WorldWideScience

Sample records for whole-genome gene sets

  1. Expansion and diversification of the SET domain gene family following whole-genome duplications in Populus trichocarpa

    Science.gov (United States)

    2012-01-01

    Background Histone lysine methylation modifies chromatin structure and regulates eukaryotic gene transcription and a variety of developmental and physiological processes. SET domain proteins are lysine methyltransferases containing the evolutionarily-conserved SET domain, which is known to be the catalytic domain. Results We identified 59 SET genes in the Populus genome. Phylogenetic analyses of 106 SET genes from Populus and Arabidopsis supported the clustering of SET genes into six distinct subfamilies and identified 19 duplicated gene pairs in Populus. The chromosome locations of these gene pairs and the distribution of synonymous substitution rates showed that the expansion of the SET gene family might be caused by large-scale duplications in Populus. Comparison of gene structures and domain architectures of each duplicate pair indicated that divergence took place at the 3'- and 5'-terminal transcribed regions and at the N- and C-termini of the predicted proteins, respectively. Expression profile analysis of Populus SET genes suggested that most Populus SET genes were expressed widely, many with the highest expression in young leaves. In particular, the expression profiles of 12 of the 19 duplicated gene pairs fell into two types of expression patterns. Conclusions The 19 duplicated SET genes could have originated from whole genome duplication events. The differences in SET gene structure, domain architecture, and expression profiles in various tissues of Populus suggest that members of the SET gene family have a variety of developmental and physiological functions. Our study provides clues about the evolution of epigenetic regulation of chromatin structure and gene expression. PMID:22497662

  2. Whole genome homology-based identification of candidate genes ...

    African Journals Online (AJOL)

    ... and might play some important roles in drought tolerance in sesame. Our results provided genomic resources for further functional analysis and genetic engineering towards drought tolerance improvement in sesame. Keywords: Sesamum indicum, candidate genes, drought tolerance, orthologous gene, whole genome ...

  3. GPCR genes are preferentially retained after whole genome duplication.

    Directory of Open Access Journals (Sweden)

    Jenia Semyonov

    Full Text Available One of the most interesting questions in biology is whether certain pathways have been favored during evolution, and if so, what properties could cause such a preference. Due to the lack of experimental evidence, whether select gene families have been preferentially retained over time after duplication in metazoan organisms remains unclear. Here, by syntenic mapping of nonchemosensory G protein-coupled receptor genes (nGPCRs which represent half the receptome for transmembrane signaling in the vertebrate genomes, we found that, as opposed to the 8-15% retention rate for whole genome duplication (WGD-derived gene duplicates in the entire genome of pufferfish, greater than 27.8% of WGD-derived nGPCRs which interact with a nonpeptide ligand were retained after WGD in pufferfish Tetraodon nigroviridis. In addition, we show that concurrent duplication of cognate ligand genes by WGD could impose selection of nGPCRs that interact with a polypeptide ligand. Against less than 2.25% probability for parallel retention of a pair of WGD-derived ligands and a pair of cognate receptor duplicates, we found a more than 8.9% retention of WGD-derived ligand-nGPCR pairs--threefold greater than one would surmise. These results demonstrate that gene retention is not uniform after WGD in vertebrates, and suggest a Darwinian selection of GPCR-mediated intercellular communication in metazoan organisms.

  4. Whole genome homology-based identification of candidate genes ...

    African Journals Online (AJOL)

    Josephine Erhiakporeh

    2016-07-06

    Jul 6, 2016 ... 1,075 from potato and 270 from tomato, comparative analysis against sesame genome led to the identification of a set of 75 candidate genes (42, 22 and 11 from Arabidopsis, potato and tomato, respectively). Mapping results .... applying drought stress by withholding water for 5 days. At this stage, all plants ...

  5. Whole-Genome Analysis of Gene Conversion Events

    Science.gov (United States)

    Hsu, Chih-Hao; Zhang, Yu; Hardison, Ross; Miller, Webb

    Gene conversion events are often overlooked in analyses of genome evolution. In a conversion event, an interval of DNA sequence (not necessarily containing a gene) overwrites a highly similar sequence. The event creates relationships among genomic intervals that can confound attempts to identify orthologs and to transfer functional annotation between genomes. Here we examine 1,112,202 paralogous pairs of human genomic intervals, and detect conversion events in about 13.5% of them. Properties of the putative gene conversions are analyzed, such as the lengths of the paralogous pairs and the spacing between their sources and targets. Our approach is illustrated using conversion events in the beta-globin gene cluster.

  6. A Gene-By-Gene Approach to Bacterial Population Genomics: Whole Genome MLST of Campylobacter

    Directory of Open Access Journals (Sweden)

    Samuel K. Sheppard

    2012-04-01

    Full Text Available Campylobacteriosis remains a major human public health problem world-wide. Genetic analyses of Campylobacter isolates, and particularly molecular epidemiology, have been central to the study of this disease, particularly the characterization of Campylobacter genotypes isolated from human infection, farm animals, and retail food. These studies have demonstrated that Campylobacter populations are highly structured, with distinct genotypes associated with particular wild or domestic animal sources, and that chicken meat is the most likely source of most human infection in countries such as the UK. The availability of multiple whole genome sequences from Campylobacter isolates presents the prospect of identifying those genes or allelic variants responsible for host-association and increased human disease risk, but the diversity of Campylobacter genomes present challenges for such analyses. We present a gene-by-gene approach for investigating the genetic basis of phenotypes in diverse bacteria such as Campylobacter, implemented with the BIGSdb software on the pubMLST.org/campylobacter website.

  7. Whole-genome gene expression profiling of formalin-fixed, paraffin-embedded tissue samples.

    Directory of Open Access Journals (Sweden)

    Craig April

    2009-12-01

    Full Text Available We have developed a gene expression assay (Whole-Genome DASL, capable of generating whole-genome gene expression profiles from degraded samples such as formalin-fixed, paraffin-embedded (FFPE specimens.We demonstrated a similar level of sensitivity in gene detection between matched fresh-frozen (FF and FFPE samples, with the number and overlap of probes detected in the FFPE samples being approximately 88% and 95% of that in the corresponding FF samples, respectively; 74% of the differentially expressed probes overlapped between the FF and FFPE pairs. The WG-DASL assay is also able to detect 1.3-1.5 and 1.5-2 -fold changes in intact and FFPE samples, respectively. The dynamic range for the assay is approximately 3 logs. Comparing the WG-DASL assay with an in vitro transcription-based labeling method yielded fold-change correlations of R(2 approximately 0.83, while fold-change comparisons with quantitative RT-PCR assays yielded R(2 approximately 0.86 and R(2 approximately 0.55 for intact and FFPE samples, respectively. Additionally, the WG-DASL assay yielded high self-correlations (R(2>0.98 with low intact RNA inputs ranging from 1 ng to 100 ng; reproducible expression profiles were also obtained with 250 pg total RNA (R(2 approximately 0.92, with approximately 71% of the probes detected in 100 ng total RNA also detected at the 250 pg level. When FFPE samples were assayed, 1 ng total RNA yielded self-correlations of R(2 approximately 0.80, while still maintaining a correlation of R(2 approximately 0.75 with standard FFPE inputs (200 ng.Taken together, these results show that WG-DASL assay provides a reliable platform for genome-wide expression profiling in archived materials. It also possesses utility within clinical settings where only limited quantities of samples may be available (e.g. microdissected material or when minimally invasive procedures are performed (e.g. biopsied specimens.

  8. Whole genome analysis of p38 SAPK-mediated gene expression upon stress

    Directory of Open Access Journals (Sweden)

    Lopez-Bigas Nuria

    2010-03-01

    Full Text Available Abstract Background Cells have the ability to respond and adapt to environmental changes through activation of stress-activated protein kinases (SAPKs. Although p38 SAPK signalling is known to participate in the regulation of gene expression little is known on the molecular mechanisms used by this SAPK to regulate stress-responsive genes and the overall set of genes regulated by p38 in response to different stimuli. Results Here, we report a whole genome expression analyses on mouse embryonic fibroblasts (MEFs treated with three different p38 SAPK activating-stimuli, namely osmostress, the cytokine TNFα and the protein synthesis inhibitor anisomycin. We have found that the activation kinetics of p38α SAPK in response to these insults is different and also leads to a complex gene pattern response specific for a given stress with a restricted set of overlapping genes. In addition, we have analysed the contribution of p38α the major p38 family member present in MEFs, to the overall stress-induced transcriptional response by using both a chemical inhibitor (SB203580 and p38α deficient (p38α-/- MEFs. We show here that p38 SAPK dependency ranged between 60% and 88% depending on the treatments and that there is a very good overlap between the inhibitor treatment and the ko cells. Furthermore, we have found that the dependency of SAPK varies depending on the time the cells are subjected to osmostress. Conclusions Our genome-wide transcriptional analyses shows a selective response to specific stimuli and a restricted common response of up to 20% of the stress up-regulated early genes that involves an important set of transcription factors, which might be critical for either cell adaptation or preparation for continuous extra-cellular changes. Interestingly, up to 85% of the up-regulated genes are under the transcriptional control of p38 SAPK. Thus, activation of p38 SAPK is critical to elicit the early gene expression program required for cell

  9. Algorithms to Model Single Gene, Single Chromosome, and Whole Genome Copy Number Changes Jointly in Tumor Phylogenetics: e1003740

    National Research Council Canada - National Science Library

    Salim Akhter Chowdhury; Stanley E Shackney; Kerstin Heselmeyer-Haddad; Thomas Ried; Alejandro A Schäffer; Russell Schwartz

    2014-01-01

      We present methods to construct phylogenetic models of tumor progression at the cellular level that include copy number changes at the scale of single genes, entire chromosomes, and the whole genome...

  10. Algorithms to model single gene, single chromosome, and whole genome copy number changes jointly in tumor phylogenetics

    National Research Council Canada - National Science Library

    Chowdhury, Salim Akhter; Shackney, Stanley E; Heselmeyer-Haddad, Kerstin; Ried, Thomas; Schäffer, Alejandro A; Schwartz, Russell

    2014-01-01

    We present methods to construct phylogenetic models of tumor progression at the cellular level that include copy number changes at the scale of single genes, entire chromosomes, and the whole genome...

  11. Expansion by whole genome duplication and evolution of the sox gene family in teleost fish.

    Science.gov (United States)

    Voldoire, Emilien; Brunet, Frédéric; Naville, Magali; Volff, Jean-Nicolas; Galiana, Delphine

    2017-01-01

    It is now recognized that several rounds of whole genome duplication (WGD) have occurred during the evolution of vertebrates, but the link between WGDs and phenotypic diversification remains unsolved. We have investigated in this study the impact of the teleost-specific WGD on the evolution of the sox gene family in teleostean fishes. The sox gene family, which encodes for transcription factors, has essential role in morphology, physiology and behavior of vertebrates and teleosts, the current largest group of vertebrates. We have first redrawn the evolution of all sox genes identified in eleven teleost genomes using a comparative genomic approach including phylogenetic and synteny analyses. We noticed, compared to tetrapods, an important expansion of the sox family: 58% (11/19) of sox genes are duplicated in teleost genomes. Furthermore, all duplicated sox genes, except sox17 paralogs, are derived from the teleost-specific WGD. Then, focusing on five sox genes, analyzing the evolution of coding and non-coding sequences, as well as the expression patterns in fish embryos and adult tissues, we demonstrated that these paralogs followed lineage-specific evolutionary trajectories in teleost genomes. This work, based on whole genome data from multiple teleostean species, supports the contribution of WGDs to the expansion of gene families, as well as to the emergence of genomic differences between lineages that might promote genetic and phenotypic diversity in teleosts.

  12. A Plasmodium Whole-Genome Synteny Map: Indels and Synteny Breakpoints as Foci for Species-Specific Genes.

    Directory of Open Access Journals (Sweden)

    2005-12-01

    Full Text Available Whole-genome comparisons are highly informative regarding genome evolution and can reveal the conservation of genome organization and gene content, gene regulatory elements, and presence of species-specific genes. Initial comparative genome analyses of the human malaria parasite Plasmodium falciparum and rodent malaria parasites (RMPs revealed a core set of 4,500 Plasmodium orthologs located in the highly syntenic central regions of the chromosomes that sharply defined the boundaries of the variable subtelomeric regions. We used composite RMP contigs, based on partial DNA sequences of three RMPs, to generate a whole-genome synteny map of P. falciparum and the RMPs. The core regions of the 14 chromosomes of P. falciparum and the RMPs are organized in 36 synteny blocks, representing groups of genes that have been stably inherited since these malaria species diverged, but whose relative organization has altered as a result of a predicted minimum of 15 recombination events. P. falciparum-specific genes and gene families are found in the variable subtelomeric regions (575 genes, at synteny breakpoints (42 genes, and as intrasyntenic indels (126 genes. Of the 168 non-subtelomeric P. falciparum genes, including two newly discovered gene families, 68% are predicted to be exported to the surface of the blood stage parasite or infected erythrocyte. Chromosomal rearrangements are implicated in the generation and dispersal of P. falciparum-specific gene families, including one encoding receptor-associated protein kinases. The data show that both synteny breakpoints and intrasyntenic indels can be foci for species-specific genes with a predicted role in host-parasite interactions and suggest that, besides rearrangements in the subtelomeric regions, chromosomal rearrangements may also be involved in the generation of species-specific gene families. A majority of these genes are expressed in blood stages, suggesting that the vertebrate host exerts a greater

  13. A Plasmodium whole-genome synteny map: indels and synteny breakpoints as foci for species-specific genes.

    Directory of Open Access Journals (Sweden)

    Taco W A Kooij

    2005-12-01

    Full Text Available Whole-genome comparisons are highly informative regarding genome evolution and can reveal the conservation of genome organization and gene content, gene regulatory elements, and presence of species-specific genes. Initial comparative genome analyses of the human malaria parasite Plasmodium falciparum and rodent malaria parasites (RMPs revealed a core set of 4,500 Plasmodium orthologs located in the highly syntenic central regions of the chromosomes that sharply defined the boundaries of the variable subtelomeric regions. We used composite RMP contigs, based on partial DNA sequences of three RMPs, to generate a whole-genome synteny map of P. falciparum and the RMPs. The core regions of the 14 chromosomes of P. falciparum and the RMPs are organized in 36 synteny blocks, representing groups of genes that have been stably inherited since these malaria species diverged, but whose relative organization has altered as a result of a predicted minimum of 15 recombination events. P. falciparum-specific genes and gene families are found in the variable subtelomeric regions (575 genes, at synteny breakpoints (42 genes, and as intrasyntenic indels (126 genes. Of the 168 non-subtelomeric P. falciparum genes, including two newly discovered gene families, 68% are predicted to be exported to the surface of the blood stage parasite or infected erythrocyte. Chromosomal rearrangements are implicated in the generation and dispersal of P. falciparum-specific gene families, including one encoding receptor-associated protein kinases. The data show that both synteny breakpoints and intrasyntenic indels can be foci for species-specific genes with a predicted role in host-parasite interactions and suggest that, besides rearrangements in the subtelomeric regions, chromosomal rearrangements may also be involved in the generation of species-specific gene families. A majority of these genes are expressed in blood stages, suggesting that the vertebrate host exerts a greater

  14. Integration of transcriptome and whole genomic resequencing data to identify key genes affecting swine fat deposition.

    Directory of Open Access Journals (Sweden)

    Kai Xing

    Full Text Available Fat deposition is highly correlated with the growth, meat quality, reproductive performance and immunity of pigs. Fatty acid synthesis takes place mainly in the adipose tissue of pigs; therefore, in this study, a high-throughput massively parallel sequencing approach was used to generate adipose tissue transcriptomes from two groups of Songliao black pigs that had opposite backfat thickness phenotypes. The total number of paired-end reads produced for each sample was in the range of 39.29-49.36 millions. Approximately 188 genes were differentially expressed in adipose tissue and were enriched for metabolic processes, such as fatty acid biosynthesis, lipid synthesis, metabolism of fatty acids, etinol, caffeine and arachidonic acid and immunity. Additionally, many genetic variations were detected between the two groups through pooled whole-genome resequencing. Integration of transcriptome and whole-genome resequencing data revealed important genomic variations among the differentially expressed genes for fat deposition, for example, the lipogenic genes. Further studies are required to investigate the roles of candidate genes in fat deposition to improve pig breeding programs.

  15. Rapid high resolution genotyping of Francisella tularensis by whole genome sequence comparison of annotated genes ("MLST+".

    Directory of Open Access Journals (Sweden)

    Markus H Antwerpen

    Full Text Available The zoonotic disease tularemia is caused by the bacterium Francisella tularensis. This pathogen is considered as a category A select agent with potential to be misused in bioterrorism. Molecular typing based on DNA-sequence like canSNP-typing or MLVA has become the accepted standard for this organism. Due to the organism's highly clonal nature, the current typing methods have reached their limit of discrimination for classifying closely related subpopulations within the subspecies F. tularensis ssp. holarctica. We introduce a new gene-by-gene approach, MLST+, based on whole genome data of 15 sequenced F. tularensis ssp. holarctica strains and apply this approach to investigate an epidemic of lethal tularemia among non-human primates in two animal facilities in Germany. Due to the high resolution of MLST+ we are able to demonstrate that three independent clones of this highly infectious pathogen were responsible for these spatially and temporally restricted outbreaks.

  16. The genome BLASTatlas - a GeneWiz extension for visualization of whole-genome homology

    DEFF Research Database (Denmark)

    Hallin, Peter Fischer; Binnewies, Tim Terence; Ussery, David

    2008-01-01

    the Clostridium tetani plasmid p88, where homologues for toxin genes can be easily visualized in other sequenced Clostridium genomes, and for a Clostridium botulinum genome, compared to 14 other Clostridium genomes. DNA structural information is also included in the atlas to visualize the DNA chromosomal context......The development of fast and inexpensive methods for sequencing bacterial genomes has led to a wealth of data, often with many genomes being sequenced of the same species or closely related organisms. Thus, there is a need for visualization methods that will allow easy comparison of many sequenced...... genomes to a defined reference strain. The BLASTatlas is one such tool that is useful for mapping and visualizing whole genome homology of genes and proteins within a reference strain compared to other strains or species of one or more prokaryotic organisms. We provide examples of BLASTatlases, including...

  17. Rapid high resolution genotyping of Francisella tularensis by whole genome sequence comparison of annotated genes ("MLST+").

    Science.gov (United States)

    Antwerpen, Markus H; Prior, Karola; Mellmann, Alexander; Höppner, Sebastian; Splettstoesser, Wolf D; Harmsen, Dag

    2015-01-01

    The zoonotic disease tularemia is caused by the bacterium Francisella tularensis. This pathogen is considered as a category A select agent with potential to be misused in bioterrorism. Molecular typing based on DNA-sequence like canSNP-typing or MLVA has become the accepted standard for this organism. Due to the organism's highly clonal nature, the current typing methods have reached their limit of discrimination for classifying closely related subpopulations within the subspecies F. tularensis ssp. holarctica. We introduce a new gene-by-gene approach, MLST+, based on whole genome data of 15 sequenced F. tularensis ssp. holarctica strains and apply this approach to investigate an epidemic of lethal tularemia among non-human primates in two animal facilities in Germany. Due to the high resolution of MLST+ we are able to demonstrate that three independent clones of this highly infectious pathogen were responsible for these spatially and temporally restricted outbreaks.

  18. Resolving the question of trypanosome monophyly: a comparative genomics approach using whole genome data sets with low taxon sampling.

    Science.gov (United States)

    Leonard, Guy; Soanes, Darren M; Stevens, Jamie R

    2011-07-01

    Since the first attempts to classify the evolutionary history of trypanosomes, there have been conflicting reports regarding their true phylogenetic relationships and, in particular, their relationships with other vertebrate trypanosomatids, e.g. Leishmania sp., as well as with the many insect parasitising trypanosomatids. Perhaps the issue that has provided most debate is that concerning the monophyly (or otherwise) of genus Trypanosoma and, even with the advent of molecular methods, the findings of numerous studies have varied significantly depending on the gene sequences analysed, number of taxa included, choice of outgroup and phylogenetic methodology. While of arguably limited applied importance, resolution of the question as to whether or not trypanosomes are monophyletic is critical to accurate evaluation of competing, mutually exclusive evolutionary scenarios for these parasites, namely the 'vertebrate-first' or 'insect-first' hypotheses. Therefore, a new approach, which could overcome previous limitations was needed. At its most simple, the problem can be defined within the framework of a trifurcated tree with three hypothetical positions at which the root can be placed. Using BLASTp and whole-genome gene-by-gene phylogenetic analyses of Trypanosoma brucei, Trypanosoma cruzi, Leishmania major and Naegleria gruberi, we have identified 599 gene markers--putative homologues--that were shared between the genomes of these four taxa. Of these, 75 homologous gene families that demonstrate monophyly of the kinetoplastids were identified. We then used these data sets in combination with an additional outgroup, Euglena gracilis, coupled with large-scale gene concatenation and diverse phylogenetic techniques to investigate the relative branching order of T. brucei, T. cruzi and L. major. Our findings confirm the monophyly of genus Trypanosoma and demonstrate that <1% of the analysed gene markers shared between the genomes of T. brucei, T. cruzi and L. major reject

  19. Whole Genome Selection

    Science.gov (United States)

    Whole genome selection (WGS) is an approach to using DNA markers that are distributed throughout the entire genome. Genes affecting most economically-important traits are distributed throughout the genome and there are relatively few that have large effects with many more genes with progressively sm...

  20. Diversity through duplication: whole-genome sequencing reveals novel gene retrocopies in the human population.

    Science.gov (United States)

    Richardson, Sandra R; Salvador-Palomeque, Carmen; Faulkner, Geoffrey J

    2014-05-01

    Gene retrocopies are generated by reverse transcription and genomic integration of mRNA. As such, retrocopies present an important exception to the central dogma of molecular biology, and have substantially impacted the functional landscape of the metazoan genome. While an estimated 8,000-17,000 retrocopies exist in the human genome reference sequence, the extent of variation between individuals in terms of retrocopy content has remained largely unexplored. Three recent studies by Abyzov et al., Ewing et al. and Schrider et al. have exploited 1,000 Genomes Project Consortium data, as well as other sources of whole-genome sequencing data, to uncover novel gene retrocopies. Here, we compare the methods and results of these three studies, highlight the impact of retrocopies in human diversity and genome evolution, and speculate on the potential for somatic gene retrocopies to impact cancer etiology and genetic diversity among individual neurons in the mammalian brain. © 2014 The Authors. Bioessays published by WILEY Periodicals, Inc.

  1. Whole genome sequencing resource identifies 18 new candidate genes for autism spectrum disorder

    Science.gov (United States)

    Yuen, Ryan KC; Merico, Daniele; Bookman, Matt; Howe, Jennifer L; Thiruvahindrapuram, Bhooma; Patel, Rohan V; Whitney, Joe; Deflaux, Nicole; Bingham, Jonathan; Wang, Zhuozhi; Pellecchia, Giovanna; Buchanan, Janet A; Walker, Susan; Marshall, Christian R; Uddin, Mohammed; Zarrei, Mehdi; Deneault, Eric; D’Abate, Lia; Chan, Ada JS; Koyanagi, Stephanie; Paton, Tara; Pereira, Sergio L; Hoang, Ny; Engchuan, Worrawat; Higginbotham, Edward J; Ho, Karen; Lamoureux, Sylvia; Li, Weili; MacDonald, Jeffrey R; Nalpathamkalam, Thomas; Sung, Wilson WL; Tsoi, Fiona J; Wei, John; Xu, Lizhen; Tasse, Anne-Marie; Kirby, Emily; Van Etten, William; Twigger, Simon; Roberts, Wendy; Drmic, Irene; Jilderda, Sanne; Modi, Bonnie MacKinnon; Kellam, Barbara; Szego, Michael; Cytrynbaum, Cheryl; Weksberg, Rosanna; Zwaigenbaum, Lonnie; Woodbury-Smith, Marc; Brian, Jessica; Senman, Lili; Iaboni, Alana; Doyle-Thomas, Krissy; Thompson, Ann; Chrysler, Christina; Leef, Jonathan; Savion-Lemieux, Tal; Smith, Isabel M; Liu, Xudong; Nicolson, Rob; Seifer, Vicki; Fedele, Angie; Cook, Edwin H; Dager, Stephen; Estes, Annette; Gallagher, Louise; Malow, Beth A; Parr, Jeremy R; Spence, Sarah J; Vorstman, Jacob; Frey, Brendan J; Robinson, James T; Strug, Lisa J; Fernandez, Bridget A; Elsabbagh, Mayada; Carter, Melissa T; Hallmayer, Joachim; Knoppers, Bartha M; Anagnostou, Evdokia; Szatmari, Peter; Ring, Robert H; Glazer, David; Pletcher, Mathew T; Scherer, Stephen W

    2017-01-01

    We are performing whole genome sequencing (WGS) of families with Autism Spectrum Disorder (ASD) to build a resource, named MSSNG, to enable the sub-categorization of phenotypes and underlying genetic factors involved. Here, we report WGS of 5,205 samples from families with ASD, accompanied by clinical information, creating a database accessible in a cloud platform, and through an internet portal with controlled access. We found an average of 73.8 de novo single nucleotide variants and 12.6 de novo insertion/deletions (indels) or copy number variations (CNVs) per ASD subject. We identified 18 new candidate ASD-risk genes such as MED13 and PHF3, and found that participants bearing mutations in susceptibility genes had significantly lower adaptive ability (p=6×10−4). In 294/2,620 (11.2%) of ASD cases, a molecular basis could be determined and 7.2% of these carried CNV/chromosomal abnormalities, emphasizing the importance of detecting all forms of genetic variation as diagnostic and therapeutic targets in ASD. PMID:28263302

  2. Whole genome sequencing resource identifies 18 new candidate genes for autism spectrum disorder.

    Science.gov (United States)

    C Yuen, Ryan K; Merico, Daniele; Bookman, Matt; L Howe, Jennifer; Thiruvahindrapuram, Bhooma; Patel, Rohan V; Whitney, Joe; Deflaux, Nicole; Bingham, Jonathan; Wang, Zhuozhi; Pellecchia, Giovanna; Buchanan, Janet A; Walker, Susan; Marshall, Christian R; Uddin, Mohammed; Zarrei, Mehdi; Deneault, Eric; D'Abate, Lia; Chan, Ada J S; Koyanagi, Stephanie; Paton, Tara; Pereira, Sergio L; Hoang, Ny; Engchuan, Worrawat; Higginbotham, Edward J; Ho, Karen; Lamoureux, Sylvia; Li, Weili; MacDonald, Jeffrey R; Nalpathamkalam, Thomas; Sung, Wilson W L; Tsoi, Fiona J; Wei, John; Xu, Lizhen; Tasse, Anne-Marie; Kirby, Emily; Van Etten, William; Twigger, Simon; Roberts, Wendy; Drmic, Irene; Jilderda, Sanne; Modi, Bonnie MacKinnon; Kellam, Barbara; Szego, Michael; Cytrynbaum, Cheryl; Weksberg, Rosanna; Zwaigenbaum, Lonnie; Woodbury-Smith, Marc; Brian, Jessica; Senman, Lili; Iaboni, Alana; Doyle-Thomas, Krissy; Thompson, Ann; Chrysler, Christina; Leef, Jonathan; Savion-Lemieux, Tal; Smith, Isabel M; Liu, Xudong; Nicolson, Rob; Seifer, Vicki; Fedele, Angie; Cook, Edwin H; Dager, Stephen; Estes, Annette; Gallagher, Louise; Malow, Beth A; Parr, Jeremy R; Spence, Sarah J; Vorstman, Jacob; Frey, Brendan J; Robinson, James T; Strug, Lisa J; Fernandez, Bridget A; Elsabbagh, Mayada; Carter, Melissa T; Hallmayer, Joachim; Knoppers, Bartha M; Anagnostou, Evdokia; Szatmari, Peter; Ring, Robert H; Glazer, David; Pletcher, Mathew T; Scherer, Stephen W

    2017-04-01

    We are performing whole-genome sequencing of families with autism spectrum disorder (ASD) to build a resource (MSSNG) for subcategorizing the phenotypes and underlying genetic factors involved. Here we report sequencing of 5,205 samples from families with ASD, accompanied by clinical information, creating a database accessible on a cloud platform and through a controlled-access internet portal. We found an average of 73.8 de novo single nucleotide variants and 12.6 de novo insertions and deletions or copy number variations per ASD subject. We identified 18 new candidate ASD-risk genes and found that participants bearing mutations in susceptibility genes had significantly lower adaptive ability (P = 6 × 10(-4)). In 294 of 2,620 (11.2%) of ASD cases, a molecular basis could be determined and 7.2% of these carried copy number variations and/or chromosomal abnormalities, emphasizing the importance of detecting all forms of genetic variation as diagnostic and therapeutic targets in ASD.

  3. Rapid genome reshaping by multiple-gene loss after whole-genome duplication in teleost fish suggested by mathematical modeling.

    Science.gov (United States)

    Inoue, Jun; Sato, Yukuto; Sinclair, Robert; Tsukamoto, Katsumi; Nishida, Mutsumi

    2015-12-01

    Whole-genome duplication (WGD) is believed to be a significant source of major evolutionary innovation. Redundant genes resulting from WGD are thought to be lost or acquire new functions. However, the rates of gene loss and thus temporal process of genome reshaping after WGD remain unclear. The WGD shared by all teleost fish, one-half of all jawed vertebrates, was more recent than the two ancient WGDs that occurred before the origin of jawed vertebrates, and thus lends itself to analysis of gene loss and genome reshaping. Using a newly developed orthology identification pipeline, we inferred the post-teleost-specific WGD evolutionary histories of 6,892 protein-coding genes from nine phylogenetically representative teleost genomes on a time-calibrated tree. We found that rapid gene loss did occur in the first 60 My, with a loss of more than 70-80% of duplicated genes, and produced similar genomic gene arrangements within teleosts in that relatively short time. Mathematical modeling suggests that rapid gene loss occurred mainly by events involving simultaneous loss of multiple genes. We found that the subsequent 250 My were characterized by slow and steady loss of individual genes. Our pipeline also identified about 1,100 shared single-copy genes that are inferred to have become singletons before the divergence of clupeocephalan teleosts. Therefore, our comparative genome analysis suggests that rapid gene loss just after the WGD reshaped teleost genomes before the major divergence, and provides a useful set of marker genes for future phylogenetic analysis.

  4. Tracing Mycobacterium tuberculosis transmission by whole genome sequencing in a high incidence setting

    DEFF Research Database (Denmark)

    Bjorn-Mortensen, K; Soborg, B; Koch, A

    2016-01-01

    In East Greenland, a dramatic increase of tuberculosis (TB) incidence has been observed in recent years. Classical genotyping suggests a genetically similar Mycobacterium tuberculosis (Mtb) strain population as cause, however, precise transmission patterns are unclear. We performed whole genome...... sequencing (WGS) of Mtb isolates from 98% of culture-positive TB cases through 21 years (n = 182) which revealed four genomic clusters of the Euro-American lineage (mainly sub-lineage 4.8 (n = 134)). The time to the most recent common ancestor of lineage 4.8 strains was found to be 100 years. This sub...... and the uniformity of circulating Mtb strains indicated that the majority of East Greenlandic TB cases originated from one or few strains introduced within the last century. Thereby, the study shows the consequences of even short interruptions in TB control efforts in previously TB high incidence areas...

  5. Insight into Shiga toxin genes encoded by Escherichia coli O157 from whole genome sequencing

    Directory of Open Access Journals (Sweden)

    Philip M. Ashton

    2015-02-01

    Full Text Available The ability of Shiga toxin-producing Escherichia coli (STEC to cause severe illness in humans is determined by multiple host factors and bacterial characteristics, including Shiga toxin (Stx subtype. Given the link between Stx2a subtype and disease severity, we sought to identify the stx subtypes present in whole genome sequences (WGS of 444 isolates of STEC O157. Difficulties in assembling the stx genes in some strains were overcome by using two complementary bioinformatics methods: mapping and de novo assembly. We compared the WGS analysis with the results obtained using a PCR approach and investigated the diversity within and between the subtypes. All strains of STEC O157 in this study had stx1a, stx2a or stx2c or a combination of these three genes. There was over 99% (442/444 concordance between PCR and WGS. When common source strains were excluded, 236/349 strains of STEC O157 had multiple copies of different Stx subtypes and 54 had multiple copies of the same Stx subtype. Of those strains harbouring multiple copies of the same Stx subtype, 33 had variants between the alleles while 21 had identical copies. Strains harbouring Stx2a only were most commonly found to have multiple alleles of the same subtype (42%. Both the PCR and WGS approach to stx subtyping provided a good level of sensitivity and specificity. In addition, the WGS data also showed there were a significant proportion of strains harbouring multiple alleles of the same Stx subtype associated with clinical disease in England.

  6. Discovery of Gene Sources for Economic Traits in Hanwoo by Whole-genome Resequencing

    Directory of Open Access Journals (Sweden)

    Younhee Shin

    2016-09-01

    Full Text Available Hanwoo, a Korean native cattle (Bos taurus coreana, has great economic value due to high meat quality. Also, the breed has genetic variations that are associated with production traits such as health, disease resistance, reproduction, growth as well as carcass quality. In this study, next generation sequencing technologies and the availability of an appropriate reference genome were applied to discover a large amount of single nucleotide polymorphisms (SNPs in ten Hanwoo bulls. Analysis of whole-genome resequencing generated a total of 26.5 Gb data, of which 594,716,859 and 592,990,750 reads covered 98.73% and 93.79% of the bovine reference genomes of UMD 3.1 and Btau 4.6.1, respectively. In total, 2,473,884 and 2,402,997 putative SNPs were discovered, of which 1,095,922 (44.3% and 982,674 (40.9% novel SNPs were discovered against UMD3.1 and Btau 4.6.1, respectively. Among the SNPs, the 46,301 (UMD 3.1 and 28,613 SNPs (Btau 4.6.1 that were identified as Hanwoo-specific SNPs were included in the functional genes that may be involved in the mechanisms of milk production, tenderness, juiciness, marbling of Hanwoo beef and yellow hair. Most of the Hanwoo-specific SNPs were identified in the promoter region, suggesting that the SNPs influence differential expression of the regulated genes relative to the relevant traits. In particular, the non-synonymous (ns SNPs found in CORIN, which is a negative regulator of Agouti, might be a causal variant to determine yellow hair of Hanwoo. Our results will provide abundant genetic sources of variation to characterize Hanwoo genetics and for subsequent breeding.

  7. Distinct functions of two olfactory marker protein genes derived from teleost-specific whole genome duplication.

    Science.gov (United States)

    Suzuki, Hikoyu; Nikaido, Masato; Hagino-Yamagishi, Kimiko; Okada, Norihiro

    2015-11-10

    Whole genome duplications (WGDs) have been proposed to have made a significant impact on vertebrate evolution. Two rounds of WGD (1R and 2R) occurred in the common ancestor of Gnathostomata and Cyclostomata, followed by the third-round WGD (3R) in a common ancestor of all modern teleosts. The 3R-derived paralogs are good models for understanding the evolution of genes after WGD, which have the potential to facilitate phenotypic diversification. However, the recent studies of 3R-derived paralogs tend to be based on in silico analyses. Here we analyzed the paralogs encoding teleost olfactory marker protein (OMP), which was shown to be specifically expressed in mature olfactory sensory neurons and is expected to be involved in olfactory transduction. Our genome database search identified two OMPs (OMP1 and OMP2) in teleosts, whereas only one was present in other vertebrates. Phylogenetic and synteny analyses suggested that OMP1 and 2 were derived from 3R. Both OMPs showed distinct expression patterns in zebrafish; OMP1 was expressed in the deep layer of the olfactory epithelium (OE), which is consistent with previous studies of mice and zebrafish, whereas OMP2 was sporadically expressed in the superficial layer. Interestingly, OMP2 was expressed in a very restricted region of the retina as well as in the OE. In addition, the analysis of transcriptome data of spotted gar, a non-teleost fish, revealed that single OMP gene was expressed in the eyes. We found distinct expression patterns of zebrafish OMP1 and 2 at the tissue and cellular level. These differences in expression patterns may be explained by subfunctionalization as the model of molecular evolution. Namely, single OMP gene was speculated to be originally expressed in the OE and the eyes in the common ancestor of all Osteichthyes (bony fish including tetrapods). Then, two OMP gene paralogs derived from 3R-WGD reduced and specialized the expression patterns. This study provides a good example for analyzing a

  8. Whole genome sequencing resource identifies 18 new candidate genes for autism spectrum disorder

    NARCIS (Netherlands)

    Yuen, Ryan K C; Merico, Daniele; Bookman, Matt; Howe, Jennifer L.; Thiruvahindrapuram, Bhooma; Patel, Rohan V.; Whitney, Joe; Deflaux, Nicole; Bingham, Jonathan; Wang, Zhuozhi; Pellecchia, Giovanna; Buchanan, Janet A.; Walker, Susan; Marshall, Christian R.; Uddin, Mohammed; Zarrei, Mehdi; Deneault, Eric; D'Abate, Lia; Chan, Ada J S; Koyanagi, Stephanie; Paton, Tara; Pereira, Sergio L.; Hoang, Ny; Engchuan, Worrawat; Higginbotham, Edward J.; Ho, Karen; Lamoureux, Sylvia; Li, Weili; MacDonald, Jeffrey R.; Nalpathamkalam, Thomas; Sung, Wilson W L; Tsoi, Fiona J.; Wei, John; Xu, Lizhen; Tasse, Anne Marie; Kirby, Emily; Van Etten, William; Twigger, Simon; Roberts, Wendy; Drmic, Irene; Jilderda, Sanne; Modi, Bonnie Mackinnon; Kellam, Barbara; Szego, Michael; Cytrynbaum, Cheryl; Weksberg, Rosanna; Zwaigenbaum, Lonnie; Woodbury-Smith, Marc; Brian, Jessica; Senman, Lili; Iaboni, Alana; Doyle-Thomas, Krissy; Thompson, Ann; Chrysler, Christina; Leef, Jonathan; Savion-Lemieux, Tal; Smith, Isabel M.; Liu, Xudong; Nicolson, Rob; Seifer, Vicki; Fedele, Angie; Cook, Edwin H.; Dager, Stephen; Estes, Annette; Gallagher, Louise; Malow, Beth A.; Parr, Jeremy R.; Spence, Sarah J.; Vorstman, Jacob|info:eu-repo/dai/nl/304817023; Frey, Brendan J.; Robinson, James T.; Strug, Lisa J.; Fernandez, Bridget A.; Elsabbagh, Mayada; Carter, Melissa T.; Hallmayer, Joachim; Knoppers, Bartha M.; Anagnostou, Evdokia; Szatmari, Peter; Ring, Robert H.; Glazer, David; Pletcher, Mathew T.; Scherer, Stephen W.

    2017-01-01

    We are performing whole-genome sequencing of families with autism spectrum disorder (ASD) to build a resource (MSSNG) for subcategorizing the phenotypes and underlying genetic factors involved. Here we report sequencing of 5,205 samples from families with ASD, accompanied by clinical information,

  9. Whole Genome Sequencing

    Science.gov (United States)

    ... you want to learn. Search form Search Whole Genome Sequencing You are here Home Testing & Services Testing ... the full story, click here . What is whole genome sequencing? Whole genome sequencing is the mapping out ...

  10. Ultrahigh-dimensional variable selection method for whole-genome gene-gene interaction analysis

    Directory of Open Access Journals (Sweden)

    Ueki Masao

    2012-05-01

    Full Text Available Abstract Background Genome-wide gene-gene interaction analysis using single nucleotide polymorphisms (SNPs is an attractive way for identification of genetic components that confers susceptibility of human complex diseases. Individual hypothesis testing for SNP-SNP pairs as in common genome-wide association study (GWAS however involves difficulty in setting overall p-value due to complicated correlation structure, namely, the multiple testing problem that causes unacceptable false negative results. A large number of SNP-SNP pairs than sample size, so-called the large p small n problem, precludes simultaneous analysis using multiple regression. The method that overcomes above issues is thus needed. Results We adopt an up-to-date method for ultrahigh-dimensional variable selection termed the sure independence screening (SIS for appropriate handling of numerous number of SNP-SNP interactions by including them as predictor variables in logistic regression. We propose ranking strategy using promising dummy coding methods and following variable selection procedure in the SIS method suitably modified for gene-gene interaction analysis. We also implemented the procedures in a software program, EPISIS, using the cost-effective GPGPU (General-purpose computing on graphics processing units technology. EPISIS can complete exhaustive search for SNP-SNP interactions in standard GWAS dataset within several hours. The proposed method works successfully in simulation experiments and in application to real WTCCC (Wellcome Trust Case–control Consortium data. Conclusions Based on the machine-learning principle, the proposed method gives powerful and flexible genome-wide search for various patterns of gene-gene interaction.

  11. Evaluating the performance of commercial whole-genome marker sets for capturing common genetic variation

    Science.gov (United States)

    Mägi, Reedik; Pfeufer, Arne; Nelis, Mari; Montpetit, Alexandre; Metspalu, Andres; Remm, Maido

    2007-01-01

    Background New technologies have enabled genome-wide association studies to be conducted with hundreds of thousands of genotyped SNPs. Several different first-generation genome-wide panels of SNPs have been commercialized. The total amount of common genetic variation is still unknown; however, the coverage of commercial panels can be evaluated against reference population samples genotyped by the International HapMap project. Less information is available about coverage in samples from other populations. Results In this study we compare four commercial panels: the HumanHap 300 and HumanHap 550 Array Sets from the Illumina Infinium series and the Mapping 100 K and Mapping 500 K Array Sets from the Affymetrix GeneChip series. Tagging performance is compared among HapMap CEPH (CEU), Asian (JPT, CHB) and Yoruba (YRI) population samples. It is also evaluated in an Estonian population sample with more than 1000 individuals genotyped in two 500-kbp ENCODE regions of chromosome 2: ENr112 on 2p16.3 and ENr131 on 2p37.1. Conclusion We found that in a non-reference Caucasian population, commercial SNP panels provide levels of coverage similar to those in the HapMap CEPH population sample. We present the proportions of universal and population-specific SNPs in all the commercial platforms studied. PMID:17562002

  12. A whole genome SNP genotyping by DNA microarray and candidate gene association study for kidney stone disease

    Science.gov (United States)

    2014-01-01

    Background Kidney stone disease (KSD) is a complex disorder with unknown etiology in majority of the patients. Genetic and environmental factors may cause the disease. In the present study, we used DNA microarray to genotype single nucleotide polymorphisms (SNP) and performed candidate gene association analysis to determine genetic variations associated with the disease. Methods A whole genome SNP genotyping by DNA microarray was initially conducted in 101 patients and 105 control subjects. A set of 104 candidate genes reported to be involved in KSD, gathered from public databases and candidate gene association study databases, were evaluated for their variations associated with KSD. Results Altogether 82 SNPs distributed within 22 candidate gene regions showed significant differences in SNP allele frequencies between the patient and control groups (P AHSG, CD44, and HAO1, encoding osteocalcin, fetuin-A, CD44-molecule and glycolate oxidase 1, respectively, were further assessed for their associations with the disease because they carried high proportion of SNPs with statistical differences of allele frequencies between the patient and control groups within the gene. The total of 26 SNPs showed significant differences of allele frequencies between the patient and control groups and haplotypes associated with disease risk were identified. The SNP rs759330 located 144 bp downstream of BGLAP where it is a predicted microRNA binding site at 3′UTR of PAQR6 – a gene encoding progestin and adipoQ receptor family member VI, was genotyped in 216 patients and 216 control subjects and found to have significant differences in its genotype and allele frequencies (P = 0.0007, OR 2.02 and P = 0.0001, OR 2.02, respectively). Conclusions Our results suggest that these candidate genes are associated with KSD and PAQR6 comes into our view as the most potent candidate since associated SNP rs759330 is located in the miRNA binding site and may affect mRNA expression level

  13. Quantitative trait locus mapping and candidate gene analysis for plant architecture traits using whole genome re-sequencing in rice.

    Science.gov (United States)

    Lim, Jung-Hyun; Yang, Hyun-Jung; Jung, Ki-Hong; Yoo, Soo-Cheul; Paek, Nam-Chon

    2014-02-01

    Plant breeders have focused on improving plant architecture as an effective means to increase crop yield. Here, we identify the main-effect quantitative trait loci (QTLs) for plant shape-related traits in rice (Oryza sativa) and find candidate genes by applying whole genome re-sequencing of two parental cultivars using next-generation sequencing. To identify QTLs influencing plant shape, we analyzed six traits: plant height, tiller number, panicle diameter, panicle length, flag leaf length, and flag leaf width. We performed QTL analysis with 178 F7 recombinant in-bred lines (RILs) from a cross of japonica rice line 'SNUSG1' and indica rice line 'Milyang23'. Using 131 molecular markers, including 28 insertion/deletion markers, we identified 11 main- and 16 minor-effect QTLs for the six traits with a threshold LOD value > 2.8. Our sequence analysis identified fifty-four candidate genes for the main-effect QTLs. By further comparison of coding sequences and meta-expression profiles between japonica and indica rice varieties, we finally chose 15 strong candidate genes for the 11 main-effect QTLs. Our study shows that the whole-genome sequence data substantially enhanced the efficiency of polymorphic marker development for QTL fine-mapping and the identification of possible candidate genes. This yields useful genetic resources for breeding high-yielding rice cultivars with improved plant architecture.

  14. Patterns of evolutionary conservation of ascorbic acid-related genes following whole-genome triplication in Brassica rapa.

    Science.gov (United States)

    Duan, Weike; Song, Xiaoming; Liu, Tongkun; Huang, Zhinan; Ren, Jun; Hou, Xilin; Du, Jianchang; Li, Ying

    2014-12-31

    Ascorbic acid (AsA) is an important antioxidant in plants and an essential vitamin for humans. Extending the study of AsA-related genes from Arabidopsis thaliana to Brassica rapa could shed light on the evolution of AsA in plants and inform crop breeding. In this study, we conducted whole-genome annotation, molecular-evolution and gene-expression analyses of all known AsA-related genes in B. rapa. The nucleobase-ascorbate transporter (NAT) gene family and AsA l-galactose pathway genes were also compared among plant species. Four important insights gained are that: 1) 102 AsA-related gene were identified in B. rapa and they mainly diverged 12-18 Ma accompanied by the Brassica-specific genome triplication event; 2) during their evolution, these AsA-related genes were preferentially retained, consistent with the gene dosage hypothesis; 3) the putative proteins were highly conserved, but their expression patterns varied; and 4) although the number of AsA-related genes is higher in B. rapa than in A. thaliana, the AsA contents and the numbers of expressed genes in leaves of both species are similar, the genes that are not generally expressed may serve as substitutes during emergencies. In summary, this study provides genome-wide insights into evolutionary history and mechanisms of AsA-related genes following whole-genome triplication in B. rapa. © The Author(s) 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  15. Soybean (Glycine max) SWEET gene family: insights through comparative genomics, transcriptome profiling and whole genome re-sequence analysis.

    Science.gov (United States)

    Patil, Gunvant; Valliyodan, Babu; Deshmukh, Rupesh; Prince, Silvas; Nicander, Bjorn; Zhao, Mingzhe; Sonah, Humira; Song, Li; Lin, Li; Chaudhary, Juhi; Liu, Yang; Joshi, Trupti; Xu, Dong; Nguyen, Henry T

    2015-07-11

    SWEET (MtN3_saliva) domain proteins, a recently identified group of efflux transporters, play an indispensable role in sugar efflux, phloem loading, plant-pathogen interaction and reproductive tissue development. The SWEET gene family is predominantly studied in Arabidopsis and members of the family are being investigated in rice. To date, no transcriptome or genomics analysis of soybean SWEET genes has been reported. In the present investigation, we explored the evolutionary aspect of the SWEET gene family in diverse plant species including primitive single cell algae to angiosperms with a major emphasis on Glycine max. Evolutionary features showed expansion and duplication of the SWEET gene family in land plants. Homology searches with BLAST tools and Hidden Markov Model-directed sequence alignments identified 52 SWEET genes that were mapped to 15 chromosomes in the soybean genome as tandem duplication events. Soybean SWEET (GmSWEET) genes showed a wide range of expression profiles in different tissues and developmental stages. Analysis of public transcriptome data and expression profiling using quantitative real time PCR (qRT-PCR) showed that a majority of the GmSWEET genes were confined to reproductive tissue development. Several natural genetic variants (non-synonymous SNPs, premature stop codons and haplotype) were identified in the GmSWEET genes using whole genome re-sequencing data analysis of 106 soybean genotypes. A significant association was observed between SNP-haplogroup and seed sucrose content in three gene clusters on chromosome 6. Present investigation utilized comparative genomics, transcriptome profiling and whole genome re-sequencing approaches and provided a systematic description of soybean SWEET genes and identified putative candidates with probable roles in the reproductive tissue development. Gene expression profiling at different developmental stages and genomic variation data will aid as an important resource for the soybean research

  16. Whole-genome resequencing and transcriptomic analysis to identify genes involved in leaf-color diversity in ornamental rice plants.

    Directory of Open Access Journals (Sweden)

    Chang-Kug Kim

    Full Text Available Rice field art is a large-scale art form in which people design rice fields using various kinds of ornamental rice plants with different leaf colors. Leaf color-related genes play an important role in the study of chlorophyll biosynthesis, chloroplast structure and function, and anthocyanin biosynthesis. Despite the role of different metabolites in the traditional relationship between leaf and color, comprehensive color-specific metabolite studies of ornamental rice have been limited. We performed whole-genome resequencing and transcriptomic analysis of regulatory patterns and genetic diversity among different rice cultivars to discover new genetic mechanisms that promote enhanced levels of various leaf colors. We resequenced the genomes of 10 rice leaf-color accessions to an average of 40× reads depth and >95% coverage and performed 30 RNA-seq experiments using the 10 rice accessions sampled at three developmental stages. The sequencing results yielded a total of 1,814 × 106 reads and identified an average of 713,114 SNPs per rice accession. Based on our analysis of the DNA variation and gene expression, we selected 47 candidate genes. We used an integrated analysis of the whole-genome resequencing data and the RNA-seq data to divide the candidate genes into two groups: genes related to macronutrient (i.e., magnesium and sulfur transport and genes related to flavonoid pathways, including anthocyanidin biosynthesis. We verified the candidate genes with quantitative real-time PCR using transgenic T-DNA insertion mutants. Our study demonstrates the potential of integrated screening methods combined with genetic-variation and transcriptomic data to isolate genes involved in complex biosynthetic networks and pathways.

  17. Diversification and evolution of the SDG gene family in Brassica rapa after the whole genome triplication.

    Science.gov (United States)

    Dong, Heng; Liu, Dandan; Han, Tianyu; Zhao, Yuxue; Sun, Ji; Lin, Sue; Cao, Jiashu; Chen, Zhong-Hua; Huang, Li

    2015-11-24

    Histone lysine methylation, controlled by the SET Domain Group (SDG) gene family, is part of the histone code that regulates chromatin function and epigenetic control of gene expression. Analyzing the SDG gene family in Brassica rapa for their gene structure, domain architecture, subcellular localization, rate of molecular evolution and gene expression pattern revealed common occurrences of subfunctionalization and neofunctionalization in BrSDGs. In comparison with Arabidopsis thaliana, the BrSDG gene family was found to be more divergent than AtSDGs, which might partly explain the rich variety of morphotypes in B. rapa. In addition, a new evolutionary pattern of the four main groups of SDGs was presented, in which the Trx group and the SUVR subgroup evolved faster than the E(z), Ash groups and the SUVH subgroup. These differences in evolutionary rate among the four main groups of SDGs are perhaps due to the complexity and variability of the regions that bind with biomacromolecules, which guide SDGs to their target loci.

  18. Whole genome expression array profiling highlights differences in mucosal defense genes in Barrett's esophagus and esophageal adenocarcinoma.

    Directory of Open Access Journals (Sweden)

    Derek J Nancarrow

    Full Text Available Esophageal adenocarcinoma (EAC has become a major concern in Western countries due to rapid rises in incidence coupled with very poor survival rates. One of the key risk factors for the development of this cancer is the presence of Barrett's esophagus (BE, which is believed to form in response to repeated gastro-esophageal reflux. In this study we performed comparative, genome-wide expression profiling (using Illumina whole-genome Beadarrays on total RNA extracted from esophageal biopsy tissues from individuals with EAC, BE (in the absence of EAC and those with normal squamous epithelium. We combined these data with publically accessible raw data from three similar studies to investigate key gene and ontology differences between these three tissue states. The results support the deduction that BE is a tissue with enhanced glycoprotein synthesis machinery (DPP4, ATP2A3, AGR2 designed to provide strong mucosal defenses aimed at resisting gastro-esophageal reflux. EAC exhibits the enhanced extracellular matrix remodeling (collagens, IGFBP7, PLAU effects expected in an aggressive form of cancer, as well as evidence of reduced expression of genes associated with mucosal (MUC6, CA2, TFF1 and xenobiotic (AKR1C2, AKR1B10 defenses. When our results are compared to previous whole-genome expression profiling studies keratin, mucin, annexin and trefoil factor gene groups are the most frequently represented differentially expressed gene families. Eleven genes identified here are also represented in at least 3 other profiling studies. We used these genes to discriminate between squamous epithelium, BE and EAC within the two largest cohorts using a support vector machine leave one out cross validation (LOOCV analysis. While this method was satisfactory for discriminating squamous epithelium and BE, it demonstrates the need for more detailed investigations into profiling changes between BE and EAC.

  19. Whole-genome analysis of pseudorabies virus gene expression by real-time quantitative RT-PCR assay

    Directory of Open Access Journals (Sweden)

    Petrovszki Pál

    2009-10-01

    Full Text Available Abstract Background Pseudorabies virus (PRV, a neurotropic herpesvirus of pigs, serves as an excellent model system with which to investigate the herpesvirus life cycle both in cultured cells and in vivo. Real-time RT-PCR is a very sensitive, accurate and reproducible technique that can be used to detect very small amounts of RNA molecules, and it can therefore be applied for analysis of the expression of herpesvirus genes from the very early period of infection. Results In this study, we have developed and applied a quantitative reverse transcriptase-based real-time PCR technique in order to profile transcription from the whole genome of PRV after lytic infection in porcine kidney cells. We calculated the relative expression ratios in a novel way, which allowed us to compare different PRV genes with respect to their expression dynamics, and to divide the PRV genes into distinct kinetic classes. This is the first publication on the whole-genome analysis of the gene expression of an alpha-herpesvirus by qRT2-PCR. We additionally established the kinetic properties of uncharacterized PRV genes and revised or confirmed data on PRV genes earlier examined by traditional methods such as Northern blot analysis. Our investigations revealed that genes with the same expression properties form clusters on the PRV genome: nested overlapping genes belong in the same kinetic class, while most convergent genes belong in different kinetic classes. Further, we detected inverse relationships as concerns the expressions of EP0 and IE180 mRNAs and their antisense partners. Conclusion Most (if not all PRV genes begin to be expressed from the onset of viral expression. No sharp boundary was found between the groups of early and late genes classified on the basis of their requirement for viral DNA synthesis. The expressions of the PRV genes were analyzed, categorized and compared by qRT2-PCR assay, with the average of the minimum cycle threshold used as a control for

  20. Differential Gene Expression Analysis of Placentas with Increased Vascular Resistance and Pre-Eclampsia Using Whole-Genome Microarrays

    Directory of Open Access Journals (Sweden)

    M. Centlow

    2011-01-01

    Full Text Available Pre-eclampsia is a pregnancy complication characterized by hypertension and proteinuria. There are several factors associated with an increased risk of developing pre-eclampsia, one of which is increased uterine artery resistance, referred to as “notching”. However, some women do not progress into pre-eclampsia whereas others may have a higher risk of doing so. The placenta, central in pre-eclampsia pathology, may express genes associated with either protection or progression into pre-eclampsia. In order to search for genes associated with protection or progression, whole-genome profiling was performed. Placental tissue from 15 controls, 10 pre-eclamptic, 5 pre-eclampsia with notching, and 5 with notching only were analyzed using microarray and antibody microarrays to study some of the same gene product and functionally related ones. The microarray showed 148 genes to be significantly altered between the four groups. In the preeclamptic group compared to notch only, there was increased expression of genes related to chemotaxis and the NF-kappa B pathway and decreased expression of genes related to antigen processing and presentation, such as human leukocyte antigen B. Our results indicate that progression of pre-eclampsia from notching may involve the development of inflammation. Increased expression of antigen-presenting genes, as seen in the notch-only placenta, may prevent this inflammatory response and, thereby, protect the patient from developing pre-eclampsia.

  1. Characterization of a novel blaIMP gene, blaIMP-58, using whole genome sequencing in a Pseudomonas putida isolate detected in Denmark

    DEFF Research Database (Denmark)

    Holmgaard, Dennis Back; Hansen, Frank; Hasman, Henrik

    2017-01-01

    A multidrug-resistant strain of Pseudomonas putida was isolated from the urine of a 65-year-old women hospitalized for serious clinical conditions. Using whole genome sequencing a novel blaIMP gene, blaIMP-58 was discovered and characterized.......A multidrug-resistant strain of Pseudomonas putida was isolated from the urine of a 65-year-old women hospitalized for serious clinical conditions. Using whole genome sequencing a novel blaIMP gene, blaIMP-58 was discovered and characterized....

  2. Comparative transcriptome analysis reveals whole-genome duplications and gene selection patterns in cultivated and wild Chrysanthemum species.

    Science.gov (United States)

    Won, So Youn; Kwon, Soo-Jin; Lee, Tae-Ho; Jung, Jae-A; Kim, Jung Sun; Kang, Sang-Ho; Sohn, Seong-Han

    2017-11-01

    Comparative transcriptome analysis of wild and cultivated chrysanthemums provides valuable genomic resources and helps uncover common and divergent patterns of genome and gene evolution in these species. Plants are unique in that they employ polyploidy (or whole-genome duplication, WGD) as a key process for speciation and evolution. The Chrysanthemum genus is closely associated with hybridization and polyploidization, with Chrysanthemum species exhibiting diverse ploidy levels. The commercially important species, C. morifolium is an allohexaploid plant that is thought to have originated via the hybridization of several Chrysanthemum species, but the genomic and molecular evolutionary mechanisms remain poorly understood. In the present study, we sequenced and compared the transcriptomes of C. morifolium and the wild Korean diploid species, C. boreale. De novo transcriptome assembly revealed 11,318 genes in C. morifolium and 10,961 genes in C. boreale, whose functions were annotated by homology searches. An analysis of synonymous substitution rates (Ks) of paralogous and orthologous genes suggested that the two Chrysanthemum species commonly experienced the Asteraceae paleopolyploidization and recent genome duplication or triplication before the divergence of these species. Intriguingly, C. boreale probably underwent rapid diploidization, with a reduction in chromosome number, whereas C. morifolium maintained the original chromosome number. Analysis of the ratios of non-synonymous to synonymous nucleotide substitutions (Ka/Ks) between orthologous gene pairs indicated that 107 genes experienced positive selection, which may have been crucial for the adaptation, domestication, and speciation of Chrysanthemum.

  3. Expression divergence of cellulose synthase (CesA) genes after a recent whole genome duplication event in Populus.

    Science.gov (United States)

    Takata, Naoki; Taniguchi, Toru

    2015-01-01

    Secondary cell wall-associated CesA genes in Populus have undergone a functional differentiation in expression pattern that may be attributable to evolutionary alteration of regulatory modules. Gene duplication is an important mechanism for functional divergence of genes. Secondary cell wall-associated cellulose synthase genes (CesA4, CesA7 and CesA8) are duplicated in Populus plants due to a recent whole genome duplication event. Here, we demonstrate that duplicate CesA genes show tissue-dependent expression divergence in Populus plants. Real-time PCR analysis of Populus CesA genes suggested that Pt × tCesA8-B was more highly expressed than Pt × tCesA8-A in phloem and secondary xylem tissue of mature stem. Histochemical and histological analyses of transformants expressing a GFP-GUS fusion gene driven by Populus CesA promoters revealed that the duplicate CesA genes showed different expression patterns in phloem fibers, secondary xylem, root cap and leaf trichomes. We predicted putative cis-regulatory motifs that regulate expression of secondary cell wall-associated CesA genes, and identified 19 motifs that are highly conserved in the CesA gene family of eudicotyledonous plants. Furthermore, a transient transactivation assay identified candidate transcription factors that affect levels and patterns of expression of Populus CesA genes. The present study reveals that secondary cell wall-associated CesA genes in Populus have undergone a functional differentiation in expression pattern that may be attributable to evolutionary alteration of regulatory modules.

  4. Repeated Whole-Genome Duplication, Karyotype Reshuffling, and Biased Retention of Stress-Responding Genes in Buckler Mustard.

    Science.gov (United States)

    Geiser, Céline; Mandáková, Terezie; Arrigo, Nils; Lysak, Martin A; Parisod, Christian

    2016-01-01

    Whole-genome duplication (WGD) is usually followed by gene loss and karyotype repatterning. Despite evidence of new adaptive traits associated with WGD, the underpinnings and evolutionary significance of such genome fractionation remain elusive. Here, we use Buckler mustard (Biscutella laevigata) to infer processes that have driven the retention of duplicated genes after recurrent WGDs. In addition to the β- and α-WGD events shared by all Brassicaceae, cytogenetic and transcriptome analyses revealed two younger WGD events that occurred at times of environmental changes in the clade of Buckler mustard (Biscutelleae): a mesopolyploidy event from the late Miocene that was followed by considerable karyotype reshuffling and chromosome number reduction and a neopolyploidy event during the Pleistocene. Although a considerable number of the older duplicates presented signatures of retention under positive selection, the majority of retained duplicates arising from the younger mesopolyploidy WGD event matched predictions of the gene balance hypothesis and showed evidence of strong purifying selection as well as enrichment in gene categories responding to abiotic stressors. Retention of large stretches of chromosomes for both genomic copies supported the hypothesis that cycles of WGD and biased fractionation shaped the genome of this stress-tolerant polypolyloid, promoting the adaptive recruitment of stress-responding genes in the face of environmental challenges. © 2016 American Society of Plant Biologists. All rights reserved.

  5. Whole genome sequencing reveals a novel deletion variant in the KIT gene in horses with white spotted coat colour phenotypes.

    Science.gov (United States)

    Dürig, N; Jude, R; Holl, H; Brooks, S A; Lafayette, C; Jagannathan, V; Leeb, T

    2017-08-01

    White spotting phenotypes in horses can range in severity from the common white markings up to completely white horses. EDNRB, KIT, MITF, PAX3 and TRPM1 represent known candidate genes for such phenotypes in horses. For the present study, we re-investigated a large horse family segregating a variable white spotting phenotype, for which conventional Sanger sequencing of the candidate genes' individual exons had failed to reveal the causative variant. We obtained whole genome sequence data from an affected horse and specifically searched for structural variants in the known candidate genes. This analysis revealed a heterozygous ~1.9-kb deletion spanning exons 10-13 of the KIT gene (chr3:77,740,239_77,742,136del1898insTATAT). In continuity with previously named equine KIT variants we propose to designate the newly identified deletion variant W22. We had access to 21 horses carrying the W22 allele. Four of them were compound heterozygous W20/W22 and had a completely white phenotype. Our data suggest that W22 represents a true null allele of the KIT gene, whereas the previously identified W20 leads to a partial loss of function. These findings will enable more precise genetic testing for depigmentation phenotypes in horses. © 2017 Stichting International Foundation for Animal Genetics.

  6. FGF: A web tool for Fishing Gene Family in a whole genome database

    DEFF Research Database (Denmark)

    Zheng, Hongkun; Shi, Junjie; Fang, Xiaodong

    2007-01-01

    to efficiently search for and identify gene families. The FGF output displays the results as visual phylogenetic trees including information on gene structure, chromosome position, duplication fate and selective pressure. It is particularly useful to identify pseudogenes and detect changes in gene structure. FGF......Gene duplication is an important process in evolution. The availability of genome sequences of a number of organisms has made it possible to conduct comprehensive searches for duplicated genes enabling informative studies of their evolution. We have established the FGF (Fishing Gene Family) program...

  7. Use of Whole Genome Sequence Data To Infer Baculovirus Phylogeny

    NARCIS (Netherlands)

    Herniou, E.A.; Luque, T.; Chen, X.; Vlak, J.M.; Winstanley, D.; Cory, J.S.; O'Reilly, D.R.

    2001-01-01

    Several phylogenetic methods based on whole genome sequence data were evaluated using data from nine complete baculovirus genomes. The utility of three independent character sets was assessed. The first data set comprised the sequences of the 63 genes common to these viruses. The second set of

  8. FGF: a web tool for Fishing Gene Family in a whole genome database

    DEFF Research Database (Denmark)

    Zheng, Hongkun; Shi, Junjie; Fang, Xiaodong

    2007-01-01

    Gene duplication is an important process in evolution. The availability of genome sequences of a number of organisms has made it possible to conduct comprehensive searches for duplicated genes enabling informative studies of their evolution. We have established the FGF (Fishing Gene Family) program...

  9. Whole-genome analysis of herbicide-tolerant mutant rice generated by Agrobacterium-mediated gene targeting.

    Science.gov (United States)

    Endo, Masaki; Kumagai, Masahiko; Motoyama, Ritsuko; Sasaki-Yamagata, Harumi; Mori-Hosokawa, Satomi; Hamada, Masao; Kanamori, Hiroyuki; Nagamura, Yoshiaki; Katayose, Yuichi; Itoh, Takeshi; Toki, Seiichi

    2015-01-01

    Gene targeting (GT) is a technique used to modify endogenous genes in target genomes precisely via homologous recombination (HR). Although GT plants are produced using genetic transformation techniques, if the difference between the endogenous and the modified gene is limited to point mutations, GT crops can be considered equivalent to non-genetically modified mutant crops generated by conventional mutagenesis techniques. However, it is difficult to guarantee the non-incorporation of DNA fragments from Agrobacterium in GT plants created by Agrobacterium-mediated GT despite screening with conventional Southern blot and/or PCR techniques. Here, we report a comprehensive analysis of herbicide-tolerant rice plants generated by inducing point mutations in the rice ALS gene via Agrobacterium-mediated GT. We performed genome comparative genomic hybridization (CGH) array analysis and whole-genome sequencing to evaluate the molecular composition of GT rice plants. Thus far, no integration of Agrobacterium-derived DNA fragments has been detected in GT rice plants. However, >1,000 single nucleotide polymorphisms (SNPs) and insertion/deletion (InDels) were found in GT plants. Among these mutations, 20-100 variants might have some effect on expression levels and/or protein function. Information about additive mutations should be useful in clearing out unwanted mutations by backcrossing. © The Author 2014. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists.

  10. Copy Number Variation of UGT 2B Genes in Indian Families Using Whole Genome Scans

    Directory of Open Access Journals (Sweden)

    Avinash M. Veerappa

    2016-01-01

    Full Text Available Background and Objectives. Uridine diphospho-glucuronosyltransferase 2B (UGT2B is a family of genes involved in metabolizing steroid hormones and several other xenobiotics. These UGT2B genes are highly polymorphic in nature and have distinct polymorphisms associated with specific regions around the globe. Copy number variations (CNVs status of UGT2B17 in Indian population is not known and their disease associations have been inconclusive. It was therefore of interest to investigate the CNV profile of UGT2B genes. Methods. We investigated the presence of CNVs in UGT2B genes in 31 members from eight Indian families using Affymetrix Genome-Wide Human SNP Array 6.0 chip. Results. Our data revealed >50% of the study members carried CNVs in UGT2B genes, of which 76% showed deletion polymorphism. CNVs were observed more in UGT2B17 (76.4% than in UGT2B15 (17.6%. Molecular network and pathway analysis found enrichment related to steroid metabolic process, carboxylesterase activity, and sequence specific DNA binding. Interpretation and Conclusion. We report the presence of UGT2B gene deletion and duplication polymorphisms in Indian families. Network analysis indicates the substitutive role of other possible genes in the UGT activity. The CNVs of UGT2B genes are very common in individuals indicating that the effect is neutral in causing any suspected diseases.

  11. The roles of whole-genome and small-scale duplications in the functional specialization of Saccharomyces cerevisiae genes.

    Directory of Open Access Journals (Sweden)

    Mario A Fares

    Full Text Available Researchers have long been enthralled with the idea that gene duplication can generate novel functions, crediting this process with great evolutionary importance. Empirical data shows that whole-genome duplications (WGDs are more likely to be retained than small-scale duplications (SSDs, though their relative contribution to the functional fate of duplicates remains unexplored. Using the map of genetic interactions and the re-sequencing of 27 Saccharomyces cerevisiae genomes evolving for 2,200 generations we show that SSD-duplicates lead to neo-functionalization while WGD-duplicates partition ancestral functions. This conclusion is supported by: (a SSD-duplicates establish more genetic interactions than singletons and WGD-duplicates; (b SSD-duplicates copies share more interaction-partners than WGD-duplicates copies; (c WGD-duplicates interaction partners are more functionally related than SSD-duplicates partners; (d SSD-duplicates gene copies are more functionally divergent from one another, while keeping more overlapping functions, and diverge in their sub-cellular locations more than WGD-duplicates copies; and (e SSD-duplicates complement their functions to a greater extent than WGD-duplicates. We propose a novel model that uncovers the complexity of evolution after gene duplication.

  12. Gene expression profiling to characterize sediment toxicity – a pilot study using Caenorhabditis elegans whole genome microarrays

    Directory of Open Access Journals (Sweden)

    Reifferscheid Georg

    2009-04-01

    Full Text Available Abstract Background Traditionally, toxicity of river sediments is assessed using whole sediment tests with benthic organisms. The challenge, however, is the differentiation between multiple effects caused by complex contaminant mixtures and the unspecific toxicity endpoints such as survival, growth or reproduction. The use of gene expression profiling facilitates the identification of transcriptional changes at the molecular level that are specific to the bio-available fraction of pollutants. Results In this pilot study, we exposed the nematode Caenorhabditis elegans to three sediments of German rivers with varying (low, medium and high levels of heavy metal and organic contamination. Beside chemical analysis, three standard bioassays were performed: reproduction of C. elegans, genotoxicity (Comet assay and endocrine disruption (YES test. Gene expression was profiled using a whole genome DNA-microarray approach to identify overrepresented functional gene categories and derived cellular processes. Disaccharide and glycogen metabolism were found to be affected, whereas further functional pathways, such as oxidative phosphorylation, ribosome biogenesis, metabolism of xenobiotics, aging and several developmental processes were found to be differentially regulated only in response to the most contaminated sediment. Conclusion This study demonstrates how ecotoxicogenomics can identify transcriptional responses in complex mixture scenarios to distinguish different samples of river sediments.

  13. Whole-genome resequencing of Bacillus cereus and expression of genes functioning in sodium chloride stress.

    Science.gov (United States)

    Xu, Zhenbo; Xie, Jinhong; Liu, Junyan; Ji, Lili; Soteyome, Thanapop; Peters, Brian M; Chen, Dingqiang; Li, Bing; Li, Lin; Shirtliff, Mark E

    2017-03-01

    Bacillus cereus is one of the most common opportunistic pathogens responsible for various foodborn diseases. To investigate the regulatory mechanism of B. cereus under high osmotic pressure, two B. cereus strains B25 and B26 were isolated from the industrial soy sauce residue containing high-salt concentration. Resequencing was performed by Illumina/Solexa platform and 13,646 SNPs and 434 InDels were identified as common variants between B25 and B26 against reference genome, followed by COG, GO, and KEGG enrichment analysis. Furthermore, 49 key genes involving in Na(+)/H(+),K(+) transporter, dipeptide or tripeptide transporter, stress response were selected and classified into 27 groups. Further validation was performed by qRT-PCR, and 4 candidate genes were found most associated with osmotic response. Gene expression of the 4 candidate genes was then analyzed accordingly, and down regulation was obtained for gene BC0669 and BC0754 associated with K(+) transport system. However, dramatic up regulation was detected for gene BC2114 involving in glutathione peroxidase, indicating the activation of antioxidant responses by osmotic stress via genetic regulation. As concluded, bioinformatic analysis and gene expression profile represented the basis of further investigation on the genetic and regulatory mechanism of bacterial salt tolerance. Copyright © 2017 Elsevier Ltd. All rights reserved.

  14. Effects of a diet high in monounsaturated fat and a full Mediterranean diet on PBMC whole genome gene expression and plasma proteins

    NARCIS (Netherlands)

    Dijk, van Susan; Feskens, Edith; Bos, M.B.; Groot, de Lisette; Vries, de Jeanne; Muller, Michael; Afman, Lydia

    2012-01-01

    This study aimed to identify the effects of replacement of saturated fat (SFA) by monunsaturated fat (MUFA) in a western-type diet and the effects of a full Mediterranean (MED) diet on whole genome PBMC gene expression and plasma protein profiles. Abdominally overweight subjects were randomized to a

  15. Prediction of Genes Related to Positive Selection Using Whole-Genome Resequencing in Three Commercial Pig Breeds

    Directory of Open Access Journals (Sweden)

    HyoYoung Kim

    2015-12-01

    Full Text Available Selective sweep can cause genetic differentiation across populations, which allows for the identification of possible causative regions/genes underlying important traits. The pig has experienced a long history of allele frequency changes through artificial selection in the domestication process. We obtained an average of 329,482,871 sequence reads for 24 pigs from three pig breeds: Yorkshire (n = 5, Landrace (n = 13, and Duroc (n = 6. An average read depth of 11.7 was obtained using whole-genome resequencing on an Illumina HiSeq2000 platform. In this study, cross-population extended haplotype homozygosity and cross-population composite likelihood ratio tests were implemented to detect genes experiencing positive selection for the genome-wide resequencing data generated from three commercial pig breeds. In our results, 26, 7, and 14 genes from Yorkshire, Landrace, and Duroc, respectively were detected by two kinds of statistical tests. Significant evidence for positive selection was identified on genes ST6GALNAC2 and EPHX1 in Yorkshire, PARK2 in Landrace, and BMP6, SLA-DQA1, and PRKG1 in Duroc.These genes are reportedly relevant to lactation, reproduction, meat quality, and growth traits. To understand how these single nucleotide polymorphisms (SNPs related positive selection affect protein function, we analyzed the effect of non-synonymous SNPs. Three SNPs (rs324509622, rs80931851, and rs80937718 in the SLA-DQA1 gene were significant in the enrichment tests, indicating strong evidence for positive selection in Duroc. Our analyses identified genes under positive selection for lactation, reproduction, and meat-quality and growth traits in Yorkshire, Landrace, and Duroc, respectively.

  16. A microarray whole-genome gene expression dataset in a rat model of inflammatory corneal angiogenesis.

    Science.gov (United States)

    Mukwaya, Anthony; Lindvall, Jessica M; Xeroudaki, Maria; Peebo, Beatrice; Ali, Zaheer; Lennikov, Anton; Jensen, Lasse Dahl Ejby; Lagali, Neil

    2016-11-22

    In angiogenesis with concurrent inflammation, many pathways are activated, some linked to VEGF and others largely VEGF-independent. Pathways involving inflammatory mediators, chemokines, and micro-RNAs may play important roles in maintaining a pro-angiogenic environment or mediating angiogenic regression. Here, we describe a gene expression dataset to facilitate exploration of pro-angiogenic, pro-inflammatory, and remodelling/normalization-associated genes during both an active capillary sprouting phase, and in the restoration of an avascular phenotype. The dataset was generated by microarray analysis of the whole transcriptome in a rat model of suture-induced inflammatory corneal neovascularisation. Regions of active capillary sprout growth or regression in the cornea were harvested and total RNA extracted from four biological replicates per group. High quality RNA was obtained for gene expression analysis using microarrays. Fold change of selected genes was validated by qPCR, and protein expression was evaluated by immunohistochemistry. We provide a gene expression dataset that may be re-used to investigate corneal neovascularisation, and may also have implications in other contexts of inflammation-mediated angiogenesis.

  17. Benchmarking of methods for identification of antimicrobial resistance genes in bacterial whole genome data

    DEFF Research Database (Denmark)

    Clausen, Philip T. L. C.; Zankari, Ea; Aarestrup, Frank Møller

    2016-01-01

    to two different methods in current use for identification of antibiotic resistance genes in bacterial WGS data. A novel method, KmerResistance, which examines the co-occurrence of k-mers between the WGS data and a database of resistance genes, was developed. The performance of this method was compared...... with two previously described methods; ResFinder and SRST2, which use an assembly/BLAST method and BWA, respectively, using two datasets with a total of 339 isolates, covering five species, originating from the Oxford University Hospitals NHS Trust and Danish pig farms. The predicted resistance...... was compared with the observed phenotypes for all isolates. To challenge further the sensitivity of the in silico methods, the datasets were also down-sampled to 1% of the reads and reanalysed. The best results were obtained by identification of resistance genes by mapping directly against the raw reads...

  18. Clinical whole-genome sequencing in severe early-onset epilepsy reveals new genes and improves molecular diagnosis.

    Science.gov (United States)

    Martin, Hilary C; Kim, Grace E; Pagnamenta, Alistair T; Murakami, Yoshiko; Carvill, Gemma L; Meyer, Esther; Copley, Richard R; Rimmer, Andrew; Barcia, Giulia; Fleming, Matthew R; Kronengold, Jack; Brown, Maile R; Hudspith, Karl A; Broxholme, John; Kanapin, Alexander; Cazier, Jean-Baptiste; Kinoshita, Taroh; Nabbout, Rima; Bentley, David; McVean, Gil; Heavin, Sinéad; Zaiwalla, Zenobia; McShane, Tony; Mefford, Heather C; Shears, Deborah; Stewart, Helen; Kurian, Manju A; Scheffer, Ingrid E; Blair, Edward; Donnelly, Peter; Kaczmarek, Leonard K; Taylor, Jenny C

    2014-06-15

    In severe early-onset epilepsy, precise clinical and molecular genetic diagnosis is complex, as many metabolic and electro-physiological processes have been implicated in disease causation. The clinical phenotypes share many features such as complex seizure types and developmental delay. Molecular diagnosis has historically been confined to sequential testing of candidate genes known to be associated with specific sub-phenotypes, but the diagnostic yield of this approach can be low. We conducted whole-genome sequencing (WGS) on six patients with severe early-onset epilepsy who had previously been refractory to molecular diagnosis, and their parents. Four of these patients had a clinical diagnosis of Ohtahara Syndrome (OS) and two patients had severe non-syndromic early-onset epilepsy (NSEOE). In two OS cases, we found de novo non-synonymous mutations in the genes KCNQ2 and SCN2A. In a third OS case, WGS revealed paternal isodisomy for chromosome 9, leading to identification of the causal homozygous missense variant in KCNT1, which produced a substantial increase in potassium channel current. The fourth OS patient had a recessive mutation in PIGQ that led to exon skipping and defective glycophosphatidyl inositol biosynthesis. The two patients with NSEOE had likely pathogenic de novo mutations in CBL and CSNK1G1, respectively. Mutations in these genes were not found among 500 additional individuals with epilepsy. This work reveals two novel genes for OS, KCNT1 and PIGQ. It also uncovers unexpected genetic mechanisms and emphasizes the power of WGS as a clinical tool for making molecular diagnoses, particularly for highly heterogeneous disorders. © The Author 2014. Published by Oxford University Press.

  19. Algorithms to model single gene, single chromosome, and whole genome copy number changes jointly in tumor phylogenetics.

    Directory of Open Access Journals (Sweden)

    Salim Akhter Chowdhury

    2014-07-01

    Full Text Available We present methods to construct phylogenetic models of tumor progression at the cellular level that include copy number changes at the scale of single genes, entire chromosomes, and the whole genome. The methods are designed for data collected by fluorescence in situ hybridization (FISH, an experimental technique especially well suited to characterizing intratumor heterogeneity using counts of probes to genetic regions frequently gained or lost in tumor development. Here, we develop new provably optimal methods for computing an edit distance between the copy number states of two cells given evolution by copy number changes of single probes, all probes on a chromosome, or all probes in the genome. We then apply this theory to develop a practical heuristic algorithm, implemented in publicly available software, for inferring tumor phylogenies on data from potentially hundreds of single cells by this evolutionary model. We demonstrate and validate the methods on simulated data and published FISH data from cervical cancers and breast cancers. Our computational experiments show that the new model and algorithm lead to more parsimonious trees than prior methods for single-tumor phylogenetics and to improved performance on various classification tasks, such as distinguishing primary tumors from metastases obtained from the same patient population.

  20. Whole-genome survey of the putative ATP-binding cassette transporter family genes in Vitis vinifera.

    Science.gov (United States)

    Çakır, Birsen; Kılıçkaya, Ozan

    2013-01-01

    The ATP-binding cassette (ABC) protein superfamily constitutes one of the largest protein families known in plants. In this report, we performed a complete inventory of ABC protein genes in Vitis vinifera, the whole genome of which has been sequenced. By comparison with ABC protein members of Arabidopsis thaliana, we identified 135 putative ABC proteins with 1 or 2 NBDs in V. vinifera. Of these, 120 encode intrinsic membrane proteins, and 15 encode proteins missing TMDs. V. vinifera ABC proteins can be divided into 13 subfamilies with 79 "full-size," 41 "half-size," and 15 "soluble" putative ABC proteins. The main feature of the Vitis ABC superfamily is the presence of 2 large subfamilies, ABCG (pleiotropic drug resistance and white-brown complex homolog) and ABCC (multidrug resistance-associated protein). We identified orthologs of V. vinifera putative ABC transporters in different species. This work represents the first complete inventory of ABC transporters in V. vinifera. The identification of Vitis ABC transporters and their comparative analysis with the Arabidopsis counterparts revealed a strong conservation between the 2 species. This inventory could help elucidate the biological and physiological functions of these transporters in V. vinifera.

  1. Algorithms to model single gene, single chromosome, and whole genome copy number changes jointly in tumor phylogenetics.

    Science.gov (United States)

    Chowdhury, Salim Akhter; Shackney, Stanley E; Heselmeyer-Haddad, Kerstin; Ried, Thomas; Schäffer, Alejandro A; Schwartz, Russell

    2014-07-01

    We present methods to construct phylogenetic models of tumor progression at the cellular level that include copy number changes at the scale of single genes, entire chromosomes, and the whole genome. The methods are designed for data collected by fluorescence in situ hybridization (FISH), an experimental technique especially well suited to characterizing intratumor heterogeneity using counts of probes to genetic regions frequently gained or lost in tumor development. Here, we develop new provably optimal methods for computing an edit distance between the copy number states of two cells given evolution by copy number changes of single probes, all probes on a chromosome, or all probes in the genome. We then apply this theory to develop a practical heuristic algorithm, implemented in publicly available software, for inferring tumor phylogenies on data from potentially hundreds of single cells by this evolutionary model. We demonstrate and validate the methods on simulated data and published FISH data from cervical cancers and breast cancers. Our computational experiments show that the new model and algorithm lead to more parsimonious trees than prior methods for single-tumor phylogenetics and to improved performance on various classification tasks, such as distinguishing primary tumors from metastases obtained from the same patient population.

  2. Whole genome population genetics analysis of Sudanese goats identifies regions harboring genes associated with major traits.

    Science.gov (United States)

    Rahmatalla, Siham A; Arends, Danny; Reissmann, Monika; Said Ahmed, Ammar; Wimmers, Klaus; Reyer, Henry; Brockmann, Gudrun A

    2017-10-23

    Sudan is endowed with a variety of indigenous goat breeds which are used for meat and milk production and which are well adapted to the local environment. The aim of the present study was to determine the genetic diversity and relationship within and between the four main Sudanese breeds of Nubian, Desert, Taggar and Nilotic goats. Using the 50 K SNP chip, 24 animals of each breed were genotyped. More than 96% of high quality SNPs were polymorphic with an average minor allele frequency of 0.3. In all breeds, no significant difference between observed (0.4) and expected (0.4) heterozygosity was found and the inbreeding coefficients (FIS) did not differ from zero. Fst coefficients for the genetic distance between breeds also did not significantly deviate from zero. In addition, the analysis of molecular variance revealed that 93% of the total variance in the examined population can be explained by differences among individuals, while only 7% result from differences between the breeds. These findings provide evidence for high genetic diversity and little inbreeding within breeds on one hand, and low diversity between breeds on the other hand. Further examinations using Nei's genetic distance and STRUCTURE analysis clustered Taggar goats distinct from the other breeds. In a principal component (PC) analysis, PC1 could separate Taggar, Nilotic and a mix of Nubian and Desert goats into three groups. The SNPs that contributed strongly to PC1 showed high Fst values in Taggar goat versus the other goat breeds. PCA allowed us to identify target genomic regions which contain genes known to influence growth, development, bone formation and the immune system. The information on the genetic variability and diversity in this study confirmed that Taggar goat is genetically different from the other goat breeds in Sudan. The SNPs identified by the first principal components show high Fst values in Taggar goat and allowed to identify candidate genes which can be used in the

  3. Whole-Genome Analysis of Diversity and SNP-Major Gene Association in Peach Germplasm.

    Directory of Open Access Journals (Sweden)

    Diego Micheletti

    Full Text Available Peach was domesticated in China more than four millennia ago and from there it spread world-wide. Since the middle of the last century, peach breeding programs have been very dynamic generating hundreds of new commercial varieties, however, in most cases such varieties derive from a limited collection of parental lines (founders. This is one reason for the observed low levels of variability of the commercial gene pool, implying that knowledge of the extent and distribution of genetic variability in peach is critical to allow the choice of adequate parents to confer enhanced productivity, adaptation and quality to improved varieties. With this aim we genotyped 1,580 peach accessions (including a few closely related Prunus species maintained and phenotyped in five germplasm collections (four European and one Chinese with the International Peach SNP Consortium 9K SNP peach array. The study of population structure revealed the subdivision of the panel in three main populations, one mainly made up of Occidental varieties from breeding programs (POP1OCB, one of Occidental landraces (POP2OCT and the third of Oriental accessions (POP3OR. Analysis of linkage disequilibrium (LD identified differential patterns of genome-wide LD blocks in each of the populations. Phenotypic data for seven monogenic traits were integrated in a genome-wide association study (GWAS. The significantly associated SNPs were always in the regions predicted by linkage analysis, forming haplotypes of markers. These diagnostic haplotypes could be used for marker-assisted selection (MAS in modern breeding programs.

  4. Identification of Ohnolog Genes Originating from Whole Genome Duplication in Early Vertebrates, Based on Synteny Comparison across Multiple Genomes.

    Science.gov (United States)

    Singh, Param Priya; Arora, Jatin; Isambert, Hervé

    2015-07-01

    Whole genome duplications (WGD) have now been firmly established in all major eukaryotic kingdoms. In particular, all vertebrates descend from two rounds of WGDs, that occurred in their jawless ancestor some 500 MY ago. Paralogs retained from WGD, also coined 'ohnologs' after Susumu Ohno, have been shown to be typically associated with development, signaling and gene regulation. Ohnologs, which amount to about 20 to 35% of genes in the human genome, have also been shown to be prone to dominant deleterious mutations and frequently implicated in cancer and genetic diseases. Hence, identifying ohnologs is central to better understand the evolution of vertebrates and their susceptibility to genetic diseases. Early computational analyses to identify vertebrate ohnologs relied on content-based synteny comparisons between the human genome and a single invertebrate outgroup genome or within the human genome itself. These approaches are thus limited by lineage specific rearrangements in individual genomes. We report, in this study, the identification of vertebrate ohnologs based on the quantitative assessment and integration of synteny conservation between six amniote vertebrates and six invertebrate outgroups. Such a synteny comparison across multiple genomes is shown to enhance the statistical power of ohnolog identification in vertebrates compared to earlier approaches, by overcoming lineage specific genome rearrangements. Ohnolog gene families can be browsed and downloaded for three statistical confidence levels or recompiled for specific, user-defined, significance criteria at http://ohnologs.curie.fr/. In the light of the importance of WGD on the genetic makeup of vertebrates, our analysis provides a useful resource for researchers interested in gaining further insights on vertebrate evolution and genetic diseases.

  5. Identification of Ohnolog Genes Originating from Whole Genome Duplication in Early Vertebrates, Based on Synteny Comparison across Multiple Genomes.

    Directory of Open Access Journals (Sweden)

    Param Priya Singh

    2015-07-01

    Full Text Available Whole genome duplications (WGD have now been firmly established in all major eukaryotic kingdoms. In particular, all vertebrates descend from two rounds of WGDs, that occurred in their jawless ancestor some 500 MY ago. Paralogs retained from WGD, also coined 'ohnologs' after Susumu Ohno, have been shown to be typically associated with development, signaling and gene regulation. Ohnologs, which amount to about 20 to 35% of genes in the human genome, have also been shown to be prone to dominant deleterious mutations and frequently implicated in cancer and genetic diseases. Hence, identifying ohnologs is central to better understand the evolution of vertebrates and their susceptibility to genetic diseases. Early computational analyses to identify vertebrate ohnologs relied on content-based synteny comparisons between the human genome and a single invertebrate outgroup genome or within the human genome itself. These approaches are thus limited by lineage specific rearrangements in individual genomes. We report, in this study, the identification of vertebrate ohnologs based on the quantitative assessment and integration of synteny conservation between six amniote vertebrates and six invertebrate outgroups. Such a synteny comparison across multiple genomes is shown to enhance the statistical power of ohnolog identification in vertebrates compared to earlier approaches, by overcoming lineage specific genome rearrangements. Ohnolog gene families can be browsed and downloaded for three statistical confidence levels or recompiled for specific, user-defined, significance criteria at http://ohnologs.curie.fr/. In the light of the importance of WGD on the genetic makeup of vertebrates, our analysis provides a useful resource for researchers interested in gaining further insights on vertebrate evolution and genetic diseases.

  6. Whole genome gene expression meta-analysis of inflammatory bowel disease colon mucosa demonstrates lack of major differences between Crohn's disease and ulcerative colitis.

    Directory of Open Access Journals (Sweden)

    Atle van Beelen Granlund

    Full Text Available BACKGROUND: In inflammatory bowel disease (IBD, genetic susceptibility together with environmental factors disturbs gut homeostasis producing chronic inflammation. The two main IBD subtypes are Ulcerative colitis (UC and Crohn's disease (CD. We present the to-date largest microarray gene expression study on IBD encompassing both inflamed and un-inflamed colonic tissue. A meta-analysis including all available, comparable data was used to explore important aspects of IBD inflammation, thereby validating consistent gene expression patterns. METHODS: Colon pinch biopsies from IBD patients were analysed using Illumina whole genome gene expression technology. Differential expression (DE was identified using LIMMA linear model in the R statistical computing environment. Results were enriched for gene ontology (GO categories. Sets of genes encoding antimicrobial proteins (AMP and proteins involved in T helper (Th cell differentiation were used in the interpretation of the results. All available data sets were analysed using the same methods, and results were compared on a global and focused level as t-scores. RESULTS: Gene expression in inflamed mucosa from UC and CD are remarkably similar. The meta-analysis confirmed this. The patterns of AMP and Th cell-related gene expression were also very similar, except for IL23A which was consistently higher expressed in UC than in CD. Un-inflamed tissue from patients demonstrated minimal differences from healthy controls. CONCLUSIONS: There is no difference in the Th subgroup involvement between UC and CD. Th1/Th17 related expression, with little Th2 differentiation, dominated both diseases. The different IL23A expression between UC and CD suggests an IBD subtype specific role. AMPs, previously little studied, are strongly overexpressed in IBD. The presented meta-analysis provides a sound background for further research on IBD pathobiology.

  7. Implications of using whole genome sequencing to test unselected populations for high risk breast cancer genes: a modelling study.

    Science.gov (United States)

    Warren-Gash, Charlotte; Kroese, Mark; Burton, Hilary; Pharoah, Paul

    2016-01-01

    The decision to test for high risk breast cancer gene mutations is traditionally based on risk scores derived from age, family and personal cancer history. Next generation sequencing technologies such as whole genome sequencing (WGS) make wider population testing more feasible. In the UK's 100,000 Genomes Project, mutations in 16 genes including BRCA1 and BRCA2 are to be actively sought regardless of clinical presentation. The implications of deploying this approach at scale for patients and clinical services are unclear. In this study we aimed to model the effect of using WGS to test an unselected UK population for high risk BRCA1 and BRCA2 gene variants to inform the debate around approaches to secondary genomic findings. We modelled the test performance of WGS for identifying pathogenic BRCA1 and BRCA2 mutations in an unselected hypothetical population of 100,000 UK women, using published literature to derive model input parameters. We calculated analytic and clinical validity, described potential health outcomes and highlighted current areas of uncertainty. We also performed a sensitivity analysis in which we re-ran the model 100,000 times to investigate the effect of varying input parameters. In our models WGS was predicted to identify correctly 93 pathogenic BRCA1 mutations and 151 BRCA2 mutations in 120 and 200 women respectively, resulting in an analytic sensitivity of 75.5-77.5 %. Of 244 women with identified pathogenic mutations, we estimated that 132 (range 121-198) would develop breast cancer, so could potentially be helped by intervention. We also predicted that breast cancer would occur in 41 women (range 36-62) incorrectly identified with no pathogenic mutations and in 12,460 women without BRCA1 or BRCA2 mutations. There was considerable uncertainty about the penetrance of mutations in people without a family history of disease and the appropriate threshold of absolute disease risk for clinical action, which impacts on judgements about the clinical

  8. Whole-genome sequencing identifies a novel ABCB7 gene mutation for X-linked congenital cerebellar ataxia in a large family of Mongolian ancestry

    OpenAIRE

    Protasova, Maria S; Grigorenko, Anastasia P.; Tyazhelova, Tatiana V.; Tatiana V Andreeva; Reshetov, Denis A; Gusev, Fedor E; Laptenko, Alexander E; Kuznetsova, Irina L; Goltsov, Andrey Y; Klyushnikov, Sergey A; Sergey N. Illarioshkin; Rogaev, Evgeny I.

    2015-01-01

    X-linked congenital cerebellar ataxia is a heterogeneous nonprogressive neurodevelopmental disorder with onset in early childhood. We searched for a genetic cause of this condition, previously reported in a Buryat pedigree of Mongolian ancestry from southeastern Russia. Using whole-genome sequencing on Illumina HiSeq 2000 platform, we found a missense mutation in the ABCB7 (ABC-binding cassette transporter B7) gene, encoding a mitochondrial transporter, involved in heme synthesis and previous...

  9. Whole-Genome Sequence Analysis of Antimicrobial Resistance Genes in Streptococcus uberis and Streptococcus dysgalactiae Isolates from Canadian Dairy Herds

    Directory of Open Access Journals (Sweden)

    Julián Reyes Vélez

    2017-05-01

    Full Text Available The objectives of this study are to determine the occurrence of antimicrobial resistance (AMR genes using whole-genome sequence (WGS of Streptococcus uberis (S. uberis and Streptococcus dysgalactiae (S. dysgalactiae isolates, recovered from dairy cows in the Canadian Maritime Provinces. A secondary objective included the exploration of the association between phenotypic AMR and the genomic characteristics (genome size, guanine–cytosine content, and occurrence of unique gene sequences. Initially, 91 isolates were sequenced, and of these isolates, 89 were assembled. Furthermore, 16 isolates were excluded due to larger than expected genomic sizes (>2.3 bp × 1,000 bp. In the final analysis, 73 were used with complete WGS and minimum inhibitory concentration records, which were part of the previous phenotypic AMR study, representing 18 dairy herds from the Maritime region of Canada (1. A total of 23 unique AMR gene sequences were found in the bacterial genomes, with a mean number of 8.1 (minimum: 5; maximum: 13 per genome. Overall, there were 10 AMR genes [ANT(6, TEM-127, TEM-163, TEM-89, TEM-95, Linb, Lnub, Ermb, Ermc, and TetS] present only in S. uberis genomes and 2 genes unique (EF-TU and TEM-71 to the S. dysgalactiae genomes; 11 AMR genes [APH(3′, TEM-1, TEM-136, TEM-157, TEM-47, TetM, bl2b, gyrA, parE, phoP, and rpoB] were found in both bacterial species. Two-way tabulations showed association between the phenotypic susceptibility to lincosamides and the presence of linB (P = 0.002 and lnuB (P < 0.001 genes and the between the presence of tetM (P = 0.015 and tetS (P = 0.064 genes and phenotypic resistance to tetracyclines only for the S. uberis isolates. The logistic model showed that the odds of resistance (to any of the phenotypically tested antimicrobials was 4.35 times higher when there were >11 AMR genes present in the genome, compared with <7 AMR genes (P < 0.001. The odds of resistance was lower for S

  10. Implementation of High Resolution Whole Genome Array CGH in the Prenatal Clinical Setting: Advantages, Challenges, and Review of the Literature

    Directory of Open Access Journals (Sweden)

    Paola Evangelidou

    2013-01-01

    Full Text Available Array Comparative Genomic Hybridization analysis is replacing postnatal chromosomal analysis in cases of intellectual disabilities, and it has been postulated that it might also become the first-tier test in prenatal diagnosis. In this study, array CGH was applied in 64 prenatal samples with whole genome oligonucleotide arrays (BlueGnome, Ltd. on DNA extracted from chorionic villi, amniotic fluid, foetal blood, and skin samples. Results were confirmed with Fluorescence In Situ Hybridization or Real-Time PCR. Fifty-three cases had normal karyotype and abnormal ultrasound findings, and seven samples had balanced rearrangements, five of which also had ultrasound findings. The value of array CGH in the characterization of previously known aberrations in five samples is also presented. Seventeen out of 64 samples carried copy number alterations giving a detection rate of 26.5%. Ten of these represent benign or variables of unknown significance, giving a diagnostic capacity of the method to be 10.9%. If karyotype is performed the additional diagnostic capacity of the method is 5.1% (3/59. This study indicates the ability of array CGH to identify chromosomal abnormalities which cannot be detected during routine prenatal cytogenetic analysis, therefore increasing the overall detection rate. In addition a thorough review of the literature is presented.

  11. Whole-genome sequence analyses of Western Central African Pygmy hunter-gatherers reveal a complex demographic history and identify candidate genes under positive natural selection.

    Science.gov (United States)

    Hsieh, PingHsun; Veeramah, Krishna R; Lachance, Joseph; Tishkoff, Sarah A; Wall, Jeffrey D; Hammer, Michael F; Gutenkunst, Ryan N

    2016-03-01

    African Pygmies practicing a mobile hunter-gatherer lifestyle are phenotypically and genetically diverged from other anatomically modern humans, and they likely experienced strong selective pressures due to their unique lifestyle in the Central African rainforest. To identify genomic targets of adaptation, we sequenced the genomes of four Biaka Pygmies from the Central African Republic and jointly analyzed these data with the genome sequences of three Baka Pygmies from Cameroon and nine Yoruba famers. To account for the complex demographic history of these populations that includes both isolation and gene flow, we fit models using the joint allele frequency spectrum and validated them using independent approaches. Our two best-fit models both suggest ancient divergence between the ancestors of the farmers and Pygmies, 90,000 or 150,000 yr ago. We also find that bidirectional asymmetric gene flow is statistically better supported than a single pulse of unidirectional gene flow from farmers to Pygmies, as previously suggested. We then applied complementary statistics to scan the genome for evidence of selective sweeps and polygenic selection. We found that conventional statistical outlier approaches were biased toward identifying candidates in regions of high mutation or low recombination rate. To avoid this bias, we assigned P-values for candidates using whole-genome simulations incorporating demography and variation in both recombination and mutation rates. We found that genes and gene sets involved in muscle development, bone synthesis, immunity, reproduction, cell signaling and development, and energy metabolism are likely to be targets of positive natural selection in Western African Pygmies or their recent ancestors. © 2016 Hsieh et al.; Published by Cold Spring Harbor Laboratory Press.

  12. A Case of HDR Syndrome and Ichthyosis: Dual Diagnosis by Whole-Genome Sequencing of Novel Mutations in GATA3 and STS Genes.

    Science.gov (United States)

    Goodwin, Gregory; Hawley, Pamela P; Miller, David T

    2016-03-01

    Atypical presentations of complex multisystem disorders may elude diagnosis based on clinical findings only. Appropriate diagnostic tests may not be available or available tests may not provide appropriate coverage of relevant genomic regions for patients with complex phenotypes. Clinical whole-exome/-genome sequencing is often considered for complex patients lacking a definitive diagnosis. A boy who is now 7 years old presented as a newborn with congenital ichthyosis. At 6 weeks of age, he presented with failure to thrive and hypoparathyroidism. At 4 years of age, he was diagnosed with sensorineural hearing loss. Whole-genome sequencing identified novel mutations in GATA3, which causes HDR syndrome (hypoparathyroidism and deafness), and STS, which causes X -linked congenital ichthyosis. Whole-genome sequencing led to a definitive clinical diagnosis in a case where no other clinical test was available for GATA3, and no sequencing panel would have included both genes because they have disparate phenotypes. This case demonstrates the power of whole-genome (or exome) sequencing for patients with complex clinical presentations involving endocrine abnormalities.

  13. Prevalent Role of Gene Features in Determining Evolutionary Fates of Whole-Genome Duplication Duplicated Genes in Flowering Plants1[W][OA

    Science.gov (United States)

    Jiang, Wen-kai; Liu, Yun-long; Xia, En-hua; Gao, Li-zhi

    2013-01-01

    The evolution of genes and genomes after polyploidization has been the subject of extensive studies in evolutionary biology and plant sciences. While a significant number of duplicated genes are rapidly removed during a process called fractionation, which operates after the whole-genome duplication (WGD), another considerable number of genes are retained preferentially, leading to the phenomenon of biased gene retention. However, the evolutionary mechanisms underlying gene retention after WGD remain largely unknown. Through genome-wide analyses of sequence and functional data, we comprehensively investigated the relationships between gene features and the retention probability of duplicated genes after WGDs in six plant genomes, Arabidopsis (Arabidopsis thaliana), poplar (Populus trichocarpa), soybean (Glycine max), rice (Oryza sativa), sorghum (Sorghum bicolor), and maize (Zea mays). The results showed that multiple gene features were correlated with the probability of gene retention. Using a logistic regression model based on principal component analysis, we resolved evolutionary rate, structural complexity, and GC3 content as the three major contributors to gene retention. Cluster analysis of these features further classified retained genes into three distinct groups in terms of gene features and evolutionary behaviors. Type I genes are more prone to be selected by dosage balance; type II genes are possibly subject to subfunctionalization; and type III genes may serve as potential targets for neofunctionalization. This study highlights that gene features are able to act jointly as primary forces when determining the retention and evolution of WGD-derived duplicated genes in flowering plants. These findings thus may help to provide a resolution to the debate on different evolutionary models of gene fates after WGDs. PMID:23396833

  14. Global analysis of human duplicated genes reveals the relative importance of whole-genome duplicates originated in the early vertebrate evolution.

    Science.gov (United States)

    Acharya, Debarun; Ghosh, Tapash C

    2016-01-22

    Gene duplication is a genetic mutation that creates functionally redundant gene copies that are initially relieved from selective pressures and may adapt themselves to new functions with time. The levels of gene duplication may vary from small-scale duplication (SSD) to whole genome duplication (WGD). Studies with yeast revealed ample differences between these duplicates: Yeast WGD pairs were functionally more similar, less divergent in subcellular localization and contained a lesser proportion of essential genes. In this study, we explored the differences in evolutionary genomic properties of human SSD and WGD genes, with the identifiable human duplicates coming from the two rounds of whole genome duplication occurred early in vertebrate evolution. We observed that these two groups of duplicates were also dissimilar in terms of their evolutionary and genomic properties. But interestingly, this is not like the same observed in yeast. The human WGDs were found to be functionally less similar, diverge more in subcellular level and contain a higher proportion of essential genes than the SSDs, all of which are opposite from yeast. Additionally, we explored that human WGDs were more divergent in their gene expression profile, have higher multifunctionality and are more often associated with disease, and are evolutionarily more conserved than human SSDs. Our study suggests that human WGD duplicates are more divergent and entails the adaptation of WGDs to novel and important functions that consequently lead to their evolutionary conservation in the course of evolution.

  15. Comparative Genomics of an IncA/C Multidrug Resistance Plasmid from Escherichia coli and Klebsiella Isolates from Intensive Care Unit Patients and the Utility of Whole-Genome Sequencing in Health Care Settings

    Science.gov (United States)

    Hazen, Tracy H.; Zhao, LiCheng; Boutin, Mallory A.; Stancil, Angela; Robinson, Gwen; Harris, Anthony D.; Rasko, David A.

    2014-01-01

    The IncA/C plasmids have been implicated for their role in the dissemination of β-lactamases, including gene variants that confer resistance to expanded-spectrum cephalosporins, which are often the treatment of last resort against multidrug-resistant, hospital-associated pathogens. A blaFOX-5 gene was detected in 14 Escherichia coli and 16 Klebsiella isolates that were cultured from perianal swabs of patients admitted to an intensive care unit (ICU) of the University of Maryland Medical Center (UMMC) in Baltimore, MD, over a span of 3 years. Four of the FOX-encoding isolates were obtained from subsequent samples of patients that were initially negative for an AmpC β-lactamase upon admission to the ICU, suggesting that the AmpC β-lactamase-encoding plasmid was acquired while the patient was in the ICU. The genomes of five E. coli isolates and six Klebsiella isolates containing blaFOX-5 were selected for sequencing based on their plasmid profiles. An ∼167-kb IncA/C plasmid encoding the FOX-5 β-lactamase, a CARB-2 β-lactamase, additional antimicrobial resistance genes, and heavy metal resistance genes was identified. Another FOX-5-encoding IncA/C plasmid that was nearly identical except for a variable region associated with the resistance genes was also identified. To our knowledge, these plasmids represent the first FOX-5-encoding plasmids sequenced. We used comparative genomics to describe the genetic diversity of a plasmid encoding a FOX-5 β-lactamase relative to the whole-genome diversity of 11 E. coli and Klebsiella isolates that carry this plasmid. Our findings demonstrate the utility of whole-genome sequencing for tracking of plasmid and antibiotic resistance gene distribution in health care settings. PMID:24914121

  16. Whole genome analysis of Vietnamese G2P[4] rotavirus strains possessing the NSP2 gene sharing an ancestral sequence with Chinese sheep and goat rotavirus strains.

    Science.gov (United States)

    Do, Loan Phuong; Doan, Yen Hai; Nakagomi, Toyoko; Gauchan, Punita; Kaneko, Miho; Agbemabiese, Chantal; Dang, Anh Duc; Nakagomi, Osamu

    2015-10-01

    Because imminent introduction into Vietnam of a vaccine against Rotavirus A is anticipated, baseline information on the whole genome of representative strains is needed to understand changes in circulating strains that may occur after vaccine introduction. In this study, the whole genomes of two G2P[4] strains detected in Nha Trang, Vietnam in 2008 were sequenced, this being the last period during which virtually no rotavirus vaccine was used in this country. The two strains were found to be >99.9% identical in sequence and had a typical DS-1 like G2-P[4]-I2-R2-C2-M2-A2-N2-T2-E2-H2 genotype constellation. Analysis of the Vietnamese strains with >184 G2P[4] strains retrieved from GenBank/EMBL/DDBJ DNA databases placed the Vietnamese strains in one of the lineages commonly found among contemporary strains, with the exception of the NSP2 and NSP4 genes. The NSP2 genes were found to belong to a previously undescribed lineage that diverged from Chinese sheep and goat rotavirus strains, including a Chinese rotavirus vaccine strain LLR with 95% nucleotide identity; the time of their most recent common ancestor was 1975. The NSP4 genes were found to belong, together with Thai and USA strains, to an emergent lineage (VIII), adding further diversity to ever diversifying NSP4 lineages. Thus, there is a need to enhance surveillance of locally-circulating strains from both children and animals at the whole genome level to address the effect of rotavirus vaccines on changing strain distribution. © 2015 The Societies and Wiley Publishing Asia Pty Ltd.

  17. Whole genome de novo assemblies of three divergent strains of rice, Oryza sativa, document novel gene space of aus and indica.

    Science.gov (United States)

    Schatz, Michael C; Maron, Lyza G; Stein, Joshua C; Hernandez Wences, Alejandro; Gurtowski, James; Biggers, Eric; Lee, Hayan; Kramer, Melissa; Antoniou, Eric; Ghiban, Elena; Wright, Mark H; Chia, Jer-ming; Ware, Doreen; McCouch, Susan R; McCombie, W Richard

    2014-01-01

    The use of high throughput genome-sequencing technologies has uncovered a large extent of structural variation in eukaryotic genomes that makes important contributions to genomic diversity and phenotypic variation. When the genomes of different strains of a given organism are compared, whole genome resequencing data are typically aligned to an established reference sequence. However, when the reference differs in significant structural ways from the individuals under study, the analysis is often incomplete or inaccurate. Here, we use rice as a model to demonstrate how improvements in sequencing and assembly technology allow rapid and inexpensive de novo assembly of next generation sequence data into high-quality assemblies that can be directly compared using whole genome alignment to provide an unbiased assessment. Using this approach, we are able to accurately assess the "pan-genome" of three divergent rice varieties and document several megabases of each genome absent in the other two. Many of the genome-specific loci are annotated to contain genes, reflecting the potential for new biological properties that would be missed by standard reference-mapping approaches. We further provide a detailed analysis of several loci associated with agriculturally important traits, including the S5 hybrid sterility locus, the Sub1 submergence tolerance locus, the LRK gene cluster associated with improved yield, and the Pup1 cluster associated with phosphorus deficiency, illustrating the utility of our approach for biological discovery. All of the data and software are openly available to support further breeding and functional studies of rice and other species.

  18. Whole genome co-expression analysis of soybean cytochrome P450 genes identifies nodulation-specific P450 monooxygenases

    Directory of Open Access Journals (Sweden)

    Pandey Sona

    2010-11-01

    Full Text Available Abstract Background Cytochrome P450 monooxygenases (P450s catalyze oxidation of various substrates using oxygen and NAD(PH. Plant P450s are involved in the biosynthesis of primary and secondary metabolites performing diverse biological functions. The recent availability of the soybean genome sequence allows us to identify and analyze soybean putative P450s at a genome scale. Co-expression analysis using an available soybean microarray and Illumina sequencing data provides clues for functional annotation of these enzymes. This approach is based on the assumption that genes that have similar expression patterns across a set of conditions may have a functional relationship. Results We have identified a total number of 332 full-length P450 genes and 378 pseudogenes from the soybean genome. From the full-length sequences, 195 genes belong to A-type, which could be further divided into 20 families. The remaining 137 genes belong to non-A type P450s and are classified into 28 families. A total of 178 probe sets were found to correspond to P450 genes on the Affymetrix soybean array. Out of these probe sets, 108 represented single genes. Using the 28 publicly available microarray libraries that contain organ-specific information, some tissue-specific P450s were identified. Similarly, stress responsive soybean P450s were retrieved from 99 microarray soybean libraries. We also utilized Illumina transcriptome sequencing technology to analyze the expressions of all 332 soybean P450 genes. This dataset contains total RNAs isolated from nodules, roots, root tips, leaves, flowers, green pods, apical meristem, mock-inoculated and Bradyrhizobium japonicum-infected root hair cells. The tissue-specific expression patterns of these P450 genes were analyzed and the expression of a representative set of genes were confirmed by qRT-PCR. We performed the co-expression analysis on many of the 108 P450 genes on the Affymetrix arrays. First we confirmed that CYP93C5 (an

  19. Mapping and Identifying a Candidate Gene (Bnmfs for Female-Male Sterility through Whole-Genome Resequencing and RNA-Seq in Rapeseed (Brassica napus L.

    Directory of Open Access Journals (Sweden)

    Changcai Teng

    2017-12-01

    Full Text Available In oilseed crops, carpel and stamen development play vital roles in pollination and rapeseed yield, but the genetic mechanisms underlying carpel and stamen development remain unclear. Herein, a male- and female-sterile mutant was obtained in offspring of a (Brassica napus cv. Qingyou 14 × (Qingyou 14 × B. rapa landrace Dahuang cross. Subsequently, F2–F9 populations were generated through selfing of the heterozygote plants among the progeny of each generation. The male- and female-sterility exhibited stable inheritance in successive generations and was controlled by a recessive gene. The mutant kept the same chromosome number (2n = 38 as B. napus parent but showed abnormal meiosis for male and female. One candidate gene for the sterility was identified by simple sequence repeat (SSR and insertion deletion length polymorphism (InDel markers in F7–F9 plants, and whole-genome resequencing with F8 pools and RNA sequencing with F9 pools. Whole-genome resequencing found three candidate intervals (35.40–35.68, 35.74–35.75, and 45.34–46.45 Mb on chromosome C3 in B. napus and candidate region for Bnmfs was narrowed to approximately 1.11-Mb (45.34–46.45 M by combining SSR and InDel marker analyses with whole-genome resequencing. From transcriptome profiling in 0–2 mm buds, all of the genes in the candidate interval were detected, and only two genes with significant differences (BnaC03g56670D and BnaC03g56870D were revealed. BnaC03g56870D was a candidate gene that shared homology with the CYP86C4 gene of Arabidopsis thaliana. Quantitative reverse transcription (qRT-PCR analysis showed that Bnmfs primarily functioned in flower buds. Thus, sequencing and expression analyses provided evidence that BnaC03g56870D was the candidate gene for male and female sterility in the B. napus mutant.

  20. Whole-genome gene expression profiling revealed genes and pathways potentially involved in regulating interactions of soybean with cyst nematode (Heterodera glycines Ichinohe).

    Science.gov (United States)

    Wan, Jinrong; Vuong, Tri; Jiao, Yongqing; Joshi, Trupti; Zhang, Hongxin; Xu, Dong; Nguyen, Henry T

    2015-03-04

    Soybean cyst nematode (SCN, Heterodera glycines Ichinohe) is the most devastating pathogen of soybean. Many gene expression profiling studies have been conducted to investigate the responses of soybean to the infection by this pathogen using primarily the first-generation soybean genome array that covered approximately 37,500 soybean transcripts. However, no study has been reported yet using the second-generation Affymetrix soybean whole-genome transcript array (Soybean WT array) that represents approximately 66,000 predicted soybean transcripts. In the present work, the gene expression profiles of two soybean plant introductions (PIs) PI 437654 and PI 567516C (both resistant to multiple SCN HG Types) and cultivar Magellan (susceptible to SCN) were compared in the presence or absence of the SCN inoculum at 3 and 8 days post-inoculation using the Soybean WT array. Data analysis revealed that the two resistant soybean lines showed distinctive gene expression profiles from each other and from Magellan not only in response to the SCN inoculation, but also in the absence of SCN. Overall, 1,413 genes and many pathways were revealed to be differentially regulated. Among them, 297 genes were constitutively regulated in the two resistant lines (compared with Magellan) and 1,146 genes were responsive to the SCN inoculation in the three lines, with 30 genes regulated both constitutively and by SCN. In addition to the findings similar to those in the published work, many genes involved in ethylene, protein degradation, and phenylpropanoid pathways were also revealed differentially regulated in the present study. GC-rich elements (e.g., GCATGC) were found over-represented in the promoter regions of certain groups of genes. These have not been observed before, and could be new defense-responsive regulatory elements. Different soybean lines showed different gene expression profiles in the presence and absence of the SCN inoculum. Both inducible and constitutive gene expression

  1. Practices and views of neurologists regarding the use of whole-genome sequencing in clinical settings: a web-based survey.

    Science.gov (United States)

    Jaitovich Groisman, Iris; Hurlimann, Thierry; Shoham, Amir; Godard, Béatrice

    2017-06-01

    The use of Whole-Genome Sequencing (WGS) in clinical settings has brought up a number of controversial scientific and ethical issues. The application of WGS is of particular relevance in neurology, as many conditions are difficult to diagnose. We conducted a worldwide, web-based survey to explore neurologists' views on the benefits of, and concerns regarding, the clinical use of WGS, as well as the resources necessary to implement it. Almost half of the 204 neurologists in the study treated mostly adult patients (48%), while the rest mainly children (37.3%), or both (14.7%). Epilepsy (73%) and headaches (57.8%) were the predominant conditions treated. Factor analysis brought out two profiles: neurologists who would offer WGS to their patients, and those who would not, or were not sure in which circumstances it should be offered. Neurologists considering the use of WGS as bringing more benefits than drawbacks currently used targeted genetic testing (PWGS' benefits were directed towards the patients, while its risks were of a financial and legal nature. Furthermore, there was a correlation between respondents' current use of genetic tests and an anticipation of increased use in the future (PWGS in their practice (53.5%). Our results highlight gaps in education, organization, and funding to support the use of WGS in neurology, and draw attention to the need for resources that could strongly contribute to more straightforward diagnoses and possibly better treatment of neurological conditions.

  2. pySAPC, a python package for sparse affinity propagation clustering: Application to odontogenesis whole genome time series gene-expression data.

    Science.gov (United States)

    Cao, Huojun; Amendt, Brad A

    2016-11-01

    Developmental dental anomalies are common forms of congenital defects. The molecular mechanisms of dental anomalies are poorly understood. Systematic approaches such as clustering genes based on similar expression patterns could identify novel genes involved in dental anomalies and provide a framework for understanding molecular regulatory mechanisms of these genes during tooth development (odontogenesis). A python package (pySAPC) of sparse affinity propagation clustering algorithm for large datasets was developed. Whole genome pair-wise similarity was calculated based on expression pattern similarity based on 45 microarrays of several stages during odontogenesis. pySAPC identified 743 gene clusters based on expression pattern similarity during mouse tooth development. Three clusters are significantly enriched for genes associated with dental anomalies (with FDR odontogenesis. Clustering genes based on similar expression profiles recovered several known regulatory relationships for genes involved in odontogenesis, as well as many novel genes that may be involved with the same genetic pathways as genes that have already been shown to contribute to dental defects. By using sparse similarity matrix, pySAPC use much less memory and CPU time compared with the original affinity propagation program that uses a full similarity matrix. This python package will be useful for many applications where dataset(s) are too large to use full similarity matrix. This article is part of a Special Issue entitled "System Genetics" Guest Editor: Dr. Yudong Cai and Dr. Tao Huang. Copyright © 2016. Published by Elsevier B.V.

  3. Whole-genome comparative analysis of virulence genes unveils similarities and differences between endophytes and other symbiotic bacteria

    Directory of Open Access Journals (Sweden)

    Sebastian eLòpez-Fernàndez

    2015-05-01

    Full Text Available Plant pathogens and endophytes co-exist and often interact with the host plant and within its microbial community. The outcome of these interactions may lead to healthy plants through beneficial interactions, or to disease through the inducible production of molecules known as virulence factors. Unravelling the role of virulence in endophytes may crucially improve our understanding of host-associated microbial communities and their correlation with host health.Virulence is the outcome of a complex network of interactions, and drawing the line between pathogens and endophytes has proven to be conflictive, as strain-level differences in niche overlapping, ecological interactions, state of the host’s immune system and environmental factors are seldom taken into account. Defining genomic differences between endophytes and plant pathogens is decisive for understanding the boundaries between these two groups. Here we describe the major differences at the genomic level between seven grapevine endophytic test bacteria, and twelve reference strains. We describe the virulence factors detected in the genomes of the test group, as compared to endophytic and non-endophytic references, to better understand the distribution of these traits in endophytic genomes. To do this, we adopted a comparative whole-genome approach, encompassing BLAST-based searches through the GUI-based tools Mauve and BRIG as well as calculating the core and accessory genomes of three genera of enterobacteria. We outline divergences in metabolic pathways of these endophytes and reference strains, with the aid of the online platform RAST. We present a summary of the major differences that help in the drawing of the boundaries between harmless and harmful bacteria, in the spirit of contributing to a microbiological definition of endophyte.

  4. Expansion of banana (Musa acuminata) gene families involved in ethylene biosynthesis and signalling after lineage-specific whole-genome duplications.

    Science.gov (United States)

    Jourda, Cyril; Cardi, Céline; Mbéguié-A-Mbéguié, Didier; Bocs, Stéphanie; Garsmeur, Olivier; D'Hont, Angélique; Yahiaoui, Nabila

    2014-05-01

    Whole-genome duplications (WGDs) are widespread in plants, and three lineage-specific WGDs occurred in the banana (Musa acuminata) genome. Here, we analysed the impact of WGDs on the evolution of banana gene families involved in ethylene biosynthesis and signalling, a key pathway for banana fruit ripening. Banana ethylene pathway genes were identified using comparative genomics approaches and their duplication modes and expression profiles were analysed. Seven out of 10 banana ethylene gene families evolved through WGD and four of them (1-aminocyclopropane-1-carboxylate synthase (ACS), ethylene-insensitive 3-like (EIL), ethylene-insensitive 3-binding F-box (EBF) and ethylene response factor (ERF)) were preferentially retained. Banana orthologues of AtEIN3 and AtEIL1, two major genes for ethylene signalling in Arabidopsis, were particularly expanded. This expansion was paralleled by that of EBF genes which are responsible for control of EIL protein levels. Gene expression profiles in banana fruits suggested functional redundancy for several MaEBF and MaEIL genes derived from WGD and subfunctionalization for some of them. We propose that EIL and EBF genes were co-retained after WGD in banana to maintain balanced control of EIL protein levels and thus avoid detrimental effects of constitutive ethylene signalling. In the course of evolution, subfunctionalization was favoured to promote finer control of ethylene signalling. © 2014 CIRAD New Phytologist © 2014 New Phytologist Trust.

  5. Mining whole genomes and transcriptomes of Jatropha (Jatropha curcas) and Castor bean (Ricinus communis) for NBS-LRR genes and defense response associated transcription factors.

    Science.gov (United States)

    Sood, Archit; Jaiswal, Varun; Chanumolu, Sree Krishna; Malhotra, Nikhil; Pal, Tarun; Chauhan, Rajinder Singh

    2014-11-01

    Jatropha (Jatropha curcas L.) and Castor bean (Ricinus communis) are oilseed crops of family Euphorbiaceae with the potential of producing high quality biodiesel and having industrial value. Both the bioenergy plants are becoming susceptible to various biotic stresses directly affecting the oil quality and content. No report exists as of today on analysis of Nucleotide Binding Site-Leucine Rich Repeat (NBS-LRR) gene repertoire and defense response transcription factors in both the plant species. In silico analysis of whole genomes and transcriptomes identified 47 new NBS-LRR genes in both the species and 122 and 318 defense response related transcription factors in Jatropha and Castor bean, respectively. The identified NBS-LRR genes and defense response transcription factors were mapped onto the respective genomes. Common and unique NBS-LRR genes and defense related transcription factors were identified in both the plant species. All NBS-LRR genes in both the species were characterized into Toll/interleukin-1 receptor NBS-LRRs (TNLs) and coiled-coil NBS-LRRs (CNLs), position on contigs, gene clusters and motifs and domains distribution. Transcript abundance or expression values were measured for all NBS-LRR genes and defense response transcription factors, suggesting their functional role. The current study provides a repertoire of NBS-LRR genes and transcription factors which can be used in not only dissecting the molecular basis of disease resistance phenotype but also in developing disease resistant genotypes in Jatropha and Castor bean through transgenic or molecular breeding approaches.

  6. Multi-platform whole-genome microarray analyses refine the epigenetic signature of breast cancer metastasis with gene expression and copy number.

    Directory of Open Access Journals (Sweden)

    Joseph Andrews

    Full Text Available BACKGROUND: We have previously identified genome-wide DNA methylation changes in a cell line model of breast cancer metastasis. These complex epigenetic changes that we observed, along with concurrent karyotype analyses, have led us to hypothesize that complex genomic alterations in cancer cells (deletions, translocations and ploidy are superimposed over promoter-specific methylation events that are responsible for gene-specific expression changes observed in breast cancer metastasis. METHODOLOGY/PRINCIPAL FINDINGS: We undertook simultaneous high-resolution, whole-genome analyses of MDA-MB-468GFP and MDA-MB-468GFP-LN human breast cancer cell lines (an isogenic, paired lymphatic metastasis cell line model using Affymetrix gene expression (U133, promoter (1.0R, and SNP/CNV (SNP 6.0 microarray platforms to correlate data from gene expression, epigenetic (DNA methylation, and combination copy number variant/single nucleotide polymorphism microarrays. Using Partek Software and Ingenuity Pathway Analysis we integrated datasets from these three platforms and detected multiple hypomethylation and hypermethylation events. Many of these epigenetic alterations correlated with gene expression changes. In addition, gene dosage events correlated with the karyotypic differences observed between the cell lines and were reflected in specific promoter methylation patterns. Gene subsets were identified that correlated hyper (and hypo methylation with the loss (or gain of gene expression and in parallel, with gene dosage losses and gains, respectively. Individual gene targets from these subsets were also validated for their methylation, expression and copy number status, and susceptible gene pathways were identified that may indicate how selective advantage drives the processes of tumourigenesis and metastasis. CONCLUSIONS/SIGNIFICANCE: Our approach allows more precisely profiling of functionally relevant epigenetic signatures that are associated with cancer

  7. Whole-genome sequencing of gentamicin-resistant Campylobacter coli isolated from U.S. retail meats reveals novel plasmid-mediated aminoglycoside resistance genes.

    Science.gov (United States)

    Chen, Yuansha; Mukherjee, Sampa; Hoffmann, Maria; Kotewicz, Michael L; Young, Shenia; Abbott, Jason; Luo, Yan; Davidson, Maureen K; Allard, Marc; McDermott, Patrick; Zhao, Shaohua

    2013-11-01

    Aminoglycoside resistance in Campylobacter has been routinely monitored in the United States in clinical isolates since 1996 and in retail meats since 2002. Gentamicin resistance first appeared in a single human isolate of Campylobacter coli in 2000 and in a single chicken meat isolate in 2007, after which it increased rapidly to account for 11.3% of human isolates and 12.5% of retail isolates in 2010. Pulsed-field gel electrophoresis analysis indicated that gentamicin-resistant C. coli isolates from retail meat were clonal. We sequenced the genomes of two strains of this clone using a next-generation sequencing technique in order to investigate the genetic basis for the resistance. The gaps of one strain were closed using optical mapping and Sanger sequencing, and this is the first completed genome of C. coli. The two genomes are highly similar to each other. A self-transmissible plasmid carrying multiple antibiotic resistance genes was revealed within both genomes, carrying genes encoding resistance to gentamicin, kanamycin, streptomycin, streptothricin, and tetracycline. Bioinformatics analysis and experimental results showed that gentamicin resistance was due to a phosphotransferase gene, aph(2")-Ig, not described previously. The phylogenetic relationship of this newly emerged clone to other Campylobacter spp. was determined by whole-genome single nucleotide polymorphisms (SNPs), which showed that it clustered with the other poultry isolates and was separated from isolates from livestock.

  8. Whole-Genome Analysis Revealed the Positively Selected Genes during the Differentiation of indica and Temperate japonica Rice

    Science.gov (United States)

    Sun, Xinli; Jia, Qi; Guo, Yuchun; Zheng, Xiujuan; Liang, Kangjing

    2015-01-01

    To investigate the selective pressures acting on the protein-coding genes during the differentiation of indica and japonica, all of the possible orthologous genes between the Nipponbare and 93–11 genomes were identified and compared with each other. Among these genes, 8,530 pairs had identical sequences, and 27,384 pairs shared more than 90% sequence identity. Only 2,678 pairs of genes displaying a Ka/Ks ratio significantly greater than one were revealed, and most of these genes contained only nonsynonymous sites. The genes without synonymous site were further analyzed with the SNP data of 1529 O. sativa and O. rufipogon accessions, and 1068 genes were identified to be under positive selection during the differentiation of indica and temperate japonica. The positively selected genes (PSGs) are unevenly distributed on 12 chromosomes, and the proteins encoded by the PSGs are dominant with binding, transferase and hydrolase activities, and especially enriched in the plant responses to stimuli, biological regulations, and transport processes. Meanwhile, the most PSGs of the known function and/or expression were involved in the regulation of biotic/abiotic stresses. The evidence of pervasive positive selection suggested that many factors drove the differentiation of indica and japonica, which has already started in wild rice but is much lower than in cultivated rice. Lower differentiation and less PSGs revealed between the Or-It and Or-IIIt wild rice groups implied that artificial selection provides greater contribution on the differentiation than natural selection. In addition, the phylogenetic tree constructed with positively selected sites showed that the japonica varieties exhibited more diversity than indica on differentiation, and Or-III of O. rufipogon exhibited more than Or-I. PMID:25774680

  9. Whole-genome profiling and shotgun sequencing delivers an anchored, gene-decorated, physical map assembly of bread wheat chromosome 6A.

    Science.gov (United States)

    Poursarebani, Naser; Nussbaumer, Thomas; Simková, Hana; Safář, Jan; Witsenboer, Hanneke; van Oeveren, Jan; Doležel, Jaroslav; Mayer, Klaus F X; Stein, Nils; Schnurbusch, Thorsten

    2014-07-01

    Bread wheat (Triticum aestivum L.) is the most important staple food crop for 35% of the world's population. International efforts are underway to facilitate an increase in wheat production, of which the International Wheat Genome Sequencing Consortium (IWGSC) plays an important role. As part of this effort, we have developed a sequence-based physical map of wheat chromosome 6A using whole-genome profiling (WGP™). The bacterial artificial chromosome (BAC) contig assembly tools fingerprinted contig (fpc) and linear topological contig (ltc) were used and their contig assemblies were compared. A detailed investigation of the contigs structure revealed that ltc created a highly robust assembly compared with those formed by fpc. The ltc assemblies contained 1217 contigs for the short arm and 1113 contigs for the long arm, with an L50 of 1 Mb. To facilitate in silico anchoring, WGP™ tags underlying BAC contigs were extended by wheat and wheat progenitor genome sequence information. Sequence data were used for in silico anchoring against genetic markers with known sequences, of which almost 79% of the physical map could be anchored. Moreover, the assigned sequence information led to the 'decoration' of the respective physical map with 3359 anchored genes. Thus, this robust and genetically anchored physical map will serve as a framework for the sequencing of wheat chromosome 6A, and is of immediate use for map-based isolation of agronomically important genes/quantitative trait loci located on this chromosome. © 2014 The Authors The Plant Journal © 2014 John Wiley & Sons Ltd.

  10. Characterization of translocation of silver nanoparticles and effects on whole-genome gene expression using an in vitro intestinal epithelium coculture model.

    Science.gov (United States)

    Bouwmeester, Hans; Poortman, Jenneke; Peters, Ruud J; Wijma, Elly; Kramer, Evelien; Makama, Sunday; Puspitaninganindita, Kinarsashanti; Marvin, Hans J P; Peijnenburg, Ad A C M; Hendriksen, Peter J M

    2011-05-24

    Applications of nanoparticles in the food sector are eminent. Silver nanoparticles are among the most frequently used, making consumer exposure to silver nanoparticles inevitable. Information about uptake through the intestines and possible toxic effects of silver nanoparticles is therefore very important but still lacking. In the present study, we used an in vitro model for the human intestinal epithelium consisting of Caco-2 and M-cells to study the passage of silver nanoparticles and their ionic equivalents and to assess their effects on whole-genome mRNA expression. This in vitro intestine model was exposed to four sizes of silver nanoparticles for 4 h. Exposure to silver ions was included as a control since 6-17% of the silver nanoparticles were found to be dissociated into silver ions. The amount of silver ions that passed the Caco-2 cell barrier was equal for the silver ion and nanoparticle exposures. The nanoparticles induced clear changes in gene expression in a range of stress responses including oxidative stress, endoplasmatic stress response, and apoptosis. The gene expression response to silver nanoparticles, however, was very similar to that of AgNO(3). Therefore, the observed effects of the silver nanoparticles are likely exerted by the silver ions that are released from the nanoparticles.

  11. Use of whole genome deep sequencing to define emerging minority variants in virus envelope genes in herpesvirus treated with novel antimicrobial K21.

    Science.gov (United States)

    Tweedy, Joshua G; Prusty, Bhupesh K; Gompels, Ursula A

    2017-10-01

    New antivirals are required to prevent rising antimicrobial resistance from replication inhibitors. The aim of this study was to analyse the range of emerging mutations in herpesvirus by whole genome deep sequencing. We tested human herpesvirus 6 treatment with novel antiviral K21, where evidence indicated distinct effects on virus envelope proteins. We treated BACmid cloned virus in order to analyse mechanisms and candidate targets for resistance. Illumina based next generation sequencing technology enabled analyses of mutations in 85 genes to depths of 10,000 per base detecting low prevalent minority variants (genes including two envelope glycoproteins. Strikingly, treatment with K21 did not accumulate the passage mutations; instead a high frequency mutation was selected in envelope protein gQ2, part of the gH/gL complex essential for herpesvirus infection. This introduced a stop codon encoding a truncation mutation previously observed in increased virion production. There was reduced detection of the glycoprotein complex in infected cells. This supports a novel pathway for K21 targeting virion envelopes distinct from replication inhibition. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.

  12. Whole Genome and Global Gene Expression Analyses of the Model Mushroom Flammulina velutipes Reveal a High Capacity for Lignocellulose Degradation

    Science.gov (United States)

    Park, Young-Jin; Baek, Jeong Hun; Lee, Seonwook; Kim, Changhoon; Rhee, Hwanseok; Kim, Hyungtae; Seo, Jeong-Sun; Park, Hae-Ran; Yoon, Dae-Eun; Nam, Jae-Young; Kim, Hong-Il; Kim, Jong-Guk; Yoon, Hyeokjun; Kang, Hee-Wan; Cho, Jae-Yong; Song, Eun-Sung; Sung, Gi-Ho; Yoo, Young-Bok; Lee, Chang-Soo; Lee, Byoung-Moo; Kong, Won-Sik

    2014-01-01

    Flammulina velutipes is a fungus with health and medicinal benefits that has been used for consumption and cultivation in East Asia. F. velutipes is also known to degrade lignocellulose and produce ethanol. The overlapping interests of mushroom production and wood bioconversion make F. velutipes an attractive new model for fungal wood related studies. Here, we present the complete sequence of the F. velutipes genome. This is the first sequenced genome for a commercially produced edible mushroom that also degrades wood. The 35.6-Mb genome contained 12,218 predicted protein-encoding genes and 287 tRNA genes assembled into 11 scaffolds corresponding with the 11 chromosomes of strain KACC42780. The 88.4-kb mitochondrial genome contained 35 genes. Well-developed wood degrading machinery with strong potential for lignin degradation (69 auxiliary activities, formerly FOLymes) and carbohydrate degradation (392 CAZymes), along with 58 alcohol dehydrogenase genes were highly expressed in the mycelium, demonstrating the potential application of this organism to bioethanol production. Thus, the newly uncovered wood degrading capacity and sequential nature of this process in F. velutipes, offer interesting possibilities for more detailed studies on either lignin or (hemi-) cellulose degradation in complex wood substrates. The mutual interest in wood degradation by the mushroom industry and (ligno-)cellulose biomass related industries further increase the significance of F. velutipes as a new model. PMID:24714189

  13. Whole-Genome Sequence Analysis and Genome-Wide Virulence Gene Identification of Riemerella anatipestifer Strain Yb2.

    Science.gov (United States)

    Wang, Xiaolan; Ding, Chan; Wang, Shaohui; Han, Xiangan; Yu, Shengqing

    2015-08-01

    Riemerella anatipestifer is a well-described pathogen of waterfowl and other avian species that can cause septicemic and exudative diseases. In this study, we sequenced the complete genome of R. anatipestifer strain Yb2 and analyzed it against the published genomic sequences of R. anatipestifer strains DSM15868, RA-GD, RA-CH-1, and RA-CH-2. The Yb2 genome contains one circular chromosome of 2,184,066 bp with a 35.73% GC content and no plasmid. The genome has 2,021 open reading frames that occupy 90.88% of the genome. A comparative genomic analysis revealed that genome organization is highly conserved among R. anatipestifer strains, except for four inversions of a sequence segment in Yb2. A phylogenetic analysis found that the closest neighbor of Yb2 is RA-GD. Furthermore, we constructed a library of 3,175 mutants by random transposon mutagenesis, and 100 mutants exhibiting more than 100-fold-attenuated virulence were obtained by animal screening experiments. Southern blot analysis and genetic characterization of the mutants led to the identification of 49 virulence genes. Of these, 25 encode cytoplasmic proteins, 6 encode cytoplasmic membrane proteins, 4 encode outer membrane proteins, and the subcellular localization of the remaining 14 gene products is unknown. The functional classification of orthologous-group clusters revealed that 16 genes are associated with metabolism, 6 are associated with cellular processing and signaling, and 4 are associated with information storage and processing. The functions of the other 23 genes are poorly characterized or unknown. This genome-wide study identified genes important to the virulence of R. anatipestifer. Copyright © 2015, American Society for Microbiology. All Rights Reserved.

  14. Whole-genome sequencing suggests a chemokine gene cluster that modifies age at onset in familial Alzheimer's disease.

    Science.gov (United States)

    Lalli, M A; Bettcher, B M; Arcila, M L; Garcia, G; Guzman, C; Madrigal, L; Ramirez, L; Acosta-Uribe, J; Baena, A; Wojta, K J; Coppola, G; Fitch, R; de Both, M D; Huentelman, M J; Reiman, E M; Brunkow, M E; Glusman, G; Roach, J C; Kao, A W; Lopera, F; Kosik, K S

    2015-11-01

    We have sequenced the complete genomes of 72 individuals affected with early-onset familial Alzheimer's disease caused by an autosomal dominant, highly penetrant mutation in the presenilin-1 (PSEN1) gene, and performed genome-wide association testing to identify variants that modify age at onset (AAO) of Alzheimer's disease. Our analysis identified a haplotype of single-nucleotide polymorphisms (SNPs) on chromosome 17 within a chemokine gene cluster associated with delayed onset of mild-cognitive impairment and dementia. Individuals carrying this haplotype had a mean AAO of mild-cognitive impairment at 51.0 ± 5.2 years compared with 41.1 ± 7.4 years for those without these SNPs. This haplotype thus appears to modify Alzheimer's AAO, conferring a large (~10 years) protective effect. The associated locus harbors several chemokines including eotaxin-1 encoded by CCL11, and the haplotype includes a missense polymorphism in this gene. Validating this association, we found plasma eotaxin-1 levels were correlated with disease AAO in an independent cohort from the University of California San Francisco Memory and Aging Center. In this second cohort, the associated haplotype disrupted the typical age-associated increase of eotaxin-1 levels, suggesting a complex regulatory role for this haplotype in the general population. Altogether, these results suggest eotaxin-1 as a novel modifier of Alzheimer's disease AAO and open potential avenues for therapy.

  15. Whole-genome sequencing identifies a novel ABCB7 gene mutation for X-linked congenital cerebellar ataxia in a large family of Mongolian ancestry.

    Science.gov (United States)

    Protasova, Maria S; Grigorenko, Anastasia P; Tyazhelova, Tatiana V; Andreeva, Tatiana V; Reshetov, Denis A; Gusev, Fedor E; Laptenko, Alexander E; Kuznetsova, Irina L; Goltsov, Andrey Y; Klyushnikov, Sergey A; Illarioshkin, Sergey N; Rogaev, Evgeny I

    2016-04-01

    X-linked congenital cerebellar ataxia is a heterogeneous nonprogressive neurodevelopmental disorder with onset in early childhood. We searched for a genetic cause of this condition, previously reported in a Buryat pedigree of Mongolian ancestry from southeastern Russia. Using whole-genome sequencing on Illumina HiSeq 2000 platform, we found a missense mutation in the ABCB7 (ABC-binding cassette transporter B7) gene, encoding a mitochondrial transporter, involved in heme synthesis and previously associated with sideroblastic anemia and ataxia. The mutation resulting in a substitution of a highly conserved glycine to serine in position 682 is apparently a major causative factor of the cerebellar hypoplasia/atrophy found in affected individuals of a Buryat family who had no evidence of sideroblastic anemia. Moreover, in these affected men we also found the genetic defects in two other genes closely linked to ABCB7 on chromosome X: a deletion of a genomic region harboring the second exon of copper-transporter gene (ATP7A) and a complete deletion of PGAM4 (phosphoglycerate mutase family member 4) retrogene located in the intronic region of the ATP7A gene. Despite the deletion, eliminating the first of six metal-binding domains in ATP7A, no signs for Menkes disease or occipital horn syndrome associated with ATP7A mutations were found in male carriers. The role of the PGAM4 gene has been previously implicated in human reproduction, but our data indicate that its complete loss does not disrupt male fertility. Our finding links cerebellar pathology to the genetic defect in ABCB7 and ATP7A structural variant inherited as X-linked trait, and further reveals the genetic heterogeneity of X-linked cerebellar disorders.

  16. Identification of antimicrobial resistance genes in multidrug-resistant clinical Bacteroides fragilis isolates by whole genome shotgun sequencing

    DEFF Research Database (Denmark)

    Sydenham, Thomas Vognbjerg; Sóki, József; Hasman, Henrik

    2015-01-01

    ://cge.cbs.dtu.dk/services/ResFinder/) and a custom BLAST database. Combinations of cfxA, cepA, cfiA, nimA, nimD, nimE, nimJ, tetQ, ermB, ermF, bexB, linAn2 and mefEn2 genes were identified in the six isolates. blaOXA-347, an open reading frame predicted to be a β-lactamase (Cheng et al., 2012), was identified in one strain. Full length IS elements...

  17. Recent advances in candidate-gene and whole-genome approaches to the discovery of anthelmintic resistance markers and the description of drug/receptor interactions

    Directory of Open Access Journals (Sweden)

    Andrew C. Kotze

    2014-12-01

    Full Text Available Anthelmintic resistance has a great impact on livestock production systems worldwide, is an emerging concern in companion animal medicine, and represents a threat to our ongoing ability to control human soil-transmitted helminths. The Consortium for Anthelmintic Resistance and Susceptibility (CARS provides a forum for scientists to meet and discuss the latest developments in the search for molecular markers of anthelmintic resistance. Such markers are important for detecting drug resistant worm populations, and indicating the likely impact of the resistance on drug efficacy. The molecular basis of resistance is also important for understanding how anthelmintics work, and how drug resistant populations arise. Changes to target receptors, drug efflux and other biological processes can be involved. This paper reports on the CARS group meeting held in August 2013 in Perth, Australia. The latest knowledge on the development of molecular markers for resistance to each of the principal classes of anthelmintics is reviewed. The molecular basis of resistance is best understood for the benzimidazole group of compounds, and we examine recent work to translate this knowledge into useful diagnostics for field use. We examine recent candidate-gene and whole-genome approaches to understanding anthelmintic resistance and identify markers. We also look at drug transporters in terms of providing both useful markers for resistance, as well as opportunities to overcome resistance through the targeting of the transporters themselves with inhibitors. Finally, we describe the tools available for the application of the newest high-throughput sequencing technologies to the study of anthelmintic resistance.

  18. Whole genome expression profiling and screening for differentially expressed cytokine genes in human bone marrow endothelial cells treated with humoral inhibitors in liver cirrhosis.

    Science.gov (United States)

    Gao, Bo; Sun, Wang; Wang, Xianqi; Jia, Xu; Ma, Biao; Chang, Yu; Zhang, Weihui; Xue, Dongbo

    2013-11-01

    Bone marrow endothelial cells (BMECs) are important components of the hematopoietic microenvironment in bone marrow, and they can secrete several types of cytokines to regulate the functions of hematopoietic stem/progenitor cells. To date, it is unknown whether BMECs undergo functional changes and lead to hematopoietic abnormalities in cases of liver cirrhosis (LC). In the present study, whole genome microarray analysis was carried out to detect differentially expressed genes in human BMECs treated for 48 h with medium supplemented with 20% pooled sera from 26 patients with LC or 10 healthy volunteers as the control group. A total of 1,106 upregulated genes and 766 downregulated genes were identified. In Gene Ontology analysis, the most significant categories of genes were revealed. A large number of the upregulated genes were involved in processes, such as cell-cell adhesion, apoptosis and cellular response to stimuli and the downregulated genes were involved in the negative regulation of secretion, angiogenesis, blood vessel development and cell growth. Pathway analysis revealed that the upregulated genes were either cell adhesion molecules or parts of the apoptotic signaling pathway and the downregulated genes were involved in the Wnt signaling pathway and MAPK signaling pathway. These were the pathways with the highest enrichment scores. The results of apoptosis assays revealed that the humoral inhibitors in the sera of patients with LC induced the apoptosis of BMECs, which confirmed the accuracy of bioinformatic analysis. Moreover, we screened and verified 21 differentially expressed cytokine genes [transforming growth factor (TGF)B1, tumor necrosis factor (TNF)B, TNF receptor superfamily, member 11b (TNFRSF11B), TNF (ligand) superfamily, member 13b (TNFSF13B), interleukin (IL)1A, IL6, IL11, IL17C, IL24, family with sequence similarity 3, member B (FAM3B), Fas ligand (FASLG), matrix metallopeptidase (MMP)3, MMP15, vitronectin (VTN), insulin-like growth factor

  19. Whole Genome Epigenetics

    National Research Council Canada - National Science Library

    Carmell, Michelle A; Hannon, Gregory J

    2005-01-01

    .... However, this is only part of the picture. Increasingly, we are learning that epigenetic changes, that is, changes in chromatin structure, are critically important in regulating cellular gene expression...

  20. Whole Genome Epigenetics

    National Research Council Canada - National Science Library

    Carmell, Michelle

    2003-01-01

    .... However, this is only part of the picture. Increasingly, we are learning that epigenetic changes, that is, changes in chromatin structure, are critically important in regulation cellular gene expression...

  1. Whole Genome Epigenetics

    National Research Council Canada - National Science Library

    Carmell, Michelle A; Hannon, Gregory J

    2004-01-01

    .... However, this is only part of the picture. Increasingly, we are learning that epigenetic changes, that is, changes in chromatin structure, are critically important in regulating cellular gene expression...

  2. Assembly of 500,000 inter-specific catfish expressed sequence tags and large scale gene-associated marker development for whole genome association studies

    Energy Technology Data Exchange (ETDEWEB)

    Catfish Genome Consortium; Wang, Shaolin; Peatman, Eric; Abernathy, Jason; Waldbieser, Geoff; Lindquist, Erika; Richardson, Paul; Lucas, Susan; Wang, Mei; Li, Ping; Thimmapuram, Jyothi; Liu, Lei; Vullaganti, Deepika; Kucuktas, Huseyin; Murdock, Christopher; Small, Brian C; Wilson, Melanie; Liu, Hong; Jiang, Yanliang; Lee, Yoona; Chen, Fei; Lu, Jianguo; Wang, Wenqi; Xu, Peng; Somridhivej, Benjaporn; Baoprasertkul, Puttharat; Quilang, Jonas; Sha, Zhenxia; Bao, Baolong; Wang, Yaping; Wang, Qun; Takano, Tomokazu; Nandi, Samiran; Liu, Shikai; Wong, Lilian; Kaltenboeck, Ludmilla; Quiniou, Sylvie; Bengten, Eva; Miller, Norman; Trant, John; Rokhsar, Daniel; Liu, Zhanjiang

    2010-03-23

    Background-Through the Community Sequencing Program, a catfish EST sequencing project was carried out through a collaboration between the catfish research community and the Department of Energy's Joint Genome Institute. Prior to this project, only a limited EST resource from catfish was available for the purpose of SNP identification. Results-A total of 438,321 quality ESTs were generated from 8 channel catfish (Ictalurus punctatus) and 4 blue catfish (Ictalurus furcatus) libraries, bringing the number of catfish ESTs to nearly 500,000. Assembly of all catfish ESTs resulted in 45,306 contigs and 66,272 singletons. Over 35percent of the unique sequences had significant similarities to known genes, allowing the identification of 14,776 unique genes in catfish. Over 300,000 putative SNPs have been identified, of which approximately 48,000 are high-quality SNPs identified from contigs with at least four sequences and the minor allele presence of at least two sequences in the contig. The EST resource should be valuable for identification of microsatellites, genome annotation, large-scale expression analysis, and comparative genome analysis. Conclusions-This project generated a large EST resource for catfish that captured the majority of the catfish transcriptome. The parallel analysis of ESTs from two closely related Ictalurid catfishes should also provide powerful means for the evaluation of ancient and recent gene duplications, and for the development of high-density microarrays in catfish. The inter- and intra-specific SNPs identified from all catfish EST dataset assembly will greatly benefit the catfish introgression breeding program and whole genome association studies.

  3. Assembly of 500,000 inter-specific catfish expressed sequence tags and large scale gene-associated marker development for whole genome association studies.

    Science.gov (United States)

    Wang, Shaolin; Peatman, Eric; Abernathy, Jason; Waldbieser, Geoff; Lindquist, Erika; Richardson, Paul; Lucas, Susan; Wang, Mei; Li, Ping; Thimmapuram, Jyothi; Liu, Lei; Vullaganti, Deepika; Kucuktas, Huseyin; Murdock, Christopher; Small, Brian C; Wilson, Melanie; Liu, Hong; Jiang, Yanliang; Lee, Yoona; Chen, Fei; Lu, Jianguo; Wang, Wenqi; Xu, Peng; Somridhivej, Benjaporn; Baoprasertkul, Puttharat; Quilang, Jonas; Sha, Zhenxia; Bao, Baolong; Wang, Yaping; Wang, Qun; Takano, Tomokazu; Nandi, Samiran; Liu, Shikai; Wong, Lilian; Kaltenboeck, Ludmilla; Quiniou, Sylvie; Bengten, Eva; Miller, Norman; Trant, John; Rokhsar, Daniel; Liu, Zhanjiang

    2010-01-22

    Through the Community Sequencing Program, a catfish EST sequencing project was carried out through a collaboration between the catfish research community and the Department of Energy's Joint Genome Institute. Prior to this project, only a limited EST resource from catfish was available for the purpose of SNP identification. A total of 438,321 quality ESTs were generated from 8 channel catfish (Ictalurus punctatus) and 4 blue catfish (Ictalurus furcatus) libraries, bringing the number of catfish ESTs to nearly 500,000. Assembly of all catfish ESTs resulted in 45,306 contigs and 66,272 singletons. Over 35% of the unique sequences had significant similarities to known genes, allowing the identification of 14,776 unique genes in catfish. Over 300,000 putative SNPs have been identified, of which approximately 48,000 are high-quality SNPs identified from contigs with at least four sequences and the minor allele presence of at least two sequences in the contig. The EST resource should be valuable for identification of microsatellites, genome annotation, large-scale expression analysis, and comparative genome analysis. This project generated a large EST resource for catfish that captured the majority of the catfish transcriptome. The parallel analysis of ESTs from two closely related Ictalurid catfishes should also provide powerful means for the evaluation of ancient and recent gene duplications, and for the development of high-density microarrays in catfish. The inter- and intra-specific SNPs identified from all catfish EST dataset assembly will greatly benefit the catfish introgression breeding program and whole genome association studies.

  4. Whole-genome association analysis of pork meat pH revealed three significant regions and several potential genes in Finnish Yorkshire pigs.

    Science.gov (United States)

    Verardo, Lucas L; Sevón-Aimonen, Marja-Liisa; Serenius, Timo; Hietakangas, Ville; Uimari, Pekka

    2017-02-13

    One of the most commonly used quality measurements of pork is pH measured 24 h after slaughter. The most probable mode of inheritance for this trait is oligogenic with several known major genes, such as PRKAG3. In this study, we used whole-genome SNP genotypes of over 700 AI boars; after a quality check, 42,385 SNPs remained for association analysis. All the boars were purebred Finnish Yorkshire. To account for relatedness of the animals, a pedigree-based relationship matrix was used in a mixed linear model to test the effect of SNPs on pH measured from loin. A bioinformatics analysis was performed to identify the most promising genes in the significant regions related to meat quality. Genome-wide association study (GWAS) revealed three significant chromosomal regions: one on chromosome 3 (39.9 Mb-40.1 Mb) and two on chromosome 15 (58.5 Mb-60.5 Mb and 132 Mb-135 Mb including PRKAG3). A conditional analysis with a significant SNP in the PRKAG3 region, MARC0083357, as a covariate in the model retained the significant SNPs on chromosome 3. Even though linkage disequilibrium was relatively high over a long distance between MARC0083357 and other significant SNPs on chromosome 15, some SNPs retained their significance in the conditional analysis, even in the vicinity of PRKAG3. The significant regions harbored several genes, including two genes involved in cyclic AMP (cAMP) signaling: ADCY9 and CREBBP. Based on functional and transcription factor-gene networks, the most promising candidate genes for meat pH are ADCY9, CREBBP, TRAP1, NRG1, PRKAG3, VIL1, TNS1, and IGFBP5, and the key transcription factors related to these genes are HNF4A, PPARG, and Nkx2-5. Based on SNP association, pathway, and transcription factor analysis, we were able to identify several genes with potential to control muscle cell homeostasis and meat quality. The associated SNPs can be used in selection for better pork. We also showed that post-GWAS analysis reveals important information about the

  5. Pathogenic Mutations in Cancer-Predisposing Genes: A Survey of 300 Patients with Whole-Genome Sequencing and Lifetime Electronic Health Records.

    Directory of Open Access Journals (Sweden)

    Karen Y He

    Full Text Available It is unclear whether and how whole-genome sequencing (WGS data can be used to implement genomic medicine. Our objective is to retrospectively evaluate whether WGS can facilitate improving prevention and care for patients with susceptibility to cancer syndromes.We analyzed genetic mutations in 60 autosomal dominant cancer-predisposition genes in 300 deceased patients with WGS data and nearly complete long-term (over 30 years medical records. To infer biological insights from massive amounts of WGS data and comprehensive clinical data in a short period of time, we developed an in-house analysis pipeline within the SeqHBase software framework to quickly identify pathogenic or likely pathogenic variants. The clinical data of the patients who carried pathogenic and/or likely pathogenic variants were further reviewed to assess their clinical conditions using their lifetime EHRs. Among the 300 participants, 5 (1.7% carried pathogenic or likely pathogenic variants in 5 cancer-predisposing genes: one in APC, BRCA1, BRCA2, NF1, and TP53 each. When assessing the clinical data, each of the 5 patients had one or more different types of cancers, fully consistent with their genetic profiles. Among these 5 patients, 2 died due to cancer while the others had multiple disorders later in their lifetimes; however, they may have benefited from early diagnosis and treatment for healthier lives, had the patients had genetic testing in their earlier lifetimes.We demonstrated a case study where the discovery of pathogenic or likely pathogenic germline mutations from population-wide WGS correlates with clinical outcome. The use of WGS may have clinical impacts to improve healthcare delivery.

  6. Functional Genome Mining for Metabolites Encoded by Large Gene Clusters through Heterologous Expression of a Whole-Genome Bacterial Artificial Chromosome Library in Streptomyces spp.

    Science.gov (United States)

    Xu, Min; Wang, Yemin; Zhao, Zhilong; Gao, Guixi; Huang, Sheng-Xiong; Kang, Qianjin; He, Xinyi; Lin, Shuangjun; Pang, Xiuhua; Deng, Zixin; Tao, Meifeng

    2016-10-01

    Genome sequencing projects in the last decade revealed numerous cryptic biosynthetic pathways for unknown secondary metabolites in microbes, revitalizing drug discovery from microbial metabolites by approaches called genome mining. In this work, we developed a heterologous expression and functional screening approach for genome mining from genomic bacterial artificial chromosome (BAC) libraries in Streptomyces spp. We demonstrate mining from a strain of Streptomyces rochei, which is known to produce streptothricins and borrelidin, by expressing its BAC library in the surrogate host Streptomyces lividans SBT5, and screening for antimicrobial activity. In addition to the successful capture of the streptothricin and borrelidin biosynthetic gene clusters, we discovered two novel linear lipopeptides and their corresponding biosynthetic gene cluster, as well as a novel cryptic gene cluster for an unknown antibiotic from S. rochei This high-throughput functional genome mining approach can be easily applied to other streptomycetes, and it is very suitable for the large-scale screening of genomic BAC libraries for bioactive natural products and the corresponding biosynthetic pathways. Microbial genomes encode numerous cryptic biosynthetic gene clusters for unknown small metabolites with potential biological activities. Several genome mining approaches have been developed to activate and bring these cryptic metabolites to biological tests for future drug discovery. Previous sequence-guided procedures relied on bioinformatic analysis to predict potentially interesting biosynthetic gene clusters. In this study, we describe an efficient approach based on heterologous expression and functional screening of a whole-genome library for the mining of bioactive metabolites from Streptomyces The usefulness of this function-driven approach was demonstrated by the capture of four large biosynthetic gene clusters for metabolites of various chemical types, including streptothricins

  7. Development and validation of concurrent preimplantation genetic diagnosis for single gene disorders and comprehensive chromosomal aneuploidy screening without whole genome amplification.

    Science.gov (United States)

    Zimmerman, Rebekah S; Jalas, Chaim; Tao, Xin; Fedick, Anastasia M; Kim, Julia G; Pepe, Russell J; Northrop, Lesley E; Scott, Richard T; Treff, Nathan R

    2016-02-01

    To develop a novel and robust protocol for multifactorial preimplantation genetic testing of trophectoderm biopsies using quantitative polymerase chain reaction (qPCR). Prospective and blinded. Not applicable. Couples indicated for preimplantation genetic diagnosis (PGD). None. Allele dropout (ADO) and failed amplification rate, genotyping consistency, chromosome screening success rate, and clinical outcomes of qPCR-based screening. The ADO frequency on a single cell from a fibroblast cell line was 1.64% (18/1,096). When two or more cells were tested, the ADO frequency dropped to 0.02% (1/4,426). The rate of amplification failure was 1.38% (55/4,000) overall, with 2.5% (20/800) for single cells and 1.09% (35/3,200) for samples that had two or more cells. Among 152 embryos tested in 17 cases by qPCR-based PGD and CCS, 100% were successfully given a diagnosis, with 0% ADO or amplification failure. Genotyping consistency with reference laboratory results was >99%. Another 304 embryos from 43 cases were included in the clinical application of qPCR-based PGD and CCS, for which 99.7% (303/304) of the embryos were given a definitive diagnosis, with only 0.3% (1/304) having an inconclusive result owing to recombination. In patients receiving a transfer with follow-up, the pregnancy rate was 82% (27/33). This study demonstrates that the use of qPCR for PGD testing delivers consistent and more reliable results than existing methods and that single gene disorder PGD can be run concurrently with CCS without the need for additional embryo biopsy or whole genome amplification. Copyright © 2016 American Society for Reproductive Medicine. Published by Elsevier Inc. All rights reserved.

  8. Spiked GBS: a unified, open platform for single marker genotyping and whole-genome profiling.

    Science.gov (United States)

    Rife, Trevor W; Wu, Shuangye; Bowden, Robert L; Poland, Jesse A

    2015-03-28

    In plant breeding, there are two primary applications for DNA markers in selection: 1) selection of known genes using a single marker assay (marker-assisted selection; MAS); and 2) whole-genome profiling and prediction (genomic selection; GS). Typically, marker platforms have addressed only one of these objectives. We have developed spiked genotyping-by-sequencing (sGBS), which combines targeted amplicon sequencing with reduced representation genotyping-by-sequencing. To minimize the cost of targeted assays, we utilize a small percent of sequencing capacity available in runs of GBS libraries to "spike" amplified targets of a priori alleles tagged with a different set of unique barcodes. This open platform allows multiple, single-target loci to be assayed while simultaneously generating a whole-genome profile. This dual-genotyping approach allows different sets of samples to be evaluated for single markers or whole genome-profiling. Here, we report the application of sGBS on a winter wheat panel that was screened for converted KASP markers and newly-designed markers targeting known polymorphisms in the leaf rust resistance gene Lr34. The flexibility and low-cost of sGBS will enable a range of applications across genetics research. Specifically in breeding applications, the sGBS approach will allow breeders to obtain a whole-genome profile of important individuals while simultaneously targeting specific genes for a range of selection strategies across the breeding program.

  9. Whole genome sequencing of field isolates reveals a common duplication of the Duffy binding protein gene in Malagasy Plasmodium vivax strains.

    Directory of Open Access Journals (Sweden)

    Didier Menard

    2013-11-01

    Full Text Available Plasmodium vivax is the most prevalent human malaria parasite, causing serious public health problems in malaria-endemic countries. Until recently the Duffy-negative blood group phenotype was considered to confer resistance to vivax malaria for most African ethnicities. We and others have reported that P. vivax strains in African countries from Madagascar to Mauritania display capacity to cause clinical vivax malaria in Duffy-negative people. New insights must now explain Duffy-independent P. vivax invasion of human erythrocytes.Through recent whole genome sequencing we obtained ≥ 70× coverage of the P. vivax genome from five field-isolates, resulting in ≥ 93% of the Sal I reference sequenced at coverage greater than 20×. Combined with sequences from one additional Malagasy field isolate and from five monkey-adapted strains, we describe here identification of DNA sequence rearrangements in the P. vivax genome, including discovery of a duplication of the P. vivax Duffy binding protein (PvDBP gene. A survey of Malagasy patients infected with P. vivax showed that the PvDBP duplication was present in numerous locations in Madagascar and found in over 50% of infected patients evaluated. Extended geographic surveys showed that the PvDBP duplication was detected frequently in vivax patients living in East Africa and in some residents of non-African P. vivax-endemic countries. Additionally, the PvDBP duplication was observed in travelers seeking treatment of vivax malaria upon returning home. PvDBP duplication prevalence was highest in west-central Madagascar sites where the highest frequencies of P. vivax-infected, Duffy-negative people were reported.The highly conserved nature of the sequence involved in the PvDBP duplication suggests that it has occurred in a recent evolutionary time frame. These data suggest that PvDBP, a merozoite surface protein involved in red cell adhesion is rapidly evolving, possibly in response to constraints imposed by

  10. Multiple Whole Genome Alignments Without a Reference Organism

    Energy Technology Data Exchange (ETDEWEB)

    Dubchak, Inna; Poliakov, Alexander; Kislyuk, Andrey; Brudno, Michael

    2009-01-16

    Multiple sequence alignments have become one of the most commonly used resources in genomics research. Most algorithms for multiple alignment of whole genomes rely either on a reference genome, against which all of the other sequences are laid out, or require a one-to-one mapping between the nucleotides of the genomes, preventing the alignment of recently duplicated regions. Both approaches have drawbacks for whole-genome comparisons. In this paper we present a novel symmetric alignment algorithm. The resulting alignments not only represent all of the genomes equally well, but also include all relevant duplications that occurred since the divergence from the last common ancestor. Our algorithm, implemented as a part of the VISTA Genome Pipeline (VGP), was used to align seven vertebrate and sixDrosophila genomes. The resulting whole-genome alignments demonstrate a higher sensitivity and specificity than the pairwise alignments previously available through the VGP and have higher exon alignment accuracy than comparable public whole-genome alignments. Of the multiple alignment methods tested, ours performed the best at aligning genes from multigene families?perhaps the most challenging test for whole-genome alignments. Our whole-genome multiple alignments are available through the VISTA Browser at http://genome.lbl.gov/vista/index.shtml.

  11. Harnessing Whole Genome Sequencing in Medical Mycology.

    Science.gov (United States)

    Cuomo, Christina A

    2017-01-01

    Comparative genome sequencing studies of human fungal pathogens enable identification of genes and variants associated with virulence and drug resistance. This review describes current approaches, resources, and advances in applying whole genome sequencing to study clinically important fungal pathogens. Genomes for some important fungal pathogens were only recently assembled, revealing gene family expansions in many species and extreme gene loss in one obligate species. The scale and scope of species sequenced is rapidly expanding, leveraging technological advances to assemble and annotate genomes with higher precision. By using iteratively improved reference assemblies or those generated de novo for new species, recent studies have compared the sequence of isolates representing populations or clinical cohorts. Whole genome approaches provide the resolution necessary for comparison of closely related isolates, for example, in the analysis of outbreaks or sampled across time within a single host. Genomic analysis of fungal pathogens has enabled both basic research and diagnostic studies. The increased scale of sequencing can be applied across populations, and new metagenomic methods allow direct analysis of complex samples.

  12. [Progress on whole genome sequencing in woody plants].

    Science.gov (United States)

    Shi, Ji-Sen; Wang, Zhan-Jun; Chen, Jin-Hui

    2012-02-01

    In recent years, the number of sequencing data of plant whole genome have been increasing rapidly and the whole genome sequencing has been also performed widely in woody plants. However, there are a set of obstacles in investigating the whole genome sequencing in woody plants, which include larger genome, complex genome structure, limitations of assembly, annotation, functional analysis, and restriction of the funds for scientific research. Therefore, to promote the efficiency of the whole genome sequencing in woody plants, the development and defect of this field should be analyzed. The three-generation sequencing technologies (i.e., Sanger sequencing, synthesis sequencing, and single molecule sequencing) were compared in our studies. The progress mainly focused on the whole genome sequencing in four woody plants (Populus, Grapevine, Papaya, and Apple), and the application of sequencing results also was analyzed. The future of whole genome sequencing research in woody plants, consisting of material selection, establishment of genetic map and physical map, selection of sequencing technology, bioinformatic analysis, and application of sequencing results, was discussed.

  13. Detection of DNA Methylation by Whole-Genome Bisulfite Sequencing.

    Science.gov (United States)

    Li, Qing; Hermanson, Peter J; Springer, Nathan M

    2018-01-01

    DNA methylation plays an important role in the regulation of the expression of transposons and genes. Various methods have been developed to assay DNA methylation levels. Bisulfite sequencing is considered to be the "gold standard" for single-base resolution measurement of DNA methylation levels. Coupled with next-generation sequencing, whole-genome bisulfite sequencing (WGBS) allows DNA methylation to be evaluated at a genome-wide scale. Here, we described a protocol for WGBS in plant species with large genomes. This protocol has been successfully applied to assay genome-wide DNA methylation levels in maize and barley. This protocol has also been successfully coupled with sequence capture technology to assay DNA methylation levels in a targeted set of genomic regions.

  14. Accuracy and coverage assessment of Oryctolagus cuniculus (rabbit) genes encoding immunoglobulins in the whole genome sequence assembly (OryCun2.0) and localization of the IGH locus to chromosome 20.

    Science.gov (United States)

    Gertz, E Michael; Schäffer, Alejandro A; Agarwala, Richa; Bonnet-Garnier, Amélie; Rogel-Gaillard, Claire; Hayes, Hélène; Mage, Rose G

    2013-10-01

    We report on the analyses of genes encoding immunoglobulin heavy and light chains in the rabbit 6.51× whole genome assembly. This OryCun2.0 assembly confirms previous mapping of the duplicated IGK1 and IGK2 loci to chromosome 2 and the IGL lambda light chain locus to chromosome 21. The most frequently rearranged and expressed IGHV1 that is closest to IG DH and IGHJ genes encodes rabbit VHa allotypes. The partially inbred Thorbecke strain rabbit used for whole-genome sequencing was homozygous at the IGK but heterozygous with the IGHV1a1 allele in one of 79 IGHV-containing unplaced scaffolds and IGHV1a2, IGHM, IGHG, and IGHE sequences in another. Some IGKV, IGLV, and IGHA genes are also in other unplaced scaffolds. By fluorescence in situ hybridization, we assigned the previously unmapped IGH locus to the q-telomeric region of rabbit chromosome 20. An approximately 3-Mb segment of human chromosome 14 including IGH genes predicted to map to this telomeric region based on synteny analysis could not be located on assembled chromosome 20. Unplaced scaffold chrUn0053 contains some of the genes that comparative mapping predicts to be missing. We identified discrepancies between previous targeted studies and the OryCun2.0 assembly and some new BAC clones with IGH sequences that can guide other studies to further sequence and improve the OryCun2.0 assembly. Complete knowledge of gene sequences encoding variable regions of rabbit heavy, kappa, and lambda chains will lead to better understanding of how and why rabbits produce antibodies of high specificity and affinity through gene conversion and somatic hypermutation.

  15. Computational operon prediction in whole-genomes and metagenomes.

    Science.gov (United States)

    Zaidi, Syed Shujaat Ali; Zhang, Xuegong

    2017-07-01

    Microbial diversity in unique environmental settings enables abrupt responses catalysed by altering the gene regulation and formation of gene clusters called operons. Operons increases bacterial adaptability, which in turn increases their survival. This review article presents the emergence of computational operon prediction methods for whole microbial genomes and metagenomes, and discusses their strengths and limitations. Most of the whole-genome operon prediction methods struggle to generalize on unrelated genomes. The applicability of universal whole-genome operon prediction methods to metagenomic data is an interesting yet less investigated question. We have evaluated the potential of various operon prediction features for genomic and metagenomic data. Most of operon prediction methods with high accuracy have been compiled into databases. Despite of the high predictive performance, the data among many databases are not completely consistent for similar species. We performed a correlation analysis between the computationally predicted operon databases and experimentally validated data for Escherichia coli, Bacillus subtilis and Mycobacterium tuberculosis. Operon prediction for most of the less characterized microbes cannot be verified due to absence of experimentally validated operons. The generation of validated information for other microbes would test the authenticity of operon databases for other less annotated microbes as well. Advances in sequencing technologies and development of better analysis methods will help researchers to overcome the technological hurdles (such as long sequencing reads and improved contig size) and further improve operon predictions and better utilize operonic information. © The Author 2016. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com.

  16. Extracting phylogenetic signal and accounting for bias in whole-genome data sets supports the Ctenophora as sister to remaining Metazoa.

    Science.gov (United States)

    Borowiec, Marek L; Lee, Ernest K; Chiu, Joanna C; Plachetzki, David C

    2015-11-23

    Understanding the phylogenetic relationships among major lineages of multicellular animals (the Metazoa) is a prerequisite for studying the evolution of complex traits such as nervous systems, muscle tissue, or sensory organs. Transcriptome-based phylogenies have dramatically improved our understanding of metazoan relationships in recent years, although several important questions remain. The branching order near the base of the tree, in particular the placement of the poriferan (sponges, phylum Porifera) and ctenophore (comb jellies, phylum Ctenophora) lineages is one outstanding issue. Recent analyses have suggested that the comb jellies are sister to all remaining metazoan phyla including sponges. This finding is surprising because it suggests that neurons and other complex traits, present in ctenophores and eumetazoans but absent in sponges or placozoans, either evolved twice in Metazoa or were independently, secondarily lost in the lineages leading to sponges and placozoans. To address the question of basal metazoan relationships we assembled a novel dataset comprised of 1080 orthologous loci derived from 36 publicly available genomes representing major lineages of animals. From this large dataset we procured an optimized set of partitions with high phylogenetic signal for resolving metazoan relationships. This optimized data set is amenable to the most appropriate and computationally intensive analyses using site-heterogeneous models of sequence evolution. We also employed several strategies to examine the potential for long-branch attraction to bias our inferences. Our analyses strongly support the Ctenophora as the sister lineage to other Metazoa. We find no support for the traditional view uniting the ctenophores and Cnidaria. Our findings are supported by Bayesian comparisons of topological hypotheses and we find no evidence that they are biased by long-branch attraction. Our study further clarifies relationships among early branching metazoan lineages

  17. Whole genome analysis of epidemiologically closely related Staphylococcus aureus isolates.

    Directory of Open Access Journals (Sweden)

    Maarten Schijffelen

    Full Text Available The change of the bacteria from colonizers to pathogens is accompanied by a drastic change in expression profiles. These changes may be due to environmental signals or to mutational changes. We therefore compared the whole genome sequences of four sets of S. aureus isolates. Three sets were from the same patients. The isolates of each pair (S1800/S1805, S2396/S2395, S2398/S2397, an isolate from colonization and an isolate from infection, respectively were obtained within <30 days of each other and the isolate from infection caused skin infections. The isolates were then compared for differences in gene content and SNPs. In addition, a set of isolates from a colonized pig and a farmer from the same farm at the same time (S0462 and S0460 were analyzed. The isolates pair S1800/S1805 showed a difference in a prophage, but these are easily lost or acquired. However, S1805 contained an integrative conjugative element not present in S1800. In addition, 92 SNPs were present in a variety of genes and the isolates S1800 and S1805 were not considered a pair. Between S2395/S2396 two SNPs were present: one was in an intergenic region and one was a synonymous mutation in a putative membrane protein. Between S2397/S2398 only one synonymous mutation in a putative lipoprotein was found. The two farm isolates were very similar and showed 12 SNPs in genes that belong to a number of different functional categories. However, we cannot pinpoint any gene that explains the change from carrier status to infection. The data indicate that differences between the isolate from infection and the colonizing isolate for S2395/S2396 and S2397/S2398 exist as well as between isolates from different hosts, but S1800/S1805 are not clonal.

  18. Whole-Genome Microarray and Gene Deletion Studies Reveal Regulation of the Polyhydroxyalkanoate Production Cycle by the Stringent Response in Ralstonia eutropha H16

    Energy Technology Data Exchange (ETDEWEB)

    Brigham, CJ; Speth, DR; Rha, C; Sinskey, AJ

    2012-10-22

    Poly(3-hydroxybutyrate) (PHB) production and mobilization in Ralstonia eutropha are well studied, but in only a few instances has PHB production been explored in relation to other cellular processes. We examined the global gene expression of wild-type R. eutropha throughout the PHB cycle: growth on fructose, PHB production using fructose following ammonium depletion, and PHB utilization in the absence of exogenous carbon after ammonium was resupplied. Our results confirm or lend support to previously reported results regarding the expression of PHB-related genes and enzymes. Additionally, genes for many different cellular processes, such as DNA replication, cell division, and translation, are selectively repressed during PHB production. In contrast, the expression levels of genes under the control of the alternative sigma factor sigma(54) increase sharply during PHB production and are repressed again during PHB utilization. Global gene regulation during PHB production is strongly reminiscent of the gene expression pattern observed during the stringent response in other species. Furthermore, a ppGpp synthase deletion mutant did not show an accumulation of PHB, and the chemical induction of the stringent response with DL-norvaline caused an increased accumulation of PHB in the presence of ammonium. These results indicate that the stringent response is required for PHB accumulation in R. eutropha, helping to elucidate a thus-far-unknown physiological basis for this process.

  19. Meta-analysis of general bacterial subclades in whole-genome phylogenies using tree topology profiling.

    Science.gov (United States)

    Meinel, Thomas; Krause, Antje

    2012-01-01

    In the last two decades, a large number of whole-genome phylogenies have been inferred to reconstruct the Tree of Life (ToL). Underlying data models range from gene or functionality content in species to phylogenetic gene family trees and multiple sequence alignments of concatenated protein sequences. Diversity in data models together with the use of different tree reconstruction techniques, disruptive biological effects and the steadily increasing number of genomes have led to a huge diversity in published phylogenies. Comparison of those and, moreover, identification of the impact of inference properties (underlying data model, inference technique) on particular reconstructions is almost impossible. In this work, we introduce tree topology profiling as a method to compare already published whole-genome phylogenies. This method requires visual determination of the particular topology in a drawn whole-genome phylogeny for a set of particular bacterial clans. For each clan, neighborhoods to other bacteria are collected into a catalogue of generalized alternative topologies. Particular topology alternatives found for an ordered list of bacterial clans reveal a topology profile that represents the analyzed phylogeny. To simulate the inhomogeneity of published gene content phylogenies we generate a set of seven phylogenies using different inference techniques and the SYSTERS-PhyloMatrix data model. After tree topology profiling on in total 54 selected published and newly inferred phylogenies, we separate artefactual from biologically meaningful phylogenies and associate particular inference results (phylogenies) with inference background (inference techniques as well as data models). Topological relationships of particular bacterial species groups are presented. With this work we introduce tree topology profiling into the scientific field of comparative phylogenomics.

  20. Identification of antibiotic resistance genes in the multidrug-resistant Acinetobacter baumannii strain, MDR-SHH02, using whole-genome sequencing.

    Science.gov (United States)

    Wang, Hualiang; Wang, Jinghua; Yu, Peijuan; Ge, Ping; Jiang, Yanqun; Xu, Rong; Chen, Rong; Liu, Xuejie

    2017-02-01

    This study aimed to investigate antibiotic resistance genes in the multidrug-resistant (MDR) Acinetobacter baumannii (A. baumanii) strain, MDR-SHH02, using whole‑genome sequencing (WGS). The antibiotic resistance of MDR-SHH02 isolated from a patient with breast cancer to 19 types of antibiotics was determined using the Kirby‑Bauer method. WGS of MDR-SHH02 was then performed. Following quality control and transcriptome assembly, functional annotation of genes was conducted, and the phylogenetic tree of MDR-SHH02, along with another 5 A. baumanii species and 2 Acinetobacter species, was constructed using PHYLIP 3.695 and FigTree v1.4.2. Furthermore, pathogenicity islands (PAIs) were predicted by the pathogenicity island database. Potential antibiotic resistance genes in MDR-SHH02 were predicted based on the information in the Antibiotic Resistance Genes Database (ARDB). MDR-SHH02 was found to be resistant to all of the tested antibiotics. The total draft genome length of MDR-SHH02 was 4,003,808 bp. There were 74.25% of coding sequences to be annotated into 21 of the Clusters of Orthologous Groups (COGs) of protein terms, such as 'transcription' and 'amino acid transport and metabolism'. Furthermore, there were 45 PAIs homologous to the sequence MDRSHH02000806. Additionally, a total of 12 gene sequences in MDR-SHH02 were highly similar to the sequences of antibiotic resistance genes in ARDB, including genes encoding aminoglycoside‑modifying enzymes [e.g., aac(3)-Ia, ant(2'')‑Ia, aph33ib and aph(3')-Ia], β-lactamase genes (bl2b_tem and bl2b_tem1), sulfonamide-resistant dihydropteroate synthase genes (sul1 and sul2), catb3 and tetb. These results suggest that numerous genes mediate resistance to various antibiotics in MDR-SHH02, and provide a clinical guidance for the personalized therapy of A. baumannii-infected patients.

  1. Alignathon: a competitive assessment of whole-genome alignment methods

    OpenAIRE

    Earl, Dent; Nguyen, Ngan; Hickey, Glenn; Harris, Robert S.; Fitzgerald, Stephen; Beal, Kathryn; Seledtsov, Igor; Molodtsov, Vladimir; Raney, Brian J.; Clawson, Hiram; Kim, Jaebum; Kemena, Carsten; Chang, Jia-Ming; Erb, Ionas; Poliakov, Alexander

    2014-01-01

    Multiple sequence alignments (MSAs) are a prerequisite for a wide variety of evolutionary analyses. Published assessments and benchmark data sets for protein and, to a lesser extent, global nucleotide MSAs are available, but less effort has been made to establish benchmarks in the more general problem of whole-genome alignment (WGA). Using the same model as the successful Assemblathon competitions, we organized a competitive evaluation in which teams submitted their alignments and then assess...

  2. Whole genome sequence study of cannabis dependence in two independent cohorts.

    Science.gov (United States)

    Gizer, Ian R; Bizon, Chris; Gilder, David A; Ehlers, Cindy L; Wilhelmsen, Kirk C

    2017-01-23

    Recent advances in genome wide sequencing techniques and analytical methods allow for more comprehensive examinations of the genome than microarray-based genome-wide association studies (GWAS). The present report provides the first application of whole genome sequencing (WGS) to identify low frequency variants involved in cannabis dependence across two independent cohorts. The present study used low-coverage whole genome sequence data to conduct set-based association and enrichment analyses of low frequency variation in protein-coding regions as well as regulatory regions in relation to cannabis dependence. Two cohorts were studied: a population-based Native American tribal community consisting of 697 participants nested within large multi-generational pedigrees and a family-based sample of 1832 predominantly European ancestry participants largely nested within nuclear families. Participants in both samples were assessed for Diagnostic and Statistical Manual of Mental Disorders-IV (DSM-IV) lifetime cannabis dependence, with 168 and 241 participants receiving a positive diagnosis in each sample, respectively. Sequence kernel association tests identified one protein-coding region, C1orf110 and one regulatory region in the MEF2B gene that achieved significance in a meta-analysis of both samples. A regulatory region within the PCCB gene, a gene previously associated with schizophrenia, exhibited a suggestive association. Finally, a significant enrichment of regions within or near genes with multiple splice variants or involved in cell adhesion or potassium channel activity were associated with cannabis dependence. This initial study demonstrates the potential utility of low pass whole genome sequencing for identifying genetic variants involved in the etiology of cannabis use disorders. © 2017 Society for the Study of Addiction.

  3. Candidate genes and genetic architecture of symbiotic and agronomic traits revealed by whole-genome, sequence-based association genetics in Medicago truncatula.

    Directory of Open Access Journals (Sweden)

    John Stanton-Geddes

    Full Text Available Genome-wide association study (GWAS has revolutionized the search for the genetic basis of complex traits. To date, GWAS have generally relied on relatively sparse sampling of nucleotide diversity, which is likely to bias results by preferentially sampling high-frequency SNPs not in complete linkage disequilibrium (LD with causative SNPs. To avoid these limitations we conducted GWAS with >6 million SNPs identified by sequencing the genomes of 226 accessions of the model legume Medicago truncatula. We used these data to identify candidate genes and the genetic architecture underlying phenotypic variation in plant height, trichome density, flowering time, and nodulation. The characteristics of candidate SNPs differed among traits, with candidates for flowering time and trichome density in distinct clusters of high linkage disequilibrium (LD and the minor allele frequencies (MAF of candidates underlying variation in flowering time and height significantly greater than MAF of candidates underlying variation in other traits. Candidate SNPs tagged several characterized genes including nodulation related genes SERK2, MtnodGRP3, MtMMPL1, NFP, CaML3, MtnodGRP3A and flowering time gene MtFD as well as uncharacterized genes that become candidates for further molecular characterization. By comparing sequence-based candidates to candidates identified by in silico 250K SNP arrays, we provide an empirical example of how reliance on even high-density reduced representation genomic makers can bias GWAS results. Depending on the trait, only 30-70% of the top 20 in silico array candidates were within 1 kb of sequence-based candidates. Moreover, the sequence-based candidates tagged by array candidates were heavily biased towards common variants; these comparisons underscore the need for caution when interpreting results from GWAS conducted with sparsely covered genomes.

  4. Whole Genome Pathway Analysis Identifies an Association of Cadmium Response Gene Loss with Copy Number Variation in Mutant p53 Bearing Uterine Endometrial Carcinomas.

    Directory of Open Access Journals (Sweden)

    Joe Ryan Delaney

    Full Text Available Massive chromosomal aberrations are a signature of advanced cancer, although the factors promoting the pervasive incidence of these copy number alterations (CNAs are poorly understood. Gatekeeper mutations, such as p53, contribute to aneuploidy, yet p53 mutant tumors do not always display CNAs. Uterine Corpus Endometrial Carcinoma (UCEC offers a unique system to begin to evaluate why some cancers acquire high CNAs while others evolve another route to oncogenesis, since about half of p53 mutant UCEC tumors have a relatively flat CNA landscape and half have 20-90% of their genome altered in copy number.We extracted copy number information from 68 UCEC genomes mutant in p53 by the GISTIC2 algorithm. GO term pathway analysis, via GOrilla, was used to identify suppressed pathways. Genes within these pathways were mapped for focal or wide distribution. Deletion hotspots were evaluated for temporal incidence.Multiple pathways contributed to the development of pervasive CNAs, including developmental, metabolic, immunological, cell adhesion and cadmium response pathways. Surprisingly, cadmium response pathway genes are predicted as the earliest loss events within these tumors: in particular, the metallothionein genes involved in heavy metal sequestration. Loss of cadmium response genes were associated with copy number changes and poorer prognosis, contrasting with 'copy number flat' tumors which instead exhibited substantive mutation.Metallothioneins are lost early in the development of high CNA endometrial cancer, providing a potential mechanism and biological rationale for increased incidence of endometrial cancer with cadmium exposure. Developmental and metabolic pathways are altered later in tumor progression.

  5. Rapid construction of a whole-genome transposon insertion collection for Shewanella oneidensis by Knockout Sudoku

    Science.gov (United States)

    Baym, Michael; Shaket, Lev; Anzai, Isao A.; Adesina, Oluwakemi; Barstow, Buz

    2016-01-01

    Whole-genome knockout collections are invaluable for connecting gene sequence to function, yet traditionally, their construction has required an extraordinary technical effort. Here we report a method for the construction and purification of a curated whole-genome collection of single-gene transposon disruption mutants termed Knockout Sudoku. Using simple combinatorial pooling, a highly oversampled collection of mutants is condensed into a next-generation sequencing library in a single day, a 30- to 100-fold improvement over prior methods. The identities of the mutants in the collection are then solved by a probabilistic algorithm that uses internal self-consistency within the sequencing data set, followed by rapid algorithmically guided condensation to a minimal representative set of mutants, validation, and curation. Starting from a progenitor collection of 39,918 mutants, we compile a quality-controlled knockout collection of the electroactive microbe Shewanella oneidensis MR-1 containing representatives for 3,667 genes that is functionally validated by high-throughput kinetic measurements of quinone reduction. PMID:27830751

  6. Whole genome analysis of a Vietnamese trio

    Indian Academy of Sciences (India)

    We here present the first whole genome analysis of an anonymous Kinh Vietnamese (KHV) trio whose genomes were deeply sequenced to 30-fold average ... Wellcome Trust Center for Human Genetics, Oxford University, Oxford, UK; High Performance Computing Center, Hanoi University of Science and Technology, ...

  7. Interpreting Whole-Genome Marker Data

    Science.gov (United States)

    Weir, Bruce S.

    2013-01-01

    The challenges of whole-genome data, when genotypes are available from hundreds of thousands of genetic markers, are explored for four topics in statistical genetics: Hardy-Weinberg testing, estimating linkage disequilibrium from unphased genotypic data, association mapping and characterizing population structure. PMID:24273615

  8. Whole genome transcriptomics reveals global effects including up-regulation of Francisella pathogenicity island gene expression during active stringent response in the highly virulent Francisella tularensis subsp. tularensis SCHU S4.

    Science.gov (United States)

    Murch, Amber L; Skipp, Paul J; Roach, Peter L; Oyston, Petra C F

    2017-11-01

    During conditions of nutrient limitation bacteria undergo a series of global gene expression changes to survive conditions of amino acid and fatty acid starvation. Rapid reallocation of cellular resources is brought about by gene expression changes coordinated by the signalling nucleotides' guanosine tetraphosphate or pentaphosphate, collectively termed (p)ppGpp and is known as the stringent response. The stringent response has been implicated in bacterial virulence, with elevated (p)ppGpp levels being associated with increased virulence gene expression. This has been observed in the highly pathogenic Francisella tularensis sub spp. tularensis SCHU S4, the causative agent of tularaemia. Here, we aimed to artificially induce the stringent response by culturing F. tularensis in the presence of the amino acid analogue l-serine hydroxamate. Serine hydroxamate competitively inhibits tRNAser aminoacylation, causing an accumulation of uncharged tRNA. The uncharged tRNA enters the A site on the translating bacterial ribosome and causes ribosome stalling, in turn stimulating the production of (p)ppGpp and activation of the stringent response. Using the essential virulence gene iglC, which is encoded on the Francisella pathogenicity island (FPI) as a marker of active stringent response, we optimized the culture conditions required for the investigation of virulence gene expression under conditions of nutrient limitation. We subsequently used whole genome RNA-seq to show how F. tularensis alters gene expression on a global scale during active stringent response. Key findings included up-regulation of genes involved in virulence, stress responses and metabolism, and down-regulation of genes involved in metabolite transport and cell division. F. tularensis is a highly virulent intracellular pathogen capable of causing debilitating or fatal disease at extremely low infectious doses. However, virulence mechanisms are still poorly understood. The stringent response is widely

  9. HLA-VBSeq: accurate HLA typing at full resolution from whole-genome sequencing data

    OpenAIRE

    Nariai, Naoki; Kojima, Kaname; Saito, Sakae; Mimori, Takahiro; Sato, Yukuto; Kawai, Yosuke; Yamaguchi-Kabata, Yumi; Yasuda, Jun; Nagasaki, Masao

    2015-01-01

    Background Human leucocyte antigen (HLA) genes play an important role in determining the outcome of organ transplantation and are linked to many human diseases. Because of the diversity and polymorphisms of HLA loci, HLA typing at high resolution is challenging even with whole-genome sequencing data. Results We have developed a computational tool, HLA-VBSeq, to estimate the most probable HLA alleles at full (8-digit) resolution from whole-genome sequence data. HLA-VBSeq simultaneously optimiz...

  10. Pathway Processor: A Tool for Integrating Whole-Genome Expression Results into Metabolic Networks

    Science.gov (United States)

    Grosu, Paul; Townsend, Jeffrey P.; Hartl, Daniel L.; Cavalieri, Duccio

    2002-01-01

    We have developed a new tool to visualize expression data on metabolic pathways and to evaluate which metabolic pathways are most affected by transcriptional changes in whole-genome expression experiments. Using the Fisher Exact Test, the method scores biochemical pathways according to the probability that as many or more genes in a pathway would be significantly altered in a given experiment by chance alone. This method has been validated on diauxic shift experiments and reproduces well known effects of carbon source on yeast metabolism. The analysis is implemented with Pathway Analyzer, one of the tools of Pathway Processor, a new statistical package for the analysis of whole-genome expression data. Results from multiple experiments can be compared, reducing the analysis from the full set of individual genes to a limited number of pathways of interest. The pathways are visualized with OpenDX, an open-source visualization software package, and the relationship between genes in the pathways can be examined in detail using Expression Mapper, the second program of the package. This program features a graphical output displaying differences in expression on metabolic charts of the biochemical pathways to which the open reading frames are assigned. [Supplementary materials are available at http://www.cgr.harvard.edu/cavalieri/pp.html and http://www.genome.org.] PMID:12097350

  11. Whole-genome sequencing in autism identifies hot spots for de novo germline mutation

    DEFF Research Database (Denmark)

    Michaelson, Jacob J.; Shi, Yujian; Gujral, Madhusudan

    2012-01-01

    investigated global patterns of germline mutation by whole-genome sequencing of monozygotic twins concordant for ASD and their parents. Mutation rates varied widely throughout the genome (by 100-fold) and could be explained by intrinsic characteristics of DNA sequence and chromatin structure. Dense clusters...... of mutations within individual genomes were attributable to compound mutation or gene conversion. Hypermutability was a characteristic of genes involved in ASD and other diseases. In addition, genes impacted by mutations in this study were associated with ASD in independent exome-sequencing data sets. Our......De novo mutation plays an important role in autism spectrum disorders (ASDs). Notably, pathogenic copy number variants (CNVs) are characterized by high mutation rates. We hypothesize that hypermutability is a property of ASD genes and may also include nucleotide-substitution hot spots. We...

  12. Development and validation of allele-specific SNP/indel markers for eight yield-enhancing genes using whole-genome sequencing strategy to increase yield potential of rice, Oryza sativa L.

    Science.gov (United States)

    Kim, Sung-Ryul; Ramos, Joie; Ashikari, Motoyuki; Virk, Parminder S; Torres, Edgar A; Nissila, Eero; Hechanova, Sherry Lou; Mauleon, Ramil; Jena, Kshirod K

    2016-12-01

    Rice is one of the major staple foods in the world, especially in the developing countries of Asia. Its consumption as a dietary source is also increasing in Africa. To meet the demand for rice to feed the increasing human population, increasing rice yield is essential. Improving the genetic yield potential of rice is one ideal solution. It is imperative to introduce the identified yield-enhancing gene(s) into modern rice cultivars for the rapid improvement of yield potential through marker-assisted breeding. We report the development of PCR-gel-based markers for eight yield-related functional genes (Gn1a, OsSPL14, SCM2, Ghd7, DEP1, SPIKE, GS5, and TGW6) to introduce yield-positive alleles from the donor lines. Six rice cultivars, including three each of donor and recipient lines, respectively, were sequenced by next-generation whole-genome sequencing to detect DNA polymorphisms between the genotypes. Additionally, PCR products containing functional nucleotide polymorphism (FNP) or putative FNPs for yield-related genes were sequenced. DNA polymorphisms discriminating yield-positive alleles and non-target alleles for each gene were selected through sequence analysis and the allele-specific PCR-gel-based markers were developed. The markers were validated with our intermediate breeding lines produced from crosses between the donors and 12 elite indica rice cultivars as recipients. Automated capillary electrophoresis was tested and fluorescence-labeled SNP genotyping markers (Fluidigm SNP genotyping platform) for Gn1a, OsSPL14, Ghd7, GS5, and GS3 genes were developed for high-throughput genotyping. The SNP/indel markers linked to yield related genes functioned properly in our marker-assisted breeding program with identified high yield potential lines. These markers can be utilized in local favorite rice cultivars for yield enhancement. The marker designing strategy using both next generation sequencing and Sanger sequencing methods can be used for suitable marker

  13. Whole-Genome Thermodynamic Analysis Reduces siRNA Off-Target Effects

    Science.gov (United States)

    Chen, Xi; Liu, Peng; Chou, Hui-Hsien

    2013-01-01

    Small interfering RNAs (siRNAs) are important tools for knocking down targeted genes, and have been widely applied to biological and biomedical research. To design siRNAs, two important aspects must be considered: the potency in knocking down target genes and the off-target effect on any nontarget genes. Although many studies have produced useful tools to design potent siRNAs, off-target prevention has mostly been delegated to sequence-level alignment tools such as BLAST. We hypothesize that whole-genome thermodynamic analysis can identify potential off-targets with higher precision and help us avoid siRNAs that may have strong off-target effects. To validate this hypothesis, two siRNA sets were designed to target three human genes IDH1, ITPR2 and TRIM28. They were selected from the output of two popular siRNA design tools, siDirect and siDesign. Both siRNA design tools have incorporated sequence-level screening to avoid off-targets, thus their output is believed to be optimal. However, one of the sets we tested has off-target genes predicted by Picky, a whole-genome thermodynamic analysis tool. Picky can identify off-target genes that may hybridize to a siRNA within a user-specified melting temperature range. Our experiments validated that some off-target genes predicted by Picky can indeed be inhibited by siRNAs. Similar experiments were performed using commercially available siRNAs and a few off-target genes were also found to be inhibited as predicted by Picky. In summary, we demonstrate that whole-genome thermodynamic analysis can identify off-target genes that are missed in sequence-level screening. Because Picky prediction is deterministic according to thermodynamics, if a siRNA candidate has no Picky predicted off-targets, it is unlikely to cause off-target effects. Therefore, we recommend including Picky as an additional screening step in siRNA design. PMID:23484018

  14. Evaluating the consistency of gene sets used in the analysis of bacterial gene expression data

    Directory of Open Access Journals (Sweden)

    Tintle Nathan L

    2012-08-01

    Full Text Available Abstract Background Statistical analyses of whole genome expression data require functional information about genes in order to yield meaningful biological conclusions. The Gene Ontology (GO and Kyoto Encyclopedia of Genes and Genomes (KEGG are common sources of functionally grouped gene sets. For bacteria, the SEED and MicrobesOnline provide alternative, complementary sources of gene sets. To date, no comprehensive evaluation of the data obtained from these resources has been performed. Results We define a series of gene set consistency metrics directly related to the most common classes of statistical analyses for gene expression data, and then perform a comprehensive analysis of 3581 Affymetrix® gene expression arrays across 17 diverse bacteria. We find that gene sets obtained from GO and KEGG demonstrate lower consistency than those obtained from the SEED and MicrobesOnline, regardless of gene set size. Conclusions Despite the widespread use of GO and KEGG gene sets in bacterial gene expression data analysis, the SEED and MicrobesOnline provide more consistent sets for a wide variety of statistical analyses. Increased use of the SEED and MicrobesOnline gene sets in the analysis of bacterial gene expression data may improve statistical power and utility of expression data.

  15. Quantitative trait loci markers derived from whole genome sequence data increases the reliability of genomic prediction

    DEFF Research Database (Denmark)

    Brøndum, Rasmus Froberg; Su, Guosheng; Janss, Luc

    2015-01-01

    This study investigated the effect on the reliability of genomic prediction when a small number of significant variants from single marker analysis based on whole genome sequence data were added to the regular 54k single nucleotide polymorphism (SNP) array data. The extra markers were selected...... this study indicate that the reliability of genomic prediction can be increased by including markers significant in genome-wide association studies on whole genome sequence data alongside the 54k SNP set....

  16. Robust and rapid algorithms facilitate large-scale whole genome sequencing downstream analysis in an integrative framework.

    Science.gov (United States)

    Li, Miaoxin; Li, Jiang; Li, Mulin Jun; Pan, Zhicheng; Hsu, Jacob Shujui; Liu, Dajiang J; Zhan, Xiaowei; Wang, Junwen; Song, Youqiang; Sham, Pak Chung

    2017-05-19

    Whole genome sequencing (WGS) is a promising strategy to unravel variants or genes responsible for human diseases and traits. However, there is a lack of robust platforms for a comprehensive downstream analysis. In the present study, we first proposed three novel algorithms, sequence gap-filled gene feature annotation, bit-block encoded genotypes and sectional fast access to text lines to address three fundamental problems. The three algorithms then formed the infrastructure of a robust parallel computing framework, KGGSeq, for integrating downstream analysis functions for whole genome sequencing data. KGGSeq has been equipped with a comprehensive set of analysis functions for quality control, filtration, annotation, pathogenic prediction and statistical tests. In the tests with whole genome sequencing data from 1000 Genomes Project, KGGSeq annotated several thousand more reliable non-synonymous variants than other widely used tools (e.g. ANNOVAR and SNPEff). It took only around half an hour on a small server with 10 CPUs to access genotypes of ∼60 million variants of 2504 subjects, while a popular alternative tool required around one day. KGGSeq's bit-block genotype format used 1.5% or less space to flexibly represent phased or unphased genotypes with multiple alleles and achieved a speed of over 1000 times faster to calculate genotypic correlation. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  17. Whole genome phylogenies for multiple Drosophila species

    Directory of Open Access Journals (Sweden)

    Seetharam Arun

    2012-12-01

    Full Text Available Abstract Background Reconstructing the evolutionary history of organisms using traditional phylogenetic methods may suffer from inaccurate sequence alignment. An alternative approach, particularly effective when whole genome sequences are available, is to employ methods that don’t use explicit sequence alignments. We extend a novel phylogenetic method based on Singular Value Decomposition (SVD to reconstruct the phylogeny of 12 sequenced Drosophila species. SVD analysis provides accurate comparisons for a high fraction of sequences within whole genomes without the prior identification of orthologs or homologous sites. With this method all protein sequences are converted to peptide frequency vectors within a matrix that is decomposed to provide simplified vector representations for each protein of the genome in a reduced dimensional space. These vectors are summed together to provide a vector representation for each species, and the angle between these vectors provides distance measures that are used to construct species trees. Results An unfiltered whole genome analysis (193,622 predicted proteins strongly supports the currently accepted phylogeny for 12 Drosophila species at higher dimensions except for the generally accepted but difficult to discern sister relationship between D. erecta and D. yakuba. Also, in accordance with previous studies, many sequences appear to support alternative phylogenies. In this case, we observed grouping of D. erecta with D. sechellia when approximately 55% to 95% of the proteins were removed using a filter based on projection values or by reducing resolution by using fewer dimensions. Similar results were obtained when just the melanogaster subgroup was analyzed. Conclusions These results indicate that using our novel phylogenetic method, it is possible to consult and interpret all predicted protein sequences within multiple whole genomes to produce accurate phylogenetic estimations of relatedness between

  18. Whole genome sequencing of clinical isolates of Giardia lamblia.

    Science.gov (United States)

    Hanevik, K; Bakken, R; Brattbakk, H R; Saghaug, C S; Langeland, N

    2015-02-01

    Clinical isolates from protozoan parasites such as Giardia lamblia are at present practically impossible to culture. By using simple cyst purification methods, we show that Giardia whole genome sequencing of clinical stool samples is possible. Immunomagnetic separation after sucrose gradient flotation gave superior results compared to sucrose gradient flotation alone. The method enables detailed analysis of a wide range of genes of interest for genotyping, virulence and drug resistance. Copyright © 2014 European Society of Clinical Microbiology and Infectious Diseases. Published by Elsevier Ltd. All rights reserved.

  19. Whole Genome Epidemiological Typing of Salmonella

    DEFF Research Database (Denmark)

    Leekitcharoenphon, Pimlapas

    Salmonella is one of the most common foodborne pathogens worldwide. In the US alone, salmonellosis was estimated to cause 1.4 million cases effecting 17,000 hospitalization and almost 600 deaths each year. Particularly, Salmonella enterica is a common cause of minor and large food borne outbreaks...... used for typing is crucial for successful discrimination. The core genes or the genes that are conserved in all members of a genus or species are potentially good candidates for investigating genomic variation in phylogeny and epidemiology. A total of 2,882 core genes have been observed among 73....../absence of all genes across genomes, is similar to the consensus tree but with higher branching confidence value. The core genes can be divided into two categories: a few highly variable genes and a larger set of conserved core genes, with low variance. These core genes are useful for investigating molecular...

  20. High-Accuracy HLA Type Inference from Whole-Genome Sequencing Data Using Population Reference Graphs.

    Directory of Open Access Journals (Sweden)

    Alexander T Dilthey

    2016-10-01

    Full Text Available Genetic variation at the Human Leucocyte Antigen (HLA genes is associated with many autoimmune and infectious disease phenotypes, is an important element of the immunological distinction between self and non-self, and shapes immune epitope repertoires. Determining the allelic state of the HLA genes (HLA typing as a by-product of standard whole-genome sequencing data would therefore be highly desirable and enable the immunogenetic characterization of samples in currently ongoing population sequencing projects. Extensive hyperpolymorphism and sequence similarity between the HLA genes, however, pose problems for accurate read mapping and make HLA type inference from whole-genome sequencing data a challenging problem. We describe how to address these challenges in a Population Reference Graph (PRG framework. First, we construct a PRG for 46 (mostly HLA genes and pseudogenes, their genomic context and their characterized sequence variants, integrating a database of over 10,000 known allele sequences. Second, we present a sequence-to-PRG paired-end read mapping algorithm that enables accurate read mapping for the HLA genes. Third, we infer the most likely pair of underlying alleles at G group resolution from the IMGT/HLA database at each locus, employing a simple likelihood framework. We show that HLA*PRG, our algorithm, outperforms existing methods by a wide margin. We evaluate HLA*PRG on six classical class I and class II HLA genes (HLA-A, -B, -C, -DQA1, -DQB1, -DRB1 and on a set of 14 samples (3 samples with 2 x 100bp, 11 samples with 2 x 250bp Illumina HiSeq data. Of 158 alleles tested, we correctly infer 157 alleles (99.4%. We also identify and re-type two erroneous alleles in the original validation data. We conclude that HLA*PRG for the first time achieves accuracies comparable to gold-standard reference methods from standard whole-genome sequencing data, though high computational demands (currently ~30-250 CPU hours per sample remain a

  1. Whole-genome sequencing for analysis of an outbreak of meticillin-resistant Staphylococcus aureus: a descriptive study.

    Science.gov (United States)

    Harris, Simon R; Cartwright, Edward J P; Török, M Estée; Holden, Matthew T G; Brown, Nicholas M; Ogilvy-Stuart, Amanda L; Ellington, Matthew J; Quail, Michael A; Bentley, Stephen D; Parkhill, Julian; Peacock, Sharon J

    2013-02-01

    The emergence of meticillin-resistant Staphylococcus aureus (MRSA) that can persist in the community and replace existing hospital-adapted lineages of MRSA means that it is necessary to understand transmission dynamics in terms of hospitals and the community as one entity. We assessed the use of whole-genome sequencing to enhance detection of MRSA transmission between these settings. We studied a putative MRSA outbreak on a special care baby unit (SCBU) at a National Health Service Foundation Trust in Cambridge, UK. We used whole-genome sequencing to validate and expand findings from an infection-control team who assessed the outbreak through conventional analysis of epidemiological data and antibiogram profiles. We sequenced isolates from all colonised patients in the SCBU, and sequenced MRSA isolates from patients in the hospital or community with the same antibiotic susceptibility profile as the outbreak strain. The hospital infection-control team identified 12 infants colonised with MRSA in a 6 month period in 2011, who were suspected of being linked, but a persistent outbreak could not be confirmed with conventional methods. With whole-genome sequencing, we identified 26 related cases of MRSA carriage, and showed transmission occurred within the SCBU, between mothers on a postnatal ward, and in the community. The outbreak MRSA type was a new sequence type (ST) 2371, which is closely related to ST22, but contains genes encoding Panton-Valentine leucocidin. Whole-genome sequencing data were used to propose and confirm that MRSA carriage by a staff member had allowed the outbreak to persist during periods without known infection on the SCBU and after a deep clean. Whole-genome sequencing holds great promise for rapid, accurate, and comprehensive identification of bacterial transmission pathways in hospital and community settings, with concomitant reductions in infections, morbidity, and costs. UK Clinical Research Collaboration Translational Infection Research

  2. Whole genome sequencing: an efficient approach to ensuring food safety

    Science.gov (United States)

    Lakicevic, B.; Nastasijevic, I.; Dimitrijevic, M.

    2017-09-01

    Whole genome sequencing is an effective, powerful tool that can be applied to a wide range of public health and food safety applications. A major difference between WGS and the traditional typing techniques is that WGS allows all genes to be included in the analysis, instead of a well-defined subset of genes or variable intergenic regions. Also, the use of WGS can facilitate the understanding of contamination/colonization routes of foodborne pathogens within the food production environment, and can also afford efficient tracking of pathogens’ entry routes and distribution from farm-to-consumer. Tracking foodborne pathogens in the food processing-distribution-retail-consumer continuum is of the utmost importance for facilitation of outbreak investigations and rapid action in controlling/preventing foodborne outbreaks. Therefore, WGS likely will replace most of the numerous workflows used in public health laboratories to characterize foodborne pathogens into one consolidated, efficient workflow.

  3. Benchmark Dataset for Whole Genome Sequence Compression.

    Science.gov (United States)

    C L, Biji; S Nair, Achuthsankar

    2017-01-01

    The research in DNA data compression lacks a standard dataset to test out compression tools specific to DNA. This paper argues that the current state of achievement in DNA compression is unable to be benchmarked in the absence of such scientifically compiled whole genome sequence dataset and proposes a benchmark dataset using multistage sampling procedure. Considering the genome sequence of organisms available in the National Centre for Biotechnology and Information (NCBI) as the universe, the proposed dataset selects 1,105 prokaryotes, 200 plasmids, 164 viruses, and 65 eukaryotes. This paper reports the results of using three established tools on the newly compiled dataset and show that their strength and weakness are evident only with a comparison based on the scientifically compiled benchmark dataset. The sample dataset and the respective links are available @ https://sourceforge.net/projects/benchmarkdnacompressiondataset/.

  4. Whole genome sequence analysis of Mycobacterium suricattae

    KAUST Repository

    Dippenaar, Anzaan

    2015-10-21

    Tuberculosis occurs in various mammalian hosts and is caused by a range of different lineages of the Mycobacterium tuberculosis complex (MTBC). A recently described member, Mycobacterium suricattae, causes tuberculosis in meerkats (Suricata suricatta) in Southern Africa and preliminary genetic analysis showed this organism to be closely related to an MTBC pathogen of rock hyraxes (Procavia capensis), the dassie bacillus. Here we make use of whole genome sequencing to describe the evolution of the genome of M. suricattae, including known and novel regions of difference, SNPs and IS6110 insertion sites. We used genome-wide phylogenetic analysis to show that M. suricattae clusters with the chimpanzee bacillus, previously isolated from a chimpanzee (Pan troglodytes) in West Africa. We propose an evolutionary scenario for the Mycobacterium africanum lineage 6 complex, showing the evolutionary relationship of M. africanum and chimpanzee bacillus, and the closely related members M. suricattae, dassie bacillus and Mycobacterium mungi.

  5. Strategies and tools for whole genome alignments

    Energy Technology Data Exchange (ETDEWEB)

    Couronne, Olivier; Poliakov, Alexander; Bray, Nicolas; Ishkhanov,Tigran; Ryaboy, Dmitriy; Rubin, Edward; Pachter, Lior; Dubchak, Inna

    2002-11-25

    The availability of the assembled mouse genome makespossible, for the first time, an alignment and comparison of two largevertebrate genomes. We have investigated different strategies ofalignment for the subsequent analysis of conservation of genomes that areeffective for different quality assemblies. These strategies were appliedto the comparison of the working draft of the human genome with the MouseGenome Sequencing Consortium assembly, as well as other intermediatemouse assemblies. Our methods are fast and the resulting alignmentsexhibit a high degree of sensitivity, covering more than 90 percent ofknown coding exons in the human genome. We have obtained such coveragewhile preserving specificity. With a view towards the end user, we havedeveloped a suite of tools and websites for automatically aligning, andsubsequently browsing and working with whole genome comparisons. Wedescribe the use of these tools to identify conserved non-coding regionsbetween the human and mouse genomes, some of which have not beenidentified by other methods.

  6. Whole Genome Sequencing and Newborn Screening.

    Science.gov (United States)

    Botkin, Jeffrey R; Rothwell, Erin

    2016-03-01

    Clinical applications of next generation sequencing are growing at a tremendous pace. Currently the largest application of genetic testing in medicine occurs with newborn screening through state-mandated public health programs, and there are suggestions that sequencing could become a standard component of newborn care within the next decade. As such, newborn screening may appear to be a logical starting point to explore whole genome and whole exome sequencing on a population level. Yet, there are a number of ethical, social and legal implications about the use of a mandatory public health screening program that create challenges for the use of sequencing technologies in this context. Additionally, at this time we still have limited understanding and strategies for managing genomic data, supporting our conclusion that genome sequencing is not justified within population based public health programs for newborn screening.

  7. Principles of Whole-Genome Amplification.

    Science.gov (United States)

    Czyz, Zbigniew Tadeusz; Kirsch, Stefan; Polzer, Bernhard

    2015-01-01

    Modern molecular biology relies on large amounts of high-quality genomic DNA. However, in a number of clinical or biological applications this requirement cannot be met, as starting material is either limited (e.g., preimplantation genetic diagnosis (PGD) or analysis of minimal residual cancer) or of insufficient quality (e.g., formalin-fixed paraffin-embedded tissue samples or forensics). As a consequence, in order to obtain sufficient amounts of material to analyze these demanding samples by state-of-the-art modern molecular assays, genomic DNA has to be amplified. This chapter summarizes available technologies for whole-genome amplification (WGA), bridging the last 25 years from the first developments to currently applied methods. We will especially elaborate on research application, as well as inherent advantages and limitations of various WGA technologies.

  8. Prokaryotic Phylogenies Inferred from Whole-Genome Sequence and Annotation Data

    Directory of Open Access Journals (Sweden)

    Wei Du

    2013-01-01

    Full Text Available Phylogenetic trees are used to represent the evolutionary relationship among various groups of species. In this paper, a novel method for inferring prokaryotic phylogenies using multiple genomic information is proposed. The method is called CGCPhy and based on the distance matrix of orthologous gene clusters between whole-genome pairs. CGCPhy comprises four main steps. First, orthologous genes are determined by sequence similarity, genomic function, and genomic structure information. Second, genes involving potential HGT events are eliminated, since such genes are considered to be the highly conserved genes across different species and the genes located on fragments with abnormal genome barcode. Third, we calculate the distance of the orthologous gene clusters between each genome pair in terms of the number of orthologous genes in conserved clusters. Finally, the neighbor-joining method is employed to construct phylogenetic trees across different species. CGCPhy has been examined on different datasets from 617 complete single-chromosome prokaryotic genomes and achieved applicative accuracies on different species sets in agreement with Bergey's taxonomy in quartet topologies. Simulation results show that CGCPhy achieves high average accuracy and has a low standard deviation on different datasets, so it has an applicative potential for phylogenetic analysis.

  9. Evaluation of whole genome sequencing for Mycobacterial species identification and drug susceptibility testing in a clinical setting: a large-scale prospective assessment of performance against line-probe assays and phenotyping.

    Science.gov (United States)

    Quan, T Phuong; Bawa, Zharain; Foster, Dona; Walker, Tim; Del Ojo Elias, Carlos; Rathod, Priti; Iqbal, Zamin; Bradley, Phelim; Mowbray, Janet; Walker, A Sarah; Crook, Derrick W; Wyllie, David H; Peto, Timothy E A; Smith, E Grace

    2017-11-22

    Use of whole genome sequencing (WGS) for routine Mycobacterial species identification and drug susceptibility testing (DST) is becoming a reality. We compared performance of WGS and standard laboratory workflows prospectively, by parallel processing at a major Mycobacterial Reference Service over one year, for species identification, first-line Mycobacterium tuberculosis (TB) resistance prediction, and turnaround time. Of 2039 isolates with line-probe results for species identification, 74 (3.6%) failed sequencing or WGS species identification. Excluding these, clinically important species were identified in 1902 isolates, of which 1825 (96.0%) were identified by WGS as the same species. 2157 line-probe test results assaying resistance to the first-line drugs isoniazid and rifampicin were available from 728 TB complex isolates. Excluding 216 (10.0%) cases where there was insufficient sequencing data for WGS to make a prediction, overall concordance was 99.3% (95% CI 98.9-99.6), (sensitivity 97.6% (91.7-99.7), specificity 99.5% (99.0-99.7)). 2982 phenotypic DST results were available from 777 TB complex isolates. Of these, 356 (11.9%) had no WGS comparator due to insufficient sequencing data, and in 154 (5.2%) cases the WGS prediction was indeterminate due to discovery of novel, previously uncharacterized mutations. Excluding these, overall concordance was 99.2% (98.7-99.5), (sensitivity 94.2% (88.4-97.6), specificity 99.4% (99.0-99.7)). Median processing time for the routine laboratory versus WGS was similar overall, at 20 days (IQR 15,31) and 21 days (15, 29) respectively (p=0.41). In conclusion, WGS predicts species and drug susceptibility with great accuracy but work is needed to increase the proportion of predictions made. Copyright © 2017 Quan et al.

  10. Whole Genome Sequencing Demonstrates Limited Transmission within Identified Mycobacterium tuberculosis Clusters in New South Wales, Australia.

    Directory of Open Access Journals (Sweden)

    Ulziijargal Gurjav

    Full Text Available Australia has a low tuberculosis incidence rate with most cases occurring among recent immigrants. Given suboptimal cluster resolution achieved with 24-locus mycobacterium interspersed repetitive unit (MIRU-24 genotyping, the added value of whole genome sequencing was explored. MIRU-24 profiles of all Mycobacterium tuberculosis culture-confirmed tuberculosis cases diagnosed between 2009 and 2013 in New South Wales (NSW, Australia, were examined and clusters identified. The relatedness of cases within the largest MIRU-24 clusters was assessed using whole genome sequencing and phylogenetic analyses. Of 1841 culture-confirmed TB cases, 91.9% (1692/1841 had complete demographic and genotyping data. East-African Indian (474; 28.0% and Beijing (470; 27.8% lineage strains predominated. The overall rate of MIRU-24 clustering was 20.1% (340/1692 and was highest among Beijing lineage strains (35.7%; 168/470. One Beijing and three East-African Indian (EAI clonal complexes were responsible for the majority of observed clusters. Whole genome sequencing of the 4 largest clusters (30 isolates demonstrated diverse single nucleotide polymorphisms (SNPs within identified clusters. All sequenced EAI strains and 70% of Beijing lineage strains clustered by MIRU-24 typing demonstrated distinct SNP profiles. The superior resolution provided by whole genome sequencing demonstrated limited M. tuberculosis transmission within NSW, even within identified MIRU-24 clusters. Routine whole genome sequencing could provide valuable public health guidance in low burden settings.

  11. Whole genome sequencing in clinical and public health microbiology.

    Science.gov (United States)

    Kwong, J C; McCallum, N; Sintchenko, V; Howden, B P

    2015-04-01

    Genomics and whole genome sequencing (WGS) have the capacity to greatly enhance knowledge and understanding of infectious diseases and clinical microbiology.The growth and availability of bench-top WGS analysers has facilitated the feasibility of genomics in clinical and public health microbiology.Given current resource and infrastructure limitations, WGS is most applicable to use in public health laboratories, reference laboratories, and hospital infection control-affiliated laboratories.As WGS represents the pinnacle for strain characterisation and epidemiological analyses, it is likely to replace traditional typing methods, resistance gene detection and other sequence-based investigations (e.g., 16S rDNA PCR) in the near future.Although genomic technologies are rapidly evolving, widespread implementation in clinical and public health microbiology laboratories is limited by the need for effective semi-automated pipelines, standardised quality control and data interpretation, bioinformatics expertise, and infrastructure.

  12. Whole genome sequencing in clinical and public health microbiology

    Science.gov (United States)

    Kwong, J. C.; McCallum, N.; Sintchenko, V.; Howden, B. P.

    2015-01-01

    SummaryGenomics and whole genome sequencing (WGS) have the capacity to greatly enhance knowledge and understanding of infectious diseases and clinical microbiology. The growth and availability of bench-top WGS analysers has facilitated the feasibility of genomics in clinical and public health microbiology. Given current resource and infrastructure limitations, WGS is most applicable to use in public health laboratories, reference laboratories, and hospital infection control-affiliated laboratories. As WGS represents the pinnacle for strain characterisation and epidemiological analyses, it is likely to replace traditional typing methods, resistance gene detection and other sequence-based investigations (e.g., 16S rDNA PCR) in the near future. Although genomic technologies are rapidly evolving, widespread implementation in clinical and public health microbiology laboratories is limited by the need for effective semi-automated pipelines, standardised quality control and data interpretation, bioinformatics expertise, and infrastructure. PMID:25730631

  13. Two Rounds of Whole Genome Duplication in the AncestralVertebrate

    Energy Technology Data Exchange (ETDEWEB)

    Dehal, Paramvir; Boore, Jeffrey L.

    2005-04-12

    The hypothesis that the relatively large and complex vertebrate genome was created by two ancient, whole genome duplications has been hotly debated, but remains unresolved. We reconstructed the evolutionary relationships of all gene families from the complete gene sets of a tunicate, fish, mouse, and human, then determined when each gene duplicated relative to the evolutionary tree of the organisms. We confirmed the results of earlier studies that there remains little signal of these events in numbers of duplicated genes, gene tree topology, or the number of genes per multigene family. However, when we plotted the genomic map positions of only the subset of paralogous genes that were duplicated prior to the fish-tetrapod split, their global physical organization provides unmistakable evidence of two distinct genome duplication events early in vertebrate evolution indicated by clear patterns of 4-way paralogous regions covering a large part of the human genome. Our results highlight the potential for these large-scale genomic events to have driven the evolutionary success of the vertebrate lineage.

  14. Whole genome sequencing for lung cancer.

    Science.gov (United States)

    Daniels, Marissa; Goh, Felicia; Wright, Casey M; Sriram, Krishna B; Relan, Vandana; Clarke, Belinda E; Duhig, Edwina E; Bowman, Rayleen V; Yang, Ian A; Fong, Kwun M

    2012-04-01

    Lung cancer is a leading cause of cancer related morbidity and mortality globally, and carries a dismal prognosis. Improved understanding of the biology of cancer is required to improve patient outcomes. Next-generation sequencing (NGS) is a powerful tool for whole genome characterisation, enabling comprehensive examination of somatic mutations that drive oncogenesis. Most NGS methods are based on polymerase chain reaction (PCR) amplification of platform-specific DNA fragment libraries, which are then sequenced. These techniques are well suited to high-throughput sequencing and are able to detect the full spectrum of genomic changes present in cancer. However, they require considerable investments in time, laboratory infrastructure, computational analysis and bioinformatic support. Next-generation sequencing has been applied to studies of the whole genome, exome, transcriptome and epigenome, and is changing the paradigm of lung cancer research and patient care. The results of this new technology will transform current knowledge of oncogenic pathways and provide molecular targets of use in the diagnosis and treatment of cancer. Somatic mutations in lung cancer have already been identified by NGS, and large scale genomic studies are underway. Personalised treatment strategies will improve care for those likely to benefit from available therapies, while sparing others the expense and morbidity of futile intervention. Organisational, computational and bioinformatic challenges of NGS are driving technological advances as well as raising ethical issues relating to informed consent and data release. Differentiation between driver and passenger mutations requires careful interpretation of sequencing data. Challenges in the interpretation of results arise from the types of specimens used for DNA extraction, sample processing techniques and tumour content. Tumour heterogeneity can reduce power to detect mutations implicated in oncogenesis. Next-generation sequencing will

  15. Genomic V exons from whole genome shotgun data in reptiles.

    Science.gov (United States)

    Olivieri, D N; von Haeften, B; Sánchez-Espinel, C; Faro, J; Gambón-Deza, F

    2014-08-01

    Reptiles and mammals diverged over 300 million years ago, creating two parallel evolutionary lineages amongst terrestrial vertebrates. In reptiles, two main evolutionary lines emerged: one gave rise to Squamata, while the other gave rise to Testudines, Crocodylia, and Aves. In this study, we determined the genomic variable (V) exons from whole genome shotgun sequencing (WGS) data in reptiles corresponding to the three main immunoglobulin (IG) loci and the four main T cell receptor (TR) loci. We show that Squamata lack the TRG and TRD genes, and snakes lack the IGKV genes. In representative species of Testudines and Crocodylia, the seven major IG and TR loci are maintained. As in mammals, genes of the IG loci can be grouped into well-defined IMGT clans through a multi-species phylogenetic analysis. We show that the reptilian IGHV and IGLV genes are distributed amongst the established mammalian clans, while their IGKV genes are found within a single clan, nearly exclusive from the mammalian sequences. The reptilian and mammalian TRAV genes cluster into six common evolutionary clades (since IMGT clans have not been defined for TR). In contrast, the reptilian TRBV genes cluster into three clades, which have few mammalian members. In this locus, the V exon sequences from mammals appear to have undergone different evolutionary diversification processes that occurred outside these shared reptilian clans. These sequences can be obtained in a freely available public repository (http://vgenerepertoire.org).

  16. Exonic remnants of whole-genome duplication reveal cis-regulatory function of coding exons.

    Science.gov (United States)

    Dong, Xianjun; Navratilova, Pavla; Fredman, David; Drivenes, Øyvind; Becker, Thomas S; Lenhard, Boris

    2010-03-01

    Using a comparative genomics approach to reconstruct the fate of genomic regulatory blocks (GRBs) and identify exonic remnants that have survived the disappearance of their host genes after whole-genome duplication (WGD) in teleosts, we discover a set of 38 candidate cis-regulatory coding exons (RCEs) with predicted target genes. These elements demonstrate evolutionary separation of overlapping protein-coding and regulatory information after WGD in teleosts. We present evidence that the corresponding mammalian exons are still under both coding and non-coding selection pressure, are more conserved than other protein coding exons in the host gene and several control sets, and share key characteristics with highly conserved non-coding elements in the same regions. Their dual function is corroborated by existing experimental data. Additionally, we show examples of human exon remnants stemming from the vertebrate 2R WGD. Our findings suggest that long-range cis-regulatory inputs for developmental genes are not limited to non-coding regions, but can also overlap the coding sequence of unrelated genes. Thus, exonic regulatory elements in GRBs might be functionally equivalent to those in non-coding regions, calling for a re-evaluation of the sequence space in which to look for long-range regulatory elements and experimentally test their activity.

  17. Landscape of somatic mutations in 560 breast cancer whole-genome sequences

    NARCIS (Netherlands)

    S. Nik-Zainal (Serena); H. Davies (Helen); J. Staaf (Johan); M. Ramakrishna (Manasa); D. Glodzik (Dominik); X. Zou (Xueqing); I. Martincorena (Inigo); L.B. Alexandrov (Ludmil); S. Martin (Sandra); D.C. Wedge (David); P. van Loo (Peter); Y.S. Ju (Young Seok); M. Smid (Marcel); A.B. Brinkman (Arie B.); S. Morganella (Sandro); Aure, M.R. (Miriam R.); Lingjærde, O.C. (Ole Christian); A. Langerød (Anita); Ringnér, M. (Markus); Ahn, S.-M. (Sung-Min); S. Boyault (Sandrine); Brock, J.E. (Jane E.); A. Broeks (Annegien); A. Butler (Adam); C. Desmedt (Christine); L.Y. Dirix (Luc); S. Dronov (Serge); A. Fatima (Aquila); J.A. Foekens (John); M. Gerstung (Moritz); J. Hooijer; Jang, S.J. (Se Jin); Jones, D.R. (David R.); H.-Y. Kim (Hyung-Yong); King, T.A. (Tari A.); Krishnamurthy, S. (Savitri); Lee, H.J. (Hee Jin); Lee, J.-Y. (Jeong-Yeon); Y. Li (Yilong); S. McLaren (Stuart); D. Menzies; Mustonen, V. (Ville); S. O'Meara (Sarah); I. Pauporté (Iris); X. Pivot (Xavier); C.A. Purdie (Colin A.); J.W. Raine (John); Ramakrishnan, K. (Kamna); F.G. Rodriguez-Gonzalez (F. German); Romieu, G. (Gilles); A.M. Sieuwerts (Anieta); Simpson, P.T. (Peter T.); Shepherd, R. (Rebecca); L.A. Stebbings (Lucy); Stefansson, O.A. (Olafur A.); J. Teague (Jon); Tommasi, S. (Stefania); I. Treilleux (Isabelle); G. van den Eynden; P.B. Vermeulen; A. Vincent-Salomon (Anne); L.R. Yates (Lucy); C. Caldas (Carlos); L.J. van 't Veer (Laura); A. Tutt (Andrew); S. Knappskog (Stian); Tan, B.K.T. (Benita Kiat Tee); J. Jonkers (Jos); Å. Borg (Åke); Ueno, N.T. (Naoto T.); C. Sotiriou (Christos); Viari, A. (Alain); P.A. Futreal (Andrew); P.J. Campbell (Peter); P.N. Span (Paul); S.J. van Laere (Steven); S. Lakhani (Sunil); J. Eyfjord; A.M. Thompson (Alastair M.); E. Birney (Ewan); H. Stunnenberg (Henk); M.J. Vijver (Marc ); J.W.M. Martens (John); A.-L. Borresen-Dale (Anne-Lise); A.L. Richardson (Andrea); G. Kong (Gu); G. Thomas (Gilles); M.R. Stratton (Michael)

    2016-01-01

    textabstractWe analysed whole-genome sequences of 560 breast cancers to advance understanding of the driver mutations conferring clonal advantage and the mutational processes generating somatic mutations. We found that 93 protein-coding cancer genes carried probable driver mutations. Some non-coding

  18. Determination of Elizabethkingia Diversity by MALDI-TOF Mass Spectrometry and Whole-Genome Sequencing

    DEFF Research Database (Denmark)

    Eriksen, Helle Brander; Gumpert, Heidi; Faurholt, Cecilie Haase

    2017-01-01

    In a hospital-acquired infection with multidrug-resistant Elizabethkingia, matrix-assisted laser desorption/ionization time-of-flight mass spectrometry and 16S rRNA gene analysis identified the pathogen as Elizabethkingia miricola. Whole-genome sequencing, genus-level core genome analysis...

  19. Whole-genome sequencing and comprehensive molecular profiling identify new driver mutations in gastric cancer

    NARCIS (Netherlands)

    Wang, Kai; Yuen, Siu Tsan; Xu, Jiangchun; Lee, Siu Po; Yan, Helen H N; Shi, Stephanie T; Siu, Hoi Cheong; Deng, Shibing; Chu, Kent Man; Law, Simon; Chan, Kok Hoe; Chan, Annie S Y; Tsui, Wai Yin; Ho, Siu Lun; Chan, Anthony K W; Man, Jonathan L K; Foglizzo, Valentina; Ng, Man Kin; Chan, April S; Ching, Yick Pang; Cheng, Grace H W; Xie, Tao; Fernandez, Julio; Li, Vivian S W; Clevers, Hans; Rejto, Paul A; Mao, Mao; Leung, Suet Yi

    Gastric cancer is a heterogeneous disease with diverse molecular and histological subtypes. We performed whole-genome sequencing in 100 tumor-normal pairs, along with DNA copy number, gene expression and methylation profiling, for integrative genomic analysis. We found subtype-specific genetic and

  20. Whole-genome sequence-based analysis of thyroid function

    DEFF Research Database (Denmark)

    Taylor, Peter N.; Porcu, Eleonora; Chew, Shelby

    2015-01-01

    Normal thyroid function is essential for health, but its genetic architecture remains poorly understood. Here, for the heritable thyroid traits thyrotropin (TSH) and free thyroxine (FT4), we analyse whole-genome sequence data from the UK10K project (N = 2,287). Using additional whole-genome seque...

  1. Whole genome sequence analysis of Mycobacterium suricattae.

    Science.gov (United States)

    Dippenaar, Anzaan; Parsons, Sven David Charles; Sampson, Samantha Leigh; van der Merwe, Ruben Gerhard; Drewe, Julian Ashley; Abdallah, Abdallah Musa; Siame, Kabengele Keith; Gey van Pittius, Nicolaas Claudius; van Helden, Paul David; Pain, Arnab; Warren, Robin Mark

    2015-12-01

    Tuberculosis occurs in various mammalian hosts and is caused by a range of different lineages of the Mycobacterium tuberculosis complex (MTBC). A recently described member, Mycobacterium suricattae, causes tuberculosis in meerkats (Suricata suricatta) in Southern Africa and preliminary genetic analysis showed this organism to be closely related to an MTBC pathogen of rock hyraxes (Procavia capensis), the dassie bacillus. Here we make use of whole genome sequencing to describe the evolution of the genome of M. suricattae, including known and novel regions of difference, SNPs and IS6110 insertion sites. We used genome-wide phylogenetic analysis to show that M. suricattae clusters with the chimpanzee bacillus, previously isolated from a chimpanzee (Pan troglodytes) in West Africa. We propose an evolutionary scenario for the Mycobacterium africanum lineage 6 complex, showing the evolutionary relationship of M. africanum and chimpanzee bacillus, and the closely related members M. suricattae, dassie bacillus and Mycobacterium mungi. Copyright © 2015 Elsevier Ltd. All rights reserved.

  2. Small Sample Whole-Genome Amplification

    Energy Technology Data Exchange (ETDEWEB)

    Hara, C A; Nguyen, C P; Wheeler, E K; Sorensen, K J; Arroyo, E S; Vrankovich, G P; Christian, A T

    2005-09-20

    Many challenges arise when trying to amplify and analyze human samples collected in the field due to limitations in sample quantity, and contamination of the starting material. Tests such as DNA fingerprinting and mitochondrial typing require a certain sample size and are carried out in large volume reactions; in cases where insufficient sample is present whole genome amplification (WGA) can be used. WGA allows very small quantities of DNA to be amplified in a way that enables subsequent DNA-based tests to be performed. A limiting step to WGA is sample preparation. To minimize the necessary sample size, we have developed two modifications of WGA: the first allows for an increase in amplified product from small, nanoscale, purified samples with the use of carrier DNA while the second is a single-step method for cleaning and amplifying samples all in one column. Conventional DNA cleanup involves binding the DNA to silica, washing away impurities, and then releasing the DNA for subsequent testing. We have eliminated losses associated with incomplete sample release, thereby decreasing the required amount of starting template for DNA testing. Both techniques address the limitations of sample size by providing ample copies of genomic samples. Carrier DNA, included in our WGA reactions, can be used when amplifying samples with the standard purification method, or can be used in conjunction with our single-step DNA purification technique to potentially further decrease the amount of starting sample necessary for future forensic DNA-based assays.

  3. Evidence and evolutionary analysis of ancient whole-genome duplication in barley predating the divergence from rice

    OpenAIRE

    Grosse Ivo; Waugh Robbie; Graner Andreas; Thiel Thomas; Close Timothy J; Stein Nils

    2009-01-01

    Abstract Background Well preserved genomic colinearity among agronomically important grass species such as rice, maize, Sorghum, wheat and barley provides access to whole-genome structure information even in species lacking a reference genome sequence. We investigated footprints of whole-genome duplication (WGD) in barley that shaped the cereal ancestor genome by analyzing shared synteny with rice using a ~2000 gene-based barley genetic map and the rice genome reference sequence. Results Base...

  4. Whole-genome duplication and the functional diversification of teleost fish hemoglobins.

    Science.gov (United States)

    Opazo, Juan C; Butts, G Tyler; Nery, Mariana F; Storz, Jay F; Hoffmann, Federico G

    2013-01-01

    Subsequent to the two rounds of whole-genome duplication that occurred in the common ancestor of vertebrates, a third genome duplication occurred in the stem lineage of teleost fishes. This teleost-specific genome duplication (TGD) is thought to have provided genetic raw materials for the physiological, morphological, and behavioral diversification of this highly speciose group. The extreme physiological versatility of teleost fish is manifest in their diversity of blood-gas transport traits, which reflects the myriad solutions that have evolved to maintain tissue O(2) delivery in the face of changing metabolic demands and environmental O(2) availability during different ontogenetic stages. During the course of development, regulatory changes in blood-O(2) transport are mediated by the expression of multiple, functionally distinct hemoglobin (Hb) isoforms that meet the particular O(2)-transport challenges encountered by the developing embryo or fetus (in viviparous or oviparous species) and in free-swimming larvae and adults. The main objective of the present study was to assess the relative contributions of whole-genome duplication, large-scale segmental duplication, and small-scale gene duplication in producing the extraordinary functional diversity of teleost Hbs. To accomplish this, we integrated phylogenetic reconstructions with analyses of conserved synteny to characterize the genomic organization and evolutionary history of the globin gene clusters of teleosts. These results were then integrated with available experimental data on functional properties and developmental patterns of stage-specific gene expression. Our results indicate that multiple α- and β-globin genes were present in the common ancestor of gars (order Lepisoteiformes) and teleosts. The comparative genomic analysis revealed that teleosts possess a dual set of TGD-derived globin gene clusters, each of which has undergone lineage-specific changes in gene content via repeated duplication and

  5. Whole Genome Sequencing of the Braconid Parasitoid Wasp Fopius arisanus, an Important Biocontrol Agent of Pest Tepritid Fruit Flies

    Directory of Open Access Journals (Sweden)

    Scott M. Geib

    2017-08-01

    Full Text Available The braconid wasp Fopius arisanus (Sonan is an important biological control agent of tropical and subtropical pest fruit flies, including two important global pests, the Mediterranean fruit fly (Ceratitis capitata, and the oriental fruit fly (Bactrocera dorsalis. The goal of this study was to develop foundational genomic resources for this species to provide tools that can be used to answer questions exploring the multitrophic interactions between the host and parasitoid in this important research system. Here, we present a whole genome assembly of F. arisanus, derived from a pool of haploid offspring from a single unmated female. The genome is ∼154 Mb in size, with a N50 contig and scaffold size of 51,867 bp and 0.98 Mb, respectively. Utilizing existing RNA-Seq data for this species, as well as publicly available peptide sequences from related Hymenoptera, a high quality gene annotation set, which includes 10,991 protein coding genes, was generated. Prior to this assembly submission, no RefSeq proteins were present for this species. Parasitic wasps play an important role in a diverse ecosystem as well as a role in biological control of agricultural pests. This whole genome assembly and annotation data represents the first genome-scale assembly for this species or any closely related Opiine, and are publicly available in the National Center for Biotechnology Information Genome and RefSeq databases, providing a much needed genomic resource for this hymenopteran group.

  6. Genetic-linkage mapping of complex hereditary disorders to a whole-genome molecular-interaction network.

    Science.gov (United States)

    Iossifov, Ivan; Zheng, Tian; Baron, Miron; Gilliam, T Conrad; Rzhetsky, Andrey

    2008-07-01

    Common hereditary neurodevelopmental disorders such as autism, bipolar disorder, and schizophrenia are most likely both genetically multifactorial and heterogeneous. Because of these characteristics traditional methods for genetic analysis fail when applied to such diseases. To address the problem we propose a novel probabilistic framework that combines the standard genetic linkage formalism with whole-genome molecular-interaction data to predict pathways or networks of interacting genes that contribute to common heritable disorders. We apply the model to three large genotype-phenotype data sets, identify a small number of significant candidate genes for autism (24), bipolar disorder (21), and schizophrenia (25), and predict a number of gene targets likely to be shared among the disorders.

  7. Whole-Genome Sequencing for National Surveillance of Shigella flexneri

    Directory of Open Access Journals (Sweden)

    Marie A. Chattaway

    2017-09-01

    Full Text Available National surveillance of Shigella flexneri ensures the rapid detection of outbreaks to facilitate public health investigation and intervention strategies. In this study, we used whole-genome sequencing (WGS to type S. flexneri in order to detect linked cases and support epidemiological investigations. We prospectively analyzed 330 isolates of S. flexneri received at the Gastrointestinal Bacteria Reference Unit at Public Health England between August 2015 and January 2016. Traditional phenotypic and WGS sub-typing methods were compared. PCR was carried out on isolates exhibiting phenotypic/genotypic discrepancies with respect to serotype. Phylogenetic relationships between isolates were analyzed by WGS using single nucleotide polymorphism (SNP typing to facilitate cluster detection. For 306/330 (93% isolates there was concordance between serotype derived from the genome and phenotypic serology. Discrepant results between the phenotypic and genotypic tests were attributed to novel O-antigen synthesis/modification gene combinations or indels identified in O-antigen synthesis/modification genes rendering them dysfunctional. SNP typing identified 36 clusters of two isolates or more. WGS provided microbiological evidence of epidemiologically linked clusters and detected novel O-antigen synthesis/modification gene combinations associated with two outbreaks. WGS provided reliable and robust data for monitoring trends in the incidence of different serotypes over time. SNP typing can be used to facilitate outbreak investigations in real-time thereby informing surveillance strategies and providing the opportunities for implementing timely public health interventions.

  8. Whole genomes redefine the mutational landscape of pancreatic cancer

    Science.gov (United States)

    Waddell, Nicola; Pajic, Marina; Patch, Ann-Marie; Chang, David K.; Kassahn, Karin S.; Bailey, Peter; Johns, Amber L.; Miller, David; Nones, Katia; Quek, Kelly; Quinn, Michael C. J.; Robertson, Alan J.; Fadlullah, Muhammad Z. H.; Bruxner, Tim J. C.; Christ, Angelika N.; Harliwong, Ivon; Idrisoglu, Senel; Manning, Suzanne; Nourse, Craig; Nourbakhsh, Ehsan; Wani, Shivangi; Wilson, Peter J; Markham, Emma; Cloonan, Nicole; Anderson, Matthew J.; Fink, J. Lynn; Holmes, Oliver; Kazakoff, Stephen H.; Leonard, Conrad; Newell, Felicity; Poudel, Barsha; Song, Sarah; Taylor, Darrin; Waddell, Nick; Wood, Scott; Xu, Qinying; Wu, Jianmin; Pinese, Mark; Cowley, Mark J.; Lee, Hong C.; Jones, Marc D.; Nagrial, Adnan M.; Humphris, Jeremy; Chantrill, Lorraine A.; Chin, Venessa; Steinmann, Angela M.; Mawson, Amanda; Humphrey, Emily S.; Colvin, Emily K.; Chou, Angela; Scarlett, Christopher J.; Pinho, Andreia V.; Giry-Laterriere, Marc; Rooman, Ilse; Samra, Jaswinder S.; Kench, James G.; Pettitt, Jessica A.; Merrett, Neil D.; Toon, Christopher; Epari, Krishna; Nguyen, Nam Q.; Barbour, Andrew; Zeps, Nikolajs; Jamieson, Nigel B.; Graham, Janet S.; Niclou, Simone P.; Bjerkvig, Rolf; Grützmann, Robert; Aust, Daniela; Hruban, Ralph H.; Maitra, Anirban; Iacobuzio-Donahue, Christine A.; Wolfgang, Christopher L.; Morgan, Richard A.; Lawlor, Rita T.; Corbo, Vincenzo; Bassi, Claudio; Falconi, Massimo; Zamboni, Giuseppe; Tortora, Giampaolo; Tempero, Margaret A.; Gill, Anthony J.; Eshleman, James R.; Pilarsky, Christian; Scarpa, Aldo; Musgrove, Elizabeth A.; Pearson, John V.; Biankin, Andrew V.; Grimmond, Sean M.

    2015-01-01

    Pancreatic cancer remains one of the most lethal of malignancies and a major health burden. We performed whole-genome sequencing and copy number variation (CNV) analysis of 100 pancreatic ductal adenocarcinomas (PDACs). Chromosomal rearrangements leading to gene disruption were prevalent, affecting genes known to be important in pancreatic cancer (TP53, SMAD4, CDKN2A, ARID1A and ROBO2) and new candidate drivers of pancreatic carcinogenesis (KDM6A and PREX2). Patterns of structural variation (variation in chromosomal structure) classified PDACs into 4 subtypes with potential clinical utility: the subtypes were termed stable, locally rearranged, scattered and unstable. A significant proportion harboured focal amplifications, many of which contained druggable oncogenes (ERBB2, MET, FGFR1, CDK6, PIK3R3 and PIK3CA), but at low individual patient prevalence. Genomic instability co-segregated with inactivation of DNA maintenance genes (BRCA1, BRCA2 or PALB2) and a mutational signature of DNA damage repair deficiency. Of 8 patients who received platinum therapy, 4 of 5 individuals with these measures of defective DNA maintenance responded. PMID:25719666

  9. Whole genome sequence of a Turkish individual.

    Directory of Open Access Journals (Sweden)

    Haluk Dogan

    Full Text Available Although whole human genome sequencing can be done with readily available technical and financial resources, the need for detailed analyses of genomes of certain populations still exists. Here we present, for the first time, sequencing and analysis of a Turkish human genome. We have performed 35x coverage using paired-end sequencing, where over 95% of sequencing reads are mapped to the reference genome covering more than 99% of the bases. The assembly of unmapped reads rendered 11,654 contigs, 2,168 of which did not reveal any homology to known sequences, resulting in ∼1 Mbp of unmapped sequence. Single nucleotide polymorphism (SNP discovery resulted in 3,537,794 SNP calls with 29,184 SNPs identified in coding regions, where 106 were nonsense and 259 were categorized as having a high-impact effect. The homo/hetero zygosity (1,415,123∶2,122,671 or 1∶1.5 and transition/transversion ratios (2,383,204∶1,154,590 or 2.06∶1 were within expected limits. Of the identified SNPs, 480,396 were potentially novel with 2,925 in coding regions, including 48 nonsense and 95 high-impact SNPs. Functional analysis of novel high-impact SNPs revealed various interaction networks, notably involving hereditary and neurological disorders or diseases. Assembly results indicated 713,640 indels (1∶1.09 insertion/deletion ratio, ranging from -52 bp to 34 bp in length and causing about 180 codon insertion/deletions and 246 frame shifts. Using paired-end- and read-depth-based methods, we discovered 9,109 structural variants and compared our variant findings with other populations. Our results suggest that whole genome sequencing is a valuable tool for understanding variations in the human genome across different populations. Detailed analyses of genomes of diverse origins greatly benefits research in genetics and medicine and should be conducted on a larger scale.

  10. Whole genome sequencing as the ultimate tool to diagnose tuberculosis.

    Science.gov (United States)

    van Soolingen, Dick; Jajou, Rana; Mulder, Arnout; de Neeling, Han

    2016-12-01

    In the past two decades, DNA techniques have been increasingly used in the laboratory diagnosis of tuberculosis (TB). The (sub) species of the Mycobacterium tuberculosis complex are usually identified using reverse line blot techniques. The resistance is predicted by the detection of mutations in genes associated with resistance. Nevertheless, all cases are still subjected to cumbersome phenotypic resistance testing. The production of a strain-characteristic DNA fingerprint, to investigate the epidemiology of TB, is done by the 24-locus variable number tandem repeat (VNTR) typing. However, most of the molecular techniques in the diagnosis of TB can eventually be replaced by whole genome sequencing (WGS). Many international TB reference laboratories are currently working on the introduction of WGS; however, standardization in the international context is lacking. The European Centre for Infectious Disease Prevention and Control in Stockholm, Sweden organizes a yearly round of quality control on VNTR typing and in 2015 for the first time also WGS. In this first proficiency study, only three out of eight international TB laboratories produced WGS results in line with those of the reference laboratory. The whole process of DNA isolation, purification, quantification, sequencing, and analysis/interpretation of data is still under development. In this presentation, many aspects will be covered that influence the quality and interpretation of WGS results. The turn-around-time, analysis, and utility of WGS will be discussed. Moreover, the experiences in the use of WGS in the molecular epidemiology of TB in The Netherlands are detailed. It can be concluded that many difficulties still have to be conquered. The state of the art is that bacteria still have to be cultured to have sufficient quality and quantity of DNA for succesful WGS. The quality of sequencing has improved significantly over the past 7years, and the detection of mutations has, therefore, become more reliable

  11. Toxicological effects of benzo[a]pyrene on DNA methylation of whole genome in ICR mice.

    Science.gov (United States)

    Zhao, L; Zhang, S; An, X; Tan, W; Pang, D; Ouyang, H

    2015-10-30

    It has been well known that alterations in DNA methylation - an important regulator of gene transcription - lead to cancer. Therefore a change in the level of DNA methylation of whole genome has been considered as a biomarker of carcinogenesis. Previously, a large number of experimental results in genetic toxicology have showed that benzo[a]pyrene could cause DNA mutation and fragmentation. However, there was little to no studies on alterations in DNA methylation of genome directly result from exposure to benzo[a]pyrene. In this paper, possible mechanisms of alterations in whole genomic DNA methylation by benzo[a]pyrene were investigated using ICR mice after benzo[a]pyrene exposure. The blood, liver, pancreas, skin, lung and bladder of ICR mice were removed and checked after a fixed time interval (6 hours) of benzo[a]pyrene exposure, and whole genomic DNA methylation level was determined by high performance liquid chromatography (HPLC). The results exhibited tissue specificity, that is, the level of whole genomic DNA methylation decreases significantly in blood and liver, rather than pancreas, lung, skin and bladder of ICR mice. This study investigated the direct relationship between aberrant DNA methylation level and benzo[a]pyrene exposure, which might be helpful to clarify the toxicological mechanism of benzo[a]pyrene in epigenetic perspectives.

  12. Survey of the Applications of NGS to Whole-Genome Sequencing and Expression Profiling

    Directory of Open Access Journals (Sweden)

    Jong-Sung Lim

    2012-03-01

    Full Text Available Recently, the technologies of DNA sequence variation and gene expression profiling have been used widely as approaches in the expertise of genome biology and genetics. The application to genome study has been particularly developed with the introduction of the next-generation DNA sequencer (NGS Roche/454 and Illumina/Solexa systems, along with bioinformation analysis technologies of whole-genome de novo assembly, expression profiling, DNA variation discovery, and genotyping. Both massive whole-genome shotgun paired-end sequencing and mate paired-end sequencing data are important steps for constructing de novo assembly of novel genome sequencing data. It is necessary to have DNA sequence information from a multiplatform NGS with at least 2× and 30× depth sequence of genome coverage using Roche/454 and Illumina/Solexa, respectively, for effective an way of de novo assembly. Massive short-length reading data from the Illumina/Solexa system is enough to discover DNA variation, resulting in reducing the cost of DNA sequencing. Whole-genome expression profile data are useful to approach genome system biology with quantification of expressed RNAs from a whole-genome transcriptome, depending on the tissue samples. The hybrid mRNA sequences from Rohce/454 and Illumina/Solexa are more powerful to find novel genes through de novo assembly in any whole-genome sequenced species. The 20× and 50× coverage of the estimated transcriptome sequences using Roche/454 and Illumina/Solexa, respectively, is effective to create novel expressed reference sequences. However, only an average 30× coverage of a transcriptome with short read sequences of Illumina/Solexa is enough to check expression quantification, compared to the reference expressed sequence tag sequence.

  13. Comprehensive characterization of human genome variation by high coverage whole-genome sequencing of forty four Caucasians.

    Directory of Open Access Journals (Sweden)

    Hui Shen

    Full Text Available Whole genome sequencing studies are essential to obtain a comprehensive understanding of the vast pattern of human genomic variations. Here we report the results of a high-coverage whole genome sequencing study for 44 unrelated healthy Caucasian adults, each sequenced to over 50-fold coverage (averaging 65.8×. We identified approximately 11 million single nucleotide polymorphisms (SNPs, 2.8 million short insertions and deletions, and over 500,000 block substitutions. We showed that, although previous studies, including the 1000 Genomes Project Phase 1 study, have catalogued the vast majority of common SNPs, many of the low-frequency and rare variants remain undiscovered. For instance, approximately 1.4 million SNPs and 1.3 million short indels that we found were novel to both the dbSNP and the 1000 Genomes Project Phase 1 data sets, and the majority of which (∼96% have a minor allele frequency less than 5%. On average, each individual genome carried ∼3.3 million SNPs and ∼492,000 indels/block substitutions, including approximately 179 variants that were predicted to cause loss of function of the gene products. Moreover, each individual genome carried an average of 44 such loss-of-function variants in a homozygous state, which would completely "knock out" the corresponding genes. Across all the 44 genomes, a total of 182 genes were "knocked-out" in at least one individual genome, among which 46 genes were "knocked out" in over 30% of our samples, suggesting that a number of genes are commonly "knocked-out" in general populations. Gene ontology analysis suggested that these commonly "knocked-out" genes are enriched in biological process related to antigen processing and immune response. Our results contribute towards a comprehensive characterization of human genomic variation, especially for less-common and rare variants, and provide an invaluable resource for future genetic studies of human variation and diseases.

  14. Whole genome identification of Mycobacterium tuberculosis vaccine candidates by comprehensive data mining and bioinformatic analyses

    Directory of Open Access Journals (Sweden)

    Sadoff Jerald C

    2008-05-01

    Full Text Available Abstract Background Mycobacterium tuberculosis, the causative agent of tuberculosis (TB, infects ~8 million annually culminating in ~2 million deaths. Moreover, about one third of the population is latently infected, 10% of which develop disease during lifetime. Current approved prophylactic TB vaccines (BCG and derivatives thereof are of variable efficiency in adult protection against pulmonary TB (0%–80%, and directed essentially against early phase infection. Methods A genome-scale dataset was constructed by analyzing published data of: (1 global gene expression studies under conditions which simulate intra-macrophage stress, dormancy, persistence and/or reactivation; (2 cellular and humoral immunity, and vaccine potential. This information was compiled along with revised annotation/bioinformatic characterization of selected gene products and in silico mapping of T-cell epitopes. Protocols for scoring, ranking and prioritization of the antigens were developed and applied. Results Cross-matching of literature and in silico-derived data, in conjunction with the prioritization scheme and biological rationale, allowed for selection of 189 putative vaccine candidates from the entire genome. Within the 189 set, the relative distribution of antigens in 3 functional categories differs significantly from their distribution in the whole genome, with reduction in the Conserved hypothetical category (due to improved annotation and enrichment in Lipid and in Virulence categories. Other prominent representatives in the 189 set are the PE/PPE proteins; iron sequestration, nitroreductases and proteases, all within the Intermediary metabolism and respiration category; ESX secretion systems, resuscitation promoting factors and lipoproteins, all within the Cell wall category. Application of a ranking scheme based on qualitative and quantitative scores, resulted in a list of 45 best-scoring antigens, of which: 74% belong to the dormancy

  15. Whole-Genome Characterization and Strain Comparison of VT2f-Producing Escherichia coli Causing Hemolytic Uremic Syndrome

    Science.gov (United States)

    Michelacci, Valeria; Bondì, Roslen; Gigliucci, Federica; Franz, Eelco; Badouei, Mahdi Askari; Schlager, Sabine; Minelli, Fabio; Tozzoli, Rosangela; Caprioli, Alfredo; Morabito, Stefano

    2016-01-01

    Verotoxigenic Escherichia coli infections in humans cause disease ranging from uncomplicated intestinal illnesses to bloody diarrhea and systemic sequelae, such as hemolytic uremic syndrome (HUS). Previous research indicated that pigeons may be a reservoir for a population of verotoxigenic E. coli producing the VT2f variant. We used whole-genome sequencing to characterize a set of VT2f-producing E. coli strains from human patients with diarrhea or HUS and from healthy pigeons. We describe a phage conveying the vtx2f genes and provide evidence that the strains causing milder diarrheal disease may be transmitted to humans from pigeons. The strains causing HUS could derive from VT2f phage acquisition by E. coli strains with a virulence genes asset resembling that of typical HUS-associated verotoxigenic E. coli. PMID:27584691

  16. Targeted analysis of whole genome sequence data to diagnose genetic cardiomyopathy.

    Science.gov (United States)

    Golbus, Jessica R; Puckelwartz, Megan J; Dellefave-Castillo, Lisa; Fahrenbach, John P; Nelakuditi, Viswateja; Pesce, Lorenzo L; Pytel, Peter; McNally, Elizabeth M

    2014-12-01

    Cardiomyopathy is highly heritable but genetically diverse. At present, genetic testing for cardiomyopathy uses targeted sequencing to simultaneously assess the coding regions of >50 genes. New genes are routinely added to panels to improve the diagnostic yield. With the anticipated $1000 genome, it is expected that genetic testing will shift toward comprehensive genome sequencing accompanied by targeted gene analysis. Therefore, we assessed the reliability of whole genome sequencing and targeted analysis to identify cardiomyopathy variants in 11 subjects with cardiomyopathy. Whole genome sequencing with an average of 37× coverage was combined with targeted analysis focused on 204 genes linked to cardiomyopathy. Genetic variants were scored using multiple prediction algorithms combined with frequency data from public databases. This pipeline yielded 1 to 14 potentially pathogenic variants per individual. Variants were further analyzed using clinical criteria and segregation analysis, where available. Three of 3 previously identified primary mutations were detected by this analysis. In 6 subjects for whom the primary mutation was previously unknown, we identified mutations that segregated with disease, had clinical correlates, and had additional pathological correlation to provide evidence for causality. For 2 subjects with previously known primary mutations, we identified additional variants that may act as modifiers of disease severity. In total, we identified the likely pathological mutation in 9 of 11 (82%) subjects. These pilot data demonstrate that ≈30 to 40× coverage whole genome sequencing combined with targeted analysis is feasible and sensitive to identify rare variants in cardiomyopathy-associated genes. © 2014 American Heart Association, Inc.

  17. Gene set analysis for GWAS

    DEFF Research Database (Denmark)

    Debrabant, Birgit; Soerensen, Mette

    2014-01-01

    Abstract We discuss the use of modified Kolmogorov-Smirnov (KS) statistics in the context of gene set analysis and review corresponding null and alternative hypotheses. Especially, we show that, when enhancing the impact of highly significant genes in the calculation of the test statistic...... parameter and the genesis and distribution of the gene-level statistics, and illustrate the effects of differential weighting in a real-life example....

  18. Signatures of selection in tilapia revealed by whole genome resequencing.

    Science.gov (United States)

    Xia, Jun Hong; Bai, Zhiyi; Meng, Zining; Zhang, Yong; Wang, Le; Liu, Feng; Jing, Wu; Wan, Zi Yi; Li, Jiale; Lin, Haoran; Yue, Gen Hua

    2015-09-16

    Natural selection and selective breeding for genetic improvement have left detectable signatures within the genome of a species. Identification of selection signatures is important in evolutionary biology and for detecting genes that facilitate to accelerate genetic improvement. However, selection signatures, including artificial selection and natural selection, have only been identified at the whole genome level in several genetically improved fish species. Tilapia is one of the most important genetically improved fish species in the world. Using next-generation sequencing, we sequenced the genomes of 47 tilapia individuals. We identified a total of 1.43 million high-quality SNPs and found that the LD block sizes ranged from 10-100 kb in tilapia. We detected over a hundred putative selective sweep regions in each line of tilapia. Most selection signatures were located in non-coding regions of the tilapia genome. The Wnt signaling, gonadotropin-releasing hormone receptor and integrin signaling pathways were under positive selection in all improved tilapia lines. Our study provides a genome-wide map of genetic variation and selection footprints in tilapia, which could be important for genetic studies and accelerating genetic improvement of tilapia.

  19. Whole genome sequencing of Chinese clearhead icefish, Protosalanx hyalocranius.

    Science.gov (United States)

    Liu, Kai; Xu, Dongpo; Li, Jia; Bian, Chao; Duan, Jinrong; Zhou, Yanfeng; Zhang, Minying; You, Xinxin; You, Yang; Chen, Jieming; Yu, Hui; Xu, Gangchun; Fang, Di-An; Qiang, Jun; Jiang, Shulun; He, Jie; Xu, Junmin; Shi, Qiong; Zhang, Zhiyong; Xu, Pao

    2017-04-01

    Chinese clearhead icefish, Protosalanx hyalocranius , is a representative icefish species with economic importance and special appearance. Due to its great economic value in China, the fish was introduced into Lake Dianchi and several other lakes from the Lake Taihu half a century ago. Similar to the Sinocyclocheilus cavefish, the clearhead icefish has certain cavefish-like traits, such as transparent body and nearly scaleless skin. Here, we provide the whole genome sequence of this surface-dwelling fish and generated a draft genome assembly, aiming at exploring molecular mechanisms for the biological interests. A total of 252.1 Gb of raw reads were sequenced. Subsequently, a novel draft genome assembly was generated, with the scaffold N50 reaching 1.163 Mb. The genome completeness was estimated to be 98.39 % by using the CEGMA evaluation. Finally, we annotated 19 884 protein-coding genes and observed that repeat sequences account for 24.43 % of the genome assembly. We report the first draft genome of the Chinese clearhead icefish. The genome assembly will provide a solid foundation for further molecular breeding and germplasm resource protection in Chinese clearhead icefish, as well as other icefishes. It is also a valuable genetic resource for revealing the molecular mechanisms for the cavefish-like characters.

  20. Comparative analysis of copy number detection by whole-genome BAC and oligonucleotide array CGH

    Directory of Open Access Journals (Sweden)

    Bejjani Bassem A

    2010-06-01

    Full Text Available Abstract Background Microarray-based comparative genomic hybridization (aCGH is a powerful diagnostic tool for the detection of DNA copy number gains and losses associated with chromosome abnormalities, many of which are below the resolution of conventional chromosome analysis. It has been presumed that whole-genome oligonucleotide (oligo arrays identify more clinically significant copy-number abnormalities than whole-genome bacterial artificial chromosome (BAC arrays, yet this has not been systematically studied in a clinical diagnostic setting. Results To determine the difference in detection rate between similarly designed BAC and oligo arrays, we developed whole-genome BAC and oligonucleotide microarrays and validated them in a side-by-side comparison of 466 consecutive clinical specimens submitted to our laboratory for aCGH. Of the 466 cases studied, 67 (14.3% had a copy-number imbalance of potential clinical significance detectable by the whole-genome BAC array, and 73 (15.6% had a copy-number imbalance of potential clinical significance detectable by the whole-genome oligo array. However, because both platforms identified copy number variants of unclear clinical significance, we designed a systematic method for the interpretation of copy number alterations and tested an additional 3,443 cases by BAC array and 3,096 cases by oligo array. Of those cases tested on the BAC array, 17.6% were found to have a copy-number abnormality of potential clinical significance, whereas the detection rate increased to 22.5% for the cases tested by oligo array. In addition, we validated the oligo array for detection of mosaicism and found that it could routinely detect mosaicism at levels of 30% and greater. Conclusions Although BAC arrays have faster turnaround times, the increased detection rate of oligo arrays makes them attractive for clinical cytogenetic testing.

  1. Whole genome analysis of a Vietnamese trio

    Indian Academy of Sciences (India)

    158706 node_37661. 152. NG_007107.2. 33647. 33758 node_41375. 111. NG_017089.1. 4275. 4397. Supplementary table 1. The genes with enriched GO terms and related non-synonymous KHV SNPs. Gene Name. OMIM Disease. Coordinates. Substitution. dbSNP ID. 1000 GP. ZNF124. NA. 1,247319954,1,C/T. G262S.

  2. Genomic view of bipolar disorder revealed by whole genome sequencing in a genetic isolate.

    Directory of Open Access Journals (Sweden)

    Benjamin Georgi

    2014-03-01

    Full Text Available Bipolar disorder is a common, heritable mental illness characterized by recurrent episodes of mania and depression. Despite considerable effort to elucidate the genetic underpinnings of bipolar disorder, causative genetic risk factors remain elusive. We conducted a comprehensive genomic analysis of bipolar disorder in a large Old Order Amish pedigree. Microsatellite genotypes and high-density SNP-array genotypes of 388 family members were combined with whole genome sequence data for 50 of these subjects, comprising 18 parent-child trios. This study design permitted evaluation of candidate variants within the context of haplotype structure by resolving the phase in sequenced parent-child trios and by imputation of variants into multiple unsequenced siblings. Non-parametric and parametric linkage analysis of the entire pedigree as well as on smaller clusters of families identified several nominally significant linkage peaks, each of which included dozens of predicted deleterious variants. Close inspection of exonic and regulatory variants in genes under the linkage peaks using family-based association tests revealed additional credible candidate genes for functional studies and further replication in population-based cohorts. However, despite the in-depth genomic characterization of this unique, large and multigenerational pedigree from a genetic isolate, there was no convergence of evidence implicating a particular set of risk loci or common pathways. The striking haplotype and locus heterogeneity we observed has profound implications for the design of studies of bipolar and other related disorders.

  3. Whole-Genome Sequencing of Native Sheep Provides Insights into Rapid Adaptations to Extreme Environments

    Science.gov (United States)

    Yang, Ji; Li, Wen-Rong; Lv, Feng-Hua; He, San-Gang; Tian, Shi-Lin; Peng, Wei-Feng; Sun, Ya-Wei; Zhao, Yong-Xin; Tu, Xiao-Long; Zhang, Min; Xie, Xing-Long; Wang, Yu-Tao; Li, Jin-Quan; Liu, Yong-Gang; Shen, Zhi-Qiang; Wang, Feng; Liu, Guang-Jian; Lu, Hong-Feng; Kantanen, Juha; Han, Jian-Lin; Li, Meng-Hua; Liu, Ming-Jun

    2016-01-01

    Global climate change has a significant effect on extreme environments and a profound influence on species survival. However, little is known of the genome-wide pattern of livestock adaptations to extreme environments over a short time frame following domestication. Sheep (Ovis aries) have become well adapted to a diverse range of agroecological zones, including certain extreme environments (e.g., plateaus and deserts), during their post-domestication (approximately 8–9 kya) migration and differentiation. Here, we generated whole-genome sequences from 77 native sheep, with an average effective sequencing depth of ∼5× for 75 samples and ∼42× for 2 samples. Comparative genomic analyses among sheep in contrasting environments, that is, plateau (>4,000 m above sea level) versus lowland (1500 m) versus low-altitude region (600 mm), and arid zone (400 mm), detected a novel set of candidate genes as well as pathways and GO categories that are putatively associated with hypoxia responses at high altitudes and water reabsorption in arid environments. In addition, candidate genes and GO terms functionally related to energy metabolism and body size variations were identified. This study offers novel insights into rapid genomic adaptations to extreme environments in sheep and other animals, and provides a valuable resource for future research on livestock breeding in response to climate change. PMID:27401233

  4. Current Developments in Prokaryotic Single Cell Whole Genome Amplification

    Energy Technology Data Exchange (ETDEWEB)

    Goudeau, Danielle; Nath, Nandita; Ciobanu, Doina; Cheng, Jan-Fang; Malmstrom, Rex

    2014-03-14

    Our approach to prokaryotic single-cell Whole Genome Amplification at the JGI continues to evolve. To increase both the quality and number of single-cell genomes produced, we explore all aspects of the process from cell sorting to sequencing. For example, we now utilize specialized reagents, acoustic liquid handling, and reduced reaction volumes eliminate non-target DNA contamination in WGA reactions. More specifically, we use a cleaner commercial WGA kit from Qiagen that employs a UV decontamination procedure initially developed at the JGI, and we use the Labcyte Echo for tip-less liquid transfer to set up 2uL reactions. Acoustic liquid handling also dramatically reduces reagent costs. In addition, we are exploring new cell lysis methods including treatment with Proteinase K, lysozyme, and other detergents, in order to complement standard alkaline lysis and allow for more efficient disruption of a wider range of cells. Incomplete lysis represents a major hurdle for WGA on some environmental samples, especially rhizosphere, peatland, and other soils. Finding effective lysis strategies that are also compatible with WGA is challenging, and we are currently assessing the impact of various strategies on genome recovery.

  5. MIPS: analysis and annotation of proteins from whole genomes.

    Science.gov (United States)

    Mewes, H W; Amid, C; Arnold, R; Frishman, D; Güldener, U; Mannhaupt, G; Münsterkötter, M; Pagel, P; Strack, N; Stümpflen, V; Warfsmann, J; Ruepp, A

    2004-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF), Neuherberg, Germany, provides protein sequence-related information based on whole-genome analysis. The main focus of the work is directed toward the systematic organization of sequence-related attributes as gathered by a variety of algorithms, primary information from experimental data together with information compiled from the scientific literature. MIPS maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the database of complete cDNAs (German Human Genome Project, NGFN), the database of mammalian protein-protein interactions (MPPI), the database of FASTA homologies (SIMAP), and the interface for the fast retrieval of protein-associated information (QUIPOS). The Arabidopsis thaliana database, the rice database, the plant EST databases (MATDB, MOsDB, SPUTNIK), as well as the databases for the comprehensive set of genomes (PEDANT genomes) are described elsewhere in the 2003 and 2004 NAR database issues, respectively. All databases described, and the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de).

  6. Challenges in Whole-Genome Annotation of Pyrosequenced Eukaryotic Genomes

    Energy Technology Data Exchange (ETDEWEB)

    Kuo, Alan; Grigoriev, Igor

    2009-04-17

    Pyrosequencing technologies such as 454/Roche and Solexa/Illumina vastly lower the cost of nucleotide sequencing compared to the traditional Sanger method, and thus promise to greatly expand the number of sequenced eukaryotic genomes. However, the new technologies also bring new challenges such as shorter reads and new kinds and higher rates of sequencing errors, which complicate genome assembly and gene prediction. At JGI we are deploying 454 technology for the sequencing and assembly of ever-larger eukaryotic genomes. Here we describe our first whole-genome annotation of a purely 454-sequenced fungal genome that is larger than a yeast (>30 Mbp). The pezizomycotine (filamentous ascomycote) Aspergillus carbonarius belongs to the Aspergillus section Nigri species complex, members of which are significant as platforms for bioenergy and bioindustrial technology, as members of soil microbial communities and players in the global carbon cycle, and as agricultural toxigens. Application of a modified version of the standard JGI Annotation Pipeline has so far predicted ~;;10k genes. ~;;12percent of these preliminary annotations suffer a potential frameshift error, which is somewhat higher than the ~;;9percent rate in the Sanger-sequenced and conventionally assembled and annotated genome of fellow Aspergillus section Nigri member A. niger. Also,>90percent of A. niger genes have potential homologs in the A. carbonarius preliminary annotation. Weconclude, and with further annotation and comparative analysis expect to confirm, that 454 sequencing strategies provide a promising substrate for annotation of modestly sized eukaryotic genomes. We will also present results of annotation of a number of other pyrosequenced fungal genomes of bioenergy interest.

  7. Whole-Genome SNP Association in the Horse: Identification of a Deletion in Myosin Va Responsible for Lavender Foal Syndrome

    Science.gov (United States)

    Brooks, Samantha A.; Gabreski, Nicole; Miller, Donald; Brisbin, Abra; Brown, Helen E.; Streeter, Cassandra; Mezey, Jason; Cook, Deborah; Antczak, Douglas F.

    2010-01-01

    Lavender Foal Syndrome (LFS) is a lethal inherited disease of horses with a suspected autosomal recessive mode of inheritance. LFS has been primarily diagnosed in a subgroup of the Arabian breed, the Egyptian Arabian horse. The condition is characterized by multiple neurological abnormalities and a dilute coat color. Candidate genes based on comparative phenotypes in mice and humans include the ras-associated protein RAB27a (RAB27A) and myosin Va (MYO5A). Here we report mapping of the locus responsible for LFS using a small set of 36 horses segregating for LFS. These horses were genotyped using a newly available single nucleotide polymorphism (SNP) chip containing 56,402 discriminatory elements. The whole genome scan identified an associated region containing these two functional candidate genes. Exon sequencing of the MYO5A gene from an affected foal revealed a single base deletion in exon 30 that changes the reading frame and introduces a premature stop codon. A PCR–based Restriction Fragment Length Polymorphism (PCR–RFLP) assay was designed and used to investigate the frequency of the mutant gene. All affected horses tested were homozygous for this mutation. Heterozygous carriers were detected in high frequency in families segregating for this trait, and the frequency of carriers in unrelated Egyptian Arabians was 10.3%. The mapping and discovery of the LFS mutation represents the first successful use of whole-genome SNP scanning in the horse for any trait. The RFLP assay can be used to assist breeders in avoiding carrier-to-carrier matings and thus in preventing the birth of affected foals. PMID:20419149

  8. Whole-genome SNP association in the horse: identification of a deletion in myosin Va responsible for Lavender Foal Syndrome.

    Directory of Open Access Journals (Sweden)

    Samantha A Brooks

    2010-04-01

    Full Text Available Lavender Foal Syndrome (LFS is a lethal inherited disease of horses with a suspected autosomal recessive mode of inheritance. LFS has been primarily diagnosed in a subgroup of the Arabian breed, the Egyptian Arabian horse. The condition is characterized by multiple neurological abnormalities and a dilute coat color. Candidate genes based on comparative phenotypes in mice and humans include the ras-associated protein RAB27a (RAB27A and myosin Va (MYO5A. Here we report mapping of the locus responsible for LFS using a small set of 36 horses segregating for LFS. These horses were genotyped using a newly available single nucleotide polymorphism (SNP chip containing 56,402 discriminatory elements. The whole genome scan identified an associated region containing these two functional candidate genes. Exon sequencing of the MYO5A gene from an affected foal revealed a single base deletion in exon 30 that changes the reading frame and introduces a premature stop codon. A PCR-based Restriction Fragment Length Polymorphism (PCR-RFLP assay was designed and used to investigate the frequency of the mutant gene. All affected horses tested were homozygous for this mutation. Heterozygous carriers were detected in high frequency in families segregating for this trait, and the frequency of carriers in unrelated Egyptian Arabians was 10.3%. The mapping and discovery of the LFS mutation represents the first successful use of whole-genome SNP scanning in the horse for any trait. The RFLP assay can be used to assist breeders in avoiding carrier-to-carrier matings and thus in preventing the birth of affected foals.

  9. Whole Genome Epidemiological Typing of Escherichia coli

    DEFF Research Database (Denmark)

    Kaas, Rolf Sommer

    Escherichia coli (E. coli) is of huge importance in global health both as a commensal organism living within its host or as a pathogen causing millions of infections each year. Infections occur both sporadic and as outbreaks with sometimes up to thousands of infected people. To limit the number...... of infections it is important to monitor pathogenic E. coli in order to detect outbreaks as quickly as possible and find the source of the outbreak. The effectiveness of monitoring and tracking of pathogens is very dependent on the typing methods that are employed. Classical typing methods employed for E. coli......D thesis attempts to take the first steps toward such a method. In Kaas I all publicly available E. coli genomes sequenced (186) are analyzed. 1,702 core genes were found in all genomes. 3,051 genes were found in 95% of the genomes. The pan genome was found to consist of 16,373 genes. The overall phylogeny...

  10. A Bacterial Analysis Platform: An Integrated System for Analysing Bacterial Whole Genome Sequencing Data for Clinical Diagnostics and Surveillance

    DEFF Research Database (Denmark)

    Thomsen, Martin Christen Frølund; Ahrenfeldt, Johanne; Bellod Cisneros, Jose Luis

    2016-01-01

    web-based tools we developed a single pipeline for batch uploading of whole genome sequencing data from multiple bacterial isolates. The pipeline will automatically identify the bacterial species and, if applicable, assemble the genome, identify the multilocus sequence type, plasmids, virulence genes...... and antimicrobial resistance genes. A short printable report for each sample will be provided and an Excel spreadsheet containing all the metadata and a summary of the results for all submitted samples can be downloaded. The pipeline was benchmarked using datasets previously used to test the individual services...... and made publicly available, providing easy-to-use automated analysis of bacterial whole genome sequencing data. The platform may be of immediate relevance as a guide for investigators using whole genome sequencing for clinical diagnostics and surveillance. The platform is freely available at: https...

  11. A Whole Genome Association Study on Meat Palatability in Hanwoo

    Directory of Open Access Journals (Sweden)

    K.-E. Hyeong

    2014-09-01

    Full Text Available A whole genome association (WGA study was carried out to find quantitative trait loci (QTL for sensory evaluation traits in Hanwoo. Carcass samples of 250 Hanwoo steers were collected from National Agricultural Cooperative Livestock Research Institute, Ansung, Gyeonggi province, Korea, between 2011 and 2012 and genotyped with the Affymetrix Bovine Axiom Array 640K single nucleotide polymorphism (SNP chip. Among the SNPs in the chip, a total of 322,160 SNPs were chosen after quality control tests. After adjusting for the effects of age, slaughter-year-season, and polygenic effects using genome relationship matrix, the corrected phenotypes for the sensory evaluation measurements were regressed on each SNP using a simple linear regression additive based model. A total of 1,631 SNPs were detected for color, aroma, tenderness, juiciness and palatability at 0.1% comparison-wise level. Among the significant SNPs, the best set of 52 SNP markers were chosen using a forward regression procedure at 0.05 level, among which the sets of 8, 14, 11, 10, and 9 SNPs were determined for the respectively sensory evaluation traits. The sets of significant SNPs explained 18% to 31% of phenotypic variance. Three SNPs were pleiotropic, i.e. AX-26703353 and AX-26742891 that were located at 101 and 110 Mb of BTA6, respectively, influencing tenderness, juiciness and palatability, while AX-18624743 at 3 Mb of BTA10 affected tenderness and palatability. Our results suggest that some QTL for sensory measures are segregating in a Hanwoo steer population. Additional WGA studies on fatty acid and nutritional components as well as the sensory panels are in process to characterize genetic architecture of meat quality and palatability in Hanwoo.

  12. Whole genome sequencing analysis of lung adenocarcinoma in Xuanwei, China.

    Science.gov (United States)

    Wang, Xiao; Li, Jing; Duan, Yong; Wu, Huifei; Xu, Qiuyue; Zhang, Yanliang

    2017-03-01

    The lung cancer mortality rate in Xuanwei city is among the highest in China and adenocarcinoma is the major histological type. Lung cancer has been associated with exposure to indoor smoky coal emissions that contain high levels of polycyclic aromatic hydrocarbons; however, the pathogenesis of lung cancer has not yet been fully elucidated. We performed whole genome sequencing with lung adenocarcinoma and corresponding non-tumor tissue to explore the genomic features of Xuanwei lung cancer. We used the Molecule Annotation System to determine and plot alterations in genes and signaling pathways. A total of 3 428 060 and 3 416 989 single nucleotide variants were detected in tumor and normal genomes, respectively. After comparison of these two genomes, 977 high-confidence somatic single nucleotide variants were identified. We observed a remarkably high proportion of C·G-A·T transversions. HECTD4, RCBTB2, KLF15, and CACNA1C may be cancer-related genes. Nine copy number variations increased in chromosome 5 and one in chromosome 7. The novel junctions were detected via clustered discordant paired ends and 1955 structural variants were discovered. Among these, we found 44 novel chromosome structural variations. In addition, EGFR and CACNA1C in the mitogen-activated protein kinase signaling pathway were mutated or amplified in lung adenocarcinoma tumor tissue. We obtained a comprehensive view of somatic alterations of Xuanwei lung adenocarcinoma. These findings provide insight into the genomic landscape in order to further learn about the progress and development of Xuanwei lung adenocarcinoma. © 2017 The Authors. Thoracic Cancer published by China Lung Oncology Group and John Wiley & Sons Australia, Ltd.

  13. ecoPrimers: inference of new DNA barcode markers from whole genome sequence analysis

    OpenAIRE

    Riaz, Tiayyba; Shehzad, Wasim; Viari, Alain; Pompanon, Fran?ois; Taberlet, Pierre; Coissac, Eric

    2011-01-01

    Using non-conventional markers, DNA metabarcoding allows biodiversity assessment from complex substrates. In this article, we present ecoPrimers, a software for identifying new barcode markers and their associated PCR primers. ecoPrimers scans whole genomes to find such markers without a priori knowledge. ecoPrimers optimizes two quality indices measuring taxonomical range and discrimination to select the most efficient markers from a set of reference sequences, according to specific experime...

  14. Evolutionary history and adaptation from high-coverage whole-genome sequences of diverse African hunter-gatherers.

    Science.gov (United States)

    Lachance, Joseph; Vernot, Benjamin; Elbers, Clara C; Ferwerda, Bart; Froment, Alain; Bodo, Jean-Marie; Lema, Godfrey; Fu, Wenqing; Nyambo, Thomas B; Rebbeck, Timothy R; Zhang, Kun; Akey, Joshua M; Tishkoff, Sarah A

    2012-08-03

    To reconstruct modern human evolutionary history and identify loci that have shaped hunter-gatherer adaptation, we sequenced the whole genomes of five individuals in each of three different hunter-gatherer populations at > 60× coverage: Pygmies from Cameroon and Khoesan-speaking Hadza and Sandawe from Tanzania. We identify 13.4 million variants, substantially increasing the set of known human variation. We found evidence of archaic introgression in all three populations, and the distribution of time to most recent common ancestors from these regions is similar to that observed for introgressed regions in Europeans. Additionally, we identify numerous loci that harbor signatures of local adaptation, including genes involved in immunity, metabolism, olfactory and taste perception, reproduction, and wound healing. Within the Pygmy population, we identify multiple highly differentiated loci that play a role in growth and anterior pituitary function and are associated with height. Copyright © 2012 Elsevier Inc. All rights reserved.

  15. Whole genome multilocus sequence typing as an epidemiologic tool for Yersinia pestis.

    Science.gov (United States)

    Kingry, Luke C; Rowe, Lori A; Respicio-Kingry, Laurel B; Beard, Charles B; Schriefer, Martin E; Petersen, Jeannine M

    2016-04-01

    Human plague is a severe and often fatal zoonotic disease caused by Yersinia pestis. For public health investigations of human cases, nonintensive whole genome molecular typing tools, capable of defining epidemiologic relationships, are advantageous. Whole genome multilocus sequence typing (wgMLST) is a recently developed methodology that simplifies genomic analyses by transforming millions of base pairs of sequence into character data for each gene. We sequenced 13 US Y. pestis isolates with known epidemiologic relationships. Sequences were assembled de novo, and multilocus sequence typing alleles were assigned by comparison against 3979 open reading frames from the reference strain CO92. Allele-based cluster analysis accurately grouped the 13 isolates, as well as 9 publicly available Y. pestis isolates, by their epidemiologic relationships. Our findings indicate wgMLST is a simplified, sensitive, and scalable tool for epidemiologic analysis of Y. pestis strains. Published by Elsevier Inc.

  16. Whole-genome sequencing and analysis of the Malaysian cynomolgus macaque (Macaca fascicularis) genome.

    Science.gov (United States)

    Higashino, Atsunori; Sakate, Ryuichi; Kameoka, Yosuke; Takahashi, Ichiro; Hirata, Makoto; Tanuma, Reiko; Masui, Tohru; Yasutomi, Yasuhiro; Osada, Naoki

    2012-07-02

    The genetic background of the cynomolgus macaque (Macaca fascicularis) is made complex by the high genetic diversity, population structure, and gene introgression from the closely related rhesus macaque (Macaca mulatta). Herein we report the whole-genome sequence of a Malaysian cynomolgus macaque male with more than 40-fold coverage, which was determined using a resequencing method based on the Indian rhesus macaque genome. We identified approximately 9.7 million single nucleotide variants (SNVs) between the Malaysian cynomolgus and the Indian rhesus macaque genomes. Compared with humans, a smaller nonsynonymous/synonymous SNV ratio in the cynomolgus macaque suggests more effective removal of slightly deleterious mutations. Comparison of two cynomolgus (Malaysian and Vietnamese) and two rhesus (Indian and Chinese) macaque genomes, including previously published macaque genomes, suggests that Indochinese cynomolgus macaques have been more affected by gene introgression from rhesus macaques. We further identified 60 nonsynonymous SNVs that completely differentiated the cynomolgus and rhesus macaque genomes, and that could be important candidate variants for determining species-specific responses to drugs and pathogens. The demographic inference using the genome sequence data revealed that Malaysian cynomolgus macaques have experienced at least three population bottlenecks. This list of whole-genome SNVs will be useful for many future applications, such as an array-based genotyping system for macaque individuals. High-quality whole-genome sequencing of the cynomolgus macaque genome may aid studies on finding genetic differences that are responsible for phenotypic diversity in macaques and may help control genetic backgrounds among individuals.

  17. Whole-Genome Sequencing and Comparative Analysis of Mycobacterium brisbanense Reveals a Possible Soil Origin and Capability in Fertiliser Synthesis.

    Science.gov (United States)

    Wee, Wei Yee; Tan, Tze King; Jakubovics, Nicholas S; Choo, Siew Woh

    2016-01-01

    Mycobacterium brisbanense is a member of Mycobacterium fortuitum third biovariant complex, which includes rapidly growing Mycobacterium spp. that normally inhabit soil, dust and water, and can sometimes cause respiratory tract infections in humans. We present the first whole-genome analysis of M. brisbanense UM_WWY which was isolated from a 70-year-old Malaysian patient. Molecular phylogenetic analyses confirmed the identification of this strain as M. brisbanense and showed that it has an unusually large genome compared with related mycobacteria. The large genome size of M. brisbanense UM_WWY (~7.7Mbp) is consistent with further findings that this strain has a highly variable genome structure that contains many putative horizontally transferred genomic islands and prophage. Comparative analysis showed that M. brisbanense UM_WWY is the only Mycobacterium species that possesses a complete set of genes encoding enzymes involved in the urea cycle, suggesting that this soil bacterium is able to synthesize urea for use as plant fertilizers. It is likely that M. brisbanense UM_WWY is adapted to live in soil as its primary habitat since the genome contains many genes associated with nitrogen metabolism. Nevertheless, a large number of predicted virulence genes were identified in M. brisbanense UM_WWY that are mostly shared with well-studied mycobacterial pathogens such as Mycobacterium tuberculosis and Mycobacterium abscessus. These findings are consistent with the role of M. brisbanense as an opportunistic pathogen of humans. The whole-genome study of UM_WWY has provided the basis for future work of M. brisbanense.

  18. Prediction of maize phenotype based on whole-genome single nucleotide polymorphisms using deep belief networks

    Science.gov (United States)

    Rachmatia, H.; Kusuma, W. A.; Hasibuan, L. S.

    2017-05-01

    Selection in plant breeding could be more effective and more efficient if it is based on genomic data. Genomic selection (GS) is a new approach for plant-breeding selection that exploits genomic data through a mechanism called genomic prediction (GP). Most of GP models used linear methods that ignore effects of interaction among genes and effects of higher order nonlinearities. Deep belief network (DBN), one of the architectural in deep learning methods, is able to model data in high level of abstraction that involves nonlinearities effects of the data. This study implemented DBN for developing a GP model utilizing whole-genome Single Nucleotide Polymorphisms (SNPs) as data for training and testing. The case study was a set of traits in maize. The maize dataset was acquisitioned from CIMMYT’s (International Maize and Wheat Improvement Center) Global Maize program. Based on Pearson correlation, DBN is outperformed than other methods, kernel Hilbert space (RKHS) regression, Bayesian LASSO (BL), best linear unbiased predictor (BLUP), in case allegedly non-additive traits. DBN achieves correlation of 0.579 within -1 to 1 range.

  19. The "most wanted" taxa from the human microbiome for whole genome sequencing.

    Directory of Open Access Journals (Sweden)

    Anthony A Fodor

    Full Text Available The goal of the Human Microbiome Project (HMP is to generate a comprehensive catalog of human-associated microorganisms including reference genomes representing the most common species. Toward this goal, the HMP has characterized the microbial communities at 18 body habitats in a cohort of over 200 healthy volunteers using 16S rRNA gene (16S sequencing and has generated nearly 1,000 reference genomes from human-associated microorganisms. To determine how well current reference genome collections capture the diversity observed among the healthy microbiome and to guide isolation and future sequencing of microbiome members, we compared the HMP's 16S data sets to several reference 16S collections to create a 'most wanted' list of taxa for sequencing. Our analysis revealed that the diversity of commonly occurring taxa within the HMP cohort microbiome is relatively modest, few novel taxa are represented by these OTUs and many common taxa among HMP volunteers recur across different populations of healthy humans. Taken together, these results suggest that it should be possible to perform whole-genome sequencing on a large fraction of the human microbiome, including the 'most wanted', and that these sequences should serve to support microbiome studies across multiple cohorts. Also, in stark contrast to other taxa, the 'most wanted' organisms are poorly represented among culture collections suggesting that novel culture- and single-cell-based methods will be required to isolate these organisms for sequencing.

  20. Identification of hallmarks of lung adenocarcinoma prognosis using whole genome sequencing

    Science.gov (United States)

    Liu, Li; Huang, Jiao; Wang, Ke; Li, Li; Li, Yangkai; Yuan, Jingsong; Wei, Sheng

    2015-01-01

    In conjunction with clinical characteristics, prognostic biomarkers are essential for choosing optimal therapies to lower the mortality of lung adenocarcinoma. Whole genome sequencing (WGS) of 7 cancerous-noncancerous tissue pairs was performed to explore the comparative copy number variations (CNVs) associated with lung adenocarcinoma. The frequencies of top ranked CNVs were verified in an independent set of 114 patients and then the roles of target CNVs in disease prognosis were assessed in 313 patients. The WGS yielded 2604 CNVs. After frequency validation and biological function screening of top 10 CNVs, 9 mutant driver genes from 7 CNVs were further analyzed for an association with survival. Compared with the PBXIP1 amplified copy number, unamplified carriers had a 0.62-fold (95%CI = 0.43–0.91) decreased risk of death. Compared with an amplified TERT, those with an unamplified TERT had a 35% reduction (95% CI = 3%–56%) in risk of lung adenocarcinoma progression. Cases with both unamplified PBXIP1 and TERT had a median 34.32-month extension of overall survival and 34.55-month delay in disease progression when compared with both amplified CNVs. This study demonstrates that CNVs of TERT and PBXIP1 have the potential to translate into the clinic and be used to improve outcomes for patients with this fatal disease. PMID:26497366

  1. Whole genome sequencing of Plasmodium falciparum from dried blood spots using selective whole genome amplification.

    Science.gov (United States)

    Oyola, Samuel O; Ariani, Cristina V; Hamilton, William L; Kekre, Mihir; Amenga-Etego, Lucas N; Ghansah, Anita; Rutledge, Gavin G; Redmond, Seth; Manske, Magnus; Jyothi, Dushyanth; Jacob, Chris G; Otto, Thomas D; Rockett, Kirk; Newbold, Chris I; Berriman, Matthew; Kwiatkowski, Dominic P

    2016-12-20

    Translating genomic technologies into healthcare applications for the malaria parasite Plasmodium falciparum has been limited by the technical and logistical difficulties of obtaining high quality clinical samples from the field. Sampling by dried blood spot (DBS) finger-pricks can be performed safely and efficiently with minimal resource and storage requirements compared with venous blood (VB). Here, the use of selective whole genome amplification (sWGA) to sequence the P. falciparum genome from clinical DBS samples was evaluated, and the results compared with current methods that use leucodepleted VB. Parasite DNA with high (>95%) human DNA contamination was selectively amplified by Phi29 polymerase using short oligonucleotide probes of 8-12 mers as primers. These primers were selected on the basis of their differential frequency of binding the desired (P. falciparum DNA) and contaminating (human) genomes. Using sWGA method, clinical samples from 156 malaria patients, including 120 paired samples for head-to-head comparison of DBS and leucodepleted VB were sequenced. Greater than 18-fold enrichment of P. falciparum DNA was achieved from DBS extracts. The parasitaemia threshold to achieve >5× coverage for 50% of the genome was 0.03% (40 parasites per 200 white blood cells). Over 99% SNP concordance between VB and DBS samples was achieved after excluding missing calls. The sWGA methods described here provide a reliable and scalable way of generating P. falciparum genome sequence data from DBS samples. The current data indicate that it will be possible to get good quality sequence on most if not all drug resistance loci from the majority of symptomatic malaria patients. This technique overcomes a major limiting factor in P. falciparum genome sequencing from field samples, and paves the way for large-scale epidemiological applications.

  2. Effective normalization for copy number variation detection from whole genome sequencing.

    Science.gov (United States)

    Janevski, Angel; Varadan, Vinay; Kamalakaran, Sitharthan; Banerjee, Nilanjana; Dimitrova, Nevenka

    2012-01-01

    Whole genome sequencing enables a high resolution view of the human genome and provides unique insights into genome structure at an unprecedented scale. There have been a number of tools to infer copy number variation in the genome. These tools, while validated, also include a number of parameters that are configurable to genome data being analyzed. These algorithms allow for normalization to account for individual and population-specific effects on individual genome CNV estimates but the impact of these changes on the estimated CNVs is not well characterized. We evaluate in detail the effect of normalization methodologies in two CNV algorithms FREEC and CNV-seq using whole genome sequencing data from 8 individuals spanning four populations. We apply FREEC and CNV-seq to a sequencing data set consisting of 8 genomes. We use multiple configurations corresponding to different read-count normalization methodologies in FREEC, and statistically characterize the concordance of the CNV calls between FREEC configurations and the analogous output from CNV-seq. The normalization methodologies evaluated in FREEC are: GC content, mappability and control genome. We further stratify the concordance analysis within genic, non-genic, and a collection of validated variant regions. The GC content normalization methodology generates the highest number of altered copy number regions. Both mappability and control genome normalization reduce the total number and length of copy number regions. Mappability normalization yields Jaccard indices in the 0.07 - 0.3 range, whereas using a control genome normalization yields Jaccard index values around 0.4 with normalization based on GC content. The most critical impact of using mappability as a normalization factor is substantial reduction of deletion CNV calls. The output of another method based on control genome normalization, CNV-seq, resulted in comparable CNV call profiles, and substantial agreement in variable gene and CNV region calls

  3. Whole-genome sequence of Schistosoma haematobium.

    Science.gov (United States)

    Young, Neil D; Jex, Aaron R; Li, Bo; Liu, Shiping; Yang, Linfeng; Xiong, Zijun; Li, Yingrui; Cantacessi, Cinzia; Hall, Ross S; Xu, Xun; Chen, Fangyuan; Wu, Xuan; Zerlotini, Adhemar; Oliveira, Guilherme; Hofmann, Andreas; Zhang, Guojie; Fang, Xiaodong; Kang, Yi; Campbell, Bronwyn E; Loukas, Alex; Ranganathan, Shoba; Rollinson, David; Rinaldi, Gabriel; Brindley, Paul J; Yang, Huanming; Wang, Jun; Wang, Jian; Gasser, Robin B

    2012-01-15

    Schistosomiasis is a neglected tropical disease caused by blood flukes (genus Schistosoma; schistosomes) and affecting 200 million people worldwide. No vaccines are available, and treatment relies on one drug, praziquantel. Schistosoma haematobium has come into the spotlight as a major cause of urogenital disease, as an agent linked to bladder cancer and as a predisposing factor for HIV/AIDS. The parasite is transmitted to humans from freshwater snails. Worms dwell in blood vessels and release eggs that become embedded in the bladder wall to elicit chronic immune-mediated disease and induce squamous cell carcinoma. Here we sequenced the 385-Mb genome of S. haematobium using Illumina-based technology at 74-fold coverage and compared it to sequences from related parasites. We included genome annotation based on function, gene ontology, networking and pathway mapping. This genome now provides an unprecedented resource for many fundamental research areas and shows great promise for the design of new disease interventions.

  4. Whole genome comparative analysis of four Georgian grape cultivars.

    Science.gov (United States)

    Tabidze, V; Pipia, I; Gogniashvili, M; Kunelauri, N; Ujmajuridze, L; Pirtskhalava, M; Vishnepolsky, B; Hernandez, A G; Fields, C J; Beridze, Tengiz

    2017-12-01

    Grapevine is the one of the most important fruit species in the world. Comparative genome sequencing of grape cultivars is very important for the interpretation of the grape genome and understanding its evolution. The genomes of four Georgian grape cultivars-Chkhaveri, Saperavi, Meskhetian green, and Rkatsiteli, belonging to different haplogroups, were resequenced. The shotgun genomic libraries of grape cultivars were sequenced on an Illumina HiSeq. Pinot Noir nuclear, mitochondrial, and chloroplast DNA were used as reference. Mitochondrial DNA of Chkhaveri closely matches that of the reference Pinot noir mitochondrial DNA, with the exception of 16 SNPs found in the Chkhaveri mitochondrial DNA. The number of SNPs in mitochondrial DNA from Saperavi, Meskhetian green, and Rkatsiteli was 764, 702, and 822, respectively. Nuclear DNA differs from the reference by 1,800,675 nt in Chkhaveri, 1,063,063 nt in Meskhetian green, 2,174,995 in Saperavi, and 5,011,513 in Rkatsiteli. Unlike mtDNA Pinot noir, chromosomal DNA is closer to the Meskhetian green than to other cultivars. Substantial differences in the number of SNPs in mitochondrial and nuclear DNA of Chkhaveri and Pinot noir cultivars are explained by backcrossing or introgression of their wild predecessors before or during the process of domestication. Annotation of chromosomal DNA of Georgian grape cultivars by MEGANTE, a web-based annotation system, shows 66,745 predicted genes (Chkhaveri-17,409; Saperavi-17,021; Meskhetian green-18,355; and Rkatsiteli-13,960). Among them, 106 predicted genes and 43 pseudogenes of terpene synthase genes were found in chromosomes 12, 18 random (18R), and 19. Four novel TPS genes not present in reference Pinot noir DNA were detected. Two of them-germacrene A synthase (Chromosome 18R) and (-) germacrene D synthase (Chromosome 19) can be identified as putatively full-length proteins. This work performs the first attempt of the comparative whole genome analysis of different haplogroups

  5. Whole genome amplification: Use of advanced isothermal method

    African Journals Online (AJOL)

    Yomi

    2010-12-29

    Dec 29, 2010 ... Whole genome amplification: Use of advanced isothermal method. Sima Moghaddaszadeh Ahrabi1, Safar Farajnia2,3*, Ghodratollah Rahimi-Mianji4, Soheila. Montazer Saheb3 ... Moreover, application of high fidelity and high possessive DNA ..... between I-PEP with MDA by using serial dilutions of.

  6. Whole genome amplification: Use of advanced isothermal method ...

    African Journals Online (AJOL)

    Laboratory method for amplifying genomic deoxyribonucleic acid (DNA) samples aiming to generate more amounts and sufficient quantity DNA for subsequent specific analysis is named whole genome amplification (WGA). This method is only way to increase input material from few cells and limited DNA contents.

  7. Comparative Copy Number Variation From Whole Genome Sequencing

    NARCIS (Netherlands)

    Janevski, A.; Varadan, V.; Kamalakaran, S.; Banerjee, N.; Dimitrova, D.

    2011-01-01

    Whole genome sequencing enables a high resolution view of the humangenome and enables unique insights into copy number variations in anunprecedented scale. Numerous tools and studies have already been introduced that provide confirmatory and new genomic variability datain individuals and across

  8. Whole-Genome Sequences of Three Symbiotic Endozoicomonas Bacteria

    KAUST Repository

    Neave, Matthew J.

    2014-08-14

    Members of the genus Endozoicomonas associate with a wide range of marine organisms. Here, we report on the whole-genome sequencing, assembly, and annotation of three Endozoicomonas type strains. These data will assist in exploring interactions between Endozoicomonas organisms and their hosts, and it will aid in the assembly of genomes from uncultivated Endozoicomonas spp.

  9. Analysis of phage Mu DNA transposition by whole-genome ...

    Indian Academy of Sciences (India)

    (Trilink Biotechnologies) were employed. Sample DNA. (ChIP or processed Mu DNA) was amplified with Cy5-9mer primer, and reference DNA (Input or whole genome DNA) with Cy3-9mer primer. The samples were loaded on microarray slides and subjected to standard hybridization procedures (NimbleGen Arrays User's ...

  10. Optimized design and assessment of whole genome tiling arrays.

    NARCIS (Netherlands)

    Graf, S.; Nielsen, F.G.G.; Kurtz, S.; Huynen, M.A.; Birney, E.; Stunnenberg, H.G.; Flicek, P.

    2007-01-01

    MOTIVATION: Recent advances in microarray technologies have made it feasible to interrogate whole genomes with tiling arrays and this technique is rapidly becoming one of the most important high-throughput functional genomics assays. For large mammalian genomes, analyzing oligonucleotide tiling

  11. SNP detection for massively parallel whole-genome resequencing.

    Science.gov (United States)

    Li, Ruiqiang; Li, Yingrui; Fang, Xiaodong; Yang, Huanming; Wang, Jian; Kristiansen, Karsten; Wang, Jun

    2009-06-01

    Next-generation massively parallel sequencing technologies provide ultrahigh throughput at two orders of magnitude lower unit cost than capillary Sanger sequencing technology. One of the key applications of next-generation sequencing is studying genetic variation between individuals using whole-genome or target region resequencing. Here, we have developed a consensus-calling and SNP-detection method for sequencing-by-synthesis Illumina Genome Analyzer technology. We designed this method by carefully considering the data quality, alignment, and experimental errors common to this technology. All of this information was integrated into a single quality score for each base under Bayesian theory to measure the accuracy of consensus calling. We tested this methodology using a large-scale human resequencing data set of 36x coverage and assembled a high-quality nonrepetitive consensus sequence for 92.25% of the diploid autosomes and 88.07% of the haploid X chromosome. Comparison of the consensus sequence with Illumina human 1M BeadChip genotyped alleles from the same DNA sample showed that 98.6% of the 37,933 genotyped alleles on the X chromosome and 98% of 999,981 genotyped alleles on autosomes were covered at 99.97% and 99.84% consistency, respectively. At a low sequencing depth, we used prior probability of dbSNP alleles and were able to improve coverage of the dbSNP sites significantly as compared to that obtained using a nonimputation model. Our analyses demonstrate that our method has a very low false call rate at any sequencing depth and excellent genome coverage at a high sequencing depth.

  12. Rapid Identification of Potential Drugs for Diabetic Nephropathy Using Whole-Genome Expression Profiles of Glomeruli

    Directory of Open Access Journals (Sweden)

    Jingsong Shi

    2016-01-01

    Full Text Available Objective. To investigate potential drugs for diabetic nephropathy (DN using whole-genome expression profiles and the Connectivity Map (CMAP. Methodology. Eighteen Chinese Han DN patients and six normal controls were included in this study. Whole-genome expression profiles of microdissected glomeruli were measured using the Affymetrix human U133 plus 2.0 chip. Differentially expressed genes (DEGs between late stage and early stage DN samples and the CMAP database were used to identify potential drugs for DN using bioinformatics methods. Results. (1 A total of 1065 DEGs (FDR 1.5 were found in late stage DN patients compared with early stage DN patients. (2 Piperlongumine, 15d-PGJ2 (15-delta prostaglandin J2, vorinostat, and trichostatin A were predicted to be the most promising potential drugs for DN, acting as NF-κB inhibitors, histone deacetylase inhibitors (HDACIs, PI3K pathway inhibitors, or PPARγ agonists, respectively. Conclusion. Using whole-genome expression profiles and the CMAP database, we rapidly predicted potential DN drugs, and therapeutic potential was confirmed by previously published studies. Animal experiments and clinical trials are needed to confirm both the safety and efficacy of these drugs in the treatment of DN.

  13. Whole-Genome Sequencing Reveals Genetic Variation in the Asian House Rat.

    Science.gov (United States)

    Teng, Huajing; Zhang, Yaohua; Shi, Chengmin; Mao, Fengbiao; Hou, Lingling; Guo, Hongling; Sun, Zhongsheng; Zhang, Jianxu

    2016-07-07

    Whole-genome sequencing of wild-derived rat species can provide novel genomic resources, which may help decipher the genetics underlying complex phenotypes. As a notorious pest, reservoir of human pathogens, and colonizer, the Asian house rat, Rattus tanezumi, is successfully adapted to its habitat. However, little is known regarding genetic variation in this species. In this study, we identified over 41,000,000 single-nucleotide polymorphisms, plus insertions and deletions, through whole-genome sequencing and bioinformatics analyses. Moreover, we identified over 12,000 structural variants, including 143 chromosomal inversions. Further functional analyses revealed several fixed nonsense mutations associated with infection and immunity-related adaptations, and a number of fixed missense mutations that may be related to anticoagulant resistance. A genome-wide scan for loci under selection identified various genes related to neural activity. Our whole-genome sequencing data provide a genomic resource for future genetic studies of the Asian house rat species and have the potential to facilitate understanding of the molecular adaptations of rats to their ecological niches. Copyright © 2016 Teng et al.

  14. Personalized oncogenomics: clinical experience with malignant peritoneal mesothelioma using whole genome sequencing.

    Directory of Open Access Journals (Sweden)

    Brandon S Sheffield

    Full Text Available Peritoneal mesothelioma is a rare and sometimes lethal malignancy that presents a clinical challenge for both diagnosis and management. Recent studies have led to a better understanding of the molecular biology of peritoneal mesothelioma. Translation of the emerging data into better treatments and outcome is needed. From two patients with peritoneal mesothelioma, we derived whole genome sequences, RNA expression profiles, and targeted deep sequencing data. Molecular data were made available for translation into a clinical treatment plan. Treatment responses and outcomes were later examined in the context of molecular findings. Molecular studies presented here provide the first reported whole genome sequences of peritoneal mesothelioma. Mutations in known mesothelioma-related genes NF2, CDKN2A, LATS2, amongst others, were identified. Activation of MET-related signaling pathways was demonstrated in both cases. A hypermutated phenotype was observed in one case (434 vs. 18 single nucleotide variants and was associated with a favourable outcome despite sarcomatoid histology and multifocal disease. This study represents the first report of whole genome analyses of peritoneal mesothelioma, a key step in the understanding and treatment of this disease.

  15. Microarray-based whole-genome hybridization as a tool for determining procaryotic species relatedness

    Energy Technology Data Exchange (ETDEWEB)

    Wu, L.; Liu, X.; Fields, M.W.; Thompson, D.K.; Bagwell, C.E.; Tiedje, J. M.; Hazen, T.C.; Zhou, J.

    2008-01-15

    The definition and delineation of microbial species are of great importance and challenge due to the extent of evolution and diversity. Whole-genome DNA-DNA hybridization is the cornerstone for defining procaryotic species relatedness, but obtaining pairwise DNA-DNA reassociation values for a comprehensive phylogenetic analysis of procaryotes is tedious and time consuming. A previously described microarray format containing whole-genomic DNA (the community genome array or CGA) was rigorously evaluated as a high-throughput alternative to the traditional DNA-DNA reassociation approach for delineating procaryotic species relationships. DNA similarities for multiple bacterial strains obtained with the CGA-based hybridization were comparable to those obtained with various traditional whole-genome hybridization methods (r=0.87, P<0.01). Significant linear relationships were also observed between the CGA-based genome similarities and those derived from small subunit (SSU) rRNA gene sequences (r=0.79, P<0.0001), gyrB sequences (r=0.95, P<0.0001) or REP- and BOX-PCR fingerprinting profiles (r=0.82, P<0.0001). The CGA hybridization-revealed species relationships in several representative genera, including Pseudomonas, Azoarcus and Shewanella, were largely congruent with previous classifications based on various conventional whole-genome DNA-DNA reassociation, SSU rRNA and/or gyrB analyses. These results suggest that CGA-based DNA-DNA hybridization could serve as a powerful, high-throughput format for determining species relatedness among microorganisms.

  16. Whole-Genome Analysis of Candidate genes Associated with Seed Size and Weight in Sorghum bicolor Reveals Signatures of Artificial Selection and Insights into Parallel Domestication in Cereal Crops.

    Science.gov (United States)

    Tao, Yongfu; Mace, Emma S; Tai, Shuaishuai; Cruickshank, Alan; Campbell, Bradley C; Zhao, Xianrong; Van Oosterom, Erik J; Godwin, Ian D; Botella, Jose R; Jordan, David R

    2017-01-01

    Seed size and seed weight are major quality attributes and important determinants of yield that have been strongly selected for during crop domestication. Limited information is available about the genetic control and genes associated with seed size and weight in sorghum. This study identified sorghum orthologs of genes with proven effects on seed size and weight in other plant species and searched for evidence of selection during domestication by utilizing resequencing data from a diversity panel. In total, 114 seed size candidate genes were identified in sorghum, 63 of which exhibited signals of purifying selection during domestication. A significant number of these genes also had domestication signatures in maize and rice, consistent with the parallel domestication of seed size in cereals. Seed size candidate genes that exhibited differentially high expression levels in seed were also found more likely to be under selection during domestication, supporting the hypothesis that modification to seed size during domestication preferentially targeted genes for intrinsic seed size rather than genes associated with physiological factors involved in the carbohydrate supply and transport. Our results provide improved understanding of the complex genetic control of seed size and weight and the impact of domestication on these genes.

  17. Clinical interpretation and implications of whole-genome sequencing.

    Science.gov (United States)

    Dewey, Frederick E; Grove, Megan E; Pan, Cuiping; Goldstein, Benjamin A; Bernstein, Jonathan A; Chaib, Hassan; Merker, Jason D; Goldfeder, Rachel L; Enns, Gregory M; David, Sean P; Pakdaman, Neda; Ormond, Kelly E; Caleshu, Colleen; Kingham, Kerry; Klein, Teri E; Whirl-Carrillo, Michelle; Sakamoto, Kenneth; Wheeler, Matthew T; Butte, Atul J; Ford, James M; Boxer, Linda; Ioannidis, John P A; Yeung, Alan C; Altman, Russ B; Assimes, Themistocles L; Snyder, Michael; Ashley, Euan A; Quertermous, Thomas

    2014-03-12

    Whole-genome sequencing (WGS) is increasingly applied in clinical medicine and is expected to uncover clinically significant findings regardless of sequencing indication. To examine coverage and concordance of clinically relevant genetic variation provided by WGS technologies; to quantitate inherited disease risk and pharmacogenomic findings in WGS data and resources required for their discovery and interpretation; and to evaluate clinical action prompted by WGS findings. An exploratory study of 12 adult participants recruited at Stanford University Medical Center who underwent WGS between November 2011 and March 2012. A multidisciplinary team reviewed all potentially reportable genetic findings. Five physicians proposed initial clinical follow-up based on the genetic findings. Genome coverage and sequencing platform concordance in different categories of genetic disease risk, person-hours spent curating candidate disease-risk variants, interpretation agreement between trained curators and disease genetics databases, burden of inherited disease risk and pharmacogenomic findings, and burden and interrater agreement of proposed clinical follow-up. Depending on sequencing platform, 10% to 19% of inherited disease genes were not covered to accepted standards for single nucleotide variant discovery. Genotype concordance was high for previously described single nucleotide genetic variants (99%-100%) but low for small insertion/deletion variants (53%-59%). Curation of 90 to 127 genetic variants in each participant required a median of 54 minutes (range, 5-223 minutes) per genetic variant, resulted in moderate classification agreement between professionals (Gross κ, 0.52; 95% CI, 0.40-0.64), and reclassified 69% of genetic variants cataloged as disease causing in mutation databases to variants of uncertain or lesser significance. Two to 6 personal disease-risk findings were discovered in each participant, including 1 frameshift deletion in the BRCA1 gene implicated in

  18. Whole-Genome Sequences of Two Campylobacter coli Isolates from the Antimicrobial Resistance Monitoring Program in Colombia.

    Science.gov (United States)

    Bernal, Johan F; Donado-Godoy, Pilar; Valencia, María Fernanda; León, Maribel; Gómez, Yolanda; Rodríguez, Fernando; Agarwala, Richa; Landsman, David; Mariño-Ramírez, Leonardo

    2016-03-17

    Campylobacter coli, along with Campylobacter jejuni, is a major agent of gastroenteritis and acute enterocolitis in humans. We report the whole-genome sequences of two multidrug-resistance C. coli strains, isolated from the Colombian poultry chain. The isolates contain a variety of antimicrobial resistance genes for aminoglycosides, lincosamides, fluoroquinolones, and tetracycline. Copyright © 2016 Bernal et al.

  19. Inferring demography from runs of homozygosity in whole-genome sequence, with correction for sequence errors.

    Science.gov (United States)

    MacLeod, Iona M; Larkin, Denis M; Lewin, Harris A; Hayes, Ben J; Goddard, Mike E

    2013-09-01

    Whole-genome sequence is potentially the richest source of genetic data for inferring ancestral demography. However, full sequence also presents significant challenges to fully utilize such large data sets and to ensure that sequencing errors do not introduce bias into the inferred demography. Using whole-genome sequence data from two Holstein cattle, we demonstrate a new method to correct for bias caused by hidden errors and then infer stepwise changes in ancestral demography up to present. There was a strong upward bias in estimates of recent effective population size (Ne) if the correction method was not applied to the data, both for our method and the Li and Durbin (Inference of human population history from individual whole-genome sequences. Nature 475:493-496) pairwise sequentially Markovian coalescent method. To infer demography, we use an analytical predictor of multiloci linkage disequilibrium (LD) based on a simple coalescent model that allows for changes in Ne. The LD statistic summarizes the distribution of runs of homozygosity for any given demography. We infer a best fit demography as one that predicts a match with the observed distribution of runs of homozygosity in the corrected sequence data. We use multiloci LD because it potentially holds more information about ancestral demography than pairwise LD. The inferred demography indicates a strong reduction in the Ne around 170,000 years ago, possibly related to the divergence of African and European Bos taurus cattle. This is followed by a further reduction coinciding with the period of cattle domestication, with Ne of between 3,500 and 6,000. The most recent reduction of Ne to approximately 100 in the Holstein breed agrees well with estimates from pedigrees. Our approach can be applied to whole-genome sequence from any diploid species and can be scaled up to use sequence from multiple individuals.

  20. Inferring Demography from Runs of Homozygosity in Whole-Genome Sequence, with Correction for Sequence Errors

    Science.gov (United States)

    MacLeod, Iona M.; Larkin, Denis M.; Lewin, Harris A.; Hayes, Ben J.; Goddard, Mike E.

    2013-01-01

    Whole-genome sequence is potentially the richest source of genetic data for inferring ancestral demography. However, full sequence also presents significant challenges to fully utilize such large data sets and to ensure that sequencing errors do not introduce bias into the inferred demography. Using whole-genome sequence data from two Holstein cattle, we demonstrate a new method to correct for bias caused by hidden errors and then infer stepwise changes in ancestral demography up to present. There was a strong upward bias in estimates of recent effective population size (Ne) if the correction method was not applied to the data, both for our method and the Li and Durbin (Inference of human population history from individual whole-genome sequences. Nature 475:493–496) pairwise sequentially Markovian coalescent method. To infer demography, we use an analytical predictor of multiloci linkage disequilibrium (LD) based on a simple coalescent model that allows for changes in Ne. The LD statistic summarizes the distribution of runs of homozygosity for any given demography. We infer a best fit demography as one that predicts a match with the observed distribution of runs of homozygosity in the corrected sequence data. We use multiloci LD because it potentially holds more information about ancestral demography than pairwise LD. The inferred demography indicates a strong reduction in the Ne around 170,000 years ago, possibly related to the divergence of African and European Bos taurus cattle. This is followed by a further reduction coinciding with the period of cattle domestication, with Ne of between 3,500 and 6,000. The most recent reduction of Ne to approximately 100 in the Holstein breed agrees well with estimates from pedigrees. Our approach can be applied to whole-genome sequence from any diploid species and can be scaled up to use sequence from multiple individuals. PMID:23842528

  1. Epigenetic regulation of subgenome dominance following whole genome triplication in Brassica rapa.

    Science.gov (United States)

    Cheng, Feng; Sun, Chao; Wu, Jian; Schnable, James; Woodhouse, Margaret R; Liang, Jianli; Cai, Chengcheng; Freeling, Michael; Wang, Xiaowu

    2016-07-01

    Subgenome dominance is an important phenomenon observed in allopolyploids after whole genome duplication, in which one subgenome retains more genes as well as contributes more to the higher expressing gene copy of paralogous genes. To dissect the mechanism of subgenome dominance, we systematically investigated the relationships of gene expression, transposable element (TE) distribution and small RNA targeting, relating to the multicopy paralogous genes generated from whole genome triplication in Brassica rapa. The subgenome dominance was found to be regulated by a relatively stable factor established previously, then inherited by and shared among B. rapa varieties. In addition, we found a biased distribution of TEs between flanking regions of paralogous genes. Furthermore, the 24-nt small RNAs target TEs and are negatively correlated to the dominant expression of individual paralogous gene pairs. The biased distribution of TEs among subgenomes and the targeting of 24-nt small RNAs together produce the dominant expression phenomenon at a subgenome scale. Based on these findings, we propose a bucket hypothesis to illustrate subgenome dominance and hybrid vigor. Our findings and hypothesis are valuable for the evolutionary study of polyploids, and may shed light on studies of hybrid vigor, which is common to most species. © 2016 The Authors. New Phytologist © 2016 New Phytologist Trust.

  2. Whole-Genome Relationships among Francisella Bacteria of Diverse Origins Define New Species and Provide Specific Regions for Detection.

    Science.gov (United States)

    Challacombe, Jean F; Petersen, Jeannine M; Gallegos-Graves, La Verne; Hodge, David; Pillai, Segaran; Kuske, Cheryl R

    2017-02-01

    Francisella tularensis is a highly virulent zoonotic pathogen that causes tularemia and, because of weaponization efforts in past world wars, is considered a tier 1 biothreat agent. Detection and surveillance of F. tularensis may be confounded by the presence of uncharacterized, closely related organisms. Through DNA-based diagnostics and environmental surveys, novel clinical and environmental Francisella isolates have been obtained in recent years. Here we present 7 new Francisella genomes and a comparison of their characteristics to each other and to 24 publicly available genomes as well as a comparative analysis of 16S rRNA and sdhA genes from over 90 Francisella strains. Delineation of new species in bacteria is challenging, especially when isolates having very close genomic characteristics exhibit different physiological features-for example, when some are virulent pathogens in humans and animals while others are nonpathogenic or are opportunistic pathogens. Species resolution within Francisella varies with analyses of single genes, multiple gene or protein sets, or whole-genome comparisons of nucleic acid and amino acid sequences. Analyses focusing on single genes (16S rRNA, sdhA), multiple gene sets (virulence genes, lipopolysaccharide [LPS] biosynthesis genes, pathogenicity island), and whole-genome comparisons (nucleotide and protein) gave congruent results, but with different levels of discrimination confidence. We designate four new species within the genus; Francisella opportunistica sp. nov. (MA06-7296), Francisella salina sp. nov. (TX07-7308), Francisella uliginis sp. nov. (TX07-7310), and Francisella frigiditurris sp. nov. (CA97-1460). This study provides a robust comparative framework to discern species and virulence features of newly detected Francisella bacteria. DNA-based detection and sequencing methods have identified thousands of new bacteria in the human body and the environment. In most cases, there are no cultured isolates that correspond

  3. Assessing molecular initiating events (MIEs), key events (KEs) and modulating factors (MFs) for styrene responses in mouse lungs using whole genome gene expression profiling following 1-day and multi-week exposures.

    Science.gov (United States)

    Andersen, Melvin E; Cruzan, George; Black, Michael B; Pendse, Salil N; Dodd, Darol; Bus, James S; Sarang, Satinder S; Banton, Marcy I; Waites, Robbie; McMullen, Patrick D

    2017-11-15

    Styrene increased lung tumors in mice at chronic inhalation exposures of 20ppm and greater. MIEs, KEs and MFs were examined using gene expression in three strains of male mice (the parental C57BL/6 strain, a CYP2F2(-/-) knock out and a CYP2F2(-/-) transgenic containing human CYP2F1, 2A13 and 2B6). Exposures were for 1-day and 1, 4 and 26weeks. After 1-day exposures at 1, 5, 10, 20, 40 and 120ppm significant increases in differentially expressed genes (DEGs) occurred only in parental strain lungs where there was already an increase in DEGs at 5ppm and then many thousands of DEGs by 120ppm. Enrichment for 1-day and 1-week exposures included cell cycle, mitotic M-M/G1 phases, DNA-synthesis and metabolism of lipids and lipoproteins pathways. The numbers of DEGs decreased steadily over time with no DEGs meeting both statistical significance and fold-change criteria at 26weeks. At 4 and 26weeks, some key transcription factors (TFs) - Nr1d1, Nr1d2, Dbp, Tef, Hlf, Per3, Per2 and Bhlhe40 - were upregulated (|FC|>1.5), while others - Npas, Arntl, Nfil3, Nr4a1, Nr4a2, and Nr4a3 - were down-regulated. At all times, consistent changes in gene expression only occurred in the parental strain. Our results support a MIE for styrene of direct mitogenicity from mouse-specific CYP2F2-mediated metabolites activating Nr4a signaling. Longer-term MFs include down-regulation of Nr4a genes and shifts in both circadian clock TFs and other TFs, linking circadian clock to cellular metabolism. We found no gene expression changes indicative of cytotoxicity or activation of p53-mediated DNA-damage pathways. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

  4. Using Whole Genome Analysis to Examine Recombination across Diverse Sequence Types of Staphylococcus aureus.

    Directory of Open Access Journals (Sweden)

    Elizabeth M Driebe

    Full Text Available Staphylococcus aureus is an important clinical pathogen worldwide and understanding this organism's phylogeny and, in particular, the role of recombination, is important both to understand the overall spread of virulent lineages and to characterize outbreaks. To further elucidate the phylogeny of S. aureus, 35 diverse strains were sequenced using whole genome sequencing. In addition, 29 publicly available whole genome sequences were included to create a single nucleotide polymorphism (SNP-based phylogenetic tree encompassing 11 distinct lineages. All strains of a particular sequence type fell into the same clade with clear groupings of the major clonal complexes of CC8, CC5, CC30, CC45 and CC1. Using a novel analysis method, we plotted the homoplasy density and SNP density across the whole genome and found evidence of recombination throughout the entire chromosome, but when we examined individual clonal lineages we found very little recombination. However, when we analyzed three branches of multiple lineages, we saw intermediate and differing levels of recombination between them. These data demonstrate that in S. aureus, recombination occurs across major lineages that subsequently expand in a clonal manner. Estimated mutation rates for the CC8 and CC5 lineages were different from each other. While the CC8 lineage rate was similar to previous studies, the CC5 lineage was 100-fold greater. Fifty known virulence genes were screened in all genomes in silico to determine their distribution across major clades. Thirty-three genes were present variably across clades, most of which were not constrained by ancestry, indicating horizontal gene transfer or gene loss.

  5. Use of whole genome expression analysis in the toxicity screening of nanoparticles

    Energy Technology Data Exchange (ETDEWEB)

    Fröhlich, Eleonore, E-mail: eleonore.froehlich@medunigraz.at [Center for Medical Research, Medical University of Graz, Stiftingtalstr. 24, 8010 Graz (Austria); Meindl, Claudia; Wagner, Karin [Center for Medical Research, Medical University of Graz, Stiftingtalstr. 24, 8010 Graz (Austria); Leitinger, Gerd [Center for Medical Research, Medical University of Graz, Stiftingtalstr. 24, 8010 Graz (Austria); Institute for Cell Biology, Histology and Embryology, Medical University of Graz, Harrachgasse 21, 8010 Graz (Austria); Roblegg, Eva [Institute of Pharmaceutical Sciences, Department of Pharmaceutical Technology, Karl-Franzens-University of Graz, Universitätsplatz 1, 8010 Graz (Austria)

    2014-10-15

    The use of nanoparticles (NPs) offers exciting new options in technical and medical applications provided they do not cause adverse cellular effects. Cellular effects of NPs depend on particle parameters and exposure conditions. In this study, whole genome expression arrays were employed to identify the influence of particle size, cytotoxicity, protein coating, and surface functionalization of polystyrene particles as model particles and for short carbon nanotubes (CNTs) as particles with potential interest in medical treatment. Another aim of the study was to find out whether screening by microarray would identify other or additional targets than commonly used cell-based assays for NP action. Whole genome expression analysis and assays for cell viability, interleukin secretion, oxidative stress, and apoptosis were employed. Similar to conventional assays, microarray data identified inflammation, oxidative stress, and apoptosis as affected by NP treatment. Application of lower particle doses and presence of protein decreased the total number of regulated genes but did not markedly influence the top regulated genes. Cellular effects of CNTs were small; only carboxyl-functionalized single-walled CNTs caused appreciable regulation of genes. It can be concluded that regulated functions correlated well with results in cell-based assays. Presence of protein mitigated cytotoxicity but did not cause a different pattern of regulated processes. - Highlights: • Regulated functions were screened using whole genome expression assays. • Polystyrene particles regulated more genes than short carbon nanotubes. • Protein coating of polystyrene particles did not change regulation pattern. • Functions regulated by microarray were confirmed by cell-based assay.

  6. Recent advances in understanding the roles of whole genome duplications in evolution.

    Science.gov (United States)

    MacKintosh, Carol; Ferrier, David E K

    2017-01-01

    Ancient whole-genome duplications (WGDs)- paleo polyploidy events-are key to solving Darwin's 'abominable mystery' of how flowering plants evolved and radiated into a rich variety of species. The vertebrates also emerged from their invertebrate ancestors via two WGDs, and genomes of diverse gymnosperm trees, unicellular eukaryotes, invertebrates, fishes, amphibians and even a rodent carry evidence of lineage-specific WGDs. Modern polyploidy is common in eukaryotes, and it can be induced, enabling mechanisms and short-term cost-benefit assessments of polyploidy to be studied experimentally. However, the ancient WGDs can be reconstructed only by comparative genomics: these studies are difficult because the DNA duplicates have been through tens or hundreds of millions of years of gene losses, mutations, and chromosomal rearrangements that culminate in resolution of the polyploid genomes back into diploid ones (rediploidisation). Intriguing asymmetries in patterns of post-WGD gene loss and retention between duplicated sets of chromosomes have been discovered recently, and elaborations of signal transduction systems are lasting legacies from several WGDs. The data imply that simpler signalling pathways in the pre-WGD ancestors were converted via WGDs into multi-stranded parallelised networks. Genetic and biochemical studies in plants, yeasts and vertebrates suggest a paradigm in which different combinations of sister paralogues in the post-WGD regulatory networks are co-regulated under different conditions. In principle, such networks can respond to a wide array of environmental, sensory and hormonal stimuli and integrate them to generate phenotypic variety in cell types and behaviours. Patterns are also being discerned in how the post-WGD signalling networks are reconfigured in human cancers and neurological conditions. It is fascinating to unpick how ancient genomic events impact on complexity, variety and disease in modern life.

  7. Whole-genome resequencing of two elite sires for the detection of haplotypes under selection in dairy cattle.

    Science.gov (United States)

    Larkin, Denis M; Daetwyler, Hans D; Hernandez, Alvaro G; Wright, Chris L; Hetrick, Lorie A; Boucek, Lisa; Bachman, Sharon L; Band, Mark R; Akraiko, Tatsiana V; Cohen-Zinder, Miri; Thimmapuram, Jyothi; Macleod, Iona M; Harkins, Timothy T; McCague, Jennifer E; Goddard, Michael E; Hayes, Ben J; Lewin, Harris A

    2012-05-15

    Using a combination of whole-genome resequencing and high-density genotyping arrays, genome-wide haplotypes were reconstructed for two of the most important bulls in the history of the dairy cattle industry, Pawnee Farm Arlinda Chief ("Chief") and his son Walkway Chief Mark ("Mark"), each accounting for ∼7% of all current genomes. We aligned 20.5 Gbp (∼7.3× coverage) and 37.9 Gbp (∼13.5× coverage) of the Chief and Mark genomic sequences, respectively. More than 1.3 million high-quality SNPs were detected in Chief and Mark sequences. The genome-wide haplotypes inherited by Mark from Chief were reconstructed using ∼1 million informative SNPs. Comparison of a set of 15,826 SNPs that overlapped in the sequence-based and BovineSNP50 SNPs showed the accuracy of the sequence-based haplotype reconstruction to be as high as 97%. By using the BovineSNP50 genotypes, the frequencies of Chief alleles on his two haplotypes then were determined in 1,149 of his descendants, and the distribution was compared with the frequencies that would be expected assuming no selection. We identified 49 chromosomal segments in which Chief alleles showed strong evidence of selection. Candidate polymorphisms for traits that have been under selection in the dairy cattle population then were identified by referencing Chief's DNA sequence within these selected chromosome blocks. Eleven candidate genes were identified with functions related to milk-production, fertility, and disease-resistance traits. These data demonstrate that haplotype reconstruction of an ancestral proband by whole-genome resequencing in combination with high-density SNP genotyping of descendants can be used for rapid, genome-wide identification of the ancestor's alleles that have been subjected to artificial selection.

  8. Whole genome amplification - Review of applications and advances

    Energy Technology Data Exchange (ETDEWEB)

    Hawkins, Trevor L.; Detter, J.C.; Richardson, Paul

    2001-11-15

    The concept of Whole Genome Amplification is something that has arisen in the past few years as modifications to the polymerase chain reaction (PCR) have been adapted to replicate regions of genomes which are of biological interest. The applications here are many--forensics, embryonic disease diagnosis, bio terrorism genome detection, ''imoralization'' of clinical samples, microbial diversity, and genotyping. The key question is if DNA can be replicated a genome at a time without bias or non random distribution of the target. Several papers published in the last year and currently in preparation may lead to the conclusion that whole genome amplification may indeed be possible and therefore open up a new avenue to molecular biology.

  9. Bioinformatics for whole-genome shotgun sequencing of microbial communities.

    Directory of Open Access Journals (Sweden)

    Kevin Chen

    2005-07-01

    Full Text Available The application of whole-genome shotgun sequencing to microbial communities represents a major development in metagenomics, the study of uncultured microbes via the tools of modern genomic analysis. In the past year, whole-genome shotgun sequencing projects of prokaryotic communities from an acid mine biofilm, the Sargasso Sea, Minnesota farm soil, three deep-sea whale falls, and deep-sea sediments have been reported, adding to previously published work on viral communities from marine and fecal samples. The interpretation of this new kind of data poses a wide variety of exciting and difficult bioinformatics problems. The aim of this review is to introduce the bioinformatics community to this emerging field by surveying existing techniques and promising new approaches for several of the most interesting of these computational problems.

  10. Rediscovery by Whole Genome Sequencing: Classical Mutations and Genome Polymorphisms in Neurospora crassa

    Energy Technology Data Exchange (ETDEWEB)

    McCluskey, Kevin; Wiest, Aric E.; Grigoriev, Igor V.; Lipzen, Anna; Martin, Joel; Schackwitz, Wendy; Baker, Scott E.

    2011-06-02

    Classical forward genetics has been foundational to modern biology, and has been the paradigm for characterizing the role of genes in shaping phenotypes for decades. In recent years, reverse genetics has been used to identify the functions of genes, via the intentional introduction of variation and subsequent evaluation in physiological, molecular, and even population contexts. These approaches are complementary and whole genome analysis serves as a bridge between the two. We report in this article the whole genome sequencing of eighteen classical mutant strains of Neurospora crassa and the putative identification of the mutations associated with corresponding mutant phenotypes. Although some strains carry multiple unique nonsynonymous, nonsense, or frameshift mutations, the combined power of limiting the scope of the search based on genetic markers and of using a comparative analysis among the eighteen genomes provides strong support for the association between mutation and phenotype. For ten of the mutants, the mutant phenotype is recapitulated in classical or gene deletion mutants in Neurospora or other filamentous fungi. From thirteen to 137 nonsense mutations are present in each strain and indel sizes are shown to be highly skewed in gene coding sequence. Significant additional genetic variation was found in the eighteen mutant strains, and this variability defines multiple alleles of many genes. These alleles may be useful in further genetic and molecular analysis of known and yet-to-be-discovered functions and they invite new interpretations of molecular and genetic interactions in classical mutant strains.

  11. Whole-genome sequencing identifies recurrent mutations in chronic lymphocytic leukaemia

    Science.gov (United States)

    Puente, Xose S.; Pinyol, Magda; Quesada, Víctor; Conde, Laura; Ordóñez, Gonzalo R.; Villamor, Neus; Escaramis, Georgia; Jares, Pedro; Beà, Sílvia; González-Díaz, Marcos; Bassaganyas, Laia; Baumann, Tycho; Juan, Manel; López-Guerra, Mónica; Colomer, Dolors; Tubío, José M. C.; López, Cristina; Navarro, Alba; Tornador, Cristian; Aymerich, Marta; Rozman, María; Hernández, Jesús M.; Puente, Diana A.; Freije, José M. P.; Velasco, Gloria; Gutiérrez-Fernández, Ana; Costa, Dolors; Carrió, Anna; Guijarro, Sara; Enjuanes, Anna; Hernández, Lluís; Yagüe, Jordi; Nicolás, Pilar; Romeo-Casabona, Carlos M.; Himmelbauer, Heinz; Castillo, Ester; Dohm, Juliane C.; de Sanjosé, Silvia; Piris, Miguel A.; de Alava, Enrique; Miguel, Jesús San; Royo, Romina; Gelpí, Josep L.; Torrents, David; Orozco, Modesto; Pisano, David G.; Valencia, Alfonso; Guigó, Roderic; Bayés, Mónica; Heath, Simon; Gut, Marta; Klatt, Peter; Marshall, John; Raine, Keiran; Stebbings, Lucy A.; Futreal, P. Andrew; Stratton, Michael R.; Campbell, Peter J.; Gut, Ivo; López-Guillermo, Armando; Estivill, Xavier; Montserrat, Emili; López-Otín, Carlos; Campo, Elías

    2012-01-01

    Chronic lymphocytic leukaemia (CLL), the most frequent leukaemia in adults in Western countries, is a heterogeneous disease with variable clinical presentation and evolution1,2. Two major molecular subtypes can be distinguished, characterized respectively by a high or low number of somatic hypermutations in the variable region of immunoglobulin genes3,4. The molecular changes leading to the pathogenesis of the disease are still poorly understood. Here we performed whole-genome sequencing of four cases of CLL and identified 46 somatic mutations that potentially affect gene function. Further analysis of these mutations in 363 patients with CLL identified four genes that are recurrently mutated: notch 1 (NOTCH1), exportin 1 (XPO1), myeloid differentiation primary response gene 88 (MYD88) and kelch-like 6 (KLHL6). Mutations in MYD88 and KLHL6 are predominant in cases of CLL with mutated immunoglobulin genes, whereas NOTCH1 and XPO1 mutations are mainly detected in patients with unmutated immunoglobulins. The patterns of somatic mutation, supported by functional and clinical analyses, strongly indicate that the recurrent NOTCH1, MYD88 and XPO1 mutations are oncogenic changes that contribute to the clinical evolution of the disease. To our knowledge, this is the first comprehensive analysis of CLL combining whole-genome sequencing with clinical characteristics and clinical outcomes. It highlights the usefulness of this approach for the identification of clinically relevant mutations in cancer. PMID:21642962

  12. A comprehensive whole-genome integrated cytogenetic map for the alpaca (Lama pacos).

    Science.gov (United States)

    Avila, Felipe; Baily, Malorie P; Perelman, Polina; Das, Pranab J; Pontius, Joan; Chowdhary, Renuka; Owens, Elaine; Johnson, Warren E; Merriwether, David A; Raudsepp, Terje

    2014-01-01

    Genome analysis of the alpaca (Lama pacos, LPA) has progressed slowly compared to other domestic species. Here, we report the development of the first comprehensive whole-genome integrated cytogenetic map for the alpaca using fluorescence in situ hybridization (FISH) and CHORI-246 BAC library clones. The map is comprised of 230 linearly ordered markers distributed among all 36 alpaca autosomes and the sex chromosomes. For the first time, markers were assigned to LPA14, 21, 22, 28, and 36. Additionally, 86 genes from 15 alpaca chromosomes were mapped in the dromedary camel (Camelus dromedarius, CDR), demonstrating exceptional synteny and linkage conservation between the 2 camelid genomes. Cytogenetic mapping of 191 protein-coding genes improved and refined the known Zoo-FISH homologies between camelids and humans: we discovered new homologous synteny blocks (HSBs) corresponding to HSA1-LPA/CDR11, HSA4-LPA/CDR31 and HSA7-LPA/CDR36, and revised the location of breakpoints for others. Overall, gene mapping was in good agreement with the Zoo-FISH and revealed remarkable evolutionary conservation of gene order within many human-camelid HSBs. Most importantly, 91 FISH-mapped markers effectively integrated the alpaca whole-genome sequence and the radiation hybrid maps with physical chromosomes, thus facilitating the improvement of the sequence assembly and the discovery of genes of biological importance. © 2015 S. Karger AG, Basel.

  13. Cgaln: fast and space-efficient whole-genome alignment

    Directory of Open Access Journals (Sweden)

    Nakato Ryuichiro

    2010-04-01

    Full Text Available Abstract Background Whole-genome sequence alignment is an essential process for extracting valuable information about the functions, evolution, and peculiarities of genomes under investigation. As available genomic sequence data accumulate rapidly, there is great demand for tools that can compare whole-genome sequences within practical amounts of time and space. However, most existing genomic alignment tools can treat sequences that are only a few Mb long at once, and no state-of-the-art alignment program can align large sequences such as mammalian genomes directly on a conventional standalone computer. Results We previously proposed the CGAT (Coarse-Grained AlignmenT algorithm, which performs an alignment job in two steps: first at the block level and then at the nucleotide level. The former is "coarse-grained" alignment that can explore genomic rearrangements and reduce the sizes of the regions to be analyzed in the next step. The latter is detailed alignment within limited regions. In this paper, we present an update of the algorithm and the open-source program, Cgaln, that implements the algorithm. We compared the performance of Cgaln with those of other programs on whole genomic sequences of several bacteria and of some mammalian chromosome pairs. The results showed that Cgaln is several times faster and more memory-efficient than the best existing programs, while its sensitivity and accuracy are comparable to those of the best programs. Cgaln takes less than 13 hours to finish an alignment between the whole genomes of human and mouse in a single run on a conventional desktop computer with a single CPU and 2 GB memory. Conclusions Cgaln is not only fast and memory efficient but also effective in coping with genomic rearrangements. Our results show that Cgaln is very effective for comparison of large genomes, especially of intact chromosomal sequences. We believe that Cgaln provides novel viewpoint for reducing computational complexity and

  14. Cgaln: fast and space-efficient whole-genome alignment.

    Science.gov (United States)

    Nakato, Ryuichiro; Gotoh, Osamu

    2010-04-30

    Whole-genome sequence alignment is an essential process for extracting valuable information about the functions, evolution, and peculiarities of genomes under investigation. As available genomic sequence data accumulate rapidly, there is great demand for tools that can compare whole-genome sequences within practical amounts of time and space. However, most existing genomic alignment tools can treat sequences that are only a few Mb long at once, and no state-of-the-art alignment program can align large sequences such as mammalian genomes directly on a conventional standalone computer. We previously proposed the CGAT (Coarse-Grained AlignmenT) algorithm, which performs an alignment job in two steps: first at the block level and then at the nucleotide level. The former is "coarse-grained" alignment that can explore genomic rearrangements and reduce the sizes of the regions to be analyzed in the next step. The latter is detailed alignment within limited regions. In this paper, we present an update of the algorithm and the open-source program, Cgaln, that implements the algorithm. We compared the performance of Cgaln with those of other programs on whole genomic sequences of several bacteria and of some mammalian chromosome pairs. The results showed that Cgaln is several times faster and more memory-efficient than the best existing programs, while its sensitivity and accuracy are comparable to those of the best programs. Cgaln takes less than 13 hours to finish an alignment between the whole genomes of human and mouse in a single run on a conventional desktop computer with a single CPU and 2 GB memory. Cgaln is not only fast and memory efficient but also effective in coping with genomic rearrangements. Our results show that Cgaln is very effective for comparison of large genomes, especially of intact chromosomal sequences. We believe that Cgaln provides novel viewpoint for reducing computational complexity and will contribute to various fields of genome science.

  15. Whole genome and transcriptome sequencing of a B3 thymoma.

    Directory of Open Access Journals (Sweden)

    Iacopo Petrini

    Full Text Available Molecular pathology of thymomas is poorly understood. Genomic aberrations are frequently identified in tumors but no extensive sequencing has been reported in thymomas. Here we present the first comprehensive view of a B3 thymoma at whole genome and transcriptome levels. A 55-year-old Caucasian female underwent complete resection of a stage IVA B3 thymoma. RNA and DNA were extracted from a snap frozen tumor sample with a fraction of cancer cells over 80%. We performed array comparative genomic hybridization using Agilent platform, transcriptome sequencing using HiSeq 2000 (Illumina and whole genome sequencing using Complete Genomics Inc platform. Whole genome sequencing determined, in tumor and normal, the sequence of both alleles in more than 95% of the reference genome (NCBI Build 37. Copy number (CN aberrations were comparable with those previously described for B3 thymomas, with CN gain of chromosome 1q, 5, 7 and X and CN loss of 3p, 6, 11q42.2-qter and q13. One translocation t(11;X was identified by whole genome sequencing and confirmed by PCR and Sanger sequencing. Ten single nucleotide variations (SNVs and 2 insertion/deletions (INDELs were identified; these mutations resulted in non-synonymous amino acid changes or affected splicing sites. The lack of common cancer-associated mutations in this patient suggests that thymomas may evolve through mechanisms distinctive from other tumor types, and supports the rationale for additional high-throughput sequencing screens to better understand the somatic genetic architecture of thymoma.

  16. A green-cotyledon/stay-green mutant exemplifies the ancient whole-genome duplications in soybean.

    Science.gov (United States)

    Nakano, Michiharu; Yamada, Tetsuya; Masuda, Yu; Sato, Yutaka; Kobayashi, Hideki; Ueda, Hiroaki; Morita, Ryouhei; Nishimura, Minoru; Kitamura, Keisuke; Kusaba, Makoto

    2014-10-01

    The recent whole-genome sequencing of soybean (Glycine max) revealed that soybean experienced whole-genome duplications 59 million and 13 million years ago, and it has an octoploid-like genome in spite of its diploid nature. We analyzed a natural green-cotyledon mutant line, Tenshin-daiseitou. The physiological analysis revealed that Tenshin-daiseitou shows a non-functional stay-green phenotype in senescent leaves, which is similar to that of the mutant of Mendel's green-cotyledon gene I, the ortholog of SGR in pea. The identification of gene mutations and genetic segregation analysis suggested that defects in GmSGR1 and GmSGR2 were responsible for the green-cotyledon/stay-green phenotype of Tenshin-daiseitou, which was confirmed by RNA interference (RNAi) transgenic soybean experiments using GmSGR genes. The characterized green-cotyledon double mutant d1d2 was found to have the same mutations, suggesting that GmSGR1 and GmSGR2 are D1 and D2. Among the examined d1d2 strains, the d1d2 strain K144a showed a lower Chl a/b ratio in mature seeds than other strains but not in senescent leaves, suggesting a seed-specific genetic factor of the Chl composition in K144a. Analysis of the soybean genome sequence revealed four genomic regions with microsynteny to the Arabidopsis SGR1 region, which included the GmSGR1 and GmSGR2 regions. The other two regions contained GmSGR3a/GmSGR3b and GmSGR4, respectively, which might be pseudogenes or genes with a function that is unrelated to Chl degradation during seed maturation and leaf senescence. These GmSGR genes were thought to be produced by the two whole-genome duplications, and they provide a good example of such whole-genome duplication events in the evolution of the soybean genome. © The Author 2014. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists. All rights reserved. For permissions, please email: journals.permissions@oup.com.

  17. Priors in whole-genome regression: the bayesian alphabet returns.

    Science.gov (United States)

    Gianola, Daniel

    2013-07-01

    Whole-genome enabled prediction of complex traits has received enormous attention in animal and plant breeding and is making inroads into human and even Drosophila genetics. The term "Bayesian alphabet" denotes a growing number of letters of the alphabet used to denote various Bayesian linear regressions that differ in the priors adopted, while sharing the same sampling model. We explore the role of the prior distribution in whole-genome regression models for dissecting complex traits in what is now a standard situation with genomic data where the number of unknown parameters (p) typically exceeds sample size (n). Members of the alphabet aim to confront this overparameterization in various manners, but it is shown here that the prior is always influential, unless n ≫ p. This happens because parameters are not likelihood identified, so Bayesian learning is imperfect. Since inferences are not devoid of the influence of the prior, claims about genetic architecture from these methods should be taken with caution. However, all such procedures may deliver reasonable predictions of complex traits, provided that some parameters ("tuning knobs") are assessed via a properly conducted cross-validation. It is concluded that members of the alphabet have a room in whole-genome prediction of phenotypes, but have somewhat doubtful inferential value, at least when sample size is such that n ≪ p.

  18. WGSQuikr: fast whole-genome shotgun metagenomic classification.

    Directory of Open Access Journals (Sweden)

    David Koslicki

    Full Text Available With the decrease in cost and increase in output of whole-genome shotgun technologies, many metagenomic studies are utilizing this approach in lieu of the more traditional 16S rRNA amplicon technique. Due to the large number of relatively short reads output from whole-genome shotgun technologies, there is a need for fast and accurate short-read OTU classifiers. While there are relatively fast and accurate algorithms available, such as MetaPhlAn, MetaPhyler, PhyloPythiaS, and PhymmBL, these algorithms still classify samples in a read-by-read fashion and so execution times can range from hours to days on large datasets. We introduce WGSQuikr, a reconstruction method which can compute a vector of taxonomic assignments and their proportions in the sample with remarkable speed and accuracy. We demonstrate on simulated data that WGSQuikr is typically more accurate and up to an order of magnitude faster than the aforementioned classification algorithms. We also verify the utility of WGSQuikr on real biological data in the form of a mock community. WGSQuikr is a Whole-Genome Shotgun QUadratic, Iterative, K-mer based Reconstruction method which extends the previously introduced 16S rRNA-based algorithm Quikr. A MATLAB implementation of WGSQuikr is available at: http://sourceforge.net/projects/wgsquikr.

  19. WGSQuikr: fast whole-genome shotgun metagenomic classification.

    Science.gov (United States)

    Koslicki, David; Foucart, Simon; Rosen, Gail

    2014-01-01

    With the decrease in cost and increase in output of whole-genome shotgun technologies, many metagenomic studies are utilizing this approach in lieu of the more traditional 16S rRNA amplicon technique. Due to the large number of relatively short reads output from whole-genome shotgun technologies, there is a need for fast and accurate short-read OTU classifiers. While there are relatively fast and accurate algorithms available, such as MetaPhlAn, MetaPhyler, PhyloPythiaS, and PhymmBL, these algorithms still classify samples in a read-by-read fashion and so execution times can range from hours to days on large datasets. We introduce WGSQuikr, a reconstruction method which can compute a vector of taxonomic assignments and their proportions in the sample with remarkable speed and accuracy. We demonstrate on simulated data that WGSQuikr is typically more accurate and up to an order of magnitude faster than the aforementioned classification algorithms. We also verify the utility of WGSQuikr on real biological data in the form of a mock community. WGSQuikr is a Whole-Genome Shotgun QUadratic, Iterative, K-mer based Reconstruction method which extends the previously introduced 16S rRNA-based algorithm Quikr. A MATLAB implementation of WGSQuikr is available at: http://sourceforge.net/projects/wgsquikr.

  20. Whole-genome shotgun optical mapping of Rhodospirillum rubrum

    Energy Technology Data Exchange (ETDEWEB)

    Reslewic, S. [Univ. Wisc.-Madison; Zhou, S. [Univ. Wisc.-Madison; Place, M. [Univ. Wisc.-Madison; Zhang, Y. [Univ. Wisc.-Madison; Briska, A. [Univ. Wisc.-Madison; Goldstein, S. [Univ. Wisc.-Madison; Churas, C. [Univ. Wisc.-Madison; Runnheim, R. [Univ. Wisc.-Madison; Forrest, D. [Univ. Wisc.-Madison; Lim, A. [Univ. Wisc.-Madison; Lapidus, A. [Univ. Wisc.-Madison; Han, C. S. [Univ. Wisc.-Madison; Roberts, G. P. [Univ. Wisc.-Madison; Schwartz, D. C. [Univ. Wisc.-Madison

    2005-09-01

    Rhodospirillum rubrum is a phototrophic purple nonsulfur bacterium known for its unique and well-studied nitrogen fixation and carbon monoxide oxidation systems and as a source of hydrogen and biodegradable plastic production. To better understand this organism and to facilitate assembly of its sequence, three whole-genome restriction endonuclease maps (XbaI, NheI, and HindIII) of R. rubrum strain ATCC 11170 were created by optical mapping. Optical mapping is a system for creating whole-genome ordered restriction endonuclease maps from randomly sheared genomic DNA molecules extracted from cells. During the sequence finishing process, all three optical maps confirmed a putative error in sequence assembly, while the HindIII map acted as a scaffold for high-resolution alignment with sequence contigs spanning the whole genome. In addition to highlighting optical mapping's role in the assembly and confirmation of genome sequence, this work underscores the unique niche in resolution occupied by the optical mapping system. With a resolution ranging from 6.5 kb (previously published) to 45 kb (reported here), optical mapping advances a "molecular cytogenetics" approach to solving problems in genomic analysis.

  1. Whole-genome shotgun optical mapping of rhodospirillumrubrum

    Energy Technology Data Exchange (ETDEWEB)

    Reslewic, Susan; Zhou, Shiguo; Place, Mike; Zhang, Yaoping; Briska, Adam; Goldstein, Steve; Churas, Chris; Runnheim, Rod; Forrest,Dan; Lim, Alex; Lapidus, Alla; Han, Cliff S.; Roberts, Gary P.; Schwartz,David C.

    2004-07-01

    Rhodospirillum rubrum is a phototrophic purple non-sulfur bacterium known for its unique and well-studied nitrogen fixation and carbon monoxide oxidation systems, and as a source of hydrogen and biodegradable plastics production. To better understand this organism and to facilitate assembly of its sequence, three whole-genome restriction maps (Xba I, Nhe I, and Hind III) of R. rubrum strain ATCC 11170 were created by optical mapping. Optical mapping is a system for creating whole-genome ordered restriction maps from randomly sheared genomic DNA molecules extracted directly from cells. During the sequence finishing process, all three optical maps confirmed a putative error in sequence assembly, while the Hind III map acted as a scaffold for high resolution alignment with sequence contigs spanning the whole genome. In addition to highlighting optical mapping's role in the assembly and validation of genome sequence, our work underscores the unique niche in resolution occupied by the optical mapping system. With a resolution ranging from 6.5 kb (previously published) to 45 kb (reported here), optical mapping advances a ''molecular cytogenetics'' approach to solving problems in genomic analysis.

  2. Whole-Genome Sequencing Reveals Diverse Models of Structural Variations in Esophageal Squamous Cell Carcinoma.

    Science.gov (United States)

    Cheng, Caixia; Zhou, Yong; Li, Hongyi; Xiong, Teng; Li, Shuaicheng; Bi, Yanghui; Kong, Pengzhou; Wang, Fang; Cui, Heyang; Li, Yaoping; Fang, Xiaodong; Yan, Ting; Li, Yike; Wang, Juan; Yang, Bin; Zhang, Ling; Jia, Zhiwu; Song, Bin; Hu, Xiaoling; Yang, Jie; Qiu, Haile; Zhang, Gehong; Liu, Jing; Xu, Enwei; Shi, Ruyi; Zhang, Yanyan; Liu, Haiyan; He, Chanting; Zhao, Zhenxiang; Qian, Yu; Rong, Ruizhou; Han, Zhiwei; Zhang, Yanlin; Luo, Wen; Wang, Jiaqian; Peng, Shaoliang; Yang, Xukui; Li, Xiangchun; Li, Lin; Fang, Hu; Liu, Xingmin; Ma, Li; Chen, Yunqing; Guo, Shiping; Chen, Xing; Xi, Yanfeng; Li, Guodong; Liang, Jianfang; Yang, Xiaofeng; Guo, Jiansheng; Jia, JunMei; Li, Qingshan; Cheng, Xiaolong; Zhan, Qimin; Cui, Yongping

    2016-02-04

    Comprehensive identification of somatic structural variations (SVs) and understanding their mutational mechanisms in cancer might contribute to understanding biological differences and help to identify new therapeutic targets. Unfortunately, characterization of complex SVs across the whole genome and the mutational mechanisms underlying esophageal squamous cell carcinoma (ESCC) is largely unclear. To define a comprehensive catalog of somatic SVs, affected target genes, and their underlying mechanisms in ESCC, we re-analyzed whole-genome sequencing (WGS) data from 31 ESCCs using Meerkat algorithm to predict somatic SVs and Patchwork to determine copy-number changes. We found deletions and translocations with NHEJ and alt-EJ signature as the dominant SV types, and 16% of deletions were complex deletions. SVs frequently led to disruption of cancer-associated genes (e.g., CDKN2A and NOTCH1) with different mutational mechanisms. Moreover, chromothripsis, kataegis, and breakage-fusion-bridge (BFB) were identified as contributing to locally mis-arranged chromosomes that occurred in 55% of ESCCs. These genomic catastrophes led to amplification of oncogene through chromothripsis-derived double-minute chromosome formation (e.g., FGFR1 and LETM2) or BFB-affected chromosomes (e.g., CCND1, EGFR, ERBB2, MMPs, and MYC), with approximately 30% of ESCCs harboring BFB-derived CCND1 amplification. Furthermore, analyses of copy-number alterations reveal high frequency of whole-genome duplication (WGD) and recurrent focal amplification of CDCA7 that might act as a potential oncogene in ESCC. Our findings reveal molecular defects such as chromothripsis and BFB in malignant transformation of ESCCs and demonstrate diverse models of SVs-derived target genes in ESCCs. These genome-wide SV profiles and their underlying mechanisms provide preventive, diagnostic, and therapeutic implications for ESCCs. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.

  3. Whole genome sequence of Pseudomonas aeruginosa F9676, an antagonistic bacterium isolated from rice seed.

    Science.gov (United States)

    Shi, Zhenyuan; Ren, Deyong; Hu, Shikai; Hu, Xingming; Wu, Liwen; Lin, Haiyan; Hu, Jiang; Zhang, Guangheng; Guo, Longbiao

    2015-10-10

    Pseudomonas aeruginosa is a group of bacteria, which can be isolated from diverse ecological niches. P. aeruginosa strain F9676 was first isolated from a rice seed sample in 2003. It showed strong antagonism against several plant pathogens. In this study, whole genome sequencing was carried out. The total genome size of F9676 is 6368,008bp with 5586 coding genes (CDS), 67 tRNAs and 3 rRNAs. The genome sequence of F9676 may shed a light on antagonism P. aeruginosa. Copyright © 2015 Elsevier B.V. All rights reserved.

  4. Whole genome analysis provides evidence for porcine-to-simian interspecies transmission of rotavirus-A.

    Science.gov (United States)

    Navarro, Ryan; Aung, Meiji Soe; Cruz, Katalina; Ketzis, Jennifer; Gallagher, Christa Ann; Beierschmitt, Amy; Malik, Yashpal Singh; Kobayashi, Nobumichi; Ghosh, Souvik

    2017-04-01

    We report here whole genome analysis of a porcine rotavirus-A (RVA) strain RVA/Pig-wt/KNA/ET8B/2015/G5P[13] detected in a diarrheic piglet, and nearly whole genome (except for VP4 gene) analysis of a simian RVA strain RVA/Simian-wt/KNA/08979/2015/G5P[X] detected in a non-diarrheic African green monkey (AGM) on the island of St. Kitts, Caribbean region. Strain ET8B exhibited a G5-P[13]-I5-R1-C1-M1-A8-N1-T7-E1-H1 genotype constellation that was identical to those of Brazilian porcine RVA G5P[13] strains RVA/Pig-wt/BRA/ROTA01/2013/G5P[13] and RVA/Pig-wt/BRA/ROTA07/2013/G5P[13], the only porcine G5P[13] RVAs that have been analyzed for the whole genome so far. Phylogenetically, all the 11 gene segments of ET8B were closely related to those of porcine and porcine-like human RVAs within the respective genotypes. Although the porcine G5P[13] RVAs exhibited identical genotype constellations, ET8B did not appear to share common evolutionary pathways with the Brazilian porcine G5P[13] RVAs. Interestingly, the VP2, VP3, VP6, VP7, and NSP1-NSP5 genes of simian RVA strain 08979 were closely related to those of porcine and porcine-like human RVA strains, exhibiting 99%-100% nucleotide sequence identities to cognate genes of co-circulating porcine RVA strain ET8B. On the other hand, the VP1 of 08979 appeared to be genetically divergent from porcine and human RVAs within the R1 genotype, and its exact origin could not be ascertained. Taken together, these observations suggested that simian strain 08979 might have been derived from interspecies transmission events involving transmission of ET8B-like RVAs from pigs to AGMs. In St. Kitts, AGMs often stray from the wild into livestock farms. Therefore, it may be possible that the AGM acquired the infection from a pig farm on the island. To our knowledge, this is the first report on detection of porcine-like RVAs in monkeys. Also, the present study is the first to report whole genomic analysis of a porcine RVA strain from the Caribbean

  5. Whole genome sequencing as a tool for phylogenetic analysis of clinical strains of Mitis group streptococci

    DEFF Research Database (Denmark)

    Rasmusen, L. H.; Dargis, R.; Iversen, Katrine Højholt

    2016-01-01

    Identification of Mitis group streptococci (MGS) to the species level is challenging for routine microbiology laboratories. Correct identification is crucial for the diagnosis of infective endocarditis, identification of treatment failure, and/or infection relapse. Eighty MGS from Danish patients...... with infective endocarditis were whole genome sequenced. We compared the phylogenetic analyses based on single genes (recA, sodA, gdh), multigene (MLSA), SNPs, and core-genome sequences. The six phylogenetic analyses generally showed a similar pattern of six monophyletic clusters, though a few differences were...

  6. Whole genome shotgun sequence of Bacillus amyloliquefaciens TF28, a biocontrol entophytic bacterium.

    Science.gov (United States)

    Zhang, Shumei; Jiang, Wei; Li, Jing; Meng, Liqiang; Cao, Xu; Hu, Jihua; Liu, Yushuai; Chen, Jingyu; Sha, Changqing

    2016-01-01

    Bacillus amyloliquefaciens TF28 is a biocontrol endophytic bacterium that is capable of inhibition of a broad range of plant pathogenic fungi. The strain has the potential to be developed into a biocontrol agent for use in agriculture. Here we report the whole-genome shotgun sequence of the strain. The genome size of B. amyloliquefaciens TF28 is 3,987,635 bp which consists of 3754 protein-coding genes, 65 tandem repeat sequences, 47 minisatellite DNA, 2 microsatellite DNA, 63 tRNA, 7rRNA, 6 sRNA, 3 prophage and CRISPR domains.

  7. Identification of emergent blaCMY-2-carrying Proteus mirabilis lineages by whole-genome sequencing

    Directory of Open Access Journals (Sweden)

    M. Mac Aogáin

    2016-01-01

    Full Text Available Whole-genome sequencing of 24 Proteus mirabilis isolates revealed the clonal expansion of two cefoxitin-resistant strains among patients with community-onset infection. These strains harboured blaCMY-2 within a chromosomally located integrative and conjugative element and exhibited multidrug resistance phenotypes. A predominant strain, identified in 18 patients, also harboured the PGI-1 genomic island and associated resistance genes, accounting for its broader antibiotic resistance profile. The identification of these novel multidrug-resistant strains among community-onset infections suggests that they are endemic to this region and represent emergent P. mirabilis lineages of clinical significance.

  8. Comparative whole genome sequence analysis of wild-type and cidofovir-resistant monkeypoxvirus

    Directory of Open Access Journals (Sweden)

    Huggins John

    2010-05-01

    Full Text Available Abstract We performed whole genome sequencing of a cidofovir {[(S-1-(3-hydroxy-2-phosphonylmethoxy-propyl cytosine] [HPMPC]}-resistant (CDV-R strain of Monkeypoxvirus (MPV. Whole-genome comparison with the wild-type (WT strain revealed 55 single-nucleotide polymorphisms (SNPs and one tandem-repeat contraction. Over one-third of all identified SNPs were located within genes comprising the poxvirus replication complex, including the DNA polymerase, RNA polymerase, mRNA capping methyltransferase, DNA processivity factor, and poly-A polymerase. Four polymorphic sites were found within the DNA polymerase gene. DNA polymerase mutations observed at positions 314 and 684 in MPV were consistent with CDV-R loci previously identified in Vaccinia virus (VACV. These data suggest the mechanism of CDV resistance may be highly conserved across Orthopoxvirus (OPV species. SNPs were also identified within virulence genes such as the A-type inclusion protein, serine protease inhibitor-like protein SPI-3, Schlafen ATPase and thymidylate kinase, among others. Aberrant chain extension induced by CDV may lead to diverse alterations in gene expression and viral replication that may result in both adaptive and attenuating mutations. Defining the potential contribution of substitutions in the replication complex and RNA processing machinery reported here may yield further insight into CDV resistance and may augment current therapeutic development strategies.

  9. Whole genome transcript profiling from fingerstick blood samples: a comparison and feasibility study

    Directory of Open Access Journals (Sweden)

    Williams Adam R

    2009-12-01

    Full Text Available Abstract Background Whole genome gene expression profiling has revolutionized research in the past decade especially with the advent of microarrays. Recently, there have been significant improvements in whole blood RNA isolation techniques which, through stabilization of RNA at the time of sample collection, avoid bias and artifacts introduced during sample handling. Despite these improvements, current human whole blood RNA stabilization/isolation kits are limited by the requirement of a venous blood sample of at least 2.5 mL. While fingerstick blood collection has been used for many different assays, there has yet to be a kit developed to isolate high quality RNA for use in gene expression studies from such small human samples. The clinical and field testing advantages of obtaining reliable and reproducible gene expression data from a fingerstick are many; it is less invasive, time saving, more mobile, and eliminates the need of a trained phlebotomist. Furthermore, this method could also be employed in small animal studies, i.e. mice, where larger sample collections often require sacrificing the animal. In this study, we offer a rapid and simple method to extract sufficient amounts of high quality total RNA from approximately 70 μl of whole blood collected via a fingerstick using a modified protocol of the commercially available Qiagen PAXgene RNA Blood Kit. Results From two sets of fingerstick collections, about 70 uL whole blood collected via finger lancet and capillary tube, we recovered an average of 252.6 ng total RNA with an average RIN of 9.3. The post-amplification yields for 50 ng of total RNA averaged at 7.0 ug cDNA. The cDNA hybridized to Affymetrix HG-U133 Plus 2.0 GeneChips had an average % Present call of 52.5%. Both fingerstick collections were highly correlated with r2 values ranging from 0.94 to 0.97. Similarly both fingerstick collections were highly correlated to the venous collection with r2 values ranging from 0.88 to 0

  10. Investigations on Genetic Architecture of Hairy Loci in Dairy Cattle by Using Single and Whole Genome Regression Approaches

    Directory of Open Access Journals (Sweden)

    B. Karacaören

    2016-07-01

    Full Text Available Development of body hair is an important physiological and cellular process that leads to better adaption in tropical environments for dairy cattle. Various studies suggested a major gene and, more recently, associated genes for hairy locus in dairy cattle. Main aim of this study was to i employ a variant of the discordant sib pair model, in which half sibs from the same sires are randomly sampled using their affection statues, ii use various single marker regression approaches, and iii use whole genome regression approaches to dissect genetic architecture of the hairy gene in the cattle. Whole and single genome regression approaches detected strong genomic signals from Chromosome 23. Although there is a major gene effect on hairy phenotype sourced from chromosome 23: whole genome regression approach also suggested polygenic component related with other parts of the genome. Such a result could not be obtained by any of the single marker approaches.

  11. Whole-genome sequencing of giant pandas provides insights into demographic history and local adaptation

    DEFF Research Database (Denmark)

    Zhao, Shancen; Zheng, Pingping; Dong, Shanshan

    2013-01-01

    The panda lineage dates back to the late Miocene and ultimately leads to only one extant species, the giant panda (Ailuropoda melanoleuca). Although global climate change and anthropogenic disturbances are recognized to shape animal population demography their contribution to panda population...... dynamics remains largely unknown. We sequenced the whole genomes of 34 pandas at an average 4.7-fold coverage and used this data set together with the previously deep-sequenced panda genome to reconstruct a continuous demographic history of pandas from their origin to the present. We identify two...

  12. Whole-genome sequencing suggests mechanisms for 22q11.2 deletion-associated Parkinson's disease.

    Directory of Open Access Journals (Sweden)

    Nancy J Butcher

    Full Text Available To investigate disease risk mechanisms of early-onset Parkinson's disease (PD associated with the recurrent 22q11.2 deletion, a genetic risk factor for early-onset PD.In a proof-of-principle study, we used whole-genome sequencing (WGS to investigate sequence variants in nine adults with 22q11.2DS, three with neuropathologically confirmed early-onset PD and six without PD. Adopting an approach used recently to study schizophrenia in 22q11.2DS, here we tested candidate gene-sets relevant to PD.No mutations common to the cases with PD were found in the intact 22q11.2 region. While all were negative for rare mutations in a gene-set comprising PD disease-causing and risk genes, another candidate gene-set of 1000 genes functionally relevant to PD presented a nominally significant (P = 0.03 enrichment of rare putatively damaging missense variants in the PD cases. Polygenic score results, based on common variants associated with PD risk, were non-significantly greater in those with PD.The results of this first-ever pilot study of WGS in PD suggest that the cumulative burden of genome-wide sequence variants may contribute to expression of early-onset PD in the presence of threshold-lowering dosage effects of a 22q11.2 deletion. We found no evidence that expression of PD in 22q11.2DS is mediated by a recessive locus on the intact 22q11.2 chromosome or mutations in known PD genes. These findings offer initial evidence of the potential effects of multiple within-individual rare variants on the expression of PD and the utility of next generation sequencing for studying the etiology of PD.

  13. Independent Evolution of Winner Traits without Whole Genome Duplication in Dekkera Yeasts.

    Science.gov (United States)

    Guo, Yi-Cheng; Zhang, Lin; Dai, Shao-Xing; Li, Wen-Xing; Zheng, Jun-Juan; Li, Gong-Hua; Huang, Jing-Fei

    2016-01-01

    Dekkera yeasts have often been considered as alternative sources of ethanol production that could compete with S. cerevisiae. The two lineages of yeasts independently evolved traits that include high glucose and ethanol tolerance, aerobic fermentation, and a rapid ethanol fermentation rate. The Saccharomyces yeasts attained these traits mainly through whole genome duplication approximately 100 million years ago (Mya). However, the Dekkera yeasts, which were separated from S. cerevisiae approximately 200 Mya, did not undergo whole genome duplication (WGD) but still occupy a niche similar to S. cerevisiae. Upon analysis of two Dekkera yeasts and five closely related non-WGD yeasts, we found that a massive loss of cis-regulatory elements occurred in an ancestor of the Dekkera yeasts, which led to improved mitochondrial functions similar to the S. cerevisiae yeasts. The evolutionary analysis indicated that genes involved in the transcription and translation process exhibited faster evolution in the Dekkera yeasts. We detected 90 positively selected genes, suggesting that the Dekkera yeasts evolved an efficient translation system to facilitate adaptive evolution. Moreover, we identified that 12 vacuolar H+-ATPase (V-ATPase) function genes that were under positive selection, which assists in developing tolerance to high alcohol and high sugar stress. We also revealed that the enzyme PGK1 is responsible for the increased rate of glycolysis in the Dekkera yeasts. These results provide important insights to understand the independent adaptive evolution of the Dekkera yeasts and provide tools for genetic modification promoting industrial usage.

  14. Independent Evolution of Winner Traits without Whole Genome Duplication in Dekkera Yeasts.

    Directory of Open Access Journals (Sweden)

    Yi-Cheng Guo

    Full Text Available Dekkera yeasts have often been considered as alternative sources of ethanol production that could compete with S. cerevisiae. The two lineages of yeasts independently evolved traits that include high glucose and ethanol tolerance, aerobic fermentation, and a rapid ethanol fermentation rate. The Saccharomyces yeasts attained these traits mainly through whole genome duplication approximately 100 million years ago (Mya. However, the Dekkera yeasts, which were separated from S. cerevisiae approximately 200 Mya, did not undergo whole genome duplication (WGD but still occupy a niche similar to S. cerevisiae. Upon analysis of two Dekkera yeasts and five closely related non-WGD yeasts, we found that a massive loss of cis-regulatory elements occurred in an ancestor of the Dekkera yeasts, which led to improved mitochondrial functions similar to the S. cerevisiae yeasts. The evolutionary analysis indicated that genes involved in the transcription and translation process exhibited faster evolution in the Dekkera yeasts. We detected 90 positively selected genes, suggesting that the Dekkera yeasts evolved an efficient translation system to facilitate adaptive evolution. Moreover, we identified that 12 vacuolar H+-ATPase (V-ATPase function genes that were under positive selection, which assists in developing tolerance to high alcohol and high sugar stress. We also revealed that the enzyme PGK1 is responsible for the increased rate of glycolysis in the Dekkera yeasts. These results provide important insights to understand the independent adaptive evolution of the Dekkera yeasts and provide tools for genetic modification promoting industrial usage.

  15. Construction of a phylogenetic tree of photosynthetic prokaryotes based on average similarities of whole genome sequences.

    Directory of Open Access Journals (Sweden)

    Soichirou Satoh

    Full Text Available Phylogenetic trees have been constructed for a wide range of organisms using gene sequence information, especially through the identification of orthologous genes that have been vertically inherited. The number of available complete genome sequences is rapidly increasing, and many tools for construction of genome trees based on whole genome sequences have been proposed. However, development of a reasonable method of using complete genome sequences for construction of phylogenetic trees has not been established. We have developed a method for construction of phylogenetic trees based on the average sequence similarities of whole genome sequences. We used this method to examine the phylogeny of 115 photosynthetic prokaryotes, i.e., cyanobacteria, Chlorobi, proteobacteria, Chloroflexi, Firmicutes and nonphotosynthetic organisms including Archaea. Although the bootstrap values for the branching order of phyla were low, probably due to lateral gene transfer and saturated mutation, the obtained tree was largely consistent with the previously reported phylogenetic trees, indicating that this method is a robust alternative to traditional phylogenetic methods.

  16. Whole genome investigation of a divergent clade of the pathogen Streptococcus suis

    Directory of Open Access Journals (Sweden)

    Abiyad eBaig

    2015-11-01

    Full Text Available Streptococcus suis is a major porcine and zoonotic pathogen responsible for significant economic losses in the pig industry and an increasing number of human cases. Multiple isolates of S. suis show marked genomic diversity. Here we report the analysis of whole genome sequences of nine pig isolates that caused disease typical of S. suis and had phenotypic characteristics of S. suis, but their genomes were divergent from those of many other S. suis isolates. Comparison of protein sequences predicted from divergent genomes with those from normal S. suis reduced the size of core genome from 793 to only 397 genes. Divergence was clear if phylogenetic analysis was performed on reduced core genes and MLST alleles. Phylogenies based on certain other genes (16S rRNA, sodA, recN and cpn60 did not show divergence for all isolates, suggesting recombination between some divergent isolates with normal S. suis for these genes. Indeed, there is evidence of recent recombination between the divergent and normal S. suis genomes for 249 of 397 core genes. In addition, phylogenetic analysis based on the 16S rRNA gene and 132 genes that were conserved between the divergent isolates and representatives of the broader Streptococcus genus showed that divergent isolates were more closely related to S. suis. Six out of nine divergent isolates possessed a S. suis-like capsule region with variation in capsular gene sequences but the remaining three did not have a discrete capsule locus. The majority (40/70, of virulence-associated genes in normal S. suis were present in the divergent genomes. Overall, the divergent isolates extend the current diversity of S. suis species but the phenotypic similarities and the large amount of gene exchange with normal S. suis gives insufficient evidence to assign these isolates to a new species or subspecies. Further sampling and whole genome analysis of more isolates is warranted to understand the diversity of the species.

  17. Whole genome microarray analysis, from neonatal blood cards

    Directory of Open Access Journals (Sweden)

    Hogan Michael E

    2009-07-01

    Full Text Available Abstract Background Neonatal blood, obtained from a heel stick and stored dry on paper cards, has been the standard for birth defects screening for 50 years. Such dried blood samples are used, primarily, for analysis of small-molecule analytes. More recently, the DNA complement of such dried blood cards has been used for targeted genetic testing, such as for single nucleotide polymorphism in cystic fibrosis. Expansion of such testing to include polygenic traits, and perhaps whole genome scanning, has been discussed as a formal possibility. However, until now the amount of DNA that might be obtained from such dried blood cards has been limiting, due to inefficient DNA recovery technology. Results A new technology is employed for efficient DNA release from a standard neonatal blood card. Using standard Guthrie cards, stored an average of ten years post-collection, about 1/40th of the air-dried neonatal blood specimen (two 3 mm punches was processed to obtain DNA that was sufficient in mass and quality for direct use in microarray-based whole genome scanning. Using that same DNA release technology, it is also shown that approximately 1/250th of the original purified DNA (about 1 ng could be subjected to whole genome amplification, thus yielding an additional microgram of amplified DNA product. That amplified DNA product was then used in microarray analysis and yielded statistical concordance of 99% or greater to the primary, unamplified DNA sample. Conclusion Together, these data suggest that DNA obtained from less than 10% of a standard neonatal blood specimen, stored dry for several years on a Guthrie card, can support a program of genome-wide neonatal genetic testing.

  18. A binary search approach to whole-genome data analysis

    Science.gov (United States)

    Brodsky, Leonid; Kogan, Simon; BenJacob, Eshel; Nevo, Eviatar

    2010-01-01

    A sequence analysis-oriented binary search-like algorithm was transformed to a sensitive and accurate analysis tool for processing whole-genome data. The advantage of the algorithm over previous methods is its ability to detect the margins of both short and long genome fragments, enriched by up-regulated signals, at equal accuracy. The score of an enriched genome fragment reflects the difference between the actual concentration of up-regulated signals in the fragment and the chromosome signal baseline. The “divide-and-conquer”-type algorithm detects a series of nonintersecting fragments of various lengths with locally optimal scores. The procedure is applied to detected fragments in a nested manner by recalculating the lower-than-baseline signals in the chromosome. The algorithm was applied to simulated whole-genome data, and its sensitivity/specificity were compared with those of several alternative algorithms. The algorithm was also tested with four biological tiling array datasets comprising Arabidopsis (i) expression and (ii) histone 3 lysine 27 trimethylation CHIP-on-chip datasets; Saccharomyces cerevisiae (iii) spliced intron data and (iv) chromatin remodeling factor binding sites. The analyses’ results demonstrate the power of the algorithm in identifying both the short up-regulated fragments (such as exons and transcription factor binding sites) and the long—even moderately up-regulated zones—at their precise genome margins. The algorithm generates an accurate whole-genome landscape that could be used for cross-comparison of signals across the same genome in evolutionary and general genomic studies. PMID:20833816

  19. Deep whole-genome sequencing of 100 southeast Asian Malays.

    Science.gov (United States)

    Wong, Lai-Ping; Ong, Rick Twee-Hee; Poh, Wan-Ting; Liu, Xuanyao; Chen, Peng; Li, Ruoying; Lam, Kevin Koi-Yau; Pillai, Nisha Esakimuthu; Sim, Kar-Seng; Xu, Haiyan; Sim, Ngak-Leng; Teo, Shu-Mei; Foo, Jia-Nee; Tan, Linda Wei-Lin; Lim, Yenly; Koo, Seok-Hwee; Gan, Linda Seo-Hwee; Cheng, Ching-Yu; Wee, Sharon; Yap, Eric Peng-Huat; Ng, Pauline Crystal; Lim, Wei-Yen; Soong, Richie; Wenk, Markus Rene; Aung, Tin; Wong, Tien-Yin; Khor, Chiea-Chuen; Little, Peter; Chia, Kee-Seng; Teo, Yik-Ying

    2013-01-10

    Whole-genome sequencing across multiple samples in a population provides an unprecedented opportunity for comprehensively characterizing the polymorphic variants in the population. Although the 1000 Genomes Project (1KGP) has offered brief insights into the value of population-level sequencing, the low coverage has compromised the ability to confidently detect rare and low-frequency variants. In addition, the composition of populations in the 1KGP is not complete, despite the fact that the study design has been extended to more than 2,500 samples from more than 20 population groups. The Malays are one of the Austronesian groups predominantly present in Southeast Asia and Oceania, and the Singapore Sequencing Malay Project (SSMP) aims to perform deep whole-genome sequencing of 100 healthy Malays. By sequencing at a minimum of 30× coverage, we have illustrated the higher sensitivity at detecting low-frequency and rare variants and the ability to investigate the presence of hotspots of functional mutations. Compared to the low-pass sequencing in the 1KGP, the deeper coverage allows more functional variants to be identified for each person. A comparison of the fidelity of genotype imputation of Malays indicated that a population-specific reference panel, such as the SSMP, outperforms a cosmopolitan panel with larger number of individuals for common SNPs. For lower-frequency (<5%) markers, a larger number of individuals might have to be whole-genome sequenced so that the accuracy currently afforded by the 1KGP can be achieved. The SSMP data are expected to be the benchmark for evaluating the value of deep population-level sequencing versus low-pass sequencing, especially in populations that are poorly represented in population-genetics studies. Copyright © 2013 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

  20. Whole-Genome Sequencing: Automated, Indexed Library Preparation.

    Science.gov (United States)

    Mardis, Elaine; McCombie, W Richard

    2017-03-01

    This protocol describes an automated procedure for constructing an indexed Illumina DNA library. With this method, genomic DNA fragments are produced by sonication, using high-frequency acoustic energy to shear DNA. Double-stranded DNA (dsDNA) will fragment when exposed to the energy of adaptive focused acoustic shearing (AFA). The resulting DNA fragments are ligated to adaptors, amplified by polymer chain reaction (PCR), and subjected to size selection using magnetic beads. The product is suitable for use as template in whole-genome sequencing. © 2017 Cold Spring Harbor Laboratory Press.

  1. Genomic prediction using QTL derived from whole genome sequence data

    DEFF Research Database (Denmark)

    Brøndum, Rasmus Froberg; Su, Guosheng; Janss, Luc

    This study investigated the gain in accuracy of genomic prediction when a small number of significant variants from single marker analysis based on whole genome sequence data were added to the regular 54k SNP data. Analyses were performed for Nordic Holstein and Danish Jersey animals, using either...... a genomic BLUP or a Bayesian variable selection model. When using the genomic BLUP model, results showed increases in accuracy of up to two percentage points for production traits in both Holstein and Jersey animals by including the extra variants in the analysis, and an extra 1.5 percentage points...

  2. Estimating telomere length from whole genome sequence data.

    Science.gov (United States)

    Ding, Zhihao; Mangino, Massimo; Aviv, Abraham; Spector, Tim; Durbin, Richard

    2014-05-01

    Telomeres play a key role in replicative ageing and undergo age-dependent attrition in vivo. Here, we report a novel method, TelSeq, to measure average telomere length from whole genome or exome shotgun sequence data. In 260 leukocyte samples, we show that TelSeq results correlate with Southern blot measurements of the mean length of terminal restriction fragments (mTRFs) and display age-dependent attrition comparably well as mTRFs. © The Author(s) 2014. Published by Oxford University Press [on behalf of insert name of society].

  3. Whole-Genome Sequences of Thirteen Isolates of Borrelia burgdorferi

    Energy Technology Data Exchange (ETDEWEB)

    Schutzer S. E.; Dunn J.; Fraser-Liggett, C. M.; Casjens, S. R.; Qiu, W.-G.; Mongodin, E. F.; Luft, B. J.

    2011-02-01

    Borrelia burgdorferi is a causative agent of Lyme disease in North America and Eurasia. The first complete genome sequence of B. burgdorferi strain 31, available for more than a decade, has assisted research on the pathogenesis of Lyme disease. Because a single genome sequence is not sufficient to understand the relationship between genotypic and geographic variation and disease phenotype, we determined the whole-genome sequences of 13 additional B. burgdorferi isolates that span the range of natural variation. These sequences should allow improved understanding of pathogenesis and provide a foundation for novel detection, diagnosis, and prevention strategies.

  4. Whole genome resequencing of black Angus and Holstein cattle for SNP and CNV discovery

    Directory of Open Access Journals (Sweden)

    Stothard Paul

    2011-11-01

    Full Text Available Abstract Background One of the goals of livestock genomics research is to identify the genetic differences responsible for variation in phenotypic traits, particularly those of economic importance. Characterizing the genetic variation in livestock species is an important step towards linking genes or genomic regions with phenotypes. The completion of the bovine genome sequence and recent advances in DNA sequencing technology allow for in-depth characterization of the genetic variations present in cattle. Here we describe the whole-genome resequencing of two Bos taurus bulls from distinct breeds for the purpose of identifying and annotating novel forms of genetic variation in cattle. Results The genomes of a Black Angus bull and a Holstein bull were sequenced to 22-fold and 19-fold coverage, respectively, using the ABI SOLiD system. Comparisons of the sequences with the Btau4.0 reference assembly yielded 7 million single nucleotide polymorphisms (SNPs, 24% of which were identified in both animals. Of the total SNPs found in Holstein, Black Angus, and in both animals, 81%, 81%, and 75% respectively are novel. In-depth annotations of the data identified more than 16 thousand distinct non-synonymous SNPs (85% novel between the two datasets. Alignments between the SNP-altered proteins and orthologues from numerous species indicate that many of the SNPs alter well-conserved amino acids. Several SNPs predicted to create or remove stop codons were also found. A comparison between the sequencing SNPs and genotyping results from the BovineHD high-density genotyping chip indicates a detection rate of 91% for homozygous SNPs and 81% for heterozygous SNPs. The false positive rate is estimated to be about 2% for both the Black Angus and Holstein SNP sets, based on follow-up genotyping of 422 and 427 SNPs, respectively. Comparisons of read depth between the two bulls along the reference assembly identified 790 putative copy-number variations (CNVs. Ten

  5. Gene set analysis of the EADGENE chicken data-set

    DEFF Research Database (Denmark)

    Skarman, Axel; Jiang, Li; Hornshøj, Henrik

    2009-01-01

     Abstract Background: Gene set analysis is considered to be a way of improving our biological interpretation of the observed expression patterns. This paper describes different methods applied to analyse expression data from a chicken DNA microarray dataset. Results: Applying different gene set...

  6. Sequence to Medical Phenotypes: A Framework for Interpretation of Human Whole Genome DNA Sequence Data.

    Science.gov (United States)

    Dewey, Frederick E; Grove, Megan E; Priest, James R; Waggott, Daryl; Batra, Prag; Miller, Clint L; Wheeler, Matthew; Zia, Amin; Pan, Cuiping; Karzcewski, Konrad J; Miyake, Christina; Whirl-Carrillo, Michelle; Klein, Teri E; Datta, Somalee; Altman, Russ B; Snyder, Michael; Quertermous, Thomas; Ashley, Euan A

    2015-10-01

    High throughput sequencing has facilitated a precipitous drop in the cost of genomic sequencing, prompting predictions of a revolution in medicine via genetic personalization of diagnostic and therapeutic strategies. There are significant barriers to realizing this goal that are related to the difficult task of interpreting personal genetic variation. A comprehensive, widely accessible application for interpretation of whole genome sequence data is needed. Here, we present a series of methods for identification of genetic variants and genotypes with clinical associations, phasing genetic data and using Mendelian inheritance for quality control, and providing predictive genetic information about risk for rare disease phenotypes and response to pharmacological therapy in single individuals and father-mother-child trios. We demonstrate application of these methods for disease and drug response prognostication in whole genome sequence data from twelve unrelated adults, and for disease gene discovery in one father-mother-child trio with apparently simplex congenital ventricular arrhythmia. In doing so we identify clinically actionable inherited disease risk and drug response genotypes in pre-symptomatic individuals. We also nominate a new candidate gene in congenital arrhythmia, ATP2B4, and provide experimental evidence of a regulatory role for variants discovered using this framework.

  7. Whole genome amplification and its impact on CGH array profiles

    Directory of Open Access Journals (Sweden)

    Meldrum Cliff

    2008-07-01

    Full Text Available Abstract Background Some array comparative genomic hybridisation (array CGH platforms require a minimum of micrograms of DNA for the generation of reliable and reproducible data. For studies where there are limited amounts of genetic material, whole genome amplification (WGA is an attractive method for generating sufficient quantities of genomic material from miniscule amounts of starting material. A range of WGA methods are available and the multiple displacement amplification (MDA approach has been shown to be highly accurate, although amplification bias has been reported. In the current study, WGA was used to amplify DNA extracted from whole blood. In total, six array CGH experiments were performed to investigate whether the use of whole genome amplified DNA (wgaDNA produces reliable and reproducible results. Four experiments were conducted on amplified DNA compared to unamplified DNA and two experiments on unamplified DNA compared to unamplified DNA. Findings All the experiments involving wgaDNA resulted in a high proportion of losses and gains of genomic material. Previously, amplification bias has been overcome by using amplified DNA in both the test and reference DNA. Our data suggests that this approach may not be effective, as the gains and losses introduced by WGA appears to be random and are not reproducible between different experiments using the same DNA. Conclusion In light of these findings, the use of both amplified test and reference DNA on CGH arrays may not provide an accurate representation of copy number variation in the DNA.

  8. Whole genome sequence-based serogrouping of Listeria monocytogenes isolates.

    Science.gov (United States)

    Hyden, Patrick; Pietzka, Ariane; Lennkh, Anna; Murer, Andrea; Springer, Burkhard; Blaschitz, Marion; Indra, Alexander; Huhulescu, Steliana; Allerberger, Franz; Ruppitsch, Werner; Sensen, Christoph W

    2016-10-10

    Whole genome sequencing (WGS) is currently becoming the method of choice for characterization of Listeria monocytogenes isolates in national reference laboratories (NRLs). WGS is superior with regards to accuracy, resolution and analysis speed in comparison to several other methods including serotyping, PCR, pulsed field gel electrophoresis (PFGE), multilocus sequence typing (MLST), multilocus variable number tandem repeat analysis (MLVA), and multivirulence-locus sequence typing (MVLST), which have been used thus far for the characterization of bacterial isolates (and are still important tools in reference laboratories today) to control and prevent listeriosis, one of the major sources of foodborne diseases for humans. Backward compatibility of WGS to former methods can be maintained by extraction of the respective information from WGS data. Serotyping was the first subtyping method for L. monocytogenes capable of differentiating 12 serovars and national reference laboratories still perform serotyping and PCR-based serogrouping as a first level classification method for Listeria monocytogenes surveillance. Whole genome sequence based core genome MLST analysis of a L. monocytogenes collection comprising 172 isolates spanning all 12 serotypes was performed for serogroup determination. These isolates clustered according to their serotypes and it was possible to group them either into the IIa, IIc, IVb or IIb clusters, respectively, which were generated by minimum spanning tree (MST) and neighbor joining (NJ) tree data analysis, demonstrating the power of the new approach. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.

  9. Whole Genome Sequences of Three Treponema pallidum ssp. pertenue Strains: Yaws and Syphilis Treponemes Differ in Less than 0.2% of the Genome Sequence

    Science.gov (United States)

    Chen, Lei; Pospíšilová, Petra; Strouhal, Michal; Qin, Xiang; Mikalová, Lenka; Norris, Steven J.; Muzny, Donna M.; Gibbs, Richard A.; Fulton, Lucinda L.; Sodergren, Erica; Weinstock, George M.; Šmajs, David

    2012-01-01

    Background The yaws treponemes, Treponema pallidum ssp. pertenue (TPE) strains, are closely related to syphilis causing strains of Treponema pallidum ssp. pallidum (TPA). Both yaws and syphilis are distinguished on the basis of epidemiological characteristics, clinical symptoms, and several genetic signatures of the corresponding causative agents. Methodology/Principal Findings To precisely define genetic differences between TPA and TPE, high-quality whole genome sequences of three TPE strains (Samoa D, CDC-2, Gauthier) were determined using next-generation sequencing techniques. TPE genome sequences were compared to four genomes of TPA strains (Nichols, DAL-1, SS14, Chicago). The genome structure was identical in all three TPE strains with similar length ranging between 1,139,330 bp and 1,139,744 bp. No major genome rearrangements were found when compared to the four TPA genomes. The whole genome nucleotide divergence (dA) between TPA and TPE subspecies was 4.7 and 4.8 times higher than the observed nucleotide diversity (π) among TPA and TPE strains, respectively, corresponding to 99.8% identity between TPA and TPE genomes. A set of 97 (9.9%) TPE genes encoded proteins containing two or more amino acid replacements or other major sequence changes. The TPE divergent genes were mostly from the group encoding potential virulence factors and genes encoding proteins with unknown function. Conclusions/Significance Hypothetical genes, with genetic differences, consistently found between TPE and TPA strains are candidates for syphilitic treponemes virulence factors. Seventeen TPE genes were predicted under positive selection, and eleven of them coded either for predicted exported proteins or membrane proteins suggesting their possible association with the cell surface. Sequence changes between TPE and TPA strains and changes specific to individual strains represent suitable targets for subspecies- and strain-specific molecular diagnostics. PMID:22292095

  10. The rainbow trout genome provides novel insights into evolution after whole-genome duplication in vertebrates.

    Science.gov (United States)

    Berthelot, Camille; Brunet, Frédéric; Chalopin, Domitille; Juanchich, Amélie; Bernard, Maria; Noël, Benjamin; Bento, Pascal; Da Silva, Corinne; Labadie, Karine; Alberti, Adriana; Aury, Jean-Marc; Louis, Alexandra; Dehais, Patrice; Bardou, Philippe; Montfort, Jérôme; Klopp, Christophe; Cabau, Cédric; Gaspin, Christine; Thorgaard, Gary H; Boussaha, Mekki; Quillet, Edwige; Guyomard, René; Galiana, Delphine; Bobe, Julien; Volff, Jean-Nicolas; Genêt, Carine; Wincker, Patrick; Jaillon, Olivier; Roest Crollius, Hugues; Guiguen, Yann

    2014-04-22

    Vertebrate evolution has been shaped by several rounds of whole-genome duplications (WGDs) that are often suggested to be associated with adaptive radiations and evolutionary innovations. Due to an additional round of WGD, the rainbow trout genome offers a unique opportunity to investigate the early evolutionary fate of a duplicated vertebrate genome. Here we show that after 100 million years of evolution the two ancestral subgenomes have remained extremely collinear, despite the loss of half of the duplicated protein-coding genes, mostly through pseudogenization. In striking contrast is the fate of miRNA genes that have almost all been retained as duplicated copies. The slow and stepwise rediploidization process characterized here challenges the current hypothesis that WGD is followed by massive and rapid genomic reorganizations and gene deletions.

  11. Whole genome sequence and genome annotation of Colletotrichum acutatum, causal agent of anthracnose in pepper plants in South Korea.

    Science.gov (United States)

    Han, Joon-Hee; Chon, Jae-Kyung; Ahn, Jong-Hwa; Choi, Ik-Young; Lee, Yong-Hwan; Kim, Kyoung Su

    2016-06-01

    Colletotrichum acutatum is a destructive fungal pathogen which causes anthracnose in a wide range of crops. Here we report the whole genome sequence and annotation of C. acutatum strain KC05, isolated from an infected pepper in Kangwon, South Korea. Genomic DNA from the KC05 strain was used for the whole genome sequencing using a PacBio sequencer and the MiSeq system. The KC05 genome was determined to be 52,190,760 bp in size with a G + C content of 51.73% in 27 scaffolds and to contain 13,559 genes with an average length of 1516 bp. Gene prediction and annotation were performed by incorporating RNA-Seq data. The genome sequence of the KC05 was deposited at DDBJ/ENA/GenBank under the accession number LUXP00000000.

  12. Whole genomic constellation of the first human G8 rotavirus strain detected in Japan.

    Science.gov (United States)

    Agbemabiese, Chantal Ama; Nakagomi, Toyoko; Doan, Yen Hai; Nakagomi, Osamu

    2015-10-01

    Human G8 Rotavirus A (RVA) strains are commonly detected in Africa but are rarely detected in Japan and elsewhere in the world. In this study, the whole genome sequence of the first human G8 RVA strain designated AU109 isolated in a child with acute gastroenteritis in 1994 was determined in order to understand how the strain was generated including the host species origin of its genes. The genotype constellation of AU109 was G8-P[4]-I2-R2-C2-M2-A2-N2-T2-E2-H2. Phylogenetic analyses of the 11 genome segments revealed that its VP7 and VP1 genes were closely related to those of a Hungarian human G8P[14] RVA strain and these genes shared the most recent common ancestors in 1988 and 1982, respectively. AU109 possessed an NSP2 gene closely related to those of Chinese sheep and goat RVA strains. The remaining eight genome segments were closely related to Japanese human G2P[4] strains which circulated around 1985-1990. Bayesian evolutionary analyses revealed that the NSP2 gene of AU109 and those of the Chinese sheep and goat RVA strains diverged from a common ancestor around 1937. In conclusion, AU109 was generated through genetic reassortment event where Japanese DS-1-like G2P[4] strains circulating around 1985-1990 obtained the VP7, VP1 and NSP2 genes from unknown ruminant G8 RVA strains. These observations highlight the need for comprehensive examination of the whole genomes of RVA strains of less explored host species. Copyright © 2015 Elsevier B.V. All rights reserved.

  13. Genetic-linkage mapping of complex hereditary disorders to a whole-genome molecular-interaction network

    OpenAIRE

    Iossifov, Ivan; Zheng, Tian; Baron, Miron; Gilliam, T. Conrad; Rzhetsky, Andrey

    2008-01-01

    Common hereditary neurodevelopmental disorders such as autism, bipolar disorder, and schizophrenia are most likely both genetically multifactorial and heterogeneous. Because of these characteristics traditional methods for genetic analysis fail when applied to such diseases. To address the problem we propose a novel probabilistic framework that combines the standard genetic linkage formalism with whole-genome molecular-interaction data to predict pathways or networks of interacting genes that...

  14. New Sequence Types of Vibrio parahaemolyticus Isolated from a Malaysian Aquaculture Pond, as Revealed by Whole-Genome Sequencing.

    Science.gov (United States)

    Foo, Soon Man; Eng, Wilhelm Wei Han; Lee, Yin Peng; Gui, Kimberly; Gan, Han Ming

    2017-05-11

    The acquisition of Photorhabdus insect-related (Pir) toxin-like genes in Vibrio parahaemolyticus has been linked to hepatopancreatic necrosis disease in shrimp. We report the whole-genome sequences of genetically virulent and avirulent V. parahaemolyticus isolated from a Malaysian aquaculture pond and show that they represent previously unreported sequence types of V. parahaemolyticus. Copyright © 2017 Foo et al.

  15. Are Escherichia coli Pathotypes Still Relevant in the Era of Whole-Genome Sequencing?

    Science.gov (United States)

    Robins-Browne, Roy M; Holt, Kathryn E; Ingle, Danielle J; Hocking, Dianna M; Yang, Ji; Tauschek, Marija

    2016-01-01

    The empirical and pragmatic nature of diagnostic microbiology has given rise to several different schemes to subtype E.coli, including biotyping, serotyping, and pathotyping. These schemes have proved invaluable in identifying and tracking outbreaks, and for prognostication in individual cases of infection, but they are imprecise and potentially misleading due to the malleability and continuous evolution of E. coli. Whole genome sequencing can be used to accurately determine E. coli subtypes that are based on allelic variation or differences in gene content, such as serotyping and pathotyping. Whole genome sequencing also provides information about single nucleotide polymorphisms in the core genome of E. coli, which form the basis of sequence typing, and is more reliable than other systems for tracking the evolution and spread of individual strains. A typing scheme for E. coli based on genome sequences that includes elements of both the core and accessory genomes, should reduce typing anomalies and promote understanding of how different varieties of E. coli spread and cause disease. Such a scheme could also define pathotypes more precisely than current methods.

  16. Whole genome analysis of linezolid resistance in Streptococcus pneumoniae reveals resistance and compensatory mutations

    Directory of Open Access Journals (Sweden)

    Légaré Danielle

    2011-10-01

    Full Text Available Abstract Background Several mutations were present in the genome of Streptococcus pneumoniae linezolid-resistant strains but the role of several of these mutations had not been experimentally tested. To analyze the role of these mutations, we reconstituted resistance by serial whole genome transformation of a novel resistant isolate into two strains with sensitive background. We sequenced the parent mutant and two independent transformants exhibiting similar minimum inhibitory concentration to linezolid. Results Comparative genomic analyses revealed that transformants acquired G2576T transversions in every gene copy of 23S rRNA and that the number of altered copies correlated with the level of linezolid resistance and cross-resistance to florfenicol and chloramphenicol. One of the transformants also acquired a mutation present in the parent mutant leading to the overexpression of an ABC transporter (spr1021. The acquisition of these mutations conferred a fitness cost however, which was further enhanced by the acquisition of a mutation in a RNA methyltransferase implicated in resistance. Interestingly, the fitness of the transformants could be restored in part by the acquisition of altered copies of the L3 and L16 ribosomal proteins and by mutations leading to the overexpression of the spr1887 ABC transporter that were present in the original linezolid-resistant mutant. Conclusions Our results demonstrate the usefulness of whole genome approaches at detecting major determinants of resistance as well as compensatory mutations that alleviate the fitness cost associated with resistance.

  17. Large-scale whole genome sequencing identifies country-wide spread of an emerging G9P[8] rotavirus strain in Hungary, 2012.

    Science.gov (United States)

    Dóró, Renáta; Mihalov-Kovács, Eszter; Marton, Szilvia; László, Brigitta; Deák, Judit; Jakab, Ferenc; Juhász, Ágnes; Kisfali, Péter; Martella, Vito; Melegh, Béla; Molnár, Péter; Sántha, Ildikó; Schneider, Ferenc; Bányai, Krisztián

    2014-12-01

    With the availability of rotavirus vaccines routine strain surveillance has been launched or continued in many countries worldwide. In this study relevant information is provided from Hungary in order to extend knowledge about circulating rotavirus strains. Direct sequencing of the RT-PCR products obtained by VP7 and VP4 genes specific primer sets was utilized as routine laboratory method. In addition we explored the advantage of random primed RT-PCR and semiconductor sequencing of the whole genome of selected strains. During the study year, 2012, we identified an increase in the prevalence of G9P[8] strains across the country. This genotype combination predominated in seven out of nine study sites (detection rates, 45-83%). In addition to G9P[8]s, epidemiologically major strains included genotypes G1P[8] (34.2%), G2P[4] (13.5%), and G4P[8] (7.4%), whereas unusual and rare strains were G3P[8] (1%), G2P[8] (0.5%), G1P[4] (0.2%), G3P[4] (0.2%), and G3P[9] (0.2%). Whole genome analysis of 125 Hungarian human rotaviruses identified nine major genotype constellations and uncovered both intra- and intergenogroup reassortment events in circulating strains. Intergenogroup reassortment resulted in several unusual genotype constellations, including mono-reassortant G1P[8] and G9P[8] strains whose genotype 1 (Wa-like) backbone gene constellations contained DS1-like NSP2 and VP3 genes, respectively, as well as, a putative bovine-feline G3P[9] reassortant strain. The conserved genomic constellations of epidemiologically major genotypes suggested the clonal spread of the re-emerging G9P[8] genotype and several co-circulating strains (e.g., G1P[8] and G2P[4]) in many study sites during 2012. Of interest, medically important G2P[4] strains carried bovine-like VP1 and VP6 genes in their genotype constellation. No evidence for vaccine associated selection, or, interaction between wild-type and vaccine strains was obtained. In conclusion, this study reports the reemergence of G9P[8

  18. Uncovering the novel characteristics of Asian honey bee, Apis cerana, by whole genome sequencing.

    Science.gov (United States)

    Park, Doori; Jung, Je Won; Choi, Beom-Soon; Jayakodi, Murukarthick; Lee, Jeongsoo; Lim, Jongsung; Yu, Yeisoo; Choi, Yong-Soo; Lee, Myeong-Lyeol; Park, Yoonseong; Choi, Ik-Young; Yang, Tae-Jin; Edwards, Owain R; Nah, Gyoungju; Kwon, Hyung Wook

    2015-01-02

    The honey bee is an important model system for increasing understanding of molecular and neural mechanisms underlying social behaviors relevant to the agricultural industry and basic science. The western honey bee, Apis mellifera, has served as a model species, and its genome sequence has been published. In contrast, the genome of the Asian honey bee, Apis cerana, has not yet been sequenced. A. cerana has been raised in Asian countries for thousands of years and has brought considerable economic benefits to the apicultural industry. A cerana has divergent biological traits compared to A. mellifera and it has played a key role in maintaining biodiversity in eastern and southern Asia. Here we report the first whole genome sequence of A. cerana. Using de novo assembly methods, we produced a 238 Mbp draft of the A. cerana genome and generated 10,651 genes. A.cerana-specific genes were analyzed to better understand the novel characteristics of this honey bee species. Seventy-two percent of the A. cerana-specific genes had more than one GO term, and 1,696 enzymes were categorized into 125 pathways. Genes involved in chemoreception and immunity were carefully identified and compared to those from other sequenced insect models. These included 10 gustatory receptors, 119 odorant receptors, 10 ionotropic receptors, and 160 immune-related genes. This first report of the whole genome sequence of A. cerana provides resources for comparative sociogenomics, especially in the field of social insect communication. These important tools will contribute to a better understanding of the complex behaviors and natural biology of the Asian honey bee and to anticipate its future evolutionary trajectory.

  19. Dirofilaria immitis JYD-34 isolate: whole genome analysis

    Directory of Open Access Journals (Sweden)

    Catherine Bourguinat

    2017-11-01

    Full Text Available Abstract Background Macrocyclic lactone (ML anthelmintics are used for chemoprophylaxis for heartworm infection in dogs and cats. Cases of dogs becoming infected with heartworms, despite apparent compliance to recommended chemoprophylaxis with approved preventives, has led to such cases being considered as suspected lack of efficacy (LOE. Recently, microfilariae collected from a small number of LOE isolates were used as a source of infection of new host dogs and confirmed to have reduced susceptibility to ML in controlled efficacy studies using L3 challenge in dogs. A specific Dirofilaria immitis laboratory isolate named JYD-34 has also been confirmed to have less than 100% susceptibility to ML-based preventives. For preventive claims against heartworm disease, evidence of 100% efficacy is required by FDA-CVM. It was therefore of interest to determine whether JYD-34 has a genetic profile similar to other documented LOE and confirmed reduced susceptibility isolates or has a genetic profile similar to known ML-susceptible isolates. Methods In this study, the 90Mbp whole genome of the JYD-34 strain was sequenced. This genome was compared using bioinformatics tools to pooled whole genomes of four well-characterized susceptible D. immitis populations, one susceptible Missouri laboratory isolate, as well as the pooled whole genomes of four LOE D. immitis populations. Fixation indexes (FST, which allow the genetic structure of each population (isolate to be compared at the level of single nucleotide polymorphisms (SNP across the genome, have been calculated. Forty-one previously reported SNP, that appeared to differentiate between susceptible and LOE and confirmed reduced susceptibility isolates, were also investigated in the JYD-34 isolate. Results The FST analysis, and the analysis of the 41 SNP that appeared to differentiate reduced susceptibility from fully susceptible isolates, confirmed that the JYD-34 isolate has a genome similar to previously

  20. Dirofilaria immitis JYD-34 isolate: whole genome analysis.

    Science.gov (United States)

    Bourguinat, Catherine; Lefebvre, Francois; Sandoval, Johanna; Bondesen, Brenda; Moreno, Yovany; Prichard, Roger K

    2017-11-09

    Macrocyclic lactone (ML) anthelmintics are used for chemoprophylaxis for heartworm infection in dogs and cats. Cases of dogs becoming infected with heartworms, despite apparent compliance to recommended chemoprophylaxis with approved preventives, has led to such cases being considered as suspected lack of efficacy (LOE). Recently, microfilariae collected from a small number of LOE isolates were used as a source of infection of new host dogs and confirmed to have reduced susceptibility to ML in controlled efficacy studies using L3 challenge in dogs. A specific Dirofilaria immitis laboratory isolate named JYD-34 has also been confirmed to have less than 100% susceptibility to ML-based preventives. For preventive claims against heartworm disease, evidence of 100% efficacy is required by FDA-CVM. It was therefore of interest to determine whether JYD-34 has a genetic profile similar to other documented LOE and confirmed reduced susceptibility isolates or has a genetic profile similar to known ML-susceptible isolates. In this study, the 90Mbp whole genome of the JYD-34 strain was sequenced. This genome was compared using bioinformatics tools to pooled whole genomes of four well-characterized susceptible D. immitis populations, one susceptible Missouri laboratory isolate, as well as the pooled whole genomes of four LOE D. immitis populations. Fixation indexes (FST), which allow the genetic structure of each population (isolate) to be compared at the level of single nucleotide polymorphisms (SNP) across the genome, have been calculated. Forty-one previously reported SNP, that appeared to differentiate between susceptible and LOE and confirmed reduced susceptibility isolates, were also investigated in the JYD-34 isolate. The FST analysis, and the analysis of the 41 SNP that appeared to differentiate reduced susceptibility from fully susceptible isolates, confirmed that the JYD-34 isolate has a genome similar to previously investigated LOE isolates, and isolates confirmed to

  1. Gene set analysis for interpreting genetic studies

    DEFF Research Database (Denmark)

    Pers, Tune H

    2016-01-01

    remains. More efficient interpretation requires more complete and consistent gene set representations of biological pathways, phenotypes and functional annotations. In this review, I examine different types of gene sets, discuss how inconsistencies in gene set definitions impact GSA, describe how GSA has...... and functional annotations and may hence point towards novel biological insights. However, despite the growing availability of GSA tools, the sizeable amount of variants identified for a vast number of complex traits, and many irrefutably trait-associated gene sets, the gap between discovery and interpretation...

  2. Whole-Genome Sequencing: Automated, Nonindexed Library Preparation.

    Science.gov (United States)

    Mardis, Elaine; McCombie, W Richard

    2017-03-01

    This protocol describes an automated procedure for constructing a nonindexed Illumina DNA library and relies on the use of a CyBi-SELMA automated pipetting machine, the Covaris E210 shearing instrument, and the epMotion 5075. With this method, genomic DNA fragments are produced by sonication, using high-frequency acoustic energy to shear DNA. Here, double-stranded DNA is fragmented when exposed to the energy of adaptive focused acoustic shearing (AFA). The resulting DNA fragments are ligated to adaptors, amplified by polymerase chain reaction (PCR), and subjected to size selection using magnetic beads. The product is suitable for use as template in whole-genome sequencing. © 2017 Cold Spring Harbor Laboratory Press.

  3. Whole-genome sequencing to control antimicrobial resistance

    Science.gov (United States)

    Köser, Claudio U.; Ellington, Matthew J.; Peacock, Sharon J.

    2014-01-01

    Following recent improvements in sequencing technologies, whole-genome sequencing (WGS) is positioned to become an essential tool in the control of antibiotic resistance, a major threat in modern healthcare. WGS has already found numerous applications in this area, ranging from the development of novel antibiotics and diagnostic tests through to antibiotic stewardship of currently available drugs via surveillance and the elucidation of the factors that allow the emergence and persistence of resistance. Numerous proof-of-principle studies have also highlighted the value of WGS as a tool for day-to-day infection control and, for some pathogens, as a primary diagnostic tool to detect antibiotic resistance. However, appropriate data analysis platforms will need to be developed before routine WGS can be introduced on a large scale. PMID:25096945

  4. Plantagora: modeling whole genome sequencing and assembly of plant genomes.

    Directory of Open Access Journals (Sweden)

    Roger Barthelson

    Full Text Available BACKGROUND: Genomics studies are being revolutionized by the next generation sequencing technologies, which have made whole genome sequencing much more accessible to the average researcher. Whole genome sequencing with the new technologies is a developing art that, despite the large volumes of data that can be produced, may still fail to provide a clear and thorough map of a genome. The Plantagora project was conceived to address specifically the gap between having the technical tools for genome sequencing and knowing precisely the best way to use them. METHODOLOGY/PRINCIPAL FINDINGS: For Plantagora, a platform was created for generating simulated reads from several different plant genomes of different sizes. The resulting read files mimicked either 454 or Illumina reads, with varying paired end spacing. Thousands of datasets of reads were created, most derived from our primary model genome, rice chromosome one. All reads were assembled with different software assemblers, including Newbler, Abyss, and SOAPdenovo, and the resulting assemblies were evaluated by an extensive battery of metrics chosen for these studies. The metrics included both statistics of the assembly sequences and fidelity-related measures derived by alignment of the assemblies to the original genome source for the reads. The results were presented in a website, which includes a data graphing tool, all created to help the user compare rapidly the feasibility and effectiveness of different sequencing and assembly strategies prior to testing an approach in the lab. Some of our own conclusions regarding the different strategies were also recorded on the website. CONCLUSIONS/SIGNIFICANCE: Plantagora provides a substantial body of information for comparing different approaches to sequencing a plant genome, and some conclusions regarding some of the specific approaches. Plantagora also provides a platform of metrics and tools for studying the process of sequencing and assembly

  5. Whole genome amplification and sequencing of a Daphnia resting egg.

    Science.gov (United States)

    Lack, Justin B; Weider, Lawrence J; Jeyasingh, Punidan D

    2017-09-19

    Resting eggs banks are unique windows that allow us to directly observe shifts in population genetics, and phenotypes over time as natural populations evolve. Though a variety of planktonic organisms also produce resting stages, the keystone freshwater consumer, Daphnia, is a well-known model for paleogenetics and resurrection ecology. Nevertheless, paleogenomic investigations are limited largely because resting eggs do not contain enough DNA for genomic sequencing. In fact, genomic studies even on extant populations include a laborious preparatory phase of batch culturing dozens of individuals to generate sufficient genomic DNA. Here, we furnish a protocol to generate whole genomes of single ephippial (resting) eggs and single daphniids. Whole genomes of single ephippial eggs and single adults were amplified using Qiagen REPLI-g Single Cell kit reaction, followed by NEBNext Ultra DNA Library Prep Kit for library construction and Illumina sequencing. We compared the quality of the single-egg and single-individual amplified genomes to the standard batch genomic DNA extraction in the absence of genome amplification. At mean 20× depth, coverage was essentially identical for the amplified single individual relative to the unamplified batch extracted genome (>90% of the genome was covered and callable). Finally, while amplification resulted in the slight loss of heterozygosity for the amplified genomes, estimates were largely comparable and illustrate the utility and limitations of this approach in estimating population genetic parameters over long periods of time in natural populations of Daphnia and also other small species known to produce resting stages. © 2017 John Wiley & Sons Ltd.

  6. A well-constrained estimate for the timing of the salmonid whole genome duplication reveals major decoupling from species diversification.

    Science.gov (United States)

    Macqueen, Daniel J; Johnston, Ian A

    2014-03-07

    Whole genome duplication (WGD) is often considered to be mechanistically associated with species diversification. Such ideas have been anecdotally attached to a WGD at the stem of the salmonid fish family, but remain untested. Here, we characterized an extensive set of gene paralogues retained from the salmonid WGD, in species covering the major lineages (subfamilies Salmoninae, Thymallinae and Coregoninae). By combining the data in calibrated relaxed molecular clock analyses, we provide the first well-constrained and direct estimate for the timing of the salmonid WGD. Our results suggest that the event occurred no later in time than 88 Ma and that 40-50 Myr passed subsequently until the subfamilies diverged. We also recovered a Thymallinae-Coregoninae sister relationship with maximal support. Comparative phylogenetic tests demonstrated that salmonid diversification patterns are closely allied in time with the continuous climatic cooling that followed the Eocene-Oligocene transition, with the highest diversification rates coinciding with recent ice ages. Further tests revealed considerably higher speciation rates in lineages that evolved anadromy--the physiological capacity to migrate between fresh and seawater--than in sister groups that retained the ancestral state of freshwater residency. Anadromy, which probably evolved in response to climatic cooling, is an established catalyst of genetic isolation, particularly during environmental perturbations (for example, glaciation cycles). We thus conclude that climate-linked ecophysiological factors, rather than WGD, were the primary drivers of salmonid diversification.

  7. A model for carbohydrate metabolism in the diatom Phaeodactylum tricornutum deduced from comparative whole genome analysis.

    Directory of Open Access Journals (Sweden)

    Peter G Kroth

    Full Text Available BACKGROUND: Diatoms are unicellular algae responsible for approximately 20% of global carbon fixation. Their evolution by secondary endocytobiosis resulted in a complex cellular structure and metabolism compared to algae with primary plastids. METHODOLOGY/PRINCIPAL FINDINGS: The whole genome sequence of the diatom Phaeodactylum tricornutum has recently been completed. We identified and annotated genes for enzymes involved in carbohydrate pathways based on extensive EST support and comparison to the whole genome sequence of a second diatom, Thalassiosira pseudonana. Protein localization to mitochondria was predicted based on identified similarities to mitochondrial localization motifs in other eukaryotes, whereas protein localization to plastids was based on the presence of signal peptide motifs in combination with plastid localization motifs previously shown to be required in diatoms. We identified genes potentially involved in a C4-like photosynthesis in P. tricornutum and, on the basis of sequence-based putative localization of relevant proteins, discuss possible differences in carbon concentrating mechanisms and CO(2 fixation between the two diatoms. We also identified genes encoding enzymes involved in photorespiration with one interesting exception: glycerate kinase was not found in either P. tricornutum or T. pseudonana. Various Calvin cycle enzymes were found in up to five different isoforms, distributed between plastids, mitochondria and the cytosol. Diatoms store energy either as lipids or as chrysolaminaran (a beta-1,3-glucan outside of the plastids. We identified various beta-glucanases and large membrane-bound glucan synthases. Interestingly most of the glucanases appear to contain C-terminal anchor domains that may attach the enzymes to membranes. CONCLUSIONS/SIGNIFICANCE: Here we present a detailed synthesis of carbohydrate metabolism in diatoms based on the genome sequences of Thalassiosira pseudonana and Phaeodactylum tricornutum

  8. Assessment of whole genome amplification-induced bias through high-throughput, massively parallel whole genome sequencing

    Directory of Open Access Journals (Sweden)

    Plant Ramona N

    2006-08-01

    Full Text Available Abstract Background Whole genome amplification is an increasingly common technique through which minute amounts of DNA can be multiplied to generate quantities suitable for genetic testing and analysis. Questions of amplification-induced error and template bias generated by these methods have previously been addressed through either small scale (SNPs or large scale (CGH array, FISH methodologies. Here we utilized whole genome sequencing to assess amplification-induced bias in both coding and non-coding regions of two bacterial genomes. Halobacterium species NRC-1 DNA and Campylobacter jejuni were amplified by several common, commercially available protocols: multiple displacement amplification, primer extension pre-amplification and degenerate oligonucleotide primed PCR. The amplification-induced bias of each method was assessed by sequencing both genomes in their entirety using the 454 Sequencing System technology and comparing the results with those obtained from unamplified controls. Results All amplification methodologies induced statistically significant bias relative to the unamplified control. For the Halobacterium species NRC-1 genome, assessed at 100 base resolution, the D-statistics from GenomiPhi-amplified material were 119 times greater than those from unamplified material, 164.0 times greater for Repli-G, 165.0 times greater for PEP-PCR and 252.0 times greater than the unamplified controls for DOP-PCR. For Campylobacter jejuni, also analyzed at 100 base resolution, the D-statistics from GenomiPhi-amplified material were 15 times greater than those from unamplified material, 19.8 times greater for Repli-G, 61.8 times greater for PEP-PCR and 220.5 times greater than the unamplified controls for DOP-PCR. Conclusion Of the amplification methodologies examined in this paper, the multiple displacement amplification products generated the least bias, and produced significantly higher yields of amplified DNA.

  9. Whole-Genome Expression Analysis of Human Mesenchymal Stromal Cells Exposed to Ultrasmooth Tantalum vs. Titanium Oxide Surfaces

    DEFF Research Database (Denmark)

    Stiehler, C.; Bunger, C.; Overall, R. W.

    2013-01-01

    to Ti surface. Key genes related to osteogenesis and cell adhesion were upregulated by MSCs exposed to Ta. We further identified differentially regulated candidate transcription factors, e.g., NRF2, EGR1, IRF-1, IRF-8, NF-Y, and p53 as well as relevant signaling pathways, e.g., p53 and mTOR, indicating...... to titanium (Ti) surface. The aim of this study was to extend the previous investigation of biocompatibility by monitoring temporal gene expression of MSCs on topographically comparable smooth Ta and Ti surfaces using whole-genome gene expression analysis. Total RNA samples from telomerase-immortalized human...

  10. Whole-genome pyrosequencing of an epidemic multidrug-resistant Acinetobacter baumannii strain belonging to the European clone II group

    DEFF Research Database (Denmark)

    Iacono, M.; Villa, L.; Fortini, D.

    2008-01-01

    The whole-genome sequence of an epidemic, multidrug-resistant Acinetobacter baumannii strain (strain ACICU) belonging to the European clone II group and carrying the plasmid-mediated bla(OXA-58) carbapenem resistance gene was determined. The A. baumannii ACICU genome was compared with the genomes...... of A. baumannii ATCC 17978 and Acinetobacter baylyi ADP1, with the aim of identifying novel genes related to virulence and drug resistance. A. baumannii ACICU has a single chromosome of 3,904,116 bp (which is predicted to contain 3,758 genes) and two plasmids, pACICUI and pACICU2, of 28,279 and 64...

  11. Whole Genome Expression Profiling and Signal Pathway Screening of MSCs in Ankylosing Spondylitis

    Directory of Open Access Journals (Sweden)

    Yuxi Li

    2014-01-01

    Full Text Available The pathogenesis of dysfunctional immunoregulation of mesenchymal stem cells (MSCs in ankylosing spondylitis (AS is thought to be a complex process that involves multiple genetic alterations. In this study, MSCs derived from both healthy donors and AS patients were cultured in normal media or media mimicking an inflammatory environment. Whole genome expression profiling analysis of 33,351 genes was performed and differentially expressed genes related to AS were analyzed by GO term analysis and KEGG pathway analysis. Our results showed that in normal media 676 genes were differentially expressed in AS, 354 upregulated and 322 downregulated, while in an inflammatory environment 1767 genes were differentially expressed in AS, 1230 upregulated and 537 downregulated. GO analysis showed that these genes were mainly related to cellular processes, physiological processes, biological regulation, regulation of biological processes, and binding. In addition, by KEGG pathway analysis, 14 key genes from the MAPK signaling and 8 key genes from the TLR signaling pathway were identified as differentially regulated. The results of qRT-PCR verified the expression variation of the 9 genes mentioned above. Our study found that in an inflammatory environment ankylosing spondylitis pathogenesis may be related to activation of the MAPK and TLR signaling pathways.

  12. Whole genome sequence of Pantoea ananatis R100, an antagonistic bacterium isolated from rice seed.

    Science.gov (United States)

    Wu, Liwen; Liu, Ruifang; Niu, Yaofang; Lin, Haiyan; Ye, Weijun; Guo, Longbiao; Hu, Xingming

    2016-05-10

    Pantoea ananatis is a group of bacteria, which was first reported as plant pathogen. Recently, several papers also described its biocontrol ability. In 2003, P. ananatis R100, which showed strong antagonism against several plant pathogens, was isolated from rice seeds. In this study, whole genome sequence of this strain was determined by SMRT Cell technology. The total genome size of R100 is 4,857,861bp with 4659 coding genes (CDS), 82 tRNAs and 22 rRNAs. The genome sequence of R100 may shed a light on the research of antagonism P. ananatis. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.

  13. Whole genome association mapping by incompatibilities and local perfect phylogenies

    DEFF Research Database (Denmark)

    Mailund, Thomas; Besenbacher, Søren; Schierup, Mikkel Heide

    2006-01-01

    . Haplotype data and phased genotype data can be analysed. The power and efficiency of the method is investigated on 1) simulated genotype data under different models of disease determination 2) artificial data sets created from the HapMap ressource, and 3) data sets used for testing of other methods in order...... for this dataset the highest association score is about 60kb from the CYP2D6 gene. Conclusions: Our method has been implemented in the Blossoc (BLOck aSSOCiation) software. Using Blossoc, genome wide chip-based surveys of 3 million SNPs in 1000 cases and 1000 controls can be analysed in less than two CPU hours....

  14. The Whole-Genome and Transcriptome of the Manila Clam (Ruditapes philippinarum).

    Science.gov (United States)

    Mun, Seyoung; Kim, Yun-Ji; Markkandan, Kesavan; Shin, Wonseok; Oh, Sumin; Woo, Jiyoung; Yoo, Jongsu; An, Hyesuck; Han, Kyudong

    2017-06-01

    The manila clam, Ruditapes philippinarum, is an important bivalve species in worldwide aquaculture including Korea. The aquaculture production of R. philippinarum is under threat from diverse environmental factors including viruses, microorganisms, parasites, and water conditions with subsequently declining production. In spite of its importance as a marine resource, the reference genome of R. philippinarum for comprehensive genetic studies is largely unexplored. Here, we report the de novo whole-genome and transcriptome assembly of R. philippinarum across three different tissues (foot, gill, and adductor muscle), and provide the basic data for advanced studies in selective breeding and disease control in order to obtain successful aquaculture systems. An approximately 2.56 Gb high quality whole-genome was assembled with various library construction methods. A total of 108,034 protein coding gene models were predicted and repetitive elements including simple sequence repeats and noncoding RNAs were identified to further understanding of the genetic background of R. philippinarum for genomics-assisted breeding. Comparative analysis with the bivalve marine invertebrates uncover that the gene family related to complement C1q was enriched. Furthermore, we performed transcriptome analysis with three different tissues in order to support genome annotation and then identified 41,275 transcripts which were annotated. The R. philippinarum genome resource will markedly advance a wide range of potential genetic studies, a reference genome for comparative analysis of bivalve species and unraveling mechanisms of biological processes in molluscs. We believe that the R. philippinarum genome will serve as an initial platform for breeding better-quality clams using a genomic approach. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  15. Comprehensive Rare Variant Analysis via Whole-Genome Sequencing to Determine the Molecular Pathology of Inherited Retinal Disease.

    Science.gov (United States)

    Carss, Keren J; Arno, Gavin; Erwood, Marie; Stephens, Jonathan; Sanchis-Juan, Alba; Hull, Sarah; Megy, Karyn; Grozeva, Detelina; Dewhurst, Eleanor; Malka, Samantha; Plagnol, Vincent; Penkett, Christopher; Stirrups, Kathleen; Rizzo, Roberta; Wright, Genevieve; Josifova, Dragana; Bitner-Glindzicz, Maria; Scott, Richard H; Clement, Emma; Allen, Louise; Armstrong, Ruth; Brady, Angela F; Carmichael, Jenny; Chitre, Manali; Henderson, Robert H H; Hurst, Jane; MacLaren, Robert E; Murphy, Elaine; Paterson, Joan; Rosser, Elisabeth; Thompson, Dorothy A; Wakeling, Emma; Ouwehand, Willem H; Michaelides, Michel; Moore, Anthony T; Webster, Andrew R; Raymond, F Lucy

    2017-01-05

    Inherited retinal disease is a common cause of visual impairment and represents a highly heterogeneous group of conditions. Here, we present findings from a cohort of 722 individuals with inherited retinal disease, who have had whole-genome sequencing (n = 605), whole-exome sequencing (n = 72), or both (n = 45) performed, as part of the NIHR-BioResource Rare Diseases research study. We identified pathogenic variants (single-nucleotide variants, indels, or structural variants) for 404/722 (56%) individuals. Whole-genome sequencing gives unprecedented power to detect three categories of pathogenic variants in particular: structural variants, variants in GC-rich regions, which have significantly improved coverage compared to whole-exome sequencing, and variants in non-coding regulatory regions. In addition to previously reported pathogenic regulatory variants, we have identified a previously unreported pathogenic intronic variant in CHM in two males with choroideremia. We have also identified 19 genes not previously known to be associated with inherited retinal disease, which harbor biallelic predicted protein-truncating variants in unsolved cases. Whole-genome sequencing is an increasingly important comprehensive method with which to investigate the genetic causes of inherited retinal disease. Copyright © 2017. Published by Elsevier Inc.

  16. Reliable reconstruction of HIV-1 whole genome haplotypes reveals clonal interference and genetic hitchhiking among immune escape variants

    Science.gov (United States)

    2014-01-01

    Background Following transmission, HIV-1 evolves into a diverse population, and next generation sequencing enables us to detect variants occurring at low frequencies. Studying viral evolution at the level of whole genomes was hitherto not possible because next generation sequencing delivers relatively short reads. Results We here provide a proof of principle that whole HIV-1 genomes can be reliably reconstructed from short reads, and use this to study the selection of immune escape mutations at the level of whole genome haplotypes. Using realistically simulated HIV-1 populations, we demonstrate that reconstruction of complete genome haplotypes is feasible with high fidelity. We do not reconstruct all genetically distinct genomes, but each reconstructed haplotype represents one or more of the quasispecies in the HIV-1 population. We then reconstruct 30 whole genome haplotypes from published short sequence reads sampled longitudinally from a single HIV-1 infected patient. We confirm the reliability of the reconstruction by validating our predicted haplotype genes with single genome amplification sequences, and by comparing haplotype frequencies with observed epitope escape frequencies. Conclusions Phylogenetic analysis shows that the HIV-1 population undergoes selection driven evolution, with successive replacement of the viral population by novel dominant strains. We demonstrate that immune escape mutants evolve in a dependent manner with various mutations hitchhiking along with others. As a consequence of this clonal interference, selection coefficients have to be estimated for complete haplotypes and not for individual immune escapes. PMID:24996694

  17. Use of routinely collected amniotic fluid for whole-genome expression analysis of polygenic disorders.

    Science.gov (United States)

    Nagy, Gyula Richárd; Gyõrffy, Balázs; Galamb, Orsolya; Molnár, Béla; Nagy, Bálint; Papp, Zoltán

    2006-11-01

    Neural tube defects related to polygenic disorders are the second most common birth defects in the world, but no molecular biologic tests are available to analyze the genes involved in the pathomechanism of these disorders. We explored the use of routinely collected amniotic fluid to characterize the differential gene expression profiles of polygenic disorders. We used oligonucleotide microarrays to analyze amniotic fluid samples obtained from pregnant women carrying fetuses with neural tube defects diagnosed during ultrasound examination. The control samples were obtained from pregnant women who underwent routine genetic amniocentesis because of advanced maternal age (>35 years). We also investigated specific folate-related genes because maternal periconceptional folic acid supplementation has been found to have a protective effect with respect to neural tube defects. Fetal mRNA from amniocytes was successfully isolated, amplified, labeled, and hybridized to whole-genome transcript arrays. We detected differential gene expression profiles between cases and controls. Highlighted genes such as SLA, LST1, and BENE might be important in the development of neural tube defects. None of the specific folate-related genes were in the top 100 associated transcripts. This pilot study demonstrated that a routinely collected amount of amniotic fluid (as small as 6 mL) can provide sufficient RNA to successfully hybridize to expression arrays. Analysis of the differences in fetal gene expressions might help us decipher the complex genetic background of polygenic disorders.

  18. Whole genome methylation profiles as independent markers of survival in stage IIIC melanoma patients

    Directory of Open Access Journals (Sweden)

    Sigalotti Luca

    2012-09-01

    Full Text Available Abstract Background The clinical course of cutaneous melanoma (CM can differ significantly for patients with identical stages of disease, defined clinico-pathologically, and no molecular markers differentiate patients with such a diverse prognosis. This study aimed to define the prognostic value of whole genome DNA methylation profiles in stage III CM. Methods Genome-wide methylation profiles were evaluated by the Illumina Human Methylation 27 BeadChip assay in short-term neoplastic cell cultures from 45 stage IIIC CM patients. Unsupervised K-means partitioning clustering was exploited to sort patients into 2 groups based on their methylation profiles. Methylation patterns related to the discovered groups were determined using the nearest shrunken centroid classification algorithm. The impact of genome-wide methylation patterns on overall survival (OS was assessed using Cox regression and Kaplan-Meier analyses. Results Unsupervised K-means partitioning by whole genome methylation profiles identified classes with significantly different OS in stage IIIC CM patients. Patients with a “favorable” methylation profile had increased OS (P = 0.001, log-rank = 10.2 by Kaplan-Meier analysis. Median OS of stage IIIC patients with a “favorable” vs. “unfavorable” methylation profile were 31.5 and 10.4 months, respectively. The 5 year OS for stage IIIC patients with a “favorable” methylation profile was 41.2% as compared to 0% for patients with an “unfavorable” methylation profile. Among the variables examined by multivariate Cox regression analysis, classification defined by methylation profile was the only predictor of OS (Hazard Ratio = 2.41, for “unfavorable” methylation profile; 95% Confidence Interval: 1.02-5.70; P = 0.045. A 17 gene methylation signature able to correctly assign prognosis (overall error rate = 0 in stage IIIC patients on the basis of distinct methylation-defined groups was also identified

  19. Whole-genome sequencing of giant pandas provides insights into demographic history and local adaptation

    DEFF Research Database (Denmark)

    Zhao, Shancen; Zheng, Pingping; Dong, Shanshan

    2013-01-01

    The panda lineage dates back to the late Miocene and ultimately leads to only one extant species, the giant panda (Ailuropoda melanoleuca). Although global climate change and anthropogenic disturbances are recognized to shape animal population demography their contribution to panda population...... dynamics remains largely unknown. We sequenced the whole genomes of 34 pandas at an average 4.7-fold coverage and used this data set together with the previously deep-sequenced panda genome to reconstruct a continuous demographic history of pandas from their origin to the present. We identify two...... panda populations that show genetic adaptation to their environments. However, in all three populations, anthropogenic activities have negatively affected pandas for 3,000 years....

  20. Revisiting the genotyping scheme for varicella-zoster viruses based on whole-genome comparisons.

    Science.gov (United States)

    Jensen, Nancy J; Rivailler, Pierre; Tseng, Hung Fu; Quinlivan, Mark L; Radford, Kay; Folster, Jennifer; Harpaz, Rafael; LaRussa, Philip; Jacobsen, Steven; Scott Schmid, D

    2017-06-01

    We report whole-genome sequences (WGSs) for four varicella-zoster virus (VZV) samples from a shingles study conducted by Kaiser Permanente of Southern California. Comparative genomics and phylogenetic analysis of all published VZV WGSs revealed that strain KY037798 is in clade IX, which shall henceforth be designated clade 9. Previously published single nucleotide polymorphisms (SNP)-based genotyping schemes fail to discriminate between clades 6 and VIII and employ positions that are not clade-specific. We provide an updated list of clade-specific positions that supersedes the list determined at the 2008 VZV nomenclature meeting. Finally, we propose a new targeted genotyping scheme that will discriminate the circulating VZV clades with at least a twofold redundancy. Genotyping strategies using a limited set of targeted SNPs will continue to provide an efficient 'first pass' method for VZV strain surveillance as vaccination programmes for varicella and zoster influence the dynamics of VZV transmission.

  1. Application of Whole-Genome Sequencing to an Unusual Outbreak of Invasive Group A Streptococcal Disease.

    Science.gov (United States)

    Galloway-Peña, Jessica; Clement, Meredith E; Sharma Kuinkel, Batu K; Ruffin, Felicia; Flores, Anthony R; Levinson, Howard; Shelburne, Samuel A; Moore, Zack; Fowler, Vance G

    2016-01-01

    Whole-genome analysis was applied to investigate atypical point-source transmission of 2 invasive group A streptococcal (GAS) infections. Isolates were serotype M4, ST39, and genetically indistinguishable. Comparison with MGAS10750 revealed nonsynonymous polymorphisms in ropB and increased speB transcription. This study demonstrates the usefulness of whole-genome analyses for GAS outbreaks.

  2. Accuracy of genomic prediction using imputed whole-genome sequence data in white layers

    NARCIS (Netherlands)

    Heidaritabar, M.; Calus, M.P.L.; Megens, H.J.; Vereijken, A.; Groenen, M.A.M.; Bastiaansen, J.W.M.

    2016-01-01

    There is an increasing interest in using whole-genome sequence data in genomic selection breeding programmes. Prediction of breeding values is expected to be more accurate when whole-genome sequence is used, because the causal mutations are assumed to be in the data. We performed genomic

  3. Whole-genome sequence variation, population structure and demographic history of the Dutch population

    NARCIS (Netherlands)

    The Genome of the Netherlands Consortium; T. Marschall (Tobias); A. Schönhuth (Alexander)

    2014-01-01

    htmlabstractWhole-genome sequencing enables complete characterization of genetic variation, but geographic clustering of rare alleles demands many diverse populations be studied. Here we describe the Genome of the Netherlands (GoNL) Project, in which we sequenced the whole genomes of 250 Dutch

  4. Accuracy of imputation to whole-genome sequence data in Holstein Friesian cattle

    NARCIS (Netherlands)

    Binsbergen, van R.; Bink, M.C.A.M.; Calus, M.P.L.; Eeuwijk, van F.A.; Hayes, B.J.; Hulsegge, B.; Veerkamp, R.F.

    2014-01-01

    Background The use of whole-genome sequence data can lead to higher accuracy in genome-wide association studies and genomic predictions. However, to benefit from whole-genome sequence data, a large dataset of sequenced individuals is needed. Imputation from SNP panels, such as the Illumina

  5. Prospects of whole-genome sequence data in animal and plant breeding

    NARCIS (Netherlands)

    Binsbergen, van Rianne

    2017-01-01

    The rapid decrease in costs of DNA sequencing implies that whole-genome sequence data will be widely available in the coming few years. Whole-genome sequence data includes all base-pairs on the genome that show variation in the sequenced population. Consequently, it is assumed that the causal

  6. Genomic Prediction from Whole Genome Sequence in Livestock: The 1000 Bull Genomes Project

    DEFF Research Database (Denmark)

    Hayes, Benjamin J; MacLeod, Iona M; Daetwyler, Hans D

    Advantages of using whole genome sequence data to predict genomic estimated breeding values (GEBV) include better persistence of accuracy of GEBV across generations and more accurate GEBV across breeds. The 1000 Bull Genomes Project provides a database of whole genome sequenced key ancestor bulls...

  7. Post-Fragmentation Whole Genome Amplification-Based Method

    Science.gov (United States)

    Benardini, James; LaDuc, Myron T.; Langmore, John

    2011-01-01

    This innovation is derived from a proprietary amplification scheme that is based upon random fragmentation of the genome into a series of short, overlapping templates. The resulting shorter DNA strands (fragmentation whole genome amplification-based technology provides a robust and accurate method of amplifying femtogram levels of starting material into microgram yields with no detectable allele bias. The amplified DNA also facilitates the preservation of samples (spacecraft samples) by amplifying scarce amounts of template DNA into microgram concentrations in just a few hours. Based on further optimization of this technology, this could be a feasible technology to use in sample preservation for potential future sample return missions. The research and technology development described here can be pivotal in dealing with backward/forward biological contamination from planetary missions. Such efforts rely heavily on an increasing understanding of the burden and diversity of microorganisms present on spacecraft surfaces throughout assembly and testing. The development and implementation of these technologies could significantly improve the comprehensiveness and resolving power of spacecraft-associated microbial population censuses, and are important to the continued evolution and advancement of planetary protection capabilities. Current molecular procedures for assaying spacecraft-associated microbial burden and diversity have inherent sample loss issues at practically every step, particularly nucleic acid extraction. In engineering a molecular means of amplifying nucleic acids directly from single cells in their native state within the sample matrix, this innovation has circumvented entirely the need for DNA extraction regimes in the sample processing scheme.

  8. Gene set analysis for longitudinal gene expression data

    Directory of Open Access Journals (Sweden)

    Piepho Hans-Peter

    2011-07-01

    Full Text Available Abstract Background Gene set analysis (GSA has become a successful tool to interpret gene expression profiles in terms of biological functions, molecular pathways, or genomic locations. GSA performs statistical tests for independent microarray samples at the level of gene sets rather than individual genes. Nowadays, an increasing number of microarray studies are conducted to explore the dynamic changes of gene expression in a variety of species and biological scenarios. In these longitudinal studies, gene expression is repeatedly measured over time such that a GSA needs to take into account the within-gene correlations in addition to possible between-gene correlations. Results We provide a robust nonparametric approach to compare the expressions of longitudinally measured sets of genes under multiple treatments or experimental conditions. The limiting distributions of our statistics are derived when the number of genes goes to infinity while the number of replications can be small. When the number of genes in a gene set is small, we recommend permutation tests based on our nonparametric test statistics to achieve reliable type I error and better power while incorporating unknown correlations between and within-genes. Simulation results demonstrate that the proposed method has a greater power than other methods for various data distributions and heteroscedastic correlation structures. This method was used for an IL-2 stimulation study and significantly altered gene sets were identified. Conclusions The simulation study and the real data application showed that the proposed gene set analysis provides a promising tool for longitudinal microarray analysis. R scripts for simulating longitudinal data and calculating the nonparametric statistics are posted on the North Dakota INBRE website http://ndinbre.org/programs/bioinformatics.php. Raw microarray data is available in Gene Expression Omnibus (National Center for Biotechnology Information with

  9. Whole genome expression profiling in chewing-tobacco-associated oral cancers: a pilot study.

    Science.gov (United States)

    Chakrabarti, Sanjukta; Multani, Shaleen; Dabholkar, Jyoti; Saranath, Dhananjaya

    2015-03-01

    The current study was undertaken with a view to identify differential biomarkers in chewing-tobacco-associated oral cancer tissues in patients of Indian ethnicity. The gene expression profile was analyzed in oral cancer tissues as compared to clinically normal oral buccal mucosa. We examined 30 oral cancer tissues and 27 normal oral tissues with 16 paired samples from contralateral site of the patient and 14 unpaired samples from different oral cancer patients, for whole genome expression using high-throughput IlluminaSentrix Human Ref-8 v2 Expression BeadChip array. The cDNA microarray analysis identified 425 differentially expressed genes with >1.5-fold expression in the oral cancer tissues as compared to normal tissues in the oral cancer patients. Overexpression of 255 genes and downregulation of 170 genes (p TNFSF13B, TMPRSS11A); signal transduction (FOLR2, MME, HTR3B); invasion and metastasis (SPP1, TNFAIP6, EPHB6); differentiation (CLEC4A, ELF5); angiogenesis (CXCL1); apoptosis (GLIPR1, WISP1, DAPL1); and immune responses (CD300A, IFIT2, TREM2); and metabolism (NNMT; ALDH3A1). Besides, several of the genes have been differentially expressed in human cancers including oral cancer. Our data indicated differentially expressed genes in oral cancer tissues and may identify prognostic and therapeutic biomarkers in oral cancers, postvalidation in larger numbers and varied population samples.

  10. Whole-Genome Sequence of a blaOXA-48-Harboring Raoultella ornithinolytica Clinical Isolate from Lebanon.

    Science.gov (United States)

    Al-Bayssari, Charbel; Olaitan, Abiola Olumuyiwa; Leangapichart, Thongpan; Okdah, Liliane; Dabboussi, Fouad; Hamze, Monzer; Rolain, Jean-Marc

    2016-04-01

    We analyzed the whole-genome sequence of ablaOXA-48-harboringRaoultella ornithinolyticaclinical isolate from a patient in Lebanon. The size of theRaoultella ornithinolyticaCMUL058 genome was 5,622,862 bp, with a G+C content of 55.7%. We deciphered all the molecular mechanisms of antibiotic resistance, and we compared our genome to other availableR. ornithinolyticagenomes in GenBank. The resistome consisted of 9 antibiotic resistance genes, including a plasmidicblaOXA-48gene whose genetic organization is also described. Copyright © 2016, American Society for Microbiology. All Rights Reserved.

  11. Evidence for an ancient whole genome duplication in the cycad lineage.

    Directory of Open Access Journals (Sweden)

    Danielle Roodt

    Full Text Available Contrary to the many whole genome duplication events recorded for angiosperms (flowering plants, whole genome duplications in gymnosperms (non-flowering seed plants seem to be much rarer. Although ancient whole genome duplications have been reported for most gymnosperm lineages as well, some are still contested and need to be confirmed. For instance, data for ginkgo, but particularly cycads have remained inconclusive so far, likely due to the quality of the data available and flaws in the analysis. We extracted and sequenced RNA from both the cycad Encephalartos natalensis and Ginkgo biloba. This was followed by transcriptome assembly, after which these data were used to build paralog age distributions. Based on these distributions, we identified remnants of an ancient whole genome duplication in both cycads and ginkgo. The most parsimonious explanation would be that this whole genome duplication event was shared between both species and had occurred prior to their divergence, about 300 million years ago.

  12. Whole-Genome Sequencing and Concordance Between Antimicrobial Susceptibility Genotypes and Phenotypes of Bacterial Isolates Associated with Bovine Respiratory Disease

    Directory of Open Access Journals (Sweden)

    Joseph R. Owen

    2017-09-01

    Full Text Available Extended laboratory culture and antimicrobial susceptibility testing timelines hinder rapid species identification and susceptibility profiling of bacterial pathogens associated with bovine respiratory disease, the most prevalent cause of cattle mortality in the United States. Whole-genome sequencing offers a culture-independent alternative to current bacterial identification methods, but requires a library of bacterial reference genomes for comparison. To contribute new bacterial genome assemblies and evaluate genetic diversity and variation in antimicrobial resistance genotypes, whole-genome sequencing was performed on bovine respiratory disease–associated bacterial isolates (Histophilus somni, Mycoplasma bovis, Mannheimia haemolytica, and Pasteurella multocida from dairy and beef cattle. One hundred genomically distinct assemblies were added to the NCBI database, doubling the available genomic sequences for these four species. Computer-based methods identified 11 predicted antimicrobial resistance genes in three species, with none being detected in M. bovis. While computer-based analysis can identify antibiotic resistance genes within whole-genome sequences (genotype, it may not predict the actual antimicrobial resistance observed in a living organism (phenotype. Antimicrobial susceptibility testing on 64 H. somni, M. haemolytica, and P. multocida isolates had an overall concordance rate between genotype and phenotypic resistance to the associated class of antimicrobials of 72.7% (P < 0.001, showing substantial discordance. Concordance rates varied greatly among different antimicrobial, antibiotic resistance gene, and bacterial species combinations. This suggests that antimicrobial susceptibility phenotypes are needed to complement genomically predicted antibiotic resistance gene genotypes to better understand how the presence of antibiotic resistance genes within a given bacterial species could potentially impact optimal bovine respiratory

  13. Rice–arsenate interactions in hydroponics: whole genome transcriptional analysis

    Science.gov (United States)

    Norton, Gareth J.; Lou-Hing, Daniel E.; Meharg, Andrew A.; Price, Adam H.

    2008-01-01

    Rice (Oryza sativa) varieties that are arsenate-tolerant (Bala) and -sensitive (Azucena) were used to conduct a transcriptome analysis of the response of rice seedlings to sodium arsenate (AsV) in hydroponic solution. RNA extracted from the roots of three replicate experiments of plants grown for 1 week in phosphate-free nutrient with or without 13.3 μM AsV was used to challenge the Affymetrix (52K) GeneChip Rice Genome array. A total of 576 probe sets were significantly up-regulated at least 2-fold in both varieties, whereas 622 were down-regulated. Ontological classification is presented. As expected, a large number of transcription factors, stress proteins, and transporters demonstrated differential expression. Striking is the lack of response of classic oxidative stress-responsive genes or phytochelatin synthases/synthatases. However, the large number of responses from genes involved in glutathione synthesis, metabolism, and transport suggests that glutathione conjugation and arsenate methylation may be important biochemical responses to arsenate challenge. In this report, no attempt is made to dissect differences in the response of the tolerant and sensitive variety, but analysis in a companion article will link gene expression to the known tolerance loci available in the Bala×Azucena mapping population. PMID:18453530

  14. Rice-arsenate interactions in hydroponics: whole genome transcriptional analysis.

    Science.gov (United States)

    Norton, Gareth J; Lou-Hing, Daniel E; Meharg, Andrew A; Price, Adam H

    2008-01-01

    Rice (Oryza sativa) varieties that are arsenate-tolerant (Bala) and -sensitive (Azucena) were used to conduct a transcriptome analysis of the response of rice seedlings to sodium arsenate (AsV) in hydroponic solution. RNA extracted from the roots of three replicate experiments of plants grown for 1 week in phosphate-free nutrient with or without 13.3 muM AsV was used to challenge the Affymetrix (52K) GeneChip Rice Genome array. A total of 576 probe sets were significantly up-regulated at least 2-fold in both varieties, whereas 622 were down-regulated. Ontological classification is presented. As expected, a large number of transcription factors, stress proteins, and transporters demonstrated differential expression. Striking is the lack of response of classic oxidative stress-responsive genes or phytochelatin synthases/synthatases. However, the large number of responses from genes involved in glutathione synthesis, metabolism, and transport suggests that glutathione conjugation and arsenate methylation may be important biochemical responses to arsenate challenge. In this report, no attempt is made to dissect differences in the response of the tolerant and sensitive variety, but analysis in a companion article will link gene expression to the known tolerance loci available in the BalaxAzucena mapping population.

  15. Environmental Whole-Genome Amplification to Access Microbial Diversity in Contaminated Sediments

    Energy Technology Data Exchange (ETDEWEB)

    Abulencia, C.B.; Wyborski, D.L.; Garcia, J.; Podar, M.; Chen, W.; Chang, S.H.; Chang, H.W.; Watson, D.; Brodie,E.I.; Hazen, T.C.; Keller, M.

    2005-12-10

    Low-biomass samples from nitrate and heavy metal contaminated soils yield DNA amounts that have limited use for direct, native analysis and screening. Multiple displacement amplification (MDA) using ?29 DNA polymerase was used to amplify whole genomes from environmental, contaminated, subsurface sediments. By first amplifying the genomic DNA (gDNA), biodiversity analysis and gDNA library construction of microbes found in contaminated soils were made possible. The MDA method was validated by analyzing amplified genome coverage from approximately five Escherichia coli cells, resulting in 99.2 percent genome coverage. The method was further validated by confirming overall representative species coverage and also an amplification bias when amplifying from a mix of eight known bacterial strains. We extracted DNA from samples with extremely low cell densities from a U.S. Department of Energy contaminated site. After amplification, small subunit rRNA analysis revealed relatively even distribution of species across several major phyla. Clone libraries were constructed from the amplified gDNA, and a small subset of clones was used for shotgun sequencing. BLAST analysis of the library clone sequences showed that 64.9 percent of the sequences had significant similarities to known proteins, and ''clusters of orthologous groups'' (COG) analysis revealed that more than half of the sequences from each library contained sequence similarity to known proteins. The libraries can be readily screened for native genes or any target of interest. Whole-genome amplification of metagenomic DNA from very minute microbial sources, while introducing an amplification bias, will allow access to genomic information that was not previously accessible.

  16. Whole genome sequencing and methylome analysis of the wild guinea pig.

    Science.gov (United States)

    Weyrich, Alexandra; Schüllermann, Tino; Heeger, Felix; Jeschek, Marie; Mazzoni, Camila J; Chen, Wei; Schumann, Kathrin; Fickel, Joerns

    2014-11-28

    DNA methylation is a heritable mechanism that acts in response to environmental changes, lifestyle and diseases by influencing gene expression in eukaryotes. Epigenetic studies of wild organisms are mandatory to understand their role in e.g. adaptational processes in the great variety of ecological niches. However, strategies to address those questions on a methylome scale are widely missing. In this study we present such a strategy and describe a whole genome sequence and methylome analysis of the wild guinea pig. We generated a full Wild guinea pig (Cavia aperea) genome sequence with enhanced coverage of methylated regions, benefiting from the available sequence of the domesticated relative Cavia porcellus. This new genome sequence was then used as reference to map the sequence reads of bisulfite treated Wild guinea pig sequencing libraries to investigate DNA-methylation patterns at nucleotide-specific level, by using our here described method, named 'DNA-enrichment-bisulfite-sequencing' (MEBS). The results achieved using MEBS matched those of standard methods in other mammalian model species. The technique is cost efficient, and incorporates both methylation enrichment results and a nucleotide-specific resolution even without a whole genome sequence available. Thus MEBS can be easily applied to extend methylation enrichment studies to a nucleotide-specific level. The approach is suited to study methylomes of not yet sequenced mammals at single nucleotide resolution. The strategy is transferable to other mammalian species by applying the nuclear genome sequence of a close relative. It is therefore of interest for studies on a variety of wild species trying to answer evolutionary, adaptational, ecological or medical questions by epigenetic mechanisms.

  17. Impacts of Whole-Genome Triplication on MIRNA Evolution in Brassica rapa.

    Science.gov (United States)

    Sun, Chao; Wu, Jian; Liang, Jianli; Schnable, James C; Yang, Wencai; Cheng, Feng; Wang, Xiaowu

    2015-11-01

    MicroRNAs (miRNAs) are a class of short non-coding, endogenous RNAs that play essential roles in eukaryotes. Although the influence of whole-genome triplication (WGT) on protein-coding genes has been well documented in Brassica rapa, little is known about its impacts on MIRNAs. In this study, through generating a comprehensive annotation of 680 MIRNAs for B. rapa, we analyzed the evolutionary characteristics of these MIRNAs from different aspects in B. rapa. First, while MIRNAs and genes show similar patterns of biased distribution among subgenomes of B. rapa, we found that MIRNAs are much more overretained than genes following fractionation after WGT. Second, multiple-copy MIRNAs show significant sequence conservation than that of single-copy MIRNAs, which is opposite to that of genes. This indicates that increased purifying selection is acting upon these highly retained multiple-copy MIRNAs and their functional importance over singleton MIRNAs. Furthermore, we found the extensive divergence between pairs of miRNAs and their target genes following the WGT in B. rapa. In summary, our study provides a valuable resource for exploring MIRNA in B. rapa and highlights the impacts of WGT on the evolution of MIRNA. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  18. Integrated analysis of whole genome and transcriptome sequencing reveals diverse transcriptomic aberrations driven by somatic genomic changes in liver cancers.

    Directory of Open Access Journals (Sweden)

    Yuichi Shiraishi

    Full Text Available Recent studies applying high-throughput sequencing technologies have identified several recurrently mutated genes and pathways in multiple cancer genomes. However, transcriptional consequences from these genomic alterations in cancer genome remain unclear. In this study, we performed integrated and comparative analyses of whole genomes and transcriptomes of 22 hepatitis B virus (HBV-related hepatocellular carcinomas (HCCs and their matched controls. Comparison of whole genome sequence (WGS and RNA-Seq revealed much evidence that various types of genomic mutations triggered diverse transcriptional changes. Not only splice-site mutations, but also silent mutations in coding regions, deep intronic mutations and structural changes caused splicing aberrations. HBV integrations generated diverse patterns of virus-human fusion transcripts depending on affected gene, such as TERT, CDK15, FN1 and MLL4. Structural variations could drive over-expression of genes such as WNT ligands, with/without creating gene fusions. Furthermore, by taking account of genomic mutations causing transcriptional aberrations, we could improve the sensitivity of deleterious mutation detection in known cancer driver genes (TP53, AXIN1, ARID2, RPS6KA3, and identified recurrent disruptions in putative cancer driver genes such as HNF4A, CPS1, TSC1 and THRAP3 in HCCs. These findings indicate genomic alterations in cancer genome have diverse transcriptomic effects, and integrated analysis of WGS and RNA-Seq can facilitate the interpretation of a large number of genomic alterations detected in cancer genome.

  19. Parent and Public Interest in Whole Genome Sequencing

    Science.gov (United States)

    Dodson, Daniel S.; Goldenberg, Aaron J.; Davis, Matthew M.; Singer, Dianne C.; Tarini, Beth A.

    2015-01-01

    Objective To assess the baseline interest of the public in whole genome sequencing (WGS) for themselves, parents’ interest in WGS for their youngest children, and factors associated with such interest. Methods A random sample of adults from a probability-based nationally representative online panel was surveyed. All participants were provided basic information about WGS and then asked their interest in WGS for themselves. Those participants who self-identified as parents were asked about their interest in WGS for their children. The order in which parents were asked about their interest in WGS for themselves and their child was randomized. The relationship between parent/child characteristics and interest in WGS was examined. Results Overall response rate was 62% (55% among parents). 58.6% of the total population (parents and non-parents) was interested in WGS for themselves. Similarly, 61.8% of parents were interested in WGS for themselves and 57.8% were interested in WGS for their youngest children. Of note, 84.7% of parents showed an identical interest level in WGS for themselves and their youngest children. Mothers as a whole, and parents whose youngest children had ≥2 health conditions had significantly more interest in WGS for themselves and their youngest children, while those with conservative political ideologies had considerably less. Conclusions While U.S. adults have varying interest levels in WGS, parents appear to have similar interests in genome testing for themselves and their youngest children. As WGS technology becomes available in the clinic and private market, clinicians should be prepared to discuss WGS risks and benefits with their patients. PMID:25765282

  20. Parent and public interest in whole-genome sequencing.

    Science.gov (United States)

    Dodson, Daniel S; Goldenberg, Aaron J; Davis, Matthew M; Singer, Dianne C; Tarini, Beth A

    2015-01-01

    The aim of this study was to assess the baseline interest of the public in whole-genome sequencing (WGS) for oneself, parents' interest in WGS for their youngest children, and factors associated with such interest. A random sample of adults from a probability-based nationally representative online panel was surveyed. All participants were provided basic information about WGS and then asked about their interest in WGS for themselves. Those participants who were parents were additionally asked about their interest in WGS for their children. The order in which parents were asked about their interest in WGS for themselves and for their child was randomized. The relationship between parent/child characteristics and interest in WGS was examined. The overall response rate was 62% (55% among parents). 58.6% of the total population (parents and nonparents) was interested in WGS for themselves. Similarly, 61.8% of the parents were interested in WGS for themselves and 57.8% were interested in WGS for their youngest children. Of note, 84.7% of the parents showed an identical interest level in WGS for themselves and their youngest children. Mothers as a group and parents whose youngest children had ≥2 health conditions had significantly more interest in WGS for themselves and their youngest children, while those with conservative political ideologies had considerably less. While US adults have varying interest levels in WGS, parents appear to have similar interests in genome testing for themselves and their youngest children. As WGS technology becomes available in the clinic and private market, clinicians should be prepared to discuss WGS risks and benefits with their patients. © 2015 S. Karger AG, Basel.

  1. SyntenyTracker: a tool for defining homologous synteny blocks using radiation hybrid maps and whole-genome sequence

    Directory of Open Access Journals (Sweden)

    Lewin Harris A

    2009-07-01

    Full Text Available Abstract Background The recent availability of genomic sequences and BAC libraries for a large number of mammals provides an excellent opportunity for identifying comparatively-anchored markers that are useful for creating high-resolution radiation-hybrid (RH and BAC-based comparative maps. To use these maps for multispecies genome comparison and evolutionary inference, robust bioinformatic tools are required for the identification of chromosomal regions shared between genomes and to localize the positions of evolutionary breakpoints that are the signatures of chromosomal rearrangements. Here we report an automated tool for the identification of homologous synteny blocks (HSBs between genomes that tolerates errors common in RH comparative maps and can be used for automated whole-genome analysis of chromosome rearrangements that occur during evolution. Findings We developed an algorithm and software tool (SyntenyTracker that can be used for automated definition of HSBs using pair-wise RH or gene-based comparative maps as input. To verify correct implementation of the underlying algorithm, SyntenyTracker was used to identify HSBs in the cattle and human genomes. Results demonstrated 96% agreement with HSBs defined manually using the same set of rules. A comparison of SyntenyTracker with the AutoGRAPH synteny tool was performed using identical datasets containing 14,380 genes with 1:1 orthology in human and mouse. Discrepancies between the results using the two tools and advantages of SyntenyTracker are reported. Conclusion SyntenyTracker was shown to be an efficient and accurate automated tool for defining HSBs using datasets that may contain minor errors resulting from limitations in map construction methodologies. The utility of SyntenyTracker will become more important for comparative genomics as the number of mapped and sequenced genomes increases.

  2. Recent advances in understanding the roles of whole genome duplications in evolution [version 1; referees: 2 approved

    Directory of Open Access Journals (Sweden)

    Carol MacKintosh

    2017-08-01

    Full Text Available Ancient whole-genome duplications (WGDs—paleopolyploidy events—are key to solving Darwin’s ‘abominable mystery’ of how flowering plants evolved and radiated into a rich variety of species. The vertebrates also emerged from their invertebrate ancestors via two WGDs, and genomes of diverse gymnosperm trees, unicellular eukaryotes, invertebrates, fishes, amphibians and even a rodent carry evidence of lineage-specific WGDs. Modern polyploidy is common in eukaryotes, and it can be induced, enabling mechanisms and short-term cost-benefit assessments of polyploidy to be studied experimentally. However, the ancient WGDs can be reconstructed only by comparative genomics: these studies are difficult because the DNA duplicates have been through tens or hundreds of millions of years of gene losses, mutations, and chromosomal rearrangements that culminate in resolution of the polyploid genomes back into diploid ones (rediploidisation. Intriguing asymmetries in patterns of post-WGD gene loss and retention between duplicated sets of chromosomes have been discovered recently, and elaborations of signal transduction systems are lasting legacies from several WGDs. The data imply that simpler signalling pathways in the pre-WGD ancestors were converted via WGDs into multi-stranded parallelised networks. Genetic and biochemical studies in plants, yeasts and vertebrates suggest a paradigm in which different combinations of sister paralogues in the post-WGD regulatory networks are co-regulated under different conditions. In principle, such networks can respond to a wide array of environmental, sensory and hormonal stimuli and integrate them to generate phenotypic variety in cell types and behaviours. Patterns are also being discerned in how the post-WGD signalling networks are reconfigured in human cancers and neurological conditions. It is fascinating to unpick how ancient genomic events impact on complexity, variety and disease in modern life.

  3. Comparative whole-genome analysis reveals artificial selection effects on Ustilago esculenta genome.

    Science.gov (United States)

    Ye, Zihong; Pan, Yao; Zhang, Yafen; Cui, Haifeng; Jin, Gulei; McHardy, Alice C; Fan, Longjiang; Yu, Xiaoping

    2017-07-19

    Ustilago esculenta, infects Zizania latifolia, and induced host stem swollen to be a popular vegetable called Jiaobai in China. It is the long-standing artificial selection that maximizes the occurrence of favourable Jiaobai, and thus maintaining the plant-fungi interaction and modulating the fungus evolving from plant pathogen to entophyte. In this study, whole genome of U. esculenta was sequenced and transcriptomes of the fungi and its host were analysed. The 20.2 Mb U. esculenta draft genome of 6,654 predicted genes including mating, primary metabolism, secreted proteins, shared a high similarity to related Smut fungi. But U. esculenta prefers RNA silencing not repeat-induced point in defence and has more introns per gene, indicating relatively slow evolution rate. The fungus also lacks some genes in amino acid biosynthesis pathway which were filled by up-regulated host genes and developed distinct amino acid response mechanism to balance the infection-resistance interaction. Besides, U. esculenta lost some surface sensors, important virulence factors and host range-related effectors to maintain the economic endophytic life. The elucidation of the U. esculenta genomic information as well as expression profiles can not only contribute to more comprehensive insights into the molecular mechanism underlying artificial selection but also into smut fungi-host interactions. © The Author 2017. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  4. Resolving Evolutionary Relationships in Closely Related Species with Whole-Genome Sequencing Data.

    Science.gov (United States)

    Nater, Alexander; Burri, Reto; Kawakami, Takeshi; Smeds, Linnéa; Ellegren, Hans

    2015-11-01

    Using genetic data to resolve the evolutionary relationships of species is of major interest in evolutionary and systematic biology. However, reconstructing the sequence of speciation events, the so-called species tree, in closely related and potentially hybridizing species is very challenging. Processes such as incomplete lineage sorting and interspecific gene flow result in local gene genealogies that differ in their topology from the species tree, and analyses of few loci with a single sequence per species are likely to produce conflicting or even misleading results. To study these phenomena on a full phylogenomic scale, we use whole-genome sequence data from 200 individuals of four black-and-white flycatcher species with so far unresolved phylogenetic relationships to infer gene tree topologies and visualize genome-wide patterns of gene tree incongruence. Using phylogenetic analysis in nonoverlapping 10-kb windows, we show that gene tree topologies are extremely diverse and change on a very small physical scale. Moreover, we find strong evidence for gene flow among flycatcher species, with distinct patterns of reduced introgression on the Z chromosome. To resolve species relationships on the background of widespread gene tree incongruence, we used four complementary coalescent-based methods for species tree reconstruction, including complex modeling approaches that incorporate post-divergence gene flow among species. This allowed us to infer the most likely species tree with high confidence. Based on this finding, we show that regions of reduced effective population size, which have been suggested as particularly useful for species tree inference, can produce positively misleading species tree topologies. Our findings disclose the pitfalls of using loci potentially under selection as phylogenetic markers and highlight the potential of modeling approaches to disentangle species relationships in systems with large effective population sizes and post

  5. Whole-genome sequencing reveals the mechanisms for evolution of streptomycin resistance in Lactobacillus plantarum.

    Science.gov (United States)

    Zhang, Fuxin; Gao, Jiayuan; Wang, Bini; Huo, Dongxue; Wang, Zhaoxia; Zhang, Jiachao; Shao, Yuyu

    2018-01-31

    In this research, we investigated the evolution of streptomycin resistance in Lactobacillus plantarum ATCC14917, which was passaged in medium containing a gradually increasing concentration of streptomycin. After 25 d, the minimum inhibitory concentration (MIC) of L. plantarum ATCC14917 had reached 131,072 µg/mL, which was 8,192-fold higher than the MIC of the original parent isolate. The highly resistant L. plantarum ATCC14917 isolate was then passaged in antibiotic-free medium to determine the stability of resistance. The MIC value of the L. plantarum ATCC14917 isolate decreased to 2,048 µg/mL after 35 d but remained constant thereafter, indicating that resistance was irreversible even in the absence of selection pressure. Whole-genome sequencing of parent isolates, control isolates, and isolates following passage was used to study the resistance mechanism of L. plantarum ATCC14917 to streptomycin and adaptation in the presence and absence of selection pressure. Five mutated genes (single nucleotide polymorphisms and structural variants) were verified in highly resistant L. plantarum ATCC14917 isolates, which were related to ribosomal protein S12, LPXTG-motif cell wall anchor domain protein, LrgA family protein, Ser/Thr phosphatase family protein, and a hypothetical protein that may correlate with resistance to streptomycin. After passage in streptomycin-free medium, only the mutant gene encoding ribosomal protein S12 remained; the other 4 mutant genes had reverted to the wild type as found in the parent isolate. Although the MIC value of L. plantarum ATCC14917 was reduced in the absence of selection pressure, it remained 128-fold higher than the MIC value of the parent isolate, indicating that ribosomal protein S12 may play an important role in streptomycin resistance. Using the mobile elements database, we demonstrated that streptomycin resistance-related genes in L. plantarum ATCC14917 were not located on mobile elements. This research offers a way of

  6. Whole-genome sequencing of Berkshire (European native pig) provides insights into its origin and domestication.

    Science.gov (United States)

    Li, Mingzhou; Tian, Shilin; Yeung, Carol K L; Meng, Xuehong; Tang, Qianzi; Niu, Lili; Wang, Xun; Jin, Long; Ma, Jideng; Long, Keren; Zhou, Chaowei; Cao, Yinchuan; Zhu, Li; Bai, Lin; Tang, Guoqing; Gu, Yiren; Jiang, An'an; Li, Xuewei; Li, Ruiqiang

    2014-04-14

    Domesticated organisms have experienced strong selective pressures directed at genes or genomic regions controlling traits of biological, agricultural or medical importance. The genome of native and domesticated pigs provide a unique opportunity for tracing the history of domestication and identifying signatures of artificial selection. Here we used whole-genome sequencing to explore the genetic relationships among the European native pig Berkshire and breeds that are distributed worldwide, and to identify genomic footprints left by selection during the domestication of Berkshire. Numerous nonsynonymous SNPs-containing genes fall into olfactory-related categories, which are part of a rapidly evolving superfamily in the mammalian genome. Phylogenetic analyses revealed a deep phylogenetic split between European and Asian pigs rather than between domestic and wild pigs. Admixture analysis exhibited higher portion of Chinese genetic material for the Berkshire pigs, which is consistent with the historical record regarding its origin. Selective sweep analyses revealed strong signatures of selection affecting genomic regions that harbor genes underlying economic traits such as disease resistance, pork yield, fertility, tameness and body length. These discoveries confirmed the history of origin of Berkshire pig by genome-wide analysis and illustrate how domestication has shaped the patterns of genetic variation.

  7. Insights into three whole-genome duplications gleaned from the Paramecium caudatum genome sequence.

    Science.gov (United States)

    McGrath, Casey L; Gout, Jean-Francois; Doak, Thomas G; Yanagi, Akira; Lynch, Michael

    2014-08-01

    Paramecium has long been a model eukaryote. The sequence of the Paramecium tetraurelia genome reveals a history of three successive whole-genome duplications (WGDs), and the sequences of P. biaurelia and P. sexaurelia suggest that these WGDs are shared by all members of the aurelia species complex. Here, we present the genome sequence of P. caudatum, a species closely related to the P. aurelia species group. P. caudatum shares only the most ancient of the three WGDs with the aurelia complex. We found that P. caudatum maintains twice as many paralogs from this early event as the P. aurelia species, suggesting that post-WGD gene retention is influenced by subsequent WGDs and supporting the importance of selection for dosage in gene retention. The availability of P. caudatum as an outgroup allows an expanded analysis of the aurelia intermediate and recent WGD events. Both the Guanine+Cytosine (GC) content and the expression level of preduplication genes are significant predictors of duplicate retention. We find widespread asymmetrical evolution among aurelia paralogs, which is likely caused by gradual pseudogenization rather than by neofunctionalization. Finally, cases of divergent resolution of intermediate WGD duplicates between aurelia species implicate this process acts as an ongoing reinforcement mechanism of reproductive isolation long after a WGD event. Copyright © 2014 by the Genetics Society of America.

  8. Homoeologous chromosomes of Xenopus laevis are highly conserved after whole-genome duplication.

    Science.gov (United States)

    Uno, Y; Nishida, C; Takagi, C; Ueno, N; Matsuda, Y

    2013-11-01

    It has been suggested that whole-genome duplication (WGD) occurred twice during the evolutionary process of vertebrates around 450 and 500 million years ago, which contributed to an increase in the genomic and phenotypic complexities of vertebrates. However, little is still known about the evolutionary process of homoeologous chromosomes after WGD because many duplicate genes have been lost. Therefore, Xenopus laevis (2n=36) and Xenopus (Silurana) tropicalis (2n=20) are good animal models for studying the process of genomic and chromosomal reorganization after WGD because X. laevis is an allotetraploid species that resulted from WGD after the interspecific hybridization of diploid species closely related to X. tropicalis. We constructed a comparative cytogenetic map of X. laevis using 60 complimentary DNA clones that covered the entire chromosomal regions of 10 pairs of X. tropicalis chromosomes. We consequently identified all nine homoeologous chromosome groups of X. laevis. Hybridization signals on two pairs of X. laevis homoeologous chromosomes were detected for 50 of 60 (83%) genes, and the genetic linkage is highly conserved between X. tropicalis and X. laevis chromosomes except for one fusion and one inversion and also between X. laevis homoeologous chromosomes except for two inversions. These results indicate that the loss of duplicated genes and inter- and/or intrachromosomal rearrangements occurred much less frequently in this lineage, suggesting that these events were not essential for diploidization of the allotetraploid genome in X. laevis after WGD.

  9. Analysis of the differences in whole-genome expression related to asthma and obesity.

    Science.gov (United States)

    Gruchała-Niedoszytko, Marta; Niedoszytko, Marek; Sanjabi, Bahram; van der Vlies, Pieter; Niedoszytko, Piotr; Jassem, Ewa; Małgorzewicz, Sylwia

    2015-01-01

    Concomitant obesity significantly impairs asthma control. Obese asthmatics show more severe symptoms and an increased use of medications. The primary aim of the study was to identify genes that are differentially expressed in the peripheral blood of asthmatic patients with obesity, asthmatic patients with normal body mass, and obese patients without asthma. Secondly, we investigated whether the analysis of gene expression in peripheral blood may be helpful in the differential diagnosis of obese patients who present with symptoms similar to asthma. The study group included 15 patients with asthma (9 obese and 6 normal-weight patients), while the control group-13 obese patients in whom asthma was excluded. The analysis of whole-genome expression was performed on RNA samples isolated from peripheral blood. The comparison of gene expression profiles between asthmatic patients with obesity and those with normal body mass revealed a significant difference in 6 genes. The comparison of the expression between controls and normal-weight patients with asthma showed a significant difference in 23 genes. The analysis of genes with a different expression revealed a group of transcripts that may be related to an increased body mass (PI3, LOC100008589, RPS6KA3, LOC441763, IFIT1, and LOC100133565). Based on gene expression results, a prediction model was constructed, which allowed to correctly classify 92% of obese controls and 89% of obese asthmatic patients, resulting in the overall accuracy of the model of 90.9%. The results of our study showed significant differences in gene expression between obese asthmatic patients compared with asthmatic patients with normal body mass as well as in obese patients without asthma compared with asthmatic patients with normal body mass.

  10. Whole genome analysis of Leptospira licerasiae provides insight into leptospiral evolution and pathogenicity.

    Directory of Open Access Journals (Sweden)

    Jessica N Ricaldi

    Full Text Available The whole genome analysis of two strains of the first intermediately pathogenic leptospiral species to be sequenced (Leptospira licerasiae strains VAR010 and MMD0835 provides insight into their pathogenic potential and deepens our understanding of leptospiral evolution. Comparative analysis of eight leptospiral genomes shows the existence of a core leptospiral genome comprising 1547 genes and 452 conserved genes restricted to infectious species (including L. licerasiae that are likely to be pathogenicity-related. Comparisons of the functional content of the genomes suggests that L. licerasiae retains several proteins related to nitrogen, amino acid and carbohydrate metabolism which might help to explain why these Leptospira grow well in artificial media compared with pathogenic species. L. licerasiae strains VAR010(T and MMD0835 possess two prophage elements. While one element is circular and shares homology with LE1 of L. biflexa, the second is cryptic and homologous to a previously identified but unnamed region in L. interrogans serovars Copenhageni and Lai. We also report a unique O-antigen locus in L. licerasiae comprised of a 6-gene cluster that is unexpectedly short compared with L. interrogans in which analogous regions may include >90 such genes. Sequence homology searches suggest that these genes were acquired by lateral gene transfer (LGT. Furthermore, seven putative genomic islands ranging in size from 5 to 36 kb are present also suggestive of antecedent LGT. How Leptospira become naturally competent remains to be determined, but considering the phylogenetic origins of the genes comprising the O-antigen cluster and other putative laterally transferred genes, L. licerasiae must be able to exchange genetic material with non-invasive environmental bacteria. The data presented here demonstrate that L. licerasiae is genetically more closely related to pathogenic than to saprophytic Leptospira and provide insight into the genomic bases for

  11. Whole genome sequencing of Saccharomyces cerevisiae: from genotype to phenotype for improved metabolic engineering applications

    Directory of Open Access Journals (Sweden)

    Asadollahi Mohammad A

    2010-12-01

    Full Text Available Abstract Background The need for rapid and efficient microbial cell factory design and construction are possible through the enabling technology, metabolic engineering, which is now being facilitated by systems biology approaches. Metabolic engineering is often complimented by directed evolution, where selective pressure is applied to a partially genetically engineered strain to confer a desirable phenotype. The exact genetic modification or resulting genotype that leads to the improved phenotype is often not identified or understood to enable further metabolic engineering. Results In this work we performed whole genome high-throughput sequencing and annotation can be used to identify single nucleotide polymorphisms (SNPs between Saccharomyces cerevisiae strains S288c and CEN.PK113-7D. The yeast strain S288c was the first eukaryote sequenced, serving as the reference genome for the Saccharomyces Genome Database, while CEN.PK113-7D is a preferred laboratory strain for industrial biotechnology research. A total of 13,787 high-quality SNPs were detected between both strains (reference strain: S288c. Considering only metabolic genes (782 of 5,596 annotated genes, a total of 219 metabolism specific SNPs are distributed across 158 metabolic genes, with 85 of the SNPs being nonsynonymous (e.g., encoding amino acid modifications. Amongst metabolic SNPs detected, there was pathway enrichment in the galactose uptake pathway (GAL1, GAL10 and ergosterol biosynthetic pathway (ERG8, ERG9. Physiological characterization confirmed a strong deficiency in galactose uptake and metabolism in S288c compared to CEN.PK113-7D, and similarly, ergosterol content in CEN.PK113-7D was significantly higher in both glucose and galactose supplemented cultivations compared to S288c. Furthermore, DNA microarray profiling of S288c and CEN.PK113-7D in both glucose and galactose batch cultures did not provide a clear hypothesis for major phenotypes observed, suggesting that

  12. CNV discovery for milk composition traits in dairy cattle using whole genome resequencing.

    Science.gov (United States)

    Gao, Yahui; Jiang, Jianping; Yang, Shaohua; Hou, Yali; Liu, George E; Zhang, Shengli; Zhang, Qin; Sun, Dongxiao

    2017-03-29

    Copy number variations (CNVs) are important and widely distributed in the genome. CNV detection opens a new avenue for exploring genes associated with complex traits in humans, animals and plants. Herein, we present a genome-wide assessment of CNVs that are potentially associated with milk composition traits in dairy cattle. In this study, CNVs were detected based on whole genome re-sequencing data of eight Holstein bulls from four half- and/or full-sib families, with extremely high and low estimated breeding values (EBVs) of milk protein percentage and fat percentage. The range of coverage depth per individual was 8.2-11.9×. Using CNVnator, we identified a total of 14,821 CNVs, including 5025 duplications and 9796 deletions. Among them, 487 differential CNV regions (CNVRs) comprising ~8.23 Mb of the cattle genome were observed between the high and low groups. Annotation of these differential CNVRs were performed based on the cattle genome reference assembly (UMD3.1) and totally 235 functional genes were found within the CNVRs. By Gene Ontology and KEGG pathway analyses, we found that genes were significantly enriched for specific biological functions related to protein and lipid metabolism, insulin/IGF pathway-protein kinase B signaling cascade, prolactin signaling pathway and AMPK signaling pathways. These genes included INS, IGF2, FOXO3, TH, SCD5, GALNT18, GALNT16, ART3, SNCA and WNT7A, implying their potential association with milk protein and fat traits. In addition, 95 CNVRs were overlapped with 75 known QTLs that are associated with milk protein and fat traits of dairy cattle (Cattle QTLdb). In conclusion, based on NGS of 8 Holstein bulls with extremely high and low EBVs for milk PP and FP, we identified a total of 14,821 CNVs, 487 differential CNVRs between groups, and 10 genes, which were suggested as promising candidate genes for milk protein and fat traits.

  13. Maps of cis-Regulatory Nodes in Megabase Long Genome Segments are an Inevitable Intermediate Step Toward Whole Genome Functional Mapping.

    Science.gov (United States)

    Nikolaev, Lev G; Akopov, Sergey B; Chernov, Igor P; Sverdlov, Eugene D

    2007-04-01

    The availability of complete human and other metazoan genome sequences has greatly facilitated positioning and analysis of various genomic functional elements, with initial emphasis on coding sequences. However, complete functional maps of sequenced eukaryotic genomes should include also positions of all non-coding regulatory elements. Unfortunately, experimental data on genomic positions of a multitude of regulatory sequences, such as enhancers, silencers, insulators, transcription terminators, and replication origins are very limited, especially at the whole genome level. Since most genomic regulatory elements (e.g. enhancers) are generally gene-, tissue-, or cell-specific, the prediction of these elements by computational methods is difficult and often ambiguous. Therefore, the development of high-throughput experimental approaches for identifying and mapping genomic functional elements is highly desirable. At the same time, the creation of whole-genome map of hundreds of thousands of regulatory elements in several hundreds of tissue/cell types is presently far beyond our capabilities. A possible alternative for the whole genome approach is to concentrate efforts on individual genomic segments and then to integrate the data obtained into a whole genome functional map. Moreover, the maps of polygenic fragments with functional cis-regulatory elements would provide valuable data on complex regulatory systems, including their variability and evolution. Here, we reviewed experimental approaches to the realization of these ideas, including our own developments of experimental techniques for selection of cis-acting functionally active DNA fragments from large (megabase-sized) segments of mammalian genomes.

  14. Whole genome re-sequencing identifies a mutation in an ABC transporter (mdr2) in a Plasmodium chabaudi clone with altered susceptibility to antifolate drugs ☆

    OpenAIRE

    Martinelli, Axel; Henriques, Gisela; Cravo, Pedro; Hunt, Paul

    2011-01-01

    In malaria parasites, mutations in two genes of folate biosynthesis encoding dihydrofolate reductase (dhfr) and dihydropteroate synthase (dhps) modify responses to antifolate therapies which target these enzymes. However, the involvement of other genes which modify the availability of exogenous folate, for example, has been proposed. Here, we used short-read whole-genome re-sequencing to determine the mutations in a clone of the rodent malaria parasite, Plasmodium chabaudi, which has altered ...

  15. Defining and Evaluating a Core Genome Multilocus Sequence Typing Scheme for Whole-Genome Sequence-Based Typing of Listeria monocytogenes

    OpenAIRE

    Ruppitsch, Werner; Pietzka, Ariane; Prior, Karola; Bletz, Stefan; Fernandez, Haizpea Lasa; Allerberger, Franz; Harmsen, Dag; Mellmann, Alexander

    2015-01-01

    Whole-genome sequencing (WGS) has emerged today as an ultimate typing tool to characterize Listeria monocytogenes outbreaks. However, data analysis and interlaboratory comparability of WGS data are still challenging for most public health laboratories. Therefore, we have developed and evaluated a new L. monocytogenes typing scheme based on genome-wide gene-by-gene comparisons (core genome multilocus the sequence typing [cgMLST]) to allow for a unique typing nomenclature. Initially, we determi...

  16. TCGA's Pan-Cancer Efforts and Expansion to Include Whole Genome Sequence - TCGA

    Science.gov (United States)

    Carolyn Hutter, Ph.D., Program Director of NHGRI's Division of Genomic Medicine, discusses the expansion of TCGA's Pan-Cancer efforts to include the Pan-Cancer Analysis of Whole Genomes (PAWG) project.

  17. The role of whole genome sequencing in antimicrobial susceptibility testing of bacteria

    NARCIS (Netherlands)

    Ellington, M.J.; Ekelund, O.; Aarestrup, F.M.; Canton, R.; Doumith, M.; Giske, C.; Grundman, H.; Hasman, H.; Holden, M.T.G.; Hopkins, K.L.; Iredell, J.; Kahlmeter, G.; Köser, C.U.; MacGowan, A.; Mevius, D.; Mulvey, M.; Naas, T.; Peto, T.; Rolain, J.M.; Samuelsen,; Woodford, N.

    2017-01-01

    Whole genome sequencing (WGS) offers the potential to predict antimicrobial susceptibility from a single assay. The European Committee on Antimicrobial Susceptibility Testing established a subcommittee to review the current development status of WGS for bacterial antimicrobial susceptibility testing

  18. Towards a whole-genome sequence for rye (Secale cereale L.)

    National Research Council Canada - National Science Library

    Bauer, Eva; Schmutzer, Thomas; Barilar, Ivan; Mascher, Martin; Gundlach, Heidrun; Martis, Mihaela-Maria; Twardziok, Sven O; Hackauf, Bernd; Gordillo, Andres; Wilde, Peer; Schmidt, Malthe; Korzun, Viktor; Mayer, Klaus F. X; Schmid, Karl; Schoen, Chris-Carolin; Scholz, Uwe

    2017-01-01

    We report on a whole-genome draft sequence of rye (Secale cereale L.). Rye is a diploid Triticeae species closely related to wheat and barley, and an important crop for food and feed in Central and Eastern Europe...

  19. Somatic retrotransposition in human cancer revealed by whole-genome and exome sequencing

    National Research Council Canada - National Science Library

    Helman, Elena; Lawrence, Michael S; Stewart, Chip; Sougnez, Carrie; Getz, Gad; Meyerson, Matthew

    2014-01-01

    .... Here, we applied TranspoSeq, a computational framework that identifies retrotransposon insertions from sequencing data, to whole genomes from 200 tumor/normal pairs across 11 tumor types as part...

  20. Core Gene Set As the Basis of Multilocus Sequence Analysis of the Subclass Actinobacteridae

    Science.gov (United States)

    Adékambi, Toïdi; Butler, Ray W.; Hanrahan, Finnian; Delcher, Arthur L.; Drancourt, Michel; Shinnick, Thomas M.

    2011-01-01

    Comparative genomic sequencing is shedding new light on bacterial identification, taxonomy and phylogeny. An in silico assessment of a core gene set necessary for cellular functioning was made to determine a consensus set of genes that would be useful for the identification, taxonomy and phylogeny of the species belonging to the subclass Actinobacteridae which contained two orders Actinomycetales and Bifidobacteriales. The subclass Actinobacteridae comprised about 85% of the actinobacteria families. The following recommended criteria were used to establish a comprehensive gene set; the gene should (i) be long enough to contain phylogenetically useful information, (ii) not be subject to horizontal gene transfer, (iii) be a single copy (iv) have at least two regions sufficiently conserved that allow the design of amplification and sequencing primers and (v) predict whole-genome relationships. We applied these constraints to 50 different Actinobacteridae genomes and made 1,224 pairwise comparisons of the genome conserved regions and gene fragments obtained by using Sequence VARiability Analysis Program (SVARAP), which allow designing the primers. Following a comparative statistical modeling phase, 3 gene fragments were selected, ychF, rpoB, and secY with R2>0.85. Selected sets of broad range primers were tested from the 3 gene fragments and were demonstrated to be useful for amplification and sequencing of 25 species belonging to 9 genera of Actinobacteridae. The intraspecies similarities were 96.3–100% for ychF, 97.8–100% for rpoB and 96.9–100% for secY among 73 strains belonging to 15 species of the subclass Actinobacteridae compare to 99.4–100% for 16S rRNA. The phylogenetic topology obtained from the combined datasets ychF+rpoB+secY was globally similar to that inferred from the 16S rRNA but with higher confidence. It was concluded that multi-locus sequence analysis using core gene set might represent the first consensus and valid approach for

  1. ecoPrimers: inference of new DNA barcode markers from whole genome sequence analysis.

    Science.gov (United States)

    Riaz, Tiayyba; Shehzad, Wasim; Viari, Alain; Pompanon, François; Taberlet, Pierre; Coissac, Eric

    2011-11-01

    Using non-conventional markers, DNA metabarcoding allows biodiversity assessment from complex substrates. In this article, we present ecoPrimers, a software for identifying new barcode markers and their associated PCR primers. ecoPrimers scans whole genomes to find such markers without a priori knowledge. ecoPrimers optimizes two quality indices measuring taxonomical range and discrimination to select the most efficient markers from a set of reference sequences, according to specific experimental constraints such as marker length or specifically targeted taxa. The key step of the algorithm is the identification of conserved regions among reference sequences for anchoring primers. We propose an efficient algorithm based on data mining, that allows the analysis of huge sets of sequences. We evaluate the efficiency of ecoPrimers by running it on three different sequence sets: mitochondrial, chloroplast and bacterial genomes. Identified barcode markers correspond either to barcode regions already in use for plants or animals, or to new potential barcodes. Results from empirical experiments carried out on a promising new barcode for analyzing vertebrate diversity fully agree with expectations based on bioinformatics analysis. These tests demonstrate the efficiency of ecoPrimers for inferring new barcodes fitting with diverse experimental contexts. ecoPrimers is available as an open source project at: http://www.grenoble.prabi.fr/trac/ecoPrimers.

  2. Comparison of variations detection between whole-genome amplification methods used in single-cell resequencing

    DEFF Research Database (Denmark)

    Hou, Yong; Wu, Kui; Shi, Xulian

    2015-01-01

    BACKGROUND: Single-cell resequencing (SCRS) provides many biomedical advances in variations detection at the single-cell level, but it currently relies on whole genome amplification (WGA). Three methods are commonly used for WGA: multiple displacement amplification (MDA), degenerate-oligonucleoti......BACKGROUND: Single-cell resequencing (SCRS) provides many biomedical advances in variations detection at the single-cell level, but it currently relies on whole genome amplification (WGA). Three methods are commonly used for WGA: multiple displacement amplification (MDA), degenerate...

  3. Evaluation of artificial selection in Standard Poodles using whole-genome sequencing.

    Science.gov (United States)

    Friedenberg, Steven G; Meurs, Kathryn M; Mackay, Trudy F C

    2016-12-01

    Identifying regions of artificial selection within dog breeds may provide insights into genetic variation that underlies breed-specific traits or diseases-particularly if these traits or disease predispositions are fixed within a breed. In this study, we searched for runs of homozygosity (ROH) and calculated the d i statistic (which is based upon F ST) to identify regions of artificial selection in Standard Poodles using high-coverage, whole-genome sequencing data of 15 Standard Poodles and 49 dogs across seven other breeds. We identified consensus ROH regions ≥1 Mb in length and common to at least ten Standard Poodles covering 0.6 % of the genome, and d i regions that most distinguish Standard Poodles from other breeds covering 3.7 % of the genome. Within these regions, we identified enriched gene pathways related to olfaction, digestion, and taste, as well as pathways related to adrenal hormone biosynthesis, T cell function, and protein ubiquitination that could contribute to the pathogenesis of some Poodle-prevalent autoimmune diseases. We also validated variants related to hair coat and skull morphology that have previously been identified as being under selective pressure in Poodles, and flagged additional polymorphisms in genes such as ITGA2B, CBX4, and TNXB that may represent strong candidates for other common Poodle disorders.

  4. Unique features of a Japanese 'Candidatus Liberibacter asiaticus' strain revealed by whole genome sequencing.

    Directory of Open Access Journals (Sweden)

    Hiroshi Katoh

    Full Text Available Citrus greening (huanglongbing is the most destructive disease of citrus worldwide. It is spread by citrus psyllids and is associated with phloem-limited bacteria of three species of α-Proteobacteria, namely, 'Candidatus Liberibacter asiaticus', 'Ca. L. americanus', and 'Ca. L. africanus'. Recent findings suggested that some Japanese strains lack the bacteriophage-type DNA polymerase region (DNA pol, in contrast to the Floridian psy62 strain. The whole genome sequence of the pol-negative 'Ca. L. asiaticus' Japanese isolate Ishi-1 was determined by metagenomic analysis of DNA extracted from 'Ca. L. asiaticus'-infected psyllids and leaf midribs. The 1.19-Mb genome has an average 36.32% GC content. Annotation revealed 13 operons encoding rRNA and 44 tRNA genes, but no typical bacterial pathogenesis-related genes were located within the genome, similar to the Floridian psy62 and Chinese gxpsy. In contrast to other 'Ca. L. asiaticus' strains, the genome of the Japanese Ishi-1 strain lacks a prophage-related region.

  5. Whole genome duplications and a 'function' for junk DNA? Facts and hypotheses.

    Directory of Open Access Journals (Sweden)

    Reiner A Veitia

    Full Text Available BACKGROUND: The lack of correlation between genome size and organismal complexity is understood in terms of the massive presence of repetitive and non-coding DNA. This non-coding subgenome has long been called "junk" DNA. However, it might have important functions. Generation of junk DNA depends on proliferation of selfish DNA elements and on local or global DNA duplication followed by genic non-functionalization. METHODOLOGY/PRINCIPAL FINDINGS: Evidence from genomic analyses and experimental data indicates that Whole Genome Duplications (WGD are often followed by a return to the diploid state, through DNA deletions and intra/interchromosomal rearrangements. We use simple theoretical models and simulations to explore how a WGD accompanied by sequence deletions might affect the dosage balance often required among several gene products involved in regulatory processes. We find that potential genomic deletions leading to changes in nuclear and cell volume might potentially perturb gene dosage balance. CONCLUSIONS/SIGNIFICANCE: The potentially negative impact of DNA deletions can be buffered if deleted genic DNA is, at least temporarily, replaced by repetitive DNA so that the nuclear/cell volume remains compatible with normal living. Thus, we speculate that retention of non-functionalized non-coding DNA, and replacement of deleted DNA through proliferation of selfish elements, might help avoid dosage imbalances in cycles of polyploidization and diploidization, which are particularly frequent in plants.

  6. Intragenic DOK7 deletion detected by whole-genome sequencing in congenital myasthenic syndromes.

    Science.gov (United States)

    Azuma, Yoshiteru; Töpf, Ana; Evangelista, Teresinha; Lorenzoni, Paulo José; Roos, Andreas; Viana, Pedro; Inagaki, Hidehito; Kurahashi, Hiroki; Lochmüller, Hanns

    2017-06-01

    To identify the genetic cause in a patient affected by ptosis and exercise-induced muscle weakness and diagnosed with congenital myasthenic syndromes (CMS) using whole-genome sequencing (WGS). Candidate gene screening and WGS analysis were performed in the case. Allele-specific PCR was subsequently performed to confirm the copy number variation (CNV) that was suspected from the WGS results. In addition to the previously reported frameshift mutation c.1124_1127dup, an intragenic 6,261 bp deletion spanning from the 5' untranslated region to intron 2 of the DOK7 gene was identified by WGS in the patient with CMS. The heterozygous deletion was suspected based on reduced coverage on WGS and confirmed by allele-specific PCR. The breakpoints had microhomology and an inverted repeat, which may have led to the development of the deletion during DNA replication. We report a CMS case with identification of the breakpoints of the intragenic DOK7 deletion using WGS analysis. This case illustrates that CNVs undetected by Sanger sequencing may be identified by WGS and highlights their relevance in the molecular diagnosis of a treatable neurologic condition such as CMS.

  7. Whole Genome Sequence Analysis of Pig Respiratory Bacterial Pathogens with Elevated Minimum Inhibitory Concentrations for Macrolides.

    Science.gov (United States)

    Dayao, Denise Ann Estarez; Seddon, Jennifer M; Gibson, Justine S; Blackall, Patrick J; Turni, Conny

    2016-10-01

    Macrolides are often used to treat and control bacterial pathogens causing respiratory disease in pigs. This study analyzed the whole genome sequences of one clinical isolate of Actinobacillus pleuropneumoniae, Haemophilus parasuis, Pasteurella multocida, and Bordetella bronchiseptica, all isolated from Australian pigs to identify the mechanism underlying the elevated minimum inhibitory concentrations (MICs) for erythromycin, tilmicosin, or tulathromycin. The H. parasuis assembled genome had a nucleotide transition at position 2059 (A to G) in the six copies of the 23S rRNA gene. This mutation has previously been associated with macrolide resistance but this is the first reported mechanism associated with elevated macrolide MICs in H. parasuis. There was no known macrolide resistance mechanism identified in the other three bacterial genomes. However, strA and sul2, aminoglycoside and sulfonamide resistance genes, respectively, were detected in one contiguous sequence (contig 1) of A. pleuropneumoniae assembled genome. This contig was identical to plasmids previously identified in Pasteurellaceae. This study has provided one possible explanation of elevated MICs to macrolides in H. parasuis. Further studies are necessary to clarify the mechanism causing the unexplained macrolide resistance in other Australian pig respiratory pathogens including the role of efflux systems, which were detected in all analyzed genomes.

  8. Whole-genome fingerprint of the DNA methylome during human B-cell differentiation

    Science.gov (United States)

    Kulis, Marta; Merkel, Angelika; Heath, Simon; Queirós, Ana C.; Schuyler, Ronald P.; Castellano, Giancarlo; Beekman, Renée; Raineri, Emanuele; Esteve, Anna; Clot, Guillem; Verdaguer-Dot, Nuria; Duran-Ferrer, Martí; Russiñol, Nuria; Vilarrasa-Blasi, Roser; Ecker, Simone; Pancaldi, Vera; Rico, Daniel; Agueda, Lidia; Blanc, Julie; Richardson, David; Clarke, Laura; Datta, Avik; Pascual, Marien; Agirre, Xabier; Prosper, Felipe; Alignani, Diego; Paiva, Bruno; Caron, Gersende; Fest, Thierry; Muench, Marcus O.; Fomin, Marina E.; Lee, Seung-Tae; Wiemels, Joseph L.; Valencia, Alfonso; Gut, Marta; Flicek, Paul; Stunnenberg, Hendrik G.; Siebert, Reiner; Küppers, Ralf; Gut, Ivo G.; Campo, Elías; Martín-Subero, José I.

    2017-01-01

    We analyzed the DNA methylome of ten subpopulations spanning the entire B-cell differentiation program by whole-genome bisulfite sequencing and high-density microarrays. We observed that non-CpG methylation disappeared upon B-cell commitment whereas CpG methylation changed extensively during B-cell maturation, showing an accumulative pattern and affecting around 30% of all measured CpGs. Early differentiation stages mainly displayed enhancer demethylation, which was associated with upregulation of key B-cell transcription factors and affected multiple genes involved in B-cell biology. Late differentiation stages, in contrast, showed extensive demethylation of heterochromatin and methylation gain of polycomb-repressed areas, and did not affect genes with apparent functional impact in B cells. This signature, which has been previously linked to aging and cancer, was particularly widespread in mature cells with extended life span. Comparing B-cell neoplasms with their normal counterparts, we identified that they frequently acquire methylation changes in regions undergoing dynamic methylation already during normal B-cell differentiation. PMID:26053498

  9. Use of Whole Genome Sequencing for Diagnosis and Discovery in the Cancer Genetics Clinic

    Directory of Open Access Journals (Sweden)

    Samantha B. Foley

    2015-01-01

    Full Text Available Despite the potential of whole-genome sequencing (WGS to improve patient diagnosis and care, the empirical value of WGS in the cancer genetics clinic is unknown. We performed WGS on members of two cohorts of cancer genetics patients: those with BRCA1/2 mutations (n = 176 and those without (n = 82. Initial analysis of potentially pathogenic variants (PPVs, defined as nonsynonymous variants with allele frequency < 1% in ESP6500 in 163 clinically-relevant genes suggested that WGS will provide useful clinical results. This is despite the fact that a majority of PPVs were novel missense variants likely to be classified as variants of unknown significance (VUS. Furthermore, previously reported pathogenic missense variants did not always associate with their predicted diseases in our patients. This suggests that the clinical use of WGS will require large-scale efforts to consolidate WGS and patient data to improve accuracy of interpretation of rare variants. While loss-of-function (LoF variants represented only a small fraction of PPVs, WGS identified additional cancer risk LoF PPVs in patients with known BRCA1/2 mutations and led to cancer risk diagnoses in 21% of non-BRCA cancer genetics patients after expanding our analysis to 3209 ClinVar genes. These data illustrate how WGS can be used to improve our ability to discover patients' cancer genetic risks.

  10. Somatic retrotransposition in human cancer revealed by whole-genome and exome sequencing.

    Science.gov (United States)

    Helman, Elena; Lawrence, Michael S; Stewart, Chip; Sougnez, Carrie; Getz, Gad; Meyerson, Matthew

    2014-07-01

    Retrotransposons constitute a major source of genetic variation, and somatic retrotransposon insertions have been reported in cancer. Here, we applied TranspoSeq, a computational framework that identifies retrotransposon insertions from sequencing data, to whole genomes from 200 tumor/normal pairs across 11 tumor types as part of The Cancer Genome Atlas (TCGA) Pan-Cancer Project. In addition to novel germline polymorphisms, we find 810 somatic retrotransposon insertions primarily in lung squamous, head and neck, colorectal, and endometrial carcinomas. Many somatic retrotransposon insertions occur in known cancer genes. We find that high somatic retrotransposition rates in tumors are associated with high rates of genomic rearrangement and somatic mutation. Finally, we developed TranspoSeq-Exome to interrogate an additional 767 tumor samples with hybrid-capture exome data and discovered 35 novel somatic retrotransposon insertions into exonic regions, including an insertion into an exon of the PTEN tumor suppressor gene. The results of this large-scale, comprehensive analysis of retrotransposon movement across tumor types suggest that somatic retrotransposon insertions may represent an important class of structural variation in cancer. © 2014 Helman et al.; Published by Cold Spring Harbor Laboratory Press.

  11. Whole-Genome Sequencing Uncovers the Genetic Basis of Chronic Mountain Sickness in Andean Highlanders

    Science.gov (United States)

    Zhou, Dan; Udpa, Nitin; Ronen, Roy; Stobdan, Tsering; Liang, Junbin; Appenzeller, Otto; Zhao, Huiwen W.; Yin, Yi; Du, Yuanping; Guo, Lixia; Cao, Rui; Wang, Yu; Jin, Xin; Huang, Chen; Jia, Wenlong; Cao, Dandan; Guo, Guangwu; Gamboa, Jorge L.; Villafuerte, Francisco; Callacondo, David; Xue, Jin; Liu, Siqi; Frazer, Kelly A.; Li, Yingrui; Bafna, Vineet; Haddad, Gabriel G.

    2013-01-01

    The hypoxic conditions at high altitudes present a challenge for survival, causing pressure for adaptation. Interestingly, many high-altitude denizens (particularly in the Andes) are maladapted, with a condition known as chronic mountain sickness (CMS) or Monge disease. To decode the genetic basis of this disease, we sequenced and compared the whole genomes of 20 Andean subjects (10 with CMS and 10 without). We discovered 11 regions genome-wide with significant differences in haplotype frequencies consistent with selective sweeps. In these regions, two genes (an erythropoiesis regulator, SENP1, and an oncogene, ANP32D) had a higher transcriptional response to hypoxia in individuals with CMS relative to those without. We further found that downregulating the orthologs of these genes in flies dramatically enhanced survival rates under hypoxia, demonstrating that suppression of SENP1 and ANP32D plays an essential role in hypoxia tolerance. Our study provides an unbiased framework to identify and validate the genetic basis of adaptation to high altitudes and identifies potentially targetable mechanisms for CMS treatment. PMID:23954164

  12. Whole genome sequencing reveals a de novo SHANK3 mutation in familial autism spectrum disorder.

    Directory of Open Access Journals (Sweden)

    Sergio I Nemirovsky

    Full Text Available Clinical genomics promise to be especially suitable for the study of etiologically heterogeneous conditions such as Autism Spectrum Disorder (ASD. Here we present three siblings with ASD where we evaluated the usefulness of Whole Genome Sequencing (WGS for the diagnostic approach to ASD.We identified a family segregating ASD in three siblings with an unidentified cause. We performed WGS in the three probands and used a state-of-the-art comprehensive bioinformatic analysis pipeline and prioritized the identified variants located in genes likely to be related to ASD. We validated the finding by Sanger sequencing in the probands and their parents.Three male siblings presented a syndrome characterized by severe intellectual disability, absence of language, autism spectrum symptoms and epilepsy with negative family history for mental retardation, language disorders, ASD or other psychiatric disorders. We found germline mosaicism for a heterozygous deletion of a cytosine in the exon 21 of the SHANK3 gene, resulting in a missense sequence of 5 codons followed by a premature stop codon (NM_033517:c.3259_3259delC, p.Ser1088Profs*6.We reported an infrequent form of familial ASD where WGS proved useful in the clinic. We identified a mutation in SHANK3 that underscores its relevance in Autism Spectrum Disorder.

  13. The American cranberry: first insights into the whole genome of a species adapted to bog habitat.

    Science.gov (United States)

    Polashock, James; Zelzion, Ehud; Fajardo, Diego; Zalapa, Juan; Georgi, Laura; Bhattacharya, Debashish; Vorsa, Nicholi

    2014-06-13

    The American cranberry (Vaccinium macrocarpon Ait.) is one of only three widely-cultivated fruit crops native to North America- the other two are blueberry (Vaccinium spp.) and native grape (Vitis spp.). In terms of taxonomy, cranberries are in the core Ericales, an order for which genome sequence data are currently lacking. In addition, cranberries produce a host of important polyphenolic secondary compounds, some of which are beneficial to human health. Whereas next-generation sequencing technology is allowing the advancement of whole-genome sequencing, one major obstacle to the successful assembly from short-read sequence data of complex diploid (and higher ploidy) organisms is heterozygosity. Cranberry has the advantage of being diploid (2n = 2x = 24) and self-fertile. To minimize the issue of heterozygosity, we sequenced the genome of a fifth-generation inbred genotype (F ≥ 0.97) derived from five generations of selfing originating from the cultivar Ben Lear. The genome size of V. macrocarpon has been estimated to be about 470 Mb. Genomic sequences were assembled into 229,745 scaffolds representing 420 Mbp (N50 = 4,237 bp) with 20X average coverage. The number of predicted genes was 36,364 and represents 17.7% of the assembled genome. Of the predicted genes, 30,090 were assigned to candidate genes based on homology. Genes supported by transcriptome data totaled 13,170 (36%). Shotgun sequencing of the cranberry genome, with an average sequencing coverage of 20X, allowed efficient assembly and gene calling. The candidate genes identified represent a useful collection to further study important biochemical pathways and cellular processes and to use for marker development for breeding and the study of horticultural characteristics, such as disease resistance.

  14. Whole genome wide expression profiles on germination of Verticillium dahliae microsclerotia.

    Directory of Open Access Journals (Sweden)

    Dongfang Hu

    Full Text Available Verticillium dahliae is a fungal pathogen causing Verticillium wilt on a range of economically important crops. Microsclerotia are its main survival and dormancy structures and serve as the primary inoculum on many hosts. Studies were conducted to determine the effect of temperature (5 to 50°C, pH (2 to 12 and nutrient regimes on microsclerotia germination. The optimal condition for microsclerotium germination was 20°C with pH 8.0 whereas nutrient regimes had no significant effect on its germination. The whole genome wide expression profiles during microsclerotium germination were characterized using the Illumina sequencing technology. Approximately 7.4 million of 21-nt cDNA tags were sequenced in the cDNA libraries derived from germinated and non-germinated microsclerotia. About 3.9% and 2.3% of the unique tags were up-regulated and down-regulated at least five-fold, respectively, in the germinated microsclerotia compared with the non-germinated microsclerotia. A total of 1654 genes showing differential expression were identified. Genes that are likely to have played important roles in microsclerotium germination include those encoding G-protein coupled receptor, lipase/esterase, cyclopentanone 1,2-monooxygenase, H(+/hexose cotransporter 1, fungal Zn(2-Cys(6 binuclear cluster domain, thymus-specific serine protease, glucan 1,3-beta-glucosidase, and alcohol dehydrogenase. These genes were mainly up-regulated or down-regulated only in germinated microsclerotia, compared with non-germinated microsclerotia. The differential expression of genes was confirmed by qRT-PCR analysis of 20 randomly selected genes from the 40 most differentially expressed genes.

  15. Whole genome association mapping by incompatibilities and local perfect phylogenies

    Directory of Open Access Journals (Sweden)

    Besenbacher Søren

    2006-10-01

    despite being significantly faster. For unphased genotype data, an initial step of estimating the phase only slightly decreases the power of the method. The method was also found to accurately localise the known susceptibility variants in an empirical data set – the ΔF508 mutation for cystic fibrosis – where the susceptibility variant is already known – and to find significant signals for association between the CYP2D6 gene and poor drug metabolism, although for this dataset the highest association score is about 60 kb from the CYP2D6 gene. Conclusion Our method has been implemented in the Blossoc (BLOck aSSOCiation software. Using Blossoc, genome wide chip-based surveys of 3 million SNPs in 1000 cases and 1000 controls can be analysed in less than two CPU hours.

  16. Whole Genome DNA Sequence Analysis of Salmonella subspecies enterica serotype Tennessee obtained from related peanut butter foodborne outbreaks.

    Science.gov (United States)

    Wilson, Mark R; Brown, Eric; Keys, Chris; Strain, Errol; Luo, Yan; Muruvanda, Tim; Grim, Christopher; Jean-Gilles Beaubrun, Junia; Jarvis, Karen; Ewing, Laura; Gopinath, Gopal; Hanes, Darcy; Allard, Marc W; Musser, Steven

    2016-01-01

    Establishing an association between possible food sources and clinical isolates requires discriminating the suspected pathogen from an environmental background, and distinguishing it from other closely-related foodborne pathogens. We used whole genome sequencing (WGS) to Salmonella subspecies enterica serotype Tennessee (S. Tennessee) to describe genomic diversity across the serovar as well as among and within outbreak clades of strains associated with contaminated peanut butter. We analyzed 71 isolates of S. Tennessee from disparate food, environmental, and clinical sources and 2 other closely-related Salmonella serovars as outgroups (S. Kentucky and S. Cubana), which were also shot-gun sequenced. A whole genome single nucleotide polymorphism (SNP) analysis was performed using a maximum likelihood approach to infer phylogenetic relationships. Several monophyletic lineages of S. Tennessee with limited SNP variability were identified that recapitulated several food contamination events. S. Tennessee clades were separated from outgroup salmonellae by more than sixteen thousand SNPs. Intra-serovar diversity of S. Tennessee was small compared to the chosen outgroups (1,153 SNPs), suggesting recent divergence of some S. Tennessee clades. Analysis of all 1,153 SNPs structuring an S. Tennessee peanut butter outbreak cluster revealed that isolates from several food, plant, and clinical isolates were very closely related, as they had only a few SNP differences between them. SNP-based cluster analyses linked specific food sources to several clinical S. Tennessee strains isolated in separate contamination events. Environmental and clinical isolates had very similar whole genome sequences; no markers were found that could be used to discriminate between these sources. Finally, we identified SNPs within variable S. Tennessee genes that may be useful markers for the development of rapid surveillance and typing methods, potentially aiding in traceback efforts during future

  17. Whole Genome DNA Sequence Analysis of Salmonella subspecies enterica serotype Tennessee obtained from related peanut butter foodborne outbreaks.

    Directory of Open Access Journals (Sweden)

    Mark R Wilson

    Full Text Available Establishing an association between possible food sources and clinical isolates requires discriminating the suspected pathogen from an environmental background, and distinguishing it from other closely-related foodborne pathogens. We used whole genome sequencing (WGS to Salmonella subspecies enterica serotype Tennessee (S. Tennessee to describe genomic diversity across the serovar as well as among and within outbreak clades of strains associated with contaminated peanut butter. We analyzed 71 isolates of S. Tennessee from disparate food, environmental, and clinical sources and 2 other closely-related Salmonella serovars as outgroups (S. Kentucky and S. Cubana, which were also shot-gun sequenced. A whole genome single nucleotide polymorphism (SNP analysis was performed using a maximum likelihood approach to infer phylogenetic relationships. Several monophyletic lineages of S. Tennessee with limited SNP variability were identified that recapitulated several food contamination events. S. Tennessee clades were separated from outgroup salmonellae by more than sixteen thousand SNPs. Intra-serovar diversity of S. Tennessee was small compared to the chosen outgroups (1,153 SNPs, suggesting recent divergence of some S. Tennessee clades. Analysis of all 1,153 SNPs structuring an S. Tennessee peanut butter outbreak cluster revealed that isolates from several food, plant, and clinical isolates were very closely related, as they had only a few SNP differences between them. SNP-based cluster analyses linked specific food sources to several clinical S. Tennessee strains isolated in separate contamination events. Environmental and clinical isolates had very similar whole genome sequences; no markers were found that could be used to discriminate between these sources. Finally, we identified SNPs within variable S. Tennessee genes that may be useful markers for the development of rapid surveillance and typing methods, potentially aiding in traceback efforts

  18. Whole-genome DNA methylation characteristics in pediatric precursor B cell acute lymphoblastic leukemia (BCP ALL.

    Directory of Open Access Journals (Sweden)

    Radosław Chaber

    Full Text Available In addition to genetic alterations, epigenetic abnormalities have been shown to underlie the pathogenesis of acute lymphoblastic leukemia (ALL-the most common pediatric cancer. The purpose of this study was to characterize the whole genome DNA methylation profile in children with precursor B-cell ALL (BCP ALL and to compare this profile with methylation observed in normal bone marrow samples. Additional efforts were made to correlate the observed methylation patterns with selected clinical features. We assessed DNA methylation from bone marrow samples obtained from 38 children with BCP ALL at the time of diagnosis along with 4 samples of normal bone marrow cells as controls using Infinium MethylationEPIC BeadChip Array. Patients were diagnosed and stratified into prognosis groups according to the BFM ALL IC 2009 protocol. The analysis of differentially methylated sites across the genome as well as promoter methylation profiles allowed clear separation of the leukemic and control samples into two clusters. 86.6% of the promoter-associated differentially methylated sites were hypermethylated in BCP ALL. Seven sites were found to correlate with the BFM ALL IC 2009 high risk group. Amongst these, one was located within the gene body of the MBP gene and another was within the promoter region- PSMF1 gene. Differentially methylated sites that were significantly related with subsets of patients with ETV6-RUNX1 fusion and hyperdiploidy. The analyzed translocations and change of genes' sequence context does not affect methylation and methylation seems not to be a mechanism for the regulation of expression of the resulting fusion genes.

  19. Genomic analysis of the basal lineage fungus Rhizopus oryzae reveals a whole-genome duplication.

    Directory of Open Access Journals (Sweden)

    Li-Jun Ma

    2009-07-01

    Full Text Available Rhizopus oryzae is the primary cause of mucormycosis, an emerging, life-threatening infection characterized by rapid angioinvasive growth with an overall mortality rate that exceeds 50%. As a representative of the paraphyletic basal group of the fungal kingdom called "zygomycetes," R. oryzae is also used as a model to study fungal evolution. Here we report the genome sequence of R. oryzae strain 99-880, isolated from a fatal case of mucormycosis. The highly repetitive 45.3 Mb genome assembly contains abundant transposable elements (TEs, comprising approximately 20% of the genome. We predicted 13,895 protein-coding genes not overlapping TEs, many of which are paralogous gene pairs. The order and genomic arrangement of the duplicated gene pairs and their common phylogenetic origin provide evidence for an ancestral whole-genome duplication (WGD event. The WGD resulted in the duplication of nearly all subunits of the protein complexes associated with respiratory electron transport chains, the V-ATPase, and the ubiquitin-proteasome systems. The WGD, together with recent gene duplications, resulted in the expansion of multiple gene families related to cell growth and signal transduction, as well as secreted aspartic protease and subtilase protein families, which are known fungal virulence factors. The duplication of the ergosterol biosynthetic pathway, especially the major azole target, lanosterol 14alpha-demethylase (ERG11, could contribute to the variable responses of R. oryzae to different azole drugs, including voriconazole and posaconazole. Expanded families of cell-wall synthesis enzymes, essential for fungal cell integrity but absent in mammalian hosts, reveal potential targets for novel and R. oryzae-specific diagnostic and therapeutic treatments.

  20. Mosquito-borne Inkoo virus in northern Sweden - isolation and whole genome sequencing.

    Science.gov (United States)

    Lwande, Olivia Wesula; Bucht, Göran; Ahlm, Clas; Ahlm, Kristoffer; Näslund, Jonas; Evander, Magnus

    2017-03-23

    Inkoo virus (INKV) is a less known mosquito-borne virus belonging to Bunyaviridae, genus Orthobunyavirus, California serogroup. Studies indicate that INKV infection is mainly asymptomatic, but can cause mild encephalitis in humans. In northern Europe, the sero-prevalence against INKV is high, 41% in Sweden and 51% in Finland. Previously, INKV RNA has been detected in adult Aedes (Ae.) communis, Ae. hexodontus and Ae. punctor mosquitoes and Ae. communis larvae, but there are still gaps of knowledge regarding mosquito vectors and genetic diversity. Therefore, we aimed to determine the occurrence of INKV in its mosquito vector and characterize the isolates. About 125,000 mosquitoes were collected during a mosquito-borne virus surveillance in northern Sweden during the summer period of 2015. Of these, 10,000 mosquitoes were processed for virus isolation and detection using cell culture and RT-PCR. Virus isolates were further characterized by whole genome sequencing. Genetic typing of mosquito species was conducted by cytochrome oxidase subunit I (COI) gene amplification and sequencing (genetic barcoding). Several Ae. communis mosquitoes were found positive for INKV RNA and two isolates were obtained. The first complete sequences of the small (S), medium (M), and large (L) segments of INKV in Sweden were obtained. Phylogenetic analysis showed that the INKV genome was most closely related to other INKV isolates from Sweden and Finland. Of the three INKV genome segments, the INKV M segment had the highest frequency of non-synonymous mutations. The overall G/C-content of INKV genes was low for the N/NSs genes (43.8-45.5%), polyprotein (Gn/Gc/NSm) gene (35.6%) and the RNA polymerase gene (33.8%) This may be due to the fact that INKV in most instances utilized A or T in the third codon position. INKV is frequently circulating in northern Sweden and Ae. communis is the key vector. The high mutation rate of the INKV M segment may have consequences on virulence.

  1. Genomic Analysis of the Basal Lineage Fungus Rhizopus oryzae Reveals a Whole-Genome Duplication

    Science.gov (United States)

    Ma, Li-Jun; Ibrahim, Ashraf S.; Skory, Christopher; Grabherr, Manfred G.; Burger, Gertraud; Butler, Margi; Elias, Marek; Idnurm, Alexander; Lang, B. Franz; Sone, Teruo; Abe, Ayumi; Calvo, Sarah E.; Corrochano, Luis M.; Engels, Reinhard; Fu, Jianmin; Hansberg, Wilhelm; Kim, Jung-Mi; Kodira, Chinnappa D.; Koehrsen, Michael J.; Liu, Bo; Miranda-Saavedra, Diego; O'Leary, Sinead; Ortiz-Castellanos, Lucila; Poulter, Russell; Rodriguez-Romero, Julio; Ruiz-Herrera, José; Shen, Yao-Qing; Zeng, Qiandong; Galagan, James; Birren, Bruce W.

    2009-01-01

    Rhizopus oryzae is the primary cause of mucormycosis, an emerging, life-threatening infection characterized by rapid angioinvasive growth with an overall mortality rate that exceeds 50%. As a representative of the paraphyletic basal group of the fungal kingdom called “zygomycetes,” R. oryzae is also used as a model to study fungal evolution. Here we report the genome sequence of R. oryzae strain 99–880, isolated from a fatal case of mucormycosis. The highly repetitive 45.3 Mb genome assembly contains abundant transposable elements (TEs), comprising approximately 20% of the genome. We predicted 13,895 protein-coding genes not overlapping TEs, many of which are paralogous gene pairs. The order and genomic arrangement of the duplicated gene pairs and their common phylogenetic origin provide evidence for an ancestral whole-genome duplication (WGD) event. The WGD resulted in the duplication of nearly all subunits of the protein complexes associated with respiratory electron transport chains, the V-ATPase, and the ubiquitin–proteasome systems. The WGD, together with recent gene duplications, resulted in the expansion of multiple gene families related to cell growth and signal transduction, as well as secreted aspartic protease and subtilase protein families, which are known fungal virulence factors. The duplication of the ergosterol biosynthetic pathway, especially the major azole target, lanosterol 14α-demethylase (ERG11), could contribute to the variable responses of R. oryzae to different azole drugs, including voriconazole and posaconazole. Expanded families of cell-wall synthesis enzymes, essential for fungal cell integrity but absent in mammalian hosts, reveal potential targets for novel and R. oryzae-specific diagnostic and therapeutic treatments. PMID:19578406

  2. Application of Whole Genome Sequencing Technology in the Investigation of Genetic Causes of Fetal, Perinatal, and Early Infant Death.

    Science.gov (United States)

    Armes, Jane E; Williams, Mark; Price, Gareth; Wallis, Tristan; Gallagher, Renee; Matsika, Admire; Joy, Christopher; Galea, Melanie; Gardener, Glenn; Leach, Rick; Swagemakers, Sigrid Ma; Tearle, Rick; Stubbs, Andrew; Harraway, James; van der Spek, Peter J; Venter, Deon J

    2017-01-01

    Death in the fetal, perinatal, and early infant age-group has a multitude of causes, a proportion of which is presumed to be genetic. Defining a specific genetic aberration leading to the death is problematic at this young age, due to limited phenotype-genotype correlation inherent in the underdeveloped phenotype, the inability to assess certain phenotypic traits after death, and the problems of dealing with rare disorders. In this study, our aim was to increase the yield of identification of a defined genetic cause of an early death. Therefore, we employed whole genome sequencing and bioinformatic filtering techniques as a comprehensive, unbiased genetic investigation into 16 fetal, perinatal, and early infant deaths, which had undergone a full autopsy. A likely genetic cause was identified in two cases (in genes; COL2A1 and RYR1) and a speculative genetic cause in a further six cases (in genes: ARHGAP35, BBS7, CASZ1, CRIM1, DHCR7, HADHB, HAPLN3, HSPG2, MYO18B, and SRGAP2). This investigation indicates that whole genome sequencing is a significantly enabling technology when determining genetic causes of early death.

  3. Wgssat: A High-Throughput Computational Pipeline for Mining and Annotation of SSR Markers from Whole Genomes.

    Science.gov (United States)

    Pandey, Manmohan; Kumar, Ravindra; Srivastava, Prachi; Agarwal, Suyash; Srivastava, Shreya; Nagpure, N S; Jena, J K; Kushwaha, Basdeo

    2017-09-16

    Mining and characterization of SSR markers from whole genomes provide valuable information about biological significance of SSR distribution and also facilitate development of markers for genetic analysis. WGS-SSR Annotation Tool (WGSSAT) is a graphical user interface pipeline developed using Java Netbeans and Perl scripts which facilitates in simplifying the process of SSR mining and characterization. WGSSAT takes input in FASTA format and automates the prediction of genes, ncRNA, core genes, repeats and SSRs from whole genomes followed by mapping of the predicted SSRs onto a genome (classified according to genes, ncRNA, repeats, exonic, intronic and core gene region) along with primer identification and mining of cross-species markers. The program also generates a detailed statistical report along with visualization of mapped SSRs, genes, core genes and RNAs. The features of WGSSAT were demonstrated using Takifugurubripes data. This yielded a total of 139057 SSR, out of which 113703 SSR primer pairs were uniquely amplified in silico onto a Takifugurubripes (fugu) genome. Out of 1,13,703 mined SSRs, 81,463 were from coding region (including 4,286 exonic and 77,177 intronic), 7 from RNA, 267 from core genes of fugu, whereas 1,05,641 SSR and 601 SSR primer pairs were uniquely mapped onto the medaka genome. WGSSAT is tested under Ubuntu Linux. The source code, documentation, user manual, example dataset and scripts are available online at https://sourceforge.net/projects/wgssat-nbfgr. © The American Genetic Association 2017. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  4. A bioinformatic approach to understanding antibiotic resistance in intracellular bacteria through whole genome analysis.

    Science.gov (United States)

    Biswas, Silpak; Raoult, Didier; Rolain, Jean-Marc

    2008-09-01

    Intracellular bacteria survive within eukaryotic host cells and are difficult to kill with certain antibiotics. As a result, antibiotic resistance in intracellular bacteria is becoming commonplace in healthcare institutions. Owing to the lack of methods available for transforming these bacteria, we evaluated the mechanisms of resistance using molecular methods and in silico genome analysis. The objective of this review was to understand the molecular mechanisms of antibiotic resistance through in silico comparisons of the genomes of obligate and facultative intracellular bacteria. The available data on in vitro mutants reported for intracellular bacteria were also reviewed. These genomic data were analysed to find natural mutations in known target genes involved in antibiotic resistance and to look for the presence or absence of different resistance determinants. Our analysis revealed the presence of tetracycline resistance protein (Tet) in Bartonella quintana, Francisella tularensis and Brucella ovis; moreover, most of the Francisella strains possessed the blaA gene, AmpG protein and metallo-beta-lactamase family protein. The presence or absence of folP (dihydropteroate synthase) and folA (dihydrofolate reductase) genes in the genome could explain natural resistance to co-trimoxazole. Finally, multiple genes encoding different efflux pumps were studied. This in silico approach was an effective method for understanding the mechanisms of antibiotic resistance in intracellular bacteria. The whole genome sequence analysis will help to predict several important phenotypic characteristics, in particular resistance to different antibiotics. In the future, stable mutants should be obtained through transformation methods in order to demonstrate experimentally the determinants of resistance in intracellular bacteria.

  5. Whole-genome resequencing uncovers molecular signatures of natural and sexual selection in wild bighorn sheep.

    Science.gov (United States)

    Kardos, Marty; Luikart, Gordon; Bunch, Rowan; Dewey, Sarah; Edwards, William; McWilliam, Sean; Stephenson, John; Allendorf, Fred W; Hogg, John T; Kijas, James

    2015-11-01

    The identification of genes influencing fitness is central to our understanding of the genetic basis of adaptation and how it shapes phenotypic variation in wild populations. Here, we used whole-genome resequencing of wild Rocky Mountain bighorn sheep (Ovis canadensis) to >50-fold coverage to identify 2.8 million single nucleotide polymorphisms (SNPs) and genomic regions bearing signatures of directional selection (i.e. selective sweeps). A comparison of SNP diversity between the X chromosome and the autosomes indicated that bighorn males had a dramatically reduced long-term effective population size compared to females. This probably reflects a long history of intense sexual selection mediated by male-male competition for mates. Selective sweep scans based on heterozygosity and nucleotide diversity revealed evidence for a selective sweep shared across multiple populations at RXFP2, a gene that strongly affects horn size in domestic ungulates. The massive horns carried by bighorn rams appear to have evolved in part via strong positive selection at RXFP2. We identified evidence for selection within individual populations at genes affecting early body growth and cellular response to hypoxia; however, these must be interpreted more cautiously as genetic drift is strong within local populations and may have caused false positives. These results represent a rare example of strong genomic signatures of selection identified at genes with known function in wild populations of a nonmodel species. Our results also showcase the value of reference genome assemblies from agricultural or model species for studies of the genomic basis of adaptation in closely related wild taxa. © 2015 John Wiley & Sons Ltd.

  6. Whole Genome Association Studies of Residual Feed Intake and Related Traits in the Pig.

    Science.gov (United States)

    Onteru, Suneel K; Gorbach, Danielle M; Young, Jennifer M; Garrick, Dorian J; Dekkers, Jack C M; Rothschild, Max F

    2013-01-01

    Residual feed intake (RFI), a measure of feed efficiency, is the difference between observed feed intake and the expected feed requirement predicted from growth and maintenance. Pigs with low RFI have reduced feed costs without compromising their growth. Identification of genes or genetic markers associated with RFI will be useful for marker-assisted selection at an early age of animals with improved feed efficiency. Whole genome association studies (WGAS) for RFI, average daily feed intake (ADFI), average daily gain (ADG), back fat (BF) and loin muscle area (LMA) were performed on 1,400 pigs from the divergently selected ISU-RFI lines, using the Illumina PorcineSNP60 BeadChip. Various statistical methods were applied to find SNPs and genomic regions associated with the traits, including a Bayesian approach using GenSel software, and frequentist approaches such as allele frequency differences between lines, single SNP and haplotype analyses using PLINK software. Single SNP and haplotype analyses showed no significant associations (except for LMA) after genomic control and FDR. Bayesian analyses found at least 2 associations for each trait at a false positive probability of 0.5. At generation 8, the RFI selection lines mainly differed in allele frequencies for SNPs near (energy homeostasis (e.g., MC4R, PGM1, GPR81) and muscle growth related genes (e.g., TGFB1) with ADG, and of fat metabolism genes (e.g., ACOXL, AEBP1) with BF. Specifically, a very highly significantly associated QTL for LMA on SSC7 with skeletal myogenesis genes (e.g., KLHL31) was identified for subsequent fine mapping. Important genomic regions associated with RFI related traits were identified for future validation studies prior to their incorporation in marker-assisted selection programs.

  7. Kuwaiti population subgroup of nomadic Bedouin ancestry—Whole genome sequence and analysis

    Directory of Open Access Journals (Sweden)

    Sumi Elsa John

    2015-03-01

    Full Text Available Kuwaiti native population comprises three distinct genetic subgroups of Persian, “city-dwelling” Saudi Arabian tribe, and nomadic “tent-dwelling” Bedouin ancestry. Bedouin subgroup is characterized by presence of 17% African ancestry; it owes it origin to nomadic tribes of the deserts of Arabian Peninsula and North Africa. By sequencing whole genome of a Kuwaiti male from this subgroup at 41X coverage, we report 3,752,878 SNPs, 411,839 indels, and 8451 structural variations. Neighbor-joining tree, based on shared variant positions carrying disease-risk alleles between the Bedouin and other continental genomes, places Bedouin genome at the nexus of African, Asian, and European genomes in concordance with geographical location of Kuwait and Peninsula. In congruence with participant's medical history for morbid obesity and bronchial asthma, risk alleles are seen at deleterious SNPs associated with obesity and asthma. Many of the observed deleterious ‘novel’ variants lie in genes associated with autosomal recessive disorders characteristic of the region.

  8. Whole Genome Sequencing-Based Mapping and Candidate Identification of Mutations from Fixed Zebrafish Tissue

    Directory of Open Access Journals (Sweden)

    Nicholas E. Sanchez

    2017-10-01

    Full Text Available As forward genetic screens in zebrafish become more common, the number of mutants that cannot be identified by gross morphology or through transgenic approaches, such as many nervous system defects, has also increased. Screening for these difficult-to-visualize phenotypes demands techniques such as whole-mount in situ hybridization (WISH or antibody staining, which require tissue fixation. To date, fixed tissue has not been amenable for generating libraries for whole genome sequencing (WGS. Here, we describe a method for using genomic DNA from fixed tissue and a bioinformatics suite for WGS-based mapping of zebrafish mutants. We tested our protocol using two known zebrafish mutant alleles, gpr126st49 and egr2bfh227, both of which cause myelin defects. As further proof of concept we mapped a novel mutation, stl64, identified in a zebrafish WISH screen for myelination defects. We linked stl64 to chromosome 1 and identified a candidate nonsense mutation in the F-box and WD repeat domain containing 7 (fbxw7 gene. Importantly, stl64 mutants phenocopy previously described fbxw7vu56 mutants, and knockdown of fbxw7 in wild-type animals produced similar defects, demonstrating that stl64 disrupts fbxw7. Together, these data show that our mapping protocol can map and identify causative lesions in mutant screens that require tissue fixation for phenotypic analysis.

  9. Use of whole genome shotgun metagenomics: a practical guide for the microbiome-minded physician scientist.

    Science.gov (United States)

    Ma, Jun; Prince, Amanda; Aagaard, Kjersti M

    2014-01-01

    Whole genome shotgun sequencing (WGS) has been increasingly recognized as the most comprehensive and robust approach for metagenomics research. When compared with 16S-based metagenomics, it offers the advantage of identification of species level taxonomy and the estimation of metabolic pathway activities from human and environmental samples. Several large-scale metagenomic projects have been recently conducted or are currently underway utilizing WGS. With the generation of vast amounts of data, the bioinformatics and computational analysis of WGS results become vital for the success of a metagenomics study. However, each step in the WGS data analysis, including metagenome assembly, gene prediction, taxonomy identification, function annotation, and pathway analysis, is complicated by the shear amount of data. Algorithms and tools have been developed specifically to handle WGS-generated metagenomics data with the hope of reducing the requirement on computational time and storage space. Here, we present an overview of the current state of metagenomics through WGS sequencing, challenges frequently encountered, and up-to-date solutions. Several applications that are uniquely applicable to microbiome studies in reproductive and perinatal medicine are also discussed. Thieme Medical Publishers 333 Seventh Avenue, New York, NY 10001, USA.

  10. Use of Whole Genome Sequencing for Diagnosis and Discovery in the Cancer Genetics Clinic.

    Science.gov (United States)

    Foley, Samantha B; Rios, Jonathan J; Mgbemena, Victoria E; Robinson, Linda S; Hampel, Heather L; Toland, Amanda E; Durham, Leslie; Ross, Theodora S

    2015-01-01

    Despite the potential of whole-genome sequencing (WGS) to improve patient diagnosis and care, the empirical value of WGS in the cancer genetics clinic is unknown. We performed WGS on members of two cohorts of cancer genetics patients: those with BRCA1/2 mutations (n = 176) and those without (n = 82). Initial analysis of potentially pathogenic variants (PPVs, defined as nonsynonymous variants with allele frequency WGS will provide useful clinical results. This is despite the fact that a majority of PPVs were novel missense variants likely to be classified as variants of unknown significance (VUS). Furthermore, previously reported pathogenic missense variants did not always associate with their predicted diseases in our patients. This suggests that the clinical use of WGS will require large-scale efforts to consolidate WGS and patient data to improve accuracy of interpretation of rare variants. While loss-of-function (LoF) variants represented only a small fraction of PPVs, WGS identified additional cancer risk LoF PPVs in patients with known BRCA1/2 mutations and led to cancer risk diagnoses in 21% of non-BRCA cancer genetics patients after expanding our analysis to 3209 ClinVar genes. These data illustrate how WGS can be used to improve our ability to discover patients' cancer genetic risks.

  11. Kuwaiti population subgroup of nomadic Bedouin ancestry-Whole genome sequence and analysis.

    Science.gov (United States)

    John, Sumi Elsa; Thareja, Gaurav; Hebbar, Prashantha; Behbehani, Kazem; Thanaraj, Thangavel Alphonse; Alsmadi, Osama

    2015-03-01

    Kuwaiti native population comprises three distinct genetic subgroups of Persian, "city-dwelling" Saudi Arabian tribe, and nomadic "tent-dwelling" Bedouin ancestry. Bedouin subgroup is characterized by presence of 17% African ancestry; it owes it origin to nomadic tribes of the deserts of Arabian Peninsula and North Africa. By sequencing whole genome of a Kuwaiti male from this subgroup at 41X coverage, we report 3,752,878 SNPs, 411,839 indels, and 8451 structural variations. Neighbor-joining tree, based on shared variant positions carrying disease-risk alleles between the Bedouin and other continental genomes, places Bedouin genome at the nexus of African, Asian, and European genomes in concordance with geographical location of Kuwait and Peninsula. In congruence with participant's medical history for morbid obesity and bronchial asthma, risk alleles are seen at deleterious SNPs associated with obesity and asthma. Many of the observed deleterious 'novel' variants lie in genes associated with autosomal recessive disorders characteristic of the region.

  12. Whole-genome sequencing of quartet families with autism spectrum disorder.

    Science.gov (United States)

    Yuen, Ryan K C; Thiruvahindrapuram, Bhooma; Merico, Daniele; Walker, Susan; Tammimies, Kristiina; Hoang, Ny; Chrysler, Christina; Nalpathamkalam, Thomas; Pellecchia, Giovanna; Liu, Yi; Gazzellone, Matthew J; D'Abate, Lia; Deneault, Eric; Howe, Jennifer L; Liu, Richard S C; Thompson, Ann; Zarrei, Mehdi; Uddin, Mohammed; Marshall, Christian R; Ring, Robert H; Zwaigenbaum, Lonnie; Ray, Peter N; Weksberg, Rosanna; Carter, Melissa T; Fernandez, Bridget A; Roberts, Wendy; Szatmari, Peter; Scherer, Stephen W

    2015-02-01

    Autism spectrum disorder (ASD) is genetically heterogeneous, with evidence for hundreds of susceptibility loci. Previous microarray and exome-sequencing studies have examined portions of the genome in simplex families (parents and one ASD-affected child) having presumed sporadic forms of the disorder. We used whole-genome sequencing (WGS) of 85 quartet families (parents and two ASD-affected siblings), consisting of 170 individuals with ASD, to generate a comprehensive data resource encompassing all classes of genetic variation (including noncoding variants) and accompanying phenotypes, in apparently familial forms of ASD. By examining de novo and rare inherited single-nucleotide and structural variations in genes previously reported to be associated with ASD or other neurodevelopmental disorders, we found that some (69.4%) of the affected siblings carried different ASD-relevant mutations. These siblings with discordant mutations tended to demonstrate more clinical variability than those who shared a risk variant. Our study emphasizes that substantial genetic heterogeneity exists in ASD, necessitating the use of WGS to delineate all genic and non-genic susceptibility variants in research and in clinical diagnostics.

  13. Whole-genome sequencing of a laboratory-evolved yeast strain

    Directory of Open Access Journals (Sweden)

    Dunham Maitreya J

    2010-02-01

    Full Text Available Abstract Background Experimental evolution of microbial populations provides a unique opportunity to study evolutionary adaptation in response to controlled selective pressures. However, until recently it has been difficult to identify the precise genetic changes underlying adaptation at a genome-wide scale. New DNA sequencing technologies now allow the genome of parental and evolved strains of microorganisms to be rapidly determined. Results We sequenced >93.5% of the genome of a laboratory-evolved strain of the yeast Saccharomyces cerevisiae and its ancestor at >28× depth. Both single nucleotide polymorphisms and copy number amplifications were found, with specific gains over array-based methodologies previously used to analyze these genomes. Applying a segmentation algorithm to quantify structural changes, we determined the approximate genomic boundaries of a 5× gene amplification. These boundaries guided the recovery of breakpoint sequences, which provide insights into the nature of a complex genomic rearrangement. Conclusions This study suggests that whole-genome sequencing can provide a rapid approach to uncover the genetic basis of evolutionary adaptations, with further applications in the study of laboratory selections and mutagenesis screens. In addition, we show how single-end, short read sequencing data can provide detailed information about structural rearrangements, and generate predictions about the genomic features and processes that underlie genome plasticity.

  14. MALINA: a web service for visual analytics of human gut microbiota whole-genome metagenomic reads.

    Science.gov (United States)

    Tyakht, Alexander V; Popenko, Anna S; Belenikin, Maxim S; Altukhov, Ilya A; Pavlenko, Alexander V; Kostryukova, Elena S; Selezneva, Oksana V; Larin, Andrei K; Karpova, Irina Y; Alexeev, Dmitry G

    2012-12-07

    MALINA is a web service for bioinformatic analysis of whole-genome metagenomic data obtained from human gut microbiota sequencing. As input data, it accepts metagenomic reads of various sequencing technologies, including long reads (such as Sanger and 454 sequencing) and next-generation (including SOLiD and Illumina). It is the first metagenomic web service that is capable of processing SOLiD color-space reads, to authors' knowledge. The web service allows phylogenetic and functional profiling of metagenomic samples using coverage depth resulting from the alignment of the reads to the catalogue of reference sequences which are built into the pipeline and contain prevalent microbial genomes and genes of human gut microbiota. The obtained metagenomic composition vectors are processed by the statistical analysis and visualization module containing methods for clustering, dimension reduction and group comparison. Additionally, the MALINA database includes vectors of bacterial and functional composition for human gut microbiota samples from a large number of existing studies allowing their comparative analysis together with user samples, namely datasets from Russian Metagenome project, MetaHIT and Human Microbiome Project (downloaded from http://hmpdacc.org). MALINA is made freely available on the web at http://malina.metagenome.ru. The website is implemented in JavaScript (using Ext JS), Microsoft .NET Framework, MS SQL, Python, with all major browsers supported.

  15. Whole-genome transcriptional analysis of heavy metal stresses inCaulobacter crescentus

    Energy Technology Data Exchange (ETDEWEB)

    Hu, Ping; Brodie, Eoin L.; Suzuki, Yohey; McAdams, Harley H.; Andersen, Gary L.

    2005-09-21

    The bacterium Caulobacter crescentus and related stalkbacterial species are known for their distinctive ability to live in lownutrient environments, a characteristic of most heavy metal contaminatedsites. Caulobacter crescentus is a model organism for studying cell cycleregulation with well developed genetics. We have identified the pathwaysresponding to heavy metal toxicity in C. crescentus to provide insightsfor possible application of Caulobacter to environmental restoration. Weexposed C. crescentus cells to four heavy metals (chromium, cadmium,selenium and uranium) and analyzed genome wide transcriptional activitiespost exposure using a Affymetrix GeneChip microarray. C. crescentusshowed surprisingly high tolerance to uranium, a possible mechanism forwhich may be formation of extracellular calcium-uranium-phosphateprecipitates. The principal response to these metals was protectionagainst oxidative stress (up-regulation of manganese-dependent superoxidedismutase, sodA). Glutathione S-transferase, thioredoxin, glutaredoxinsand DNA repair enzymes responded most strongly to cadmium and chromate.The cadmium and chromium stress response also focused on reducing theintracellular metal concentration, with multiple efflux pumps employed toremove cadmium while a sulfate transporter was down-regulated to reducenon-specific uptake of chromium. Membrane proteins were also up-regulatedin response to most of the metals tested. A two-component signaltransduction system involved in the uranium response was identified.Several differentially regulated transcripts from regions previously notknown to encode proteins were identified, demonstrating the advantage ofevaluating the transcriptome using whole genome microarrays.

  16. Functional regression method for whole genome eQTL epistasis analysis with sequencing data.

    Science.gov (United States)

    Xu, Kelin; Jin, Li; Xiong, Momiao

    2017-05-18

    Epistasis plays an essential rule in understanding the regulation mechanisms and is an essential component of the genetic architecture of the gene expressions. However, interaction analysis of gene expressions remains fundamentally unexplored due to great computational challenges and data availability. Due to variation in splicing, transcription start sites, polyadenylation sites, post-transcriptional RNA editing across the entire gene, and transcription rates of the cells, RNA-seq measurements generate large expression variability and collectively create the observed position level read count curves. A single number for measuring gene expression which is widely used for microarray measured gene expression analysis is highly unlikely to sufficiently account for large expression variation across the gene. Simultaneously analyzing epistatic architecture using the RNA-seq and whole genome sequencing (WGS) data poses enormous challenges. We develop a nonlinear functional regression model (FRGM) with functional responses where the position-level read counts within a gene are taken as a function of genomic position, and functional predictors where genotype profiles are viewed as a function of genomic position, for epistasis analysis with RNA-seq data. Instead of testing the interaction of all possible pair-wises SNPs, the FRGM takes a gene as a basic unit for epistasis analysis, which tests for the interaction of all possible pairs of genes and use all the information that can be accessed to collectively test interaction between all possible pairs of SNPs within two genome regions. By large-scale simulations, we demonstrate that the proposed FRGM for epistasis analysis can achieve the correct type 1 error and has higher power to detect the interactions between genes than the existing methods. The proposed methods are applied to the RNA-seq and WGS data from the 1000 Genome Project. The numbers of pairs of significantly interacting genes after Bonferroni correction

  17. Structural and functional-annotation of an equine whole genome oligoarray

    Directory of Open Access Journals (Sweden)

    Chowdhary Bhanu

    2009-10-01

    Full Text Available Abstract Background The horse genome is sequenced, allowing equine researchers to use high-throughput functional genomics platforms such as microarrays; next-generation sequencing for gene expression and proteomics. However, for researchers to derive value from these functional genomics datasets, they must be able to model this data in biologically relevant ways; to do so requires that the equine genome be more fully annotated. There are two interrelated types of genomic annotation: structural and functional. Structural annotation is delineating and demarcating the genomic elements (such as genes, promoters, and regulatory elements. Functional annotation is assigning function to structural elements. The Gene Ontology (GO is the de facto standard for functional annotation, and is routinely used as a basis for modelling and hypothesis testing, large functional genomics datasets. Results An Equine Whole Genome Oligonucleotide (EWGO array with 21,351 elements was developed at Texas A&M University. This 70-mer oligoarray was designed using the approximately 7× assembled and annotated sequence of the equine genome to be one of the most comprehensive arrays available for expressed equine sequences. To assist researchers in determining the biological meaning of data derived from this array, we have structurally annotated it by mapping the elements to multiple database accessions, including UniProtKB, Entrez Gene, NRPD (Non-Redundant Protein Database and UniGene. We next provided GO functional annotations for the gene transcripts represented on this array. Overall, we GO annotated 14,531 gene products (68.1% of the gene products represented on the EWGO array with 57,912 annotations. GAQ (GO Annotation Quality scores were calculated for this array both before and after we added GO annotation. The additional annotations improved the meanGAQ score 16-fold. This data is publicly available at AgBase http://www.agbase.msstate.edu/. Conclusion Providing

  18. Rapid Bacterial Whole-Genome Sequencing to Enhance Diagnostic and Public Health Microbiology

    Science.gov (United States)

    Reuter, Sandra; Ellington, Matthew J.; Cartwright, Edward J. P.; Köser, Claudio U.; Török, M. Estée; Gouliouris, Theodore; Harris, Simon R.; Brown, Nicholas M.; Holden, Matthew T. G.; Quail, Mike; Parkhill, Julian; Smith, Geoffrey P.; Bentley, Stephen D.; Peacock, Sharon J.

    2014-01-01

    IMPORTANCE The latest generation of benchtop DNA sequencing platforms can provide an accurate whole-genome sequence (WGS) for a broad range of bacteria in less than a day. These could be used to more effectively contain the spread of multidrug-resistant pathogens. OBJECTIVE To compare WGS with standard clinical microbiology practice for the investigation of nosocomial outbreaks caused by multidrug-resistant bacteria, the identification of genetic determinants of antimicrobial resistance, and typing of other clinically important pathogens. DESIGN, SETTING, AND PARTICIPANTS A laboratory-based study of hospital inpatients with a range of bacterial infections at Cambridge University Hospitals NHS Foundation Trust, a secondary and tertiary referral center in England, comparing WGS with standard diagnostic microbiology using stored bacterial isolates and clinical information. MAIN OUTCOMES AND MEASURES Specimens were taken and processed as part of routine clinical care, and cultured isolates stored and referred for additional reference laboratory testing as necessary. Isolates underwent DNA extraction and library preparation prior to sequencing on the Illumina MiSeq platform. Bioinformatic analyses were performed by persons blinded to the clinical, epidemiologic, and antimicrobial susceptibility data. RESULTS We investigated 2 putative nosocomial outbreaks, one caused by vancomycin-resistant Enterococcus faecium and the other by carbapenem-resistant Enterobacter cloacae; WGS accurately discriminated between outbreak and nonoutbreak isolates and was superior to conventional typing methods. We compared WGS with standard methods for the identification of the mechanism of carbapenem resistance in a range of gram-negative bacteria (Acinetobacter baumannii, E cloacae, Escherichia coli, and Klebsiella pneumoniae). This demonstrated concordance between phenotypic and genotypic results, and the ability to determine whether resistance was attributable to the presence of

  19. Quantitative trait loci markers derived from whole genome sequence data increases the reliability of genomic prediction.

    Science.gov (United States)

    Brøndum, R F; Su, G; Janss, L; Sahana, G; Guldbrandtsen, B; Boichard, D; Lund, M S

    2015-06-01

    This study investigated the effect on the reliability of genomic prediction when a small number of significant variants from single marker analysis based on whole genome sequence data were added to the regular 54k single nucleotide polymorphism (SNP) array data. The extra markers were selected with the aim of augmenting the custom low-density Illumina BovineLD SNP chip (San Diego, CA) used in the Nordic countries. The single-marker analysis was done breed-wise on all 16 index traits included in the breeding goals for Nordic Holstein, Danish Jersey, and Nordic Red cattle plus the total merit index itself. Depending on the trait's economic weight, 15, 10, or 5 quantitative trait loci (QTL) were selected per trait per breed and 3 to 5 markers were selected to tag each QTL. After removing duplicate markers (same marker selected for more than one trait or breed) and filtering for high pairwise linkage disequilibrium and assaying performance on the array, a total of 1,623 QTL markers were selected for inclusion on the custom chip. Genomic prediction analyses were performed for Nordic and French Holstein and Nordic Red animals using either a genomic BLUP or a Bayesian variable selection model. When using the genomic BLUP model including the QTL markers in the analysis, reliability was increased by up to 4 percentage points for production traits in Nordic Holstein animals, up to 3 percentage points for Nordic Reds, and up to 5 percentage points for French Holstein. Smaller gains of up to 1 percentage point was observed for mastitis, but only a 0.5 percentage point increase was seen for fertility. When using a Bayesian model accuracies were generally higher with only 54k data compared with the genomic BLUP approach, but increases in reliability were relatively smaller when QTL markers were included. Results from this study indicate that the reliability of genomic prediction can be increased by including markers significant in genome-wide association studies on whole genome

  20. MADS goes genomic in conifers: towards determining the ancestral set of MADS-box genes in seed plants.

    Science.gov (United States)

    Gramzow, Lydia; Weilandt, Lisa; Theißen, Günter

    2014-11-01

    MADS-box genes comprise a gene family coding for transcription factors. This gene family expanded greatly during land plant evolution such that the number of MADS-box genes ranges from one or two in green algae to around 100 in angiosperms. Given the crucial functions of MADS-box genes for nearly all aspects of plant development, the expansion of this gene family probably contributed to the increasing complexity of plants. However, the expansion of MADS-box genes during one important step of land plant evolution, namely the origin of seed plants, remains poorly understood due to the previous lack of whole-genome data for gymnosperms. The newly available genome sequences of Picea abies, Picea glauca and Pinus taeda were used to identify the complete set of MADS-box genes in these conifers. In addition, MADS-box genes were identified in the growing number of transcriptomes available for gymnosperms. With these datasets, phylogenies were constructed to determine the ancestral set of MADS-box genes of seed plants and to infer the ancestral functions of these genes. Type I MADS-box genes are under-represented in gymnosperms and only a minimum of two Type I MADS-box genes have been present in the most recent common ancestor (MRCA) of seed plants. In contrast, a large number of Type II MADS-box genes were found in gymnosperms. The MRCA of extant seed plants probably possessed at least 11-14 Type II MADS-box genes. In gymnosperms two duplications of Type II MADS-box genes were found, such that the MRCA of extant gymnosperms had at least 14-16 Type II MADS-box genes. The implied ancestral set of MADS-box genes for seed plants shows simplicity for Type I MADS-box genes and remarkable complexity for Type II MADS-box genes in terms of phylogeny and putative functions. The analysis of transcriptome data reveals that gymnosperm MADS-box genes are expressed in a great variety of tissues, indicating diverse roles of MADS-box genes for the development of gymnosperms. This study is

  1. Rapid and Easy In Silico Serotyping of Escherichia coli Isolates by Use of Whole-Genome Sequencing Data

    DEFF Research Database (Denmark)

    Joensen, Katrine Grimstrup; Tetzschner, Anna M. M.; Iguchi, Atsushi

    2015-01-01

    typing and surveillance. The aim of this study was to establish a valid and publicly available tool for WGS-based in silico serotyping of E. coli applicable for routine typing and surveillance. A FASTA database of specific O-antigen processing system genes for O typing and flagellin genes for H typing...... was created as a component of the publicly available Web tools hosted by the Center for Genomic Epidemiology (CGE) (www.genomicepidemiology.org). All E. coli isolates available with WGS data and conventional serotype information were subjected to WGS-based serotyping employing this specific SerotypeFinder CGE...... tool. SerotypeFinder was evaluated on 682 E. coli genomes, 108 of which were sequenced for this study, where both the whole genome and the serotype were available. In total, 601 and 509 isolates were included for O and H typing, respectively. The O-antigen genes wzx, wzy, wzm, and wzt and the flagellin...

  2. Whole genome sequencing of a dizygotic twin suggests a role for the serotonin receptor HTR7 in autism spectrum disorder.

    Science.gov (United States)

    Helsmoortel, Céline; Swagemakers, Sigrid M A; Vandeweyer, Geert; Stubbs, Andrew P; Palli, Ivo; Mortier, Geert; Kooy, R Frank; van der Spek, Peter J

    2016-12-01

    Whole genome sequencing of a severely affected dizygotic twin with an autism spectrum disorder and intellectual disability revealed a compound heterozygous mutation in the HTR7 gene as the only variation not detected in control databases. Each parent carries one allele of the mutation, which is not present in an unaffected stepsister. The HTR7 gene encodes the 5-HT7 serotonin receptor that is involved in brain development, synaptic transmission, and plasticity. The paternally inherited p.W60C variant is situated at an evolutionary conserved nucleotide and predicted damaging by Polyphen2. A mutation akin to the maternally inherited pV286I mutation has been reported to significantly affect the binding characteristics of the receptor. Therefore, the observed sequence alterations provide a first suggestive link between a genetic abnormality in the HTR7 gene and a neurodevelopmental disorder. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  3. A Minimal Set of Glycolytic Genes Reveals Strong Redundancies in Saccharomyces cerevisiae Central Metabolism.

    Science.gov (United States)

    Solis-Escalante, Daniel; Kuijpers, Niels G A; Barrajon-Simancas, Nuria; van den Broek, Marcel; Pronk, Jack T; Daran, Jean-Marc; Daran-Lapujade, Pascale

    2015-08-01

    As a result of ancestral whole-genome and small-scale duplication events, the genomes of Saccharomyces cerevisiae and many eukaryotes still contain a substantial fraction of duplicated genes. In all investigated organisms, metabolic pathways, and more particularly glycolysis, are specifically enriched for functionally redundant paralogs. In ancestors of the Saccharomyces lineage, the duplication of glycolytic genes is purported to have played an important role leading to S. cerevisiae's current lifestyle favoring fermentative metabolism even in the presence of oxygen and characterized by a high glycolytic capacity. In modern S. cerevisiae strains, the 12 glycolytic reactions leading to the biochemical conversion from glucose to ethanol are encoded by 27 paralogs. In order to experimentally explore the physiological role of this genetic redundancy, a yeast strain with a minimal set of 14 paralogs was constructed (the "minimal glycolysis" [MG] strain). Remarkably, a combination of a quantitative systems approach and semiquantitative analysis in a wide array of growth environments revealed the absence of a phenotypic response to the cumulative deletion of 13 glycolytic paralogs. This observation indicates that duplication of glycolytic genes is not a prerequisite for achieving the high glycolytic fluxes and fermentative capacities that are characteristic of S. cerevisiae and essential for many of its industrial applications and argues against gene dosage effects as a means of fixing minor glycolytic paralogs in the yeast genome. The MG strain was carefully designed and constructed to provide a robust prototrophic platform for quantitative studies and has been made available to the scientific community. Copyright © 2015, Solis-Escalante et al.

  4. A proposed clinical decision support architecture capable of supporting whole genome sequence information.

    Science.gov (United States)

    Welch, Brandon M; Loya, Salvador Rodriguez; Eilbeck, Karen; Kawamoto, Kensaku

    2014-04-04

    Whole genome sequence (WGS) information may soon be widely available to help clinicians personalize the care and treatment of patients. However, considerable barriers exist, which may hinder the effective utilization of WGS information in a routine clinical care setting. Clinical decision support (CDS) offers a potential solution to overcome such barriers and to facilitate the effective use of WGS information in the clinic. However, genomic information is complex and will require significant considerations when developing CDS capabilities. As such, this manuscript lays out a conceptual framework for a CDS architecture designed to deliver WGS-guided CDS within the clinical workflow. To handle the complexity and breadth of WGS information, the proposed CDS framework leverages service-oriented capabilities and orchestrates the interaction of several independently-managed components. These independently-managed components include the genome variant knowledge base, the genome database, the CDS knowledge base, a CDS controller and the electronic health record (EHR). A key design feature is that genome data can be stored separately from the EHR. This paper describes in detail: (1) each component of the architecture; (2) the interaction of the components; and (3) how the architecture attempts to overcome the challenges associated with WGS information. We believe that service-oriented CDS capabilities will be essential to using WGS information for personalized medicine.

  5. Whole genome PCR scanning (WGPS) of Coxiella burnetii strains from ruminants.

    Science.gov (United States)

    Sidi-Boumedine, Karim; Adam, Gilbert; Angen, Øysten; Aspán, Anna; Bossers, Alex; Roest, Hendrik-Jan; Prigent, Myriam; Thiéry, Richard; Rousset, Elodie

    2015-01-01

    Coxiella burnetii is the causative agent of Q fever, a zoonosis that spreads from ruminants to humans via the inhalation of aerosols contaminated by livestock's birth products. This study aimed to compare the genomes of strains isolated from ruminants by "Whole Genome PCR Scanning (WGPS)" in order to identify genomic differences. C. burnetii isolated from different ruminant hosts were compared to the Nine Mile reference strain using WGPS. The identified genomic regions of differences (RDs) were confirmed by sequencing. A set of 219 primers for amplification of 10 kbp segments covering the entire genome was obtained. The analyses revealed the presence of: i) conserved genomic regions, ii) genomic polymorphism including insertions and deletions and iii) amplification failures in some cases as well. WGPS, a descriptive approach, allowed the identification and localization of divergent genetic loci from various strains of C. burnetii which consisted of deletions, insertions and maybe genomic rearrangements. It also substantiates the role played by the IS1111 element in the genomic plasticity of C. burnetii. We believe that this approach could be combined with new sequencing technologies, as a selective/directed sequencing approach, particularly when repeated sequences are present in the analysed genomes. Copyright © 2015 Institut Pasteur. Published by Elsevier Masson SAS. All rights reserved.

  6. Bonus Organisms in High-Throughput Eukaryotic Whole-Genome Shorgun Assembly

    Energy Technology Data Exchange (ETDEWEB)

    Pangilinan, Jasmyn; Shapiro, Harris; Tu, Hank; Platt, Darren

    2006-02-06

    The DOE Joint Genome Institute has sequenced over 50 eukaryotic genomes, ranging in size from 15 MB to 1.6 GB, over a wide range of organism types. In the course of doing so, it has become clear that a substantial fraction of these data sets contains bonus organisms, usually prokaryotes, in addition to the desired genome. While some of these additional organisms are extraneous contamination, they are sometimes symbionts, and so can be of biological interest. Therefore, it is desirable to assemble the bonus organisms along with the main genome. This transforms the problem into one of metagenomic assembly, which is considerably more challenging than traditional whole-genome shotgun (WGS) assembly. The different organisms will usually be present at different sequence depths, which is difficult to handle in most WGS assemblers. In addition, with multiple distinct genomes present, chimerism can produce cross-organism combinations. Finally, there is no guarantee that only a single bonus organism will be present. For example, one JGI project contained at least two different prokaryotic contaminants, plus a 145 KB plasmid of unknown origin. We have developed techniques to routinely identify and handle such bonus organisms in a high-throughput sequencing environment. Approaches include screening and partitioning the unassembled data, and iterative subassemblies. These methods are applicable not only to bonus organisms, but also to desired components such as organelles. These procedures have the additional benefit of identifying, and allowing for the removal of, cloning artifacts such as E.coli and spurious vector inclusions.

  7. High Whole-Genome Sequence Diversity of Human Papillomavirus Type 18 Isolates

    Directory of Open Access Journals (Sweden)

    Pascal van der Weele

    2018-02-01

    Full Text Available Background: The most commonly found human papillomavirus (HPV types in cervical cancer are HPV16 and HPV18. Genome variants of these types have been associated with differential carcinogenic potential. To date, only a handful of studies have described HPV18 whole genome sequencing results. Here we describe HPV18 variant diversity and conservation of persistent infections in a longitudinal retrospective cohort study. Methods: Cervical self-samples were obtained annually over four years and genotyped on the SPF10-DEIA-LiPA25 platform. Clearing and persistent HPV18 positive infections were selected, amplified in two overlapping fragments, and sequenced using 32 sequence primers. Results: Complete viral genomes were obtained from 25 participants with persistent and 26 participants with clearing HPV18 infections, resulting in 52 unique HPV18 genomes. Sublineage A3 was predominant in this population. The consensus viral genome was completely conserved over time in persistent infections, with one exception, where different HPV18 variants were identified in follow-up samples. Conclusions: This study identified a diverse set of HPV18 variants. In persistent infections, the consensus viral genome is conserved. The identification of only one HPV18 infection with different major variants in follow-up implies that this is a potentially rare event. This dataset adds 52 HPV18 genome variants to Genbank, more than doubling the currently available HPV18 information resource, and all but one variant are unique additions.

  8. The present and future of de novo whole-genome assembly.

    Science.gov (United States)

    Sohn, Jang-Il; Nam, Jin-Wu

    2018-01-01

    As the advent of next-generation sequencing (NGS) technology, various de novo assembly algorithms based on the de Bruijn graph have been developed to construct chromosome-level sequences. However, numerous technical or computational challenges in de novo assembly still remain, although many bright ideas and heuristics have been suggested to tackle the challenges in both experimental and computational settings. In this review, we categorize de novo assemblers on the basis of the type of de Bruijn graphs (Hamiltonian and Eulerian) and discuss the challenges of de novo assembly for short NGS reads regarding computational complexity and assembly ambiguity. Then, we discuss how the limitations of the short reads can be overcome by using a single-molecule sequencing platform that generates long reads of up to several kilobases. In fact, the long read assembly has caused a paradigm shift in whole-genome assembly in terms of algorithms and supporting steps. We also summarize (i) hybrid assemblies using both short and long reads and (ii) overlap-based assemblies for long reads and discuss their challenges and future prospects. This review provides guidelines to determine the optimal approach for a given input data type, computational budget or genome. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  9. A synthetic rainbow trout linkage map provides new insights into the salmonid whole genome duplication and the conservation of synteny among teleosts.

    Science.gov (United States)

    Guyomard, René; Boussaha, Mekki; Krieg, Francine; Hervet, Caroline; Quillet, Edwige

    2012-03-16

    Rainbow trout is an economically important fish and a suitable experimental organism in many fields of biology including genome evolution, owing to the occurrence of a salmonid specific whole-genome duplication (4th WGD). Rainbow trout is among some of the most studied teleosts and has benefited from substantial efforts to develop genomic resources (e.g., linkage maps. Here, we first generated a synthetic map by merging segregation data files derived from three independent linkage maps. Then, we used it to evaluate genome conservation between rainbow trout and three teleost models, medaka, stickleback and zebrafish and to further investigate the extent of the 4th WGD in trout genome. The INRA linkage map was updated by adding 211 new markers. After standardization of marker names, consistency of marker assignment to linkage groups and marker orders was checked across the three different data sets and only loci showing consistent location over all or almost all of the data sets were kept. This resulted in a synthetic map consisting of 2226 markers and 29 linkage groups spanning over 3600 cM. Blastn searches against medaka, stickleback, and zebrafish genomic databases resulted in 778, 824 and 730 significant hits respectively while blastx searches yielded 505, 513 and 510 significant hits. Homology search results revealed that, for most rainbow trout chromosomes, large syntenic regions encompassing nearly whole chromosome arms have been conserved between rainbow trout and its closest models, medaka and stickleback. Large conserved syntenies were also found between the genomes of rainbow trout and the reconstructed teleost ancestor. These syntenies consolidated the known homeologous affinities between rainbow trout chromosomes due to the 4th WGD and suggested new ones. The synthetic map constructed herein further highlights the stability of the teleost genome over long evolutionary time scales. This map can be easily extended by incorporating new data sets and should

  10. Characterisation of a multidrug-resistant Bacteroides fragilis isolate recovered from blood of a patient in Denmark using whole-genome sequencing

    DEFF Research Database (Denmark)

    Ank, Nina; Sydenham, Thomas V; Iversen, Lene H

    2015-01-01

    Here we describe a patient undergoing extensive abdominal surgery and hyperthermic intraperitoneal chemotherapy due to primary adenocarcinoma in the sigmoid colon with peritoneal carcinomatosis. During hospitalisation the patient suffered from bacteraemia with a multidrug-resistant Bacteroides fr...... fragilis isolate. Whole-genome sequencing of the isolate resulted in identification of nimE, cfiA and ermF genes corresponding to metronidazole, carbapenem and clindamycin resistance....

  11. Whole genomic analysis of G2P[4] human Rotaviruses in Mymensingh, north-central Bangladesh

    Directory of Open Access Journals (Sweden)

    Satoru Aida

    2016-09-01

    Full Text Available Rotavirus A (RVA is a dominant causative agent of acute gastroenteritis in children worldwide. G2P[4] is one of the most common genotypes among human rotavirus (HRV strains, and has been persistently prevalent in South Asia including Bangladesh. In the present study, whole genome sequences of a total of 16 G2P[4] HRV strains (8 strains each in 2010 and 2013 detected in Mymensingh, north-central Bangladesh were determined. These strains had typical DS-1-like genotype constellation. Most of gene segments from DS-1 genogroup exhibited high level sequence identities to each other (>98%, while slight diversity was observed for VP1, VP3, and NSP4 genes. By phylogenetic analysis, individual RNA segments were classified into one (V or two-three lineages (V–VI or V–VII. In terms of lineages (sublineages of 11 gene segments, the 16 Bangladeshi strains could be further classified into four clades (A-D containing 8 lineage constellations, revealing the presence of three clades (A-C with three lineage constellations in 2010, and a single clade (D with four constellations in 2013. Therefore, co-existence of multiple G2P[4] HRV strains with different lineage constellations, and change in clades for the study period were demonstrated. Although amino acids in the antigenic regions on VP7 and VP4 were mostly identical to those of global G2P[4] strains after 2000, VP4 of clade D RVAs in 2013 had alanine and proline at positions 88 and 114, respectively, which are novel substitutions compared with recent global G2P[4] strains. Replacement of lineage constellations associated with unique amino acid changes in the antigenic region in VP4 suggested continuous genetic evolutionary state for emerging new G2P[4] rotavirus strains in Bangladesh.

  12. Whole genome evaluation of horizontal transfers in the pathogenic fungus Aspergillus fumigatus

    Directory of Open Access Journals (Sweden)

    Deschavanne Patrick

    2010-03-01

    Full Text Available Abstract Background Numerous cases of horizontal transfers (HTs have been described for eukaryote genomes, but in contrast to prokaryote genomes, no whole genome evaluation of HTs has been carried out. This is mainly due to a lack of parametric methods specially designed to take the intrinsic heterogeneity of eukaryote genomes into account. We applied a simple and tested method based on local variations of genomic signatures to analyze the genome of the pathogenic fungus Aspergillus fumigatus. Results We detected 189 atypical regions containing 214 genes, accounting for about 1 Mb of DNA sequences. However, the fraction of atypical DNA detected was smaller than the average amount detected in the same conditions in prokaryote genomes (3.1% vs 5.6%. It appeared that about one third of these regions contained no annotated genes, a proportion far greater than in prokaryote genomes. When analyzing the origin of these HTs by comparing their signatures to a home made database of species signatures, 3 groups of donor species emerged: bacteria (40%, fungi (25%, and viruses (22%. It is to be noticed that though inter-domain exchanges are confirmed, we only put in evidence very few exchanges between eukaryotic kingdoms. Conclusions In conclusion, we demonstrated that HTs are not negligible in eukaryote genomes, bearing in mind that in our stringent conditions this amount is a floor value, though of a lesser extent than in prokaryote genomes. The biological mechanisms underlying those transfers remain to be elucidated as well as the biological functions of the transferred genes.

  13. Functional diversification of vitamin D receptor paralogs in teleost fish after a whole genome duplication event.

    Science.gov (United States)

    Kollitz, Erin M; Hawkins, Mary Beth; Whitfield, G Kerr; Kullman, Seth W

    2014-12-01

    The diversity and success of teleost fishes (Actinopterygii) has been attributed to three successive rounds of whole-genome duplication (WGD). WGDs provide a source of raw genetic material for evolutionary forces to act upon, resulting in the divergence of genes with altered or novel functions. The retention of multiple gene pairs (paralogs) in teleosts provides a unique opportunity to study how genes diversify and evolve after a WGD. This study examines the hypothesis that vitamin D receptor (VDR) paralogs (VDRα and VDRβ) from two distantly related teleost orders have undergone functional divergence subsequent to the teleost-specific WGD. VDRα and VDRβ paralogs were cloned from the Japanese medaka (Beloniformes) and the zebrafish (Cypriniformes). Initial transactivation studies using 1α, 25-dihydroxyvitamin D3 revealed that although VDRα and VDRβ maintain similar ligand potency, the maximum efficacy of VDRβ was significantly attenuated compared with VDRα in both species. Subsequent analyses revealed that VDRα and VDRβ maintain highly similar ligand affinities; however, VDRα demonstrated preferential DNA binding compared with VDRβ. Protein-protein interactions between the VDR paralogs and essential nuclear receptor coactivators were investigated using transactivation and mammalian two-hybrid assays. Our results imply that functional differences between VDRα and VDRβ occurred early in teleost evolution because they are conserved between distantly related species. Our results further suggest that the observed differences may be associated with differential protein-protein interactions between the VDR paralogs and coactivators. We speculate that the observed functional differences are due to subtle ligand-induced conformational differences between the two paralogs, leading to divergent downstream functions.

  14. swDMR: A Sliding Window Approach to Identify Differentially Methylated Regions Based on Whole Genome Bisulfite Sequencing.

    Directory of Open Access Journals (Sweden)

    Zhen Wang

    Full Text Available DNA methylation is a widespread epigenetic modification that plays an essential role in gene expression through transcriptional regulation and chromatin remodeling. The emergence of whole genome bisulfite sequencing (WGBS represents an important milestone in the detection of DNA methylation. Characterization of differential methylated regions (DMRs is fundamental as well for further functional analysis. In this study, we present swDMR (http://sourceforge.net/projects/swDMR/ for the comprehensive analysis of DMRs from whole genome methylation profiles by a sliding window approach. It is an integrated tool designed for WGBS data, which not only implements accessible statistical methods to perform hypothesis test adapted to two or more samples without replicates, but false discovery rate was also controlled by multiple test correction. Downstream analysis tools were also provided, including cluster, annotation and visualization modules. In summary, based on WGBS data, swDMR can produce abundant information of differential methylated regions. As a convenient and flexible tool, we believe swDMR will bring us closer to unveil the potential functional regions involved in epigenetic regulation.

  15. Examining phylogenetic relationships among gibbon genera using whole genome sequence data using an approximate bayesian computation approach.

    Science.gov (United States)

    Veeramah, Krishna R; Woerner, August E; Johnstone, Laurel; Gut, Ivo; Gut, Marta; Marques-Bonet, Tomas; Carbone, Lucia; Wall, Jeff D; Hammer, Michael F

    2015-05-01

    Gibbons are believed to have diverged from the larger great apes ∼16.8 MYA and today reside in the rainforests of Southeast Asia. Based on their diploid chromosome number, the family Hylobatidae is divided into four genera, Nomascus, Symphalangus, Hoolock, and Hylobates. Genetic studies attempting to elucidate the phylogenetic relationships among gibbons using karyotypes, mitochondrial DNA (mtDNA), the Y chromosome, and short autosomal sequences have been inconclusive . To examine the relationships among gibbon genera in more depth, we performed second-generation whole genome sequencing (WGS) to a mean of ∼15× coverage in two individuals from each genus. We developed a coalescent-based approximate Bayesian computation (ABC) method incorporating a model of sequencing error generated by high coverage exome validation to infer the branching order, divergence times, and effective population sizes of gibbon taxa. Although Hoolock and Symphalangus are likely sister taxa, we could not confidently resolve a single bifurcating tree despite the large amount of data analyzed. Instead, our results support the hypothesis that all four gibbon genera diverged at approximately the same time. Assuming an autosomal mutation rate of 1 × 10(-9)/site/year this speciation process occurred ∼5 MYA during a period in the Early Pliocene characterized by climatic shifts and fragmentation of the Sunda shelf forests. Whole genome sequencing of additional individuals will be vital for inferring the extent of gene flow among species after the separation of the gibbon genera. Copyright © 2015 by the Genetics Society of America.

  16. Identification of molecular phenotypic descriptors of breast capsular contracture formation using informatics analysis of the whole genome transcriptome.

    Science.gov (United States)

    Kyle, Daniel J T; Harvey, Alison G; Shih, Barbara; Tan, Kian T; Chaudhry, Iskander H; Bayat, Ardeshir

    2013-01-01

    Breast capsular contracture formation following silicone implant augmentation/reconstruction is a common complication that remains poorly understood. The aim of this study was to identify potential biomarkers implicated in breast capsular contracture formation by using, for the first time, whole genome arrays. Biopsy samples were taken from 18 patients (23 breast capsules) with Baker Grade I-II (Control) and Baker Grade III-IV (Contracted). Whole genome microarrays were performed and six significantly dysregulated genes were selected for further validation with quantitative reverse transcriptase polymerase chain reaction and immunohistochemistry. Hematoxylin and eosin was also carried out to compare the histological characteristics of control and contracted samples. Microarray results showed that aggrecan, tissue inhibitor of metalloproteinase 4 (TIMP4), and tumor necrosis factor superfamily (ligand) member 11 were significantly down-regulated in contracted capsules; while matrix metallopeptidase 12, serum amyloid A 1, and interleukin 8 (IL8) were significantly up-regulated. The dysregulation of aggrecan, tumor necrosis factor superfamily (ligand) member 11, TIMP4, and IL8 was validated by quantitative reverse transcriptase polymerase chain reaction (p contracture formation. IL8 and TIMP4 may serve as potential key diagnostic, therapeutic, and prognostic biomarkers in capsular contracture formation. © 2013 by the Wound Healing Society.

  17. Functional and evolutionary analysis of Korean bob-tailed native dog using whole-genome sequencing data.

    Science.gov (United States)

    Lee, Daehwan; Lim, Dajeong; Kwon, Daehong; Kim, Juyeon; Lee, Jongin; Sim, Mikang; Choi, Bong-Hwan; Choi, Seog-Gyu; Kim, Jaebum

    2017-12-11

    Rapid and cost effective production of large-scale genome data through next-generation sequencing has enabled population-level studies of various organisms to identify their genotypic differences and phenotypic consequences. This is also used to study indigenous animals with historical and economical values, although they are less studied than model organisms. The objective of this study was to perform functional and evolutionary analysis of Korean bob-tailed native dog Donggyeong with distinct tail and agility phenotype using whole-genome sequencing data by using population and comparative genomics approaches. Based on the uniqueness of non-synonymous single nucleotide polymorphisms obtained from next-generation sequencing data, Donggyeong dog-specific genes/proteins and their functions were identified by comparison with 12 other dog breeds and six other related species. These proteins were further divided into subpopulation-specific ones with different tail length and protein interaction-level signatures were investigated. Finally, the trajectory of shaping protein interactions of subpopulation-specific proteins during evolution was uncovered. This study expands our knowledge of Korean native dogs. Our results also provide a good example of using whole-genome sequencing data for population-level analysis in closely related species.

  18. GeneSetDB: A comprehensive meta-database, statistical and visualisation framework for gene set analysis.

    Science.gov (United States)

    Araki, Hiromitsu; Knapp, Christoph; Tsai, Peter; Print, Cristin

    2012-01-01

    Most "omics" experiments require comprehensive interpretation of the biological meaning of gene lists. To address this requirement, a number of gene set analysis (GSA) tools have been developed. Although the biological value of GSA is strictly limited by the breadth of the gene sets used, very few methods exist for simultaneously analysing multiple publically available gene set databases. Therefore, we constructed GeneSetDB (http://genesetdb.auckland.ac.nz/haeremai.html), a comprehensive meta-database, which integrates 26 public databases containing diverse biological information with a particular focus on human disease and pharmacology. GeneSetDB enables users to search for gene sets containing a gene identifier or keyword, generate their own gene sets, or statistically test for enrichment of an uploaded gene list across all gene sets, and visualise gene set enrichment and overlap using a clustered heat map.

  19. Final Report Construction of Whole Genome Microarrays, and Expression Analysis of Desulfovibrio vulgaris cells in Metal-Reducing Conditions

    Energy Technology Data Exchange (ETDEWEB)

    M.W. Fields; J.D. Wall; J. Keasling; J. Zhou

    2008-05-15

    experimental results, a set of criteria were suggested for the design of gene-specific and group-specific oligonucleotide probes, and these criteria should provide valuable information for the development of new software and algorithms for microarray-based studies. Secondly, in order to empirically determine the effect of probe length on signal intensities, microarrays with oligonucleotides of different lengths were used to monitor gene expression at a whole genome level. To determine what length of oligonucleotide is a better alternative to PCR-generated probes, the performance of oligonucleotide probes was systematically compared to that of their PCR-generated counterparts for 96 genes from Shewanella oneidensis MR-1 in terms of overall signal intensity, numbers of detected genes, specificity, sensitivity and differential gene expression under experimental conditions. Hybridizations conducted at 42 C, 45 C, 50 C, and 60 C indicated that good sensitivities were obtained at 45 C for oligonucleotide probes in the presence of 50% formamide, under which conditions specific signals were detected by both PCR and oligonucleotide probes. Signal intensities increased as the length of oligonucleotide probes increased, and the 70mer oligonucleotide probes produced similar signal intensities and detected a similar number of ORFs compared to the PCR probes. cDNA, 70mer, 60mer and 50mer arrays had detection sensitivities at 5.0, 25, 100 and 100 ng of genomic DNA, or an approximately equivalent of 1.9 x 10{sup 6}, 9.2 x 10{sup 6}, 3.7 x 10{sup 7} and 3.7 x 10{sup 7} copies, respectively when the array was hybridized with genomic DNA. To evaluate differential gene expression under experimental conditions, S. oneidensis MR-1 cells were exposed to low or high pH conditions for 30 and 60 min, and the transcriptional profiling detected by oligonucleotide probes (50mer, 60mer, and 70mer) was closely correlated with that detected by the PCR probes. The results demonstrated that 70mer

  20. Whole-genome sequences of Chlamydia trachomatis directly from clinical samples without culture

    Science.gov (United States)

    Seth-Smith, Helena M.B.; Harris, Simon R.; Skilton, Rachel J.; Radebe, Frans M.; Golparian, Daniel; Shipitsyna, Elena; Duy, Pham Thanh; Scott, Paul; Cutcliffe, Lesley T.; O’Neill, Colette; Parmar, Surendra; Pitt, Rachel; Baker, Stephen; Ison, Catherine A.; Marsh, Peter; Jalal, Hamid; Lewis, David A.; Unemo, Magnus; Clarke, Ian N.; Parkhill, Julian; Thomson, Nicholas R.

    2013-01-01

    The use of whole-genome sequencing as a tool for the study of infectious bacteria is of growing clinical interest. Chlamydia trachomatis is responsible for sexually transmitted infections and the blinding disease trachoma, which affect hundreds of millions of people worldwide. Recombination is widespread within the genome of C. trachomatis, thus whole-genome sequencing is necessary to understand the evolution, diversity, and epidemiology of this pathogen. Culture of C. trachomatis has, until now, been a prerequisite to obtain DNA for whole-genome sequencing; however, as C. trachomatis is an obligate intracellular pathogen, this procedure is technically demanding and time consuming. Discarded clinical samples represent a large resource for sequencing the genomes of pathogens, yet clinical swabs frequently contain very low levels of C. trachomatis DNA and large amounts of contaminating microbial and human DNA. To determine whether it is possible to obtain whole-genome sequences from bacteria without the need for culture, we have devised an approach that combines immunomagnetic separation (IMS) for targeted bacterial enrichment with multiple displacement amplification (MDA) for whole-genome amplification. Using IMS-MDA in conjunction with high-throughput multiplexed Illumina sequencing, we have produced the first whole bacterial genome sequences direct from clinical samples. We also show that this method can be used to generate genome data from nonviable archived samples. This method will prove a useful tool in answering questions relating to the biology of many difficult-to-culture or fastidious bacteria of clinical concern. PMID:23525359

  1. Whole genome sequencing in the prevention and control of Staphylococcus aureus infection.

    Science.gov (United States)

    Price, J R; Didelot, X; Crook, D W; Llewelyn, M J; Paul, J

    2013-01-01

    Staphylococcus aureus remains a leading cause of hospital-acquired infection but weaknesses inherent in currently available typing methods impede effective infection prevention and control. The high resolution offered by whole genome sequencing has the potential to revolutionise our understanding and management of S. aureus infection. To outline the practicalities of whole genome sequencing and discuss how it might shape future infection control practice. We review conventional typing methods and compare these with the potential offered by whole genome sequencing. In contrast with conventional methods, whole genome sequencing discriminates down to single nucleotide differences and allows accurate characterisation of transmission events and outbreaks and additionally provides information about the genetic basis of phenotypic characteristics, including antibiotic susceptibility and virulence. However, translating its potential into routine practice will depend on affordability, acceptable turnaround times and on creating a reliable standardised bioinformatic infrastructure. Whole genome sequencing has the potential to provide a universal test that facilitates outbreak investigation, enables the detection of emerging strains and predicts their clinical importance. Copyright © 2012 The Healthcare Infection Society. Published by Elsevier Ltd. All rights reserved.

  2. When whole-genome alignments just won't work: kSNP v2 software for alignment-free SNP discovery and phylogenetics of hundreds of microbial genomes.

    Directory of Open Access Journals (Sweden)

    Shea N Gardner

    Full Text Available Effective use of rapid and inexpensive whole genome sequencing for microbes requires fast, memory efficient bioinformatics tools for sequence comparison. The kSNP v2 software finds single nucleotide polymorphisms (SNPs in whole genome data. kSNP v2 has numerous improvements over kSNP v1 including SNP gene annotation; better scaling for draft genomes available as assembled contigs or raw, unassembled reads; a tool to identify the optimal value of k; distribution of packages of executables for Linux and Mac OS X for ease of installation and user-friendly use; and a detailed User Guide. SNP discovery is based on k-mer analysis, and requires no multiple sequence alignment or the selection of a single reference genome. Most target sets with hundreds of genomes complete in minutes to hours. SNP phylogenies are built by maximum likelihood, parsimony, and distance, based on all SNPs, only core SNPs, or SNPs present in some intermediate user-specified fraction of targets. The SNP-based trees that result are consistent with known taxonomy. kSNP v2 can handle many gigabases of sequence in a single run, and if one or more annotated genomes are included in the target set, SNPs are annotated with protein coding and other information (UTRs, etc. from Genbank file(s. We demonstrate application of kSNP v2 on sets of viral and bacterial genomes, and discuss in detail analysis of a set of 68 finished E. coli and Shigella genomes and a set of the same genomes to which have been added 47 assemblies and four "raw read" genomes of H104:H4 strains from the recent European E. coli outbreak that resulted in both bloody diarrhea and hemolytic uremic syndrome (HUS, and caused at least 50 deaths.

  3. When whole-genome alignments just won't work: kSNP v2 software for alignment-free SNP discovery and phylogenetics of hundreds of microbial genomes.

    Science.gov (United States)

    Gardner, Shea N; Hall, Barry G

    2013-01-01

    Effective use of rapid and inexpensive whole genome sequencing for microbes requires fast, memory efficient bioinformatics tools for sequence comparison. The kSNP v2 software finds single nucleotide polymorphisms (SNPs) in whole genome data. kSNP v2 has numerous improvements over kSNP v1 including SNP gene annotation; better scaling for draft genomes available as assembled contigs or raw, unassembled reads; a tool to identify the optimal value of k; distribution of packages of executables for Linux and Mac OS X for ease of installation and user-friendly use; and a detailed User Guide. SNP discovery is based on k-mer analysis, and requires no multiple sequence alignment or the selection of a single reference genome. Most target sets with hundreds of genomes complete in minutes to hours. SNP phylogenies are built by maximum likelihood, parsimony, and distance, based on all SNPs, only core SNPs, or SNPs present in some intermediate user-specified fraction of targets. The SNP-based trees that result are consistent with known taxonomy. kSNP v2 can handle many gigabases of sequence in a single run, and if one or more annotated genomes are included in the target set, SNPs are annotated with protein coding and other information (UTRs, etc.) from Genbank file(s). We demonstrate application of kSNP v2 on sets of viral and bacterial genomes, and discuss in detail analysis of a set of 68 finished E. coli and Shigella genomes and a set of the same genomes to which have been added 47 assemblies and four "raw read" genomes of H104:H4 strains from the recent European E. coli outbreak that resulted in both bloody diarrhea and hemolytic uremic syndrome (HUS), and caused at least 50 deaths.

  4. Whole-genome shotgun optical mapping of Rhodobacter sphaeroides strain 2.4. 1 and its use for whole-genome shotgun sequence assembly

    Energy Technology Data Exchange (ETDEWEB)

    Shou, S. [Univ. Wisc.-Madison; Kvikstad, E. [Univ. Wisc.-Madison; Kile, A. [Univ. Wisc.-Madison; Severin, J. [Whole-genome shotgun optical mapping of Rhodobacter sphaeroides strain 2.4. 1 and its use for whole-genome shotgun sequence assembly; Forrest, D. [Univ. Wisc.-Madison; Runnheim, R. [Univ. Wisc.-Madison; Churas, C. [Univ. Wisc.-Madison; Hickman, J. W. [Univ. Wisc.-Madison; Mackenzie, C. [University of Texas–Houston Medical School; Choudhary, M. [University of Texas–Houston Medical School; Donohue, T. [Univ. Wisc.-Madison; Kaplan, S. [University of Texas–Houston Medical School; Schwartz, D. C. [Univ. Wisc.-Madison

    2003-09-01

    Rhodobacter sphaeroides 2.4.1 is a facultative photoheterotrophic bacterium with tremendous metabolic diversity, which has significantly contributed to our understanding of the molecular genetics of photosynthesis, photoheterotrophy, nitrogen fixation, hydrogen metabolism, carbon dioxide fixation, taxis, and tetrapyrrole biosynthesis. To further understand this remarkable bacterium, and to accelerate an ongoing sequencing project, two whole-genome restriction maps (EcoRI and HindIII) of R. sphaeroides strain 2.4.1 were constructed using shotgun optical mapping. The approach directly mapped genomic DNA by the random mapping of single molecules. The two maps were used to facilitate sequence assembly by providing an optical scaffold for high-resolution alignment and verification of sequence contigs. Our results show that such maps facilitated the closure of sequence gaps by the early detection of nascent sequence contigs during the course of the whole-genome shotgun sequencing process.

  5. Whole-genome typing and characterization of blaVIM19-harbouring ST383 Klebsiella pneumoniae by PFGE, whole-genome mapping and WGS.

    Science.gov (United States)

    Sabirova, Julia S; Xavier, Basil Britto; Coppens, Jasmine; Zarkotou, Olympia; Lammens, Christine; Janssens, Lore; Burggrave, Ronald; Wagner, Trevor; Goossens, Herman; Malhotra-Kumar, Surbhi

    2016-06-01

    We utilized whole-genome mapping (WGM) and WGS to characterize 12 clinical carbapenem-resistant Klebsiella pneumoniae strains (TGH1-TGH12). All strains were screened for carbapenemase genes by PCR, and typed by MLST, PFGE (XbaI) and WGM (AflII) (OpGen, USA). WGS (Illumina) was performed on TGH8 and TGH10. Reads were de novo assembled and annotated [SPAdes, Rapid Annotation Subsystem Technology (RAST)]. Contigs were aligned directly, and after in silico AflII restriction, with corresponding WGMs (MapSolver, OpGen; BioNumerics, Applied Maths). All 12 strains were ST383. Of the 12 strains, 11 were carbapenem resistant, 7 harboured blaKPC-2 and 11 harboured blaVIM-19. Varying the parameters for assigning WGM clusters showed that these were comparable to STs and to the eight PFGE types or subtypes (difference of three or more bands). A 95% similarity coefficient assigned all 12 WGMs to a single cluster, whereas a 99% similarity coefficient (or ≥10 unmatched-fragment difference) assigned the 12 WGMs to eight (sub)clusters. Based on a difference of three or more bands between PFGE profiles, the Simpson's diversity indices (SDIs) of WGM (0.94, Jackknife pseudo-values CI: 0.883-0.996) and PFGE (0.93, Jackknife pseudo-values CI: 0.828-1.000) were similar (P = 0.649). However, the discriminatory power of WGM was significantly higher (SDI: 0.94, Jackknife pseudo-values CI: 0.883-0.996) than that of PFGE profiles typed on a difference of seven or more bands (SDI: 0.53, Jackknife pseudo-values CI: 0.212-0.849) (P = 0.007). This study demonstrates the application of WGM to understanding the epidemiology of hospital-associated K. pneumoniae. Utilizing a combination of WGM and WGS, we also present here the first longitudinal genomic characterization of the highly dynamic carbapenem-resistant ST383 K. pneumoniae clone that is rapidly gaining importance in Europe. © The Author 2016. Published by Oxford University Press on behalf of the British Society for Antimicrobial

  6. Whole-genome sequencing of Bacillus velezensis LS69, a strain with a broad inhibitory spectrum against pathogenic bacteria.

    Science.gov (United States)

    Liu, Guoqiang; Kong, Yingying; Fan, Yajing; Geng, Ce; Peng, Donghai; Sun, Ming

    2017-05-10

    Bacillus velezensis LS69 was found to exhibit antagonistic activity against a diverse spectrum of pathogenic bacteria. It has one circular chromosome of 3,917,761bp with 3,643 open reading frames. Genome analysis identified ten gene clusters involved in nonribosomal synthesis of polyketides (macrolactin, bacillaene and difficidin), lipopeptides (surfactin, fengycin, bacilysin and iturin A) and bacteriocins (amylolysin and amylocyclicin). In addition, B. velezensis LS69 was found to contain a series of genes involved in enhancing plant growth and triggering plant immunity. Whole genome sequencing of Bacillus velezensis LS69 will provide a basis for elucidation of its biocontrol mechanisms and facilitate its applications in the future. Copyright © 2017 Elsevier B.V. All rights reserved.

  7. Whole genome sequence analysis of unidentified genetically modified papaya for development of a specific detection method.

    Science.gov (United States)

    Nakamura, Kosuke; Kondo, Kazunari; Akiyama, Hiroshi; Ishigaki, Takumi; Noguchi, Akio; Katsumata, Hiroshi; Takasaki, Kazuto; Futo, Satoshi; Sakata, Kozue; Fukuda, Nozomi; Mano, Junichi; Kitta, Kazumi; Tanaka, Hidenori; Akashi, Ryo; Nishimaki-Mogami, Tomoko

    2016-08-15

    Identification of transgenic sequences in an unknown genetically modified (GM) papaya (Carica papaya L.) by whole genome sequence analysis was demonstrated. Whole genome sequence data were generated for a GM-positive fresh papaya fruit commodity detected in monitoring using real-time polymerase chain reaction (PCR). The sequences obtained were mapped against an open database for papaya genome sequence. Transgenic construct- and event-specific sequences were identified as a GM papaya developed to resist infection from a Papaya ringspot virus. Based on the transgenic sequences, a specific real-time PCR detection method for GM papaya applicable to various food commodities was developed. Whole genome sequence analysis enabled identifying unknown transgenic construct- and event-specific sequences in GM papaya and development of a reliable method for detecting them in papaya food commodities. Copyright © 2016 Elsevier Ltd. All rights reserved.

  8. A Comparison of Whole Genome Sequencing to Multigene Panel Testing in Hypertrophic Cardiomyopathy Patients.

    Science.gov (United States)

    Cirino, Allison L; Lakdawala, Neal K; McDonough, Barbara; Conner, Lauren; Adler, Dale; Weinfeld, Mark; O'Gara, Patrick; Rehm, Heidi L; Machini, Kalotina; Lebo, Matthew; Blout, Carrie; Green, Robert C; MacRae, Calum A; Seidman, Christine E; Ho, Carolyn Y

    2017-10-01

    As DNA sequencing costs decline, genetic testing options have expanded. Whole exome sequencing and whole genome sequencing (WGS) are entering clinical use, posing questions about their incremental value compared with disease-specific multigene panels that have been the cornerstone of genetic testing. Forty-one patients with hypertrophic cardiomyopathy who had undergone targeted hypertrophic cardiomyopathy genetic testing (either multigene panel or familial variant test) were recruited into the MedSeq Project, a clinical trial of WGS. Results from panel genetic testing and WGS were compared. In 20 of 41 participants, panel genetic testing identified variants classified as pathogenic, likely pathogenic, or uncertain significance. WGS identified 19 of these 20 variants, but the variant detection algorithm missed a pathogenic 18 bp duplication in myosin binding protein C (MYBPC3) because of low coverage. In 3 individuals, WGS identified variants in genes implicated in cardiomyopathy but not included in prior panel testing: a pathogenic protein tyrosine phosphatase, non-receptor type 11 (PTPN11) variant and variants of uncertain significance in integrin-linked kinase (ILK) and filamin-C (FLNC). WGS also identified 84 secondary findings (mean=2 per person, range=0-6), which mostly defined carrier status for recessive conditions. WGS detected nearly all variants identified on panel testing, provided 1 new diagnostic finding, and allowed interrogation of posited disease genes. Several variants of uncertain clinical use and numerous secondary genetic findings were also identified. Whereas panel testing and WGS provided similar diagnostic yield, WGS offers the advantage of reanalysis over time to incorporate advances in knowledge, but requires expertise in genomic interpretation to appropriately incorporate WGS into clinical care. URL: https://clinicaltrials.gov. Unique identifier: NCT01736566. © 2017 American Heart Association, Inc.

  9. Comprehensive whole genome sequence analyses yields novel genetic and structural insights for Intellectual Disability.

    Science.gov (United States)

    Zahir, Farah R; Mwenifumbo, Jill C; Chun, Hye-Jung E; Lim, Emilia L; Van Karnebeek, Clara D M; Couse, Madeline; Mungall, Karen L; Lee, Leora; Makela, Nancy; Armstrong, Linlea; Boerkoel, Cornelius F; Langlois, Sylvie L; McGillivray, Barbara M; Jones, Steven J M; Friedman, Jan M; Marra, Marco A

    2017-05-24

    Intellectual Disability (ID) is among the most common global disorders, yet etiology is unknown in ~30% of patients despite clinical assessment. Whole genome sequencing (WGS) is able to interrogate the entire genome, providing potential to diagnose idiopathic patients. We conducted WGS on eight children with idiopathic ID and brain structural defects, and their normal parents; carrying out an extensive data analyses, using standard and discovery approaches. We verified de novo pathogenic single nucleotide variants (SNV) in ARID1B c.1595delG and PHF6 c.820C > T, potentially causative de novo two base indels in SQSTM1 c.115_116delinsTA and UPF1 c.1576_1577delinsA, and de novo SNVs in CACNB3 c.1289G > A, and SPRY4 c.508 T > A, of uncertain significance. We report results from a large secondary control study of 2081 exomes probing the pathogenicity of the above genes. We analyzed structural variation by four different algorithms including de novo genome assembly. We confirmed a likely contributory 165 kb de novo heterozygous 1q43 microdeletion missed by clinical microarray. The de novo assembly resulted in unmasking hidden genome instability that was missed by standard re-alignment based algorithms. We also interrogated regulatory sequence variation for known and hypothesized ID genes and present useful strategies for WGS data analyses for non-coding variation. This study provides an extensive analysis of WGS in the context of ID, providing genetic and structural insights into ID and yielding diagnoses.

  10. Practical Approaches for Whole-Genome Sequence Analysis of Heart- and Blood-Related Traits.

    Science.gov (United States)

    Morrison, Alanna C; Huang, Zhuoyi; Yu, Bing; Metcalf, Ginger; Liu, Xiaoming; Ballantyne, Christie; Coresh, Josef; Yu, Fuli; Muzny, Donna; Feofanova, Elena; Rustagi, Navin; Gibbs, Richard; Boerwinkle, Eric

    2017-02-02

    Whole-genome sequencing (WGS) allows for a comprehensive view of the sequence of the human genome. We present and apply integrated methodologic steps for interrogating WGS data to characterize the genetic architecture of 10 heart- and blood-related traits in a sample of 1,860 African Americans. In order to evaluate the contribution of regulatory and non-protein coding regions of the genome, we conducted aggregate tests of rare variation across the entire genomic landscape using a sliding window, complemented by an annotation-based assessment of the genome using predefined regulatory elements and within the first intron of all genes. These tests were performed treating all variants equally as well as with individual variants weighted by a measure of predicted functional consequence. Significant findings were assessed in 1,705 individuals of European ancestry. After these steps, we identified and replicated components of the genomic landscape significantly associated with heart- and blood-related traits. For two traits, lipoprotein(a) levels and neutrophil count, aggregate tests of low-frequency and rare variation were significantly associated across multiple motifs. For a third trait, cardiac troponin T, investigation of regulatory domains identified a locus on chromosome 9. These practical approaches for WGS analysis led to the identification of informative genomic regions and also showed that defined non-coding regions, such as first introns of genes and regulatory domains, are associated with important risk factor phenotypes. This study illustrates the tractable nature of WGS data and outlines an approach for characterizing the genetic architecture of complex traits. Copyright © 2017 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

  11. Whole-Genome Saliva and Blood DNA Methylation Profiling in Individuals with a Respiratory Allergy.

    Directory of Open Access Journals (Sweden)

    Sabine A S Langie

    Full Text Available The etiology of respiratory allergies (RA can be partly explained by DNA methylation changes caused by adverse environmental and lifestyle factors experienced early in life. Longitudinal, prospective studies can aid in the unravelment of the epigenetic mechanisms involved in the disease development. High compliance rates can be expected in these studies when data is collected using non-invasive and convenient procedures. Saliva is an attractive biofluid to analyze changes in DNA methylation patterns. We investigated in a pilot study the differential methylation in saliva of RA (n = 5 compared to healthy controls (n = 5 using the Illumina Methylation 450K BeadChip platform. We evaluated the results against the results obtained in mononuclear blood cells from the same individuals. Differences in methylation patterns from saliva and mononuclear blood cells were clearly distinguishable (PAdj0.2, though the methylation status of about 96% of the cg-sites was comparable between peripheral blood mononuclear cells and saliva. When comparing RA cases with healthy controls, the number of differentially methylated sites (DMS in saliva and blood were 485 and 437 (P0.1, respectively, of which 216 were in common. The methylation levels of these sites were significantly correlated between blood and saliva. The absolute levels of methylation in blood and saliva were confirmed for 3 selected DMS in the PM20D1, STK32C, and FGFR2 genes using pyrosequencing analysis. The differential methylation could only be confirmed for DMS in PM20D1 and STK32C genes in saliva. We show that saliva can be used for genome-wide methylation analysis and that it is possible to identify DMS when comparing RA cases and healthy controls. The results were replicated in blood cells of the same individuals and confirmed by pyrosequencing analysis. This study provides proof-of-concept for the applicability of saliva-based whole-genome methylation analysis in the field of respiratory allergy.

  12. The whole-genome landscape of medulloblastoma subtypes

    NARCIS (Netherlands)

    Northcott, Paul A.; Buchhalter, Ivo; Morrissy, A. Sorana; Hovestadt, Volker; Weischenfeldt, Joachim; Ehrenberger, Tobias; Gröbner, Susanne; Segura-Wang, Maia; Zichner, Thomas; Rudneva, Vasilisa A.; Warnatz, Hans-Jörg; Sidiropoulos, Nikos; Phillips, Aaron H.; Schumacher, Steven; Kleinheinz, Kortine; Waszak, Sebastian M.; Erkek, Serap; Jones, David T. W.; Worst, Barbara C.; Kool, Marcel; Zapatka, Marc; Jäger, Natalie; Chavez, Lukas; Hutter, Barbara; Bieg, Matthias; Paramasivam, Nagarajan; Heinold, Michael; Gu, Zuguang; Ishaque, Naveed; Jäger-Schmidt, Christina; Imbusch, Charles D.; Jugold, Alke; Hübschmann, Daniel; Risch, Thomas; Amstislavskiy, Vyacheslav; Gonzalez, Francisco German Rodriguez; Weber, Ursula D.; Wolf, Stephan; Robinson, Giles W.; Zhou, Xin; Wu, Gang; Finkelstein, David; Liu, Yanling; Cavalli, Florence M. G.; Luu, Betty; Ramaswamy, Vijay; Wu, Xiaochong; Koster, Jan; Ryzhova, Marina; Cho, Yoon-Jae; Pomeroy, Scott L.; Herold-Mende, Christel; Schuhmann, Martin; Ebinger, Martin; Liau, Linda M.; Mora, Jaume; McLendon, Roger E.; Jabado, Nada; Kumabe, Toshihiro; Chuah, Eric; Ma, Yussanne; Moore, Richard A.; Mungall, Andrew J.; Mungall, Karen L.; Thiessen, Nina; Tse, Kane; Wong, Tina; Jones, Steven J. M.; Witt, Olaf; Milde, Till; von Deimling, Andreas; Capper, David; Korshunov, Andrey; Yaspo, Marie-Laure; Kriwacki, Richard; Gajjar, Amar; Zhang, Jinghui; Beroukhim, Rameen; Fraenkel, Ernest; Korbel, Jan O.; Brors, Benedikt; Schlesner, Matthias; Eils, Roland; Marra, Marco A.; Pfister, Stefan M.; Taylor, Michael D.; Lichter, Peter

    2017-01-01

    Current therapies for medulloblastoma, a highly malignant childhood brain tumour, impose debilitating effects on the developing child, and highlight the need for molecularly targeted treatments with reduced toxicity. Previous studies have been unable to identify the full spectrum of driver genes and

  13. The whole-genome landscape of medulloblastoma subtypes

    DEFF Research Database (Denmark)

    Northcott, Paul A.; Buchhalter, Ivo; Morrissy, A. Sorana

    2017-01-01

    Current therapies for medulloblastoma, a highly malignant childhood brain tumour, impose debilitating effects on the developing child, and highlight the need for molecularly targeted treatments with reduced toxicity. Previous studies have been unable to identify the full spectrum of driver genes ...

  14. Diagnostic value of exome and whole genome sequencing in craniosynostosis

    NARCIS (Netherlands)

    K.A. Miller (Kerry A.); S.R.F. Twigg (Stephen); S.J. McGowan (Simon); J.M. Phipps (Julie); A.L. Fenwick (Aimée); D. Johnson (David); S.A. Wall (Steven); P. Noons (Peter); Rees, K.E.M. (Katie E.M.); Tidey, E.A. (Elizabeth A.); Craft, J. (Judith); Taylor, J. (John); Taylor, J.C. (Jenny C.); J.A.C. Goos (Jacqueline); S.M.A. Swagemakers (Sigrid); I.M.J. Mathijssen (Irene); P.J. van der Spek (Peter); H. Lord (Helen); K.J. Lester (Kathryn); Abid, N. (Noina); Cilliers, D. (Deirdre); J.A. Hurst (Jane); J. Morton (Jenny); E. Sweeney (Elizabeth); Weber, A. (Astrid); L.C. Wilson (Louise); A.O.M. Wilkie (Andrew)

    2017-01-01

    textabstractBackground Craniosynostosis, the premature fusion of one or more cranial sutures, occurs in ~1 in 2250 births, either in isolation or as part of a syndrome. Mutations in at least 57 genes have been associated with craniosynostosis, but only a minority of these are included in routine

  15. Whole-genome analyses of speciation events in pathogenic Brucellae

    Energy Technology Data Exchange (ETDEWEB)

    Chain, Patrick S. G. [Lawrence Livermore National Laboratory (LLNL); Comerci, Diego J. [Universidad Nacional de General San Martin; Tolmasky, Marcelo E. [California State University; Larimer, Frank W [ORNL; Malfatti, Stephanie [Lawrence Livermore National Laboratory (LLNL); Vergez, Lisa [Lawrence Livermore National Laboratory (LLNL); Aguero, Fernan [Universidad Nacional de General San Martin; Land, Miriam L [ORNL; Ugalde, Rodolfo A. [Universidad Nacional de General San Martin; Garcia, Emilio [Lawrence Livermore National Laboratory (LLNL)

    2005-12-01

    Despite their high DNA identity and a proposal to group classical Brucella species as biovars of Brucella melitensis, the commonly recognized Brucella species can be distinguished by distinct biochemical and fatty acid characters, as well as by a marked host range (e.g., Brucella suis for swine, B. melitensis for sheep and goats, and Brucella abortus for cattle). Here we present the genome of B. abortus 2308, the virulent prototype biovar 1 strain, and its comparison to the two other human pathogenic Brucella species and to B. abortus field isolate 9-941. The global distribution of pseudogenes, deletions, and insertions supports previous indications that B. abortus and B. melitensis share a common ancestor that diverged from B. suis. With the exception of a dozen genes, the genetic complements of both B. abortus strains are identical, whereas the three species differ in gene content and pseudogenes. The pattern of species-specific gene inactivations affecting transcriptional regulators and outer membrane proteins suggests that these inactivations may play an important role in the establishment of host specificity and may have been a primary driver of speciation in the genus Brucella. Despite being nonmotile, the brucellae contain flagellum gene clusters and display species-specific flagellar gene inactivations, which lead to the putative generation of different versions of flagellum-derived structures and may contribute to differences in host specificity and virulence. Metabolic changes such as the lack of complete metabolic pathways for the synthesis of numerous compounds (e.g., glycogen, biotin, NAD, and choline) are consistent with adaptation of brucellae to an intracellular life-style.

  16. Whole genome sequence of Treponema pallidum ssp. pallidum, strain Mexico A, suggests recombination between yaws and syphilis strains.

    Directory of Open Access Journals (Sweden)

    Helena Pětrošová

    Full Text Available Treponema pallidum ssp. pallidum (TPA, the causative agent of syphilis, and Treponema pallidum ssp. pertenue (TPE, the causative agent of yaws, are closely related spirochetes causing diseases with distinct clinical manifestations. The TPA Mexico A strain was isolated in 1953 from male, with primary syphilis, living in Mexico. Attempts to cultivate TPA Mexico A strain under in vitro conditions have revealed lower growth potential compared to other tested TPA strains.The complete genome sequence of the TPA Mexico A strain was determined using the Illumina sequencing technique. The genome sequence assembly was verified using the whole genome fingerprinting technique and the final sequence was annotated. The genome size of the Mexico A strain was determined to be 1,140,038 bp with 1,035 predicted ORFs. The Mexico A genome sequence was compared to the whole genome sequences of three TPA (Nichols, SS14 and Chicago and three TPE (CDC-2, Samoa D and Gauthier strains. No large rearrangements in the Mexico A genome were found and the identified nucleotide changes occurred most frequently in genes encoding putative virulence factors. Nevertheless, the genome of the Mexico A strain, revealed two genes (TPAMA_0326 (tp92 and TPAMA_0488 (mcp2-1 which combine TPA- and TPE- specific nucleotide sequences. Both genes were found to be under positive selection within TPA strains and also between TPA and TPE strains.The observed mosaic character of the TPAMA_0326 and TPAMA_0488 loci is likely a result of inter-strain recombination between TPA and TPE strains during simultaneous infection of a single host suggesting horizontal gene transfer between treponemal subspecies.

  17. Rapid and Easy In Silico Serotyping of Escherichia coli Isolates by Use of Whole-Genome Sequencing Data.

    Science.gov (United States)

    Joensen, Katrine G; Tetzschner, Anna M M; Iguchi, Atsushi; Aarestrup, Frank M; Scheutz, Flemming

    2015-08-01

    Accurate and rapid typing of pathogens is essential for effective surveillance and outbreak detection. Conventional serotyping of Escherichia coli is a delicate, laborious, time-consuming, and expensive procedure. With whole-genome sequencing (WGS) becoming cheaper, it has vast potential in routine typing and surveillance. The aim of this study was to establish a valid and publicly available tool for WGS-based in silico serotyping of E. coli applicable for routine typing and surveillance. A FASTA database of specific O-antigen processing system genes for O typing and flagellin genes for H typing was created as a component of the publicly available Web tools hosted by the Center for Genomic Epidemiology (CGE) (www.genomicepidemiology.org). All E. coli isolates available with WGS data and conventional serotype information were subjected to WGS-based serotyping employing this specific SerotypeFinder CGE tool. SerotypeFinder was evaluated on 682 E. coli genomes, 108 of which were sequenced for this study, where both the whole genome and the serotype were available. In total, 601 and 509 isolates were included for O and H typing, respectively. The O-antigen genes wzx, wzy, wzm, and wzt and the flagellin genes fliC, flkA, fllA, flmA, and flnA were detected in 569 and 508 genome sequences, respectively. SerotypeFinder for WGS-based O and H typing predicted 560 of 569 O types and 504 of 508 H types, consistent with conventional serotyping. In combination with other available WGS typing tools, E. coli serotyping can be performed solely from WGS data, providing faster and cheaper typing than current routine procedures and making WGS typing a superior alternative to conventional typing strategies. Copyright © 2015, American Society for Microbiology. All Rights Reserved.

  18. Whole-Genome Sequences of Two Borrelia afzelii and Two Borrelia garinii Lyme Disease Agent Isolates

    Energy Technology Data Exchange (ETDEWEB)

    Casjens, S.R.; Dunn, J.; Mongodin, E. F.; Qiu, W.-G.; Luft, B. J.; Fraser-Liggett, C. M.; Schutzer, S. E.

    2011-12-01

    Human Lyme disease is commonly caused by several species of spirochetes in the Borrelia genus. In Eurasia these species are largely Borrelia afzelii, B. garinii, B. burgdorferi, and B. bavariensis sp. nov. Whole-genome sequencing is an excellent tool for investigating and understanding the influence of bacterial diversity on the pathogenesis and etiology of Lyme disease. We report here the whole-genome sequences of four isolates from two of the Borrelia species that cause human Lyme disease, B. afzelii isolates ACA-1 and PKo and B. garinii isolates PBr and Far04.

  19. A randomization test for controlling population stratification in whole-genome association studies.

    Science.gov (United States)

    Kimmel, Gad; Jordan, Michael I; Halperin, Eran; Shamir, Ron; Karp, Richard M

    2007-11-01

    Population stratification can be a serious obstacle in the analysis of genomewide association studies. We propose a method for evaluating the significance of association scores in whole-genome cohorts with stratification. Our approach is a randomization test akin to a standard permutation test. It conditions on the genotype matrix and thus takes into account not only the population structure but also the complex linkage disequilibrium structure of the genome. As we show in simulation experiments, our method achieves higher power and significantly better control over false-positive rates than do existing methods. In addition, it can be easily applied to whole-genome association studies.

  20. Whole-Genome de novo Sequencing Of Quail And Grey Partridge

    DEFF Research Database (Denmark)

    Holm, Lars-Erik; Panitz, Frank; Burt, Dave

    2011-01-01

    The development in sequencing methods has made it possible to perform whole genome de novo sequencing of species without large commercial interests. Within the EU-financed QUANTOMICS project (KBBE-2A-222664), we have performed de novo sequencing of quail (Coturnix coturnix) and grey partridge...... comparative studies towards the chicken genome and will aid in identifying evolutionarily conserved sequences within the Galliformes. The obtained sequences from quail and partridge represent a beginning of generating the whole genome sequence for these species. The continuation of establishing the genome...

  1. Whole Genome Analysis of a Wine Yeast Strain

    Science.gov (United States)

    Hauser, Nicole C.; Fellenberg, Kurt; Gil, Rosario; Bastuck, Sonja; Hoheisel, Jörg D.

    2001-01-01

    Saccharomyces cerevisiae strains frequently exhibit rather specific phenotypic features needed for adaptation to a special environment. Wine yeast strains are able to ferment musts, for example, while other industrial or laboratory strains fail to do so. The genetic differences that characterize wine yeast strains are poorly understood, however. As a first search of genetic differences between wine and laboratory strains, we performed DNA-array analyses on the typical wine yeast strain T73 and the standard laboratory background in S288c. Our analysis shows that even under normal conditions, logarithmic growth in YPD medium, the two strains have expression patterns that differ significantly in more than 40 genes. Subsequent studies indicated that these differences correlate with small changes in promoter regions or variations in gene copy number. Blotting copy numbers vs. transcript levels produced patterns, which were specific for the individual strains and could be used for a characterization of unknown samples. PMID:18628902

  2. Whole Genome Association Studies of Residual Feed Intake and Related Traits in the Pig.

    Directory of Open Access Journals (Sweden)

    Suneel K Onteru

    Full Text Available Residual feed intake (RFI, a measure of feed efficiency, is the difference between observed feed intake and the expected feed requirement predicted from growth and maintenance. Pigs with low RFI have reduced feed costs without compromising their growth. Identification of genes or genetic markers associated with RFI will be useful for marker-assisted selection at an early age of animals with improved feed efficiency.Whole genome association studies (WGAS for RFI, average daily feed intake (ADFI, average daily gain (ADG, back fat (BF and loin muscle area (LMA were performed on 1,400 pigs from the divergently selected ISU-RFI lines, using the Illumina PorcineSNP60 BeadChip. Various statistical methods were applied to find SNPs and genomic regions associated with the traits, including a Bayesian approach using GenSel software, and frequentist approaches such as allele frequency differences between lines, single SNP and haplotype analyses using PLINK software. Single SNP and haplotype analyses showed no significant associations (except for LMA after genomic control and FDR. Bayesian analyses found at least 2 associations for each trait at a false positive probability of 0.5. At generation 8, the RFI selection lines mainly differed in allele frequencies for SNPs near (<0.05 Mb genes that regulate insulin release and leptin functions. The Bayesian approach identified associations of genomic regions containing insulin release genes (e.g., GLP1R, CDKAL, SGMS1 with RFI and ADFI, of regions with energy homeostasis (e.g., MC4R, PGM1, GPR81 and muscle growth related genes (e.g., TGFB1 with ADG, and of fat metabolism genes (e.g., ACOXL, AEBP1 with BF. Specifically, a very highly significantly associated QTL for LMA on SSC7 with skeletal myogenesis genes (e.g., KLHL31 was identified for subsequent fine mapping.Important genomic regions associated with RFI related traits were identified for future validation studies prior to their incorporation in marker

  3. Whole genome analysis of selected human and animal rotaviruses identified in Uganda from 2012 to 2014 reveals complex genome reassortment events between human, bovine, caprine and porcine strains.

    Science.gov (United States)

    Bwogi, Josephine; Jere, Khuzwayo C; Karamagi, Charles; Byarugaba, Denis K; Namuwulya, Prossy; Baliraine, Frederick N; Desselberger, Ulrich; Iturriza-Gomara, Miren

    2017-01-01

    Rotaviruses of species A (RVA) are a common cause of diarrhoea in children and the young of various other mammals and birds worldwide. To investigate possible interspecies transmission of RVAs, whole genomes of 18 human and 6 domestic animal RVA strains identified in Uganda between 2012 and 2014 were sequenced using the Illumina HiSeq platform. The backbone of the human RVA strains had either a Wa- or a DS-1-like genetic constellation. One human strain was a Wa-like mono-reassortant containing a DS-1-like VP2 gene of possible animal origin. All eleven genes of one bovine RVA strain were closely related to those of human RVAs. One caprine strain had a mixed genotype backbone, suggesting that it emerged from multiple reassortment events involving different host species. The porcine RVA strains had mixed genotype backbones with possible multiple reassortant events with strains of human and bovine origin.Overall, whole genome characterisation of rotaviruses found in domestic animals in Uganda strongly suggested the presence of human-to animal RVA transmission, with concomitant circulation of multi-reassortant strains potentially derived from complex interspecies transmission events. However, whole genome data from the human RVA strains causing moderate and severe diarrhoea in under-fives in Uganda indicated that they were primarily transmitted from person-to-person.

  4. Extremely low-coverage whole genome sequencing in South Asians captures population genomics information.

    Science.gov (United States)

    Rustagi, Navin; Zhou, Anbo; Watkins, W Scott; Gedvilaite, Erika; Wang, Shuoguo; Ramesh, Naveen; Muzny, Donna; Gibbs, Richard A; Jorde, Lynn B; Yu, Fuli; Xing, Jinchuan

    2017-05-22

    The cost of Whole Genome Sequencing (WGS) has decreased tremendously in recent years due to advances in next-generation sequencing technologies. Nevertheless, the cost of carrying out large-scale cohort studies using WGS is still daunting. Past simulation studies with coverage at ~2x have shown promise for using low coverage WGS in studies focused on variant discovery, association study replications, and population genomics characterization. However, the performance of low coverage WGS in populations with a complex history and no reference panel remains to be determined. South Indian populations are known to have a complex population structure and are an example of a major population group that lacks adequate reference panels. To test the performance of extremely low-coverage WGS (EXL-WGS) in populations with a complex history and to provide a reference resource for South Indian populations, we performed EXL-WGS on 185 South Indian individuals from eight populations to ~1.6x coverage. Using two variant discovery pipelines, SNPTools and GATK, we generated a consensus call set that has ~90% sensitivity for identifying common variants (minor allele frequency ≥ 10%). Imputation further improves the sensitivity of our call set. In addition, we obtained high-coverage for the whole mitochondrial genome to infer the maternal lineage evolutionary history of the Indian samples. Overall, we demonstrate that EXL-WGS with imputation can be a valuable study design for variant discovery with a dramatically lower cost than standard WGS, even in populations with a complex history and without available reference data. In addition, the South Indian EXL-WGS data generated in this study will provide a valuable resource for future Indian genomic studies.

  5. Whole-Genome Expression Analysis and Signal Pathway Screening of Synovium-Derived Mesenchymal Stromal Cells in Rheumatoid Arthritis

    Directory of Open Access Journals (Sweden)

    Jingyi Hou

    2016-01-01

    Full Text Available Synovium-derived mesenchymal stromal cells (SMSCs may play an important role in the pathogenesis of rheumatoid arthritis (RA and show promise for therapeutic applications in RA. In this study, a whole-genome microarray analysis was used to detect differential gene expression in SMSCs from RA patients and healthy donors (HDs. Our results showed that there were 4828 differentially expressed genes in the RA group compared to the HD group; 3117 genes were upregulated, and 1711 genes were downregulated. A Gene Ontology analysis showed significantly enriched terms of differentially expressed genes in the biological process, cellular component, and molecular function domains. A Kyoto Encyclopedia of Genes and Genomes analysis showed that the MAPK signaling and rheumatoid arthritis pathways were upregulated and that the p53 signaling pathway was downregulated in RA SMSCs. Quantitative real-time polymerase chain reaction was applied to verify the expression variations of the partial genes mentioned above, and a western blot analysis was used to determine the expression levels of p53, p-JNK, p-ERK, and p-p38. Our study found that differentially expressed genes in the MAPK signaling, rheumatoid arthritis, and p53 signaling pathways may help to explain the pathogenic mechanism of RA and lead to therapeutic RA SMSC applications.

  6. Whole genome sequencing for typing and characterisation of Listeria monocytogenes isolated in a rabbit meat processing plant

    Science.gov (United States)

    Palma, Federica; Pasquali, Frédérique; Lucchi, Alex; Cesare, Alessandra De; Manfreda, Gerardo

    2017-01-01

    Listeria monocytogenes is a food-borne pathogen able to survive and grow in different environments including food processing plants where it can persist for month or years. In the present study the discriminatory power of Whole Genome Sequencing (WGS)-based analysis (cgMLST) was compared to that of molecular typing methods on 34 L. monocytogenes isolates collected over one year in the same rabbit meat processing plant and belonging to three genotypes (ST14, ST121, ST224). Each genotype included isolates indistinguishable by standard molecular typing methods. The virulence potential of all isolates was assessed by Multi Virulence-Locus Sequence Typing (MVLST) and the investigation of a representative database of virulence determinant genes. The whole genome of each isolate was sequenced on a MiSeq platform. The cgMLST, MVLST, and in silico identification of virulence genes were performed using publicly available tools. Draft genomes included a number of contigs ranging from 13 to 28 and N50 ranging from 456298 to 580604. The coverage ranged from 41 to 187X. The cgMLST showed a significantly superior discriminatory power only in comparison to ribotyping, nevertheless it allows the detection of two singletons belonging to ST14 that were not observed by other molecular methods. All ST14 isolates belonged to VT107, which 7-loci concatenated sequence differs for only 4 nucleotides to VT1 (Epidemic clone III). Analysis of virulence genes showed the presence of a fulllength inlA version in all ST14 isolates and of a mutated version including a premature stop codon (PMSC) associated to attenuated virulence in all ST121 isolates. PMID:29071246

  7. Whole genome sequencing for typing and characterisation ofListeria monocytogenesisolated in a rabbit meat processing plant.

    Science.gov (United States)

    Palma, Federica; Pasquali, Frédérique; Lucchi, Alex; Cesare, Alessandra De; Manfreda, Gerardo

    2017-08-16

    Listeria monocytogenes is a food-borne pathogen able to survive and grow in different environments including food processing plants where it can persist for month or years. In the present study the discriminatory power of Whole Genome Sequencing (WGS)-based analysis (cgMLST) was compared to that of molecular typing methods on 34 L. monocytogenes isolates collected over one year in the same rabbit meat processing plant and belonging to three genotypes (ST14, ST121, ST224). Each genotype included isolates indistinguishable by standard molecular typing methods. The virulence potential of all isolates was assessed by Multi Virulence-Locus Sequence Typing (MVLST) and the investigation of a representative database of virulence determinant genes. The whole genome of each isolate was sequenced on a MiSeq platform. The cgMLST, MVLST, and in silico identification of virulence genes were performed using publicly available tools. Draft genomes included a number of contigs ranging from 13 to 28 and N50 ranging from 456298 to 580604. The coverage ranged from 41 to 187X. The cgMLST showed a significantly superior discriminatory power only in comparison to ribotyping, nevertheless it allows the detection of two singletons belonging to ST14 that were not observed by other molecular methods. All ST14 isolates belonged to VT107, which 7-loci concatenated sequence differs for only 4 nucleotides to VT1 (Epidemic clone III). Analysis of virulence genes showed the presence of a fulllength inlA version in all ST14 isolates and of a mutated version including a premature stop codon (PMSC) associated to attenuated virulence in all ST121 isolates.

  8. Whole genome sequencing for typing and characterisation of Listeria monocytogenes isolated in a rabbit meat processing plant

    Directory of Open Access Journals (Sweden)

    Federica Palma

    2017-09-01

    Full Text Available Listeria monocytogenes is a food-borne pathogen able to survive and grow in different environments including food processing plants where it can persist for month or years. In the present study the discriminatory power of Whole Genome Sequencing (WGS-based analysis (cgMLST was compared to that of molecular typing methods on 34 L. monocytogenes isolates collected over one year in the same rabbit meat processing plant and belonging to three genotypes (ST14, ST121, ST224. Each genotype included isolates indistinguishable by standard molecular typing methods. The virulence potential of all isolates was assessed by Multi Virulence-Locus Sequence Typing (MVLST and the investigation of a representative database of virulence determinant genes. The whole genome of each isolate was sequenced on a MiSeq platform. The cgMLST, MVLST, and in silico identification of virulence genes were performed using publicly available tools. Draft genomes included a number of contigs ranging from 13 to 28 and N50 ranging from 456298 to 580604. The coverage ranged from 41 to 187X. The cgMLST showed a significantly superior discriminatory power only in comparison to ribotyping, nevertheless it allows the detection of two singletons belonging to ST14 that were not observed by other molecular methods. All ST14 isolates belonged to VT107, which 7-loci concatenated sequence differs for only 4 nucleotides to VT1 (Epidemic clone III. Analysis of virulence genes showed the presence of a fulllength inlA version in all ST14 isolates and of a mutated version including a premature stop codon (PMSC associated to attenuated virulence in all ST121 isolates.

  9. A whole genome screen for HIV restriction factors

    Directory of Open Access Journals (Sweden)

    Liu Li

    2011-11-01

    Full Text Available Abstract Background Upon cellular entry retroviruses must avoid innate restriction factors produced by the host cell. For human immunodeficiency virus (HIV human restriction factors, APOBEC3 (apolipoprotein-B-mRNA-editing-enzyme, p21 and tetherin are well characterised. Results To identify intrinsic resistance factors to HIV-1 replication we screened 19,121 human genes and identified 114 factors with significant inhibition of infection. Those with a known function are involved in a broad spectrum of cellular processes including receptor signalling, vesicle trafficking, transcription, apoptosis, cross-nuclear membrane transport, meiosis, DNA damage repair, ubiquitination and RNA processing. We focused on the PAF1 complex which has been previously implicated in gene transcription, cell cycle control and mRNA surveillance. Knockdown of all members of the PAF1 family of proteins enhanced HIV-1 reverse transcription and integration of provirus. Over-expression of PAF1 in host cells renders them refractory to HIV-1. Simian Immunodeficiency Viruses and HIV-2 are also restricted in PAF1 expressing cells. PAF1 is expressed in primary monocytes, macrophages and T-lymphocytes and we demonstrate strong activity in MonoMac1, a monocyte cell line. Conclusions We propose that the PAF1c establishes an anti-viral state to prevent infection by incoming retroviruses. This previously unrecognised mechanism of restriction could have implications for invasion of cells by any pathogen.

  10. The whole-genome landscape of medulloblastoma subtypes.

    Science.gov (United States)

    Northcott, Paul A; Buchhalter, Ivo; Morrissy, A Sorana; Hovestadt, Volker; Weischenfeldt, Joachim; Ehrenberger, Tobias; Gröbner, Susanne; Segura-Wang, Maia; Zichner, Thomas; Rudneva, Vasilisa A; Warnatz, Hans-Jörg; Sidiropoulos, Nikos; Phillips, Aaron H; Schumacher, Steven; Kleinheinz, Kortine; Waszak, Sebastian M; Erkek, Serap; Jones, David T W; Worst, Barbara C; Kool, Marcel; Zapatka, Marc; Jäger, Natalie; Chavez, Lukas; Hutter, Barbara; Bieg, Matthias; Paramasivam, Nagarajan; Heinold, Michael; Gu, Zuguang; Ishaque, Naveed; Jäger-Schmidt, Christina; Imbusch, Charles D; Jugold, Alke; Hübschmann, Daniel; Risch, Thomas; Amstislavskiy, Vyacheslav; Gonzalez, Francisco German Rodriguez; Weber, Ursula D; Wolf, Stephan; Robinson, Giles W; Zhou, Xin; Wu, Gang; Finkelstein, David; Liu, Yanling; Cavalli, Florence M G; Luu, Betty; Ramaswamy, Vijay; Wu, Xiaochong; Koster, Jan; Ryzhova, Marina; Cho, Yoon-Jae; Pomeroy, Scott L; Herold-Mende, Christel; Schuhmann, Martin; Ebinger, Martin; Liau, Linda M; Mora, Jaume; McLendon, Roger E; Jabado, Nada; Kumabe, Toshihiro; Chuah, Eric; Ma, Yussanne; Moore, Richard A; Mungall, Andrew J; Mungall, Karen L; Thiessen, Nina; Tse, Kane; Wong, Tina; Jones, Steven J M; Witt, Olaf; Milde, Till; Von Deimling, Andreas; Capper, David; Korshunov, Andrey; Yaspo, Marie-Laure; Kriwacki, Richard; Gajjar, Amar; Zhang, Jinghui; Beroukhim, Rameen; Fraenkel, Ernest; Korbel, Jan O; Brors, Benedikt; Schlesner, Matthias; Eils, Roland; Marra, Marco A; Pfister, Stefan M; Taylor, Michael D; Lichter, Peter

    2017-07-19

    Current therapies for medulloblastoma, a highly malignant childhood brain tumour, impose debilitating effects on the developing child, and highlight the need for molecularly targeted treatments with reduced toxicity. Previous studies have been unable to identify the full spectrum of driver genes and molecular processes that operate in medulloblastoma subgroups. Here we analyse the somatic landscape across 491 sequenced medulloblastoma samples and the molecular heterogeneity among 1,256 epigenetically analysed cases, and identify subgroup-specific driver alterations that include previously undiscovered actionable targets. Driver mutations were confidently assigned to most patients belonging to Group 3 and Group 4 medulloblastoma subgroups, greatly enhancing previous knowledge. New molecular subtypes were differentially enriched for specific driver events, including hotspot in-frame insertions that target KBTBD4 and 'enhancer hijacking' events that activate PRDM6. Thus, the application of integrative genomics to an extensive cohort of clinical samples derived from a single childhood cancer entity revealed a series of cancer genes and biologically relevant subtype diversity that represent attractive therapeutic targets for the treatment of patients with medulloblastoma.

  11. Flexible positions, managed hopes: the promissory bioeconomy of a whole genome sequencing cancer study.

    Science.gov (United States)

    Haase, Rachel; Michie, Marsha; Skinner, Debra

    2015-04-01

    Genomic research has rapidly expanded its scope and ambition over the past decade, promoted by both public and private sectors as having the potential to revolutionize clinical medicine. This promissory bioeconomy of genomic research and technology is generated by, and in turn generates, the hopes and expectations shared by investors, researchers and clinicians, patients, and the general public alike. Examinations of such bioeconomies have often focused on the public discourse, media representations, and capital investments that fuel these "regimes of hope," but also crucial are the more intimate contexts of small-scale medical research, and the private hopes, dreams, and disappointments of those involved. Here we examine one local site of production in a university-based clinical research project that sought to identify novel cancer predisposition genes through whole genome sequencing in individuals at high risk for cancer. In-depth interviews with 24 adults who donated samples to the study revealed an ability to shift flexibly between positioning themselves as research participants on the one hand, and as patients or as family members of patients, on the other. Similarly, interviews with members of the research team highlighted the dual nature of their positions as researchers and as clinicians. For both parties, this dual positioning shaped their investment in the project and valuing of its possible outcomes. In their narratives, all parties shifted between these different relational positions as they managed hopes and expectations for the research project. We suggest that this flexibility facilitated study implementation and participation in the face of potential and probable disappointment on one or more fronts, and acted as a key element in the resilience of this local promissory bioeconomy. We conclude that these multiple dimensions of relationality and positionality are inherent and essential in the creation of any complex economy, "bio" or otherwise

  12. Quantification of trace-level DNA by real-time whole genome amplification.

    Directory of Open Access Journals (Sweden)

    Min-Jung Kang

    Full Text Available Quantification of trace amounts of DNA is a challenge in analytical applications where the concentration of a target DNA is very low or only limited amounts of samples are available for analysis. PCR-based methods including real-time PCR are highly sensitive and widely used for quantification of low-level DNA samples. However, ordinary PCR methods require at least one copy of a specific gene sequence for amplification and may not work for a sub-genomic amount of DNA. We suggest a real-time whole genome amplification method adopting the degenerate oligonucleotide primed PCR (DOP-PCR for quantification of sub-genomic amounts of DNA. This approach enabled quantification of sub-picogram amounts of DNA independently of their sequences. When the method was applied to the human placental DNA of which amount was accurately determined by inductively coupled plasma-optical emission spectroscopy (ICP-OES, an accurate and stable quantification capability for DNA samples ranging from 80 fg to 8 ng was obtained. In blind tests of laboratory-prepared DNA samples, measurement accuracies of 7.4%, -2.1%, and -13.9% with analytical precisions around 15% were achieved for 400-pg, 4-pg, and 400-fg DNA samples, respectively. A similar quantification capability was also observed for other DNA species from calf, E. coli, and lambda phage. Therefore, when provided with an appropriate standard DNA, the suggested real-time DOP-PCR method can be used as a universal method for quantification of trace amounts of DNA.

  13. Identification of Escherichia coli and Shigella Species from Whole-Genome Sequences.

    Science.gov (United States)

    Chattaway, Marie A; Schaefer, Ulf; Tewolde, Rediat; Dallman, Timothy J; Jenkins, Claire

    2017-02-01

    Escherichia coli and Shigella species are closely related and genetically constitute the same species. Differentiating between these two pathogens and accurately identifying the four species of Shigella are therefore challenging. The organism-specific bioinformatics whole-genome sequencing (WGS) typing pipelines at Public Health England are dependent on the initial identification of the bacterial species by use of a kmer-based approach. Of the 1,982 Escherichia coli and Shigella sp. isolates analyzed in this study, 1,957 (98.4%) had concordant results by both traditional biochemistry and serology (TB&S) and the kmer identification (ID) derived from the WGS data. Of the 25 mismatches identified, 10 were enteroinvasive E. coli isolates that were misidentified as Shigella flexneri or S. boydii by the kmer ID, and 8 were S. flexneri isolates misidentified by TB&S as S. boydii due to nonfunctional S. flexneri O antigen biosynthesis genes. Analysis of the population structure based on multilocus sequence typing (MLST) data derived from the WGS data showed that the remaining discrepant results belonged to clonal complex 288 (CC288), comprising both S. boydii and S. dysenteriae strains. Mismatches between the TB&S and kmer ID results were explained by the close phylogenetic relationship between the two species and were resolved with reference to the MLST data. Shigella can be differentiated from E. coli and accurately identified to the species level by use of kmer comparisons and MLST. Analysis of the WGS data provided explanations for the discordant results between TB&S and WGS data, revealed the true phylogenetic relationships between different species of Shigella, and identified emerging pathoadapted lineages. © Crown copyright 2017.

  14. Whole genome sequences are required to fully resolve the linkage disequilibrium structure of human populations.

    Science.gov (United States)

    Pengelly, Reuben J; Tapper, William; Gibson, Jane; Knut, Marcin; Tearle, Rick; Collins, Andrew; Ennis, Sarah

    2015-09-03

    An understanding of linkage disequilibrium (LD) structures in the human genome underpins much of medical genetics and provides a basis for disease gene mapping and investigating biological mechanisms such as recombination and selection. Whole genome sequencing (WGS) provides the opportunity to determine LD structures at maximal resolution. We compare LD maps constructed from WGS data with LD maps produced from the array-based HapMap dataset, for representative European and African populations. WGS provides up to 5.7-fold greater SNP density than array-based data and achieves much greater resolution of LD structure, allowing for identification of up to 2.8-fold more regions of intense recombination. The absence of ascertainment bias in variant genotyping improves the population representativeness of the WGS maps, and highlights the extent of uncaptured variation using array genotyping methodologies. The complete capture of LD patterns using WGS allows for higher genome-wide association study (GWAS) power compared to array-based GWAS, with WGS also allowing for the analysis of rare variation. The impact of marker ascertainment issues in arrays has been greatest for Sub-Saharan African populations where larger sample sizes and substantially higher marker densities are required to fully resolve the LD structure. WGS provides the best possible resource for LD mapping due to the maximal marker density and lack of ascertainment bias. WGS LD maps provide a rich resource for medical and population genetics studies. The increasing availability of WGS data for large populations will allow for improved research utilising LD, such as GWAS and recombination biology studies.

  15. Whole genome sequencing of an ethnic Pathan (Pakhtun) from the north-west of Pakistan.

    Science.gov (United States)

    Ilyas, Muhammad; Kim, Jong-Soo; Cooper, Jesse; Shin, Young-Ah; Kim, Hak-Min; Cho, Yun Sung; Hwang, Seungwoo; Kim, Hyunho; Moon, Jaewoo; Chung, Oksung; Jun, JeHoon; Rastogi, Achal; Song, Sanghoon; Ko, Junsu; Manica, Andrea; Rahman, Ziaur; Husnain, Tayyab; Bhak, Jong

    2015-03-12

    Pakistan covers a key geographic area in human history, being both part of the Indus River region that acted as one of the cradles of civilization and as a link between Western Eurasia and Eastern Asia. This region is inhabited by a number of distinct ethnic groups, the largest being the Punjabi, Pathan (Pakhtuns), Sindhi, and Baloch. We analyzed the first ethnic male Pathan genome by sequencing it to 29.7-fold coverage using the Illumina HiSeq2000 platform. A total of 3.8 million single nucleotide variations (SNVs) and 0.5 million small indels were identified by comparing with the human reference genome. Among the SNVs, 129,441 were novel, and 10,315 nonsynonymous SNVs were found in 5,344 genes. SNVs were annotated for health consequences and high risk diseases, as well as possible influences on drug efficacy. We confirmed that the Pathan genome presented here is representative of this ethnic group by comparing it to a panel of Central Asians from the HGDP-CEPH panels typed for ~650 k SNPs. The mtDNA (H2) and Y haplogroup (L1) of this individual were also typical of his geographic region of origin. Finally, we reconstruct the demographic history by PSMC, which highlights a recent increase in effective population size compatible with admixture between European and Asian lineages expected in this geographic region. We present a whole-genome sequence and analyses of an ethnic Pathan from the north-west province of Pakistan. It is a useful resource to understand genetic variation and human migration across the whole Asian continent.

  16. Assessing the Quality of Whole Genome Alignments in Bacteria

    Directory of Open Access Journals (Sweden)

    Firas Swidan

    2009-01-01

    biology. Matching long similar segments between two genomes is a precondition for their evolutionary, genetic, and genome rearrangement analyses. Though various comparison methods have been developed in recent years, a quantitative assessment of their performance is lacking. Here, we describe two families of assessment measures whose purpose is to evaluate bacteria-oriented comparison tools. The first measure is based on how well the genome segmentation fits the gene annotation of the studied organisms; the second uses the number of segments created by the segmentation and the percentage of the two genomes that are conserved. The effectiveness of the two measures is demonstrated by applying them to the results of genome comparison tools obtained on 41 pairs of bacterial species. Despite the difference in the nature of the two types of measurements, both show consistent results, providing insights into the subtle differences between the mapping tools.

  17. Development of novel InDel markers and genetic diversity in Chenopodium quinoa through whole-genome re-sequencing.

    Science.gov (United States)

    Zhang, Tifu; Gu, Minfeng; Liu, Yuhe; Lv, Yuanda; Zhou, Ling; Lu, Haiyan; Liang, Shuaiqiang; Bao, Huabin; Zhao, Han

    2017-09-05

    Quinoa (Chenopodium quinoa Willd.) is a balanced nutritional crop, but its breeding improvement has been limited by the lack of information on its genetics and genomics. Therefore, it is necessary to obtain knowledge on genomic variation, population structure, and genetic diversity and to develop novel Insertion/Deletion (InDel) markers for quinoa by whole-genome re-sequencing. We re-sequenced 11 quinoa accessions and obtained a coverage depth between approximately 7× to 23× the quinoa genome. Based on the 1453-megabase (Mb) assembly from the reference accession Riobamba, 8,441,022 filtered bi-allelic single nucleotide polymorphisms (SNPs) and 842,783 filtered InDels were identified, with an estimated SNP and InDel density of 5.81 and 0.58 per kilobase (kb). From the genomic InDel variations, 85 dimorphic InDel markers were newly developed and validated. Together with the 62 simple sequence repeat (SSR) markers reported, a total of 147 markers were used for genotyping the 129 quinoa accessions. Molecular grouping analysis showed classification into two major groups, the Andean highland (composed of the northern and southern highland subgroups) and Chilean coastal, based on combined STRUCTURE, phylogenetic tree and PCA (Principle Component Analysis) analyses. Further analysis of the genetic diversity exhibited a decreasing tendency from the Chilean coast group to the Andean highland group, and the gene flow between subgroups was more frequent than that between the two subgroups and the Chilean coastal group. The majority of the variations (approximately 70%) were found through an analysis of molecular variation (AMOVA) due to the diversity between the groups. This was congruent with the observation of a highly significant FST value (0.705) between the groups, demonstrating significant genetic differentiation between the Andean highland type of quinoa and the Chilean coastal type. Moreover, a core set of 16 quinoa germplasms that capture all 362 alleles was

  18. In search of rare variants: Preliminary results from whole genome sequencing of 1,325 individuals with psychophysiological endophenotypes

    Science.gov (United States)

    VRIEZE, SCOTT I.; MALONE, STEPHEN M.; VAIDYANATHAN, UMA; KWONG, ALAN; KANG, HYUN MIN; ZHAN, XIAOWEI; FLICKINGER, MATTHEW; IRONS, DANIEL; JUN, GOO; LOCKE, ADAM E.; PISTIS, GIORGIO; PORCU, ELEONORA; LEVY, SHAWN; MYERS, RICHARD M.; OETTING, WILLIAM; MCGUE, MATT; ABECASIS, GONCALO; IACONO, WILLIAM G.

    2014-01-01

    Whole genome sequencing was completed on 1,325 individuals from 602 families, identifying 27 million autosomal variants. Genetic association tests were conducted for those individuals who had been assessed for one or more of 17 endophenotypes (N range = 802–1,185). No significant associations were found. These 27 million variants were then imputed into the full sample of individuals with psychophysiological data (N range = 3,088–4,469) and again tested for associations with the 17 endophenotypes. No association was signif