WorldWideScience

Sample records for integrated microbial genomes

  1. The integrated microbial genome resource of analysis.

    Science.gov (United States)

    Checcucci, Alice; Mengoni, Alessio

    2015-01-01

    Integrated Microbial Genomes and Metagenomes (IMG) is a biocomputational system that allows to provide information and support for annotation and comparative analysis of microbial genomes and metagenomes. IMG has been developed by the US Department of Energy (DOE)-Joint Genome Institute (JGI). IMG platform contains both draft and complete genomes, sequenced by Joint Genome Institute and other public and available genomes. Genomes of strains belonging to Archaea, Bacteria, and Eukarya domains are present as well as those of viruses and plasmids. Here, we provide some essential features of IMG system and case study for pangenome analysis.

  2. IMG: the integrated microbial genomes database and comparative analysis system

    Science.gov (United States)

    Markowitz, Victor M.; Chen, I-Min A.; Palaniappan, Krishna; Chu, Ken; Szeto, Ernest; Grechkin, Yuri; Ratner, Anna; Jacob, Biju; Huang, Jinghua; Williams, Peter; Huntemann, Marcel; Anderson, Iain; Mavromatis, Konstantinos; Ivanova, Natalia N.; Kyrpides, Nikos C.

    2012-01-01

    The Integrated Microbial Genomes (IMG) system serves as a community resource for comparative analysis of publicly available genomes in a comprehensive integrated context. IMG integrates publicly available draft and complete genomes from all three domains of life with a large number of plasmids and viruses. IMG provides tools and viewers for analyzing and reviewing the annotations of genes and genomes in a comparative context. IMG's data content and analytical capabilities have been continuously extended through regular updates since its first release in March 2005. IMG is available at http://img.jgi.doe.gov. Companion IMG systems provide support for expert review of genome annotations (IMG/ER: http://img.jgi.doe.gov/er), teaching courses and training in microbial genome analysis (IMG/EDU: http://img.jgi.doe.gov/edu) and analysis of genomes related to the Human Microbiome Project (IMG/HMP: http://www.hmpdacc-resources.org/img_hmp). PMID:22194640

  3. Improving Microbial Genome Annotations in an Integrated Database Context

    Science.gov (United States)

    Chen, I-Min A.; Markowitz, Victor M.; Chu, Ken; Anderson, Iain; Mavromatis, Konstantinos; Kyrpides, Nikos C.; Ivanova, Natalia N.

    2013-01-01

    Effective comparative analysis of microbial genomes requires a consistent and complete view of biological data. Consistency regards the biological coherence of annotations, while completeness regards the extent and coverage of functional characterization for genomes. We have developed tools that allow scientists to assess and improve the consistency and completeness of microbial genome annotations in the context of the Integrated Microbial Genomes (IMG) family of systems. All publicly available microbial genomes are characterized in IMG using different functional annotation and pathway resources, thus providing a comprehensive framework for identifying and resolving annotation discrepancies. A rule based system for predicting phenotypes in IMG provides a powerful mechanism for validating functional annotations, whereby the phenotypic traits of an organism are inferred based on the presence of certain metabolic reactions and pathways and compared to experimentally observed phenotypes. The IMG family of systems are available at http://img.jgi.doe.gov/. PMID:23424620

  4. Improving microbial genome annotations in an integrated database context.

    Directory of Open Access Journals (Sweden)

    I-Min A Chen

    Full Text Available Effective comparative analysis of microbial genomes requires a consistent and complete view of biological data. Consistency regards the biological coherence of annotations, while completeness regards the extent and coverage of functional characterization for genomes. We have developed tools that allow scientists to assess and improve the consistency and completeness of microbial genome annotations in the context of the Integrated Microbial Genomes (IMG family of systems. All publicly available microbial genomes are characterized in IMG using different functional annotation and pathway resources, thus providing a comprehensive framework for identifying and resolving annotation discrepancies. A rule based system for predicting phenotypes in IMG provides a powerful mechanism for validating functional annotations, whereby the phenotypic traits of an organism are inferred based on the presence of certain metabolic reactions and pathways and compared to experimentally observed phenotypes. The IMG family of systems are available at http://img.jgi.doe.gov/.

  5. INDIGO - INtegrated data warehouse of microbial genomes with examples from the red sea extremophiles.

    KAUST Repository

    Alam, Intikhab; Antunes, André ; Kamau, Allan; Ba Alawi, Wail; Kalkatawi, Manal M.; Stingl, Ulrich; Bajic, Vladimir B.

    2013-01-01

    The next generation sequencing technologies substantially increased the throughput of microbial genome sequencing. To functionally annotate newly sequenced microbial genomes, a variety of experimental and computational methods are used. Integration of information from different sources is a powerful approach to enhance such annotation. Functional analysis of microbial genomes, necessary for downstream experiments, crucially depends on this annotation but it is hampered by the current lack of suitable information integration and exploration systems for microbial genomes.

  6. INDIGO - INtegrated data warehouse of microbial genomes with examples from the red sea extremophiles.

    KAUST Repository

    Alam, Intikhab

    2013-12-06

    The next generation sequencing technologies substantially increased the throughput of microbial genome sequencing. To functionally annotate newly sequenced microbial genomes, a variety of experimental and computational methods are used. Integration of information from different sources is a powerful approach to enhance such annotation. Functional analysis of microbial genomes, necessary for downstream experiments, crucially depends on this annotation but it is hampered by the current lack of suitable information integration and exploration systems for microbial genomes.

  7. G-InforBIO: integrated system for microbial genomics

    Directory of Open Access Journals (Sweden)

    Abe Takashi

    2006-08-01

    Full Text Available Abstract Background Genome databases contain diverse kinds of information, including gene annotations and nucleotide and amino acid sequences. It is not easy to integrate such information for genomic study. There are few tools for integrated analyses of genomic data, therefore, we developed software that enables users to handle, manipulate, and analyze genome data with a variety of sequence analysis programs. Results The G-InforBIO system is a novel tool for genome data management and sequence analysis. The system can import genome data encoded as eXtensible Markup Language documents as formatted text documents, including annotations and sequences, from DNA Data Bank of Japan and GenBank encoded as flat files. The genome database is constructed automatically after importing, and the database can be exported as documents formatted with eXtensible Markup Language or tab-deliminated text. Users can retrieve data from the database by keyword searches, edit annotation data of genes, and process data with G-InforBIO. In addition, information in the G-InforBIO database can be analyzed seamlessly with nine different software programs, including programs for clustering and homology analyses. Conclusion The G-InforBIO system simplifies genome analyses by integrating several available software programs to allow efficient handling and manipulation of genome data. G-InforBIO is freely available from the download site.

  8. IMG 4 version of the integrated microbial genomes comparative analysis system

    Science.gov (United States)

    Markowitz, Victor M.; Chen, I-Min A.; Palaniappan, Krishna; Chu, Ken; Szeto, Ernest; Pillay, Manoj; Ratner, Anna; Huang, Jinghua; Woyke, Tanja; Huntemann, Marcel; Anderson, Iain; Billis, Konstantinos; Varghese, Neha; Mavromatis, Konstantinos; Pati, Amrita; Ivanova, Natalia N.; Kyrpides, Nikos C.

    2014-01-01

    The Integrated Microbial Genomes (IMG) data warehouse integrates genomes from all three domains of life, as well as plasmids, viruses and genome fragments. IMG provides tools for analyzing and reviewing the structural and functional annotations of genomes in a comparative context. IMG’s data content and analytical capabilities have increased continuously since its first version released in 2005. Since the last report published in the 2012 NAR Database Issue, IMG’s annotation and data integration pipelines have evolved while new tools have been added for recording and analyzing single cell genomes, RNA Seq and biosynthetic cluster data. Different IMG datamarts provide support for the analysis of publicly available genomes (IMG/W: http://img.jgi.doe.gov/w), expert review of genome annotations (IMG/ER: http://img.jgi.doe.gov/er) and teaching and training in the area of microbial genome analysis (IMG/EDU: http://img.jgi.doe.gov/edu). PMID:24165883

  9. IMG 4 version of the integrated microbial genomes comparative analysis system

    Energy Technology Data Exchange (ETDEWEB)

    Markowitz, Victor M. [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Biological Data Management and Technology Center. Computational Research Division; Chen, I-Min A. [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Biological Data Management and Technology Center. Computational Research Division; Palaniappan, Krishna [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Biological Data Management and Technology Center. Computational Research Division; Chu, Ken [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Biological Data Management and Technology Center. Computational Research Division; Szeto, Ernest [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Biological Data Management and Technology Center. Computational Research Division; Pillay, Manoj [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Biological Data Management and Technology Center. Computational Research Division; Ratner, Anna [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Biological Data Management and Technology Center. Computational Research Division; Huang, Jinghua [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Biological Data Management and Technology Center. Computational Research Division; Woyke, Tanja [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States). Microbial Genome and Metagenome Program; Huntemann, Marcel [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States). Microbial Genome and Metagenome Program; Anderson, Iain [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States). Microbial Genome and Metagenome Program; Billis, Konstantinos [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States). Microbial Genome and Metagenome Program; Varghese, Neha [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States). Microbial Genome and Metagenome Program; Mavromatis, Konstantinos [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States). Microbial Genome and Metagenome Program; Pati, Amrita [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States). Microbial Genome and Metagenome Program; Ivanova, Natalia N. [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States). Microbial Genome and Metagenome Program; Kyrpides, Nikos C. [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States). Microbial Genome and Metagenome Program

    2013-10-27

    The Integrated Microbial Genomes (IMG) data warehouse integrates genomes from all three domains of life, as well as plasmids, viruses and genome fragments. IMG provides tools for analyzing and reviewing the structural and functional annotations of genomes in a comparative context. IMG’s data content and analytical capabilities have increased continuously since its first version released in 2005. Since the last report published in the 2012 NAR Database Issue, IMG’s annotation and data integration pipelines have evolved while new tools have been added for recording and analyzing single cell genomes, RNA Seq and biosynthetic cluster data. Finally, different IMG datamarts provide support for the analysis of publicly available genomes (IMG/W: http://img.jgi.doe.gov/w), expert review of genome annotations (IMG/ER: http://img.jgi.doe.gov/er) and teaching and training in the area of microbial genome analysis (IMG/EDU: http://img.jgi.doe.gov/edu).

  10. INDIGO - INtegrated data warehouse of microbial genomes with examples from the red sea extremophiles.

    Science.gov (United States)

    Alam, Intikhab; Antunes, André; Kamau, Allan Anthony; Ba Alawi, Wail; Kalkatawi, Manal; Stingl, Ulrich; Bajic, Vladimir B

    2013-01-01

    The next generation sequencing technologies substantially increased the throughput of microbial genome sequencing. To functionally annotate newly sequenced microbial genomes, a variety of experimental and computational methods are used. Integration of information from different sources is a powerful approach to enhance such annotation. Functional analysis of microbial genomes, necessary for downstream experiments, crucially depends on this annotation but it is hampered by the current lack of suitable information integration and exploration systems for microbial genomes. We developed a data warehouse system (INDIGO) that enables the integration of annotations for exploration and analysis of newly sequenced microbial genomes. INDIGO offers an opportunity to construct complex queries and combine annotations from multiple sources starting from genomic sequence to protein domain, gene ontology and pathway levels. This data warehouse is aimed at being populated with information from genomes of pure cultures and uncultured single cells of Red Sea bacteria and Archaea. Currently, INDIGO contains information from Salinisphaera shabanensis, Haloplasma contractile, and Halorhabdus tiamatea - extremophiles isolated from deep-sea anoxic brine lakes of the Red Sea. We provide examples of utilizing the system to gain new insights into specific aspects on the unique lifestyle and adaptations of these organisms to extreme environments. We developed a data warehouse system, INDIGO, which enables comprehensive integration of information from various resources to be used for annotation, exploration and analysis of microbial genomes. It will be regularly updated and extended with new genomes. It is aimed to serve as a resource dedicated to the Red Sea microbes. In addition, through INDIGO, we provide our Automatic Annotation of Microbial Genomes (AAMG) pipeline. The INDIGO web server is freely available at http://www.cbrc.kaust.edu.sa/indigo.

  11. INDIGO - INtegrated data warehouse of microbial genomes with examples from the red sea extremophiles.

    Directory of Open Access Journals (Sweden)

    Intikhab Alam

    Full Text Available The next generation sequencing technologies substantially increased the throughput of microbial genome sequencing. To functionally annotate newly sequenced microbial genomes, a variety of experimental and computational methods are used. Integration of information from different sources is a powerful approach to enhance such annotation. Functional analysis of microbial genomes, necessary for downstream experiments, crucially depends on this annotation but it is hampered by the current lack of suitable information integration and exploration systems for microbial genomes.We developed a data warehouse system (INDIGO that enables the integration of annotations for exploration and analysis of newly sequenced microbial genomes. INDIGO offers an opportunity to construct complex queries and combine annotations from multiple sources starting from genomic sequence to protein domain, gene ontology and pathway levels. This data warehouse is aimed at being populated with information from genomes of pure cultures and uncultured single cells of Red Sea bacteria and Archaea. Currently, INDIGO contains information from Salinisphaera shabanensis, Haloplasma contractile, and Halorhabdus tiamatea - extremophiles isolated from deep-sea anoxic brine lakes of the Red Sea. We provide examples of utilizing the system to gain new insights into specific aspects on the unique lifestyle and adaptations of these organisms to extreme environments.We developed a data warehouse system, INDIGO, which enables comprehensive integration of information from various resources to be used for annotation, exploration and analysis of microbial genomes. It will be regularly updated and extended with new genomes. It is aimed to serve as a resource dedicated to the Red Sea microbes. In addition, through INDIGO, we provide our Automatic Annotation of Microbial Genomes (AAMG pipeline. The INDIGO web server is freely available at http://www.cbrc.kaust.edu.sa/indigo.

  12. The Integrated Microbial Genomes (IMG) System: An Expanding Comparative Analysis Resource

    Energy Technology Data Exchange (ETDEWEB)

    Markowitz, Victor M.; Chen, I-Min A.; Palaniappan, Krishna; Chu, Ken; Szeto, Ernest; Grechkin, Yuri; Ratner, Anna; Anderson, Iain; Lykidis, Athanasios; Mavromatis, Konstantinos; Ivanova, Natalia N.; Kyrpides, Nikos C.

    2009-09-13

    The integrated microbial genomes (IMG) system serves as a community resource for comparative analysis of publicly available genomes in a comprehensive integrated context. IMG contains both draft and complete microbial genomes integrated with other publicly available genomes from all three domains of life, together with a large number of plasmids and viruses. IMG provides tools and viewers for analyzing and reviewing the annotations of genes and genomes in a comparative context. Since its first release in 2005, IMG's data content and analytical capabilities have been constantly expanded through regular releases. Several companion IMG systems have been set up in order to serve domain specific needs, such as expert review of genome annotations. IMG is available at .

  13. INDIGO – INtegrated Data Warehouse of MIcrobial GenOmes with Examples from the Red Sea Extremophiles

    Science.gov (United States)

    Alam, Intikhab; Antunes, André; Kamau, Allan Anthony; Ba alawi, Wail; Kalkatawi, Manal; Stingl, Ulrich; Bajic, Vladimir B.

    2013-01-01

    Background The next generation sequencing technologies substantially increased the throughput of microbial genome sequencing. To functionally annotate newly sequenced microbial genomes, a variety of experimental and computational methods are used. Integration of information from different sources is a powerful approach to enhance such annotation. Functional analysis of microbial genomes, necessary for downstream experiments, crucially depends on this annotation but it is hampered by the current lack of suitable information integration and exploration systems for microbial genomes. Results We developed a data warehouse system (INDIGO) that enables the integration of annotations for exploration and analysis of newly sequenced microbial genomes. INDIGO offers an opportunity to construct complex queries and combine annotations from multiple sources starting from genomic sequence to protein domain, gene ontology and pathway levels. This data warehouse is aimed at being populated with information from genomes of pure cultures and uncultured single cells of Red Sea bacteria and Archaea. Currently, INDIGO contains information from Salinisphaera shabanensis, Haloplasma contractile, and Halorhabdus tiamatea - extremophiles isolated from deep-sea anoxic brine lakes of the Red Sea. We provide examples of utilizing the system to gain new insights into specific aspects on the unique lifestyle and adaptations of these organisms to extreme environments. Conclusions We developed a data warehouse system, INDIGO, which enables comprehensive integration of information from various resources to be used for annotation, exploration and analysis of microbial genomes. It will be regularly updated and extended with new genomes. It is aimed to serve as a resource dedicated to the Red Sea microbes. In addition, through INDIGO, we provide our Automatic Annotation of Microbial Genomes (AAMG) pipeline. The INDIGO web server is freely available at http://www.cbrc.kaust.edu.sa/indigo. PMID

  14. Integrative computational approach for genome-based study of microbial lipid-degrading enzymes.

    Science.gov (United States)

    Vorapreeda, Tayvich; Thammarongtham, Chinae; Laoteng, Kobkul

    2016-07-01

    Lipid-degrading or lipolytic enzymes have gained enormous attention in academic and industrial sectors. Several efforts are underway to discover new lipase enzymes from a variety of microorganisms with particular catalytic properties to be used for extensive applications. In addition, various tools and strategies have been implemented to unravel the functional relevance of the versatile lipid-degrading enzymes for special purposes. This review highlights the study of microbial lipid-degrading enzymes through an integrative computational approach. The identification of putative lipase genes from microbial genomes and metagenomic libraries using homology-based mining is discussed, with an emphasis on sequence analysis of conserved motifs and enzyme topology. Molecular modelling of three-dimensional structure on the basis of sequence similarity is shown to be a potential approach for exploring the structural and functional relationships of candidate lipase enzymes. The perspectives on a discriminative framework of cutting-edge tools and technologies, including bioinformatics, computational biology, functional genomics and functional proteomics, intended to facilitate rapid progress in understanding lipolysis mechanism and to discover novel lipid-degrading enzymes of microorganisms are discussed.

  15. MicroScope in 2017: an expanding and evolving integrated resource for community expertise of microbial genomes.

    Science.gov (United States)

    Vallenet, David; Calteau, Alexandra; Cruveiller, Stéphane; Gachet, Mathieu; Lajus, Aurélie; Josso, Adrien; Mercier, Jonathan; Renaux, Alexandre; Rollin, Johan; Rouy, Zoe; Roche, David; Scarpelli, Claude; Médigue, Claudine

    2017-01-04

    The annotation of genomes from NGS platforms needs to be automated and fully integrated. However, maintaining consistency and accuracy in genome annotation is a challenging problem because millions of protein database entries are not assigned reliable functions. This shortcoming limits the knowledge that can be extracted from genomes and metabolic models. Launched in 2005, the MicroScope platform (http://www.genoscope.cns.fr/agc/microscope) is an integrative resource that supports systematic and efficient revision of microbial genome annotation, data management and comparative analysis. Effective comparative analysis requires a consistent and complete view of biological data, and therefore, support for reviewing the quality of functional annotation is critical. MicroScope allows users to analyze microbial (meta)genomes together with post-genomic experiment results if any (i.e. transcriptomics, re-sequencing of evolved strains, mutant collections, phenotype data). It combines tools and graphical interfaces to analyze genomes and to perform the expert curation of gene functions in a comparative context. Starting with a short overview of the MicroScope system, this paper focuses on some major improvements of the Web interface, mainly for the submission of genomic data and on original tools and pipelines that have been developed and integrated in the platform: computation of pan-genomes and prediction of biosynthetic gene clusters. Today the resource contains data for more than 6000 microbial genomes, and among the 2700 personal accounts (65% of which are now from foreign countries), 14% of the users are performing expert annotations, on at least a weekly basis, contributing to improve the quality of microbial genome annotations. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  16. IMGMD: A platform for the integration and standardisation of In silico Microbial Genome-scale Metabolic Models.

    Science.gov (United States)

    Ye, Chao; Xu, Nan; Dong, Chuan; Ye, Yuannong; Zou, Xuan; Chen, Xiulai; Guo, Fengbiao; Liu, Liming

    2017-04-07

    Genome-scale metabolic models (GSMMs) constitute a platform that combines genome sequences and detailed biochemical information to quantify microbial physiology at the system level. To improve the unity, integrity, correctness, and format of data in published GSMMs, a consensus IMGMD database was built in the LAMP (Linux + Apache + MySQL + PHP) system by integrating and standardizing 328 GSMMs constructed for 139 microorganisms. The IMGMD database can help microbial researchers download manually curated GSMMs, rapidly reconstruct standard GSMMs, design pathways, and identify metabolic targets for strategies on strain improvement. Moreover, the IMGMD database facilitates the integration of wet-lab and in silico data to gain an additional insight into microbial physiology. The IMGMD database is freely available, without any registration requirements, at http://imgmd.jiangnan.edu.cn/database.

  17. Rumen microbial genomics

    International Nuclear Information System (INIS)

    Morrison, M.; Nelson, K.E.

    2005-01-01

    Improving microbial degradation of plant cell wall polysaccharides remains one of the highest priority goals for all livestock enterprises, including the cattle herds and draught animals of developing countries. The North American Consortium for Genomics of Fibrolytic Ruminal Bacteria was created to promote the sequencing and comparative analysis of rumen microbial genomes, offering the potential to fully assess the genetic potential in a functional and comparative fashion. It has been found that the Fibrobacter succinogenes genome encodes many more endoglucanases and cellodextrinases than previously isolated, and several new processive endoglucanases have been identified by genome and proteomic analysis of Ruminococcus albus, in addition to a variety of strategies for its adhesion to fibre. The ramifications of acquiring genome sequence data for rumen microorganisms are profound, including the potential to elucidate and overcome the biochemical, ecological or physiological processes that are rate limiting for ruminal fibre degradation. (author)

  18. Microbial Genomes Multiply

    Science.gov (United States)

    Doolittle, Russell F.

    2002-01-01

    The publication of the first complete sequence of a bacterial genome in 1995 was a signal event, underscored by the fact that the article has been cited more than 2,100 times during the intervening seven years. It was a marvelous technical achievement, made possible by automatic DNA-sequencing machines. The feat is the more impressive in that complete genome sequencing has now been adopted in many different laboratories around the world. Four years ago in these columns I examined the situation after a dozen microbial genomes had been completed. Now, with upwards of 60 microbial genome sequences determined and twice that many in progress, it seems reasonable to assess just what is being learned. Are new concepts emerging about how cells work? Have there been practical benefits in the fields of medicine and agriculture? Is it feasible to determine the genomic sequence of every bacterial species on Earth? The answers to these questions maybe Yes, Perhaps, and No, respectively.

  19. MicroScope—an integrated microbial resource for the curation and comparative analysis of genomic and metabolic data

    Science.gov (United States)

    Vallenet, David; Belda, Eugeni; Calteau, Alexandra; Cruveiller, Stéphane; Engelen, Stefan; Lajus, Aurélie; Le Fèvre, François; Longin, Cyrille; Mornico, Damien; Roche, David; Rouy, Zoé; Salvignol, Gregory; Scarpelli, Claude; Thil Smith, Adam Alexander; Weiman, Marion; Médigue, Claudine

    2013-01-01

    MicroScope is an integrated platform dedicated to both the methodical updating of microbial genome annotation and to comparative analysis. The resource provides data from completed and ongoing genome projects (automatic and expert annotations), together with data sources from post-genomic experiments (i.e. transcriptomics, mutant collections) allowing users to perfect and improve the understanding of gene functions. MicroScope (http://www.genoscope.cns.fr/agc/microscope) combines tools and graphical interfaces to analyse genomes and to perform the manual curation of gene annotations in a comparative context. Since its first publication in January 2006, the system (previously named MaGe for Magnifying Genomes) has been continuously extended both in terms of data content and analysis tools. The last update of MicroScope was published in 2009 in the Database journal. Today, the resource contains data for >1600 microbial genomes, of which ∼300 are manually curated and maintained by biologists (1200 personal accounts today). Expert annotations are continuously gathered in the MicroScope database (∼50 000 a year), contributing to the improvement of the quality of microbial genomes annotations. Improved data browsing and searching tools have been added, original tools useful in the context of expert annotation have been developed and integrated and the website has been significantly redesigned to be more user-friendly. Furthermore, in the context of the European project Microme (Framework Program 7 Collaborative Project), MicroScope is becoming a resource providing for the curation and analysis of both genomic and metabolic data. An increasing number of projects are related to the study of environmental bacterial (meta)genomes that are able to metabolize a large variety of chemical compounds that may be of high industrial interest. PMID:23193269

  20. Improved genome recovery and integrated cell-size analyses of individual uncultured microbial cells and viral particles.

    Science.gov (United States)

    Stepanauskas, Ramunas; Fergusson, Elizabeth A; Brown, Joseph; Poulton, Nicole J; Tupper, Ben; Labonté, Jessica M; Becraft, Eric D; Brown, Julia M; Pachiadaki, Maria G; Povilaitis, Tadas; Thompson, Brian P; Mascena, Corianna J; Bellows, Wendy K; Lubys, Arvydas

    2017-07-20

    Microbial single-cell genomics can be used to provide insights into the metabolic potential, interactions, and evolution of uncultured microorganisms. Here we present WGA-X, a method based on multiple displacement amplification of DNA that utilizes a thermostable mutant of the phi29 polymerase. WGA-X enhances genome recovery from individual microbial cells and viral particles while maintaining ease of use and scalability. The greatest improvements are observed when amplifying high G+C content templates, such as those belonging to the predominant bacteria in agricultural soils. By integrating WGA-X with calibrated index-cell sorting and high-throughput genomic sequencing, we are able to analyze genomic sequences and cell sizes of hundreds of individual, uncultured bacteria, archaea, protists, and viral particles, obtained directly from marine and soil samples, in a single experiment. This approach may find diverse applications in microbiology and in biomedical and forensic studies of humans and other multicellular organisms.Single-cell genomics can be used to study uncultured microorganisms. Here, Stepanauskas et al. present a method combining improved multiple displacement amplification and FACS, to obtain genomic sequences and cell size information from uncultivated microbial cells and viral particles in environmental samples.

  1. Visualization for genomics: the Microbial Genome Viewer.

    NARCIS (Netherlands)

    Kerkhoven, R.; Enckevort, F.H.J. van; Boekhorst, J.; Molenaar, D; Siezen, R.J.

    2004-01-01

    SUMMARY: A Web-based visualization tool, the Microbial Genome Viewer, is presented that allows the user to combine complex genomic data in a highly interactive way. This Web tool enables the interactive generation of chromosome wheels and linear genome maps from genome annotation data stored in a

  2. MicroScope-an integrated resource for community expertise of gene functions and comparative analysis of microbial genomic and metabolic data.

    Science.gov (United States)

    Médigue, Claudine; Calteau, Alexandra; Cruveiller, Stéphane; Gachet, Mathieu; Gautreau, Guillaume; Josso, Adrien; Lajus, Aurélie; Langlois, Jordan; Pereira, Hugo; Planel, Rémi; Roche, David; Rollin, Johan; Rouy, Zoe; Vallenet, David

    2017-09-12

    The overwhelming list of new bacterial genomes becoming available on a daily basis makes accurate genome annotation an essential step that ultimately determines the relevance of thousands of genomes stored in public databanks. The MicroScope platform (http://www.genoscope.cns.fr/agc/microscope) is an integrative resource that supports systematic and efficient revision of microbial genome annotation, data management and comparative analysis. Starting from the results of our syntactic, functional and relational annotation pipelines, MicroScope provides an integrated environment for the expert annotation and comparative analysis of prokaryotic genomes. It combines tools and graphical interfaces to analyze genomes and to perform the manual curation of gene function in a comparative genomics and metabolic context. In this article, we describe the free-of-charge MicroScope services for the annotation and analysis of microbial (meta)genomes, transcriptomic and re-sequencing data. Then, the functionalities of the platform are presented in a way providing practical guidance and help to the nonspecialists in bioinformatics. Newly integrated analysis tools (i.e. prediction of virulence and resistance genes in bacterial genomes) and original method recently developed (the pan-genome graph representation) are also described. Integrated environments such as MicroScope clearly contribute, through the user community, to help maintaining accurate resources. © The Author 2017. Published by Oxford University Press.

  3. Visualization for genomics: the Microbial Genome Viewer.

    Science.gov (United States)

    Kerkhoven, Robert; van Enckevort, Frank H J; Boekhorst, Jos; Molenaar, Douwe; Siezen, Roland J

    2004-07-22

    A Web-based visualization tool, the Microbial Genome Viewer, is presented that allows the user to combine complex genomic data in a highly interactive way. This Web tool enables the interactive generation of chromosome wheels and linear genome maps from genome annotation data stored in a MySQL database. The generated images are in scalable vector graphics (SVG) format, which is suitable for creating high-quality scalable images and dynamic Web representations. Gene-related data such as transcriptome and time-course microarray experiments can be superimposed on the maps for visual inspection. The Microbial Genome Viewer 1.0 is freely available at http://www.cmbi.kun.nl/MGV

  4. Theory of microbial genome evolution

    Science.gov (United States)

    Koonin, Eugene

    Bacteria and archaea have small genomes tightly packed with protein-coding genes. This compactness is commonly perceived as evidence of adaptive genome streamlining caused by strong purifying selection in large microbial populations. In such populations, even the small cost incurred by nonfunctional DNA because of extra energy and time expenditure is thought to be sufficient for this extra genetic material to be eliminated by selection. However, contrary to the predictions of this model, there exists a consistent, positive correlation between the strength of selection at the protein sequence level, measured as the ratio of nonsynonymous to synonymous substitution rates, and microbial genome size. By fitting the genome size distributions in multiple groups of prokaryotes to predictions of mathematical models of population evolution, we show that only models in which acquisition of additional genes is, on average, slightly beneficial yield a good fit to genomic data. Thus, the number of genes in prokaryotic genomes seems to reflect the equilibrium between the benefit of additional genes that diminishes as the genome grows and deletion bias. New genes acquired by microbial genomes, on average, appear to be adaptive. Evolution of bacterial and archaeal genomes involves extensive horizontal gene transfer and gene loss. Many microbes have open pangenomes, where each newly sequenced genome contains more than 10% `ORFans', genes without detectable homologues in other species. A simple, steady-state evolutionary model reveals two sharply distinct classes of microbial genes, one of which (ORFans) is characterized by effectively instantaneous gene replacement, whereas the other consists of genes with finite, distributed replacement rates. These findings imply a conservative estimate of at least a billion distinct genes in the prokaryotic genomic universe.

  5. Microbial genomes: Blueprints for life

    Energy Technology Data Exchange (ETDEWEB)

    Relman, David A.; Strauss, Evelyn

    2000-12-31

    Complete microbial genome sequences hold the promise of profound new insights into microbial pathogenesis, evolution, diagnostics, and therapeutics. From these insights will come a new foundation for understanding the evolution of single-celled life, as well as the evolution of more complex life forms. This report is an in-depth analysis of scientific issues that provides recommendations and will be widely disseminated to the scientific community, federal agencies, industry and the public.

  6. [Advances in microbial genome reduction and modification].

    Science.gov (United States)

    Wang, Jianli; Wang, Xiaoyuan

    2013-08-01

    Microbial genome reduction and modification are important strategies for constructing cellular chassis used for synthetic biology. This article summarized the essential genes and the methods to identify them in microorganisms, compared various strategies for microbial genome reduction, and analyzed the characteristics of some microorganisms with the minimized genome. This review shows the important role of genome reduction in constructing cellular chassis.

  7. Uses of antimicrobial genes from microbial genome

    Science.gov (United States)

    Sorek, Rotem; Rubin, Edward M.

    2013-08-20

    We describe a method for mining microbial genomes to discover antimicrobial genes and proteins having broad spectrum of activity. Also described are antimicrobial genes and their expression products from various microbial genomes that were found using this method. The products of such genes can be used as antimicrobial agents or as tools for molecular biology.

  8. Development and Use of Integrated Microarray-Based Genomic Technologies for Assessing Microbial Community Composition and Dynamics

    Energy Technology Data Exchange (ETDEWEB)

    J. Zhou; S.-K. Rhee; C. Schadt; T. Gentry; Z. He; X. Li; X. Liu; J. Liebich; S.C. Chong; L. Wu

    2004-03-17

    To effectively monitor microbial populations involved in various important processes, a 50-mer-based oligonucleotide microarray was developed based on known genes and pathways involved in: biodegradation, metal resistance and reduction, denitrification, nitrification, nitrogen fixation, methane oxidation, methanogenesis, carbon polymer decomposition, and sulfate reduction. This array contains approximately 2000 unique and group-specific probes with <85% similarity to their non-target sequences. Based on artificial probes, our results showed that at hybridization conditions of 50 C and 50% formamide, the 50-mer microarray hybridization can differentiate sequences having <88% similarity. Specificity tests with representative pure cultures indicated that the designed probes on the arrays appeared to be specific to their corresponding target genes. Detection limits were about 5-10ng genomic DNA in the absence of background DNA, and 50-100ng ({approx}1.3{sup o} 10{sup 7} cells) in the presence background DNA. Strong linear relationships between signal intensity and target DNA and RNA concentration were observed (r{sup 2} = 0.95-0.99). Application of this microarray to naphthalene-amended enrichments and soil microcosms demonstrated that composition of the microflora varied depending on incubation conditions. While the naphthalene-degrading genes from Rhodococcus-type microorganisms were dominant in enrichments, the genes involved in naphthalene degradation from Gram-negative microorganisms such as Ralstonia, Comamonas, and Burkholderia were most abundant in the soil microcosms (as well as those for polyaromatic hydrocarbon and nitrotoluene degradation). Although naphthalene degradation is widely known and studied in Pseudomonas, Pseudomonas genes were not detected in either system. Real-time PCR analysis of 4 representative genes was consistent with microarray-based quantification (r{sup 2} = 0.95). Currently, we are also applying this microarray to the study of several

  9. Genome engineering for microbial natural product discovery.

    Science.gov (United States)

    Choi, Si-Sun; Katsuyama, Yohei; Bai, Linquan; Deng, Zixin; Ohnishi, Yasuo; Kim, Eung-Soo

    2018-03-03

    The discovery and development of microbial natural products (MNPs) have played pivotal roles in the fields of human medicine and its related biotechnology sectors over the past several decades. The post-genomic era has witnessed the development of microbial genome mining approaches to isolate previously unsuspected MNP biosynthetic gene clusters (BGCs) hidden in the genome, followed by various BGC awakening techniques to visualize compound production. Additional microbial genome engineering techniques have allowed higher MNP production titers, which could complement a traditional culture-based MNP chasing approach. Here, we describe recent developments in the MNP research paradigm, including microbial genome mining, NP BGC activation, and NP overproducing cell factory design. Copyright © 2018 Elsevier Ltd. All rights reserved.

  10. Sequencing intractable DNA to close microbial genomes.

    Directory of Open Access Journals (Sweden)

    Richard A Hurt

    Full Text Available Advancement in high throughput DNA sequencing technologies has supported a rapid proliferation of microbial genome sequencing projects, providing the genetic blueprint for in-depth studies. Oftentimes, difficult to sequence regions in microbial genomes are ruled "intractable" resulting in a growing number of genomes with sequence gaps deposited in databases. A procedure was developed to sequence such problematic regions in the "non-contiguous finished" Desulfovibrio desulfuricans ND132 genome (6 intractable gaps and the Desulfovibrio africanus genome (1 intractable gap. The polynucleotides surrounding each gap formed GC rich secondary structures making the regions refractory to amplification and sequencing. Strand-displacing DNA polymerases used in concert with a novel ramped PCR extension cycle supported amplification and closure of all gap regions in both genomes. The developed procedures support accurate gene annotation, and provide a step-wise method that reduces the effort required for genome finishing.

  11. Sequencing Intractable DNA to Close Microbial Genomes

    Energy Technology Data Exchange (ETDEWEB)

    Hurt, Jr., Richard Ashley [ORNL; Brown, Steven D [ORNL; Podar, Mircea [ORNL; Palumbo, Anthony Vito [ORNL; Elias, Dwayne A [ORNL

    2012-01-01

    Advancement in high throughput DNA sequencing technologies has supported a rapid proliferation of microbial genome sequencing projects, providing the genetic blueprint for for in-depth studies. Oftentimes, difficult to sequence regions in microbial genomes are ruled intractable resulting in a growing number of genomes with sequence gaps deposited in databases. A procedure was developed to sequence such difficult regions in the non-contiguous finished Desulfovibrio desulfuricans ND132 genome (6 intractable gaps) and the Desulfovibrio africanus genome (1 intractable gap). The polynucleotides surrounding each gap formed GC rich secondary structures making the regions refractory to amplification and sequencing. Strand-displacing DNA polymerases used in concert with a novel ramped PCR extension cycle supported amplification and closure of all gap regions in both genomes. These developed procedures support accurate gene annotation, and provide a step-wise method that reduces the effort required for genome finishing.

  12. Safeguarding genome integrity

    DEFF Research Database (Denmark)

    Sørensen, Claus Storgaard; Syljuåsen, Randi G

    2012-01-01

    Mechanisms that preserve genome integrity are highly important during the normal life cycle of human cells. Loss of genome protective mechanisms can lead to the development of diseases such as cancer. Checkpoint kinases function in the cellular surveillance pathways that help cells to cope with D...

  13. Genome-reconstruction for eukaryotes from complex natural microbial communities.

    Science.gov (United States)

    West, Patrick T; Probst, Alexander J; Grigoriev, Igor V; Thomas, Brian C; Banfield, Jillian F

    2018-04-01

    Microbial eukaryotes are integral components of natural microbial communities, and their inclusion is critical for many ecosystem studies, yet the majority of published metagenome analyses ignore eukaryotes. In order to include eukaryotes in environmental studies, we propose a method to recover eukaryotic genomes from complex metagenomic samples. A key step for genome recovery is separation of eukaryotic and prokaryotic fragments. We developed a k -mer-based strategy, EukRep, for eukaryotic sequence identification and applied it to environmental samples to show that it enables genome recovery, genome completeness evaluation, and prediction of metabolic potential. We used this approach to test the effect of addition of organic carbon on a geyser-associated microbial community and detected a substantial change of the community metabolism, with selection against almost all candidate phyla bacteria and archaea and for eukaryotes. Near complete genomes were reconstructed for three fungi placed within the Eurotiomycetes and an arthropod. While carbon fixation and sulfur oxidation were important functions in the geyser community prior to carbon addition, the organic carbon-impacted community showed enrichment for secreted proteases, secreted lipases, cellulose targeting CAZymes, and methanol oxidation. We demonstrate the broader utility of EukRep by reconstructing and evaluating relatively high-quality fungal, protist, and rotifer genomes from complex environmental samples. This approach opens the way for cultivation-independent analyses of whole microbial communities. © 2018 West et al.; Published by Cold Spring Harbor Laboratory Press.

  14. Microbial genome analysis: the COG approach.

    Science.gov (United States)

    Galperin, Michael Y; Kristensen, David M; Makarova, Kira S; Wolf, Yuri I; Koonin, Eugene V

    2017-09-14

    For the past 20 years, the Clusters of Orthologous Genes (COG) database had been a popular tool for microbial genome annotation and comparative genomics. Initially created for the purpose of evolutionary classification of protein families, the COG have been used, apart from straightforward functional annotation of sequenced genomes, for such tasks as (i) unification of genome annotation in groups of related organisms; (ii) identification of missing and/or undetected genes in complete microbial genomes; (iii) analysis of genomic neighborhoods, in many cases allowing prediction of novel functional systems; (iv) analysis of metabolic pathways and prediction of alternative forms of enzymes; (v) comparison of organisms by COG functional categories; and (vi) prioritization of targets for structural and functional characterization. Here we review the principles of the COG approach and discuss its key advantages and drawbacks in microbial genome analysis. Published by Oxford University Press 2017. This work is written by US Government employees and is in the public domain in the US.

  15. Fourteenth-Sixteenth Microbial Genomics Conference-2006-2008

    Energy Technology Data Exchange (ETDEWEB)

    Miller, Jeffrey H

    2011-04-18

    The concept of an annual meeting on the E. coli genome was formulated at the Banbury Center Conference on the Genome of E. coli in October, 1991. The first meeting was held on September 10-14, 1992 at the University of Wisconsin, and this was followed by a yearly series of meetings, and by an expansion to include The fourteenth meeting took place September 24-28, 2006 at Lake Arrowhead, CA, the fifteenth September 16-20, 2007 at the University of Maryland, College Park, MD, and the sixteenth September 14-18, 2008 at Lake Arrowhead. The full program for the 16th meeting is attached. There have been rapid and exciting advances in microbial genomics that now make possible comparing large data sets of sequences from a wide variety of microbial genomes, and from whole microbial communities. Examining the “microbiomes”, the living microbial communities in different host organisms opens up many possibilities for understanding the landscape presented to pathogenic microorganisms. For quite some time there has been a shifting emphasis from pure sequence data to trying to understand how to use that information to solve biological problems. Towards this end new technologies are being developed and improved. Using genetics, functional genomics, and proteomics has been the recent focus of many different laboratories. A key element is the integration of different aspects of microbiology, sequencing technology, analysis techniques, and bioinformatics. The goal of these conference is to provide a regular forum for these interactions to occur. While there have been a number of genome conferences, what distinguishes the Microbial Genomics Conference is its emphasis on bringing together biology and genetics with sequencing and bioinformatics. Also, this conference is the longest continuing meeting, now established as a major regular annual meeting. In addition to its coverage of microbial genomes and biodiversity, the meetings also highlight microbial communities and the use of

  16. Microbial species delineation using whole genome sequences.

    Science.gov (United States)

    Varghese, Neha J; Mukherjee, Supratim; Ivanova, Natalia; Konstantinidis, Konstantinos T; Mavrommatis, Kostas; Kyrpides, Nikos C; Pati, Amrita

    2015-08-18

    Increased sequencing of microbial genomes has revealed that prevailing prokaryotic species assignments can be inconsistent with whole genome information for a significant number of species. The long-standing need for a systematic and scalable species assignment technique can be met by the genome-wide Average Nucleotide Identity (gANI) metric, which is widely acknowledged as a robust measure of genomic relatedness. In this work, we demonstrate that the combination of gANI and the alignment fraction (AF) between two genomes accurately reflects their genomic relatedness. We introduce an efficient implementation of AF,gANI and discuss its successful application to 86.5M genome pairs between 13,151 prokaryotic genomes assigned to 3032 species. Subsequently, by comparing the genome clusters obtained from complete linkage clustering of these pairs to existing taxonomy, we observed that nearly 18% of all prokaryotic species suffer from anomalies in species definition. Our results can be used to explore central questions such as whether microorganisms form a continuum of genetic diversity or distinct species represented by distinct genetic signatures. We propose that this precise and objective AF,gANI-based species definition: the MiSI (Microbial Species Identifier) method, be used to address previous inconsistencies in species classification and as the primary guide for new taxonomic species assignment, supplemented by the traditional polyphasic approach, as required. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  17. Genome Surfing As Driver of Microbial Genomic Diversity.

    Science.gov (United States)

    Choudoir, Mallory J; Panke-Buisse, Kevin; Andam, Cheryl P; Buckley, Daniel H

    2017-08-01

    Historical changes in population size, such as those caused by demographic range expansions, can produce nonadaptive changes in genomic diversity through mechanisms such as gene surfing. We propose that demographic range expansion of a microbial population capable of horizontal gene exchange can result in genome surfing, a mechanism that can cause widespread increase in the pan-genome frequency of genes acquired by horizontal gene exchange. We explain that patterns of genetic diversity within Streptomyces are consistent with genome surfing, and we describe several predictions for testing this hypothesis both in Streptomyces and in other microorganisms. Copyright © 2017 Elsevier Ltd. All rights reserved.

  18. Using Microbial Genome Annotation as a Foundation for Collaborative Student Research

    Science.gov (United States)

    Reed, Kelynne E.; Richardson, John M.

    2013-01-01

    We used the Integrated Microbial Genomes Annotation Collaboration Toolkit as a framework to incorporate microbial genomics research into a microbiology and biochemistry course in a way that promoted student learning of bioinformatics and research skills and emphasized teamwork and collaboration as evidenced through multiple assessment mechanisms.…

  19. Microbial ecology in the age of genomics and metagenomics: concepts, tools, and recent advances.

    Science.gov (United States)

    Xu, Jianping

    2006-06-01

    genomes, a result consistent with those from multilocus sequence typing and representational difference analyses. The integration of various levels of ecological analyses coupled to the application and further development of high throughput technologies are accelerating the pace of discovery in microbial ecology.

  20. Pre-genomic, genomic and post-genomic study of microbial communities involved in bioenergy.

    Science.gov (United States)

    Rittmann, Bruce E; Krajmalnik-Brown, Rosa; Halden, Rolf U

    2008-08-01

    Microorganisms can produce renewable energy in large quantities and without damaging the environment or disrupting food supply. The microbial communities must be robust and self-stabilizing, and their essential syntrophies must be managed. Pre-genomic, genomic and post-genomic tools can provide crucial information about the structure and function of these microbial communities. Applying these tools will help accelerate the rate at which microbial bioenergy processes move from intriguing science to real-world practice.

  1. GI-POP: a combinational annotation and genomic island prediction pipeline for ongoing microbial genome projects.

    Science.gov (United States)

    Lee, Chi-Ching; Chen, Yi-Ping Phoebe; Yao, Tzu-Jung; Ma, Cheng-Yu; Lo, Wei-Cheng; Lyu, Ping-Chiang; Tang, Chuan Yi

    2013-04-10

    Sequencing of microbial genomes is important because of microbial-carrying antibiotic and pathogenetic activities. However, even with the help of new assembling software, finishing a whole genome is a time-consuming task. In most bacteria, pathogenetic or antibiotic genes are carried in genomic islands. Therefore, a quick genomic island (GI) prediction method is useful for ongoing sequencing genomes. In this work, we built a Web server called GI-POP (http://gipop.life.nthu.edu.tw) which integrates a sequence assembling tool, a functional annotation pipeline, and a high-performance GI predicting module, in a support vector machine (SVM)-based method called genomic island genomic profile scanning (GI-GPS). The draft genomes of the ongoing genome projects in contigs or scaffolds can be submitted to our Web server, and it provides the functional annotation and highly probable GI-predicting results. GI-POP is a comprehensive annotation Web server designed for ongoing genome project analysis. Researchers can perform annotation and obtain pre-analytic information include possible GIs, coding/non-coding sequences and functional analysis from their draft genomes. This pre-analytic system can provide useful information for finishing a genome sequencing project. Copyright © 2012 Elsevier B.V. All rights reserved.

  2. Specialized microbial databases for inductive exploration of microbial genome sequences

    Directory of Open Access Journals (Sweden)

    Cabau Cédric

    2005-02-01

    Full Text Available Abstract Background The enormous amount of genome sequence data asks for user-oriented databases to manage sequences and annotations. Queries must include search tools permitting function identification through exploration of related objects. Methods The GenoList package for collecting and mining microbial genome databases has been rewritten using MySQL as the database management system. Functions that were not available in MySQL, such as nested subquery, have been implemented. Results Inductive reasoning in the study of genomes starts from "islands of knowledge", centered around genes with some known background. With this concept of "neighborhood" in mind, a modified version of the GenoList structure has been used for organizing sequence data from prokaryotic genomes of particular interest in China. GenoChore http://bioinfo.hku.hk/genochore.html, a set of 17 specialized end-user-oriented microbial databases (including one instance of Microsporidia, Encephalitozoon cuniculi, a member of Eukarya has been made publicly available. These databases allow the user to browse genome sequence and annotation data using standard queries. In addition they provide a weekly update of searches against the world-wide protein sequences data libraries, allowing one to monitor annotation updates on genes of interest. Finally, they allow users to search for patterns in DNA or protein sequences, taking into account a clustering of genes into formal operons, as well as providing extra facilities to query sequences using predefined sequence patterns. Conclusion This growing set of specialized microbial databases organize data created by the first Chinese bacterial genome programs (ThermaList, Thermoanaerobacter tencongensis, LeptoList, with two different genomes of Leptospira interrogans and SepiList, Staphylococcus epidermidis associated to related organisms for comparison.

  3. Statistical Methods in Integrative Genomics

    Science.gov (United States)

    Richardson, Sylvia; Tseng, George C.; Sun, Wei

    2016-01-01

    Statistical methods in integrative genomics aim to answer important biology questions by jointly analyzing multiple types of genomic data (vertical integration) or aggregating the same type of data across multiple studies (horizontal integration). In this article, we introduce different types of genomic data and data resources, and then review statistical methods of integrative genomics, with emphasis on the motivation and rationale of these methods. We conclude with some summary points and future research directions. PMID:27482531

  4. VirSorter: mining viral signal from microbial genomic data

    Science.gov (United States)

    Roux, Simon; Enault, Francois; Hurwitz, Bonnie L.

    2015-01-01

    Viruses of microbes impact all ecosystems where microbes drive key energy and substrate transformations including the oceans, humans and industrial fermenters. However, despite this recognized importance, our understanding of viral diversity and impacts remains limited by too few model systems and reference genomes. One way to fill these gaps in our knowledge of viral diversity is through the detection of viral signal in microbial genomic data. While multiple approaches have been developed and applied for the detection of prophages (viral genomes integrated in a microbial genome), new types of microbial genomic data are emerging that are more fragmented and larger scale, such as Single-cell Amplified Genomes (SAGs) of uncultivated organisms or genomic fragments assembled from metagenomic sequencing. Here, we present VirSorter, a tool designed to detect viral signal in these different types of microbial sequence data in both a reference-dependent and reference-independent manner, leveraging probabilistic models and extensive virome data to maximize detection of novel viruses. Performance testing shows that VirSorter’s prophage prediction capability compares to that of available prophage predictors for complete genomes, but is superior in predicting viral sequences outside of a host genome (i.e., from extrachromosomal prophages, lytic infections, or partially assembled prophages). Furthermore, VirSorter outperforms existing tools for fragmented genomic and metagenomic datasets, and can identify viral signal in assembled sequence (contigs) as short as 3kb, while providing near-perfect identification (>95% Recall and 100% Precision) on contigs of at least 10kb. Because VirSorter scales to large datasets, it can also be used in “reverse” to more confidently identify viral sequence in viral metagenomes by sorting away cellular DNA whether derived from gene transfer agents, generalized transduction or contamination. Finally, VirSorter is made available through the i

  5. VirSorter: mining viral signal from microbial genomic data

    Directory of Open Access Journals (Sweden)

    Simon Roux

    2015-05-01

    Full Text Available Viruses of microbes impact all ecosystems where microbes drive key energy and substrate transformations including the oceans, humans and industrial fermenters. However, despite this recognized importance, our understanding of viral diversity and impacts remains limited by too few model systems and reference genomes. One way to fill these gaps in our knowledge of viral diversity is through the detection of viral signal in microbial genomic data. While multiple approaches have been developed and applied for the detection of prophages (viral genomes integrated in a microbial genome, new types of microbial genomic data are emerging that are more fragmented and larger scale, such as Single-cell Amplified Genomes (SAGs of uncultivated organisms or genomic fragments assembled from metagenomic sequencing. Here, we present VirSorter, a tool designed to detect viral signal in these different types of microbial sequence data in both a reference-dependent and reference-independent manner, leveraging probabilistic models and extensive virome data to maximize detection of novel viruses. Performance testing shows that VirSorter’s prophage prediction capability compares to that of available prophage predictors for complete genomes, but is superior in predicting viral sequences outside of a host genome (i.e., from extrachromosomal prophages, lytic infections, or partially assembled prophages. Furthermore, VirSorter outperforms existing tools for fragmented genomic and metagenomic datasets, and can identify viral signal in assembled sequence (contigs as short as 3kb, while providing near-perfect identification (>95% Recall and 100% Precision on contigs of at least 10kb. Because VirSorter scales to large datasets, it can also be used in “reverse” to more confidently identify viral sequence in viral metagenomes by sorting away cellular DNA whether derived from gene transfer agents, generalized transduction or contamination. Finally, VirSorter is made

  6. Big data or bust: realizing the microbial genomics revolution.

    Science.gov (United States)

    Raza, Sobia; Luheshi, Leila

    2016-02-01

    Pathogen genomics has the potential to transform the clinical and public health management of infectious diseases through improved diagnosis, detection and tracking of antimicrobial resistance and outbreak control. However, the wide-ranging benefits of this technology can only fully be realized through the timely collation, integration and sharing of genomic and clinical/epidemiological metadata by all those involved in the delivery of genomic-informed services. As part of our review on bringing pathogen genomics into 'health-service' practice, we undertook extensive stakeholder consultation to examine the factors integral to achieving effective data sharing and integration. Infrastructure tailored to the needs of clinical users, as well as practical support and policies to facilitate the timely and responsible sharing of data with relevant health authorities and beyond, are all essential. We propose a tiered data sharing and integration model to maximize the immediate and longer term utility of microbial genomics in healthcare. Realizing this model at the scale and sophistication necessary to support national and international infection management services is not uncomplicated. Yet the establishment of a clear data strategy is paramount if failures in containing disease spread due to inadequate knowledge sharing are to be averted, and substantial progress made in tackling the dangers posed by infectious diseases.

  7. Community genomics among stratified microbial assemblages in the ocean's interior

    DEFF Research Database (Denmark)

    DeLong, Edward F; Preston, Christina M; Mincer, Tracy

    2006-01-01

    Microbial life predominates in the ocean, yet little is known about its genomic variability, especially along the depth continuum. We report here genomic analyses of planktonic microbial communities in the North Pacific Subtropical Gyre, from the ocean's surface to near-sea floor depths. Sequence......, and host-viral interactions. Comparative genomic analyses of stratified microbial communities have the potential to provide significant insight into higher-order community organization and dynamics....

  8. Genome-Based Microbial Taxonomy Coming of Age.

    Science.gov (United States)

    Hugenholtz, Philip; Skarshewski, Adam; Parks, Donovan H

    2016-06-01

    Reconstructing the complete evolutionary history of extant life on our planet will be one of the most fundamental accomplishments of scientific endeavor, akin to the completion of the periodic table, which revolutionized chemistry. The road to this goal is via comparative genomics because genomes are our most comprehensive and objective evolutionary documents. The genomes of plant and animal species have been systematically targeted over the past decade to provide coverage of the tree of life. However, multicellular organisms only emerged in the last 550 million years of more than three billion years of biological evolution and thus comprise a small fraction of total biological diversity. The bulk of biodiversity, both past and present, is microbial. We have only scratched the surface in our understanding of the microbial world, as most microorganisms cannot be readily grown in the laboratory and remain unknown to science. Ground-breaking, culture-independent molecular techniques developed over the past 30 years have opened the door to this so-called microbial dark matter with an accelerating momentum driven by exponential increases in sequencing capacity. We are on the verge of obtaining representative genomes across all life for the first time. However, historical use of morphology, biochemical properties, behavioral traits, and single-marker genes to infer organismal relationships mean that the existing highly incomplete tree is riddled with taxonomic errors. Concerted efforts are now needed to synthesize and integrate the burgeoning genomic data resources into a coherent universal tree of life and genome-based taxonomy. Copyright © 2016 Cold Spring Harbor Laboratory Press; all rights reserved.

  9. The standard operating procedure of the DOE-JGI Microbial Genome Annotation Pipeline (MGAP v.4).

    Science.gov (United States)

    Huntemann, Marcel; Ivanova, Natalia N; Mavromatis, Konstantinos; Tripp, H James; Paez-Espino, David; Palaniappan, Krishnaveni; Szeto, Ernest; Pillay, Manoj; Chen, I-Min A; Pati, Amrita; Nielsen, Torben; Markowitz, Victor M; Kyrpides, Nikos C

    2015-01-01

    The DOE-JGI Microbial Genome Annotation Pipeline performs structural and functional annotation of microbial genomes that are further included into the Integrated Microbial Genome comparative analysis system. MGAP is applied to assembled nucleotide sequence datasets that are provided via the IMG submission site. Dataset submission for annotation first requires project and associated metadata description in GOLD. The MGAP sequence data processing consists of feature prediction including identification of protein-coding genes, non-coding RNAs and regulatory RNA features, as well as CRISPR elements. Structural annotation is followed by assignment of protein product names and functions.

  10. MicroScope: a platform for microbial genome annotation and comparative genomics.

    Science.gov (United States)

    Vallenet, D; Engelen, S; Mornico, D; Cruveiller, S; Fleury, L; Lajus, A; Rouy, Z; Roche, D; Salvignol, G; Scarpelli, C; Médigue, C

    2009-01-01

    The initial outcome of genome sequencing is the creation of long text strings written in a four letter alphabet. The role of in silico sequence analysis is to assist biologists in the act of associating biological knowledge with these sequences, allowing investigators to make inferences and predictions that can be tested experimentally. A wide variety of software is available to the scientific community, and can be used to identify genomic objects, before predicting their biological functions. However, only a limited number of biologically interesting features can be revealed from an isolated sequence. Comparative genomics tools, on the other hand, by bringing together the information contained in numerous genomes simultaneously, allow annotators to make inferences based on the idea that evolution and natural selection are central to the definition of all biological processes. We have developed the MicroScope platform in order to offer a web-based framework for the systematic and efficient revision of microbial genome annotation and comparative analysis (http://www.genoscope.cns.fr/agc/microscope). Starting with the description of the flow chart of the annotation processes implemented in the MicroScope pipeline, and the development of traditional and novel microbial annotation and comparative analysis tools, this article emphasizes the essential role of expert annotation as a complement of automatic annotation. Several examples illustrate the use of implemented tools for the review and curation of annotations of both new and publicly available microbial genomes within MicroScope's rich integrated genome framework. The platform is used as a viewer in order to browse updated annotation information of available microbial genomes (more than 440 organisms to date), and in the context of new annotation projects (117 bacterial genomes). The human expertise gathered in the MicroScope database (about 280,000 independent annotations) contributes to improve the quality of

  11. Genome-scale biological models for industrial microbial systems.

    Science.gov (United States)

    Xu, Nan; Ye, Chao; Liu, Liming

    2018-04-01

    The primary aims and challenges associated with microbial fermentation include achieving faster cell growth, higher productivity, and more robust production processes. Genome-scale biological models, predicting the formation of an interaction among genetic materials, enzymes, and metabolites, constitute a systematic and comprehensive platform to analyze and optimize the microbial growth and production of biological products. Genome-scale biological models can help optimize microbial growth-associated traits by simulating biomass formation, predicting growth rates, and identifying the requirements for cell growth. With regard to microbial product biosynthesis, genome-scale biological models can be used to design product biosynthetic pathways, accelerate production efficiency, and reduce metabolic side effects, leading to improved production performance. The present review discusses the development of microbial genome-scale biological models since their emergence and emphasizes their pertinent application in improving industrial microbial fermentation of biological products.

  12. Integrated Approach to Reconstruction of Microbial Regulatory Networks

    Energy Technology Data Exchange (ETDEWEB)

    Rodionov, Dmitry A [Sanford-Burnham Medical Research Institute; Novichkov, Pavel S [Lawrence Berkeley National Laboratory

    2013-11-04

    This project had the goal(s) of development of integrated bioinformatics platform for genome-scale inference and visualization of transcriptional regulatory networks (TRNs) in bacterial genomes. The work was done in Sanford-Burnham Medical Research Institute (SBMRI, P.I. D.A. Rodionov) and Lawrence Berkeley National Laboratory (LBNL, co-P.I. P.S. Novichkov). The developed computational resources include: (1) RegPredict web-platform for TRN inference and regulon reconstruction in microbial genomes, and (2) RegPrecise database for collection, visualization and comparative analysis of transcriptional regulons reconstructed by comparative genomics. These analytical resources were selected as key components in the DOE Systems Biology KnowledgeBase (SBKB). The high-quality data accumulated in RegPrecise will provide essential datasets of reference regulons in diverse microbes to enable automatic reconstruction of draft TRNs in newly sequenced genomes. We outline our progress toward the three aims of this grant proposal, which were: Develop integrated platform for genome-scale regulon reconstruction; Infer regulatory annotations in several groups of bacteria and building of reference collections of microbial regulons; and Develop KnowledgeBase on microbial transcriptional regulation.

  13. MBGD update 2013: the microbial genome database for exploring the diversity of microbial world.

    Science.gov (United States)

    Uchiyama, Ikuo; Mihara, Motohiro; Nishide, Hiroyo; Chiba, Hirokazu

    2013-01-01

    The microbial genome database for comparative analysis (MBGD, available at http://mbgd.genome.ad.jp/) is a platform for microbial genome comparison based on orthology analysis. As its unique feature, MBGD allows users to conduct orthology analysis among any specified set of organisms; this flexibility allows MBGD to adapt to a variety of microbial genomic study. Reflecting the huge diversity of microbial world, the number of microbial genome projects now becomes several thousands. To efficiently explore the diversity of the entire microbial genomic data, MBGD now provides summary pages for pre-calculated ortholog tables among various taxonomic groups. For some closely related taxa, MBGD also provides the conserved synteny information (core genome alignment) pre-calculated using the CoreAligner program. In addition, efficient incremental updating procedure can create extended ortholog table by adding additional genomes to the default ortholog table generated from the representative set of genomes. Combining with the functionalities of the dynamic orthology calculation of any specified set of organisms, MBGD is an efficient and flexible tool for exploring the microbial genome diversity.

  14. The Modern Synthesis in the Light of Microbial Genomics.

    Science.gov (United States)

    Booth, Austin; Mariscal, Carlos; Doolittle, W Ford

    2016-09-08

    We review the theoretical implications of findings in genomics for evolutionary biology since the Modern Synthesis. We examine the ways in which microbial genomics has influenced our understanding of the last universal common ancestor, the tree of life, species, lineages, and evolutionary transitions. We conclude by advocating a piecemeal toolkit approach to evolutionary biology, in lieu of any grand unified theory updated to include microbial genomics.

  15. PanCoreGen - Profiling, detecting, annotating protein-coding genes in microbial genomes.

    Science.gov (United States)

    Paul, Sandip; Bhardwaj, Archana; Bag, Sumit K; Sokurenko, Evgeni V; Chattopadhyay, Sujay

    2015-12-01

    A large amount of genomic data, especially from multiple isolates of a single species, has opened new vistas for microbial genomics analysis. Analyzing the pan-genome (i.e. the sum of genetic repertoire) of microbial species is crucial in understanding the dynamics of molecular evolution, where virulence evolution is of major interest. Here we present PanCoreGen - a standalone application for pan- and core-genomic profiling of microbial protein-coding genes. PanCoreGen overcomes key limitations of the existing pan-genomic analysis tools, and develops an integrated annotation-structure for a species-specific pan-genomic profile. It provides important new features for annotating draft genomes/contigs and detecting unidentified genes in annotated genomes. It also generates user-defined group-specific datasets within the pan-genome. Interestingly, analyzing an example-set of Salmonella genomes, we detect potential footprints of adaptive convergence of horizontally transferred genes in two human-restricted pathogenic serovars - Typhi and Paratyphi A. Overall, PanCoreGen represents a state-of-the-art tool for microbial phylogenomics and pathogenomics study. Copyright © 2015 Elsevier Inc. All rights reserved.

  16. PanCoreGen – profiling, detecting, annotating protein-coding genes in microbial genomes

    Science.gov (United States)

    Bhardwaj, Archana; Bag, Sumit K; Sokurenko, Evgeni V.

    2015-01-01

    A large amount of genomic data, especially from multiple isolates of a single species, has opened new vistas for microbial genomics analysis. Analyzing pan-genome (i.e. the sum of genetic repertoire) of microbial species is crucial in understanding the dynamics of molecular evolution, where virulence evolution is of major interest. Here we present PanCoreGen – a standalone application for pan- and core-genomic profiling of microbial protein-coding genes. PanCoreGen overcomes key limitations of the existing pan-genomic analysis tools, and develops an integrated annotation-structure for species-specific pan-genomic profile. It provides important new features for annotating draft genomes/contigs and detecting unidentified genes in annotated genomes. It also generates user-defined group-specific datasets within the pan-genome. Interestingly, analyzing an example-set of Salmonella genomes, we detect potential footprints of adaptive convergence of horizontally transferred genes in two human-restricted pathogenic serovars – Typhi and Paratyphi A. Overall, PanCoreGen represents a state-of-the-art tool for microbial phylogenomics and pathogenomics study. PMID:26456591

  17. Genomic Prospecting for Microbial Biodiesel Production

    Energy Technology Data Exchange (ETDEWEB)

    Lykidis, Athanasios; Lykidis, Athanasios; Ivanova, Natalia

    2008-03-20

    Biodiesel is defined as fatty acid mono-alkylesters and is produced from triacylglycerols. In the current article we provide an overview of the structure, diversity and regulation of the metabolic pathways leading to intracellular fatty acid and triacylglycerol accumulation in three types of organisms (bacteria, algae and fungi) of potential biotechnological interest and discuss possible intervention points to increase the cellular lipid content. The key steps that regulate carbon allocation and distribution in lipids include the formation of malonyl-CoA, the synthesis of fatty acids and their attachment onto the glycerol backbone, and the formation of triacylglycerols. The lipid biosynthetic genes and pathways are largely known for select model organisms. Comparative genomics allows the examination of these pathways in organisms of biotechnological interest and reveals the evolution of divergent and yet uncharacterized regulatory mechanisms. Utilization of microbial systems for triacylglycerol and fatty acid production is in its infancy; however, genomic information and technologies combined with synthetic biology concepts provide the opportunity to further exploit microbes for the competitive production of biodiesel.

  18. Ten years of maintaining and expanding a microbial genome and metagenome analysis system.

    Science.gov (United States)

    Markowitz, Victor M; Chen, I-Min A; Chu, Ken; Pati, Amrita; Ivanova, Natalia N; Kyrpides, Nikos C

    2015-11-01

    Launched in March 2005, the Integrated Microbial Genomes (IMG) system is a comprehensive data management system that supports multidimensional comparative analysis of genomic data. At the core of the IMG system is a data warehouse that contains genome and metagenome datasets sequenced at the Joint Genome Institute or provided by scientific users, as well as public genome datasets available at the National Center for Biotechnology Information Genbank sequence data archive. Genomes and metagenome datasets are processed using IMG's microbial genome and metagenome sequence data processing pipelines and are integrated into the data warehouse using IMG's data integration toolkits. Microbial genome and metagenome application specific data marts and user interfaces provide access to different subsets of IMG's data and analysis toolkits. This review article revisits IMG's original aims, highlights key milestones reached by the system during the past 10 years, and discusses the main challenges faced by a rapidly expanding system, in particular the complexity of maintaining such a system in an academic setting with limited budgets and computing and data management infrastructure. Copyright © 2015 Elsevier Ltd. All rights reserved.

  19. Microbial genome-wide association studies: lessons from human GWAS.

    Science.gov (United States)

    Power, Robert A; Parkhill, Julian; de Oliveira, Tulio

    2017-01-01

    The reduced costs of sequencing have led to whole-genome sequences for a large number of microorganisms, enabling the application of microbial genome-wide association studies (GWAS). Given the successes of human GWAS in understanding disease aetiology and identifying potential drug targets, microbial GWAS are likely to further advance our understanding of infectious diseases. These advances include insights into pressing global health problems, such as antibiotic resistance and disease transmission. In this Review, we outline the methodologies of GWAS, the current state of the field of microbial GWAS, and how lessons from human GWAS can direct the future of the field.

  20. Millstone: software for multiplex microbial genome analysis and engineering.

    Science.gov (United States)

    Goodman, Daniel B; Kuznetsov, Gleb; Lajoie, Marc J; Ahern, Brian W; Napolitano, Michael G; Chen, Kevin Y; Chen, Changping; Church, George M

    2017-05-25

    Inexpensive DNA sequencing and advances in genome editing have made computational analysis a major rate-limiting step in adaptive laboratory evolution and microbial genome engineering. We describe Millstone, a web-based platform that automates genotype comparison and visualization for projects with up to hundreds of genomic samples. To enable iterative genome engineering, Millstone allows users to design oligonucleotide libraries and create successive versions of reference genomes. Millstone is open source and easily deployable to a cloud platform, local cluster, or desktop, making it a scalable solution for any lab.

  1. Impact of genomics on microbial food safety

    NARCIS (Netherlands)

    Abee, T.; Schaik, van W.; Siezen, R.J.

    2004-01-01

    Genome sequences are now available for many of the microbes that cause food-borne diseases. The information contained in pathogen genome sequences, together with the development of themed and whole-genome DNA microarrays and improved proteomics techniques, might provide tools for the rapid detection

  2. Integrating genomics into evolutionary medicine.

    Science.gov (United States)

    Rodríguez, Juan Antonio; Marigorta, Urko M; Navarro, Arcadi

    2014-12-01

    The application of the principles of evolutionary biology into medicine was suggested long ago and is already providing insight into the ultimate causes of disease. However, a full systematic integration of medical genomics and evolutionary medicine is still missing. Here, we briefly review some cases where the combination of the two fields has proven profitable and highlight two of the main issues hindering the development of evolutionary genomic medicine as a mature field, namely the dissociation between fitness and health and the still considerable difficulties in predicting phenotypes from genotypes. We use publicly available data to illustrate both problems and conclude that new approaches are needed for evolutionary genomic medicine to overcome these obstacles. Copyright © 2014 Elsevier Ltd. All rights reserved.

  3. Microbial minimalism: genome reduction in bacterial pathogens.

    Science.gov (United States)

    Moran, Nancy A

    2002-03-08

    When bacterial lineages make the transition from free-living or facultatively parasitic life cycles to permanent associations with hosts, they undergo a major loss of genes and DNA. Complete genome sequences are providing an understanding of how extreme genome reduction affects evolutionary directions and metabolic capabilities of obligate pathogens and symbionts.

  4. Practical Approaches for Detecting Selection in Microbial Genomes

    OpenAIRE

    Hedge, Jessica; Wilson, Daniel J.

    2016-01-01

    Microbial genome evolution is shaped by a variety of selective pressures. Understanding how these processes occur can help to address important problems in microbiology by explaining observed differences in phenotypes, including virulence and resistance to antibiotics. Greater access to whole-genome sequencing provides microbiologists with the opportunity to perform large-scale analyses of selection in novel settings, such as within individual hosts. This tutorial aims to guide researchers th...

  5. Puzzling sequences: studying microbial genomes from 'Ötzi'

    International Nuclear Information System (INIS)

    Rattei, T.

    2012-01-01

    Ancient remains, and mummies in particular, are of central value for archaeological research. The Tyrolean iceman “Ötzi” was conserved in a glacier of the Ötztal Alps about 5000 years ago. Aside from morphological and phenotypical classification, the determination of DNA sequences and the subsequent genome analyses have been first applied to mitochondrial DNA and then been extended to genomic DNA. Typically also ancient microbial DNA is sequenced. These sequences allow the identification of pathogens as well as studying the evolution of microorganisms. The talk will explain the metagenomic aspects of the “Ötzi” genome project and discuss the first results. (author)

  6. Microbial ecology and genomics: A crossroads of opportunity

    Energy Technology Data Exchange (ETDEWEB)

    Stahl, David A. [University of Washington; Tiedje, James M. [Michigan State University

    2002-08-30

    Microbes have dominated life on Earth for most of its 4.5 billionyear history. They are the foundation of the biosphere, controlling the biogeochemical cycles and affecting geology, hydrology, and local and global climates. All life is completely dependent upon them. Humans cannot survive without the rich diversity of microbes, but most microbial species can survive without humans. Extraordinary advances in molecular technology have fostered an explosion of information in microbial biology. It is now known that microbial species in culture poorly represent their natural diversity—which dwarfs conventions established for the visible world. This was revealed over the last decade using newer molecular tools to explore environmental diversity and has sparked an explosive growth in microbial ecology and technologies that may profit from the bounty of natural biochemical diversity. Several colloquia and meetings have helped formulate policy recommendations to enable sustained research programs in these areas. One such colloquium organized by the American Academy of Microbiology (“The Microbial World: Foundation of the Biosphere,” 1997) made two key recommendations: (1) develop a more complete inventory of living organisms and the interagency cooperation needed to accomplish this goal, and (2) develop strategies to harvest this remarkable biological diversity for the benefit of science, technology, and society. Complete genome sequence information was identified as an essential part of strategy development, and the recommendation was made to sequence the genome of at least one species of each of the major divisions of microbial life.

  7. Comparative genomic analysis by microbial COGs self-attraction rate.

    Science.gov (United States)

    Santoni, Daniele; Romano-Spica, Vincenzo

    2009-06-21

    Whole genome analysis provides new perspectives to determine phylogenetic relationships among microorganisms. The availability of whole nucleotide sequences allows different levels of comparison among genomes by several approaches. In this work, self-attraction rates were considered for each cluster of orthologous groups of proteins (COGs) class in order to analyse gene aggregation levels in physical maps. Phylogenetic relationships among microorganisms were obtained by comparing self-attraction coefficients. Eighteen-dimensional vectors were computed for a set of 168 completely sequenced microbial genomes (19 archea, 149 bacteria). The components of the vector represent the aggregation rate of the genes belonging to each of 18 COGs classes. Genes involved in nonessential functions or related to environmental conditions showed the highest aggregation rates. On the contrary genes involved in basic cellular tasks showed a more uniform distribution along the genome, except for translation genes. Self-attraction clustering approach allowed classification of Proteobacteria, Bacilli and other species belonging to Firmicutes. Rearrangement and Lateral Gene Transfer events may influence divergences from classical taxonomy. Each set of COG classes' aggregation values represents an intrinsic property of the microbial genome. This novel approach provides a new point of view for whole genome analysis and bacterial characterization.

  8. Improved bacteriophage genome data is necessary for integrating viral and bacterial ecology.

    Science.gov (United States)

    Bibby, Kyle

    2014-02-01

    The recent rise in "omics"-enabled approaches has lead to improved understanding in many areas of microbial ecology. However, despite the importance that viruses play in a broad microbial ecology context, viral ecology remains largely not integrated into high-throughput microbial ecology studies. A fundamental hindrance to the integration of viral ecology into omics-enabled microbial ecology studies is the lack of suitable reference bacteriophage genomes in reference databases-currently, only 0.001% of bacteriophage diversity is represented in genome sequence databases. This commentary serves to highlight this issue and to promote bacteriophage genome sequencing as a valuable scientific undertaking to both better understand bacteriophage diversity and move towards a more holistic view of microbial ecology.

  9. Genomics Portals: integrative web-platform for mining genomics data.

    Science.gov (United States)

    Shinde, Kaustubh; Phatak, Mukta; Johannes, Freudenberg M; Chen, Jing; Li, Qian; Vineet, Joshi K; Hu, Zhen; Ghosh, Krishnendu; Meller, Jaroslaw; Medvedovic, Mario

    2010-01-13

    A large amount of experimental data generated by modern high-throughput technologies is available through various public repositories. Our knowledge about molecular interaction networks, functional biological pathways and transcriptional regulatory modules is rapidly expanding, and is being organized in lists of functionally related genes. Jointly, these two sources of information hold a tremendous potential for gaining new insights into functioning of living systems. Genomics Portals platform integrates access to an extensive knowledge base and a large database of human, mouse, and rat genomics data with basic analytical visualization tools. It provides the context for analyzing and interpreting new experimental data and the tool for effective mining of a large number of publicly available genomics datasets stored in the back-end databases. The uniqueness of this platform lies in the volume and the diversity of genomics data that can be accessed and analyzed (gene expression, ChIP-chip, ChIP-seq, epigenomics, computationally predicted binding sites, etc), and the integration with an extensive knowledge base that can be used in such analysis. The integrated access to primary genomics data, functional knowledge and analytical tools makes Genomics Portals platform a unique tool for interpreting results of new genomics experiments and for mining the vast amount of data stored in the Genomics Portals backend databases. Genomics Portals can be accessed and used freely at http://GenomicsPortals.org.

  10. Genomics Portals: integrative web-platform for mining genomics data

    Directory of Open Access Journals (Sweden)

    Ghosh Krishnendu

    2010-01-01

    Full Text Available Abstract Background A large amount of experimental data generated by modern high-throughput technologies is available through various public repositories. Our knowledge about molecular interaction networks, functional biological pathways and transcriptional regulatory modules is rapidly expanding, and is being organized in lists of functionally related genes. Jointly, these two sources of information hold a tremendous potential for gaining new insights into functioning of living systems. Results Genomics Portals platform integrates access to an extensive knowledge base and a large database of human, mouse, and rat genomics data with basic analytical visualization tools. It provides the context for analyzing and interpreting new experimental data and the tool for effective mining of a large number of publicly available genomics datasets stored in the back-end databases. The uniqueness of this platform lies in the volume and the diversity of genomics data that can be accessed and analyzed (gene expression, ChIP-chip, ChIP-seq, epigenomics, computationally predicted binding sites, etc, and the integration with an extensive knowledge base that can be used in such analysis. Conclusion The integrated access to primary genomics data, functional knowledge and analytical tools makes Genomics Portals platform a unique tool for interpreting results of new genomics experiments and for mining the vast amount of data stored in the Genomics Portals backend databases. Genomics Portals can be accessed and used freely at http://GenomicsPortals.org.

  11. Practical Approaches for Detecting Selection in Microbial Genomes.

    Directory of Open Access Journals (Sweden)

    Jessica Hedge

    2016-02-01

    Full Text Available Microbial genome evolution is shaped by a variety of selective pressures. Understanding how these processes occur can help to address important problems in microbiology by explaining observed differences in phenotypes, including virulence and resistance to antibiotics. Greater access to whole-genome sequencing provides microbiologists with the opportunity to perform large-scale analyses of selection in novel settings, such as within individual hosts. This tutorial aims to guide researchers through the fundamentals underpinning popular methods for measuring selection in pathogens. These methods are transferable to a wide variety of organisms, and the exercises provided are designed for researchers with any level of programming experience.

  12. Practical Approaches for Detecting Selection in Microbial Genomes.

    Science.gov (United States)

    Hedge, Jessica; Wilson, Daniel J

    2016-02-01

    Microbial genome evolution is shaped by a variety of selective pressures. Understanding how these processes occur can help to address important problems in microbiology by explaining observed differences in phenotypes, including virulence and resistance to antibiotics. Greater access to whole-genome sequencing provides microbiologists with the opportunity to perform large-scale analyses of selection in novel settings, such as within individual hosts. This tutorial aims to guide researchers through the fundamentals underpinning popular methods for measuring selection in pathogens. These methods are transferable to a wide variety of organisms, and the exercises provided are designed for researchers with any level of programming experience.

  13. phiGENOME: an integrative navigation throughout bacteriophage genomes.

    Science.gov (United States)

    Stano, Matej; Klucar, Lubos

    2011-11-01

    phiGENOME is a web-based genome browser generating dynamic and interactive graphical representation of phage genomes stored in the phiSITE, database of gene regulation in bacteriophages. phiGENOME is an integral part of the phiSITE web portal (http://www.phisite.org/phigenome) and it was optimised for visualisation of phage genomes with the emphasis on the gene regulatory elements. phiGENOME consists of three components: (i) genome map viewer built using Adobe Flash technology, providing dynamic and interactive graphical display of phage genomes; (ii) sequence browser based on precisely formatted HTML tags, providing detailed exploration of genome features on the sequence level and (iii) regulation illustrator, based on Scalable Vector Graphics (SVG) and designed for graphical representation of gene regulations. Bringing 542 complete genome sequences accompanied with their rich annotations and references, makes phiGENOME a unique information resource in the field of phage genomics. Copyright © 2011 Elsevier Inc. All rights reserved.

  14. Colloquium and Report on Systems Microbiology: Beyond Microbial Genomics

    Energy Technology Data Exchange (ETDEWEB)

    Merry R. Buckley

    2004-12-13

    The American Academy of Microbiology convened a colloquium June 4-6, 2004 to confer about the scientific promise of systems microbiology. Participants discussed the power of applying a systems approach to the study of biology and to microbiology in particular, specifics about current research efforts, technical bottlenecks, requirements for data acquisition and maintenance, educational needs, and communication issues surrounding the field. A number of recommendations were made for removing barriers to progress in systems microbiology and for improving opportunities in education and collaboration. Systems biology, as a concept, is not new, but the recent explosion of genomic sequences and related data has revived interest in the field. Systems microbiology, a subset of systems biology, represents a different approach to investigating biological systems. It attempts to examine the emergent properties of microorganisms that arise from the interplay of genes, proteins, other macromolecules, small molecules, organelles, and the environment. It is these interactions, often nonlinear, that lead to the emergent properties of biological systems that are generally not tractable by traditional approaches. As a complement to the long-standing trend toward reductionism, systems microbiology seeks to treat the organism or community as a whole, integrating fundamental biological knowledge with genomics, metabolomics, and other data to create an integrated picture of how a microbial cell or community operates. Systems microbiology promises not only to shed light on the activities of microbes, but will also provide biology the tools and approaches necessary for achieving a better understanding of life and ecosystems. Microorganisms are ideal candidates for systems biology research because they are relatively easy to manipulate and because they play critical roles in health, environment, agriculture, and energy production. Potential applications of systems microbiology research

  15. A multi-objective constraint-based approach for modeling genome-scale microbial ecosystems.

    Science.gov (United States)

    Budinich, Marko; Bourdon, Jérémie; Larhlimi, Abdelhalim; Eveillard, Damien

    2017-01-01

    Interplay within microbial communities impacts ecosystems on several scales, and elucidation of the consequent effects is a difficult task in ecology. In particular, the integration of genome-scale data within quantitative models of microbial ecosystems remains elusive. This study advocates the use of constraint-based modeling to build predictive models from recent high-resolution -omics datasets. Following recent studies that have demonstrated the accuracy of constraint-based models (CBMs) for simulating single-strain metabolic networks, we sought to study microbial ecosystems as a combination of single-strain metabolic networks that exchange nutrients. This study presents two multi-objective extensions of CBMs for modeling communities: multi-objective flux balance analysis (MO-FBA) and multi-objective flux variability analysis (MO-FVA). Both methods were applied to a hot spring mat model ecosystem. As a result, multiple trade-offs between nutrients and growth rates, as well as thermodynamically favorable relative abundances at community level, were emphasized. We expect this approach to be used for integrating genomic information in microbial ecosystems. Following models will provide insights about behaviors (including diversity) that take place at the ecosystem scale.

  16. A multi-objective constraint-based approach for modeling genome-scale microbial ecosystems.

    Directory of Open Access Journals (Sweden)

    Marko Budinich

    Full Text Available Interplay within microbial communities impacts ecosystems on several scales, and elucidation of the consequent effects is a difficult task in ecology. In particular, the integration of genome-scale data within quantitative models of microbial ecosystems remains elusive. This study advocates the use of constraint-based modeling to build predictive models from recent high-resolution -omics datasets. Following recent studies that have demonstrated the accuracy of constraint-based models (CBMs for simulating single-strain metabolic networks, we sought to study microbial ecosystems as a combination of single-strain metabolic networks that exchange nutrients. This study presents two multi-objective extensions of CBMs for modeling communities: multi-objective flux balance analysis (MO-FBA and multi-objective flux variability analysis (MO-FVA. Both methods were applied to a hot spring mat model ecosystem. As a result, multiple trade-offs between nutrients and growth rates, as well as thermodynamically favorable relative abundances at community level, were emphasized. We expect this approach to be used for integrating genomic information in microbial ecosystems. Following models will provide insights about behaviors (including diversity that take place at the ecosystem scale.

  17. pico-PLAZA, a genome database of microbial photosynthetic eukaryotes.

    Science.gov (United States)

    Vandepoele, Klaas; Van Bel, Michiel; Richard, Guilhem; Van Landeghem, Sofie; Verhelst, Bram; Moreau, Hervé; Van de Peer, Yves; Grimsley, Nigel; Piganeau, Gwenael

    2013-08-01

    With the advent of next generation genome sequencing, the number of sequenced algal genomes and transcriptomes is rapidly growing. Although a few genome portals exist to browse individual genome sequences, exploring complete genome information from multiple species for the analysis of user-defined sequences or gene lists remains a major challenge. pico-PLAZA is a web-based resource (http://bioinformatics.psb.ugent.be/pico-plaza/) for algal genomics that combines different data types with intuitive tools to explore genomic diversity, perform integrative evolutionary sequence analysis and study gene functions. Apart from homologous gene families, multiple sequence alignments, phylogenetic trees, Gene Ontology, InterPro and text-mining functional annotations, different interactive viewers are available to study genome organization using gene collinearity and synteny information. Different search functions, documentation pages, export functions and an extensive glossary are available to guide non-expert scientists. To illustrate the versatility of the platform, different case studies are presented demonstrating how pico-PLAZA can be used to functionally characterize large-scale EST/RNA-Seq data sets and to perform environmental genomics. Functional enrichments analysis of 16 Phaeodactylum tricornutum transcriptome libraries offers a molecular view on diatom adaptation to different environments of ecological relevance. Furthermore, we show how complementary genomic data sources can easily be combined to identify marker genes to study the diversity and distribution of algal species, for example in metagenomes, or to quantify intraspecific diversity from environmental strains. © 2013 John Wiley & Sons Ltd and Society for Applied Microbiology.

  18. Bioinformatics for whole-genome shotgun sequencing of microbial communities.

    Directory of Open Access Journals (Sweden)

    Kevin Chen

    2005-07-01

    Full Text Available The application of whole-genome shotgun sequencing to microbial communities represents a major development in metagenomics, the study of uncultured microbes via the tools of modern genomic analysis. In the past year, whole-genome shotgun sequencing projects of prokaryotic communities from an acid mine biofilm, the Sargasso Sea, Minnesota farm soil, three deep-sea whale falls, and deep-sea sediments have been reported, adding to previously published work on viral communities from marine and fecal samples. The interpretation of this new kind of data poses a wide variety of exciting and difficult bioinformatics problems. The aim of this review is to introduce the bioinformatics community to this emerging field by surveying existing techniques and promising new approaches for several of the most interesting of these computational problems.

  19. Recent Advances in Microbial Single Cell Genomics Technology and Applications

    Science.gov (United States)

    Stepanauskas, R.

    2016-02-01

    Single cell genomics is increasingly utilized as a powerful tool to decipher the metabolic potential, evolutionary histories and in situ interactions of environmental microorganisms. This transformative technology recovers extensive information from cultivation-unbiased samples of individual, unicellular organisms. Thus, it does not require data binning into arbitrary phylogenetic or functional groups and therefore is highly compatible with agent-based modeling approaches. I will present several technological advances in this field, which significantly improve genomic data recovery from individual cells and provide direct linkages between cell's genomic and phenotypic properties. I will also demonstrate how these new technical capabilities help understanding the metabolic potential and viral infections of the "microbial dark matter" inhabiting aquatic and subsurface environments.

  20. Use of genome-scale microbial models for metabolic engineering

    DEFF Research Database (Denmark)

    Patil, Kiran Raosaheb; Åkesson, M.; Nielsen, Jens

    2004-01-01

    Metabolic engineering serves as an integrated approach to design new cell factories by providing rational design procedures and valuable mathematical and experimental tools. Mathematical models have an important role for phenotypic analysis, but can also be used for the design of optimal metaboli...... network structures. The major challenge for metabolic engineering in the post-genomic era is to broaden its design methodologies to incorporate genome-scale biological data. Genome-scale stoichiometric models of microorganisms represent a first step in this direction....

  1. Microbial genome-enabled insights into plant-microorganism interactions.

    Science.gov (United States)

    Guttman, David S; McHardy, Alice C; Schulze-Lefert, Paul

    2014-12-01

    Advances in genome-based studies on plant-associated microorganisms have transformed our understanding of many plant pathogens and are beginning to greatly widen our knowledge of plant interactions with mutualistic and commensal microorganisms. Pathogenomics has revealed how pathogenic microorganisms adapt to particular hosts, subvert innate immune responses and change host range, as well as how new pathogen species emerge. Similarly, culture-independent community profiling methods, coupled with metagenomic and metatranscriptomic studies, have provided the first insights into the emerging field of research on plant-associated microbial communities. Together, these approaches have the potential to bridge the gap between plant microbial ecology and plant pathology, which have traditionally been two distinct research fields.

  2. Use of Modern Chemical Protein Synthesis and Advanced Fluorescent Assay Techniques to Experimentally Validate the Functional Annotation of Microbial Genomes

    Energy Technology Data Exchange (ETDEWEB)

    Kent, Stephen [University of Chicago

    2012-07-20

    The objective of this research program was to prototype methods for the chemical synthesis of predicted protein molecules in annotated microbial genomes. High throughput chemical methods were to be used to make large numbers of predicted proteins and protein domains, based on microbial genome sequences. Microscale chemical synthesis methods for the parallel preparation of peptide-thioester building blocks were developed; these peptide segments are used for the parallel chemical synthesis of proteins and protein domains. Ultimately, it is envisaged that these synthetic molecules would be ‘printed’ in spatially addressable arrays. The unique ability of total synthesis to precision label protein molecules with dyes and with chemical or biochemical ‘tags’ can be used to facilitate novel assay technologies adapted from state-of-the art single molecule fluorescence detection techniques. In the future, in conjunction with modern laboratory automation this integrated set of techniques will enable high throughput experimental validation of the functional annotation of microbial genomes.

  3. Low-pass sequencing for microbial comparative genomics

    Directory of Open Access Journals (Sweden)

    Kennedy Sean

    2004-01-01

    IS-element rich genome of H. sp. NRC-1. Identification of multiple TBP and TFB homologs in these four halophiles are consistent with the hypothesis that different types of complex transcriptional regulation may occur through multiple TBP-TFB combinations in response to rapidly changing environmental conditions. Low-pass shotgun sequence analyses of genomes permit extensive and diverse analyses, and should be generally useful for comparative microbial genomics.

  4. The Genome-Scale Integrated Networks in Microorganisms

    Directory of Open Access Journals (Sweden)

    Tong Hao

    2018-02-01

    Full Text Available The genome-scale cellular network has become a necessary tool in the systematic analysis of microbes. In a cell, there are several layers (i.e., types of the molecular networks, for example, genome-scale metabolic network (GMN, transcriptional regulatory network (TRN, and signal transduction network (STN. It has been realized that the limitation and inaccuracy of the prediction exist just using only a single-layer network. Therefore, the integrated network constructed based on the networks of the three types attracts more interests. The function of a biological process in living cells is usually performed by the interaction of biological components. Therefore, it is necessary to integrate and analyze all the related components at the systems level for the comprehensively and correctly realizing the physiological function in living organisms. In this review, we discussed three representative genome-scale cellular networks: GMN, TRN, and STN, representing different levels (i.e., metabolism, gene regulation, and cellular signaling of a cell’s activities. Furthermore, we discussed the integration of the networks of the three types. With more understanding on the complexity of microbial cells, the development of integrated network has become an inevitable trend in analyzing genome-scale cellular networks of microorganisms.

  5. PATtyFams: Protein families for the microbial genomes in the PATRIC database

    Directory of Open Access Journals (Sweden)

    James J Davis

    2016-02-01

    Full Text Available The ability to build accurate protein families is a fundamental operation in bioinformatics that influences comparative analyses, genome annotation and metabolic modeling. For several years we have been maintaining protein families for all microbial genomes in the PATRIC database (Pathosystems Resource Integration Center, patricbrc.org in order to drive many of the comparative analysis tools that are available through the PATRIC website. However, due to the burgeoning number of genomes, traditional approaches for generating protein families are becoming prohibitive. In this report, we describe a new approach for generating protein families, which we call PATtyFams. This method uses the k-mer-based function assignments available through RAST (Rapid Annotation using Subsystem Technology to rapidly guide family formation, and then differentiates the function-based groups into families using a Markov Cluster algorithm (MCL. This new approach for generating protein families is rapid, scalable and has properties that are consistent with alignment-based methods.

  6. Streamlining genomes: toward the generation of simplified and stabilized microbial systems

    NARCIS (Netherlands)

    Leprince, A.; Passel, van M.W.J.; Martins Dos Santos, V.A.P.

    2012-01-01

    At the junction between systems and synthetic biology, genome streamlining provides a solid foundation both for increased understanding of cellular circuitry, and for the tailoring of microbial chassis towards innovative biotechnological applications. Iterative genomic deletions (targeted and

  7. Assembling networks of microbial genomes using linear programming.

    Science.gov (United States)

    Holloway, Catherine; Beiko, Robert G

    2010-11-20

    Microbial genomes exhibit complex sets of genetic affinities due to lateral genetic transfer. Assessing the relative contributions of parent-to-offspring inheritance and gene sharing is a vital step in understanding the evolutionary origins and modern-day function of an organism, but recovering and showing these relationships is a challenging problem. We have developed a new approach that uses linear programming to find between-genome relationships, by treating tables of genetic affinities (here, represented by transformed BLAST e-values) as an optimization problem. Validation trials on simulated data demonstrate the effectiveness of the approach in recovering and representing vertical and lateral relationships among genomes. Application of the technique to a set comprising Aquifex aeolicus and 75 other thermophiles showed an important role for large genomes as 'hubs' in the gene sharing network, and suggested that genes are preferentially shared between organisms with similar optimal growth temperatures. We were also able to discover distinct and common genetic contributors to each sequenced representative of genus Pseudomonas. The linear programming approach we have developed can serve as an effective inference tool in its own right, and can be an efficient first step in a more-intensive phylogenomic analysis.

  8. Assembly and Multiplex Genome Integration of Metabolic Pathways in Yeast Using CasEMBLR

    DEFF Research Database (Denmark)

    Jakočiūnas, Tadas; Jensen, Emil D.; Jensen, Michael Krogh

    2018-01-01

    and marker-free integration of the carotenoid pathway from 15 exogenously supplied DNA parts into three targeted genomic loci. As a second proof-of-principle, a total of ten DNA parts were assembled and integrated in two genomic loci to construct a tyrosine production strain, and at the same time knocking......Genome integration is a vital step for implementing large biochemical pathways to build a stable microbial cell factory. Although traditional strain construction strategies are well established for the model organism Saccharomyces cerevisiae, recent advances in CRISPR/Cas9-mediated genome...... engineering allow much higher throughput and robustness in terms of strain construction. In this chapter, we describe CasEMBLR, a highly efficient and marker-free genome engineering method for one-step integration of in vivo assembled expression cassettes in multiple genomic sites simultaneously. Cas...

  9. Genomic Sequencing of Single Microbial Cells from Environmental Samples

    Energy Technology Data Exchange (ETDEWEB)

    Ishoey, Thomas; Woyke, Tanja; Stepanauskas, Ramunas; Novotny, Mark; Lasken, Roger S.

    2008-02-01

    Recently developed techniques allow genomic DNA sequencing from single microbial cells [Lasken RS: Single-cell genomic sequencing using multiple displacement amplification, Curr Opin Microbiol 2007, 10:510-516]. Here, we focus on research strategies for putting these methods into practice in the laboratory setting. An immediate consequence of single-cell sequencing is that it provides an alternative to culturing organisms as a prerequisite for genomic sequencing. The microgram amounts of DNA required as template are amplified from a single bacterium by a method called multiple displacement amplification (MDA) avoiding the need to grow cells. The ability to sequence DNA from individual cells will likely have an immense impact on microbiology considering the vast numbers of novel organisms, which have been inaccessible unless culture-independent methods could be used. However, special approaches have been necessary to work with amplified DNA. MDA may not recover the entire genome from the single copy present in most bacteria. Also, some sequence rearrangements can occur during the DNA amplification reaction. Over the past two years many research groups have begun to use MDA, and some practical approaches to single-cell sequencing have been developed. We review the consensus that is emerging on optimum methods, reliability of amplified template, and the proper interpretation of 'composite' genomes which result from the necessity of combining data from several single-cell MDA reactions in order to complete the assembly. Preferred laboratory methods are considered on the basis of experience at several large sequencing centers where >70% of genomes are now often recovered from single cells. Methods are reviewed for preparation of bacterial fractions from environmental samples, single-cell isolation, DNA amplification by MDA, and DNA sequencing.

  10. Assembly and Multiplex Genome Integration of Metabolic Pathways in Yeast Using CasEMBLR.

    Science.gov (United States)

    Jakočiūnas, Tadas; Jensen, Emil D; Jensen, Michael K; Keasling, Jay D

    2018-01-01

    Genome integration is a vital step for implementing large biochemical pathways to build a stable microbial cell factory. Although traditional strain construction strategies are well established for the model organism Saccharomyces cerevisiae, recent advances in CRISPR/Cas9-mediated genome engineering allow much higher throughput and robustness in terms of strain construction. In this chapter, we describe CasEMBLR, a highly efficient and marker-free genome engineering method for one-step integration of in vivo assembled expression cassettes in multiple genomic sites simultaneously. CasEMBLR capitalizes on the CRISPR/Cas9 technology to generate double-strand breaks in genomic loci, thus prompting native homologous recombination (HR) machinery to integrate exogenously derived homology templates. As proof-of-principle for microbial cell factory development, CasEMBLR was used for one-step assembly and marker-free integration of the carotenoid pathway from 15 exogenously supplied DNA parts into three targeted genomic loci. As a second proof-of-principle, a total of ten DNA parts were assembled and integrated in two genomic loci to construct a tyrosine production strain, and at the same time knocking out two genes. This new method complements and improves the field of genome engineering in S. cerevisiae by providing a more flexible platform for rapid and precise strain building.

  11. Genome-wide association study of Arabidopsis thaliana leaf microbial community.

    Science.gov (United States)

    Horton, Matthew W; Bodenhausen, Natacha; Beilsmith, Kathleen; Meng, Dazhe; Muegge, Brian D; Subramanian, Sathish; Vetter, M Madlen; Vilhjálmsson, Bjarni J; Nordborg, Magnus; Gordon, Jeffrey I; Bergelson, Joy

    2014-11-10

    Identifying the factors that influence the outcome of host-microbial interactions is critical to protecting biodiversity, minimizing agricultural losses and improving human health. A few genes that determine symbiosis or resistance to infectious disease have been identified in model species, but a comprehensive examination of how a host genotype influences the structure of its microbial community is lacking. Here we report the results of a field experiment with the model plant Arabidopsis thaliana to identify the fungi and bacteria that colonize its leaves and the host loci that influence the microbe numbers. The composition of this community differs among accessions of A. thaliana. Genome-wide association studies (GWAS) suggest that plant loci responsible for defense and cell wall integrity affect variation in this community. Furthermore, species richness in the bacterial community is shaped by host genetic variation, notably at loci that also influence the reproduction of viruses, trichome branching and morphogenesis.

  12. Microbial taxonomy in the post-genomic era: Rebuilding from scratch?

    Energy Technology Data Exchange (ETDEWEB)

    Thompson, Cristiane C. [Univ. of Rio de Janeiro (UFRJ) (Brazil); Amaral, Gilda R. [Univ. of Rio de Janeiro (UFRJ) (Brazil); Campeão, Mariana [Univ. of Rio de Janeiro (UFRJ) (Brazil); Edwards, Robert A. [Univ. of Rio de Janeiro (UFRJ) (Brazil); San Diego State Univ., CA (United States); Argonne National Lab. (ANL), Argonne, IL (United States); Polz, Martin F. [Massachusetts Inst. of Technology (MIT), Cambridge, MA (United States); Dutilh, Bas E. [Univ. of Rio de Janeiro (UFRJ) (Brazil); Radbould Univ., Nijmegen (Netherlands); Ussery, David W. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Sawabe, Tomoo [Hokkaido Univ., Hakodate (Japan); Swings, Jean [Univ. of Rio de Janeiro (UFRJ) (Brazil); Ghent Univ. (Belgium); Thompson, Fabiano L. [Univ. of Rio de Janeiro (UFRJ) (Brazil); Advanced Systems Laboratory Production Management COPPE / UFRJ, Rio de Janeiro (Brazil)

    2014-12-23

    Microbial taxonomy should provide adequate descriptions of bacterial, archaeal, and eukaryotic microbial diversity in ecological, clinical, and industrial environments. We re-evaluated the prokaryote species twice. It is time to revisit polyphasic taxonomy, its principles, and its practice, including its underlying pragmatic species concept. We will be able to realize an old dream of our predecessor taxonomists and build a genomic-based microbial taxonomy, using standardized and automated curation of high-quality complete genome sequences as the new gold standard.

  13. Transcription as a Threat to Genome Integrity.

    Science.gov (United States)

    Gaillard, Hélène; Aguilera, Andrés

    2016-06-02

    Genomes undergo different types of sporadic alterations, including DNA damage, point mutations, and genome rearrangements, that constitute the basis for evolution. However, these changes may occur at high levels as a result of cell pathology and trigger genome instability, a hallmark of cancer and a number of genetic diseases. In the last two decades, evidence has accumulated that transcription constitutes an important natural source of DNA metabolic errors that can compromise the integrity of the genome. Transcription can create the conditions for high levels of mutations and recombination by its ability to open the DNA structure and remodel chromatin, making it more accessible to DNA insulting agents, and by its ability to become a barrier to DNA replication. Here we review the molecular basis of such events from a mechanistic perspective with particular emphasis on the role of transcription as a genome instability determinant.

  14. Genome-driven evolutionary game theory helps understand the rise of metabolic interdependencies in microbial communities.

    Science.gov (United States)

    Zomorrodi, Ali R; Segrè, Daniel

    2017-11-16

    Metabolite exchanges in microbial communities give rise to ecological interactions that govern ecosystem diversity and stability. It is unclear, however, how the rise of these interactions varies across metabolites and organisms. Here we address this question by integrating genome-scale models of metabolism with evolutionary game theory. Specifically, we use microbial fitness values estimated by metabolic models to infer evolutionarily stable interactions in multi-species microbial "games". We first validate our approach using a well-characterized yeast cheater-cooperator system. We next perform over 80,000 in silico experiments to infer how metabolic interdependencies mediated by amino acid leakage in Escherichia coli vary across 189 amino acid pairs. While most pairs display shared patterns of inter-species interactions, multiple deviations are caused by pleiotropy and epistasis in metabolism. Furthermore, simulated invasion experiments reveal possible paths to obligate cross-feeding. Our study provides genomically driven insight into the rise of ecological interactions, with implications for microbiome research and synthetic ecology.

  15. Mapping genomic features to functional traits through microbial whole genome sequences.

    Science.gov (United States)

    Zhang, Wei; Zeng, Erliang; Liu, Dan; Jones, Stuart E; Emrich, Scott

    2014-01-01

    Recently, the utility of trait-based approaches for microbial communities has been identified. Increasing availability of whole genome sequences provide the opportunity to explore the genetic foundations of a variety of functional traits. We proposed a machine learning framework to quantitatively link the genomic features with functional traits. Genes from bacteria genomes belonging to different functional traits were grouped to Cluster of Orthologs (COGs), and were used as features. Then, TF-IDF technique from the text mining domain was applied to transform the data to accommodate the abundance and importance of each COG. After TF-IDF processing, COGs were ranked using feature selection methods to identify their relevance to the functional trait of interest. Extensive experimental results demonstrated that functional trait related genes can be detected using our method. Further, the method has the potential to provide novel biological insights.

  16. Whole-Genome Sequencing in Microbial Forensic Analysis of Gamma-Irradiated Microbial Materials.

    Science.gov (United States)

    Broomall, Stacey M; Ait Ichou, Mohamed; Krepps, Michael D; Johnsky, Lauren A; Karavis, Mark A; Hubbard, Kyle S; Insalaco, Joseph M; Betters, Janet L; Redmond, Brady W; Rivers, Bryan A; Liem, Alvin T; Hill, Jessica M; Fochler, Edward T; Roth, Pierce A; Rosenzweig, C Nicole; Skowronski, Evan W; Gibbons, Henry S

    2016-01-15

    Effective microbial forensic analysis of materials used in a potential biological attack requires robust methods of morphological and genetic characterization of the attack materials in order to enable the attribution of the materials to potential sources and to exclude other potential sources. The genetic homogeneity and potential intersample variability of many of the category A to C bioterrorism agents offer a particular challenge to the generation of attributive signatures, potentially requiring whole-genome or proteomic approaches to be utilized. Currently, irradiation of mail is standard practice at several government facilities judged to be at particularly high risk. Thus, initial forensic signatures would need to be recovered from inactivated (nonviable) material. In the study described in this report, we determined the effects of high-dose gamma irradiation on forensic markers of bacterial biothreat agent surrogate organisms with a particular emphasis on the suitability of genomic DNA (gDNA) recovered from such sources as a template for whole-genome analysis. While irradiation of spores and vegetative cells affected the retention of Gram and spore stains and sheared gDNA into small fragments, we found that irradiated material could be utilized to generate accurate whole-genome sequence data on the Illumina and Roche 454 sequencing platforms. Copyright © 2016, American Society for Microbiology. All Rights Reserved.

  17. A Web-Based Comparative Genomics Tutorial for Investigating Microbial Genomes

    Directory of Open Access Journals (Sweden)

    Michael Strong

    2009-12-01

    Full Text Available As the number of completely sequenced microbial genomes continues to rise at an impressive rate, it is important to prepare students with the skills necessary to investigate microorganisms at the genomic level. As a part of the core curriculum for first-year graduate students in the biological sciences, we have implemented a web-based tutorial to introduce students to the fields of comparative and functional genomics. The tutorial focuses on recent computational methods for identifying functionally linked genes and proteins on a genome-wide scale and was used to introduce students to the Rosetta Stone, Phylogenetic Profile, conserved Gene Neighbor, and Operon computational methods. Students learned to use a number of publicly available web servers and databases to identify functionally linked genes in the Escherichia coli genome, with emphasis on genome organization and operon structure. The overall effectiveness of the tutorial was assessed based on student evaluations and homework assignments. The tutorial is available to other educators at http://www.doe-mbi.ucla.edu/~strong/m253.php.

  18. Genome-based microbial ecology of anammox granules in a full-scale wastewater treatment system.

    Science.gov (United States)

    Speth, Daan R; In 't Zandt, Michiel H; Guerrero-Cruz, Simon; Dutilh, Bas E; Jetten, Mike S M

    2016-03-31

    Partial-nitritation anammox (PNA) is a novel wastewater treatment procedure for energy-efficient ammonium removal. Here we use genome-resolved metagenomics to build a genome-based ecological model of the microbial community in a full-scale PNA reactor. Sludge from the bioreactor examined here is used to seed reactors in wastewater treatment plants around the world; however, the role of most of its microbial community in ammonium removal remains unknown. Our analysis yielded 23 near-complete draft genomes that together represent the majority of the microbial community. We assign these genomes to distinct anaerobic and aerobic microbial communities. In the aerobic community, nitrifying organisms and heterotrophs predominate. In the anaerobic community, widespread potential for partial denitrification suggests a nitrite loop increases treatment efficiency. Of our genomes, 19 have no previously cultivated or sequenced close relatives and six belong to bacterial phyla without any cultivated members, including the most complete Omnitrophica (formerly OP3) genome to date.

  19. MBGD update 2015: microbial genome database for flexible ortholog analysis utilizing a diverse set of genomic data.

    Science.gov (United States)

    Uchiyama, Ikuo; Mihara, Motohiro; Nishide, Hiroyo; Chiba, Hirokazu

    2015-01-01

    The microbial genome database for comparative analysis (MBGD) (available at http://mbgd.genome.ad.jp/) is a comprehensive ortholog database for flexible comparative analysis of microbial genomes, where the users are allowed to create an ortholog table among any specified set of organisms. Because of the rapid increase in microbial genome data owing to the next-generation sequencing technology, it becomes increasingly challenging to maintain high-quality orthology relationships while allowing the users to incorporate the latest genomic data available into an analysis. Because many of the recently accumulating genomic data are draft genome sequences for which some complete genome sequences of the same or closely related species are available, MBGD now stores draft genome data and allows the users to incorporate them into a user-specific ortholog database using the MyMBGD functionality. In this function, draft genome data are incorporated into an existing ortholog table created only from the complete genome data in an incremental manner to prevent low-quality draft data from affecting clustering results. In addition, to provide high-quality orthology relationships, the standard ortholog table containing all the representative genomes, which is first created by the rapid classification program DomClust, is now refined using DomRefine, a recently developed program for improving domain-level clustering using multiple sequence alignment information. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  20. Sequencing Single Cell Microbial Genomes with Microfluidic Amplifications Tools (MICW - Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    Energy Technology Data Exchange (ETDEWEB)

    Quake, Steve

    2011-10-12

    Stanford University's Steve Quake on "Sequencing Single Cell Microbial Genomes with Microfluidic Amplification Tools" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  1. Integrating genomics into undergraduate nursing education.

    Science.gov (United States)

    Daack-Hirsch, Sandra; Dieter, Carla; Quinn Griffin, Mary T

    2011-09-01

    To prepare the next generation of nurses, faculty are now faced with the challenge of incorporating genomics into curricula. Here we discuss how to meet this challenge. Steps to initiate curricular changes to include genomics are presented along with a discussion on creating a genomic curriculum thread versus a standalone course. Ideas for use of print material and technology on genomic topics are also presented. Information is based on review of the literature and curriculum change efforts by the authors. In recognition of advances in genomics, the nursing profession is increasing an emphasis on the integration of genomics into professional practice and educational standards. Incorporating genomics into nurses' practices begins with changes in our undergraduate curricula. Information given in didactic courses should be reinforced in clinical practica, and Internet-based tools such as WebQuest, Second Life, and wikis offer attractive, up-to-date platforms to deliver this now crucial content. To provide information that may assist faculty to prepare the next generation of nurses to practice using genomics. © 2011 Sigma Theta Tau International.

  2. MICROBIAL BIOFILMS AS INTEGRATIVE SENSORS OF ENVIRONMENTAL QUALITY

    Science.gov (United States)

    Snyder, Richard A., Michael A. Lewis, Andreas Nocker and Joe E. Lepo. In press. Microbial Biofilms as Integrative Sensors of Environmental Quality. In: Estuarine Indicators Workshop Proceedings. CRC Press, Boca Raton, FL. 34 p. (ERL,GB 1198). Microbial biofilms are comple...

  3. Amino Acid Usage Is Asymmetrically Biased in AT- and GC-Rich Microbial Genomes

    DEFF Research Database (Denmark)

    Bohlin, Jon; Brynildsrud, Ola Brønstad; Vesth, Tammi Camilla

    2013-01-01

    frequencies were distributed in over 2000 microbial genomes and how these distributions were affected by base compositional changes. In addition, we wanted to know how genome-wide amino acid usage was biased in the different genomes and how changes to base composition and mutations affected this bias...... purifying selection than genomes with higher AAUB. Conclusion: Genomic base composition has a substantial effect on both amino acid- and codon frequencies in bacterial genomes. While phylogeny influenced amino acid usage more in GC-rich genomes, AT-content was driving amino acid usage in AT-rich genomes. We...

  4. Systematic evaluation of bias in microbial community profiles induced by whole genome amplification.

    Science.gov (United States)

    Direito, Susana O L; Zaura, Egija; Little, Miranda; Ehrenfreund, Pascale; Röling, Wilfred F M

    2014-03-01

    Whole genome amplification methods facilitate the detection and characterization of microbial communities in low biomass environments. We examined the extent to which the actual community structure is reliably revealed and factors contributing to bias. One widely used [multiple displacement amplification (MDA)] and one new primer-free method [primase-based whole genome amplification (pWGA)] were compared using a polymerase chain reaction (PCR)-based method as control. Pyrosequencing of an environmental sample and principal component analysis revealed that MDA impacted community profiles more strongly than pWGA and indicated that this related to species GC content, although an influence of DNA integrity could not be excluded. Subsequently, biases by species GC content, DNA integrity and fragment size were separately analysed using defined mixtures of DNA from various species. We found significantly less amplification of species with the highest GC content for MDA-based templates and, to a lesser extent, for pWGA. DNA fragmentation also interfered severely: species with more fragmented DNA were less amplified with MDA and pWGA. pWGA was unable to amplify low molecular weight DNA (microbial communities in low-biomass environments and for currently planned astrobiological missions to Mars. © 2013 Society for Applied Microbiology and John Wiley & Sons Ltd.

  5. GAPIT: genome association and prediction integrated tool.

    Science.gov (United States)

    Lipka, Alexander E; Tian, Feng; Wang, Qishan; Peiffer, Jason; Li, Meng; Bradbury, Peter J; Gore, Michael A; Buckler, Edward S; Zhang, Zhiwu

    2012-09-15

    Software programs that conduct genome-wide association studies and genomic prediction and selection need to use methodologies that maximize statistical power, provide high prediction accuracy and run in a computationally efficient manner. We developed an R package called Genome Association and Prediction Integrated Tool (GAPIT) that implements advanced statistical methods including the compressed mixed linear model (CMLM) and CMLM-based genomic prediction and selection. The GAPIT package can handle large datasets in excess of 10 000 individuals and 1 million single-nucleotide polymorphisms with minimal computational time, while providing user-friendly access and concise tables and graphs to interpret results. http://www.maizegenetics.net/GAPIT. zhiwu.zhang@cornell.edu Supplementary data are available at Bioinformatics online.

  6. GENOME-BASED MODELING AND DESIGN OF METABOLIC INTERACTIONS IN MICROBIAL COMMUNITIES

    Directory of Open Access Journals (Sweden)

    Radhakrishnan Mahadevan

    2012-10-01

    With the advent of genome sequencing, omics technologies, bioinformatics and genome-scale modeling, researchers now have unprecedented capabilities to analyze and engineer the metabolism of microbial communities. The goal of this review is to summarize recent applications of genome-scale metabolic modeling to microbial communities. A brief introduction to lumped community models is used to motivate the need for genome-level descriptions of individual species and their metabolic interactions. The review of genome-scale models begins with static modeling approaches, which are appropriate for communities where the extracellular environment can be assumed to be time invariant or slowly varying. Dynamic extensions of the static modeling approach are described, and then applications of genome-scale models for design of synthetic microbial communities are reviewed. The review concludes with a summary of metagenomic tools for analyzing community metabolism and an outlook for future research.

  7. Integrated Genome-Based Studies of Shewanella Echophysiology

    Energy Technology Data Exchange (ETDEWEB)

    Margrethe H. Serres

    2012-06-29

    Shewanella oneidensis MR-1 is a motile, facultative {gamma}-Proteobacterium with remarkable respiratory versatility; it can utilize a range of organic and inorganic compounds as terminal electronacceptors for anaerobic metabolism. The ability to effectively reduce nitrate, S0, polyvalent metals andradionuclides has established MR-1 as an important model dissimilatory metal-reducing microorganism for genome-based investigations of biogeochemical transformation of metals and radionuclides that are of concern to the U.S. Department of Energy (DOE) sites nationwide. Metal-reducing bacteria such as Shewanella also have a highly developed capacity for extracellular transfer of respiratory electrons to solid phase Fe and Mn oxides as well as directly to anode surfaces in microbial fuel cells. More broadly, Shewanellae are recognized free-living microorganisms and members of microbial communities involved in the decomposition of organic matter and the cycling of elements in aquatic and sedimentary systems. To function and compete in environments that are subject to spatial and temporal environmental change, Shewanella must be able to sense and respond to such changes and therefore require relatively robust sensing and regulation systems. The overall goal of this project is to apply the tools of genomics, leveraging the availability of genome sequence for 18 additional strains of Shewanella, to better understand the ecophysiology and speciation of respiratory-versatile members of this important genus. To understand these systems we propose to use genome-based approaches to investigate Shewanella as a system of integrated networks; first describing key cellular subsystems - those involved in signal transduction, regulation, and metabolism - then building towards understanding the function of whole cells and, eventually, cells within populations. As a general approach, this project will employ complimentary "top-down" - bioinformatics-based genome functional predictions, high

  8. A Case Study into Microbial Genome Assembly Gap Sequences and Finishing Strategies.

    Science.gov (United States)

    Utturkar, Sagar M; Klingeman, Dawn M; Hurt, Richard A; Brown, Steven D

    2017-01-01

    This study characterized regions of DNA which remained unassembled by either PacBio and Illumina sequencing technologies for seven bacterial genomes. Two genomes were manually finished using bioinformatics and PCR/Sanger sequencing approaches and regions not assembled by automated software were analyzed. Gaps present within Illumina assemblies mostly correspond to repetitive DNA regions such as multiple rRNA operon sequences. PacBio gap sequences were evaluated for several properties such as GC content, read coverage, gap length, ability to form strong secondary structures, and corresponding annotations. Our hypothesis that strong secondary DNA structures blocked DNA polymerases and contributed to gap sequences was not accepted. PacBio assemblies had few limitations overall and gaps were explained as cumulative effect of lower than average sequence coverage and repetitive sequences at contig termini. An important aspect of the present study is the compilation of biological features that interfered with assembly and included active transposons, multiple plasmid sequences, phage DNA integration, and large sequence duplication. Our targeted genome finishing approach and systematic evaluation of the unassembled DNA will be useful for others looking to close, finish, and polish microbial genome sequences.

  9. SNP-associations and phenotype predictions from hundreds of microbial genomes without genome alignments.

    Science.gov (United States)

    Hall, Barry G

    2014-01-01

    SNP-association studies are a starting point for identifying genes that may be responsible for specific phenotypes, such as disease traits. The vast bulk of tools for SNP-association studies are directed toward SNPs in the human genome, and I am unaware of any tools designed specifically for such studies in bacterial or viral genomes. The PPFS (Predict Phenotypes From SNPs) package described here is an add-on to kSNP , a program that can identify SNPs in a data set of hundreds of microbial genomes. PPFS identifies those SNPs that are non-randomly associated with a phenotype based on the χ² probability, then uses those diagnostic SNPs for two distinct, but related, purposes: (1) to predict the phenotypes of strains whose phenotypes are unknown, and (2) to identify those diagnostic SNPs that are most likely to be causally related to the phenotype. In the example illustrated here, from a set of 68 E. coli genomes, for 67 of which the pathogenicity phenotype was known, there were 418,500 SNPs. Using the phenotypes of 36 of those strains, PPFS identified 207 diagnostic SNPs. The diagnostic SNPs predicted the phenotypes of all of the genomes with 97% accuracy. It then identified 97 SNPs whose probability of being causally related to the pathogenic phenotype was >0.999. In a second example, from a set of 116 E. coli genome sequences, using the phenotypes of 65 strains PPFS identified 101 SNPs that predicted the source host (human or non-human) with 90% accuracy.

  10. Host Genome Influence on Gut Microbial Composition and Microbial Prediction of Complex Traits in Pigs.

    Science.gov (United States)

    Camarinha-Silva, Amelia; Maushammer, Maria; Wellmann, Robin; Vital, Marius; Preuss, Siegfried; Bennewitz, Jörn

    2017-07-01

    The aim of the present study was to analyze the interplay between gastrointestinal tract (GIT) microbiota, host genetics, and complex traits in pigs using extended quantitative-genetic methods. The study design consisted of 207 pigs that were housed and slaughtered under standardized conditions, and phenotyped for daily gain, feed intake, and feed conversion rate. The pigs were genotyped with a standard 60 K SNP chip. The GIT microbiota composition was analyzed by 16S rRNA gene amplicon sequencing technology. Eight from 49 investigated bacteria genera showed a significant narrow sense host heritability, ranging from 0.32 to 0.57. Microbial mixed linear models were applied to estimate the microbiota variance for each complex trait. The fraction of phenotypic variance explained by the microbial variance was 0.28, 0.21, and 0.16 for daily gain, feed conversion, and feed intake, respectively. The SNP data and the microbiota composition were used to predict the complex traits using genomic best linear unbiased prediction (G-BLUP) and microbial best linear unbiased prediction (M-BLUP) methods, respectively. The prediction accuracies of G-BLUP were 0.35, 0.23, and 0.20 for daily gain, feed conversion, and feed intake, respectively. The corresponding prediction accuracies of M-BLUP were 0.41, 0.33, and 0.33. Thus, in addition to SNP data, microbiota abundances are an informative source of complex trait predictions. Since the pig is a well-suited animal for modeling the human digestive tract, M-BLUP, in addition to G-BLUP, might be beneficial for predicting human predispositions to some diseases, and, consequently, for preventative and personalized medicine. Copyright © 2017 by the Genetics Society of America.

  11. Systematic evaluation of bias in microbial community profiles induced by whole genome amplification.

    NARCIS (Netherlands)

    Direito, S.; Zaura, E.; Little, M.; Ehrenfreund, P.; Roling, W.F.M.

    2014-01-01

    Whole genome amplification methods facilitate the detection and characterization of microbial communities in low biomass environments. We examined the extent to which the actual community structure is reliably revealed and factors contributing to bias. One widely used [multiple displacement

  12. Systematic evaluation of bias in microbial community profiles induced by whole genome amplification

    NARCIS (Netherlands)

    Direito, S.O.L.; Zaura, E.; Little, M.; Ehrenfreund, P.; Röling, W.F.M.

    2014-01-01

    Whole genome amplification methods facilitate the detection and characterization of microbial communities in low biomass environments. We examined the extent to which the actual community structure is reliably revealed and factors contributing to bias. One widely used [multiple displacement

  13. Genome-based microbial ecology of anammox granules in a full-scale wastewater treatment system

    OpenAIRE

    Speth, D.R.; Zandt, M.H. in 't; Guerrero Cruz, S.; Dutilh, B.E.; Jetten, M.S.M.

    2016-01-01

    Partial-nitritation anammox (PNA) is a novel wastewater treatment procedure for energy-efficient ammonium removal. Here we use genome-resolved metagenomics to build a genome-based ecological model of the microbial community in a full-scale PNA reactor. Sludge from the bioreactor examined here is used to seed reactors in wastewater treatment plants around the world; however, the role of most of its microbial community in ammonium removal remains unknown. Our analysis yielded 23 near-complete d...

  14. The Harvest suite for rapid core-genome alignment and visualization of thousands of intraspecific microbial genomes.

    Science.gov (United States)

    Treangen, Todd J; Ondov, Brian D; Koren, Sergey; Phillippy, Adam M

    2014-01-01

    Whole-genome sequences are now available for many microbial species and clades, however existing whole-genome alignment methods are limited in their ability to perform sequence comparisons of multiple sequences simultaneously. Here we present the Harvest suite of core-genome alignment and visualization tools for the rapid and simultaneous analysis of thousands of intraspecific microbial strains. Harvest includes Parsnp, a fast core-genome multi-aligner, and Gingr, a dynamic visual platform. Together they provide interactive core-genome alignments, variant calls, recombination detection, and phylogenetic trees. Using simulated and real data we demonstrate that our approach exhibits unrivaled speed while maintaining the accuracy of existing methods. The Harvest suite is open-source and freely available from: http://github.com/marbl/harvest.

  15. Microbial genome mining for accelerated natural products discovery: is a renaissance in the making?

    Science.gov (United States)

    Bachmann, Brian O; Van Lanen, Steven G; Baltz, Richard H

    2014-02-01

    Microbial genome mining is a rapidly developing approach to discover new and novel secondary metabolites for drug discovery. Many advances have been made in the past decade to facilitate genome mining, and these are reviewed in this Special Issue of the Journal of Industrial Microbiology and Biotechnology. In this Introductory Review, we discuss the concept of genome mining and why it is important for the revitalization of natural product discovery; what microbes show the most promise for focused genome mining; how microbial genomes can be mined; how genome mining can be leveraged with other technologies; how progress on genome mining can be accelerated; and who should fund future progress in this promising field. We direct interested readers to more focused reviews on the individual topics in this Special Issue for more detailed summaries on the current state-of-the-art.

  16. Integrated hydrogen production process from cellulose by combining dark fermentation, microbial fuel cells, and a microbial electrolysis cell

    KAUST Repository

    Wang, Aijie; Sun, Dan; Cao, Guangli; Wang, Haoyu; Ren, Nanqi; Wu, Wei-Min; Logan, Bruce E.

    2011-01-01

    Hydrogen gas production from cellulose was investigated using an integrated hydrogen production process consisting of a dark fermentation reactor and microbial fuel cells (MFCs) as power sources for a microbial electrolysis cell (MEC). Two MFCs

  17. Genomic integrity and the ageing brain.

    Science.gov (United States)

    Chow, Hei-man; Herrup, Karl

    2015-11-01

    DNA damage is correlated with and may drive the ageing process. Neurons in the brain are postmitotic and are excluded from many forms of DNA repair; therefore, neurons are vulnerable to various neurodegenerative diseases. The challenges facing the field are to understand how and when neuronal DNA damage accumulates, how this loss of genomic integrity might serve as a 'time keeper' of nerve cell ageing and why this process manifests itself as different diseases in different individuals.

  18. Reducing assembly complexity of microbial genomes with single-molecule sequencing

    Science.gov (United States)

    Genome assembly algorithms cannot fully reconstruct microbial chromosomes from the DNA reads output by first or second-generation sequencing instruments. Therefore, most genomes are left unfinished due to the significant resources required to manually close gaps left in the draft assemblies. Single-...

  19. Genome-based microbial ecology of anammox granules in a full-scale wastewater treatment system

    NARCIS (Netherlands)

    Speth, D.R.; Zandt, M.H. in 't; Guerrero Cruz, S.; Dutilh, B.E.; Jetten, M.S.M.

    2016-01-01

    Partial-nitritation anammox (PNA) is a novel wastewater treatment procedure for energy-efficient ammonium removal. Here we use genome-resolved metagenomics to build a genome-based ecological model of the microbial community in a full-scale PNA reactor. Sludge from the bioreactor examined here is

  20. Meta genome-wide network from functional linkages of genes in human gut microbial ecosystems.

    Science.gov (United States)

    Ji, Yan; Shi, Yixiang; Wang, Chuan; Dai, Jianliang; Li, Yixue

    2013-03-01

    The human gut microbial ecosystem (HGME) exerts an important influence on the human health. In recent researches, meta-genomics provided deep insights into the HGME in terms of gene contents, metabolic processes and genome constitutions of meta-genome. Here we present a novel methodology to investigate the HGME on the basis of a set of functionally coupled genes regardless of their genome origins when considering the co-evolution properties of genes. By analyzing these coupled genes, we showed some basic properties of HGME significantly associated with each other, and further constructed a protein interaction map of human gut meta-genome to discover some functional modules that may relate with essential metabolic processes. Compared with other studies, our method provides a new idea to extract basic function elements from meta-genome systems and investigate complex microbial environment by associating its biological traits with co-evolutionary fingerprints encoded in it.

  1. Alignment-free microbial phylogenomics under scenarios of sequence divergence, genome rearrangement and lateral genetic transfer.

    Science.gov (United States)

    Bernard, Guillaume; Chan, Cheong Xin; Ragan, Mark A

    2016-07-01

    Alignment-free (AF) approaches have recently been highlighted as alternatives to methods based on multiple sequence alignment in phylogenetic inference. However, the sensitivity of AF methods to genome-scale evolutionary scenarios is little known. Here, using simulated microbial genome data we systematically assess the sensitivity of nine AF methods to three important evolutionary scenarios: sequence divergence, lateral genetic transfer (LGT) and genome rearrangement. Among these, AF methods are most sensitive to the extent of sequence divergence, less sensitive to low and moderate frequencies of LGT, and most robust against genome rearrangement. We describe the application of AF methods to three well-studied empirical genome datasets, and introduce a new application of the jackknife to assess node support. Our results demonstrate that AF phylogenomics is computationally scalable to multi-genome data and can generate biologically meaningful phylogenies and insights into microbial evolution.

  2. Variant Review with the Integrative Genomics Viewer.

    Science.gov (United States)

    Robinson, James T; Thorvaldsdóttir, Helga; Wenger, Aaron M; Zehir, Ahmet; Mesirov, Jill P

    2017-11-01

    Manual review of aligned reads for confirmation and interpretation of variant calls is an important step in many variant calling pipelines for next-generation sequencing (NGS) data. Visual inspection can greatly increase the confidence in calls, reduce the risk of false positives, and help characterize complex events. The Integrative Genomics Viewer (IGV) was one of the first tools to provide NGS data visualization, and it currently provides a rich set of tools for inspection, validation, and interpretation of NGS datasets, as well as other types of genomic data. Here, we present a short overview of IGV's variant review features for both single-nucleotide variants and structural variants, with examples from both cancer and germline datasets. IGV is freely available at https://www.igv.org Cancer Res; 77(21); e31-34. ©2017 AACR . ©2017 American Association for Cancer Research.

  3. CloVR-Comparative: automated, cloud-enabled comparative microbial genome sequence analysis pipeline

    OpenAIRE

    Agrawal, Sonia; Arze, Cesar; Adkins, Ricky S.; Crabtree, Jonathan; Riley, David; Vangala, Mahesh; Galens, Kevin; Fraser, Claire M.; Tettelin, Herv?; White, Owen; Angiuoli, Samuel V.; Mahurkar, Anup; Fricke, W. Florian

    2017-01-01

    Background The benefit of increasing genomic sequence data to the scientific community depends on easy-to-use, scalable bioinformatics support. CloVR-Comparative combines commonly used bioinformatics tools into an intuitive, automated, and cloud-enabled analysis pipeline for comparative microbial genomics. Results CloVR-Comparative runs on annotated complete or draft genome sequences that are uploaded by the user or selected via a taxonomic tree-based user interface and downloaded from NCBI. ...

  4. High-throughput automated microfluidic sample preparation for accurate microbial genomics.

    Science.gov (United States)

    Kim, Soohong; De Jonghe, Joachim; Kulesa, Anthony B; Feldman, David; Vatanen, Tommi; Bhattacharyya, Roby P; Berdy, Brittany; Gomez, James; Nolan, Jill; Epstein, Slava; Blainey, Paul C

    2017-01-27

    Low-cost shotgun DNA sequencing is transforming the microbial sciences. Sequencing instruments are so effective that sample preparation is now the key limiting factor. Here, we introduce a microfluidic sample preparation platform that integrates the key steps in cells to sequence library sample preparation for up to 96 samples and reduces DNA input requirements 100-fold while maintaining or improving data quality. The general-purpose microarchitecture we demonstrate supports workflows with arbitrary numbers of reaction and clean-up or capture steps. By reducing the sample quantity requirements, we enabled low-input (∼10,000 cells) whole-genome shotgun (WGS) sequencing of Mycobacterium tuberculosis and soil micro-colonies with superior results. We also leveraged the enhanced throughput to sequence ∼400 clinical Pseudomonas aeruginosa libraries and demonstrate excellent single-nucleotide polymorphism detection performance that explained phenotypically observed antibiotic resistance. Fully-integrated lab-on-chip sample preparation overcomes technical barriers to enable broader deployment of genomics across many basic research and translational applications.

  5. Population Genomics of Infectious and Integrated Wolbachia pipientis Genomes in Drosophila ananassae

    Science.gov (United States)

    Choi, Jae Young; Bubnell, Jaclyn E.; Aquadro, Charles F.

    2015-01-01

    Coevolution between Drosophila and its endosymbiont Wolbachia pipientis has many intriguing aspects. For example, Drosophila ananassae hosts two forms of W. pipientis genomes: One being the infectious bacterial genome and the other integrated into the host nuclear genome. Here, we characterize the infectious and integrated genomes of W. pipientis infecting D. ananassae (wAna), by genome sequencing 15 strains of D. ananassae that have either the infectious or integrated wAna genomes. Results indicate evolutionarily stable maternal transmission for the infectious wAna genome suggesting a relatively long-term coevolution with its host. In contrast, the integrated wAna genome showed pseudogene-like characteristics accumulating many variants that are predicted to have deleterious effects if present in an infectious bacterial genome. Phylogenomic analysis of sequence variation together with genotyping by polymerase chain reaction of large structural variations indicated several wAna variants among the eight infectious wAna genomes. In contrast, only a single wAna variant was found among the seven integrated wAna genomes examined in lines from Africa, south Asia, and south Pacific islands suggesting that the integration occurred once from a single infectious wAna genome and then spread geographically. Further analysis revealed that for all D. ananassae we examined with the integrated wAna genomes, the majority of the integrated wAna genomic regions is represented in at least two copies suggesting a double integration or single integration followed by an integrated genome duplication. The possible evolutionary mechanism underlying the widespread geographical presence of the duplicate integration of the wAna genome is an intriguing question remaining to be answered. PMID:26254486

  6. Integrative Genomics Viewer (IGV) | Informatics Technology for Cancer Research (ITCR)

    Science.gov (United States)

    The Integrative Genomics Viewer (IGV) is a high-performance visualization tool for interactive exploration of large, integrated genomic datasets. It supports a wide variety of data types, including array-based and next-generation sequence data, and genomic annotations.

  7. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes

    Science.gov (United States)

    Parks, Donovan H.; Imelfort, Michael; Skennerton, Connor T.; Hugenholtz, Philip; Tyson, Gene W.

    2015-01-01

    Large-scale recovery of genomes from isolates, single cells, and metagenomic data has been made possible by advances in computational methods and substantial reductions in sequencing costs. Although this increasing breadth of draft genomes is providing key information regarding the evolutionary and functional diversity of microbial life, it has become impractical to finish all available reference genomes. Making robust biological inferences from draft genomes requires accurate estimates of their completeness and contamination. Current methods for assessing genome quality are ad hoc and generally make use of a limited number of “marker” genes conserved across all bacterial or archaeal genomes. Here we introduce CheckM, an automated method for assessing the quality of a genome using a broader set of marker genes specific to the position of a genome within a reference genome tree and information about the collocation of these genes. We demonstrate the effectiveness of CheckM using synthetic data and a wide range of isolate-, single-cell-, and metagenome-derived genomes. CheckM is shown to provide accurate estimates of genome completeness and contamination and to outperform existing approaches. Using CheckM, we identify a diverse range of errors currently impacting publicly available isolate genomes and demonstrate that genomes obtained from single cells and metagenomic data vary substantially in quality. In order to facilitate the use of draft genomes, we propose an objective measure of genome quality that can be used to select genomes suitable for specific gene- and genome-centric analyses of microbial communities. PMID:25977477

  8. Integrating microbial diversity in soil carbon dynamic models parameters

    Science.gov (United States)

    Louis, Benjamin; Menasseri-Aubry, Safya; Leterme, Philippe; Maron, Pierre-Alain; Viaud, Valérie

    2015-04-01

    Faced with the numerous concerns about soil carbon dynamic, a large quantity of carbon dynamic models has been developed during the last century. These models are mainly in the form of deterministic compartment models with carbon fluxes between compartments represented by ordinary differential equations. Nowadays, lots of them consider the microbial biomass as a compartment of the soil organic matter (carbon quantity). But the amount of microbial carbon is rarely used in the differential equations of the models as a limiting factor. Additionally, microbial diversity and community composition are mostly missing, although last advances in soil microbial analytical methods during the two past decades have shown that these characteristics play also a significant role in soil carbon dynamic. As soil microorganisms are essential drivers of soil carbon dynamic, the question about explicitly integrating their role have become a key issue in soil carbon dynamic models development. Some interesting attempts can be found and are dominated by the incorporation of several compartments of different groups of microbial biomass in terms of functional traits and/or biogeochemical compositions to integrate microbial diversity. However, these models are basically heuristic models in the sense that they are used to test hypotheses through simulations. They have rarely been confronted to real data and thus cannot be used to predict realistic situations. The objective of this work was to empirically integrate microbial diversity in a simple model of carbon dynamic through statistical modelling of the model parameters. This work is based on available experimental results coming from a French National Research Agency program called DIMIMOS. Briefly, 13C-labelled wheat residue has been incorporated into soils with different pedological characteristics and land use history. Then, the soils have been incubated during 104 days and labelled and non-labelled CO2 fluxes have been measured at ten

  9. Rise of Microbial Culturomics: Noncontiguous Finished Genome Sequence and Description of Beduini massiliensis gen. nov., sp. nov.

    Science.gov (United States)

    Mourembou, Gaël; Yasir, Muhammad; Azhar, Esam Ibraheem; Lagier, Jean Christophe; Bibi, Fehmida; Jiman-Fatani, Asif Ahmad; Helmy, Nayel; Robert, Catherine; Rathored, Jaishriram; Fournier, Pierre-Edouard; Raoult, Didier; Million, Matthieu

    2015-12-01

    Microbial culturomics is a new field of omics sciences that examines the bacterial diversity of human gut coupled with a taxono-genomic strategy. Using microbial culturomics, we report here for the first time a novel Gram negative, catalase- and oxidase-negative, strict anaerobic bacilli named Beduini massiliensis gen. nov., sp nov. strain GM1 (= CSUR P1440 = DSM 100188), isolated from the stools of a female nomadic Bedouin from Saudi Arabia. With a length of 2,850,586 bp, the Beduini massiliensis genome exhibits a G + C content of 35.9%, and contains 2819 genes (2744 protein-coding and 75 RNA genes including 57 tRNA and 18 rRNA genes). It is composed of 6 scaffolds (composed of 6 contigs). A total of 1859 genes (67.75%) were assigned a putative function (by COGs or by NR blast). At least 1457 (53%) orthologous proteins were not shared with the closest phylogenetic species. 274 genes (10.0%) were identified as ORFans. These results show that microbial culturomics can dramatically improve the characterization of the human microbiota repertoire, deciphering new bacterial species and new genes. Further studies will clarify the geographic specificity and the putative role of these new microbes and their related functional genetic content in health and disease. Microbial culturomics is an emerging frontier of omics systems sciences and integrative biology and thus, warrants further consideration as part of the postgenomics methodology toolbox.

  10. Direct coupling of a genome-scale microbial in silico model and a groundwater reactive transport model

    International Nuclear Information System (INIS)

    Fang, Yilin; Scheibe, Timothy D.; Mahadevan, Radhakrishnan; Garg, Srinath; Long, Philip E.; Lovley, Derek R.

    2011-01-01

    The activity of microorganisms often plays an important role in dynamic natural attenuation or engineered bioremediation of subsurface contaminants, such as chlorinated solvents, metals, and radionuclides. To evaluate and/or design bioremediated systems, quantitative reactive transport models are needed. State-of-the-art reactive transport models often ignore the microbial effects or simulate the microbial effects with static growth yield and constant reaction rate parameters over simulated conditions, while in reality microorganisms can dynamically modify their functionality (such as utilization of alternative respiratory pathways) in response to spatial and temporal variations in environmental conditions. Constraint-based genome-scale microbial in silico models, using genomic data and multiple-pathway reaction networks, have been shown to be able to simulate transient metabolism of some well studied microorganisms and identify growth rate, substrate uptake rates, and byproduct rates under different growth conditions. These rates can be identified and used to replace specific microbially-mediated reaction rates in a reactive transport model using local geochemical conditions as constraints. We previously demonstrated the potential utility of integrating a constraint based microbial metabolism model with a reactive transport simulator as applied to bioremediation of uranium in groundwater. However, that work relied on an indirect coupling approach that was effective for initial demonstration but may not be extensible to more complex problems that are of significant interest (e.g., communities of microbial species, multiple constraining variables). Here, we extend that work by presenting and demonstrating a method of directly integrating a reactive transport model (FORTRAN code) with constraint-based in silico models solved with IBM ILOG CPLEX linear optimizer base system (C library). The models were integrated with BABEL, a language interoperability tool. The

  11. Direct coupling of a genome-scale microbial in silico model and a groundwater reactive transport model.

    Science.gov (United States)

    Fang, Yilin; Scheibe, Timothy D; Mahadevan, Radhakrishnan; Garg, Srinath; Long, Philip E; Lovley, Derek R

    2011-03-25

    The activity of microorganisms often plays an important role in dynamic natural attenuation or engineered bioremediation of subsurface contaminants, such as chlorinated solvents, metals, and radionuclides. To evaluate and/or design bioremediated systems, quantitative reactive transport models are needed. State-of-the-art reactive transport models often ignore the microbial effects or simulate the microbial effects with static growth yield and constant reaction rate parameters over simulated conditions, while in reality microorganisms can dynamically modify their functionality (such as utilization of alternative respiratory pathways) in response to spatial and temporal variations in environmental conditions. Constraint-based genome-scale microbial in silico models, using genomic data and multiple-pathway reaction networks, have been shown to be able to simulate transient metabolism of some well studied microorganisms and identify growth rate, substrate uptake rates, and byproduct rates under different growth conditions. These rates can be identified and used to replace specific microbially-mediated reaction rates in a reactive transport model using local geochemical conditions as constraints. We previously demonstrated the potential utility of integrating a constraint-based microbial metabolism model with a reactive transport simulator as applied to bioremediation of uranium in groundwater. However, that work relied on an indirect coupling approach that was effective for initial demonstration but may not be extensible to more complex problems that are of significant interest (e.g., communities of microbial species and multiple constraining variables). Here, we extend that work by presenting and demonstrating a method of directly integrating a reactive transport model (FORTRAN code) with constraint-based in silico models solved with IBM ILOG CPLEX linear optimizer base system (C library). The models were integrated with BABEL, a language interoperability tool. The

  12. Direct coupling of a genome-scale microbial in silico model and a groundwater reactive transport model

    Science.gov (United States)

    Fang, Yilin; Scheibe, Timothy D.; Mahadevan, Radhakrishnan; Garg, Srinath; Long, Philip E.; Lovley, Derek R.

    2011-03-01

    The activity of microorganisms often plays an important role in dynamic natural attenuation or engineered bioremediation of subsurface contaminants, such as chlorinated solvents, metals, and radionuclides. To evaluate and/or design bioremediated systems, quantitative reactive transport models are needed. State-of-the-art reactive transport models often ignore the microbial effects or simulate the microbial effects with static growth yield and constant reaction rate parameters over simulated conditions, while in reality microorganisms can dynamically modify their functionality (such as utilization of alternative respiratory pathways) in response to spatial and temporal variations in environmental conditions. Constraint-based genome-scale microbial in silico models, using genomic data and multiple-pathway reaction networks, have been shown to be able to simulate transient metabolism of some well studied microorganisms and identify growth rate, substrate uptake rates, and byproduct rates under different growth conditions. These rates can be identified and used to replace specific microbially-mediated reaction rates in a reactive transport model using local geochemical conditions as constraints. We previously demonstrated the potential utility of integrating a constraint-based microbial metabolism model with a reactive transport simulator as applied to bioremediation of uranium in groundwater. However, that work relied on an indirect coupling approach that was effective for initial demonstration but may not be extensible to more complex problems that are of significant interest (e.g., communities of microbial species and multiple constraining variables). Here, we extend that work by presenting and demonstrating a method of directly integrating a reactive transport model (FORTRAN code) with constraint-based in silico models solved with IBM ILOG CPLEX linear optimizer base system (C library). The models were integrated with BABEL, a language interoperability tool. The

  13. Microbial genome sequencing using optical mapping and Illumina sequencing

    Science.gov (United States)

    Introduction Optical mapping is a technique in which strands of genomic DNA are digested with one or more restriction enzymes, and a physical map of the genome constructed from the resulting image. In outline, genomic DNA is extracted from a pure culture, linearly arrayed on a specialized glass sli...

  14. Predictions of Gene Family Distributions in Microbial Genomes: Evolution by Gene Duplication and Modification

    International Nuclear Information System (INIS)

    Yanai, Itai; Camacho, Carlos J.; DeLisi, Charles

    2000-01-01

    A universal property of microbial genomes is the considerable fraction of genes that are homologous to other genes within the same genome. The process by which these homologues are generated is not well understood, but sequence analysis of 20 microbial genomes unveils a recurrent distribution of gene family sizes. We show that a simple evolutionary model based on random gene duplication and point mutations fully accounts for these distributions and permits predictions for the number of gene families in genomes not yet complete. Our findings are consistent with the notion that a genome evolves from a set of precursor genes to a mature size by gene duplications and increasing modifications. (c) 2000 The American Physical Society

  15. Predictions of Gene Family Distributions in Microbial Genomes: Evolution by Gene Duplication and Modification

    Energy Technology Data Exchange (ETDEWEB)

    Yanai, Itai; Camacho, Carlos J.; DeLisi, Charles

    2000-09-18

    A universal property of microbial genomes is the considerable fraction of genes that are homologous to other genes within the same genome. The process by which these homologues are generated is not well understood, but sequence analysis of 20 microbial genomes unveils a recurrent distribution of gene family sizes. We show that a simple evolutionary model based on random gene duplication and point mutations fully accounts for these distributions and permits predictions for the number of gene families in genomes not yet complete. Our findings are consistent with the notion that a genome evolves from a set of precursor genes to a mature size by gene duplications and increasing modifications. (c) 2000 The American Physical Society.

  16. Approaches for in silico finishing of microbial genome sequences

    Directory of Open Access Journals (Sweden)

    Frederico Schmitt Kremer

    Full Text Available Abstract The introduction of next-generation sequencing (NGS had a significant effect on the availability of genomic information, leading to an increase in the number of sequenced genomes from a large spectrum of organisms. Unfortunately, due to the limitations implied by the short-read sequencing platforms, most of these newly sequenced genomes remained as “drafts”, incomplete representations of the whole genetic content. The previous genome sequencing studies indicated that finishing a genome sequenced by NGS, even bacteria, may require additional sequencing to fill the gaps, making the entire process very expensive. As such, several in silico approaches have been developed to optimize the genome assemblies and facilitate the finishing process. The present review aims to explore some free (open source, in many cases tools that are available to facilitate genome finishing.

  17. Approaches for in silico finishing of microbial genome sequences.

    Science.gov (United States)

    Kremer, Frederico Schmitt; McBride, Alan John Alexander; Pinto, Luciano da Silva

    The introduction of next-generation sequencing (NGS) had a significant effect on the availability of genomic information, leading to an increase in the number of sequenced genomes from a large spectrum of organisms. Unfortunately, due to the limitations implied by the short-read sequencing platforms, most of these newly sequenced genomes remained as "drafts", incomplete representations of the whole genetic content. The previous genome sequencing studies indicated that finishing a genome sequenced by NGS, even bacteria, may require additional sequencing to fill the gaps, making the entire process very expensive. As such, several in silico approaches have been developed to optimize the genome assemblies and facilitate the finishing process. The present review aims to explore some free (open source, in many cases) tools that are available to facilitate genome finishing.

  18. Selective whole genome amplification for resequencing target microbial species from complex natural samples.

    Science.gov (United States)

    Leichty, Aaron R; Brisson, Dustin

    2014-10-01

    Population genomic analyses have demonstrated power to address major questions in evolutionary and molecular microbiology. Collecting populations of genomes is hindered in many microbial species by the absence of a cost effective and practical method to collect ample quantities of sufficiently pure genomic DNA for next-generation sequencing. Here we present a simple method to amplify genomes of a target microbial species present in a complex, natural sample. The selective whole genome amplification (SWGA) technique amplifies target genomes using nucleotide sequence motifs that are common in the target microbe genome, but rare in the background genomes, to prime the highly processive phi29 polymerase. SWGA thus selectively amplifies the target genome from samples in which it originally represented a minor fraction of the total DNA. The post-SWGA samples are enriched in target genomic DNA, which are ideal for population resequencing. We demonstrate the efficacy of SWGA using both laboratory-prepared mixtures of cultured microbes as well as a natural host-microbe association. Targeted amplification of Borrelia burgdorferi mixed with Escherichia coli at genome ratios of 1:2000 resulted in >10(5)-fold amplification of the target genomes with genomic extracts from Wolbachia pipientis-infected Drosophila melanogaster resulted in up to 70% of high-throughput resequencing reads mapping to the W. pipientis genome. By contrast, 2-9% of sequencing reads were derived from W. pipientis without prior amplification. The SWGA technique results in high sequencing coverage at a fraction of the sequencing effort, thus allowing population genomic studies at affordable costs. Copyright © 2014 by the Genetics Society of America.

  19. MycoCosm, an Integrated Fungal Genomics Resource

    Energy Technology Data Exchange (ETDEWEB)

    Shabalov, Igor; Grigoriev, Igor

    2012-03-16

    MycoCosm is a web-based interactive fungal genomics resource, which was first released in March 2010, in response to an urgent call from the fungal community for integration of all fungal genomes and analytical tools in one place (Pan-fungal data resources meeting, Feb 21-22, 2010, Alexandria, VA). MycoCosm integrates genomics data and analysis tools to navigate through over 100 fungal genomes sequenced at JGI and elsewhere. This resource allows users to explore fungal genomes in the context of both genome-centric analysis and comparative genomics, and promotes user community participation in data submission, annotation and analysis. MycoCosm has over 4500 unique visitors/month or 35000+ visitors/year as well as hundreds of registered users contributing their data and expertise to this resource. Its scalable architecture allows significant expansion of the data expected from JGI Fungal Genomics Program, its users, and integration with external resources used by fungal community.

  20. Genome-based Modeling and Design of Metabolic Interactions in Microbial Communities.

    Science.gov (United States)

    Mahadevan, Radhakrishnan; Henson, Michael A

    2012-01-01

    Biotechnology research is traditionally focused on individual microbial strains that are perceived to have the necessary metabolic functions, or the capability to have these functions introduced, to achieve a particular task. For many important applications, the development of such omnipotent microbes is an extremely challenging if not impossible task. By contrast, nature employs a radically different strategy based on synergistic combinations of different microbial species that collectively achieve the desired task. These natural communities have evolved to exploit the native metabolic capabilities of each species and are highly adaptive to changes in their environments. However, microbial communities have proven difficult to study due to a lack of suitable experimental and computational tools. With the advent of genome sequencing, omics technologies, bioinformatics and genome-scale modeling, researchers now have unprecedented capabilities to analyze and engineer the metabolism of microbial communities. The goal of this review is to summarize recent applications of genome-scale metabolic modeling to microbial communities. A brief introduction to lumped community models is used to motivate the need for genome-level descriptions of individual species and their metabolic interactions. The review of genome-scale models begins with static modeling approaches, which are appropriate for communities where the extracellular environment can be assumed to be time invariant or slowly varying. Dynamic extensions of the static modeling approach are described, and then applications of genome-scale models for design of synthetic microbial communities are reviewed. The review concludes with a summary of metagenomic tools for analyzing community metabolism and an outlook for future research.

  1. Rapid prototyping of microbial cell factories via genome-scale engineering.

    Science.gov (United States)

    Si, Tong; Xiao, Han; Zhao, Huimin

    2015-11-15

    Advances in reading, writing and editing genetic materials have greatly expanded our ability to reprogram biological systems at the resolution of a single nucleotide and on the scale of a whole genome. Such capacity has greatly accelerated the cycles of design, build and test to engineer microbes for efficient synthesis of fuels, chemicals and drugs. In this review, we summarize the emerging technologies that have been applied, or are potentially useful for genome-scale engineering in microbial systems. We will focus on the development of high-throughput methodologies, which may accelerate the prototyping of microbial cell factories. Copyright © 2014 Elsevier Inc. All rights reserved.

  2. Rapid Prototyping of Microbial Cell Factories via Genome-scale Engineering

    Science.gov (United States)

    Si, Tong; Xiao, Han; Zhao, Huimin

    2014-01-01

    Advances in reading, writing and editing genetic materials have greatly expanded our ability to reprogram biological systems at the resolution of a single nucleotide and on the scale of a whole genome. Such capacity has greatly accelerated the cycles of design, build and test to engineer microbes for efficient synthesis of fuels, chemicals and drugs. In this review, we summarize the emerging technologies that have been applied, or are potentially useful for genome-scale engineering in microbial systems. We will focus on the development of high-throughput methodologies, which may accelerate the prototyping of microbial cell factories. PMID:25450192

  3. The need for high-quality whole-genome sequence databases in microbial forensics.

    Science.gov (United States)

    Sjödin, Andreas; Broman, Tina; Melefors, Öjar; Andersson, Gunnar; Rasmusson, Birgitta; Knutsson, Rickard; Forsman, Mats

    2013-09-01

    Microbial forensics is an important part of a strengthened capability to respond to biocrime and bioterrorism incidents to aid in the complex task of distinguishing between natural outbreaks and deliberate acts. The goal of a microbial forensic investigation is to identify and criminally prosecute those responsible for a biological attack, and it involves a detailed analysis of the weapon--that is, the pathogen. The recent development of next-generation sequencing (NGS) technologies has greatly increased the resolution that can be achieved in microbial forensic analyses. It is now possible to identify, quickly and in an unbiased manner, previously undetectable genome differences between closely related isolates. This development is particularly relevant for the most deadly bacterial diseases that are caused by bacterial lineages with extremely low levels of genetic diversity. Whole-genome analysis of pathogens is envisaged to be increasingly essential for this purpose. In a microbial forensic context, whole-genome sequence analysis is the ultimate method for strain comparisons as it is informative during identification, characterization, and attribution--all 3 major stages of the investigation--and at all levels of microbial strain identity resolution (ie, it resolves the full spectrum from family to isolate). Given these capabilities, one bottleneck in microbial forensics investigations is the availability of high-quality reference databases of bacterial whole-genome sequences. To be of high quality, databases need to be curated and accurate in terms of sequences, metadata, and genetic diversity coverage. The development of whole-genome sequence databases will be instrumental in successfully tracing pathogens in the future.

  4. Molecular Assemblies, Genes and Genomics Integrated Efficiently (MAGGIE)

    Energy Technology Data Exchange (ETDEWEB)

    Baliga, Nitin S

    2011-05-26

    when applied to the manually curated training set. Applying this method to the data representing around a quarter of the fraction space for water soluble proteins in D. vulgaris, we obtained 854 reliable pair wise interactions. Further, we have developed algorithms to analyze and assign significance to protein interaction data from bait pull-down experiments and integrate these data with other systems biology data through associative biclustering in a parallel computing environment. We will 'fill-in' missing information in these interaction data using a 'Transitive Closure' algorithm and subsequently use 'Between Commonality Decomposition' algorithm to discover complexes within these large graphs of protein interactions. To characterize the metabolic activities of proteins and their complexes we are developing algorithms to deconvolute pure mass spectra, estimate chemical formula for m/z values, and fit isotopic fine structure to metabolomics data. We have discovered that in comparison to isotopic pattern fitting methods restricting the chemical formula by these two dimensions actually facilitates unique solutions for chemical formula generators. To understand how microbial functions are regulated we have developed complementary algorithms for reconstructing gene regulatory networks (GRNs). Whereas the network inference algorithms cMonkey and Inferelator developed enable de novo reconstruction of predictive models for GRNs from diverse systems biology data, the RegPrecise and RegPredict framework developed uses evolutionary comparisons of genomes from closely related organisms to reconstruct conserved regulons. We have integrated the two complementary algorithms to rapidly generate comprehensive models for gene regulation of understudied organisms. Our preliminary analyses of these reconstructed GRNs have revealed novel regulatory mechanisms and cis-regulatory motifs, as well asothers that are conserved across species. Finally, we are

  5. Microbial comparative pan-genomics using binomial mixture models

    DEFF Research Database (Denmark)

    Ussery, David; Snipen, L; Almøy, T

    2009-01-01

    The size of the core- and pan-genome of bacterial species is a topic of increasing interest due to the growing number of sequenced prokaryote genomes, many from the same species. Attempts to estimate these quantities have been made, using regression methods or mixture models. We extend the latter...... approach by using statistical ideas developed for capture-recapture problems in ecology and epidemiology. RESULTS: We estimate core- and pan-genome sizes for 16 different bacterial species. The results reveal a complex dependency structure for most species, manifested as heterogeneous detection...... probabilities. Estimated pan-genome sizes range from small (around 2600 gene families) in Buchnera aphidicola to large (around 43000 gene families) in Escherichia coli. Results for Echerichia coli show that as more data become available, a larger diversity is estimated, indicating an extensive pool of rarely...

  6. Microbial comparative pan-genomics using binomial mixture models

    Directory of Open Access Journals (Sweden)

    Ussery David W

    2009-08-01

    Full Text Available Abstract Background The size of the core- and pan-genome of bacterial species is a topic of increasing interest due to the growing number of sequenced prokaryote genomes, many from the same species. Attempts to estimate these quantities have been made, using regression methods or mixture models. We extend the latter approach by using statistical ideas developed for capture-recapture problems in ecology and epidemiology. Results We estimate core- and pan-genome sizes for 16 different bacterial species. The results reveal a complex dependency structure for most species, manifested as heterogeneous detection probabilities. Estimated pan-genome sizes range from small (around 2600 gene families in Buchnera aphidicola to large (around 43000 gene families in Escherichia coli. Results for Echerichia coli show that as more data become available, a larger diversity is estimated, indicating an extensive pool of rarely occurring genes in the population. Conclusion Analyzing pan-genomics data with binomial mixture models is a way to handle dependencies between genomes, which we find is always present. A bottleneck in the estimation procedure is the annotation of rarely occurring genes.

  7. The fast changing landscape of sequencing technologies and their impact on microbial genome assemblies and annotation.

    Science.gov (United States)

    Mavromatis, Konstantinos; Land, Miriam L; Brettin, Thomas S; Quest, Daniel J; Copeland, Alex; Clum, Alicia; Goodwin, Lynne; Woyke, Tanja; Lapidus, Alla; Klenk, Hans Peter; Cottingham, Robert W; Kyrpides, Nikos C

    2012-01-01

    The emergence of next generation sequencing (NGS) has provided the means for rapid and high throughput sequencing and data generation at low cost, while concomitantly creating a new set of challenges. The number of available assembled microbial genomes continues to grow rapidly and their quality reflects the quality of the sequencing technology used, but also of the analysis software employed for assembly and annotation. In this work, we have explored the quality of the microbial draft genomes across various sequencing technologies. We have compared the draft and finished assemblies of 133 microbial genomes sequenced at the Department of Energy-Joint Genome Institute and finished at the Los Alamos National Laboratory using a variety of combinations of sequencing technologies, reflecting the transition of the institute from Sanger-based sequencing platforms to NGS platforms. The quality of the public assemblies and of the associated gene annotations was evaluated using various metrics. Results obtained with the different sequencing technologies, as well as their effects on downstream processes, were analyzed. Our results demonstrate that the Illumina HiSeq 2000 sequencing system, the primary sequencing technology currently used for de novo genome sequencing and assembly at JGI, has various advantages in terms of total sequence throughput and cost, but it also introduces challenges for the downstream analyses. In all cases assembly results although on average are of high quality, need to be viewed critically and consider sources of errors in them prior to analysis. These data follow the evolution of microbial sequencing and downstream processing at the JGI from draft genome sequences with large gaps corresponding to missing genes of significant biological role to assemblies with multiple small gaps (Illumina) and finally to assemblies that generate almost complete genomes (Illumina+PacBio).

  8. Automated ensemble assembly and validation of microbial genomes

    Science.gov (United States)

    2014-01-01

    Background The continued democratization of DNA sequencing has sparked a new wave of development of genome assembly and assembly validation methods. As individual research labs, rather than centralized centers, begin to sequence the majority of new genomes, it is important to establish best practices for genome assembly. However, recent evaluations such as GAGE and the Assemblathon have concluded that there is no single best approach to genome assembly. Instead, it is preferable to generate multiple assemblies and validate them to determine which is most useful for the desired analysis; this is a labor-intensive process that is often impossible or unfeasible. Results To encourage best practices supported by the community, we present iMetAMOS, an automated ensemble assembly pipeline; iMetAMOS encapsulates the process of running, validating, and selecting a single assembly from multiple assemblies. iMetAMOS packages several leading open-source tools into a single binary that automates parameter selection and execution of multiple assemblers, scores the resulting assemblies based on multiple validation metrics, and annotates the assemblies for genes and contaminants. We demonstrate the utility of the ensemble process on 225 previously unassembled Mycobacterium tuberculosis genomes as well as a Rhodobacter sphaeroides benchmark dataset. On these real data, iMetAMOS reliably produces validated assemblies and identifies potential contamination without user intervention. In addition, intelligent parameter selection produces assemblies of R. sphaeroides comparable to or exceeding the quality of those from the GAGE-B evaluation, affecting the relative ranking of some assemblers. Conclusions Ensemble assembly with iMetAMOS provides users with multiple, validated assemblies for each genome. Although computationally limited to small or mid-sized genomes, this approach is the most effective and reproducible means for generating high-quality assemblies and enables users to

  9. Genome-centric resolution of microbial diversity, metabolism and interactions in anaerobic digestion.

    Science.gov (United States)

    Vanwonterghem, Inka; Jensen, Paul D; Rabaey, Korneel; Tyson, Gene W

    2016-09-01

    Our understanding of the complex interconnected processes performed by microbial communities is hindered by our inability to culture the vast majority of microorganisms. Metagenomics provides a way to bypass this cultivation bottleneck and recent advances in this field now allow us to recover a growing number of genomes representing previously uncultured populations from increasingly complex environments. In this study, a temporal genome-centric metagenomic analysis was performed of lab-scale anaerobic digesters that host complex microbial communities fulfilling a series of interlinked metabolic processes to enable the conversion of cellulose to methane. In total, 101 population genomes that were moderate to near-complete were recovered based primarily on differential coverage binning. These populations span 19 phyla, represent mostly novel species and expand the genomic coverage of several rare phyla. Classification into functional guilds based on their metabolic potential revealed metabolic networks with a high level of functional redundancy as well as niche specialization, and allowed us to identify potential roles such as hydrolytic specialists for several rare, uncultured populations. Genome-centric analyses of complex microbial communities across diverse environments provide the key to understanding the phylogenetic and metabolic diversity of these interactive communities. © 2016 Society for Applied Microbiology and John Wiley & Sons Ltd.

  10. Calibration and analysis of genome-based models for microbial ecology.

    Science.gov (United States)

    Louca, Stilianos; Doebeli, Michael

    2015-10-16

    Microbial ecosystem modeling is complicated by the large number of unknown parameters and the lack of appropriate calibration tools. Here we present a novel computational framework for modeling microbial ecosystems, which combines genome-based model construction with statistical analysis and calibration to experimental data. Using this framework, we examined the dynamics of a community of Escherichia coli strains that emerged in laboratory evolution experiments, during which an ancestral strain diversified into two coexisting ecotypes. We constructed a microbial community model comprising the ancestral and the evolved strains, which we calibrated using separate monoculture experiments. Simulations reproduced the successional dynamics in the evolution experiments, and pathway activation patterns observed in microarray transcript profiles. Our approach yielded detailed insights into the metabolic processes that drove bacterial diversification, involving acetate cross-feeding and competition for organic carbon and oxygen. Our framework provides a missing link towards a data-driven mechanistic microbial ecology.

  11. Integration of microbial biopesticides in greenhouse floriculture: The Canadian experience.

    Science.gov (United States)

    Brownbridge, Michael; Buitenhuis, Rose

    2017-11-28

    Historically, greenhouse floriculture has relied on synthetic insecticides to meet its pest control needs. But, growers are increasingly faced with the loss or failure of synthetic chemical pesticides, declining access to new chemistries, stricter environmental/health and safety regulations, and the need to produce plants in a manner that meets the 'sustainability' demands of a consumer driven market. In Canada, reports of thrips resistance to spinosad (Success™) within 6-12 months of its registration prompted a radical change in pest management philosophy and approach. Faced with a lack of registered chemical alternatives, growers turned to biological control out of necessity. Biological control now forms the foundation for pest management programs in Canadian floriculture greenhouses. Success in a biocontrol program is rarely achieved through the use of a single agent, though. Rather, it is realized through the concurrent use of biological, cultural and other strategies within an integrated plant production system. Microbial insecticides can play a critical supporting role in biologically-based integrated pest management (IPM) programs. They have unique modes of action and are active against a range of challenging pests. As commercial microbial insecticides have come to market, research to generate efficacy data has assisted their registration in Canada, and the development and adaptation of integrated programs has promoted uptake by floriculture growers. This review documents some of the work done to integrate microbial insecticides into chrysanthemum and poinsettia production systems, outlines current use practices, and identifies opportunities to improve efficacy in Canadian floriculture crops. Copyright © 2017 Elsevier Inc. All rights reserved.

  12. CMG-biotools, a free workbench for basic comparative microbial genomics.

    Science.gov (United States)

    Vesth, Tammi; Lagesen, Karin; Acar, Öncel; Ussery, David

    2013-01-01

    Today, there are more than a hundred times as many sequenced prokaryotic genomes than were present in the year 2000. The economical sequencing of genomic DNA has facilitated a whole new approach to microbial genomics. The real power of genomics is manifested through comparative genomics that can reveal strain specific characteristics, diversity within species and many other aspects. However, comparative genomics is a field not easily entered into by scientists with few computational skills. The CMG-biotools package is designed for microbiologists with limited knowledge of computational analysis and can be used to perform a number of analyses and comparisons of genomic data. The CMG-biotools system presents a stand-alone interface for comparative microbial genomics. The package is a customized operating system, based on Xubuntu 10.10, available through the open source Ubuntu project. The system can be installed on a virtual computer, allowing the user to run the system alongside any other operating system. Source codes for all programs are provided under GNU license, which makes it possible to transfer the programs to other systems if so desired. We here demonstrate the package by comparing and analyzing the diversity within the class Negativicutes, represented by 31 genomes including 10 genera. The analyses include 16S rRNA phylogeny, basic DNA and codon statistics, proteome comparisons using BLAST and graphical analyses of DNA structures. This paper shows the strength and diverse use of the CMG-biotools system. The system can be installed on a vide range of host operating systems and utilizes as much of the host computer as desired. It allows the user to compare multiple genomes, from various sources using standardized data formats and intuitive visualizations of results. The examples presented here clearly shows that users with limited computational experience can perform complicated analysis without much training.

  13. Comparative analyses identified species-specific functional roles in oral microbial genomes

    Science.gov (United States)

    Chen, Tsute; Gajare, Prasad; Olsen, Ingar; Dewhirst, Floyd E.

    2017-01-01

    ABSTRACT The advent of next generation sequencing is producing more genomic sequences for various strains of many human oral microbial species and allows for insightful functional comparisons at both intra- and inter-species levels. This study performed in-silico functional comparisons for currently available genomic sequences of major species associated with periodontitis including Aggregatibacter actinomycetemcomitans (AA), Porphyromonas gingivalis (PG), Treponema denticola (TD), and Tannerella forsythia (TF), as well as several cariogenic and commensal streptococcal species. Complete or draft sequences were annotated with the RAST to infer structured functional subsystems for each genome. The subsystems profiles were clustered to groups of functions with similar patterns. Functional enrichment and depletion were evaluated based on hypergeometric distribution to identify subsystems that are unique or missing between two groups of genomes. Unique or missing metabolic pathways and biological functions were identified in different species. For example, components involved in flagellar motility were found only in the motile species TD, as expected, with few exceptions scattered in several streptococcal species, likely associated with chemotaxis. Transposable elements were only found in the two Bacteroidales species PG and TF, and half of the AA genomes. Genes involved in CRISPR were prevalent in most oral species. Furthermore, prophage related subsystems were also commonly found in most species except for PG and Streptococcus mutans, in which very few genomes contain prophage components. Comparisons between pathogenic (P) and nonpathogenic (NP) genomes also identified genes potentially important for virulence. Two such comparisons were performed between AA (P) and several A. aphrophilus (NP) strains, and between S. mutans + S. sobrinus (P) and other oral streptococcal species (NP). This comparative genomics approach can be readily used to identify functions unique to

  14. Genomic Analysis of Complex Microbial Communities in Wounds

    Science.gov (United States)

    2012-01-01

    Permutation Multivariate Analysis of Variance ( PerMANOVA ). We used PerMANOVA to test the null-hypothesis of no... permutation -based version of the multivariate analysis of variance (MANOVA). PerMANOVA uses the distances between samples to partition variance and...coli. Antibiotics, bacteria, community analysis , diabetes, pyrosequencing, wound, wound therapy, 16S rRNA gene Genomic Analysis of Complex

  15. A system-level model for the microbial regulatory genome.

    Science.gov (United States)

    Brooks, Aaron N; Reiss, David J; Allard, Antoine; Wu, Wei-Ju; Salvanha, Diego M; Plaisier, Christopher L; Chandrasekaran, Sriram; Pan, Min; Kaur, Amardeep; Baliga, Nitin S

    2014-07-15

    Microbes can tailor transcriptional responses to diverse environmental challenges despite having streamlined genomes and a limited number of regulators. Here, we present data-driven models that capture the dynamic interplay of the environment and genome-encoded regulatory programs of two types of prokaryotes: Escherichia coli (a bacterium) and Halobacterium salinarum (an archaeon). The models reveal how the genome-wide distributions of cis-acting gene regulatory elements and the conditional influences of transcription factors at each of those elements encode programs for eliciting a wide array of environment-specific responses. We demonstrate how these programs partition transcriptional regulation of genes within regulons and operons to re-organize gene-gene functional associations in each environment. The models capture fitness-relevant co-regulation by different transcriptional control mechanisms acting across the entire genome, to define a generalized, system-level organizing principle for prokaryotic gene regulatory networks that goes well beyond existing paradigms of gene regulation. An online resource (http://egrin2.systemsbiology.net) has been developed to facilitate multiscale exploration of conditional gene regulation in the two prokaryotes. © 2014 The Authors. Published under the terms of the CC BY 4.0 license.

  16. Integrating cancer genomic data into electronic health records

    Directory of Open Access Journals (Sweden)

    Jeremy L. Warner

    2016-10-01

    Full Text Available Abstract The rise of genomically targeted therapies and immunotherapy has revolutionized the practice of oncology in the last 10–15 years. At the same time, new technologies and the electronic health record (EHR in particular have permeated the oncology clinic. Initially designed as billing and clinical documentation systems, EHR systems have not anticipated the complexity and variety of genomic information that needs to be reviewed, interpreted, and acted upon on a daily basis. Improved integration of cancer genomic data with EHR systems will help guide clinician decision making, support secondary uses, and ultimately improve patient care within oncology clinics. Some of the key factors relating to the challenge of integrating cancer genomic data into EHRs include: the bioinformatics pipelines that translate raw genomic data into meaningful, actionable results; the role of human curation in the interpretation of variant calls; and the need for consistent standards with regard to genomic and clinical data. Several emerging paradigms for integration are discussed in this review, including: non-standardized efforts between individual institutions and genomic testing laboratories; “middleware” products that portray genomic information, albeit outside of the clinical workflow; and application programming interfaces that have the potential to work within clinical workflow. The critical need for clinical-genomic knowledge bases, which can be independent or integrated into the aforementioned solutions, is also discussed.

  17. From cultured to uncultured genome sequences: metagenomics and modeling microbial ecosystems.

    Science.gov (United States)

    Garza, Daniel R; Dutilh, Bas E

    2015-11-01

    Microorganisms and the viruses that infect them are the most numerous biological entities on Earth and enclose its greatest biodiversity and genetic reservoir. With strength in their numbers, these microscopic organisms are major players in the cycles of energy and matter that sustain all life. Scientists have only scratched the surface of this vast microbial world through culture-dependent methods. Recent developments in generating metagenomes, large random samples of nucleic acid sequences isolated directly from the environment, are providing comprehensive portraits of the composition, structure, and functioning of microbial communities. Moreover, advances in metagenomic analysis have created the possibility of obtaining complete or nearly complete genome sequences from uncultured microorganisms, providing important means to study their biology, ecology, and evolution. Here we review some of the recent developments in the field of metagenomics, focusing on the discovery of genetic novelty and on methods for obtaining uncultured genome sequences, including through the recycling of previously published datasets. Moreover we discuss how metagenomics has become a core scientific tool to characterize eco-evolutionary patterns of microbial ecosystems, thus allowing us to simultaneously discover new microbes and study their natural communities. We conclude by discussing general guidelines and challenges for modeling the interactions between uncultured microorganisms and viruses based on the information contained in their genome sequences. These models will significantly advance our understanding of the functioning of microbial ecosystems and the roles of microbes in the environment.

  18. Microbial and viral chitinases: Attractive biopesticides for integrated pest management.

    Science.gov (United States)

    Berini, Francesca; Katz, Chen; Gruzdev, Nady; Casartelli, Morena; Tettamanti, Gianluca; Marinelli, Flavia

    2018-01-04

    The negative impact of the massive use of synthetic pesticides on the environment and on human health has stimulated the search for environment-friendly practices for controlling plant diseases and pests. Among them, biocontrol, which relies on using beneficial organisms or their products (bioactive molecules and/or hydrolytic enzymes), holds the greatest promise and is considered a pillar of integrated pest management. Chitinases are particularly attractive to this purpose since they have fungicidal, insecticidal, and nematicidal activities. Here, current knowledge on the biopesticidal action of microbial and viral chitinases is reviewed, together with a critical analysis of their future development as biopesticides. Copyright © 2018 Elsevier Inc. All rights reserved.

  19. Perspectives of Integrative Cancer Genomics in Next Generation Sequencing Era

    Directory of Open Access Journals (Sweden)

    So Mee Kwon

    2012-06-01

    Full Text Available The explosive development of genomics technologies including microarrays and next generation sequencing (NGS has provided comprehensive maps of cancer genomes, including the expression of mRNAs and microRNAs, DNA copy numbers, sequence variations, and epigenetic changes. These genome-wide profiles of the genetic aberrations could reveal the candidates for diagnostic and/or prognostic biomarkers as well as mechanistic insights into tumor development and progression. Recent efforts to establish the huge cancer genome compendium and integrative omics analyses, so-called "integromics", have extended our understanding on the cancer genome, showing its daunting complexity and heterogeneity. However, the challenges of the structured integration, sharing, and interpretation of the big omics data still remain to be resolved. Here, we review several issues raised in cancer omics data analysis, including NGS, focusing particularly on the study design and analysis strategies. This might be helpful to understand the current trends and strategies of the rapidly evolving cancer genomics research.

  20. Distilled single-cell genome sequencing and de novo assembly for sparse microbial communities.

    Science.gov (United States)

    Taghavi, Zeinab; Movahedi, Narjes S; Draghici, Sorin; Chitsaz, Hamidreza

    2013-10-01

    Identification of every single genome present in a microbial sample is an important and challenging task with crucial applications. It is challenging because there are typically millions of cells in a microbial sample, the vast majority of which elude cultivation. The most accurate method to date is exhaustive single-cell sequencing using multiple displacement amplification, which is simply intractable for a large number of cells. However, there is hope for breaking this barrier, as the number of different cell types with distinct genome sequences is usually much smaller than the number of cells. Here, we present a novel divide and conquer method to sequence and de novo assemble all distinct genomes present in a microbial sample with a sequencing cost and computational complexity proportional to the number of genome types, rather than the number of cells. The method is implemented in a tool called Squeezambler. We evaluated Squeezambler on simulated data. The proposed divide and conquer method successfully reduces the cost of sequencing in comparison with the naïve exhaustive approach. Squeezambler and datasets are available at http://compbio.cs.wayne.edu/software/squeezambler/.

  1. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.

    Science.gov (United States)

    Chin, Chen-Shan; Alexander, David H; Marks, Patrick; Klammer, Aaron A; Drake, James; Heiner, Cheryl; Clum, Alicia; Copeland, Alex; Huddleston, John; Eichler, Evan E; Turner, Stephen W; Korlach, Jonas

    2013-06-01

    We present a hierarchical genome-assembly process (HGAP) for high-quality de novo microbial genome assemblies using only a single, long-insert shotgun DNA library in conjunction with Single Molecule, Real-Time (SMRT) DNA sequencing. Our method uses the longest reads as seeds to recruit all other reads for construction of highly accurate preassembled reads through a directed acyclic graph-based consensus procedure, which we follow with assembly using off-the-shelf long-read assemblers. In contrast to hybrid approaches, HGAP does not require highly accurate raw reads for error correction. We demonstrate efficient genome assembly for several microorganisms using as few as three SMRT Cell zero-mode waveguide arrays of sequencing and for BACs using just one SMRT Cell. Long repeat regions can be successfully resolved with this workflow. We also describe a consensus algorithm that incorporates SMRT sequencing primary quality values to produce de novo genome sequence exceeding 99.999% accuracy.

  2. On the total number of genes and their length distribution in complete microbial genomes

    DEFF Research Database (Denmark)

    Skovgaard, Marie; Jensen, L.J.; Brunak, Søren

    2001-01-01

    In sequenced microbial genomes, some of the annotated genes are actually not protein-coding genes, but rather open reading frames that occur by chance. Therefore, the number of annotated genes is higher than the actual number of genes for most of these microbes. Comparison of the length...... distribution of the annotated genes with the length distribution of those matching a known protein reveals that too many short genes are annotated in many genomes. Here we estimate the true number of protein-coding genes for sequenced genomes. Although it is often claimed that Escherichia coli has about 4300...... genes, we show that it probably has only similar to 3800 genes, and that a similar discrepancy exists for almost all published genomes....

  3. Integrated proteomic and genomic analysis of colorectal cancer

    Science.gov (United States)

    Investigators who analyzed 95 human colorectal tumor samples have determined how gene alterations identified in previous analyses of the same samples are expressed at the protein level. The integration of proteomic and genomic data, or proteogenomics, pro

  4. Integrated Genome-Based Studies of Shewanella Ecophysiology

    Energy Technology Data Exchange (ETDEWEB)

    Zhou, Jizhong [Univ. of Oklahoma, Norman, OK (United States); He, Zhili [Univ. of Oklahoma, Norman, OK (United States)

    2014-04-08

    As a part of the Shewanella Federation project, we have used integrated genomic, proteomic and computational technologies to study various aspects of energy metabolism of two Shewanella strains from a systems-level perspective.

  5. Rhizome of life, catastrophes, sequence exchanges, gene creations and giant viruses: How microbial genomics challenges Darwin

    Directory of Open Access Journals (Sweden)

    Vicky eMerhej

    2012-08-01

    Full Text Available Darwin’s theory about the evolution of species has been the object of considerable dispute. In this review, we have described seven key principles in Darwin’s book The Origin of Species and tried to present how genomics challenge each of these concepts and improve our knowledge about evolution. Darwin believed that species evolution consists on a positive directional selection ensuring the survival of the fittest. The most developed state of the species is characterized by increasing complexity. Darwin proposed the theory of descent with modification according to which all species evolve from a single common ancestor through a gradual process of small modification of their vertical inheritance. Finally, the process of evolution can be depicted in the form of a tree. However, microbial genomics showed that evolution is better described as the biological changes over time." The mode of change is not unidirectional and does not necessarily favors advantageous mutations to increase fitness it is rather subject to random selection as a result of catastrophic stochastic processes. Complexity is not necessarily the completion of development: several complex organisms have gone extinct and many microbes including bacteria with intracellular lifestyle have streamlined highly effective genomes. Genomes evolve through large events of gene deletions, duplications, insertions and genomes rearrangements rather than a gradual adaptative process. Genomes are dynamic and chimeric entities with gene repertoires that result from vertical and horizontal acquisitions as well as de novo gene creation. The chimeric character of microbial genomes excludes the possibility of finding a single common ancestor for all the genes recorded currently. Genomes are collections of genes with different evolutionary histories that cannot be represented by a single tree of life. A forest, a network or a rhizome of life may be more accurate to represent evolutionary relationships

  6. Viral dark matter and virus–host interactions resolved from publicly available microbial genomes

    Science.gov (United States)

    Roux, Simon; Hallam, Steven J; Woyke, Tanja; Sullivan, Matthew B

    2015-01-01

    The ecological importance of viruses is now widely recognized, yet our limited knowledge of viral sequence space and virus–host interactions precludes accurate prediction of their roles and impacts. In this study, we mined publicly available bacterial and archaeal genomic data sets to identify 12,498 high-confidence viral genomes linked to their microbial hosts. These data augment public data sets 10-fold, provide first viral sequences for 13 new bacterial phyla including ecologically abundant phyla, and help taxonomically identify 7–38% of ‘unknown’ sequence space in viromes. Genome- and network-based classification was largely consistent with accepted viral taxonomy and suggested that (i) 264 new viral genera were identified (doubling known genera) and (ii) cross-taxon genomic recombination is limited. Further analyses provided empirical data on extrachromosomal prophages and coinfection prevalences, as well as evaluation of in silico virus–host linkage predictions. Together these findings illustrate the value of mining viral signal from microbial genomes. DOI: http://dx.doi.org/10.7554/eLife.08490.001 PMID:26200428

  7. On the total number of genes and their length distribution in complete microbial genomes

    DEFF Research Database (Denmark)

    Skovgaard, M; Jensen, L J; Brunak, S

    2001-01-01

    In sequenced microbial genomes, some of the annotated genes are actually not protein-coding genes, but rather open reading frames that occur by chance. Therefore, the number of annotated genes is higher than the actual number of genes for most of these microbes. Comparison of the length distribut......In sequenced microbial genomes, some of the annotated genes are actually not protein-coding genes, but rather open reading frames that occur by chance. Therefore, the number of annotated genes is higher than the actual number of genes for most of these microbes. Comparison of the length...... distribution of the annotated genes with the length distribution of those matching a known protein reveals that too many short genes are annotated in many genomes. Here we estimate the true number of protein-coding genes for sequenced genomes. Although it is often claimed that Escherichia coli has about 4300...... genes, we show that it probably has only approximately 3800 genes, and that a similar discrepancy exists for almost all published genomes....

  8. Complete genome sequence of Defluviimonas alba cai42T, a microbial exopolysaccharides producer.

    Science.gov (United States)

    Zhao, Jie-Yu; Geng, Shuang; Xu, Lian; Hu, Bing; Sun, Ji-Quan; Nie, Yong; Tang, Yue-Qin; Wu, Xiao-Lei

    2016-12-10

    Defluviimonas alba cai42 T , isolated from the oil-production water in Xinjiang Oilfield in China, has a strong ability to produce exopolysaccharides (EPS). We hereby present its complete genome sequence information which consists of a circular chromosome and three plasmids. The strain characteristically contains various genes encoding for enzymes involved in EPS biosynthesis, modification, and export. According to the genomic and physiochemical data, it is predicted that the strain has the potential to be utilized in industrial production of microbial EPS. Copyright © 2016 Elsevier B.V. All rights reserved.

  9. The MICROBE Project, A Report from the Interagency Working Group on Microbial Genomics

    Science.gov (United States)

    2001-01-01

    functional genomics tools (gene chips, technologies, etc.), comparative genomics, proteomics tools, novel culture techniques, in situ analyses, and...interested in supporting microarray/chip development for gene expression analysis for agricultural microbes, bioinformatics, and proteomics , and the...including one fungus ) in various stages of progress. The closely integrated Natural and Accelerated Bioremediation Research Program in the Office of

  10. Analyzing the genomic variation of microbial cell factories in the era of “New Biotechnology”

    DEFF Research Database (Denmark)

    Herrgard, Markus; Panagiotou, Gianni

    2012-01-01

    The application of genome-scale technologies, both experimental and in silico, to industrial biotechnology has allowed improving the conversion of biomass-derived feedstocks to chemicals, materials and fuels through microbial fermentation. In particular, due to rapidly decreasing costs and its...... technologies for finding the underlying molecular mechanisms for (a) improved carbon source utilization, (b) increased product formation, and (c) stress tolerance. We also discuss the strengths and weaknesses of different strategies for mapping industrially relevant genotype-to-phenotype links including...

  11. Use of an uncertainty analysis for genome-scale models as a prediction tool for microbial growth processes in subsurface environments.

    Science.gov (United States)

    Klier, Christine

    2012-03-06

    The integration of genome-scale, constraint-based models of microbial cell function into simulations of contaminant transport and fate in complex groundwater systems is a promising approach to help characterize the metabolic activities of microorganisms in natural environments. In constraint-based modeling, the specific uptake flux rates of external metabolites are usually determined by Michaelis-Menten kinetic theory. However, extensive data sets based on experimentally measured values are not always available. In this study, a genome-scale model of Pseudomonas putida was used to study the key issue of uncertainty arising from the parametrization of the influx of two growth-limiting substrates: oxygen and toluene. The results showed that simulated growth rates are highly sensitive to substrate affinity constants and that uncertainties in specific substrate uptake rates have a significant influence on the variability of simulated microbial growth. Michaelis-Menten kinetic theory does not, therefore, seem to be appropriate for descriptions of substrate uptake processes in the genome-scale model of P. putida. Microbial growth rates of P. putida in subsurface environments can only be accurately predicted if the processes of complex substrate transport and microbial uptake regulation are sufficiently understood in natural environments and if data-driven uptake flux constraints can be applied.

  12. Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae : Implications for the microbial "pan-genome"

    NARCIS (Netherlands)

    Tettelin, H; Masignani, [No Value; Cieslewicz, MJ; Donati, C; Medini, D; Ward, NL; Angiuoli, SV; Crabtree, J; Jones, AL; Durkin, AS; DeBoy, RT; Davidsen, TM; Mora, M; Scarselli, M; Ros, IMY; Peterson, JD; Hauser, CR; Sundaram, JP; Nelson, WC; Madupu, R; Brinkac, LM; Dodson, RJ; Rosovitz, MJ; Sullivan, SA; Daugherty, SC; Haft, DH; Selengut, J; Gwinn, ML; Zhou, LW; Zafar, N; Khouri, H; Radune, D; Dimitrov, G; Watkins, K; O'Connor, KJB; Smith, S; Utterback, TR; White, O; Rubens, CE; Grandi, G; Madoff, LC; Kasper, DL; Telford, JL; Wessels, MR; Rappuoli, R; Fraser, CM

    2005-01-01

    The development of efficient and inexpensive genome sequencing methods has revolutionized the study of human bacterial pathogens and improved vaccine design. Unfortunately, the sequence of a single genome does not reflect how genetic variability drives pathogenesis within a bacterial species and

  13. CloVR-Comparative: automated, cloud-enabled comparative microbial genome sequence analysis pipeline.

    Science.gov (United States)

    Agrawal, Sonia; Arze, Cesar; Adkins, Ricky S; Crabtree, Jonathan; Riley, David; Vangala, Mahesh; Galens, Kevin; Fraser, Claire M; Tettelin, Hervé; White, Owen; Angiuoli, Samuel V; Mahurkar, Anup; Fricke, W Florian

    2017-04-27

    The benefit of increasing genomic sequence data to the scientific community depends on easy-to-use, scalable bioinformatics support. CloVR-Comparative combines commonly used bioinformatics tools into an intuitive, automated, and cloud-enabled analysis pipeline for comparative microbial genomics. CloVR-Comparative runs on annotated complete or draft genome sequences that are uploaded by the user or selected via a taxonomic tree-based user interface and downloaded from NCBI. CloVR-Comparative runs reference-free multiple whole-genome alignments to determine unique, shared and core coding sequences (CDSs) and single nucleotide polymorphisms (SNPs). Output includes short summary reports and detailed text-based results files, graphical visualizations (phylogenetic trees, circular figures), and a database file linked to the Sybil comparative genome browser. Data up- and download, pipeline configuration and monitoring, and access to Sybil are managed through CloVR-Comparative web interface. CloVR-Comparative and Sybil are distributed as part of the CloVR virtual appliance, which runs on local computers or the Amazon EC2 cloud. Representative datasets (e.g. 40 draft and complete Escherichia coli genomes) are processed in genomics projects, while eliminating the need for on-site computational resources and expertise.

  14. A catalogue of 136 microbial draft genomes from Red Sea metagenomes

    KAUST Repository

    Haroon, Mohamed; Thompson, Luke R.; Parks, Donovan H.; Hugenholtz, Philip; Stingl, Ulrich

    2016-01-01

    Earth is expected to continue warming and the Red Sea is a model environment for understanding the effects of global warming on ocean microbiomes due to its unusually high temperature, salinity and solar irradiance. However, most microbial diversity analyses of the Red Sea have been limited to cultured representatives and single marker gene analyses, hence neglecting the substantial uncultured majority. Here, we report 136 microbial genomes (completion minus contamination is ≥50%) assembled from 45 metagenomes from eight stations spanning the Red Sea and taken from multiple depths between 10 to 500 m. Phylogenomic analysis showed that most of the retrieved genomes belong to seven different phyla of known marine microbes, but more than half representing currently uncultured species. The open-access data presented here is the largest number of Red Sea representative microbial genomes reported in a single study and will help facilitate future studies in understanding the physiology of these microorganisms and how they have adapted to the relatively harsh conditions of the Red Sea.

  15. A catalogue of 136 microbial draft genomes from Red Sea metagenomes

    KAUST Repository

    Haroon, Mohamed

    2016-07-05

    Earth is expected to continue warming and the Red Sea is a model environment for understanding the effects of global warming on ocean microbiomes due to its unusually high temperature, salinity and solar irradiance. However, most microbial diversity analyses of the Red Sea have been limited to cultured representatives and single marker gene analyses, hence neglecting the substantial uncultured majority. Here, we report 136 microbial genomes (completion minus contamination is ≥50%) assembled from 45 metagenomes from eight stations spanning the Red Sea and taken from multiple depths between 10 to 500 m. Phylogenomic analysis showed that most of the retrieved genomes belong to seven different phyla of known marine microbes, but more than half representing currently uncultured species. The open-access data presented here is the largest number of Red Sea representative microbial genomes reported in a single study and will help facilitate future studies in understanding the physiology of these microorganisms and how they have adapted to the relatively harsh conditions of the Red Sea.

  16. A catalogue of 136 microbial draft genomes from Red Sea metagenomes.

    Science.gov (United States)

    Haroon, Mohamed F; Thompson, Luke R; Parks, Donovan H; Hugenholtz, Philip; Stingl, Ulrich

    2016-07-05

    Earth is expected to continue warming and the Red Sea is a model environment for understanding the effects of global warming on ocean microbiomes due to its unusually high temperature, salinity and solar irradiance. However, most microbial diversity analyses of the Red Sea have been limited to cultured representatives and single marker gene analyses, hence neglecting the substantial uncultured majority. Here, we report 136 microbial genomes (completion minus contamination is ≥50%) assembled from 45 metagenomes from eight stations spanning the Red Sea and taken from multiple depths between 10 to 500 m. Phylogenomic analysis showed that most of the retrieved genomes belong to seven different phyla of known marine microbes, but more than half representing currently uncultured species. The open-access data presented here is the largest number of Red Sea representative microbial genomes reported in a single study and will help facilitate future studies in understanding the physiology of these microorganisms and how they have adapted to the relatively harsh conditions of the Red Sea.

  17. Mess management in microbial ecology: Rhetorical processes of disciplinary integration

    Science.gov (United States)

    McCracken, Christopher W.

    As interdisciplinary work becomes more common in the sciences, research into the rhetorical processes mediating disciplinary integration becomes more vital. This dissertation, which takes as its subject the integration of microbiology and ecology, combines a postplural approach to rhetoric of science research with Victor Turner's "social drama" analysis and a third-generation activity theory methodological framework to identify conceptual and practical conflicts in interdisciplinary work and describe how, through visual and verbal communication, scientists negotiate these conflicts. First, to understand the conflicting disciplinary principles that might impede integration, the author conducts a Turnerian analysis of a disciplinary conflict that took place in the 1960s and 70s, during which American ecologists and biologists debated whether they should participate in the International Biological Program (IBP). Participation in the IBP ultimately contributed to the emergence of ecology as a discipline distinct from biology, and Turnerian social drama analysis of the debate surrounding participation lays bare the conflicting principles separating biology and ecology. Second, to answer the question of how these conflicting principles are negotiated in practice, the author reports on a yearlong qualitative study of scientists working in a microbial ecology laboratory. Focusing specifically on two case studies from this fieldwork that illustrate the key concept of textually mediated disciplinary integration, the author's analysis demonstrates how scientific objects emerge in differently situated practices, and how these objects manage to cohere despite their multiplicity through textually mediated rhetorical processes of calibration and alignment.

  18. Evaluation of an integrated continuous stirred microbial electrochemical reactor: Wastewater treatment, energy recovery and microbial community.

    Science.gov (United States)

    Wang, Haiman; Qu, Youpeng; Li, Da; Zhou, Xiangtong; Feng, Yujie

    2015-11-01

    A continuous stirred microbial electrochemical reactor (CSMER) was developed by integrating anaerobic digestion (AD) and microbial electrochemical system (MES). The system was capable of treating high strength artificial wastewater and simultaneously recovering electric and methane energy. Maximum power density of 583±9, 562±7, 533±10 and 572±6 mW m(-2) were obtained by each cell in a four-independent circuit mode operation at an OLR of 12 kg COD m(-3) d(-1). COD removal and energy recovery efficiency were 87.1% and 32.1%, which were 1.6 and 2.5 times higher than that of a continuous stirred tank reactor (CSTR). Larger amount of Deltaproteobacteria (5.3%) and hydrogenotrophic methanogens (47%) can account for the better performance of CSMER, since syntrophic associations among them provided more degradation pathways compared to the CSTR. Results demonstrate the CSMER holds great promise for efficient wastewater treatment and energy recovery. Copyright © 2015 Elsevier Ltd. All rights reserved.

  19. INE: a rice genome database with an integrated map view.

    Science.gov (United States)

    Sakata, K; Antonio, B A; Mukai, Y; Nagasaki, H; Sakai, Y; Makino, K; Sasaki, T

    2000-01-01

    The Rice Genome Research Program (RGP) launched a large-scale rice genome sequencing in 1998 aimed at decoding all genetic information in rice. A new genome database called INE (INtegrated rice genome Explorer) has been developed in order to integrate all the genomic information that has been accumulated so far and to correlate these data with the genome sequence. A web interface based on Java applet provides a rapid viewing capability in the database. The first operational version of the database has been completed which includes a genetic map, a physical map using YAC (Yeast Artificial Chromosome) clones and PAC (P1-derived Artificial Chromosome) contigs. These maps are displayed graphically so that the positional relationships among the mapped markers on each chromosome can be easily resolved. INE incorporates the sequences and annotations of the PAC contig. A site on low quality information ensures that all submitted sequence data comply with the standard for accuracy. As a repository of rice genome sequence, INE will also serve as a common database of all sequence data obtained by collaborating members of the International Rice Genome Sequencing Project (IRGSP). The database can be accessed at http://www. dna.affrc.go.jp:82/giot/INE. html or its mirror site at http://www.staff.or.jp/giot/INE.html

  20. How agricultural management shapes soil microbial communities: patterns emerging from genetic and genomic studies

    Science.gov (United States)

    Daly, Amanda; Grandy, A. Stuart

    2016-04-01

    Agriculture is a predominant land use and thus a large influence on global carbon (C) and nitrogen (N) balances, climate, and human health. If we are to produce food, fiber, and fuel sustainably we must maximize agricultural yield while minimizing negative environmental consequences, goals towards which we have made great strides through agronomic advances. However, most agronomic strategies have been designed with a view of soil as a black box, largely ignoring the way management is mediated by soil biota. Because soil microbes play a central role in many of the processes that deliver nutrients to crops and support their health and productivity, agricultural management strategies targeted to exploit or support microbial activity should deliver additional benefits. To do this we must determine how microbial community structure and function are shaped by agricultural practices, but until recently our characterizations of soil microbial communities in agricultural soils have been largely limited to broad taxonomic classes due to methodological constraints. With advances in high-throughput genetic and genomic sequencing techniques, better taxonomic resolution now enables us to determine how agricultural management affects specific microbes and, in turn, nutrient cycling outcomes. Here we unite findings from published research that includes genetic or genomic data about microbial community structure (e.g. 454, Illumina, clone libraries, qPCR) in soils under agricultural management regimes that differ in type and extent of tillage, cropping selections and rotations, inclusion of cover crops, organic amendments, and/or synthetic fertilizer application. We delineate patterns linking agricultural management to microbial diversity, biomass, C- and N-content, and abundance of microbial taxa; furthermore, where available, we compare patterns in microbial communities to patterns in soil extracellular enzyme activities, catabolic profiles, inorganic nitrogen pools, and nitrogen

  1. Using integrated environmental modeling to automate a process-based Quantitative Microbial Risk Assessment

    Science.gov (United States)

    Integrated Environmental Modeling (IEM) organizes multidisciplinary knowledge that explains and predicts environmental-system response to stressors. A Quantitative Microbial Risk Assessment (QMRA) is an approach integrating a range of disparate data (fate/transport, exposure, an...

  2. Using Integrated Environmental Modeling to Automate a Process-Based Quantitative Microbial Risk Assessment (presentation)

    Science.gov (United States)

    Integrated Environmental Modeling (IEM) organizes multidisciplinary knowledge that explains and predicts environmental-system response to stressors. A Quantitative Microbial Risk Assessment (QMRA) is an approach integrating a range of disparate data (fate/transport, exposure, and...

  3. Challenging a bioinformatic tool's ability to detect microbial contaminants using in silico whole genome sequencing data.

    Science.gov (United States)

    Olson, Nathan D; Zook, Justin M; Morrow, Jayne B; Lin, Nancy J

    2017-01-01

    High sensitivity methods such as next generation sequencing and polymerase chain reaction (PCR) are adversely impacted by organismal and DNA contaminants. Current methods for detecting contaminants in microbial materials (genomic DNA and cultures) are not sensitive enough and require either a known or culturable contaminant. Whole genome sequencing (WGS) is a promising approach for detecting contaminants due to its sensitivity and lack of need for a priori assumptions about the contaminant. Prior to applying WGS, we must first understand its limitations for detecting contaminants and potential for false positives. Herein we demonstrate and characterize a WGS-based approach to detect organismal contaminants using an existing metagenomic taxonomic classification algorithm. Simulated WGS datasets from ten genera as individuals and binary mixtures of eight organisms at varying ratios were analyzed to evaluate the role of contaminant concentration and taxonomy on detection. For the individual genomes the false positive contaminants reported depended on the genus, with Staphylococcus , Escherichia , and Shigella having the highest proportion of false positives. For nearly all binary mixtures the contaminant was detected in the in-silico datasets at the equivalent of 1 in 1,000 cells, though F. tularensis was not detected in any of the simulated contaminant mixtures and Y. pestis was only detected at the equivalent of one in 10 cells. Once a WGS method for detecting contaminants is characterized, it can be applied to evaluate microbial material purity, in efforts to ensure that contaminants are characterized in microbial materials used to validate pathogen detection assays, generate genome assemblies for database submission, and benchmark sequencing methods.

  4. Comparison of microbial DNA enrichment tools for metagenomic whole genome sequencing.

    Science.gov (United States)

    Thoendel, Matthew; Jeraldo, Patricio R; Greenwood-Quaintance, Kerryl E; Yao, Janet Z; Chia, Nicholas; Hanssen, Arlen D; Abdel, Matthew P; Patel, Robin

    2016-08-01

    Metagenomic whole genome sequencing for detection of pathogens in clinical samples is an exciting new area for discovery and clinical testing. A major barrier to this approach is the overwhelming ratio of human to pathogen DNA in samples with low pathogen abundance, which is typical of most clinical specimens. Microbial DNA enrichment methods offer the potential to relieve this limitation by improving this ratio. Two commercially available enrichment kits, the NEBNext Microbiome DNA Enrichment Kit and the Molzym MolYsis Basic kit, were tested for their ability to enrich for microbial DNA from resected arthroplasty component sonicate fluids from prosthetic joint infections or uninfected sonicate fluids spiked with Staphylococcus aureus. Using spiked uninfected sonicate fluid there was a 6-fold enrichment of bacterial DNA with the NEBNext kit and 76-fold enrichment with the MolYsis kit. Metagenomic whole genome sequencing of sonicate fluid revealed 13- to 85-fold enrichment of bacterial DNA using the NEBNext enrichment kit. The MolYsis approach achieved 481- to 9580-fold enrichment, resulting in 7 to 59% of sequencing reads being from the pathogens known to be present in the samples. These results demonstrate the usefulness of these tools when testing clinical samples with low microbial burden using next generation sequencing. Copyright © 2016 Elsevier B.V. All rights reserved.

  5. Genome-scale modelling of microbial metabolism with temporal and spatial resolution.

    Science.gov (United States)

    Henson, Michael A

    2015-12-01

    Most natural microbial systems have evolved to function in environments with temporal and spatial variations. A major limitation to understanding such complex systems is the lack of mathematical modelling frameworks that connect the genomes of individual species and temporal and spatial variations in the environment to system behaviour. The goal of this review is to introduce the emerging field of spatiotemporal metabolic modelling based on genome-scale reconstructions of microbial metabolism. The extension of flux balance analysis (FBA) to account for both temporal and spatial variations in the environment is termed spatiotemporal FBA (SFBA). Following a brief overview of FBA and its established dynamic extension, the SFBA problem is introduced and recent progress is described. Three case studies are reviewed to illustrate the current state-of-the-art and possible future research directions are outlined. The author posits that SFBA is the next frontier for microbial metabolic modelling and a rapid increase in methods development and system applications is anticipated. © 2015 Authors; published by Portland Press Limited.

  6. Resolution of habitat-associated ecogenomic signatures in bacteriophage genomes and application to microbial source tracking.

    Science.gov (United States)

    Ogilvie, Lesley A; Nzakizwanayo, Jonathan; Guppy, Fergus M; Dedi, Cinzia; Diston, David; Taylor, Huw; Ebdon, James; Jones, Brian V

    2018-04-01

    Just as the expansion in genome sequencing has revealed and permitted the exploitation of phylogenetic signals embedded in bacterial genomes, the application of metagenomics has begun to provide similar insights at the ecosystem level for microbial communities. However, little is known regarding this aspect of bacteriophage associated with microbial ecosystems, and if phage encode discernible habitat-associated signals diagnostic of underlying microbiomes. Here we demonstrate that individual phage can encode clear habitat-related 'ecogenomic signatures', based on relative representation of phage-encoded gene homologues in metagenomic data sets. Furthermore, we show the ecogenomic signature encoded by the gut-associated ɸB124-14 can be used to segregate metagenomes according to environmental origin, and distinguish 'contaminated' environmental metagenomes (subject to simulated in silico human faecal pollution) from uncontaminated data sets. This indicates phage-encoded ecological signals likely possess sufficient discriminatory power for use in biotechnological applications, such as development of microbial source tracking tools for monitoring water quality.

  7. SIGMA: A System for Integrative Genomic Microarray Analysis of Cancer Genomes

    Directory of Open Access Journals (Sweden)

    Davies Jonathan J

    2006-12-01

    Full Text Available Abstract Background The prevalence of high resolution profiling of genomes has created a need for the integrative analysis of information generated from multiple methodologies and platforms. Although the majority of data in the public domain are gene expression profiles, and expression analysis software are available, the increase of array CGH studies has enabled integration of high throughput genomic and gene expression datasets. However, tools for direct mining and analysis of array CGH data are limited. Hence, there is a great need for analytical and display software tailored to cross platform integrative analysis of cancer genomes. Results We have created a user-friendly java application to facilitate sophisticated visualization and analysis such as cross-tumor and cross-platform comparisons. To demonstrate the utility of this software, we assembled array CGH data representing Affymetrix SNP chip, Stanford cDNA arrays and whole genome tiling path array platforms for cross comparison. This cancer genome database contains 267 profiles from commonly used cancer cell lines representing 14 different tissue types. Conclusion In this study we have developed an application for the visualization and analysis of data from high resolution array CGH platforms that can be adapted for analysis of multiple types of high throughput genomic datasets. Furthermore, we invite researchers using array CGH technology to deposit both their raw and processed data, as this will be a continually expanding database of cancer genomes. This publicly available resource, the System for Integrative Genomic Microarray Analysis (SIGMA of cancer genomes, can be accessed at http://sigma.bccrc.ca.

  8. Human Papillomavirus Genome Integration and Head and Neck Cancer.

    Science.gov (United States)

    Pinatti, L M; Walline, H M; Carey, T E

    2018-06-01

    We conducted a critical review of human papillomavirus (HPV) integration into the host genome in oral/oropharyngeal cancer, reviewed the literature for HPV-induced cancers, and obtained current data for HPV-related oral and oropharyngeal cancers. In addition, we performed studies to identify HPV integration sites and the relationship of integration to viral-host fusion transcripts and whether integration is required for HPV-associated oncogenesis. Viral integration of HPV into the host genome is not required for the viral life cycle and might not be necessary for cellular transformation, yet HPV integration is frequently reported in cervical and head and neck cancer specimens. Studies of large numbers of early cervical lesions revealed frequent viral integration into gene-poor regions of the host genome with comparatively rare integration into cellular genes, suggesting that integration is a stochastic event and that site of integration may be largely a function of chance. However, more recent studies of head and neck squamous cell carcinomas (HNSCCs) suggest that integration may represent an additional oncogenic mechanism through direct effects on cancer-related gene expression and generation of hybrid viral-host fusion transcripts. In HNSCC cell lines as well as primary tumors, integration into cancer-related genes leading to gene disruption has been reported. The studies have shown that integration-induced altered gene expression may be associated with tumor recurrence. Evidence from several studies indicates that viral integration into genic regions is accompanied by local amplification, increased expression in some cases, interruption of gene expression, and likely additional oncogenic effects. Similarly, reported examples of viral integration near microRNAs suggest that altered expression of these regulatory molecules may also contribute to oncogenesis. Future work is indicated to identify the mechanisms of these events on cancer cell behavior.

  9. RegPredict: an integrated system for regulon inference in prokaryotes by comparative genomics approach

    Energy Technology Data Exchange (ETDEWEB)

    Novichkov, Pavel S.; Rodionov, Dmitry A.; Stavrovskaya, Elena D.; Novichkova, Elena S.; Kazakov, Alexey E.; Gelfand, Mikhail S.; Arkin, Adam P.; Mironov, Andrey A.; Dubchak, Inna

    2010-05-26

    RegPredict web server is designed to provide comparative genomics tools for reconstruction and analysis of microbial regulons using comparative genomics approach. The server allows the user to rapidly generate reference sets of regulons and regulatory motif profiles in a group of prokaryotic genomes. The new concept of a cluster of co-regulated orthologous operons allows the user to distribute the analysis of large regulons and to perform the comparative analysis of multiple clusters independently. Two major workflows currently implemented in RegPredict are: (i) regulon reconstruction for a known regulatory motif and (ii) ab initio inference of a novel regulon using several scenarios for the generation of starting gene sets. RegPredict provides a comprehensive collection of manually curated positional weight matrices of regulatory motifs. It is based on genomic sequences, ortholog and operon predictions from the MicrobesOnline. An interactive web interface of RegPredict integrates and presents diverse genomic and functional information about the candidate regulon members from several web resources. RegPredict is freely accessible at http://regpredict.lbl.gov.

  10. Tools for Accurate and Efficient Analysis of Complex Evolutionary Mechanisms in Microbial Genomes. Final Report

    Energy Technology Data Exchange (ETDEWEB)

    Nakhleh, Luay

    2014-03-12

    I proposed to develop computationally efficient tools for accurate detection and reconstruction of microbes' complex evolutionary mechanisms, thus enabling rapid and accurate annotation, analysis and understanding of their genomes. To achieve this goal, I proposed to address three aspects. (1) Mathematical modeling. A major challenge facing the accurate detection of HGT is that of distinguishing between these two events on the one hand and other events that have similar "effects." I proposed to develop a novel mathematical approach for distinguishing among these events. Further, I proposed to develop a set of novel optimization criteria for the evolutionary analysis of microbial genomes in the presence of these complex evolutionary events. (2) Algorithm design. In this aspect of the project, I proposed to develop an array of e cient and accurate algorithms for analyzing microbial genomes based on the formulated optimization criteria. Further, I proposed to test the viability of the criteria and the accuracy of the algorithms in an experimental setting using both synthetic as well as biological data. (3) Software development. I proposed the nal outcome to be a suite of software tools which implements the mathematical models as well as the algorithms developed.

  11. A Novel Tool for Microbial Genome Editing Using the Restriction-Modification System.

    Science.gov (United States)

    Bai, Hua; Deng, Aihua; Liu, Shuwen; Cui, Di; Qiu, Qidi; Wang, Laiyou; Yang, Zhao; Wu, Jie; Shang, Xiuling; Zhang, Yun; Wen, Tingyi

    2018-01-19

    Scarless genetic manipulation of genomes is an essential tool for biological research. The restriction-modification (R-M) system is a defense system in bacteria that protects against invading genomes on the basis of its ability to distinguish foreign DNA from self DNA. Here, we designed an R-M system-mediated genome editing (RMGE) technique for scarless genetic manipulation in different microorganisms. For bacteria with Type IV REase, an RMGE technique using the inducible DNA methyltransferase gene, bceSIIM (RMGE-bceSIIM), as the counter-selection cassette was developed to edit the genome of Escherichia coli. For bacteria without Type IV REase, an RMGE technique based on a restriction endonuclease (RMGE-mcrA) was established in Bacillus subtilis. These techniques were successfully used for gene deletion and replacement with nearly 100% counter-selection efficiencies, which were higher and more stable compared to conventional methods. Furthermore, precise point mutation without limiting sites was achieved in E. coli using RMGE-bceSIIM to introduce a single base mutation of A128C into the rpsL gene. In addition, the RMGE-mcrA technique was applied to delete the CAN1 gene in Saccharomyces cerevisiae DAY414 with 100% counter-selection efficiency. The effectiveness of the RMGE technique in E. coli, B. subtilis, and S. cerevisiae suggests the potential universal usefulness of this technique for microbial genome manipulation.

  12. Genomes in Turmoil: Frugality Drives Microbial Community Structure in Extremely Acidic Environments

    Science.gov (United States)

    Holmes, D. S.

    2016-12-01

    Extremely acidic environments (To gain insight into these issues, we have conducted deep bioinformatic analyses, including metabolic reconstruction of key assimilatory pathways, phylogenomics and network scrutiny of >160 genomes of acidophiles, including representatives from Archaea, Bacteria and Eukarya and at least ten metagenomes of acidic environments [Cardenas JP, et al. pp 179-197 in Acidophiles, eds R. Quatrini and D. B. Johnson, Caister Academic Press, UK (2016)]. Results yielded valuable insights into cellular processes, including carbon and nitrogen management and energy production, linking biogeochemical processes to organismal physiology. They also provided insight into the evolutionary forces that shape the genomic structure of members of acidophile communities. Niche partitioning can explain diversity patterns in rapidly changing acidic environments such as bioleaching heaps. However, in spatially and temporally homogeneous acidic environments genome flux appears to provide deeper insight into the composition and evolution of acidic consortia. Acidophiles have undergone genome streamlining by gene loss promoting mutual coexistence of species that exploit complementarity use of scarce resources consistent with the Black Queen hypothesis [Morris JJ et al. mBio 3: e00036-12 (2012)]. Acidophiles also have a large pool of accessory genes (the microbial super-genome) that can be accessed by horizontal gene transfer. This further promotes dependency relationships as drivers of community structure and the evolution of keystone species. Acknowledgements: Fondecyt 1130683; Basal CCTE PFB16

  13. INTEGRATED GENOME-BASED STUDIES OF SHEWANELLA ECOPHYSIOLOGY

    Energy Technology Data Exchange (ETDEWEB)

    NEALSON, KENNETH H.

    2013-10-15

    products of dissimilatory iron reduction. Geochim. Cosmochim. Acta. 74:574-583. 10. Karpinets, T.V., A.Y Obraztsova, Y. Wang, D.D. Schmoyer, G.H. Kora, B.H. Park, M.H. Serres, M.F. Ropmine, M.L. Land, T.B. Kothe, J.K. Fredrickson, K.H. Nealson, and E.C. Uberbacher 2010. Conserved synteny at the protein family level reveals genes underlying Shewanella species? cold tolerance and predicts their novel phenotypes. Funct. Integr. Genomics 10: 97 ? 110. (DOI 10.1007/s10143-009-0142-y) 11. Bretschger, O., A.C.M. Cheung, F. Mansfeld, and K.H. Nealson. 2010. Comparative microbial fuel cell evaluations of Shewanella spp. Electroanalysis 22: 883-894. 12. McLean, J.S., G. Wanger, Y.A. Gorby, M. Wainstein, J. McQuaid, Shun?ichi Ishii, O. Bretschger, H. Beyanal, K.H. Nealson. 2010. Quantification of electron transfer rates to a solid phase electron acceptor through the stages of biofilm formation from single cells to multicellular communities. Env. Sci. Technol. 44:2721-2717. 13. El-Naggar, M., G. Wanger, K.M. Leung, T.D. Yuzvinsky, G. Southam, J. Yang, W.M. Lau, K.H. Nealson, and Y.A. Gorby. 2010. Electrical Transport Along Bacterial Nanowires from Shewanella oneidensis MR-1 Proc. Nat. Acad. Sci. USA 107:18127-18131. 14. Biffinger, J.C., L.A. Fitzgerald, R. Ray, B.J. Little, S.E. Lizewski, E.R. Petersen, B.R. Ringeisen, W.C. Sanders, P.E. Sheehan, J.J. Pietron, J.W. Baldwin, L.J. Nadeau, G.R. Johnson, M. Ribbens, S.E. Finkel, K.H. Nealson. 2010. The utility of Shewanella japonica for microbial fuel cells. Bioresource Technol. 102:290-297. 15. Rodionov, D. , C. Yang, X. Li, I. Rodionova, Y. Wang, A.Y. Obraztsova, O. P. Zagnitko, R. Overbeek, M. F. Romine, S. Reed, J.K. Fredrickson, K.H. Nealson, A.L. Osterman. 2010. Genomic encyclopedia of sugar utilization pathways in the Shewanella genus. BMC Genomics 2010, 11:494 16. Kan, J., L. Hsu, A.C.M. Cheung, M. Pirbazari, and K.H. Nealson. 2011. Current production by bacterial communities in microbial fuel cells enriched from wastewater sludge

  14. Expanded microbial genome coverage and improved protein family annotation in the COG database.

    Science.gov (United States)

    Galperin, Michael Y; Makarova, Kira S; Wolf, Yuri I; Koonin, Eugene V

    2015-01-01

    Microbial genome sequencing projects produce numerous sequences of deduced proteins, only a small fraction of which have been or will ever be studied experimentally. This leaves sequence analysis as the only feasible way to annotate these proteins and assign to them tentative functions. The Clusters of Orthologous Groups of proteins (COGs) database (http://www.ncbi.nlm.nih.gov/COG/), first created in 1997, has been a popular tool for functional annotation. Its success was largely based on (i) its reliance on complete microbial genomes, which allowed reliable assignment of orthologs and paralogs for most genes; (ii) orthology-based approach, which used the function(s) of the characterized member(s) of the protein family (COG) to assign function(s) to the entire set of carefully identified orthologs and describe the range of potential functions when there were more than one; and (iii) careful manual curation of the annotation of the COGs, aimed at detailed prediction of the biological function(s) for each COG while avoiding annotation errors and overprediction. Here we present an update of the COGs, the first since 2003, and a comprehensive revision of the COG annotations and expansion of the genome coverage to include representative complete genomes from all bacterial and archaeal lineages down to the genus level. This re-analysis of the COGs shows that the original COG assignments had an error rate below 0.5% and allows an assessment of the progress in functional genomics in the past 12 years. During this time, functions of many previously uncharacterized COGs have been elucidated and tentative functional assignments of many COGs have been validated, either by targeted experiments or through the use of high-throughput methods. A particularly important development is the assignment of functions to several widespread, conserved proteins many of which turned out to participate in translation, in particular rRNA maturation and tRNA modification. The new version of the

  15. Experimental evolution and the dynamics of adaptation and genome evolution in microbial populations.

    Science.gov (United States)

    Lenski, Richard E

    2017-10-01

    Evolution is an on-going process, and it can be studied experimentally in organisms with rapid generations. My team has maintained 12 populations of Escherichia coli in a simple laboratory environment for >25 years and 60 000 generations. We have quantified the dynamics of adaptation by natural selection, seen some of the populations diverge into stably coexisting ecotypes, described changes in the bacteria's mutation rate, observed the new ability to exploit a previously untapped carbon source, characterized the dynamics of genome evolution and used parallel evolution to identify the genetic targets of selection. I discuss what the future might hold for this particular experiment, briefly highlight some other microbial evolution experiments and suggest how the fields of experimental evolution and microbial ecology might intersect going forward.

  16. Integrated Genomic Characterization of Papillary Thyroid Carcinoma

    Science.gov (United States)

    Agrawal, Nishant; Akbani, Rehan; Aksoy, B. Arman; Ally, Adrian; Arachchi, Harindra; Asa, Sylvia L.; Auman, J. Todd; Balasundaram, Miruna; Balu, Saianand; Baylin, Stephen B.; Behera, Madhusmita; Bernard, Brady; Beroukhim, Rameen; Bishop, Justin A.; Black, Aaron D.; Bodenheimer, Tom; Boice, Lori; Bootwalla, Moiz S.; Bowen, Jay; Bowlby, Reanne; Bristow, Christopher A.; Brookens, Robin; Brooks, Denise; Bryant, Robert; Buda, Elizabeth; Butterfield, Yaron S.N.; Carling, Tobias; Carlsen, Rebecca; Carter, Scott L.; Carty, Sally E.; Chan, Timothy A.; Chen, Amy Y.; Cherniack, Andrew D.; Cheung, Dorothy; Chin, Lynda; Cho, Juok; Chu, Andy; Chuah, Eric; Cibulskis, Kristian; Ciriello, Giovanni; Clarke, Amanda; Clayman, Gary L.; Cope, Leslie; Copland, John; Covington, Kyle; Danilova, Ludmila; Davidsen, Tanja; Demchok, John A.; DiCara, Daniel; Dhalla, Noreen; Dhir, Rajiv; Dookran, Sheliann S.; Dresdner, Gideon; Eldridge, Jonathan; Eley, Greg; El-Naggar, Adel K.; Eng, Stephanie; Fagin, James A.; Fennell, Timothy; Ferris, Robert L.; Fisher, Sheila; Frazer, Scott; Frick, Jessica; Gabriel, Stacey B.; Ganly, Ian; Gao, Jianjiong; Garraway, Levi A.; Gastier-Foster, Julie M.; Getz, Gad; Gehlenborg, Nils; Ghossein, Ronald; Gibbs, Richard A.; Giordano, Thomas J.; Gomez-Hernandez, Karen; Grimsby, Jonna; Gross, Benjamin; Guin, Ranabir; Hadjipanayis, Angela; Harper, Hollie A.; Hayes, D. Neil; Heiman, David I.; Herman, James G.; Hoadley, Katherine A.; Hofree, Matan; Holt, Robert A.; Hoyle, Alan P.; Huang, Franklin W.; Huang, Mei; Hutter, Carolyn M.; Ideker, Trey; Iype, Lisa; Jacobsen, Anders; Jefferys, Stuart R.; Jones, Corbin D.; Jones, Steven J.M.; Kasaian, Katayoon; Kebebew, Electron; Khuri, Fadlo R.; Kim, Jaegil; Kramer, Roger; Kreisberg, Richard; Kucherlapati, Raju; Kwiatkowski, David J.; Ladanyi, Marc; Lai, Phillip H.; Laird, Peter W.; Lander, Eric; Lawrence, Michael S.; Lee, Darlene; Lee, Eunjung; Lee, Semin; Lee, William; Leraas, Kristen M.; Lichtenberg, Tara M.; Lichtenstein, Lee; Lin, Pei; Ling, Shiyun; Liu, Jinze; Liu, Wenbin; Liu, Yingchun; LiVolsi, Virginia A.; Lu, Yiling; Ma, Yussanne; Mahadeshwar, Harshad S.; Marra, Marco A.; Mayo, Michael; McFadden, David G.; Meng, Shaowu; Meyerson, Matthew; Mieczkowski, Piotr A.; Miller, Michael; Mills, Gordon; Moore, Richard A.; Mose, Lisle E.; Mungall, Andrew J.; Murray, Bradley A.; Nikiforov, Yuri E.; Noble, Michael S.; Ojesina, Akinyemi I.; Owonikoko, Taofeek K.; Ozenberger, Bradley A.; Pantazi, Angeliki; Parfenov, Michael; Park, Peter J.; Parker, Joel S.; Paull, Evan O.; Pedamallu, Chandra Sekhar; Perou, Charles M.; Prins, Jan F.; Protopopov, Alexei; Ramalingam, Suresh S.; Ramirez, Nilsa C.; Ramirez, Ricardo; Raphael, Benjamin J.; Rathmell, W. Kimryn; Ren, Xiaojia; Reynolds, Sheila M.; Rheinbay, Esther; Ringel, Matthew D.; Rivera, Michael; Roach, Jeffrey; Robertson, A. Gordon; Rosenberg, Mara W.; Rosenthall, Matthew; Sadeghi, Sara; Saksena, Gordon; Sander, Chris; Santoso, Netty; Schein, Jacqueline E.; Schultz, Nikolaus; Schumacher, Steven E.; Seethala, Raja R.; Seidman, Jonathan; Senbabaoglu, Yasin; Seth, Sahil; Sharpe, Samantha; Mills Shaw, Kenna R.; Shen, John P.; Shen, Ronglai; Sherman, Steven; Sheth, Margi; Shi, Yan; Shmulevich, Ilya; Sica, Gabriel L.; Simons, Janae V.; Sipahimalani, Payal; Smallridge, Robert C.; Sofia, Heidi J.; Soloway, Matthew G.; Song, Xingzhi; Sougnez, Carrie; Stewart, Chip; Stojanov, Petar; Stuart, Joshua M.; Tabak, Barbara; Tam, Angela; Tan, Donghui; Tang, Jiabin; Tarnuzzer, Roy; Taylor, Barry S.; Thiessen, Nina; Thorne, Leigh; Thorsson, Vésteinn; Tuttle, R. Michael; Umbricht, Christopher B.; Van Den Berg, David J.; Vandin, Fabio; Veluvolu, Umadevi; Verhaak, Roel G.W.; Vinco, Michelle; Voet, Doug; Walter, Vonn; Wang, Zhining; Waring, Scot; Weinberger, Paul M.; Weinstein, John N.; Weisenberger, Daniel J.; Wheeler, David; Wilkerson, Matthew D.; Wilson, Jocelyn; Williams, Michelle; Winer, Daniel A.; Wise, Lisa; Wu, Junyuan; Xi, Liu; Xu, Andrew W.; Yang, Liming; Yang, Lixing; Zack, Travis I.; Zeiger, Martha A.; Zeng, Dong; Zenklusen, Jean Claude; Zhao, Ni; Zhang, Hailei; Zhang, Jianhua; Zhang, Jiashan (Julia); Zhang, Wei; Zmuda, Erik; Zou., Lihua

    2014-01-01

    Summary Papillary thyroid carcinoma (PTC) is the most common type of thyroid cancer. Here, we describe the genomic landscape of 496 PTCs. We observed a low frequency of somatic alterations (relative to other carcinomas) and extended the set of known PTC driver alterations to include EIF1AX, PPM1D and CHEK2 and diverse gene fusions. These discoveries reduced the fraction of PTC cases with unknown oncogenic driver from 25% to 3.5%. Combined analyses of genomic variants, gene expression, and methylation demonstrated that different driver groups lead to different pathologies with distinct signaling and differentiation characteristics. Similarly, we identified distinct molecular subgroups of BRAF-mutant tumors and multidimensional analyses highlighted a potential involvement of oncomiRs in less-differentiated subgroups. Our results propose a reclassification of thyroid cancers into molecular subtypes that better reflect their underlying signaling and differentiation properties, which has the potential to improve their pathological classification and better inform the management of the disease. PMID:25417114

  17. High throughput platforms for structural genomics of integral membrane proteins.

    Science.gov (United States)

    Mancia, Filippo; Love, James

    2011-08-01

    Structural genomics approaches on integral membrane proteins have been postulated for over a decade, yet specific efforts are lagging years behind their soluble counterparts. Indeed, high throughput methodologies for production and characterization of prokaryotic integral membrane proteins are only now emerging, while large-scale efforts for eukaryotic ones are still in their infancy. Presented here is a review of recent literature on actively ongoing structural genomics of membrane protein initiatives, with a focus on those aimed at implementing interesting techniques aimed at increasing our rate of success for this class of macromolecules. Copyright © 2011 Elsevier Ltd. All rights reserved.

  18. IMG-ABC: new features for bacterial secondary metabolism analysis and targeted biosynthetic gene cluster discovery in thousands of microbial genomes.

    Science.gov (United States)

    Hadjithomas, Michalis; Chen, I-Min A; Chu, Ken; Huang, Jinghua; Ratner, Anna; Palaniappan, Krishna; Andersen, Evan; Markowitz, Victor; Kyrpides, Nikos C; Ivanova, Natalia N

    2017-01-04

    Secondary metabolites produced by microbes have diverse biological functions, which makes them a great potential source of biotechnologically relevant compounds with antimicrobial, anti-cancer and other activities. The proteins needed to synthesize these natural products are often encoded by clusters of co-located genes called biosynthetic gene clusters (BCs). In order to advance the exploration of microbial secondary metabolism, we developed the largest publically available database of experimentally verified and predicted BCs, the Integrated Microbial Genomes Atlas of Biosynthetic gene Clusters (IMG-ABC) (https://img.jgi.doe.gov/abc/). Here, we describe an update of IMG-ABC, which includes ClusterScout, a tool for targeted identification of custom biosynthetic gene clusters across 40 000 isolate microbial genomes, and a new search capability to query more than 700 000 BCs from isolate genomes for clusters with similar Pfam composition. Additional features enable fast exploration and analysis of BCs through two new interactive visualization features, a BC function heatmap and a BC similarity network graph. These new tools and features add to the value of IMG-ABC's vast body of BC data, facilitating their in-depth analysis and accelerating secondary metabolite discovery. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  19. Microbial network for waste activated sludge cascade utilization in an integrated system of microbial electrolysis and anaerobic fermentation

    DEFF Research Database (Denmark)

    Liu, Wenzong; He, Zhangwei; Yang, Chunxue

    2016-01-01

    in an integrated system of microbial electrolysis cell (MEC) and anaerobic digestion (AD) for waste activated sludge (WAS). Microbial communities in integrated system would build a thorough energetic and metabolic interaction network regarding fermentation communities and electrode respiring communities...... to Firmicutes (Acetoanaerobium, Acetobacterium, and Fusibacter) showed synergistic relationship with exoelectrogensin the degradation of complex organic matter or recycling of MEC products (H2). High protein and polysaccharide but low fatty acid content led to the dominance of Proteiniclasticum...... biofilm. The overall performance of WAS cascade utilization was substantially related to the microbial community structures, which in turn depended on the initial pretreatment to enhance WAS fermentation. It is worth noting that species in AD and MEC communities are able to build complex networks...

  20. Genome sequence of a microbial lipid producing fungus Cryptococcus albidus NT2002.

    Science.gov (United States)

    Yong, Xiaoyu; Yan, Zhiying; Xu, Lin; Zhou, Jun; Wu, Xiayuan; Wu, Yuandong; Li, Yang; Chen, Zugeng; Zhou, Hua; Wei, Ping; Jia, Honghua

    2016-04-10

    Cryptococcus albidus NT2002, isolated from the soil in Xinjiang, China, appeared to have the ability to accumulate microbial lipid by utilizing various carbon sources. The predominant properties make it as a potential bio-platform for biodiesel production. Here, we report the complete genome sequence of C. albidus NT2002, which might provide a basis for further elucidation of the genetic background of this promising strain for developing metabolic engineering strategies to produce biodiesel in a green and sustainable manner. Copyright © 2016 Elsevier B.V. All rights reserved.

  1. Integrative Genome Comparison of Primary and Metastatic Melanomas

    Science.gov (United States)

    Feng, Bin; Nazarian, Rosalynn M.; Bosenberg, Marcus; Wu, Min; Scott, Kenneth L.; Kwong, Lawrence N.; Xiao, Yonghong; Cordon-Cardo, Carlos; Granter, Scott R.; Ramaswamy, Sridhar; Golub, Todd; Duncan, Lyn M.; Wagner, Stephan N.; Brennan, Cameron; Chin, Lynda

    2010-01-01

    A cardinal feature of malignant melanoma is its metastatic propensity. An incomplete view of the genetic events driving metastatic progression has been a major barrier to rational development of effective therapeutics and prognostic diagnostics for melanoma patients. In this study, we conducted global genomic characterization of primary and metastatic melanomas to examine the genomic landscape associated with metastatic progression. In addition to uncovering three genomic subclasses of metastastic melanomas, we delineated 39 focal and recurrent regions of amplification and deletions, many of which encompassed resident genes that have not been implicated in cancer or metastasis. To identify progression-associated metastasis gene candidates, we applied a statistical approach, Integrative Genome Comparison (IGC), to define 32 genomic regions of interest that were significantly altered in metastatic relative to primary melanomas, encompassing 30 resident genes with statistically significant expression deregulation. Functional assays on a subset of these candidates, including MET, ASPM, AKAP9, IMP3, PRKCA, RPA3, and SCAP2, validated their pro-invasion activities in human melanoma cells. Validity of the IGC approach was further reinforced by tissue microarray analysis of Survivin showing significant increased protein expression in thick versus thin primary cutaneous melanomas, and a progression correlation with lymph node metastases. Together, these functional validation results and correlative analysis of human tissues support the thesis that integrated genomic and pathological analyses of staged melanomas provide a productive entry point for discovery of melanoma metastases genes. PMID:20520718

  2. Reconstructing each cell's genome within complex microbial communities-dream or reality?

    Science.gov (United States)

    Clingenpeel, Scott; Clum, Alicia; Schwientek, Patrick; Rinke, Christian; Woyke, Tanja

    2014-01-01

    As the vast majority of microorganisms have yet to be cultivated in a laboratory setting, access to their genetic makeup has largely been limited to cultivation-independent methods. These methods, namely metagenomics and more recently single-cell genomics, have become cornerstones for microbial ecology and environmental microbiology. One ultimate goal is the recovery of genome sequences from each cell within an environment to move toward a better understanding of community metabolic potential and to provide substrate for experimental work. As single-cell sequencing has the ability to decipher all sequence information contained in an individual cell, this method holds great promise in tackling such challenge. Methodological limitations and inherent biases however do exist, which will be discussed here based on environmental and benchmark data, to assess how far we are from reaching this goal.

  3. Evaluation of nine popular de novo assemblers in microbial genome assembly.

    Science.gov (United States)

    Forouzan, Esmaeil; Maleki, Masoumeh Sadat Mousavi; Karkhane, Ali Asghar; Yakhchali, Bagher

    2017-12-01

    Next generation sequencing (NGS) technologies are revolutionizing biology, with Illumina being the most popular NGS platform. Short read assembly is a critical part of most genome studies using NGS. Hence, in this study, the performance of nine well-known assemblers was evaluated in the assembly of seven different microbial genomes. Effect of different read coverage and k-mer parameters on the quality of the assembly were also evaluated on both simulated and actual read datasets. Our results show that the performance of assemblers on real and simulated datasets could be significantly different, mainly because of coverage bias. According to outputs on actual read datasets, for all studied read coverages (of 7×, 25× and 100×), SPAdes and IDBA-UD clearly outperformed other assemblers based on NGA50 and accuracy metrics. Velvet is the most conservative assembler with the lowest NGA50 and error rate. Copyright © 2017. Published by Elsevier B.V.

  4. VERSE: a novel approach to detect virus integration in host genomes through reference genome customization.

    Science.gov (United States)

    Wang, Qingguo; Jia, Peilin; Zhao, Zhongming

    2015-01-01

    Fueled by widespread applications of high-throughput next generation sequencing (NGS) technologies and urgent need to counter threats of pathogenic viruses, large-scale studies were conducted recently to investigate virus integration in host genomes (for example, human tumor genomes) that may cause carcinogenesis or other diseases. A limiting factor in these studies, however, is rapid virus evolution and resulting polymorphisms, which prevent reads from aligning readily to commonly used virus reference genomes, and, accordingly, make virus integration sites difficult to detect. Another confounding factor is host genomic instability as a result of virus insertions. To tackle these challenges and improve our capability to identify cryptic virus-host fusions, we present a new approach that detects Virus intEgration sites through iterative Reference SEquence customization (VERSE). To the best of our knowledge, VERSE is the first approach to improve detection through customizing reference genomes. Using 19 human tumors and cancer cell lines as test data, we demonstrated that VERSE substantially enhanced the sensitivity of virus integration site detection. VERSE is implemented in the open source package VirusFinder 2 that is available at http://bioinfo.mc.vanderbilt.edu/VirusFinder/.

  5. Genome-resolved metagenomics reveals that sulfur metabolism dominates the microbial ecology of rising hydrothermal plumes

    Science.gov (United States)

    Anantharaman, K.; Breier, J. A., Jr.; Jain, S.; Reed, D. C.; Dick, G.

    2015-12-01

    Deep-sea hydrothermal plumes occur when hot fluids from hydrothermal vents replete with chemically reduced elements and compounds like sulfide, methane, hydrogen, ammonia, iron and manganese mix with cold, oxic seawater. Chemosynthetic microbes use these reduced chemicals to power primary production and are pervasive throughout the deep sea, even at sites far removed from hydrothermal vents. Although neutrally-buoyant hydrothermal plumes have been well-studied, rising hydrothermal plumes have received little attention even though they represent an important interface in the deep-sea where microbial metabolism and particle formation processes control the transformation of important elements and impact global biogeochemical cycles. In this study, we used genome-resolved metagenomic analyses and thermodynamic-bioenergetic modeling to study the microbial ecology of rising hydrothermal plumes at five different hydrothermal vents spanning a range of geochemical gradients at the Eastern Lau Spreading Center (ELSC) in the Western Pacific Ocean. Our analyses show that differences in the geochemistry of hydrothermal vents do not manifest in microbial diversity and community composition, both of which display only minor variance across ELSC hydrothermal plumes. Microbial metabolism is dominated by oxidation of reduced sulfur species and supports a diversity of bacteria, archaea and viruses that provide intriguing insights into metabolic plasticity and virus-mediated horizontal gene transfer in the microbial community. The manifestation of sulfur oxidation genes in hydrogen and methane oxidizing organisms hints at metabolic opportunism in deep-sea microbes that would enable them to respond to varying redox conditions in hydrothermal plumes. Finally, we infer that the abundance, diversity and metabolic versatility of microbes associated with sulfur oxidation impart functional redundancy that could allow it to persist in the dynamic settings of hydrothermal plumes.

  6. Environmental whole-genome amplification to access microbial populations in contaminated sediments

    Energy Technology Data Exchange (ETDEWEB)

    Abulencia, Carl B [Diversa Corporation; Wyborski, Denise L. [Diversa Corporation; Garcia, Joseph A. [Diversa Corporation; Podar, Mircea [ORNL; Chen, Wenqiong [Diversa Corporation; Chang, Sherman H. [Diversa Corporation; Chang, Hwai W. [Diversa Corporation; Watson, David B [ORNL; Brodie, Eoin L. [Lawrence Berkeley National Laboratory (LBNL); Hazen, Terry [Lawrence Berkeley National Laboratory (LBNL); Keller, Martin [ORNL

    2006-05-01

    Low-biomass samples from nitrate and heavy metal contaminated soils yield DNA amounts that have limited use for direct, native analysis and screening. Multiple displacement amplification (MDA) using {phi}29 DNA polymerase was used to amplify whole genomes from environmental, contaminated, subsurface sediments. By first amplifying the genomic DNA (gDNA), biodiversity analysis and gDNA library construction of microbes found in contaminated soils were made possible. The MDA method was validated by analyzing amplified genome coverage from approximately five Escherichia coli cells, resulting in 99.2% genome coverage. The method was further validated by confirming overall representative species coverage and also an amplification bias when amplifying from a mix of eight known bacterial strains. We extracted DNA from samples with extremely low cell densities from a U.S. Department of Energy contaminated site. After amplification, small-subunit rRNA analysis revealed relatively even distribution of species across several major phyla. Clone libraries were constructed from the amplified gDNA, and a small subset of clones was used for shotgun sequencing. BLAST analysis of the library clone sequences showed that 64.9% of the sequences had significant similarities to known proteins, and 'clusters of orthologous groups' (COG) analysis revealed that more than half of the sequences from each library contained sequence similarity to known proteins. The libraries can be readily screened for native genes or any target of interest. Whole-genome amplification of metagenomic DNA from very minute microbial sources, while introducing an amplification bias, will allow access to genomic information that was not previously accessible.

  7. Environmental Whole-Genome Amplification to Access Microbial Diversity in Contaminated Sediments

    Energy Technology Data Exchange (ETDEWEB)

    Abulencia, C.B.; Wyborski, D.L.; Garcia, J.; Podar, M.; Chen, W.; Chang, S.H.; Chang, H.W.; Watson, D.; Brodie,E.I.; Hazen, T.C.; Keller, M.

    2005-12-10

    Low-biomass samples from nitrate and heavy metal contaminated soils yield DNA amounts that have limited use for direct, native analysis and screening. Multiple displacement amplification (MDA) using ?29 DNA polymerase was used to amplify whole genomes from environmental, contaminated, subsurface sediments. By first amplifying the genomic DNA (gDNA), biodiversity analysis and gDNA library construction of microbes found in contaminated soils were made possible. The MDA method was validated by analyzing amplified genome coverage from approximately five Escherichia coli cells, resulting in 99.2 percent genome coverage. The method was further validated by confirming overall representative species coverage and also an amplification bias when amplifying from a mix of eight known bacterial strains. We extracted DNA from samples with extremely low cell densities from a U.S. Department of Energy contaminated site. After amplification, small subunit rRNA analysis revealed relatively even distribution of species across several major phyla. Clone libraries were constructed from the amplified gDNA, and a small subset of clones was used for shotgun sequencing. BLAST analysis of the library clone sequences showed that 64.9 percent of the sequences had significant similarities to known proteins, and ''clusters of orthologous groups'' (COG) analysis revealed that more than half of the sequences from each library contained sequence similarity to known proteins. The libraries can be readily screened for native genes or any target of interest. Whole-genome amplification of metagenomic DNA from very minute microbial sources, while introducing an amplification bias, will allow access to genomic information that was not previously accessible.

  8. KAIKObase: An integrated silkworm genome database and data mining tool

    Directory of Open Access Journals (Sweden)

    Nagaraju Javaregowda

    2009-10-01

    Full Text Available Abstract Background The silkworm, Bombyx mori, is one of the most economically important insects in many developing countries owing to its large-scale cultivation for silk production. With the development of genomic and biotechnological tools, B. mori has also become an important bioreactor for production of various recombinant proteins of biomedical interest. In 2004, two genome sequencing projects for B. mori were reported independently by Chinese and Japanese teams; however, the datasets were insufficient for building long genomic scaffolds which are essential for unambiguous annotation of the genome. Now, both the datasets have been merged and assembled through a joint collaboration between the two groups. Description Integration of the two data sets of silkworm whole-genome-shotgun sequencing by the Japanese and Chinese groups together with newly obtained fosmid- and BAC-end sequences produced the best continuity (~3.7 Mb in N50 scaffold size among the sequenced insect genomes and provided a high degree of nucleotide coverage (88% of all 28 chromosomes. In addition, a physical map of BAC contigs constructed by fingerprinting BAC clones and a SNP linkage map constructed using BAC-end sequences were available. In parallel, proteomic data from two-dimensional polyacrylamide gel electrophoresis in various tissues and developmental stages were compiled into a silkworm proteome database. Finally, a Bombyx trap database was constructed for documenting insertion positions and expression data of transposon insertion lines. Conclusion For efficient usage of genome information for functional studies, genomic sequences, physical and genetic map information and EST data were compiled into KAIKObase, an integrated silkworm genome database which consists of 4 map viewers, a gene viewer, and sequence, keyword and position search systems to display results and data at the level of nucleotide sequence, gene, scaffold and chromosome. Integration of the

  9. Integrating genomic selection into dairy cattle breeding programmes: a review.

    Science.gov (United States)

    Bouquet, A; Juga, J

    2013-05-01

    Extensive genetic progress has been achieved in dairy cattle populations on many traits of economic importance because of efficient breeding programmes. Success of these programmes has relied on progeny testing of the best young males to accurately assess their genetic merit and hence their potential for breeding. Over the last few years, the integration of dense genomic information into statistical tools used to make selection decisions, commonly referred to as genomic selection, has enabled gains in predicting accuracy of breeding values for young animals without own performance. The possibility to select animals at an early stage allows defining new breeding strategies aimed at boosting genetic progress while reducing costs. The first objective of this article was to review methods used to model and optimize breeding schemes integrating genomic selection and to discuss their relative advantages and limitations. The second objective was to summarize the main results and perspectives on the use of genomic selection in practical breeding schemes, on the basis of the example of dairy cattle populations. Two main designs of breeding programmes integrating genomic selection were studied in dairy cattle. Genomic selection can be used either for pre-selecting males to be progeny tested or for selecting males to be used as active sires in the population. The first option produces moderate genetic gains without changing the structure of breeding programmes. The second option leads to large genetic gains, up to double those of conventional schemes because of a major reduction in the mean generation interval, but it requires greater changes in breeding programme structure. The literature suggests that genomic selection becomes more attractive when it is coupled with embryo transfer technologies to further increase selection intensity on the dam-to-sire pathway. The use of genomic information also offers new opportunities to improve preservation of genetic variation. However

  10. Comparative genome analysis of a thermotolerant Escherichia coli obtained by Genome Replication Engineering Assisted Continuous Evolution (GREACE) and its parent strain provides new understanding of microbial heat tolerance.

    Science.gov (United States)

    Luan, Guodong; Bao, Guanhui; Lin, Zhao; Li, Yang; Chen, Zugen; Li, Yin; Cai, Zhen

    2015-12-25

    Heat tolerance of microbes is of great importance for efficient biorefinery and bioconversion. However, engineering and understanding of microbial heat tolerance are difficult and insufficient because it is a complex physiological trait which probably correlates with all gene functions, genetic regulations, and cellular metabolisms and activities. In this work, a novel strain engineering approach named Genome Replication Engineering Assisted Continuous Evolution (GREACE) was employed to improve the heat tolerance of Escherichia coli. When the E. coli strain carrying a mutator was cultivated under gradually increasing temperature, genome-wide mutations were continuously generated during genome replication and the mutated strains with improved thermotolerance were autonomously selected. A thermotolerant strain HR50 capable of growing at 50°C on LB agar plate was obtained within two months, demonstrating the efficiency of GREACE in improving such a complex physiological trait. To understand the improved heat tolerance, genomes of HR50 and its wildtype strain DH5α were sequenced. Evenly distributed 361 mutations covering all mutation types were found in HR50. Closed material transportations, loose genome conformation, and possibly altered cell wall structure and transcription pattern were the main differences of HR50 compared with DH5α, which were speculated to be responsible for the improved heat tolerance. This work not only expanding our understanding of microbial heat tolerance, but also emphasizing that the in vivo continuous genome mutagenesis method, GREACE, is efficient in improving microbial complex physiological trait. Copyright © 2015 Elsevier B.V. All rights reserved.

  11. Glycogenomics as a mass spectrometry-guided genome-mining method for microbial glycosylated molecules.

    Science.gov (United States)

    Kersten, Roland D; Ziemert, Nadine; Gonzalez, David J; Duggan, Brendan M; Nizet, Victor; Dorrestein, Pieter C; Moore, Bradley S

    2013-11-19

    Glycosyl groups are an essential mediator of molecular interactions in cells and on cellular surfaces. There are very few methods that directly relate sugar-containing molecules to their biosynthetic machineries. Here, we introduce glycogenomics as an experiment-guided genome-mining approach for fast characterization of glycosylated natural products (GNPs) and their biosynthetic pathways from genome-sequenced microbes by targeting glycosyl groups in microbial metabolomes. Microbial GNPs consist of aglycone and glycosyl structure groups in which the sugar unit(s) are often critical for the GNP's bioactivity, e.g., by promoting binding to a target biomolecule. GNPs are a structurally diverse class of molecules with important pharmaceutical and agrochemical applications. Herein, O- and N-glycosyl groups are characterized in their sugar monomers by tandem mass spectrometry (MS) and matched to corresponding glycosylation genes in secondary metabolic pathways by a MS-glycogenetic code. The associated aglycone biosynthetic genes of the GNP genotype then classify the natural product to further guide structure elucidation. We highlight the glycogenomic strategy by the characterization of several bioactive glycosylated molecules and their gene clusters, including the anticancer agent cinerubin B from Streptomyces sp. SPB74 and an antibiotic, arenimycin B, from Salinispora arenicola CNB-527.

  12. Bioreactor microbial ecosystems for thiocyanate and cyanide degradation unravelled with genome-resolved metagenomics.

    Science.gov (United States)

    Kantor, Rose S; van Zyl, A Wynand; van Hille, Robert P; Thomas, Brian C; Harrison, Susan T L; Banfield, Jillian F

    2015-12-01

    Gold ore processing uses cyanide (CN(-) ), which often results in large volumes of thiocyanate- (SCN(-) ) contaminated wastewater requiring treatment. Microbial communities can degrade SCN(-) and CN(-) , but little is known about their membership and metabolic potential. Microbial-based remediation strategies will benefit from an ecological understanding of organisms involved in the breakdown of SCN(-) and CN(-) into sulfur, carbon and nitrogen compounds. We performed metagenomic analysis of samples from two laboratory-scale bioreactors used to study SCN(-) and CN(-) degradation. Community analysis revealed the dominance of Thiobacillus spp., whose genomes harbour a previously unreported operon for SCN(-) degradation. Genome-based metabolic predictions suggest that a large portion of each bioreactor community is autotrophic, relying not on molasses in reactor feed but using energy gained from oxidation of sulfur compounds produced during SCN(-) degradation. Heterotrophs, including a bacterium from a previously uncharacterized phylum, compose a smaller portion of the reactor community. Predation by phage and eukaryotes is predicted to affect community dynamics. Genes for ammonium oxidation and denitrification were detected, indicating the potential for nitrogen removal, as required for complete remediation of wastewater. These findings suggest optimization strategies for reactor design, such as improved aerobic/anaerobic partitioning and elimination of organic carbon from reactor feed. © 2015 Society for Applied Microbiology and John Wiley & Sons Ltd.

  13. MicrobesOnline: an integrated portal for comparative and functional genomics

    Energy Technology Data Exchange (ETDEWEB)

    Dehal, Paramvir; Joachimiak, Marcin; Price, Morgan; Bates, John; Baumohl, Jason; Chivian, Dylan; Friedland, Greg; Huang, Kathleen; Keller, Keith; Novichkov, Pavel; Dubchak, Inna; Alm, Eric; Arkin, Adam

    2011-07-14

    Since 2003, MicrobesOnline (http://www.microbesonline.org) has been providing a community resource for comparative and functional genome analysis. The portal includes over 1000 complete genomes of bacteria, archaea and fungi and thousands of expression microarrays from diverse organisms ranging from model organisms such as Escherichia coli and Saccharomyces cerevisiae to environmental microbes such as Desulfovibrio vulgaris and Shewanella oneidensis. To assist in annotating genes and in reconstructing their evolutionary history, MicrobesOnline includes a comparative genome browser based on phylogenetic trees for every gene family as well as a species tree. To identify co-regulated genes, MicrobesOnline can search for genes based on their expression profile, and provides tools for identifying regulatory motifs and seeing if they are conserved. MicrobesOnline also includes fast phylogenetic profile searches, comparative views of metabolic pathways, operon predictions, a workbench for sequence analysis and integration with RegTransBase and other microbial genome resources. The next update of MicrobesOnline will contain significant new functionality, including comparative analysis of metagenomic sequence data. Programmatic access to the database, along with source code and documentation, is available at http://microbesonline.org/programmers.html.

  14. MicrobesOnline: an integrated portal for comparative and functional genomics

    Energy Technology Data Exchange (ETDEWEB)

    Dehal, Paramvir S.; Joachimiak, Marcin P.; Price, Morgan N.; Bates, John T.; Baumohl, Jason K.; Chivian, Dylan; Friedland, Greg D.; Huang, Katherine H.; Keller, Keith; Novichkov, Pavel S.; Dubchak, Inna L.; Alm, Eric J.; Arkin, Adam P.

    2009-09-17

    Since 2003, MicrobesOnline (http://www.microbesonline.org) has been providing a community resource for comparative and functional genome analysis. The portal includes over 1000 complete genomes of bacteria, archaea and fungi and thousands of expression microarrays from diverse organisms ranging from model organisms such as Escherichia coli and Saccharomyces cerevisiae to environmental microbes such as Desulfovibrio vulgaris and Shewanella oneidensis. To assist in annotating genes and in reconstructing their evolutionary history, MicrobesOnline includes a comparative genome browser based on phylogenetic trees for every gene family as well as a species tree. To identify co-regulated genes, MicrobesOnline can search for genes based on their expression profile, and provides tools for identifying regulatory motifs and seeing if they are conserved. MicrobesOnline also includes fast phylogenetic profile searches, comparative views of metabolic pathways, operon predictions, a workbench for sequence analysis and integration with RegTransBase and other microbial genome resources. The next update of MicrobesOnline will contain significant new functionality, including comparative analysis of metagenomic sequence data. Programmatic access to the database, along with source code and documentation, is available at http://microbesonline.org/programmers.html.

  15. Genome-enabled Modeling of Microbial Biogeochemistry using a Trait-based Approach. Does Increasing Metabolic Complexity Increase Predictive Capabilities?

    Science.gov (United States)

    King, E.; Karaoz, U.; Molins, S.; Bouskill, N.; Anantharaman, K.; Beller, H. R.; Banfield, J. F.; Steefel, C. I.; Brodie, E.

    2015-12-01

    The biogeochemical functioning of ecosystems is shaped in part by genomic information stored in the subsurface microbiome. Cultivation-independent approaches allow us to extract this information through reconstruction of thousands of genomes from a microbial community. Analysis of these genomes, in turn, gives an indication of the organisms present and their functional roles. However, metagenomic analyses can currently deliver thousands of different genomes that range in abundance/importance, requiring the identification and assimilation of key physiologies and metabolisms to be represented as traits for successful simulation of subsurface processes. Here we focus on incorporating -omics information into BioCrunch, a genome-informed trait-based model that represents the diversity of microbial functional processes within a reactive transport framework. This approach models the rate of nutrient uptake and the thermodynamics of coupled electron donors and acceptors for a range of microbial metabolisms including heterotrophs and chemolithotrophs. Metabolism of exogenous substrates fuels catabolic and anabolic processes, with the proportion of energy used for cellular maintenance, respiration, biomass development, and enzyme production based upon dynamic intracellular and environmental conditions. This internal resource partitioning represents a trade-off against biomass formation and results in microbial community emergence across a fitness landscape. Biocrunch was used here in simulations that included organisms and metabolic pathways derived from a dataset of ~1200 non-redundant genomes reflecting a microbial community in a floodplain aquifer. Metagenomic data was directly used to parameterize trait values related to growth and to identify trait linkages associated with respiration, fermentation, and key enzymatic functions such as plant polymer degradation. Simulations spanned a range of metabolic complexities and highlight benefits originating from simulations

  16. Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration.

    Science.gov (United States)

    Thorvaldsdóttir, Helga; Robinson, James T; Mesirov, Jill P

    2013-03-01

    Data visualization is an essential component of genomic data analysis. However, the size and diversity of the data sets produced by today's sequencing and array-based profiling methods present major challenges to visualization tools. The Integrative Genomics Viewer (IGV) is a high-performance viewer that efficiently handles large heterogeneous data sets, while providing a smooth and intuitive user experience at all levels of genome resolution. A key characteristic of IGV is its focus on the integrative nature of genomic studies, with support for both array-based and next-generation sequencing data, and the integration of clinical and phenotypic data. Although IGV is often used to view genomic data from public sources, its primary emphasis is to support researchers who wish to visualize and explore their own data sets or those from colleagues. To that end, IGV supports flexible loading of local and remote data sets, and is optimized to provide high-performance data visualization and exploration on standard desktop systems. IGV is freely available for download from http://www.broadinstitute.org/igv, under a GNU LGPL open-source license.

  17. The GAAS metagenomic tool and its estimations of viral and microbial average genome size in four major biomes.

    Science.gov (United States)

    Angly, Florent E; Willner, Dana; Prieto-Davó, Alejandra; Edwards, Robert A; Schmieder, Robert; Vega-Thurber, Rebecca; Antonopoulos, Dionysios A; Barott, Katie; Cottrell, Matthew T; Desnues, Christelle; Dinsdale, Elizabeth A; Furlan, Mike; Haynes, Matthew; Henn, Matthew R; Hu, Yongfei; Kirchman, David L; McDole, Tracey; McPherson, John D; Meyer, Folker; Miller, R Michael; Mundt, Egbert; Naviaux, Robert K; Rodriguez-Mueller, Beltran; Stevens, Rick; Wegley, Linda; Zhang, Lixin; Zhu, Baoli; Rohwer, Forest

    2009-12-01

    Metagenomic studies characterize both the composition and diversity of uncultured viral and microbial communities. BLAST-based comparisons have typically been used for such analyses; however, sampling biases, high percentages of unknown sequences, and the use of arbitrary thresholds to find significant similarities can decrease the accuracy and validity of estimates. Here, we present Genome relative Abundance and Average Size (GAAS), a complete software package that provides improved estimates of community composition and average genome length for metagenomes in both textual and graphical formats. GAAS implements a novel methodology to control for sampling bias via length normalization, to adjust for multiple BLAST similarities by similarity weighting, and to select significant similarities using relative alignment lengths. In benchmark tests, the GAAS method was robust to both high percentages of unknown sequences and to variations in metagenomic sequence read lengths. Re-analysis of the Sargasso Sea virome using GAAS indicated that standard methodologies for metagenomic analysis may dramatically underestimate the abundance and importance of organisms with small genomes in environmental systems. Using GAAS, we conducted a meta-analysis of microbial and viral average genome lengths in over 150 metagenomes from four biomes to determine whether genome lengths vary consistently between and within biomes, and between microbial and viral communities from the same environment. Significant differences between biomes and within aquatic sub-biomes (oceans, hypersaline systems, freshwater, and microbialites) suggested that average genome length is a fundamental property of environments driven by factors at the sub-biome level. The behavior of paired viral and microbial metagenomes from the same environment indicated that microbial and viral average genome sizes are independent of each other, but indicative of community responses to stressors and environmental conditions.

  18. The GAAS metagenomic tool and its estimations of viral and microbial average genome size in four major biomes.

    Directory of Open Access Journals (Sweden)

    Florent E Angly

    2009-12-01

    Full Text Available Metagenomic studies characterize both the composition and diversity of uncultured viral and microbial communities. BLAST-based comparisons have typically been used for such analyses; however, sampling biases, high percentages of unknown sequences, and the use of arbitrary thresholds to find significant similarities can decrease the accuracy and validity of estimates. Here, we present Genome relative Abundance and Average Size (GAAS, a complete software package that provides improved estimates of community composition and average genome length for metagenomes in both textual and graphical formats. GAAS implements a novel methodology to control for sampling bias via length normalization, to adjust for multiple BLAST similarities by similarity weighting, and to select significant similarities using relative alignment lengths. In benchmark tests, the GAAS method was robust to both high percentages of unknown sequences and to variations in metagenomic sequence read lengths. Re-analysis of the Sargasso Sea virome using GAAS indicated that standard methodologies for metagenomic analysis may dramatically underestimate the abundance and importance of organisms with small genomes in environmental systems. Using GAAS, we conducted a meta-analysis of microbial and viral average genome lengths in over 150 metagenomes from four biomes to determine whether genome lengths vary consistently between and within biomes, and between microbial and viral communities from the same environment. Significant differences between biomes and within aquatic sub-biomes (oceans, hypersaline systems, freshwater, and microbialites suggested that average genome length is a fundamental property of environments driven by factors at the sub-biome level. The behavior of paired viral and microbial metagenomes from the same environment indicated that microbial and viral average genome sizes are independent of each other, but indicative of community responses to stressors and

  19. Phylogeny-guided (meta)genome mining approach for the targeted discovery of new microbial natural products.

    Science.gov (United States)

    Kang, Hahk-Soo

    2017-02-01

    Genomics-based methods are now commonplace in natural products research. A phylogeny-guided mining approach provides a means to quickly screen a large number of microbial genomes or metagenomes in search of new biosynthetic gene clusters of interest. In this approach, biosynthetic genes serve as molecular markers, and phylogenetic trees built with known and unknown marker gene sequences are used to quickly prioritize biosynthetic gene clusters for their metabolites characterization. An increase in the use of this approach has been observed for the last couple of years along with the emergence of low cost sequencing technologies. The aim of this review is to discuss the basic concept of a phylogeny-guided mining approach, and also to provide examples in which this approach was successfully applied to discover new natural products from microbial genomes and metagenomes. I believe that the phylogeny-guided mining approach will continue to play an important role in genomics-based natural products research.

  20. Construction of an ortholog database using the semantic web technology for integrative analysis of genomic data.

    Science.gov (United States)

    Chiba, Hirokazu; Nishide, Hiroyo; Uchiyama, Ikuo

    2015-01-01

    Recently, various types of biological data, including genomic sequences, have been rapidly accumulating. To discover biological knowledge from such growing heterogeneous data, a flexible framework for data integration is necessary. Ortholog information is a central resource for interlinking corresponding genes among different organisms, and the Semantic Web provides a key technology for the flexible integration of heterogeneous data. We have constructed an ortholog database using the Semantic Web technology, aiming at the integration of numerous genomic data and various types of biological information. To formalize the structure of the ortholog information in the Semantic Web, we have constructed the Ortholog Ontology (OrthO). While the OrthO is a compact ontology for general use, it is designed to be extended to the description of database-specific concepts. On the basis of OrthO, we described the ortholog information from our Microbial Genome Database for Comparative Analysis (MBGD) in the form of Resource Description Framework (RDF) and made it available through the SPARQL endpoint, which accepts arbitrary queries specified by users. In this framework based on the OrthO, the biological data of different organisms can be integrated using the ortholog information as a hub. Besides, the ortholog information from different data sources can be compared with each other using the OrthO as a shared ontology. Here we show some examples demonstrating that the ortholog information described in RDF can be used to link various biological data such as taxonomy information and Gene Ontology. Thus, the ortholog database using the Semantic Web technology can contribute to biological knowledge discovery through integrative data analysis.

  1. Complete genome sequence of Klebsiella pneumoniae J1, a protein-based microbial flocculant-producing bacterium.

    Science.gov (United States)

    Pang, Changlong; Li, Ang; Cui, Di; Yang, Jixian; Ma, Fang; Guo, Haijuan

    2016-02-20

    Klebsiella pneumoniae J1 is a Gram-negative strain, which belongs to a protein-based microbial flocculant-producing bacterium. However, little genetic information is known about this species. Here we carried out a whole-genome sequence analysis of this strain and report the complete genome sequence of this organism and its genetic basis for carbohydrate metabolism, capsule biosynthesis and transport system. Copyright © 2016 Elsevier B.V. All rights reserved.

  2. Ensembl Genomes: an integrative resource for genome-scale data from non-vertebrate species.

    Science.gov (United States)

    Kersey, Paul J; Staines, Daniel M; Lawson, Daniel; Kulesha, Eugene; Derwent, Paul; Humphrey, Jay C; Hughes, Daniel S T; Keenan, Stephan; Kerhornou, Arnaud; Koscielny, Gautier; Langridge, Nicholas; McDowall, Mark D; Megy, Karine; Maheswari, Uma; Nuhn, Michael; Paulini, Michael; Pedro, Helder; Toneva, Iliana; Wilson, Derek; Yates, Andrew; Birney, Ewan

    2012-01-01

    Ensembl Genomes (http://www.ensemblgenomes.org) is an integrative resource for genome-scale data from non-vertebrate species. The project exploits and extends technology (for genome annotation, analysis and dissemination) developed in the context of the (vertebrate-focused) Ensembl project and provides a complementary set of resources for non-vertebrate species through a consistent set of programmatic and interactive interfaces. These provide access to data including reference sequence, gene models, transcriptional data, polymorphisms and comparative analysis. Since its launch in 2009, Ensembl Genomes has undergone rapid expansion, with the goal of providing coverage of all major experimental organisms, and additionally including taxonomic reference points to provide the evolutionary context in which genes can be understood. Against the backdrop of a continuing increase in genome sequencing activities in all parts of the tree of life, we seek to work, wherever possible, with the communities actively generating and using data, and are participants in a growing range of collaborations involved in the annotation and analysis of genomes.

  3. Sequence-based Methods in Human Microbial Ecology: A The 2nd HumanGenome Comes of Age

    Energy Technology Data Exchange (ETDEWEB)

    Weng, Li; Rubin, Edward M.; Bristow, James

    2005-06-01

    Ecologists studying microbial life in the environment have recognized the enormous complexity of microbial diversity for more than a decade (Whitman et al. 1998). The development of a variety of culture-independent methods, many of them coupled with high-throughput DNA sequencing, has allowed this diversity to be explored in ever greater detail (Handelsman 2004; Harris et al. 2004; Hugenholtz et al. 1998; Moreira and Lopez-Garcia 2002; Rappe and Giovannoni 2003). Despite the widespread application of these new techniques to the characterization of uncultivated microbes and microbial communities in the environment, their application to human health and disease has lagged behind. Because these techniques now allow not only cataloging of microbial diversity, but also insight into microbial functions, it is time for clinical microbiologists to apply these tools to the microbial communities that abound on and within us, in what has been aptly called ''the second Human Genome Project'' (Relman and Falkow 2001). In this review we will discuss the sequence-based methods for microbial analysis that are currently available and their application to identify novel human pathogens, improve diagnosis and treatment of known infectious diseases, and finally to advance understanding of our relationship with microbial communities that normally reside in and on the human body.

  4. Integration of genomic information with biological networks using Cytoscape.

    Science.gov (United States)

    Bauer-Mehren, Anna

    2013-01-01

    Cytoscape is an open-source software for visualizing, analyzing, and modeling biological networks. This chapter explains how to use Cytoscape to analyze the functional effect of sequence variations in the context of biological networks such as protein-protein interaction networks and signaling pathways. The chapter is divided into five parts: (1) obtaining information about the functional effect of sequence variation in a Cytoscape readable format, (2) loading and displaying different types of biological networks in Cytoscape, (3) integrating the genomic information (SNPs and mutations) with the biological networks, and (4) analyzing the effect of the genomic perturbation onto the network structure using Cytoscape built-in functions. Finally, we briefly outline how the integrated data can help in building mathematical network models for analyzing the effect of the sequence variation onto the dynamics of the biological system. Each part is illustrated by step-by-step instructions on an example use case and visualized by many screenshots and figures.

  5. The evolution of microbial species - a view through the genomic lens

    Energy Technology Data Exchange (ETDEWEB)

    Varghese, Neha; Mukherjee, Supratim; ivanova, Natalia; Mavromatis, Konstantinos; Konstantinidis, Konstantinos; Kyrpides, Nikos; Pati, Amrita

    2014-03-17

    For a long time prokaryotic species definition has been under debate and a constant source of turmoil in microbiology. This has recently prompted the ASM to call for a scalable and reproducible technique, which uses meaningful commonalities to cluster microorganisms into groups corresponding to prokaryotic species. Whole-genome Average Nucleotide Identity (gANI) was previously suggested as a measure of genetic distance that generally agrees with prokaryotic species assignments based on the accepted best practices (DNA-DNA hybridization and 16S rDNA similarity). In this work, we prove that gANI is indeed the meaningful commonality based on which microorganisms can be grouped into the aforementioned clusters. By analyzing 1.76 million pairs of genomes we find that identification of the closest relatives of an organism via gANI is precise, scalable, reproducible, and reflects the evolutionary dynamics of microbes. We model the previously unexplored statistical properties of gANI using 6,000 microbial genomes and apply species-specific gANI cutoffs to reveal anomalies in the current taxonomic species definitions for almost 50percent of the species with multiple genome sequences. We also provide evidence of speciation events and genetic continuums in 17.8percent of those species. We consider disagreements between gANI-based groupings and named species and demonstrate that the former have all the desired features to serve as the much-needed natural groups for moving forward with taxonomy. Further, the groupings identified are presented in detail at http://ani.jgi-psf.org to facilitate comprehensive downstream analysis for researchers across different disciplines

  6. GDR (Genome Database for Rosaceae: integrated web resources for Rosaceae genomics and genetics research

    Directory of Open Access Journals (Sweden)

    Ficklin Stephen

    2004-09-01

    Full Text Available Abstract Background Peach is being developed as a model organism for Rosaceae, an economically important family that includes fruits and ornamental plants such as apple, pear, strawberry, cherry, almond and rose. The genomics and genetics data of peach can play a significant role in the gene discovery and the genetic understanding of related species. The effective utilization of these peach resources, however, requires the development of an integrated and centralized database with associated analysis tools. Description The Genome Database for Rosaceae (GDR is a curated and integrated web-based relational database. GDR contains comprehensive data of the genetically anchored peach physical map, an annotated peach EST database, Rosaceae maps and markers and all publicly available Rosaceae sequences. Annotations of ESTs include contig assembly, putative function, simple sequence repeats, and anchored position to the peach physical map where applicable. Our integrated map viewer provides graphical interface to the genetic, transcriptome and physical mapping information. ESTs, BACs and markers can be queried by various categories and the search result sites are linked to the integrated map viewer or to the WebFPC physical map sites. In addition to browsing and querying the database, users can compare their sequences with the annotated GDR sequences via a dedicated sequence similarity server running either the BLAST or FASTA algorithm. To demonstrate the utility of the integrated and fully annotated database and analysis tools, we describe a case study where we anchored Rosaceae sequences to the peach physical and genetic map by sequence similarity. Conclusions The GDR has been initiated to meet the major deficiency in Rosaceae genomics and genetics research, namely a centralized web database and bioinformatics tools for data storage, analysis and exchange. GDR can be accessed at http://www.genome.clemson.edu/gdr/.

  7. GDR (Genome Database for Rosaceae): integrated web resources for Rosaceae genomics and genetics research.

    Science.gov (United States)

    Jung, Sook; Jesudurai, Christopher; Staton, Margaret; Du, Zhidian; Ficklin, Stephen; Cho, Ilhyung; Abbott, Albert; Tomkins, Jeffrey; Main, Dorrie

    2004-09-09

    Peach is being developed as a model organism for Rosaceae, an economically important family that includes fruits and ornamental plants such as apple, pear, strawberry, cherry, almond and rose. The genomics and genetics data of peach can play a significant role in the gene discovery and the genetic understanding of related species. The effective utilization of these peach resources, however, requires the development of an integrated and centralized database with associated analysis tools. The Genome Database for Rosaceae (GDR) is a curated and integrated web-based relational database. GDR contains comprehensive data of the genetically anchored peach physical map, an annotated peach EST database, Rosaceae maps and markers and all publicly available Rosaceae sequences. Annotations of ESTs include contig assembly, putative function, simple sequence repeats, and anchored position to the peach physical map where applicable. Our integrated map viewer provides graphical interface to the genetic, transcriptome and physical mapping information. ESTs, BACs and markers can be queried by various categories and the search result sites are linked to the integrated map viewer or to the WebFPC physical map sites. In addition to browsing and querying the database, users can compare their sequences with the annotated GDR sequences via a dedicated sequence similarity server running either the BLAST or FASTA algorithm. To demonstrate the utility of the integrated and fully annotated database and analysis tools, we describe a case study where we anchored Rosaceae sequences to the peach physical and genetic map by sequence similarity. The GDR has been initiated to meet the major deficiency in Rosaceae genomics and genetics research, namely a centralized web database and bioinformatics tools for data storage, analysis and exchange. GDR can be accessed at http://www.genome.clemson.edu/gdr/.

  8. Construction of an integrated database to support genomic sequence analysis

    Energy Technology Data Exchange (ETDEWEB)

    Gilbert, W.; Overbeek, R.

    1994-11-01

    The central goal of this project is to develop an integrated database to support comparative analysis of genomes including DNA sequence data, protein sequence data, gene expression data and metabolism data. In developing the logic-based system GenoBase, a broader integration of available data was achieved due to assistance from collaborators. Current goals are to easily include new forms of data as they become available and to easily navigate through the ensemble of objects described within the database. This report comments on progress made in these areas.

  9. Thousands of microbial genomes shed light on interconnected biogeochemical processes in an aquifer system

    Science.gov (United States)

    Anantharaman, Karthik; Brown, Christopher T.; Hug, Laura A.; Sharon, Itai; Castelle, Cindy J.; Probst, Alexander J.; Thomas, Brian C.; Singh, Andrea; Wilkins, Michael J.; Karaoz, Ulas; Brodie, Eoin L.; Williams, Kenneth H.; Hubbard, Susan S.; Banfield, Jillian F.

    2016-01-01

    The subterranean world hosts up to one-fifth of all biomass, including microbial communities that drive transformations central to Earth's biogeochemical cycles. However, little is known about how complex microbial communities in such environments are structured, and how inter-organism interactions shape ecosystem function. Here we apply terabase-scale cultivation-independent metagenomics to aquifer sediments and groundwater, and reconstruct 2,540 draft-quality, near-complete and complete strain-resolved genomes that represent the majority of known bacterial phyla as well as 47 newly discovered phylum-level lineages. Metabolic analyses spanning this vast phylogenetic diversity and representing up to 36% of organisms detected in the system are used to document the distribution of pathways in coexisting organisms. Consistent with prior findings indicating metabolic handoffs in simple consortia, we find that few organisms within the community can conduct multiple sequential redox transformations. As environmental conditions change, different assemblages of organisms are selected for, altering linkages among the major biogeochemical cycles. PMID:27774985

  10. Construction of a dairy microbial genome catalog opens new perspectives for the metagenomic analysis of dairy fermented products.

    Science.gov (United States)

    Almeida, Mathieu; Hébert, Agnès; Abraham, Anne-Laure; Rasmussen, Simon; Monnet, Christophe; Pons, Nicolas; Delbès, Céline; Loux, Valentin; Batto, Jean-Michel; Leonard, Pierre; Kennedy, Sean; Ehrlich, Stanislas Dusko; Pop, Mihai; Montel, Marie-Christine; Irlinger, Françoise; Renault, Pierre

    2014-12-13

    Microbial communities of traditional cheeses are complex and insufficiently characterized. The origin, safety and functional role in cheese making of these microbial communities are still not well understood. Metagenomic analysis of these communities by high throughput shotgun sequencing is a promising approach to characterize their genomic and functional profiles. Such analyses, however, critically depend on the availability of appropriate reference genome databases against which the sequencing reads can be aligned. We built a reference genome catalog suitable for short read metagenomic analysis using a low-cost sequencing strategy. We selected 142 bacteria isolated from dairy products belonging to 137 different species and 67 genera, and succeeded to reconstruct the draft genome of 117 of them at a standard or high quality level, including isolates from the genera Kluyvera, Luteococcus and Marinilactibacillus, still missing from public database. To demonstrate the potential of this catalog, we analysed the microbial composition of the surface of two smear cheeses and one blue-veined cheese, and showed that a significant part of the microbiota of these traditional cheeses was composed of microorganisms newly sequenced in our study. Our study provides data, which combined with publicly available genome references, represents the most expansive catalog to date of cheese-associated bacteria. Using this extended dairy catalog, we revealed the presence in traditional cheese of dominant microorganisms not deliberately inoculated, mainly Gram-negative genera such as Pseudoalteromonas haloplanktis or Psychrobacter immobilis, that may contribute to the characteristics of cheese produced through traditional methods.

  11. Integrated Genome-Based Studies of Shewanella Ecophysiology

    Energy Technology Data Exchange (ETDEWEB)

    Andrei L. Osterman, Ph.D.

    2012-12-17

    Integration of bioinformatics and experimental techniques was applied to mapping and characterization of the key components (pathways, enzymes, transporters, regulators) of the core metabolic machinery in Shewanella oneidensis and related species with main focus was on metabolic and regulatory pathways involved in utilization of various carbon and energy sources. Among the main accomplishments reflected in ten joint publications with other participants of Shewanella Federation are: (i) A systems-level reconstruction of carbohydrate utilization pathways in the genus of Shewanella (19 species). This analysis yielded reconstruction of 18 sugar utilization pathways including 10 novel pathway variants and prediction of > 60 novel protein families of enzymes, transporters and regulators involved in these pathways. Selected functional predictions were verified by focused biochemical and genetic experiments. Observed growth phenotypes were consistent with bioinformatic predictions providing strong validation of the technology and (ii) Global genomic reconstruction of transcriptional regulons in 16 Shewanella genomes. The inferred regulatory network includes 82 transcription factors, 8 riboswitches and 6 translational attenuators. Of those, 45 regulons were inferred directly from the genome context analysis, whereas others were propagated from previously characterized regulons in other species. Selected regulatory predictions were experimentally tested. Integration of this analysis with microarray data revealed overall consistency and provided additional layer of interactions between regulons. All the results were captured in the new database RegPrecise, which is a joint development with the LBNL team. A more detailed analysis of the individual subsystems, pathways and regulons in Shewanella spp included bioinfiormatics-based prediction and experimental characterization of: (i) N-Acetylglucosamine catabolic pathway; (ii)Lactate utilization machinery; (iii) Novel Nrt

  12. Development of a fluorescence-activated cell sorting method coupled with whole genome amplification to analyze minority and trace Dehalococcoides genomes in microbial communities.

    Science.gov (United States)

    Lee, Patrick K H; Men, Yujie; Wang, Shanquan; He, Jianzhong; Alvarez-Cohen, Lisa

    2015-02-03

    Dehalococcoides mccartyi are functionally important bacteria that catalyze the reductive dechlorination of chlorinated ethenes. However, these anaerobic bacteria are fastidious to isolate, making downstream genomic characterization challenging. In order to facilitate genomic analysis, a fluorescence-activated cell sorting (FACS) method was developed in this study to separate D. mccartyi cells from a microbial community, and the DNA of the isolated cells was processed by whole genome amplification (WGA) and hybridized onto a D. mccartyi microarray for comparative genomics against four sequenced strains. First, FACS was successfully applied to a D. mccartyi isolate as positive control, and then microarray results verified that WGA from 10(6) cells or ∼1 ng of genomic DNA yielded high-quality coverage detecting nearly all genes across the genome. As expected, some inter- and intrasample variability in WGA was observed, but these biases were minimized by performing multiple parallel amplifications. Subsequent application of the FACS and WGA protocols to two enrichment cultures containing ∼10% and ∼1% D. mccartyi cells successfully enabled genomic analysis. As proof of concept, this study demonstrates that coupling FACS with WGA and microarrays is a promising tool to expedite genomic characterization of target strains in environmental communities where the relative concentrations are low.

  13. Single-cell genomics reveal metabolic strategies for microbial growth and survival in an oligotrophic aquifer

    Energy Technology Data Exchange (ETDEWEB)

    Wilkins, Michael J.; Kennedy, David W.; Castelle, Cindy; Field, Erin; Stepanauskas, Ramunas; Fredrickson, Jim K.; Konopka, Allan

    2014-02-09

    Bacteria from the genus Pedobacter are a major component of microbial assemblages at Hanford Site and have been shown to significantly change in abundance in response to the subsurface intrusion of Columbia River water. Here we employed single cell genomics techniques to shed light on the physiological niche of these microorganisms. Analysis of four Pedobacter single amplified genomes (SAGs) from Hanford Site sediments revealed a chemoheterotrophic lifestyle, with the potential to exist under both aerobic and microaerophilic conditions via expression of both aa3­-type and cbb3-type cytochrome c oxidases. These SAGs encoded a wide-range of both intra-and extra­-cellular carbohydrate-active enzymes, potentially enabling the degradation of recalcitrant substrates such as xylan and chitin, and the utilization of more labile sugars such as mannose and fucose. Coupled to these enzymes, a diversity of transporters and sugar-binding molecules were involved in the uptake of carbon from the extracellular local environment. The SAGs were enriched in TonB-dependent receptors (TBDRs), which play a key role in uptake of substrates resulting from degradation of recalcitrant carbon. CRISPR-Cas mechanisms for resisting viral infections were identified in all SAGs. These data demonstrate the potential mechanisms utilized for persistence by heterotrophic microorganisms in a carbon-limited aquifer, and hint at potential linkages between observed Pedobacter abundance shifts within the 300 Area subsurface and biogeochemical shifts associated with Columbia River water intrusion.

  14. Metagenome sequencing and 98 microbial genomes from Juan de Fuca Ridge flank subsurface fluids

    Science.gov (United States)

    Jungbluth, Sean P.; Amend, Jan P.; Rappé, Michael S.

    2017-03-01

    The global deep subsurface biosphere is one of the largest reservoirs for microbial life on our planet. This study takes advantage of new sampling technologies and couples them with improvements to DNA sequencing and associated informatics tools to reconstruct the genomes of uncultivated Bacteria and Archaea from fluids collected deep within the Juan de Fuca Ridge subseafloor. Here, we generated two metagenomes from borehole observatories located 311 meters apart and, using binning tools, retrieved 98 genomes from metagenomes (GFMs). Of the GFMs, 31 were estimated to be >90% complete, while an additional 17 were >70% complete. Phylogenomic analysis revealed 53 bacterial and 45 archaeal GFMs, of which nearly all were distantly related to known cultivated isolates. In the GFMs, abundant Bacteria included Chloroflexi, Nitrospirae, Acetothermia (OP1), EM3, Aminicenantes (OP8), Gammaproteobacteria, and Deltaproteobacteria, while abundant Archaea included Archaeoglobi, Bathyarchaeota (MCG), and Marine Benthic Group E (MBG-E). These data are the first GFMs reconstructed from the deep basaltic subseafloor biosphere, and provide a dataset available for further interrogation.

  15. Visualization of RNA structure models within the Integrative Genomics Viewer.

    Science.gov (United States)

    Busan, Steven; Weeks, Kevin M

    2017-07-01

    Analyses of the interrelationships between RNA structure and function are increasingly important components of genomic studies. The SHAPE-MaP strategy enables accurate RNA structure probing and realistic structure modeling of kilobase-length noncoding RNAs and mRNAs. Existing tools for visualizing RNA structure models are not suitable for efficient analysis of long, structurally heterogeneous RNAs. In addition, structure models are often advantageously interpreted in the context of other experimental data and gene annotation information, for which few tools currently exist. We have developed a module within the widely used and well supported open-source Integrative Genomics Viewer (IGV) that allows visualization of SHAPE and other chemical probing data, including raw reactivities, data-driven structural entropies, and data-constrained base-pair secondary structure models, in context with linear genomic data tracks. We illustrate the usefulness of visualizing RNA structure in the IGV by exploring structure models for a large viral RNA genome, comparing bacterial mRNA structure in cells with its structure under cell- and protein-free conditions, and comparing a noncoding RNA structure modeled using SHAPE data with a base-pairing model inferred through sequence covariation analysis. © 2017 Busan and Weeks; Published by Cold Spring Harbor Laboratory Press for the RNA Society.

  16. Evaluation and Validation of Assembling Corrected PacBio Long Reads for Microbial Genome Completion via Hybrid Approaches.

    Science.gov (United States)

    Lin, Hsin-Hung; Liao, Yu-Chieh

    2015-01-01

    Despite the ever-increasing output of next-generation sequencing data along with developing assemblers, dozens to hundreds of gaps still exist in de novo microbial assemblies due to uneven coverage and large genomic repeats. Third-generation single-molecule, real-time (SMRT) sequencing technology avoids amplification artifacts and generates kilobase-long reads with the potential to complete microbial genome assembly. However, due to the low accuracy (~85%) of third-generation sequences, a considerable amount of long reads (>50X) are required for self-correction and for subsequent de novo assembly. Recently-developed hybrid approaches, using next-generation sequencing data and as few as 5X long reads, have been proposed to improve the completeness of microbial assembly. In this study we have evaluated the contemporary hybrid approaches and demonstrated that assembling corrected long reads (by runCA) produced the best assembly compared to long-read scaffolding (e.g., AHA, Cerulean and SSPACE-LongRead) and gap-filling (SPAdes). For generating corrected long reads, we further examined long-read correction tools, such as ECTools, LSC, LoRDEC, PBcR pipeline and proovread. We have demonstrated that three microbial genomes including Escherichia coli K12 MG1655, Meiothermus ruber DSM1279 and Pdeobacter heparinus DSM2366 were successfully hybrid assembled by runCA into near-perfect assemblies using ECTools-corrected long reads. In addition, we developed a tool, Patch, which implements corrected long reads and pre-assembled contigs as inputs, to enhance microbial genome assemblies. With the additional 20X long reads, short reads of S. cerevisiae W303 were hybrid assembled into 115 contigs using the verified strategy, ECTools + runCA. Patch was subsequently applied to upgrade the assembly to a 35-contig draft genome. Our evaluation of the hybrid approaches shows that assembling the ECTools-corrected long reads via runCA generates near complete microbial genomes, suggesting

  17. Structured Matrix Completion with Applications to Genomic Data Integration.

    Science.gov (United States)

    Cai, Tianxi; Cai, T Tony; Zhang, Anru

    2016-01-01

    Matrix completion has attracted significant recent attention in many fields including statistics, applied mathematics and electrical engineering. Current literature on matrix completion focuses primarily on independent sampling models under which the individual observed entries are sampled independently. Motivated by applications in genomic data integration, we propose a new framework of structured matrix completion (SMC) to treat structured missingness by design. Specifically, our proposed method aims at efficient matrix recovery when a subset of the rows and columns of an approximately low-rank matrix are observed. We provide theoretical justification for the proposed SMC method and derive lower bound for the estimation errors, which together establish the optimal rate of recovery over certain classes of approximately low-rank matrices. Simulation studies show that the method performs well in finite sample under a variety of configurations. The method is applied to integrate several ovarian cancer genomic studies with different extent of genomic measurements, which enables us to construct more accurate prediction rules for ovarian cancer survival.

  18. An integrated semiconductor device enabling non-optical genome sequencing.

    Science.gov (United States)

    Rothberg, Jonathan M; Hinz, Wolfgang; Rearick, Todd M; Schultz, Jonathan; Mileski, William; Davey, Mel; Leamon, John H; Johnson, Kim; Milgrew, Mark J; Edwards, Matthew; Hoon, Jeremy; Simons, Jan F; Marran, David; Myers, Jason W; Davidson, John F; Branting, Annika; Nobile, John R; Puc, Bernard P; Light, David; Clark, Travis A; Huber, Martin; Branciforte, Jeffrey T; Stoner, Isaac B; Cawley, Simon E; Lyons, Michael; Fu, Yutao; Homer, Nils; Sedova, Marina; Miao, Xin; Reed, Brian; Sabina, Jeffrey; Feierstein, Erika; Schorn, Michelle; Alanjary, Mohammad; Dimalanta, Eileen; Dressman, Devin; Kasinskas, Rachel; Sokolsky, Tanya; Fidanza, Jacqueline A; Namsaraev, Eugeni; McKernan, Kevin J; Williams, Alan; Roth, G Thomas; Bustillo, James

    2011-07-20

    The seminal importance of DNA sequencing to the life sciences, biotechnology and medicine has driven the search for more scalable and lower-cost solutions. Here we describe a DNA sequencing technology in which scalable, low-cost semiconductor manufacturing techniques are used to make an integrated circuit able to directly perform non-optical DNA sequencing of genomes. Sequence data are obtained by directly sensing the ions produced by template-directed DNA polymerase synthesis using all-natural nucleotides on this massively parallel semiconductor-sensing device or ion chip. The ion chip contains ion-sensitive, field-effect transistor-based sensors in perfect register with 1.2 million wells, which provide confinement and allow parallel, simultaneous detection of independent sequencing reactions. Use of the most widely used technology for constructing integrated circuits, the complementary metal-oxide semiconductor (CMOS) process, allows for low-cost, large-scale production and scaling of the device to higher densities and larger array sizes. We show the performance of the system by sequencing three bacterial genomes, its robustness and scalability by producing ion chips with up to 10 times as many sensors and sequencing a human genome.

  19. Integrated analysis of whole genome and transcriptome sequencing reveals diverse transcriptomic aberrations driven by somatic genomic changes in liver cancers.

    Directory of Open Access Journals (Sweden)

    Yuichi Shiraishi

    Full Text Available Recent studies applying high-throughput sequencing technologies have identified several recurrently mutated genes and pathways in multiple cancer genomes. However, transcriptional consequences from these genomic alterations in cancer genome remain unclear. In this study, we performed integrated and comparative analyses of whole genomes and transcriptomes of 22 hepatitis B virus (HBV-related hepatocellular carcinomas (HCCs and their matched controls. Comparison of whole genome sequence (WGS and RNA-Seq revealed much evidence that various types of genomic mutations triggered diverse transcriptional changes. Not only splice-site mutations, but also silent mutations in coding regions, deep intronic mutations and structural changes caused splicing aberrations. HBV integrations generated diverse patterns of virus-human fusion transcripts depending on affected gene, such as TERT, CDK15, FN1 and MLL4. Structural variations could drive over-expression of genes such as WNT ligands, with/without creating gene fusions. Furthermore, by taking account of genomic mutations causing transcriptional aberrations, we could improve the sensitivity of deleterious mutation detection in known cancer driver genes (TP53, AXIN1, ARID2, RPS6KA3, and identified recurrent disruptions in putative cancer driver genes such as HNF4A, CPS1, TSC1 and THRAP3 in HCCs. These findings indicate genomic alterations in cancer genome have diverse transcriptomic effects, and integrated analysis of WGS and RNA-Seq can facilitate the interpretation of a large number of genomic alterations detected in cancer genome.

  20. Microbial Culturomics Broadens Human Vaginal Flora Diversity: Genome Sequence and Description of Prevotella lascolaii sp. nov. Isolated from a Patient with Bacterial Vaginosis.

    Science.gov (United States)

    Diop, Khoudia; Diop, Awa; Levasseur, Anthony; Mediannikov, Oleg; Robert, Catherine; Armstrong, Nicholas; Couderc, Carine; Bretelle, Florence; Raoult, Didier; Fournier, Pierre-Edouard; Fenollar, Florence

    2018-03-01

    Microbial culturomics is a new subfield of postgenomic medicine and omics biotechnology application that has broadened our awareness on bacterial diversity of the human microbiome, including the human vaginal flora bacterial diversity. Using culturomics, a new obligate anaerobic Gram-stain-negative rod-shaped bacterium designated strain khD1 T was isolated in the vagina of a patient with bacterial vaginosis and characterized using taxonogenomics. The most abundant cellular fatty acids were C 15:0 anteiso (36%), C 16:0 (19%), and C 15:0 iso (10%). Based on an analysis of the full-length 16S rRNA gene sequences, phylogenetic analysis showed that the strain khD1 T exhibited 90% sequence similarity with Prevotella loescheii, the phylogenetically closest validated Prevotella species. With 3,763,057 bp length, the genome of strain khD1 T contained (mol%) 48.7 G + C and 3248 predicted genes, including 3194 protein-coding and 54 RNA genes. Given the phenotypical and biochemical characteristic results as well as genome sequencing, strain khD1 T is considered to represent a novel species within the genus Prevotella, for which the name Prevotella lascolaii sp. nov. is proposed. The type strain is khD1 T ( = CSUR P0109, = DSM 101754). These results show that microbial culturomics greatly improves the characterization of the human microbiome repertoire by isolating potential putative new species. Further studies will certainly clarify the microbial mechanisms of pathogenesis of these new microbes and their role in health and disease. Microbial culturomics is an important new addition to the diagnostic medicine toolbox and warrants attention in future medical, global health, and integrative biology postgraduate teaching curricula.

  1. Computational genomics of hyperthermophiles

    NARCIS (Netherlands)

    Werken, van de H.J.G.

    2008-01-01

    With the ever increasing number of completely sequenced prokaryotic genomes and the subsequent use of functional genomics tools, e.g. DNA microarray and proteomics, computational data analysis and the integration of microbial and molecular data is inevitable. This thesis describes the computational

  2. STINGRAY: system for integrated genomic resources and analysis.

    Science.gov (United States)

    Wagner, Glauber; Jardim, Rodrigo; Tschoeke, Diogo A; Loureiro, Daniel R; Ocaña, Kary A C S; Ribeiro, Antonio C B; Emmel, Vanessa E; Probst, Christian M; Pitaluga, André N; Grisard, Edmundo C; Cavalcanti, Maria C; Campos, Maria L M; Mattoso, Marta; Dávila, Alberto M R

    2014-03-07

    The STINGRAY system has been conceived to ease the tasks of integrating, analyzing, annotating and presenting genomic and expression data from Sanger and Next Generation Sequencing (NGS) platforms. STINGRAY includes: (a) a complete and integrated workflow (more than 20 bioinformatics tools) ranging from functional annotation to phylogeny; (b) a MySQL database schema, suitable for data integration and user access control; and (c) a user-friendly graphical web-based interface that makes the system intuitive, facilitating the tasks of data analysis and annotation. STINGRAY showed to be an easy to use and complete system for analyzing sequencing data. While both Sanger and NGS platforms are supported, the system could be faster using Sanger data, since the large NGS datasets could potentially slow down the MySQL database usage. STINGRAY is available at http://stingray.biowebdb.org and the open source code at http://sourceforge.net/projects/stingray-biowebdb/.

  3. Integrate genome-based assessment of safety for probiotic strains: Bacillus coagulans GBI-30, 6086 as a case study.

    Science.gov (United States)

    Salvetti, Elisa; Orrù, Luigi; Capozzi, Vittorio; Martina, Alessia; Lamontanara, Antonella; Keller, David; Cash, Howard; Felis, Giovanna E; Cattivelli, Luigi; Torriani, Sandra; Spano, Giuseppe

    2016-05-01

    Probiotics are microorganisms that confer beneficial effects on the host; nevertheless, before being allowed for human consumption, their safety must be verified with accurate protocols. In the genomic era, such procedures should take into account the genomic-based approaches. This study aims at assessing the safety traits of Bacillus coagulans GBI-30, 6086 integrating the most updated genomics-based procedures and conventional phenotypic assays. Special attention was paid to putative virulence factors (VF), antibiotic resistance (AR) genes and genes encoding enzymes responsible for harmful metabolites (i.e. biogenic amines, BAs). This probiotic strain was phenotypically resistant to streptomycin and kanamycin, although the genome analysis suggested that the AR-related genes were not easily transferrable to other bacteria, and no other genes with potential safety risks, such as those related to VF or BA production, were retrieved. Furthermore, no unstable elements that could potentially lead to genomic rearrangements were detected. Moreover, a workflow is proposed to allow the proper taxonomic identification of a microbial strain and the accurate evaluation of risk-related gene traits, combining whole genome sequencing analysis with updated bioinformatics tools and standard phenotypic assays. The workflow presented can be generalized as a guideline for the safety investigation of novel probiotic strains to help stakeholders (from scientists to manufacturers and consumers) to meet regulatory requirements and avoid misleading information.

  4. Data integration to prioritize drugs using genomics and curated data.

    Science.gov (United States)

    Louhimo, Riku; Laakso, Marko; Belitskin, Denis; Klefström, Juha; Lehtonen, Rainer; Hautaniemi, Sampsa

    2016-01-01

    Genomic alterations affecting drug target proteins occur in several tumor types and are prime candidates for patient-specific tailored treatments. Increasingly, patients likely to benefit from targeted cancer therapy are selected based on molecular alterations. The selection of a precision therapy benefiting most patients is challenging but can be enhanced with integration of multiple types of molecular data. Data integration approaches for drug prioritization have successfully integrated diverse molecular data but do not take full advantage of existing data and literature. We have built a knowledge-base which connects data from public databases with molecular results from over 2200 tumors, signaling pathways and drug-target databases. Moreover, we have developed a data mining algorithm to effectively utilize this heterogeneous knowledge-base. Our algorithm is designed to facilitate retargeting of existing drugs by stratifying samples and prioritizing drug targets. We analyzed 797 primary tumors from The Cancer Genome Atlas breast and ovarian cancer cohorts using our framework. FGFR, CDK and HER2 inhibitors were prioritized in breast and ovarian data sets. Estrogen receptor positive breast tumors showed potential sensitivity to targeted inhibitors of FGFR due to activation of FGFR3. Our results suggest that computational sample stratification selects potentially sensitive samples for targeted therapies and can aid in precision medicine drug repositioning. Source code is available from http://csblcanges.fimm.fi/GOPredict/.

  5. Integrated genomic and gene expression profiling identifies two major genomic circuits in urothelial carcinoma.

    Directory of Open Access Journals (Sweden)

    David Lindgren

    Full Text Available Similar to other malignancies, urothelial carcinoma (UC is characterized by specific recurrent chromosomal aberrations and gene mutations. However, the interconnection between specific genomic alterations, and how patterns of chromosomal alterations adhere to different molecular subgroups of UC, is less clear. We applied tiling resolution array CGH to 146 cases of UC and identified a number of regions harboring recurrent focal genomic amplifications and deletions. Several potential oncogenes were included in the amplified regions, including known oncogenes like E2F3, CCND1, and CCNE1, as well as new candidate genes, such as SETDB1 (1q21, and BCL2L1 (20q11. We next combined genome profiling with global gene expression, gene mutation, and protein expression data and identified two major genomic circuits operating in urothelial carcinoma. The first circuit was characterized by FGFR3 alterations, overexpression of CCND1, and 9q and CDKN2A deletions. The second circuit was defined by E3F3 amplifications and RB1 deletions, as well as gains of 5p, deletions at PTEN and 2q36, 16q, 20q, and elevated CDKN2A levels. TP53/MDM2 alterations were common for advanced tumors within the two circuits. Our data also suggest a possible RAS/RAF circuit. The tumors with worst prognosis showed a gene expression profile that indicated a keratinized phenotype. Taken together, our integrative approach revealed at least two separate networks of genomic alterations linked to the molecular diversity seen in UC, and that these circuits may reflect distinct pathways of tumor development.

  6. The Proteins API: accessing key integrated protein and genome information.

    Science.gov (United States)

    Nightingale, Andrew; Antunes, Ricardo; Alpi, Emanuele; Bursteinas, Borisas; Gonzales, Leonardo; Liu, Wudong; Luo, Jie; Qi, Guoying; Turner, Edd; Martin, Maria

    2017-07-03

    The Proteins API provides searching and programmatic access to protein and associated genomics data such as curated protein sequence positional annotations from UniProtKB, as well as mapped variation and proteomics data from large scale data sources (LSS). Using the coordinates service, researchers are able to retrieve the genomic sequence coordinates for proteins in UniProtKB. This, the LSS genomics and proteomics data for UniProt proteins is programmatically only available through this service. A Swagger UI has been implemented to provide documentation, an interface for users, with little or no programming experience, to 'talk' to the services to quickly and easily formulate queries with the services and obtain dynamically generated source code for popular programming languages, such as Java, Perl, Python and Ruby. Search results are returned as standard JSON, XML or GFF data objects. The Proteins API is a scalable, reliable, fast, easy to use RESTful services that provides a broad protein information resource for users to ask questions based upon their field of expertise and allowing them to gain an integrated overview of protein annotations available to aid their knowledge gain on proteins in biological processes. The Proteins API is available at (http://www.ebi.ac.uk/proteins/api/doc). © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  7. An integrated genetic data environment (GDE)-based LINUX interface for analysis of HIV-1 and other microbial sequences.

    Science.gov (United States)

    De Oliveira, T; Miller, R; Tarin, M; Cassol, S

    2003-01-01

    Sequence databases encode a wealth of information needed to develop improved vaccination and treatment strategies for the control of HIV and other important pathogens. To facilitate effective utilization of these datasets, we developed a user-friendly GDE-based LINUX interface that reduces input/output file formatting. GDE was adapted to the Linux operating system, bioinformatics tools were integrated with microbe-specific databases, and up-to-date GDE menus were developed for several clinically important viral, bacterial and parasitic genomes. Each microbial interface was designed for local access and contains Genbank, BLAST-formatted and phylogenetic databases. GDE-Linux is available for research purposes by direct application to the corresponding author. Application-specific menus and support files can be downloaded from (http://www.bioafrica.net).

  8. Distribution of nitrogen fixation and nitrogenase-like sequences amongst microbial genomes

    Science.gov (United States)

    2012-01-01

    Background The metabolic capacity for nitrogen fixation is known to be present in several prokaryotic species scattered across taxonomic groups. Experimental detection of nitrogen fixation in microbes requires species-specific conditions, making it difficult to obtain a comprehensive census of this trait. The recent and rapid increase in the availability of microbial genome sequences affords novel opportunities to re-examine the occurrence and distribution of nitrogen fixation genes. The current practice for computational prediction of nitrogen fixation is to use the presence of the nifH and/or nifD genes. Results Based on a careful comparison of the repertoire of nitrogen fixation genes in known diazotroph species we propose a new criterion for computational prediction of nitrogen fixation: the presence of a minimum set of six genes coding for structural and biosynthetic components, namely NifHDK and NifENB. Using this criterion, we conducted a comprehensive search in fully sequenced genomes and identified 149 diazotrophic species, including 82 known diazotrophs and 67 species not known to fix nitrogen. The taxonomic distribution of nitrogen fixation in Archaea was limited to the Euryarchaeota phylum; within the Bacteria domain we predict that nitrogen fixation occurs in 13 different phyla. Of these, seven phyla had not hitherto been known to contain species capable of nitrogen fixation. Our analyses also identified protein sequences that are similar to nitrogenase in organisms that do not meet the minimum-gene-set criteria. The existence of nitrogenase-like proteins lacking conserved co-factor ligands in both diazotrophs and non-diazotrophs suggests their potential for performing other, as yet unidentified, metabolic functions. Conclusions Our predictions expand the known phylogenetic diversity of nitrogen fixation, and suggest that this trait may be much more common in nature than it is currently thought. The diverse phylogenetic distribution of nitrogenase

  9. Distribution of nitrogen fixation and nitrogenase-like sequences amongst microbial genomes

    Directory of Open Access Journals (Sweden)

    Dos Santos Patricia C

    2012-05-01

    Full Text Available Abstract Background The metabolic capacity for nitrogen fixation is known to be present in several prokaryotic species scattered across taxonomic groups. Experimental detection of nitrogen fixation in microbes requires species-specific conditions, making it difficult to obtain a comprehensive census of this trait. The recent and rapid increase in the availability of microbial genome sequences affords novel opportunities to re-examine the occurrence and distribution of nitrogen fixation genes. The current practice for computational prediction of nitrogen fixation is to use the presence of the nifH and/or nifD genes. Results Based on a careful comparison of the repertoire of nitrogen fixation genes in known diazotroph species we propose a new criterion for computational prediction of nitrogen fixation: the presence of a minimum set of six genes coding for structural and biosynthetic components, namely NifHDK and NifENB. Using this criterion, we conducted a comprehensive search in fully sequenced genomes and identified 149 diazotrophic species, including 82 known diazotrophs and 67 species not known to fix nitrogen. The taxonomic distribution of nitrogen fixation in Archaea was limited to the Euryarchaeota phylum; within the Bacteria domain we predict that nitrogen fixation occurs in 13 different phyla. Of these, seven phyla had not hitherto been known to contain species capable of nitrogen fixation. Our analyses also identified protein sequences that are similar to nitrogenase in organisms that do not meet the minimum-gene-set criteria. The existence of nitrogenase-like proteins lacking conserved co-factor ligands in both diazotrophs and non-diazotrophs suggests their potential for performing other, as yet unidentified, metabolic functions. Conclusions Our predictions expand the known phylogenetic diversity of nitrogen fixation, and suggest that this trait may be much more common in nature than it is currently thought. The diverse phylogenetic

  10. High definition for systems biology of microbial communities: metagenomics gets genome-centric and strain-resolved.

    Science.gov (United States)

    Turaev, Dmitrij; Rattei, Thomas

    2016-06-01

    The systems biology of microbial communities, organismal communities inhabiting all ecological niches on earth, has in recent years been strongly facilitated by the rapid development of experimental, sequencing and data analysis methods. Novel experimental approaches and binning methods in metagenomics render the semi-automatic reconstructions of near-complete genomes of uncultivable bacteria possible, while advances in high-resolution amplicon analysis allow for efficient and less biased taxonomic community characterization. This will also facilitate predictive modeling approaches, hitherto limited by the low resolution of metagenomic data. In this review, we pinpoint the most promising current developments in metagenomics. They facilitate microbial systems biology towards a systemic understanding of mechanisms in microbial communities with scopes of application in many areas of our daily life. Copyright © 2016 Elsevier Ltd. All rights reserved.

  11. Germ warfare in a microbial mat community: CRISPRs provide insights into the co-evolution of host and viral genomes.

    Directory of Open Access Journals (Sweden)

    John F Heidelberg

    Full Text Available CRISPR arrays and associated cas genes are widespread in bacteria and archaea and confer acquired resistance to viruses. To examine viral immunity in the context of naturally evolving microbial populations we analyzed genomic data from two thermophilic Synechococcus isolates (Syn OS-A and Syn OS-B' as well as a prokaryotic metagenome and viral metagenome derived from microbial mats in hotsprings at Yellowstone National Park. Two distinct CRISPR types, distinguished by the repeat sequence, are found in both the Syn OS-A and Syn OS-B' genomes. The genome of Syn OS-A contains a third CRISPR type with a distinct repeat sequence, which is not found in Syn OS-B', but appears to be shared with other microorganisms that inhabit the mat. The CRISPR repeats identified in the microbial metagenome are highly conserved, while the spacer sequences (hereafter referred to as "viritopes" to emphasize their critical role in viral immunity were mostly unique and had no high identity matches when searched against GenBank. Searching the viritopes against the viral metagenome, however, yielded several matches with high similarity some of which were within a gene identified as a likely viral lysozyme/lysin protein. Analysis of viral metagenome sequences corresponding to this lysozyme/lysin protein revealed several mutations all of which translate into silent or conservative mutations which are unlikely to affect protein function, but may help the virus evade the host CRISPR resistance mechanism. These results demonstrate the varied challenges presented by a natural virus population, and support the notion that the CRISPR/viritope system must be able to adapt quickly to provide host immunity. The ability of metagenomics to track population-level variation in viritope sequences allows for a culture-independent method for evaluating the fast co-evolution of host and viral genomes and its consequence on the structuring of complex microbial communities.

  12. Genome-centric metatranscriptomes and ecological roles of the active microbial populations during cellulosic biomass anaerobic digestion.

    Science.gov (United States)

    Jia, Yangyang; Ng, Siu-Kin; Lu, Hongyuan; Cai, Mingwei; Lee, Patrick K H

    2018-01-01

    Although anaerobic digestion for biogas production is used worldwide in treatment processes to recover energy from carbon-rich waste such as cellulosic biomass, the activities and interactions among the microbial populations that perform anaerobic digestion deserve further investigations, especially at the population genome level. To understand the cellulosic biomass-degrading potentials in two full-scale digesters, this study examined five methanogenic enrichment cultures derived from the digesters that anaerobically digested cellulose or xylan for more than 2 years under 35 or 55 °C conditions. Metagenomics and metatranscriptomics were used to capture the active microbial populations in each enrichment culture and reconstruct their meta-metabolic network and ecological roles. 107 population genomes were reconstructed from the five enrichment cultures using a differential coverage binning approach, of which only a subset was highly transcribed in the metatranscriptomes. Phylogenetic and functional convergence of communities by enrichment condition and phase of fermentation was observed for the highly transcribed populations in the metatranscriptomes. In the 35 °C cultures grown on cellulose, Clostridium cellulolyticum -related and Ruminococcus -related bacteria were identified as major hydrolyzers and primary fermenters in the early growth phase, while Clostridium leptum -related bacteria were major secondary fermenters and potential fatty acid scavengers in the late growth phase. While the meta-metabolism and trophic roles of the cultures were similar, the bacterial populations performing each function were distinct between the enrichment conditions. Overall, a population genome-centric view of the meta-metabolism and functional roles of key active players in anaerobic digestion of cellulosic biomass was obtained. This study represents a major step forward towards understanding the microbial functions and interactions at population genome level during the

  13. The GAAS Metagenomic Tool and Its Estimations of Viral and Microbial Average Genome Size in Four Major Biomes

    OpenAIRE

    Angly, Florent E.; Willner, Dana; Prieto-Dav?, Alejandra; Edwards, Robert A.; Schmieder, Robert; Vega-Thurber, Rebecca; Antonopoulos, Dionysios A.; Barott, Katie; Cottrell, Matthew T.; Desnues, Christelle; Dinsdale, Elizabeth A.; Furlan, Mike; Haynes, Matthew; Henn, Matthew R.; Hu, Yongfei

    2009-01-01

    Metagenomic studies characterize both the composition and diversity of uncultured viral and microbial communities. BLAST-based comparisons have typically been used for such analyses; however, sampling biases, high percentages of unknown sequences, and the use of arbitrary thresholds to find significant similarities can decrease the accuracy and validity of estimates. Here, we present Genome relative Abundance and Average Size (GAAS), a complete software package that provides improved estimate...

  14. Rapid detection of microbial DNA by a novel isothermal genome exponential amplification reaction (GEAR) assay.

    Science.gov (United States)

    Prithiviraj, Jothikumar; Hill, Vincent; Jothikumar, Narayanan

    2012-04-20

    In this study we report the development of a simple target-specific isothermal nucleic acid amplification technique, termed genome exponential amplification reaction (GEAR). Escherichia coli was selected as the microbial target to demonstrate the GEAR technique as a proof of concept. The GEAR technique uses a set of four primers; in the present study these primers targeted 5 regions on the 16S rRNA gene of E. coli. The outer forward and reverse Tab primer sequences are complementary to each other at their 5' end, whereas their 3' end sequences are complementary to their respective target nucleic acid sequences. The GEAR assay was performed at a constant temperature 60 °C and monitored continuously in a real-time PCR instrument in the presence of an intercalating dye (SYTO 9). The GEAR assay enabled amplification of as few as one colony forming units of E. coli per reaction within 30 min. We also evaluated the GEAR assay for rapid identification of bacterial colonies cultured on agar media directly in the reaction without DNA extraction. Cells from E. coli colonies were picked and added directly to GEAR assay mastermix without prior DNA extraction. DNA in the cells could be amplified, yielding positive results within 15 min. Published by Elsevier Inc.

  15. Integrated hydrogen production process from cellulose by combining dark fermentation, microbial fuel cells, and a microbial electrolysis cell

    KAUST Repository

    Wang, Aijie

    2011-03-01

    Hydrogen gas production from cellulose was investigated using an integrated hydrogen production process consisting of a dark fermentation reactor and microbial fuel cells (MFCs) as power sources for a microbial electrolysis cell (MEC). Two MFCs (each 25mL) connected in series to an MEC (72mL) produced a maximum of 0.43V using fermentation effluent as a feed, achieving a hydrogen production rate from the MEC of 0.48m 3 H 2/m 3/d (based on the MEC volume), and a yield of 33.2mmol H 2/g COD removed in the MEC. The overall hydrogen production for the integrated system (fermentation, MFC and MEC) was increased by 41% compared with fermentation alone to 14.3mmol H 2/g cellulose, with a total hydrogen production rate of 0.24m 3 H 2/m 3/d and an overall energy recovery efficiency of 23% (based on cellulose removed) without the need for any external electrical energy input. © 2010 Elsevier Ltd.

  16. Integrated hydrogen production process from cellulose by combining dark fermentation, microbial fuel cells, and a microbial electrolysis cell.

    Science.gov (United States)

    Wang, Aijie; Sun, Dan; Cao, Guangli; Wang, Haoyu; Ren, Nanqi; Wu, Wei-Min; Logan, Bruce E

    2011-03-01

    Hydrogen gas production from cellulose was investigated using an integrated hydrogen production process consisting of a dark fermentation reactor and microbial fuel cells (MFCs) as power sources for a microbial electrolysis cell (MEC). Two MFCs (each 25 mL) connected in series to an MEC (72 mL) produced a maximum of 0.43 V using fermentation effluent as a feed, achieving a hydrogen production rate from the MEC of 0.48 m(3) H(2)/m(3)/d (based on the MEC volume), and a yield of 33.2 mmol H(2)/g COD removed in the MEC. The overall hydrogen production for the integrated system (fermentation, MFC and MEC) was increased by 41% compared with fermentation alone to 14.3 mmol H(2)/g cellulose, with a total hydrogen production rate of 0.24 m(3) H(2)/m(3)/d and an overall energy recovery efficiency of 23% (based on cellulose removed) without the need for any external electrical energy input. Copyright © 2010 Elsevier Ltd. All rights reserved.

  17. Linking Soil Microbial Ecology to Ecosystem Functioning in Integrated Crop-Livestock Systems

    Science.gov (United States)

    Enhanced soil stability, nutrient cycling and C sequestration potential are important ecosystem functions driven by soil microbial processes and are directly influenced by agricultural management. Integrated crop-livestock agroecosystems (ICL) can enhance these functions via high-residue returning c...

  18. GDR (Genome Database for Rosaceae): integrated web-database for Rosaceae genomics and genetics data.

    Science.gov (United States)

    Jung, Sook; Staton, Margaret; Lee, Taein; Blenda, Anna; Svancara, Randall; Abbott, Albert; Main, Dorrie

    2008-01-01

    The Genome Database for Rosaceae (GDR) is a central repository of curated and integrated genetics and genomics data of Rosaceae, an economically important family which includes apple, cherry, peach, pear, raspberry, rose and strawberry. GDR contains annotated databases of all publicly available Rosaceae ESTs, the genetically anchored peach physical map, Rosaceae genetic maps and comprehensively annotated markers and traits. The ESTs are assembled to produce unigene sets of each genus and the entire Rosaceae. Other annotations include putative function, microsatellites, open reading frames, single nucleotide polymorphisms, gene ontology terms and anchored map position where applicable. Most of the published Rosaceae genetic maps can be viewed and compared through CMap, the comparative map viewer. The peach physical map can be viewed using WebFPC/WebChrom, and also through our integrated GDR map viewer, which serves as a portal to the combined genetic, transcriptome and physical mapping information. ESTs, BACs, markers and traits can be queried by various categories and the search result sites are linked to the mapping visualization tools. GDR also provides online analysis tools such as a batch BLAST/FASTA server for the GDR datasets, a sequence assembly server and microsatellite and primer detection tools. GDR is available at http://www.rosaceae.org.

  19. Genome-wide profiling of HPV integration in cervical cancer identifies clustered genomic hot spots and a potential microhomology-mediated integration mechanism

    DEFF Research Database (Denmark)

    Hu, Zheng; Zhu, Da; Wang, Wei

    2015-01-01

    Human papillomavirus (HPV) integration is a key genetic event in cervical carcinogenesis1. By conducting whole-genome sequencing and high-throughput viral integration detection, we identified 3,667 HPV integration breakpoints in 26 cervical intraepithelial neoplasias, 104 cervical carcinomas and ...

  20. Integrated Genomic Analysis of the Ubiquitin Pathway across Cancer Types

    Directory of Open Access Journals (Sweden)

    Zhongqi Ge

    2018-04-01

    Full Text Available Summary: Protein ubiquitination is a dynamic and reversible process of adding single ubiquitin molecules or various ubiquitin chains to target proteins. Here, using multidimensional omic data of 9,125 tumor samples across 33 cancer types from The Cancer Genome Atlas, we perform comprehensive molecular characterization of 929 ubiquitin-related genes and 95 deubiquitinase genes. Among them, we systematically identify top somatic driver candidates, including mutated FBXW7 with cancer-type-specific patterns and amplified MDM2 showing a mutually exclusive pattern with BRAF mutations. Ubiquitin pathway genes tend to be upregulated in cancer mediated by diverse mechanisms. By integrating pan-cancer multiomic data, we identify a group of tumor samples that exhibit worse prognosis. These samples are consistently associated with the upregulation of cell-cycle and DNA repair pathways, characterized by mutated TP53, MYC/TERT amplification, and APC/PTEN deletion. Our analysis highlights the importance of the ubiquitin pathway in cancer development and lays a foundation for developing relevant therapeutic strategies. : Ge et al. analyze a cohort of 9,125 TCGA samples across 33 cancer types to provide a comprehensive characterization of the ubiquitin pathway. They detect somatic driver candidates in the ubiquitin pathway and identify a cluster of patients with poor survival, highlighting the importance of this pathway in cancer development. Keywords: ubiquitin pathway, pan-cancer analysis, The Cancer Genome Atlas, tumor subtype, cancer prognosis, therapeutic targets, biomarker, FBXW7

  1. A New Approach to Predict Microbial Community Assembly and Function Using a Stochastic, Genome-Enabled Modeling Framework

    Science.gov (United States)

    King, E.; Brodie, E.; Anantharaman, K.; Karaoz, U.; Bouskill, N.; Banfield, J. F.; Steefel, C. I.; Molins, S.

    2016-12-01

    Characterizing and predicting the microbial and chemical compositions of subsurface aquatic systems necessitates an understanding of the metabolism and physiology of organisms that are often uncultured or studied under conditions not relevant for one's environment of interest. Cultivation-independent approaches are therefore important and have greatly enhanced our ability to characterize functional microbial diversity. The capability to reconstruct genomes representing thousands of populations from microbial communities using metagenomic techniques provides a foundation for development of predictive models for community structure and function. Here, we discuss a genome-informed stochastic trait-based model incorporated into a reactive transport framework to represent the activities of coupled guilds of hypothetical microorganisms. Metabolic pathways for each microbe within a functional guild are parameterized from metagenomic data with a unique combination of traits governing organism fitness under dynamic environmental conditions. We simulate the thermodynamics of coupled electron donor and acceptor reactions to predict the energy available for cellular maintenance, respiration, biomass development, and enzyme production. While `omics analyses can now characterize the metabolic potential of microbial communities, it is functionally redundant as well as computationally prohibitive to explicitly include the thousands of recovered organisms into biogeochemical models. However, one can derive potential metabolic pathways from genomes along with trait-linkages to build probability distributions of traits. These distributions are used to assemble groups of microbes that couple one or more of these pathways. From the initial ensemble of microbes, only a subset will persist based on the interaction of their physiological and metabolic traits with environmental conditions, competing organisms, etc. Here, we analyze the predicted niches of these hypothetical microbes and

  2. Metagenome-Assembled Genome Sequences of Acetobacterium sp. Strain MES1 and Desulfovibrio sp. Strain MES5 from a Cathode-Associated Acetogenic Microbial Community.

    Science.gov (United States)

    Ross, Daniel E; Marshall, Christopher W; May, Harold D; Norman, R Sean

    2017-09-07

    Draft genome sequences of Acetobacterium sp. strain MES1 and Desulfovibrio sp. strain MES5 were obtained from the metagenome of a cathode-associated community enriched within a microbial electrosynthesis system (MES). The draft genome sequences provide insight into the functional potential of these microorganisms within an MES and a foundation for future comparative analyses. Copyright © 2017 Ross et al.

  3. Genome-Enabled Modeling of Biogeochemical Processes Predicts Metabolic Dependencies that Connect the Relative Fitness of Microbial Functional Guilds

    Science.gov (United States)

    Brodie, E.; King, E.; Molins, S.; Karaoz, U.; Steefel, C. I.; Banfield, J. F.; Beller, H. R.; Anantharaman, K.; Ligocki, T. J.; Trebotich, D.

    2015-12-01

    Pore-scale processes mediated by microorganisms underlie a range of critical ecosystem services, regulating carbon stability, nutrient flux, and the purification of water. Advances in cultivation-independent approaches now provide us with the ability to reconstruct thousands of genomes from microbial populations from which functional roles may be assigned. With this capability to reveal microbial metabolic potential, the next step is to put these microbes back where they belong to interact with their natural environment, i.e. the pore scale. At this scale, microorganisms communicate, cooperate and compete across their fitness landscapes with communities emerging that feedback on the physical and chemical properties of their environment, ultimately altering the fitness landscape and selecting for new microbial communities with new properties and so on. We have developed a trait-based model of microbial activity that simulates coupled functional guilds that are parameterized with unique combinations of traits that govern fitness under dynamic conditions. Using a reactive transport framework, we simulate the thermodynamics of coupled electron donor-acceptor reactions to predict energy available for cellular maintenance, respiration, biomass development, and enzyme production. From metagenomics, we directly estimate some trait values related to growth and identify the linkage of key traits associated with respiration and fermentation, macromolecule depolymerizing enzymes, and other key functions such as nitrogen fixation. Our simulations were carried out to explore abiotic controls on community emergence such as seasonally fluctuating water table regimes across floodplain organic matter hotspots. Simulations and metagenomic/metatranscriptomic observations highlighted the many dependencies connecting the relative fitness of functional guilds and the importance of chemolithoautotrophic lifestyles. Using an X-Ray microCT-derived soil microaggregate physical model combined

  4. Microbial bioinformatics 2020.

    Science.gov (United States)

    Pallen, Mark J

    2016-09-01

    Microbial bioinformatics in 2020 will remain a vibrant, creative discipline, adding value to the ever-growing flood of new sequence data, while embracing novel technologies and fresh approaches. Databases and search strategies will struggle to cope and manual curation will not be sustainable during the scale-up to the million-microbial-genome era. Microbial taxonomy will have to adapt to a situation in which most microorganisms are discovered and characterised through the analysis of sequences. Genome sequencing will become a routine approach in clinical and research laboratories, with fresh demands for interpretable user-friendly outputs. The "internet of things" will penetrate healthcare systems, so that even a piece of hospital plumbing might have its own IP address that can be integrated with pathogen genome sequences. Microbiome mania will continue, but the tide will turn from molecular barcoding towards metagenomics. Crowd-sourced analyses will collide with cloud computing, but eternal vigilance will be the price of preventing the misinterpretation and overselling of microbial sequence data. Output from hand-held sequencers will be analysed on mobile devices. Open-source training materials will address the need for the development of a skilled labour force. As we boldly go into the third decade of the twenty-first century, microbial sequence space will remain the final frontier! © 2016 The Author. Microbial Biotechnology published by John Wiley & Sons Ltd and Society for Applied Microbiology.

  5. Challenging a bioinformatic tool’s ability to detect microbial contaminants using in silico whole genome sequencing data

    Directory of Open Access Journals (Sweden)

    Nathan D. Olson

    2017-09-01

    Full Text Available High sensitivity methods such as next generation sequencing and polymerase chain reaction (PCR are adversely impacted by organismal and DNA contaminants. Current methods for detecting contaminants in microbial materials (genomic DNA and cultures are not sensitive enough and require either a known or culturable contaminant. Whole genome sequencing (WGS is a promising approach for detecting contaminants due to its sensitivity and lack of need for a priori assumptions about the contaminant. Prior to applying WGS, we must first understand its limitations for detecting contaminants and potential for false positives. Herein we demonstrate and characterize a WGS-based approach to detect organismal contaminants using an existing metagenomic taxonomic classification algorithm. Simulated WGS datasets from ten genera as individuals and binary mixtures of eight organisms at varying ratios were analyzed to evaluate the role of contaminant concentration and taxonomy on detection. For the individual genomes the false positive contaminants reported depended on the genus, with Staphylococcus, Escherichia, and Shigella having the highest proportion of false positives. For nearly all binary mixtures the contaminant was detected in the in-silico datasets at the equivalent of 1 in 1,000 cells, though F. tularensis was not detected in any of the simulated contaminant mixtures and Y. pestis was only detected at the equivalent of one in 10 cells. Once a WGS method for detecting contaminants is characterized, it can be applied to evaluate microbial material purity, in efforts to ensure that contaminants are characterized in microbial materials used to validate pathogen detection assays, generate genome assemblies for database submission, and benchmark sequencing methods.

  6. Energy-positive wastewater treatment and desalination in an integrated microbial desalination cell (MDC)-microbial electrolysis cell (MEC)

    Science.gov (United States)

    Li, Yan; Styczynski, Jordyn; Huang, Yuankai; Xu, Zhiheng; McCutcheon, Jeffrey; Li, Baikun

    2017-07-01

    Simultaneous removal of nitrogen in municipal wastewater, metal in industrial wastewater and saline in seawater was achieved in an integrated microbial desalination cell-microbial electrolysis cell (MDC-MEC) system. Batch tests showed that more than 95.1% of nitrogen was oxidized by nitrification in the cathode of MDC and reduced by heterotrophic denitrification in the anode of MDC within 48 h, leading to the total nitrogen removal rate of 4.07 mg L-1 h-1. Combining of nitrogen removal and desalination in MDC effectively solved the problem of pH fluctuation in anode and cathode, and led to 63.7% of desalination. Power generation of MDC (293.7 mW m-2) was 2.9 times higher than the one without salt solution. The electric power of MDC was harvested by a capacitor circuit to supply metal reduction in a MEC, and 99.5% of lead (II) was removed within 48 h. A kinetic MDC model was developed to elucidate the correlation of voltage output and desalination efficiency. Ratio of wastewater and sea water was calculated for MDC optimal operation. Energy balance of nutrient removal, metal removal and desalination in the MDC-MEC system was positive (0.0267 kW h m-3), demonstrating the promise of utilizing low power output of MDCs.

  7. A hybrid clustering approach to recognition of protein families in 114 microbial genomes

    Directory of Open Access Journals (Sweden)

    Gogarten J Peter

    2004-04-01

    Full Text Available Abstract Background Grouping proteins into sequence-based clusters is a fundamental step in many bioinformatic analyses (e.g., homology-based prediction of structure or function. Standard clustering methods such as single-linkage clustering capture a history of cluster topologies as a function of threshold, but in practice their usefulness is limited because unrelated sequences join clusters before biologically meaningful families are fully constituted, e.g. as the result of matches to so-called promiscuous domains. Use of the Markov Cluster algorithm avoids this non-specificity, but does not preserve topological or threshold information about protein families. Results We describe a hybrid approach to sequence-based clustering of proteins that combines the advantages of standard and Markov clustering. We have implemented this hybrid approach over a relational database environment, and describe its application to clustering a large subset of PDB, and to 328577 proteins from 114 fully sequenced microbial genomes. To demonstrate utility with difficult problems, we show that hybrid clustering allows us to constitute the paralogous family of ATP synthase F1 rotary motor subunits into a single, biologically interpretable hierarchical grouping that was not accessible using either single-linkage or Markov clustering alone. We describe validation of this method by hybrid clustering of PDB and mapping SCOP families and domains onto the resulting clusters. Conclusion Hybrid (Markov followed by single-linkage clustering combines the advantages of the Markov Cluster algorithm (avoidance of non-specific clusters resulting from matches to promiscuous domains and single-linkage clustering (preservation of topological information as a function of threshold. Within the individual Markov clusters, single-linkage clustering is a more-precise instrument, discerning sub-clusters of biological relevance. Our hybrid approach thus provides a computationally efficient

  8. Genome resolved analysis of a premature infant gut microbial community reveals a Varibaculum cambriense genome and a shift towards fermentation-based metabolism during the third week of life.

    Science.gov (United States)

    Brown, Christopher T; Sharon, Itai; Thomas, Brian C; Castelle, Cindy J; Morowitz, Michael J; Banfield, Jillian F

    2013-12-17

    The premature infant gut has low individual but high inter-individual microbial diversity compared with adults. Based on prior 16S rRNA gene surveys, many species from this environment are expected to be similar to those previously detected in the human microbiota. However, the level of genomic novelty and metabolic variation of strains found in the infant gut remains relatively unexplored. To study the stability and function of early microbial colonizers of the premature infant gut, nine stool samples were taken during the third week of life of a premature male infant delivered via Caesarean section. Metagenomic sequences were assembled and binned into near-complete and partial genomes, enabling strain-level genomic analysis of the microbial community.We reconstructed eleven near-complete and six partial bacterial genomes representative of the key members of the microbial community. Twelve of these genomes share >90% putative ortholog amino acid identity with reference genomes. Manual curation of the assembly of one particularly novel genome resulted in the first essentially complete genome sequence (in three pieces, the order of which could not be determined due to a repeat) for Varibaculum cambriense (strain Dora), a medically relevant species that has been implicated in abscess formation.During the period studied, the microbial community undergoes a compositional shift, in which obligate anaerobes (fermenters) overtake Escherichia coli as the most abundant species. Other species remain stable, probably due to their ability to either respire anaerobically or grow by fermentation, and their capacity to tolerate fluctuating levels of oxygen. Metabolic predictions for V. cambriense suggest that, like other members of the microbial community, this organism is able to process various sugar substrates and make use of multiple different electron acceptors during anaerobic respiration. Genome comparisons within the family Actinomycetaceae reveal important differences

  9. An Integrative Genomic Island Affects the Adaptations of Piezophilic Hyperthermophilic Archaeon Pyrococcus yayanosii to High Temperature and High Hydrostatic Pressure

    Directory of Open Access Journals (Sweden)

    Zhen Li

    2016-11-01

    Full Text Available Deep-sea hydrothermal vent environments are characterized by high hydrostatic pressure and sharp temperature and chemical gradients. Horizontal gene transfer is thought to play an important role in the microbial adaptation to such an extreme environment. In this study, a 21.4-kb DNA fragment was identified as a genomic island, designated PYG1, in the genomic sequence of the piezophilic hyperthermophile Pyrococcus yayanosii. According to the sequence alignment and functional annotation, the genes in PYG1 could tentatively be divided into five modules, with functions related to mobility, DNA repair, metabolic processes and the toxin-antitoxin system. Integrase can mediate the site-specific integration and excision of PYG1 in the chromosome of P. yayanosii A1. Gene replacement of PYG1 with a SimR cassette was successful. The growth of the mutant strain ∆PYG1 was compared with its parent strain P. yayanosii A2 under various stress conditions, including different pH, salinity, temperature and hydrostatic pressure. The ∆PYG1 mutant strain showed reduced growth when grown at 100 °C, while the biomass of ∆PYG1 increased significantly when cultured at 80 MPa. Differential expression of the genes in module Ⅲ of PYG1 was observed under different temperature and pressure conditions. This study demonstrates the first example of an archaeal integrative genomic island that could affect the adaptation of the hyperthermophilic piezophile P. yayanosii to high temperature and high hydrostatic pressure.

  10. New perspectives on microbial community distortion after whole-genome amplification

    Science.gov (United States)

    Whole-genome amplification (WGA) has become an important tool to explore the genomic information of microorganisms in an environmental sample with limited biomass, however potential selective biases during the amplification processes are poorly understood. Here, we describe the e...

  11. Role of Genomic Typing in Taxonomy, Evolutionary Genetics, and Microbial Epidemiology

    OpenAIRE

    van Belkum, Alex; Struelens, Marc; de Visser, Arjan; Verbrugh, Henri; Tibayrenc, Michel

    2001-01-01

    Currently, genetic typing of microorganisms is widely used in several major fields of microbiological research. Taxonomy, research aimed at elucidation of evolutionary dynamics or phylogenetic relationships, population genetics of microorganisms, and microbial epidemiology all rely on genetic typing data for discrimination between genotypes. Apart from being an essential component of these fundamental sciences, microbial typing clearly affects several areas of applied microbiogical research. ...

  12. Role of genomic typing in taxonomy, evolutionary genetics, and microbial epidemiology.

    OpenAIRE

    Belkum, Alex; Struelens, M.; Visser, Arjan; Verbrugh, Henri; Tibayrench, M.

    2001-01-01

    textabstractCurrently, genetic typing of microorganisms is widely used in several major fields of microbiological research. Taxonomy, research aimed at elucidation of evolutionary dynamics or phylogenetic relationships, population genetics of microorganisms, and microbial epidemiology all rely on genetic typing data for discrimination between genotypes. Apart from being an essential component of these fundamental sciences, microbial typing clearly affects several areas of applied microbiologi...

  13. Microbial Genome Analysis and Comparisons: Web-based Protocols and Resources

    Science.gov (United States)

    Fully annotated genome sequences of many microorganisms are publicly available as a resource. However, in-depth analysis of these genomes using specialized tools is required to derive meaningful information. We describe here the utility of three powerful publicly available genome databases and ana...

  14. Increased microbial functional diversity under long-term organic and integrated fertilization in a paddy soil.

    Science.gov (United States)

    Ding, Long-Jun; Su, Jian-Qiang; Sun, Guo-Xin; Wu, Jin-Shui; Wei, Wen-Xue

    2018-02-01

    Microbes play key roles in diverse biogeochemical processes including nutrient cycling. However, responses of soil microbial community and functional genes to long-term integrated fertilization (chemical combined with organic fertilization) remain unclear. Here, we used pyrosequencing and a microarray-based GeoChip to explore the shifts of microbial community and functional genes in a paddy soil which received over 21-year fertilization with various regimes, including control (no fertilizer), rice straw (R), rice straw plus chemical fertilizer nitrogen (NR), N and phosphorus (NPR), NP and potassium (NPKR), and reduced rice straw plus reduced NPK (L-NPKR). Significant shifts of the overall soil bacterial composition only occurred in the NPKR and L-NPKR treatments, with enrichment of certain groups including Bradyrhizobiaceae and Rhodospirillaceae families that benefit higher productivity. All fertilization treatments significantly altered the soil microbial functional structure with increased diversity and abundances of genes for carbon and nitrogen cycling, in which NPKR and L-NPKR exhibited the strongest effect, while R exhibited the least. Functional gene structure and abundance were significantly correlated with corresponding soil enzymatic activities and rice yield, respectively, suggesting that the structural shift of the microbial functional community under fertilization might promote soil nutrient turnover and thereby affect yield. Overall, this study indicates that the combined application of rice straw and balanced chemical fertilizers was more pronounced in shifting the bacterial composition and improving the functional diversity toward higher productivity, providing a microbial point of view on applying a cost-effective integrated fertilization regime with rice straw plus reduced chemical fertilizers for sustainable nutrient management.

  15. New Markov Model Approaches to Deciphering Microbial Genome Function and Evolution: Comparative Genomics of Laterally Transferred Genes

    Energy Technology Data Exchange (ETDEWEB)

    Borodovsky, M.

    2013-04-11

    Algorithmic methods for gene prediction have been developed and successfully applied to many different prokaryotic genome sequences. As the set of genes in a particular genome is not homogeneous with respect to DNA sequence composition features, the GeneMark.hmm program utilizes two Markov models representing distinct classes of protein coding genes denoted "typical" and "atypical". Atypical genes are those whose DNA features deviate significantly from those classified as typical and they represent approximately 10% of any given genome. In addition to the inherent interest of more accurately predicting genes, the atypical status of these genes may also reflect their separate evolutionary ancestry from other genes in that genome. We hypothesize that atypical genes are largely comprised of those genes that have been relatively recently acquired through lateral gene transfer (LGT). If so, what fraction of atypical genes are such bona fide LGTs? We have made atypical gene predictions for all fully completed prokaryotic genomes; we have been able to compare these results to other "surrogate" methods of LGT prediction.

  16. Methylobacterium genome sequences: a reference blueprint to investigate microbial metabolism of C1 compounds from natural and industrial sources.

    Science.gov (United States)

    Vuilleumier, Stéphane; Chistoserdova, Ludmila; Lee, Ming-Chun; Bringel, Françoise; Lajus, Aurélie; Zhou, Yang; Gourion, Benjamin; Barbe, Valérie; Chang, Jean; Cruveiller, Stéphane; Dossat, Carole; Gillett, Will; Gruffaz, Christelle; Haugen, Eric; Hourcade, Edith; Levy, Ruth; Mangenot, Sophie; Muller, Emilie; Nadalig, Thierry; Pagni, Marco; Penny, Christian; Peyraud, Rémi; Robinson, David G; Roche, David; Rouy, Zoé; Saenampechek, Channakhone; Salvignol, Grégory; Vallenet, David; Wu, Zaining; Marx, Christopher J; Vorholt, Julia A; Olson, Maynard V; Kaul, Rajinder; Weissenbach, Jean; Médigue, Claudine; Lidstrom, Mary E

    2009-01-01

    Methylotrophy describes the ability of organisms to grow on reduced organic compounds without carbon-carbon bonds. The genomes of two pink-pigmented facultative methylotrophic bacteria of the Alpha-proteobacterial genus Methylobacterium, the reference species Methylobacterium extorquens strain AM1 and the dichloromethane-degrading strain DM4, were compared. The 6.88 Mb genome of strain AM1 comprises a 5.51 Mb chromosome, a 1.26 Mb megaplasmid and three plasmids, while the 6.12 Mb genome of strain DM4 features a 5.94 Mb chromosome and two plasmids. The chromosomes are highly syntenic and share a large majority of genes, while plasmids are mostly strain-specific, with the exception of a 130 kb region of the strain AM1 megaplasmid which is syntenic to a chromosomal region of strain DM4. Both genomes contain large sets of insertion elements, many of them strain-specific, suggesting an important potential for genomic plasticity. Most of the genomic determinants associated with methylotrophy are nearly identical, with two exceptions that illustrate the metabolic and genomic versatility of Methylobacterium. A 126 kb dichloromethane utilization (dcm) gene cluster is essential for the ability of strain DM4 to use DCM as the sole carbon and energy source for growth and is unique to strain DM4. The methylamine utilization (mau) gene cluster is only found in strain AM1, indicating that strain DM4 employs an alternative system for growth with methylamine. The dcm and mau clusters represent two of the chromosomal genomic islands (AM1: 28; DM4: 17) that were defined. The mau cluster is flanked by mobile elements, but the dcm cluster disrupts a gene annotated as chelatase and for which we propose the name "island integration determinant" (iid). These two genome sequences provide a platform for intra- and interspecies genomic comparisons in the genus Methylobacterium, and for investigations of the adaptive mechanisms which allow bacterial lineages to acquire methylotrophic

  17. The United States Culture Collection Network (USCCN): Enhancing Microbial Genomics Research through Living Microbe Culture Collections

    Science.gov (United States)

    Boundy-Mills, Kyria; Hess, Matthias; Bennett, A. Rick; Ryan, Matthew; Kang, Seogchan; Nobles, David; Eisen, Jonathan A.; Inderbitzin, Patrik; Sitepu, Irnayuli R.; Torok, Tamas; Brown, Daniel R.; Cho, Juliana; Wertz, John E.; Mukherjee, Supratim; Cady, Sherry L.

    2015-01-01

    The mission of the United States Culture Collection Network (USCCN; http://usccn.org) is “to facilitate the safe and responsible utilization of microbial resources for research, education, industry, medicine, and agriculture for the betterment of human kind.” Microbial culture collections are a key component of life science research, biotechnology, and emerging global biobased economies. Representatives and users of several microbial culture collections from the United States and Europe gathered at the University of California, Davis, to discuss how collections of microorganisms can better serve users and stakeholders and to showcase existing resources available in public culture collections. PMID:26092453

  18. Stakeholder engagement: a key component of integrating genomic information into electronic health records.

    Science.gov (United States)

    Hartzler, Andrea; McCarty, Catherine A; Rasmussen, Luke V; Williams, Marc S; Brilliant, Murray; Bowton, Erica A; Clayton, Ellen Wright; Faucett, William A; Ferryman, Kadija; Field, Julie R; Fullerton, Stephanie M; Horowitz, Carol R; Koenig, Barbara A; McCormick, Jennifer B; Ralston, James D; Sanderson, Saskia C; Smith, Maureen E; Trinidad, Susan Brown

    2013-10-01

    Integrating genomic information into clinical care and the electronic health record can facilitate personalized medicine through genetically guided clinical decision support. Stakeholder involvement is critical to the success of these implementation efforts. Prior work on implementation of clinical information systems provides broad guidance to inform effective engagement strategies. We add to this evidence-based recommendations that are specific to issues at the intersection of genomics and the electronic health record. We describe stakeholder engagement strategies employed by the Electronic Medical Records and Genomics Network, a national consortium of US research institutions funded by the National Human Genome Research Institute to develop, disseminate, and apply approaches that combine genomic and electronic health record data. Through select examples drawn from sites of the Electronic Medical Records and Genomics Network, we illustrate a continuum of engagement strategies to inform genomic integration into commercial and homegrown electronic health records across a range of health-care settings. We frame engagement as activities to consult, involve, and partner with key stakeholder groups throughout specific phases of health information technology implementation. Our aim is to provide insights into engagement strategies to guide genomic integration based on our unique network experiences and lessons learned within the broader context of implementation research in biomedical informatics. On the basis of our collective experience, we describe key stakeholder practices, challenges, and considerations for successful genomic integration to support personalized medicine.

  19. Brassica database (BRAD) version 2.0: integrating and mining Brassicaceae species genomic resources.

    Science.gov (United States)

    Wang, Xiaobo; Wu, Jian; Liang, Jianli; Cheng, Feng; Wang, Xiaowu

    2015-01-01

    The Brassica database (BRAD) was built initially to assist users apply Brassica rapa and Arabidopsis thaliana genomic data efficiently to their research. However, many Brassicaceae genomes have been sequenced and released after its construction. These genomes are rich resources for comparative genomics, gene annotation and functional evolutionary studies of Brassica crops. Therefore, we have updated BRAD to version 2.0 (V2.0). In BRAD V2.0, 11 more Brassicaceae genomes have been integrated into the database, namely those of Arabidopsis lyrata, Aethionema arabicum, Brassica oleracea, Brassica napus, Camelina sativa, Capsella rubella, Leavenworthia alabamica, Sisymbrium irio and three extremophiles Schrenkiella parvula, Thellungiella halophila and Thellungiella salsuginea. BRAD V2.0 provides plots of syntenic genomic fragments between pairs of Brassicaceae species, from the level of chromosomes to genomic blocks. The Generic Synteny Browser (GBrowse_syn), a module of the Genome Browser (GBrowse), is used to show syntenic relationships between multiple genomes. Search functions for retrieving syntenic and non-syntenic orthologs, as well as their annotation and sequences are also provided. Furthermore, genome and annotation information have been imported into GBrowse so that all functional elements can be visualized in one frame. We plan to continually update BRAD by integrating more Brassicaceae genomes into the database. Database URL: http://brassicadb.org/brad/. © The Author(s) 2015. Published by Oxford University Press.

  20. Role of genomic typing in taxonomy, evolutionary genetics, and microbial epidemiology.

    NARCIS (Netherlands)

    Belkum, van A.; Struelens, M.; Visser, de J.A.G.M.; Verburgh, H.; Tibayrenc., M.

    2001-01-01

    Currently, genetic typing of microorganisms is widely used in several major fields of microbiological research. Taxonomy, research aimed at elucidation of evolutionary dynamics or phylogenetic relationships, population genetics of microorganisms, and microbial epidemiology all rely on genetic typing

  1. Toward integration of genomic selection with crop modelling: the development of an integrated approach to predicting rice heading dates.

    Science.gov (United States)

    Onogi, Akio; Watanabe, Maya; Mochizuki, Toshihiro; Hayashi, Takeshi; Nakagawa, Hiroshi; Hasegawa, Toshihiro; Iwata, Hiroyoshi

    2016-04-01

    It is suggested that accuracy in predicting plant phenotypes can be improved by integrating genomic prediction with crop modelling in a single hierarchical model. Accurate prediction of phenotypes is important for plant breeding and management. Although genomic prediction/selection aims to predict phenotypes on the basis of whole-genome marker information, it is often difficult to predict phenotypes of complex traits in diverse environments, because plant phenotypes are often influenced by genotype-environment interaction. A possible remedy is to integrate genomic prediction with crop/ecophysiological modelling, which enables us to predict plant phenotypes using environmental and management information. To this end, in the present study, we developed a novel method for integrating genomic prediction with phenological modelling of Asian rice (Oryza sativa, L.), allowing the heading date of untested genotypes in untested environments to be predicted. The method simultaneously infers the phenological model parameters and whole-genome marker effects on the parameters in a Bayesian framework. By cultivating backcross inbred lines of Koshihikari × Kasalath in nine environments, we evaluated the potential of the proposed method in comparison with conventional genomic prediction, phenological modelling, and two-step methods that applied genomic prediction to phenological model parameters inferred from Nelder-Mead or Markov chain Monte Carlo algorithms. In predicting heading dates of untested lines in untested environments, the proposed and two-step methods tended to provide more accurate predictions than the conventional genomic prediction methods, particularly in environments where phenotypes from environments similar to the target environment were unavailable for training genomic prediction. The proposed method showed greater accuracy in prediction than the two-step methods in all cross-validation schemes tested, suggesting the potential of the integrated approach in

  2. Characterization of Equine Infectious Anemia Virus Integration in the Horse Genome

    Directory of Open Access Journals (Sweden)

    Qiang Liu

    2015-06-01

    Full Text Available Human immunodeficiency virus (HIV-1 has a unique integration profile in the human genome relative to murine and avian retroviruses. Equine infectious anemia virus (EIAV is another well-studied lentivirus that can also be used as a promising retro-transfection vector, but its integration into its native host has not been characterized. In this study, we mapped 477 integration sites of the EIAV strain EIAVFDDV13 in fetal equine dermal (FED cells during in vitro infection. Published integration sites of EIAV and HIV-1 in the human genome were also analyzed as references. Our results demonstrated that EIAVFDDV13 tended to integrate into genes and AT-rich regions, and it avoided integrating into transcription start sites (TSS, which is consistent with EIAV and HIV-1 integration in the human genome. Notably, the integration of EIAVFDDV13 favored long interspersed elements (LINEs and DNA transposons in the horse genome, whereas the integration of HIV-1 favored short interspersed elements (SINEs in the human genome. The chromosomal environment near LINEs or DNA transposons potentially influences viral transcription and may be related to the unique EIAV latency states in equids. The data on EIAV integration in its natural host will facilitate studies on lentiviral infection and lentivirus-based therapeutic vectors.

  3. Community standards for genomic resources, genetic conservation, and data integration

    Science.gov (United States)

    Jill Wegrzyn; Meg Staton; Emily Grau; Richard Cronn; C. Dana Nelson

    2017-01-01

    Genetics and genomics are increasingly important in forestry management and conservation. Next generation sequencing can increase analytical power, but still relies on building on the structure of previously acquired data. Data standards and data sharing allow the community to maximize the analytical power of high throughput genomics data. The landscape of incomplete...

  4. GIGGLE: a search engine for large-scale integrated genome analysis

    Science.gov (United States)

    Layer, Ryan M; Pedersen, Brent S; DiSera, Tonya; Marth, Gabor T; Gertz, Jason; Quinlan, Aaron R

    2018-01-01

    GIGGLE is a genomics search engine that identifies and ranks the significance of genomic loci shared between query features and thousands of genome interval files. GIGGLE (https://github.com/ryanlayer/giggle) scales to billions of intervals and is over three orders of magnitude faster than existing methods. Its speed extends the accessibility and utility of resources such as ENCODE, Roadmap Epigenomics, and GTEx by facilitating data integration and hypothesis generation. PMID:29309061

  5. GIGGLE: a search engine for large-scale integrated genome analysis.

    Science.gov (United States)

    Layer, Ryan M; Pedersen, Brent S; DiSera, Tonya; Marth, Gabor T; Gertz, Jason; Quinlan, Aaron R

    2018-02-01

    GIGGLE is a genomics search engine that identifies and ranks the significance of genomic loci shared between query features and thousands of genome interval files. GIGGLE (https://github.com/ryanlayer/giggle) scales to billions of intervals and is over three orders of magnitude faster than existing methods. Its speed extends the accessibility and utility of resources such as ENCODE, Roadmap Epigenomics, and GTEx by facilitating data integration and hypothesis generation.

  6. The three-dimensional genome organization of Drosophila melanogaster through data integration.

    Science.gov (United States)

    Li, Qingjiao; Tjong, Harianto; Li, Xiao; Gong, Ke; Zhou, Xianghong Jasmine; Chiolo, Irene; Alber, Frank

    2017-07-31

    Genome structures are dynamic and non-randomly organized in the nucleus of higher eukaryotes. To maximize the accuracy and coverage of three-dimensional genome structural models, it is important to integrate all available sources of experimental information about a genome's organization. It remains a major challenge to integrate such data from various complementary experimental methods. Here, we present an approach for data integration to determine a population of complete three-dimensional genome structures that are statistically consistent with data from both genome-wide chromosome conformation capture (Hi-C) and lamina-DamID experiments. Our structures resolve the genome at the resolution of topological domains, and reproduce simultaneously both sets of experimental data. Importantly, this data deconvolution framework allows for structural heterogeneity between cells, and hence accounts for the expected plasticity of genome structures. As a case study we choose Drosophila melanogaster embryonic cells, for which both data types are available. Our three-dimensional genome structures have strong predictive power for structural features not directly visible in the initial data sets, and reproduce experimental hallmarks of the D. melanogaster genome organization from independent and our own imaging experiments. Also they reveal a number of new insights about genome organization and its functional relevance, including the preferred locations of heterochromatic satellites of different chromosomes, and observations about homologous pairing that cannot be directly observed in the original Hi-C or lamina-DamID data. Our approach allows systematic integration of Hi-C and lamina-DamID data for complete three-dimensional genome structure calculation, while also explicitly considering genome structural variability.

  7. Protecting genomic integrity in somatic cells and embryonic stem cells

    International Nuclear Information System (INIS)

    Hong, Y.; Cervantes, R.B.; Tichy, E.; Tischfield, J.A.; Stambrook, P.J.

    2007-01-01

    Mutation frequencies at some loci in mammalian somatic cells in vivo approach 10 -4 . The majority of these events occur as a consequence of loss of heterozygosity (LOH) due to mitotic recombination. Such high levels of DNA damage in somatic cells, which can accumulate with age, will cause injury and, after a latency period, may lead to somatic disease and ultimately death. This high level of DNA damage is untenable for germ cells, and by extrapolation for embryonic stem (ES) cells, that must recreate the organism. ES cells cannot tolerate such a high frequency of damage since mutations will immediately impact the altered cell, and subsequently the entire organism. Most importantly, the mutations may be passed on to future generations. ES cells, therefore, must have robust mechanisms to protect the integrity of their genomes. We have examined two such mechanisms. Firstly, we have shown that mutation frequencies and frequencies of mitotic recombination in ES cells are about 100-fold lower than in adult somatic cells or in isogenic mouse embryonic fibroblasts (MEFs). A second complementary protective mechanism eliminates those ES cells that have acquired a mutational burden, thereby maintaining a pristine population. Consistent with this hypothesis, ES cells lack a G1 checkpoint, and the two known signaling pathways that mediate the checkpoint are compromised. The checkpoint kinase, Chk2, which participates in both pathways is sequestered at centrosomes in ES cells and does not phosphorylate its substrates (i.e. p53 and Cdc25A) that must be modified to produce a G1 arrest. Ectopic expression of Chk2 does not rescue the p53-mediated pathway, but does restore the pathway mediated by Cdc25A. Wild type ES cells exposed to ionizing radiation do not accumulate in G1 but do so in S-phase and in G2. ES cells that ectopically express Chk2 undergo cell cycle arrest in G1 as well as G2, and appear to be protected from apoptosis

  8. Comprehensive, Integrative Genomic Analysis of Diffuse Lower-Grade Gliomas.

    Science.gov (United States)

    Brat, Daniel J; Verhaak, Roel G W; Aldape, Kenneth D; Yung, W K Alfred; Salama, Sofie R; Cooper, Lee A D; Rheinbay, Esther; Miller, C Ryan; Vitucci, Mark; Morozova, Olena; Robertson, A Gordon; Noushmehr, Houtan; Laird, Peter W; Cherniack, Andrew D; Akbani, Rehan; Huse, Jason T; Ciriello, Giovanni; Poisson, Laila M; Barnholtz-Sloan, Jill S; Berger, Mitchel S; Brennan, Cameron; Colen, Rivka R; Colman, Howard; Flanders, Adam E; Giannini, Caterina; Grifford, Mia; Iavarone, Antonio; Jain, Rajan; Joseph, Isaac; Kim, Jaegil; Kasaian, Katayoon; Mikkelsen, Tom; Murray, Bradley A; O'Neill, Brian Patrick; Pachter, Lior; Parsons, Donald W; Sougnez, Carrie; Sulman, Erik P; Vandenberg, Scott R; Van Meir, Erwin G; von Deimling, Andreas; Zhang, Hailei; Crain, Daniel; Lau, Kevin; Mallery, David; Morris, Scott; Paulauskis, Joseph; Penny, Robert; Shelton, Troy; Sherman, Mark; Yena, Peggy; Black, Aaron; Bowen, Jay; Dicostanzo, Katie; Gastier-Foster, Julie; Leraas, Kristen M; Lichtenberg, Tara M; Pierson, Christopher R; Ramirez, Nilsa C; Taylor, Cynthia; Weaver, Stephanie; Wise, Lisa; Zmuda, Erik; Davidsen, Tanja; Demchok, John A; Eley, Greg; Ferguson, Martin L; Hutter, Carolyn M; Mills Shaw, Kenna R; Ozenberger, Bradley A; Sheth, Margi; Sofia, Heidi J; Tarnuzzer, Roy; Wang, Zhining; Yang, Liming; Zenklusen, Jean Claude; Ayala, Brenda; Baboud, Julien; Chudamani, Sudha; Jensen, Mark A; Liu, Jia; Pihl, Todd; Raman, Rohini; Wan, Yunhu; Wu, Ye; Ally, Adrian; Auman, J Todd; Balasundaram, Miruna; Balu, Saianand; Baylin, Stephen B; Beroukhim, Rameen; Bootwalla, Moiz S; Bowlby, Reanne; Bristow, Christopher A; Brooks, Denise; Butterfield, Yaron; Carlsen, Rebecca; Carter, Scott; Chin, Lynda; Chu, Andy; Chuah, Eric; Cibulskis, Kristian; Clarke, Amanda; Coetzee, Simon G; Dhalla, Noreen; Fennell, Tim; Fisher, Sheila; Gabriel, Stacey; Getz, Gad; Gibbs, Richard; Guin, Ranabir; Hadjipanayis, Angela; Hayes, D Neil; Hinoue, Toshinori; Hoadley, Katherine; Holt, Robert A; Hoyle, Alan P; Jefferys, Stuart R; Jones, Steven; Jones, Corbin D; Kucherlapati, Raju; Lai, Phillip H; Lander, Eric; Lee, Semin; Lichtenstein, Lee; Ma, Yussanne; Maglinte, Dennis T; Mahadeshwar, Harshad S; Marra, Marco A; Mayo, Michael; Meng, Shaowu; Meyerson, Matthew L; Mieczkowski, Piotr A; Moore, Richard A; Mose, Lisle E; Mungall, Andrew J; Pantazi, Angeliki; Parfenov, Michael; Park, Peter J; Parker, Joel S; Perou, Charles M; Protopopov, Alexei; Ren, Xiaojia; Roach, Jeffrey; Sabedot, Thaís S; Schein, Jacqueline; Schumacher, Steven E; Seidman, Jonathan G; Seth, Sahil; Shen, Hui; Simons, Janae V; Sipahimalani, Payal; Soloway, Matthew G; Song, Xingzhi; Sun, Huandong; Tabak, Barbara; Tam, Angela; Tan, Donghui; Tang, Jiabin; Thiessen, Nina; Triche, Timothy; Van Den Berg, David J; Veluvolu, Umadevi; Waring, Scot; Weisenberger, Daniel J; Wilkerson, Matthew D; Wong, Tina; Wu, Junyuan; Xi, Liu; Xu, Andrew W; Yang, Lixing; Zack, Travis I; Zhang, Jianhua; Aksoy, B Arman; Arachchi, Harindra; Benz, Chris; Bernard, Brady; Carlin, Daniel; Cho, Juok; DiCara, Daniel; Frazer, Scott; Fuller, Gregory N; Gao, JianJiong; Gehlenborg, Nils; Haussler, David; Heiman, David I; Iype, Lisa; Jacobsen, Anders; Ju, Zhenlin; Katzman, Sol; Kim, Hoon; Knijnenburg, Theo; Kreisberg, Richard Bailey; Lawrence, Michael S; Lee, William; Leinonen, Kalle; Lin, Pei; Ling, Shiyun; Liu, Wenbin; Liu, Yingchun; Liu, Yuexin; Lu, Yiling; Mills, Gordon; Ng, Sam; Noble, Michael S; Paull, Evan; Rao, Arvind; Reynolds, Sheila; Saksena, Gordon; Sanborn, Zack; Sander, Chris; Schultz, Nikolaus; Senbabaoglu, Yasin; Shen, Ronglai; Shmulevich, Ilya; Sinha, Rileen; Stuart, Josh; Sumer, S Onur; Sun, Yichao; Tasman, Natalie; Taylor, Barry S; Voet, Doug; Weinhold, Nils; Weinstein, John N; Yang, Da; Yoshihara, Kosuke; Zheng, Siyuan; Zhang, Wei; Zou, Lihua; Abel, Ty; Sadeghi, Sara; Cohen, Mark L; Eschbacher, Jenny; Hattab, Eyas M; Raghunathan, Aditya; Schniederjan, Matthew J; Aziz, Dina; Barnett, Gene; Barrett, Wendi; Bigner, Darell D; Boice, Lori; Brewer, Cathy; Calatozzolo, Chiara; Campos, Benito; Carlotti, Carlos Gilberto; Chan, Timothy A; Cuppini, Lucia; Curley, Erin; Cuzzubbo, Stefania; Devine, Karen; DiMeco, Francesco; Duell, Rebecca; Elder, J Bradley; Fehrenbach, Ashley; Finocchiaro, Gaetano; Friedman, William; Fulop, Jordonna; Gardner, Johanna; Hermes, Beth; Herold-Mende, Christel; Jungk, Christine; Kendler, Ady; Lehman, Norman L; Lipp, Eric; Liu, Ouida; Mandt, Randy; McGraw, Mary; Mclendon, Roger; McPherson, Christopher; Neder, Luciano; Nguyen, Phuong; Noss, Ardene; Nunziata, Raffaele; Ostrom, Quinn T; Palmer, Cheryl; Perin, Alessandro; Pollo, Bianca; Potapov, Alexander; Potapova, Olga; Rathmell, W Kimryn; Rotin, Daniil; Scarpace, Lisa; Schilero, Cathy; Senecal, Kelly; Shimmel, Kristen; Shurkhay, Vsevolod; Sifri, Suzanne; Singh, Rosy; Sloan, Andrew E; Smolenski, Kathy; Staugaitis, Susan M; Steele, Ruth; Thorne, Leigh; Tirapelli, Daniela P C; Unterberg, Andreas; Vallurupalli, Mahitha; Wang, Yun; Warnick, Ronald; Williams, Felicia; Wolinsky, Yingli; Bell, Sue; Rosenberg, Mara; Stewart, Chip; Huang, Franklin; Grimsby, Jonna L; Radenbaugh, Amie J; Zhang, Jianan

    2015-06-25

    Diffuse low-grade and intermediate-grade gliomas (which together make up the lower-grade gliomas, World Health Organization grades II and III) have highly variable clinical behavior that is not adequately predicted on the basis of histologic class. Some are indolent; others quickly progress to glioblastoma. The uncertainty is compounded by interobserver variability in histologic diagnosis. Mutations in IDH, TP53, and ATRX and codeletion of chromosome arms 1p and 19q (1p/19q codeletion) have been implicated as clinically relevant markers of lower-grade gliomas. We performed genomewide analyses of 293 lower-grade gliomas from adults, incorporating exome sequence, DNA copy number, DNA methylation, messenger RNA expression, microRNA expression, and targeted protein expression. These data were integrated and tested for correlation with clinical outcomes. Unsupervised clustering of mutations and data from RNA, DNA-copy-number, and DNA-methylation platforms uncovered concordant classification of three robust, nonoverlapping, prognostically significant subtypes of lower-grade glioma that were captured more accurately by IDH, 1p/19q, and TP53 status than by histologic class. Patients who had lower-grade gliomas with an IDH mutation and 1p/19q codeletion had the most favorable clinical outcomes. Their gliomas harbored mutations in CIC, FUBP1, NOTCH1, and the TERT promoter. Nearly all lower-grade gliomas with IDH mutations and no 1p/19q codeletion had mutations in TP53 (94%) and ATRX inactivation (86%). The large majority of lower-grade gliomas without an IDH mutation had genomic aberrations and clinical behavior strikingly similar to those found in primary glioblastoma. The integration of genomewide data from multiple platforms delineated three molecular classes of lower-grade gliomas that were more concordant with IDH, 1p/19q, and TP53 status than with histologic class. Lower-grade gliomas with an IDH mutation either had 1p/19q codeletion or carried a TP53 mutation. Most

  9. Hidden diversity revealed by genome-resolved metagenomics of iron-oxidizing microbial mats from Lō'ihi Seamount, Hawai'i.

    Science.gov (United States)

    Fullerton, Heather; Hager, Kevin W; McAllister, Sean M; Moyer, Craig L

    2017-08-01

    The Zetaproteobacteria are ubiquitous in marine environments, yet this class of Proteobacteria is only represented by a few closely-related cultured isolates. In high-iron environments, such as diffuse hydrothermal vents, the Zetaproteobacteria are important members of the community driving its structure. Biogeography of Zetaproteobacteria has shown two ubiquitous operational taxonomic units (OTUs), yet much is unknown about their genomic diversity. Genome-resolved metagenomics allows for the specific binning of microbial genomes based on genomic signatures present in composite metagenome assemblies. This resulted in the recovery of 93 genome bins, of which 34 were classified as Zetaproteobacteria. Form II ribulose 1,5-bisphosphate carboxylase genes were recovered from nearly all the Zetaproteobacteria genome bins. In addition, the Zetaproteobacteria genome bins contain genes for uptake and utilization of bioavailable nitrogen, detoxification of arsenic, and a terminal electron acceptor adapted for low oxygen concentration. Our results also support the hypothesis of a Cyc2-like protein as the site for iron oxidation, now detected across a majority of the Zetaproteobacteria genome bins. Whole genome comparisons showed a high genomic diversity across the Zetaproteobacteria OTUs and genome bins that were previously unidentified by SSU rRNA gene analysis. A single lineage of cosmopolitan Zetaproteobacteria (zOTU 2) was found to be monophyletic, based on cluster analysis of average nucleotide identity and average amino acid identity comparisons. From these data, we can begin to pinpoint genomic adaptations of the more ecologically ubiquitous Zetaproteobacteria, and further understand their environmental constraints and metabolic potential.

  10. Analysis of pan-genome content and its application in microbial identification

    DEFF Research Database (Denmark)

    Lukjancenko, Oksana

    microorganisms and eventually speed up the diagnosis of foodborne illnesses. This genomic data can give biologists many possibilities to improve knowledge of organismal evolution and complex genetic systems. The general interest of this PhD thesis is how to obtain relevant information from growing amounts...... groups or genomic structures; and to use the information of a specific proteome to predict which species it might belong to. Two different algorithms, BLAST and profile Hidden Markov Models (HMMs), are used to determine similarity between sequences and to address the questions in this thesis. The first...... the application of PanFunPro to a set of more than 2000 genomes; this paper aims to define set of protein families, which are conserved among all the genomes. Papers V demonstrates comparative genomics analysis of proteomes, belonging to Vibrio genus. In the last project, described in Chapter 5, both BLAST...

  11. Quantitative and qualitative proteome characteristics extracted from in-depth integrated genomics and proteomics analysis

    NARCIS (Netherlands)

    Low, T.Y.; van Heesch, S.; van den Toorn, H.; Giansanti, P.; Cristobal, A.; Toonen, P.; Schafer, S.; Hubner, N.; van Breukelen, B.; Mohammed, S.; Cuppen, E.; Heck, A.J.R.; Guryev, V.

    2013-01-01

    Quantitative and qualitative protein characteristics are regulated at genomic, transcriptomic, and posttranscriptional levels. Here, we integrated in-depth transcriptome and proteome analyses of liver tissues from two rat strains to unravel the interactions within and between these layers. We

  12. Brassica ASTRA: an integrated database for Brassica genomic research.

    Science.gov (United States)

    Love, Christopher G; Robinson, Andrew J; Lim, Geraldine A C; Hopkins, Clare J; Batley, Jacqueline; Barker, Gary; Spangenberg, German C; Edwards, David

    2005-01-01

    Brassica ASTRA is a public database for genomic information on Brassica species. The database incorporates expressed sequences with Swiss-Prot and GenBank comparative sequence annotation as well as secondary Gene Ontology (GO) annotation derived from the comparison with Arabidopsis TAIR GO annotations. Simple sequence repeat molecular markers are identified within resident sequences and mapped onto the closely related Arabidopsis genome sequence. Bacterial artificial chromosome (BAC) end sequences derived from the Multinational Brassica Genome Project are also mapped onto the Arabidopsis genome sequence enabling users to identify candidate Brassica BACs corresponding to syntenic regions of Arabidopsis. This information is maintained in a MySQL database with a web interface providing the primary means of interrogation. The database is accessible at http://hornbill.cspp.latrobe.edu.au.

  13. Mass spectral molecular networking of living microbial colonies

    NARCIS (Netherlands)

    Watrous, J.; Roach, P.; Alexandrov, T.; Heath, B.S.; Yang, J.Y.; Kersten, R.D.; Voort, van der M.; Pogliano, K.; Gross, H.; Raaijmakers, J.; Moore, B.S.; Laskin, J.; Bandeira, N.; Dorrestein, P.C.

    2012-01-01

    Integrating the governing chemistry with the genomics and phenotypes of microbial colonies has been a “holy grail” in microbiology. This work describes a highly sensitive, broadly applicable, and cost-effective approach that allows metabolic profiling of live microbial colonies directly from a Petri

  14. Rapid scoring of genes in microbial pan-genome-wide association studies with Scoary.

    Science.gov (United States)

    Brynildsrud, Ola; Bohlin, Jon; Scheffer, Lonneke; Eldholm, Vegard

    2016-11-25

    Genome-wide association studies (GWAS) have become indispensable in human medicine and genomics, but very few have been carried out on bacteria. Here we introduce Scoary, an ultra-fast, easy-to-use, and widely applicable software tool that scores the components of the pan-genome for associations to observed phenotypic traits while accounting for population stratification, with minimal assumptions about evolutionary processes. We call our approach pan-GWAS to distinguish it from traditional, single nucleotide polymorphism (SNP)-based GWAS. Scoary is implemented in Python and is available under an open source GPLv3 license at https://github.com/AdmiralenOla/Scoary .

  15. Insights into recent and ancient trends in the co-evolution of Earth and life as revealed by microbial genomics

    Science.gov (United States)

    Anderson, R. E.; Huber, J. A.; Parsons, C.; Stüeken, E.

    2017-12-01

    Since the origin of life over 4 billion years ago, life has fundamentally altered the habitability of Earth. Similarly, the environment molds the evolutionary trajectory of life itself through natural selection. Microbial genomes retain a "memory" of the co-evolution of life and Earth and can be analyzed to better understand trends and events in both the recent and distant past. To examine evolutionary trends in the more recent past, we have used metagenomics analyses to investigate which environmental factors play the strongest role in driving the evolution of microbes in deep-sea hydrothermal vents, which are thought to have been important habitats in the earliest stages of life's evolution. We have shown that microbial populations in a deep, basalt-hosted system appear to be under stronger purifying selection than populations inhabiting a cooler serpentinizing system less than 20 km away, suggesting that environmental context and geochemistry have an important impact on evolutionary rates and trends. We also found evidence that viruses play an important role in driving evolution in these habitats. Changing environmental conditions may also effect long-term evolutionary trends in Earth's distant past, as revealed by comparative genomics. By reconciling phylogenetic trees for microbial species with trees of metabolic genes, we can determine approximately when crucial metabolic genes began to spread across the tree of life through horizontal gene transfer. Using these methods, we conducted an analysis of the relative timing of the spread of genes related to the nitrogen cycle. Our results indicate that the rate of horizontal gene transfer for important genes related to denitrification increased after the Great Oxidation Event, concurrent with geochemical evidence for increasing availability of nitrate, suggesting that the oxygenation of the atmosphere and surface ocean may have been an important determining factor for the spread of denitrification genes across the

  16. Effect of humic acids on electricity generation integrated with xylose degradation in microbial fuel cells

    DEFF Research Database (Denmark)

    Huang, Liping; Angelidaki, Irini

    2008-01-01

    Pentose and humic acids (HA) are the main components of hydrolysates, the liquid fraction produced during thermohydrolysis of lignocellulosic material. Electricity generation integrated with xylose (typical pentose) degradation as well as the effect of HA on electricity production in microbial fuel...... to controls where HAs were not added, addition of commercial HA resulted in increase of power density and coulombic efficiency, which ranged from 7.5% to 67.4% and 24% to 92.6%, respectively. Digested manure wastewater (DMW) was tested as potential mediator for power generation due to its content of natural...

  17. MICROBIAL CHARACTERISTICS OF SOILS UNDER AN INTEGRATED CROP-LIVESTOCK SYSTEM

    Directory of Open Access Journals (Sweden)

    Andréa Scaramal da Silva

    2015-02-01

    Full Text Available Integrated crop-livestock systems (ICLs are a viable strategy for the recovery and maintenance of soil characteristics. In the present study, an ICL experiment was conducted by the Instituto Agronômico do Paraná in the municipality of Xambre, Parana (PR, Brazil, to evaluate the effects of various grazing intensities. The objective of the present study was to quantify the levels of microbial biomass carbon (MBC and soil enzymatic activity in an ICL of soybean (summer and Brachiaria ruziziensis (winter, with B. ruziziensis subjected to various grazing intensities. Treatments consisted of varying pasture heights and grazing intensities (GI: 10, 20, 30, and 40 cm (GI-10, GI-20, GI-30, and GI-40, respectively and a no grazing (NG control. The microbial characteristics analysed were MBC, microbial respiration (MR, metabolic quotient (qCO2, the activities of acid phosphatase, β-glucosidase, arylsuphatase, and cellulase, and fluorescein diacetate (FDA hydrolysis. Following the second grazing cycle, the GI-20 treatment (20-cm - moderate grazing intensity contained the highest MBC concentrations and lowest qCO2 concentrations. Following the second soybean cycle, the treatment with the highest grazing intensity (GI-10 contained the lowest MBC concentration. Soil MBC concentrations in the pasture were favoured by the introduction of animals to the system. High grazing intensity (10-cm pasture height during the pasture cycle may cause a decrease in soil MBC and have a negative effect on the microbial biomass during the succeeding crop. Of all the enzymes analyzed, only arylsuphatase and cellulase activities were altered by ICL management, with differences between the moderate grazing intensity (GI-20 and no grazing (NG treatments.

  18. Genome Sequence of Rhodoferax antarcticus ANT.BRT; A Psychrophilic Purple Nonsulfur Bacterium from an Antarctic Microbial Mat

    Directory of Open Access Journals (Sweden)

    Jennifer M. Baker

    2017-02-01

    Full Text Available Rhodoferax antarcticus is an Antarctic purple nonsulfur bacterium and the only characterized anoxygenic phototroph that grows best below 20 °C. We present here a high-quality draft genome of Rfx. antarcticus strain ANT.BRT, isolated from an Antarctic microbial mat. The circular chromosome (3.8 Mbp of Rfx. antarcticus has a 59.1% guanine + cytosine (GC content and contains 4036 open reading frames. In addition, the bacterium contains a sizable plasmid (198.6 kbp, 48.4% GC with 226 open reading frames that comprises about 5% of the total genetic content. Surprisingly, genes encoding light-harvesting complexes 1 and 3 (LH1 and LH3, but not light-harvesting complex 2 (LH2, were identified in the photosynthesis gene cluster of the Rfx. antarcticus genome, a feature that is unique among purple phototrophs. Consistent with physiological studies that showed a strong capacity for nitrogen fixation in Rfx. antarcticus, a nitrogen fixation gene cluster encoding a molybdenum-type nitrogenase was present, but no alternative nitrogenases were identified despite the cold-active phenotype of this phototroph. Genes encoding two forms of ribulose 1,5-bisphosphate carboxylase/oxygenase were present in the Rfx. antarcticus genome, a feature that likely provides autotrophic flexibility under varying environmental conditions. Lastly, genes for assembly of both type IV pili and flagella are present, with the latter showing an unusual degree of clustering. This report represents the first genomic analysis of a psychrophilic anoxygenic phototroph and provides a glimpse of the genetic basis for maintaining a phototrophic lifestyle in a permanently cold, yet highly variable, environment.

  19. Genomic insights into microbial iron oxidation and iron uptake strategies in extremely acidic environments.

    Science.gov (United States)

    Bonnefoy, Violaine; Holmes, David S

    2012-07-01

    This minireview presents recent advances in our understanding of iron oxidation and homeostasis in acidophilic Bacteria and Archaea. These processes influence the flux of metals and nutrients in pristine and man-made acidic environments such as acid mine drainage and industrial bioleaching operations. Acidophiles are also being studied to understand life in extreme conditions and their role in the generation of biomarkers used in the search for evidence of existing or past extra-terrestrial life. Iron oxidation in acidophiles is best understood in the model organism Acidithiobacillus ferrooxidans. However, recent functional genomic analysis of acidophiles is leading to a deeper appreciation of the diversity of acidophilic iron-oxidizing pathways. Although it is too early to paint a detailed picture of the role played by lateral gene transfer in the evolution of iron oxidation, emerging evidence tends to support the view that iron oxidation arose independently more than once in evolution. Acidic environments are generally rich in soluble iron and extreme acidophiles (e.g. the Leptospirillum genus) have considerably fewer iron uptake systems compared with neutrophiles. However, some acidophiles have been shown to grow as high as pH 6 and, in the case of the Acidithiobacillus genus, to have multiple iron uptake systems. This could be an adaption allowing them to respond to different iron concentrations via the use of a multiplicity of different siderophores. Both Leptospirillum spp. and Acidithiobacillus spp. are predicted to synthesize the acid stable citrate siderophore for Fe(III) uptake. In addition, both groups have predicted receptors for siderophores produced by other microorganisms, suggesting that competition for iron occurs influencing the ecophysiology of acidic environments. Little is known about the genetic regulation of iron oxidation and iron uptake in acidophiles, especially how the use of iron as an energy source is balanced with its need to take up

  20. Complete genome sequence of DSM 30083(T), the type strain (U5/41(T)) of Escherichia coli, and a proposal for delineating subspecies in microbial taxonomy.

    OpenAIRE

    Meier-Kolthoff, Jan P; Hahnke, Richard L; Petersen, Jörn; Scheuner, Carmen; Michael, Victoria; Fiebig, Anne; Rohde, Christine; Rohde, Manfred; Fartmann, Berthold; Goodwin, Lynne A; Chertkov, Olga; Reddy, Tbk; Pati, Amrita; Ivanova, Natalia N; Markowitz, Victor

    2014-01-01

    Although Escherichia coli is the most widely studied bacterial model organism and often considered to be the model bacterium per se, its type strain was until now forgotten from microbial genomics. As a part of the G enomic E ncyclopedia of B acteria and A rchaea project, we here describe the features of E. coli DSM 30083T together with its genome sequence and annotation as well as novel aspects of its phenotype. The 5,038,133 bp containing genome sequence includes 4,762 protein-coding genes ...

  1. Methylobacterium genome sequences: a reference blueprint to investigate microbial metabolism of C1 compounds from natural and industrial sources.

    Directory of Open Access Journals (Sweden)

    Stéphane Vuilleumier

    Full Text Available Methylotrophy describes the ability of organisms to grow on reduced organic compounds without carbon-carbon bonds. The genomes of two pink-pigmented facultative methylotrophic bacteria of the Alpha-proteobacterial genus Methylobacterium, the reference species Methylobacterium extorquens strain AM1 and the dichloromethane-degrading strain DM4, were compared.The 6.88 Mb genome of strain AM1 comprises a 5.51 Mb chromosome, a 1.26 Mb megaplasmid and three plasmids, while the 6.12 Mb genome of strain DM4 features a 5.94 Mb chromosome and two plasmids. The chromosomes are highly syntenic and share a large majority of genes, while plasmids are mostly strain-specific, with the exception of a 130 kb region of the strain AM1 megaplasmid which is syntenic to a chromosomal region of strain DM4. Both genomes contain large sets of insertion elements, many of them strain-specific, suggesting an important potential for genomic plasticity. Most of the genomic determinants associated with methylotrophy are nearly identical, with two exceptions that illustrate the metabolic and genomic versatility of Methylobacterium. A 126 kb dichloromethane utilization (dcm gene cluster is essential for the ability of strain DM4 to use DCM as the sole carbon and energy source for growth and is unique to strain DM4. The methylamine utilization (mau gene cluster is only found in strain AM1, indicating that strain DM4 employs an alternative system for growth with methylamine. The dcm and mau clusters represent two of the chromosomal genomic islands (AM1: 28; DM4: 17 that were defined. The mau cluster is flanked by mobile elements, but the dcm cluster disrupts a gene annotated as chelatase and for which we propose the name "island integration determinant" (iid.These two genome sequences provide a platform for intra- and interspecies genomic comparisons in the genus Methylobacterium, and for investigations of the adaptive mechanisms which allow bacterial lineages to acquire

  2. A massive incorporation of microbial genes into the genome of Tetranychus urticae, a polyphagous arthropod herbivore.

    Science.gov (United States)

    Wybouw, N; Van Leeuwen, T; Dermauw, W

    2018-06-01

    A number of horizontal gene transfers (HGTs) have been identified in the spider mite Tetranychus urticae, a chelicerate herbivore. However, the genome of this mite species has at present not been thoroughly mined for the presence of HGT genes. Here, we performed a systematic screen for HGT genes in the T. urticae genome using the h-index metric. Our results not only validated previously identified HGT genes but also uncovered 25 novel HGT genes. In addition to HGT genes with a predicted biochemical function in carbohydrate, lipid and folate metabolism, we also identified the horizontal transfer of a ketopantoate hydroxymethyltransferase and a pantoate β-alanine ligase gene. In plants and bacteria, both genes are essential for vitamin B5 biosynthesis and their presence in the mite genome strongly suggests that spider mites, similar to Bemisia tabaci and nematodes, can synthesize their own vitamin B5. We further show that HGT genes were physically embedded within the mite genome and were expressed in different life stages. By screening chelicerate genomes and transcriptomes, we were able to estimate the evolutionary histories of these HGTs during chelicerate evolution. Our study suggests that HGT has made a significant and underestimated impact on the metabolic repertoire of plant-feeding spider mites. © 2018 The Royal Entomological Society.

  3. Genomic Microbial Epidemiology Is Needed to Comprehend the Global Problem of Antibiotic Resistance and to Improve Pathogen Diagnosis.

    Science.gov (United States)

    Wyrsch, Ethan R; Roy Chowdhury, Piklu; Chapman, Toni A; Charles, Ian G; Hammond, Jeffrey M; Djordjevic, Steven P

    2016-01-01

    Contamination of waste effluent from hospitals and intensive food animal production with antimicrobial residues is an immense global problem. Antimicrobial residues exert selection pressures that influence the acquisition of antimicrobial resistance and virulence genes in diverse microbial populations. Despite these concerns there is only a limited understanding of how antimicrobial residues contribute to the global problem of antimicrobial resistance. Furthermore, rapid detection of emerging bacterial pathogens and strains with resistance to more than one antibiotic class remains a challenge. A comprehensive, sequence-based genomic epidemiological surveillance model that captures essential microbial metadata is needed, both to improve surveillance for antimicrobial resistance and to monitor pathogen evolution. Escherichia coli is an important pathogen causing both intestinal [intestinal pathogenic E. coli (IPEC)] and extraintestinal [extraintestinal pathogenic E. coli (ExPEC)] disease in humans and food animals. ExPEC are the most frequently isolated Gram negative pathogen affecting human health, linked to food production practices and are often resistant to multiple antibiotics. Cattle are a known reservoir of IPEC but they are not recognized as a source of ExPEC that impact human or animal health. In contrast, poultry are a recognized source of multiple antibiotic resistant ExPEC, while swine have received comparatively less attention in this regard. Here, we review what is known about ExPEC in swine and how pig production contributes to the problem of antibiotic resistance.

  4. Modeling the integration of bacterial rRNA fragments into the human cancer genome.

    Science.gov (United States)

    Sieber, Karsten B; Gajer, Pawel; Dunning Hotopp, Julie C

    2016-03-21

    Cancer is a disease driven by the accumulation of genomic alterations, including the integration of exogenous DNA into the human somatic genome. We previously identified in silico evidence of DNA fragments from a Pseudomonas-like bacteria integrating into the 5'-UTR of four proto-oncogenes in stomach cancer sequencing data. The functional and biological consequences of these bacterial DNA integrations remain unknown. Modeling of these integrations suggests that the previously identified sequences cover most of the sequence flanking the junction between the bacterial and human DNA. Further examination of these reads reveals that these integrations are rich in guanine nucleotides and the integrated bacterial DNA may have complex transcript secondary structures. The models presented here lay the foundation for future experiments to test if bacterial DNA integrations alter the transcription of the human genes.

  5. Deeper insight into the structure of the anaerobic digestion microbial community; the biogas microbiome database is expanded with 157 new genomes

    DEFF Research Database (Denmark)

    Treu, Laura; Kougias, Panagiotis; Campanaro, Stefano

    2016-01-01

    strategy resulted in the highest, up to now, extraction of microbial genomes involved in biogas producing systems. From the 236 extracted genome bins, it was remarkably found that the vast majority of them could only be characterized at high taxonomic levels. This result confirms that the biogas microbiome......This research aimed to better characterize the biogas microbiome by means of high throughput metagenomic sequencing and to elucidate the core microbial consortium existing in biogas reactors independently from the operational conditions. Assembly of shotgun reads followed by an established binning...... is comprised by a consortium of unknown species. A comparative analysis between the genome bins of the current study and those extracted from a previous metagenomic assembly demonstrated a similar phylogenetic distribution of the main taxa. Finally, this analysis led to the identification of a subset of common...

  6. Deeper insight into the structure of the anaerobic digestion microbial community; the biogas microbiome database is expanded with 157 new genomes.

    Science.gov (United States)

    Treu, Laura; Kougias, Panagiotis G; Campanaro, Stefano; Bassani, Ilaria; Angelidaki, Irini

    2016-09-01

    This research aimed to better characterize the biogas microbiome by means of high throughput metagenomic sequencing and to elucidate the core microbial consortium existing in biogas reactors independently from the operational conditions. Assembly of shotgun reads followed by an established binning strategy resulted in the highest, up to now, extraction of microbial genomes involved in biogas producing systems. From the 236 extracted genome bins, it was remarkably found that the vast majority of them could only be characterized at high taxonomic levels. This result confirms that the biogas microbiome is comprised by a consortium of unknown species. A comparative analysis between the genome bins of the current study and those extracted from a previous metagenomic assembly demonstrated a similar phylogenetic distribution of the main taxa. Finally, this analysis led to the identification of a subset of common microbes that could be considered as the core essential group in biogas production. Copyright © 2016 Elsevier Ltd. All rights reserved.

  7. Integrated genomics of Mucorales reveals novel therapeutic targets

    Science.gov (United States)

    Mucormycosis is a life-threatening infection caused by Mucorales fungi. We sequenced 30 fungal genomes and performed transcriptomics with three representative Rhizopus and Mucor strains with human airway epithelial cells during fungal invasion to reveal key host and fungal determinants contributing ...

  8. An Integrated Genetic and Cytogenetic Map of the Cucumber Genome

    Science.gov (United States)

    The Cucurbitaceae includes important crops as cucumber, melon, watermelon, and squash and pumpkin. However, few genetic and genomic resources are available for plant improvement. Some cucurbit species such as cucumber have a narrow genetic base, which impedes construction of saturated molecular li...

  9. Integrated genome-based studies of Shewanella Ecophysiology

    Energy Technology Data Exchange (ETDEWEB)

    Tiedje, James M. [Michigan State Univ., East Lansing, MI (United States); Konstantinidis, Kostas [Michigan State Univ., East Lansing, MI (United States); Worden, Mark [Michigan State Univ., East Lansing, MI (United States)

    2014-01-08

    The aim of the work reported is to study Shewanella population genomics, and to understand the evolution, ecophysiology, and speciation of Shewanella. The tasks supporting this aim are: to study genetic and ecophysiological bases defining the core and diversification of Shewanella species; to determine gene content patterns along redox gradients; and to Investigate the evolutionary processes, patterns and mechanisms of Shewanella.

  10. Role of Genomic Typing in Taxonomy, Evolutionary Genetics, and Microbial Epidemiology

    Science.gov (United States)

    van Belkum, Alex; Struelens, Marc; de Visser, Arjan; Verbrugh, Henri; Tibayrenc, Michel

    2001-01-01

    Currently, genetic typing of microorganisms is widely used in several major fields of microbiological research. Taxonomy, research aimed at elucidation of evolutionary dynamics or phylogenetic relationships, population genetics of microorganisms, and microbial epidemiology all rely on genetic typing data for discrimination between genotypes. Apart from being an essential component of these fundamental sciences, microbial typing clearly affects several areas of applied microbiogical research. The epidemiological investigation of outbreaks of infectious diseases and the measurement of genetic diversity in relation to relevant biological properties such as pathogenicity, drug resistance, and biodegradation capacities are obvious examples. The diversity among nucleic acid molecules provides the basic information for all fields described above. However, researchers in various disciplines tend to use different vocabularies, a wide variety of different experimental methods to monitor genetic variation, and sometimes widely differing modes of data processing and interpretation. The aim of the present review is to summarize the technological and fundamental concepts used in microbial taxonomy, evolutionary genetics, and epidemiology. Information on the nomenclature used in the different fields of research is provided, descriptions of the diverse genetic typing procedures are presented, and examples of both conceptual and technological research developments for Escherichia coli are included. Recommendations for unification of the different fields through standardization of laboratory techniques are made. PMID:11432813

  11. Data-driven integration of genome-scale regulatory and metabolic network models

    Science.gov (United States)

    Imam, Saheed; Schäuble, Sascha; Brooks, Aaron N.; Baliga, Nitin S.; Price, Nathan D.

    2015-01-01

    Microbes are diverse and extremely versatile organisms that play vital roles in all ecological niches. Understanding and harnessing microbial systems will be key to the sustainability of our planet. One approach to improving our knowledge of microbial processes is through data-driven and mechanism-informed computational modeling. Individual models of biological networks (such as metabolism, transcription, and signaling) have played pivotal roles in driving microbial research through the years. These networks, however, are highly interconnected and function in concert—a fact that has led to the development of a variety of approaches aimed at simulating the integrated functions of two or more network types. Though the task of integrating these different models is fraught with new challenges, the large amounts of high-throughput data sets being generated, and algorithms being developed, means that the time is at hand for concerted efforts to build integrated regulatory-metabolic networks in a data-driven fashion. In this perspective, we review current approaches for constructing integrated regulatory-metabolic models and outline new strategies for future development of these network models for any microbial system. PMID:25999934

  12. Data-driven integration of genome-scale regulatory and metabolic network models

    Directory of Open Access Journals (Sweden)

    Saheed eImam

    2015-05-01

    Full Text Available Microbes are diverse and extremely versatile organisms that play vital roles in all ecological niches. Understanding and harnessing microbial systems will be key to the sustainability of our planet. One approach to improving our knowledge of microbial processes is through data-driven and mechanism-informed computational modeling. Individual models of biological networks (such as metabolism, transcription and signaling have played pivotal roles in driving microbial research through the years. These networks, however, are highly interconnected and function in concert – a fact that has led to the development of a variety of approaches aimed at simulating the integrated functions of two or more network types. Though the task of integrating these different models is fraught with new challenges, the large amounts of high-throughput data sets being generated, and algorithms being developed, means that the time is at hand for concerted efforts to build integrated regulatory-metabolic networks in a data-driven fashion. In this perspective, we review current approaches for constructing integrated regulatory-metabolic models and outline new strategies for future development of these network models for any microbial system.

  13. An Integrated Metabolomic and Genomic Mining Workflow to Uncover the Biosynthetic Potential of Bacteria

    DEFF Research Database (Denmark)

    Månsson, Maria; Vynne, Nikolaj Grønnegaard; Klitgaard, Andreas

    2016-01-01

    Microorganisms are a rich source of bioactives; however, chemical identification is a major bottleneck. Strategies that can prioritize the most prolific microbial strains and novel compounds are of great interest. Here, we present an integrated approach to evaluate the biosynthetic richness in ba...

  14. Genome-wide survey of recurrent HBV integration in hepatocellular carcinoma

    DEFF Research Database (Denmark)

    Sung, Wing-Kin; Zheng, Hancheng; Li, Shuyu

    2012-01-01

    To survey hepatitis B virus (HBV) integration in liver cancer genomes, we conducted massively parallel sequencing of 81 HBV-positive and 7 HBV-negative hepatocellular carcinomas (HCCs) and adjacent normal tissues. We found that HBV integration is observed more frequently in the tumors (86.4%) than...

  15. Integrated genome-based studies of Shewanella ecophysiology

    Energy Technology Data Exchange (ETDEWEB)

    Segre Daniel; Beg Qasim

    2012-02-14

    This project was a component of the Shewanella Federation and, as such, contributed to the overall goal of applying the genomic tools to better understand eco-physiology and speciation of respiratory-versatile members of Shewanella genus. Our role at Boston University was to perform bioreactor and high throughput gene expression microarrays, and combine dynamic flux balance modeling with experimentally obtained transcriptional and gene expression datasets from different growth conditions. In the first part of project, we designed the S. oneidensis microarray probes for Affymetrix Inc. (based in California), then we identified the pathways of carbon utilization in the metal-reducing marine bacterium Shewanella oneidensis MR-1, using our newly designed high-density oligonucleotide Affymetrix microarray on Shewanella cells grown with various carbon sources. Next, using a combination of experimental and computational approaches, we built algorithm and methods to integrate the transcriptional and metabolic regulatory networks of S. oneidensis. Specifically, we combined mRNA microarray and metabolite measurements with statistical inference and dynamic flux balance analysis (dFBA) to study the transcriptional response of S. oneidensis MR-1 as it passes through exponential, stationary, and transition phases. By measuring time-dependent mRNA expression levels during batch growth of S. oneidensis MR-1 under two radically different nutrient compositions (minimal lactate and nutritionally rich LB medium), we obtain detailed snapshots of the regulatory strategies used by this bacterium to cope with gradually changing nutrient availability. In addition to traditional clustering, which provides a first indication of major regulatory trends and transcription factors activities, we developed and implemented a new computational approach for Dynamic Detection of Transcriptional Triggers (D2T2). This new method allows us to infer a putative topology of transcriptional dependencies

  16. The Chthonomonas calidirosea Genome Is Highly Conserved across Geographic Locations and Distinct Chemical and Microbial Environments in New Zealand's Taupō Volcanic Zone.

    Science.gov (United States)

    Lee, Kevin C; Stott, Matthew B; Dunfield, Peter F; Huttenhower, Curtis; McDonald, Ian R; Morgan, Xochitl C

    2016-06-15

    Chthonomonas calidirosea T49(T) is a low-abundance, carbohydrate-scavenging, and thermophilic soil bacterium with a seemingly disorganized genome. We hypothesized that the C. calidirosea genome would be highly responsive to local selection pressure, resulting in the divergence of its genomic content, genome organization, and carbohydrate utilization phenotype across environments. We tested this hypothesis by sequencing the genomes of four C. calidirosea isolates obtained from four separate geothermal fields in the Taupō Volcanic Zone, New Zealand. For each isolation site, we measured physicochemical attributes and defined the associated microbial community by 16S rRNA gene sequencing. Despite their ecological and geographical isolation, the genome sequences showed low divergence (maximum, 1.17%). Isolate-specific variations included single-nucleotide polymorphisms (SNPs), restriction-modification systems, and mobile elements but few major deletions and no major rearrangements. The 50-fold variation in C. calidirosea relative abundance among the four sites correlated with site environmental characteristics but not with differences in genomic content. Conversely, the carbohydrate utilization profiles of the C. calidirosea isolates corresponded to the inferred isolate phylogenies, which only partially paralleled the geographical relationships among the sample sites. Genomic sequence conservation does not entirely parallel geographic distance, suggesting that stochastic dispersal and localized extinction, which allow for rapid population homogenization with little restriction by geographical barriers, are possible mechanisms of C. calidirosea distribution. This dispersal and extinction mechanism is likely not limited to C. calidirosea but may shape the populations and genomes of many other low-abundance free-living taxa. This study compares the genomic sequence variations and metabolisms of four strains of Chthonomonas calidirosea, a rare thermophilic bacterium from

  17. GMATA: An Integrated Software Package for Genome-Scale SSR Mining, Marker Development and Viewing.

    Science.gov (United States)

    Wang, Xuewen; Wang, Le

    2016-01-01

    Simple sequence repeats (SSRs), also referred to as microsatellites, are highly variable tandem DNAs that are widely used as genetic markers. The increasing availability of whole-genome and transcript sequences provides information resources for SSR marker development. However, efficient software is required to efficiently identify and display SSR information along with other gene features at a genome scale. We developed novel software package Genome-wide Microsatellite Analyzing Tool Package (GMATA) integrating SSR mining, statistical analysis and plotting, marker design, polymorphism screening and marker transferability, and enabled simultaneously display SSR markers with other genome features. GMATA applies novel strategies for SSR analysis and primer design in large genomes, which allows GMATA to perform faster calculation and provides more accurate results than existing tools. Our package is also capable of processing DNA sequences of any size on a standard computer. GMATA is user friendly, only requires mouse clicks or types inputs on the command line, and is executable in multiple computing platforms. We demonstrated the application of GMATA in plants genomes and reveal a novel distribution pattern of SSRs in 15 grass genomes. The most abundant motifs are dimer GA/TC, the A/T monomer and the GCG/CGC trimer, rather than the rich G/C content in DNA sequence. We also revealed that SSR count is a linear to the chromosome length in fully assembled grass genomes. GMATA represents a powerful application tool that facilitates genomic sequence analyses. GAMTA is freely available at http://sourceforge.net/projects/gmata/?source=navbar.

  18. CMG-Biotools, a Free Workbench for Basic Comparative Microbial Genomics

    DEFF Research Database (Denmark)

    Vesth, Tammi Camilla; Lagesen, Karin; Acar, Öncel

    2013-01-01

    This paper shows the strength and diverse use of the CMG-biotools system. The system can be installed on a vide range of host operating systems and utilizes as much of the host computer as desired. It allows the user to compare multiple genomes, from various sources using standardized data formats...

  19. Improvement of industrially important microbial strains by genome shuffling: Current status and future prospects.

    Science.gov (United States)

    Magocha, Tinashe Archbold; Zabed, H; Yang, Miaomiao; Yun, Junhua; Zhang, Huanhuan; Qi, Xianghui

    2018-06-01

    The growing demand for biotechnological products against limited metabolic capacity of industrially used microorganisms has led to an increased interest on strain-improvement over the last several decades, which aimed to enhance metabolite yield, substrate uptake and tolerance of the strains. Among a few techniques of strain-improvement, genome shuffling is the most recent and promising approach used for rapid strain-improvement that can yield a new strain by combining whole genomes of multi-parental microorganisms using the principles of protoplast fusion. Genome shuffling has brought a major breakthrough in the strain-improvement concept as it is found to be effective and reliable for expressing complex phenotypes. This review will discuss the technical aspects and applications of genome shuffling for various industrial strains to present its current status and recent progress. In the concluding remarks, a summary will be presented focusing on the major challenges and future outlooks of this technology. Copyright © 2018 Elsevier Ltd. All rights reserved.

  20. Microbial Genomics: The Expanding Universe of Bacterial Defense Systems.

    Science.gov (United States)

    Forsberg, Kevin J; Malik, Harmit S

    2018-04-23

    Bacteria protect themselves against infection using multiple defensive systems that move by horizontal gene transfer and accumulate in genomic 'defense islands'. A recent study exploited these features to uncover ten novel defense systems, substantially expanding the catalog of bacterial defense systems and predicting the discovery of many more. Copyright © 2018 Elsevier Ltd. All rights reserved.

  1. Group sparse canonical correlation analysis for genomic data integration.

    Science.gov (United States)

    Lin, Dongdong; Zhang, Jigang; Li, Jingyao; Calhoun, Vince D; Deng, Hong-Wen; Wang, Yu-Ping

    2013-08-12

    The emergence of high-throughput genomic datasets from different sources and platforms (e.g., gene expression, single nucleotide polymorphisms (SNP), and copy number variation (CNV)) has greatly enhanced our understandings of the interplay of these genomic factors as well as their influences on the complex diseases. It is challenging to explore the relationship between these different types of genomic data sets. In this paper, we focus on a multivariate statistical method, canonical correlation analysis (CCA) method for this problem. Conventional CCA method does not work effectively if the number of data samples is significantly less than that of biomarkers, which is a typical case for genomic data (e.g., SNPs). Sparse CCA (sCCA) methods were introduced to overcome such difficulty, mostly using penalizations with l-1 norm (CCA-l1) or the combination of l-1and l-2 norm (CCA-elastic net). However, they overlook the structural or group effect within genomic data in the analysis, which often exist and are important (e.g., SNPs spanning a gene interact and work together as a group). We propose a new group sparse CCA method (CCA-sparse group) along with an effective numerical algorithm to study the mutual relationship between two different types of genomic data (i.e., SNP and gene expression). We then extend the model to a more general formulation that can include the existing sCCA models. We apply the model to feature/variable selection from two data sets and compare our group sparse CCA method with existing sCCA methods on both simulation and two real datasets (human gliomas data and NCI60 data). We use a graphical representation of the samples with a pair of canonical variates to demonstrate the discriminating characteristic of the selected features. Pathway analysis is further performed for biological interpretation of those features. The CCA-sparse group method incorporates group effects of features into the correlation analysis while performs individual feature

  2. Childhood Acute Lymphoblastic Leukemia: Integrating Genomics into Therapy

    Science.gov (United States)

    Tasian, Sarah K; Loh, Mignon L; Hunger, Stephen P

    2015-01-01

    Acute lymphoblastic leukemia (ALL), the most common malignancy of childhood, is a genetically complex entity that remains a major cause of childhood cancer-related mortality. Major advances in genomic and epigenomic profiling during the past decade have appreciably enhanced knowledge of the biology of de novo and relapsed ALL and have facilitated more precise risk stratification of patients. These achievements have also provided critical insights regarding potentially targetable lesions for development of new therapeutic approaches in the era of precision medicine. This review delineates the current genetic landscape of childhood ALL with emphasis upon patient outcomes with contemporary treatment regimens, as well as therapeutic implications of newly identified genomic alterations in specific subsets of ALL. PMID:26194091

  3. Site-Specific Integration of Exogenous Genes Using Genome Editing Technologies in Zebrafish

    Directory of Open Access Journals (Sweden)

    Atsuo Kawahara

    2016-05-01

    Full Text Available The zebrafish (Danio rerio is an ideal vertebrate model to investigate the developmental molecular mechanism of organogenesis and regeneration. Recent innovation in genome editing technologies, such as zinc finger nucleases (ZFNs, transcription activator-like effector nucleases (TALENs and the clustered regularly interspaced short palindromic repeats (CRISPR/CRISPR associated protein 9 (Cas9 system, have allowed researchers to generate diverse genomic modifications in whole animals and in cultured cells. The CRISPR/Cas9 and TALEN techniques frequently induce DNA double-strand breaks (DSBs at the targeted gene, resulting in frameshift-mediated gene disruption. As a useful application of genome editing technology, several groups have recently reported efficient site-specific integration of exogenous genes into targeted genomic loci. In this review, we provide an overview of TALEN- and CRISPR/Cas9-mediated site-specific integration of exogenous genes in zebrafish.

  4. International regulatory landscape and integration of corrective genome editing into in vitro fertilization.

    Science.gov (United States)

    Araki, Motoko; Ishii, Tetsuya

    2014-11-24

    Genome editing technology, including zinc finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs), and clustered regularly interspaced short palindromic repeat (CRISPR)/Cas, has enabled far more efficient genetic engineering even in non-human primates. This biotechnology is more likely to develop into medicine for preventing a genetic disease if corrective genome editing is integrated into assisted reproductive technology, represented by in vitro fertilization. Although rapid advances in genome editing are expected to make germline gene correction feasible in a clinical setting, there are many issues that still need to be addressed before this could occur. We herein examine current status of genome editing in mammalian embryonic stem cells and zygotes and discuss potential issues in the international regulatory landscape regarding human germline gene modification. Moreover, we address some ethical and social issues that would be raised when each country considers whether genome editing-mediated germline gene correction for preventive medicine should be permitted.

  5. An integrative and applicable phylogenetic footprinting framework for cis-regulatory motifs identification in prokaryotic genomes.

    Science.gov (United States)

    Liu, Bingqiang; Zhang, Hanyuan; Zhou, Chuan; Li, Guojun; Fennell, Anne; Wang, Guanghui; Kang, Yu; Liu, Qi; Ma, Qin

    2016-08-09

    Phylogenetic footprinting is an important computational technique for identifying cis-regulatory motifs in orthologous regulatory regions from multiple genomes, as motifs tend to evolve slower than their surrounding non-functional sequences. Its application, however, has several difficulties for optimizing the selection of orthologous data and reducing the false positives in motif prediction. Here we present an integrative phylogenetic footprinting framework for accurate motif predictions in prokaryotic genomes (MP(3)). The framework includes a new orthologous data preparation procedure, an additional promoter scoring and pruning method and an integration of six existing motif finding algorithms as basic motif search engines. Specifically, we collected orthologous genes from available prokaryotic genomes and built the orthologous regulatory regions based on sequence similarity of promoter regions. This procedure made full use of the large-scale genomic data and taxonomy information and filtered out the promoters with limited contribution to produce a high quality orthologous promoter set. The promoter scoring and pruning is implemented through motif voting by a set of complementary predicting tools that mine as many motif candidates as possible and simultaneously eliminate the effect of random noise. We have applied the framework to Escherichia coli k12 genome and evaluated the prediction performance through comparison with seven existing programs. This evaluation was systematically carried out at the nucleotide and binding site level, and the results showed that MP(3) consistently outperformed other popular motif finding tools. We have integrated MP(3) into our motif identification and analysis server DMINDA, allowing users to efficiently identify and analyze motifs in 2,072 completely sequenced prokaryotic genomes. The performance evaluation indicated that MP(3) is effective for predicting regulatory motifs in prokaryotic genomes. Its application may enhance

  6. Integration sites of Epstein-Barr virus genome on chromosomes of human lymphoblastoid cell lines

    Energy Technology Data Exchange (ETDEWEB)

    Wuu, K.D.; Chen, Y.J.; Wang-Wuu, S. [Institute of Genetics, Taipei (Taiwan, Province of China)

    1994-09-01

    Epstein-Barr virus (EBV) is the pathogen of infectious mononucleosis. The viral genome is present in more than 95% of the African cases of Burkitt lymphoma and it is usually maintained in episomal form in the tumor cells. Viral integration has been described only for Nanalwa which is a Burkitt lymphoma cell line lacking episomes. In order to examine the role of EBV in the immortalization of human Blymphocytes, we investigated whether the EBV integration into the human genome is essential. If the integration does occur, we would like to know whether the integration is randomly distributed or whether the viral DNA integrates preferentially at certain sites. Fourteen in vitro immortalized human lymphoblastoid cell lines (LCLs) were examined by fluorescence in situ hybridization (FISH) with a biotinylated EBV BamHI w DNA fragment as probe. The episomal form of EBV DNA was found in all cells of these cell lines, while only about 65% of the cells have the integrated viral DNA. This might suggest that integration is not a pre-requisite for cell immortalization. Although all chromosomes, except Y, have been found with integrated viral genome, chromsomes 1 and 5 are the most frequent EBV DNA carrier (p<0.05). Nine chromosome bands, namely, 1p31, 1q31, 2q32, 3q13, 3q26, 5q14, 6q24, 7q31 and 12q21, are preferential targets for EBV integration (p<0.001). Eighty percent of the total 938 EBV hybridization signals were found to be at G-band-positive area. This suggests that the mechanism of EBV integration might be different from that of the retroviruses, which specifically integrate to G-band-negative areas. Thus, we conclude that the integration of EBV to host genome is non-random and it may have something to do with the structure of chromosome and DNA sequences.

  7. Figure 4 from Integrative Genomics Viewer: Visualizing Big Data | Office of Cancer Genomics

    Science.gov (United States)

    Gene-list view of genomic data. The gene-list view allows users to compare data across a set of loci. The data in this figure includes copy number, mutation, and clinical data from 202 glioblastoma samples from TCGA. Adapted from Figure 7; Thorvaldsdottir H et al. 2012

  8. Figure 2 from Integrative Genomics Viewer: Visualizing Big Data | Office of Cancer Genomics

    Science.gov (United States)

    Grouping and sorting genomic data in IGV. The IGV user interface displaying 202 glioblastoma samples from TCGA. Samples are grouped by tumor subtype (second annotation column) and data type (first annotation column) and sorted by copy number of the EGFR locus (middle column). Adapted from Figure 1; Robinson et al. 2011

  9. Figure 5 from Integrative Genomics Viewer: Visualizing Big Data | Office of Cancer Genomics

    Science.gov (United States)

    Split-Screen View. The split-screen view is useful for exploring relationships of genomic features that are independent of chromosomal location. Color is used here to indicate mate pairs that map to different chromosomes, chromosomes 1 and 6, suggesting a translocation event. Adapted from Figure 8; Thorvaldsdottir H et al. 2012

  10. An Integrative Bioinformatics Framework for Genome-scale Multiple Level Network Reconstruction of Rice

    Directory of Open Access Journals (Sweden)

    Liu Lili

    2013-06-01

    Full Text Available Understanding how metabolic reactions translate the genome of an organism into its phenotype is a grand challenge in biology. Genome-wide association studies (GWAS statistically connect genotypes to phenotypes, without any recourse to known molecular interactions, whereas a molecular mechanistic description ties gene function to phenotype through gene regulatory networks (GRNs, protein-protein interactions (PPIs and molecular pathways. Integration of different regulatory information levels of an organism is expected to provide a good way for mapping genotypes to phenotypes. However, the lack of curated metabolic model of rice is blocking the exploration of genome-scale multi-level network reconstruction. Here, we have merged GRNs, PPIs and genome-scale metabolic networks (GSMNs approaches into a single framework for rice via omics’ regulatory information reconstruction and integration. Firstly, we reconstructed a genome-scale metabolic model, containing 4,462 function genes, 2,986 metabolites involved in 3,316 reactions, and compartmentalized into ten subcellular locations. Furthermore, 90,358 pairs of protein-protein interactions, 662,936 pairs of gene regulations and 1,763 microRNA-target interactions were integrated into the metabolic model. Eventually, a database was developped for systematically storing and retrieving the genome-scale multi-level network of rice. This provides a reference for understanding genotype-phenotype relationship of rice, and for analysis of its molecular regulatory network.

  11. Incorporating comparative genomics into the design-test-learn cycle of microbial strain engineering.

    Science.gov (United States)

    Sardi, Maria; Gasch, Audrey P

    2017-08-01

    Engineering microbes with new properties is an important goal in industrial engineering, to establish biological factories for production of biofuels, commodity chemicals and pharmaceutics. But engineering microbes to produce new compounds with high yield remains a major challenge toward economically viable production. Incorporating several modern approaches, including synthetic and systems biology, metabolic modeling and regulatory rewiring, has proven to significantly advance industrial strain engineering. This review highlights how comparative genomics can also facilitate strain engineering, by identifying novel genes and pathways, regulatory mechanisms and genetic background effects for engineering. We discuss how incorporating comparative genomics into the design-test-learn cycle of strain engineering can provide novel information that complements other engineering strategies. © FEMS 2017. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  12. Complete genome sequence of Bacillus subtilis BSD-2, a microbial germicide isolated from cultivated cotton.

    Science.gov (United States)

    Liu, Hongwei; Yin, Shuli; An, Likang; Zhang, Genwei; Cheng, Huicai; Xi, Yanhua; Cui, Guanhui; Zhang, Feiyan; Zhang, Liping

    2016-07-20

    Bacillus subtilis BSD-2, isolated from cotton (Gossypium spp.), had strong antagonistic activity to Verticillium dahlia Kleb and Botrytis cinerea. We sequenced and annotated the BSD-2 complete genome to help us the better use of this strain, which has surfactin, bacilysin, bacillibactin, subtilosin A, Tas A and a potential class IV lanthipeptide biosynthetic pathways. Copyright © 2016 Elsevier B.V. All rights reserved.

  13. A DNMT3A2-HDAC2 Complex Is Essential for Genomic Imprinting and Genome Integrity in Mouse Oocytes

    Directory of Open Access Journals (Sweden)

    Pengpeng Ma

    2015-11-01

    Full Text Available Maternal genomic imprints are established during oogenesis. Histone deacetylases (HDACs 1 and 2 are required for oocyte development in mouse, but their role in genomic imprinting is unknown. We find that Hdac1:Hdac2−/− double-mutant growing oocytes exhibit global DNA hypomethylation and fail to establish imprinting marks for Igf2r, Peg3, and Srnpn. Global hypomethylation correlates with increased retrotransposon expression and double-strand DNA breaks. Nuclear-associated DNMT3A2 is reduced in double-mutant oocytes, and injecting these oocytes with Hdac2 partially restores DNMT3A2 nuclear staining. DNMT3A2 co-immunoprecipitates with HDAC2 in mouse embryonic stem cells. Partial loss of nuclear DNMT3A2 and HDAC2 occurs in Sin3a−/− oocytes, which exhibit decreased DNA methylation of imprinting control regions for Igf2r and Srnpn, but not Peg3. These results suggest seminal roles of HDAC1/2 in establishing maternal genomic imprints and maintaining genomic integrity in oocytes mediated in part through a SIN3A complex that interacts with DNMT3A2.

  14. Microbial production of polyhydroxybutyrate with tailor-made properties: an integrated modelling approach and experimental validation.

    Science.gov (United States)

    Penloglou, Giannis; Chatzidoukas, Christos; Kiparissides, Costas

    2012-01-01

    The microbial production of polyhydroxybutyrate (PHB) is a complex process in which the final quantity and quality of the PHB depend on a large number of process operating variables. Consequently, the design and optimal dynamic operation of a microbial process for the efficient production of PHB with tailor-made molecular properties is an extremely interesting problem. The present study investigates how key process operating variables (i.e., nutritional and aeration conditions) affect the biomass production rate and the PHB accumulation in the cells and its associated molecular weight distribution. A combined metabolic/polymerization/macroscopic modelling approach, relating the process performance and product quality with the process variables, was developed and validated using an extensive series of experiments and measurements. The model predicts the dynamic evolution of the biomass growth, the polymer accumulation, the consumption of carbon and nitrogen sources and the average molecular weights of the PHB in a bioreactor, under batch and fed-batch operating conditions. The proposed integrated model was used for the model-based optimization of the production of PHB with tailor-made molecular properties in Azohydromonas lata bacteria. The process optimization led to a high intracellular PHB accumulation (up to 95% g of PHB per g of DCW) and the production of different grades (i.e., different molecular weight distributions) of PHB. Copyright © 2011 Elsevier Inc. All rights reserved.

  15. An integrated metagenome and -proteome analysis of the microbial community residing in a biogas production plant.

    Science.gov (United States)

    Ortseifen, Vera; Stolze, Yvonne; Maus, Irena; Sczyrba, Alexander; Bremges, Andreas; Albaum, Stefan P; Jaenicke, Sebastian; Fracowiak, Jochen; Pühler, Alfred; Schlüter, Andreas

    2016-08-10

    To study the metaproteome of a biogas-producing microbial community, fermentation samples were taken from an agricultural biogas plant for microbial cell and protein extraction and corresponding metagenome analyses. Based on metagenome sequence data, taxonomic community profiling was performed to elucidate the composition of bacterial and archaeal sub-communities. The community's cytosolic metaproteome was represented in a 2D-PAGE approach. Metaproteome databases for protein identification were compiled based on the assembled metagenome sequence dataset for the biogas plant analyzed and non-corresponding biogas metagenomes. Protein identification results revealed that the corresponding biogas protein database facilitated the highest identification rate followed by other biogas-specific databases, whereas common public databases yielded insufficient identification rates. Proteins of the biogas microbiome identified as highly abundant were assigned to the pathways involved in methanogenesis, transport and carbon metabolism. Moreover, the integrated metagenome/-proteome approach enabled the examination of genetic-context information for genes encoding identified proteins by studying neighboring genes on the corresponding contig. Exemplarily, this approach led to the identification of a Methanoculleus sp. contig encoding 16 methanogenesis-related gene products, three of which were also detected as abundant proteins within the community's metaproteome. Thus, metagenome contigs provide additional information on the genetic environment of identified abundant proteins. Copyright © 2016 Elsevier B.V. All rights reserved.

  16. Genome Sequence and Composition of a Tolyporphin-Producing Cyanobacterium-Microbial Community.

    Science.gov (United States)

    Hughes, Rebecca-Ayme; Zhang, Yunlong; Zhang, Ran; Williams, Philip G; Lindsey, Jonathan S; Miller, Eric S

    2017-10-01

    The cyanobacterial culture HT-58-2 was originally described as a strain of Tolypothrix nodosa with the ability to produce tolyporphins, which comprise a family of distinct tetrapyrrole macrocycles with reported efflux pump inhibition properties. Upon reviving the culture from what was thought to be a nonextant collection, studies of culture conditions, strain characterization, phylogeny, and genomics have been undertaken. Here, HT-58-2 was shown by 16S rRNA analysis to closely align with Brasilonema strains and not with Tolypothrix isolates. Light, fluorescence, and scanning electron microscopy revealed cyanobacterium filaments that are decorated with attached bacteria and associated with free bacteria. Metagenomic surveys of HT-58-2 cultures revealed a diversity of bacteria dominated by Erythrobacteraceae , 97% of which are Porphyrobacter species. A dimethyl sulfoxide washing procedure was found to yield enriched cyanobacterial DNA (presumably by removing community bacteria) and sequence data sufficient for genome assembly. The finished, closed HT-58-2Cyano genome consists of 7.85 Mbp (42.6% G+C) and contains 6,581 genes. All genes for biosynthesis of tetrapyrroles (e.g., heme, chlorophyll a , and phycocyanobilin) and almost all for cobalamin were identified dispersed throughout the chromosome. Among the 6,177 protein-encoding genes, coding sequences (CDSs) for all but two of the eight enzymes for conversion of glutamic acid to protoporphyrinogen IX also were found within one major gene cluster. The cluster also includes 10 putative genes (and one hypothetical gene) encoding proteins with domains for a glycosyltransferase, two cytochrome P450 enzymes, and a flavin adenine dinucleotide (FAD)-binding protein. The composition of the gene cluster suggests a possible role in tolyporphin biosynthesis. IMPORTANCE A worldwide search more than 25 years ago for cyanobacterial natural products with anticancer activity identified a culture (HT-58-2) from Micronesia that

  17. Roles of Werner syndrome protein in protection of genome integrity

    DEFF Research Database (Denmark)

    Rossi, Marie L; Ghosh, Avik K; Bohr, Vilhelm A

    2010-01-01

    Werner syndrome protein (WRN) is one of a family of five human RecQ helicases implicated in the maintenance of genome stability. The conserved RecQ family also includes RecQ1, Bloom syndrome protein (BLM), RecQ4, and RecQ5 in humans, as well as Sgs1 in Saccharomyces cerevisiae, Rqh1...... in Schizosaccharomyces pombe, and homologs in Caenorhabditis elegans, Xenopus laevis, and Drosophila melanogaster. Defects in three of the RecQ helicases, RecQ4, BLM, and WRN, cause human pathologies linked with cancer predisposition and premature aging. Mutations in the WRN gene are the causative factor of Werner...

  18. The Plant Genome Integrative Explorer Resource: PlantGenIE.org.

    Science.gov (United States)

    Sundell, David; Mannapperuma, Chanaka; Netotea, Sergiu; Delhomme, Nicolas; Lin, Yao-Cheng; Sjödin, Andreas; Van de Peer, Yves; Jansson, Stefan; Hvidsten, Torgeir R; Street, Nathaniel R

    2015-12-01

    Accessing and exploring large-scale genomics data sets remains a significant challenge to researchers without specialist bioinformatics training. We present the integrated PlantGenIE.org platform for exploration of Populus, conifer and Arabidopsis genomics data, which includes expression networks and associated visualization tools. Standard features of a model organism database are provided, including genome browsers, gene list annotation, Blast homology searches and gene information pages. Community annotation updating is supported via integration of WebApollo. We have produced an RNA-sequencing (RNA-Seq) expression atlas for Populus tremula and have integrated these data within the expression tools. An updated version of the ComPlEx resource for performing comparative plant expression analyses of gene coexpression network conservation between species has also been integrated. The PlantGenIE.org platform provides intuitive access to large-scale and genome-wide genomics data from model forest tree species, facilitating both community contributions to annotation improvement and tools supporting use of the included data resources to inform biological insight. © 2015 The Authors. New Phytologist © 2015 New Phytologist Trust.

  19. A microbial fuel cell–membrane bioreactor integrated system for cost-effective wastewater treatment

    International Nuclear Information System (INIS)

    Wang, Yong-Peng; Liu, Xian-Wei; Li, Wen-Wei; Li, Feng; Wang, Yun-Kun; Sheng, Guo-Ping; Zeng, Raymond J.; Yu, Han-Qing

    2012-01-01

    Highlights: ► An MFC–MBR integrated system for wastewater treatment and electricity generation. ► Stable electricity generation during 1000-h continuous operation. ► Low-cost electrode, separator and filter materials were adopted. -- Abstract: Microbial fuel cell (MFC) and membrane bioreactor (MBR) are both promising technologies for wastewater treatment, but both with limitations. In this study, a novel MFC–MBR integrated system, which combines the advantages of the individual systems, was proposed for simultaneous wastewater treatment and energy recovery. The system favored a better utilization of the oxygen in the aeration tank of MBR by the MFC biocathode, and enabled a high effluent quality. Continuous and stable electricity generation, with the average current of 1.9 ± 0.4 mA, was achieved over a long period of about 40 days. The maximum power density reached 6.0 W m −3 . Moreover, low-cost materials were used for the reactor construction. This integrated system shows great promise for practical wastewater treatment application.

  20. Gene Ontology Terms and Automated Annotation for Energy-Related Microbial Genomes

    Energy Technology Data Exchange (ETDEWEB)

    Mukhopadhyay, Biswarup [Virginia Polytechnic Inst. and State Univ. (Virginia Tech), Blacksburg, VA (United States); Tyler, Brett M. [Oregon State Univ., Corvallis, OR (United States); Setubal, Joao [Univ. of Sao Paulo (Brazil); Murali, T. M. [Virginia Polytechnic Inst. and State Univ. (Virginia Tech), Blacksburg, VA (United States)

    2017-11-03

    Gene Ontology (GO) is one of the more widely used functional ontologies for describing gene functions at various levels. The project developed 660 GO terms for describing energy-related microbial processes and filled the known gaps in this area of the GO system, and then used these terms to describe functions of 179 genes to showcase the utilities of the new resources. It hosted a series of workshops and made presentations at key meetings to inform and train scientific community members on these terms and to receive inputs from them for the GO term generation efforts. The project has developed a website for storing and displaying the resources (http://www.mengo.biochem.vt.edu/). The outcome of the project was further disseminated through peer-reviewed publications and poster and seminar presentations.

  1. Integrating Microbial Electrochemical Technology with Forward Osmosis and Membrane Bioreactors: Low-Energy Wastewater Treatment, Energy Recovery and Water Reuse

    KAUST Repository

    Werner, Craig M.

    2014-06-01

    Wastewater treatment is energy intensive, with modern wastewater treatment processes consuming 0.6 kWh/m3 of water treated, half of which is required for aeration. Considering that wastewater contains approximately 2 kWh/m3 of energy and represents a reliable alternative water resource, capturing part of this energy and reclaiming the water would offset or even eliminate energy requirements for wastewater treatment and provide a means to augment traditional water supplies. Microbial electrochemical technology is a novel technology platform that uses bacteria capable of producing an electric current outside of the cell to recover energy from wastewater. These bacteria do not require oxygen to respire but instead use an insoluble electrode as their terminal electron acceptor. Two types of microbial electrochemical technologies were investigated in this dissertation: 1) a microbial fuel cell that produces electricity; and 2) a microbial electrolysis cell that produces hydrogen with the addition of external power. On their own, microbial electrochemical technologies do not achieve sufficiently high treatment levels. Innovative approaches that integrate microbial electrochemical technologies with emerging and established membrane-based treatment processes may improve the overall extent of wastewater treatment and reclaim treated water. Forward osmosis is an emerging low-energy membrane-based technology for seawater desalination. In forward osmosis water is transported across a semipermeable membrane driven by an osmotic gradient. The microbial osmotic fuel cell described in this dissertation integrates a microbial fuel cell with forward osmosis to achieve wastewater treatment, energy recovery and partial desalination. This system required no aeration and generated more power than conventional microbial fuel cells using ion exchange membranes by minimizing electrochemical losses. Membrane bioreactors incorporate semipermeable membranes within a biological wastewater

  2. Future Public Policy and Ethical Issues Facing the Agricultural and Microbial Genomics Sectors of the Biotechnology Industry: A Roundtable Discussion

    Energy Technology Data Exchange (ETDEWEB)

    Diane E. Hoffmann

    2003-09-12

    On September 12, 2003, the University of Maryland School of Law's Intellectual Property and Law & Health Care Programs jointly sponsored and convened a roundtable discussion on the future public policy and ethical issues that will likely face the agricultural and microbial genomics sectors of the biotechnology industry. As this industry has developed over the last two decades, societal concerns have moved from what were often local issues, e.g., the safety of laboratories where scientists conducted recombinant DNA research on transgenic microbes, animals and crops, to more global issues. These newer issues include intellectual property, international trade, risks of genetically engineered foods and microbes, bioterrorism, and marketing and labeling of new products sold worldwide. The fast paced nature of the biotechnology industry and its new developments often mean that legislators, regulators and society, in general, must play ''catch up'' in their efforts to understand the issues, the risks, and even the benefits, that may result from the industry's new ways of conducting research, new products, and novel methods of product marketing and distribution. The goal of the roundtable was to develop a short list of the most significant public policy and ethical issues that will emerge as a result of advances in these sectors of the biotechnology industry over the next five to six years. More concretely, by ''most significant'' the conveners meant the types of issues that would come to the attention of members of Congress or state legislators during this time frame and for which they would be better prepared if they had well researched and timely background information. A concomitant goal was to provide a set of focused issues for academic debate and scholarship so that policy makers, industry leaders and regulators would have the intellectual resources they need to better understand the issues and concerns at stake. The

  3. Diversity and Composition of Sulfate-Reducing Microbial Communities Based on Genomic DNA and RNA Transcription in Production Water of High Temperature and Corrosive Oil Reservoir

    Directory of Open Access Journals (Sweden)

    Xiao-Xiao Li

    2017-06-01

    Full Text Available Deep subsurface petroleum reservoir ecosystems harbor a high diversity of microorganisms, and microbial influenced corrosion is a major problem for the petroleum industry. Here, we used high-throughput sequencing to explore the microbial communities based on genomic 16S rDNA and metabolically active 16S rRNA analyses of production water samples with different extents of corrosion from a high-temperature oil reservoir. Results showed that Desulfotignum and Roseovarius were the most abundant genera in both genomic and active bacterial communities of all the samples. Both genomic and active archaeal communities were mainly composed of Archaeoglobus and Methanolobus. Within both bacteria and archaea, the active and genomic communities were compositionally distinct from one another across the different oil wells (bacteria p = 0.002; archaea p = 0.01. In addition, the sulfate-reducing microorganisms (SRMs were specifically assessed by Sanger sequencing of functional genes aprA and dsrA encoding the enzymes adenosine-5′-phosphosulfate reductase and dissimilatory sulfite reductase, respectively. Functional gene analysis indicated that potentially active Archaeoglobus, Desulfotignum, Desulfovibrio, and Thermodesulforhabdus were frequently detected, with Archaeoglobus as the most abundant and active sulfate-reducing group. Canonical correspondence analysis revealed that the SRM communities in petroleum reservoir system were closely related to pH of the production water and sulfate concentration. This study highlights the importance of distinguishing the metabolically active microorganisms from the genomic community and extends our knowledge on the active SRM communities in corrosive petroleum reservoirs.

  4. Diversity and Composition of Sulfate-Reducing Microbial Communities Based on Genomic DNA and RNA Transcription in Production Water of High Temperature and Corrosive Oil Reservoir

    Science.gov (United States)

    Li, Xiao-Xiao; Liu, Jin-Feng; Zhou, Lei; Mbadinga, Serge M.; Yang, Shi-Zhong; Gu, Ji-Dong; Mu, Bo-Zhong

    2017-01-01

    Deep subsurface petroleum reservoir ecosystems harbor a high diversity of microorganisms, and microbial influenced corrosion is a major problem for the petroleum industry. Here, we used high-throughput sequencing to explore the microbial communities based on genomic 16S rDNA and metabolically active 16S rRNA analyses of production water samples with different extents of corrosion from a high-temperature oil reservoir. Results showed that Desulfotignum and Roseovarius were the most abundant genera in both genomic and active bacterial communities of all the samples. Both genomic and active archaeal communities were mainly composed of Archaeoglobus and Methanolobus. Within both bacteria and archaea, the active and genomic communities were compositionally distinct from one another across the different oil wells (bacteria p = 0.002; archaea p = 0.01). In addition, the sulfate-reducing microorganisms (SRMs) were specifically assessed by Sanger sequencing of functional genes aprA and dsrA encoding the enzymes adenosine-5′-phosphosulfate reductase and dissimilatory sulfite reductase, respectively. Functional gene analysis indicated that potentially active Archaeoglobus, Desulfotignum, Desulfovibrio, and Thermodesulforhabdus were frequently detected, with Archaeoglobus as the most abundant and active sulfate-reducing group. Canonical correspondence analysis revealed that the SRM communities in petroleum reservoir system were closely related to pH of the production water and sulfate concentration. This study highlights the importance of distinguishing the metabolically active microorganisms from the genomic community and extends our knowledge on the active SRM communities in corrosive petroleum reservoirs. PMID:28638372

  5. An integrated CRISPR Bombyx mori genome editing system with improved efficiency and expanded target sites.

    Science.gov (United States)

    Ma, Sanyuan; Liu, Yue; Liu, Yuanyuan; Chang, Jiasong; Zhang, Tong; Wang, Xiaogang; Shi, Run; Lu, Wei; Xia, Xiaojuan; Zhao, Ping; Xia, Qingyou

    2017-04-01

    Genome editing enabled unprecedented new opportunities for targeted genomic engineering of a wide variety of organisms ranging from microbes, plants, animals and even human embryos. The serial establishing and rapid applications of genome editing tools significantly accelerated Bombyx mori (B. mori) research during the past years. However, the only CRISPR system in B. mori was the commonly used SpCas9, which only recognize target sites containing NGG PAM sequence. In the present study, we first improve the efficiency of our previous established SpCas9 system by 3.5 folds. The improved high efficiency was also observed at several loci in both BmNs cells and B. mori embryos. Then to expand the target sites, we showed that two newly discovered CRISPR system, SaCas9 and AsCpf1, could also induce highly efficient site-specific genome editing in BmNs cells, and constructed an integrated CRISPR system. Genome-wide analysis of targetable sites was further conducted and showed that the integrated system cover 69,144,399 sites in B. mori genome, and one site could be found in every 6.5 bp. The efficiency and resolution of this CRISPR platform will probably accelerate both fundamental researches and applicable studies in B. mori, and perhaps other insects. Copyright © 2017 Elsevier Ltd. All rights reserved.

  6. Human papillomavirus genome integration in squamous carcinogenesis: what have next-generation sequencing studies taught us?

    Science.gov (United States)

    Groves, Ian J; Coleman, Nicholas

    2018-05-01

    Human papillomavirus (HPV) infection is associated with ∼5% of all human cancers, including a range of squamous cell carcinomas. Persistent infection by high-risk HPVs (HRHPVs) is associated with the integration of virus genomes (which are usually stably maintained as extrachromosomal episomes) into host chromosomes. Although HRHPV integration rates differ across human sites of infection, this process appears to be an important event in HPV-associated neoplastic progression, leading to deregulation of virus oncogene expression, host gene expression modulation, and further genomic instability. However, the mechanisms by which HRHPV integration occur and by which the subsequent gene expression changes take place are incompletely understood. The advent of next-generation sequencing (NGS) of both RNA and DNA has allowed powerful interrogation of the association of HRHPVs with human disease, including precise determination of the sites of integration and the genomic rearrangements at integration loci. In turn, these data have indicated that integration occurs through two main mechanisms: looping integration and direct insertion. Improved understanding of integration sites is allowing further investigation of the factors that provide a competitive advantage to some integrants during disease progression. Furthermore, advanced approaches to the generation of genome-wide samples have given novel insights into the three-dimensional interactions within the nucleus, which could act as another layer of epigenetic control of both virus and host transcription. It is hoped that further advances in NGS techniques and analysis will not only allow the examination of further unanswered questions regarding HPV infection, but also direct new approaches to treating HPV-associated human disease. Copyright © 2018 Pathological Society of Great Britain and Ireland. Published by John Wiley & Sons, Ltd. Copyright © 2018 Pathological Society of Great Britain and Ireland. Published by John

  7. Genome Sequence and Composition of a Tolyporphin-Producing Cyanobacterium-Microbial Community

    Energy Technology Data Exchange (ETDEWEB)

    Hughes, Rebecca-Ayme; Zhang, Yunlong; Zhang, Ran; Williams, Philip G.; Lindsey, Jonathan S.; Miller, Eric S.; Nojiri, Hideaki

    2017-07-28

    ABSTRACT

    The cyanobacterial culture HT-58-2 was originally described as a strain ofTolypothrix nodosawith the ability to produce tolyporphins, which comprise a family of distinct tetrapyrrole macrocycles with reported efflux pump inhibition properties. Upon reviving the culture from what was thought to be a nonextant collection, studies of culture conditions, strain characterization, phylogeny, and genomics have been undertaken. Here, HT-58-2 was shown by 16S rRNA analysis to closely align withBrasilonemastrains and not withTolypothrixisolates. Light, fluorescence, and scanning electron microscopy revealed cyanobacterium filaments that are decorated with attached bacteria and associated with free bacteria. Metagenomic surveys of HT-58-2 cultures revealed a diversity of bacteria dominated byErythrobacteraceae, 97% of which arePorphyrobacterspecies. A dimethyl sulfoxide washing procedure was found to yield enriched cyanobacterial DNA (presumably by removing community bacteria) and sequence data sufficient for genome assembly. The finished, closed HT-58-2Cyano genome consists of 7.85 Mbp (42.6% G+C) and contains 6,581 genes. All genes for biosynthesis of tetrapyrroles (e.g., heme, chlorophylla, and phycocyanobilin) and almost all for cobalamin were identified dispersed throughout the chromosome. Among the 6,177 protein-encoding genes, coding sequences (CDSs) for all but two of the eight enzymes for conversion of glutamic acid to protoporphyrinogen IX also were found within one major gene cluster. The cluster also includes 10 putative genes (and one hypothetical gene) encoding proteins with

  8. Probabilistic quantitative microbial risk assessment model of norovirus from wastewater irrigated vegetables in Ghana using genome copies and fecal indicator ratio conversion for estimating exposure dose

    DEFF Research Database (Denmark)

    Owusu-Ansah, Emmanuel de-Graft Johnson; Sampson, Angelina; Amponsah, Samuel K.

    2017-01-01

    physical and environmental factors that might influence the reliability of using indicator organisms in microbial risk assessment. The challenges facing analytical studies on virus enumeration (genome copies or particles) have contributed to the already existing lack of data in QMRA modelling. This study......The need to replace the commonly applied fecal indicator conversions ratio (an assumption of 1:10− 5 virus to fecal indicator organism) in Quantitative Microbial Risk Assessment (QMRA) with models based on quantitative data on the virus of interest has gained prominence due to the different...... attempts to fit a QMRA model to genome copies of norovirus data. The model estimates the risk of norovirus infection from the intake of vegetables irrigated with wastewater from different sources. The results were compared to the results of a corresponding model using the fecal indicator conversion ratio...

  9. PGSB/MIPS PlantsDB Database Framework for the Integration and Analysis of Plant Genome Data.

    Science.gov (United States)

    Spannagl, Manuel; Nussbaumer, Thomas; Bader, Kai; Gundlach, Heidrun; Mayer, Klaus F X

    2017-01-01

    Plant Genome and Systems Biology (PGSB), formerly Munich Institute for Protein Sequences (MIPS) PlantsDB, is a database framework for the integration and analysis of plant genome data, developed and maintained for more than a decade now. Major components of that framework are genome databases and analysis resources focusing on individual (reference) genomes providing flexible and intuitive access to data. Another main focus is the integration of genomes from both model and crop plants to form a scaffold for comparative genomics, assisted by specialized tools such as the CrowsNest viewer to explore conserved gene order (synteny). Data exchange and integrated search functionality with/over many plant genome databases is provided within the transPLANT project.

  10. Decoding the genome with an integrative analysis tool: combinatorial CRM Decoder.

    Science.gov (United States)

    Kang, Keunsoo; Kim, Joomyeong; Chung, Jae Hoon; Lee, Daeyoup

    2011-09-01

    The identification of genome-wide cis-regulatory modules (CRMs) and characterization of their associated epigenetic features are fundamental steps toward the understanding of gene regulatory networks. Although integrative analysis of available genome-wide information can provide new biological insights, the lack of novel methodologies has become a major bottleneck. Here, we present a comprehensive analysis tool called combinatorial CRM decoder (CCD), which utilizes the publicly available information to identify and characterize genome-wide CRMs in a species of interest. CCD first defines a set of the epigenetic features which is significantly associated with a set of known CRMs as a code called 'trace code', and subsequently uses the trace code to pinpoint putative CRMs throughout the genome. Using 61 genome-wide data sets obtained from 17 independent mouse studies, CCD successfully catalogued ∼12 600 CRMs (five distinct classes) including polycomb repressive complex 2 target sites as well as imprinting control regions. Interestingly, we discovered that ∼4% of the identified CRMs belong to at least two different classes named 'multi-functional CRM', suggesting their functional importance for regulating spatiotemporal gene expression. From these examples, we show that CCD can be applied to any potential genome-wide datasets and therefore will shed light on unveiling genome-wide CRMs in various species.

  11. Integrating Genomic Data Sets for Knowledge Discovery: An Informed Approach to Management of Captive Endangered Species

    Directory of Open Access Journals (Sweden)

    Kristopher J. L. Irizarry

    2016-01-01

    Full Text Available Many endangered captive populations exhibit reduced genetic diversity resulting in health issues that impact reproductive fitness and quality of life. Numerous cost effective genomic sequencing and genotyping technologies provide unparalleled opportunity for incorporating genomics knowledge in management of endangered species. Genomic data, such as sequence data, transcriptome data, and genotyping data, provide critical information about a captive population that, when leveraged correctly, can be utilized to maximize population genetic variation while simultaneously reducing unintended introduction or propagation of undesirable phenotypes. Current approaches aimed at managing endangered captive populations utilize species survival plans (SSPs that rely upon mean kinship estimates to maximize genetic diversity while simultaneously avoiding artificial selection in the breeding program. However, as genomic resources increase for each endangered species, the potential knowledge available for management also increases. Unlike model organisms in which considerable scientific resources are used to experimentally validate genotype-phenotype relationships, endangered species typically lack the necessary sample sizes and economic resources required for such studies. Even so, in the absence of experimentally verified genetic discoveries, genomics data still provides value. In fact, bioinformatics and comparative genomics approaches offer mechanisms for translating these raw genomics data sets into integrated knowledge that enable an informed approach to endangered species management.

  12. Integrating Genomic Data Sets for Knowledge Discovery: An Informed Approach to Management of Captive Endangered Species.

    Science.gov (United States)

    Irizarry, Kristopher J L; Bryant, Doug; Kalish, Jordan; Eng, Curtis; Schmidt, Peggy L; Barrett, Gini; Barr, Margaret C

    2016-01-01

    Many endangered captive populations exhibit reduced genetic diversity resulting in health issues that impact reproductive fitness and quality of life. Numerous cost effective genomic sequencing and genotyping technologies provide unparalleled opportunity for incorporating genomics knowledge in management of endangered species. Genomic data, such as sequence data, transcriptome data, and genotyping data, provide critical information about a captive population that, when leveraged correctly, can be utilized to maximize population genetic variation while simultaneously reducing unintended introduction or propagation of undesirable phenotypes. Current approaches aimed at managing endangered captive populations utilize species survival plans (SSPs) that rely upon mean kinship estimates to maximize genetic diversity while simultaneously avoiding artificial selection in the breeding program. However, as genomic resources increase for each endangered species, the potential knowledge available for management also increases. Unlike model organisms in which considerable scientific resources are used to experimentally validate genotype-phenotype relationships, endangered species typically lack the necessary sample sizes and economic resources required for such studies. Even so, in the absence of experimentally verified genetic discoveries, genomics data still provides value. In fact, bioinformatics and comparative genomics approaches offer mechanisms for translating these raw genomics data sets into integrated knowledge that enable an informed approach to endangered species management.

  13. Prolonged Integration Site Selection of a Lentiviral Vector in the Genome of Human Keratinocytes.

    Science.gov (United States)

    Qian, Wei; Wang, Yong; Li, Rui-Fu; Zhou, Xin; Liu, Jing; Peng, Dai-Zhi

    2017-03-03

    BACKGROUND Lentiviral vectors have been successfully used for human skin cell gene transfer studies. Defining the selection of integration sites for retroviral vectors in the host genome is crucial in risk assessment analysis of gene therapy. However, genome-wide analyses of lentiviral integration sites in human keratinocytes, especially after prolonged growth, are poorly understood. MATERIAL AND METHODS In this study, 874 unique lentiviral vector integration sites in human HaCaT keratinocytes after long-term culture were identified and analyzed with the online tool GTSG-QuickMap and SPSS software. RESULTS The data indicated that lentiviral vectors showed integration site preferences for genes and gene-rich regions. CONCLUSIONS This study will likely assist in determining the relative risks of the lentiviral vector system and in the design of a safe lentiviral vector system in the gene therapy of skin diseases.

  14. Genome-Wide Tuning of Protein Expression Levels to Rapidly Engineer Microbial Traits.

    Science.gov (United States)

    Freed, Emily F; Winkler, James D; Weiss, Sophie J; Garst, Andrew D; Mutalik, Vivek K; Arkin, Adam P; Knight, Rob; Gill, Ryan T

    2015-11-20

    The reliable engineering of biological systems requires quantitative mapping of predictable and context-independent expression over a broad range of protein expression levels. However, current techniques for modifying expression levels are cumbersome and are not amenable to high-throughput approaches. Here we present major improvements to current techniques through the design and construction of E. coli genome-wide libraries using synthetic DNA cassettes that can tune expression over a ∼10(4) range. The cassettes also contain molecular barcodes that are optimized for next-generation sequencing, enabling rapid and quantitative tracking of alleles that have the highest fitness advantage. We show these libraries can be used to determine which genes and expression levels confer greater fitness to E. coli under different growth conditions.

  15. Microbial Culturomics Application for Global Health: Noncontiguous Finished Genome Sequence and Description of Pseudomonas massiliensis Strain CB-1T sp. nov. in Brazil.

    Science.gov (United States)

    Bardet, Lucie; Cimmino, Teresa; Buffet, Clémence; Michelle, Caroline; Rathored, Jaishriram; Tandina, Fatalmoudou; Lagier, Jean-Christophe; Khelaifia, Saber; Abrahão, Jônatas; Raoult, Didier; Rolain, Jean-Marc

    2018-02-01

    Culturomics is a new postgenomics field that explores the microbial diversity of the human gut coupled with taxono-genomic strategy. Culturomics, and the microbiome science more generally, are anticipated to transform global health diagnostics and inform the ways in which gut microbial diversity contributes to human health and disease, and by extension, to personalized medicine. Using culturomics, we report in this study the description of strain CB1 T ( = CSUR P1334 = DSM 29075), a new species isolated from a stool specimen from a 37-year-old Brazilian woman. This description includes phenotypic characteristics and complete genome sequence and annotation. Strain CB1 T is a gram-negative aerobic and motile bacillus, exhibits neither catalase nor oxidase activities, and presents a 98.3% 16S rRNA sequence similarity with Pseudomonas putida. The 4,723,534 bp long genome contains 4239 protein-coding genes and 74 RNA genes, including 15 rRNA genes (5 16S rRNA, 4 23S rRNA, and 6 5S rRNA) and 59 tRNA genes. Strain CB1 T was named Pseudomonas massiliensis sp. nov. and classified into the family Pseudomonadaceae. This study demonstrates the usefulness of microbial culturomics in exploration of human microbiota in diverse geographies and offers new promise for incorporating new omics technologies for innovation in diagnostic medicine and global health.

  16. Going from Microbial Ecology to Genome Data and Back: Studies on a Haloalkaliphilic Bacterium Isolated from Soap Lake, Washington State

    Directory of Open Access Journals (Sweden)

    Melanie R. Mormile

    2014-11-01

    Full Text Available Soap Lake is a meromictic, alkaline (~pH 9.8 and saline (~14 to 140 g liter-1 lake located in the semiarid area of eastern Washington State. Of note is the length of time it has been meromictic (at least 2000 years and the extremely high sulfide level (~140 mM in its monimolimnion. As expected, the microbial ecology of this lake is greatly influenced by these conditions. A bacterium, Halanaerobium hydrogeniformans, was isolated from the mixolimnion region of this lake. H. hydrogeniformans is a haloalkaliphilic bacterium capable of forming hydrogen from 5- and 6-carbon sugars derived from hemicellulose and cellulose. Due to its ability to produce hydrogen under saline and alkaline conditions, in amounts that rival genetically modified organisms, its genome was sequenced. This sequence data provides an opportunity to explore the unique metabolic capabilities of this organism, including the mechanisms for tolerating the extreme conditions of both high salinity and alkalinity of its environment.

  17. ReacKnock: identifying reaction deletion strategies for microbial strain optimization based on genome-scale metabolic network.

    Directory of Open Access Journals (Sweden)

    Zixiang Xu

    Full Text Available Gene knockout has been used as a common strategy to improve microbial strains for producing chemicals. Several algorithms are available to predict the target reactions to be deleted. Most of them apply mixed integer bi-level linear programming (MIBLP based on metabolic networks, and use duality theory to transform bi-level optimization problem of large-scale MIBLP to single-level programming. However, the validity of the transformation was not proved. Solution of MIBLP depends on the structure of inner problem. If the inner problem is continuous, Karush-Kuhn-Tucker (KKT method can be used to reformulate the MIBLP to a single-level one. We adopt KKT technique in our algorithm ReacKnock to attack the intractable problem of the solution of MIBLP, demonstrated with the genome-scale metabolic network model of E. coli for producing various chemicals such as succinate, ethanol, threonine and etc. Compared to the previous methods, our algorithm is fast, stable and reliable to find the optimal solutions for all the chemical products tested, and able to provide all the alternative deletion strategies which lead to the same industrial objective.

  18. The Effect of Diet and Exercise on Intestinal Integrity and Microbial Diversity in Mice.

    Science.gov (United States)

    Campbell, Sara C; Wisniewski, Paul J; Noji, Michael; McGuinness, Lora R; Häggblom, Max M; Lightfoot, Stanley A; Joseph, Laurie B; Kerkhof, Lee J

    2016-01-01

    The gut microbiota is now known to play an important role contributing to inflammatory-based chronic diseases. This study examined intestinal integrity/inflammation and the gut microbial communities in sedentary and exercising mice presented with a normal or high-fat diet. Thirty-six, 6-week old C57BL/6NTac male mice were fed a normal or high-fat diet for 12-weeks and randomly assigned to exercise or sedentary groups. After 12 weeks animals were sacrificed and duodenum/ileum tissues were fixed for immunohistochemistry for occludin, E-cadherin, and cyclooxygenase-2 (COX-2). The bacterial communities were assayed in fecal samples using terminal restriction fragment length polymorphism (TRFLP) analysis and pyrosequencing of 16S rRNA gene amplicons. Lean sedentary (LS) mice presented normal histologic villi while obese sedentary (OS) mice had similar villi height with more than twice the width of the LS animals. Both lean (LX) and obese exercise (OX) mice duodenum and ileum were histologically normal. COX-2 expression was the greatest in the OS group, followed by LS, LX and OX. The TRFLP and pyrosequencing indicated that members of the Clostridiales order were predominant in all diet groups. Specific phylotypes were observed with exercise, including Faecalibacterium prausnitzi, Clostridium spp., and Allobaculum spp. These data suggest that exercise has a strong influence on gut integrity and host microbiome which points to the necessity for more mechanistic studies of the interactions between specific bacteria in the gut and its host.

  19. BiGG Models: A platform for integrating, standardizing and sharing genome-scale models

    DEFF Research Database (Denmark)

    King, Zachary A.; Lu, Justin; Dräger, Andreas

    2016-01-01

    Genome-scale metabolic models are mathematically-structured knowledge bases that can be used to predict metabolic pathway usage and growth phenotypes. Furthermore, they can generate and test hypotheses when integrated with experimental data. To maximize the value of these models, centralized repo...

  20. Integrative genome analyses identify key somatic driver mutations of small-cell lung cancer

    NARCIS (Netherlands)

    Peifer, Martin; Fernandez-Cuesta, Lynnette; Sos, Martin L.; George, Julie; Seidel, Danila; Kasper, Lawryn H.; Plenker, Dennis; Leenders, Frauke; Sun, Ruping; Zander, Thomas; Menon, Roopika; Koker, Mirjam; Dahmen, Ilona; Mueller, Christian; Di Cerbo, Vincenzo; Schildhaus, Hans-Ulrich; Altmueller, Janine; Baessmann, Ingelore; Becker, Christian; de Wilde, Bram; Vandesompele, Jo; Boehm, Diana; Ansen, Sascha; Gabler, Franziska; Wilkening, Ines; Heynck, Stefanie; Heuckmann, Johannes M.; Lu, Xin; Carter, Scott L.; Cibulskis, Kristian; Banerji, Shantanu; Getz, Gad; Park, Kwon-Sik; Rauh, Daniel; Gruetter, Christian; Fischer, Matthias; Pasqualucci, Laura; Wright, Gavin; Wainer, Zoe; Russell, Prudence; Petersen, Iver; Chen, Yuan; Stoelben, Erich; Ludwig, Corinna; Schnabel, Philipp; Hoffmann, Hans; Muley, Thomas; Brockmann, Michael; Engel-Riedel, Walburga; Muscarella, Lucia A.; Fazio, Vito M.; Groen, Harry; Timens, Wim; Sietsma, Hannie; Thunnissen, Erik; Smit, Egbert; Heideman, Danielle A. M.; Snijders, Peter J. F.; Cappuzzo, Federico; Ligorio, Claudia; Damiani, Stefania; Field, John; Solberg, Steinar; Brustugun, Odd Terje; Lund-Iversen, Marius; Saenger, Joerg; Clement, Joachim H.; Soltermann, Alex; Moch, Holger; Weder, Walter; Solomon, Benjamin; Soria, Jean-Charles; Validire, Pierre; Besse, Benjamin; Brambilla, Elisabeth; Brambilla, Christian; Lantuejoul, Sylvie; Lorimier, Philippe; Schneider, Peter M.; Hallek, Michael; Pao, William; Meyerson, Matthew; Sage, Julien; Shendure, Jay; Schneider, Robert; Buettner, Reinhard; Wolf, Juergen; Nuernberg, Peter; Perner, Sven; Heukamp, Lukas C.; Brindle, Paul K.; Haas, Stefan; Thomas, Roman K.

    2012-01-01

    Small-cell lung cancer (SCLC) is an aggressive lung tumor subtype with poor prognosis(1-3). We sequenced 29 SCLC exomes, 2 genomes and 15 transcriptomes and found an extremely high mutation rate of 7.4 +/- 1 protein-changing mutations per million base pairs. Therefore, we conducted integrated

  1. Filling the knowledge gap: Integrating quantitative genetics and genomics in graduate education and outreach

    Science.gov (United States)

    The genomics revolution provides vital tools to address global food security. Yet to be incorporated into livestock breeding, molecular techniques need to be integrated into a quantitative genetics framework. Within the U.S., with shrinking faculty numbers with the requisite skills, the capacity to ...

  2. Quantitative and Qualitative Proteome Characteristics Extracted from In-Depth Integrated Genomics and Proteomics Analysis

    NARCIS (Netherlands)

    Low, Teck Yew; van Heesch, Sebastiaan; van den Toorn, Henk; Giansanti, Piero; Cristobal, Alba; Toonen, Pim; Schafer, Sebastian; Huebner, Norbert; van Breukelen, Bas; Mohammed, Shabaz; Cuppen, Edwin; Heck, Albert J. R.; Guryev, Victor

    2013-01-01

    Quantitative and qualitative protein characteristics are regulated at genomic, transcriptomic, and post-transcriptional levels. Here, we integrated in-depth transcriptome and proteome analyses of liver tissues from two rat strains to unravel the interactions within and between these layers. We

  3. Integrative Genomic Analysis of Cholangiocarcinoma Identifies Distinct IDH-Mutant Molecular Profiles

    DEFF Research Database (Denmark)

    Farshidfar, Farshad; Zheng, Siyuan; Gingras, Marie-Claude

    2017-01-01

    Cholangiocarcinoma (CCA) is an aggressive malignancy of the bile ducts, with poor prognosis and limited treatment options. Here, we describe the integrated analysis of somatic mutations, RNA expression, copy number, and DNA methylation by The Cancer Genome Atlas of a set of predominantly intrahep...

  4. Nucleotide excision repair : a multi-step mechanism required to maintain genome integrity

    NARCIS (Netherlands)

    Moser, Jill

    2010-01-01

    DNA is continuously exposed to exogenous and genotoxic insults including ionizing and ultraviolet radiation as well as chemical agents. DNA damage can compromise the integrity of the genome and have potentially deleterious effects. Ultraviolet light (UV) can induce the formation of helix distorting

  5. Integrating sequencing technologies in personal genomics: optimal low cost reconstruction of structural variants.

    Directory of Open Access Journals (Sweden)

    Jiang Du

    2009-07-01

    Full Text Available The goal of human genome re-sequencing is obtaining an accurate assembly of an individual's genome. Recently, there has been great excitement in the development of many technologies for this (e.g. medium and short read sequencing from companies such as 454 and SOLiD, and high-density oligo-arrays from Affymetrix and NimbelGen, with even more expected to appear. The costs and sensitivities of these technologies differ considerably from each other. As an important goal of personal genomics is to reduce the cost of re-sequencing to an affordable point, it is worthwhile to consider optimally integrating technologies. Here, we build a simulation toolbox that will help us optimally combine different technologies for genome re-sequencing, especially in reconstructing large structural variants (SVs. SV reconstruction is considered the most challenging step in human genome re-sequencing. (It is sometimes even harder than de novo assembly of small genomes because of the duplications and repetitive sequences in the human genome. To this end, we formulate canonical problems that are representative of issues in reconstruction and are of small enough scale to be computationally tractable and simulatable. Using semi-realistic simulations, we show how we can combine different technologies to optimally solve the assembly at low cost. With mapability maps, our simulations efficiently handle the inhomogeneous repeat-containing structure of the human genome and the computational complexity of practical assembly algorithms. They quantitatively show how combining different read lengths is more cost-effective than using one length, how an optimal mixed sequencing strategy for reconstructing large novel SVs usually also gives accurate detection of SNPs/indels, how paired-end reads can improve reconstruction efficiency, and how adding in arrays is more efficient than just sequencing for disentangling some complex SVs. Our strategy should facilitate the sequencing of

  6. Unexpected inheritance: multiple integrations of ancient bornavirus and ebolavirus/marburgvirus sequences in vertebrate genomes.

    Science.gov (United States)

    Belyi, Vladimir A; Levine, Arnold J; Skalka, Anna Marie

    2010-07-29

    Vertebrate genomes contain numerous copies of retroviral sequences, acquired over the course of evolution. Until recently they were thought to be the only type of RNA viruses to be so represented, because integration of a DNA copy of their genome is required for their replication. In this study, an extensive sequence comparison was conducted in which 5,666 viral genes from all known non-retroviral families with single-stranded RNA genomes were matched against the germline genomes of 48 vertebrate species, to determine if such viruses could also contribute to the vertebrate genetic heritage. In 19 of the tested vertebrate species, we discovered as many as 80 high-confidence examples of genomic DNA sequences that appear to be derived, as long ago as 40 million years, from ancestral members of 4 currently circulating virus families with single strand RNA genomes. Surprisingly, almost all of the sequences are related to only two families in the Order Mononegavirales: the Bornaviruses and the Filoviruses, which cause lethal neurological disease and hemorrhagic fevers, respectively. Based on signature landmarks some, and perhaps all, of the endogenous virus-like DNA sequences appear to be LINE element-facilitated integrations derived from viral mRNAs. The integrations represent genes that encode viral nucleocapsid, RNA-dependent-RNA-polymerase, matrix and, possibly, glycoproteins. Integrations are generally limited to one or very few copies of a related viral gene per species, suggesting that once the initial germline integration was obtained (or selected), later integrations failed or provided little advantage to the host. The conservation of relatively long open reading frames for several of the endogenous sequences, the virus-like protein regions represented, and a potential correlation between their presence and a species' resistance to the diseases caused by these pathogens, are consistent with the notion that their products provide some important biological

  7. Unexpected inheritance: multiple integrations of ancient bornavirus and ebolavirus/marburgvirus sequences in vertebrate genomes.

    Directory of Open Access Journals (Sweden)

    Vladimir A Belyi

    2010-07-01

    Full Text Available Vertebrate genomes contain numerous copies of retroviral sequences, acquired over the course of evolution. Until recently they were thought to be the only type of RNA viruses to be so represented, because integration of a DNA copy of their genome is required for their replication. In this study, an extensive sequence comparison was conducted in which 5,666 viral genes from all known non-retroviral families with single-stranded RNA genomes were matched against the germline genomes of 48 vertebrate species, to determine if such viruses could also contribute to the vertebrate genetic heritage. In 19 of the tested vertebrate species, we discovered as many as 80 high-confidence examples of genomic DNA sequences that appear to be derived, as long ago as 40 million years, from ancestral members of 4 currently circulating virus families with single strand RNA genomes. Surprisingly, almost all of the sequences are related to only two families in the Order Mononegavirales: the Bornaviruses and the Filoviruses, which cause lethal neurological disease and hemorrhagic fevers, respectively. Based on signature landmarks some, and perhaps all, of the endogenous virus-like DNA sequences appear to be LINE element-facilitated integrations derived from viral mRNAs. The integrations represent genes that encode viral nucleocapsid, RNA-dependent-RNA-polymerase, matrix and, possibly, glycoproteins. Integrations are generally limited to one or very few copies of a related viral gene per species, suggesting that once the initial germline integration was obtained (or selected, later integrations failed or provided little advantage to the host. The conservation of relatively long open reading frames for several of the endogenous sequences, the virus-like protein regions represented, and a potential correlation between their presence and a species' resistance to the diseases caused by these pathogens, are consistent with the notion that their products provide some important

  8. Construction of a dairy microbial genome catalog opens new perspectives for the metagenomic analysis of dairy fermented products

    DEFF Research Database (Denmark)

    Almeida, Mathieu; Hebert, Agnes; Abraham, Anne-Laure

    2014-01-01

    Background: Microbial communities of traditional cheeses are complex and insufficiently characterized. The origin, safety and functional role in cheese making of these microbial communities are still not well understood. Metagenomic analysis of these communities by high throughput shotgun sequenc...

  9. Evolution of endogenous non-retroviral genes integrated into plant genomes

    Directory of Open Access Journals (Sweden)

    Hyosub Chu

    2014-08-01

    Full Text Available Numerous comparative genome analyses have revealed the wide extent of horizontal gene transfer (HGT in living organisms, which contributes to their evolution and genetic diversity. Viruses play important roles in HGT. Endogenous viral elements (EVEs are defined as viral DNA sequences present within the genomes of non-viral organisms. In eukaryotic cells, the majority of EVEs are derived from RNA viruses using reverse transcription. In contrast, endogenous non-retroviral elements (ENREs are poorly studied. However, the increasing availability of genomic data and the rapid development of bioinformatics tools have enabled the identification of several ENREs in various eukaryotic organisms. To date, a small number of ENREs integrated into plant genomes have been identified. Of the known non-retroviruses, most identified ENREs are derived from double-strand (ds RNA viruses, followed by single-strand (ss DNA and ssRNA viruses. At least eight virus families have been identified. Of these, viruses in the family Partitiviridae are dominant, followed by viruses of the families Chrysoviridae and Geminiviridae. The identified ENREs have been primarily identified in eudicots, followed by monocots. In this review, we briefly discuss the current view on non-retroviral sequences integrated into plant genomes that are associated with plant-virus evolution and their possible roles in antiviral resistance.

  10. TIGER: Toolbox for integrating genome-scale metabolic models, expression data, and transcriptional regulatory networks

    Directory of Open Access Journals (Sweden)

    Jensen Paul A

    2011-09-01

    Full Text Available Abstract Background Several methods have been developed for analyzing genome-scale models of metabolism and transcriptional regulation. Many of these methods, such as Flux Balance Analysis, use constrained optimization to predict relationships between metabolic flux and the genes that encode and regulate enzyme activity. Recently, mixed integer programming has been used to encode these gene-protein-reaction (GPR relationships into a single optimization problem, but these techniques are often of limited generality and lack a tool for automating the conversion of rules to a coupled regulatory/metabolic model. Results We present TIGER, a Toolbox for Integrating Genome-scale Metabolism, Expression, and Regulation. TIGER converts a series of generalized, Boolean or multilevel rules into a set of mixed integer inequalities. The package also includes implementations of existing algorithms to integrate high-throughput expression data with genome-scale models of metabolism and transcriptional regulation. We demonstrate how TIGER automates the coupling of a genome-scale metabolic model with GPR logic and models of transcriptional regulation, thereby serving as a platform for algorithm development and large-scale metabolic analysis. Additionally, we demonstrate how TIGER's algorithms can be used to identify inconsistencies and improve existing models of transcriptional regulation with examples from the reconstructed transcriptional regulatory network of Saccharomyces cerevisiae. Conclusion The TIGER package provides a consistent platform for algorithm development and extending existing genome-scale metabolic models with regulatory networks and high-throughput data.

  11. The nucleolus—guardian of cellular homeostasis and genome integrity.

    Science.gov (United States)

    Grummt, Ingrid

    2013-12-01

    All organisms sense and respond to conditions that stress their homeostasis by downregulating the synthesis of rRNA and ribosome biogenesis, thus designating the nucleolus as the central hub in coordinating the cellular stress response. One of the most intriguing roles of the nucleolus, long regarded as a mere ribosome-producing factory, is its participation in monitoring cellular stress signals and transmitting them to the RNA polymerase I (Pol I) transcription machinery. As rRNA synthesis is a most energy-consuming process, switching off transcription of rRNA genes is an effective way of saving the energy required to maintain cellular homeostasis during acute stress. The Pol I transcription machinery is the key convergence point that collects and integrates a vast array of information from cellular signaling cascades to regulate ribosome production which, in turn, guides cell growth and proliferation. This review focuses on the mechanisms that link cell physiology to rDNA silencing, a prerequisite for nucleolar integrity and cell survival.

  12. An integrated clinical and genomic information system for cancer precision medicine.

    Science.gov (United States)

    Jang, Yeongjun; Choi, Taekjin; Kim, Jongho; Park, Jisub; Seo, Jihae; Kim, Sangok; Kwon, Yeajee; Lee, Seungjae; Lee, Sanghyuk

    2018-04-20

    Increasing affordability of next-generation sequencing (NGS) has created an opportunity for realizing genomically-informed personalized cancer therapy as a path to precision oncology. However, the complex nature of genomic information presents a huge challenge for clinicians in interpreting the patient's genomic alterations and selecting the optimum approved or investigational therapy. An elaborate and practical information system is urgently needed to support clinical decision as well as to test clinical hypotheses quickly. Here, we present an integrated clinical and genomic information system (CGIS) based on NGS data analyses. Major components include modules for handling clinical data, NGS data processing, variant annotation and prioritization, drug-target-pathway analysis, and population cohort explorer. We built a comprehensive knowledgebase of genes, variants, drugs by collecting annotated information from public and in-house resources. Structured reports for molecular pathology are generated using standardized terminology in order to help clinicians interpret genomic variants and utilize them for targeted cancer therapy. We also implemented many features useful for testing hypotheses to develop prognostic markers from mutation and gene expression data. Our CGIS software is an attempt to provide useful information for both clinicians and scientists who want to explore genomic information for precision oncology.

  13. Integrated genomics of ovarian xenograft tumor progression and chemotherapy response

    International Nuclear Information System (INIS)

    Stuckey, Ashley; Brodsky, Alexander S; Fischer, Andrew; Miller, Daniel H; Hillenmeyer, Sara; Kim, Kyu K; Ritz, Anna; Singh, Rakesh K; Raphael, Benjamin J; Brard, Laurent

    2011-01-01

    Ovarian cancer is the most deadly gynecological cancer with a very poor prognosis. Xenograft mouse models have proven to be one very useful tool in testing candidate therapeutic agents and gene function in vivo. In this study we identify genes and gene networks important for the efficacy of a pre-clinical anti-tumor therapeutic, MT19c. In order to understand how ovarian xenograft tumors may be growing and responding to anti-tumor therapeutics, we used genome-wide mRNA expression and DNA copy number measurements to identify key genes and pathways that may be critical for SKOV-3 xenograft tumor progression. We compared SKOV-3 xenografts treated with the ergocalciferol derived, MT19c, to untreated tumors collected at multiple time points. Cell viability assays were used to test the function of the PPARγ agonist, Rosiglitazone, on SKOV-3 cell growth. These data indicate that a number of known survival and growth pathways including Notch signaling and general apoptosis factors are differentially expressed in treated vs. untreated xenografts. As tumors grow, cell cycle and DNA replication genes show increased expression, consistent with faster growth. The steroid nuclear receptor, PPARγ, was significantly up-regulated in MT19c treated xenografts. Surprisingly, stimulation of PPARγ with Rosiglitazone reduced the efficacy of MT19c and cisplatin suggesting that PPARγ is regulating a survival pathway in SKOV-3 cells. To identify which genes may be important for tumor growth and treatment response, we observed that MT19c down-regulates some high copy number genes and stimulates expression of some low copy number genes suggesting that these genes are particularly important for SKOV-3 xenograft growth and survival. We have characterized the time dependent responses of ovarian xenograft tumors to the vitamin D analog, MT19c. Our results suggest that PPARγ promotes survival for some ovarian tumor cells. We propose that a combination of regulated expression and copy number

  14. Integration of expression data in genome-scale metabolic network reconstructions

    Directory of Open Access Journals (Sweden)

    Anna S. Blazier

    2012-08-01

    Full Text Available With the advent of high-throughput technologies, the field of systems biology has amassed an abundance of omics data, quantifying thousands of cellular components across a variety of scales, ranging from mRNA transcript levels to metabolite quantities. Methods are needed to not only integrate this omics data but to also use this data to heighten the predictive capabilities of computational models. Several recent studies have successfully demonstrated how flux balance analysis (FBA, a constraint-based modeling approach, can be used to integrate transcriptomic data into genome-scale metabolic network reconstructions to generate predictive computational models. In this review, we summarize such FBA-based methods for integrating expression data into genome-scale metabolic network reconstructions, highlighting their advantages as well as their limitations.

  15. Identifying candidate driver genes by integrative ovarian cancer genomics data

    Science.gov (United States)

    Lu, Xinguo; Lu, Jibo

    2017-08-01

    Integrative analysis of molecular mechanics underlying cancer can distinguish interactions that cannot be revealed based on one kind of data for the appropriate diagnosis and treatment of cancer patients. Tumor samples exhibit heterogeneity in omics data, such as somatic mutations, Copy Number Variations CNVs), gene expression profiles and so on. In this paper we combined gene co-expression modules and mutation modulators separately in tumor patients to obtain the candidate driver genes for resistant and sensitive tumor from the heterogeneous data. The final list of modulators identified are well known in biological processes associated with ovarian cancer, such as CCL17, CACTIN, CCL16, CCL22, APOB, KDF1, CCL11, HNF1B, LRG1, MED1 and so on, which can help to facilitate the discovery of biomarkers, molecular diagnostics, and drug discovery.

  16. Genome-wide analysis reveals the extent of EAV-HP integration in domestic chicken.

    Science.gov (United States)

    Wragg, David; Mason, Andrew S; Yu, Le; Kuo, Richard; Lawal, Raman A; Desta, Takele Taye; Mwacharo, Joram M; Cho, Chang-Yeon; Kemp, Steve; Burt, David W; Hanotte, Olivier

    2015-10-14

    EAV-HP is an ancient retrovirus pre-dating Gallus speciation, which continues to circulate in modern chicken populations, and led to the emergence of avian leukosis virus subgroup J causing significant economic losses to the poultry industry. We mapped EAV-HP integration sites in Ethiopian village chickens, a Silkie, Taiwan Country chicken, red junglefowl Gallus gallus and several inbred experimental lines using whole-genome sequence data. An average of 75.22 ± 9.52 integration sites per bird were identified, which collectively group into 279 intervals of which 5 % are common to 90 % of the genomes analysed and are suggestive of pre-domestication integration events. More than a third of intervals are specific to individual genomes, supporting active circulation of EAV-HP in modern chickens. Interval density is correlated with chromosome length (P < 2.31(-6)), and 27 % of intervals are located within 5 kb of a transcript. Functional annotation clustering of genes reveals enrichment for immune-related functions (P < 0.05). Our results illustrate a non-random distribution of EAV-HP in the genome, emphasising the importance it may have played in the adaptation of the species, and provide a platform from which to extend investigations on the co-evolutionary significance of endogenous retroviral genera with their hosts.

  17. Annotating novel genes by integrating synthetic lethals and genomic information

    Directory of Open Access Journals (Sweden)

    Faty Mahamadou

    2008-01-01

    Full Text Available Abstract Background Large scale screening for synthetic lethality serves as a common tool in yeast genetics to systematically search for genes that play a role in specific biological processes. Often the amounts of data resulting from a single large scale screen far exceed the capacities of experimental characterization of every identified target. Thus, there is need for computational tools that select promising candidate genes in order to reduce the number of follow-up experiments to a manageable size. Results We analyze synthetic lethality data for arp1 and jnm1, two spindle migration genes, in order to identify novel members in this process. To this end, we use an unsupervised statistical method that integrates additional information from biological data sources, such as gene expression, phenotypic profiling, RNA degradation and sequence similarity. Different from existing methods that require large amounts of synthetic lethal data, our method merely relies on synthetic lethality information from two single screens. Using a Multivariate Gaussian Mixture Model, we determine the best subset of features that assign the target genes to two groups. The approach identifies a small group of genes as candidates involved in spindle migration. Experimental testing confirms the majority of our candidates and we present she1 (YBL031W as a novel gene involved in spindle migration. We applied the statistical methodology also to TOR2 signaling as another example. Conclusion We demonstrate the general use of Multivariate Gaussian Mixture Modeling for selecting candidate genes for experimental characterization from synthetic lethality data sets. For the given example, integration of different data sources contributes to the identification of genetic interaction partners of arp1 and jnm1 that play a role in the same biological process.

  18. High speed municipal sewage treatment in microbial fuel cell integrated with anaerobic membrane filtration system.

    Science.gov (United States)

    Lee, Y; Oa, S W

    2014-01-01

    A cylindrical two chambered microbial fuel cell (MFC) integrated with an anaerobic membrane filter was designed and constructed to evaluate bioelectricity generation and removal efficiency of organic substrate (glucose or domestic wastewater) depending on organic loading rates (OLRs). The MFC was continuously operated with OLRs 3.75, 5.0, 6.25, and 9.38 kg chemical oxygen demand (COD)/(m(3)·d) using glucose as a substrate, and the cathode chamber was maintained at 5-7 mg/L of dissolved oxygen. The optimal OLR was found to be 6.25 kgCOD/(m(3)·d) (hydraulic retention time (HRT) 1.9 h), and the corresponding voltage and power density averaged during the operation were 0.15 V and 13.6 mW/m(3). With OLR 6.25 kgCOD/(m(3)·d) using domestic wastewater as a substrate, the voltage and power reached to 0.13 V and 91 mW/m(3) in the air cathode system. Even though a relatively short HRT of 1.9 h was applied, stable effluent could be obtained by the membrane filtration system and the following air purging. In addition, the short HRT would provide economic benefit in terms of reduction of construction and operating costs compared with a conventional aerobic treatment process.

  19. The Effect of Diet and Exercise on Intestinal Integrity and Microbial Diversity in Mice

    Science.gov (United States)

    Wisniewski, Paul J.; Noji, Michael; McGuinness, Lora R.; Lightfoot, Stanley A.

    2016-01-01

    Background The gut microbiota is now known to play an important role contributing to inflammatory-based chronic diseases. This study examined intestinal integrity/inflammation and the gut microbial communities in sedentary and exercising mice presented with a normal or high-fat diet. Methods Thirty-six, 6-week old C57BL/6NTac male mice were fed a normal or high-fat diet for 12-weeks and randomly assigned to exercise or sedentary groups. After 12 weeks animals were sacrificed and duodenum/ileum tissues were fixed for immunohistochemistry for occludin, E-cadherin, and cyclooxygenase-2 (COX-2). The bacterial communities were assayed in fecal samples using terminal restriction fragment length polymorphism (TRFLP) analysis and pyrosequencing of 16S rRNA gene amplicons. Results Lean sedentary (LS) mice presented normal histologic villi while obese sedentary (OS) mice had similar villi height with more than twice the width of the LS animals. Both lean (LX) and obese exercise (OX) mice duodenum and ileum were histologically normal. COX-2 expression was the greatest in the OS group, followed by LS, LX and OX. The TRFLP and pyrosequencing indicated that members of the Clostridiales order were predominant in all diet groups. Specific phylotypes were observed with exercise, including Faecalibacterium prausnitzi, Clostridium spp., and Allobaculum spp. Conclusion These data suggest that exercise has a strong influence on gut integrity and host microbiome which points to the necessity for more mechanistic studies of the interactions between specific bacteria in the gut and its host. PMID:26954359

  20. The Effect of Diet and Exercise on Intestinal Integrity and Microbial Diversity in Mice.

    Directory of Open Access Journals (Sweden)

    Sara C Campbell

    Full Text Available The gut microbiota is now known to play an important role contributing to inflammatory-based chronic diseases. This study examined intestinal integrity/inflammation and the gut microbial communities in sedentary and exercising mice presented with a normal or high-fat diet.Thirty-six, 6-week old C57BL/6NTac male mice were fed a normal or high-fat diet for 12-weeks and randomly assigned to exercise or sedentary groups. After 12 weeks animals were sacrificed and duodenum/ileum tissues were fixed for immunohistochemistry for occludin, E-cadherin, and cyclooxygenase-2 (COX-2. The bacterial communities were assayed in fecal samples using terminal restriction fragment length polymorphism (TRFLP analysis and pyrosequencing of 16S rRNA gene amplicons.Lean sedentary (LS mice presented normal histologic villi while obese sedentary (OS mice had similar villi height with more than twice the width of the LS animals. Both lean (LX and obese exercise (OX mice duodenum and ileum were histologically normal. COX-2 expression was the greatest in the OS group, followed by LS, LX and OX. The TRFLP and pyrosequencing indicated that members of the Clostridiales order were predominant in all diet groups. Specific phylotypes were observed with exercise, including Faecalibacterium prausnitzi, Clostridium spp., and Allobaculum spp.These data suggest that exercise has a strong influence on gut integrity and host microbiome which points to the necessity for more mechanistic studies of the interactions between specific bacteria in the gut and its host.

  1. Efficient genome-wide genotyping strategies and data integration in crop plants.

    Science.gov (United States)

    Torkamaneh, Davoud; Boyle, Brian; Belzile, François

    2018-03-01

    Next-generation sequencing (NGS) has revolutionized plant and animal research by providing powerful genotyping methods. This review describes and discusses the advantages, challenges and, most importantly, solutions to facilitate data processing, the handling of missing data, and cross-platform data integration. Next-generation sequencing technologies provide powerful and flexible genotyping methods to plant breeders and researchers. These methods offer a wide range of applications from genome-wide analysis to routine screening with a high level of accuracy and reproducibility. Furthermore, they provide a straightforward workflow to identify, validate, and screen genetic variants in a short time with a low cost. NGS-based genotyping methods include whole-genome re-sequencing, SNP arrays, and reduced representation sequencing, which are widely applied in crops. The main challenges facing breeders and geneticists today is how to choose an appropriate genotyping method and how to integrate genotyping data sets obtained from various sources. Here, we review and discuss the advantages and challenges of several NGS methods for genome-wide genetic marker development and genotyping in crop plants. We also discuss how imputation methods can be used to both fill in missing data in genotypic data sets and to integrate data sets obtained using different genotyping tools. It is our hope that this synthetic view of genotyping methods will help geneticists and breeders to integrate these NGS-based methods in crop plant breeding and research.

  2. Microbial iron management mechanisms in extremely acidic environments: comparative genomics evidence for diversity and versatility

    Directory of Open Access Journals (Sweden)

    Nieto Pamela A

    2008-11-01

    uptake systems could reflect their obligatory occupation of extremely low pH environments where high concentrations of soluble iron may always be available and were oxidized sulfur species might not compromise iron speciation dynamics. Presence of bacterioferritin in the Acidithiobacilli, polyphosphate accumulation functions and variants of FieF-like diffusion facilitators in both Acidithiobacilli and Leptospirilla, indicate that they may remove or store iron under conditions of variable availability. In addition, the Fe(II-oxidizing capacity of both A. ferrooxidans and Leptospirilla could itself be a way to evade iron stress imposed by readily available Fe(II ions at low pH. Fur regulatory sites have been predicted for a number of gene clusters including iron related and non-iron related functions in both the Acidithiobacilli and Leptospirilla, laying the foundation for the future discovery of iron regulated and iron-phosphate coordinated regulatory control circuits. Conclusion In silico analyses of the genomes of acidophilic bacteria are beginning to tease apart the mechanisms that mediate iron uptake and homeostasis in low pH environments. Initial models pinpoint significant differences in abundance and diversity of iron management mechanisms between Leptospirilla and Acidithiobacilli, and begin to reveal how these two groups respond to iron cycling and iron fluctuations in naturally acidic environments and in industrial operations. Niche partitions and ecological successions between acidophilic microorganisms may be partially explained by these observed differences. Models derived from these analyses pave the way for improved hypothesis testing and well directed experimental investigation. In addition, aspects of these models should challenge investigators to evaluate alternative iron management strategies in non-acidophilic model organisms.

  3. First step in using molecular data for microbial food safety risk assessment; hazard identification of Escherichia coli O157:H7 by coupling genomic data with in vitro adherence to human epithelial cells.

    Science.gov (United States)

    Pielaat, Annemarie; Boer, Martin P; Wijnands, Lucas M; van Hoek, Angela H A M; Bouw, El; Barker, Gary C; Teunis, Peter F M; Aarts, Henk J M; Franz, Eelco

    2015-11-20

    The potential for using whole genome sequencing (WGS) data in microbiological risk assessment (MRA) has been discussed on several occasions since the beginning of this century. Still, the proposed heuristic approaches have never been applied in a practical framework. This is due to the non-trivial problem of mapping microbial information consisting of thousands of loci onto a probabilistic scale for risks. The paradigm change for MRA involves translation of multidimensional microbial genotypic information to much reduced (integrated) phenotypic information and onwards to a single measure of human risk (i.e. probability of illness). In this paper a first approach in methodology development is described for the application of WGS data in MRA; this is supported by a practical example. That is, combining genetic data (single nucleotide polymorphisms; SNPs) for Shiga toxin-producing Escherichia coli (STEC) O157 with phenotypic data (in vitro adherence to epithelial cells as a proxy for virulence) leads to hazard identification in a Genome Wide Association Study (GWAS). This application revealed practical implications when using SNP data for MRA. These can be summarized by considering the following main issues: optimum sample size for valid inference on population level, correction for population structure, quantification and calibration of results, reproducibility of the analysis, links with epidemiological data, anchoring and integration of results into a systems biology approach for the translation of molecular studies to human health risk. Future developments in genetic data analysis for MRA should aim at resolving the mapping problem of processing genetic sequences to come to a quantitative description of risk. The development of a clustering scheme focusing on biologically relevant information of the microbe involved would be a useful approach in molecular data reduction for risk assessment. Copyright © 2015. Published by Elsevier B.V.

  4. Targeting Unknowns Just Underfoot: Microbial Ecology and Community Genomics of C Cycling in Soil Informed and Enabled with DNA-SIP

    Science.gov (United States)

    Pepe-Ranney, C. P.; Campbell, A.; Buckley, D. H.

    2015-12-01

    Microorganisms drive biogeochemical cycles and because soil is a large global carbon (C) reservoir (soil contains more C than plants and the atmosphere combined), soil microorganisms are important players in the global C-cycle. Frustratingly, however, many soil microorganisms resist cultivation and soil communities are astoundingly complex. This makes soil microbiology difficult to study and without a solid understanding of soil microbial ecology, models of soil C feedbacks to climate change are under-informed. Stable isotope probing (SIP) is a useful approach for establishing identity-function connections in microbial communities but has been challenging to employ in soil due to the inadequate resolution of microbial community fingerprinting techniques. High throughput DNA sequencing improves SIP resolving power transforming it into a powerful tool for studying the soil C cycle. We conducted a DNA-SIP experiment to track flow of xylose-C, a labile component of plant biomass, and cellulose-C, the most abundant global biopolymer, through a soil microbial community. We could track 13C into microbial DNA even when added 13C amounted to less than 5% of native C and found Spartobacteria, Chloroflexi, and Planctomycetes taxa were among those that assimilated 13C cellulose. These lineages are cosmopolitan in soil but little is known of their ecophysiology. By profiling SSU rRNA genes across entire DNA-SIP density gradients, we assessed relative DNA atom % 13C per taxon in 13C treatments and found cellulose degraders exhibited signal consistent with a specialist lifestyle with respect to C preference. Further, DNA-SIP enriches DNA of targeted microorganisms (Verrucomicrobia cellulose degraders were enriched by nearly two orders of magnitude) and this enriched DNA can serve as template for community genomics. We produced draft genomes from soil cellulose degraders including microorganisms belonging to Verrucomicrobia, Chloroflexi, and Planctomycetes from SIP enriched DNA

  5. Capturing microbial sources distributed in a mixed-use watershed within an integrated environmental modeling workflow

    Science.gov (United States)

    Many watershed models simulate overland and instream microbial fate and transport, but few provide loading rates on land surfaces and point sources to the waterbody network. This paper describes the underlying equations for microbial loading rates associated with 1) land-applied ...

  6. Resistance to post-harvest microbial rot in yam: Integration of ...

    African Journals Online (AJOL)

    Post-harvest microbial rot is an important disease that causes severe losses in yam (Dioscorea spp.) storage. Rot from microbial infection of healthy yam tubers reduces their table quality and renders them unappealing to consumers. A study was carried out at Bimbilla in the Nanumba North District of Ghana to evaluate ...

  7. Drosophila Sld5 is essential for normal cell cycle progression and maintenance of genomic integrity

    Energy Technology Data Exchange (ETDEWEB)

    Gouge, Catherine A. [Department of Biology, East Carolina University East Carolina University, Greenville, NC 27858 (United States); Christensen, Tim W., E-mail: christensent@ecu.edu [Department of Biology, East Carolina University East Carolina University, Greenville, NC 27858 (United States)

    2010-09-10

    Research highlights: {yields} Drosophila Sld5 interacts with Psf1, PPsf2, and Mcm10. {yields} Haploinsufficiency of Sld5 leads to M-phase delay and genomic instability. {yields} Sld5 is also required for normal S phase progression. -- Abstract: Essential for the normal functioning of a cell is the maintenance of genomic integrity. Failure in this process is often catastrophic for the organism, leading to cell death or mis-proliferation. Central to genomic integrity is the faithful replication of DNA during S phase. The GINS complex has recently come to light as a critical player in DNA replication through stabilization of MCM2-7 and Cdc45 as a member of the CMG complex which is likely responsible for the processivity of helicase activity during S phase. The GINS complex is made up of 4 members in a 1:1:1:1 ratio: Psf1, Psf2, Psf3, And Sld5. Here we present the first analysis of the function of the Sld5 subunit in a multicellular organism. We show that Drosophila Sld5 interacts with Psf1, Psf2, and Mcm10 and that mutations in Sld5 lead to M and S phase delays with chromosomes exhibiting hallmarks of genomic instability.

  8. Genome puzzle master (GPM): an integrated pipeline for building and editing pseudomolecules from fragmented sequences.

    Science.gov (United States)

    Zhang, Jianwei; Kudrna, Dave; Mu, Ting; Li, Weiming; Copetti, Dario; Yu, Yeisoo; Goicoechea, Jose Luis; Lei, Yang; Wing, Rod A

    2016-10-15

    Next generation sequencing technologies have revolutionized our ability to rapidly and affordably generate vast quantities of sequence data. Once generated, raw sequences are assembled into contigs or scaffolds. However, these assemblies are mostly fragmented and inaccurate at the whole genome scale, largely due to the inability to integrate additional informative datasets (e.g. physical, optical and genetic maps). To address this problem, we developed a semi-automated software tool-Genome Puzzle Master (GPM)-that enables the integration of additional genomic signposts to edit and build 'new-gen-assemblies' that result in high-quality 'annotation-ready' pseudomolecules. With GPM, loaded datasets can be connected to each other via their logical relationships which accomplishes tasks to 'group,' 'merge,' 'order and orient' sequences in a draft assembly. Manual editing can also be performed with a user-friendly graphical interface. Final pseudomolecules reflect a user's total data package and are available for long-term project management. GPM is a web-based pipeline and an important part of a Laboratory Information Management System (LIMS) which can be easily deployed on local servers for any genome research laboratory. The GPM (with LIMS) package is available at https://github.com/Jianwei-Zhang/LIMS CONTACTS: jzhang@mail.hzau.edu.cn or rwing@mail.arizona.eduSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  9. MannDB – A microbial database of automated protein sequence analyses and evidence integration for protein characterization

    Directory of Open Access Journals (Sweden)

    Kuczmarski Thomas A

    2006-10-01

    Full Text Available Abstract Background MannDB was created to meet a need for rapid, comprehensive automated protein sequence analyses to support selection of proteins suitable as targets for driving the development of reagents for pathogen or protein toxin detection. Because a large number of open-source tools were needed, it was necessary to produce a software system to scale the computations for whole-proteome analysis. Thus, we built a fully automated system for executing software tools and for storage, integration, and display of automated protein sequence analysis and annotation data. Description MannDB is a relational database that organizes data resulting from fully automated, high-throughput protein-sequence analyses using open-source tools. Types of analyses provided include predictions of cleavage, chemical properties, classification, features, functional assignment, post-translational modifications, motifs, antigenicity, and secondary structure. Proteomes (lists of hypothetical and known proteins are downloaded and parsed from Genbank and then inserted into MannDB, and annotations from SwissProt are downloaded when identifiers are found in the Genbank entry or when identical sequences are identified. Currently 36 open-source tools are run against MannDB protein sequences either on local systems or by means of batch submission to external servers. In addition, BLAST against protein entries in MvirDB, our database of microbial virulence factors, is performed. A web client browser enables viewing of computational results and downloaded annotations, and a query tool enables structured and free-text search capabilities. When available, links to external databases, including MvirDB, are provided. MannDB contains whole-proteome analyses for at least one representative organism from each category of biological threat organism listed by APHIS, CDC, HHS, NIAID, USDA, USFDA, and WHO. Conclusion MannDB comprises a large number of genomes and comprehensive protein

  10. RGmatch: matching genomic regions to proximal genes in omics data integration

    Directory of Open Access Journals (Sweden)

    Pedro Furió-Tarí

    2016-11-01

    Full Text Available Abstract Background The integrative analysis of multiple genomics data often requires that genome coordinates-based signals have to be associated with proximal genes. The relative location of a genomic region with respect to the gene (gene area is important for functional data interpretation; hence algorithms that match regions to genes should be able to deliver insight into this information. Results In this work we review the tools that are publicly available for making region-to-gene associations. We also present a novel method, RGmatch, a flexible and easy-to-use Python tool that computes associations either at the gene, transcript, or exon level, applying a set of rules to annotate each region-gene association with the region location within the gene. RGmatch can be applied to any organism as long as genome annotation is available. Furthermore, we qualitatively and quantitatively compare RGmatch to other tools. Conclusions RGmatch simplifies the association of a genomic region with its closest gene. At the same time, it is a powerful tool because the rules used to annotate these associations are very easy to modify according to the researcher’s specific interests. Some important differences between RGmatch and other similar tools already in existence are RGmatch’s flexibility, its wide range of user options, compatibility with any annotatable organism, and its comprehensive and user-friendly output.

  11. FISH Oracle 2: a web server for integrative visualization of genomic data in cancer research.

    Science.gov (United States)

    Mader, Malte; Simon, Ronald; Kurtz, Stefan

    2014-03-31

    A comprehensive view on all relevant genomic data is instrumental for understanding the complex patterns of molecular alterations typically found in cancer cells. One of the most effective ways to rapidly obtain an overview of genomic alterations in large amounts of genomic data is the integrative visualization of genomic events. We developed FISH Oracle 2, a web server for the interactive visualization of different kinds of downstream processed genomics data typically available in cancer research. A powerful search interface and a fast visualization engine provide a highly interactive visualization for such data. High quality image export enables the life scientist to easily communicate their results. A comprehensive data administration allows to keep track of the available data sets. We applied FISH Oracle 2 to published data and found evidence that, in colorectal cancer cells, the gene TTC28 may be inactivated in two different ways, a fact that has not been published before. The interactive nature of FISH Oracle 2 and the possibility to store, select and visualize large amounts of downstream processed data support life scientists in generating hypotheses. The export of high quality images supports explanatory data visualization, simplifying the communication of new biological findings. A FISH Oracle 2 demo server and the software is available at http://www.zbh.uni-hamburg.de/fishoracle.

  12. Genome scale models of yeast: towards standardized evaluation and consistent omic integration

    DEFF Research Database (Denmark)

    Sanchez, Benjamin J.; Nielsen, Jens

    2015-01-01

    Genome scale models (GEMs) have enabled remarkable advances in systems biology, acting as functional databases of metabolism, and as scaffolds for the contextualization of high-throughput data. In the case of Saccharomyces cerevisiae (budding yeast), several GEMs have been published and are curre......Genome scale models (GEMs) have enabled remarkable advances in systems biology, acting as functional databases of metabolism, and as scaffolds for the contextualization of high-throughput data. In the case of Saccharomyces cerevisiae (budding yeast), several GEMs have been published...... in which all levels of omics data (from gene expression to flux) have been integrated in yeast GEMs. Relevant conclusions and current challenges for both GEM evaluation and omic integration are highlighted....

  13. Probabilistic quantitative microbial risk assessment model of norovirus from wastewater irrigated vegetables in Ghana using genome copies and fecal indicator ratio conversion for estimating exposure dose.

    Science.gov (United States)

    Owusu-Ansah, Emmanuel de-Graft Johnson; Sampson, Angelina; Amponsah, Samuel K; Abaidoo, Robert C; Dalsgaard, Anders; Hald, Tine

    2017-12-01

    The need to replace the commonly applied fecal indicator conversions ratio (an assumption of 1:10 -5 virus to fecal indicator organism) in Quantitative Microbial Risk Assessment (QMRA) with models based on quantitative data on the virus of interest has gained prominence due to the different physical and environmental factors that might influence the reliability of using indicator organisms in microbial risk assessment. The challenges facing analytical studies on virus enumeration (genome copies or particles) have contributed to the already existing lack of data in QMRA modelling. This study attempts to fit a QMRA model to genome copies of norovirus data. The model estimates the risk of norovirus infection from the intake of vegetables irrigated with wastewater from different sources. The results were compared to the results of a corresponding model using the fecal indicator conversion ratio to estimate the norovirus count. In all scenarios of using different water sources, the application of the fecal indicator conversion ratio underestimated the norovirus disease burden, measured by the Disability Adjusted Life Years (DALYs), when compared to results using the genome copies norovirus data. In some cases the difference was >2 orders of magnitude. All scenarios using genome copies met the 10 -4 DALY per person per year for consumption of vegetables irrigated with wastewater, although these results are considered to be highly conservative risk estimates. The fecal indicator conversion ratio model of stream-water and drain-water sources of wastewater achieved the 10 -6 DALY per person per year threshold, which tends to indicate an underestimation of health risk when compared to using genome copies for estimating the dose. Copyright © 2017 Elsevier B.V. All rights reserved.

  14. The RNAPII-CTD Maintains Genome Integrity through Inhibition of Retrotransposon Gene Expression and Transposition.

    Directory of Open Access Journals (Sweden)

    Maria J Aristizabal

    2015-10-01

    Full Text Available RNA polymerase II (RNAPII contains a unique C-terminal domain that is composed of heptapeptide repeats and which plays important regulatory roles during gene expression. RNAPII is responsible for the transcription of most protein-coding genes, a subset of non-coding genes, and retrotransposons. Retrotransposon transcription is the first step in their multiplication cycle, given that the RNA intermediate is required for the synthesis of cDNA, the material that is ultimately incorporated into a new genomic location. Retrotransposition can have grave consequences to genome integrity, as integration events can change the gene expression landscape or lead to alteration or loss of genetic information. Given that RNAPII transcribes retrotransposons, we sought to investigate if the RNAPII-CTD played a role in the regulation of retrotransposon gene expression. Importantly, we found that the RNAPII-CTD functioned to maintaining genome integrity through inhibition of retrotransposon gene expression, as reducing CTD length significantly increased expression and transposition rates of Ty1 elements. Mechanistically, the increased Ty1 mRNA levels in the rpb1-CTD11 mutant were partly due to Cdk8-dependent alterations to the RNAPII-CTD phosphorylation status. In addition, Cdk8 alone contributed to Ty1 gene expression regulation by altering the occupancy of the gene-specific transcription factor Ste12. Loss of STE12 and TEC1 suppressed growth phenotypes of the RNAPII-CTD truncation mutant. Collectively, our results implicate Ste12 and Tec1 as general and important contributors to the Cdk8, RNAPII-CTD regulatory circuitry as it relates to the maintenance of genome integrity.

  15. The openness of pluripotent epigenome - Defining the genomic integrity of stemness for regenerative medicine

    Directory of Open Access Journals (Sweden)

    Xuejun H Parsons

    2014-02-01

    Full Text Available This article is an editorial, and it doesn't include an abstract. Full text of this article is available in HTML and PDF.Cite this article as: Parsons XH. The openness of pluripotent epigenome - Defining the genomic Integrity of stemness for regenerative medicine. Int J Cancer Ther Oncol 2014; 2(1:020114.DOI: http://dx.doi.org/10.14319/ijcto.0201.14

  16. CTDB: An Integrated Chickpea Transcriptome Database for Functional and Applied Genomics

    OpenAIRE

    Verma, Mohit; Kumar, Vinay; Patel, Ravi K.; Garg, Rohini; Jain, Mukesh

    2015-01-01

    Chickpea is an important grain legume used as a rich source of protein in human diet. The narrow genetic diversity and limited availability of genomic resources are the major constraints in implementing breeding strategies and biotechnological interventions for genetic enhancement of chickpea. We developed an integrated Chickpea Transcriptome Database (CTDB), which provides the comprehensive web interface for visualization and easy retrieval of transcriptome data in chickpea. The database fea...

  17. An Integrated Insight into the Relationship between Soil Microbial Community and Tobacco Bacterial Wilt Disease

    Science.gov (United States)

    Yang, Hongwu; Li, Juan; Xiao, Yunhua; Gu, Yabing; Liu, Hongwei; Liang, Yili; Liu, Xueduan; Hu, Jin; Meng, Delong; Yin, Huaqun

    2017-01-01

    The soil microbial communities play an important role in plant health, however, the relationship between the below-ground microbiome and above-ground plant health remains unclear. To reveal such a relationship, we analyzed soil microbial communities through sequencing of 16S rRNA gene amplicons from 15 different tobacco fields with different levels of wilt disease in the central south part of China. We found that plant health was related to the soil microbial diversity as plants may benefit from the diverse microbial communities. Also, those 15 fields were grouped into ‘healthy’ and ‘infected’ samples based upon soil microbial community composition analyses such as unweighted paired-group method with arithmetic means (UPGMA) and principle component analysis, and furthermore, molecular ecological network analysis indicated that some potential plant-beneficial microbial groups, e.g., Bacillus and Actinobacteria could act as network key taxa, thus reducing the chance of plant soil-borne pathogen invasion. In addition, we propose that a more complex soil ecology network may help suppress tobacco wilt, which was also consistent with highly diversity and composition with plant-beneficial microbial groups. This study provides new insights into our understanding the relationship between the soil microbiome and plant health. PMID:29163453

  18. CTDB: An Integrated Chickpea Transcriptome Database for Functional and Applied Genomics.

    Directory of Open Access Journals (Sweden)

    Mohit Verma

    Full Text Available Chickpea is an important grain legume used as a rich source of protein in human diet. The narrow genetic diversity and limited availability of genomic resources are the major constraints in implementing breeding strategies and biotechnological interventions for genetic enhancement of chickpea. We developed an integrated Chickpea Transcriptome Database (CTDB, which provides the comprehensive web interface for visualization and easy retrieval of transcriptome data in chickpea. The database features many tools for similarity search, functional annotation (putative function, PFAM domain and gene ontology search and comparative gene expression analysis. The current release of CTDB (v2.0 hosts transcriptome datasets with high quality functional annotation from cultivated (desi and kabuli types and wild chickpea. A catalog of transcription factor families and their expression profiles in chickpea are available in the database. The gene expression data have been integrated to study the expression profiles of chickpea transcripts in major tissues/organs and various stages of flower development. The utilities, such as similarity search, ortholog identification and comparative gene expression have also been implemented in the database to facilitate comparative genomic studies among different legumes and Arabidopsis. Furthermore, the CTDB represents a resource for the discovery of functional molecular markers (microsatellites and single nucleotide polymorphisms between different chickpea types. We anticipate that integrated information content of this database will accelerate the functional and applied genomic research for improvement of chickpea. The CTDB web service is freely available at http://nipgr.res.in/ctdb.html.

  19. Microbial network for waste activated sludge cascade utilization in an integrated system of microbial electrolysis and anaerobic fermentation

    DEFF Research Database (Denmark)

    Liu, Wenzong; He, Zhangwei; Yang, Chunxue

    2016-01-01

    Background: Bioelectrochemical systems have been considered a promising novel technology that shows an enhanced energy recovery, as well as generation of value-added products. A number of recent studies suggested that an enhancement of carbon conversion and biogas production can be achieved....... The characterization of integrated community structure and community shifts is not well understood, however, it starts to attract interest of scientists and engineers. Results: In the present work, energy recovery and WAS conversion are comprehensively affected by typical pretreated biosolid characteristics. We...... investigated the interaction of fermentation communities and electrode respiring communities in an integrated system of WAS fermentation and MEC for hydrogen recovery. A high energy recovery was achieved in the MECs feeding WAS fermentation liquid through alkaline pretreatment. Some anaerobes belonging...

  20. Genomic GC-content affects the accuracy of 16S rRNA gene sequencing bsed microbial profiling due to PCR bias

    DEFF Research Database (Denmark)

    Laursen, Martin F.; Dalgaard, Marlene Danner; Bahl, Martin Iain

    2017-01-01

    Profiling of microbial community composition is frequently performed by partial 16S rRNA gene sequencing on benchtop platforms following PCR amplification of specific hypervariable regions within this gene. Accuracy and reproducibility of this strategy are two key parameters to consider, which may...... be influenced during all processes from sample collection and storage, through DNA extraction and PCR based library preparation to the final sequencing. In order to evaluate both the reproducibility and accuracy of 16S rRNA gene based microbial profiling using the Ion Torrent PGM platform, we prepared libraries...... be explained partly by premature read truncation, but to larger degree their genomic GC-content, which correlated negatively with the observed relative abundances, suggesting a PCR bias against GC-rich species during library preparation. Increasing the initial denaturation time during the PCR amplification...

  1. The molecular dimension of microbial species: 3. Comparative genomics of Synechococcus strains with different light responses and in situ diel transcription patterns of associated ecotypes in the Mushroom Spring microbial mat

    Directory of Open Access Journals (Sweden)

    Millie T. Olsen

    2015-06-01

    Full Text Available Genomes were obtained for three closely related strains of Synechococcus that are representative of putative ecotypes that predominate at different depths in the 1 mm-thick, upper-green layer in the 60°C mat of Mushroom Spring, Yellowstone National Park, and exhibit different light adaptation and acclimation responses. The genomes were compared to the published genome of a previously obtained, closely related strain from a neighboring spring, and differences in both gene content and orthologous gene alleles between high-light-adapted and low-light-adapted strains were identified. Evidence of genetic differences that relate to adaptation to light intensity and/or quality, CO2 uptake, nitrogen metabolism, organic carbon metabolism, and uptake of other nutrients were found between strains of the different putative ecotypes. In situ diel transcription patterns of genes, including genes unique to either low-light-adapted or high-light-adapted strains and different alleles of an orthologous photosystem gene, revealed that expression is fine-tuned to the different light environments experienced by ecotypes prevalent at various depths in the mat. This study suggests that strains of closely related putative ecotypes have different genomic adaptations that enable them to inhabit distinct ecological niches while living in close proximity within a microbial community.

  2. The European Renal Genome Project: An Integrated Approach Towards Understanding the Genetics of Kidney Development and Disease

    OpenAIRE

    Willnow, TE; Antignac, C; Brändli, AW; Christensen, EI; Cox, RD; Davidson, D; Davies, JA; Devuyst, O; Eichele, G; Hastie, ND; Verroust, PJ; Schedl, A; Meij, IC

    2005-01-01

    Rapid progress in genome research creates a wealth of information on the functional annotation of mammalian genome sequences. However, as we accumulate large amounts of scientific information we are facing problems of how to integrate and relate the data produced by various genomic approaches. Here, we propose the novel concept of an organ atlas where diverse data from expression maps to histological findings to mutant phenotypes can be queried, compared and visualized in the context of a thr...

  3. Genomes

    National Research Council Canada - National Science Library

    Brown, T. A. (Terence A.)

    2002-01-01

    ... of genome expression and replication processes, and transcriptomics and proteomics. This text is richly illustrated with clear, easy-to-follow, full color diagrams, which are downloadable from the book's website...

  4. Effects of iron and calcium carbonate on contaminant removal efficiencies and microbial communities in integrated wastewater treatment systems.

    Science.gov (United States)

    Zhao, Zhimiao; Song, Xinshan; Zhang, Yinjiang; Zhao, Yufeng; Wang, Bodi; Wang, Yuhui

    2017-12-01

    In the paper, we explored the influences of different dosages of iron and calcium carbonate on contaminant removal efficiencies and microbial communities in algal ponds combined with constructed wetlands. After 1-year operation of treatment systems, based on the high-throughput pyrosequencing analysis of microbial communities, the optimal operating conditions were obtained as follows: the ACW10 system with Fe 3+ (5.6 mg L -1 ), iron powder (2.8 mg L -1 ), and CaCO 3 powder (0.2 mg L -1 ) in influent as the adjusting agents, initial phosphorus source (PO 4 3- ) in influent, the ratio of nitrogen to phosphorus (N/P) of 30 in influent, and hydraulic retention time (HRT) of 1 day. Total nitrogen (TN) removal efficiency and total phosphorus (TP) removal efficiency were improved significantly. The hydrolysis of CaCO 3 promoted the physicochemical precipitation in contaminant removal. Meanwhile, Fe 3+ and iron powder produced Fe 2+ , which improved contaminant removal. Iron ion improved the diversity, distribution, and metabolic functions of microbial communities in integrated treatment systems. In the treatment ACW10, the dominant phylum in the microbial community was PLANCTOMYCETES, which positively promoted nitrogen removal. After 5 consecutive treatments in ACW10, contaminant removal efficiencies for TN and TP respectively reached 80.6% and 57.3% and total iron concentration in effluent was 0.042 mg L -1 . Copyright © 2017 Elsevier Ltd. All rights reserved.

  5. Data integration for plant genomics--exemplars from the integration of Arabidopsis thaliana databases.

    Science.gov (United States)

    Lysenko, Artem; Lysenko, Atem; Hindle, Matthew Morritt; Taubert, Jan; Saqi, Mansoor; Rawlings, Christopher John

    2009-11-01

    The development of a systems based approach to problems in plant sciences requires integration of existing information resources. However, the available information is currently often incomplete and dispersed across many sources and the syntactic and semantic heterogeneity of the data is a challenge for integration. In this article, we discuss strategies for data integration and we use a graph based integration method (Ondex) to illustrate some of these challenges with reference to two example problems concerning integration of (i) metabolic pathway and (ii) protein interaction data for Arabidopsis thaliana. We quantify the degree of overlap for three commonly used pathway and protein interaction information sources. For pathways, we find that the AraCyc database contains the widest coverage of enzyme reactions and for protein interactions we find that the IntAct database provides the largest unique contribution to the integrated dataset. For both examples, however, we observe a relatively small amount of data common to all three sources. Analysis and visual exploration of the integrated networks was used to identify a number of practical issues relating to the interpretation of these datasets. We demonstrate the utility of these approaches to the analysis of groups of coexpressed genes from an individual microarray experiment, in the context of pathway information and for the combination of coexpression data with an integrated protein interaction network.

  6. Complete genome sequence of DSM 30083(T), the type strain (U5/41(T)) of Escherichia coli, and a proposal for delineating subspecies in microbial taxonomy.

    Science.gov (United States)

    Meier-Kolthoff, Jan P; Hahnke, Richard L; Petersen, Jörn; Scheuner, Carmen; Michael, Victoria; Fiebig, Anne; Rohde, Christine; Rohde, Manfred; Fartmann, Berthold; Goodwin, Lynne A; Chertkov, Olga; Reddy, Tbk; Pati, Amrita; Ivanova, Natalia N; Markowitz, Victor; Kyrpides, Nikos C; Woyke, Tanja; Göker, Markus; Klenk, Hans-Peter

    2014-01-01

    Although Escherichia coli is the most widely studied bacterial model organism and often considered to be the model bacterium per se, its type strain was until now forgotten from microbial genomics. As a part of the G enomic E ncyclopedia of B acteria and A rchaea project, we here describe the features of E. coli DSM 30083(T) together with its genome sequence and annotation as well as novel aspects of its phenotype. The 5,038,133 bp containing genome sequence includes 4,762 protein-coding genes and 175 RNA genes as well as a single plasmid. Affiliation of a set of 250 genome-sequenced E. coli strains, Shigella and outgroup strains to the type strain of E. coli was investigated using digital DNA:DNA-hybridization (dDDH) similarities and differences in genomic G+C content. As in the majority of previous studies, results show Shigella spp. embedded within E. coli and in most cases forming a single subgroup of it. Phylogenomic trees also recover the proposed E. coli phylotypes as monophyla with minor exceptions and place DSM 30083(T) in phylotype B2 with E. coli S88 as its closest neighbor. The widely used lab strain K-12 is not only genomically but also physiologically strongly different from the type strain. The phylotypes do not express a uniform level of character divergence as measured using dDDH, however, thus an alternative arrangement is proposed and discussed in the context of bacterial subspecies. Analyses of the genome sequences of a large number of E. coli strains and of strains from > 100 other bacterial genera indicate a value of 79-80% dDDH as the most promising threshold for delineating subspecies, which in turn suggests the presence of five subspecies within E. coli.

  7. An evolvable oestrogen receptor activity sensor: development of a modular system for integrating multiple genes into the yeast genome

    NARCIS (Netherlands)

    Fox, J.E.; Bridgham, J.T.; Bovee, T.F.H.; Thornton, J.W.

    2007-01-01

    To study a gene interaction network, we developed a gene-targeting strategy that allows efficient and stable genomic integration of multiple genetic constructs at distinct target loci in the yeast genome. This gene-targeting strategy uses a modular plasmid with a recyclable selectable marker and a

  8. VarB Plus: An Integrated Tool for Visualization of Genome Variation Datasets

    KAUST Repository

    Hidayah, Lailatul

    2012-07-01

    Research on genomic sequences has been improving significantly as more advanced technology for sequencing has been developed. This opens enormous opportunities for sequence analysis. Various analytical tools have been built for purposes such as sequence assembly, read alignments, genome browsing, comparative genomics, and visualization. From the visualization perspective, there is an increasing trend towards use of large-scale computation. However, more than power is required to produce an informative image. This is a challenge that we address by providing several ways of representing biological data in order to advance the inference endeavors of biologists. This thesis focuses on visualization of variations found in genomic sequences. We develop several visualization functions and embed them in an existing variation visualization tool as extensions. The tool we improved is named VarB, hence the nomenclature for our enhancement is VarB Plus. To the best of our knowledge, besides VarB, there is no tool that provides the capability of dynamic visualization of genome variation datasets as well as statistical analysis. Dynamic visualization allows users to toggle different parameters on and off and see the results on the fly. The statistical analysis includes Fixation Index, Relative Variant Density, and Tajima’s D. Hence we focused our efforts on this tool. The scope of our work includes plots of per-base genome coverage, Principal Coordinate Analysis (PCoA), integration with a read alignment viewer named LookSeq, and visualization of geo-biological data. In addition to description of embedded functionalities, significance, and limitations, future improvements are discussed. The result is four extensions embedded successfully in the original tool, which is built on the Qt framework in C++. Hence it is portable to numerous platforms. Our extensions have shown acceptable execution time in a beta testing with various high-volume published datasets, as well as positive

  9. Modeling and interoperability of heterogeneous genomic big data for integrative processing and querying.

    Science.gov (United States)

    Masseroli, Marco; Kaitoua, Abdulrahman; Pinoli, Pietro; Ceri, Stefano

    2016-12-01

    While a huge amount of (epi)genomic data of multiple types is becoming available by using Next Generation Sequencing (NGS) technologies, the most important emerging problem is the so-called tertiary analysis, concerned with sense making, e.g., discovering how different (epi)genomic regions and their products interact and cooperate with each other. We propose a paradigm shift in tertiary analysis, based on the use of the Genomic Data Model (GDM), a simple data model which links genomic feature data to their associated experimental, biological and clinical metadata. GDM encompasses all the data formats which have been produced for feature extraction from (epi)genomic datasets. We specifically describe the mapping to GDM of SAM (Sequence Alignment/Map), VCF (Variant Call Format), NARROWPEAK (for called peaks produced by NGS ChIP-seq or DNase-seq methods), and BED (Browser Extensible Data) formats, but GDM supports as well all the formats describing experimental datasets (e.g., including copy number variations, DNA somatic mutations, or gene expressions) and annotations (e.g., regarding transcription start sites, genes, enhancers or CpG islands). We downloaded and integrated samples of all the above-mentioned data types and formats from multiple sources. The GDM is able to homogeneously describe semantically heterogeneous data and makes the ground for providing data interoperability, e.g., achieved through the GenoMetric Query Language (GMQL), a high-level, declarative query language for genomic big data. The combined use of the data model and the query language allows comprehensive processing of multiple heterogeneous data, and supports the development of domain-specific data-driven computations and bio-molecular knowledge discovery. Copyright © 2016 Elsevier Inc. All rights reserved.

  10. An integrated study to analyze soil microbial community structure and metabolic potential in two forest types.

    Science.gov (United States)

    Zhang, Yuguang; Cong, Jing; Lu, Hui; Yang, Caiyun; Yang, Yunfeng; Zhou, Jizhong; Li, Diqiang

    2014-01-01

    Soil microbial metabolic potential and ecosystem function have received little attention owing to difficulties in methodology. In this study, we selected natural mature forest and natural secondary forest and analyzed the soil microbial community and metabolic potential combing the high-throughput sequencing and GeoChip technologies. Phylogenetic analysis based on 16S rRNA sequencing showed that one known archaeal phylum and 15 known bacterial phyla as well as unclassified phylotypes were presented in these forest soils, and Acidobacteria, Protecobacteria, and Actinobacteria were three of most abundant phyla. The detected microbial functional gene groups were related to different biogeochemical processes, including carbon degradation, carbon fixation, methane metabolism, nitrogen cycling, phosphorus utilization, sulfur cycling, etc. The Shannon index for detected functional gene probes was significantly higher (PThe regression analysis showed that a strong positive (Pthe soil microbial functional gene diversity and phylogenetic diversity. Mantel test showed that soil oxidizable organic carbon, soil total nitrogen and cellulose, glucanase, and amylase activities were significantly linked (Pthe relative abundance of corresponded functional gene groups. Variance partitioning analysis showed that a total of 81.58% of the variation in community structure was explained by soil chemical factors, soil temperature, and plant diversity. Therefore, the positive link of soil microbial structure and composition to functional activity related to ecosystem functioning was existed, and the natural secondary forest soil may occur the high microbial metabolic potential. Although the results can't directly reflect the actual microbial populations and functional activities, this study provides insight into the potential activity of the microbial community and associated feedback responses of the terrestrial ecosystem to environmental changes.

  11. LocusTrack: Integrated visualization of GWAS results and genomic annotation.

    Science.gov (United States)

    Cuellar-Partida, Gabriel; Renteria, Miguel E; MacGregor, Stuart

    2015-01-01

    Genome-wide association studies (GWAS) are an important tool for the mapping of complex traits and diseases. Visual inspection of genomic annotations may be used to generate insights into the biological mechanisms underlying GWAS-identified loci. We developed LocusTrack, a web-based application that annotates and creates plots of regional GWAS results and incorporates user-specified tracks that display annotations such as linkage disequilibrium (LD), phylogenetic conservation, chromatin state, and other genomic and regulatory elements. Currently, LocusTrack can integrate annotation tracks from the UCSC genome-browser as well as from any tracks provided by the user. LocusTrack is an easy-to-use application and can be accessed at the following URL: http://gump.qimr.edu.au/general/gabrieC/LocusTrack/. Users can upload and manage GWAS results and select from and/or provide annotation tracks using simple and intuitive menus. LocusTrack scripts and associated data can be downloaded from the website and run locally.

  12. Integrative Genomic and Proteomic Analysis of the Response of Lactobacillus casei Zhang to Glucose Restriction.

    Science.gov (United States)

    Yu, Jie; Hui, Wenyan; Cao, Chenxia; Pan, Lin; Zhang, Heping; Zhang, Wenyi

    2018-03-02

    Nutrient starvation is an important survival challenge for bacteria during industrial production of functional foods. As next-generation sequencing technology has greatly advanced, we performed proteomic and genomic analysis to investigate the response of Lactobacillus casei Zhang to a glucose-restricted environment. L. casei Zhang strains were permitted to evolve in glucose-restricted or normal medium from a common ancestor over a 3 year period, and they were sampled at 1000, 2000, 3000, 4000, 5000, 6000, 7000, and 8000 generations and subjected to proteomic and genomic analyses. Genomic resequencing data revealed different point mutations and other mutational events in each selected generation of L. casei Zhang under glucose restriction stress. The differentially expressed proteins induced by glucose restriction were mostly related to fructose and mannose metabolism, carbohydrate metabolic processes, lyase activity, and amino-acid-transporting ATPase activity. Integrative proteomic and genomic analysis revealed that the mutations protected L. casei Zhang against glucose starvation by regulating other cellular carbohydrate, fatty acid, and amino acid catabolism; phosphoenolpyruvate system pathway activation; glycogen synthesis; ATP consumption; pyruvate metabolism; and general stress-response protein expression. The results help reveal the mechanisms of adapting to glucose starvation and provide new strategies for enhancing the industrial utility of L. casei Zhang.

  13. MIPS Arabidopsis thaliana Database (MAtDB): an integrated biological knowledge resource for plant genomics

    Science.gov (United States)

    Schoof, Heiko; Ernst, Rebecca; Nazarov, Vladimir; Pfeifer, Lukas; Mewes, Hans-Werner; Mayer, Klaus F. X.

    2004-01-01

    Arabidopsis thaliana is the most widely studied model plant. Functional genomics is intensively underway in many laboratories worldwide. Beyond the basic annotation of the primary sequence data, the annotated genetic elements of Arabidopsis must be linked to diverse biological data and higher order information such as metabolic or regulatory pathways. The MIPS Arabidopsis thaliana database MAtDB aims to provide a comprehensive resource for Arabidopsis as a genome model that serves as a primary reference for research in plants and is suitable for transfer of knowledge to other plants, especially crops. The genome sequence as a common backbone serves as a scaffold for the integration of data, while, in a complementary effort, these data are enhanced through the application of state-of-the-art bioinformatics tools. This information is visualized on a genome-wide and a gene-by-gene basis with access both for web users and applications. This report updates the information given in a previous report and provides an outlook on further developments. The MAtDB web interface can be accessed at http://mips.gsf.de/proj/thal/db. PMID:14681437

  14. Integrative Genomics Reveals Mechanisms of Copy Number Alterations Responsible for Transcriptional Deregulation in Colorectal Cancer

    Science.gov (United States)

    Camps, Jordi; Nguyen, Quang Tri; Padilla-Nash, Hesed M.; Knutsen, Turid; McNeil, Nicole E.; Wangsa, Danny; Hummon, Amanda B.; Grade, Marian; Ried, Thomas; Difilippantonio, Michael J.

    2016-01-01

    To evaluate the mechanisms and consequences of chromosomal aberrations in colorectal cancer (CRC), we used a combination of spectral karyotyping, array comparative genomic hybridization (aCGH), and array-based global gene expression profiling on 31 primary carcinomas and 15 established cell lines. Importantly, aCGH showed that the genomic profiles of primary tumors are recapitulated in the cell lines. We revealed a preponderance of chromosome breakpoints at sites of copy number variants (CNVs) in the CRC cell lines, a novel mechanism of DNA breakage in cancer. The integration of gene expression and aCGH led to the identification of 157 genes localized within high-level copy number changes whose transcriptional deregulation was significantly affected across all of the samples, thereby suggesting that these genes play a functional role in CRC. Genomic amplification at 8q24 was the most recurrent event and led to the overexpression of MYC and FAM84B. Copy number dependent gene expression resulted in deregulation of known cancer genes such as APC, FGFR2, and ERBB2. The identification of only 36 genes whose localization near a breakpoint could account for their observed deregulated expression demonstrates that the major mechanism for transcriptional deregulation in CRC is genomic copy number changes resulting from chromosomal aberrations. PMID:19691111

  15. The MedSeq Project: a randomized trial of integrating whole genome sequencing into clinical medicine.

    Science.gov (United States)

    Vassy, Jason L; Lautenbach, Denise M; McLaughlin, Heather M; Kong, Sek Won; Christensen, Kurt D; Krier, Joel; Kohane, Isaac S; Feuerman, Lindsay Z; Blumenthal-Barby, Jennifer; Roberts, J Scott; Lehmann, Lisa Soleymani; Ho, Carolyn Y; Ubel, Peter A; MacRae, Calum A; Seidman, Christine E; Murray, Michael F; McGuire, Amy L; Rehm, Heidi L; Green, Robert C

    2014-03-20

    illuminate the impact of integrating genomic medicine into the clinical care of patients but also inform the design of future studies. ClinicalTrials.gov identifier NCT01736566.

  16. Integrated microbial processes for biofuels and high value-added products: the way to improve the cost effectiveness of biofuel production.

    Science.gov (United States)

    da Silva, Teresa Lopes; Gouveia, Luísa; Reis, Alberto

    2014-02-01

    The production of microbial biofuels is currently under investigation, as they are alternative sources to fossil fuels, which are diminishing and their use has a negative impact on the environment. However, so far, biofuels derived from microbes are not economically competitive. One way to overcome this bottleneck is the use of microorganisms to transform substrates into biofuels and high value-added products, and simultaneously taking advantage of the various microbial biomass components to produce other products of interest, as an integrated process. In this way, it is possible to maximize the economic value of the whole process, with the desired reduction of the waste streams produced. It is expected that this integrated system makes the biofuel production economically sustainable and competitive in the near future. This review describes the investigation on integrated microbial processes (based on bacteria, yeast, and microalgal cultivations) that have been experimentally developed, highlighting the importance of this approach as a way to optimize microbial biofuel production process.

  17. Integrated systems for biopolymers and bioenergy production from organic waste and by-products: a review of microbial processes.

    Science.gov (United States)

    Pagliano, Giorgia; Ventorino, Valeria; Panico, Antonio; Pepe, Olimpia

    2017-01-01

    Recently, issues concerning the sustainable and harmless disposal of organic solid waste have generated interest in microbial biotechnologies aimed at converting waste materials into bioenergy and biomaterials, thus contributing to a reduction in economic dependence on fossil fuels. To valorize biomass, waste materials derived from agriculture, food processing factories, and municipal organic waste can be used to produce biopolymers, such as biohydrogen and biogas, through different microbial processes. In fact, different bacterial strains can synthesize biopolymers to convert waste materials into valuable intracellular (e.g., polyhydroxyalkanoates) and extracellular (e.g., exopolysaccharides) bioproducts, which are useful for biochemical production. In particular, large numbers of bacteria, including Alcaligenes eutrophus , Alcaligenes latus , Azotobacter vinelandii , Azotobacter chroococcum , Azotobacter beijerincki , methylotrophs, Pseudomonas spp., Bacillus spp., Rhizobium spp., Nocardia spp., and recombinant Escherichia coli , have been successfully used to produce polyhydroxyalkanoates on an industrial scale from different types of organic by-products. Therefore, the development of high-performance microbial strains and the use of by-products and waste as substrates could reasonably make the production costs of biodegradable polymers comparable to those required by petrochemical-derived plastics and promote their use. Many studies have reported use of the same organic substrates as alternative energy sources to produce biogas and biohydrogen through anaerobic digestion as well as dark and photofermentation processes under anaerobic conditions. Therefore, concurrently obtaining bioenergy and biopolymers at a reasonable cost through an integrated system is becoming feasible using by-products and waste as organic carbon sources. An overview of the suitable substrates and microbial strains used in low-cost polyhydroxyalkanoates for biohydrogen and biogas

  18. Integration of HIV in the Human Genome: Which Sites Are Preferential? A Genetic and Statistical Assessment

    Science.gov (United States)

    Gonçalves, Juliana; Moreira, Elsa; Sequeira, Inês J.; Rodrigues, António S.; Rueff, José; Brás, Aldina

    2016-01-01

    Chromosomal fragile sites (FSs) are loci where gaps and breaks may occur and are preferential integration targets for some viruses, for example, Hepatitis B, Epstein-Barr virus, HPV16, HPV18, and MLV vectors. However, the integration of the human immunodeficiency virus (HIV) in Giemsa bands and in FSs is not yet completely clear. This study aimed to assess the integration preferences of HIV in FSs and in Giemsa bands using an in silico study. HIV integration positions from Jurkat cells were used and two nonparametric tests were applied to compare HIV integration in dark versus light bands and in FS versus non-FS (NFSs). The results show that light bands are preferential targets for integration of HIV-1 in Jurkat cells and also that it integrates with equal intensity in FSs and in NFSs. The data indicates that HIV displays different preferences for FSs compared to other viruses. The aim was to develop and apply an approach to predict the conditions and constraints of HIV insertion in the human genome which seems to adequately complement empirical data. PMID:27294106

  19. The Conjugative Relaxase TrwC Promotes Integration of Foreign DNA in the Human Genome.

    Science.gov (United States)

    González-Prieto, Coral; Gabriel, Richard; Dehio, Christoph; Schmidt, Manfred; Llosa, Matxalen

    2017-06-15

    Bacterial conjugation is a mechanism of horizontal DNA transfer. The relaxase TrwC of the conjugative plasmid R388 cleaves one strand of the transferred DNA at the oriT gene, covalently attaches to it, and leads the single-stranded DNA (ssDNA) into the recipient cell. In addition, TrwC catalyzes site-specific integration of the transferred DNA into its target sequence present in the genome of the recipient bacterium. Here, we report the analysis of the efficiency and specificity of the integrase activity of TrwC in human cells, using the type IV secretion system of the human pathogen Bartonella henselae to introduce relaxase-DNA complexes. Compared to Mob relaxase from plasmid pBGR1, we found that TrwC mediated a 10-fold increase in the rate of plasmid DNA transfer to human cells and a 100-fold increase in the rate of chromosomal integration of the transferred DNA. We used linear amplification-mediated PCR and plasmid rescue to characterize the integration pattern in the human genome. DNA sequence analysis revealed mostly reconstituted oriT sequences, indicating that TrwC is active and recircularizes transferred DNA in human cells. One TrwC-mediated site-specific integration event was detected, proving that TrwC is capable of mediating site-specific integration in the human genome, albeit with very low efficiency compared to the rate of random integration. Our results suggest that TrwC may stabilize the plasmid DNA molecules in the nucleus of the human cell, probably by recircularization of the transferred DNA strand. This stabilization would increase the opportunities for integration of the DNA by the host machinery. IMPORTANCE Different biotechnological applications, including gene therapy strategies, require permanent modification of target cells. Long-term expression is achieved either by extrachromosomal persistence or by integration of the introduced DNA. Here, we studied the utility of conjugative relaxase TrwC, a bacterial protein with site

  20. IVAG: An Integrative Visualization Application for Various Types of Genomic Data Based on R-Shiny and the Docker Platform.

    Science.gov (United States)

    Lee, Tae-Rim; Ahn, Jin Mo; Kim, Gyuhee; Kim, Sangsoo

    2017-12-01

    Next-generation sequencing (NGS) technology has become a trend in the genomics research area. There are many software programs and automated pipelines to analyze NGS data, which can ease the pain for traditional scientists who are not familiar with computer programming. However, downstream analyses, such as finding differentially expressed genes or visualizing linkage disequilibrium maps and genome-wide association study (GWAS) data, still remain a challenge. Here, we introduce a dockerized web application written in R using the Shiny platform to visualize pre-analyzed RNA sequencing and GWAS data. In addition, we have integrated a genome browser based on the JBrowse platform and an automated intermediate parsing process required for custom track construction, so that users can easily build and navigate their personal genome tracks with in-house datasets. This application will help scientists perform series of downstream analyses and obtain a more integrative understanding about various types of genomic data by interactively visualizing them with customizable options.

  1. BiologicalNetworks 2.0 - an integrative view of genome biology data

    Directory of Open Access Journals (Sweden)

    Ponomarenko Julia

    2010-12-01

    Full Text Available Abstract Background A significant problem in the study of mechanisms of an organism's development is the elucidation of interrelated factors which are making an impact on the different levels of the organism, such as genes, biological molecules, cells, and cell systems. Numerous sources of heterogeneous data which exist for these subsystems are still not integrated sufficiently enough to give researchers a straightforward opportunity to analyze them together in the same frame of study. Systematic application of data integration methods is also hampered by a multitude of such factors as the orthogonal nature of the integrated data and naming problems. Results Here we report on a new version of BiologicalNetworks, a research environment for the integral visualization and analysis of heterogeneous biological data. BiologicalNetworks can be queried for properties of thousands of different types of biological entities (genes/proteins, promoters, COGs, pathways, binding sites, and other and their relations (interactions, co-expression, co-citations, and other. The system includes the build-pathways infrastructure for molecular interactions/relations and module discovery in high-throughput experiments. Also implemented in BiologicalNetworks are the Integrated Genome Viewer and Comparative Genomics Browser applications, which allow for the search and analysis of gene regulatory regions and their conservation in multiple species in conjunction with molecular pathways/networks, experimental data and functional annotations. Conclusions The new release of BiologicalNetworks together with its back-end database introduces extensive functionality for a more efficient integrated multi-level analysis of microarray, sequence, regulatory, and other data. BiologicalNetworks is freely available at http://www.biologicalnetworks.org.

  2. Integration of Multiple Genomic and Phenotype Data to Infer Novel miRNA-Disease Associations.

    Science.gov (United States)

    Shi, Hongbo; Zhang, Guangde; Zhou, Meng; Cheng, Liang; Yang, Haixiu; Wang, Jing; Sun, Jie; Wang, Zhenzhen

    2016-01-01

    MicroRNAs (miRNAs) play an important role in the development and progression of human diseases. The identification of disease-associated miRNAs will be helpful for understanding the molecular mechanisms of diseases at the post-transcriptional level. Based on different types of genomic data sources, computational methods for miRNA-disease association prediction have been proposed. However, individual source of genomic data tends to be incomplete and noisy; therefore, the integration of various types of genomic data for inferring reliable miRNA-disease associations is urgently needed. In this study, we present a computational framework, CHNmiRD, for identifying miRNA-disease associations by integrating multiple genomic and phenotype data, including protein-protein interaction data, gene ontology data, experimentally verified miRNA-target relationships, disease phenotype information and known miRNA-disease connections. The performance of CHNmiRD was evaluated by experimentally verified miRNA-disease associations, which achieved an area under the ROC curve (AUC) of 0.834 for 5-fold cross-validation. In particular, CHNmiRD displayed excellent performance for diseases without any known related miRNAs. The results of case studies for three human diseases (glioblastoma, myocardial infarction and type 1 diabetes) showed that all of the top 10 ranked miRNAs having no known associations with these three diseases in existing miRNA-disease databases were directly or indirectly confirmed by our latest literature mining. All these results demonstrated the reliability and efficiency of CHNmiRD, and it is anticipated that CHNmiRD will serve as a powerful bioinformatics method for mining novel disease-related miRNAs and providing a new perspective into molecular mechanisms underlying human diseases at the post-transcriptional level. CHNmiRD is freely available at http://www.bio-bigdata.com/CHNmiRD.

  3. Integration of Multiple Genomic and Phenotype Data to Infer Novel miRNA-Disease Associations.

    Directory of Open Access Journals (Sweden)

    Hongbo Shi

    Full Text Available MicroRNAs (miRNAs play an important role in the development and progression of human diseases. The identification of disease-associated miRNAs will be helpful for understanding the molecular mechanisms of diseases at the post-transcriptional level. Based on different types of genomic data sources, computational methods for miRNA-disease association prediction have been proposed. However, individual source of genomic data tends to be incomplete and noisy; therefore, the integration of various types of genomic data for inferring reliable miRNA-disease associations is urgently needed. In this study, we present a computational framework, CHNmiRD, for identifying miRNA-disease associations by integrating multiple genomic and phenotype data, including protein-protein interaction data, gene ontology data, experimentally verified miRNA-target relationships, disease phenotype information and known miRNA-disease connections. The performance of CHNmiRD was evaluated by experimentally verified miRNA-disease associations, which achieved an area under the ROC curve (AUC of 0.834 for 5-fold cross-validation. In particular, CHNmiRD displayed excellent performance for diseases without any known related miRNAs. The results of case studies for three human diseases (glioblastoma, myocardial infarction and type 1 diabetes showed that all of the top 10 ranked miRNAs having no known associations with these three diseases in existing miRNA-disease databases were directly or indirectly confirmed by our latest literature mining. All these results demonstrated the reliability and efficiency of CHNmiRD, and it is anticipated that CHNmiRD will serve as a powerful bioinformatics method for mining novel disease-related miRNAs and providing a new perspective into molecular mechanisms underlying human diseases at the post-transcriptional level. CHNmiRD is freely available at http://www.bio-bigdata.com/CHNmiRD.

  4. A statistical framework to predict functional non-coding regions in the human genome through integrated analysis of annotation data.

    Science.gov (United States)

    Lu, Qiongshi; Hu, Yiming; Sun, Jiehuan; Cheng, Yuwei; Cheung, Kei-Hoi; Zhao, Hongyu

    2015-05-27

    Identifying functional regions in the human genome is a major goal in human genetics. Great efforts have been made to functionally annotate the human genome either through computational predictions, such as genomic conservation, or high-throughput experiments, such as the ENCODE project. These efforts have resulted in a rich collection of functional annotation data of diverse types that need to be jointly analyzed for integrated interpretation and annotation. Here we present GenoCanyon, a whole-genome annotation method that performs unsupervised statistical learning using 22 computational and experimental annotations thereby inferring the functional potential of each position in the human genome. With GenoCanyon, we are able to predict many of the known functional regions. The ability of predicting functional regions as well as its generalizable statistical framework makes GenoCanyon a unique and powerful tool for whole-genome annotation. The GenoCanyon web server is available at http://genocanyon.med.yale.edu.

  5. Selective Gene Delivery for Integrating Exogenous DNA into Plastid and Mitochondrial Genomes Using Peptide-DNA Complexes.

    Science.gov (United States)

    Yoshizumi, Takeshi; Oikawa, Kazusato; Chuah, Jo-Ann; Kodama, Yutaka; Numata, Keiji

    2018-05-14

    Selective gene delivery into organellar genomes (mitochondrial and plastid genomes) has been limited because of a lack of appropriate platform technology, even though these organelles are essential for metabolite and energy production. Techniques for selective organellar modification are needed to functionally improve organelles and produce transplastomic/transmitochondrial plants. However, no method for mitochondrial genome modification has yet been established for multicellular organisms including plants. Likewise, modification of plastid genomes has been limited to a few plant species and algae. In the present study, we developed ionic complexes of fusion peptides containing organellar targeting signal and plasmid DNA for selective delivery of exogenous DNA into the plastid and mitochondrial genomes of intact plants. This is the first report of exogenous DNA being integrated into the mitochondrial genomes of not only plants, but also multicellular organisms in general. This fusion peptide-mediated gene delivery system is a breakthrough platform for both plant organellar biotechnology and gene therapy for mitochondrial diseases in animals.

  6. Genome-Wide Analysis of Transposon and Retroviral Insertions Reveals Preferential Integrations in Regions of DNA Flexibility.

    Science.gov (United States)

    Vrljicak, Pavle; Tao, Shijie; Varshney, Gaurav K; Quach, Helen Ngoc Bao; Joshi, Adita; LaFave, Matthew C; Burgess, Shawn M; Sampath, Karuna

    2016-04-07

    DNA transposons and retroviruses are important transgenic tools for genome engineering. An important consideration affecting the choice of transgenic vector is their insertion site preferences. Previous large-scale analyses of Ds transposon integration sites in plants were done on the basis of reporter gene expression or germ-line transmission, making it difficult to discern vertebrate integration preferences. Here, we compare over 1300 Ds transposon integration sites in zebrafish with Tol2 transposon and retroviral integration sites. Genome-wide analysis shows that Ds integration sites in the presence or absence of marker selection are remarkably similar and distributed throughout the genome. No strict motif was found, but a preference for structural features in the target DNA associated with DNA flexibility (Twist, Tilt, Rise, Roll, Shift, and Slide) was observed. Remarkably, this feature is also found in transposon and retroviral integrations in maize and mouse cells. Our findings show that structural features influence the integration of heterologous DNA in genomes, and have implications for targeted genome engineering. Copyright © 2016 Vrljicak et al.

  7. Complete genome sequence provides insights into the biodrying-related microbial function of Bacillus thermoamylovorans isolated from sewage sludge biodrying material.

    Science.gov (United States)

    Cai, Lu; Zheng, Sheng-Wei; Shen, Yu-Jun; Zheng, Guo-Di; Liu, Hong-Tao; Wu, Zhi-Ying

    2018-07-01

    To enable the development of microbial agents and identify suitable candidate used for biodrying, the existence and function of Bacillus thermoamylovorans during sewage sludge biodrying merits investigation. This study isolated a strain of B. thermoamylovorans during sludge biodrying, submitted it for complete genome sequencing and analyzed its potential microbial functions. After biodrying, the moisture content of the biodrying material decreased from 66.33% to 50.18%, and B. thermoamylovorans was the ecologically dominant Bacillus, with the primary annotations associated with amino acid transport and metabolism (9.53%) and carbohydrate transport and metabolism (8.14%). It contains 96 carbohydrate-active- enzyme-encoding gene counts, mainly distributed in glycoside hydrolases (33.3%) and glycosyl transferases (27.1%). The virulence factors are mainly associated with biosynthesis of capsule and polysaccharide capsule. This work indicates that among the biodrying microorganisms, B. thermoamylovorans has good potential for degrading recalcitrant and readily degradable components, thus being a potential microbial agent used to improve biodrying. Copyright © 2018 Elsevier Ltd. All rights reserved.

  8. Employment of Near Full-Length Ribosome Gene TA-Cloning and Primer-Blast to Detect Multiple Species in a Natural Complex Microbial Community Using Species-Specific Primers Designed with Their Genome Sequences.

    Science.gov (United States)

    Zhang, Huimin; He, Hongkui; Yu, Xiujuan; Xu, Zhaohui; Zhang, Zhizhou

    2016-11-01

    It remains an unsolved problem to quantify a natural microbial community by rapidly and conveniently measuring multiple species with functional significance. Most widely used high throughput next-generation sequencing methods can only generate information mainly for genus-level taxonomic identification and quantification, and detection of multiple species in a complex microbial community is still heavily dependent on approaches based on near full-length ribosome RNA gene or genome sequence information. In this study, we used near full-length rRNA gene library sequencing plus Primer-Blast to design species-specific primers based on whole microbial genome sequences. The primers were intended to be specific at the species level within relevant microbial communities, i.e., a defined genomics background. The primers were tested with samples collected from the Daqu (also called fermentation starters) and pit mud of a traditional Chinese liquor production plant. Sixteen pairs of primers were found to be suitable for identification of individual species. Among them, seven pairs were chosen to measure the abundance of microbial species through quantitative PCR. The combination of near full-length ribosome RNA gene library sequencing and Primer-Blast may represent a broadly useful protocol to quantify multiple species in complex microbial population samples with species-specific primers.

  9. Perspectives on Clinical Informatics: Integrating Large-Scale Clinical, Genomic, and Health Information for Clinical Care

    Directory of Open Access Journals (Sweden)

    In Young Choi

    2013-12-01

    Full Text Available The advances in electronic medical records (EMRs and bioinformatics (BI represent two significant trends in healthcare. The widespread adoption of EMR systems and the completion of the Human Genome Project developed the technologies for data acquisition, analysis, and visualization in two different domains. The massive amount of data from both clinical and biology domains is expected to provide personalized, preventive, and predictive healthcare services in the near future. The integrated use of EMR and BI data needs to consider four key informatics areas: data modeling, analytics, standardization, and privacy. Bioclinical data warehouses integrating heterogeneous patient-related clinical or omics data should be considered. The representative standardization effort by the Clinical Bioinformatics Ontology (CBO aims to provide uniquely identified concepts to include molecular pathology terminologies. Since individual genome data are easily used to predict current and future health status, different safeguards to ensure confidentiality should be considered. In this paper, we focused on the informatics aspects of integrating the EMR community and BI community by identifying opportunities, challenges, and approaches to provide the best possible care service for our patients and the population.

  10. Palm oil mill effluent treatment using a two-stage microbial fuel cells system integrated with immobilized biological aerated filters.

    Science.gov (United States)

    Cheng, Jia; Zhu, Xiuping; Ni, Jinren; Borthwick, Alistair

    2010-04-01

    An integrated system of two-stage microbial fuel cells (MFCs) and immobilized biological aerated filters (I-BAFs) was used to treat palm oil mill effluent (POME) at laboratory scale. By replacing the conventional two-stage up-flow anaerobic sludge blanket (UASB) with a newly proposed upflow membrane-less microbial fuel cell (UML-MFC) in the integrated system, significant improvements on NH(3)-N removal were observed and direct electricity generation implemented in both MFC1 and MFC2. Moreover, the coupled iron-carbon micro-electrolysis in the cathode of MFC2 further enhanced treatment efficiency of organic compounds. The I-BAFs played a major role in further removal of NH(3)-N and COD. For influent COD and NH(3)-N of 10,000 and 125 mg/L, respectively, the final effluents COD and NH(3)-N were below 350 and 8 mg/L, with removal rates higher than 96.5% and 93.6%. The GC-MS analysis indicated that most of the contaminants were satisfactorily biodegraded by the integrated system. Copyright 2009 Elsevier Ltd. All rights reserved.

  11. Integration of Genome Scale Metabolic Networks and Gene Regulation of Metabolic Enzymes With Physiologically Based Pharmacokinetics.

    Science.gov (United States)

    Maldonado, Elaina M; Leoncikas, Vytautas; Fisher, Ciarán P; Moore, J Bernadette; Plant, Nick J; Kierzek, Andrzej M

    2017-11-01

    The scope of physiologically based pharmacokinetic (PBPK) modeling can be expanded by assimilation of the mechanistic models of intracellular processes from systems biology field. The genome scale metabolic networks (GSMNs) represent a whole set of metabolic enzymes expressed in human tissues. Dynamic models of the gene regulation of key drug metabolism enzymes are available. Here, we introduce GSMNs and review ongoing work on integration of PBPK, GSMNs, and metabolic gene regulation. We demonstrate example models. © 2017 The Authors CPT: Pharmacometrics & Systems Pharmacology published by Wiley Periodicals, Inc. on behalf of American Society for Clinical Pharmacology and Therapeutics.

  12. Inferring genetic architecture of complex traits using Bayesian integrative analysis of genome and transcriptiome data

    DEFF Research Database (Denmark)

    Ehsani, Alireza; Sørensen, Peter; Pomp, Daniel

    2012-01-01

    Background To understand the genetic architecture of complex traits and bridge the genotype-phenotype gap, it is useful to study intermediate -omics data, e.g. the transcriptome. The present study introduces a method for simultaneous quantification of the contributions from single nucleotide......-modal distribution of genomic values collapses, when gene expressions are added to the model Conclusions With increased availability of various -omics data, integrative approaches are promising tools for understanding the genetic architecture of complex traits. Partitioning of explained variances at the chromosome...

  13. Linking genes to microbial growth kinetics: an integrated biochemical systems engineering approach

    NARCIS (Netherlands)

    Koutinas, M.; Kiparissides, A.; Silva-Rocha, R.; Lam, M.C.; Martins Dos Santos, V.A.P.; Lorenzo, de V.; Pistikopoulos, E.N.; Mantalaris, A.

    2011-01-01

    The majority of models describing the kinetic properties of a microorganism for a given substrate are unstructured and empirical. They are formulated in this manner so that the complex mechanism of cell growth is simplified. Herein, a novel approach for modelling microbial growth kinetics is

  14. Genome-Resolved Metagenomic Analysis Reveals Roles for Candidate Phyla and Other Microbial Community Members in Biogeochemical Transformations in Oil Reservoirs

    Directory of Open Access Journals (Sweden)

    Ping Hu

    2016-01-01

    Full Text Available Oil reservoirs are major sites of methane production and carbon turnover, processes with significant impacts on energy resources and global biogeochemical cycles. We applied a cultivation-independent genomic approach to define microbial community membership and predict roles for specific organisms in biogeochemical transformations in Alaska North Slope oil fields. Produced water samples were collected from six locations between 1,128 m (24 to 27°C and 2,743 m (80 to 83°C below the surface. Microbial community complexity decreased with increasing temperature, and the potential to degrade hydrocarbon compounds was most prevalent in the lower-temperature reservoirs. Sulfate availability, rather than sulfate reduction potential, seems to be the limiting factor for sulfide production in some of the reservoirs under investigation. Most microorganisms in the intermediate- and higher-temperature samples were related to previously studied methanogenic and nonmethanogenic archaea and thermophilic bacteria, but one candidate phylum bacterium, a member of the Acetothermia (OP1, was present in Kuparuk sample K3. The greatest numbers of candidate phyla were recovered from the mesothermic reservoir samples SB1 and SB2. We reconstructed a nearly complete genome for an organism from the candidate phylum Parcubacteria (OD1 that was abundant in sample SB1. Consistent with prior findings for members of this lineage, the OD1 genome is small, and metabolic predictions support an obligately anaerobic, fermentation-based lifestyle. At moderate abundance in samples SB1 and SB2 were members of bacteria from other candidate phyla, including Microgenomates (OP11, Atribacteria (OP9, candidate phyla TA06 and WS6, and Marinimicrobia (SAR406. The results presented here elucidate potential roles of organisms in oil reservoir biological processes.

  15. In vitro analysis of integrated global high-resolution DNA methylation profiling with genomic imbalance and gene expression in osteosarcoma.

    Directory of Open Access Journals (Sweden)

    Bekim Sadikovic

    Full Text Available Genetic and epigenetic changes contribute to deregulation of gene expression and development of human cancer. Changes in DNA methylation are key epigenetic factors regulating gene expression and genomic stability. Recent progress in microarray technologies resulted in developments of high resolution platforms for profiling of genetic, epigenetic and gene expression changes. OS is a pediatric bone tumor with characteristically high level of numerical and structural chromosomal changes. Furthermore, little is known about DNA methylation changes in OS. Our objective was to develop an integrative approach for analysis of high-resolution epigenomic, genomic, and gene expression profiles in order to identify functional epi/genomic differences between OS cell lines and normal human osteoblasts. A combination of Affymetrix Promoter Tilling Arrays for DNA methylation, Agilent array-CGH platform for genomic imbalance and Affymetrix Gene 1.0 platform for gene expression analysis was used. As a result, an integrative high-resolution approach for interrogation of genome-wide tumour-specific changes in DNA methylation was developed. This approach was used to provide the first genomic DNA methylation maps, and to identify and validate genes with aberrant DNA methylation in OS cell lines. This first integrative analysis of global cancer-related changes in DNA methylation, genomic imbalance, and gene expression has provided comprehensive evidence of the cumulative roles of epigenetic and genetic mechanisms in deregulation of gene expression networks.

  16. Integration of transcriptome and whole genomic resequencing data to identify key genes affecting swine fat deposition.

    Directory of Open Access Journals (Sweden)

    Kai Xing

    Full Text Available Fat deposition is highly correlated with the growth, meat quality, reproductive performance and immunity of pigs. Fatty acid synthesis takes place mainly in the adipose tissue of pigs; therefore, in this study, a high-throughput massively parallel sequencing approach was used to generate adipose tissue transcriptomes from two groups of Songliao black pigs that had opposite backfat thickness phenotypes. The total number of paired-end reads produced for each sample was in the range of 39.29-49.36 millions. Approximately 188 genes were differentially expressed in adipose tissue and were enriched for metabolic processes, such as fatty acid biosynthesis, lipid synthesis, metabolism of fatty acids, etinol, caffeine and arachidonic acid and immunity. Additionally, many genetic variations were detected between the two groups through pooled whole-genome resequencing. Integration of transcriptome and whole-genome resequencing data revealed important genomic variations among the differentially expressed genes for fat deposition, for example, the lipogenic genes. Further studies are required to investigate the roles of candidate genes in fat deposition to improve pig breeding programs.

  17. Cancer prevention, the need to preserve the integrity of the genome at all cost.

    Science.gov (United States)

    Okafor, M T; Nwagha, T U; Anusiem, C; Okoli, U A; Nubila, N I; Al-Alloosh, F; Udenyia, I J

    2018-05-01

    The entire genetic information carried by an organism makes up its genome. Genes have a diverse number of functions. They code different proteins for normal proliferation of cells. However, changes in the base sequence of genes affect their protein by-products which act as messengers for normal cellular functions such as proliferation and repairs. Salient processes for maintaining the integrity of the genome are hinged on intricate mechanisms put in place for the evolution to tackle genomic stresses. To discuss how cells sense and repair damage to their deoxyribonucleic acid (DNA) as well as to highlight how defects in the genes involved in DNA repair contribute to cancer development. Methodology: Online searches on the following databases such as Google Scholar, PubMed, Biomed Central, and SciELO were done. Attempt was made to review articles with keywords such as cancer, cell cycle, tumor suppressor genes, and DNA repair. The cell cycle, tumor suppression genes, DNA repair mechanism, as well as their contribution to cancer development, were discussed and reviewed. Knowledge on how cells detect and repair DNA damage through an array of mechanisms should allay our anxiety as regards cancer development. More studies on DNA damage detection and repair processes are important toward a holistic approach to cancer treatment.

  18. metabolicMine: an integrated genomics, genetics and proteomics data warehouse for common metabolic disease research.

    Science.gov (United States)

    Lyne, Mike; Smith, Richard N; Lyne, Rachel; Aleksic, Jelena; Hu, Fengyuan; Kalderimis, Alex; Stepan, Radek; Micklem, Gos

    2013-01-01

    Common metabolic and endocrine diseases such as diabetes affect millions of people worldwide and have a major health impact, frequently leading to complications and mortality. In a search for better prevention and treatment, there is ongoing research into the underlying molecular and genetic bases of these complex human diseases, as well as into the links with risk factors such as obesity. Although an increasing number of relevant genomic and proteomic data sets have become available, the quantity and diversity of the data make their efficient exploitation challenging. Here, we present metabolicMine, a data warehouse with a specific focus on the genomics, genetics and proteomics of common metabolic diseases. Developed in collaboration with leading UK metabolic disease groups, metabolicMine integrates data sets from a range of experiments and model organisms alongside tools for exploring them. The current version brings together information covering genes, proteins, orthologues, interactions, gene expression, pathways, ontologies, diseases, genome-wide association studies and single nucleotide polymorphisms. Although the emphasis is on human data, key data sets from mouse and rat are included. These are complemented by interoperation with the RatMine rat genomics database, with a corresponding mouse version under development by the Mouse Genome Informatics (MGI) group. The web interface contains a number of features including keyword search, a library of Search Forms, the QueryBuilder and list analysis tools. This provides researchers with many different ways to analyse, view and flexibly export data. Programming interfaces and automatic code generation in several languages are supported, and many of the features of the web interface are available through web services. The combination of diverse data sets integrated with analysis tools and a powerful query system makes metabolicMine a valuable research resource. The web interface makes it accessible to first

  19. Chromosomally Integrated Human Herpesvirus 6: Models of Viral Genome Release from the Telomere and Impacts on Human Health.

    Science.gov (United States)

    Wood, Michael L; Royle, Nicola J

    2017-07-12

    Human herpesvirus 6A and 6B, alongside some other herpesviruses, have the striking capacity to integrate into telomeres, the terminal repeated regions of chromosomes. The chromosomally integrated forms, ciHHV-6A and ciHHV-6B, are proposed to be a state of latency and it has been shown that they can both be inherited if integration occurs in the germ line. The first step in full viral reactivation must be the release of the integrated viral genome from the telomere and here we propose various models of this release involving transcription of the viral genome, replication fork collapse, and t-circle mediated release. In this review, we also discuss the relationship between ciHHV-6 and the telomere carrying the insertion, particularly how the presence and subsequent partial or complete release of the ciHHV-6 genome may affect telomere dynamics and the risk of disease.

  20. NGS-based approach to determine the presence of HPV and their sites of integration in human cancer genome.

    Science.gov (United States)

    Chandrani, P; Kulkarni, V; Iyer, P; Upadhyay, P; Chaubal, R; Das, P; Mulherkar, R; Singh, R; Dutt, A

    2015-06-09

    Human papilloma virus (HPV) accounts for the most common cause of all virus-associated human cancers. Here, we describe the first graphic user interface (GUI)-based automated tool 'HPVDetector', for non-computational biologists, exclusively for detection and annotation of the HPV genome based on next-generation sequencing data sets. We developed a custom-made reference genome that comprises of human chromosomes along with annotated genome of 143 HPV types as pseudochromosomes. The tool runs on a dual mode as defined by the user: a 'quick mode' to identify presence of HPV types and an 'integration mode' to determine genomic location for the site of integration. The input data can be a paired-end whole-exome, whole-genome or whole-transcriptome data set. The HPVDetector is available in public domain for download: http://www.actrec.gov.in/pi-webpages/AmitDutt/HPVdetector/HPVDetector.html. On the basis of our evaluation of 116 whole-exome, 23 whole-transcriptome and 2 whole-genome data, we were able to identify presence of HPV in 20 exomes and 4 transcriptomes of cervical and head and neck cancer tumour samples. Using the inbuilt annotation module of HPVDetector, we found predominant integration of viral gene E7, a known oncogene, at known 17q21, 3q27, 7q35, Xq28 and novel sites of integration in the human genome. Furthermore, co-infection with high-risk HPVs such as 16 and 31 were found to be mutually exclusive compared with low-risk HPV71. HPVDetector is a simple yet precise and robust tool for detecting HPV from tumour samples using variety of next-generation sequencing platforms including whole genome, whole exome and transcriptome. Two different modes (quick detection and integration mode) along with a GUI widen the usability of HPVDetector for biologists and clinicians with minimal computational knowledge.

  1. Microenvironmental Heterogeneity Parallels Breast Cancer Progression: A Histology-Genomic Integration Analysis.

    Directory of Open Access Journals (Sweden)

    Rachael Natrajan

    2016-02-01

    Full Text Available The intra-tumor diversity of cancer cells is under intense investigation; however, little is known about the heterogeneity of the tumor microenvironment that is key to cancer progression and evolution. We aimed to assess the degree of microenvironmental heterogeneity in breast cancer and correlate this with genomic and clinical parameters.We developed a quantitative measure of microenvironmental heterogeneity along three spatial dimensions (3-D in solid tumors, termed the tumor ecosystem diversity index (EDI, using fully automated histology image analysis coupled with statistical measures commonly used in ecology. This measure was compared with disease-specific survival, key mutations, genome-wide copy number, and expression profiling data in a retrospective study of 510 breast cancer patients as a test set and 516 breast cancer patients as an independent validation set. In high-grade (grade 3 breast cancers, we uncovered a striking link between high microenvironmental heterogeneity measured by EDI and a poor prognosis that cannot be explained by tumor size, genomics, or any other data types. However, this association was not observed in low-grade (grade 1 and 2 breast cancers. The prognostic value of EDI was superior to known prognostic factors and was enhanced with the addition of TP53 mutation status (multivariate analysis test set, p = 9 × 10-4, hazard ratio = 1.47, 95% CI 1.17-1.84; validation set, p = 0.0011, hazard ratio = 1.78, 95% CI 1.26-2.52. Integration with genome-wide profiling data identified losses of specific genes on 4p14 and 5q13 that were enriched in grade 3 tumors with high microenvironmental diversity that also substratified patients into poor prognostic groups. Limitations of this study include the number of cell types included in the model, that EDI has prognostic value only in grade 3 tumors, and that our spatial heterogeneity measure was dependent on spatial scale and tumor size.To our knowledge, this is the first

  2. New approaches to assessing the effects of mutagenic agents on the integrity of the human genome

    International Nuclear Information System (INIS)

    Elespuru, R.K.; Sankaranarayanan, K.

    2007-01-01

    Heritable genetic alterations, although individually rare, have a substantial collective health impact. Approximately 20% of these are new mutations of unknown cause. Assessment of the effect of exposures to DNA damaging agents, i.e. mutagenic chemicals and radiations, on the integrity of the human genome and on the occurrence of genetic disease remains a daunting challenge. Recent insights may explain why previous examination of human exposures to ionizing radiation, as in Hiroshima and Nagasaki, failed to reveal heritable genetic effects. New opportunities to assess the heritable genetic damaging effects of environmental mutagens are afforded by: (1) integration of knowledge on the molecular nature of genetic disorders and the molecular effects of mutagens; (2) the development of more practical assays for germline mutagenesis; (3) the likely use of population-based genetic screening in personalized medicine

  3. Draft genome sequence of Lampropedia cohaerens strain CT6(T) isolated from arsenic rich microbial mats of a Himalayan hot water spring.

    Science.gov (United States)

    Tripathi, Charu; Mahato, Nitish K; Rani, Pooja; Singh, Yogendra; Kamra, Komal; Lal, Rup

    2016-01-01

    Lampropedia cohaerens strain CT6(T), a non-motile, aerobic and coccoid strain was isolated from arsenic rich microbial mats (temperature ~45 °C) of a hot water spring located atop the Himalayan ranges at Manikaran, India. The present study reports the first genome sequence of type strain CT6(T) of genus Lampropedia cohaerens. Sequencing data was generated using the Illumina HiSeq 2000 platform and assembled with ABySS v 1.3.5. The 3,158,922 bp genome was assembled into 41 contigs with a mean GC content of 63.5 % and 2823 coding sequences. Strain CT6(T) was found to harbour genes involved in both the Entner-Duodoroff pathway and non-phosphorylated ED pathway. Strain CT6(T) also contained genes responsible for imparting resistance to arsenic, copper, cobalt, zinc, cadmium and magnesium, providing survival advantages at a thermal location. Additionally, the presence of genes associated with biofilm formation, pyrroloquinoline-quinone production, isoquinoline degradation and mineral phosphate solubilisation in the genome demonstrate the diverse genetic potential for survival at stressed niches.

  4. Polymorphic integrations of an endogenous gammaretrovirus in the mule deer genome.

    Science.gov (United States)

    Elleder, Daniel; Kim, Oekyung; Padhi, Abinash; Bankert, Jason G; Simeonov, Ivan; Schuster, Stephan C; Wittekindt, Nicola E; Motameny, Susanne; Poss, Mary

    2012-03-01

    Endogenous retroviruses constitute a significant genomic fraction in all mammalian species. Typically they are evolutionarily old and fixed in the host species population. Here we report on a novel endogenous gammaretrovirus (CrERVγ; for cervid endogenous gammaretrovirus) in the mule deer (Odocoileus hemionus) that is insertionally polymorphic among individuals from the same geographical location, suggesting that it has a more recent evolutionary origin. Using PCR-based methods, we identified seven CrERVγ proviruses and demonstrated that they show various levels of insertional polymorphism in mule deer individuals. One CrERVγ provirus was detected in all mule deer sampled but was absent from white-tailed deer, indicating that this virus originally integrated after the split of the two species, which occurred approximately one million years ago. There are, on average, 100 CrERVγ copies in the mule deer genome based on quantitative PCR analysis. A CrERVγ provirus was sequenced and contained intact open reading frames (ORFs) for three virus genes. Transcripts were identified covering the entire provirus. CrERVγ forms a distinct branch of the gammaretrovirus phylogeny, with the closest relatives of CrERVγ being endogenous gammaretroviruses from sheep and pig. We demonstrated that white-tailed deer (Odocoileus virginianus) and elk (Cervus canadensis) DNA contain proviruses that are closely related to mule deer CrERVγ in a conserved region of pol; more distantly related sequences can be identified in the genome of another member of the Cervidae, the muntjac (Muntiacus muntjak). The discovery of a novel transcriptionally active and insertionally polymorphic retrovirus in mammals could provide a useful model system to study the dynamic interaction between the host genome and an invading retrovirus.

  5. DNA double-strand break response in stem cells: mechanisms to maintain genomic integrity.

    Science.gov (United States)

    Nagaria, Pratik; Robert, Carine; Rassool, Feyruz V

    2013-02-01

    Embryonic stem cells (ESCs) represent the point of origin of all cells in a given organism and must protect their genomes from both endogenous and exogenous genotoxic stress. DNA double-strand breaks (DSBs) are one of the most lethal forms of damage, and failure to adequately repair DSBs would not only compromise the ability of SCs to self-renew and differentiate, but will also lead to genomic instability and disease. Herein, we describe the mechanisms by which ESCs respond to DSB-inducing agents such as reactive oxygen species (ROS) and ionizing radiation, compared to somatic cells. We will also discuss whether the DSB response is fully reprogrammed in induced pluripotent stem cells (iPSCs) and the role of the DNA damage response (DDR) in the reprogramming of these cells. ESCs have distinct mechanisms to protect themselves against DSBs and oxidative stress compared to somatic cells. The response to damage and stress is crucial for the maintenance of self-renewal and differentiation capacity in SCs. iPSCs appear to reprogram some of the responses to genotoxic stress. However, it remains to be determined if iPSCs also retain some DDR characteristics of the somatic cells of origin. The mechanisms regulating the genomic integrity in ESCs and iPSCs are critical for its safe use in regenerative medicine and may shed light on the pathways and factors that maintain genomic stability, preventing diseases such as cancer. This article is part of a Special Issue entitled Biochemistry of Stem Cells. Copyright © 2012 Elsevier B.V. All rights reserved.

  6. An integrative genomic and transcriptomic analysis reveals potential targets associated with cell proliferation in uterine leiomyomas.

    Directory of Open Access Journals (Sweden)

    Priscila Daniele Ramos Cirilo

    Full Text Available Uterine Leiomyomas (ULs are the most common benign tumours affecting women of reproductive age. ULs represent a major problem in public health, as they are the main indication for hysterectomy. Approximately 40-50% of ULs have non-random cytogenetic abnormalities, and half of ULs may have copy number alterations (CNAs. Gene expression microarrays studies have demonstrated that cell proliferation genes act in response to growth factors and steroids. However, only a few genes mapping to CNAs regions were found to be associated with ULs.We applied an integrative analysis using genomic and transcriptomic data to identify the pathways and molecular markers associated with ULs. Fifty-one fresh frozen specimens were evaluated by array CGH (JISTIC and gene expression microarrays (SAM. The CONEXIC algorithm was applied to integrate the data.The integrated analysis identified the top 30 significant genes (P<0.01, which comprised genes associated with cancer, whereas the protein-protein interaction analysis indicated a strong association between FANCA and BRCA1. Functional in silico analysis revealed target molecules for drugs involved in cell proliferation, including FGFR1 and IGFBP5. Transcriptional and protein analyses showed that FGFR1 (P = 0.006 and P<0.01, respectively and IGFBP5 (P = 0.0002 and P = 0.006, respectively were up-regulated in the tumours when compared with the adjacent normal myometrium.The integrative genomic and transcriptomic approach indicated that FGFR1 and IGFBP5 amplification, as well as the consequent up-regulation of the protein products, plays an important role in the aetiology of ULs and thus provides data for potential drug therapies development to target genes associated with cellular proliferation in ULs.

  7. Evolutionary time-scale of the begomoviruses: evidence from integrated sequences in the Nicotiana genome.

    Directory of Open Access Journals (Sweden)

    Pierre Lefeuvre

    Full Text Available Despite having single stranded DNA genomes that are replicated by host DNA polymerases, viruses in the family Geminiviridae are apparently evolving as rapidly as some RNA viruses. The observed substitution rates of geminiviruses in the genera Begomovirus and Mastrevirus are so high that the entire family could conceivably have originated less than a million years ago (MYA. However, the existence of geminivirus related DNA (GRD integrated within the genomes of various Nicotiana species suggests that the geminiviruses probably originated >10 MYA. Some have even suggested that a distinct New-World (NW lineage of begomoviruses may have arisen following the separation by continental drift of African and American proto-begomoviruses ∼110 MYA. We evaluate these various geminivirus origin hypotheses using Bayesian coalescent-based approaches to date firstly the Nicotiana GRD integration events, and then the divergence of the NW and Old-World (OW begomoviruses. Besides rejecting the possibility of a<2 MYA OW-NW begomovirus split, we could also discount that it may have occurred concomitantly with the breakup of Gondwanaland 110 MYA. Although we could only confidently narrow the date of the split down to between 2 and 80 MYA, the most plausible (and best supported date for the split is between 20 and 30 MYA--a time when global cooling ended the dispersal of temperate species between Asia and North America via the Beringian land bridge.

  8. Integrative Functional Genomics for Systems Genetics in GeneWeaver.org.

    Science.gov (United States)

    Bubier, Jason A; Langston, Michael A; Baker, Erich J; Chesler, Elissa J

    2017-01-01

    The abundance of existing functional genomics studies permits an integrative approach to interpreting and resolving the results of diverse systems genetics studies. However, a major challenge lies in assembling and harmonizing heterogeneous data sets across species for facile comparison to the positional candidate genes and coexpression networks that come from systems genetic studies. GeneWeaver is an online database and suite of tools at www.geneweaver.org that allows for fast aggregation and analysis of gene set-centric data. GeneWeaver contains curated experimental data together with resource-level data such as GO annotations, MP annotations, and KEGG pathways, along with persistent stores of user entered data sets. These can be entered directly into GeneWeaver or transferred from widely used resources such as GeneNetwork.org. Data are analyzed using statistical tools and advanced graph algorithms to discover new relations, prioritize candidate genes, and generate function hypotheses. Here we use GeneWeaver to find genes common to multiple gene sets, prioritize candidate genes from a quantitative trait locus, and characterize a set of differentially expressed genes. Coupling a large multispecies repository curated and empirical functional genomics data to fast computational tools allows for the rapid integrative analysis of heterogeneous data for interpreting and extrapolating systems genetics results.

  9. Inter-replicon Gene Flow Contributes to Transcriptional Integration in the Sinorhizobium meliloti Multipartite Genome

    Directory of Open Access Journals (Sweden)

    George C. diCenzo

    2018-05-01

    Full Text Available Integration of newly acquired genes into existing regulatory networks is necessary for successful horizontal gene transfer (HGT. Ten percent of bacterial species contain at least two DNA replicons over 300 kilobases in size, with the secondary replicons derived predominately through HGT. The Sinorhizobium meliloti genome is split between a 3.7 Mb chromosome, a 1.7 Mb chromid consisting largely of genes acquired through ancient HGT, and a 1.4 Mb megaplasmid consisting primarily of recently acquired genes. Here, RNA-sequencing is used to examine the transcriptional consequences of massive, synthetic genome reduction produced through the removal of the megaplasmid and/or the chromid. Removal of the pSymA megaplasmid influenced the transcription of only six genes. In contrast, removal of the chromid influenced expression of ∼8% of chromosomal genes and ∼4% of megaplasmid genes. This was mediated in part by the loss of the ETR DNA region whose presence on pSymB is due to a translocation from the chromosome. No obvious functional bias among the up-regulated genes was detected, although genes with putative homologs on the chromid were enriched. Down-regulated genes were enriched in motility and sensory transduction pathways. Four transcripts were examined further, and in each case the transcriptional change could be traced to loss of specific pSymB regions. In particularly, a chromosomal transporter was induced due to deletion of bdhA likely mediated through 3-hydroxybutyrate accumulation. These data provide new insights into the evolution of the multipartite bacterial genome, and more generally into the integration of horizontally acquired genes into the transcriptome.

  10. Inter-replicon Gene Flow Contributes to Transcriptional Integration in the Sinorhizobium meliloti Multipartite Genome.

    Science.gov (United States)

    diCenzo, George C; Wellappili, Deelaka; Golding, G Brian; Finan, Turlough M

    2018-05-04

    Integration of newly acquired genes into existing regulatory networks is necessary for successful horizontal gene transfer (HGT). Ten percent of bacterial species contain at least two DNA replicons over 300 kilobases in size, with the secondary replicons derived predominately through HGT. The Sinorhizobium meliloti genome is split between a 3.7 Mb chromosome, a 1.7 Mb chromid consisting largely of genes acquired through ancient HGT, and a 1.4 Mb megaplasmid consisting primarily of recently acquired genes. Here, RNA-sequencing is used to examine the transcriptional consequences of massive, synthetic genome reduction produced through the removal of the megaplasmid and/or the chromid. Removal of the pSymA megaplasmid influenced the transcription of only six genes. In contrast, removal of the chromid influenced expression of ∼8% of chromosomal genes and ∼4% of megaplasmid genes. This was mediated in part by the loss of the ETR DNA region whose presence on pSymB is due to a translocation from the chromosome. No obvious functional bias among the up-regulated genes was detected, although genes with putative homologs on the chromid were enriched. Down-regulated genes were enriched in motility and sensory transduction pathways. Four transcripts were examined further, and in each case the transcriptional change could be traced to loss of specific pSymB regions. In particularly, a chromosomal transporter was induced due to deletion of bdhA likely mediated through 3-hydroxybutyrate accumulation. These data provide new insights into the evolution of the multipartite bacterial genome, and more generally into the integration of horizontally acquired genes into the transcriptome. Copyright © 2018 diCenzo, et al.

  11. Genome-wide conserved non-coding microsatellite (CNMS) marker-based integrative genetical genomics for quantitative dissection of seed weight in chickpea.

    Science.gov (United States)

    Bajaj, Deepak; Saxena, Maneesha S; Kujur, Alice; Das, Shouvik; Badoni, Saurabh; Tripathi, Shailesh; Upadhyaya, Hari D; Gowda, C L L; Sharma, Shivali; Singh, Sube; Tyagi, Akhilesh K; Parida, Swarup K

    2015-03-01

    Phylogenetic footprinting identified 666 genome-wide paralogous and orthologous CNMS (conserved non-coding microsatellite) markers from 5'-untranslated and regulatory regions (URRs) of 603 protein-coding chickpea genes. The (CT)n and (GA)n CNMS carrying CTRMCAMV35S and GAGA8BKN3 regulatory elements, respectively, are abundant in the chickpea genome. The mapped genic CNMS markers with robust amplification efficiencies (94.7%) detected higher intraspecific polymorphic potential (37.6%) among genotypes, implying their immense utility in chickpea breeding and genetic analyses. Seventeen differentially expressed CNMS marker-associated genes showing strong preferential and seed tissue/developmental stage-specific expression in contrasting genotypes were selected to narrow down the gene targets underlying seed weight quantitative trait loci (QTLs)/eQTLs (expression QTLs) through integrative genetical genomics. The integration of transcript profiling with seed weight QTL/eQTL mapping, molecular haplotyping, and association analyses identified potential molecular tags (GAGA8BKN3 and RAV1AAT regulatory elements and alleles/haplotypes) in the LOB-domain-containing protein- and KANADI protein-encoding transcription factor genes controlling the cis-regulated expression for seed weight in the chickpea. This emphasizes the potential of CNMS marker-based integrative genetical genomics for the quantitative genetic dissection of complex seed weight in chickpea. © The Author 2014. Published by Oxford University Press on behalf of the Society for Experimental Biology.

  12. Hydrogen production from sugar beet juice using an integrated biohydrogen process of dark fermentation and microbial electrolysis cell.

    Science.gov (United States)

    Dhar, Bipro Ranjan; Elbeshbishy, Elsayed; Hafez, Hisham; Lee, Hyung-Sool

    2015-12-01

    An integrated dark fermentation and microbial electrochemical cell (MEC) process was evaluated for hydrogen production from sugar beet juice. Different substrate to inoculum (S/X) ratios were tested for dark fermentation, and the maximum hydrogen yield was 13% of initial COD at the S/X ratio of 2 and 4 for dark fermentation. Hydrogen yield was 12% of initial COD in the MEC using fermentation liquid end products as substrate, and butyrate only accumulated in the MEC. The overall hydrogen production from the integrated biohydrogen process was 25% of initial COD (equivalent to 6 mol H2/mol hexoseadded), and the energy recovery from sugar beet juice was 57% using the combined biohydrogen. Copyright © 2015 Elsevier Ltd. All rights reserved.

  13. MIPS Arabidopsis thaliana Database (MAtDB): an integrated biological knowledge resource based on the first complete plant genome

    Science.gov (United States)

    Schoof, Heiko; Zaccaria, Paolo; Gundlach, Heidrun; Lemcke, Kai; Rudd, Stephen; Kolesov, Grigory; Arnold, Roland; Mewes, H. W.; Mayer, Klaus F. X.

    2002-01-01

    Arabidopsis thaliana is the first plant for which the complete genome has been sequenced and published. Annotation of complex eukaryotic genomes requires more than the assignment of genetic elements to the sequence. Besides completing the list of genes, we need to discover their cellular roles, their regulation and their interactions in order to understand the workings of the whole plant. The MIPS Arabidopsis thaliana Database (MAtDB; http://mips.gsf.de/proj/thal/db) started out as a repository for genome sequence data in the European Scientists Sequencing Arabidopsis (ESSA) project and the Arabidopsis Genome Initiative. Our aim is to transform MAtDB into an integrated biological knowledge resource by integrating diverse data, tools, query and visualization capabilities and by creating a comprehensive resource for Arabidopsis as a reference model for other species, including crop plants. PMID:11752263

  14. Genomic Microbial Epidemiology Is Needed to Comprehend the Global Problem of Antibiotic Resistance and to Improve Pathogen Diagnosis

    OpenAIRE

    Wyrsch, Ethan R.; Roy Chowdhury, Piklu; Chapman, Toni A.; Charles, Ian G.; Hammond, Jeffrey M.; Djordjevic, Steven P.

    2016-01-01

    Contamination of waste effluent from hospitals and intensive food animal production with antimicrobial residues is an immense global problem. Antimicrobial residues exert selection pressures that influence the acquisition of antimicrobial resistance and virulence genes in diverse microbial populations. Despite these concerns there is only a limited understanding of how antimicrobial residues contribute to the global problem of antimicrobial resistance. Furthermore, rapid detection of emerging...

  15. The Next Generation Precision Medical Record - A Framework for Integrating Genomes and Wearable Sensors with Medical Records

    OpenAIRE

    Batra, Prag; Singh, Enakshi; Bog, Anja; Wright, Mark; Ashley, Euan; Waggott, Daryl

    2016-01-01

    Current medical records are rigid with regards to emerging big biomedical data. Examples of poorly integrated big data that already exist in clinical practice include whole genome sequencing and wearable sensors for real time monitoring. Genome sequencing enables conventional diagnostic interrogation and forms the fundamental baseline for precision health throughout a patients lifetime. Mobile sensors enable tailored monitoring regimes for both reducing risk through precision health intervent...

  16. Network analysis of epidermal growth factor signaling using integrated genomic, proteomic and phosphorylation data.

    Directory of Open Access Journals (Sweden)

    Katrina M Waters

    Full Text Available To understand how integration of multiple data types can help decipher cellular responses at the systems level, we analyzed the mitogenic response of human mammary epithelial cells to epidermal growth factor (EGF using whole genome microarrays, mass spectrometry-based proteomics and large-scale western blots with over 1000 antibodies. A time course analysis revealed significant differences in the expression of 3172 genes and 596 proteins, including protein phosphorylation changes measured by western blot. Integration of these disparate data types showed that each contributed qualitatively different components to the observed cell response to EGF and that varying degrees of concordance in gene expression and protein abundance measurements could be linked to specific biological processes. Networks inferred from individual data types were relatively limited, whereas networks derived from the integrated data recapitulated the known major cellular responses to EGF and exhibited more highly connected signaling nodes than networks derived from any individual dataset. While cell cycle regulatory pathways were altered as anticipated, we found the most robust response to mitogenic concentrations of EGF was induction of matrix metalloprotease cascades, highlighting the importance of the EGFR system as a regulator of the extracellular environment. These results demonstrate the value of integrating multiple levels of biological information to more accurately reconstruct networks of cellular response.

  17. Network Analysis of Epidermal Growth Factor Signaling using Integrated Genomic, Proteomic and Phosphorylation Data

    Energy Technology Data Exchange (ETDEWEB)

    Waters, Katrina M.; Liu, Tao; Quesenberry, Ryan D.; Willse, Alan R.; Bandyopadhyay, Somnath; Kathmann, Loel E.; Weber, Thomas J.; Smith, Richard D.; Wiley, H. S.; Thrall, Brian D.

    2012-03-29

    To understand how integration of multiple data types can help decipher cellular responses at the systems level, we analyzed the mitogenic response of human mammary epithelial cells to epidermal growth factor (EGF) using whole genome microarrays, mass spectrometry-based proteomics and large-scale western blots with over 1000 antibodies. A time course analysis revealed significant differences in the expression of 3172 genes and 596 proteins, including protein phosphorylation changes measured by western blot. Integration of these disparate data types showed that each contributed qualitatively different components to the observed cell response to EGF and that varying degrees of concordance in gene expression and protein abundance measurements could be linked to specific biological processes. Networks inferred from individual data types were relatively limited, whereas networks derived from the integrated data recapitulated the known major cellular responses to EGF and exhibited more highly connected signaling nodes than networks derived from any individual dataset. While cell cycle regulatory pathways were altered as anticipated, we found the most robust response to mitogenic concentrations of EGF was induction of matrix metalloprotease cascades, highlighting the importance of the EGFR system as a regulator of the extracellular environment. These results demonstrate the value of integrating multiple levels of biological information to more accurately reconstruct networks of cellular response.

  18. Complete genome sequence and description of Salinispira pacifica gen. nov., sp. nov., a novel spirochaete isolated form a hypersaline microbial mat.

    Science.gov (United States)

    Ben Hania, Wajdi; Joseph, Manon; Schumann, Peter; Bunk, Boyke; Fiebig, Anne; Spröer, Cathrin; Klenk, Hans-Peter; Fardeau, Marie-Laure; Spring, Stefan

    2015-01-01

    During a study of the anaerobic microbial community of a lithifying hypersaline microbial mat of Lake 21 on the Kiritimati atoll (Kiribati Republic, Central Pacific) strain L21-RPul-D2(T) was isolated. The closest phylogenetic neighbor was Spirochaeta africana Z-7692(T) that shared a 16S rRNA gene sequence identity value of 90% with the novel strain and thus was only distantly related. A comprehensive polyphasic study including determination of the complete genome sequence was initiated to characterize the novel isolate. Cells of strain L21-RPul-D2(T) had a size of 0.2 - 0.25 × 8-9 μm, were helical, motile, stained Gram-negative and produced an orange carotenoid-like pigment. Optimal conditions for growth were 35°C, a salinity of 50 g/l NaCl and a pH around 7.0. Preferred substrates for growth were carbohydrates and a few carboxylic acids. The novel strain had an obligate fermentative metabolism and produced ethanol, acetate, lactate, hydrogen and carbon dioxide during growth on glucose. Strain L21-RPul-D2(T) was aerotolerant, but oxygen did not stimulate growth. Major cellular fatty acids were C14:0, iso-C15:0, C16:0 and C18:0. The major polar lipids were an unidentified aminolipid, phosphatidylglycerol, an unidentified phospholipid and two unidentified glycolipids. Whole-cell hydrolysates contained L-ornithine as diagnostic diamino acid of the cell wall peptidoglycan. The complete genome sequence was determined and annotated. The genome comprised one circular chromosome with a size of 3.78 Mbp that contained 3450 protein-coding genes and 50 RNA genes, including 2 operons of ribosomal RNA genes. The DNA G + C content was determined from the genome sequence as 51.9 mol%. There were no predicted genes encoding cytochromes or enzymes responsible for the biosynthesis of respiratory lipoquinones. Based on significant differences to the uncultured type species of the genus Spirochaeta, S. plicatilis, as well as to any other phylogenetically related

  19. Genome-Resolved Metagenomic Analysis Reveals Roles for Candidate Phyla and Other Microbial Community Members in Biogeochemical Transformations in Oil Reservoirs.

    Science.gov (United States)

    Hu, Ping; Tom, Lauren; Singh, Andrea; Thomas, Brian C; Baker, Brett J; Piceno, Yvette M; Andersen, Gary L; Banfield, Jillian F

    2016-01-19

    Oil reservoirs are major sites of methane production and carbon turnover, processes with significant impacts on energy resources and global biogeochemical cycles. We applied a cultivation-independent genomic approach to define microbial community membership and predict roles for specific organisms in biogeochemical transformations in Alaska North Slope oil fields. Produced water samples were collected from six locations between 1,128 m (24 to 27°C) and 2,743 m (80 to 83°C) below the surface. Microbial community complexity decreased with increasing temperature, and the potential to degrade hydrocarbon compounds was most prevalent in the lower-temperature reservoirs. Sulfate availability, rather than sulfate reduction potential, seems to be the limiting factor for sulfide production in some of the reservoirs under investigation. Most microorganisms in the intermediate- and higher-temperature samples were related to previously studied methanogenic and nonmethanogenic archaea and thermophilic bacteria, but one candidate phylum bacterium, a member of the Acetothermia (OP1), was present in Kuparuk sample K3. The greatest numbers of candidate phyla were recovered from the mesothermic reservoir samples SB1 and SB2. We reconstructed a nearly complete genome for an organism from the candidate phylum Parcubacteria (OD1) that was abundant in sample SB1. Consistent with prior findings for members of this lineage, the OD1 genome is small, and metabolic predictions support an obligately anaerobic, fermentation-based lifestyle. At moderate abundance in samples SB1 and SB2 were members of bacteria from other candidate phyla, including Microgenomates (OP11), Atribacteria (OP9), candidate phyla TA06 and WS6, and Marinimicrobia (SAR406). The results presented here elucidate potential roles of organisms in oil reservoir biological processes. The activities of microorganisms in oil reservoirs impact petroleum resource quality and the global carbon cycle. We show that bacteria

  20. A Novel Uncultured Bacterium of the Family Gallionellaceae: Description and Genome Reconstruction Based on the Metagenomic Analysis of Microbial Community in Acid Mine Drainage.

    Science.gov (United States)

    Kadnikov, V V; Ivasenko, D A; Beletsky, A V; Mardanov, A V; Danilova, E V; Pimenov, N V; Karnachuk, O V; Ravin, N V

    2016-07-01

    Drainage waters at the metal mining areas often have low pH and high content of dissolved metals due to oxidation of sulfide minerals. Extreme conditions limit microbial diversity in- such ecosystems. A drainage water microbial community (6.5'C, pH 2.65) in an open pit at the Sherlovaya Gora polymetallic open-cast mine (Transbaikal region, Eastern Siberia, Russia) was studied using metagenomic techniques. Metagenome sequencing provided information for taxonomic and functional characterization of the micro- bial community. The majority of microorganisms belonged to a single uncultured lineage representing a new Betaproteobacteria species of the genus Gallionella. While no.acidophiles are known among the cultured members of the family Gallionellaceae, similar 16S rRNA gene sequences were detected in acid mine drain- ages. Bacteria ofthe genera Thiobacillus, Acidobacterium, Acidisphaera, and Acidithiobacillus,-which are com- mon in acid mine drainage environments, were the minor components of the community. Metagenomic data were -used to determine the almost complete (-3.4 Mb) composite genome of the new bacterial. lineage desig- nated Candidatus Gallionella acididurans ShG14-8. Genome analysis revealed that Fe(II) oxidation probably involved the cytochromes localized on the outer membrane of the cell. The electron transport chain included NADH dehydrogenase, a cytochrome bc1 complex, an alternative complex III, and cytochrome oxidases of the bd, cbb3, and bo3 types. Oxidation of reduced sulfur compounds probably involved the Sox system, sul- fide-quinone oxidoreductase, adenyl sulfate reductase, and sulfate adenyltransferase. The genes required for autotrophic carbon assimilation via the Calvin cycle were present, while no pathway for nitrogen fixation was revealed. High numbers of RND metal transporters and P type ATPases were probably responsible for resis- tance to heavy metals. The new microorganism was an aerobic chemolithoautotroph of the group of

  1. Complete genome sequence of Enterobacter sp. IIT-BT 08: A potential microbial strain for high rate hydrogen production.

    Science.gov (United States)

    Khanna, Namita; Ghosh, Ananta Kumar; Huntemann, Marcel; Deshpande, Shweta; Han, James; Chen, Amy; Kyrpides, Nikos; Mavrommatis, Kostas; Szeto, Ernest; Markowitz, Victor; Ivanova, Natalia; Pagani, Ioanna; Pati, Amrita; Pitluck, Sam; Nolan, Matt; Woyke, Tanja; Teshima, Hazuki; Chertkov, Olga; Daligault, Hajnalka; Davenport, Karen; Gu, Wei; Munk, Christine; Zhang, Xiaojing; Bruce, David; Detter, Chris; Xu, Yan; Quintana, Beverly; Reitenga, Krista; Kunde, Yulia; Green, Lance; Erkkila, Tracy; Han, Cliff; Brambilla, Evelyne-Marie; Lang, Elke; Klenk, Hans-Peter; Goodwin, Lynne; Chain, Patrick; Das, Debabrata

    2013-12-20

    Enterobacter sp. IIT-BT 08 belongs to Phylum: Proteobacteria, Class: Gammaproteobacteria, Order: Enterobacteriales, Family: Enterobacteriaceae. The organism was isolated from the leaves of a local plant near the Kharagpur railway station, Kharagpur, West Bengal, India. It has been extensively studied for fermentative hydrogen production because of its high hydrogen yield. For further enhancement of hydrogen production by strain development, complete genome sequence analysis was carried out. Sequence analysis revealed that the genome was linear, 4.67 Mbp long and had a GC content of 56.01%. The genome properties encode 4,393 protein-coding and 179 RNA genes. Additionally, a putative pathway of hydrogen production was suggested based on the presence of formate hydrogen lyase complex and other related genes identified in the genome. Thus, in the present study we describe the specific properties of the organism and the generation, annotation and analysis of its genome sequence as well as discuss the putative pathway of hydrogen production by this organism.

  2. GenomeCAT: a versatile tool for the analysis and integrative visualization of DNA copy number variants.

    Science.gov (United States)

    Tebel, Katrin; Boldt, Vivien; Steininger, Anne; Port, Matthias; Ebert, Grit; Ullmann, Reinhard

    2017-01-06

    The analysis of DNA copy number variants (CNV) has increasing impact in the field of genetic diagnostics and research. However, the interpretation of CNV data derived from high resolution array CGH or NGS platforms is complicated by the considerable variability of the human genome. Therefore, tools for multidimensional data analysis and comparison of patient cohorts are needed to assist in the discrimination of clinically relevant CNVs from others. We developed GenomeCAT, a standalone Java application for the analysis and integrative visualization of CNVs. GenomeCAT is composed of three modules dedicated to the inspection of single cases, comparative analysis of multidimensional data and group comparisons aiming at the identification of recurrent aberrations in patients sharing the same phenotype, respectively. Its flexible import options ease the comparative analysis of own results derived from microarray or NGS platforms with data from literature or public depositories. Multidimensional data obtained from different experiment types can be merged into a common data matrix to enable common visualization and analysis. All results are stored in the integrated MySQL database, but can also be exported as tab delimited files for further statistical calculations in external programs. GenomeCAT offers a broad spectrum of visualization and analysis tools that assist in the evaluation of CNVs in the context of other experiment data and annotations. The use of GenomeCAT does not require any specialized computer skills. The various R packages implemented for data analysis are fully integrated into GenomeCATs graphical user interface and the installation process is supported by a wizard. The flexibility in terms of data import and export in combination with the ability to create a common data matrix makes the program also well suited as an interface between genomic data from heterogeneous sources and external software tools. Due to the modular architecture the functionality of

  3. Advanced Microbial Taxonomy Combined with Genome-Based-Approaches Reveals that Vibrio astriarenae sp. nov., an Agarolytic Marine Bacterium, Forms a New Clade in Vibrionaceae.

    Science.gov (United States)

    Al-Saari, Nurhidayu; Gao, Feng; Rohul, Amin A K M; Sato, Kazumichi; Sato, Keisuke; Mino, Sayaka; Suda, Wataru; Oshima, Kenshiro; Hattori, Masahira; Ohkuma, Moriya; Meirelles, Pedro M; Thompson, Fabiano L; Thompson, Cristiane; Filho, Gilberto M A; Gomez-Gil, Bruno; Sawabe, Toko; Sawabe, Tomoo

    2015-01-01

    Advances in genomic microbial taxonomy have opened the way to create a more universal and transparent concept of species but is still in a transitional stage towards becoming a defining robust criteria for describing new microbial species with minimum features obtained using both genome and classical polyphasic taxonomies. Here we performed advanced microbial taxonomies combined with both genome-based and classical approaches for new agarolytic vibrio isolates to describe not only a novel Vibrio species but also a member of a new Vibrio clade. Two novel vibrio strains (Vibrio astriarenae sp. nov. C7T and C20) showing agarolytic, halophilic and fermentative metabolic activity were isolated from a seawater sample collected in a coral reef in Okinawa. Intraspecific similarities of the isolates were identical in both sequences on the 16S rRNA and pyrH genes, but the closest relatives on the molecular phylogenetic trees on the basis of 16S rRNA and pyrH gene sequences were V. hangzhouensis JCM 15146T (97.8% similarity) and V. agarivorans CECT 5085T (97.3% similarity), respectively. Further multilocus sequence analysis (MLSA) on the basis of 8 protein coding genes (ftsZ, gapA, gyrB, mreB, pyrH, recA, rpoA, and topA) obtained by the genome sequences clearly showed the V. astriarenae strain C7T and C20 formed a distinct new clade protruded next to V. agarivorans CECT 5085T. The singleton V. agarivorans has never been included in previous MLSA of Vibrionaceae due to the lack of some gene sequences. Now the gene sequences are completed and analysis of 100 taxa in total provided a clear picture describing the association of V. agarivorans into pre-existing concatenated network tree and concluded its relationship to our vibrio strains. Experimental DNA-DNA hybridization (DDH) data showed that the strains C7T and C20 were conspecific but were separated from all of the other Vibrio species related on the basis of both 16S rRNA and pyrH gene phylogenies (e.g., V. agarivorans CECT

  4. Advanced Microbial Taxonomy Combined with Genome-Based-Approaches Reveals that Vibrio astriarenae sp. nov., an Agarolytic Marine Bacterium, Forms a New Clade in Vibrionaceae.

    Directory of Open Access Journals (Sweden)

    Nurhidayu Al-Saari

    Full Text Available Advances in genomic microbial taxonomy have opened the way to create a more universal and transparent concept of species but is still in a transitional stage towards becoming a defining robust criteria for describing new microbial species with minimum features obtained using both genome and classical polyphasic taxonomies. Here we performed advanced microbial taxonomies combined with both genome-based and classical approaches for new agarolytic vibrio isolates to describe not only a novel Vibrio species but also a member of a new Vibrio clade. Two novel vibrio strains (Vibrio astriarenae sp. nov. C7T and C20 showing agarolytic, halophilic and fermentative metabolic activity were isolated from a seawater sample collected in a coral reef in Okinawa. Intraspecific similarities of the isolates were identical in both sequences on the 16S rRNA and pyrH genes, but the closest relatives on the molecular phylogenetic trees on the basis of 16S rRNA and pyrH gene sequences were V. hangzhouensis JCM 15146T (97.8% similarity and V. agarivorans CECT 5085T (97.3% similarity, respectively. Further multilocus sequence analysis (MLSA on the basis of 8 protein coding genes (ftsZ, gapA, gyrB, mreB, pyrH, recA, rpoA, and topA obtained by the genome sequences clearly showed the V. astriarenae strain C7T and C20 formed a distinct new clade protruded next to V. agarivorans CECT 5085T. The singleton V. agarivorans has never been included in previous MLSA of Vibrionaceae due to the lack of some gene sequences. Now the gene sequences are completed and analysis of 100 taxa in total provided a clear picture describing the association of V. agarivorans into pre-existing concatenated network tree and concluded its relationship to our vibrio strains. Experimental DNA-DNA hybridization (DDH data showed that the strains C7T and C20 were conspecific but were separated from all of the other Vibrio species related on the basis of both 16S rRNA and pyrH gene phylogenies (e.g., V

  5. Preserving genome integrity: the DdrA protein of Deinococcus radiodurans R1.

    Science.gov (United States)

    Harris, Dennis R; Tanaka, Masashi; Saveliev, Sergei V; Jolivet, Edmond; Earl, Ashlee M; Cox, Michael M; Battista, John R

    2004-10-01

    The bacterium Deinococcus radiodurans can withstand extraordinary levels of ionizing radiation, reflecting an equally extraordinary capacity for DNA repair. The hypothetical gene product DR0423 has been implicated in the recovery of this organism from DNA damage, indicating that this protein is a novel component of the D. radiodurans DNA repair system. DR0423 is a homologue of the eukaryotic Rad52 protein. Following exposure to ionizing radiation, DR0423 expression is induced relative to an untreated control, and strains carrying a deletion of the DR0423 gene exhibit increased sensitivity to ionizing radiation. When recovering from ionizing-radiation-induced DNA damage in the absence of nutrients, wild-type D. radiodurans reassembles its genome while the mutant lacking DR0423 function does not. In vitro, the purified DR0423 protein binds to single-stranded DNA with an apparent affinity for 3' ends, and protects those ends from nuclease degradation. We propose that DR0423 is part of a DNA end-protection system that helps to preserve genome integrity following exposure to ionizing radiation. We designate the DR0423 protein as DNA damage response A protein.

  6. Preserving genome integrity: the DdrA protein of Deinococcus radiodurans R1.

    Directory of Open Access Journals (Sweden)

    Dennis R Harris

    2004-10-01

    Full Text Available The bacter