WorldWideScience

Sample records for metagenomic gene prediction

  1. Gene Prediction in Metagenomic Fragments with Deep Learning

    Directory of Open Access Journals (Sweden)

    Shao-Wu Zhang

    2017-01-01

    Full Text Available Next generation sequencing technologies used in metagenomics yield numerous sequencing fragments which come from thousands of different species. Accurately identifying genes from metagenomics fragments is one of the most fundamental issues in metagenomics. In this article, by fusing multifeatures (i.e., monocodon usage, monoamino acid usage, ORF length coverage, and Z-curve features and using deep stacking networks learning model, we present a novel method (called Meta-MFDL to predict the metagenomic genes. The results with 10 CV and independent tests show that Meta-MFDL is a powerful tool for identifying genes from metagenomic fragments.

  2. MOCAT: a metagenomics assembly and gene prediction toolkit.

    Directory of Open Access Journals (Sweden)

    Jens Roat Kultima

    Full Text Available MOCAT is a highly configurable, modular pipeline for fast, standardized processing of single or paired-end sequencing data generated by the Illumina platform. The pipeline uses state-of-the-art programs to quality control, map, and assemble reads from metagenomic samples sequenced at a depth of several billion base pairs, and predict protein-coding genes on assembled metagenomes. Mapping against reference databases allows for read extraction or removal, as well as abundance calculations. Relevant statistics for each processing step can be summarized into multi-sheet Excel documents and queryable SQL databases. MOCAT runs on UNIX machines and integrates seamlessly with the SGE and PBS queuing systems, commonly used to process large datasets. The open source code and modular architecture allow users to modify or exchange the programs that are utilized in the various processing steps. Individual processing steps and parameters were benchmarked and tested on artificial, real, and simulated metagenomes resulting in an improvement of selected quality metrics. MOCAT can be freely downloaded at http://www.bork.embl.de/mocat/.

  3. Gene prediction in metagenomic fragments: a large scale machine learning approach.

    Science.gov (United States)

    Hoff, Katharina J; Tech, Maike; Lingner, Thomas; Daniel, Rolf; Morgenstern, Burkhard; Meinicke, Peter

    2008-04-28

    Metagenomics is an approach to the characterization of microbial genomes via the direct isolation of genomic sequences from the environment without prior cultivation. The amount of metagenomic sequence data is growing fast while computational methods for metagenome analysis are still in their infancy. In contrast to genomic sequences of single species, which can usually be assembled and analyzed by many available methods, a large proportion of metagenome data remains as unassembled anonymous sequencing reads. One of the aims of all metagenomic sequencing projects is the identification of novel genes. Short length, for example, Sanger sequencing yields on average 700 bp fragments, and unknown phylogenetic origin of most fragments require approaches to gene prediction that are different from the currently available methods for genomes of single species. In particular, the large size of metagenomic samples requires fast and accurate methods with small numbers of false positive predictions. We introduce a novel gene prediction algorithm for metagenomic fragments based on a two-stage machine learning approach. In the first stage, we use linear discriminants for monocodon usage, dicodon usage and translation initiation sites to extract features from DNA sequences. In the second stage, an artificial neural network combines these features with open reading frame length and fragment GC-content to compute the probability that this open reading frame encodes a protein. This probability is used for the classification and scoring of gene candidates. With large scale training, our method provides fast single fragment predictions with good sensitivity and specificity on artificially fragmented genomic DNA. Additionally, this method is able to predict translation initiation sites accurately and distinguishes complete from incomplete genes with high reliability. Large scale machine learning methods are well-suited for gene prediction in metagenomic DNA fragments. In particular, the

  4. Gene prediction in metagenomic fragments: A large scale machine learning approach

    Directory of Open Access Journals (Sweden)

    Morgenstern Burkhard

    2008-04-01

    Full Text Available Abstract Background Metagenomics is an approach to the characterization of microbial genomes via the direct isolation of genomic sequences from the environment without prior cultivation. The amount of metagenomic sequence data is growing fast while computational methods for metagenome analysis are still in their infancy. In contrast to genomic sequences of single species, which can usually be assembled and analyzed by many available methods, a large proportion of metagenome data remains as unassembled anonymous sequencing reads. One of the aims of all metagenomic sequencing projects is the identification of novel genes. Short length, for example, Sanger sequencing yields on average 700 bp fragments, and unknown phylogenetic origin of most fragments require approaches to gene prediction that are different from the currently available methods for genomes of single species. In particular, the large size of metagenomic samples requires fast and accurate methods with small numbers of false positive predictions. Results We introduce a novel gene prediction algorithm for metagenomic fragments based on a two-stage machine learning approach. In the first stage, we use linear discriminants for monocodon usage, dicodon usage and translation initiation sites to extract features from DNA sequences. In the second stage, an artificial neural network combines these features with open reading frame length and fragment GC-content to compute the probability that this open reading frame encodes a protein. This probability is used for the classification and scoring of gene candidates. With large scale training, our method provides fast single fragment predictions with good sensitivity and specificity on artificially fragmented genomic DNA. Additionally, this method is able to predict translation initiation sites accurately and distinguishes complete from incomplete genes with high reliability. Conclusion Large scale machine learning methods are well-suited for gene

  5. Meta4: a web application for sharing and annotating metagenomic gene predictions using web services.

    Science.gov (United States)

    Richardson, Emily J; Escalettes, Franck; Fotheringham, Ian; Wallace, Robert J; Watson, Mick

    2013-01-01

    Whole-genome shotgun metagenomics experiments produce DNA sequence data from entire ecosystems, and provide a huge amount of novel information. Gene discovery projects require up-to-date information about sequence homology and domain structure for millions of predicted proteins to be presented in a simple, easy-to-use system. There is a lack of simple, open, flexible tools that allow the rapid sharing of metagenomics datasets with collaborators in a format they can easily interrogate. We present Meta4, a flexible and extensible web application that can be used to share and annotate metagenomic gene predictions. Proteins and predicted domains are stored in a simple relational database, with a dynamic front-end which displays the results in an internet browser. Web services are used to provide up-to-date information about the proteins from homology searches against public databases. Information about Meta4 can be found on the project website, code is available on Github, a cloud image is available, and an example implementation can be seen at.

  6. Metagenomic Analysis of Apple Orchard Soil Reveals Antibiotic Resistance Genes Encoding Predicted Bifunctional Proteins▿

    Science.gov (United States)

    Donato, Justin J.; Moe, Luke A.; Converse, Brandon J.; Smart, Keith D.; Berklein, Flora C.; McManus, Patricia S.; Handelsman, Jo

    2010-01-01

    To gain insight into the diversity and origins of antibiotic resistance genes, we identified resistance genes in the soil in an apple orchard using functional metagenomics, which involves inserting large fragments of foreign DNA into Escherichia coli and assaying the resulting clones for expressed functions. Among 13 antibiotic-resistant clones, we found two genes that encode bifunctional proteins. One predicted bifunctional protein confers resistance to ceftazidime and contains a natural fusion between a predicted transcriptional regulator and a β-lactamase. Sequence analysis of the entire metagenomic clone encoding the predicted bifunctional β-lactamase revealed a gene potentially involved in chloramphenicol resistance as well as a predicted transposase. A second clone that encodes a predicted bifunctional protein confers resistance to kanamycin and contains an aminoglycoside acetyltransferase domain fused to a second acetyltransferase domain that, based on nucleotide sequence, was predicted not to be involved in antibiotic resistance. This is the first report of a transcriptional regulator fused to a β-lactamase and of an aminoglycoside acetyltransferase fused to an acetyltransferase not involved in antibiotic resistance. PMID:20453147

  7. Functional metagenomics reveals novel β-galactosidases not predictable from gene sequences.

    Science.gov (United States)

    Cheng, Jiujun; Romantsov, Tatyana; Engel, Katja; Doxey, Andrew C; Rose, David R; Neufeld, Josh D; Charles, Trevor C

    2017-01-01

    The techniques of metagenomics have allowed researchers to access the genomic potential of uncultivated microbes, but there remain significant barriers to determination of gene function based on DNA sequence alone. Functional metagenomics, in which DNA is cloned and expressed in surrogate hosts, can overcome these barriers, and make important contributions to the discovery of novel enzymes. In this study, a soil metagenomic library carried in an IncP cosmid was used for functional complementation for β-galactosidase activity in both Sinorhizobium meliloti (α-Proteobacteria) and Escherichia coli (γ-Proteobacteria) backgrounds. One β-galactosidase, encoded by six overlapping clones that were selected in both hosts, was identified as a member of glycoside hydrolase family 2. We could not identify ORFs obviously encoding possible β-galactosidases in 19 other sequenced clones that were only able to complement S. meliloti. Based on low sequence identity to other known glycoside hydrolases, yet not β-galactosidases, three of these ORFs were examined further. Biochemical analysis confirmed that all three encoded β-galactosidase activity. Lac36W_ORF11 and Lac161_ORF7 had conserved domains, but lacked similarities to known glycoside hydrolases. Lac161_ORF10 had neither conserved domains nor similarity to known glycoside hydrolases. Bioinformatic and structural modeling implied that Lac161_ORF10 protein represented a novel enzyme family with a five-bladed propeller glycoside hydrolase domain. By discovering founding members of three novel β-galactosidase families, we have reinforced the value of functional metagenomics for isolating novel genes that could not have been predicted from DNA sequence analysis alone.

  8. MGC: a metagenomic gene caller.

    Science.gov (United States)

    El Allali, Achraf; Rose, John R

    2013-01-01

    Computational gene finding algorithms have proven their robustness in identifying genes in complete genomes. However, metagenomic sequencing has presented new challenges due to the incomplete and fragmented nature of the data. During the last few years, attempts have been made to extract complete and incomplete open reading frames (ORFs) directly from short reads and identify the coding ORFs, bypassing other challenging tasks such as the assembly of the metagenome. In this paper we introduce a metagenomics gene caller (MGC) which is an improvement over the state-of-the-art prediction algorithm Orphelia. Orphelia uses a two-stage machine learning approach and computes a model that classifies extracted ORFs from fragmented sequences. We hypothesise and demonstrate evidence that sequences need separate models based on their local GC-content in order to avoid the noise introduced to a single model computed with sequences from the entire GC spectrum. We have also added two amino-acid features based on the benefit of amino-acid usage shown in our previous research. Our algorithm is able to predict genes and translation initiation sites (TIS) more accurately than Orphelia which uses a single model. Learning separate models for several pre-defined GC-content regions as opposed to a single model approach improves the performance of the neural network as demonstrated by the experimental results presented in this paper. The inclusion of amino-acid usage features also helps improve the overall accuracy of our algorithm. MGC's improvement sets the ground for further investigation into the use of GC-content to separate data for training models in machine learning based gene finders.

  9. Meta4: a web-application for sharing and annotating metagenomic gene predictions using web-services

    Directory of Open Access Journals (Sweden)

    Emily J Richardson

    2013-09-01

    Full Text Available Whole-genome-shotgun (WGS metagenomics experiments produce DNA sequence data from entire ecosystems, and provide a huge amount of novel information. Gene discovery projects require up-to-date information about sequence homology and domain structure for millions of predicted proteins to be presented in a simple, easy-to-use system. There is a lack of simple, open, flexible tools that allow the rapid sharing of metagenomics datasets with collaborators in a format they can easily interrogate. We present Meta4, a flexible and extensible web-application that can be used to share and annotate metagenomic gene predictions. Proteins and predicted domains are stored in a simple relational database, with a dynamic front-end which displays the results in an internet browser. Web-services are used to provide up-to-date information about the proteins from homology searches against public databases. Information about Meta4 can be found on the project website (http://www.ark-genomics.org/bioinformatics/meta4, code is available on Github (https://github.com/mw55309/meta4, a cloud image is available, and an example implementation can be seen at http://www.ark-genomics.org/tools/meta4

  10. DeepARG: a deep learning approach for predicting antibiotic resistance genes from metagenomic data.

    Science.gov (United States)

    Arango-Argoty, Gustavo; Garner, Emily; Pruden, Amy; Heath, Lenwood S; Vikesland, Peter; Zhang, Liqing

    2018-02-01

    Growing concerns about increasing rates of antibiotic resistance call for expanded and comprehensive global monitoring. Advancing methods for monitoring of environmental media (e.g., wastewater, agricultural waste, food, and water) is especially needed for identifying potential resources of novel antibiotic resistance genes (ARGs), hot spots for gene exchange, and as pathways for the spread of ARGs and human exposure. Next-generation sequencing now enables direct access and profiling of the total metagenomic DNA pool, where ARGs are typically identified or predicted based on the "best hits" of sequence searches against existing databases. Unfortunately, this approach produces a high rate of false negatives. To address such limitations, we propose here a deep learning approach, taking into account a dissimilarity matrix created using all known categories of ARGs. Two deep learning models, DeepARG-SS and DeepARG-LS, were constructed for short read sequences and full gene length sequences, respectively. Evaluation of the deep learning models over 30 antibiotic resistance categories demonstrates that the DeepARG models can predict ARGs with both high precision (> 0.97) and recall (> 0.90). The models displayed an advantage over the typical best hit approach, yielding consistently lower false negative rates and thus higher overall recall (> 0.9). As more data become available for under-represented ARG categories, the DeepARG models' performance can be expected to be further enhanced due to the nature of the underlying neural networks. Our newly developed ARG database, DeepARG-DB, encompasses ARGs predicted with a high degree of confidence and extensive manual inspection, greatly expanding current ARG repositories. The deep learning models developed here offer more accurate antimicrobial resistance annotation relative to current bioinformatics practice. DeepARG does not require strict cutoffs, which enables identification of a much broader diversity of ARGs. The

  11. Tentacle: distributed quantification of genes in metagenomes

    OpenAIRE

    Boulund, Fredrik; Sjögren, Anders; Kristiansson, Erik

    2015-01-01

    Background In metagenomics, microbial communities are sequenced at increasingly high resolution, generating datasets with billions of DNA fragments. Novel methods that can efficiently process the growing volumes of sequence data are necessary for the accurate analysis and interpretation of existing and upcoming metagenomes. Findings Here we present Tentacle, which is a novel framework that uses distributed computational resources for gene quantification in metagenomes. Tentacle is implemented...

  12. Snowball: Strain aware gene assembly of Metagenomes

    NARCIS (Netherlands)

    I. Gregor; A. Schönhuth (Alexander); A.C. McHardy (Alice)

    2015-01-01

    htmlabstractGene assembly is an important step in functional analysis of shotgun metagenomic data. Nonetheless, strain aware assembly remains a challenging task, as current assembly tools often fail to distinguish among strain variants or require closely related reference genomes of the studied

  13. Snowball: strain aware gene assembly of metagenomes

    NARCIS (Netherlands)

    I. Gregor; A. Schönhuth (Alexander); A.C. McHardy (Alice)

    2016-01-01

    textabstractMotivation: Gene assembly is an important step in functional analysis of shotgun metagenomic data. Nonetheless, strain aware assembly remains a challenging task, as current assembly tools often fail to distinguish among strain variants or require closely related reference genomes of the

  14. Tentacle: distributed quantification of genes in metagenomes.

    Science.gov (United States)

    Boulund, Fredrik; Sjögren, Anders; Kristiansson, Erik

    2015-01-01

    In metagenomics, microbial communities are sequenced at increasingly high resolution, generating datasets with billions of DNA fragments. Novel methods that can efficiently process the growing volumes of sequence data are necessary for the accurate analysis and interpretation of existing and upcoming metagenomes. Here we present Tentacle, which is a novel framework that uses distributed computational resources for gene quantification in metagenomes. Tentacle is implemented using a dynamic master-worker approach in which DNA fragments are streamed via a network and processed in parallel on worker nodes. Tentacle is modular, extensible, and comes with support for six commonly used sequence aligners. It is easy to adapt Tentacle to different applications in metagenomics and easy to integrate into existing workflows. Evaluations show that Tentacle scales very well with increasing computing resources. We illustrate the versatility of Tentacle on three different use cases. Tentacle is written for Linux in Python 2.7 and is published as open source under the GNU General Public License (v3). Documentation, tutorials, installation instructions, and the source code are freely available online at: http://bioinformatics.math.chalmers.se/tentacle.

  15. Gene-Based Pathogen Detection: Can We Use qPCR to Predict the Outcome of Diagnostic Metagenomics?

    DEFF Research Database (Denmark)

    Andersen, Sandra Christine; Fachmann, Mette Sofie Rousing; Kiil, Kristoffer

    2017-01-01

    In microbial food safety, molecular methods such as quantitative PCR (qPCR) and next-generation sequencing (NGS) of bacterial isolates can potentially be replaced by diagnostic shotgun metagenomics. However, the methods for pre-analytical sample preparation are often optimized for qPCR, and do...... not necessarily perform equally well for qPCR and sequencing. The present study investigates, through screening of methods, whether qPCR can be used as an indicator for the optimization of sample preparation for NGS-based shotgun metagenomics with a diagnostic focus. This was used on human fecal samples spiked...... with 10³ or 10⁶ colony-forming units (CFU)/g Campylobacter jejuni, as well as porcine fecal samples spiked with 10³ or 10⁶ CFU/g Salmonella typhimurium. DNA was extracted from the samples using variations of two widely used kits. The following quality parameters were measured: DNA concentration, qPCR, DNA...

  16. Gene-Based Pathogen Detection: Can We Use qPCR to Predict the Outcome of Diagnostic Metagenomics?

    OpenAIRE

    Andersen, Sandra Christine; Fachmann, Mette Sofie Rousing; Kiil, Kristoffer; Møller Nielsen, Eva; Hoorfar, Jeffrey

    2017-01-01

    In microbial food safety, molecular methods such as quantitative PCR (qPCR) and next-generation sequencing (NGS) of bacterial isolates can potentially be replaced by diagnostic shotgun metagenomics. However, the methods for pre-analytical sample preparation are often optimized for qPCR, and do not necessarily perform equally well for qPCR and sequencing. The present study investigates, through screening of methods, whether qPCR can be used as an indicator for the optimization of sample prepar...

  17. Gene-Based Pathogen Detection: Can We Use qPCR to Predict the Outcome of Diagnostic Metagenomics?

    Directory of Open Access Journals (Sweden)

    Sandra Christine Andersen

    2017-11-01

    Full Text Available In microbial food safety, molecular methods such as quantitative PCR (qPCR and next-generation sequencing (NGS of bacterial isolates can potentially be replaced by diagnostic shotgun metagenomics. However, the methods for pre-analytical sample preparation are often optimized for qPCR, and do not necessarily perform equally well for qPCR and sequencing. The present study investigates, through screening of methods, whether qPCR can be used as an indicator for the optimization of sample preparation for NGS-based shotgun metagenomics with a diagnostic focus. This was used on human fecal samples spiked with 103 or 106 colony-forming units (CFU/g Campylobacter jejuni, as well as porcine fecal samples spiked with 103 or 106 CFU/g Salmonella typhimurium. DNA was extracted from the samples using variations of two widely used kits. The following quality parameters were measured: DNA concentration, qPCR, DNA fragmentation during library preparation, amount of DNA available for sequencing, amount of sequencing data, distribution of data between samples in a batch, and data insert size; none showed any correlation with the target ratio of the spiking organism detected in sequencing data. Surprisingly, diagnostic metagenomics can have better detection sensitivity than qPCR for samples spiked with 103 CFU/g C. jejuni. The study also showed that qPCR and sequencing results may be different due to inhibition in one of the methods. In conclusion, qPCR cannot uncritically be used as an indicator for the optimization of sample preparation for diagnostic metagenomics.

  18. A novel bioinformatics strategy for function prediction of poorly-characterized protein genes obtained from metagenome analyses.

    Science.gov (United States)

    Abe, Takashi; Kanaya, Shigehiko; Uehara, Hiroshi; Ikemura, Toshimichi

    2009-10-01

    As a result of remarkable progresses of DNA sequencing technology, vast quantities of genomic sequences have been decoded. Homology search for amino acid sequences, such as BLAST, has become a basic tool for assigning functions of genes/proteins when genomic sequences are decoded. Although the homology search has clearly been a powerful and irreplaceable method, the functions of only 50% or fewer of genes can be predicted when a novel genome is decoded. A prediction method independent of the homology search is urgently needed. By analyzing oligonucleotide compositions in genomic sequences, we previously developed a modified Self-Organizing Map 'BLSOM' that clustered genomic fragments according to phylotype with no advance knowledge of phylotype. Using BLSOM for di-, tri- and tetrapeptide compositions, we developed a system to enable separation (self-organization) of proteins by function. Analyzing oligopeptide frequencies in proteins previously classified into COGs (clusters of orthologous groups of proteins), BLSOMs could faithfully reproduce the COG classifications. This indicated that proteins, whose functions are unknown because of lack of significant sequence similarity with function-known proteins, can be related to function-known proteins based on similarity in oligopeptide composition. BLSOM was applied to predict functions of vast quantities of proteins derived from mixed genomes in environmental samples.

  19. Novel florfenicol and chloramphenicol resistance gene discovered in Alaskan soil by using functional metagenomics.

    Science.gov (United States)

    Lang, Kevin S; Anderson, Janet M; Schwarz, Stefan; Williamson, Lynn; Handelsman, Jo; Singer, Randall S

    2010-08-01

    Functional metagenomics was used to search for florfenicol resistance genes in libraries of cloned DNA isolated from Alaskan soil. A gene that mediated reduced susceptibility to florfenicol was identified and designated pexA. The predicted PexA protein showed a structure similar to that of efflux pumps of the major facilitator superfamily.

  20. Use of Substrate-Induced Gene Expression in Metagenomic Analysis of an Aromatic Hydrocarbon-Contaminated Soil.

    Science.gov (United States)

    Meier, Matthew J; Paterson, E Suzanne; Lambert, Iain B

    2016-02-01

    Metagenomics allows the study of genes related to xenobiotic degradation in a culture-independent manner, but many of these studies are limited by the lack of genomic context for metagenomic sequences. This study combined a phenotypic screen known as substrate-induced gene expression (SIGEX) with whole-metagenome shotgun sequencing. SIGEX is a high-throughput promoter-trap method that relies on transcriptional activation of a green fluorescent protein (GFP) reporter gene in response to an inducing compound and subsequent fluorescence-activated cell sorting to isolate individual inducible clones from a metagenomic DNA library. We describe a SIGEX procedure with improved library construction from fragmented metagenomic DNA and improved flow cytometry sorting procedures. We used SIGEX to interrogate an aromatic hydrocarbon (AH)-contaminated soil metagenome. The recovered clones contained sequences with various degrees of similarity to genes (or partial genes) involved in aromatic metabolism, for example, nahG (salicylate oxygenase) family genes and their respective upstream nahR regulators. To obtain a broader context for the recovered fragments, clones were mapped to contigs derived from de novo assembly of shotgun-sequenced metagenomic DNA which, in most cases, contained complete operons involved in aromatic metabolism, providing greater insight into the origin of the metagenomic fragments. A comparable set of contigs was generated using a significantly less computationally intensive procedure in which assembly of shotgun-sequenced metagenomic DNA was directed by the SIGEX-recovered sequences. This methodology may have broad applicability in identifying biologically relevant subsets of metagenomes (including both novel and known sequences) that can be targeted computationally by in silico assembly and prediction tools. Copyright © 2016, American Society for Microbiology. All Rights Reserved.

  1. Exploring Antibiotic Resistance Genes and Metal Resistance Genes in Plasmid Metagenomes from Wastewater Treatment Plants

    Directory of Open Access Journals (Sweden)

    An-Dong eLi

    2015-09-01

    Full Text Available Plasmids operate as independent genetic elements in microorganism communities. Through horizontal gene transfer, they can provide their host microorganisms with important functions such as antibiotic resistance and heavy metal resistance. In this study, six metagenomic libraries were constructed with plasmid DNA extracted from influent, activated sludge and digested sludge of two wastewater treatment plants. Compared with the metagenomes of the total DNA extracted from the same sectors of the wastewater treatment plant, the plasmid metagenomes had significantly higher annotation rates, indicating that the functional genes on plasmids are commonly shared by those studied microorganisms. Meanwhile, the plasmid metagenomes also encoded many more genes related to defense mechanisms, including ARGs. Searching against an antibiotic resistance genes (ARGs database and a metal resistance genes (MRGs database revealed a broad-spectrum of antibiotic (323 out of a total 618 subtypes and metal resistance genes (23 out of a total 23 types on these plasmid metagenomes. The influent plasmid metagenomes contained many more resistance genes (both ARGs and MRGs than the activated sludge and the digested sludge metagenomes. Sixteen novel plasmids with a complete circular structure that carried these resistance genes were assembled from the plasmid metagenomes. The results of this study demonstrated that the plasmids in wastewater treatment plants could be important reservoirs for resistance genes, and may play a significant role in the horizontal transfer of these genes.

  2. Metagenomic species profiling using universal phylogenetic marker genes

    NARCIS (Netherlands)

    Sunagawa, S.; Mende, D.R.; Zeller, G.; Izquierdo-Carrasco, F.; Berger, S.A.; Kultima, J.R.; Coelho, L.P.; Arumugam, M.; Tap, J.; Nielsen, H.B.; Rasmussen, S.; Brunak, S.; Pedersen, O.; Guarner, F.; Vos, de W.M.; Wang, J.; Li, J.; Doré, J.; Ehrlich, S.D.; Stamatakis, A.; Bork, P.

    2013-01-01

    To quantify known and unknown microorganisms at species-level resolution using shotgun sequencing data, we developed a method that establishes metagenomic operational taxonomic units (mOTUs) based on single-copy phylogenetic marker genes. Applied to 252 human fecal samples, the method revealed that

  3. Metagenomic species profiling using universal phylogenetic marker genes

    DEFF Research Database (Denmark)

    Sunagawa, Shinichi; Mende, Daniel R; Zeller, Georg

    2013-01-01

    To quantify known and unknown microorganisms at species-level resolution using shotgun sequencing data, we developed a method that establishes metagenomic operational taxonomic units (mOTUs) based on single-copy phylogenetic marker genes. Applied to 252 human fecal samples, the method revealed...

  4. Reconstruction of ribosomal RNA genes from metagenomic data.

    Directory of Open Access Journals (Sweden)

    Lu Fan

    Full Text Available Direct sequencing of environmental DNA (metagenomics has a great potential for describing the 16S rRNA gene diversity of microbial communities. However current approaches using this 16S rRNA gene information to describe community diversity suffer from low taxonomic resolution or chimera problems. Here we describe a new strategy that involves stringent assembly and data filtering to reconstruct full-length 16S rRNA genes from metagenomicpyrosequencing data. Simulations showed that reconstructed 16S rRNA genes provided a true picture of the community diversity, had minimal rates of chimera formation and gave taxonomic resolution down to genus level. The strategy was furthermore compared to PCR-based methods to determine the microbial diversity in two marine sponges. This showed that about 30% of the abundant phylotypes reconstructed from metagenomic data failed to be amplified by PCR. Our approach is readily applicable to existing metagenomic datasets and is expected to lead to the discovery of new microbial phylotypes.

  5. A human gut microbial gene catalogue established by metagenomic sequencing

    DEFF Research Database (Denmark)

    dos Santos, Marcelo Bertalan Quintanilha; Sicheritz-Pontén, Thomas; Nielsen, Henrik Bjørn

    2010-01-01

    To understand the impact of gut microbes on human health and well-being it is crucial to assess their genetic potential. Here we describe the Illumina-based metagenomic sequencing, assembly and characterization of 3.3 million non-redundant microbial genes, derived from 576.7 gigabases of sequence......, from faecal samples of 124 European individuals. The gene set, ,150 times larger than the human gene complement, contains an overwhelming majority of the prevalent (more frequent) microbial genes of the cohort and probably includes a large proportion of the prevalent human intestinal microbial genes....... The genes are largely shared among individuals of the cohort. Over 99% of the genes are bacterial, indicating that the entire cohort harbours between 1,000 and 1,150 prevalent bacterial species and each individual at least 160 such species, which are also largely shared. We define and describe the minimal...

  6. Identification of nitrogen-fixing genes and gene clusters from metagenomic library of acid mine drainage.

    Directory of Open Access Journals (Sweden)

    Zhimin Dai

    Full Text Available Biological nitrogen fixation is an essential function of acid mine drainage (AMD microbial communities. However, most acidophiles in AMD environments are uncultured microorganisms and little is known about the diversity of nitrogen-fixing genes and structure of nif gene cluster in AMD microbial communities. In this study, we used metagenomic sequencing to isolate nif genes in the AMD microbial community from Dexing Copper Mine, China. Meanwhile, a metagenome microarray containing 7,776 large-insertion fosmids was constructed to screen novel nif gene clusters. Metagenomic analyses revealed that 742 sequences were identified as nif genes including structural subunit genes nifH, nifD, nifK and various additional genes. The AMD community is massively dominated by the genus Acidithiobacillus. However, the phylogenetic diversity of nitrogen-fixing microorganisms is much higher than previously thought in the AMD community. Furthermore, a 32.5-kb genomic sequence harboring nif, fix and associated genes was screened by metagenome microarray. Comparative genome analysis indicated that most nif genes in this cluster are most similar to those of Herbaspirillum seropedicae, but the organization of the nif gene cluster had significant differences from H. seropedicae. Sequence analysis and reverse transcription PCR also suggested that distinct transcription units of nif genes exist in this gene cluster. nifQ gene falls into the same transcription unit with fixABCX genes, which have not been reported in other diazotrophs before. All of these results indicated that more novel diazotrophs survive in the AMD community.

  7. Exploring antibiotic resistance genes and metal resistance genes in plasmid metagenomes from wastewater treatment plants

    OpenAIRE

    Li, An-Dong; Li, Li-Guan; Zhang, Tong

    2015-01-01

    Plasmids operate as independent genetic elements in microorganism communities. Through horizontal gene transfer, they can provide their host microorganisms with important functions such as antibiotic resistance and heavy metal resistance. In this study, six metagenomic libraries were constructed with plasmid DNA extracted from influent, activated sludge and digested sludge of two wastewater treatment plants. Compared with the metagenomes of the total DNA extracted from the same sectors of the...

  8. Metagenomic Functional Potential Predicts Degradation Rates of a Model Organophosphorus Xenobiotic in Pesticide Contaminated Soils

    Directory of Open Access Journals (Sweden)

    Thomas C. Jeffries

    2018-02-01

    Full Text Available Chemical contamination of natural and agricultural habitats is an increasing global problem and a major threat to sustainability and human health. Organophosphorus (OP compounds are one major class of contaminant and can undergo microbial degradation, however, no studies have applied system-wide ecogenomic tools to investigate OP degradation or use metagenomics to understand the underlying mechanisms of biodegradation in situ and predict degradation potential. Thus, there is a lack of knowledge regarding the functional genes and genomic potential underpinning degradation and community responses to contamination. Here we address this knowledge gap by performing shotgun sequencing of community DNA from agricultural soils with a history of pesticide usage and profiling shifts in functional genes and microbial taxa abundance. Our results showed two distinct groups of soils defined by differing functional and taxonomic profiles. Degradation assays suggested that these groups corresponded to the organophosphorus degradation potential of soils, with the fastest degrading community being defined by increases in transport and nutrient cycling pathways and enzymes potentially involved in phosphorus metabolism. This was against a backdrop of taxonomic community shifts potentially related to contamination adaptation and reflecting the legacy of exposure. Overall our results highlight the value of using holistic system-wide metagenomic approaches as a tool to predict microbial degradation in the context of the ecology of contaminated habitats.

  9. Diversity of Nonribosomal Peptide Synthetase Genes in the Microbial Metagenomes of Marine Sponges

    Directory of Open Access Journals (Sweden)

    Ute Hentschel

    2012-05-01

    Full Text Available Genomic mining revealed one major nonribosomal peptide synthetase (NRPS phylogenetic cluster in 12 marine sponge species, one ascidian, an actinobacterial isolate and seawater. Phylogenetic analysis predicts its taxonomic affiliation to the actinomycetes and hydroxy-phenyl-glycine as a likely substrate. Additionally, a phylogenetically distinct NRPS gene cluster was discovered in the microbial metagenome of the sponge Aplysina aerophoba, which shows highest similarities to NRPS genes that were previously assigned, by ways of single cell genomics, to a Chloroflexi sponge symbiont. Genomic mining studies such as the one presented here for NRPS genes, contribute to on-going efforts to characterize the genomic potential of sponge-associated microbiota for secondary metabolite biosynthesis.

  10. Retrieval of glycoside hydrolase family 9 cellulase genes from environmental DNA by metagenomic gene specific multi-primer PCR.

    Science.gov (United States)

    Xiong, Xiaolong; Yin, Xiaopu; Pei, Xiaolin; Jin, Peng; Zhang, Ao; Li, Yan; Gong, Weibo; Wang, Qiuyan

    2012-05-01

    A new method, termed metagenomic gene specific multi-primer PCR (MGSM-PCR), is presented that uses multiple gene specific primers derived from an isolated gene from a constructed metagenomic library rather than degenerate primers designed based on a known enzyme family. The utility of MGSM-PCR was shown by applying it to search for homologues of the glycoside hydrolase family 9 cellulase in metagenomic DNA. The success of the multiplex PCR was verified by visualizing products on an agarose gel following gel electrophoresis. A total of 127 homologous genes were amplified with combinatorial multi-primer reactions from 34 soil DNA samples. Multiple alignments revealed extensive sequence diversity among these captured sequences with sequence identity varying from 26 to 99.7%. These results indicated that significantly diverse homologous genes were indeed readily accessible when using multiple metagenomic gene specific primers.

  11. Bioprospecting for β-lactam resistance genes using a metagenomics-guided strategy.

    Science.gov (United States)

    Yang, Chao; Yang, Ying; Che, You; Xia, Yu; Li, Liguan; Xiong, Wenguang; Zhang, Tong

    2017-08-01

    Emergence of new antibiotic resistance bacteria poses a serious threat to human health, which is largely attributed to the evolution and spread of antibiotic resistance genes (ARGs). In this work, a metagenomics-guided strategy consisting of metagenomic analysis and function validation was proposed for rapidly identifying novel ARGs from hot spots of ARG dissemination, such as wastewater treatment plants (WWTPs) and animal feces. We used an antibiotic resistance gene database to annotate 76 putative β-lactam resistance genes from the metagenomes of sludge and chicken feces. Among these 76 candidate genes, 25 target genes that shared 40~70% amino acid identity to known β-lactamases were cloned by PCR from the metagenomes. Their resistances to four β-lactam antibiotics were further demonstrated. Furthermore, the validated ARGs were used as the reference sequences to identify novel ARGs in eight environmental samples, suggesting the necessity of re-examining the profiles of ARGs in environmental samples using the validated novel ARG sequences. This metagenomics-guided pipeline does not rely on the activity of ARGs during the initial screening process and may specifically select novel ARG sequences for function validation, which make it suitable for the high-throughput screening of novel ARGs from environmental metagenomes.

  12. Captured metagenomics: large-scale targeting of genes based on ‘sequence capture’ reveals functional diversity in soils

    Science.gov (United States)

    Manoharan, Lokeshwaran; Kushwaha, Sandeep K.; Hedlund, Katarina; Ahrén, Dag

    2015-01-01

    Microbial enzyme diversity is a key to understand many ecosystem processes. Whole metagenome sequencing (WMG) obtains information on functional genes, but it is costly and inefficient due to large amount of sequencing that is required. In this study, we have applied a captured metagenomics technique for functional genes in soil microorganisms, as an alternative to WMG. Large-scale targeting of functional genes, coding for enzymes related to organic matter degradation, was applied to two agricultural soil communities through captured metagenomics. Captured metagenomics uses custom-designed, hybridization-based oligonucleotide probes that enrich functional genes of interest in metagenomic libraries where only probe-bound DNA fragments are sequenced. The captured metagenomes were highly enriched with targeted genes while maintaining their target diversity and their taxonomic distribution correlated well with the traditional ribosomal sequencing. The captured metagenomes were highly enriched with genes related to organic matter degradation; at least five times more than similar, publicly available soil WMG projects. This target enrichment technique also preserves the functional representation of the soils, thereby facilitating comparative metagenomics projects. Here, we present the first study that applies the captured metagenomics approach in large scale, and this novel method allows deep investigations of central ecosystem processes by studying functional gene abundances. PMID:26490729

  13. Captured metagenomics: large-scale targeting of genes based on 'sequence capture' reveals functional diversity in soils.

    Science.gov (United States)

    Manoharan, Lokeshwaran; Kushwaha, Sandeep K; Hedlund, Katarina; Ahrén, Dag

    2015-12-01

    Microbial enzyme diversity is a key to understand many ecosystem processes. Whole metagenome sequencing (WMG) obtains information on functional genes, but it is costly and inefficient due to large amount of sequencing that is required. In this study, we have applied a captured metagenomics technique for functional genes in soil microorganisms, as an alternative to WMG. Large-scale targeting of functional genes, coding for enzymes related to organic matter degradation, was applied to two agricultural soil communities through captured metagenomics. Captured metagenomics uses custom-designed, hybridization-based oligonucleotide probes that enrich functional genes of interest in metagenomic libraries where only probe-bound DNA fragments are sequenced. The captured metagenomes were highly enriched with targeted genes while maintaining their target diversity and their taxonomic distribution correlated well with the traditional ribosomal sequencing. The captured metagenomes were highly enriched with genes related to organic matter degradation; at least five times more than similar, publicly available soil WMG projects. This target enrichment technique also preserves the functional representation of the soils, thereby facilitating comparative metagenomics projects. Here, we present the first study that applies the captured metagenomics approach in large scale, and this novel method allows deep investigations of central ecosystem processes by studying functional gene abundances. © The Author 2015. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  14. Selection and characterization of forest soil metagenome genes encoding lipolytic enzymes.

    Science.gov (United States)

    Hong, Kyung Sik; Lim, He Kyoung; Chung, Eu Jin; Park, Eun Jin; Lee, Myung Hwan; Kim, Jin-Cheol; Choi, Gyung Ja; Cho, Kwang Yun; Lee, Seon-Woo

    2007-10-01

    A metagenome is a unique resource to search for novel microbial enzymes from the unculturable microorganisms in soil. A forest soil metagenomic library using a fosmid and soil microbial DNA from Gwangneung forest, Korea, was constructed in Escherichia coli and screened to select lipolytic genes. A total of seven unique lipolytic clones were selected by screening of the 31,000-member forest soil metagenome library based on tributyrin hydrolysis. The ORFs for lipolytic activity were subcloned in a high copy number plasmid by screening the secondary shortgun libraries from the seven clones. Since the lipolytic enzymes were well secreted in E. coli into the culture broth, the lipolytic activity of the subclones was confirmed by the hydrolysis of p-nitrophenyl butyrate using culture broth. Deduced amino acid sequence analysis of the identified ORFs for lipolytic activity revealed that 4 genes encode hormone-sensitive lipase (HSL) in lipase family IV. Phylogenetic analysis indicated that 4 proteins were clustered with HSL in the database and other metagenomic HSLs. The other 2 genes and 1 gene encode non-heme peroxidase-like enzymes of lipase family V and a GDSL family esterase/lipase in family II, respectively. The gene for the GDSL enzyme is the first description of the enzyme from metagenomic screening.

  15. Indole Derivatives Produced by the Metagenome Genes of the Escherichia coli-Harboring Marine Sponge Discodermia calyx.

    Science.gov (United States)

    Liu, Feng-Lou; Yang, Xiao-Long

    2017-04-25

    Three indole derivatives, a novel benzoxazine-indole hybrid ( 1 ) and two known indole trimers ( 2 , 3 ), were isolated from the metagenomic library of the marine sponge Discodermia calyx based on functional screening. Their structures were elucidated by extensive spectroscopic analysis and comparison of their NMR data to that of known compounds. The antibacterial assay indicated that only compound 2 displayed significant antibacterial activity against Bacillus cereus , with approximately 20 mm diameter growth inhibition at 10 µg/paper. HPLC analyses revealed that compound 2 is a newly induced metabolite, and the concentration of 3 was obviously enhanced in contrast to negative control, while 1 was not detected, allowing us to predict that the formation of 2 might be induced by exogenous genes derived from the sponge metagenome, whereas compound 1 could be formed through a non-enzymatic process during the isolation procedure.

  16. Functional metagenomics identifies novel genes ABCTPP, TMSRP1 and TLSRP1 among human gut enterotypes

    DEFF Research Database (Denmark)

    Verma, Manoj Kumar; Ahmed, Vasim; Gupta, Shashank

    2018-01-01

    gut microbiome to identify candidate genes responsible for the salt stress tolerance. A plasmid borne metagenomic library of Bacteroidetes enriched human fecal metagenomic DNA led to identification of unique salt osmotolerance clones SR6 and SR7. Subsequent gene analysis combined with functional...... groups in a North Indian population. This study unravels an alternative method for imparting ionic stress tolerance, which may be prevalent in the human gut microbiome....... is an important aspect of gut microbes for their survival and colonization. Identification of these survival mechanisms is a pivotal step towards understanding genomic suitability of a symbiont for successful human gut colonization. Here we highlight our recent work applying functional metagenomics to study human...

  17. Plasmid metagenome reveals high levels of antibiotic resistance genes and mobile genetic elements in activated sludge.

    Directory of Open Access Journals (Sweden)

    Tong Zhang

    Full Text Available The overuse or misuse of antibiotics has accelerated antibiotic resistance, creating a major challenge for the public health in the world. Sewage treatment plants (STPs are considered as important reservoirs for antibiotic resistance genes (ARGs and activated sludge characterized with high microbial density and diversity facilitates ARG horizontal gene transfer (HGT via mobile genetic elements (MGEs. However, little is known regarding the pool of ARGs and MGEs in sludge microbiome. In this study, the transposon aided capture (TRACA system was employed to isolate novel plasmids from activated sludge of one STP in Hong Kong, China. We also used Illumina Hiseq 2000 high-throughput sequencing and metagenomics analysis to investigate the plasmid metagenome. Two novel plasmids were acquired from the sludge microbiome by using TRACA system and one novel plasmid was identified through metagenomics analysis. Our results revealed high levels of various ARGs as well as MGEs for HGT, including integrons, transposons and plasmids. The application of the TRACA system to isolate novel plasmids from the environmental metagenome, coupled with subsequent high-throughput sequencing and metagenomic analysis, highlighted the prevalence of ARGs and MGEs in microbial community of STPs.

  18. Plasmid metagenome reveals high levels of antibiotic resistance genes and mobile genetic elements in activated sludge.

    Science.gov (United States)

    Zhang, Tong; Zhang, Xu-Xiang; Ye, Lin

    2011-01-01

    The overuse or misuse of antibiotics has accelerated antibiotic resistance, creating a major challenge for the public health in the world. Sewage treatment plants (STPs) are considered as important reservoirs for antibiotic resistance genes (ARGs) and activated sludge characterized with high microbial density and diversity facilitates ARG horizontal gene transfer (HGT) via mobile genetic elements (MGEs). However, little is known regarding the pool of ARGs and MGEs in sludge microbiome. In this study, the transposon aided capture (TRACA) system was employed to isolate novel plasmids from activated sludge of one STP in Hong Kong, China. We also used Illumina Hiseq 2000 high-throughput sequencing and metagenomics analysis to investigate the plasmid metagenome. Two novel plasmids were acquired from the sludge microbiome by using TRACA system and one novel plasmid was identified through metagenomics analysis. Our results revealed high levels of various ARGs as well as MGEs for HGT, including integrons, transposons and plasmids. The application of the TRACA system to isolate novel plasmids from the environmental metagenome, coupled with subsequent high-throughput sequencing and metagenomic analysis, highlighted the prevalence of ARGs and MGEs in microbial community of STPs.

  19. A primer on metagenomics.

    Directory of Open Access Journals (Sweden)

    John C Wooley

    2010-02-01

    Full Text Available Metagenomics is a discipline that enables the genomic study of uncultured microorganisms. Faster, cheaper sequencing technologies and the ability to sequence uncultured microbes sampled directly from their habitats are expanding and transforming our view of the microbial world. Distilling meaningful information from the millions of new genomic sequences presents a serious challenge to bioinformaticians. In cultured microbes, the genomic data come from a single clone, making sequence assembly and annotation tractable. In metagenomics, the data come from heterogeneous microbial communities, sometimes containing more than 10,000 species, with the sequence data being noisy and partial. From sampling, to assembly, to gene calling and function prediction, bioinformatics faces new demands in interpreting voluminous, noisy, and often partial sequence data. Although metagenomics is a relative newcomer to science, the past few years have seen an explosion in computational methods applied to metagenomic-based research. It is therefore not within the scope of this article to provide an exhaustive review. Rather, we provide here a concise yet comprehensive introduction to the current computational requirements presented by metagenomics, and review the recent progress made. We also note whether there is software that implements any of the methods presented here, and briefly review its utility. Nevertheless, it would be useful if readers of this article would avail themselves of the comment section provided by this journal, and relate their own experiences. Finally, the last section of this article provides a few representative studies illustrating different facets of recent scientific discoveries made using metagenomics.

  20. Analysis of bacterial xylose isomerase gene diversity using gene-targeted metagenomics.

    Science.gov (United States)

    Nurdiani, Dini; Ito, Michihiro; Maruyama, Toru; Terahara, Takeshi; Mori, Tetsushi; Ugawa, Shin; Takeyama, Haruko

    2015-08-01

    Bacterial xylose isomerases (XI) are promising resources for efficient biofuel production from xylose in lignocellulosic biomass. Here, we investigated xylose isomerase gene (xylA) diversity in three soil metagenomes differing in plant vegetation and geographical location, using an amplicon pyrosequencing approach and two newly-designed primer sets. A total of 158,555 reads from three metagenomic DNA replicates for each soil sample were classified into 1127 phylotypes, detected in triplicate and defined by 90% amino acid identity. The phylotype coverage was estimated to be within the range of 84.0-92.7%. The xylA gene phylotypes obtained were phylogenetically distributed across the two known xylA groups. They shared 49-100% identities with their closest-related XI sequences in GenBank. Phylotypes demonstrating soil sample were significantly smaller than they were between different soils based on a UniFrac distance analysis, suggesting soil-specific xylA genotypes and taxonomic compositions. The differences among xylA members and their compositions in the soil were strongly correlated with 16S rRNA variation between soil samples, also assessed by amplicon pyrosequencing. This is the first report of xylA diversity in environmental samples assessed by amplicon pyrosequencing. Our data provide information regarding xylA diversity in nature, and can be a basis for the screening of novel xylA genotypes for practical applications. Copyright © 2015. Published by Elsevier B.V.

  1. Metagenomic Profiling of Soil Microbes to Mine Salt Stress Tolerance Genes

    DEFF Research Database (Denmark)

    Ahmed, Vasim; Verma, Manoj K.; Gupta, Shashank

    2018-01-01

    /halotolerant phylotypes affiliated to Proteobacteria, Actinobacteria, Gemmatimonadetes, Bacteroidetes, Firmicutes and Acidobacteria. A functional metagenomics approach led to the identification of osmotolerant clones SSR1, SSR4, SSR6, SSR2 harbouring BCAA_ABCtp, GSDH, STK_Pknb and duf3445 genes. Furthermore, transposon...

  2. Captured metagenomics: large-scale targeting of genes based on ?sequence capture? reveals functional diversity in soils

    OpenAIRE

    Manoharan, Lokeshwaran; Kushwaha, Sandeep K.; Hedlund, Katarina; Ahr?n, Dag

    2015-01-01

    Microbial enzyme diversity is a key to understand many ecosystem processes. Whole metagenome sequencing (WMG) obtains information on functional genes, but it is costly and inefficient due to large amount of sequencing that is required. In this study, we have applied a captured metagenomics technique for functional genes in soil microorganisms, as an alternative to WMG. Large-scale targeting of functional genes, coding for enzymes related to organic matter degradation, was applied to two agric...

  3. Bacterial human virulence genes across diverse habitats as assessed by In silico analysis of environmental metagenomes

    DEFF Research Database (Denmark)

    Søborg, Ditte Andreasen; Hendriksen, Niels B.; Kilian, Mogens

    2016-01-01

    The occurrence and distribution of clinically relevant bacterial virulence genes across natural (non-human) environments is not well understood. We aimed to investigate the occurrence of homologs to bacterial human virulence genes in a variety of ecological niches to better understand the role...... of natural environments in the evolution of bacterial virulence. Twenty four bacterial virulence genes were analyzed in 46 diverse environmental metagenomic datasets, representing various soils, seawater, freshwater, marine sediments, hot springs, the deep-sea, hypersaline mats, microbialites, gutless worms...... and glacial ice. Homologs to 16 bacterial human virulence genes, involved in urinary tract infections, gastrointestinal diseases, skin diseases, and wound and systemic infections, showed global ubiquity. A principal component analysis did not demonstrate clear trends across the metagenomes with respect...

  4. SmashCommunity: A metagenomic annotation and analysis tool

    DEFF Research Database (Denmark)

    Arumugam, Manimozhiyan; Harrington, Eoghan D; Foerstner, Konrad U

    2010-01-01

    SUMMARY: SmashCommunity is a stand-alone metagenomic annotation and analysis pipeline suitable for data from Sanger and 454 sequencing technologies. It supports state-of-the-art software for essential metagenomic tasks such as assembly and gene prediction. It provides tools to estimate the quanti......SUMMARY: SmashCommunity is a stand-alone metagenomic annotation and analysis pipeline suitable for data from Sanger and 454 sequencing technologies. It supports state-of-the-art software for essential metagenomic tasks such as assembly and gene prediction. It provides tools to estimate...

  5. Exploration of noncoding sequences in metagenomes.

    Directory of Open Access Journals (Sweden)

    Fabián Tobar-Tosse

    Full Text Available Environment-dependent genomic features have been defined for different metagenomes, whose genes and their associated processes are related to specific environments. Identification of ORFs and their functional categories are the most common methods for association between functional and environmental features. However, this analysis based on finding ORFs misses noncoding sequences and, therefore, some metagenome regulatory or structural information could be discarded. In this work we analyzed 23 whole metagenomes, including coding and noncoding sequences using the following sequence patterns: (G+C content, Codon Usage (Cd, Trinucleotide Usage (Tn, and functional assignments for ORF prediction. Herein, we present evidence of a high proportion of noncoding sequences discarded in common similarity-based methods in metagenomics, and the kind of relevant information present in those. We found a high density of trinucleotide repeat sequences (TRS in noncoding sequences, with a regulatory and adaptive function for metagenome communities. We present associations between trinucleotide values and gene function, where metagenome clustering correlate with microorganism adaptations and kinds of metagenomes. We propose here that noncoding sequences have relevant information to describe metagenomes that could be considered in a whole metagenome analysis in order to improve their organization, classification protocols, and their relation with the environment.

  6. Bacterial human virulence genes across diverse habitats as assessed by in silico analysis of environmental metagenomes

    Directory of Open Access Journals (Sweden)

    Ditte Andreasen Søborg

    2016-11-01

    Full Text Available The occurrence and distribution of clinically relevant bacterial virulence genes across natural (non-human environments is not well understood. We aimed to investigate the occurrence of homologues to bacterial human virulence genes in a variety of ecological niches to better understand the role of natural environments in the evolution of bacterial virulence. Twentyfour bacterial virulence genes were analyzed in 47 diverse environmental metagenomic datasets, representing various soils, seawater, freshwater, marine sediments, hot springs, the deep-sea, hypersaline mats, microbialites, gutless worms and glacial ice. Homologues to 17 bacterial human virulence genes, involved in urinary tract infections, gastrointestinal diseases, skin diseases, and wound and systemic infections, showed global ubiquity. A principal component analysis did not demonstrate clear trends across the metagenomes with respect to occurrence and frequency of observed gene homologues. Full-length (>95% homologues of several virulence genes were identified, and translated sequences of the environmental and clinical genes were up to 50-100% identical. Furthermore, phylogenetic analyses indicated deep branching positions of some of the environmental gene homologues, suggesting that they represent ancient lineages in the phylogeny of the clinical genes. Fifteen virulence gene homologues were detected in metagenomes based on metatranscriptomic data, providing evidence of environmental expression. The ubiquitous presence and transcription of the virulence gene homologues in non-human environments point to an important ecological role of the genes for the activity and survival of environmental bacteria. Furthermore, the high degree of sequence conservation between several of the environmental and clinical genes suggests common ancestral origins.

  7. Insights into novel antimicrobial compounds and antibiotic resistance genes from soil metagenomes

    Directory of Open Access Journals (Sweden)

    Alinne P Castro

    2014-09-01

    Full Text Available In recent years a major worldwide problem has arisen with regard to infectious diseases caused by resistant bacteria. Resistant pathogens are related to high mortality and also to enormous healthcare costs. In this field, cultured microorganisms have been commonly focused in attempts to isolate antibiotic resistance genes or to identify antimicrobial compounds. Although this strategy has been successful in many cases, most of the microbial diversity and related antimicrobial molecules have been completely lost. As an alternative, metagenomics has been used as a reliable approach to reveal the prospective reservoir of antimicrobial compounds and antibiotic resistance genes in the uncultured microbial community that inhabits a number of environments. In this context, this review will focus on resistance genes as well as on novel antibiotics revealed by a metagenomics approach from the soil environment. Biotechnology prospects are also discussed, opening new frontiers for antibiotic development.

  8. Comparison of normalization methods for the analysis of metagenomic gene abundance data.

    Science.gov (United States)

    Pereira, Mariana Buongermino; Wallroth, Mikael; Jonsson, Viktor; Kristiansson, Erik

    2018-04-20

    In shotgun metagenomics, microbial communities are studied through direct sequencing of DNA without any prior cultivation. By comparing gene abundances estimated from the generated sequencing reads, functional differences between the communities can be identified. However, gene abundance data is affected by high levels of systematic variability, which can greatly reduce the statistical power and introduce false positives. Normalization, which is the process where systematic variability is identified and removed, is therefore a vital part of the data analysis. A wide range of normalization methods for high-dimensional count data has been proposed but their performance on the analysis of shotgun metagenomic data has not been evaluated. Here, we present a systematic evaluation of nine normalization methods for gene abundance data. The methods were evaluated through resampling of three comprehensive datasets, creating a realistic setting that preserved the unique characteristics of metagenomic data. Performance was measured in terms of the methods ability to identify differentially abundant genes (DAGs), correctly calculate unbiased p-values and control the false discovery rate (FDR). Our results showed that the choice of normalization method has a large impact on the end results. When the DAGs were asymmetrically present between the experimental conditions, many normalization methods had a reduced true positive rate (TPR) and a high false positive rate (FPR). The methods trimmed mean of M-values (TMM) and relative log expression (RLE) had the overall highest performance and are therefore recommended for the analysis of gene abundance data. For larger sample sizes, CSS also showed satisfactory performance. This study emphasizes the importance of selecting a suitable normalization methods in the analysis of data from shotgun metagenomics. Our results also demonstrate that improper methods may result in unacceptably high levels of false positives, which in turn may lead

  9. Beyond classification: gene-family phylogenies from shotgun metagenomic reads enable accurate community analysis.

    Science.gov (United States)

    Riesenfeld, Samantha J; Pollard, Katherine S

    2013-06-22

    Sequence-based phylogenetic trees are a well-established tool for characterizing diversity of both macroorganisms and microorganisms. Phylogenetic methods have recently been applied to shotgun metagenomic data from microbial communities, particularly with the aim of classifying reads. But the accuracy of gene-family phylogenies that characterize evolutionary relationships among short, non-overlapping sequencing reads has not been thoroughly evaluated. To quantify errors in metagenomic read trees, we developed MetaPASSAGE, a software pipeline to generate in silico bacterial communities, simulate a sample of shotgun reads from a gene family represented in the community, orient or translate reads, and produce a profile-based alignment of the reads from which a gene-family phylogenetic tree can be built. We applied MetaPASSAGE to a variety of RNA and protein-coding gene families, built trees using a range of different phylogenetic methods, and compared the resulting trees using topological and branch-length error metrics. We identified read length as one of the major sources of error. Because phylogenetic methods use a reference database of full-length sequences from the gene family to guide construction of alignments and trees, we found that error can also be substantially reduced through increasing the size and diversity of the reference database. Finally, UniFrac analysis, which compares metagenomic samples based on a summary statistic computed over all branches in a read tree, is very robust to the level of error we observe. Bacterial community diversity can be quantified using phylogenetic approaches applied to shotgun metagenomic data. As sequencing reads get longer and more genomes across the bacterial tree of life are sequenced, the accuracy of this approach will continue to improve, opening the door to more applications.

  10. Metagenomic Profiling of Antibiotic Resistance Genes and Mobile Genetic Elements in a Tannery Wastewater Treatment Plant

    OpenAIRE

    Wang, Zhu; Zhang, Xu-Xiang; Huang, Kailong; Miao, Yu; Shi, Peng; Liu, Bo; Long, Chao; Li, Aimin

    2013-01-01

    Antibiotics are often used to prevent sickness and improve production in animal agriculture, and the residues in animal bodies may enter tannery wastewater during leather production. This study aimed to use Illumina high-throughput sequencing to investigate the occurrence, diversity and abundance of antibiotic resistance genes (ARGs) and mobile genetic elements (MGEs) in aerobic and anaerobic sludge of a full-scale tannery wastewater treatment plant (WWTP). Metagenomic analysis showed that Pr...

  11. Screening currency notes for microbial pathogens and antibiotic resistance genes using a shotgun metagenomic approach.

    Directory of Open Access Journals (Sweden)

    Saakshi Jalali

    Full Text Available Fomites are a well-known source of microbial infections and previous studies have provided insights into the sojourning microbiome of fomites from various sources. Paper currency notes are one of the most commonly exchanged objects and its potential to transmit pathogenic organisms has been well recognized. Approaches to identify the microbiome associated with paper currency notes have been largely limited to culture dependent approaches. Subsequent studies portrayed the use of 16S ribosomal RNA based approaches which provided insights into the taxonomical distribution of the microbiome. However, recent techniques including shotgun sequencing provides resolution at gene level and enable estimation of their copy numbers in the metagenome. We investigated the microbiome of Indian paper currency notes using a shotgun metagenome sequencing approach. Metagenomic DNA isolated from samples of frequently circulated denominations of Indian currency notes were sequenced using Illumina Hiseq sequencer. Analysis of the data revealed presence of species belonging to both eukaryotic and prokaryotic genera. The taxonomic distribution at kingdom level revealed contigs mapping to eukaryota (70%, bacteria (9%, viruses and archae (~1%. We identified 78 pathogens including Staphylococcus aureus, Corynebacterium glutamicum, Enterococcus faecalis, and 75 cellulose degrading organisms including Acidothermus cellulolyticus, Cellulomonas flavigena and Ruminococcus albus. Additionally, 78 antibiotic resistance genes were identified and 18 of these were found in all the samples. Furthermore, six out of 78 pathogens harbored at least one of the 18 common antibiotic resistance genes. To the best of our knowledge, this is the first report of shotgun metagenome sequence dataset of paper currency notes, which can be useful for future applications including as bio-surveillance of exchangeable fomites for infectious agents.

  12. Screening currency notes for microbial pathogens and antibiotic resistance genes using a shotgun metagenomic approach.

    Science.gov (United States)

    Jalali, Saakshi; Kohli, Samantha; Latka, Chitra; Bhatia, Sugandha; Vellarikal, Shamsudheen Karuthedath; Sivasubbu, Sridhar; Scaria, Vinod; Ramachandran, Srinivasan

    2015-01-01

    Fomites are a well-known source of microbial infections and previous studies have provided insights into the sojourning microbiome of fomites from various sources. Paper currency notes are one of the most commonly exchanged objects and its potential to transmit pathogenic organisms has been well recognized. Approaches to identify the microbiome associated with paper currency notes have been largely limited to culture dependent approaches. Subsequent studies portrayed the use of 16S ribosomal RNA based approaches which provided insights into the taxonomical distribution of the microbiome. However, recent techniques including shotgun sequencing provides resolution at gene level and enable estimation of their copy numbers in the metagenome. We investigated the microbiome of Indian paper currency notes using a shotgun metagenome sequencing approach. Metagenomic DNA isolated from samples of frequently circulated denominations of Indian currency notes were sequenced using Illumina Hiseq sequencer. Analysis of the data revealed presence of species belonging to both eukaryotic and prokaryotic genera. The taxonomic distribution at kingdom level revealed contigs mapping to eukaryota (70%), bacteria (9%), viruses and archae (~1%). We identified 78 pathogens including Staphylococcus aureus, Corynebacterium glutamicum, Enterococcus faecalis, and 75 cellulose degrading organisms including Acidothermus cellulolyticus, Cellulomonas flavigena and Ruminococcus albus. Additionally, 78 antibiotic resistance genes were identified and 18 of these were found in all the samples. Furthermore, six out of 78 pathogens harbored at least one of the 18 common antibiotic resistance genes. To the best of our knowledge, this is the first report of shotgun metagenome sequence dataset of paper currency notes, which can be useful for future applications including as bio-surveillance of exchangeable fomites for infectious agents.

  13. Functional Metagenomics as a Tool for Identification of New Antibiotic Resistance Genes from Natural Environments.

    Science.gov (United States)

    Dos Santos, Débora Farage Knupp; Istvan, Paula; Quirino, Betania Ferraz; Kruger, Ricardo Henrique

    2017-02-01

    Antibiotic resistance has become a major concern for human and animal health, as therapeutic alternatives to treat multidrug-resistant microorganisms are rapidly dwindling. The problem is compounded by low investment in antibiotic research and lack of new effective antimicrobial drugs on the market. Exploring environmental antibiotic resistance genes (ARGs) will help us to better understand bacterial resistance mechanisms, which may be the key to identifying new drug targets. Because most environment-associated microorganisms are not yet cultivable, culture-independent techniques are essential to determine which organisms are present in a given environmental sample and allow the assessment and utilization of the genetic wealth they represent. Metagenomics represents a powerful tool to achieve these goals using sequence-based and functional-based approaches. Functional metagenomic approaches are particularly well suited to the identification new ARGs from natural environments because, unlike sequence-based approaches, they do not require previous knowledge of these genes. This review discusses functional metagenomics-based ARG research and describes new possibilities for surveying the resistome in environmental samples.

  14. Characterization of a Soil Metagenome-Derived Gene Encoding Wax Ester Synthase.

    Science.gov (United States)

    Kim, Nam Hee; Park, Ji-Hye; Chung, Eunsook; So, Hyun-Ah; Lee, Myung Hwan; Kim, Jin-Cheol; Hwang, Eul Chul; Lee, Seon-Woo

    2016-02-01

    A soil metagenome contains the genomes of all microbes included in a soil sample, including those that cannot be cultured. In this study, soil metagenome libraries were searched for microbial genes exhibiting lipolytic activity and those involved in potential lipid metabolism that could yield valuable products in microorganisms. One of the subclones derived from the original fosmid clone, pELP120, was selected for further analysis. A subclone spanning a 3.3 kb DNA fragment was found to encode for lipase/esterase and contained an additional partial open reading frame encoding a wax ester synthase (WES) motif. Consequently, both pELP120 and the full length of the gene potentially encoding WES were sequenced. To determine if the wes gene encoded a functioning WES protein that produced wax esters, gas chromatography-mass spectroscopy was conducted using ethyl acetate extract from an Escherichia coli strain that expressed the wes gene and was grown with hexadecanol. The ethyl acetate extract from this E. coli strain did indeed produce wax ester compounds of various carbon-chain lengths. DNA sequence analysis of the full-length gene revealed that the gene cluster may be derived from a member of Proteobacteria, whereas the clone does not contain any clear phylogenetic markers. These results suggest that the wes gene discovered in this study encodes a functional protein in E. coli and produces wax esters through a heterologous expression system.

  15. Detecting nitrous oxide reductase (NosZ) genes in soil metagenomes: method development and implications for the nitrogen cycle.

    Science.gov (United States)

    Orellana, L H; Rodriguez-R, L M; Higgins, S; Chee-Sanford, J C; Sanford, R A; Ritalahti, K M; Löffler, F E; Konstantinidis, K T

    2014-06-03

    Microbial activities in soils, such as (incomplete) denitrification, represent major sources of nitrous oxide (N2O), a potent greenhouse gas. The key enzyme for mitigating N2O emissions is NosZ, which catalyzes N2O reduction to N2. We recently described "atypical" functional NosZ proteins encoded by both denitrifiers and nondenitrifiers, which were missed in previous environmental surveys (R. A. Sanford et al., Proc. Natl. Acad. Sci. U. S. A. 109:19709-19714, 2012, doi:10.1073/pnas.1211238109). Here, we analyzed the abundance and diversity of both nosZ types in whole-genome shotgun metagenomes from sandy and silty loam agricultural soils that typify the U.S. Midwest corn belt. First, different search algorithms and parameters for detecting nosZ metagenomic reads were evaluated based on in silico-generated (mock) metagenomes. Using the derived cutoffs, 71 distinct alleles (95% amino acid identity level) encoding typical or atypical NosZ proteins were detected in both soil types. Remarkably, more than 70% of the total nosZ reads in both soils were classified as atypical, emphasizing that prior surveys underestimated nosZ abundance. Approximately 15% of the total nosZ reads were taxonomically related to Anaeromyxobacter, which was the most abundant genus encoding atypical NosZ-type proteins in both soil types. Further analyses revealed that atypical nosZ genes outnumbered typical nosZ genes in most publicly available soil metagenomes, underscoring their potential role in mediating N2O consumption in soils. Therefore, this study provides a bioinformatics strategy to reliably detect target genes in complex short-read metagenomes and suggests that the analysis of both typical and atypical nosZ sequences is required to understand and predict N2O flux in soils. Nitrous oxide (N2O) is a potent greenhouse gas with ozone layer destruction potential. Microbial activities control both the production and the consumption of N2O, i.e., its conversion to innocuous dinitrogen gas (N

  16. Metagenomic and network analysis reveal wide distribution and co-occurrence of environmental antibiotic resistance genes

    OpenAIRE

    Li, Bing; Yang, Ying; Ma, Liping; Ju, Feng; Guo, Feng; Tiedje, James M; Zhang, Tong

    2015-01-01

    A metagenomic approach and network analysis was used to investigate the wide-spectrum profiles of antibiotic resistance genes (ARGs) and their co-occurrence patterns in 50 samples from 10 typical environments. In total, 260 ARG subtypes belonging to 18 ARG types were detected with an abundance range of 5.4 × 10−6–2.2 × 10−1 copy of ARG per copy of 16S-rRNA gene. The trend of the total ARG abundances in environments matched well with the levels of anthropogenic impacts on these environments. F...

  17. Shotgun metagenomics reveals a wide array of antibiotic resistance genes and mobile elements in a polluted lake in India

    Directory of Open Access Journals (Sweden)

    Johan eBengtsson-Palme

    2014-12-01

    Full Text Available There is increasing evidence for an environmental origin of many antibiotic resistance genes. Consequently, it is important to identify environments of particular risk for selecting and maintaining such resistance factors. In this study, we described the diversity of antibiotic resistance genes in an Indian lake subjected to industrial pollution with fluoroquinolone antibiotics. We also assessed the genetic context of the identified resistance genes, to try to predict their genetic transferability. The lake harbored a wide range of resistance genes (81 identified gene types against essentially every major class of antibiotics, as well as genes responsible for mobilization of genetic material. Resistance genes were estimated to be 7000 times more abundant than in a Swedish lake included for comparison, where only eight resistance genes were found. The sul2 and qnrD genes were the most common resistance genes in the Indian lake. Twenty-six known and twenty-one putative novel plasmids were recovered in the Indian lake metagenome, which, together with the genes found, indicate a large potential for horizontal gene transfer through conjugation. Interestingly, the microbial community of the lake still included a wide range of taxa, suggesting that, across most phyla, bacteria has adapted relatively well to this highly polluted environment. Based on the wide range and high abundance of known resistance factors we have detected, it is plausible that yet unrecognized resistance genes are also present in the lake. Thus, we conclude that environments polluted with waste from antibiotic manufacturing could be important reservoirs for mobile antibiotic resistance genes.

  18. Metagenomic analysis reveals that bacteriophages are reservoirs of antibiotic resistance genes.

    Science.gov (United States)

    Subirats, Jéssica; Sànchez-Melsió, Alexandre; Borrego, Carles M; Balcázar, José Luis; Simonet, Pascal

    2016-08-01

    A metagenomics approach was applied to explore the presence of antibiotic resistance genes (ARGs) in bacteriophages from hospital wastewater. Metagenomic analysis showed that most phage sequences affiliated to the order Caudovirales, comprising the tailed phage families Podoviridae, Siphoviridae and Myoviridae. Moreover, the relative abundance of ARGs in the phage DNA fraction (0.26%) was higher than in the bacterial DNA fraction (0.18%). These differences were particularly evident for genes encoding ATP-binding cassette (ABC) and resistance-nodulation-cell division (RND) proteins, phosphotransferases, β-lactamases and plasmid-mediated quinolone resistance. Analysis of assembled contigs also revealed that blaOXA-10, blaOXA-58 and blaOXA-24 genes belonging to class D β-lactamases as well as a novel blaTEM (98.9% sequence similarity to the blaTEM-1 gene) belonging to class A β-lactamases were detected in a higher proportion in phage DNA. Although preliminary, these findings corroborate the role of bacteriophages as reservoirs of resistance genes and thus highlight the necessity to include them in future studies on the emergence and spread of antibiotic resistance in the environment. Copyright © 2016 Elsevier B.V. and International Society of Chemotherapy. All rights reserved.

  19. Fate of antibiotic resistance genes in sewage treatment plant revealed by metagenomic approach.

    Science.gov (United States)

    Yang, Ying; Li, Bing; Zou, Shichun; Fang, Herbert H P; Zhang, Tong

    2014-10-01

    Antibiotic resistance has become a serious threat to human health. Sewage treatment plant (STP) is one of the major sources of antibiotic resistance genes (ARGs) in natural environment. High-throughput sequencing-based metagenomic approach was applied to investigate the broad-spectrum profiles and fate of ARGs in a full scale STP. Totally, 271 ARGs subtypes belonging to 18 ARGs types were identified by the broad scanning of metagenomic analysis. Influent had the highest ARGs abundance, followed by effluent, anaerobic digestion sludge and activated sludge. 78 ARGs subtypes persisted through the biological wastewater and sludge treatment process. The high removal efficiency of 99.82% for total ARGs in wastewater suggested that sewage treatment process is effective in reducing ARGs. But the removal efficiency of ARGs in sludge treatment was not as good as that in sewage treatment. Furthermore, the composition of microbial communities was examined and the correlation between microbial community and ARGs was investigated using redundancy analysis. Significant correlation between 6 genera and the distribution of ARGs were found and 5 of the 6 genera included potential pathogens. This is the first study on the fate of ARGs in STP using metagenomic analysis with high-throughput sequencing and hopefully would enhance our knowledge on fate of ARGs in STP. Copyright © 2014 Elsevier Ltd. All rights reserved.

  20. Metagenomic analysis of lysogeny in Tampa Bay: implications for prophage gene expression.

    Directory of Open Access Journals (Sweden)

    Lauren McDaniel

    Full Text Available Phage integrase genes often play a role in the establishment of lysogeny in temperate phage by catalyzing the integration of the phage into one of the host's replicons. To investigate temperate phage gene expression, an induced viral metagenome from Tampa Bay was sequenced by 454/Pyrosequencing. The sequencing yielded 294,068 reads with 6.6% identifiable. One hundred-three sequences had significant similarity to integrases by BLASTX analysis (e < or =0.001. Four sequences with strongest amino-acid level similarity to integrases were selected and real-time PCR primers and probes were designed. Initial testing with microbial fraction DNA from Tampa Bay revealed 1.9 x 10(7, and 1300 gene copies of Vibrio-like integrase and Oceanicola-like integrase L(-1 respectively. The other two integrases were not detected. The integrase assay was then tested on microbial fraction RNA extracted from 200 ml of Tampa Bay water sampled biweekly over a 12 month time series. Vibrio-like integrase gene expression was detected in three samples, with estimated copy numbers of 2.4-1280 L(-1. Clostridium-like integrase gene expression was detected in 6 samples, with estimated copy numbers of 37 to 265 L(-1. In all cases, detection of integrase gene expression corresponded to the occurrence of lysogeny as detected by prophage induction. Investigation of the environmental distribution of the two expressed integrases in the Global Ocean Survey Database found the Vibrio-like integrase was present in genome equivalents of 3.14% of microbial libraries and all four viral metagenomes. There were two similar genes in the library from British Columbia and one similar gene was detected in both the Gulf of Mexico and Sargasso Sea libraries. In contrast, in the Arctic library eleven similar genes were observed. The Clostridium-like integrase was less prevalent, being found in 0.58% of the microbial and none of the viral libraries. These results underscore the value of metagenomic data

  1. Microbial population index and community structure in saline-alkaline soil using gene targeted metagenomics.

    Science.gov (United States)

    Keshri, Jitendra; Mishra, Avinash; Jha, Bhavanath

    2013-03-30

    Population indices of bacteria and archaea were investigated from saline-alkaline soil and a possible microbe-environment pattern was established using gene targeted metagenomics. Clone libraries were constructed using 16S rRNA and functional gene(s) involved in carbon fixation (cbbL), nitrogen fixation (nifH), ammonia oxidation (amoA) and sulfur metabolism (apsA). Molecular phylogeny revealed the dominance of Actinobacteria, Firmicutes and Proteobacteria along with archaeal members of Halobacteraceae. The library consisted of novel bacterial (20%) and archaeal (38%) genera showing ≤95% similarity to previously retrieved sequences. Phylogenetic analysis indicated ability of inhabitant to survive in stress condition. The 16S rRNA gene libraries contained novel gene sequences and were distantly homologous with cultured bacteria. Functional gene libraries were found unique and most of the clones were distantly related to Proteobacteria, while clones of nifH gene library also showed homology with Cyanobacteria and Firmicutes. Quantitative real-time PCR exhibited that bacterial abundance was two orders of magnitude higher than archaeal. The gene(s) quantification indicated the size of the functional guilds harboring relevant key genes. The study provides insights on microbial ecology and different metabolic interactions occurring in saline-alkaline soil, possessing phylogenetically diverse groups of bacteria and archaea, which may be explored further for gene cataloging and metabolic profiling. Copyright © 2012 Elsevier GmbH. All rights reserved.

  2. Identification and characterization of a novel fumarase gene by metagenome expression cloning from marine microorganisms

    Directory of Open Access Journals (Sweden)

    Tang Xian-Lai

    2010-11-01

    Full Text Available Abstract Background Fumarase catalyzes the reversible hydration of fumarate to L-malate and is a key enzyme in the tricarboxylic acid (TCA cycle and in amino acid metabolism. Fumarase is also used for the industrial production of L-malate from the substrate fumarate. Thermostable and high-activity fumarases from organisms that inhabit extreme environments may have great potential in industry, biotechnology, and basic research. The marine environment is highly complex and considered one of the main reservoirs of microbial diversity on the planet. However, most of the microorganisms are inaccessible in nature and are not easily cultivated in the laboratory. Metagenomic approaches provide a powerful tool to isolate and identify enzymes with novel biocatalytic activities for various biotechnological applications. Results A plasmid metagenomic library was constructed from uncultivated marine microorganisms within marine water samples. Through sequence-based screening of the DNA library, a gene encoding a novel fumarase (named FumF was isolated. Amino acid sequence analysis revealed that the FumF protein shared the greatest homology with Class II fumarate hydratases from Bacteroides sp. 2_1_33B and Parabacteroides distasonis ATCC 8503 (26% identical and 43% similar. The putative fumarase gene was subcloned into pETBlue-2 vector and expressed in E. coli BL21(DE3pLysS. The recombinant protein was purified to homogeneity. Functional characterization by high performance liquid chromatography confirmed that the recombinant FumF protein catalyzed the hydration of fumarate to form L-malate. The maximum activity for FumF protein occurred at pH 8.5 and 55°C in 5 mM Mg2+. The enzyme showed higher affinity and catalytic efficiency under optimal reaction conditions: Km= 0.48 mM, Vmax = 827 μM/min/mg, and kcat/Km = 1900 mM/s. Conclusions We isolated a novel fumarase gene, fumF, from a sequence-based screen of a plasmid metagenomic library from uncultivated

  3. Chitinase genes revealed and compared in bacterial isolates, DNA extracts and a metagenomic library from a phytopathogen suppressive soil

    Energy Technology Data Exchange (ETDEWEB)

    Hjort, K.; Bergstrom, M.; Adesina, M.F.; Jansson, J.K.; Smalla, K.; Sjoling, S.

    2009-09-01

    Soil that is suppressive to disease caused by fungal pathogens is an interesting source to target for novel chitinases that might be contributing towards disease suppression. In this study we screened for chitinase genes, in a phytopathogen-suppressive soil in three ways: (1) from a metagenomic library constructed from microbial cells extracted from soil, (2) from directly extracted DNA and (3) from bacterial isolates with antifungal and chitinase activities. Terminal-restriction fragment length polymorphism (T-RFLP) of chitinase genes revealed differences in amplified chitinase genes from the metagenomic library and the directly extracted DNA, but approximately 40% of the identified chitinase terminal-restriction fragments (TRFs) were found in both sources. All of the chitinase TRFs from the isolates were matched to TRFs in the directly extracted DNA and the metagenomic library. The most abundant chitinase TRF in the soil DNA and the metagenomic library corresponded to the TRF{sup 103} of the isolate, Streptomyces mutomycini and/or Streptomyces clavifer. There were good matches between T-RFLP profiles of chitinase gene fragments obtained from different sources of DNA. However, there were also differences in both the chitinase and the 16S rRNA gene T-RFLP patterns depending on the source of DNA, emphasizing the lack of complete coverage of the gene diversity by any of the approaches used.

  4. Functional metagenomic characterization of antibiotic resistance genes in agricultural soils from China.

    Science.gov (United States)

    Su, Jian Qiang; Wei, Bei; Xu, Chun Yan; Qiao, Min; Zhu, Yong Guan

    2014-04-01

    Soil has been regarded as a rich source of antibiotic resistance genes (ARGs) due to the complex microbial community and diverse antibiotic-producing microbes in soil, however, little is known about the ARGs in unculturable bacteria. To investigate the diversity and distribution of ARGs in soil and assess the impact of agricultural practice on the ARGs, we screened soil metagenomic library constructed using DNA from four different agricultural soil for ARGs. We identified 45 clones conferring resistance to minocycline, tetracycline, streptomycin, gentamicin, kanamycin, amikacin, chloramphenicol and rifampicin. The similarity of identified ARGs with the closest protein in GenBank ranged from 26% to 92%, with more than 60% of identified ARGs had low similarity less than 60% at amino acid level. The identified ARGs include aminoglycoside acetyltransferase, aminoglycoside 6-adenyltransferase, ADP-ribosyl transferase, ribosome protection protein, transporters and other antibiotic resistant determinants. The identified ARGs from the soil with manure application account for approximately 70% of the total ARGs in this study, implying that manure amendment may increase the diversity of antibiotic resistance genes in soil bacteria. These results suggest that antibiotic resistance in soil remains unexplored and functional metagenomic approach is powerful in discovering novel ARGs and resistant mechanisms. Copyright © 2013 Elsevier Ltd. All rights reserved.

  5. Functional Screening of Antibiotic Resistance Genes from a Representative Metagenomic Library of Food Fermenting Microbiota

    Directory of Open Access Journals (Sweden)

    Chiara Devirgiliis

    2014-01-01

    Full Text Available Lactic acid bacteria (LAB represent the predominant microbiota in fermented foods. Foodborne LAB have received increasing attention as potential reservoir of antibiotic resistance (AR determinants, which may be horizontally transferred to opportunistic pathogens. We have previously reported isolation of AR LAB from the raw ingredients of a fermented cheese, while AR genes could be detected in the final, marketed product only by PCR amplification, thus pointing at the need for more sensitive microbial isolation techniques. We turned therefore to construction of a metagenomic library containing microbial DNA extracted directly from the food matrix. To maximize yield and purity and to ensure that genomic complexity of the library was representative of the original bacterial population, we defined a suitable protocol for total DNA extraction from cheese which can also be applied to other lipid-rich foods. Functional library screening on different antibiotics allowed recovery of ampicillin and kanamycin resistant clones originating from Streptococcus salivarius subsp. thermophilus and Lactobacillus helveticus genomes. We report molecular characterization of the cloned inserts, which were fully sequenced and shown to confer AR phenotype to recipient bacteria. We also show that metagenomics can be applied to food microbiota to identify underrepresented species carrying specific genes of interest.

  6. Metagenomic analysis revealed highly diverse microbial arsenic metabolism genes in paddy soils with low-arsenic contents

    International Nuclear Information System (INIS)

    Xiao, Ke-Qing; Li, Li-Guan; Ma, Li-Ping; Zhang, Si-Yu; Bao, Peng; Zhang, Tong; Zhu, Yong-Guan

    2016-01-01

    Microbe-mediated arsenic (As) metabolism plays a critical role in global As cycle, and As metabolism involves different types of genes encoding proteins facilitating its biotransformation and transportation processes. Here, we used metagenomic analysis based on high-throughput sequencing and constructed As metabolism protein databases to analyze As metabolism genes in five paddy soils with low-As contents. The results showed that highly diverse As metabolism genes were present in these paddy soils, with varied abundances and distribution for different types and subtypes of these genes. Arsenate reduction genes (ars) dominated in all soil samples, and significant correlation existed between the abundance of arr (arsenate respiration), aio (arsenite oxidation), and arsM (arsenite methylation) genes, indicating the co-existence and close-relation of different As resistance systems of microbes in wetland environments similar to these paddy soils after long-term evolution. Among all soil parameters, pH was an important factor controlling the distribution of As metabolism gene in five paddy soils (p = 0.018). To the best of our knowledge, this is the first study using high-throughput sequencing and metagenomics approach in characterizing As metabolism genes in the five paddy soil, showing their great potential in As biotransformation, and therefore in mitigating arsenic risk to humans. - Highlights: • Use metagenomics to analyze As metabolism genes in paddy soils with low-As content. • These genes were ubiquitous, abundant, and associated with diverse microbes. • pH as an important factor controlling their distribution in paddy soil. • Imply combinational effect of evolution and selection on As metabolism genes. - Metagenomics was used to analyze As metabolism genes in paddy soils with low-As contents. These genes were ubiquitous, abundant, and associated with diverse microbes.

  7. Evaluation of a hybrid approach using UBLAST and BLASTX for metagenomic sequences annotation of specific functional genes.

    Directory of Open Access Journals (Sweden)

    Ying Yang

    Full Text Available The fast development of next generation sequencing (NGS has dramatically increased the application of metagenomics in various aspects. Functional annotation is a major step in the metagenomics studies. Fast annotation of functional genes has been a challenge because of the deluge of NGS data and expanding databases. A hybrid annotation pipeline proposed previously for taxonomic assignments was evaluated in this study for metagenomic sequences annotation of specific functional genes, such as antibiotic resistance genes, arsenic resistance genes and key genes in nitrogen metabolism. The hybrid approach using UBLAST and BLASTX is 44-177 times faster than direct BLASTX in the annotation using the small protein database for the specific functional genes, with the cost of missing a small portion (<1.8% of target sequences compared with direct BLASTX hits. Different from direct BLASTX, the time required for specific functional genes annotation using the hybrid annotation pipeline depends on the abundance for the target genes. Thus this hybrid annotation pipeline is more suitable in specific functional genes annotation than in comprehensive functional genes annotation.

  8. Shotgun Metagenomic Sequencing Reveals Functional Genes and Microbiome Associated with Bovine Digital Dermatitis.

    Directory of Open Access Journals (Sweden)

    Martin Zinicola

    Full Text Available Metagenomic methods amplifying 16S ribosomal RNA genes have been used to describe the microbial diversity of healthy skin and lesion stages of bovine digital dermatitis (DD and to detect critical pathogens involved with disease pathogenesis. In this study, we characterized the microbiome and for the first time, the composition of functional genes of healthy skin (HS, active (ADD and inactive (IDD lesion stages using a whole-genome shotgun approach. Metagenomic sequences were annotated using MG-RAST pipeline. Six phyla were identified as the most abundant. Firmicutes and Actinobacteria were the predominant bacterial phyla in the microbiome of HS, while Spirochetes, Bacteroidetes and Proteobacteria were highly abundant in ADD and IDD. T. denticola-like, T. vincentii-like and T. phagedenis-like constituted the most abundant species in ADD and IDD. Recruitment plots comparing sequences from HS, ADD and IDD samples to the genomes of specific Treponema spp., supported the presence of T. denticola and T. vincentii in ADD and IDD. Comparison of the functional composition of HS to ADD and IDD identified a significant difference in genes associated with motility/chemotaxis and iron acquisition/metabolism. We also provide evidence that the microbiome of ADD and IDD compared to that of HS had significantly higher abundance of genes associated with resistance to copper and zinc, which are commonly used in footbaths to prevent and control DD. In conclusion, the results from this study provide new insights into the HS, ADD and IDD microbiomes, improve our understanding of the disease pathogenesis and generate unprecedented knowledge regarding the functional genetic composition of the digital dermatitis microbiome.

  9. Metagenomic profiling of antibiotic resistance genes and mobile genetic elements in a tannery wastewater treatment plant.

    Directory of Open Access Journals (Sweden)

    Zhu Wang

    Full Text Available Antibiotics are often used to prevent sickness and improve production in animal agriculture, and the residues in animal bodies may enter tannery wastewater during leather production. This study aimed to use Illumina high-throughput sequencing to investigate the occurrence, diversity and abundance of antibiotic resistance genes (ARGs and mobile genetic elements (MGEs in aerobic and anaerobic sludge of a full-scale tannery wastewater treatment plant (WWTP. Metagenomic analysis showed that Proteobacteria, Firmicutes, Bacteroidetes and Actinobacteria dominated in the WWTP, but the relative abundance of archaea in anaerobic sludge was higher than in aerobic sludge. Sequencing reads from aerobic and anaerobic sludge revealed differences in the abundance of functional genes between both microbial communities. Genes coding for antibiotic resistance were identified in both communities. BLAST analysis against Antibiotic Resistance Genes Database (ARDB further revealed that aerobic and anaerobic sludge contained various ARGs with high abundance, among which sulfonamide resistance gene sul1 had the highest abundance, occupying over 20% of the total ARGs reads. Tetracycline resistance genes (tet were highly rich in the anaerobic sludge, among which tet33 had the highest abundance, but was absent in aerobic sludge. Over 70 types of insertion sequences were detected in each sludge sample, and class 1 integrase genes were prevalent in the WWTP. The results highlighted prevalence of ARGs and MGEs in tannery WWTPs, which may deserve more public health concerns.

  10. Metagenomic profiling of antibiotic resistance genes and mobile genetic elements in a tannery wastewater treatment plant.

    Science.gov (United States)

    Wang, Zhu; Zhang, Xu-Xiang; Huang, Kailong; Miao, Yu; Shi, Peng; Liu, Bo; Long, Chao; Li, Aimin

    2013-01-01

    Antibiotics are often used to prevent sickness and improve production in animal agriculture, and the residues in animal bodies may enter tannery wastewater during leather production. This study aimed to use Illumina high-throughput sequencing to investigate the occurrence, diversity and abundance of antibiotic resistance genes (ARGs) and mobile genetic elements (MGEs) in aerobic and anaerobic sludge of a full-scale tannery wastewater treatment plant (WWTP). Metagenomic analysis showed that Proteobacteria, Firmicutes, Bacteroidetes and Actinobacteria dominated in the WWTP, but the relative abundance of archaea in anaerobic sludge was higher than in aerobic sludge. Sequencing reads from aerobic and anaerobic sludge revealed differences in the abundance of functional genes between both microbial communities. Genes coding for antibiotic resistance were identified in both communities. BLAST analysis against Antibiotic Resistance Genes Database (ARDB) further revealed that aerobic and anaerobic sludge contained various ARGs with high abundance, among which sulfonamide resistance gene sul1 had the highest abundance, occupying over 20% of the total ARGs reads. Tetracycline resistance genes (tet) were highly rich in the anaerobic sludge, among which tet33 had the highest abundance, but was absent in aerobic sludge. Over 70 types of insertion sequences were detected in each sludge sample, and class 1 integrase genes were prevalent in the WWTP. The results highlighted prevalence of ARGs and MGEs in tannery WWTPs, which may deserve more public health concerns.

  11. Abundant rifampin resistance genes and significant correlations of antibiotic resistance genes and plasmids in various environments revealed by metagenomic analysis.

    Science.gov (United States)

    Ma, Liping; Li, Bing; Zhang, Tong

    2014-06-01

    In the present study, a newly developed metagenomic analysis approach was applied to investigate the abundance and diversity of antibiotic resistance genes (ARGs) and mobile genetic elements (MGEs) in aquaculture farm sediments, activated sludge, biofilm, anaerobic digestion sludge, and river water. BLASTX analysis against the Comprehensive Antibiotic Resistance Database was conducted for the metagenomic sequence data of each sample and then the ARG-like sequences were sorted based on structured sub-database using customized scripts. The results showed that freshwater fishpond sediment had the highest abundance (196 ppm), and anaerobic digestion sludge possessed the highest diversity (133 subtypes) of ARGs among the samples in this study. Significantly, rifampin resistance genes were universal in all the diverse samples and consistently accounted for 26.9~38.6 % of the total annotated ARG sequences. Furthermore, a significant linear correlation (R (2) = 0.924) was found between diversities (number of subtypes) of ARGs and diversities of plasmids in diverse samples. This work provided a wide spectrum scan of ARGs and MGEs in different environments and revealed the prevalence of rifampin resistance genes and the strong correlation between ARG diversity and plasmid diversity for the first time.

  12. MATAM: reconstruction of phylogenetic marker genes from short sequencing reads in metagenomes.

    Science.gov (United States)

    Pericard, Pierre; Dufresne, Yoann; Couderc, Loïc; Blanquart, Samuel; Touzet, Hélène

    2018-02-15

    Advances in the sequencing of uncultured environmental samples, dubbed metagenomics, raise a growing need for accurate taxonomic assignment. Accurate identification of organisms present within a community is essential to understanding even the most elementary ecosystems. However, current high-throughput sequencing technologies generate short reads which partially cover full-length marker genes and this poses difficult bioinformatic challenges for taxonomy identification at high resolution. We designed MATAM, a software dedicated to the fast and accurate targeted assembly of short reads sequenced from a genomic marker of interest. The method implements a stepwise process based on construction and analysis of a read overlap graph. It is applied to the assembly of 16S rRNA markers and is validated on simulated, synthetic and genuine metagenomes. We show that MATAM outperforms other available methods in terms of low error rates and recovered fractions and is suitable to provide improved assemblies for precise taxonomic assignments. https://github.com/bonsai-team/matam. pierre.pericard@gmail.com or helene.touzet@univ-lille1.fr. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  13. Tetracycline Resistance Genes Identified from Distinct Soil Environments in China by Functional Metagenomics.

    Science.gov (United States)

    Wang, Shaochen; Gao, Xia; Gao, Yuejiao; Li, Yanqing; Cao, Mingming; Xi, Zhenhua; Zhao, Lixing; Feng, Zhiyang

    2017-01-01

    Soil microbiota represents one of the ancient evolutionary origins of antibiotic resistance and has been increasingly recognized as a potentially vast unstudied reservoir of resistance genes with possibilities to exchange with pathogens. Tetracycline resistance is one of the most abundant antibiotic resistances that may transfer among clinical and commensal microorganisms. To investigate tetracycline resistance genes from soil bacteria in different habitats, we performed functional analysis of three metagenomic libraries derived from soil samples collected from Yunnan, Sichuan, and Tibet, respectively, in China. We found efflux transporter genes form all the libraries, including 21 major facilitator superfamily efflux pump genes and one multidrug and toxic compound extrusion (MATE) transporter gene. Interestingly, we also identified two tetracycline destructase genes, belonging to a newly described family of tetracycline-inactivating enzymes that scarcely observed in clinical pathogens, from the Tibet library. The inactivation activity of the putative enzyme was confirmed in vitro by biochemical analysis. Our results indicated that efflux pumps distributed predominantly across habitats. Meanwhile, the mechanism of enzymatic inactivation for tetracycline resistance should not be neglected and merits further investigation.

  14. Tetracycline Resistance Genes Identified from Distinct Soil Environments in China by Functional Metagenomics

    Directory of Open Access Journals (Sweden)

    Shaochen Wang

    2017-07-01

    Full Text Available Soil microbiota represents one of the ancient evolutionary origins of antibiotic resistance and has been increasingly recognized as a potentially vast unstudied reservoir of resistance genes with possibilities to exchange with pathogens. Tetracycline resistance is one of the most abundant antibiotic resistances that may transfer among clinical and commensal microorganisms. To investigate tetracycline resistance genes from soil bacteria in different habitats, we performed functional analysis of three metagenomic libraries derived from soil samples collected from Yunnan, Sichuan, and Tibet, respectively, in China. We found efflux transporter genes form all the libraries, including 21 major facilitator superfamily efflux pump genes and one multidrug and toxic compound extrusion (MATE transporter gene. Interestingly, we also identified two tetracycline destructase genes, belonging to a newly described family of tetracycline-inactivating enzymes that scarcely observed in clinical pathogens, from the Tibet library. The inactivation activity of the putative enzyme was confirmed in vitro by biochemical analysis. Our results indicated that efflux pumps distributed predominantly across habitats. Meanwhile, the mechanism of enzymatic inactivation for tetracycline resistance should not be neglected and merits further investigation.

  15. Mining for Nonribosomal Peptide Synthetase and Polyketide Synthase Genes Revealed a High Level of Diversity in the Sphagnum Bog Metagenome.

    Science.gov (United States)

    Müller, Christina A; Oberauner-Wappis, Lisa; Peyman, Armin; Amos, Gregory C A; Wellington, Elizabeth M H; Berg, Gabriele

    2015-08-01

    Sphagnum bog ecosystems are among the oldest vegetation forms harboring a specific microbial community and are known to produce an exceptionally wide variety of bioactive substances. Although the Sphagnum metagenome shows a rich secondary metabolism, the genes have not yet been explored. To analyze nonribosomal peptide synthetases (NRPSs) and polyketide synthases (PKSs), the diversity of NRPS and PKS genes in Sphagnum-associated metagenomes was investigated by in silico data mining and sequence-based screening (PCR amplification of 9,500 fosmid clones). The in silico Illumina-based metagenomic approach resulted in the identification of 279 NRPSs and 346 PKSs, as well as 40 PKS-NRPS hybrid gene sequences. The occurrence of NRPS sequences was strongly dominated by the members of the Protebacteria phylum, especially by species of the Burkholderia genus, while PKS sequences were mainly affiliated with Actinobacteria. Thirteen novel NRPS-related sequences were identified by PCR amplification screening, displaying amino acid identities of 48% to 91% to annotated sequences of members of the phyla Proteobacteria, Actinobacteria, and Cyanobacteria. Some of the identified metagenomic clones showed the closest similarity to peptide synthases from Burkholderia or Lysobacter, which are emerging bacterial sources of as-yet-undescribed bioactive metabolites. This report highlights the role of the extreme natural ecosystems as a promising source for detection of secondary compounds and enzymes, serving as a source for biotechnological applications. Copyright © 2015, American Society for Microbiology. All Rights Reserved.

  16. Characterization of novel antibiotic resistance genes identified by functional metagenomics on soil samples.

    Science.gov (United States)

    Torres-Cortés, Gloria; Millán, Vicenta; Ramírez-Saad, Hugo C; Nisa-Martínez, Rafael; Toro, Nicolás; Martínez-Abarca, Francisco

    2011-04-01

    The soil microbial community is highly complex and contains a high density of antibiotic-producing bacteria, making it a likely source of diverse antibiotic resistance determinants. We used functional metagenomics to search for antibiotic resistance genes in libraries generated from three different soil samples, containing 3.6 Gb of DNA in total. We identified 11 new antibiotic resistance genes: 3 conferring resistance to ampicillin, 2 to gentamicin, 2 to chloramphenicol and 4 to trimethoprim. One of the clones identified was a new trimethoprim resistance gene encoding a 26.8 kDa protein closely resembling unassigned reductases of the dihydrofolate reductase group. This protein, Tm8-3, conferred trimethoprim resistance in Escherichia coli and Sinorhizobium meliloti (γ- and α-proteobacteria respectively). We demonstrated that this gene encoded an enzyme with dihydrofolate reductase activity, with kinetic constants similar to other type I and II dihydrofolate reductases (K(m) of 8.9 µM for NADPH and 3.7 µM for dihydrofolate and IC(50) of 20 µM for trimethoprim). This is the first description of a new type of reductase conferring resistance to trimethoprim. Our results indicate that soil bacteria display a high level of genetic diversity and are a reservoir of antibiotic resistance genes, supporting the use of this approach for the discovery of novel enzymes with unexpected activities unpredictable from their amino acid sequences. © 2011 Society for Applied Microbiology and Blackwell Publishing Ltd.

  17. Gene Expression and Molecular Characterization of a Xylanase from Chicken Cecum Metagenome.

    Science.gov (United States)

    Al-Darkazali, Hind; Meevootisom, Vithaya; Isarangkul, Duangnate; Wiyakrutta, Suthep

    2017-01-01

    A xylanase gene xynA MG1 with a 1,116-bp open reading frame, encoding an endo- β -1,4-xylanase, was cloned from a chicken cecum metagenome. The translated XynA MG1 protein consisted of 372 amino acids including a putative signal peptide of 23 amino acids. The calculated molecular mass of the mature XynA MG1 was 40,013 Da, with a theoretical pI value of 5.76. The amino acid sequence of XynA MG1 showed 59% identity to endo- β -1,4-xylanase from Prevotella bryantii and Prevotella ruminicola and 58% identity to that from Prevotella copri . XynA MG1 has two conserved motifs, DVVNE and TEXD, containing two active site glutamates and an invariant asparagine, characteristic of GH10 family xylanase. The xynA MG1 gene without signal peptide sequence was cloned and fused with thioredoxin protein (Trx.Tag) in pET-32a plasmid and overexpressed in Escherichia coli Tuner™(DE3)pLysS. The purified mature XynA MG1 was highly salt-tolerant and stable and displayed higher than 96% of its catalytic activity in the reaction containing 1 to 4 M NaCl. It was only slightly affected by common organic solvents added in aqueous solution to up to 5 M. This chicken cecum metagenome-derived xylanase has potential applications in animal feed additives and industrial enzymatic processes requiring exposure to high concentrations of salt and organic solvents.

  18. Gene Expression and Molecular Characterization of a Xylanase from Chicken Cecum Metagenome

    Directory of Open Access Journals (Sweden)

    Hind AL-Darkazali

    2017-01-01

    Full Text Available A xylanase gene xynAMG1 with a 1,116-bp open reading frame, encoding an endo-β-1,4-xylanase, was cloned from a chicken cecum metagenome. The translated XynAMG1 protein consisted of 372 amino acids including a putative signal peptide of 23 amino acids. The calculated molecular mass of the mature XynAMG1 was 40,013 Da, with a theoretical pI value of 5.76. The amino acid sequence of XynAMG1 showed 59% identity to endo-β-1,4-xylanase from Prevotella bryantii and Prevotella ruminicola and 58% identity to that from Prevotella copri. XynAMG1 has two conserved motifs, DVVNE and TEXD, containing two active site glutamates and an invariant asparagine, characteristic of GH10 family xylanase. The xynAMG1 gene without signal peptide sequence was cloned and fused with thioredoxin protein (Trx.Tag in pET-32a plasmid and overexpressed in Escherichia coli Tuner™(DE3pLysS. The purified mature XynAMG1 was highly salt-tolerant and stable and displayed higher than 96% of its catalytic activity in the reaction containing 1 to 4 M NaCl. It was only slightly affected by common organic solvents added in aqueous solution to up to 5 M. This chicken cecum metagenome-derived xylanase has potential applications in animal feed additives and industrial enzymatic processes requiring exposure to high concentrations of salt and organic solvents.

  19. Metagenome cloning and functional analysis of Na⁺/H⁺ antiporter genes from Keke Salt Lake in China.

    Science.gov (United States)

    Gao, Maio; Wang, Lei; Chen, San-Feng

    2012-02-01

    Na⁺/H⁺ antiporters are ubiquitous membrane proteins and play a central role in cell homeostasis including pH regulation, osmoregulation, and Na⁺/Li⁺ tolerance in bacteria. The microbial communities in extremely hypersaline soil are an important resource for isolating Na⁺/H⁺ antiporter genes. A metagenomic library containing 35,700 clones was constructed by using genomic DNA obtained from the hypersaline soil samples of Keke Salt Lake in Northwest of China. Two Na⁺/H⁺ antiporters, K1-NhaD, and K2-NhaD belonging to NhaD family, were screened and cloned from this metagenome by complementing the triple mutant Escherichia coli strain KNabc (nhaA⁻, nhaB⁻, chaA⁻) in medium containing 0.2 M NaCl. K1-NhaD and K2-NhaD have 75.5% identity at the predicted amino acid sequence. K1-NhaD has 78% identity with Na⁺/H⁺ antiporter NhaD from Halomonas elongate at the predicted amino acid sequence. The predicted K1-NhaD is a 53.5 kDa protein (487 amino acids) with 13 transmembrane helices. K2-NhaD has 73% identity with Alkalimonas amylolytica NhaD. The predicted K2-NhaD is a 55 kDa protein (495 amino acids) with 12 transmembrane helices. Both K1-NhaD and K2-NhaD could make the triple mutant E. coli KNabc (nhaA⁻, nhaB⁻, chaA⁻) grow in the LBK medium containing 0.2-0.6 M Na⁺ or with 0.05-0.4 M Li⁺. Everted membrane vesicles prepared from E. coli KNabc cells carrying K1-NhaD or K2-NhaD exhibited Na⁺/H⁺ and Li⁺/H⁺ antiporter activities which were pH-dependent with the highest activity at pH 9.5. Little K⁺/H⁺ antiporter activity was also detected in vesicles form E. coli KNabc carrying K1-NhaD or K2-NhaD.

  20. Phylogeny and phylogeography of functional genes shared among seven terrestrial subsurface metagenomes reveal N-cycling and microbial evolutionary relationships

    Directory of Open Access Journals (Sweden)

    Maggie CY Lau

    2014-10-01

    Full Text Available Comparative studies on community phylogenetics and phylogeography of microorganisms living in extreme environments are rare. Terrestrial subsurface habitats are valuable for studying microbial biogeographical patterns due to their isolation and the restricted dispersal mechanisms. Since the taxonomic identity of a microorganism does not always correspond well with its functional role in a particular community, the use of taxonomic assignments or patterns may give limited inference on how microbial functions are affected by historical, geographical and environmental factors. With seven metagenomic libraries generated from fracture water samples collected from five South African mines, this study was carried out to (1 screen for ubiquitous functions or pathways of biogeochemical cycling of CH4, S and N; (2 to characterize the biodiversity represented by the common functional genes; (3 to investigate the subsurface biogeography as revealed by this subset of genes; and (4 to explore the possibility of using metagenomic data for evolutionary study. The ubiquitous functional genes are NarV, NPD, PAP reductase, NifH, NifD, NifK, NifE and NifN genes. Although these 8 common functional genes were taxonomically and phylogenetically diverse and distinct from each other, the dissimilarity between samples did not correlate strongly with either geographical, environmental or residence time of the water. Por genes homologous to those of Thermodesulfovibrio yellowstonii detected in all metagenomes were deep lineages of Nitrospirae, suggesting that subsurface habitats have preserved ancestral genetic signatures that inform the study of the origin and evolution of prokaryotes.

  1. Metagenomic profiles of antibiotic resistance genes in paddy soils from South China.

    Science.gov (United States)

    Xiao, Ke-Qing; Li, Bing; Ma, Liping; Bao, Peng; Zhou, Xue; Zhang, Tong; Zhu, Yong-Guan

    2016-03-01

    Overuse and arbitrary discarding of antibiotics have expanded antibiotic resistance reservoirs, from gut, waste water and activated sludge, to soil, freshwater and even the ocean. Based on the structured Antibiotic Resistance Genes Database and next generation sequencing, metagenomic analysis was used for the first time to detect and quantify antibiotic resistance genes (ARGs) in paddy soils from South China. A total of 16 types of ARGs were identified, corresponding to 110 ARG subtypes. The abundances and distribution pattern of ARGs in paddy soil were distinctively different from those in activated sludge and pristine deep ocean sediment, but close to those of sediment from human-impacted estuaries. Multidrug resistance genes were the most dominant type (38-47.5%) in all samples, and the ARGs detected encompassed the three major resistance mechanisms, among which extrusion by efflux pumps was predominant. Redundancy analysis (RDA) showed that pH was significantly correlated with the distribution of ARG subtypes (P soil, indicating that ARGs are widespread in paddy soils of South China. © FEMS 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  2. Metagenomic Profiling of Soil Microbes to Mine Salt Stress Tolerance Genes

    Directory of Open Access Journals (Sweden)

    Vasim Ahmed

    2018-02-01

    Full Text Available Osmotolerance is one of the critical factors for successful survival and colonization of microbes in saline environments. Nonetheless, information about these osmotolerance mechanisms is still inadequate. Exploration of the saline soil microbiome for its community structure and novel genetic elements is likely to provide information on the mechanisms involved in osmoadaptation. The present study explores the saline soil microbiome for its native structure and novel genetic elements involved in osmoadaptation. 16S rRNA gene sequence analysis has indicated the dominance of halophilic/halotolerant phylotypes affiliated to Proteobacteria, Actinobacteria, Gemmatimonadetes, Bacteroidetes, Firmicutes, and Acidobacteria. A functional metagenomics approach led to the identification of osmotolerant clones SSR1, SSR4, SSR6, SSR2 harboring BCAA_ABCtp, GSDH, STK_Pknb, and duf3445 genes. Furthermore, transposon mutagenesis, genetic, physiological and functional studies in close association has confirmed the role of these genes in osmotolerance. Enhancement in host osmotolerance possibly though the cytosolic accumulation of amino acids, reducing equivalents and osmolytes involving BCAA-ABCtp, GSDH, and STKc_PknB. Decoding of the genetic elements prevalent within these microbes can be exploited either as such for ameliorating soils or their genetically modified forms can assist crops to resist and survive in saline environment.

  3. Cloning and identification of novel hydrolase genes from a dairy cow rumen metagenomic library and characterization of a cellulase gene

    Directory of Open Access Journals (Sweden)

    Gong Xia

    2012-10-01

    Full Text Available Abstract Background Interest in cellulose degrading enzymes has increased in recent years due to the expansion of the cellulosic biofuel industry. The rumen is a highly adapted environment for the degradation of cellulose and a promising source of enzymes for industrial use. To identify cellulase enzymes that may be of such use we have undertaken a functional metagenomic screen to identify cellulase enzymes from the bacterial community in the rumen of a grass-hay fed dairy cow. Results Twenty five clones specifying cellulose activity were identified. Subcloning and sequence analysis of a subset of these hydrolase-positive clones identified 10 endoglucanase genes. Preliminary characterization of the encoded cellulases was carried out using crude extracts of each of the subclones. Zymogram analysis using carboxymethylcellulose as a substrate showed a single positive band for each subclone, confirming that only one functional cellulase gene was present in each. One cellulase gene, designated Cel14b22, was expressed at a high level in Escherichia coli and purified for further characterization. The purified recombinant enzyme showed optimal activity at pH 6.0 and 50°C. It was stable over a broad pH range, from pH 4.0 to 10.0. The activity was significantly enhanced by Mn2+ and dramatically reduced by Fe3+ or Cu2+. The enzyme hydrolyzed a wide range of beta-1,3-, and beta-1,4-linked polysaccharides, with varying activities. Activities toward microcrystalline cellulose and filter paper were relatively high, while the highest activity was toward Oat Gum. Conclusion The present study shows that a functional metagenomic approach can be used to isolate previously uncharacterized cellulases from the rumen environment.

  4. Bovine Host Genetic Variation Influences Rumen Microbial Methane Production with Best Selection Criterion for Low Methane Emitting and Efficiently Feed Converting Hosts Based on Metagenomic Gene Abundance.

    Directory of Open Access Journals (Sweden)

    Rainer Roehe

    2016-02-01

    Full Text Available Methane produced by methanogenic archaea in ruminants contributes significantly to anthropogenic greenhouse gas emissions. The host genetic link controlling microbial methane production is unknown and appropriate genetic selection strategies are not developed. We used sire progeny group differences to estimate the host genetic influence on rumen microbial methane production in a factorial experiment consisting of crossbred breed types and diets. Rumen metagenomic profiling was undertaken to investigate links between microbial genes and methane emissions or feed conversion efficiency. Sire progeny groups differed significantly in their methane emissions measured in respiration chambers. Ranking of the sire progeny groups based on methane emissions or relative archaeal abundance was consistent overall and within diet, suggesting that archaeal abundance in ruminal digesta is under host genetic control and can be used to genetically select animals without measuring methane directly. In the metagenomic analysis of rumen contents, we identified 3970 microbial genes of which 20 and 49 genes were significantly associated with methane emissions and feed conversion efficiency respectively. These explained 81% and 86% of the respective variation and were clustered in distinct functional gene networks. Methanogenesis genes (e.g. mcrA and fmdB were associated with methane emissions, whilst host-microbiome cross talk genes (e.g. TSTA3 and FucI were associated with feed conversion efficiency. These results strengthen the idea that the host animal controls its own microbiota to a significant extent and open up the implementation of effective breeding strategies using rumen microbial gene abundance as a predictor for difficult-to-measure traits on a large number of hosts. Generally, the results provide a proof of principle to use the relative abundance of microbial genes in the gastrointestinal tract of different species to predict their influence on traits e

  5. Metagenomic and network analysis reveal wide distribution and co-occurrence of environmental antibiotic resistance genes.

    Science.gov (United States)

    Li, Bing; Yang, Ying; Ma, Liping; Ju, Feng; Guo, Feng; Tiedje, James M; Zhang, Tong

    2015-11-01

    A metagenomic approach and network analysis was used to investigate the wide-spectrum profiles of antibiotic resistance genes (ARGs) and their co-occurrence patterns in 50 samples from 10 typical environments. In total, 260 ARG subtypes belonging to 18 ARG types were detected with an abundance range of 5.4 × 10(-6)-2.2 × 10(-1) copy of ARG per copy of 16S-rRNA gene. The trend of the total ARG abundances in environments matched well with the levels of anthropogenic impacts on these environments. From the less impacted environments to the seriously impacted environments, the total ARG abundances increased up to three orders of magnitude, that is, from 3.2 × 10(-3) to 3.1 × 10(0) copy of ARG per copy of 16S-rRNA gene. The abundant ARGs were associated with aminoglycoside, bacitracin, β-lactam, chloramphenicol, macrolide-lincosamide-streptogramin, quinolone, sulphonamide and tetracycline, in agreement with the antibiotics extensively used in human medicine or veterinary medicine/promoters. The widespread occurrences and abundance variation trend of vancomycin resistance genes in different environments might imply the spread of vancomycin resistance genes because of the selective pressure resulting from vancomycin use. The simultaneous enrichment of 12 ARG types in adult chicken faeces suggests the coselection of multiple ARGs in this production system. Non-metric multidimensional scaling analysis revealed that samples belonging to the same environment generally possessed similar ARG compositions. Based on the co-occurrence pattern revealed by network analysis, tetM and aminoglycoside resistance protein, the hubs of the ARG network, are proposed to be indicators to quantitatively estimate the abundance of 23 other co-occurring ARG subtypes by power functions.

  6. Gene-centric metagenomics of the fiber-adherent bovine rumen microbiome reveals forage specific glycoside hydrolases.

    Science.gov (United States)

    Brulc, Jennifer M; Antonopoulos, Dionysios A; Miller, Margret E Berg; Wilson, Melissa K; Yannarell, Anthony C; Dinsdale, Elizabeth A; Edwards, Robert E; Frank, Edward D; Emerson, Joanne B; Wacklin, Pirjo; Coutinho, Pedro M; Henrissat, Bernard; Nelson, Karen E; White, Bryan A

    2009-02-10

    The complex microbiome of the rumen functions as an effective system for the conversion of plant cell wall biomass to microbial protein, short chain fatty acids, and gases. As such, it provides a unique genetic resource for plant cell wall degrading microbial enzymes that could be used in the production of biofuels. The rumen and gastrointestinal tract harbor a dense and complex microbiome. To gain a greater understanding of the ecology and metabolic potential of this microbiome, we used comparative metagenomics (phylotype analysis and SEED subsystems-based annotations) to examine randomly sampled pyrosequence data from 3 fiber-adherent microbiomes and 1 pooled liquid sample (a mixture of the liquid microbiome fractions from the same bovine rumens). Even though the 3 animals were fed the same diet, the community structure, predicted phylotype, and metabolic potentials in the rumen were markedly different with respect to nutrient utilization. A comparison of the glycoside hydrolase and cellulosome functional genes revealed that in the rumen microbiome, initial colonization of fiber appears to be by organisms possessing enzymes that attack the easily available side chains of complex plant polysaccharides and not the more recalcitrant main chains, especially cellulose. Furthermore, when compared with the termite hindgut microbiome, there are fundamental differences in the glycoside hydrolase content that appear to be diet driven for either the bovine rumen (forages and legumes) or the termite hindgut (wood).

  7. Metagenomic analysis reveals wastewater treatment plants as hotspots of antibiotic resistance genes and mobile genetic elements.

    Science.gov (United States)

    Guo, Jianhua; Li, Jie; Chen, Hui; Bond, Philip L; Yuan, Zhiguo

    2017-10-15

    The intensive use of antibiotics results in their continuous release into the environment and the subsequent widespread occurrence of antibiotic resistant bacteria (ARB), antibiotic resistance genes (ARGs) and mobile genetic elements (MGEs). This study used Illumina high-throughput sequencing to investigate the broad-spectrum profiles of both ARGs and MGEs in activated sludge and anaerobically digested sludge from a full-scale wastewater treatment plant. A pipeline for identifying antibiotic resistance determinants was developed that consisted of four categories: gene transfer potential, ARG potential, ARGs pathway and ARGs phylogenetic origin. The metagenomic analysis showed that the activated sludge and the digested sludge exhibited different microbial communities and changes in the types and occurrence of ARGs and MGEs. In total, 42 ARGs subtypes were identified in the activated sludge, while 51 ARG subtypes were detected in the digested sludge. Additionally, MGEs including plasmids, transposons, integrons (intI1) and insertion sequences (e.g. ISSsp4, ISMsa21 and ISMba16) were abundant in the two sludge samples. The co-occurrence pattern between ARGs and microbial taxa revealed by network analysis indicated that some environmental bacteria (e.g. Clostridium and Nitrosomonas) might be potential hosts of multiple ARGs. The findings increase our understanding of WWTPs as hotspots of ARGs and MGEs, and contribute towards preventing their release into the downstream environment. Copyright © 2017 Elsevier Ltd. All rights reserved.

  8. PCR-based amplification and heterologous expression of Pseudomonas alcohol dehydrogenase genes from the soil metagenome for biocatalysis.

    Science.gov (United States)

    Itoh, Nobuya; Isotani, Kentaro; Makino, Yoshihide; Kato, Masaki; Kitayama, Kouta; Ishimota, Tuyoshi

    2014-02-05

    The amplification of useful genes from metagenomes offers great biotechnological potential. We employed this approach to isolate alcohol dehydrogenase (adh) genes from Pseudomonas to aid in the synthesis of optically pure alcohols from various ketones. A PCR primer combination synthesized by reference to the adh sequences of known Pseudomonas genes was used to amplify full-length adh genes directly from 17 samples of DNA extracted from soil. Three such adh preparations were used to construct Escherichia coli plasmid libraries. Of the approximately 2800 colonies obtained, 240 putative adh-positive clones were identified by colony-PCR. Next, 23 functional adh genes named using the descriptors HBadh and HPadh were analyzed. The adh genes obtained via this metagenomic approach varied in their DNA and amino acid sequences. Expression of the gene products in E. coli indicated varying substrate specificity. Two representative genes, HBadh-1 and HPadh-24, expressed in E. coli and Pseudomonas putida, respectively, were purified and characterized in detail. The enzyme products of these genes were confirmed to be useful for producing anti-Prelog chiral alcohols. Copyright © 2013 Elsevier Inc. All rights reserved.

  9. Metagenomic analysis revealed highly diverse microbial arsenic metabolism genes in paddy soils with low-arsenic contents.

    Science.gov (United States)

    Xiao, Ke-Qing; Li, Li-Guan; Ma, Li-Ping; Zhang, Si-Yu; Bao, Peng; Zhang, Tong; Zhu, Yong-Guan

    2016-04-01

    Microbe-mediated arsenic (As) metabolism plays a critical role in global As cycle, and As metabolism involves different types of genes encoding proteins facilitating its biotransformation and transportation processes. Here, we used metagenomic analysis based on high-throughput sequencing and constructed As metabolism protein databases to analyze As metabolism genes in five paddy soils with low-As contents. The results showed that highly diverse As metabolism genes were present in these paddy soils, with varied abundances and distribution for different types and subtypes of these genes. Arsenate reduction genes (ars) dominated in all soil samples, and significant correlation existed between the abundance of arr (arsenate respiration), aio (arsenite oxidation), and arsM (arsenite methylation) genes, indicating the co-existence and close-relation of different As resistance systems of microbes in wetland environments similar to these paddy soils after long-term evolution. Among all soil parameters, pH was an important factor controlling the distribution of As metabolism gene in five paddy soils (p = 0.018). To the best of our knowledge, this is the first study using high-throughput sequencing and metagenomics approach in characterizing As metabolism genes in the five paddy soil, showing their great potential in As biotransformation, and therefore in mitigating arsenic risk to humans. Copyright © 2015 Elsevier Ltd. All rights reserved.

  10. High throughput whole rumen metagenome profiling using untargeted massively parallel sequencing

    Directory of Open Access Journals (Sweden)

    Ross Elizabeth M

    2012-07-01

    Full Text Available Abstract Background Variation of microorganism communities in the rumen of cattle (Bos taurus is of great interest because of possible links to economically or environmentally important traits, such as feed conversion efficiency or methane emission levels. The resolution of studies investigating this variation may be improved by utilizing untargeted massively parallel sequencing (MPS, that is, sequencing without targeted amplification of genes. The objective of this study was to develop a method which used MPS to generate “rumen metagenome profiles”, and to investigate if these profiles were repeatable among samples taken from the same cow. Given faecal samples are much easier to obtain than rumen fluid samples; we also investigated whether rumen metagenome profiles were predictive of faecal metagenome profiles. Results Rather than focusing on individual organisms within the rumen, our method used MPS data to generate quantitative rumen micro-biome profiles, regardless of taxonomic classifications. The method requires a previously assembled reference metagenome. A number of such reference metagenomes were considered, including two rumen derived metagenomes, a human faecal microflora metagenome and a reference metagenome made up of publically available prokaryote sequences. Sequence reads from each test sample were aligned to these references. The “rumen metagenome profile” was generated from the number of the reads that aligned to each contig in the database. We used this method to test the hypothesis that rumen fluid microbial community profiles vary more between cows than within multiple samples from the same cow. Rumen fluid samples were taken from three cows, at three locations within the rumen. DNA from the samples was sequenced on the Illumina GAIIx. When the reads were aligned to a rumen metagenome reference, the rumen metagenome profiles were repeatable (P  Conclusions We have presented a simple and high throughput method of

  11. Metagenomic profiling of historic Colorado Front Range flood impact on distribution of riverine antibiotic resistance genes

    Science.gov (United States)

    Garner, Emily; Wallace, Joshua S.; Argoty, Gustavo Arango; Wilkinson, Caitlin; Fahrenfeld, Nicole; Heath, Lenwood S.; Zhang, Liqing; Arabi, Mazdak; Aga, Diana S.; Pruden, Amy

    2016-12-01

    Record-breaking floods in September 2013 caused massive damage to homes and infrastructure across the Colorado Front Range and heavily impacted the Cache La Poudre River watershed. Given the unique nature of this watershed as a test-bed for tracking environmental pathways of antibiotic resistance gene (ARG) dissemination, we sought to determine the impact of extreme flooding on ARG reservoirs in river water and sediment. We utilized high-throughput DNA sequencing to obtain metagenomic profiles of ARGs before and after flooding, and investigated 23 antibiotics and 14 metals as putative selective agents during post-flood recovery. With 277 ARG subtypes identified across samples, total bulk water ARGs decreased following the flood but recovered to near pre-flood abundances by ten months post-flood at both a pristine site and at a site historically heavily influenced by wastewater treatment plants and animal feeding operations. Network analysis of de novo assembled sequencing reads into 52,556 scaffolds identified ARGs likely located on mobile genetic elements, with up to 11 ARGs per plasmid-associated scaffold. Bulk water bacterial phylogeny correlated with ARG profiles while sediment phylogeny varied along the river’s anthropogenic gradient. This rare flood afforded the opportunity to gain deeper insight into factors influencing the spread of ARGs in watersheds.

  12. Identification and characterization of a novel trehalose synthase gene derived from saline-alkali soil metagenomes.

    Directory of Open Access Journals (Sweden)

    Ling Jiang

    Full Text Available A novel trehalose synthase (TreS gene was identified from a metagenomic library of saline-alkali soil by a simple activity-based screening system. Sequence analysis revealed that TreS encodes a protein of 552 amino acids, with a deduced molecular weight of 63.3 kDa. After being overexpressed in Escherichia coli and purified, the enzymatic properties of TreS were investigated. The recombinant TreS displayed its optimal activity at pH 9.0 and 45 °C, and the addition of most common metal ions (1 or 30 mM had no inhibition effect on the enzymatic activity evidently, except for the divalent metal ions Zn(2+ and Hg(2+. Kinetic analysis showed that the recombinant TreS had a 4.1-fold higher catalytic efficiency (Kcat/K m for maltose than for trehalose. The maximum conversion rate of maltose into trehalose by the TreS was reached more than 78% at a relatively high maltose concentration (30%, making it a good candidate in the large-scale production of trehalsoe after further study. In addition, five amino acid residues, His172, Asp201, Glu251, His318 and Asp319, were shown to be conserved in the TreS, which were also important for glycosyl hydrolase family 13 enzyme catalysis.

  13. A Metagenomic Perspective on Changes to Nutrient-cycling Genes Following Forest-to-agriculture Conversion in the Amazon Basin

    Science.gov (United States)

    Meyer, K. M.; Womack, A. M.; Rodrigues, J.; Nüsslein, K.; Bohannan, B. J. M.

    2014-12-01

    Forest-to-agriculture conversion has been shown to alter nutrient cycling and the community composition of soil microorganisms. However, few studies have looked simultaneously at how the abundance, composition, and diversity of microbial genes involved in nutrient cycling change with conversion. We used shotgun metagenomic sequencing to analyze soil from primary rainforest and converted cattle pasture sampled at the Fazenda Nova Vida in Rondônia, Brazil. The diversity, richness, and evenness of nutrient cycling genes were significantly higher in the pasture, and the composition of nutrient cycling communities differed significantly between land use types. These results largely mirror taxonomic shifts following Amazon rainforest conversion, which tends to increase diversity, richness, and evenness of soil microbial communities. The abundance of genes related to N cycling and methane flux differed between land use types. Methanotrophy genes decreased in abundance in the pasture, whereas methanogenesis genes were not significantly different between land use types. These changes could underlie the commonly observed shift from methane sink to source following forest-to-agriculture conversion. Multiple genes in the nitrogen cycle also differed with land use, including genes related to N-fixation and ammonification. Metagenomics provides a unique perspective on the consequences of land use change on microbial community structure and function.

  14. Computational prediction of CRISPR cassettes in gut metagenome samples from Chinese type-2 diabetic patients and healthy controls.

    Science.gov (United States)

    Mangericao, Tatiana C; Peng, Zhanhao; Zhang, Xuegong

    2016-01-11

    CRISPR has been becoming a hot topic as a powerful technique for genome editing for human and other higher organisms. The original CRISPR-Cas (Clustered Regularly Interspaced Short Palindromic Repeats coupled with CRISPR-associated proteins) is an important adaptive defence system for prokaryotes that provides resistance against invading elements such as viruses and plasmids. A CRISPR cassette contains short nucleotide sequences called spacers. These unique regions retain a history of the interactions between prokaryotes and their invaders in individual strains and ecosystems. One important ecosystem in the human body is the human gut, a rich habitat populated by a great diversity of microorganisms. Gut microbiomes are important for human physiology and health. Metagenome sequencing has been widely applied for studying the gut microbiomes. Most efforts in metagenome study has been focused on profiling taxa compositions and gene catalogues and identifying their associations with human health. Less attention has been paid to the analysis of the ecosystems of microbiomes themselves especially their CRISPR composition. We conducted a preliminary analysis of CRISPR sequences in a human gut metagenomic data set of Chinese individuals of type-2 diabetes patients and healthy controls. Applying an available CRISPR-identification algorithm, PILER-CR, we identified 3169 CRISPR cassettes in the data, from which we constructed a set of 1302 unique repeat sequences and 36,709 spacers. A more extensive analysis was made for the CRISPR repeats: these repeats were submitted to a more comprehensive clustering and classification using the web server tool CRISPRmap. All repeats were compared with known CRISPRs in the database CRISPRdb. A total of 784 repeats had matches in the database, and the remaining 518 repeats from our set are potentially novel ones. The computational analysis of CRISPR composition based contigs of metagenome sequencing data is feasible. It provides an efficient

  15. Salt resistance genes revealed by functional metagenomics from brines and moderate-salinity rhizosphere within a hypersaline environment

    Directory of Open Access Journals (Sweden)

    Salvador eMirete

    2015-10-01

    Full Text Available Hypersaline environments are considered one of the most extreme habitats on earth and microorganisms have developed diverse molecular mechanisms of adaptation to withstand these conditions. The present study was aimed at identifying novel genes involved in salt resistance from the microbial communities of brines and the rhizosphere from the Es Trenc saltern (Mallorca, Spain. The microbial diversity assessed by pyrosequencing of 16S rRNA gene libraries revealed the presence of communities that are typical in such environments. Metagenomic libraries from brine and rhizosphere samples, were transferred to the osmosensitive strain Escherichia coli MKH13, and screened for salt resistance. As a result, eleven genes that conferred salt resistance were identified, some encoding for well known proteins previously related to osmoadaptation as a glycerol and a proton pump, whereas others encoded for proteins not previously related to this function in microorganisms as DNA/RNA helicases, an endonuclease III (Nth and hypothetical proteins of unknown function. Furthermore, four of the retrieved genes were cloned and expressed in Bacillus subtilis and they also exhibited salt resistance in this bacterium, broadening the spectrum of bacterial species where these genes can operate. This is the first report of salt resistance genes recovered from metagenomes of a hypersaline environment.

  16. Identification and molecular characterization of a metagenome-derived L-lysine decarboxylase gene from subtropical soil microorganisms.

    Science.gov (United States)

    Deng, Jie; Gao, Hua; Gao, Zhen; Zhao, Huaxian; Yang, Ying; Wu, Qiaofen; Wu, Bo; Jiang, Chengjian

    2017-01-01

    L-lysine decarboxylase (LDC, EC 4.1.1.18) is a key enzyme in the decarboxylation of L-lysine to 1,5-pentanediamine and efficiently contributes significance to biosynthetic capability. Metagenomic technology is a shortcut approach used to obtain new genes from uncultured microorganisms. In this study, a subtropical soil metagenomic library was constructed, and a putative LDC gene named ldc1E was isolated by function-based screening strategy through the indication of pH change by L-lysine decarboxylation. Amino acid sequence comparison and homology modeling indicated the close relation between Ldc1E and other putative LDCs. Multiple sequence alignment analysis revealed that Ldc1E contained a highly conserved motif Ser-X-His-Lys (Pxl), and molecular docking results showed that this motif was located in the active site and could combine with the cofactor pyridoxal 5'-phosphate. The ldc1E gene was subcloned into the pET-30a(+) vector and highly expressed in Escherichia coli BL21 (DE3) pLysS. The recombinant protein was purified to homogeneity. The maximum activity of Ldc1E occurred at pH 6.5 and 40°C using L-lysine monohydrochloride as the substrate. Recombinant Ldc1E had apparent Km, kcat, and kcat/Km values of 1.08±0.16 mM, 5.09±0.63 s-1, and 4.73×103 s-1 M-1, respectively. The specific activity of Ldc1E was 1.53±0.06 U mg-1 protein. Identifying a metagenome-derived LDC gene provided a rational reference for further gene modifications in industrial applications.

  17. Metagenomic Analysis of Antibiotic Resistance Genes in Dairy Cow Feces following Therapeutic Administration of Third Generation Cephalosporin.

    Directory of Open Access Journals (Sweden)

    Lindsey Chambers

    Full Text Available Although dairy manure is widely applied to land, it is relatively understudied compared to other livestock as a potential source of antibiotic resistance genes (ARGs to the environment and ultimately to human pathogens. Ceftiofur, the most widely used antibiotic used in U.S. dairy cows, is a 3rd generation cephalosporin, a critically important class of antibiotics to human health. The objective of this study was to evaluate the effect of typical ceftiofur antibiotic treatment on the prevalence of ARGs in the fecal microbiome of dairy cows using a metagenomics approach. β-lactam ARGs were found to be elevated in feces from Holstein cows administered ceftiofur (n = 3 relative to control cows (n = 3. However, total numbers of ARGs across all classes were not measurably affected by ceftiofur treatment, likely because of dominance of unaffected tetracycline ARGs in the metagenomics libraries. Functional analysis via MG-RAST further revealed that ceftiofur treatment resulted in increases in gene sequences associated with "phages, prophages, transposable elements, and plasmids", suggesting that this treatment also enriched the ability to horizontally transfer ARGs. Additional functional shifts were noted with ceftiofur treatment (e.g., increase in genes associated with stress, chemotaxis, and resistance to toxic compounds; decrease in genes associated with metabolism of aromatic compounds and cell division and cell cycle, along with measureable taxonomic shifts (increase in Bacterioidia and decrease in Actinobacteria. This study demonstrates that ceftiofur has a broad, measureable and immediate effect on the cow fecal metagenome. Given the importance of 3rd generation cephalospirins to human medicine, their continued use in dairy cattle should be carefully considered and waste treatment strategies to slow ARG dissemination from dairy cattle manure should be explored.

  18. Metagenomic Analysis of Antibiotic Resistance Genes in Dairy Cow Feces following Therapeutic Administration of Third Generation Cephalosporin.

    Science.gov (United States)

    Chambers, Lindsey; Yang, Ying; Littier, Heather; Ray, Partha; Zhang, Tong; Pruden, Amy; Strickland, Michael; Knowlton, Katharine

    2015-01-01

    Although dairy manure is widely applied to land, it is relatively understudied compared to other livestock as a potential source of antibiotic resistance genes (ARGs) to the environment and ultimately to human pathogens. Ceftiofur, the most widely used antibiotic used in U.S. dairy cows, is a 3rd generation cephalosporin, a critically important class of antibiotics to human health. The objective of this study was to evaluate the effect of typical ceftiofur antibiotic treatment on the prevalence of ARGs in the fecal microbiome of dairy cows using a metagenomics approach. β-lactam ARGs were found to be elevated in feces from Holstein cows administered ceftiofur (n = 3) relative to control cows (n = 3). However, total numbers of ARGs across all classes were not measurably affected by ceftiofur treatment, likely because of dominance of unaffected tetracycline ARGs in the metagenomics libraries. Functional analysis via MG-RAST further revealed that ceftiofur treatment resulted in increases in gene sequences associated with "phages, prophages, transposable elements, and plasmids", suggesting that this treatment also enriched the ability to horizontally transfer ARGs. Additional functional shifts were noted with ceftiofur treatment (e.g., increase in genes associated with stress, chemotaxis, and resistance to toxic compounds; decrease in genes associated with metabolism of aromatic compounds and cell division and cell cycle), along with measureable taxonomic shifts (increase in Bacterioidia and decrease in Actinobacteria). This study demonstrates that ceftiofur has a broad, measureable and immediate effect on the cow fecal metagenome. Given the importance of 3rd generation cephalospirins to human medicine, their continued use in dairy cattle should be carefully considered and waste treatment strategies to slow ARG dissemination from dairy cattle manure should be explored.

  19. Metagenomic Analysis of Antibiotic Resistance Genes in Dairy Cow Feces following Therapeutic Administration of Third Generation Cephalosporin

    Science.gov (United States)

    Ray, Partha; Zhang, Tong; Pruden, Amy; Strickland, Michael; Knowlton, Katharine

    2015-01-01

    Although dairy manure is widely applied to land, it is relatively understudied compared to other livestock as a potential source of antibiotic resistance genes (ARGs) to the environment and ultimately to human pathogens. Ceftiofur, the most widely used antibiotic used in U.S. dairy cows, is a 3rd generation cephalosporin, a critically important class of antibiotics to human health. The objective of this study was to evaluate the effect of typical ceftiofur antibiotic treatment on the prevalence of ARGs in the fecal microbiome of dairy cows using a metagenomics approach. β-lactam ARGs were found to be elevated in feces from Holstein cows administered ceftiofur (n = 3) relative to control cows (n = 3). However, total numbers of ARGs across all classes were not measurably affected by ceftiofur treatment, likely because of dominance of unaffected tetracycline ARGs in the metagenomics libraries. Functional analysis via MG-RAST further revealed that ceftiofur treatment resulted in increases in gene sequences associated with “phages, prophages, transposable elements, and plasmids”, suggesting that this treatment also enriched the ability to horizontally transfer ARGs. Additional functional shifts were noted with ceftiofur treatment (e.g., increase in genes associated with stress, chemotaxis, and resistance to toxic compounds; decrease in genes associated with metabolism of aromatic compounds and cell division and cell cycle), along with measureable taxonomic shifts (increase in Bacterioidia and decrease in Actinobacteria). This study demonstrates that ceftiofur has a broad, measureable and immediate effect on the cow fecal metagenome. Given the importance of 3rd generation cephalospirins to human medicine, their continued use in dairy cattle should be carefully considered and waste treatment strategies to slow ARG dissemination from dairy cattle manure should be explored. PMID:26258869

  20. Automated and Accurate Estimation of Gene Family Abundance from Shotgun Metagenomes.

    Directory of Open Access Journals (Sweden)

    Stephen Nayfach

    2015-11-01

    Full Text Available Shotgun metagenomic DNA sequencing is a widely applicable tool for characterizing the functions that are encoded by microbial communities. Several bioinformatic tools can be used to functionally annotate metagenomes, allowing researchers to draw inferences about the functional potential of the community and to identify putative functional biomarkers. However, little is known about how decisions made during annotation affect the reliability of the results. Here, we use statistical simulations to rigorously assess how to optimize annotation accuracy and speed, given parameters of the input data like read length and library size. We identify best practices in metagenome annotation and use them to guide the development of the Shotgun Metagenome Annotation Pipeline (ShotMAP. ShotMAP is an analytically flexible, end-to-end annotation pipeline that can be implemented either on a local computer or a cloud compute cluster. We use ShotMAP to assess how different annotation databases impact the interpretation of how marine metagenome and metatranscriptome functional capacity changes across seasons. We also apply ShotMAP to data obtained from a clinical microbiome investigation of inflammatory bowel disease. This analysis finds that gut microbiota collected from Crohn's disease patients are functionally distinct from gut microbiota collected from either ulcerative colitis patients or healthy controls, with differential abundance of metabolic pathways related to host-microbiome interactions that may serve as putative biomarkers of disease.

  1. Identification and characterization of novel cellulolytic and hemicellulolytic genes and enzymes derived from German grassland soil metagenomes.

    Science.gov (United States)

    Nacke, Heiko; Engelhaupt, Martin; Brady, Silja; Fischer, Christiane; Tautzt, Janine; Daniel, Rolf

    2012-04-01

    Soil metagenomes represent an unlimited resource for the discovery of novel biocatalysts from soil microorganisms. Three large-inserts metagenomic DNA libraries were constructed from different grassland soil samples and screened for genes conferring cellulase or xylanase activity. Function-driven screening identified a novel cellulase-encoding gene (cel01) and two xylanase-encoding genes (xyn01 and xyn02). From sequence and protein domain analyses, Cel01 (831 amino acids) belongs to glycoside hydrolase family 9 whereas Xyn01 (170 amino acids) and Xyn02 (255 amino acids) are members of glycoside hydrolase family 11. Cel01 harbors a family 9 carbohydrate-binding module, previously found only in xylanases. Both Xyn01 and Xyn02 were most active at 60°C with high activities from 4 to 10 and optimal at pH 7 (Xyn01) and pH 6 (Xyn02). The cellulase gene, cel01, was expressed in E. coli BL21 and the recombinant enzyme (91.9 kDa) was purified. Cel01 exhibited high activity with soluble cellulose substrates containing β-1,4-linkages. Activity with microcrystalline cellulose was not detected. These data, together with the analysis of the degradation profiles of carboxymethyl cellulose and barley glucan indicated that Cel01 is an endo 1,4-β-glucanase. Cel01 showed optimal activity at 50°C and pH 7 being highly active from pH range 5 to 9 and possesses remarkable halotolerance.

  2. Cloning, expression and characteristics of a novel alkalistable and thermostable xylanase encoding gene (Mxyl retrieved from compost-soil metagenome.

    Directory of Open Access Journals (Sweden)

    Digvijay Verma

    Full Text Available BACKGROUND: The alkalistable and thermostable xylanases are in high demand for pulp bleaching in paper industry and generating xylooligosaccharides by hydrolyzing xylan component of agro-residues. The compost-soil samples, one of the hot environments, are expected to be a rich source of microbes with thermostable enzymes. METHODOLOGY/PRINCIPAL FINDINGS: Metagenomic DNA from hot environmental samples could be a rich source of novel biocatalysts. While screening metagenomic library constructed from DNA extracted from the compost-soil in the p18GFP vector, a clone (TSDV-MX1 was detected that exhibited clear zone of xylan hydrolysis on RBB xylan plate. The sequencing of 6.321 kb DNA insert and its BLAST analysis detected the presence of xylanase gene that comprised 1077 bp. The deduced protein sequence (358 amino acids displayed homology with glycosyl hydrolase (GH family 11 xylanases. The gene was subcloned into pET28a vector and expressed in E. coli BL21 (DE3. The recombinant xylanase (rMxyl exhibited activity over a broad range of pH and temperature with optima at pH 9.0 and 80°C. The recombinant xylanase is highly thermostable having T1/2 of 2 h at 80°C and 15 min at 90°C. CONCLUSION/SIGNIFICANCE: This is the first report on the retrieval of xylanase gene through metagenomic approach that encodes an enzyme with alkalistability and thermostability. The recombinant xylanase has a potential application in paper and pulp industry in pulp bleaching and generating xylooligosaccharides from the abundantly available agro-residues.

  3. MegaGTA: a sensitive and accurate metagenomic gene-targeted assembler using iterative de Bruijn graphs

    OpenAIRE

    Li, Dinghua; Huang, Yukun; Leung, Chi-Ming; Luo, Ruibang; Ting, Hing-Fung; Lam, Tak-Wah

    2017-01-01

    Background The recent release of the gene-targeted metagenomics assembler Xander has demonstrated that using the trained Hidden Markov Model (HMM) to guide the traversal of de Bruijn graph gives obvious advantage over other assembly methods. Xander, as a pilot study, indeed has a lot of room for improvement. Apart from its slow speed, Xander uses only 1 k-mer size for graph construction and whatever choice of k will compromise either sensitivity or accuracy. Xander uses a Bloom-filter represe...

  4. MegaGTA: a sensitive and accurate metagenomic gene-targeted assembler using iterative de Bruijn graphs.

    Science.gov (United States)

    Li, Dinghua; Huang, Yukun; Leung, Chi-Ming; Luo, Ruibang; Ting, Hing-Fung; Lam, Tak-Wah

    2017-10-16

    The recent release of the gene-targeted metagenomics assembler Xander has demonstrated that using the trained Hidden Markov Model (HMM) to guide the traversal of de Bruijn graph gives obvious advantage over other assembly methods. Xander, as a pilot study, indeed has a lot of room for improvement. Apart from its slow speed, Xander uses only 1 k-mer size for graph construction and whatever choice of k will compromise either sensitivity or accuracy. Xander uses a Bloom-filter representation of de Bruijn graph to achieve a lower memory footprint. Bloom filters bring in false positives, and it is not clear how this would impact the quality of assembly. Xander does not keep track of the multiplicity of k-mers, which would have been an effective way to differentiate between erroneous k-mers and correct k-mers. In this paper, we present a new gene-targeted assembler MegaGTA, which attempts to improve Xander in different aspects. Quality-wise, it utilizes iterative de Bruijn graphs to take full advantage of multiple k-mer sizes to make the best of both sensitivity and accuracy. Computation-wise, it employs succinct de Bruijn graphs (SdBG) to achieve low memory footprint and high speed (the latter is benefited from a highly efficient parallel algorithm for constructing SdBG). Unlike Bloom filters, an SdBG is an exact representation of a de Bruijn graph. It enables MegaGTA to avoid false-positive contigs and to easily incorporate the multiplicity of k-mers for building better HMM model. We have compared MegaGTA and Xander on an HMP-defined mock metagenomic dataset, and showed that MegaGTA excelled in both sensitivity and accuracy. On a large rhizosphere soil metagenomic sample (327Gbp), MegaGTA produced 9.7-19.3% more contigs than Xander, and these contigs were assigned to 10-25% more gene references. In our experiments, MegaGTA, depending on the number of k-mers used, is two to ten times faster than Xander. MegaGTA improves on the algorithm of Xander and achieves higher

  5. Effect of temperature on removal of antibiotic resistance genes by anaerobic digestion of activated sludge revealed by metagenomic approach.

    Science.gov (United States)

    Zhang, Tong; Yang, Ying; Pruden, Amy

    2015-09-01

    As antibiotic resistance continues to spread globally, there is growing interest in the potential to limit the spread of antibiotic resistance genes (ARGs) from wastewater sources. In particular, operational conditions during sludge digestion may serve to discourage selection of resistant bacteria, reduce horizontal transfer of ARGs, and aid in hydrolysis of DNA. This study applied metagenomic analysis to examine the removal efficiency of ARGs through thermophilic and mesophilic anaerobic digestion using bench-scale reactors. Although the relative abundance of various ARGs shifted from influent to effluent sludge, there was no measureable change in the abundance of total ARGs or their diversity in either the thermophilic or mesophilic treatment. Among the 35 major ARG subtypes detected in feed sludge, substantial reductions (removal efficiency >90%) of 8 and 13 ARGs were achieved by thermophilic and mesophilic digestion, respectively. However, resistance genes of aadA, macB, and sul1 were enriched during the thermophilic anaerobic digestion, while resistance genes of erythromycin esterase type I, sul1, and tetM were enriched during the mesophilic anaerobic digestion. Efflux pump remained to be the major antibiotic resistance mechanism in sludge samples, but the portion of ARGs encoding resistance via target modification increased in the anaerobically digested sludge relative to the feed. Metagenomic analysis provided insight into the potential for anaerobic digestion to mitigate a broad array of ARGs.

  6. Activity screening of environmental metagenomic libraries reveals novel carboxylesterase families

    Science.gov (United States)

    Popovic, Ana; Hai, Tran; Tchigvintsev, Anatoly; Hajighasemi, Mahbod; Nocek, Boguslaw; Khusnutdinova, Anna N.; Brown, Greg; Glinos, Julia; Flick, Robert; Skarina, Tatiana; Chernikova, Tatyana N.; Yim, Veronica; Brüls, Thomas; Paslier, Denis Le; Yakimov, Michail M.; Joachimiak, Andrzej; Ferrer, Manuel; Golyshina, Olga V.; Savchenko, Alexei; Golyshin, Peter N.; Yakunin, Alexander F.

    2017-01-01

    Metagenomics has made accessible an enormous reserve of global biochemical diversity. To tap into this vast resource of novel enzymes, we have screened over one million clones from metagenome DNA libraries derived from sixteen different environments for carboxylesterase activity and identified 714 positive hits. We have validated the esterase activity of 80 selected genes, which belong to 17 different protein families including unknown and cyclase-like proteins. Three metagenomic enzymes exhibited lipase activity, and seven proteins showed polyester depolymerization activity against polylactic acid and polycaprolactone. Detailed biochemical characterization of four new enzymes revealed their substrate preference, whereas their catalytic residues were identified using site-directed mutagenesis. The crystal structure of the metal-ion dependent esterase MGS0169 from the amidohydrolase superfamily revealed a novel active site with a bound unknown ligand. Thus, activity-centered metagenomics has revealed diverse enzymes and novel families of microbial carboxylesterases, whose activity could not have been predicted using bioinformatics tools. PMID:28272521

  7. Characterization of a novel amylolytic enzyme encoded by a gene from a soil-derived metagenomic library.

    Science.gov (United States)

    Yun, Jiae; Kang, Seowon; Park, Sulhee; Yoon, Hyunjin; Kim, Myo-Jeong; Heu, Sunggi; Ryu, Sangyeol

    2004-12-01

    It has been estimated that less than 1% of the microorganisms in nature can be cultivated by conventional techniques. Thus, the classical approach of isolating enzymes from pure cultures allows the analysis of only a subset of the total naturally occurring microbiota in environmental samples enriched in microorganisms. To isolate useful microbial enzymes from uncultured soil microorganisms, a metagenome was isolated from soil samples, and a metagenomic library was constructed by using the pUC19 vector. The library was screened for amylase activity, and one clone from among approximately 30,000 recombinant Escherichia coli clones showed amylase activity. Sequencing of the clone revealed a novel amylolytic enzyme expressed from a novel gene. The putative amylase gene (amyM) was overexpressed and purified for characterization. Optimal conditions for the enzyme activity of the AmyM protein were 42 degrees C and pH 9.0; Ca2+ stabilized the activity. The amylase hydrolyzed soluble starch and cyclodextrins to produce high levels of maltose and hydrolyzed pullulan to panose. The enzyme showed a high transglycosylation activity, making alpha-(1, 4) linkages exclusively. The hydrolysis and transglycosylation properties of AmyM suggest that it has novel characteristics and can be regarded as an intermediate type of maltogenic amylase, alpha-amylase, and 4-alpha-glucanotransferase.

  8. A Bioinformatician's Guide to Metagenomics

    Energy Technology Data Exchange (ETDEWEB)

    Kunin, Victor; Copeland, Alex; Lapidus, Alla; Mavromatis, Konstantinos; Hugenholtz, Philip

    2008-08-01

    As random shotgun metagenomic projects proliferate and become the dominant source of publicly available sequence data, procedures for best practices in their execution and analysis become increasingly important. Based on our experience at the Joint Genome Institute, we describe step-by-step the chain of decisions accompanying a metagenomic project from the viewpoint of a bioinformatician. We guide the reader through a standard workflow for a metagenomic project beginning with pre-sequencing considerations such as community composition and sequence data type that will greatly influence downstream analyses. We proceed with recommendations for sampling and data generation including sample and metadata collection, community profiling, construction of shotgun libraries and sequencing strategies. We then discuss the application of generic sequence processing steps (read preprocessing, assembly, and gene prediction and annotation) to metagenomic datasets by contrast to genome projects. Different types of data analyses particular to metagenomes are then presented including binning, dominant population analysis and gene-centric analysis. Finally data management systems and issues are presented and discussed. We hope that this review will assist bioinformaticians and biologists in making better-informed decisions on their journey during a metagenomic project.

  9. Comparative Metagenomics Revealed Commonly Enriched Gene Sets in Human Gut Microbiomes

    OpenAIRE

    Kurokawa, Ken; Itoh, Takehiko; Kuwahara, Tomomi; Oshima, Kenshiro; Toh, Hidehiro; Toyoda, Atsushi; Takami, Hideto; Morita, Hidetoshi; Vineet K. Sharma; Tulika P. Srivastava; Todd D. Taylor; Noguchi, Hideki; Mori, Hiroshi; Ogura, Yoshitoshi; Dusko S. Ehrlich

    2007-01-01

    Numerous microbes inhabit the human intestine, many of which are uncharacterized or uncultivable. They form a complex microbial community that deeply affects human physiology. To identify the genomic features common to all human gut microbiomes as well as those variable among them, we performed a large-scale comparative metagenomic analysis of fecal samples from 13 healthy individuals of various ages, including unweaned infants. We found that, while the gut microbiota from unweaned infants we...

  10. Comparative Metagenomics of Gut and Ocean: Identification of Microbial Marker Genes for Complex Environmental Properties (2011 JGI User Meeting)

    Energy Technology Data Exchange (ETDEWEB)

    Bork, Peer

    2011-03-23

    The U.S. Department of Energy Joint Genome Institute (JGI) invited scientists interested in the application of genomics to bioenergy and environmental issues, as well as all current and prospective users and collaborators, to attend the annual DOE JGI Genomics of Energy & Environment Meeting held March 22-24, 2011 in Walnut Creek, Calif. The emphasis of this meeting was on the genomics of renewable energy strategies, carbon cycling, environmental gene discovery, and engineering of fuel-producing organisms. The meeting features presentations by leading scientists advancing these topics. Peer Bork of the European Molecular Biology Laboratory on Comparative Metagenomics of Gut and Ocean: Identification of Microbial Marker Genes for Complex Environmental Properties at the 6th annual Genomics of Energy & Environment Meeting on March 23, 2011.

  11. A novel feruloyl esterase from rumen microbial metagenome: Gene cloning and enzyme characterization in the release of mono- and diferulic acids

    Science.gov (United States)

    A feruloyl esterase (FAE) gene was isolated from a rumen microbial metagenome, cloned into E. coli, and expressed in active form. The enzyme (RuFae4) was classified as a Type D feruloyl esterase based on its action on synthetic substrates and ability to release diferulates. The RuFae4 alone releas...

  12. Phylogeny-function analysis of (meta)genomic libraries: screening for expression of ribosomal RNA genes by large-insert library fluorescent in situ hybridization (LIL-FISH)

    NARCIS (Netherlands)

    Leveau, J.H.J.; Gerards, S.; De Boer, W.; Van Veen, J.A.

    2004-01-01

    We assessed the utility of fluorescent in situ hybridization (FISH) in the screening of clone libraries of (meta)genomic or environmental DNA for the presence and expression of bacterial ribosomal RNA (rRNA) genes. To establish proof-of-principle, we constructed a fosmid-based library in Escherichia

  13. Culture-Independent Identification of Manganese-Oxidizing Genes from Deep-Sea Hydrothermal Vent Chemoautotrophic Ferromanganese Microbial Communities Using a Metagenomic Approach

    Science.gov (United States)

    Davis, R.; Tebo, B. M.

    2013-12-01

    Microbial activity has long been recognized as being important to the fate of manganese (Mn) in hydrothermal systems, yet we know very little about the organisms that catalyze Mn oxidation, the mechanisms by which Mn is oxidized or the physiological function that Mn oxidation serves in these hydrothermal systems. Hydrothermal vents with thick ferromanganese microbial mats and Mn oxide-coated rocks observed throughout the Pacific Ring of Fire are ideal models to study the mechanisms of microbial Mn oxidation, as well as primary productivity in these metal-cycling ecosystems. We sampled ferromanganese microbial mats from Vai Lili Vent Field (Tmax=43°C) located on the Eastern Lau Spreading Center and Mn oxide-encrusted rhyolytic pumice (4°C) from Niua South Seamount on the Tonga Volcanic Arc. Metagenomic libraries were constructed and assembled from these samples and key genes known to be involved in Mn oxidation and carbon fixation pathways were identified in the reconstructed genomes. The Vai Lili metagenome assembled to form 121,157 contiguous sequences (contigs) greater than 1000bp in length, with an N50 of 8,261bp and a total metagenome size of 593 Mbp. Contigs were binned using an emergent self-organizing map of tetranucleotide frequencies. Putative homologs of the multicopper Mn-oxidase MnxG were found in the metagenome that were related to both the Pseudomonas-like and Bacillus-like forms of the enzyme. The bins containing the Pseudomonas-like mnxG genes are most closely related to uncultured Deltaproteobacteria and Chloroflexi. The Deltaproteobacteria bin appears to be an obligate anaerobe with possible chemoautotrophic metabolisms, while the Chloroflexi appears to be a heterotrophic organism. The metagenome from the Mn-stained pumice was assembled into 122,092 contigs greater than 1000bp in length with an N50 of 7635 and a metagenome size of 385 Mbp. Both forms of mnxG genes are present in this metagenome as well as the genes encoding the putative Mn

  14. Metagenome Analysis of Protein Domain Collocation within Cellulase Genes of Goat Rumen Microbes

    Directory of Open Access Journals (Sweden)

    SooYeon Lim

    2013-08-01

    Full Text Available In this study, protein domains with cellulase activity in goat rumen microbes were investigated using metagenomic and bioinformatic analyses. After the complete genome of goat rumen microbes was obtained using a shotgun sequencing method, 217,892,109 pair reads were filtered, including only those with 70% identity, 100-bp matches, and thresholds below E−10 using METAIDBA. These filtered contigs were assembled and annotated using blastN against the NCBI nucleotide database. As a result, a microbial community structure with 1431 species was analyzed, among which Prevotella ruminicola 23 bacteria and Butyrivibrio proteoclasticus B316 were the dominant groups. In parallel, 201 sequences related with cellulase activities (EC.3.2.1.4 were obtained through blast searches using the enzyme.dat file provided by the NCBI database. After translating the nucleotide sequence into a protein sequence using Interproscan, 28 protein domains with cellulase activity were identified using the HMMER package with threshold E values below 10−5. Cellulase activity protein domain profiling showed that the major protein domains such as lipase GDSL, cellulase, and Glyco hydro 10 were present in bacterial species with strong cellulase activities. Furthermore, correlation plots clearly displayed the strong positive correlation between some protein domain groups, which was indicative of microbial adaption in the goat rumen based on feeding habits. This is the first metagenomic analysis of cellulase activity protein domains using bioinformatics from the goat rumen.

  15. The metagenomic telescope.

    Directory of Open Access Journals (Sweden)

    Balázs Szalkai

    Full Text Available Next generation sequencing technologies led to the discovery of numerous new microbe species in diverse environmental samples. Some of the new species contain genes never encountered before. Some of these genes encode proteins with novel functions, and some of these genes encode proteins that perform some well-known function in a novel way. A tool, named the Metagenomic Telescope, is described here that applies artificial intelligence methods, and seems to be capable of identifying new protein functions even in the well-studied model organisms. As a proof-of-principle demonstration of the Metagenomic Telescope, we considered DNA repair enzymes in the present work. First we identified proteins in DNA repair in well-known organisms (i.e., proteins in base excision repair, nucleotide excision repair, mismatch repair and DNA break repair; next we applied multiple alignments and then built hidden Markov profiles for each protein separately, across well-researched organisms; next, using public depositories of metagenomes, originating from extreme environments, we identified DNA repair genes in the samples. While the phylogenetic classification of the metagenomic samples are not typically available, we hypothesized that some very special DNA repair strategies need to be applied in bacteria and Archaea living in those extreme circumstances. It is a difficult task to evaluate the results obtained from mostly unknown species; therefore we applied again the hidden Markov profiling: for the identified DNA repair genes in the extreme metagenomes, we prepared new hidden Markov profiles (for each genes separately, subsequent to a cluster analysis; and we searched for similarities to those profiles in model organisms. We have found well known DNA repair proteins, numerous proteins with unknown functions, and also proteins with known, but different functions in the model organisms.

  16. Molecular cloning of a novel bioH gene from an environmental metagenome encoding a carboxylesterase with exceptional tolerance to organic solvents

    DEFF Research Database (Denmark)

    Shi, Yuping; Pan, Yingjie; Li, Bailin

    2013-01-01

    ABSTRACT: BACKGROUND: BioH is one of the key enzymes to produce the precursor pimeloyl-ACP to initiate biotin biosynthesis de novo in bacteria. To date, very few bioH genes have been characterized. In this study, we cloned and identified a novel bioH gene, bioHx, from an environmental metagenome...... of an unamplified metagenomic library with a tributyrin-containing medium led to the isolation of a clone exhibiting lipolytic activity. This clone carried a 4,570-bp DNA fragment encoding for six genes, designated bioF, bioHx, fabG, bioC, orf5 and sdh, four of which were implicated in the de novo biotin...... with a strong potential in industrial applications. CONCLUSIONS: This study constituted the first investigation of a novel bioHx gene in a biotin biosynthetic gene cluster cloned from an environmental metagenome. The bioHx gene was successfully cloned, expressed and characterized. The results demonstrated...

  17. Prevalence of antibiotic resistance genes and bacterial pathogens in long-term manured greenhouse soils as revealed by metagenomic survey.

    Science.gov (United States)

    Fang, Hua; Wang, Huifang; Cai, Lin; Yu, Yunlong

    2015-01-20

    Antibiotic resistance genes (ARGs), human pathogenic bacteria (HPB), and HPB carrying ARGs pose a high risk to soil ecology and public health. Here, we used a metagenomic approach to investigate their diversity and abundance in chicken manures and greenhouse soils collected from Guli, Pulangke, and Hushu vegetable bases with different greenhouse planting years in Nanjing, Eastern China. There was a positive correlation between the levels of antibiotics, ARGs, HPB, and HPB carrying ARGs in manures and greenhouse soils. In total, 156.2–5001.4 μg/kg of antibiotic residues, 22 classes of ARGs, 32 HPB species, and 46 species of HPB carrying ARGs were found. The highest relative abundance was tetracycline resistance genes (manures) and multidrug resistance genes (greenhouse soils). The dominant HPB and HPB carrying ARGs in the manures were Bacillus anthracis, Bordetella pertussis, and B. anthracis (sulfonamide resistance gene, sul1), respectively. The corresponding findings in greenhouse soils were Mycobacterium tuberculosis and M. ulcerans, M. tuberculosis (macrolide-lincosamide-streptogramin resistance protein, MLSRP), and B. anthracis (sul1), respectively. Our findings confirmed high levels of antibiotics, ARGs, HPB, and HPB carrying ARGs in the manured greenhouse soils compared with those in the field soils, and their relative abundance increased with the extension of greenhouse planting years.

  18. Metagenomic-based study of the phylogenetic and functional gene diversity in Galápagos land and marine iguanas.

    Science.gov (United States)

    Hong, Pei-Ying; Mao, Yuejian; Ortiz-Kofoed, Shannon; Shah, Rushabh; Cann, Isaac; Mackie, Roderick I

    2015-02-01

    In this study, a metagenome-based analysis of the fecal samples from the macrophytic algae-consuming marine iguana (MI; Amblyrhynchus cristatus) and terrestrial biomass-consuming land iguanas (LI; Conolophus spp.) was conducted. Phylogenetic affiliations of the fecal microbiome were more similar between both iguanas than to other mammalian herbivorous hosts. However, functional gene diversities in both MI and LI iguana hosts differed in relation to the diet, where the MI fecal microbiota had a functional diversity that clustered apart from the other terrestrial-biomass consuming reptilian and mammalian hosts. A further examination of the carbohydrate-degrading genes revealed that several of the prevalent glycosyl hydrolases (GH), glycosyl transferases (GT), carbohydrate binding modules (CBM), and carbohydrate esterases (CE) gene classes were conserved among all examined herbivorous hosts, reiterating the important roles these genes play in the breakdown and metabolism of herbivorous diets. Genes encoding some classes of carbohydrate-degrading families, including GH2, GH13, GT2, GT4, CBM50, CBM48, CE4, and CE11, as well as genes associated with sulfur metabolism and dehalogenation, were highly enriched or unique to the MI. In contrast, gene sequences that relate to archaeal methanogenesis were detected only in LI fecal microbiome, and genes coding for GH13, GH66, GT2, GT4, CBM50, CBM13, CE4, and CE8 carbohydrate active enzymes were highly abundant in the LI. Bacterial populations were enriched on various carbohydrates substrates (e.g., glucose, arabinose, xylose). The majority of the enriched bacterial populations belong to genera Clostridium spp. and Enterococcus spp. that likely accounted for the high prevalence of GH13 and GH2, as well as the GT families (e.g., GT2, GT4, GT28, GT35, and GT51) that were ubiquitously present in the fecal microbiota of all herbivorous hosts.

  19. Metagenomic-Based Study of the Phylogenetic and Functional Gene Diversity in Galápagos Land and Marine Iguanas

    KAUST Repository

    Hong, Pei-Ying

    2014-12-19

    In this study, a metagenome-based analysis of the fecal samples from the macrophytic algae-consuming marine iguana (MI; Amblyrhynchus cristatus) and terrestrial biomass-consuming land iguanas (LI; Conolophus spp.) was conducted. Phylogenetic affiliations of the fecal microbiome were more similar between both iguanas than to other mammalian herbivorous hosts. However, functional gene diversities in both MI and LI iguana hosts differed in relation to the diet, where the MI fecal microbiota had a functional diversity that clustered apart from the other terrestrial-biomass consuming reptilian and mammalian hosts. A further examination of the carbohydrate-degrading genes revealed that several of the prevalent glycosyl hydrolases (GH), glycosyl transferases (GT), carbohydrate binding modules (CBM), and carbohydrate esterases (CE) gene classes were conserved among all examined herbivorous hosts, reiterating the important roles these genes play in the breakdown and metabolism of herbivorous diets. Genes encoding some classes of carbohydrate-degrading families, including GH2, GH13, GT2, GT4, CBM50, CBM48, CE4, and CE11, as well as genes associated with sulfur metabolism and dehalogenation, were highly enriched or unique to the MI. In contrast, gene sequences that relate to archaeal methanogenesis were detected only in LI fecal microbiome, and genes coding for GH13, GH66, GT2, GT4, CBM50, CBM13, CE4, and CE8 carbohydrate active enzymes were highly abundant in the LI. Bacterial populations were enriched on various carbohydrates substrates (e.g., glucose, arabinose, xylose). The majority of the enriched bacterial populations belong to genera Clostridium spp. and Enterococcus spp. that likely accounted for the high prevalence of GH13 and GH2, as well as the GT families (e.g., GT2, GT4, GT28, GT35, and GT51) that were ubiquitously present in the fecal microbiota of all herbivorous hosts.

  20. Microbial Metagenomics: Beyond the Genome

    Science.gov (United States)

    Gilbert, Jack A.; Dupont, Christopher L.

    2011-01-01

    Metagenomics literally means “beyond the genome.” Marine microbial metagenomic databases presently comprise ˜400 billion base pairs of DNA, only ˜3% of that found in 1 ml of seawater. Very soon a trillion-base-pair sequence run will be feasible, so it is time to reflect on what we have learned from metagenomics. We review the impact of metagenomics on our understanding of marine microbial communities. We consider the studies facilitated by data generated through the Global Ocean Sampling expedition, as well as the revolution wrought at the individual laboratory level through next generation sequencing technologies. We review recent studies and discoveries since 2008, provide a discussion of bioinformatic analyses, including conceptual pipelines and sequence annotation and predict the future of metagenomics, with suggestions of collaborative community studies tailored toward answering some of the fundamental questions in marine microbial ecology.

  1. The soil microbiome — from metagenomics to metaphenomics

    Energy Technology Data Exchange (ETDEWEB)

    Jansson, Janet K.; Hofmockel, Kirsten S.

    2018-06-01

    Soil microorganisms carry out important processes, including support of plant growth and cycling of carbon and other nutrients. However, the majority of soil microbes have not yet been isolated and their functions are largely unknown. Although metagenomic sequencing reveals microbial identities and functional gene information, it includes DNA from microbes with vastly varying physiological states. Therefore, metagenomics is only predictive of community functional potential. We posit that the next frontier lies in understanding the metaphenome, the product of the combined genetic potential of the microbiome and available resources. Here we describe examples of opportunities towards gaining understanding of the soil metaphenome.

  2. Fate of antibiotic and metal resistance genes during two-phase anaerobic digestion of residue sludge revealed by metagenomic approach.

    Science.gov (United States)

    Wu, Ying; Cui, Erping; Zuo, Yiru; Cheng, Weixiao; Chen, Hong

    2018-03-07

    The prevalence and persistence of antibiotic resistance genes in wastewater treatment plants (WWTPs) is of growing interest, and residual sludge is among the main sources for the release of antibiotic resistance genes (ARGs). Moreover, heavy metals concentrated in dense microbial communities of sludge could potentially favor co-selection of ARGs and metal resistance genes (MRGs). Residual sludge treatment is needed to limit the spread of resistance from WWTPs into the environment. This study aimed to explore the fate of ARGs and MRGs during thermophilic two-phase (acidogenic/methanogenic phase) anaerobic digestion by metagenomic analysis. The occurrence and abundance of mobile genetic elements were also determined based on the SEED database. Among the 27 major ARG subtypes detected in feed sludge, large reductions (> 50%) in 6 ARG subtypes were achieved by acidogenic phase (AP), while 63.0% of the ARG subtypes proliferated in the following methanogenic phase (MP). In contrast, a 2.8-fold increase in total MRG abundance was found in AP, while the total abundance during MP decreased to the same order of magnitude as in feed sludge. The distinct dynamics of ARGs and MRGs during the two-phase anaerobic digestion are noteworthy, and more specific treatments are required to limit their proliferation in the environment.

  3. Nematicidal protease genes screened from a soil metagenomic library to control Radopholus similis mediated by Pseudomonas fluorescens pf36.

    Science.gov (United States)

    Chen, Deqiang; Wang, Dongwei; Xu, Chunling; Chen, Chun; Li, Junyi; Wu, Wenjia; Huang, Xin; Xie, Hui

    2018-04-01

    Controlling Radopholus similis, an important phytopathogenic nematode, is a challenge worldwide. Herein, we constructed a metagenomic fosmid library from the rhizosphere soil of banana plants, and six clones with protease activity were obtained by functionally screening the library. Furthermore, subclones were constructed using the six clones, and three protease genes with nematicidal activity were identified: pase1, pase4, and pase6. The pase4 gene was successfully cloned and expressed, demonstrating that the protease PASE4 could effectively degrade R. similis tissues and result in nematode death. Additionally, we isolated a predominant R. similis-associated bacterium, Pseudomonas fluorescens (pf36), from 10 R. similis populations with different hosts. The pase4 gene was successfully introduced into the pf36 strain by vector transformation and conjugative transposition, and two genetically modified strains were obtained: p4MCS-pf36 and p4Tn5-pf36. p4MCS-pf36 had significantly higher protease expression and nematicidal activity (p < 0.05) than p4Tn5-pf36 in a microtiter plate assay, whereas p4Tn5-pf36 was superior to p4MCS-pf36 in terms of genetic stability and controlling R. similis in growth pot tests. This study confirmed that R. similis is inhibited by the associated bacterium pf36-mediated expression of nematicidal proteases. Herein, a novel approach is provided for the study and development of efficient, environmentally friendly, and sustainable biocontrol techniques against phytonematodes.

  4. Molecular cloning of a novel bioH gene from an environmental metagenome encoding a carboxylesterase with exceptional tolerance to organic solvents

    Directory of Open Access Journals (Sweden)

    Shi Yuping

    2013-02-01

    Full Text Available Abstract Background BioH is one of the key enzymes to produce the precursor pimeloyl-ACP to initiate biotin biosynthesis de novo in bacteria. To date, very few bioH genes have been characterized. In this study, we cloned and identified a novel bioH gene, bioHx, from an environmental metagenome by a functional metagenomic approach. The bioHx gene, encoding an enzyme that is capable of hydrolysis of p-nitrophenyl esters of fatty acids, was expressed in Escherichia coli BL21 using the pET expression system. The biochemical property of the purified BioHx protein was also investigated. Results Screening of an unamplified metagenomic library with a tributyrin-containing medium led to the isolation of a clone exhibiting lipolytic activity. This clone carried a 4,570-bp DNA fragment encoding for six genes, designated bioF, bioHx, fabG, bioC, orf5 and sdh, four of which were implicated in the de novo biotin biosynthesis. The bioHx gene encodes a protein of 259 aa with a calculated molecular mass of 28.60 kDa, displaying 24-39% amino acid sequence identity to a few characterized bacterial BioH enzymes. It contains a pentapeptide motif (Gly76-Trp77-Ser78-Met79-Gly80 and a catalytic triad (Ser78-His230-Asp202, both of which are characteristic for lipolytic enzymes. BioHx was expressed as a recombinant protein and characterized. The purified BioHx protein displayed carboxylesterase activity, and it was most active on p-nitrophenyl esters of fatty acids substrate with a short acyl chain (C4. Comparing BioHx with other known BioH proteins revealed interesting diversity in their sensitivity to ionic and nonionic detergents and organic solvents, and BioHx exhibited exceptional resistance to organic solvents, being the most tolerant one amongst all known BioH enzymes. This ascribed BioHx as a novel carboxylesterase with a strong potential in industrial applications. Conclusions This study constituted the first investigation of a novel bioHx gene in a biotin

  5. Identification of genes and pathways related to phenol degradation in metagenomic libraries from petroleum refinery wastewater.

    Directory of Open Access Journals (Sweden)

    Cynthia C Silva

    Full Text Available Two fosmid libraries, totaling 13,200 clones, were obtained from bioreactor sludge of petroleum refinery wastewater treatment system. The library screening based on PCR and biological activity assays revealed more than 400 positive clones for phenol degradation. From these, 100 clones were randomly selected for pyrosequencing in order to evaluate the genetic potential of the microorganisms present in wastewater treatment plant for biodegradation, focusing mainly on novel genes and pathways of phenol and aromatic compound degradation. The sequence analysis of selected clones yielded 129,635 reads at an estimated 17-fold coverage. The phylogenetic analysis showed Burkholderiales and Rhodocyclales as the most abundant orders among the selected fosmid clones. The MG-RAST analysis revealed a broad metabolic profile with important functions for wastewater treatment, including metabolism of aromatic compounds, nitrogen, sulphur and phosphorus. The predicted 2,276 proteins included phenol hydroxylases and cathecol 2,3- dioxygenases, involved in the catabolism of aromatic compounds, such as phenol, byphenol, benzoate and phenylpropanoid. The sequencing of one fosmid insert of 33 kb unraveled the gene that permitted the host, Escherichia coli EPI300, to grow in the presence of aromatic compounds. Additionally, the comparison of the whole fosmid sequence against bacterial genomes deposited in GenBank showed that about 90% of sequence showed no identity to known sequences of Proteobacteria deposited in the NCBI database. This study surveyed the functional potential of fosmid clones for aromatic compound degradation and contributed to our knowledge of the biodegradative capacity and pathways of microbial assemblages present in refinery wastewater treatment system.

  6. deFUME: Dynamic exploration of functional metagenomic sequencing data

    DEFF Research Database (Denmark)

    van der Helm, Eric; Geertz-Hansen, Henrik Marcus; Genee, Hans Jasper

    2015-01-01

    Functional metagenomic selections represent a powerful technique that is widely applied for identification of novel genes from complex metagenomic sources. However, whereas hundreds to thousands of clones can be easily generated and sequenced over a few days of experiments, analyzing the data...... is time consuming and constitutes a major bottleneck for experimental researchers in the field. Here we present the deFUME web server, an easy-to-use web-based interface for processing, annotation and visualization of functional metagenomics sequencing data, tailored to meet the requirements of non......-bioinformaticians. The web-server integrates multiple analysis steps into one single workflow: read assembly, open reading frame prediction, and annotation with BLAST, InterPro and GO classifiers. Analysis results are visualized in an online dynamic web-interface. The deFUME webserver provides a fast track from raw sequence...

  7. Characterization of Metagenomes in Urban Aquatic Compartments Reveals High Prevalence of Clinically Relevant Antibiotic Resistance Genes in Wastewaters

    Directory of Open Access Journals (Sweden)

    Charmaine Ng

    2017-11-01

    Full Text Available The dissemination of antimicrobial resistance (AMR is an escalating problem and a threat to public health. Comparative metagenomics was used to investigate the occurrence of antibiotic resistant genes (ARGs in wastewater and urban surface water environments in Singapore. Hospital and municipal wastewater (n = 6 were found to have higher diversity and average abundance of ARGs (303 ARG subtypes, 197,816 x/Gb compared to treated wastewater effluent (n = 2, 58 ARG subtypes, 2,692 x/Gb and surface water (n = 5, 35 subtypes, 7,985 x/Gb. A cluster analysis showed that the taxonomic composition of wastewaters was highly similar and had a bacterial community composition enriched in gut bacteria (Bacteroides, Faecalibacterium, Bifidobacterium, Blautia, Roseburia, Ruminococcus, the Enterobacteriaceae group (Klebsiella, Aeromonas, Enterobacter and opportunistic pathogens (Prevotella, Comamonas, Neisseria. Wastewater, treated effluents and surface waters had a shared resistome of 21 ARGs encoding multidrug resistant efflux pumps or resistance to aminoglycoside, macrolide-lincosamide-streptogramins (MLS, quinolones, sulfonamide, and tetracycline resistance which suggests that these genes are wide spread across different environments. Wastewater had a distinctively higher average abundance of clinically relevant, class A beta-lactamase resistant genes (i.e., blaKPC, blaCTX-M, blaSHV, blaTEM. The wastewaters from clinical isolation wards, in particular, had a exceedingly high levels of blaKPC-2 genes (142,200 x/Gb, encoding for carbapenem resistance. Assembled scaffolds (16 and 30 kbp from isolation ward wastewater samples indicated this gene was located on a Tn3-based transposon (Tn4401, a mobilization element found in Klebsiella pneumonia plasmids. In the longer scaffold, transposable elements were flanked by a toxin–antitoxin (TA system and other metal resistant genes that likely increase the persistence, fitness and propagation of the plasmid in the

  8. Identification and in silico characterization of two novel genes encoding peptidases S8 found by functional screening in a metagenomic library of Yucatán underground water.

    Science.gov (United States)

    Apolinar-Hernández, Max M; Peña-Ramírez, Yuri J; Pérez-Rueda, Ernesto; Canto-Canché, Blondy B; De Los Santos-Briones, César; O'Connor-Sánchez, Aileen

    2016-11-15

    Metagenomics is a culture-independent technology that allows access to novel and potentially useful genetic resources from a wide range of unknown microorganisms. In this study, a fosmid metagenomic library of tropical underground water was constructed, and clones were functionally screened for extracellular proteolytic activity. One of the positive clones, containing a 41,614-bp insert, had two genes with 60% and 68% identity respectively with a peptidase S8 of Chitinimonas koreensis. When these genes were individually sub-cloned, in both cases their sub-clones showed proteolytic phenotype, confirming that they both encode functional proteases. These genes -named PrAY5 and PrAY6- are next to each other. They are similar in size (1845bp and 1824bp respectively) and share 66.5% identity. An extensive in silico characterization showed that their ORFs encode complex zymogens having a signal peptide at their 5' end, followed by a pro-peptide, a catalytic region, and a PPC domain at their 3' end. Their translated sequences were classified as peptidases S8A by sequence comparisons against the non-redundant database and corroborated by Pfam and MEROPS. Phylogenetic analysis of the catalytic region showed that they encode novel proteases that clustered with the sub-family S8_13, which according to the CDD database at NCBI, is an uncharacterized subfamily. They clustered in a clade different from the other three proteases S8 found so far by functional metagenomics, and also different from proteases S8 found in sequenced environmental samples, thereby expanding the range of potentially useful proteases that have been identified by metagenomics. I-TASSER modeling corroborated that they may be subtilases, thus possibly they participate in the hydrolysis of proteins with broad specificity for peptide bonds, and have a preference for a large uncharged residue in P1. Copyright © 2016 Elsevier B.V. All rights reserved.

  9. Metagenomic Analysis of the Microbiota from the Crop of an Invasive Snail Reveals a Rich Reservoir of Novel Genes

    Science.gov (United States)

    Cardoso, Alexander M.; Cavalcante, Janaína J. V.; Cantão, Maurício E.; Thompson, Claudia E.; Flatschart, Roberto B.; Glogauer, Arnaldo; Scapin, Sandra M. N.; Sade, Youssef B.; Beltrão, Paulo J. M. S. I.; Gerber, Alexandra L.; Martins, Orlando B.; Garcia, Eloi S.; de Souza, Wanderley; Vasconcelos, Ana Tereza R.

    2012-01-01

    The shortage of petroleum reserves and the increase in CO2 emissions have raised global concerns and highlighted the importance of adopting sustainable energy sources. Second-generation ethanol made from lignocellulosic materials is considered to be one of the most promising fuels for vehicles. The giant snail Achatina fulica is an agricultural pest whose biotechnological potential has been largely untested. Here, the composition of the microbial population within the crop of this invasive land snail, as well as key genes involved in various biochemical pathways, have been explored for the first time. In a high-throughput approach, 318 Mbp of 454-Titanium shotgun metagenomic sequencing data were obtained. The predominant bacterial phylum found was Proteobacteria, followed by Bacteroidetes and Firmicutes. Viruses, Fungi, and Archaea were present to lesser extents. The functional analysis reveals a variety of microbial genes that could assist the host in the degradation of recalcitrant lignocellulose, detoxification of xenobiotics, and synthesis of essential amino acids and vitamins, contributing to the adaptability and wide-ranging diet of this snail. More than 2,700 genes encoding glycoside hydrolase (GH) domains and carbohydrate-binding modules were detected. When we compared GH profiles, we found an abundance of sequences coding for oligosaccharide-degrading enzymes (36%), very similar to those from wallabies and giant pandas, as well as many novel cellulase and hemicellulase coding sequences, which points to this model as a remarkable potential source of enzymes for the biofuel industry. Furthermore, this work is a major step toward the understanding of the unique genetic profile of the land snail holobiont. PMID:23133637

  10. Metagenomic analysis of the microbiota from the crop of an invasive snail reveals a rich reservoir of novel genes.

    Directory of Open Access Journals (Sweden)

    Alexander M Cardoso

    Full Text Available The shortage of petroleum reserves and the increase in CO(2 emissions have raised global concerns and highlighted the importance of adopting sustainable energy sources. Second-generation ethanol made from lignocellulosic materials is considered to be one of the most promising fuels for vehicles. The giant snail Achatina fulica is an agricultural pest whose biotechnological potential has been largely untested. Here, the composition of the microbial population within the crop of this invasive land snail, as well as key genes involved in various biochemical pathways, have been explored for the first time. In a high-throughput approach, 318 Mbp of 454-Titanium shotgun metagenomic sequencing data were obtained. The predominant bacterial phylum found was Proteobacteria, followed by Bacteroidetes and Firmicutes. Viruses, Fungi, and Archaea were present to lesser extents. The functional analysis reveals a variety of microbial genes that could assist the host in the degradation of recalcitrant lignocellulose, detoxification of xenobiotics, and synthesis of essential amino acids and vitamins, contributing to the adaptability and wide-ranging diet of this snail. More than 2,700 genes encoding glycoside hydrolase (GH domains and carbohydrate-binding modules were detected. When we compared GH profiles, we found an abundance of sequences coding for oligosaccharide-degrading enzymes (36%, very similar to those from wallabies and giant pandas, as well as many novel cellulase and hemicellulase coding sequences, which points to this model as a remarkable potential source of enzymes for the biofuel industry. Furthermore, this work is a major step toward the understanding of the unique genetic profile of the land snail holobiont.

  11. Microbial diversity and hydrocarbon degrading gene capacity of a crude oil field soil as determined by metagenomics analysis.

    Science.gov (United States)

    Abbasian, Firouz; Palanisami, Thavamani; Megharaj, Mallavarapu; Naidu, Ravi; Lockington, Robin; Ramadass, Kavitha

    2016-05-01

    Soils contaminated with crude oil are rich sources of enzymes suitable for both degradation of hydrocarbons through bioremediation processes and improvement of crude oil during its refining steps. Due to the long term selection, crude oil fields are unique environments for the identification of microorganisms with the ability to produce these enzymes. In this metagenomic study, based on Hiseq Illumina sequencing of samples obtained from a crude oil field and analysis of data on MG-RAST, Actinomycetales (9.8%) were found to be the dominant microorganisms, followed by Rhizobiales (3.3%). Furthermore, several functional genes were found in this study, mostly belong to Actinobacteria (12.35%), which have a role in the metabolism of aliphatic and aromatic hydrocarbons (2.51%), desulfurization (0.03%), element shortage (5.6%), and resistance to heavy metals (1.1%). This information will be useful for assisting in the application of microorganisms in the removal of hydrocarbon contamination and/or for improving the quality of crude oil. © 2016 American Institute of Chemical Engineers Biotechnol. Prog., 32:638-648, 2016. © 2016 American Institute of Chemical Engineers.

  12. Plasmid metagenomics reveals multiple antibiotic resistance gene classes among the gut microbiomes of hospitalised patients

    DEFF Research Database (Denmark)

    Jitwasinkul, Tossawan; Suriyaphol, Prapat; Tangphatsornruang, Sithichoke

    2016-01-01

    Antibiotic resistance genes are rapidly spread between pathogens and the normal flora, with plasmids playing an important role in their circulation. This study aimed to investigate antibiotic resistance plasmids in the gut microbiome of hospitalised patients. Stool samples were collected from seven...... sequences (using >80% alignment length as the cut-off), and ResFinder was used to classify the antibiotic resistance gene pools. Plasmid replicon modules were used for plasmid typing. Forty-six genes conferring resistance to several classes of antibiotics were identified in the stool samples. Several...... antibiotic resistance genes were shared by the patients; interestingly, most were reported previously in food animals and healthy humans. Four antibiotic resistance genes were found in the healthy subject. One gene (aph3-III) was identified in the patients and the healthy subject and was related...

  13. A sampling and metagenomic sequencing-based methodology for monitoring antimicrobial resistance in swine herds

    DEFF Research Database (Denmark)

    Munk, Patrick; Dalhoff Andersen, Vibe; de Knegt, Leonardo

    2016-01-01

    Objectives Reliable methods for monitoring antimicrobial resistance (AMR) in livestock and other reservoirs are essential to understand the trends, transmission and importance of agricultural resistance. Quantification of AMR is mostly done using culture-based techniques, but metagenomic read...... on known antimicrobial consumption in 10 Danish integrated slaughter pig herds. In addition, we evaluated whether fresh or manure floor samples constitute suitable proxies for intestinal sampling, using cfu counting, qPCR and metagenomic shotgun sequencing. Results Metagenomic read-mapping outperformed...... cultivation-based techniques in terms of predicting expected tetracycline resistance based on antimicrobial consumption. Our metagenomic approach had sufficient resolution to detect antimicrobial-induced changes to individual resistance gene abundances. Pen floor manure samples were found to represent rectal...

  14. Distribution and quantification of antibiotic resistance genes and bacteria across agricultural and non-agricultural metagenomes

    Science.gov (United States)

    There is concern that antibiotic resistance can potentially be transferred from animals to humans through the food chain. The relationship between specific antibiotic resistant bacteria and the genes they carry remains to be described and few details are known about how antibiotic resistance genes i...

  15. Prediction and identification of sequences coding for orphan enzymes using genomic and metagenomic neighbours

    DEFF Research Database (Denmark)

    Yamada, Takuji; Waller, Alison S.; Raes, Jeroen

    2012-01-01

    Despite the current wealth of sequencing data, one-third of all biochemically characterized metabolic enzymes lack a corresponding gene or protein sequence, and as such can be considered orphan enzymes. They represent a major gap between our molecular and biochemical knowledge, and consequently a...

  16. Assembling large, complex environmental metagenomes

    Energy Technology Data Exchange (ETDEWEB)

    Howe, A. C. [Michigan State Univ., East Lansing, MI (United States). Microbiology and Molecular Genetics, Plant Soil and Microbial Sciences; Jansson, J. [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Earth Sciences Division; Malfatti, S. A. [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Tringe, S. G. [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Tiedje, J. M. [Michigan State Univ., East Lansing, MI (United States). Microbiology and Molecular Genetics, Plant Soil and Microbial Sciences; Brown, C. T. [Michigan State Univ., East Lansing, MI (United States). Microbiology and Molecular Genetics, Computer Science and Engineering

    2012-12-28

    The large volumes of sequencing data required to sample complex environments deeply pose new challenges to sequence analysis approaches. De novo metagenomic assembly effectively reduces the total amount of data to be analyzed but requires significant computational resources. We apply two pre-assembly filtering approaches, digital normalization and partitioning, to make large metagenome assemblies more computationaly tractable. Using a human gut mock community dataset, we demonstrate that these methods result in assemblies nearly identical to assemblies from unprocessed data. We then assemble two large soil metagenomes from matched Iowa corn and native prairie soils. The predicted functional content and phylogenetic origin of the assembled contigs indicate significant taxonomic differences despite similar function. The assembly strategies presented are generic and can be extended to any metagenome; full source code is freely available under a BSD license.

  17. Marine metagenomics as a source for bioprospecting

    KAUST Repository

    Kodzius, Rimantas

    2015-08-12

    This review summarizes usage of genome-editing technologies for metagenomic studies; these studies are used to retrieve and modify valuable microorganisms for production, particularly in marine metagenomics. Organisms may be cultivable or uncultivable. Metagenomics is providing especially valuable information for uncultivable samples. The novel genes, pathways and genomes can be deducted. Therefore, metagenomics, particularly genome engineering and system biology, allows for the enhancement of biological and chemical producers and the creation of novel bioresources. With natural resources rapidly depleting, genomics may be an effective way to efficiently produce quantities of known and novel foods, livestock feed, fuels, pharmaceuticals and fine or bulk chemicals.

  18. Plasmid metagenome reveals high levels of antibiotic resistance genes and mobile genetic elements in activated sludge

    OpenAIRE

    Zhang, T; Zhang, XX; Ye, L

    2011-01-01

    The overuse or misuse of antibiotics has accelerated antibiotic resistance, creating a major challenge for the public health in the world. Sewage treatment plants (STPs) are considered as important reservoirs for antibiotic resistance genes (ARGs) and activated sludge characterized with high microbial density and diversity facilitates ARG horizontal gene transfer (HGT) via mobile genetic elements (MGEs). However, little is known regarding the pool of ARGs and MGEs in sludge microbiome. In thi...

  19. Identification of biosynthetic gene clusters from metagenomic libraries using PPTase complementation in a Streptomyces host.

    Science.gov (United States)

    Bitok, J Kipchirchir; Lemetre, Christophe; Ternei, Melinda A; Brady, Sean F

    2017-09-01

    The majority of environmental bacteria are not readily cultured in the lab, leaving the natural products they make inaccessible using culture-dependent discovery methods. Cloning and heterologous expression of DNA extracted from environmental samples (environmental DNA, eDNA) provides a means of circumventing this discovery bottleneck. To facilitate the identification of clones containing biosynthetic gene clusters, we developed a model heterologous expression reporter strain Streptomyces albus::bpsA ΔPPTase. This strain carries a 4΄-phosphopantetheinyl transferase (PPTase)-dependent blue pigment synthase A gene, bpsA, in a PPTase deletion background. eDNA clones that express a functional PPTase restore production of the blue pigment, indigoidine. As PPTase genes often occur in biosynthetic gene clusters (BGCs), indigoidine production can be used to identify eDNA clones containing BGCs. We screened a soil eDNA library hosted in S. albus::bpsA ΔPPTase and identified clones containing non-ribosomal peptide synthetase (NRPS), polyketide synthase (PKS) and mixed NRPS/PKS biosynthetic gene clusters. One NRPS gene cluster was shown to confer the production of myxochelin A to S. albus::bpsA ΔPPTase. © FEMS 2017. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  20. The Dark Side of the Mushroom Spring Microbial Mat: Life in the Shadow of Chlorophototrophs. II. Metabolic Functions of Abundant Community Members Predicted from Metagenomic Analyses

    Directory of Open Access Journals (Sweden)

    Vera Thiel

    2017-06-01

    Full Text Available Microbial mat communities in the effluent channels of Octopus and Mushroom Springs within the Lower Geyser Basin of Yellowstone National Park have been extensively characterized. Previous studies have focused on the chlorophototrophic organisms of the phyla Cyanobacteria and Chloroflexi. However, the diversity and metabolic functions of the other portion of the community in the microoxic/anoxic region of the mat are poorly understood. We recently described the diverse but extremely uneven microbial assemblage in the undermat of Mushroom Spring based on 16S rRNA amplicon sequences, which was dominated by Roseiflexus members, filamentous anoxygenic chlorophototrophs. In this study, we analyzed the orange-colored undermat portion of the community of Mushroom Spring mats in a genome-centric approach and discuss the metabolic potentials of the major members. Metagenome binning recovered partial genomes of all abundant community members, ranging in completeness from ~28 to 96%, and allowed affiliation of function with taxonomic identity even for representatives of novel and Candidate phyla. Less complete metagenomic bins correlated with high microdiversity. The undermat portion of the community was found to be a mixture of phototrophic and chemotrophic organisms, which use bicarbonate as well as organic carbon sources derived from different cell components and fermentation products. The presence of rhodopsin genes in many taxa strengthens the hypothesis that light energy is of major importance. Evidence for the usage of all four bacterial carbon fixation pathways was found in the metagenome. Nitrogen fixation appears to be limited to Synechococcus spp. in the upper mat layer and Thermodesulfovibrio sp. in the undermat, and nitrate/nitrite metabolism was limited. A closed sulfur cycle is indicated by biological sulfate reduction combined with the presence of genes for sulfide oxidation mainly in phototrophs. Finally, a variety of undermat

  1. The Dark Side of the Mushroom Spring Microbial Mat: Life in the Shadow of Chlorophototrophs. II. Metabolic Functions of Abundant Community Members Predicted from Metagenomic Analyses.

    Science.gov (United States)

    Thiel, Vera; Hügler, Michael; Ward, David M; Bryant, Donald A

    2017-01-01

    Microbial mat communities in the effluent channels of Octopus and Mushroom Springs within the Lower Geyser Basin of Yellowstone National Park have been extensively characterized. Previous studies have focused on the chlorophototrophic organisms of the phyla Cyanobacteria and Chloroflexi . However, the diversity and metabolic functions of the other portion of the community in the microoxic/anoxic region of the mat are poorly understood. We recently described the diverse but extremely uneven microbial assemblage in the undermat of Mushroom Spring based on 16S rRNA amplicon sequences, which was dominated by Roseiflexus members, filamentous anoxygenic chlorophototrophs. In this study, we analyzed the orange-colored undermat portion of the community of Mushroom Spring mats in a genome-centric approach and discuss the metabolic potentials of the major members. Metagenome binning recovered partial genomes of all abundant community members, ranging in completeness from ~28 to 96%, and allowed affiliation of function with taxonomic identity even for representatives of novel and Candidate phyla. Less complete metagenomic bins correlated with high microdiversity. The undermat portion of the community was found to be a mixture of phototrophic and chemotrophic organisms, which use bicarbonate as well as organic carbon sources derived from different cell components and fermentation products. The presence of rhodopsin genes in many taxa strengthens the hypothesis that light energy is of major importance. Evidence for the usage of all four bacterial carbon fixation pathways was found in the metagenome. Nitrogen fixation appears to be limited to Synechococcus spp. in the upper mat layer and Thermodesulfovibrio sp. in the undermat, and nitrate/nitrite metabolism was limited. A closed sulfur cycle is indicated by biological sulfate reduction combined with the presence of genes for sulfide oxidation mainly in phototrophs. Finally, a variety of undermat microorganisms have genes for

  2. The Dark Side of the Mushroom Spring Microbial Mat: Life in the Shadow of Chlorophototrophs. II. Metabolic Functions of Abundant Community Members Predicted from Metagenomic Analyses

    Science.gov (United States)

    Thiel, Vera; Hügler, Michael; Ward, David M.; Bryant, Donald A.

    2017-01-01

    Microbial mat communities in the effluent channels of Octopus and Mushroom Springs within the Lower Geyser Basin of Yellowstone National Park have been extensively characterized. Previous studies have focused on the chlorophototrophic organisms of the phyla Cyanobacteria and Chloroflexi. However, the diversity and metabolic functions of the other portion of the community in the microoxic/anoxic region of the mat are poorly understood. We recently described the diverse but extremely uneven microbial assemblage in the undermat of Mushroom Spring based on 16S rRNA amplicon sequences, which was dominated by Roseiflexus members, filamentous anoxygenic chlorophototrophs. In this study, we analyzed the orange-colored undermat portion of the community of Mushroom Spring mats in a genome-centric approach and discuss the metabolic potentials of the major members. Metagenome binning recovered partial genomes of all abundant community members, ranging in completeness from ~28 to 96%, and allowed affiliation of function with taxonomic identity even for representatives of novel and Candidate phyla. Less complete metagenomic bins correlated with high microdiversity. The undermat portion of the community was found to be a mixture of phototrophic and chemotrophic organisms, which use bicarbonate as well as organic carbon sources derived from different cell components and fermentation products. The presence of rhodopsin genes in many taxa strengthens the hypothesis that light energy is of major importance. Evidence for the usage of all four bacterial carbon fixation pathways was found in the metagenome. Nitrogen fixation appears to be limited to Synechococcus spp. in the upper mat layer and Thermodesulfovibrio sp. in the undermat, and nitrate/nitrite metabolism was limited. A closed sulfur cycle is indicated by biological sulfate reduction combined with the presence of genes for sulfide oxidation mainly in phototrophs. Finally, a variety of undermat microorganisms have genes for

  3. Metagenomics shows that low-energy anaerobic-aerobic treatment reactors reduce antibiotic resistance gene levels from domestic wastewater.

    Science.gov (United States)

    Christgen, Beate; Yang, Ying; Ahammad, S Z; Li, Bing; Rodriquez, D Catalina; Zhang, Tong; Graham, David W

    2015-02-17

    Effective domestic wastewater treatment is among our primary defenses against the dissemination of infectious waterborne disease. However, reducing the amount of energy used in treatment processes has become essential for the future. One low-energy treatment option is anaerobic-aerobic sequence (AAS) bioreactors, which use an anaerobic pretreatment step (e.g., anaerobic hybrid reactors) to reduce carbon levels, followed by some form of aerobic treatment. Although AAS is common in warm climates, it is not known how its compares to other treatment options relative to disease transmission, including its influence on antibiotic resistance (AR) in treated effluents. Here, we used metagenomic approaches to contrast the fate of antibiotic-resistant genes (ARG) in anaerobic, aerobic, and AAS bioreactors treating domestic wastewater. Five reactor configurations were monitored for 6 months, and treatment performance, energy use, and ARG abundance and diversity were compared in influents and effluents. AAS and aerobic reactors were superior to anaerobic units in reducing ARG-like sequence abundances, with effluent ARG levels of 29, 34, and 74 ppm (198 ppm influent), respectively. AAS and aerobic systems especially reduced aminoglycoside, tetracycline, and β-lactam ARG levels relative to anaerobic units, although 63 persistent ARG subtypes were detected in effluents from all systems (of 234 assessed). Sulfonamide and chloramphenicol ARG levels were largely unaffected by treatment, whereas a broad shift from target-specific ARGs to ARGs associated with multi-drug resistance was seen across influents and effluents. AAS reactors show promise for future applications because they can reduce more ARGs for less energy (32% less energy here), but all three treatment options have limitations and need further study.

  4. Insights into novel antimicrobial compounds and antibiotic resistance genes from soil metagenomes

    OpenAIRE

    de Castro, Alinne P.; Fernandes, Gabriel da R.; Franco, Octávio L.

    2014-01-01

    In recent years a major worldwide problem has arisen with regard to infectious diseases caused by resistant bacteria. Resistant pathogens are related to high mortality and also to enormous healthcare costs. In this field, cultured microorganisms have been commonly focused in attempts to isolate antibiotic resistance genes or to identify antimicrobial compounds. Although this strategy has been successful in many cases, most of the microbial diversity and related antimicrobial molecules have be...

  5. Current and future resources for functional metagenomics

    Directory of Open Access Journals (Sweden)

    Kathy Nguyen Lam

    2015-10-01

    Full Text Available Functional metagenomics is a powerful experimental approach for studying gene function, starting from the extracted DNA of mixed microbial populations. A functional approach relies on the construction and screening of metagenomic libraries – physical libraries that contain DNA cloned from environmental metagenomes. The information obtained from functional metagenomics can help in future annotations of gene function and serve as a complement to sequence-based metagenomics. In this Perspective, we begin by summarizing the technical challenges of constructing metagenomic libraries and emphasize their value as resources. We then discuss libraries constructed using the popular cloning vector, pCC1FOS, and highlight the strengths and shortcomings of this system, alongside possible strategies to maximize existing pCC1FOS-based libraries by screening in diverse hosts. Finally, we discuss the known bias of libraries constructed from human gut and marine water samples, present results that suggest bias may also occur for soil libraries, and consider factors that bias metagenomic libraries in general. We anticipate that discussion of current resources and limitations will advance tools and technologies for functional metagenomics research.

  6. Antibiotic Resistance Genes and Correlations with Microbial Community and Metal Resistance Genes in Full-Scale Biogas Reactors As Revealed by Metagenomic Analysis.

    Science.gov (United States)

    Luo, Gang; Li, Bing; Li, Li-Guan; Zhang, Tong; Angelidaki, Irini

    2017-04-04

    Digested residues from biogas plants are often used as biofertilizers for agricultural crops cultivation. The antibiotic resistance genes (ARGs) in digested residues pose a high risk to public health due to their potential spread to the disease-causing microorganisms and thus reduce the susceptibility of disease-causing microorganisms to antibiotics in medical treatment. A high-throughput sequencing (HTS)-based metagenomic approach was used in the present study to investigate the variations of ARGs in full-scale biogas reactors and the correlations of ARGs with microbial communities and metal resistance genes (MRGs). The total abundance of ARGs in all the samples varied from 7 × 10 -3 to 1.08 × 10 -1 copy of ARG/copy of 16S-rRNA gene, and the samples obtained from thermophilic biogas reactors had a lower total abundance of ARGs, indicating the superiority of thermophilic anaerobic digestion for ARGs removal. ARGs in all the samples were composed of 175 ARG subtypes; however, only 7 ARG subtypes were shared by all the samples. Principal component analysis and canonical correspondence analysis clustered the samples into three groups (samples from manure-based mesophilic reactors, manure-based thermophilic reactors, and sludge-based mesophilic reactors), and substrate, temperature, and hydraulic retention time (HRT) as well as volatile fatty acids (VFAs) were identified as crucial environmental variables affecting the ARGs compositions. Procrustes analysis revealed microbial community composition was the determinant of ARGs composition in biogas reactors, and there was also a significant correlation between ARGs composition and MRGs composition. Network analysis further revealed the co-occurrence of ARGs with specific microorganisms and MRGs.

  7. Metagenomic profiles and antibiotic resistance genes in gut microbiota of mice exposed to arsenic and iron.

    Science.gov (United States)

    Guo, Xuechao; Liu, Su; Wang, Zhu; Zhang, Xu-xiang; Li, Mei; Wu, Bing

    2014-10-01

    Iron (Fe) has been widely applied to treat arsenic (As)-contaminated water, and Fe could influence bioavailability and toxicity of As. However, little is known about the impact of As and/or Fe on gut microbiota, which plays important roles in host health. In this study, high-throughput sequencing and quantitative real time PCR were applied to analyze the impact of As and Fe on mouse gut microbiota. Co-exposure of As and Fe mitigated effects on microbial community to a certain extent. Correlation analysis showed the shifts in gut microbiota caused by As and/or Fe exposure might be important reason of changes in metabolic profiles of mouse. For antibiotic resistance genes (ARGs), co-exposure of As and Fe increased types and abundance of ARGs. But for high abundance ARGs, such as tetQ, tetO and tetM, co-exposure of As and Fe mitigated effects on their abundances compared to exposure to As and Fe alone. No obvious relationship between ARGs and mobile genetic elements were found. The changes in ARGs caused by metal exposure might be due to the alteration of gut microbial diversity. Our results show that changes of gut microbial community caused by As and/or Fe can influence host metabolisms and abundances of ARGs in gut, indicating that changes of gut microbiota should be considered during the risk assessment of As and/or Fe. Copyright © 2014 Elsevier Ltd. All rights reserved.

  8. Metagenomic Survey of Potential Symbiotic Bacteria and Polyketide Synthase Genes in an Indonesian Marine Sponge

    Directory of Open Access Journals (Sweden)

    Nia M. Kurnia

    2017-01-01

    Full Text Available There has been emerging evidence that the bacteria associated with marine sponges are the key producers of many complex bioactive compounds. The as-yet uncultured candidate bacterial genus “Candidatus Entotheonella” of the marine sponge Theonella swinhoei from Japan have recently been recognized as the source of numerous pharmacologically relevant polyketides and modified peptides, as previously reported by the Piel group (Wilson et al. 2014. This work reported the presence of “Candidatus Entotheonella sp.” in the highly complex microbiome of an Indonesian marine sponge from Kapoposang Island, South Sulawesi. We further identified the Kapoposang sponge specimen used in this work as Rhabdastrella sp. based on the integrated morphological, histological, and cytochrome oxidase subunit I (COI gene analyses. To detect the polyketide biosynthetic machinery called type I polyketide synthase (PKS in this Indonesian Rhabdastrella sp., we amplified and cloned the ketosynthase-encoding DNA regions of approximately 700 bp from the uncultured sponge's microbiome. Further sequencing and analysis of several randomly chosen clones indicated that all of them are mostly likely involved in the biosynthesis of methyl-branched fatty acids. However, employing a PKS-targeting primer designed in this work led to the isolation of four positive clones. BlastX search and subsequent phylogenetic analysis showed that one of the positive clones, designed as RGK32, displayed high homology with ketosynthase domains of many type I PKS systems and may belong to the subclass cis-AT PKS group.

  9. Predictive microbiology combined with metagenomic analysis targeted on the 16S rDNA : A new approach for food quality

    OpenAIRE

    Delhalle, Laurent; Taminiau, Bernard; Ellouze, Mariem; Nezer, Carine; Daube, Georges

    2013-01-01

    OBJECTIVES The food spoilage process is mainly caused by alteration micro-organisms and classical culture-based methods have therefore been used to assess the microbiological quality of food. These techniques are simple to implement but may not be relevant to understand the modifications of the microbial ecology which occur in the food product in response to different changes in the environmental conditions. Metagenomic analysis targeted on 16S ribosomal DNA can bring about a solution to t...

  10. Seasonal changes in the abundance of bacterial genes related to dimethylsulfoniopropionate catabolism in seawater from Ofunato Bay revealed by metagenomic analysis

    KAUST Repository

    Kudo, Toshiaki

    2018-04-26

    Ofunato Bay is located in the northeastern Pacific Ocean area of Japan, and it has the highest biodiversity of marine organisms in the world, primarily due to tidal influences from the cold Oyashio and warm Kuroshio currents. Our previous results from performing shotgun metagenomics indicated that Candidatus Pelagibacter ubique and Planktomarina temperata were the dominant bacteria (Reza et al., 2018a, 2018b). These bacteria are reportedly able to catabolize dimethylsulfoniopropionate (DMSP) produced from phytoplankton into dimethyl sulfide (DMS) or methanethiol (MeSH). This study was focused on seasonal changes in the abundances of bacterial genes (dddP, dmdA) related to DMSP catabolism in the seawater of Ofunato Bay by BLAST+ analysis using shotgun metagenomic datasets. We found seasonal changes among the Candidatus Pelagibacter ubique strains, including those of the HTCC1062 type and the Red Sea type. A good correlation was observed between the chlorophyll a concentrations and the abundances of the catabolic genes, suggesting that the bacteria directly interact with phytoplankton in the marine material cycle system and play important roles in producing DMS and MeSH from DMSP as signaling molecules for the possible formation of the scent of the tidewater or as fish attractants.

  11. Metagenomic analysis of microbial communities and beyond

    DEFF Research Database (Denmark)

    Schreiber, Lars

    2014-01-01

    From small clone libraries to large next-generation sequencing datasets – the field of community genomics or metagenomics has developed tremendously within the last years. This chapter will summarize some of these developments and will also highlight pitfalls of current metagenomic analyses. It w...... heterologous expression of metagenomic DNA fragments to discover novel metabolic functions. Lastly, the chapter will shortly discuss the meta-analysis of gene expression of microbial communities, more precisely metatranscriptomics and metaproteomics.......From small clone libraries to large next-generation sequencing datasets – the field of community genomics or metagenomics has developed tremendously within the last years. This chapter will summarize some of these developments and will also highlight pitfalls of current metagenomic analyses...

  12. Identification, characterization and metagenome analysis of oocyte-specific genes organized in clusters in the mouse genome

    Directory of Open Access Journals (Sweden)

    Vaiman Daniel

    2005-05-01

    Full Text Available Abstract Background Genes specifically expressed in the oocyte play key roles in oogenesis, ovarian folliculogenesis, fertilization and/or early embryonic development. In an attempt to identify novel oocyte-specific genes in the mouse, we have used an in silico subtraction methodology, and we have focused our attention on genes that are organized in genomic clusters. Results In the present work, five clusters have been studied: a cluster of thirteen genes characterized by an F-box domain localized on chromosome 9, a cluster of six genes related to T-cell leukaemia/lymphoma protein 1 (Tcl1 on chromosome 12, a cluster composed of a SPErm-associated glutamate (E-Rich (Speer protein expressed in the oocyte in the vicinity of four unknown genes specifically expressed in the testis on chromosome 14, a cluster composed of the oocyte secreted protein-1 (Oosp-1 gene and two Oosp-related genes on chromosome 19, all three being characterized by a partial N-terminal zona pellucida-like domain, and another small cluster of two genes on chromosome 19 as well, composed of a TWIK-Related spinal cord K+ channel encoding-gene, and an unknown gene predicted in silico to be testis-specific. The specificity of expression was confirmed by RT-PCR and in situ hybridization for eight and five of them, respectively. Finally, we showed by comparing all of the isolated and clustered oocyte-specific genes identified so far in the mouse genome, that the oocyte-specific clusters are significantly closer to telomeres than isolated oocyte-specific genes are. Conclusion We have studied five clusters of genes specifically expressed in female, some of them being also expressed in male germ-cells. Moreover, contrarily to non-clustered oocyte-specific genes, those that are organized in clusters tend to map near chromosome ends, suggesting that this specific near-telomere position of oocyte-clusters in rodents could constitute an evolutionary advantage. Understanding the biological

  13. Metagenomic applications in environmental monitoring and bioremediation.

    Science.gov (United States)

    Techtmann, Stephen M; Hazen, Terry C

    2016-10-01

    With the rapid advances in sequencing technology, the cost of sequencing has dramatically dropped and the scale of sequencing projects has increased accordingly. This has provided the opportunity for the routine use of sequencing techniques in the monitoring of environmental microbes. While metagenomic applications have been routinely applied to better understand the ecology and diversity of microbes, their use in environmental monitoring and bioremediation is increasingly common. In this review we seek to provide an overview of some of the metagenomic techniques used in environmental systems biology, addressing their application and limitation. We will also provide several recent examples of the application of metagenomics to bioremediation. We discuss examples where microbial communities have been used to predict the presence and extent of contamination, examples of how metagenomics can be used to characterize the process of natural attenuation by unculturable microbes, as well as examples detailing the use of metagenomics to understand the impact of biostimulation on microbial communities.

  14. Predicting gene expression from sequence: a reexamination.

    Directory of Open Access Journals (Sweden)

    Yuan Yuan

    2007-11-01

    Full Text Available Although much of the information regarding genes' expressions is encoded in the genome, deciphering such information has been very challenging. We reexamined Beer and Tavazoie's (BT approach to predict mRNA expression patterns of 2,587 genes in Saccharomyces cerevisiae from the information in their respective promoter sequences. Instead of fitting complex Bayesian network models, we trained naïve Bayes classifiers using only the sequence-motif matching scores provided by BT. Our simple models correctly predict expression patterns for 79% of the genes, based on the same criterion and the same cross-validation (CV procedure as BT, which compares favorably to the 73% accuracy of BT. The fact that our approach did not use position and orientation information of the predicted binding sites but achieved a higher prediction accuracy, motivated us to investigate a few biological predictions made by BT. We found that some of their predictions, especially those related to motif orientations and positions, are at best circumstantial. For example, the combinatorial rules suggested by BT for the PAC and RRPE motifs are not unique to the cluster of genes from which the predictive model was inferred, and there are simpler rules that are statistically more significant than BT's ones. We also show that CV procedure used by BT to estimate their method's prediction accuracy is inappropriate and may have overestimated the prediction accuracy by about 10%.

  15. Isolation of a gene encoding a cellulolytic enzyme from swamp buffalo rumen metagenomes and its cloning and expression in Escherichia coli.

    Science.gov (United States)

    Cheema, Tanzeem Akbar; Jirajaroenrat, Kanya; Sirinarumitr, Theerapol; Rakshit, Sudip K

    2012-01-01

    Ruminants are capable of hydrolyzing lignocellulosic residues to absorbable sugars by virtue of the microbial communities residing in their rumen. However, large sections of such microbial communities are not yet culturable using conventional laboratory techniques. Therefore in the present study, the metagenomic DNA of swamp buffalo (Bubalus bubalis) rumen contents was explored using culture-independent techniques. The consensus regions of glycosyl hydrolase 5 (GH5) family of cellulases were used as primers for PCR amplification. A full-length metagenomic cellulase gene, Umcel5B29, with a complete open reading frame (ORF) of 1611 bp was identified. The similarity search analysis revealed that Umcel5B29 is closely related to the cellulases (73% to 98% similarity) of ruminal unculturable microorganisms, indicating its phylogenetic origin. Further analysis indicated that Umcel5B29 does not contain a carbohydrate binding module (CBM). Subsequently, Umcel5B29 was overexpressed in Escherichia coli. The recombinant enzyme worked optimally at pH 5.5 and 45°C, a condition similar to the buffalo's rumen. However, the enzyme retained more than 70% of its maximal activity after incubation at pH 4-7 and more than 50% maximal activity after incubation at 30-60°C for 30 min. These characteristics render Umcel5B29 as a potential candidate for the bio-stoning process of denim.

  16. OTU analysis using metagenomic shotgun sequencing data.

    Directory of Open Access Journals (Sweden)

    Xiaolin Hao

    Full Text Available Because of technological limitations, the primer and amplification biases in targeted sequencing of 16S rRNA genes have veiled the true microbial diversity underlying environmental samples. However, the protocol of metagenomic shotgun sequencing provides 16S rRNA gene fragment data with natural immunity against the biases raised during priming and thus the potential of uncovering the true structure of microbial community by giving more accurate predictions of operational taxonomic units (OTUs. Nonetheless, the lack of statistically rigorous comparison between 16S rRNA gene fragments and other data types makes it difficult to interpret previously reported results using 16S rRNA gene fragments. Therefore, in the present work, we established a standard analysis pipeline that would help confirm if the differences in the data are true or are just due to potential technical bias. This pipeline is built by using simulated data to find optimal mapping and OTU prediction methods. The comparison between simulated datasets revealed a relationship between 16S rRNA gene fragments and full-length 16S rRNA sequences that a 16S rRNA gene fragment having a length >150 bp provides the same accuracy as a full-length 16S rRNA sequence using our proposed pipeline, which could serve as a good starting point for experimental design and making the comparison between 16S rRNA gene fragment-based and targeted 16S rRNA sequencing-based surveys possible.

  17. Computational algorithms to predict Gene Ontology annotations.

    Science.gov (United States)

    Pinoli, Pietro; Chicco, Davide; Masseroli, Marco

    2015-01-01

    Gene function annotations, which are associations between a gene and a term of a controlled vocabulary describing gene functional features, are of paramount importance in modern biology. Datasets of these annotations, such as the ones provided by the Gene Ontology Consortium, are used to design novel biological experiments and interpret their results. Despite their importance, these sources of information have some known issues. They are incomplete, since biological knowledge is far from being definitive and it rapidly evolves, and some erroneous annotations may be present. Since the curation process of novel annotations is a costly procedure, both in economical and time terms, computational tools that can reliably predict likely annotations, and thus quicken the discovery of new gene annotations, are very useful. We used a set of computational algorithms and weighting schemes to infer novel gene annotations from a set of known ones. We used the latent semantic analysis approach, implementing two popular algorithms (Latent Semantic Indexing and Probabilistic Latent Semantic Analysis) and propose a novel method, the Semantic IMproved Latent Semantic Analysis, which adds a clustering step on the set of considered genes. Furthermore, we propose the improvement of these algorithms by weighting the annotations in the input set. We tested our methods and their weighted variants on the Gene Ontology annotation sets of three model organism genes (Bos taurus, Danio rerio and Drosophila melanogaster ). The methods showed their ability in predicting novel gene annotations and the weighting procedures demonstrated to lead to a valuable improvement, although the obtained results vary according to the dimension of the input annotation set and the considered algorithm. Out of the three considered methods, the Semantic IMproved Latent Semantic Analysis is the one that provides better results. In particular, when coupled with a proper weighting policy, it is able to predict a

  18. Comparative metagenomics of the Red Sea

    KAUST Repository

    Mineta, Katsuhiko

    2016-01-26

    Metagenome produces a tremendous amount of data that comes from the organisms living in the environments. This big data enables us to examine not only microbial genes but also the community structure, interaction and adaptation mechanisms at the specific location and condition. The Red Sea has several unique characteristics such as high salinity, high temperature and low nutrition. These features must contribute to form the unique microbial community during the evolutionary process. Since 2014, we started monthly samplings of the metagenomes in the Red Sea under KAUST-CCF project. In collaboration with Kitasato University, we also collected the metagenome data from the ocean in Japan, which shows contrasting features to the Red Sea. Therefore, the comparative metagenomics of those data provides a comprehensive view of the Red Sea microbes, leading to identify key microbes, genes and networks related to those environmental differences.

  19. A metagenomics portal for a democratized sequencing world.

    Science.gov (United States)

    Wilke, Andreas; Glass, Elizabeth M; Bartels, Daniela; Bischof, Jared; Braithwaite, Daniel; D'Souza, Mark; Gerlach, Wolfgang; Harrison, Travis; Keegan, Kevin; Matthews, Hunter; Kottmann, Renzo; Paczian, Tobias; Tang, Wei; Trimble, William L; Yilmaz, Pelin; Wilkening, Jared; Desai, Narayan; Meyer, Folker

    2013-01-01

    The democratized world of sequencing is leading to numerous data analysis challenges; MG-RAST addresses many of these challenges for diverse datasets, including amplicon datasets, shotgun metagenomes, and metatranscriptomes. The changes from version 2 to version 3 include the addition of a dedicated gene calling stage using FragGenescan, clustering of predicted proteins at 90% identity, and the use of BLAT for the computation of similarities. Together with changes in the underlying software infrastructure, this has enabled the dramatic scaling up of pipeline throughput while remaining on a limited hardware budget. The Web-based service allows upload, fully automated analysis, and visualization of results. As a result of the plummeting cost of sequencing and the readily available analytical power of MG-RAST, over 78,000 metagenomic datasets have been analyzed, with over 12,000 of them publicly available in MG-RAST. © 2013 Elsevier Inc. All rights reserved.

  20. [Mini review] metagenomic studies of the Red Sea

    KAUST Repository

    Behzad, Hayedeh

    2015-10-23

    Metagenomics has significantly advanced the field of marine microbial ecology, revealing the vast diversity of previously unknown microbial life forms in different marine niches. The tremendous amount of data generated has enabled identification of a large number of microbial genes (metagenomes), their community interactions, adaptation mechanisms, and their potential applications in pharmaceutical and biotechnology-based industries. Comparative metagenomics reveals that microbial diversity is a function of the local environment, meaning that unique or unusual environments typically harbor novel microbial species with unique genes and metabolic pathways. The Red Sea has an abundance of unique characteristics; however, its microbiota is one of the least studied amongst marine environments. The Red Sea harbors approximately 25 hot anoxic brine pools, plus a vibrant coral reef ecosystem. Physiochemical studies describe the Red Sea as an oligotrophic environment that contains one of the warmest and saltiest waters in the world with year-round high UV radiations. These characteristics are believed to have shaped the evolution of microbial communities in the Red Sea. Over-representation of genes involved in DNA repair, high-intensity light responses, and osmolyte C1 oxidation were found in the Red Sea metagenomic databases suggesting acquisition of specific environmental adaptation by the Red Sea microbiota. The Red Sea brine pools harbor a diverse range of halophilic and thermophilic bacterial and archaeal communities, which are potential sources of enzymes for pharmaceutical and biotechnology-based application. Understanding the mechanisms of these adaptations and their function within the larger ecosystem could also prove useful in light of predicted global warming scenarios where global ocean temperatures are expected to rise by 1–3 °C in the next few decades. In this review, we provide an overview of the published metagenomic studies that were conducted in the

  1. Screening a novel Na+/H+ antiporter gene from a metagenomic library of halophiles colonizing in the Dagong Ancient Brine Well in China.

    Science.gov (United States)

    Xiang, Wenliang; Zhang, Jie; Li, Lin; Liang, Huazhong; Luo, Hai; Zhao, Jian; Yang, Zhirong; Sun, Qun

    2010-05-01

    Metagenomic DNA libraries constructed from the Dagong Ancient Brine Well were screened for genes with Na(+)/H(+) antiporter activity on the antiporter-deficient Escherichia coli KNabc strain. One clone with a stable Na(+)-resistant phenotype was obtained and its Na(+)/H(+) antiporter gene was sequenced and designated as m-nha. The deduced amino acid sequence of M-Nha protein consists of 523 residues with a calculated molecular weight of 58 147 Da and a pI of 5.50, which is homologous with NhaH from Halobacillus dabanensis D-8(T) (92%) and Halobacillus aidingensis AD-6(T) (86%), and with Nhe2 from Bacillus sp. NRRL B-14911 (64%). It had a hydropathy profile with 10 putative transmembrane domains and a long carboxyl terminal hydrophilic tail of 140 amino acid residues, similar to Nhap from Synechocystis sp. and Aphanothece halophytica, as well as NhaG from Bacillus subtilis. The m-nha gene in the antiporter-negative mutant E. coli KNabc conferred resistance to Na(+) and the ability to grow under alkaline conditions. The difference in amino acid sequence and the putative secondary structure suggested that the m-nha isolated from the Dagong Ancient Brine Well in this study was a novel Na(+)/H(+) antiporter gene.

  2. Distribution of triclosan-resistant genes in major pathogenic microorganisms revealed by metagenome and genome-wide analysis.

    Directory of Open Access Journals (Sweden)

    Raees Khan

    Full Text Available The substantial use of triclosan (TCS has been aimed to kill pathogenic bacteria, but TCS resistance seems to be prevalent in microbial species and limited knowledge exists about TCS resistance determinants in a majority of pathogenic bacteria. We aimed to evaluate the distribution of TCS resistance determinants in major pathogenic bacteria (N = 231 and to assess the enrichment of potentially pathogenic genera in TCS contaminated environments. A TCS-resistant gene (TRG database was constructed and experimentally validated to predict TCS resistance in major pathogenic bacteria. Genome-wide in silico analysis was performed to define the distribution of TCS-resistant determinants in major pathogens. Microbiome analysis of TCS contaminated soil samples was also performed to investigate the abundance of TCS-resistant pathogens. We experimentally confirmed that TCS resistance could be accurately predicted using genome-wide in silico analysis against TRG database. Predicted TCS resistant phenotypes were observed in all of the tested bacterial strains (N = 17, and heterologous expression of selected TCS resistant genes from those strains conferred expected levels of TCS resistance in an alternative host Escherichia coli. Moreover, genome-wide analysis revealed that potential TCS resistance determinants were abundant among the majority of human-associated pathogens (79% and soil-borne plant pathogenic bacteria (98%. These included a variety of enoyl-acyl carrier protein reductase (ENRs homologues, AcrB efflux pumps, and ENR substitutions. FabI ENR, which is the only known effective target for TCS, was either co-localized with other TCS resistance determinants or had TCS resistance-associated substitutions. Furthermore, microbiome analysis revealed that pathogenic genera with intrinsic TCS-resistant determinants exist in TCS contaminated environments. We conclude that TCS may not be as effective against the majority of bacterial pathogens as previously

  3. Functional metagenomics of extreme environments.

    Science.gov (United States)

    Mirete, Salvador; Morgante, Verónica; González-Pastor, José Eduardo

    2016-04-01

    The bioprospecting of enzymes that operate under extreme conditions is of particular interest for many biotechnological and industrial processes. Nevertheless, there is a considerable limitation to retrieve novel enzymes as only a small fraction of microorganisms derived from extreme environments can be cultured under standard laboratory conditions. Functional metagenomics has the advantage of not requiring the cultivation of microorganisms or previous sequence information to known genes, thus representing a valuable approach for mining enzymes with new features. In this review, we summarize studies showing how functional metagenomics was employed to retrieve genes encoding for proteins involved not only in molecular adaptation and resistance to extreme environmental conditions but also in other enzymatic activities of biotechnological interest. Copyright © 2016 Elsevier Ltd. All rights reserved.

  4. Microbial diversity of a full-scale UASB reactor applied to poultry slaughterhouse wastewater treatment: integration of 16S rRNA gene amplicon and shotgun metagenomic sequencing.

    Science.gov (United States)

    Delforno, Tiago Palladino; Lacerda Júnior, Gileno Vieira; Noronha, Melline F; Sakamoto, Isabel K; Varesche, Maria Bernadete A; Oliveira, Valéria M

    2017-06-01

    The 16S rRNA gene amplicon and whole-genome shotgun metagenomic (WGSM) sequencing approaches were used to investigate wide-spectrum profiles of microbial composition and metabolic diversity from a full-scale UASB reactor applied to poultry slaughterhouse wastewater treatment. The data were generated by using MiSeq 2 × 250 bp and HiSeq 2 × 150 bp Illumina sequencing platforms for 16S amplicon and WGSM sequencing, respectively. Each approach revealed a distinct microbial community profile, with Pseudomonas and Psychrobacter as predominant genus for the WGSM dataset and Clostridium and Methanosaeta for the 16S rRNA gene amplicon dataset. The virome characterization revealed the presence of two viral families with Bacteria and Archaea as host, Myoviridae, and Siphoviridae. A wide functional diversity was found with predominance of genes involved in the metabolism of acetone, butanol, and ethanol synthesis; and one-carbon metabolism (e.g., methanogenesis). Genes related to the acetotrophic methanogenesis pathways were more abundant than methylotrophic and hydrogenotrophic, corroborating the taxonomic results that showed the prevalence of the acetotrophic genus Methanosaeta. Moreover, the dataset indicated a variety of metabolic genes involved in sulfur, nitrogen, iron, and phosphorus cycles, with many genera able to act in all cycles. BLAST analysis against Antibiotic Resistance Genes Database (ARDB) revealed that microbial community contained 43 different types of antibiotic resistance genes, some of them were associated with growth chicken promotion (e.g., bacitracin, tetracycline, and polymyxin). © 2017 The Authors. MicrobiologyOpen published by John Wiley & Sons Ltd.

  5. Metagenomic of Actinomycetes Based on 16S rRNA and nifH Genes in Soil and Roots of Four Indonesian Rice Cultivars Using PCR-DGGE

    Directory of Open Access Journals (Sweden)

    Mahyarudin

    2015-07-01

    Full Text Available The research was conducted to study the metagenomic of actinomycetes based on 16S ribosomal RNA (rRNA and bacterial nifH genes in soil and roots of four rice cultivars. The denaturing gradient gel electrophoresis profile based on 16S rRNA gene showed that the diversity of actinomycetes in roots was higher than soil samples. The profile also showed that the diversity of actinomycetes was similar in four varieties of rice plant and three types of agroecosystem. The profile was partially sequenced and compared to GenBank database indicating their identity with closely related microbes. The blast results showed that 17 bands were closely related ranging from 93% to 100% of maximum identity with five genera of actinomycetes, which is Geodermatophilus, Actinokineospora, Actinoplanes, Streptomyces and Kocuria. Our study found that Streptomyces species in soil and roots of rice plants were more varied than other genera, with a dominance of Streptomyces alboniger and Streptomyces acidiscabies in almost all the samples. Bacterial community analyses based on nifH gene denaturing gradient gel electrophoresis showed that diversity of bacteria in soils which have nifH gene was higher than that in rice plant roots. The profile also showed that the diversity of those bacteria was similar in four varieties of rice plant and three types of agroecosystem. Five bands were closely related with nifH gene from uncultured bacterium clone J50, uncultured bacterium clone clod-38, and uncultured bacterium clone BG2.37 with maximum identity 99%, 98%, and 92%, respectively. The diversity analysis based on 16S rRNA gene differed from nifH gene and may not correlate with each other. The findings indicated the diversity of actinomycetes and several bacterial genomes analyzed here have an ability to fix nitrogen in soil and roots of rice plant.

  6. A novel esterase gene cloned from a metagenomic library from neritic sediments of the South China Sea

    Directory of Open Access Journals (Sweden)

    Peng Qing

    2011-11-01

    Full Text Available Abstract Background Marine microbes are a large and diverse group, which are exposed to a wide variety of pressure, temperature, salinity, nutrient availability and other environmental conditions. They provide a huge potential source of novel enzymes with unique properties that may be useful in industry and biotechnology. To explore the lipolytic genetic resources in the South China Sea, 23 sediment samples were collected in the depth Results A metagenomic library of South China Sea sediments assemblage in plasmid vector containing about 194 Mb of community DNA was prepared. Screening of a part of the unamplified library resulted in isolation of 15 unique lipolytic clones with the ability to hydrolyze tributyrin. A positive recombinant clone (pNLE1, containing a novel esterase (Est_p1, was successfully expressed in E. coli and purified. In a series of assays, Est_p1 displayed maximal activity at pH 8.57, 40°C, with ρ-Nitrophenyl butyrate (C4 as substrate. Compared to other metagenomic esterases, Est_p1 played a notable role in specificity for substrate C4 (kcat/Km value 11,500 S-1m M-1 and showed no inhibited by phenylmethylsulfonyl fluoride, suggested that the substrate binding pocket was suitable for substrate C4 and the serine active-site residue was buried at the bottom of substrate binding pocket which sheltered by a lid structure. Conclusions Esterase, which specificity towards short chain fatty acids, especially butanoic acid, is commercially available as potent flavoring tools. According the outstanding activity and specificity for substrate C4, Est_p1 has potential application in flavor industries requiring hydrolysis of short chain esters.

  7. A catalog of the mouse gut metagenome.

    Science.gov (United States)

    Xiao, Liang; Feng, Qiang; Liang, Suisha; Sonne, Si Brask; Xia, Zhongkui; Qiu, Xinmin; Li, Xiaoping; Long, Hua; Zhang, Jianfeng; Zhang, Dongya; Liu, Chuan; Fang, Zhiwei; Chou, Joyce; Glanville, Jacob; Hao, Qin; Kotowska, Dorota; Colding, Camilla; Licht, Tine Rask; Wu, Donghai; Yu, Jun; Sung, Joseph Jao Yiu; Liang, Qiaoyi; Li, Junhua; Jia, Huijue; Lan, Zhou; Tremaroli, Valentina; Dworzynski, Piotr; Nielsen, H Bjørn; Bäckhed, Fredrik; Doré, Joël; Le Chatelier, Emmanuelle; Ehrlich, S Dusko; Lin, John C; Arumugam, Manimozhiyan; Wang, Jun; Madsen, Lise; Kristiansen, Karsten

    2015-10-01

    We established a catalog of the mouse gut metagenome comprising ∼2.6 million nonredundant genes by sequencing DNA from fecal samples of 184 mice. To secure high microbiome diversity, we used mouse strains of diverse genetic backgrounds, from different providers, kept in different housing laboratories and fed either a low-fat or high-fat diet. Similar to the human gut microbiome, >99% of the cataloged genes are bacterial. We identified 541 metagenomic species and defined a core set of 26 metagenomic species found in 95% of the mice. The mouse gut microbiome is functionally similar to its human counterpart, with 95.2% of its Kyoto Encyclopedia of Genes and Genomes (KEGG) orthologous groups in common. However, only 4.0% of the mouse gut microbial genes were shared (95% identity, 90% coverage) with those of the human gut microbiome. This catalog provides a useful reference for future studies.

  8. A retrospective metagenomics approach to studying Blastocystis

    DEFF Research Database (Denmark)

    Andersen, Lee O'Brien; Bonde, Ida; Nielsen, Henrik Bjørn

    2015-01-01

    Blastocystis is a common single-celled intestinal parasitic genus, comprising several subtypes. Here, we screened data obtained by metagenomic analysis of faecal DNA for Blastocystis by searching for subtype-specific genes in coabundance gene groups, which are groups of genes that covary across......- and Prevotella-driven enterotypes. This is the first study to investigate the relationship between Blastocystis and communities of gut bacteria using a metagenomics approach. The study serves as an example of how it is possible to retrospectively investigate microbial eukaryotic communities in the gut using...

  9. Productivity and salinity structuring of the microplankton revealed by comparative freshwater metagenomics.

    Science.gov (United States)

    Eiler, Alexander; Zaremba-Niedzwiedzka, Katarzyna; Martínez-García, Manuel; McMahon, Katherine D; Stepanauskas, Ramunas; Andersson, Siv G E; Bertilsson, Stefan

    2014-09-01

    Little is known about the diversity and structuring of freshwater microbial communities beyond the patterns revealed by tracing their distribution in the landscape with common taxonomic markers such as the ribosomal RNA. To address this gap in knowledge, metagenomes from temperate lakes were compared to selected marine metagenomes. Taxonomic analyses of rRNA genes in these freshwater metagenomes confirm the previously reported dominance of a limited subset of uncultured lineages of freshwater bacteria, whereas Archaea were rare. Diversification into marine and freshwater microbial lineages was also reflected in phylogenies of functional genes, and there were also significant differences in functional beta-diversity. The pathways and functions that accounted for these differences are involved in osmoregulation, active transport, carbohydrate and amino acid metabolism. Moreover, predicted genes orthologous to active transporters and recalcitrant organic matter degradation were more common in microbial genomes from oligotrophic versus eutrophic lakes. This comparative metagenomic analysis allowed us to formulate a general hypothesis that oceanic- compared with freshwater-dwelling microorganisms, invest more in metabolism of amino acids and that strategies of carbohydrate metabolism differ significantly between marine and freshwater microbial communities. © 2013 The Authors. Environmental Microbiology published by Society for Applied Microbiology and John Wiley & Sons Ltd.

  10. An Experimental Metagenome Data Management and AnalysisSystem

    Energy Technology Data Exchange (ETDEWEB)

    Markowitz, Victor M.; Korzeniewski, Frank; Palaniappan, Krishna; Szeto, Ernest; Ivanova, Natalia N.; Kyrpides, Nikos C.; Hugenholtz, Philip

    2006-03-01

    The application of shotgun sequencing to environmental samples has revealed a new universe of microbial community genomes (metagenomes) involving previously uncultured organisms. Metagenome analysis, which is expected to provide a comprehensive picture of the gene functions and metabolic capacity of microbial community, needs to be conducted in the context of a comprehensive data management and analysis system. We present in this paper IMG/M, an experimental metagenome data management and analysis system that is based on the Integrated Microbial Genomes (IMG) system. IMG/M provides tools and viewers for analyzing both metagenomes and isolate genomes individually or in a comparative context.

  11. Analysis and comparison of very large metagenomes with fast clustering and functional annotation

    Directory of Open Access Journals (Sweden)

    Li Weizhong

    2009-10-01

    Full Text Available Abstract Background The remarkable advance of metagenomics presents significant new challenges in data analysis. Metagenomic datasets (metagenomes are large collections of sequencing reads from anonymous species within particular environments. Computational analyses for very large metagenomes are extremely time-consuming, and there are often many novel sequences in these metagenomes that are not fully utilized. The number of available metagenomes is rapidly increasing, so fast and efficient metagenome comparison methods are in great demand. Results The new metagenomic data analysis method Rapid Analysis of Multiple Metagenomes with a Clustering and Annotation Pipeline (RAMMCAP was developed using an ultra-fast sequence clustering algorithm, fast protein family annotation tools, and a novel statistical metagenome comparison method that employs a unique graphic interface. RAMMCAP processes extremely large datasets with only moderate computational effort. It identifies raw read clusters and protein clusters that may include novel gene families, and compares metagenomes using clusters or functional annotations calculated by RAMMCAP. In this study, RAMMCAP was applied to the two largest available metagenomic collections, the "Global Ocean Sampling" and the "Metagenomic Profiling of Nine Biomes". Conclusion RAMMCAP is a very fast method that can cluster and annotate one million metagenomic reads in only hundreds of CPU hours. It is available from http://tools.camera.calit2.net/camera/rammcap/.

  12. Natural variation in SAR11 marine bacterioplankton genomes inferred from metagenomic data

    Directory of Open Access Journals (Sweden)

    Wilhelm Larry J

    2007-11-01

    Full Text Available Abstract Background One objective of metagenomics is to reconstruct information about specific uncultured organisms from fragmentary environmental DNA sequences. We used the genome of an isolate of the marine alphaproteobacterium SAR11 ('Candidatus Pelagibacter ubique'; strain HTCC1062, obtained from the cold, productive Oregon coast, as a query sequence to study variation in SAR11 metagenome sequence data from the Sargasso Sea, a warm, oligotrophic ocean gyre. Results The average amino acid identity of SAR11 genes encoded by the metagenomic data to the query genome was only 71%, indicating significant evolutionary divergence between the coastal isolates and Sargasso Sea populations. However, an analysis of gene neighbors indicated that SAR11 genes in the Sargasso Sea metagenomic data match the gene order of the HTCC1062 genome in 96% of cases (> 85,000 observations, and that rearrangements are most frequent at predicted operon boundaries. There were no conserved examples of genes with known functions being found in the coastal isolates, but not the Sargasso Sea metagenomic data, or vice versa, suggesting that core regions of these diverse SAR11 genomes are relatively conserved in gene content. However, four hypervariable regions were observed, which may encode properties associated with variation in SAR11 ecotypes. The largest of these, HVR2, is a 48 kb region flanked by the sole 5S and 23S genes in the HTCC1062 genome, and mainly encodes genes that determine cell surface properties. A comparison of two closely related 'Candidatus Pelagibacter' genomes (HTCC1062 and HTCC1002 revealed a number of "gene indels" in core regions. Most of these were found to be polymorphic in the metagenomic data and showed evidence of purifying selection, suggesting that the same "polymorphic gene indels" are maintained in physically isolated SAR11 populations. Conclusion These findings suggest that natural selection has conserved many core features of SAR11

  13. Blood Gene Expression Predicts Bronchiolitis Obliterans Syndrome

    Directory of Open Access Journals (Sweden)

    Richard Danger

    2018-01-01

    Full Text Available Bronchiolitis obliterans syndrome (BOS, the main manifestation of chronic lung allograft dysfunction, leads to poor long-term survival after lung transplantation. Identifying predictors of BOS is essential to prevent the progression of dysfunction before irreversible damage occurs. By using a large set of 107 samples from lung recipients, we performed microarray gene expression profiling of whole blood to identify early biomarkers of BOS, including samples from 49 patients with stable function for at least 3 years, 32 samples collected at least 6 months before BOS diagnosis (prediction group, and 26 samples at or after BOS diagnosis (diagnosis group. An independent set from 25 lung recipients was used for validation by quantitative PCR (13 stables, 11 in the prediction group, and 8 in the diagnosis group. We identified 50 transcripts differentially expressed between stable and BOS recipients. Three genes, namely POU class 2 associating factor 1 (POU2AF1, T-cell leukemia/lymphoma protein 1A (TCL1A, and B cell lymphocyte kinase, were validated as predictive biomarkers of BOS more than 6 months before diagnosis, with areas under the curve of 0.83, 0.77, and 0.78 respectively. These genes allow stratification based on BOS risk (log-rank test p < 0.01 and are not associated with time posttransplantation. This is the first published large-scale gene expression analysis of blood after lung transplantation. The three-gene blood signature could provide clinicians with new tools to improve follow-up and adapt treatment of patients likely to develop BOS.

  14. Novel resistance functions uncovered using functional metagenomic investigations of resistance reservoirs

    Directory of Open Access Journals (Sweden)

    Erica C. Pehrsson

    2013-06-01

    Full Text Available Rates of infection with antibiotic-resistant bacteria have increased precipitously over the past several decades, with far-reaching healthcare and societal costs. Recent evidence has established a link between antibiotic resistance genes in human pathogens and those found in non-pathogenic, commensal, and environmental organisms, prompting deeper investigation of natural and human-associated reservoirs of antibiotic resistance. Functional metagenomic selections, in which shotgun-cloned DNA fragments are selected for their ability to confer survival to an indicator host, have been increasingly applied to the characterization of many antibiotic resistance reservoirs. These experiments have demonstrated that antibiotic resistance genes are highly diverse and widely distributed, many times bearing little to no similarity to known sequences. Through unbiased selections for survival to antibiotic exposure, functional metagenomics can improve annotations by reducing the discovery of false-positive resistance and by allowing for the identification of previously unrecognizable resistance genes. In this review, we summarize the novel resistance functions uncovered using functional metagenomic investigations of natural and human-impacted resistance reservoirs. Examples of novel antibiotic resistance genes include those highly divergent from known sequences, those for which sequence is entirely unable to predict resistance function, bifunctional resistance genes, and those with unconventional, atypical resistance mechanisms. Overcoming antibiotic resistance in the clinic will require a better understanding of existing resistance reservoirs and the dissemination networks that govern horizontal gene exchange, informing best practices to limit the spread of resistance-conferring genes to human pathogens.

  15. Novel resistance functions uncovered using functional metagenomic investigations of resistance reservoirs.

    Science.gov (United States)

    Pehrsson, Erica C; Forsberg, Kevin J; Gibson, Molly K; Ahmadi, Sara; Dantas, Gautam

    2013-01-01

    Rates of infection with antibiotic-resistant bacteria have increased precipitously over the past several decades, with far-reaching healthcare and societal costs. Recent evidence has established a link between antibiotic resistance genes in human pathogens and those found in non-pathogenic, commensal, and environmental organisms, prompting deeper investigation of natural and human-associated reservoirs of antibiotic resistance. Functional metagenomic selections, in which shotgun-cloned DNA fragments are selected for their ability to confer survival to an indicator host, have been increasingly applied to the characterization of many antibiotic resistance reservoirs. These experiments have demonstrated that antibiotic resistance genes are highly diverse and widely distributed, many times bearing little to no similarity to known sequences. Through unbiased selections for survival to antibiotic exposure, functional metagenomics can improve annotations by reducing the discovery of false-positive resistance and by allowing for the identification of previously unrecognizable resistance genes. In this review, we summarize the novel resistance functions uncovered using functional metagenomic investigations of natural and human-impacted resistance reservoirs. Examples of novel antibiotic resistance genes include those highly divergent from known sequences, those for which sequence is entirely unable to predict resistance function, bifunctional resistance genes, and those with unconventional, atypical resistance mechanisms. Overcoming antibiotic resistance in the clinic will require a better understanding of existing resistance reservoirs and the dissemination networks that govern horizontal gene exchange, informing best practices to limit the spread of resistance-conferring genes to human pathogens.

  16. Challenges and Opportunities of Airborne Metagenomics

    KAUST Repository

    Behzad, H.

    2015-05-06

    Recent metagenomic studies of environments, such as marine and soil, have significantly enhanced our understanding of the diverse microbial communities living in these habitats and their essential roles in sustaining vast ecosystems. The increase in the number of publications related to soil and marine metagenomics is in sharp contrast to those of air, yet airborne microbes are thought to have significant impacts on many aspects of our lives from their potential roles in atmospheric events such as cloud formation, precipitation, and atmospheric chemistry to their major impact on human health. In this review, we will discuss the current progress in airborne metagenomics, with a special focus on exploring the challenges and opportunities of undertaking such studies. The main challenges of conducting metagenomic studies of airborne microbes are as follows: 1) Low density of microorganisms in the air, 2) efficient retrieval of microorganisms from the air, 3) variability in airborne microbial community composition, 4) the lack of standardized protocols and methodologies, and 5) DNA sequencing and bioinformatics-related challenges. Overcoming these challenges could provide the groundwork for comprehensive analysis of airborne microbes and their potential impact on the atmosphere, global climate, and our health. Metagenomic studies offer a unique opportunity to examine viral and bacterial diversity in the air and monitor their spread locally or across the globe, including threats from pathogenic microorganisms. Airborne metagenomic studies could also lead to discoveries of novel genes and metabolic pathways relevant to meteorological and industrial applications, environmental bioremediation, and biogeochemical cycles.

  17. Comparative Metagenomics of Freshwater Microbial Communities

    International Nuclear Information System (INIS)

    Hemme, Chris; Deng, Ye; Tu, Qichao; Fields, Matthew; Gentry, Terry; Wu, Liyou; Tringe, Susannah; Watson, David; He, Zhili; Hazen, Terry; Tiedje, James; Rubin, Eddy; Zhou, Jizhong

    2010-01-01

    Previous analyses of a microbial metagenome from uranium and nitric-acid contaminated groundwater (FW106) showed significant environmental effects resulting from the rapid introduction of multiple contaminants. Effects include a massive loss of species and strain biodiversity, accumulation of toxin resistant genes in the metagenome and lateral transfer of toxin resistance genes between community members. To better understand these results in an ecological context, a second metagenome from a pristine groundwater system located along the same geological strike was sequenced and analyzed (FW301). It is hypothesized that FW301 approximates the ancestral FW106 community based on phylogenetic profiles and common geological parameters; however, even if is not the case, the datasets still permit comparisons between healthy and stressed groundwater ecosystems. Complex carbohydrate metabolism has been almost entirely lost in the stressed ecosystem. In contrast, the pristine system encodes a wide diversity of complex carbohydrate metabolism systems, suggesting that carbon turnover is very rapid and less leaky in the healthy groundwater system. FW301 encodes many (∼160+) carbon monoxide dehydrogenase genes while FW106 encodes none. This result suggests that the community is frequently exposed to oxygen from aerated rainwater percolating into the subsurface, with a resulting high rate of carbon metabolism and CO production. When oxygen levels fall, the CO then serves as a major carbon source for the community. FW301 appears to be capable of CO2 fixation via the reductive carboxylase (reverse TCA) cycle and possibly acetogenesis, activities; these activities are lacking in the heterotrophic FW106 system which relies exclusively on respiration of nitrate and/or oxygen for energy production. FW301 encodes a complete set of B12 biosynthesis pathway at high abundance suggesting the use of sodium gradients for energy production in the healthy groundwater community. Overall

  18. Comparative Metagenomics of Freshwater Microbial Communities

    Energy Technology Data Exchange (ETDEWEB)

    Hemme, Chris; Deng, Ye; Tu, Qichao; Fields, Matthew; Gentry, Terry; Wu, Liyou; Tringe, Susannah; Watson, David; He, Zhili; Hazen, Terry; Tiedje, James; Rubin, Eddy; Zhou, Jizhong

    2010-05-17

    Previous analyses of a microbial metagenome from uranium and nitric-acid contaminated groundwater (FW106) showed significant environmental effects resulting from the rapid introduction of multiple contaminants. Effects include a massive loss of species and strain biodiversity, accumulation of toxin resistant genes in the metagenome and lateral transfer of toxin resistance genes between community members. To better understand these results in an ecological context, a second metagenome from a pristine groundwater system located along the same geological strike was sequenced and analyzed (FW301). It is hypothesized that FW301 approximates the ancestral FW106 community based on phylogenetic profiles and common geological parameters; however, even if is not the case, the datasets still permit comparisons between healthy and stressed groundwater ecosystems. Complex carbohydrate metabolism has been almost entirely lost in the stressed ecosystem. In contrast, the pristine system encodes a wide diversity of complex carbohydrate metabolism systems, suggesting that carbon turnover is very rapid and less leaky in the healthy groundwater system. FW301 encodes many (~;;160+) carbon monoxide dehydrogenase genes while FW106 encodes none. This result suggests that the community is frequently exposed to oxygen from aerated rainwater percolating into the subsurface, with a resulting high rate of carbon metabolism and CO production. When oxygen levels fall, the CO then serves as a major carbon source for the community. FW301 appears to be capable of CO2 fixation via the reductive carboxylase (reverse TCA) cycle and possibly acetogenesis, activities; these activities are lacking in the heterotrophic FW106 system which relies exclusively on respiration of nitrate and/or oxygen for energy production. FW301 encodes a complete set of B12 biosynthesis pathway at high abundance suggesting the use of sodium gradients for energy production in the healthy groundwater community. Overall

  19. Ecological patterns of nifH genes in four terrestrial climatic zones explored with targeted metagenomics using FrameBot, a new informatics tool.

    Science.gov (United States)

    Wang, Qiong; Quensen, John F; Fish, Jordan A; Lee, Tae Kwon; Sun, Yanni; Tiedje, James M; Cole, James R

    2013-09-17

    Biological nitrogen fixation is an important component of sustainable soil fertility and a key component of the nitrogen cycle. We used targeted metagenomics to study the nitrogen fixation-capable terrestrial bacterial community by targeting the gene for nitrogenase reductase (nifH). We obtained 1.1 million nifH 454 amplicon sequences from 222 soil samples collected from 4 National Ecological Observatory Network (NEON) sites in Alaska, Hawaii, Utah, and Florida. To accurately detect and correct frameshifts caused by indel sequencing errors, we developed FrameBot, a tool for frameshift correction and nearest-neighbor classification, and compared its accuracy to that of two other rapid frameshift correction tools. We found FrameBot was, in general, more accurate as long as a reference protein sequence with 80% or greater identity to a query was available, as was the case for virtually all nifH reads for the 4 NEON sites. Frameshifts were present in 12.7% of the reads. Those nifH sequences related to the Proteobacteria phylum were most abundant, followed by those for Cyanobacteria in the Alaska and Utah sites. Predominant genera with nifH sequences similar to reads included Azospirillum, Bradyrhizobium, and Rhizobium, the latter two without obvious plant hosts at the sites. Surprisingly, 80% of the sequences had greater than 95% amino acid identity to known nifH gene sequences. These samples were grouped by site and correlated with soil environmental factors, especially drainage, light intensity, mean annual temperature, and mean annual precipitation. FrameBot was tested successfully on three ecofunctional genes but should be applicable to any. High-throughput phylogenetic analysis of microbial communities using rRNA-targeted sequencing is now commonplace; however, such data often allow little inference with respect to either the presence or the diversity of genes involved in most important ecological processes. To study the gene pool for these processes, it is more

  20. Characterization and detection of a widely distributed gene cluster that predicts anaerobic choline utilization by human gut bacteria.

    Science.gov (United States)

    Martínez-del Campo, Ana; Bodea, Smaranda; Hamer, Hilary A; Marks, Jonathan A; Haiser, Henry J; Turnbaugh, Peter J; Balskus, Emily P

    2015-04-14

    Elucidation of the molecular mechanisms underlying the human gut microbiota's effects on health and disease has been complicated by difficulties in linking metabolic functions associated with the gut community as a whole to individual microorganisms and activities. Anaerobic microbial choline metabolism, a disease-associated metabolic pathway, exemplifies this challenge, as the specific human gut microorganisms responsible for this transformation have not yet been clearly identified. In this study, we established the link between a bacterial gene cluster, the choline utilization (cut) cluster, and anaerobic choline metabolism in human gut isolates by combining transcriptional, biochemical, bioinformatic, and cultivation-based approaches. Quantitative reverse transcription-PCR analysis and in vitro biochemical characterization of two cut gene products linked the entire cluster to growth on choline and supported a model for this pathway. Analyses of sequenced bacterial genomes revealed that the cut cluster is present in many human gut bacteria, is predictive of choline utilization in sequenced isolates, and is widely but discontinuously distributed across multiple bacterial phyla. Given that bacterial phylogeny is a poor marker for choline utilization, we were prompted to develop a degenerate PCR-based method for detecting the key functional gene choline TMA-lyase (cutC) in genomic and metagenomic DNA. Using this tool, we found that new choline-metabolizing gut isolates universally possessed cutC. We also demonstrated that this gene is widespread in stool metagenomic data sets. Overall, this work represents a crucial step toward understanding anaerobic choline metabolism in the human gut microbiota and underscores the importance of examining this microbial community from a function-oriented perspective. Anaerobic choline utilization is a bacterial metabolic activity that occurs in the human gut and is linked to multiple diseases. While bacterial genes responsible for

  1. Conservative fragments in bacterial 16S rRNA genes and primer design for 16S ribosomal DNA amplicons in metagenomic studies.

    Directory of Open Access Journals (Sweden)

    Yong Wang

    Full Text Available Bacterial 16S ribosomal DNA (rDNA amplicons have been widely used in the classification of uncultured bacteria inhabiting environmental niches. Primers targeting conservative regions of the rDNAs are used to generate amplicons of variant regions that are informative in taxonomic assignment. One problem is that the percentage coverage and application scope of the primers used in previous studies are largely unknown. In this study, conservative fragments of available rDNA sequences were first mined and then used to search for candidate primers within the fragments by measuring the coverage rate defined as the percentage of bacterial sequences containing the target. Thirty predicted primers with a high coverage rate (>90% were identified, which were basically located in the same conservative regions as known primers in previous reports, whereas 30% of the known primers were associated with a coverage rate of <90%. The application scope of the primers was also examined by calculating the percentages of failed detections in bacterial phyla. Primers A519-539, E969-983, E1063-1081, U515 and E517, are highly recommended because of their high coverage in almost all phyla. As expected, the three predominant phyla, Firmicutes, Gemmatimonadetes and Proteobacteria, are best covered by the predicted primers. The primers recommended in this report shall facilitate a comprehensive and reliable survey of bacterial diversity in metagenomic studies.

  2. Conservative fragments in bacterial 16S rRNA genes and primer design for 16S ribosomal DNA amplicons in metagenomic studies

    KAUST Repository

    Wang, Yong

    2009-10-09

    Bacterial 16S ribosomal DNA (rDNA) amplicons have been widely used in the classification of uncultured bacteria inhabiting environmental niches. Primers targeting conservative regions of the rDNAs are used to generate amplicons of variant regions that are informative in taxonomic assignment. One problem is that the percentage coverage and application scope of the primers used in previous studies are largely unknown. In this study, conservative fragments of available rDNA sequences were first mined and then used to search for candidate primers within the fragments by measuring the coverage rate defined as the percentage of bacterial sequences containing the target. Thirty predicted primers with a high coverage rate (>90%) were identified, which were basically located in the same conservative regions as known primers in previous reports, whereas 30% of the known primers were associated with a coverage rate of <90%. The application scope of the primers was also examined by calculating the percentages of failed detections in bacterial phyla. Primers A519-539, E969- 983, E1063-1081, U515 and E517, are highly recommended because of their high coverage in almost all phyla. As expected, the three predominant phyla, Firmicutes, Gemmatimonadetes and Proteobacteria, are best covered by the predicted primers. The primers recommended in this report shall facilitate a comprehensive and reliable survey of bacterial diversity in metagenomic studies. © 2009 Wang, Qian.

  3. Mining of unexplored habitats for novel chitinases - chiA as a helper gene proxy in metagenomics

    DEFF Research Database (Denmark)

    Cretoiu, Mariana Silvia; Kielak, Anna Maria; Abu Al-Soud, Waleed

    2012-01-01

    encompassed (1) classical overall enzymatic assays, (2) chiA gene abundance measurement by qPCR, (3) chiA gene pyrosequencing, and (4) chiA gene-based PCR-DGGE was used. The chiA gene pyrosequencing is unprecedented, as it is the first massive parallel sequencing of this gene. The data obtained showed...... the existence across habitats of core bacterial communities responsible for chitin assimilation irrespective of ecosystem origin. Conversely, there were habitat-specific differences. In addition, a suite of sequences were obtained that are as yet unregistered in the chitinase database. In terms of chiA gene...

  4. Metagenomic Systems Biology of the Human Microbiome

    DEFF Research Database (Denmark)

    Bonde, Ida

    , nose and oral cavity has been analyzed. The central method has been a co-abundance clustering method, which separates genes from metagenomics data under the assumption that genes originating from the same DNA (e.g. a bacterial genome, a phage or a plasmid) will co-vary across samples. Thus, co...... to previous Blastocystis prevalence studies. Moreover, it was found that individuals with a Bacteroides-driven enterotype were less prone to harbor the Blastocystis parasite. Finally, the CAG clustering method was applied to metagenomics data from the human nose- and oral-cavity. It was concluded...

  5. Metagenomics at Grass Roots

    Indian Academy of Sciences (India)

    Home; Journals; Resonance – Journal of Science Education; Volume 22; Issue 3. Metagenomics at Grass Roots. Sudeshna ... benefit human health, agriculture, and ecosystemfunctions. This article provides a brief history of technicaladvances in metagenomics, including DNA sequencing methods,and some case studies.

  6. Genomic Prediction of Gene Bank Wheat Landraces

    Directory of Open Access Journals (Sweden)

    José Crossa

    2016-07-01

    Full Text Available This study examines genomic prediction within 8416 Mexican landrace accessions and 2403 Iranian landrace accessions stored in gene banks. The Mexican and Iranian collections were evaluated in separate field trials, including an optimum environment for several traits, and in two separate environments (drought, D and heat, H for the highly heritable traits, days to heading (DTH, and days to maturity (DTM. Analyses accounting and not accounting for population structure were performed. Genomic prediction models include genotype × environment interaction (G × E. Two alternative prediction strategies were studied: (1 random cross-validation of the data in 20% training (TRN and 80% testing (TST (TRN20-TST80 sets, and (2 two types of core sets, “diversity” and “prediction”, including 10% and 20%, respectively, of the total collections. Accounting for population structure decreased prediction accuracy by 15–20% as compared to prediction accuracy obtained when not accounting for population structure. Accounting for population structure gave prediction accuracies for traits evaluated in one environment for TRN20-TST80 that ranged from 0.407 to 0.677 for Mexican landraces, and from 0.166 to 0.662 for Iranian landraces. Prediction accuracy of the 20% diversity core set was similar to accuracies obtained for TRN20-TST80, ranging from 0.412 to 0.654 for Mexican landraces, and from 0.182 to 0.647 for Iranian landraces. The predictive core set gave similar prediction accuracy as the diversity core set for Mexican collections, but slightly lower for Iranian collections. Prediction accuracy when incorporating G × E for DTH and DTM for Mexican landraces for TRN20-TST80 was around 0.60, which is greater than without the G × E term. For Iranian landraces, accuracies were 0.55 for the G × E model with TRN20-TST80. Results show promising prediction accuracies for potential use in germplasm enhancement and rapid introgression of exotic germplasm

  7. Metagenomics as a Tool for Enzyme Discovery: Hydrolytic Enzymes from Marine-Related Metagenomes.

    Science.gov (United States)

    Popovic, Ana; Tchigvintsev, Anatoly; Tran, Hai; Chernikova, Tatyana N; Golyshina, Olga V; Yakimov, Michail M; Golyshin, Peter N; Yakunin, Alexander F

    2015-01-01

    This chapter discusses metagenomics and its application for enzyme discovery, with a focus on hydrolytic enzymes from marine metagenomic libraries. With less than one percent of culturable microorganisms in the environment, metagenomics, or the collective study of community genetics, has opened up a rich pool of uncharacterized metabolic pathways, enzymes, and adaptations. This great untapped pool of genes provides the particularly exciting potential to mine for new biochemical activities or novel enzymes with activities tailored to peculiar sets of environmental conditions. Metagenomes also represent a huge reservoir of novel enzymes for applications in biocatalysis, biofuels, and bioremediation. Here we present the results of enzyme discovery for four enzyme activities, of particular industrial or environmental interest, including esterase/lipase, glycosyl hydrolase, protease and dehalogenase.

  8. Mining biomass-degrading genes through Illumina-based de novo sequencing and metagenomic analysis of free-living bacteria in the gut of the lower termite Coptotermes gestroi harvested in Vietnam.

    Science.gov (United States)

    Do, Thi Huyen; Nguyen, Thi Thao; Nguyen, Thanh Ngoc; Le, Quynh Giang; Nguyen, Cuong; Kimura, Keitarou; Truong, Nam Hai

    2014-12-01

    The 5.6 Gb metagenome of free-living microbial flora in the gut of the lower termite Coptotermes gestroi, harvested in Vietnam, was sequenced using Illumina technology. Genes related to biomass degradation were mined for a better understanding of biomass digestion in the termite gut and to identify lignocellulolytic enzymes applicable to biofuel production. The sequencing generated 5.4 Gb of useful reads, containing 125,431 ORFs spanning 78,271,365 bp, 80% of which was derived from bacteria. The 12 most abundant bacterial orders were Spirochaetales, Lactobacillales, Bacteroidales, Clostridiales, Enterobacteriales, Pseudomonades, Synergistales, Desulfovibrionales, Xanthomonadales, Burkholderiales, Bacillales, and Actinomycetales, and 1460 species were estimated. Of more than 12,000 ORFs with predicted functions related to carbohydrate metabolism, 587 encoding hydrolytic enzymes for cellulose, hemicellulose, and pectin were identified. Among them, 316 ORFs were related to cellulose degradation, and included β-glucosidases, 6-phospho-β-glucosidases, licheninases, glucan endo-1,3-β-D-glucosidases, endoglucanases, cellulose 1,4-β-cellobiosidases, glucan 1,3-β-glucosidases, and cellobiose phosphorylases. In addition, 259 ORFs were related to hemicellulose degradation, encoding endo-1,4-β-xylanases, α-galactosidases, α-N-arabinofuranosidases, xylan 1,4-β-xylosidases, arabinan endo-1,5-α-L-arabinosidases, endo-1,4-β-mannanases, and α-glucuronidases. Twelve ORFs encoding pectinesterases and pectate lyases were also obtained. To our knowledge, this is the first successful application of Illumina-based de novo sequencing for the analysis of a free-living bacterial community in the gut of a lower termite C. gestroi and for mining genes related to lignocellulose degradation from the gut bacteria. Copyright © 2014 The Society for Biotechnology, Japan. Published by Elsevier B.V. All rights reserved.

  9. A catalog of the mouse gut metagenome

    DEFF Research Database (Denmark)

    Xiao, Liang; Feng, Qiang; Liang, Suisha

    2015-01-01

    We established a catalog of the mouse gut metagenome comprising ∼2.6 million nonredundant genes by sequencing DNA from fecal samples of 184 mice. To secure high microbiome diversity, we used mouse strains of diverse genetic backgrounds, from different providers, kept in different housing laborato......We established a catalog of the mouse gut metagenome comprising ∼2.6 million nonredundant genes by sequencing DNA from fecal samples of 184 mice. To secure high microbiome diversity, we used mouse strains of diverse genetic backgrounds, from different providers, kept in different housing...... laboratories and fed either a low-fat or high-fat diet. Similar to the human gut microbiome, >99% of the cataloged genes are bacterial. We identified 541 metagenomic species and defined a core set of 26 metagenomic species found in 95% of the mice. The mouse gut microbiome is functionally similar to its human...... counterpart, with 95.2% of its Kyoto Encyclopedia of Genes and Genomes (KEGG) orthologous groups in common. However, only 4.0% of the mouse gut microbial genes were shared (95% identity, 90% coverage) with those of the human gut microbiome. This catalog provides a useful reference for future studies....

  10. Coupled high-throughput functional screening and next generation sequencing for identification of plant polymer decomposing enzymes in metagenomic libraries

    Directory of Open Access Journals (Sweden)

    Mari eNyyssönen

    2013-09-01

    Full Text Available Recent advances in sequencing technologies generate new predictions and hypotheses about the functional roles of environmental microorganisms. Yet, until we can test these predictions at a scale that matches our ability to generate them, most of them will remain as hypotheses. Function-based mining of metagenomic libraries can provide direct linkages between genes, metabolic traits and microbial taxa and thus bridge this gap between sequence data generation and functional predictions. Here we developed high-throughput screening assays for function-based characterization of activities involved in plant polymer decomposition from environmental metagenomic libraries. The multiplexed assays use fluorogenic and chromogenic substrates, combine automated liquid handling and use a genetically modified expression host to enable simultaneous screening of 12,160 clones for 14 activities in a total of 170,240 reactions. Using this platform we identified 374 (0.26 % cellulose, hemicellulose, chitin, starch, phosphate and protein hydrolyzing clones from fosmid libraries prepared from decomposing leaf litter. Sequencing on the Illumina MiSeq platform, followed by assembly and gene prediction of a subset of 95 fosmid clones, identified a broad range of bacterial phyla, including Actinobacteria, Bacteroidetes, multiple Proteobacteria sub-phyla in addition to some Fungi. Carbohydrate-active enzyme genes from 20 different glycoside hydrolase families were detected. Using tetranucleotide frequency binning of fosmid sequences, multiple enzyme activities from distinct fosmids were linked, demonstrating how biochemically-confirmed functional traits in environmental metagenomes may be attributed to groups of specific organisms. Overall, our results demonstrate how functional screening of metagenomic libraries can be used to connect microbial functionality to community composition and, as a result, complement large-scale metagenomic sequencing efforts.

  11. Exploring the optimal strategy to predict essential genes in microbes.

    Science.gov (United States)

    Deng, Jingyuan; Tan, Lirong; Lin, Xiaodong; Lu, Yao; Lu, Long J

    2011-12-27

    Accurately predicting essential genes is important in many aspects of biology, medicine and bioengineering. In previous research, we have developed a machine learning based integrative algorithm to predict essential genes in bacterial species. This algorithm lends itself to two approaches for predicting essential genes: learning the traits from known essential genes in the target organism, or transferring essential gene annotations from a closely related model organism. However, for an understudied microbe, each approach has its potential limitations. The first is constricted by the often small number of known essential genes. The second is limited by the availability of model organisms and by evolutionary distance. In this study, we aim to determine the optimal strategy for predicting essential genes by examining four microbes with well-characterized essential genes. Our results suggest that, unless the known essential genes are few, learning from the known essential genes in the target organism usually outperforms transferring essential gene annotations from a related model organism. In fact, the required number of known essential genes is surprisingly small to make accurate predictions. In prokaryotes, when the number of known essential genes is greater than 2% of total genes, this approach already comes close to its optimal performance. In eukaryotes, achieving the same best performance requires over 4% of total genes, reflecting the increased complexity of eukaryotic organisms. Combining the two approaches resulted in an increased performance when the known essential genes are few. Our investigation thus provides key information on accurately predicting essential genes and will greatly facilitate annotations of microbial genomes.

  12. PCR-Based Analysis of ColE1 Plasmids in Clinical Isolates and Metagenomic Samples Reveals Their Importance as Gene Capture Platforms

    Directory of Open Access Journals (Sweden)

    Manuel Ares-Arroyo

    2018-03-01

    Full Text Available ColE1 plasmids are important vehicles for the spread of antibiotic resistance in the Enterobacteriaceae and Pasteurellaceae families of bacteria. Their monitoring is essential, as they harbor important resistant determinants in humans, animals and the environment. In this work, we have analyzed ColE1 replicons using bioinformatic and experimental approaches. First, we carried out a computational study examining the structure of different ColE1 plasmids deposited in databases. Bioinformatic analysis of these ColE1 replicons revealed a mosaic genetic structure consisting of a host-adapted conserved region responsible for the housekeeping functions of the plasmid, and a variable region encoding a wide variety of genes, including multiple antibiotic resistance determinants. From this exhaustive computational analysis we developed a new PCR-based technique, targeting a specific sequence in the conserved region, for the screening, capture and sequencing of these small plasmids, either specific for Enterobacteriaceae or specific for Pasteurellaceae. To validate this PCR-based system, we tested various collections of isolates from both bacterial families, finding that ColE1 replicons were not only highly prevalent in antibiotic-resistant isolates, but also present in susceptible bacteria. In Pasteurellaceae, ColE1 plasmids carried almost exclusively antibiotic resistance genes. In Enterobacteriaceae, these plasmids encoded a large range of traits, including not only antibiotic resistance determinants, but also a wide variety of genes, showing the huge genetic plasticity of these small replicons. Finally, we also used a metagenomic approach in order to validate this technique, performing this PCR system using total DNA extractions from fecal samples from poultry, turkeys, pigs and humans. Using Illumina sequencing of the PCR products we identified a great diversity of genes encoded by ColE1 replicons, including different antibiotic resistance

  13. Interactive metagenomic visualization in a Web browser.

    Science.gov (United States)

    Ondov, Brian D; Bergman, Nicholas H; Phillippy, Adam M

    2011-09-30

    A critical output of metagenomic studies is the estimation of abundances of taxonomical or functional groups. The inherent uncertainty in assignments to these groups makes it important to consider both their hierarchical contexts and their prediction confidence. The current tools for visualizing metagenomic data, however, omit or distort quantitative hierarchical relationships and lack the facility for displaying secondary variables. Here we present Krona, a new visualization tool that allows intuitive exploration of relative abundances and confidences within the complex hierarchies of metagenomic classifications. Krona combines a variant of radial, space-filling displays with parametric coloring and interactive polar-coordinate zooming. The HTML5 and JavaScript implementation enables fully interactive charts that can be explored with any modern Web browser, without the need for installed software or plug-ins. This Web-based architecture also allows each chart to be an independent document, making them easy to share via e-mail or post to a standard Web server. To illustrate Krona's utility, we describe its application to various metagenomic data sets and its compatibility with popular metagenomic analysis tools. Krona is both a powerful metagenomic visualization tool and a demonstration of the potential of HTML5 for highly accessible bioinformatic visualizations. Its rich and interactive displays facilitate more informed interpretations of metagenomic analyses, while its implementation as a browser-based application makes it extremely portable and easily adopted into existing analysis packages. Both the Krona rendering code and conversion tools are freely available under a BSD open-source license, and available from: http://krona.sourceforge.net.

  14. Interactive metagenomic visualization in a Web browser

    Directory of Open Access Journals (Sweden)

    Phillippy Adam M

    2011-09-01

    Full Text Available Abstract Background A critical output of metagenomic studies is the estimation of abundances of taxonomical or functional groups. The inherent uncertainty in assignments to these groups makes it important to consider both their hierarchical contexts and their prediction confidence. The current tools for visualizing metagenomic data, however, omit or distort quantitative hierarchical relationships and lack the facility for displaying secondary variables. Results Here we present Krona, a new visualization tool that allows intuitive exploration of relative abundances and confidences within the complex hierarchies of metagenomic classifications. Krona combines a variant of radial, space-filling displays with parametric coloring and interactive polar-coordinate zooming. The HTML5 and JavaScript implementation enables fully interactive charts that can be explored with any modern Web browser, without the need for installed software or plug-ins. This Web-based architecture also allows each chart to be an independent document, making them easy to share via e-mail or post to a standard Web server. To illustrate Krona's utility, we describe its application to various metagenomic data sets and its compatibility with popular metagenomic analysis tools. Conclusions Krona is both a powerful metagenomic visualization tool and a demonstration of the potential of HTML5 for highly accessible bioinformatic visualizations. Its rich and interactive displays facilitate more informed interpretations of metagenomic analyses, while its implementation as a browser-based application makes it extremely portable and easily adopted into existing analysis packages. Both the Krona rendering code and conversion tools are freely available under a BSD open-source license, and available from: http://krona.sourceforge.net.

  15. Metagenomic scaffolds enable combinatorial lignin transformation.

    Science.gov (United States)

    Strachan, Cameron R; Singh, Rahul; VanInsberghe, David; Ievdokymenko, Kateryna; Budwill, Karen; Mohn, William W; Eltis, Lindsay D; Hallam, Steven J

    2014-07-15

    Engineering the microbial transformation of lignocellulosic biomass is essential to developing modern biorefining processes that alleviate reliance on petroleum-derived energy and chemicals. Many current bioprocess streams depend on the genetic tractability of Escherichia coli with a primary emphasis on engineering cellulose/hemicellulose catabolism, small molecule production, and resistance to product inhibition. Conversely, bioprocess streams for lignin transformation remain embryonic, with relatively few environmental strains or enzymes implicated. Here we develop a biosensor responsive to monoaromatic lignin transformation products compatible with functional screening in E. coli. We use this biosensor to retrieve metagenomic scaffolds sourced from coal bed bacterial communities conferring an array of lignin transformation phenotypes that synergize in combination. Transposon mutagenesis and comparative sequence analysis of active clones identified genes encoding six functional classes mediating lignin transformation phenotypes that appear to be rearrayed in nature via horizontal gene transfer. Lignin transformation activity was then demonstrated for one of the predicted gene products encoding a multicopper oxidase to validate the screen. These results illuminate cellular and community-wide networks acting on aromatic polymers and expand the toolkit for engineering recombinant lignin transformation based on ecological design principles.

  16. Comprehensive Diagnosis of Bacterial Infection Associated with Acute Cholecystitis Using Metagenomic Approach

    Directory of Open Access Journals (Sweden)

    Manabu Kujiraoka

    2017-04-01

    Full Text Available Acute cholecystitis (AC, which is strongly associated with retrograde bacterial infection, is an inflammatory disease that can be fatal if inappropriately treated. Currently, bacterial culture testing, which is basically recommended to detect the etiological agent, is a time-consuming (4–6 days, non-comprehensive approach. To rapidly detect a potential pathogen and predict its antimicrobial susceptibility, we undertook a metagenomic approach to characterize the bacterial infection associated with AC. Six patients (P1–P6 who underwent cholecystectomy for AC were enrolled in this study. Metagenome analysis demonstrated possible single or multiple bacterial infections in four patients (P1, P2, P3, and P4 with 24-h experimental procedures; in addition, the CTX-M extended-spectrum ß-lactamase (ESBL gene was identified in two bile samples (P1 and P4. Further whole genome sequencing of Escherichia coli isolates suggested that CTX-M-27-producing ST131 and CTX-M-14-producing novel-ST were identified in P1 and P4, respectively. Metagenome analysis of feces and saliva also suggested some imbalance in the microbiota for more comprehensive assessment of patients with AC. In conclusion, metagenome analysis was useful for rapid bacterial diagnostics, including assessing potential antimicrobial susceptibility, in patients with AC.

  17. Metagenomics at Grass Roots

    Indian Academy of Sciences (India)

    Metagenomics is a robust, interdisciplinary approach for studyingmicrobial community composition, function, and dynamics.It typically involves a core of molecular biology, microbiology,ecology, statistics, and computational biology. Excitingoutcomes anticipated from these studies include unravelingof complex interactions ...

  18. Metagenomic islands of hyperhalophiles: the case of Salinibacter ruber

    Directory of Open Access Journals (Sweden)

    Rohwer Forest

    2009-12-01

    Full Text Available Abstract Background Saturated brines are extreme environments of low diversity. Salinibacter ruber is the only bacterium that inhabits this environment in significant numbers. In order to establish the extent of genetic diversity in natural populations of this microbe, the genomic sequence of reference strain DSM 13855 was compared to metagenomic fragments recovered from climax saltern crystallizers and obtained with 454 sequencing technology. This kind of analysis reveals the presence of metagenomic islands, i.e. highly variable regions among the different lineages in the population. Results Three regions of the sequenced isolate were scarcely represented in the metagenome thus appearing to vary among co-occurring S. ruber cells. These metagenomic islands showed evidence of extensive genomic corruption with atypically low GC content, low coding density, high numbers of pseudogenes and short hypothetical proteins. A detailed analysis of island gene content showed that the genes in metagenomic island 1 code for cell surface polysaccharides. The strain-specific genes of metagenomic island 2 were found to be involved in biosynthesis of cell wall polysaccharide components. Finally, metagenomic island 3 was rich in DNA related enzymes. Conclusion The genomic organisation of S. ruber variable genomic regions showed a number of convergences with genomic islands of marine microbes studied, being largely involved in variable cell surface traits. This variation at the level of cell envelopes in an environment devoid of grazing pressure probably reflects a global strategy of bacteria to escape phage predation.

  19. Ocean microbial metagenomics

    Science.gov (United States)

    Kerkhof, Lee J.; Goodman, Robert M.

    2009-09-01

    Technology for accessing the genomic DNA of microorganisms, directly from environmental samples without prior cultivation, has opened new vistas to understanding microbial diversity and functions. Especially as applied to soils and the oceans, environments on Earth where microbial diversity is vast, metagenomics and its emergent approaches have the power to transform rapidly our understanding of environmental microbiology. Here we explore select recent applications of the metagenomic suite to ocean microbiology.

  20. Prediction of autism susceptibility genes based on association rules.

    Science.gov (United States)

    Gong, Lejun; Yan, Yunyang; Xie, Jianming; Liu, Hongde; Sun, Xiao

    2012-06-01

    Autism is a complex neuropsychiatric disorder with high heritability and an unclear etiology. The identification of key genes related to autism may elucidate its etiology. The current study provides an approach to predicting autism susceptibility genes. Genes are first extracted from the biomedical literature, and some autism susceptibility genes are then recognized as seeds by the prior knowledge. As candidates, the remaining genes are predicted by creating association rules between the seeds and candidates. In an evaluated data set, 27 autism susceptibility genes (type "Y") are extracted and 43 possible autism susceptibility genes (type "P") are predicted. The sum of "Y" and "P" genes accounts for 93.3% of the data set that are not contained in the typical database of autism susceptibility genes. Our approach can effectively extract and predict autism susceptibility genes from the biomedical literature. These predicted results complement the typical database of autism susceptibility genes. The web portal for the predicted results, which is freely available at http://biolab.hyit.edu.cn/ar, can be a valuable resource in studies of diseases related to genes. Copyright © 2012 Wiley Periodicals, Inc.

  1. Challenges and opportunities of airborne metagenomics.

    Science.gov (United States)

    Behzad, Hayedeh; Gojobori, Takashi; Mineta, Katsuhiko

    2015-05-06

    Recent metagenomic studies of environments, such as marine and soil, have significantly enhanced our understanding of the diverse microbial communities living in these habitats and their essential roles in sustaining vast ecosystems. The increase in the number of publications related to soil and marine metagenomics is in sharp contrast to those of air, yet airborne microbes are thought to have significant impacts on many aspects of our lives from their potential roles in atmospheric events such as cloud formation, precipitation, and atmospheric chemistry to their major impact on human health. In this review, we will discuss the current progress in airborne metagenomics, with a special focus on exploring the challenges and opportunities of undertaking such studies. The main challenges of conducting metagenomic studies of airborne microbes are as follows: 1) Low density of microorganisms in the air, 2) efficient retrieval of microorganisms from the air, 3) variability in airborne microbial community composition, 4) the lack of standardized protocols and methodologies, and 5) DNA sequencing and bioinformatics-related challenges. Overcoming these challenges could provide the groundwork for comprehensive analysis of airborne microbes and their potential impact on the atmosphere, global climate, and our health. Metagenomic studies offer a unique opportunity to examine viral and bacterial diversity in the air and monitor their spread locally or across the globe, including threats from pathogenic microorganisms. Airborne metagenomic studies could also lead to discoveries of novel genes and metabolic pathways relevant to meteorological and industrial applications, environmental bioremediation, and biogeochemical cycles. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  2. Exploring the Optimal Strategy to Predict Essential Genes in Microbes

    Directory of Open Access Journals (Sweden)

    Yao Lu

    2011-12-01

    Full Text Available Accurately predicting essential genes is important in many aspects of biology, medicine and bioengineering. In previous research, we have developed a machine learning based integrative algorithm to predict essential genes in bacterial species. This algorithm lends itself to two approaches for predicting essential genes: learning the traits from known essential genes in the target organism, or transferring essential gene annotations from a closely related model organism. However, for an understudied microbe, each approach has its potential limitations. The first is constricted by the often small number of known essential genes. The second is limited by the availability of model organisms and by evolutionary distance. In this study, we aim to determine the optimal strategy for predicting essential genes by examining four microbes with well-characterized essential genes. Our results suggest that, unless the known essential genes are few, learning from the known essential genes in the target organism usually outperforms transferring essential gene annotations from a related model organism. In fact, the required number of known essential genes is surprisingly small to make accurate predictions. In prokaryotes, when the number of known essential genes is greater than 2% of total genes, this approach already comes close to its optimal performance. In eukaryotes, achieving the same best performance requires over 4% of total genes, reflecting the increased complexity of eukaryotic organisms. Combining the two approaches resulted in an increased performance when the known essential genes are few. Our investigation thus provides key information on accurately predicting essential genes and will greatly facilitate annotations of microbial genomes.

  3. Metagenomics and Applications

    Directory of Open Access Journals (Sweden)

    L Rafati

    2016-11-01

    Full Text Available Introduction: Bacteria are a group of microorganisms which in contrast to their diversity in nature, only very few of them can be grown and isolated in the current standard laboratories. Metagenomics as a new field of research, during the last decade has worked on clarification of the genomes of the non-cultured microbes and researchers around the world with serious study of this group of bacteria, looking for new compounds such as new antibiotics, anti-cancer agents, new enzymes and biomolecules. Methods: This article is reviews study which with study of Texts and Internet and handy browsing of key words from reliable scientific resources and sites amongst: Google Scholar, Pub med, Science direct, Sid and Scopus in the years 2000 to 2013 were collected and studied. Results: The data collection instrument in the study includes all printed metagenomics related texts. Although, nowadays metagenomics is used to screen samples but now as a perfect technique beside the medium application and other traditional techniques will have better position. The highest usage of metagenomics is in clinical cases where with conventional techniques can't be discovered microbial reasons. So for tests and analyze information need to skilled scientists. Conclusion: This paper focuses on some of the latest achievements of Metagenomics and its application in new drugs, detection of enzymes, potential of biotechnology and environment.

  4. Quantitative metagenomic analyses based on average genome size normalization

    DEFF Research Database (Denmark)

    Frank, Jeremy Alexander; Sørensen, Søren Johannes

    2011-01-01

    Over the past quarter-century, microbiologists have used DNA sequence information to aid in the characterization of microbial communities. During the last decade, this has expanded from single genes to microbial community genomics, or metagenomics, in which the gene content of an environment can...... provide not just a census of the community members but direct information on metabolic capabilities and potential interactions among community members. Here we introduce a method for the quantitative characterization and comparison of microbial communities based on the normalization of metagenomic data...... by estimating average genome sizes. This normalization can relieve comparative biases introduced by differences in community structure, number of sequencing reads, and sequencing read lengths between different metagenomes. We demonstrate the utility of this approach by comparing metagenomes from two different...

  5. A case study for large-scale human microbiome analysis using JCVI's metagenomics reports (METAREP.

    Directory of Open Access Journals (Sweden)

    Johannes Goll

    Full Text Available As metagenomic studies continue to increase in their number, sequence volume and complexity, the scalability of biological analysis frameworks has become a rate-limiting factor to meaningful data interpretation. To address this issue, we have developed JCVI Metagenomics Reports (METAREP as an open source tool to query, browse, and compare extremely large volumes of metagenomic annotations. Here we present improvements to this software including the implementation of a dynamic weighting of taxonomic and functional annotation, support for distributed searches, advanced clustering routines, and integration of additional annotation input formats. The utility of these improvements to data interpretation are demonstrated through the application of multiple comparative analysis strategies to shotgun metagenomic data produced by the National Institutes of Health Roadmap for Biomedical Research Human Microbiome Project (HMP (http://nihroadmap.nih.gov. Specifically, the scalability of the dynamic weighting feature is evaluated and established by its application to the analysis of over 400 million weighted gene annotations derived from 14 billion short reads as predicted by the HMP Unified Metabolic Analysis Network (HUMAnN pipeline. Further, the capacity of METAREP to facilitate the identification and simultaneous comparison of taxonomic and functional annotations including biological pathway and individual enzyme abundances from hundreds of community samples is demonstrated by providing scenarios that describe how these data can be mined to answer biological questions related to the human microbiome. These strategies provide users with a reference of how to conduct similar large-scale metagenomic analyses using METAREP with their own sequence data, while in this study they reveal insights into the nature and extent of variation in taxonomic and functional profiles across body habitats and individuals. Over one thousand HMP WGS datasets and the latest

  6. The binning of metagenomic contigs for microbial physiology of mixed cultures.

    Science.gov (United States)

    Strous, Marc; Kraft, Beate; Bisdorf, Regina; Tegetmeyer, Halina E

    2012-01-01

    So far, microbial physiology has dedicated itself mainly to pure cultures. In nature, cross feeding and competition are important aspects of microbial physiology and these can only be addressed by studying complete communities such as enrichment cultures. Metagenomic sequencing is a powerful tool to characterize such mixed cultures. In the analysis of metagenomic data, well established algorithms exist for the assembly of short reads into contigs and for the annotation of predicted genes. However, the binning of the assembled contigs or unassembled reads is still a major bottleneck and required to understand how the overall metabolism is partitioned over different community members. Binning consists of the clustering of contigs or reads that apparently originate from the same source population. In the present study eight metagenomic samples from the same habitat, a laboratory enrichment culture, were sequenced. Each sample contained 13-23 Mb of assembled contigs and up to eight abundant populations. Binning was attempted with existing methods but they were found to produce poor results, were slow, dependent on non-standard platforms or produced errors. A new binning procedure was developed based on multivariate statistics of tetranucleotide frequencies combined with the use of interpolated Markov models. Its performance was evaluated by comparison of the results between samples with BLAST and in comparison to existing algorithms for four publicly available metagenomes and one previously published artificial metagenome. The accuracy of the new approach was comparable or higher than existing methods. Further, it was up to a 100 times faster. It was implemented in Java Swing as a complete open source graphical binning application available for download and further development (http://sourceforge.net/projects/metawatt).

  7. The binning of metagenomic contigs for microbial physiology of mixed cultures

    Directory of Open Access Journals (Sweden)

    Marc eStrous

    2012-12-01

    Full Text Available So far, microbial physiology has dedicated itself mainly to pure cultures. In nature, cross feeding and competition are important aspects of microbial physiology and these can only be addressed by studying complete communities such as enrichment cultures. Metagenomic sequencing is a powerful tool to characterize such mixed cultures. In the analysis of metagenomic data, well established algorithms exist for the assembly of short reads into contigs and for the annotation of predicted genes. However, the binning of the assembled contigs or unassembled reads is still a major bottleneck and required to understand how the overall metabolism is partitioned over different community members. Binning consists of the clustering of contigs or reads that apparently originate from the same source population.In the present study eight metagenomic samples originating from the same habitat, a laboratory enrichment culture, were sequenced. Each sample contained 13-23 Mb of assembled contigs and up to eight abundant populations. Binning was attempted with existing methods but they were found to produce poor results, were slow, dependent on non-standard platforms or produced errors. A new binning procedure was developed based on multivariate statistics of tetranucleotide frequencies combined with the use of interpolated Markov models. Its performance was evaluated by comparison of the results between samples with BLAST and in comparison to exisiting algorithms for four publicly available metagenomes and one previously published artificial metagenome. The accuracy of the new approach was comparable or higher than existing methods. Further, it was up to a hunderd times faster. It was implemented in Java Swing as a complete open source graphical binning application available for download and further development (http://sourceforge.net/projects/metawatt.

  8. Metagenomic analysis of bacterial community composition and antibiotic resistance genes in a wastewater treatment plant and its receiving surface water.

    Science.gov (United States)

    Tang, Junying; Bu, Yuanqing; Zhang, Xu-Xiang; Huang, Kailong; He, Xiwei; Ye, Lin; Shan, Zhengjun; Ren, Hongqiang

    2016-10-01

    The presence of pathogenic bacteria and the dissemination of antibiotic resistance genes (ARGs) may pose big risks to the rivers that receive the effluent from municipal wastewater treatment plants (WWTPs). In this study, we investigated the changes of bacterial community and ARGs along treatment processes of one WWTP, and examined the effects of the effluent discharge on the bacterial community and ARGs in the receiving river. Pyrosequencing was applied to reveal bacterial community composition including potential bacterial pathogen, and Illumina high-throughput sequencing was used for profiling ARGs. The results showed that the WWTP had good removal efficiency on potential pathogenic bacteria (especially Arcobacter butzleri) and ARGs. Moreover, the bacterial communities of downstream and upstream of the river showed no significant difference. However, the increase in the abundance of potential pathogens and ARGs at effluent outfall was observed, indicating that WWTP effluent might contribute to the dissemination of potential pathogenic bacteria and ARGs in the receiving river. Copyright © 2016 Elsevier Inc. All rights reserved.

  9. Comparison of metagenomic samples using sequence signatures

    Directory of Open Access Journals (Sweden)

    Jiang Bai

    2012-12-01

    Full Text Available Abstract Background Sequence signatures, as defined by the frequencies of k-tuples (or k-mers, k-grams, have been used extensively to compare genomic sequences of individual organisms, to identify cis-regulatory modules, and to study the evolution of regulatory sequences. Recently many next-generation sequencing (NGS read data sets of metagenomic samples from a variety of different environments have been generated. The assembly of these reads can be difficult and analysis methods based on mapping reads to genes or pathways are also restricted by the availability and completeness of existing databases. Sequence-signature-based methods, however, do not need the complete genomes or existing databases and thus, can potentially be very useful for the comparison of metagenomic samples using NGS read data. Still, the applications of sequence signature methods for the comparison of metagenomic samples have not been well studied. Results We studied several dissimilarity measures, including d2, d2* and d2S recently developed from our group, a measure (hereinafter noted as Hao used in CVTree developed from Hao’s group (Qi et al., 2004, measures based on relative di-, tri-, and tetra-nucleotide frequencies as in Willner et al. (2009, as well as standard lp measures between the frequency vectors, for the comparison of metagenomic samples using sequence signatures. We compared their performance using a series of extensive simulations and three real next-generation sequencing (NGS metagenomic datasets: 39 fecal samples from 33 mammalian host species, 56 marine samples across the world, and 13 fecal samples from human individuals. Results showed that the dissimilarity measure d2S can achieve superior performance when comparing metagenomic samples by clustering them into different groups as well as recovering environmental gradients affecting microbial samples. New insights into the environmental factors affecting microbial compositions in metagenomic samples

  10. Semi-supervised prediction of gene regulatory networks using ...

    Indian Academy of Sciences (India)

    Use of computational methods to predict gene regulatory networks (GRNs) from gene expression data is a challenging task. Many studies have been conducted using unsupervised methods to fulfill the task; however, such methods usually yield low prediction accuracies due to the lack of training data. In this article, we ...

  11. Semi-supervised prediction of gene regulatory networks using ...

    Indian Academy of Sciences (India)

    2015-09-28

    Sep 28, 2015 ... Use of computational methods to predict gene regulatory networks (GRNs) from gene expression data is a challenging task. Many studies have been conducted using unsupervised methods to fulfill the task; however, such methods usually yield low prediction accuracies due to the lack of training data.

  12. MGkit: Metagenomic Framework For The Study Of Microbial Communities

    OpenAIRE

    Rubino, Francesco; Creevey, C. J.

    2014-01-01

    Introduction While metagenomics has been used extensively to study microbial communities from a taxonomic and functional perspective, little has been done to address how the species in a microbiome are adapted to and maintain specific roles in dynamic environments like the rumen. Rationale To address this issue we have developed a framework for the robust analysis of metagenomic data that includes fully automated analysis from next-generation sequencing (NGS) reads to assembly, gene ...

  13. Semi-supervised prediction of gene regulatory networks using ...

    Indian Academy of Sciences (India)

    2015-09-28

    Sep 28, 2015 ... [Patel N and Wang JTL 2015 Semi-supervised prediction of gene regulatory networks using machine learning algorithms. J. Biosci. 40 731–740]. DOI 10.1007/s12038-015-9558-9. 1. Introduction. 1.1 Background. Using gene expression data to infer gene regulatory net- works (GRNs) is a key approach to ...

  14. Metagenomics of extreme environments.

    Science.gov (United States)

    Cowan, D A; Ramond, J-B; Makhalanyane, T P; De Maayer, P

    2015-06-01

    Whether they are exposed to extremes of heat or cold, or buried deep beneath the Earth's surface, microorganisms have an uncanny ability to survive under these conditions. This ability to survive has fascinated scientists for nearly a century, but the recent development of metagenomics and 'omics' tools has allowed us to make huge leaps in understanding the remarkable complexity and versatility of extremophile communities. Here, in the context of the recently developed metagenomic tools, we discuss recent research on the community composition, adaptive strategies and biological functions of extremophiles. Copyright © 2015 Elsevier Ltd. All rights reserved.

  15. Contribution of exogenous genetic elements to the group A Streptococcus metagenome.

    Directory of Open Access Journals (Sweden)

    Stephen B Beres

    2007-08-01

    Full Text Available Variation in gene content among strains of a bacterial species contributes to biomedically relevant differences in phenotypes such as virulence and antimicrobial resistance. Group A Streptococcus (GAS causes a diverse array of human infections and sequelae, and exhibits a complex pathogenic behavior. To enhance our understanding of genotype-phenotype relationships in this important pathogen, we determined the complete genome sequences of four GAS strains expressing M protein serotypes (M2, M4, and 2 M12 that commonly cause noninvasive and invasive infections. These sequences were compared with eight previously determined GAS genomes and regions of variably present gene content were assessed. Consistent with the previously determined genomes, each of the new genomes is approximately 1.9 Mb in size, with approximately 10% of the gene content of each encoded on variably present exogenous genetic elements. Like the other GAS genomes, these four genomes are polylysogenic and prophage encode the majority of the variably present gene content of each. In contrast to most of the previously determined genomes, multiple exogenous integrated conjugative elements (ICEs with characteristics of conjugative transposons and plasmids are present in these new genomes. Cumulatively, 242 new GAS metagenome genes were identified that were not present in the previously sequenced genomes. Importantly, ICEs accounted for 41% of the new GAS metagenome gene content identified in these four genomes. Two large ICEs, designated 2096-RD.2 (63 kb and 10750-RD.2 (49 kb, have multiple genes encoding resistance to antimicrobial agents, including tetracycline and erythromycin, respectively. Also resident on these ICEs are three genes encoding inferred extracellular proteins of unknown function, including a predicted cell surface protein that is only present in the genome of the serotype M12 strain cultured from a patient with acute poststreptococcal glomerulonephritis. The data

  16. High throughtput comparisons and profiling of metagenomes for industrially relevant enzymes

    KAUST Repository

    Alam, Intikhab

    2016-01-26

    More and more genomes and metagenomes are being sequenced since the advent of Next Generation Sequencing Technologies (NGS). Many metagenomic samples are collected from a variety of environments, each exhibiting a different environmental profile, e.g. temperature, environmental chemistry, etc… These metagenomes can be profiled to unearth enzymes relevant to several industries based on specific enzyme properties such as ability to work on extreme conditions, such as extreme temperatures, salinity, anaerobically, etc.. In this work, we present the DMAP platform comprising of a high-throughput metagenomic annotation pipeline and a data-warehouse for comparisons and profiling across large number of metagenomes. We developed two reference databases for profiling of important genes, one containing enzymes related to different industries and the other containing genes with potential bioactivity roles. In this presentation we describe an example analysis of a large number of publicly available metagenomic sample from TARA oceans study (Science 2015) that covers significant part of world oceans.

  17. Bracken: estimating species abundance in metagenomics data

    Directory of Open Access Journals (Sweden)

    Jennifer Lu

    2017-01-01

    Full Text Available Metagenomic experiments attempt to characterize microbial communities using high-throughput DNA sequencing. Identification of the microorganisms in a sample provides information about the genetic profile, population structure, and role of microorganisms within an environment. Until recently, most metagenomics studies focused on high-level characterization at the level of phyla, or alternatively sequenced the 16S ribosomal RNA gene that is present in bacterial species. As the cost of sequencing has fallen, though, metagenomics experiments have increasingly used unbiased shotgun sequencing to capture all the organisms in a sample. This approach requires a method for estimating abundance directly from the raw read data. Here we describe a fast, accurate new method that computes the abundance at the species level using the reads collected in a metagenomics experiment. Bracken (Bayesian Reestimation of Abundance after Classification with KrakEN uses the taxonomic assignments made by Kraken, a very fast read-level classifier, along with information about the genomes themselves to estimate abundance at the species level, the genus level, or above. We demonstrate that Bracken can produce accurate species- and genus-level abundance estimates even when a sample contains multiple near-identical species.

  18. Antibiotic Resistance Genes and Correlations with Microbial Community and Metal Resistance Genes in Full-Scale Biogas Reactors As Revealed by Metagenomic Analysis

    DEFF Research Database (Denmark)

    Luo, Gang; Li, Bing; Li, Li-Guan

    2017-01-01

    resistance genes (MRGs). The total abundance of ARGs in all the samples varied from 7 × 10-3 to 1.08 × 10-1 copy of ARG/copy of 16S-rRNA gene, and the samples obtained from thermophilic biogas reactors had a lower total abundance of ARGs, indicating the superiority of thermophilic anaerobic digestion......Digested residues from biogas plants are often used as biofertilizers for agricultural crops cultivation. The antibiotic resistance genes (ARGs) in digested residues pose a high risk to public health due to their potential spread to the disease-causing microorganisms and thus reduce...

  19. Metagenomics at Grass Roots

    Indian Academy of Sciences (India)

    Metagenomics is a robust, interdisciplinary approach for study- ing microbial community composition, function, and dynam- ics. It typically involves a core of molecular biology, micro- biology, ecology, statistics, and computational biology. Excit- ing outcomes anticipated from these studies include unrav- eling of complex ...

  20. A hybrid approach of gene sets and single genes for the prediction of survival risks with gene expression data.

    Science.gov (United States)

    Seok, Junhee; Davis, Ronald W; Xiao, Wenzhong

    2015-01-01

    Accumulated biological knowledge is often encoded as gene sets, collections of genes associated with similar biological functions or pathways. The use of gene sets in the analyses of high-throughput gene expression data has been intensively studied and applied in clinical research. However, the main interest remains in finding modules of biological knowledge, or corresponding gene sets, significantly associated with disease conditions. Risk prediction from censored survival times using gene sets hasn't been well studied. In this work, we propose a hybrid method that uses both single gene and gene set information together to predict patient survival risks from gene expression profiles. In the proposed method, gene sets provide context-level information that is poorly reflected by single genes. Complementarily, single genes help to supplement incomplete information of gene sets due to our imperfect biomedical knowledge. Through the tests over multiple data sets of cancer and trauma injury, the proposed method showed robust and improved performance compared with the conventional approaches with only single genes or gene sets solely. Additionally, we examined the prediction result in the trauma injury data, and showed that the modules of biological knowledge used in the prediction by the proposed method were highly interpretable in biology. A wide range of survival prediction problems in clinical genomics is expected to benefit from the use of biological knowledge.

  1. MetAnnotate: function-specific taxonomic profiling and comparison of metagenomes.

    Science.gov (United States)

    Petrenko, Pavel; Lobb, Briallen; Kurtz, Daniel A; Neufeld, Josh D; Doxey, Andrew C

    2015-11-05

    Metagenomes provide access to the taxonomic composition and functional capabilities of microbial communities. Although metagenomic analysis methods exist for estimating overall community composition or metabolic potential, identifying specific taxa that encode specific functions or pathways of interest can be more challenging. Here we present MetAnnotate, which addresses the common question: "which organisms perform my function of interest within my metagenome(s) of interest?" MetAnnotate uses profile hidden Markov models to analyze shotgun metagenomes for genes and pathways of interest, classifies retrieved sequences either through a phylogenetic placement or best hit approach, and enables comparison of these profiles between metagenomes. Based on a simulated metagenome dataset, the tool achieves high taxonomic classification accuracy for a broad range of genes, including both markers of community abundance and specific biological pathways. Lastly, we demonstrate MetAnnotate by analyzing for cobalamin (vitamin B12) synthesis genes across hundreds of aquatic metagenomes in a fraction of the time required by the commonly used Basic Local Alignment Search Tool top hit approach. MetAnnotate is multi-threaded and installable as a local web application or command-line tool on Linux systems. Metannotate is a useful framework for general and/or function-specific taxonomic profiling and comparison of metagenomes.

  2. GenePRIMP: A GENE PRediction IMprovement Pipeline for Prokaryotic genomes

    Energy Technology Data Exchange (ETDEWEB)

    Pati, Amrita; Ivanova, Natalia N.; Mikhailova, Natalia; Ovchinnikova, Galina; Hooper, Sean D.; Lykidis, Athanasios; Kyrpides, Nikos C.

    2010-04-01

    We present 'gene prediction improvement pipeline' (GenePRIMP; http://geneprimp.jgi-psf.org/), a computational process that performs evidence-based evaluation of gene models in prokaryotic genomes and reports anomalies including inconsistent start sites, missed genes and split genes. We found that manual curation of gene models using the anomaly reports generated by GenePRIMP improved their quality, and demonstrate the applicability of GenePRIMP in improving finishing quality and comparing different genome-sequencing and annotation technologies.

  3. Functional association prediction by community profiling.

    Science.gov (United States)

    Jiao, Dazhi; Han, Wontack; Ye, Yuzhen

    2017-10-01

    Recent years have witnessed unprecedented accumulation of DNA sequences and therefore protein sequences (predicted from DNA sequences), due to the advances of sequencing technology. One of the major sources of the hypothetical proteins is the metagenomics research. Current annotation of metagenomes (collections of short metagenomic sequences or assemblies) relies on similarity searches against known gene/protein families, based on which functional profiles of microbial communities can be built. This practice, however, leaves out the hypothetical proteins, which may outnumber the known proteins for many microbial communities. On the other hand, we may ask: what can we gain from the large number of metagenomes made available by the metagenomic studies, for the annotation of metagenomic sequences as well as functional annotation of hypothetical proteins in general? Here we propose a community profiling approach for predicting functional associations between proteins: two proteins are predicted to be associated if they share similar presence and absence profiles (called community profiles) across microbial communities. Community profiling is conceptually similar to the phylogenetic profiling approach to functional prediction, however with fundamental differences. We tested different profile construction methods, the selection of reference metagenomes, and correlation metrics, among others, to optimize the performance of this new approach. We demonstrated that the community profiling approach alone slightly outperforms the phylogenetic profiling approach for associating proteins in species that are well represented by sequenced genomes, and combining phylogenetic and community profiling further improves (though only marginally) the prediction of functional association. Further we showed that community profiling method significantly outperforms phylogenetic profiling, revealing more functional associations, when applied to a more recently sequenced bacterial genome

  4. Metagenomic Analysis of the Gut Microbiome of the Common Black Slug Arion ater in Search of Novel Lignocellulose Degrading Enzymes

    Directory of Open Access Journals (Sweden)

    Ryan Joynson

    2017-11-01

    Full Text Available Some eukaryotes are able to gain access to well-protected carbon sources in plant biomass by exploiting microorganisms in the environment or harbored in their digestive system. One is the land pulmonate Arion ater, which takes advantage of a gut microbial consortium that can break down the widely available, but difficult to digest, carbohydrate polymers in lignocellulose, enabling them to digest a broad range of fresh and partially degraded plant material efficiently. This ability is considered one of the major factors that have enabled A. ater to become one of the most widespread plant pest species in Western Europe and North America. Using metagenomic techniques we have characterized the bacterial diversity and functional capability of the gut microbiome of this notorious agricultural pest. Analysis of gut metagenomic community sequences identified abundant populations of known lignocellulose-degrading bacteria, along with well-characterized bacterial plant pathogens. This also revealed a repertoire of more than 3,383 carbohydrate active enzymes (CAZymes including multiple enzymes associated with lignin degradation, demonstrating a microbial consortium capable of degradation of all components of lignocellulose. This would allow A. ater to make extensive use of plant biomass as a source of nutrients through exploitation of the enzymatic capabilities of the gut microbial consortia. From this metagenome assembly we also demonstrate the successful amplification of multiple predicted gene sequences from metagenomic DNA subjected to whole genome amplification and expression of functional proteins, facilitating the low cost acquisition and biochemical testing of the many thousands of novel genes identified in metagenomics studies. These findings demonstrate the importance of studying Gastropod microbial communities. Firstly, with respect to understanding links between feeding and evolutionary success and, secondly, as sources of novel enzymes with

  5. Soil metagenomics and tropical soil productivity

    OpenAIRE

    Garrett, Karen A.

    2009-01-01

    This presentation summarizes research in the soil metagenomics cross cutting research activity. Soil metagenomics studies soil microbial communities as contributors to soil health.C CCRA-4 (Soil Metagenomics)

  6. Bioinformatics tools for predicting GPCR gene functions.

    Science.gov (United States)

    Suwa, Makiko

    2014-01-01

    The automatic classification of GPCRs by bioinformatics methodology can provide functional information for new GPCRs in the whole 'GPCR proteome' and this information is important for the development of novel drugs. Since GPCR proteome is classified hierarchically, general ways for GPCR function prediction are based on hierarchical classification. Various computational tools have been developed to predict GPCR functions; those tools use not simple sequence searches but more powerful methods, such as alignment-free methods, statistical model methods, and machine learning methods used in protein sequence analysis, based on learning datasets. The first stage of hierarchical function prediction involves the discrimination of GPCRs from non-GPCRs and the second stage involves the classification of the predicted GPCR candidates into family, subfamily, and sub-subfamily levels. Then, further classification is performed according to their protein-protein interaction type: binding G-protein type, oligomerized partner type, etc. Those methods have achieved predictive accuracies of around 90 %. Finally, I described the future subject of research of the bioinformatics technique about functional prediction of GPCR.

  7. Water metagenomic analysis reveals low bacterial diversity and the presence of antimicrobial residues and resistance genes in a river containing wastewater from backyard aquacultures in the Mekong Delta, Vietnam.

    Science.gov (United States)

    Nakayama, Tatsuya; Tuyet Hoa, Tran Thi; Harada, Kazuo; Warisaya, Minae; Asayama, Megumi; Hinenoya, Atsushi; Lee, Joon Won; Phu, Tran Minh; Ueda, Shuhei; Sumimura, Yoshinori; Hirata, Kazumasa; Phuong, Nguyen Thanh; Yamamoto, Yoshimasa

    2017-03-01

    The environmental pathways for the dissemination of antibiotic resistance have recently received increased attention. Aquatic environments act as reservoirs or sources of antimicrobial-resistant bacteria, antimicrobial residues, and antimicrobial resistance genes (ARGs). Therefore, it is imperative to identify the role of polluted water in the dissemination of antimicrobial resistance. The aim of this study was to evaluate the antimicrobial residues, ARGs, and microbiota in the freshwater systems of the Mekong Delta. We selected 12 freshwater sites from aquacultures and rivers in Can Tho, Vietnam and analyzed them for 45 antimicrobial residues and 8 ARGs by LC/MS/MS and real-time PCR, respectively. A 16S rDNA-based metagenomic analysis was conducted to characterize the water microbiota. Residues of sulfamethoxazole (10/12) and sulfadimidine (7/12) were widely detected, together with the sulfa-resistance genes sul1 (11/12) and sul2 (9/12). Additionally, sulfamethoxazole residues and the β-lactamase-resistance gene bla CTX-M-1 were detected in eight freshwater systems (8/12), suggesting that these freshwater systems may have been polluted by human activity. The metagenomic analysis showed that all the tested freshwater systems contained the phyla Proteobacteria, Actinobacteria, and Bacteroidetes, representing 64% of the total microbiota. Moreover, the Cai Rang River site (Ri-E), which is located at the merge point of wastewaters from backyard-based aquacultures, contained the genera Polynucleobacter, Variovorax, and Limnohabitans, representing more than 78.4% of the total microbiota. Bacterial diversity analysis showed that the Ri-E exhibited the lowest diversity compared with other regions. Principal coordinate analysis showed that the differences among water microbiotas in backyard-based aquacultures could be explained by the farmers' aquaculture techniques. In conclusion, this study demonstrated a collapse of bacterial diversity at the merge point of wastewaters

  8. Exploiting ontology graph for predicting sparsely annotated gene function.

    Science.gov (United States)

    Wang, Sheng; Cho, Hyunghoon; Zhai, ChengXiang; Berger, Bonnie; Peng, Jian

    2015-06-15

    Systematically predicting gene (or protein) function based on molecular interaction networks has become an important tool in refining and enhancing the existing annotation catalogs, such as the Gene Ontology (GO) database. However, functional labels with only a few (algorithm that independently considers each label faces a paucity of information and thus is prone to capture non-generalizable patterns in the data, resulting in poor predictive performance. There exist a variety of algorithms for function prediction, but none properly address this 'overfitting' issue of sparsely annotated functions, or do so in a manner scalable to tens of thousands of functions in the human catalog. We propose a novel function prediction algorithm, clusDCA, which transfers information between similar functional labels to alleviate the overfitting problem for sparsely annotated functions. Our method is scalable to datasets with a large number of annotations. In a cross-validation experiment in yeast, mouse and human, our method greatly outperformed previous state-of-the-art function prediction algorithms in predicting sparsely annotated functions, without sacrificing the performance on labels with sufficient information. Furthermore, we show that our method can accurately predict genes that will be assigned a functional label that has no known annotations, based only on the ontology graph structure and genes associated with other labels, which further suggests that our method effectively utilizes the similarity between gene functions. https://github.com/wangshenguiuc/clusDCA. © The Author 2015. Published by Oxford University Press.

  9. Reranking candidate gene models with cross-species comparison for improved gene prediction

    Directory of Open Access Journals (Sweden)

    Pereira Fernando CN

    2008-10-01

    Full Text Available Abstract Background Most gene finders score candidate gene models with state-based methods, typically HMMs, by combining local properties (coding potential, splice donor and acceptor patterns, etc. Competing models with similar state-based scores may be distinguishable with additional information. In particular, functional and comparative genomics datasets may help to select among competing models of comparable probability by exploiting features likely to be associated with the correct gene models, such as conserved exon/intron structure or protein sequence features. Results We have investigated the utility of a simple post-processing step for selecting among a set of alternative gene models, using global scoring rules to rerank competing models for more accurate prediction. For each gene locus, we first generate the K best candidate gene models using the gene finder Evigan, and then rerank these models using comparisons with putative orthologous genes from closely-related species. Candidate gene models with lower scores in the original gene finder may be selected if they exhibit strong similarity to probable orthologs in coding sequence, splice site location, or signal peptide occurrence. Experiments on Drosophila melanogaster demonstrate that reranking based on cross-species comparison outperforms the best gene models identified by Evigan alone, and also outperforms the comparative gene finders GeneWise and Augustus+. Conclusion Reranking gene models with cross-species comparison improves gene prediction accuracy. This straightforward method can be readily adapted to incorporate additional lines of evidence, as it requires only a ranked source of candidate gene models.

  10. FY11 Report on Metagenome Analysis using Pathogen Marker Libraries

    Energy Technology Data Exchange (ETDEWEB)

    Gardner, Shea N. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Allen, Jonathan E. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); McLoughlin, Kevin S. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Slezak, Tom [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

    2011-06-02

    detection probability appears to be a function of both coverages. Multiple species could be detected simultaneously in a simulated low-coverage, complex metagenome, and the largest PML gave no false negative species and no false positive genera. The presence of multiple species was predicted in a complex metagenome from a human gut microbiome with 1.9 GB of short reads (75 nt); the species predicted were reasonable gut flora and no biothreat agents were detected, showing the feasibility of PML analysis of empirical complex metagenomes.

  11. Embryo quality predictive models based on cumulus cells gene expression

    Directory of Open Access Journals (Sweden)

    Devjak R

    2016-06-01

    Full Text Available Since the introduction of in vitro fertilization (IVF in clinical practice of infertility treatment, the indicators for high quality embryos were investigated. Cumulus cells (CC have a specific gene expression profile according to the developmental potential of the oocyte they are surrounding, and therefore, specific gene expression could be used as a biomarker. The aim of our study was to combine more than one biomarker to observe improvement in prediction value of embryo development. In this study, 58 CC samples from 17 IVF patients were analyzed. This study was approved by the Republic of Slovenia National Medical Ethics Committee. Gene expression analysis [quantitative real time polymerase chain reaction (qPCR] for five genes, analyzed according to embryo quality level, was performed. Two prediction models were tested for embryo quality prediction: a binary logistic and a decision tree model. As the main outcome, gene expression levels for five genes were taken and the area under the curve (AUC for two prediction models were calculated. Among tested genes, AMHR2 and LIF showed significant expression difference between high quality and low quality embryos. These two genes were used for the construction of two prediction models: the binary logistic model yielded an AUC of 0.72 ± 0.08 and the decision tree model yielded an AUC of 0.73 ± 0.03. Two different prediction models yielded similar predictive power to differentiate high and low quality embryos. In terms of eventual clinical decision making, the decision tree model resulted in easy-to-interpret rules that are highly applicable in clinical practice.

  12. Microbial Functional Gene Diversity Predicts Groundwater Contamination and Ecosystem Functioning.

    Science.gov (United States)

    He, Zhili; Zhang, Ping; Wu, Linwei; Rocha, Andrea M; Tu, Qichao; Shi, Zhou; Wu, Bo; Qin, Yujia; Wang, Jianjun; Yan, Qingyun; Curtis, Daniel; Ning, Daliang; Van Nostrand, Joy D; Wu, Liyou; Yang, Yunfeng; Elias, Dwayne A; Watson, David B; Adams, Michael W W; Fields, Matthew W; Alm, Eric J; Hazen, Terry C; Adams, Paul D; Arkin, Adam P; Zhou, Jizhong

    2018-02-20

    Contamination from anthropogenic activities has significantly impacted Earth's biosphere. However, knowledge about how environmental contamination affects the biodiversity of groundwater microbiomes and ecosystem functioning remains very limited. Here, we used a comprehensive functional gene array to analyze groundwater microbiomes from 69 wells at the Oak Ridge Field Research Center (Oak Ridge, TN), representing a wide pH range and uranium, nitrate, and other contaminants. We hypothesized that the functional diversity of groundwater microbiomes would decrease as environmental contamination (e.g., uranium or nitrate) increased or at low or high pH, while some specific populations capable of utilizing or resistant to those contaminants would increase, and thus, such key microbial functional genes and/or populations could be used to predict groundwater contamination and ecosystem functioning. Our results indicated that functional richness/diversity decreased as uranium (but not nitrate) increased in groundwater. In addition, about 5.9% of specific key functional populations targeted by a comprehensive functional gene array (GeoChip 5) increased significantly ( P contamination and ecosystem functioning. This study indicates great potential for using microbial functional genes to predict environmental contamination and ecosystem functioning. IMPORTANCE Disentangling the relationships between biodiversity and ecosystem functioning is an important but poorly understood topic in ecology. Predicting ecosystem functioning on the basis of biodiversity is even more difficult, particularly with microbial biomarkers. As an exploratory effort, this study used key microbial functional genes as biomarkers to provide predictive understanding of environmental contamination and ecosystem functioning. The results indicated that the overall functional gene richness/diversity decreased as uranium increased in groundwater, while specific key microbial guilds increased significantly as

  13. The YNP metagenome project

    DEFF Research Database (Denmark)

    Inskeep, William P.; Jay, Zackary J.; Tringe, Susannah G.

    2013-01-01

    The Yellowstone geothermal complex contains over 10,000 diverse geothermal features that host numerous phylogenetically deeply rooted and poorly understood archaea, bacteria, and viruses. Microbial communities in high-temperature environments are generally less diverse than soil, marine, sediment......, and environmental variables. Twenty geochemically distinct geothermal ecosystems representing a broad spectrum of Yellowstone hot-spring environments were used for metagenomic and geochemical analysis and included approximately equal numbers of: (1) phototrophic mats, (2) “filamentous streamer” communities, and (3...

  14. Genome signature analysis of thermal virus metagenomes reveals Archaea and thermophilic signatures

    Directory of Open Access Journals (Sweden)

    Pride David T

    2008-09-01

    Full Text Available Abstract Background Metagenomic analysis provides a rich source of biological information for otherwise intractable viral communities. However, study of viral metagenomes has been hampered by its nearly complete reliance on BLAST algorithms for identification of DNA sequences. We sought to develop algorithms for examination of viral metagenomes to identify the origin of sequences independent of BLAST algorithms. We chose viral metagenomes obtained from two hot springs, Bear Paw and Octopus, in Yellowstone National Park, as they represent simple microbial populations where comparatively large contigs were obtained. Thermal spring metagenomes have high proportions of sequences without significant Genbank homology, which has hampered identification of viruses and their linkage with hosts. To analyze each metagenome, we developed a method to classify DNA fragments using genome signature-based phylogenetic classification (GSPC, where metagenomic fragments are compared to a database of oligonucleotide signatures for all previously sequenced Bacteria, Archaea, and viruses. Results From both Bear Paw and Octopus hot springs, each assembled contig had more similarity to other metagenome contigs than to any sequenced microbial genome based on GSPC analysis, suggesting a genome signature common to each of these extreme environments. While viral metagenomes from Bear Paw and Octopus share some similarity, the genome signatures from each locale are largely unique. GSPC using a microbial database predicts most of the Octopus metagenome has archaeal signatures, while bacterial signatures predominate in Bear Paw; a finding consistent with those of Genbank BLAST. When using a viral database, the majority of the Octopus metagenome is predicted to belong to archaeal virus Families Globuloviridae and Fuselloviridae, while none of the Bear Paw metagenome is predicted to belong to archaeal viruses. As expected, when microbial and viral databases are combined, each of

  15. Genome signature analysis of thermal virus metagenomes reveals Archaea and thermophilic signatures.

    Science.gov (United States)

    Pride, David T; Schoenfeld, Thomas

    2008-09-17

    Metagenomic analysis provides a rich source of biological information for otherwise intractable viral communities. However, study of viral metagenomes has been hampered by its nearly complete reliance on BLAST algorithms for identification of DNA sequences. We sought to develop algorithms for examination of viral metagenomes to identify the origin of sequences independent of BLAST algorithms. We chose viral metagenomes obtained from two hot springs, Bear Paw and Octopus, in Yellowstone National Park, as they represent simple microbial populations where comparatively large contigs were obtained. Thermal spring metagenomes have high proportions of sequences without significant Genbank homology, which has hampered identification of viruses and their linkage with hosts. To analyze each metagenome, we developed a method to classify DNA fragments using genome signature-based phylogenetic classification (GSPC), where metagenomic fragments are compared to a database of oligonucleotide signatures for all previously sequenced Bacteria, Archaea, and viruses. From both Bear Paw and Octopus hot springs, each assembled contig had more similarity to other metagenome contigs than to any sequenced microbial genome based on GSPC analysis, suggesting a genome signature common to each of these extreme environments. While viral metagenomes from Bear Paw and Octopus share some similarity, the genome signatures from each locale are largely unique. GSPC using a microbial database predicts most of the Octopus metagenome has archaeal signatures, while bacterial signatures predominate in Bear Paw; a finding consistent with those of Genbank BLAST. When using a viral database, the majority of the Octopus metagenome is predicted to belong to archaeal virus Families Globuloviridae and Fuselloviridae, while none of the Bear Paw metagenome is predicted to belong to archaeal viruses. As expected, when microbial and viral databases are combined, each of the Octopus and Bear Paw metagenomic contigs

  16. An Agile Functional Analysis of Metagenomic Data Using SUPER-FOCUS.

    Science.gov (United States)

    Silva, Genivaldo Gueiros Z; Lopes, Fabyano A C; Edwards, Robert A

    2017-01-01

    One of the main goals in metagenomics is to identify the functional profile of a microbial community from unannotated shotgun sequencing reads. Functional annotation is important in biological research because it enables researchers to identify the abundance of functional genes of the organisms present in the sample, answering the question, "What can the organisms in the sample do?" Most currently available approaches do not scale with increasing data volumes, which is important because both the number and lengths of the reads provided by sequencing platforms keep increasing. Here, we present SUPER-FOCUS, SUbsystems Profile by databasE Reduction using FOCUS, an agile homology-based approach using a reduced reference database to report the subsystems present in metagenomic datasets and profile their abundances. SUPER-FOCUS was tested with real metagenomes, and the results show that it accurately predicts the subsystems present in the profiled microbial communities, is computationally efficient, and up to 1000 times faster than other tools. SUPER-FOCUS is freely available at http://edwards.sdsu.edu/SUPERFOCUS .

  17. SUPER-FOCUS: a tool for agile functional analysis of shotgun metagenomic data.

    Science.gov (United States)

    Silva, Genivaldo Gueiros Z; Green, Kevin T; Dutilh, Bas E; Edwards, Robert A

    2016-02-01

    Analyzing the functional profile of a microbial community from unannotated shotgun sequencing reads is one of the important goals in metagenomics. Functional profiling has valuable applications in biological research because it identifies the abundances of the functional genes of the organisms present in the original sample, answering the question what they can do. Currently, available tools do not scale well with increasing data volumes, which is important because both the number and lengths of the reads produced by sequencing platforms keep increasing. Here, we introduce SUPER-FOCUS, SUbsystems Profile by databasE Reduction using FOCUS, an agile homology-based approach using a reduced reference database to report the subsystems present in metagenomic datasets and profile their abundances. SUPER-FOCUS was tested with over 70 real metagenomes, the results showing that it accurately predicts the subsystems present in the profiled microbial communities, and is up to 1000 times faster than other tools. SUPER-FOCUS was implemented in Python, and its source code and the tool website are freely available at https://edwards.sdsu.edu/SUPERFOCUS. redwards@mail.sdsu.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.

  18. Exploring nucleo-cytoplasmic large DNA viruses in Tara Oceans microbial metagenomes.

    Science.gov (United States)

    Hingamp, Pascal; Grimsley, Nigel; Acinas, Silvia G; Clerissi, Camille; Subirana, Lucie; Poulain, Julie; Ferrera, Isabel; Sarmento, Hugo; Villar, Emilie; Lima-Mendez, Gipsi; Faust, Karoline; Sunagawa, Shinichi; Claverie, Jean-Michel; Moreau, Hervé; Desdevises, Yves; Bork, Peer; Raes, Jeroen; de Vargas, Colomban; Karsenti, Eric; Kandels-Lewis, Stefanie; Jaillon, Olivier; Not, Fabrice; Pesant, Stéphane; Wincker, Patrick; Ogata, Hiroyuki

    2013-09-01

    Nucleo-cytoplasmic large DNA viruses (NCLDVs) constitute a group of eukaryotic viruses that can have crucial ecological roles in the sea by accelerating the turnover of their unicellular hosts or by causing diseases in animals. To better characterize the diversity, abundance and biogeography of marine NCLDVs, we analyzed 17 metagenomes derived from microbial samples (0.2-1.6 μm size range) collected during the Tara Oceans Expedition. The sample set includes ecosystems under-represented in previous studies, such as the Arabian Sea oxygen minimum zone (OMZ) and Indian Ocean lagoons. By combining computationally derived relative abundance and direct prokaryote cell counts, the abundance of NCLDVs was found to be in the order of 10(4)-10(5) genomes ml(-1) for the samples from the photic zone and 10(2)-10(3) genomes ml(-1) for the OMZ. The Megaviridae and Phycodnaviridae dominated the NCLDV populations in the metagenomes, although most of the reads classified in these families showed large divergence from known viral genomes. Our taxon co-occurrence analysis revealed a potential association between viruses of the Megaviridae family and eukaryotes related to oomycetes. In support of this predicted association, we identified six cases of lateral gene transfer between Megaviridae and oomycetes. Our results suggest that marine NCLDVs probably outnumber eukaryotic organisms in the photic layer (per given water mass) and that metagenomic sequence analyses promise to shed new light on the biodiversity of marine viruses and their interactions with potential hosts.

  19. Gene prediction validation and functional analysis of redundant pathways

    DEFF Research Database (Denmark)

    Sønderkær, Mads

    2011-01-01

    have employed a large mRNA-seq data set to improve and validate ab initio predicted gene models. This direct experimental evidence also provides reliable determinations of UTR regions and polyadenylation sites, which are not easily predicted in plants. Furthermore, once an annotated genome sequence...... pathway is transcriptionally active in DM, this is virtually non-existing in RH, possible reflecting the selection for high yield in European breeding programs.......Gene expression by mRNA-Seq In silico gene prediction in eukaryotic genomes is a complicated and error prone process. Nonetheless, a high-quality gene annotation is very important for the usefulness of a genome sequence to the scientific community. In the potato genome sequencing consortium, we...

  20. Predictability of Genetic Interactions from Functional Gene Modules

    Directory of Open Access Journals (Sweden)

    Jonathan H. Young

    2017-02-01

    Full Text Available Characterizing genetic interactions is crucial to understanding cellular and organismal response to gene-level perturbations. Such knowledge can inform the selection of candidate disease therapy targets, yet experimentally determining whether genes interact is technically nontrivial and time-consuming. High-fidelity prediction of different classes of genetic interactions in multiple organisms would substantially alleviate this experimental burden. Under the hypothesis that functionally related genes tend to share common genetic interaction partners, we evaluate a computational approach to predict genetic interactions in Homo sapiens, Drosophila melanogaster, and Saccharomyces cerevisiae. By leveraging knowledge of functional relationships between genes, we cross-validate predictions on known genetic interactions and observe high predictive power of multiple classes of genetic interactions in all three organisms. Additionally, our method suggests high-confidence candidate interaction pairs that can be directly experimentally tested. A web application is provided for users to query genes for predicted novel genetic interaction partners. Finally, by subsampling the known yeast genetic interaction network, we found that novel genetic interactions are predictable even when knowledge of currently known interactions is minimal.

  1. Prediction of the gene expression in normal lung tissue by the gene expression in blood.

    Science.gov (United States)

    Halloran, Justin W; Zhu, Dakai; Qian, David C; Byun, Jinyoung; Gorlova, Olga Y; Amos, Christopher I; Gorlov, Ivan P

    2015-11-17

    Comparative analysis of gene expression in human tissues is important for understanding the molecular mechanisms underlying tissue-specific control of gene expression. It can also open an avenue for using gene expression in blood (which is the most easily accessible human tissue) to predict gene expression in other (less accessible) tissues, which would facilitate the development of novel gene expression based models for assessing disease risk and progression. Until recently, direct comparative analysis across different tissues was not possible due to the scarcity of paired tissue samples from the same individuals. In this study we used paired whole blood/lung gene expression data from the Genotype-Tissue Expression (GTEx) project. We built a generalized linear regression model for each gene using gene expression in lung as the outcome and gene expression in blood, age and gender as predictors. For ~18 % of the genes, gene expression in blood was a significant predictor of gene expression in lung. We found that the number of single nucleotide polymorphisms (SNPs) influencing expression of a given gene in either blood or lung, also known as the number of quantitative trait loci (eQTLs), was positively associated with efficacy of blood-based prediction of that gene's expression in lung. This association was strongest for shared eQTLs: those influencing gene expression in both blood and lung. In conclusion, for a considerable number of human genes, their expression levels in lung can be predicted using observable gene expression in blood. An abundance of shared eQTLs may explain the strong blood/lung correlations in the gene expression.

  2. Databases of the marine metagenomics

    KAUST Repository

    Mineta, Katsuhiko

    2015-10-28

    The metagenomic data obtained from marine environments is significantly useful for understanding marine microbial communities. In comparison with the conventional amplicon-based approach of metagenomics, the recent shotgun sequencing-based approach has become a powerful tool that provides an efficient way of grasping a diversity of the entire microbial community at a sampling point in the sea. However, this approach accelerates accumulation of the metagenome data as well as increase of data complexity. Moreover, when metagenomic approach is used for monitoring a time change of marine environments at multiple locations of the seawater, accumulation of metagenomics data will become tremendous with an enormous speed. Because this kind of situation has started becoming of reality at many marine research institutions and stations all over the world, it looks obvious that the data management and analysis will be confronted by the so-called Big Data issues such as how the database can be constructed in an efficient way and how useful knowledge should be extracted from a vast amount of the data. In this review, we summarize the outline of all the major databases of marine metagenome that are currently publically available, noting that database exclusively on marine metagenome is none but the number of metagenome databases including marine metagenome data are six, unexpectedly still small. We also extend our explanation to the databases, as reference database we call, that will be useful for constructing a marine metagenome database as well as complementing important information with the database. Then, we would point out a number of challenges to be conquered in constructing the marine metagenome database.

  3. Metagenomic analysis of permafrost microbial community response to thaw

    Energy Technology Data Exchange (ETDEWEB)

    Mackelprang, R.; Waldrop, M.P.; DeAngelis, K.M.; David, M.M.; Chavarria, K.L.; Blazewicz, S.J.; Rubin, E.M.; Jansson, J.K.

    2011-07-01

    We employed deep metagenomic sequencing to determine the impact of thaw on microbial phylogenetic and functional genes and related this data to measurements of methane emissions. Metagenomics, the direct sequencing of DNA from the environment, allows for the examination of whole biochemical pathways and associated processes, as opposed to individual pieces of the metabolic puzzle. Our metagenome analyses revealed that during transition from a frozen to a thawed state there were rapid shifts in many microbial, phylogenetic and functional gene abundances and pathways. After one week of incubation at 5°C, permafrost metagenomes converged to be more similar to each other than while they were frozen. We found that multiple genes involved in cycling of C and nitrogen shifted rapidly during thaw. We also constructed the first draft genome from a complex soil metagenome, which corresponded to a novel methanogen. Methane previously accumulated in permafrost was released during thaw and subsequently consumed by methanotrophic bacteria. Together these data point towards the importance of rapid cycling of methane and nitrogen in thawing permafrost.

  4. Protein structure determination using metagenome sequence data.

    Science.gov (United States)

    Ovchinnikov, Sergey; Park, Hahnbeom; Varghese, Neha; Huang, Po-Ssu; Pavlopoulos, Georgios A; Kim, David E; Kamisetty, Hetunandan; Kyrpides, Nikos C; Baker, David

    2017-01-20

    Despite decades of work by structural biologists, there are still ~5200 protein families with unknown structure outside the range of comparative modeling. We show that Rosetta structure prediction guided by residue-residue contacts inferred from evolutionary information can accurately model proteins that belong to large families and that metagenome sequence data more than triple the number of protein families with sufficient sequences for accurate modeling. We then integrate metagenome data, contact-based structure matching, and Rosetta structure calculations to generate models for 614 protein families with currently unknown structures; 206 are membrane proteins and 137 have folds not represented in the Protein Data Bank. This approach provides the representative models for large protein families originally envisioned as the goal of the Protein Structure Initiative at a fraction of the cost. Copyright © 2017, American Association for the Advancement of Science.

  5. The relative abundance of predicted genes associated with ammonia-oxidation, nitrate reduction, and biomass decomposition in mineral soil are altered by intensive timber harvest.

    Science.gov (United States)

    Mushinski, R. M.; Zhou, Y.; Gentry, T. J.; Boutton, T. W.

    2017-12-01

    Forest ecosystems in the southern United States are substantially altered by anthropogenic disturbances such as timber harvest and land conversion, with effects being observed in carbon and nutrient pools as well as biogeochemical processes. Furthermore, the desire to develop renewable energy sources in the form of biomass extraction from logging residues may result in alterations in soil community structure and function. While the impact of forest management on soil physicochemical properties of the region has been studied, its' long-term effect on soil bacterial community composition and metagenomic potential is relatively unknown, especially at deeper soil depths. This study investigates how intensive organic matter removal intensities associated with timber harvest influence decadal-scale alterations in bacterial community structure and functional potential in the upper 1-m of the soil profile, 18 years post-harvest in a Pinus taeda L. forest of eastern Texas. Amplicon sequencing of the 16S rRNA gene was used in conjunction with soil chemical analyses to evaluate treatment-induced differences in community composition and potential environmental drivers of associated change. Furthermore, functional potential was assessed by using amplicon data to make metagenomic predictions. Results indicate that increasing organic matter removal intensity leads to altered community composition and the relative abundance of dominant OTUs annotated to Burkholderia and Aciditerrimonas. The relative abundance of predicted genes associated with dissimilatory nitrate reduction and denitrification were highest in the most intensively harvested treatment while genes involved in nitrification were significantly lower in the most intensively harvested treatment. Furthermore, genes associated with glycosyltransferases were significantly reduced with increasing harvest intensity while polysaccharide lyases increased. These results imply that intensive organic matter removal may create

  6. Identification and Resolution of Microdiversity through Metagenomic Sequencing of Parallel Consortia

    Energy Technology Data Exchange (ETDEWEB)

    Nelson, William C.; Maezato, Yukari; Wu, Yu-Wei; Romine, Margaret F.; Lindemann, Stephen R.; Löffler, F. E.

    2015-10-23

    To gain a predictive understanding of the interspecies interactions within microbial communities that govern community function, the genomic complement of every member population must be determined. Although metagenomic sequencing has enabled thede novoreconstruction of some microbial genomes from environmental communities, microdiversity confounds current genome reconstruction techniques. To overcome this issue, we performed short-read metagenomic sequencing on parallel consortia, defined as consortia cultivated under the same conditions from the same natural community with overlapping species composition. The differences in species abundance between the two consortia allowed reconstruction of near-complete (at an estimated >85% of gene complement) genome sequences for 17 of the 20 detected member species. TwoHalomonasspp. indistinguishable by amplicon analysis were found to be present within the community. In addition, comparison of metagenomic reads against the consensus scaffolds revealed within-species variation for one of theHalomonaspopulations, one of theRhodobacteraceaepopulations, and theRhizobialespopulation. Genomic comparison of these representative instances of inter- and intraspecies microdiversity suggests differences in functional potential that may result in the expression of distinct roles in the community. In addition, isolation and complete genome sequence determination of six member species allowed an investigation into the sensitivity and specificity of genome reconstruction processes, demonstrating robustness across a wide range of sequence coverage (9× to 2,700×) within the metagenomic data set.

  7. Global discriminative learning for higher-accuracy computational gene prediction.

    Directory of Open Access Journals (Sweden)

    Axel Bernal

    2007-03-01

    Full Text Available Most ab initio gene predictors use a probabilistic sequence model, typically a hidden Markov model, to combine separately trained models of genomic signals and content. By combining separate models of relevant genomic features, such gene predictors can exploit small training sets and incomplete annotations, and can be trained fairly efficiently. However, that type of piecewise training does not optimize prediction accuracy and has difficulty in accounting for statistical dependencies among different parts of the gene model. With genomic information being created at an ever-increasing rate, it is worth investigating alternative approaches in which many different types of genomic evidence, with complex statistical dependencies, can be integrated by discriminative learning to maximize annotation accuracy. Among discriminative learning methods, large-margin classifiers have become prominent because of the success of support vector machines (SVM in many classification tasks. We describe CRAIG, a new program for ab initio gene prediction based on a conditional random field model with semi-Markov structure that is trained with an online large-margin algorithm related to multiclass SVMs. Our experiments on benchmark vertebrate datasets and on regions from the ENCODE project show significant improvements in prediction accuracy over published gene predictors that use intrinsic features only, particularly at the gene level and on genes with long introns.

  8. Random Subspace Aggregation for Cancer Prediction with Gene Expression Profiles

    Directory of Open Access Journals (Sweden)

    Liying Yang

    2016-01-01

    Full Text Available Background. Precisely predicting cancer is crucial for cancer treatment. Gene expression profiles make it possible to analyze patterns between genes and cancers on the genome-wide scale. Gene expression data analysis, however, is confronted with enormous challenges for its characteristics, such as high dimensionality, small sample size, and low Signal-to-Noise Ratio. Results. This paper proposes a method, termed RS_SVM, to predict gene expression profiles via aggregating SVM trained on random subspaces. After choosing gene features through statistical analysis, RS_SVM randomly selects feature subsets to yield random subspaces and training SVM classifiers accordingly and then aggregates SVM classifiers to capture the advantage of ensemble learning. Experiments on eight real gene expression datasets are performed to validate the RS_SVM method. Experimental results show that RS_SVM achieved better classification accuracy and generalization performance in contrast with single SVM, K-nearest neighbor, decision tree, Bagging, AdaBoost, and the state-of-the-art methods. Experiments also explored the effect of subspace size on prediction performance. Conclusions. The proposed RS_SVM method yielded superior performance in analyzing gene expression profiles, which demonstrates that RS_SVM provides a good channel for such biological data.

  9. Comparative Analysis of Predicted Gene Expression among Crenarchaeal Genomes

    Directory of Open Access Journals (Sweden)

    Shibsankar Das

    2017-03-01

    Full Text Available Research into new methods for identifying highly expressed genes in anonymous genome sequences has been going on for more than 15 years. We presented here an alternative approach based on modified score of relative codon usage bias to identify highly expressed genes in crenarchaeal genomes. The proposed algorithm relies exclusively on sequence features for identifying the highly expressed genes. In this study, a comparative analysis of predicted highly expressed genes in five crenarchaeal genomes was performed using the score of Modified Relative Codon Bias Strength (MRCBS as a numerical estimator of gene expression level. We found a systematic strong correlation between Codon Adaptation Index and MRCBS. Additionally, MRCBS correlated well with other expression measures. Our study indicates that MRCBS can consistently capture the highly expressed genes.

  10. A deep auto-encoder model for gene expression prediction.

    Science.gov (United States)

    Xie, Rui; Wen, Jia; Quitadamo, Andrew; Cheng, Jianlin; Shi, Xinghua

    2017-11-17

    Gene expression is a key intermediate level that genotypes lead to a particular trait. Gene expression is affected by various factors including genotypes of genetic variants. With an aim of delineating the genetic impact on gene expression, we build a deep auto-encoder model to assess how good genetic variants will contribute to gene expression changes. This new deep learning model is a regression-based predictive model based on the MultiLayer Perceptron and Stacked Denoising Auto-encoder (MLP-SAE). The model is trained using a stacked denoising auto-encoder for feature selection and a multilayer perceptron framework for backpropagation. We further improve the model by introducing dropout to prevent overfitting and improve performance. To demonstrate the usage of this model, we apply MLP-SAE to a real genomic datasets with genotypes and gene expression profiles measured in yeast. Our results show that the MLP-SAE model with dropout outperforms other models including Lasso, Random Forests and the MLP-SAE model without dropout. Using the MLP-SAE model with dropout, we show that gene expression quantifications predicted by the model solely based on genotypes, align well with true gene expression patterns. We provide a deep auto-encoder model for predicting gene expression from SNP genotypes. This study demonstrates that deep learning is appropriate for tackling another genomic problem, i.e., building predictive models to understand genotypes' contribution to gene expression. With the emerging availability of richer genomic data, we anticipate that deep learning models play a bigger role in modeling and interpreting genomics.

  11. Environmental Metagenomics: The Data Assembly and Data Analysis Perspectives

    Science.gov (United States)

    Kumar, Vinay; Maitra, S. S.; Shukla, Rohit Nandan

    2015-01-01

    Novel gene finding is one of the emerging fields in the environmental research. In the past decades the research was focused mainly on the discovery of microorganisms which were capable of degrading a particular compound. A lot of methods are available in literature about the cultivation and screening of these novel microorganisms. All of these methods are efficient for screening of microbes which can be cultivated in the laboratory. Microorganisms which live in extreme conditions like hot springs, frozen glaciers, acid mine drainage, etc. cannot be cultivated in the laboratory, this is because of incomplete knowledge about their growth requirements like temperature, nutrients and their mutual dependence on each other. The microbes that can be cultivated correspond only to less than 1 % of the total microbes which are present in the earth. Rest of the 99 % of uncultivated majority remains inaccessible. Metagenomics transcends the culture requirements of microbes. In metagenomics DNA is directly extracted from the environmental samples such as soil, seawater, acid mine drainage etc., followed by construction and screening of metagenomic library. With the ongoing research, a huge amount of metagenomic data is accumulating. Understanding this data is an essential step to extract novel genes of industrial importance. Various bioinformatics tools have been designed to analyze and annotate the data produced from the metagenome. The Bio-informatic requirements of metagenomics data analysis are different in theory and practice. This paper reviews the tools that are available for metagenomic data analysis and the capability such tools—what they can do and their web availability.

  12. Metagenomic analysis of kimchi, a traditional Korean fermented food.

    Science.gov (United States)

    Jung, Ji Young; Lee, Se Hee; Kim, Jeong Myeong; Park, Moon Su; Bae, Jin-Woo; Hahn, Yoonsoo; Madsen, Eugene L; Jeon, Che Ok

    2011-04-01

    Kimchi, a traditional food in the Korean culture, is made from vegetables by fermentation. In this study, metagenomic approaches were used to monitor changes in bacterial populations, metabolic potential, and overall genetic features of the microbial community during the 29-day fermentation process. Metagenomic DNA was extracted from kimchi samples obtained periodically and was sequenced using a 454 GS FLX Titanium system, which yielded a total of 701,556 reads, with an average read length of 438 bp. Phylogenetic analysis based on 16S rRNA genes from the metagenome indicated that the kimchi microbiome was dominated by members of three genera: Leuconostoc, Lactobacillus, and Weissella. Assignment of metagenomic sequences to SEED categories of the Metagenome Rapid Annotation using Subsystem Technology (MG-RAST) server revealed a genetic profile characteristic of heterotrophic lactic acid fermentation of carbohydrates, which was supported by the detection of mannitol, lactate, acetate, and ethanol as fermentation products. When the metagenomic reads were mapped onto the database of completed genomes, the Leuconostoc mesenteroides subsp. mesenteroides ATCC 8293 and Lactobacillus sakei subsp. sakei 23K genomes were highly represented. These same two genera were confirmed to be important in kimchi fermentation when the majority of kimchi metagenomic sequences showed very high identity to Leuconostoc mesenteroides and Lactobacillus genes. Besides microbial genome sequences, a surprisingly large number of phage DNA sequences were identified from the cellular fractions, possibly indicating that a high proportion of cells were infected by bacteriophages during fermentation. Overall, these results provide insights into the kimchi microbial community and also shed light on fermentation processes carried out broadly by complex microbial communities.

  13. The MAOA gene predicts happiness in women.

    Science.gov (United States)

    Chen, Henian; Pine, Daniel S; Ernst, Monique; Gorodetsky, Elena; Kasen, Stephanie; Gordon, Kathy; Goldman, David; Cohen, Patricia

    2013-01-10

    Psychologists, quality of life and well-being researchers have grown increasingly interested in understanding the factors that are associated with human happiness. Although twin studies estimate that genetic factors account for 35-50% of the variance in human happiness, knowledge of specific genes is limited. However, recent advances in molecular genetics can now provide a window into neurobiological markers of human happiness. This investigation examines association between happiness and monoamine oxidase A (MAOA) genotype. Data were drawn from a longitudinal study of a population-based cohort, followed for three decades. In women, low expression of MAOA (MAOA-L) was related significantly to greater happiness (0.261 SD increase with one L-allele, 0.522 SD with two L-alleles, P=0.002) after adjusting for the potential effects of age, education, household income, marital status, employment status, mental disorder, physical health, relationship quality, religiosity, abuse history, recent negative life events and self-esteem use in linear regression models. In contrast, no such association was found in men. This new finding may help explain the gender difference on happiness and provide a link between MAOA and human happiness. Copyright © 2012 Elsevier Inc. All rights reserved.

  14. Accessing carboxylesterase diversity from termite hindgut symbionts through metagenomics.

    Science.gov (United States)

    Rashamuse, Konanani; Mabizela-Mokoena, Nobalanda; Sanyika, Tendai Walter; Mabvakure, Batsirai; Brady, Dean

    2012-01-01

    A shotgun metagenomic library was constructed from termite hindgut symbionts and subsequently screened for esterase activities. A total of 68 recombinant clones conferring esterolytic phenotypes were identified, of which the 14 most active were subcloned and sequenced. The nucleotide lengths of the esterase-encoding open reading frames (ORFs) ranged from 783 to 2,592 bp and encoded proteins with predicted molecular masses of between 28.8 and 97.5 kDa. The highest identity scores in the GenBank database, from a global amino acid alignment ranged from 39 to 83%. The identified ORFs revealed the presence of the G-X-S-X-D, G-D-S-X, and S-X-X-K sequence motifs that have been reported to harbour a catalytic serine residue in other previously reported esterase primary structures. Five of the ORFs (EstT5, EstT7, EstT9, EstT10, and EstT12) could not be classified into any of the original eight esterase families. One of the ORFs (EstT9) showed a unique primary structure consisting of an amidohydrolase-esterase fusion. Six of the 14 esterase-encoding genes were recombinantly expressed in Escherichia coli and the purified enzymes exhibited temperature optima of between 40-50°C. Substrate-profiling studies revealed that the characterised enzymes were 'true' carboxylesterases based on their preferences for short to medium chain length p-nitrophenyl ester substrates. This study has demonstrated a successful application of a metagenomic approach in accessing novel esterase-encoding genes from the gut of termites that could otherwise have been missed by classical culture enrichment approaches. Copyright © 2012 S. Karger AG, Basel.

  15. Combining Gene Signatures Improves Prediction of Breast Cancer Survival

    Science.gov (United States)

    Zhao, Xi; Naume, Bjørn; Langerød, Anita; Frigessi, Arnoldo; Kristensen, Vessela N.; Børresen-Dale, Anne-Lise; Lingjærde, Ole Christian

    2011-01-01

    Background Several gene sets for prediction of breast cancer survival have been derived from whole-genome mRNA expression profiles. Here, we develop a statistical framework to explore whether combination of the information from such sets may improve prediction of recurrence and breast cancer specific death in early-stage breast cancers. Microarray data from two clinically similar cohorts of breast cancer patients are used as training (n = 123) and test set (n = 81), respectively. Gene sets from eleven previously published gene signatures are included in the study. Principal Findings To investigate the relationship between breast cancer survival and gene expression on a particular gene set, a Cox proportional hazards model is applied using partial likelihood regression with an L2 penalty to avoid overfitting and using cross-validation to determine the penalty weight. The fitted models are applied to an independent test set to obtain a predicted risk for each individual and each gene set. Hierarchical clustering of the test individuals on the basis of the vector of predicted risks results in two clusters with distinct clinical characteristics in terms of the distribution of molecular subtypes, ER, PR status, TP53 mutation status and histological grade category, and associated with significantly different survival probabilities (recurrence: p = 0.005; breast cancer death: p = 0.014). Finally, principal components analysis of the gene signatures is used to derive combined predictors used to fit a new Cox model. This model classifies test individuals into two risk groups with distinct survival characteristics (recurrence: p = 0.003; breast cancer death: p = 0.001). The latter classifier outperforms all the individual gene signatures, as well as Cox models based on traditional clinical parameters and the Adjuvant! Online for survival prediction. Conclusion Combining the predictive strength of multiple gene signatures improves prediction of breast

  16. Combining gene signatures improves prediction of breast cancer survival.

    Directory of Open Access Journals (Sweden)

    Xi Zhao

    Full Text Available BACKGROUND: Several gene sets for prediction of breast cancer survival have been derived from whole-genome mRNA expression profiles. Here, we develop a statistical framework to explore whether combination of the information from such sets may improve prediction of recurrence and breast cancer specific death in early-stage breast cancers. Microarray data from two clinically similar cohorts of breast cancer patients are used as training (n = 123 and test set (n = 81, respectively. Gene sets from eleven previously published gene signatures are included in the study. PRINCIPAL FINDINGS: To investigate the relationship between breast cancer survival and gene expression on a particular gene set, a Cox proportional hazards model is applied using partial likelihood regression with an L2 penalty to avoid overfitting and using cross-validation to determine the penalty weight. The fitted models are applied to an independent test set to obtain a predicted risk for each individual and each gene set. Hierarchical clustering of the test individuals on the basis of the vector of predicted risks results in two clusters with distinct clinical characteristics in terms of the distribution of molecular subtypes, ER, PR status, TP53 mutation status and histological grade category, and associated with significantly different survival probabilities (recurrence: p = 0.005; breast cancer death: p = 0.014. Finally, principal components analysis of the gene signatures is used to derive combined predictors used to fit a new Cox model. This model classifies test individuals into two risk groups with distinct survival characteristics (recurrence: p = 0.003; breast cancer death: p = 0.001. The latter classifier outperforms all the individual gene signatures, as well as Cox models based on traditional clinical parameters and the Adjuvant! Online for survival prediction. CONCLUSION: Combining the predictive strength of multiple gene signatures improves

  17. Functional Metagenomic Investigations of the Human Intestinal Microbiota

    DEFF Research Database (Denmark)

    Moore, Aimee M.; Munck, Christian; Sommer, Morten Otto Alexander

    2011-01-01

    The human intestinal microbiota encode multiple critical functions impacting human health, including metabolism of dietary substrate, prevention of pathogen invasion, immune system modulation, and provision of a reservoir of antibiotic resistance genes accessible to pathogens. The complexity...... microorganisms, but relatively recently applied to the study of the human commensal microbiota. Metagenomic functional screens characterize the functional capacity of a microbial community, independent of identity to known genes, by subjecting the metagenome to functional assays in a genetically tractable host....... Here we highlight recent work applying this technique to study the functional diversity of the intestinal microbiota, and discuss how an approach combining high-throughput sequencing, cultivation, and metagenomic functional screens can improve our understanding of interactions between this complex...

  18. Bioinformatic prediction and functional characterization of human KIAA0100 gene

    Directory of Open Access Journals (Sweden)

    He Cui

    2017-02-01

    Full Text Available Our previous study demonstrated that human KIAA0100 gene was a novel acute monocytic leukemia-associated antigen (MLAA gene. But the functional characterization of human KIAA0100 gene has remained unknown to date. Here, firstly, bioinformatic prediction of human KIAA0100 gene was carried out using online softwares; Secondly, Human KIAA0100 gene expression was downregulated by the clustered regularly interspaced short palindromic repeats (CRISPR/CRISPR-associated (Cas 9 system in U937 cells. Cell proliferation and apoptosis were next evaluated in KIAA0100-knockdown U937 cells. The bioinformatic prediction showed that human KIAA0100 gene was located on 17q11.2, and human KIAA0100 protein was located in the secretory pathway. Besides, human KIAA0100 protein contained a signalpeptide, a transmembrane region, three types of secondary structures (alpha helix, extended strand, and random coil , and four domains from mitochondrial protein 27 (FMP27. The observation on functional characterization of human KIAA0100 gene revealed that its downregulation inhibited cell proliferation, and promoted cell apoptosis in U937 cells. To summarize, these results suggest human KIAA0100 gene possibly comes within mitochondrial genome; moreover, it is a novel anti-apoptotic factor related to carcinogenesis or progression in acute monocytic leukemia, and may be a potential target for immunotherapy against acute monocytic leukemia.

  19. Generating viral metagenomes from the coral holobiont

    Directory of Open Access Journals (Sweden)

    Karen Dawn Weynberg

    2014-05-01

    Full Text Available Reef-building corals comprise multipartite symbioses where the cnidarian animal is host to an array of eukaryotic and prokaryotic organisms, and the viruses that infect them. These viruses are critical elements of the coral holobiont, serving not only as agents of mortality, but also as potential vectors for lateral gene flow, and as elements encoding a variety of auxiliary metabolic functions. Consequently, understanding the functioning and health of the coral holobiont requires detailed knowledge of the associated viral assemblage and its function. Currently, the most tractable way of uncovering viral diversity and function is through metagenomic approaches, which is inherently difficult in corals because of the complex holobiont community, an extracellular mucus layer that all corals secrete, and the variety of sizes and structures of nucleic acids found in viruses. Here we present the first protocol for isolating, purifying and amplifying viral nucleic acids from corals based on mechanical disruption of cells. This method produces at least 50% higher yields of viral nucleic acids, has very low levels of cellular sequence contamination and captures wider viral diversity than previously used chemical-based extraction methods. We demonstrate that our mechanical-based method profiles a greater diversity of DNA and RNA genomes, including virus groups such as Retro-transcribing and ssRNA viruses, which are absent from metagenomes generated via chemical-based methods. In addition, we briefly present (and make publically available the first paired DNA and RNA viral metagenomes from the coral Acropora tenuis.

  20. Application of metagenomics in the human gut microbiome.

    Science.gov (United States)

    Wang, Wei-Lin; Xu, Shao-Yan; Ren, Zhi-Gang; Tao, Liang; Jiang, Jian-Wen; Zheng, Shu-Sen

    2015-01-21

    There are more than 1000 microbial species living in the complex human intestine. The gut microbial community plays an important role in protecting the host against pathogenic microbes, modulating immunity, regulating metabolic processes, and is even regarded as an endocrine organ. However, traditional culture methods are very limited for identifying microbes. With the application of molecular biologic technology in the field of the intestinal microbiome, especially metagenomic sequencing of the next-generation sequencing technology, progress has been made in the study of the human intestinal microbiome. Metagenomics can be used to study intestinal microbiome diversity and dysbiosis, as well as its relationship to health and disease. Moreover, functional metagenomics can identify novel functional genes, microbial pathways, antibiotic resistance genes, functional dysbiosis of the intestinal microbiome, and determine interactions and co-evolution between microbiota and host, though there are still some limitations. Metatranscriptomics, metaproteomics and metabolomics represent enormous complements to the understanding of the human gut microbiome. This review aims to demonstrate that metagenomics can be a powerful tool in studying the human gut microbiome with encouraging prospects. The limitations of metagenomics to be overcome are also discussed. Metatranscriptomics, metaproteomics and metabolomics in relation to the study of the human gut microbiome are also briefly discussed.

  1. Novel Florfenicol and Chloramphenicol Resistance Gene Discovered in Alaskan Soil by Using Functional Metagenomics▿

    Science.gov (United States)

    Lang, Kevin S.; Anderson, Janet M.; Schwarz, Stefan; Williamson, Lynn; Handelsman, Jo; Singer, Randall S.

    2010-01-01

    Functional metagenomics was used to search for florfenicol resistance genes in libraries of cloned DNA isolated from Alaskan soil. A gene that mediated reduced susceptibility to florfenicol was identified and designated pexA. The predicted PexA protein showed a structure similar to that of efflux pumps of the major facilitator superfamily. PMID:20543056

  2. Gene expression profiling predicts survival in conventional renal cell carcinoma.

    Directory of Open Access Journals (Sweden)

    Hongjuan Zhao

    2006-01-01

    Full Text Available BACKGROUND: Conventional renal cell carcinoma (cRCC accounts for most of the deaths due to kidney cancer. Tumor stage, grade, and patient performance status are used currently to predict survival after surgery. Our goal was to identify gene expression features, using comprehensive gene expression profiling, that correlate with survival. METHODS AND FINDINGS: Gene expression profiles were determined in 177 primary cRCCs using DNA microarrays. Unsupervised hierarchical clustering analysis segregated cRCC into five gene expression subgroups. Expression subgroup was correlated with survival in long-term follow-up and was independent of grade, stage, and performance status. The tumors were then divided evenly into training and test sets that were balanced for grade, stage, performance status, and length of follow-up. A semisupervised learning algorithm (supervised principal components analysis was applied to identify transcripts whose expression was associated with survival in the training set, and the performance of this gene expression-based survival predictor was assessed using the test set. With this method, we identified 259 genes that accurately predicted disease-specific survival among patients in the independent validation group (p < 0.001. In multivariate analysis, the gene expression predictor was a strong predictor of survival independent of tumor stage, grade, and performance status (p < 0.001. CONCLUSIONS: cRCC displays molecular heterogeneity and can be separated into gene expression subgroups that correlate with survival after surgery. We have identified a set of 259 genes that predict survival after surgery independent of clinical prognostic factors.

  3. Gene Expression Profiling Predicts Survival in Conventional Renal Cell Carcinoma.

    Directory of Open Access Journals (Sweden)

    2005-12-01

    Full Text Available BACKGROUND: Conventional renal cell carcinoma (cRCC accounts for most of the deaths due to kidney cancer. Tumor stage, grade, and patient performance status are used currently to predict survival after surgery. Our goal was to identify gene expression features, using comprehensive gene expression profiling, that correlate with survival. METHODS AND FINDINGS: Gene expression profiles were determined in 177 primary cRCCs using DNA microarrays. Unsupervised hierarchical clustering analysis segregated cRCC into five gene expression subgroups. Expression subgroup was correlated with survival in long-term follow-up and was independent of grade, stage, and performance status. The tumors were then divided evenly into training and test sets that were balanced for grade, stage, performance status, and length of follow-up. A semisupervised learning algorithm (supervised principal components analysis was applied to identify transcripts whose expression was associated with survival in the training set, and the performance of this gene expression-based survival predictor was assessed using the test set. With this method, we identified 259 genes that accurately predicted disease-specific survival among patients in the independent validation group (p < 0.001. In multivariate analysis, the gene expression predictor was a strong predictor of survival independent of tumor stage, grade, and performance status (p < 0.001. CONCLUSIONS: cRCC displays molecular heterogeneity and can be separated into gene expression subgroups that correlate with survival after surgery. We have identified a set of 259 genes that predict survival after surgery independent of clinical prognostic factors.

  4. Metagenomic evidence for reciprocal particle exchange between the mainstem estuary and lateral bay sediments of the lower Columbia River

    Directory of Open Access Journals (Sweden)

    Mariya W Smith

    2015-10-01

    Full Text Available Lateral bays of the lower Columbia River estuary are areas of enhanced water retention that influence net ecosystem metabolism through activities of their diverse microbial communities. Metagenomic characterization of sediment microbiota from three disparate sites in two brackish lateral bays (Baker and Youngs produced approximately 100 Gbp of DNA sequence data analyzed subsequently for predicted SSU rRNA and peptide-coding genes. The metagenomes were dominated by Bacteria. A large component of Eukaryota was present in Youngs Bay samples, i.e. the inner bay sediment was enriched with the invasive New Zealand mudsnail, Potamopyrgus antipodarum, known for high ammonia production. The metagenome was also highly enriched with an archaeal ammonia oxidizer closely related to Nitrosoarchaeum limnia. Combined analysis of sequences and continuous, high-resolution time series of biogeochemical data from fixed and mobile platforms revealed the importance of large-scale reciprocal particle exchanges between the mainstem estuarine water column and lateral bay sediments. Deposition of marine diatom particles in sediments near Youngs Bay mouth was associated with a dramatic enrichment of Bacteroidetes (58% of total Bacteria and corresponding genes involved in phytoplankton polysaccharide degradation. The Baker Bay sediment metagenome contained abundant Archaea, including diverse methanogens, as well as functional genes for methylotrophy and taxonomic markers for syntrophic bacteria, suggesting that active methane cycling occurs at this location. Our previous work showed enrichments of similar anaerobic taxa in particulate matter of the mainstem estuarine water column. In total, our results identify the lateral bays as both sources and sinks of biogenic particles significantly impacting microbial community composition and biogeochemical activities in the estuary.

  5. A metagenomic snapshot of taxonomic and functional diversity in an alpine glacier cryoconite ecosystem

    International Nuclear Information System (INIS)

    Edwards, Arwyn; Pachebat, Justin A; Swain, Martin; Hegarty, Matt; Rassner, Sara M E; Hodson, Andrew J; Irvine-Fynn, Tristram D L; Sattler, Birgit

    2013-01-01

    Cryoconite is a microbe–mineral aggregate which darkens the ice surface of glaciers. Microbial process and marker gene PCR-dependent measurements reveal active and diverse cryoconite microbial communities on polar glaciers. Here, we provide the first report of a cryoconite metagenome and culture-independent study of alpine cryoconite microbial diversity. We assembled 1.2 Gbp of metagenomic DNA sequenced using an Illumina HiScanSQ from cryoconite holes across the ablation zone of Rotmoosferner in the Austrian Alps. The metagenome revealed a bacterially-dominated community, with Proteobacteria (62% of bacterial-assigned contigs) and Bacteroidetes (14%) considerably more abundant than Cyanobacteria (2.5%). Streptophyte DNA dominated the eukaryotic metagenome. Functional genes linked to N, Fe, S and P cycling illustrated an acquisitive trend and a nitrogen cycle based upon efficient ammonia recycling. A comparison of 32 metagenome datasets revealed a similarity in functional profiles between the cryoconite and metagenomes characterized from other cold microbe–mineral aggregates. Overall, the metagenomic snapshot reveals the cryoconite ecosystem of this alpine glacier as dependent on scavenging carbon and nutrients from allochthonous sources, in particular mosses transported by wind from ice-marginal habitats, consistent with net heterotrophy indicated by productivity measurements. A transition from singular snapshots of cryoconite metagenomes to comparative analyses is advocated. (letter)

  6. Gene-specific function prediction for non-synonymous mutations in monogenic diabetes genes.

    Directory of Open Access Journals (Sweden)

    Quan Li

    Full Text Available The rapid progress of genomic technologies has been providing new opportunities to address the need of maturity-onset diabetes of the young (MODY molecular diagnosis. However, whether a new mutation causes MODY can be questionable. A number of in silico methods have been developed to predict functional effects of rare human mutations. The purpose of this study is to compare the performance of different bioinformatics methods in the functional prediction of nonsynonymous mutations in each MODY gene, and provides reference matrices to assist the molecular diagnosis of MODY. Our study showed that the prediction scores by different methods of the diabetes mutations were highly correlated, but were more complimentary than replacement to each other. The available in silico methods for the prediction of diabetes mutations had varied performances across different genes. Applying gene-specific thresholds defined by this study may be able to increase the performance of in silico prediction of disease-causing mutations.

  7. Discriminative local subspaces in gene expression data for effective gene function prediction.

    Science.gov (United States)

    Puelma, Tomas; Gutiérrez, Rodrigo A; Soto, Alvaro

    2012-09-01

    Massive amounts of genome-wide gene expression data have become available, motivating the development of computational approaches that leverage this information to predict gene function. Among successful approaches, supervised machine learning methods, such as Support Vector Machines (SVMs), have shown superior prediction accuracy. However, these methods lack the simple biological intuition provided by co-expression networks (CNs), limiting their practical usefulness. In this work, we present Discriminative Local Subspaces (DLS), a novel method that combines supervised machine learning and co-expression techniques with the goal of systematically predict genes involved in specific biological processes of interest. Unlike traditional CNs, DLS uses the knowledge available in Gene Ontology (GO) to generate informative training sets that guide the discovery of expression signatures: expression patterns that are discriminative for genes involved in the biological process of interest. By linking genes co-expressed with these signatures, DLS is able to construct a discriminative CN that links both, known and previously uncharacterized genes, for the selected biological process. This article focuses on the algorithm behind DLS and shows its predictive power using an Arabidopsis thaliana dataset and a representative set of 101 GO terms from the Biological Process Ontology. Our results show that DLS has a superior average accuracy than both SVMs and CNs. Thus, DLS is able to provide the prediction accuracy of supervised learning methods while maintaining the intuitive understanding of CNs. A MATLAB® implementation of DLS is available at http://virtualplant.bio.puc.cl/cgi-bin/Lab/tools.cgi.

  8. Inductive matrix completion for predicting gene-disease associations.

    Science.gov (United States)

    Natarajan, Nagarajan; Dhillon, Inderjit S

    2014-06-15

    Most existing methods for predicting causal disease genes rely on specific type of evidence, and are therefore limited in terms of applicability. More often than not, the type of evidence available for diseases varies-for example, we may know linked genes, keywords associated with the disease obtained by mining text, or co-occurrence of disease symptoms in patients. Similarly, the type of evidence available for genes varies-for example, specific microarray probes convey information only for certain sets of genes. In this article, we apply a novel matrix-completion method called Inductive Matrix Completion to the problem of predicting gene-disease associations; it combines multiple types of evidence (features) for diseases and genes to learn latent factors that explain the observed gene-disease associations. We construct features from different biological sources such as microarray expression data and disease-related textual data. A crucial advantage of the method is that it is inductive; it can be applied to diseases not seen at training time, unlike traditional matrix-completion approaches and network-based inference methods that are transductive. Comparison with state-of-the-art methods on diseases from the Online Mendelian Inheritance in Man (OMIM) database shows that the proposed approach is substantially better-it has close to one-in-four chance of recovering a true association in the top 100 predictions, compared to the recently proposed Catapult method (second best) that has bigdata.ices.utexas.edu/project/gene-disease. © The Author 2014. Published by Oxford University Press.

  9. Dynamic changes of yak ( gut microbiota during growth revealed by polymerase chain reaction-denaturing gradient gel electrophoresis and metagenomics

    Directory of Open Access Journals (Sweden)

    Yuanyang Nie

    2017-07-01

    Full Text Available Objective To understand the dynamic structure, function, and influence on nutrient metabolism in hosts, it was crucial to assess the genetic potential of gut microbial community in yaks of different ages. Methods The denaturing gradient gel electrophoresis (DGGE profiles and Illumina-based metagenomic sequencing on colon contents of 15 semi-domestic yaks were investigated. Unweighted pairwise grouping method with mathematical averages (UPGMA clustering and principal component analysis (PCA were used to analyze the DGGE fingerprint. The Illumina sequences were assembled, predicted to genes and functionally annotated, and then classified by querying protein sequences of the genes against the Kyoto encyclopedia of genes and genomes (KEGG database. Results Metagenomic sequencing showed that more than 85% of ribosomal RNA (rRNA gene sequences belonged to the phylum Firmicutes and Bacteroidetes, indicating that the family Ruminococcaceae (46.5%, Rikenellaceae (11.3%, Lachnospiraceae (10.0%, and Bacteroidaceae (6.3% were dominant gut microbes. Over 50% of non-rRNA gene sequences represented the metabolic pathways of amino acids (14.4%, proteins (12.3%, sugars (11.9%, nucleotides (6.8%, lipids (1.7%, xenobiotics (1.4%, coenzymes, and vitamins (3.6%. Gene functional classification showed that most of enzyme-coding genes were related to cellulose digestion and amino acids metabolic pathways. Conclusion Yaks’ age had a substantial effect on gut microbial composition. Comparative metagenomics of gut microbiota in 0.5-, 1.5-, and 2.5-year-old yaks revealed that the abundance of the class Clostridia, Bacteroidia, and Lentisphaeria, as well as the phylum Firmicutes, Bacteroidetes, Lentisphaerae, Tenericutes, and Cyanobacteria, varied more greatly during yaks’ growth, especially in young animals (0.5 and 1.5 years old. Gut microbes, including Bacteroides, Clostridium, and Lentisphaeria, make a contribution to the energy metabolism and synthesis of amino

  10. Recovering Genomics Clusters of Secondary Metabolites from Lakes Using Genome-Resolved Metagenomics

    Directory of Open Access Journals (Sweden)

    Rafael R. C. Cuadrat

    2018-02-01

    Full Text Available Metagenomic approaches became increasingly popular in the past decades due to decreasing costs of DNA sequencing and bioinformatics development. So far, however, the recovery of long genes coding for secondary metabolites still represents a big challenge. Often, the quality of metagenome assemblies is poor, especially in environments with a high microbial diversity where sequence coverage is low and complexity of natural communities high. Recently, new and improved algorithms for binning environmental reads and contigs have been developed to overcome such limitations. Some of these algorithms use a similarity detection approach to classify the obtained reads into taxonomical units and to assemble draft genomes. This approach, however, is quite limited since it can classify exclusively sequences similar to those available (and well classified in the databases. In this work, we used draft genomes from Lake Stechlin, north-eastern Germany, recovered by MetaBat, an efficient binning tool that integrates empirical probabilistic distances of genome abundance, and tetranucleotide frequency for accurate metagenome binning. These genomes were screened for secondary metabolism genes, such as polyketide synthases (PKS and non-ribosomal peptide synthases (NRPS, using the Anti-SMASH and NAPDOS workflows. With this approach we were able to identify 243 secondary metabolite clusters from 121 genomes recovered from our lake samples. A total of 18 NRPS, 19 PKS, and 3 hybrid PKS/NRPS clusters were found. In addition, it was possible to predict the partial structure of several secondary metabolite clusters allowing for taxonomical classifications and phylogenetic inferences. Our approach revealed a high potential to recover and study secondary metabolites genes from any aquatic ecosystem.

  11. HIPred: an integrative approach to predicting haploinsufficient genes.

    Science.gov (United States)

    Shihab, Hashem A; Rogers, Mark F; Campbell, Colin; Gaunt, Tom R

    2017-06-15

    A major cause of autosomal dominant disease is haploinsufficiency, whereby a single copy of a gene is not sufficient to maintain the normal function of the gene. A large proportion of existing methods for predicting haploinsufficiency incorporate biological networks, e.g. protein-protein interaction networks that have recently been shown to introduce study bias. As a result, these methods tend to perform best on well-studied genes, but underperform on less studied genes. The advent of large genome sequencing consortia, such as the 1000 genomes project, NHLBI Exome Sequencing Project and the Exome Aggregation Consortium creates an urgent need for unbiased haploinsufficiency prediction methods. Here, we describe a machine learning approach, called HIPred, that integrates genomic and evolutionary information from ENSEMBL, with functional annotations from the Encyclopaedia of DNA Elements consortium and the NIH Roadmap Epigenomics Project to predict haploinsufficiency, without the study bias described earlier. We benchmark HIPred using several datasets and show that our unbiased method performs as well as, and in most cases, outperforms existing biased algorithms. HIPred scores for all gene identifiers are available at: https://github.com/HAShihab/HIPred . h.shihab@bristol.ac.uk or tom.gaunt@bristol.ac.uk. Supplementary data are available at Bioinformatics online.

  12. Biomarkers and genes predictive of disease predisposition and ...

    African Journals Online (AJOL)

    Biomarkers and genes predictive of disease predisposition and prognosis in rheumatoid arthritis. Rheumatoid arthritis is a debilitating disease which often progresses to relentlessly severe disease. Pieter W A Meyer, NHDip Med Tech, MTech, PhD. Medical Research Council Unit for Inflammation and Immunity, Department ...

  13. Accurate prediction of secondary metabolite gene clusters in filamentous fungi

    DEFF Research Database (Denmark)

    Andersen, Mikael Rørdam; Nielsen, Jakob Blæsbjerg; Klitgaard, Andreas

    2013-01-01

    supporting enzymes for key synthases one cluster at a time. In this study, we design and apply a DNA expression array for Aspergillus nidulans in combination with legacy data to form a comprehensive gene expression compendium. We apply a guilt-by-association-based analysis to predict the extent...

  14. Prediction of human protein function according to Gene Ontology categories

    DEFF Research Database (Denmark)

    Jensen, Lars Juhl; Gupta, Ramneek; Stærfeldt, Hans Henrik

    2003-01-01

    developed a method for prediction of protein function for a subset of classes from the Gene Ontology classification scheme. This subset includes several pharmaceutically interesting categories-transcription factors, receptors, ion channels, stress and immune response proteins, hormones and growth factors...

  15. Prediction of epigenetically regulated genes in breast cancer cell lines

    Energy Technology Data Exchange (ETDEWEB)

    Loss, Leandro A; Sadanandam, Anguraj; Durinck, Steffen; Nautiyal, Shivani; Flaucher, Diane; Carlton, Victoria EH; Moorhead, Martin; Lu, Yontao; Gray, Joe W; Faham, Malek; Spellman, Paul; Parvin, Bahram

    2010-05-04

    panel of breast cancer cell lines. Subnetwork enrichment of these genes has identifed 35 common regulators with 6 or more predicted markers. In addition to identifying epigenetically regulated genes, we show evidence of differentially expressed methylation patterns between the basal and luminal subtypes. Our results indicate that the proposed computational protocol is a viable platform for identifying epigenetically regulated genes. Our protocol has generated a list of predictors including COL1A2, TOP2A, TFF1, and VAV3, genes whose key roles in epigenetic regulation is documented in the literature. Subnetwork enrichment of these predicted markers further suggests that epigenetic regulation of individual genes occurs in a coordinated fashion and through common regulators.

  16. MetCap: a bioinformatics probe design pipeline for large-scale targeted metagenomics.

    Science.gov (United States)

    Kushwaha, Sandeep K; Manoharan, Lokeshwaran; Meerupati, Tejashwari; Hedlund, Katarina; Ahrén, Dag

    2015-02-28

    Massive sequencing of genes from different environments has evolved metagenomics as central to enhancing the understanding of the wide diversity of micro-organisms and their roles in driving ecological processes. Reduced cost and high throughput sequencing has made large-scale projects achievable to a wider group of researchers, though complete metagenome sequencing is still a daunting task in terms of sequencing as well as the downstream bioinformatics analyses. Alternative approaches such as targeted amplicon sequencing requires custom PCR primer generation, and is not scalable to thousands of genes or gene families. In this study, we are presenting a web-based tool called MetCap that circumvents the limitations of amplicon sequencing of multiple genes by designing probes that are suitable for large-scale targeted metagenomics sequencing studies. MetCap provides a novel approach to target thousands of genes and genomic regions that could be used in targeted metagenomics studies. Automatic analysis of user-defined sequences is performed, and probes specifically designed for metagenome studies are generated. To illustrate the advantage of a targeted metagenome approach, we have generated more than 400,000 probes that match more than 300,000 [corrected] publicly available sequences related to carbon degradation, and used these probes for target sequencing in a soil metagenome study. The results show high enrichment of target genes and a successful capturing of the majority of gene families. MetCap is freely available to users from: http://soilecology.biol.lu.se/metcap/ . MetCap is facilitating probe-based target enrichment as an easy and efficient alternative tool compared to complex primer-based enrichment for large-scale investigations of metagenomes. Our results have shown efficient large-scale target enrichment through MetCap-designed probes for a soil metagenome. The web service is suitable for any targeted metagenomics project that aims to study several genes

  17. GOPET: A tool for automated predictions of Gene Ontology terms

    Directory of Open Access Journals (Sweden)

    Glatting Karl-Heinz

    2006-03-01

    Full Text Available Abstract Background Vast progress in sequencing projects has called for annotation on a large scale. A Number of methods have been developed to address this challenging task. These methods, however, either apply to specific subsets, or their predictions are not formalised, or they do not provide precise confidence values for their predictions. Description We recently established a learning system for automated annotation, trained with a broad variety of different organisms to predict the standardised annotation terms from Gene Ontology (GO. Now, this method has been made available to the public via our web-service GOPET (Gene Ontology term Prediction and Evaluation Tool. It supplies annotation for sequences of any organism. For each predicted term an appropriate confidence value is provided. The basic method had been developed for predicting molecular function GO-terms. It is now expanded to predict biological process terms. This web service is available via http://genius.embnet.dkfz-heidelberg.de/menu/biounit/open-husar Conclusion Our web service gives experimental researchers as well as the bioinformatics community a valuable sequence annotation device. Additionally, GOPET also provides less significant annotation data which may serve as an extended discovery platform for the user.

  18. Bioinformatics Prediction of Polyketide Synthase Gene Clusters from Mycosphaerella fijiensis.

    Directory of Open Access Journals (Sweden)

    Roslyn D Noar

    Full Text Available Mycosphaerella fijiensis, causal agent of black Sigatoka disease of banana, is a Dothideomycete fungus closely related to fungi that produce polyketides important for plant pathogenicity. We utilized the M. fijiensis genome sequence to predict PKS genes and their gene clusters and make bioinformatics predictions about the types of compounds produced by these clusters. Eight PKS gene clusters were identified in the M. fijiensis genome, placing M. fijiensis into the 23rd percentile for the number of PKS genes compared to other Dothideomycetes. Analysis of the PKS domains identified three of the PKS enzymes as non-reducing and two as highly reducing. Gene clusters contained types of genes frequently found in PKS clusters including genes encoding transporters, oxidoreductases, methyltransferases, and non-ribosomal peptide synthases. Phylogenetic analysis identified a putative PKS cluster encoding melanin biosynthesis. None of the other clusters were closely aligned with genes encoding known polyketides, however three of the PKS genes fell into clades with clusters encoding alternapyrone, fumonisin, and solanapyrone produced by Alternaria and Fusarium species. A search for homologs among available genomic sequences from 103 Dothideomycetes identified close homologs (>80% similarity for six of the PKS sequences. One of the PKS sequences was not similar (< 60% similarity to sequences in any of the 103 genomes, suggesting that it encodes a unique compound. Comparison of the M. fijiensis PKS sequences with those of two other banana pathogens, M. musicola and M. eumusae, showed that these two species have close homologs to five of the M. fijiensis PKS sequences, but three others were not found in either species. RT-PCR and RNA-Seq analysis showed that the melanin PKS cluster was down-regulated in infected banana as compared to growth in culture. Three other clusters, however were strongly upregulated during disease development in banana, suggesting that

  19. Bioinformatics Prediction of Polyketide Synthase Gene Clusters from Mycosphaerella fijiensis.

    Science.gov (United States)

    Noar, Roslyn D; Daub, Margaret E

    2016-01-01

    Mycosphaerella fijiensis, causal agent of black Sigatoka disease of banana, is a Dothideomycete fungus closely related to fungi that produce polyketides important for plant pathogenicity. We utilized the M. fijiensis genome sequence to predict PKS genes and their gene clusters and make bioinformatics predictions about the types of compounds produced by these clusters. Eight PKS gene clusters were identified in the M. fijiensis genome, placing M. fijiensis into the 23rd percentile for the number of PKS genes compared to other Dothideomycetes. Analysis of the PKS domains identified three of the PKS enzymes as non-reducing and two as highly reducing. Gene clusters contained types of genes frequently found in PKS clusters including genes encoding transporters, oxidoreductases, methyltransferases, and non-ribosomal peptide synthases. Phylogenetic analysis identified a putative PKS cluster encoding melanin biosynthesis. None of the other clusters were closely aligned with genes encoding known polyketides, however three of the PKS genes fell into clades with clusters encoding alternapyrone, fumonisin, and solanapyrone produced by Alternaria and Fusarium species. A search for homologs among available genomic sequences from 103 Dothideomycetes identified close homologs (>80% similarity) for six of the PKS sequences. One of the PKS sequences was not similar (< 60% similarity) to sequences in any of the 103 genomes, suggesting that it encodes a unique compound. Comparison of the M. fijiensis PKS sequences with those of two other banana pathogens, M. musicola and M. eumusae, showed that these two species have close homologs to five of the M. fijiensis PKS sequences, but three others were not found in either species. RT-PCR and RNA-Seq analysis showed that the melanin PKS cluster was down-regulated in infected banana as compared to growth in culture. Three other clusters, however were strongly upregulated during disease development in banana, suggesting that they may encode

  20. A novel bioinformatics strategy for searching industrially useful genome resources from metagenomic sequence libraries.

    Science.gov (United States)

    Uehara, Hiroshi; Iwasaki, Yuki; Wada, Chieko; Ikemura, Toshimichi; Abe, Takashi

    2011-01-01

    Although remarkable progress in metagenomic sequencing of various environmental samples has been made, large numbers of fragment sequences have been registered in the international DNA databanks, primarily without information on gene function and phylotype, and thus with limited usefulness. Industrial useful biological activity is often carried out by a set of genes, such as those constituting an operon. In this connection, metagenomic approaches have a weakness because sets of the genes are usually split up, since the sequences obtained by metagenome analyses are fragmented into 1-kb or much shorter segments. Therefore, even when a set of genes responsible for an industrially useful function is found in one metagenome library, it is usually difficult to know whether a single genome harbors the entire gene set or whether different genomes have individual genes. By modifying Self-Organizing Map (SOM), we previously developed BLSOM for oligonucleotide composition, which allowed classification (self-organization) of sequence fragments according to genomes. Because BLSOM could reassociate genomic fragments according to genomes, BLSOM may ameliorate the abovementioned weakness of metagenome analyses. Here, we have developed a strategy for clustering of metagenomic sequences according to phylotypes and genomes, by testing a gene set contributing to environment preservation.

  1. Comparative analysis of metagenomes of Italian top soil improvers

    International Nuclear Information System (INIS)

    Gigliucci, Federica; Brambilla, Gianfranco; Tozzoli, Rosangela; Michelacci, Valeria; Morabito, Stefano

    2017-01-01

    Biosolids originating from Municipal Waste Water Treatment Plants are proposed as top soil improvers (TSI) for their beneficial input of organic carbon on agriculture lands. Their use to amend soil is controversial, as it may lead to the presence of emerging hazards of anthropogenic or animal origin in the environment devoted to food production. In this study, we used a shotgun metagenomics sequencing as a tool to perform a characterization of the hazards related with the TSIs. The samples showed the presence of many virulence genes associated to different diarrheagenic E. coli pathotypes as well as of different antimicrobial resistance-associated genes. The genes conferring resistance to Fluoroquinolones was the most relevant class of antimicrobial resistance genes observed in all the samples tested. To a lesser extent traits associated with the resistance to Methicillin in Staphylococci and genes conferring resistance to Streptothricin, Fosfomycin and Vancomycin were also identified. The most represented metal resistance genes were cobalt-zinc-cadmium related, accounting for 15–50% of the sequence reads in the different metagenomes out of the total number of those mapping on the class of resistance to compounds determinants. Moreover the taxonomic analysis performed by comparing compost-based samples and biosolids derived from municipal sewage-sludges treatments divided the samples into separate populations, based on the microbiota composition. The results confirm that the metagenomics is efficient to detect genomic traits associated with pathogens and antimicrobial resistance in complex matrices and this approach can be efficiently used for the traceability of TSI samples using the microorganisms’ profiles as indicators of their origin. - Highlights: • Sludge- and green- based biosolids analysed by metagenomics. • Biosolids may introduce microbial hazards in the food chain. • Metagenomics enables tracking biosolids’ sources.

  2. Combining many interaction networks to predict gene function and analyze gene lists.

    Science.gov (United States)

    Mostafavi, Sara; Morris, Quaid

    2012-05-01

    In this article, we review how interaction networks can be used alone or in combination in an automated fashion to provide insight into gene and protein function. We describe the concept of a "gene-recommender system" that can be applied to any large collection of interaction networks to make predictions about gene or protein function based on a query list of proteins that share a function of interest. We discuss these systems in general and focus on one specific system, GeneMANIA, that has unique features and uses different algorithms from the majority of other systems. © 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  3. A Massively Parallel Sequence Similarity Search for Metagenomic Sequencing Data

    Directory of Open Access Journals (Sweden)

    Masanori Kakuta

    2017-10-01

    Full Text Available Sequence similarity searches have been widely used in the analyses of metagenomic sequencing data. Finding homologous sequences in a reference database enables the estimation of taxonomic and functional characteristics of each query sequence. Because current metagenomic sequencing data consist of a large number of nucleotide sequences, the time required for sequence similarity searches account for a large proportion of the total time. This time-consuming step makes it difficult to perform large-scale analyses. To analyze large-scale metagenomic data, such as those found in the human oral microbiome, we developed GHOST-MP (Genome-wide HOmology Search Tool on Massively Parallel system, a parallel sequence similarity search tool for massively parallel computing systems. This tool uses a fast search algorithm based on suffix arrays of query and database sequences and a hierarchical parallel search to accelerate the large-scale sequence similarity search of metagenomic sequencing data. The parallel computing efficiency and the search speed of this tool were evaluated. GHOST-MP was shown to be scalable over 10,000 CPU (Central Processing Unit cores, and achieved over 80-fold acceleration compared with mpiBLAST using the same computational resources. We applied this tool to human oral metagenomic data, and the results indicate that the oral cavity, the oral vestibule, and plaque have different characteristics based on the functional gene category.

  4. MetaStorm: A Public Resource for Customizable Metagenomics Annotation

    Science.gov (United States)

    Arango-Argoty, Gustavo; Singh, Gargi; Heath, Lenwood S.; Pruden, Amy; Xiao, Weidong; Zhang, Liqing

    2016-01-01

    Metagenomics is a trending research area, calling for the need to analyze large quantities of data generated from next generation DNA sequencing technologies. The need to store, retrieve, analyze, share, and visualize such data challenges current online computational systems. Interpretation and annotation of specific information is especially a challenge for metagenomic data sets derived from environmental samples, because current annotation systems only offer broad classification of microbial diversity and function. Moreover, existing resources are not configured to readily address common questions relevant to environmental systems. Here we developed a new online user-friendly metagenomic analysis server called MetaStorm (http://bench.cs.vt.edu/MetaStorm/), which facilitates customization of computational analysis for metagenomic data sets. Users can upload their own reference databases to tailor the metagenomics annotation to focus on various taxonomic and functional gene markers of interest. MetaStorm offers two major analysis pipelines: an assembly-based annotation pipeline and the standard read annotation pipeline used by existing web servers. These pipelines can be selected individually or together. Overall, MetaStorm provides enhanced interactive visualization to allow researchers to explore and manipulate taxonomy and functional annotation at various levels of resolution. PMID:27632579

  5. Metagenomic Insights of Microbial Feedbacks to Elevated CO2 (Invited)

    Science.gov (United States)

    Zhou, J.; Tu, Q.; Wu, L.; He, Z.; Deng, Y.; Van Nostrand, J. D.

    2013-12-01

    Understanding the responses of biological communities to elevated CO2 (eCO2) is a central issue in ecology and global change biology, but its impacts on the diversity, composition, structure, function, interactions and dynamics of soil microbial communities remain elusive. In this study, we first examined microbial responses to eCO2 among six FACE sites/ecosystems using a comprehensive functional gene microarray (GeoChip), and then focused on details of metagenome sequencing analysis in one particular site. GeoChip is a comprehensive functional gene array for examining the relationships between microbial community structure and ecosystem functioning and is a very powerful technology for biogeochemical, ecological and environmental studies. The current version of GeoChip (GeoChip 5.0) contains approximately 162,000 probes from 378,000 genes involved in C, N, S and P cycling, organic contaminant degradation, metal resistance, antibiotic resistance, stress responses, metal homeostasis, virulence, pigment production, bacterial phage-mediated lysis, soil beneficial microorganisms, and specific probes for viruses, protists, and fungi. Our experimental results revealed that both ecosystem and CO2 significantly (p changes in the soil microbial community structure were closely correlated with geographic distance, soil NO3-N, NH4-N and C/N ratio. Further metagenome sequencing analysis of soil microbial communities in one particular site showed eCO2 altered the overall structure of soil microbial communities with ambient CO2 samples retaining a higher functional gene diversity than eCO2 samples. Also the taxonomic diversity of functional genes decreased at eCO2. Random matrix theory (RMT)-based network analysis showed that the identified networks under ambient and elevated CO2 were substantially different in terms of overall network topology, network composition, node overlap, module preservation, module-based higher order organization (meta-modules), topological roles of

  6. Moleculo Long-Read Sequencing Facilitates Assembly and Genomic Binning from Complex Soil Metagenomes

    Energy Technology Data Exchange (ETDEWEB)

    White, Richard Allen; Bottos, Eric M.; Roy Chowdhury, Taniya; Zucker, Jeremy D.; Brislawn, Colin J.; Nicora, Carrie D.; Fansler, Sarah J.; Glaesemann, Kurt R.; Glass, Kevin; Jansson, Janet K.; Langille, Morgan

    2016-06-28

    ABSTRACT

    Soil metagenomics has been touted as the “grand challenge” for metagenomics, as the high microbial diversity and spatial heterogeneity of soils make them unamenable to current assembly platforms. Here, we aimed to improve soil metagenomic sequence assembly by applying the Moleculo synthetic long-read sequencing technology. In total, we obtained 267 Gbp of raw sequence data from a native prairie soil; these data included 109.7 Gbp of short-read data (~100 bp) from the Joint Genome Institute (JGI), an additional 87.7 Gbp of rapid-mode read data (~250 bp), plus 69.6 Gbp (>1.5 kbp) from Moleculo sequencing. The Moleculo data alone yielded over 5,600 reads of >10 kbp in length, and over 95% of the unassembled reads mapped to contigs of >1.5 kbp. Hybrid assembly of all data resulted in more than 10,000 contigs over 10 kbp in length. We mapped three replicate metatranscriptomes derived from the same parent soil to the Moleculo subassembly and found that 95% of the predicted genes, based on their assignments to Enzyme Commission (EC) numbers, were expressed. The Moleculo subassembly also enabled binning of >100 microbial genome bins. We obtained via direct binning the first complete genome, that of “CandidatusPseudomonas sp. strain JKJ-1” from a native soil metagenome. By mapping metatranscriptome sequence reads back to the bins, we found that several bins corresponding to low-relative-abundanceAcidobacteriawere highly transcriptionally active, whereas bins corresponding to high-relative-abundanceVerrucomicrobiawere not. These results demonstrate that Moleculo sequencing provides a significant advance for resolving complex soil microbial communities.

    IMPORTANCESoil microorganisms carry out key processes for life on our planet, including cycling of carbon and other nutrients and supporting growth of plants. However, there is poor molecular-level understanding of their

  7. Functional Metagenomic Investigations of the Human Intestinal Microbiota

    Directory of Open Access Journals (Sweden)

    Aimee Marguerite Moore

    2011-10-01

    Full Text Available The human intestinal microbiota encode multiple critical functions impacting human health, including, metabolism of dietary substrate, prevention of pathogen invasion, immune system modulation, and provision of a reservoir of antibiotic resistance genes accessible to pathogens. The complexity of this microbial community, its recalcitrance to standard cultivation and the immense diversity of its encoded genes has necessitated the development of novel molecular, microbiological, and genomic tools. Functional metagenomics is one such culture-independent technique used for decades to study environmental microorganisms but relatively recently applied to the study of the human commensal microbiota. Metagenomic functional screens characterize the functional capacity of a microbial community independent of identity to known genes by subjecting the metagenome to functional assays in a genetically tractable host. Here we highlight recent work applying this technique to study the functional diversity of the intestinal microbiota, and discuss how an approach combining high-throughput sequencing, cultivation, and metagenomic functional screens can improve our understanding of interactions between this complex community and its human host.

  8. A four gene signature predictive of recurrent prostate cancer.

    Science.gov (United States)

    Komisarof, Justin; McCall, Matthew; Newman, Laurel; Bshara, Wiam; Mohler, James L; Morrison, Carl; Land, Hartmut

    2017-01-10

    Prostate cancer is the most common form of non-dermatological cancer among US men, with an increasing incidence due to the aging population. Patients diagnosed with clinically localized disease identified as intermediate or high-risk are often treated by radical prostatectomy. Approximately 33% of these patients will suffer recurrence after surgery. Identifying patients likely to experience recurrence after radical prostatectomy would lead to improved clinical outcomes, as these patients could receive adjuvant radiotherapy. Here, we report a new tool for prediction of prostate cancer recurrence based on the expression pattern of a small set of cooperation response genes (CRGs). CRGs are a group of genes downstream of cooperating oncogenic mutations previously identified in a colon cancer model that are critical to the cancer phenotype. We show that systemic dysregulation of CRGs is also found in prostate cancer, including a 4-gene signature (HBEGF, HOXC13, IGFBP2, and SATB1) capable of differentiating recurrent from non-recurrent prostate cancer. To develop a suitable diagnostic tool to predict disease outcomes in individual patients, multiple algorithms and data handling strategies were evaluated on a training set using leave-one-out cross-validation (LOOCV). The best-performing algorithm, when used in combination with a predictive nomogram based on clinical staging, predicted recurrent and non-recurrent disease outcomes in a blinded validation set with 83% accuracy, outperforming previous methods. Disease-free survival times between the cohort of prostate cancers predicted to recur and predicted not to recur differed significantly (p = 1.38x10-6). Therefore, this test allows us to accurately identify prostate cancer patients likely to experience future recurrent disease immediately following removal of the primary tumor.

  9. Metagenomics Reveals Pervasive Bacterial Populations and Reduced Community Diversity across the Alaska Tundra Ecosystem.

    Science.gov (United States)

    Johnston, Eric R; Rodriguez-R, Luis M; Luo, Chengwei; Yuan, Mengting M; Wu, Liyou; He, Zhili; Schuur, Edward A G; Luo, Yiqi; Tiedje, James M; Zhou, Jizhong; Konstantinidis, Konstantinos T

    2016-01-01

    How soil microbial communities contrast with respect to taxonomic and functional composition within and between ecosystems remains an unresolved question that is central to predicting how global anthropogenic change will affect soil functioning and services. In particular, it remains unclear how small-scale observations of soil communities based on the typical volume sampled (1-2 g) are generalizable to ecosystem-scale responses and processes. This is especially relevant for remote, northern latitude soils, which are challenging to sample and are also thought to be more vulnerable to climate change compared to temperate soils. Here, we employed well-replicated shotgun metagenome and 16S rRNA gene amplicon sequencing to characterize community composition and metabolic potential in Alaskan tundra soils, combining our own datasets with those publically available from distant tundra and temperate grassland and agriculture habitats. We found that the abundance of many taxa and metabolic functions differed substantially between tundra soil metagenomes relative to those from temperate soils, and that a high degree of OTU-sharing exists between tundra locations. Tundra soils were an order of magnitude less complex than their temperate counterparts, allowing for near-complete coverage of microbial community richness (~92% breadth) by sequencing, and the recovery of 27 high-quality, almost complete (>80% completeness) population bins. These population bins, collectively, made up to ~10% of the metagenomic datasets, and represented diverse taxonomic groups and metabolic lifestyles tuned toward sulfur cycling, hydrogen metabolism, methanotrophy, and organic matter oxidation. Several population bins, including members of Acidobacteria, Actinobacteria, and Proteobacteria, were also present in geographically distant (~100-530 km apart) tundra habitats (full genome representation and up to 99.6% genome-derived average nucleotide identity). Collectively, our results revealed that

  10. Detection of 140 clinically relevant antibiotic-resistance genes in the plasmid metagenome of wastewater treatment plant bacteria showing reduced susceptibility to selected antibiotics.

    Science.gov (United States)

    Szczepanowski, Rafael; Linke, Burkhard; Krahn, Irene; Gartemann, Karl-Heinz; Gützkow, Tim; Eichler, Wolfgang; Pühler, Alfred; Schlüter, Andreas

    2009-07-01

    To detect plasmid-borne antibiotic-resistance genes in wastewater treatment plant (WWTP) bacteria, 192 resistance-gene-specific PCR primer pairs were designed and synthesized. Subsequent PCR analyses on total plasmid DNA preparations obtained from bacteria of activated sludge or the WWTP's final effluents led to the identification of, respectively, 140 and 123 different resistance-gene-specific amplicons. The genes detected included aminoglycoside, beta-lactam, chloramphenicol, fluoroquinolone, macrolide, rifampicin, tetracycline, trimethoprim and sulfonamide resistance genes as well as multidrug efflux and small multidrug resistance genes. Some of these genes were only recently described from clinical isolates, demonstrating genetic exchange between clinical and WWTP bacteria. Sequencing of selected resistance-gene-specific amplicons confirmed their identity or revealed that the amplicon nucleotide sequence is very similar to a gene closely related to the reference gene used for primer design. These results demonstrate that WWTP bacteria are a reservoir for various resistance genes. Moreover, detection of about 64 % of the 192 reference resistance genes in bacteria obtained from the WWTP's final effluents indicates that these resistance determinants might be further disseminated in habitats downstream of the sewage plant.

  11. Multi-Layer and Recursive Neural Networks for Metagenomic Classification.

    Science.gov (United States)

    Ditzler, Gregory; Polikar, Robi; Rosen, Gail

    2015-09-01

    Recent advances in machine learning, specifically in deep learning with neural networks, has made a profound impact on fields such as natural language processing, image classification, and language modeling; however, feasibility and potential benefits of the approaches to metagenomic data analysis has been largely under-explored. Deep learning exploits many layers of learning nonlinear feature representations, typically in an unsupervised fashion, and recent results have shown outstanding generalization performance on previously unseen data. Furthermore, some deep learning methods can also represent the structure in a data set. Consequently, deep learning and neural networks may prove to be an appropriate approach for metagenomic data. To determine whether such approaches are indeed appropriate for metagenomics, we experiment with two deep learning methods: i) a deep belief network, and ii) a recursive neural network, the latter of which provides a tree representing the structure of the data. We compare these approaches to the standard multi-layer perceptron, which has been well-established in the machine learning community as a powerful prediction algorithm, though its presence is largely missing in metagenomics literature. We find that traditional neural networks can be quite powerful classifiers on metagenomic data compared to baseline methods, such as random forests. On the other hand, while the deep learning approaches did not result in improvements to the classification accuracy, they do provide the ability to learn hierarchical representations of a data set that standard classification methods do not allow. Our goal in this effort is not to determine the best algorithm in terms accuracy-as that depends on the specific application-but rather to highlight the benefits and drawbacks of each of the approach we discuss and provide insight on how they can be improved for predictive metagenomic analysis.

  12. Microbial Functional Gene Diversity Predicts Groundwater Contamination and Ecosystem Functioning

    Science.gov (United States)

    Zhang, Ping; Wu, Linwei; Rocha, Andrea M.; Shi, Zhou; Wu, Bo; Qin, Yujia; Wang, Jianjun; Yan, Qingyun; Curtis, Daniel; Ning, Daliang; Van Nostrand, Joy D.; Wu, Liyou; Watson, David B.; Adams, Michael W. W.; Alm, Eric J.; Adams, Paul D.; Arkin, Adam P.

    2018-01-01

    ABSTRACT Contamination from anthropogenic activities has significantly impacted Earth’s biosphere. However, knowledge about how environmental contamination affects the biodiversity of groundwater microbiomes and ecosystem functioning remains very limited. Here, we used a comprehensive functional gene array to analyze groundwater microbiomes from 69 wells at the Oak Ridge Field Research Center (Oak Ridge, TN), representing a wide pH range and uranium, nitrate, and other contaminants. We hypothesized that the functional diversity of groundwater microbiomes would decrease as environmental contamination (e.g., uranium or nitrate) increased or at low or high pH, while some specific populations capable of utilizing or resistant to those contaminants would increase, and thus, such key microbial functional genes and/or populations could be used to predict groundwater contamination and ecosystem functioning. Our results indicated that functional richness/diversity decreased as uranium (but not nitrate) increased in groundwater. In addition, about 5.9% of specific key functional populations targeted by a comprehensive functional gene array (GeoChip 5) increased significantly (P contamination and ecosystem functioning. This study indicates great potential for using microbial functional genes to predict environmental contamination and ecosystem functioning. PMID:29463661

  13. Metagenomics of an alkaline hot spring in Galicia (Spain: microbial diversity analysis and screening for novel lipolytic enzymes

    Directory of Open Access Journals (Sweden)

    Olalla eLópez-López

    2015-11-01

    Full Text Available A fosmid library was constructed with the metagenomic DNA from the water of the Lobios hot spring (76°C, pH=8.2 located in Ourense (Spain. Metagenomic sequencing of the fosmid library allowed the assembly of 9,722 contigs ranging in size from 500 to 56,677 bp and spanning approximately 18 Mbp. 23,207 ORFs (Open Reading Frames were predicted from the assembly. Biodiversity was explored by taxonomic classification and it revealed that bacteria were predominant, while the archaea were less abundant. The 6 most abundant bacterial phyla were Deinococcus-Thermus, Proteobacteria, Firmicutes, Acidobacteria, Aquificae and Chloroflexi. Within the archaeal superkingdom, the phylum Thaumarchaeota was predominant with the dominant species Candidatus Caldiarchaeum subterraneum. Functional classification revealed the genes associated to one-carbon metabolism as the most abundant. Both taxonomic and functional classifications showed a mixture of different microbial metabolic patterns: aerobic and anaerobic, chemoorganotrophic and chemolithotrophic, autotrophic and heterotrophic. Remarkably, the presence of genes encoding enzymes with potential biotechnological interest, such as xylanases, galactosidases, proteases and lipases, was also revealed in the metagenomic library.Functional screening of this library was subsequently done looking for genes encoding lipolytic enzymes. Six genes conferring lipolytic activity were identified and one was cloned and characterized. This gene was named LOB4Est and it was expressed in a yeast mesophilic host. LOB4Est codes for a novel esterase of family VIII, with sequence similarity to β-lactamases, but with unusual wide substrate specificity. When the enzyme was purified from the mesophilic host it showed half-life of 1 h and 43 minutes at 50°C, and maximal activity at 40°C and pH 7.5 with p-nitrophenyl-laurate as substrate. Interestingly, the enzyme retained more than 80% of maximal activity in a broad range of pH from 6.5-8.

  14. New Bacterial Phytase through Metagenomic Prospection

    Directory of Open Access Journals (Sweden)

    Nathálya Farias

    2018-02-01

    Full Text Available Alkaline phytases from uncultured microorganisms, which hydrolyze phytate to less phosphorylated myo-inositols and inorganic phosphate, have great potential as additives in agricultural industry. The development of metagenomics has stemmed from the ineluctable evidence that as-yet-uncultured microorganisms represent the vast majority of organisms in most environments on earth. In this study, a gene encoding a phytase was cloned from red rice crop residues and castor bean cake using a metagenomics strategy. The amino acid identity between this gene and its closest published counterparts is lower than 60%. The phytase was named PhyRC001 and was biochemically characterized. This recombinant protein showed activity on sodium phytate, indicating that PhyRC001 is a hydrolase enzyme. The enzymatic activity was optimal at a pH of 7.0 and at a temperature of 35 °C. β-propeller phytases possess great potential as feed additives because they are the only type of phytase with high activity at neutral pH. Therefore, to explore and exploit the underlying mechanism for β-propeller phytase functions could be of great benefit to biotechnology.

  15. Identification of Predictive Gene Markers for Multipotent Stromal Cell Proliferation.

    Science.gov (United States)

    Bellayr, Ian H; Marklein, Ross A; Lo Surdo, Jessica L; Bauer, Steven R; Puri, Raj K

    2016-06-01

    Multipotent stromal cells (MSCs) are known for their distinctive ability to differentiate into different cell lineages, such as adipocytes, chondrocytes, and osteocytes. They can be isolated from numerous tissue sources, including bone marrow, adipose tissue, skeletal muscle, and others. Because of their differentiation potential and secretion of growth factors, MSCs are believed to have an inherent quality of regeneration and immune suppression. Cellular expansion is necessary to obtain sufficient numbers for use; however, MSCs exhibit a reduced capacity for proliferation and differentiation after several rounds of passaging. In this study, gene markers of MSC proliferation were identified and evaluated for their ability to predict proliferative quality. Microarray data of human bone marrow-derived MSCs were correlated with two proliferation assays. A collection of 24 genes were observed to significantly correlate with both proliferation assays (|r| >0.70) for eight MSC lines at multiple passages. These 24 identified genes were then confirmed using an additional set of MSCs from eight new donors using reverse transcription quantitative polymerase chain reaction (RT-qPCR). The proliferative potential of the second set of MSCs was measured for each donor/passage for confluency fraction, fraction of EdU+ cells, and population doubling time. The second set of MSCs exhibited a greater proliferative potential at passage 4 in comparison to passage 8, which was distinguishable by 15 genes; however, only seven of the genes (BIRC5, CCNA2, CDC20, CDK1, PBK, PLK1, and SPC25) demonstrated significant correlation with MSC proliferation regardless of passage. Our analyses revealed that correlation between gene expression and proliferation was consistently reduced with the inclusion of non-MSC cell lines; therefore, this set of seven genes may be more strongly associated with MSC proliferative quality. Our results pave the way to determine the quality of an MSC population for a

  16. Identification of rat genes by TWINSCAN gene prediction, RT-PCR, and direct sequencing

    DEFF Research Database (Denmark)

    Wu, Jia Qian; Shteynberg, David; Arumugam, Manimozhiyan

    2004-01-01

    an alternative approach: reverse transcription-polymerase chain reaction (RT-PCR) and direct sequencing based on dual-genome de novo predictions from TWINSCAN. We tested 444 TWINSCAN-predicted rat genes that showed significant homology to known human genes implicated in disease but that were partially...... in the single-intron experiment. Spliced sequences were amplified in 46 cases (34%). We conclude that this procedure for elucidating gene structures with native cDNA sequences is cost-effective and will become even more so as it is further optimized.......The publication of a draft sequence of a third mammalian genome--that of the rat--suggests a need to rethink genome annotation. New mammalian sequences will not receive the kind of labor-intensive annotation efforts that are currently being devoted to human. In this paper, we demonstrate...

  17. Metagenomic discovery of polybrominated diphenyl ether biosynthesis by marine sponges

    Science.gov (United States)

    Podell, Sheila; Taton, Arnaud; Schorn, Michelle A.; Busch, Julia; Lin, Zhenjian; Schmidt, Eric W.; Jensen, Paul R.; Paul, Valerie J.; Biggs, Jason S.; Golden, James W.; Allen, Eric E.; Moore, Bradley S.

    2017-01-01

    Naturally produced polybrominated diphenyl ethers (PBDEs) pervade the marine environment and structurally resemble toxic man-made brominated flame retardants. PBDEs bioaccumulate in marine animals and are likely transferred to the human food chain. However, the biogenic basis for PBDE production in one of their most prolific sources, marine sponges of the order Dysideidae, remains unidentified. Here, we report the discovery of PBDE biosynthetic gene clusters within sponge microbiome-associated cyanobacterial endosymbionts by employing an unbiased metagenome mining approach. By expression of PBDE biosynthetic genes in heterologous cyanobacterial hosts, we correlate the structural diversity of naturally produced PBDEs to modifications within PBDE biosynthetic gene clusters in multiple sponge holobionts. Our results establish the genetic and molecular foundation for the production of PBDEs in one of the most abundant natural sources of these molecules, further setting the stage for a metagenomic-based inventory of other PBDE sources in the marine environment. PMID:28319100

  18. From DNA Sequences to Chemical Structures – Methods for Mining Microbial Genomic and Metagenomic Data Sets for New Natural Products

    Directory of Open Access Journals (Sweden)

    Jurica Zucko

    2010-01-01

    Full Text Available Rapid mining of large genomic and metagenomic data sets for modular polyketide synthases, non-ribosomal peptide synthetases and hybrid polyketide synthase/non-ribosomal peptide synthetase biosynthetic gene clusters has been achieved using the generic computer program packages ClustScan and CompGen. These program packages perform the annotation with the hierarchical structuring into polypeptides, modules and domains, as well as storage and graphical presentations of the data. This aims to achieve the most accurate predictions of the activities and specificities of catalytically active domains that can be made with present knowledge, leading to a prediction of the most likely chemical structures produced by these enzymes. The program packages also allow generation of novel clusters by homologous recombination of the annotated genes in silico. ClustScan and CompGen were used to construct a custom database of known compounds (CSDB and of predicted entirely novel recombinant products (r-CSDB that can be used for in silico screening with computer aided drug design technology. The use of these programs has been exemplified by analysing genomic sequences from terrestrial prokaryotes and eukaryotic microorganisms, a marine metagenomic data set and a newly discovered example of a 'shared metabolic pathway' in marine-microbial endosymbiosis.

  19. MEGGASENSE - The Metagenome/Genome Annotated Sequence Natural Language Search Engine: A Platform for 
the Construction of Sequence Data Warehouses.

    Science.gov (United States)

    Gacesa, Ranko; Zucko, Jurica; Petursdottir, Solveig K; Gudmundsdottir, Elisabet Eik; Fridjonsson, Olafur H; Diminic, Janko; Long, Paul F; Cullum, John; Hranueli, Daslav; Hreggvidsson, Gudmundur O; Starcevic, Antonio

    2017-06-01

    The MEGGASENSE platform constructs relational databases of DNA or protein sequences. The default functional analysis uses 14 106 hidden Markov model (HMM) profiles based on sequences in the KEGG database. The Solr search engine allows sophisticated queries and a BLAST search function is also incorporated. These standard capabilities were used to generate the SCATT database from the predicted proteome of Streptomyces cattleya . The implementation of a specialised metagenome database (AMYLOMICS) for bioprospecting of carbohydrate-modifying enzymes is described. In addition to standard assembly of reads, a novel 'functional' assembly was developed, in which screening of reads with the HMM profiles occurs before the assembly. The AMYLOMICS database incorporates additional HMM profiles for carbohydrate-modifying enzymes and it is illustrated how the combination of HMM and BLAST analyses helps identify interesting genes. A variety of different proteome and metagenome databases have been generated by MEGGASENSE.

  20. Composition and predicted functional ecology of mussel - associated bacteria in Indonesian marine lakes

    NARCIS (Netherlands)

    Cleary, D.F.R.; Becking, L.E.; Polonia, A.; Freitas, R.M.; Gomes, N.

    2015-01-01

    In the present study, we sampled bacterial communities associated with mussels inhabiting two distinct coastal marine ecosystems in Kalimantan, Indonesia, namely, marine lakes and coastal mangroves. We used 16S rRNA gene pyrosequencing and predicted metagenomic analysis to compare microbial

  1. Elucidating selection processes for antibiotic resistance in sewage treatment plants using metagenomics.

    Science.gov (United States)

    Bengtsson-Palme, Johan; Hammarén, Rickard; Pal, Chandan; Östman, Marcus; Björlenius, Berndt; Flach, Carl-Fredrik; Fick, Jerker; Kristiansson, Erik; Tysklind, Mats; Larsson, D G Joakim

    2016-12-01

    Sewage treatment plants (STPs) have repeatedly been suggested as "hotspots" for the emergence and dissemination of antibiotic-resistant bacteria. A critical question still unanswered is if selection pressures within STPs, caused by residual antibiotics or other co-selective agents, are sufficient to specifically promote resistance. To address this, we employed shotgun metagenomic sequencing of samples from different steps of the treatment process in three Swedish STPs. In parallel, concentrations of selected antibiotics, biocides and metals were analyzed. We found that concentrations of tetracycline and ciprofloxacin in the influent were above predicted concentrations for resistance selection, however, there was no consistent enrichment of resistance genes to any particular class of antibiotics in the STPs, neither for biocide and metal resistance genes. The most substantial change of the bacterial communities compared to human feces occurred already in the sewage pipes, manifested by a strong shift from obligate to facultative anaerobes. Through the treatment process, resistance genes against antibiotics, biocides and metals were not reduced to the same extent as fecal bacteria. The OXA-48 gene was consistently enriched in surplus and digested sludge. We find this worrying as OXA-48, still rare in Swedish clinical isolates, provides resistance to carbapenems, one of our most critically important classes of antibiotics. Taken together, metagenomics analyses did not provide clear support for specific antibiotic resistance selection. However, stronger selective forces affecting gross taxonomic composition, and with that resistance gene abundances, limit interpretability. Comprehensive analyses of resistant/non-resistant strains within relevant species are therefore warranted. Copyright © 2016 Elsevier B.V. All rights reserved.

  2. MALINA: a web service for visual analytics of human gut microbiota whole-genome metagenomic reads.

    Science.gov (United States)

    Tyakht, Alexander V; Popenko, Anna S; Belenikin, Maxim S; Altukhov, Ilya A; Pavlenko, Alexander V; Kostryukova, Elena S; Selezneva, Oksana V; Larin, Andrei K; Karpova, Irina Y; Alexeev, Dmitry G

    2012-12-07

    MALINA is a web service for bioinformatic analysis of whole-genome metagenomic data obtained from human gut microbiota sequencing. As input data, it accepts metagenomic reads of various sequencing technologies, including long reads (such as Sanger and 454 sequencing) and next-generation (including SOLiD and Illumina). It is the first metagenomic web service that is capable of processing SOLiD color-space reads, to authors' knowledge. The web service allows phylogenetic and functional profiling of metagenomic samples using coverage depth resulting from the alignment of the reads to the catalogue of reference sequences which are built into the pipeline and contain prevalent microbial genomes and genes of human gut microbiota. The obtained metagenomic composition vectors are processed by the statistical analysis and visualization module containing methods for clustering, dimension reduction and group comparison. Additionally, the MALINA database includes vectors of bacterial and functional composition for human gut microbiota samples from a large number of existing studies allowing their comparative analysis together with user samples, namely datasets from Russian Metagenome project, MetaHIT and Human Microbiome Project (downloaded from http://hmpdacc.org). MALINA is made freely available on the web at http://malina.metagenome.ru. The website is implemented in JavaScript (using Ext JS), Microsoft .NET Framework, MS SQL, Python, with all major browsers supported.

  3. Data mining approach to predict BRCA1 gene mutation

    Directory of Open Access Journals (Sweden)

    Olegas Niakšu

    2013-09-01

    Full Text Available Breast cancer is the most frequent women cancer form and one of the leading mortality causes among women around the world. Patients with pathological mutation of a BRCA gene have 65% lifelong breast cancer probability. It is known that such patients have different cause of illness. In this study, we have proposed a new approach for the prediction of BRCA mutation carriers by methodically applying knowledge discovery steps and utilizing data mining methods. An alternative BRCA risk assessment model has been created utilizing decision tree classifier model. The biggest challenge was a very small size and imbalanced nature of the initial dataset, which have been collected by clinicians during 4 years of clinical trial. Iterative optimization of initial dataset, optimal algorithms selection and their parameterization have resulted in higher classifier model performance, with acceptable prediction accuracy for the clinical usage. In this study, three data mining problems have been analyzed using eleven data mining algorithms.

  4. New Hydrocarbon Degradation Pathways in the Microbial Metagenome from Brazilian Petroleum Reservoirs

    Science.gov (United States)

    Sierra-García, Isabel Natalia; Correa Alvarez, Javier; Pantaroto de Vasconcellos, Suzan; Pereira de Souza, Anete; dos Santos Neto, Eugenio Vaz; de Oliveira, Valéria Maia

    2014-01-01

    Current knowledge of the microbial diversity and metabolic pathways involved in hydrocarbon degradation in petroleum reservoirs is still limited, mostly due to the difficulty in recovering the complex community from such an extreme environment. Metagenomics is a valuable tool to investigate the genetic and functional diversity of previously uncultured microorganisms in natural environments. Using a function-driven metagenomic approach, we investigated the metabolic abilities of microbial communities in oil reservoirs. Here, we describe novel functional metabolic pathways involved in the biodegradation of aromatic compounds in a metagenomic library obtained from an oil reservoir. Although many of the deduced proteins shared homology with known enzymes of different well-described aerobic and anaerobic catabolic pathways, the metagenomic fragments did not contain the complete clusters known to be involved in hydrocarbon degradation. Instead, the metagenomic fragments comprised genes belonging to different pathways, showing novel gene arrangements. These results reinforce the potential of the metagenomic approach for the identification and elucidation of new genes and pathways in poorly studied environments and contribute to a broader perspective on the hydrocarbon degradation processes in petroleum reservoirs. PMID:24587220

  5. Comparative fecal metagenomics unveils unique functional capacity of the swine gut

    Directory of Open Access Journals (Sweden)

    Martinson John

    2011-05-01

    Full Text Available Abstract Background Uncovering the taxonomic composition and functional capacity within the swine gut microbial consortia is of great importance to animal physiology and health as well as to food and water safety due to the presence of human pathogens in pig feces. Nonetheless, limited information on the functional diversity of the swine gut microbiome is available. Results Analysis of 637, 722 pyrosequencing reads (130 megabases generated from Yorkshire pig fecal DNA extracts was performed to help better understand the microbial diversity and largely unknown functional capacity of the swine gut microbiome. Swine fecal metagenomic sequences were annotated using both MG-RAST and JGI IMG/M-ER pipelines. Taxonomic analysis of metagenomic reads indicated that swine fecal microbiomes were dominated by Firmicutes and Bacteroidetes phyla. At a finer phylogenetic resolution, Prevotella spp. dominated the swine fecal metagenome, while some genes associated with Treponema and Anareovibrio species were found to be exclusively within the pig fecal metagenomic sequences analyzed. Functional analysis revealed that carbohydrate metabolism was the most abundant SEED subsystem, representing 13% of the swine metagenome. Genes associated with stress, virulence, cell wall and cell capsule were also abundant. Virulence factors associated with antibiotic resistance genes with highest sequence homology to genes in Bacteroidetes, Clostridia, and Methanosarcina were numerous within the gene families unique to the swine fecal metagenomes. Other abundant proteins unique to the distal swine gut shared high sequence homology to putative carbohydrate membrane transporters. Conclusions The results from this metagenomic survey demonstrated the presence of genes associated with resistance to antibiotics and carbohydrate metabolism suggesting that the swine gut microbiome may be shaped by husbandry practices.

  6. Vikodak--A Modular Framework for Inferring Functional Potential of Microbial Communities from 16S Metagenomic Datasets.

    Directory of Open Access Journals (Sweden)

    Sunil Nagpal

    Full Text Available The overall metabolic/functional potential of any given environmental niche is a function of the sum total of genes/proteins/enzymes that are encoded and expressed by various interacting microbes residing in that niche. Consequently, prior (collated information pertaining to genes, enzymes encoded by the resident microbes can aid in indirectly (reconstructing/ inferring the metabolic/ functional potential of a given microbial community (given its taxonomic abundance profile. In this study, we present Vikodak--a multi-modular package that is based on the above assumption and automates inferring and/ or comparing the functional characteristics of an environment using taxonomic abundance generated from one or more environmental sample datasets. With the underlying assumptions of co-metabolism and independent contributions of different microbes in a community, a concerted effort has been made to accommodate microbial co-existence patterns in various modules incorporated in Vikodak.Validation experiments on over 1400 metagenomic samples have confirmed the utility of Vikodak in (a deciphering enzyme abundance profiles of any KEGG metabolic pathway, (b functional resolution of distinct metagenomic environments, (c inferring patterns of functional interaction between resident microbes, and (d automating statistical comparison of functional features of studied microbiomes. Novel features incorporated in Vikodak also facilitate automatic removal of false positives and spurious functional predictions.With novel provisions for comprehensive functional analysis, inclusion of microbial co-existence pattern based algorithms, automated inter-environment comparisons; in-depth analysis of individual metabolic pathways and greater flexibilities at the user end, Vikodak is expected to be an important value addition to the family of existing tools for 16S based function prediction.A web implementation of Vikodak can be publicly accessed at: http://metagenomics

  7. Metagenome reveals potential microbial degradation of hydrocarbon coupled with sulfate reduction in an oil-immersed chimney from Guaymas Basin

    Directory of Open Access Journals (Sweden)

    Ying eHe

    2013-06-01

    Full Text Available Deep-sea hydrothermal vent chimneys contain a high diversity of microorganisms, yet the metabolic activity and the ecological functions of the microbial communities remain largely unexplored. In this study, a metagenomic approach was applied to characterize the metabolic potential in a Guaymas hydrothermal vent chimney and to conduct comparative genomic analysis among a variety of environments with sequenced metagenomes. Complete clustering of functional gene categories with a comparative metagenomic approach showed that this Guaymas chimney metagenome was clustered most closely with a chimney metagenome from Juan de Fuca. All chimney samples were enriched with genes involved in recombination and repair, chemotaxis and flagellar assembly, highlighting their roles in coping with the fluctuating extreme deep-sea environments. A high proportion of transposases was observed in all the metagenomes from deep-sea chimneys, supporting the previous hypothesis that horizontal gene transfer may be common in the deep-sea vent chimney biosphere. In the Guaymas chimney metagenome, thermophilic sulfate reducing microorganisms including bacteria and archaea were found predominant, and genes coding for the degradation of refractory organic compounds such as cellulose, lipid, pullullan, as well as a few hydrocarbons including toluene, ethylbenzene and o-xylene were identified. Therefore, this oil-immersed chimney supported a thermophilic microbial community capable of oxidizing a range of hydrocarbons that served as electron donors for sulphate reduction under anaerobic conditions.

  8. Metagenomic Diagnosis of Bacterial Infections

    Science.gov (United States)

    Nakamura, Shota; Maeda, Norihiro; Miron, Ionut Mihai; Yoh, Myonsun; Izutsu, Kaori; Kataoka, Chidoh; Honda, Takeshi; Yasunaga, Teruo; Nakaya, Takaaki; Kawai, Jun; Hayashizaki, Yoshihide; Horii, Toshihiro

    2008-01-01

    To test the ability of high-throughput DNA sequencing to detect bacterial pathogens, we used it on DNA from a patient’s feces during and after diarrheal illness. Sequences showing best matches for Campylobacter jejuni were detected only in the illness sample. Various bacteria may be detectable with this metagenomic approach. PMID:18976571

  9. Comparative Metagenomics Reveals the Distinctive Adaptive Features of the Spongia officinalis Endosymbiotic Consortium

    Directory of Open Access Journals (Sweden)

    Elham Karimi

    2017-12-01

    Full Text Available Current knowledge of sponge microbiome functioning derives mostly from comparative analyses with bacterioplankton communities. We employed a metagenomics-centered approach to unveil the distinct features of the Spongia officinalis endosymbiotic consortium in the context of its two primary environmental vicinities. Microbial metagenomic DNA samples (n = 10 from sponges, seawater, and sediments were subjected to Hiseq Illumina sequencing (c. 15 million 100 bp reads per sample. Totals of 10,272 InterPro (IPR predicted protein entries and 784 rRNA gene operational taxonomic units (OTUs, 97% cut-off were uncovered from all metagenomes. Despite the large divergence in microbial community assembly between the surveyed biotopes, the S. officinalis symbiotic community shared slightly greater similarity (p < 0.05, in terms of both taxonomy and function, to sediment than to seawater communities. The vast majority of the dominant S. officinalis symbionts (i.e., OTUs, representing several, so-far uncultivable lineages in diverse bacterial phyla, displayed higher residual abundances in sediments than in seawater. CRISPR-Cas proteins and restriction endonucleases presented much higher frequencies (accompanied by lower viral abundances in sponges than in the environment. However, several genomic features sharply enriched in the sponge specimens, including eukaryotic-like repeat motifs (ankyrins, tetratricopeptides, WD-40, and leucine-rich repeats, and genes encoding for plasmids, sulfatases, polyketide synthases, type IV secretion proteins, and terpene/terpenoid synthases presented, to varying degrees, higher frequencies in sediments than in seawater. In contrast, much higher abundances of motility and chemotaxis genes were found in sediments and seawater than in sponges. Higher cell and surface densities, sponge cell shedding and particle uptake, and putative chemical signaling processes favoring symbiont persistence in particulate matrices all may act as

  10. Dynamic changes of yak (Bos grunniens) gut microbiota during growth revealed by polymerase chain reaction-denaturing gradient gel electrophoresis and metagenomics.

    Science.gov (United States)

    Nie, Yuanyang; Zhou, Zhiwei; Guan, Jiuqiang; Xia, Baixue; Luo, Xiaolin; Yang, Yang; Fu, Yu; Sun, Qun

    2017-07-01

    To understand the dynamic structure, function, and influence on nutrient metabolism in hosts, it was crucial to assess the genetic potential of gut microbial community in yaks of different ages. The denaturing gradient gel electrophoresis (DGGE) profiles and Illumina-based metagenomic sequencing on colon contents of 15 semi-domestic yaks were investigated. Unweighted pairwise grouping method with mathematical averages (UPGMA) clustering and principal component analysis (PCA) were used to analyze the DGGE fingerprint. The Illumina sequences were assembled, predicted to genes and functionally annotated, and then classified by querying protein sequences of the genes against the Kyoto encyclopedia of genes and genomes (KEGG) database. Metagenomic sequencing showed that more than 85% of ribosomal RNA (rRNA) gene sequences belonged to the phylum Firmicutes and Bacteroidetes , indicating that the family Ruminococcaceae (46.5%), Rikenellaceae (11.3%), Lachnospiraceae (10.0%), and Bacteroidaceae (6.3%) were dominant gut microbes. Over 50% of non-rRNA gene sequences represented the metabolic pathways of amino acids (14.4%), proteins (12.3%), sugars (11.9%), nucleotides (6.8%), lipids (1.7%), xenobiotics (1.4%), coenzymes, and vitamins (3.6%). Gene functional classification showed that most of enzyme-coding genes were related to cellulose digestion and amino acids metabolic pathways. Yaks' age had a substantial effect on gut microbial composition. Comparative metagenomics of gut microbiota in 0.5-, 1.5-, and 2.5-year-old yaks revealed that the abundance of the class Clostridia , Bacteroidia , and Lentisphaeria , as well as the phylum Firmicutes , Bacteroidetes , Lentisphaerae , Tenericutes , and Cyanobacteria , varied more greatly during yaks' growth, especially in young animals (0.5 and 1.5 years old). Gut microbes, including Bacteroides , Clostridium , and Lentisphaeria , make a contribution to the energy metabolism and synthesis of amino acid, which are essential to the

  11. Extremozymes from metagenome: Potential applications in food processing.

    Science.gov (United States)

    Khan, Mahejibin; Sathya, T A

    2017-06-12

    The long-established use of enzymes for food processing and product formulation has resulted in an increased enzyme market compounding to 7.0% annual growth rate. Advancements in molecular biology and recognition that enzymes with specific properties have application for industrial production of infant, baby and functional foods boosted research toward sourcing the genes of microorganisms for enzymes with distinctive properties. In this regard, functional metagenomics for extremozymes has gained attention on the premise that such enzymes can catalyze specific reactions. Hence, metagenomics that can isolate functional genes of unculturable extremophilic microorganisms has expanded attention as a promising tool. Developments in this field of research in relation to food sector are reviewed.

  12. Metagenomic analysis of soil and freshwater from zoo agricultural area with organic fertilization.

    Science.gov (United States)

    Meneghine, Aylan K; Nielsen, Shaun; Varani, Alessandro M; Thomas, Torsten; Carareto Alves, Lucia Maria

    2017-01-01

    Microbial communities drive biogeochemical cycles in agricultural areas by decomposing organic materials and converting essential nutrients. Organic amendments improve soil quality by increasing the load of essential nutrients and enhancing the productivity. Additionally, fresh water used for irrigation can affect soil quality of agricultural soils, mainly due to the presence of microbial contaminants and pathogens. In this study, we investigated how microbial communities in irrigation water might contribute to the microbial diversity and function of soil. Whole-metagenomic sequencing approaches were used to investigate the taxonomic and the functional profiles of microbial communities present in fresh water used for irrigation, and in soil from a vegetable crop, which received fertilization with organic compost made from animal carcasses. The taxonomic analysis revealed that the most abundant genera were Polynucleobacter (~8% relative abundance) and Bacillus (~10%) in fresh water and soil from the vegetable crop, respectively. Low abundance (0.38%) of cyanobacterial groups were identified. Based on functional gene prediction, denitrification appears to be an important process in the soil community analysed here. Conversely, genes for nitrogen fixation were abundant in freshwater, indicating that the N-fixation plays a crucial role in this particular ecosystem. Moreover, pathogenicity islands, antibiotic resistance and potential virulence related genes were identified in both samples, but no toxigenic genes were detected. This study provides a better understanding of the community structure of an area under strong agricultural activity with regular irrigation and fertilization with an organic compost made from animal carcasses. Additionally, the use of a metagenomic approach to investigate fresh water quality proved to be a relevant method to evaluate its use in an agricultural ecosystem.

  13. Metagenomics-Enabled Understanding of Soil Microbial Feedbacks to Climate Warming

    Science.gov (United States)

    Zhou, J.; Wu, L.; Zhili, H.; Kostas, K.; Luo, Y.; Schuur, E. A. G.; Cole, J. R.; Tiedje, J. M.

    2014-12-01

    Understanding the response of biological communities to climate warming is a central issue in ecology and global change biology, but it is poorly understood microbial communities. To advance system-level predictive understanding of the feedbacks of belowground microbial communities to multiple climate change factors and their impacts on soil carbon (C) and nitrogen (N) cycling processes, we have used integrated metagenomic technologies (e.g., target gene and shotgun metagenome sequencing, GeoChip, and isotope) to analyze soil microbial communities from experimental warming sites in Alaska (AK) and Oklahoma (OK), and long-term laboratory incubation. Rapid feedbacks of microbial communities to warming were observed in the AK site. Consistent with the changes in soil temperature, moisture and ecosystem respiration, microbial functional community structure was shifted after only 1.5-year warming, indicating rapid responses and high sensitivity of this permafrost ecosystem to climate warming. Also, warming stimulated not only functional genes involved in aerobic respiration of both labile and recalcitrant C, contributing to an observed 24% increase in 2010 growing season and 56% increase of decomposition of a standard substrate, but also functional genes for anaerobic processes (e.g., denitrification, sulfate reduction, methanogenesis). Further comparisons by shotgun sequencing showed significant differences of microbial community structure between AK and OK sites. The OK site was enriched in genes annotated for cellulose degradation, CO2 production, denitrification, sporulation, heat shock response, and cellular surface structures (e.g., trans-membrane transporters for glucosides), while the AK warmed plots were enriched in metabolic pathways related to labile C decomposition. Together, our results demonstrate the vulnerability of permafrost ecosystem C to climate warming and the importance of microbial feedbacks in mediating such vulnerability.

  14. Metagenomic analysis of soil and freshwater from zoo agricultural area with organic fertilization.

    Directory of Open Access Journals (Sweden)

    Aylan K Meneghine

    Full Text Available Microbial communities drive biogeochemical cycles in agricultural areas by decomposing organic materials and converting essential nutrients. Organic amendments improve soil quality by increasing the load of essential nutrients and enhancing the productivity. Additionally, fresh water used for irrigation can affect soil quality of agricultural soils, mainly due to the presence of microbial contaminants and pathogens. In this study, we investigated how microbial communities in irrigation water might contribute to the microbial diversity and function of soil. Whole-metagenomic sequencing approaches were used to investigate the taxonomic and the functional profiles of microbial communities present in fresh water used for irrigation, and in soil from a vegetable crop, which received fertilization with organic compost made from animal carcasses. The taxonomic analysis revealed that the most abundant genera were Polynucleobacter (~8% relative abundance and Bacillus (~10% in fresh water and soil from the vegetable crop, respectively. Low abundance (0.38% of cyanobacterial groups were identified. Based on functional gene prediction, denitrification appears to be an important process in the soil community analysed here. Conversely, genes for nitrogen fixation were abundant in freshwater, indicating that the N-fixation plays a crucial role in this particular ecosystem. Moreover, pathogenicity islands, antibiotic resistance and potential virulence related genes were identified in both samples, but no toxigenic genes were detected. This study provides a better understanding of the community structure of an area under strong agricultural activity with regular irrigation and fertilization with an organic compost made from animal carcasses. Additionally, the use of a metagenomic approach to investigate fresh water quality proved to be a relevant method to evaluate its use in an agricultural ecosystem.

  15. The Amazon continuum dataset: quantitative metagenomic and metatranscriptomic inventories of the Amazon River plume, June 2010.

    Science.gov (United States)

    Satinsky, Brandon M; Zielinski, Brian L; Doherty, Mary; Smith, Christa B; Sharma, Shalabh; Paul, John H; Crump, Byron C; Moran, Mary Ann

    2014-01-01

    The Amazon River is by far the world's largest in terms of volume and area, generating a fluvial export that accounts for about a fifth of riverine input into the world's oceans. Marine microbial communities of the Western Tropical North Atlantic Ocean are strongly affected by the terrestrial materials carried by the Amazon plume, including dissolved (DOC) and particulate organic carbon (POC) and inorganic nutrients, with impacts on primary productivity and carbon sequestration. We inventoried genes and transcripts at six stations in the Amazon River plume during June 2010. At each station, internal standard-spiked metagenomes, non-selective metatranscriptomes, and poly(A)-selective metatranscriptomes were obtained in duplicate for two discrete size fractions (0.2 to 2.0 μm and 2.0 to 156 μm) using 150 × 150 paired-end Illumina sequencing. Following quality control, the dataset contained 360 million reads of approximately 200 bp average size from Bacteria, Archaea, Eukarya, and viruses. Bacterial metagenomes and metatranscriptomes were dominated by Synechococcus, Prochlorococcus, SAR11, SAR116, and SAR86, with high contributions from SAR324 and Verrucomicrobia at some stations. Diatoms, green picophytoplankton, dinoflagellates, haptophytes, and copepods dominated the eukaryotic genes and transcripts. Gene expression ratios differed by station, size fraction, and microbial group, with transcription levels varying over three orders of magnitude across taxa and environments. This first comprehensive inventory of microbial genes and transcripts, benchmarked with internal standards for full quantitation, is generating novel insights into biogeochemical processes of the Amazon plume and improving prediction of climate change impacts on the marine biosphere.

  16. A genome-wide MeSH-based literature mining system predicts implicit gene-to-gene relationships and networks.

    Science.gov (United States)

    Xiang, Zuoshuang; Qin, Tingting; Qin, Zhaohui S; He, Yongqun

    2013-10-16

    The large amount of literature in the post-genomics era enables the study of gene interactions and networks using all available articles published for a specific organism. MeSH is a controlled vocabulary of medical and scientific terms that is used by biomedical scientists to manually index articles in the PubMed literature database. We hypothesized that genome-wide gene-MeSH term associations from the PubMed literature database could be used to predict implicit gene-to-gene relationships and networks. While the gene-MeSH associations have been used to detect gene-gene interactions in some studies, different methods have not been well compared, and such a strategy has not been evaluated for a genome-wide literature analysis. Genome-wide literature mining of gene-to-gene interactions allows ranking of the best gene interactions and investigation of comprehensive biological networks at a genome level. The genome-wide GenoMesh literature mining algorithm was developed by sequentially generating a gene-article matrix, a normalized gene-MeSH term matrix, and a gene-gene matrix. The gene-gene matrix relies on the calculation of pairwise gene dissimilarities based on gene-MeSH relationships. An optimized dissimilarity score was identified from six well-studied functions based on a receiver operating characteristic (ROC) analysis. Based on the studies with well-studied Escherichia coli and less-studied Brucella spp., GenoMesh was found to accurately identify gene functions using weighted MeSH terms, predict gene-gene interactions not reported in the literature, and cluster all the genes studied from an organism using the MeSH-based gene-gene matrix. A web-based GenoMesh literature mining program is also available at: http://genomesh.hegroup.org. GenoMesh also predicts gene interactions and networks among genes associated with specific MeSH terms or user-selected gene lists. The GenoMesh algorithm and web program provide the first genome-wide, MeSH-based literature mining

  17. A genome-wide MeSH-based literature mining system predicts implicit gene-to-gene relationships and networks

    Science.gov (United States)

    2013-01-01

    Background The large amount of literature in the post-genomics era enables the study of gene interactions and networks using all available articles published for a specific organism. MeSH is a controlled vocabulary of medical and scientific terms that is used by biomedical scientists to manually index articles in the PubMed literature database. We hypothesized that genome-wide gene-MeSH term associations from the PubMed literature database could be used to predict implicit gene-to-gene relationships and networks. While the gene-MeSH associations have been used to detect gene-gene interactions in some studies, different methods have not been well compared, and such a strategy has not been evaluated for a genome-wide literature analysis. Genome-wide literature mining of gene-to-gene interactions allows ranking of the best gene interactions and investigation of comprehensive biological networks at a genome level. Results The genome-wide GenoMesh literature mining algorithm was developed by sequentially generating a gene-article matrix, a normalized gene-MeSH term matrix, and a gene-gene matrix. The gene-gene matrix relies on the calculation of pairwise gene dissimilarities based on gene-MeSH relationships. An optimized dissimilarity score was identified from six well-studied functions based on a receiver operating characteristic (ROC) analysis. Based on the studies with well-studied Escherichia coli and less-studied Brucella spp., GenoMesh was found to accurately identify gene functions using weighted MeSH terms, predict gene-gene interactions not reported in the literature, and cluster all the genes studied from an organism using the MeSH-based gene-gene matrix. A web-based GenoMesh literature mining program is also available at: http://genomesh.hegroup.org. GenoMesh also predicts gene interactions and networks among genes associated with specific MeSH terms or user-selected gene lists. Conclusions The GenoMesh algorithm and web program provide the first genome

  18. Metagenomic data of fungal internal transcribed spacer from serofluid dish, a traditional Chinese fermented food

    Directory of Open Access Journals (Sweden)

    Peng Chen

    2016-03-01

    Full Text Available Serofluid dish (or Jiangshui, in Chinese, a traditional food in the Chinese culture for thousands of years, is made from vegetables by fermentation. In this work, microorganism community of the fermented serofluid dish was investigated by the culture-independent method. The metagenomic data in this article contains the sequences of fungal internal transcribed spacer (ITS regions of rRNA genes from 12 different serofluid dish samples. The metagenome comprised of 50,865 average raw reads with an average of 8,958,220 bp and G + C content is 45.62%. This is the first report on metagenomic data of fungal ITS from serofluid dish employing Illumina platform to profile the fungal communities of this little known fermented food from Gansu Province, China. The Metagenomic data of fungal internal transcribed spacer can be accessed at NCBI, SRA database accession no. SRP067411.

  19. A statistical toolbox for metagenomics: assessing functional diversity in microbial communities

    Directory of Open Access Journals (Sweden)

    Handelsman Jo

    2008-01-01

    Full Text Available Abstract Background The 99% of bacteria in the environment that are recalcitrant to culturing have spurred the development of metagenomics, a culture-independent approach to sample and characterize microbial genomes. Massive datasets of metagenomic sequences have been accumulated, but analysis of these sequences has focused primarily on the descriptive comparison of the relative abundance of proteins that belong to specific functional categories. More robust statistical methods are needed to make inferences from metagenomic data. In this study, we developed and applied a suite of tools to describe and compare the richness, membership, and structure of microbial communities using peptide fragment sequences extracted from metagenomic sequence data. Results Application of these tools to acid mine drainage, soil, and whale fall metagenomic sequence collections revealed groups of peptide fragments with a relatively high abundance and no known function. When combined with analysis of 16S rRNA gene fragments from the same communities these tools enabled us to demonstrate that although there was no overlap in the types of 16S rRNA gene sequence observed, there was a core collection of operational protein families that was shared among the three environments. Conclusion The results of comparisons between the three habitats were surprising considering the relatively low overlap of membership and the distinctively different characteristics of the three habitats. These tools will facilitate the use of metagenomics to pursue statistically sound genome-based ecological analyses.

  20. [Metagenomics and biodiversity of sphagnum bogs].

    Science.gov (United States)

    Rusin, L Yu

    2016-01-01

    Biodiversity of sphagnum bogs is one of the richest and less studied, while these ecosystems are among the top ones in ecological, conservation, and economic value. Recent studies focused on the prokaryotic consortia associated with sphagnum mosses, and revealed the factors that maintain sustainability and productivity of bog ecosystems. High-throughput sequencing technologies provided insight into functional diversity of moss microbial communities (microbiomes), and helped to identify the biochemical pathways and gene families that facilitate the spectrum of adaptive strategies and largely foster the very successful colonization of the Northern hemisphere by sphagnum mosses. Rich and valuable information obtained on microbiomes of peat bogs sets off the paucity of evidence on their eukaryotic diversity. Prospects and expectations of reliable assessment of taxonomic profiles, relative abundance of taxa, and hidden biodiversity of microscopic eukaryotes in sphagnum bog ecosystems are briefly outlined in the context of today's metagenomics.

  1. Functional metagenomics to decipher food-microbe-host crosstalk.

    Science.gov (United States)

    Larraufie, Pierre; de Wouters, Tomas; Potocki-Veronese, Gabrielle; Blottière, Hervé M; Doré, Joël

    2015-02-01

    The recent developments of metagenomics permit an extremely high-resolution molecular scan of the intestinal microbiota giving new insights and opening perspectives for clinical applications. Beyond the unprecedented vision of the intestinal microbiota given by large-scale quantitative metagenomics studies, such as the EU MetaHIT project, functional metagenomics tools allow the exploration of fine interactions between food constituents, microbiota and host, leading to the identification of signals and intimate mechanisms of crosstalk, especially between bacteria and human cells. Cloning of large genome fragments, either from complex intestinal communities or from selected bacteria, allows the screening of these biological resources for bioactivity towards complex plant polymers or functional food such as prebiotics. This permitted identification of novel carbohydrate-active enzyme families involved in dietary fibre and host glycan breakdown, and highlighted unsuspected bacterial players at the top of the intestinal microbial food chain. Similarly, exposure of fractions from genomic and metagenomic clones onto human cells engineered with reporter systems to track modulation of immune response, cell proliferation or cell metabolism has allowed the identification of bioactive clones modulating key cell signalling pathways or the induction of specific genes. This opens the possibility to decipher mechanisms by which commensal bacteria or candidate probiotics can modulate the activity of cells in the intestinal epithelium or even in distal organs such as the liver, adipose tissue or the brain. Hence, in spite of our inability to culture many of the dominant microbes of the human intestine, functional metagenomics open a new window for the exploration of food-microbe-host crosstalk.

  2. Forest harvesting reduces the soil metagenomic potential for biomass decomposition.

    Science.gov (United States)

    Cardenas, Erick; Kranabetter, J M; Hope, Graeme; Maas, Kendra R; Hallam, Steven; Mohn, William W

    2015-11-01

    Soil is the key resource that must be managed to ensure sustainable forest productivity. Soil microbial communities mediate numerous essential ecosystem functions, and recent studies show that forest harvesting alters soil community composition. From a long-term soil productivity study site in a temperate coniferous forest in British Columbia, 21 forest soil shotgun metagenomes were generated, totaling 187 Gb. A method to analyze unassembled metagenome reads from the complex community was optimized and validated. The subsequent metagenome analysis revealed that, 12 years after forest harvesting, there were 16% and 8% reductions in relative abundances of biomass decomposition genes in the organic and mineral soil layers, respectively. Organic and mineral soil layers differed markedly in genetic potential for biomass degradation, with the organic layer having greater potential and being more strongly affected by harvesting. Gene families were disproportionately affected, and we identified 41 gene families consistently affected by harvesting, including families involved in lignin, cellulose, hemicellulose and pectin degradation. The results strongly suggest that harvesting profoundly altered below-ground cycling of carbon and other nutrients at this site, with potentially important consequences for forest regeneration. Thus, it is important to determine whether these changes foreshadow long-term changes in forest productivity or resilience and whether these changes are broadly characteristic of harvested forests.

  3. Revealing large metagenomic regions through long DNA fragment hybridization capture.

    Science.gov (United States)

    Gasc, Cyrielle; Peyret, Pierre

    2017-03-14

    High-throughput DNA sequencing technologies have revolutionized genomic analysis, including the de novo assembly of whole genomes from single organisms or metagenomic samples. However, due to the limited capacity of short-read sequence data to assemble complex or low coverage regions, genomes are typically fragmented, leading to draft genomes with numerous underexplored large genomic regions. Revealing these missing sequences is a major goal to resolve concerns in numerous biological studies. To overcome these limitations, we developed an innovative target enrichment method for the reconstruction of large unknown genomic regions. Based on a hybridization capture strategy, this approach enables the enrichment of large genomic regions allowing the reconstruction of tens of kilobase pairs flanking a short, targeted DNA sequence. Applied to a metagenomic soil sample targeting the linA gene, the biomarker of hexachlorocyclohexane (HCH) degradation, our method permitted the enrichment of the gene and its flanking regions leading to the reconstruction of several contigs and complete plasmids exceeding tens of kilobase pairs surrounding linA. Thus, through gene association and genome reconstruction, we identified microbial species involved in HCH degradation which constitute targets to improve biostimulation treatments. This new hybridization capture strategy makes surveying and deconvoluting complex genomic regions possible through large genomic regions enrichment and allows the efficient exploration of metagenomic diversity. Indeed, this approach enables to assign identity and function to microorganisms in natural environments, one of the ultimate goals of microbial ecology.

  4. Large-scale prokaryotic gene prediction and comparison to genome annotation

    DEFF Research Database (Denmark)

    Nielsen, Pernille; Krogh, Anders Stærmose

    2005-01-01

    -annotated. These results are based on the difference between the number of annotated genes not found by EasyGene and the number of predicted genes that are not annotated in GenBank. We argue that the average performance of our standardized and fully automated method is slightly better than the annotation....... genefinder EasyGene. Comparison of the GenBank and RefSeq annotations with the EasyGene predictions reveals that in some genomes up to 60% of the genes may have been annotated with a wrong start codon, especially in the GC-rich genomes. The fractional difference between annotated and predicted confirms...

  5. Physiology and phylogeny of the candidate phylum "Atribacteria" (formerly OP9/JS1) inferred from single-cell genomics and metagenomics

    Science.gov (United States)

    Dodsworth, J. A.; Murugapiran, S.; Blainey, P. C.; Nobu, M.; Rinke, C.; Schwientek, P.; Gies, E.; Webster, G.; Kille, P.; Weightman, A.; Liu, W. T.; Hallam, S.; Tsiamis, G.; Swingley, W.; Ross, C.; Tringe, S. G.; Chain, P. S.; Scholz, M. B.; Lo, C. C.; Raymond, J.; Quake, S. R.; Woyke, T.; Hedlund, B. P.

    2014-12-01

    Single-cell sequencing and metagenomics have extended the genomics revolution to yet-uncultivated microorganisms and provided insights into the coding potential of this so-called "microbial dark matter", including microbes belonging candidate phyla with no cultivated representatives. As more datasets emerge, comparison of individual genomes from different lineages and habitats can provide insight into the phylogeny, conserved features, and potential metabolic diversity of candidate phyla. The candidate bacterial phylum OP9 was originally found in Obsidian Pool, Yellowstone National Park, and it has since been detected in geothermal springs, petroleum reservoirs, and engineered thermal environments worldwide. JS1, another uncultivated bacterial lineage affiliated with OP9, is often abundant in marine sediments associated with methane hydrates, hydrocarbon seeps, and on continental margins and shelves, and is found in other non-thermal marine and subsurface environments. The phylogenetic relationship between OP9, JS1, and other Bacteria has not been fully resolved, and to date no axenic cultures from these lineages have been reported. Recently, 31 single amplified genomes (SAGs) from six distinct OP9 and JS1 lineages have been obtained using flow cytometric and microfluidic techniques. These SAGs were used to inform metagenome binning techniques that identified OP9/JS1 sequences in several metagenomes, extending genomic coverage in three of the OP9 and JS1 lineages. Phylogenomic analyses of these SAG and metagenome bin datasets suggest that OP9 and JS1 constitute a single, deeply branching phylum, for which the name "Atribacteria" has recently been proposed. Overall, members of the "Atribacteria" are predicted to be heterotrophic anaerobes without the capacity for respiration, with some lineages potentially specializing in secondary fermentation of organic acids. A set of signature "Atribacteria" genes was tentatively identified, including components of a bacterial

  6. In silico prediction of gene expression patterns in Citrus flavedo

    Directory of Open Access Journals (Sweden)

    Irving J. Berger

    2007-01-01

    Full Text Available Out of the 18,942 flavedo expressed sequences (clusters plus singletons in Citrus sinensis from the Citrus EST Project (CitEST, 25 were statistically supported to be differentially expressed in this tissue after a double in silico hybridization strategy against leaf-, flower-, and bark-derived ESTs. Five of them, two terpene synthases and three O-methyltransferases, are absent in the other citrus tissues with concomitant 2x2 statistics, supporting the hypothesis that they are putative flavedo-specific expressed sequences. The pattern of these differentially expressed sequences during fruit development suggests that most of them are developmentally regulated. Some expressed gene products, including a putative germin-like protein highly expressed in flavedo, are shown to be promising candidates for further characterization. In addition to promoter seeking, this kind of analysis can lead to gene discovery, tissue-specific and tissue-enriched expression pattern predictions (as shown herein and can also be adopted as an in silico first, and probably reliable approach, for detecting expression profiles from EST sequencing efforts before experimental validation is available or for heuristically guiding that validation.

  7. Genes implicated in serotonergic and dopaminergic functioning predict BMI categories.

    Science.gov (United States)

    Fuemmeler, Bernard F; Agurs-Collins, Tanya D; McClernon, F Joseph; Kollins, Scott H; Kail, Melanie E; Bergen, Andrew W; Ashley-Koch, Allison E

    2008-02-01

    This study addressed the hypothesis that variation in genes associated with dopamine function (SLC6A3, DRD2, DRD4), serotonin function (SLC6A4, and regulation of monoamine levels (MAOA) may be predictive of BMI categories (obese and overweight + obese) in young adulthood and of changes in BMI as adolescents transition into young adulthood. Interactions with gender and race/ethnicity were also examined. Participants were a subsample of individuals from the National Longitudinal Study of Adolescent Health (Add Health), a nationally representative sample of adolescents followed from 1995 to 2002. The sample analyzed included a subset of 1,584 unrelated individuals with genotype data. Multiple logistic regressions were conducted to evaluate the associations between genotypes and obesity (BMI > 29.9) or overweight + obese combined (BMI > or = 25) with normal weight (BMI = 18.5-24.9) as a referent. Linear regression models were used to examine change in BMI from adolescence to young adulthood. Significant associations were found between SLC6A4 5HTTLPR and categories of BMI, and between MAOA promoter variable number tandem repeat (VNTR) among men and categories of BMI. Stratified analyses revealed that the association between these two genes and excess BMI was significant for men overall and for white and Hispanic men specifically. Linear regression models indicated a significant effect of SLC6A4 5HTTLPR on change in BMI from adolescence to young adulthood. Our findings lend further support to the involvement of genes implicated in dopamine and serotonin regulation on energy balance.

  8. Predictive gene testing for Huntington disease and other neurodegenerative disorders.

    Science.gov (United States)

    Wedderburn, S; Panegyres, P K; Andrew, S; Goldblatt, J; Liebeck, T; McGrath, F; Wiltshire, M; Pestell, C; Lee, J; Beilby, J

    2013-12-01

    Controversies exist around predictive testing (PT) programmes in neurodegenerative disorders. This study sets out to answer the following questions relating to Huntington disease (HD) and other neurodegenerative disorders: differences between these patients in their PT journeys, why and when individuals withdraw from PT, and decision-making processes regarding reproductive genetic testing. A case series analysis of patients having PT from the multidisciplinary Western Australian centre for PT over the past 20 years was performed using internationally recognised guidelines for predictive gene testing in neurodegenerative disorders. Of 740 at-risk patients, 518 applied for PT: 466 at risk of HD, 52 at risk of other neurodegenerative disorders - spinocerebellar ataxias, hereditary prion disease and familial Alzheimer disease. Thirteen percent withdrew from PT - 80.32% of withdrawals occurred during counselling stages. Major withdrawal reasons related to timing in the patients' lives or unknown as the patient did not disclose the reason. Thirty-eight HD individuals had reproductive genetic testing: 34 initiated prenatal testing (of which eight withdrew from the process) and four initiated pre-implantation genetic diagnosis. There was no recorded or other evidence of major psychological reactions or suicides during PT. People withdrew from PT in relation to life stages and reasons that are unknown. Our findings emphasise the importance of: (i) adherence to internationally recommended guidelines for PT; (ii) the role of the multidisciplinary team in risk minimisation; and (iii) patient selection. © 2013 The Authors; Internal Medicine Journal © 2013 Royal Australasian College of Physicians.

  9. Compositional profile of α/β-hydrolase fold proteins in mangrove soil metagenomes: prevalence of epoxide hydrolases and haloalkane dehalogenases in oil-contaminated sites

    Science.gov (United States)

    Jiménez, Diego Javier; Dini-Andreote, Francisco; Ottoni, Júlia Ronzella; de Oliveira, Valéria Maia; van Elsas, Jan Dirk; Andreote, Fernando Dini

    2015-01-01

    The occurrence of genes encoding biotechnologically relevant α/β-hydrolases in mangrove soil microbial communities was assessed using data obtained by whole-metagenome sequencing of four mangroves areas, denoted BrMgv01 to BrMgv04, in São Paulo, Brazil. The sequences (215 Mb in total) were filtered based on local amino acid alignments against the Lipase Engineering Database. In total, 5923 unassembled sequences were affiliated with 30 different α/β-hydrolase fold superfamilies. The most abundant predicted proteins encompassed cytosolic hydrolases (abH08; ∼ 23%), microsomal hydrolases (abH09; ∼ 12%) and Moraxella lipase-like proteins (abH04 and abH01; mangroves BrMgv01-02-03. This suggested selection and putative involvement in local degradation/detoxification of the pollutants. Seven sequences that were annotated as genes for putative epoxide hydrolases and five for putative haloalkane dehalogenases were found in a fosmid library generated from BrMgv02 DNA. The latter enzymes were predicted to belong to Actinobacteria, Deinococcus-Thermus, Planctomycetes and Proteobacteria. Our integrated approach thus identified 12 genes (complete and/or partial) that may encode hitherto undescribed enzymes. The low amino acid identity (< 60%) with already-described genes opens perspectives for both production in an expression host and genetic screening of metagenomes. PMID:25171437

  10. Compositional profile of α / β-hydrolase fold proteins in mangrove soil metagenomes: prevalence of epoxide hydrolases and haloalkane dehalogenases in oil-contaminated sites.

    Science.gov (United States)

    Jiménez, Diego Javier; Dini-Andreote, Francisco; Ottoni, Júlia Ronzella; de Oliveira, Valéria Maia; van Elsas, Jan Dirk; Andreote, Fernando Dini

    2015-05-01

    The occurrence of genes encoding biotechnologically relevant α/β-hydrolases in mangrove soil microbial communities was assessed using data obtained by whole-metagenome sequencing of four mangroves areas, denoted BrMgv01 to BrMgv04, in São Paulo, Brazil. The sequences (215 Mb in total) were filtered based on local amino acid alignments against the Lipase Engineering Database. In total, 5923 unassembled sequences were affiliated with 30 different α/β-hydrolase fold superfamilies. The most abundant predicted proteins encompassed cytosolic hydrolases (abH08; ∼ 23%), microsomal hydrolases (abH09; ∼ 12%) and Moraxella lipase-like proteins (abH04 and abH01; mangroves BrMgv01-02-03. This suggested selection and putative involvement in local degradation/detoxification of the pollutants. Seven sequences that were annotated as genes for putative epoxide hydrolases and five for putative haloalkane dehalogenases were found in a fosmid library generated from BrMgv02 DNA. The latter enzymes were predicted to belong to Actinobacteria, Deinococcus-Thermus, Planctomycetes and Proteobacteria. Our integrated approach thus identified 12 genes (complete and/or partial) that may encode hitherto undescribed enzymes. The low amino acid identity (< 60%) with already-described genes opens perspectives for both production in an expression host and genetic screening of metagenomes. © 2014 The Authors. Microbial Biotechnology published by John Wiley & Sons Ltd and Society for Applied Microbiology.

  11. Characterization of a novel beta-glucosidase-like activity from a soil metagenome.

    Science.gov (United States)

    Jiang, Chengjian; Ma, Gefei; Li, Shuangxi; Hu, Tingting; Che, Zhiqun; Shen, Peihong; Yan, Bing; Wu, Bo

    2009-10-01

    We report the cloning of a novel beta-glucosidase-like gene by function-based screening of a metagenomic library from uncultured soil microorganisms. The gene was named bgllC and has an open reading frame of 1,443 base pairs. It encodes a 481 amino acid polypeptide with a predicted molecular mass of about 57.8 kDa. The deduced amino acid sequence did not show any homology with known beta-glucosidases. The putative beta-glucosidase gene was subcloned into the pETBlue-2 vector and overexpressed in E. coli Tuner (DE3) pLacI; the recombinant protein was purified to homogeneity. Functional characterization with a high performance liquid chromatography method demonstrated that the recombinant BgllC protein hydrolyzed D-glucosyl-beta-(l-4)-D-glucose to glucose. The maximum activity for BgllC protein occurred at pH 8.0 and 42 degrees C using p-nitrophenyl-beta-D-glucoside as the substrate. A CaCl(2) concentration of 1 mM was required for optimal activity. The putative beta-glucosidase had an apparent K(m) value of 0.19 mM, a V(max) value of 4.75 U/mg and a k (cat) value of 316.7/min under the optimal reaction conditions. The biochemical characterization of BgllC has enlarged our understanding of the novel enzymes that can be isolated from the soil metagenome.

  12. Metagenomics and the protein universe

    Science.gov (United States)

    Godzik, Adam

    2011-01-01

    Metagenomics sequencing projects have dramatically increased our knowledge of the protein universe and provided over one-half of currently known protein sequences; they have also introduced a much broader phylogenetic diversity into the protein databases. The full analysis of metagenomic datasets is only beginning, but it has already led to the discovery of thousands of new protein families, likely representing novel functions specific to given environments. At the same time, a deeper analysis of such novel families, including experimental structure determination of some representatives, suggests that most of them represent distant homologs of already characterized protein families, and thus most of the protein diversity present in the new environments are due to functional divergence of the known protein families rather than the emergence of new ones. PMID:21497084

  13. Metagenomic Analysis of Dairy Bacteriophages

    DEFF Research Database (Denmark)

    Muhammed, Musemma K.; Kot, Witold; Neve, Horst

    2017-01-01

    Despite their huge potential for characterizing the biodiversity of phages, metagenomic studies are currently not available for dairy bacteriophages, partly due to the lack of a standard procedure for phage extraction. We optimized an extraction method that allows to remove the bulk protein from ...... diversity. Possible co-induction of temperate P335 prophages and satellite phages in one of the whey mixtures was also observed....

  14. Phylogenetic analysis of a spontaneous cocoa bean fermentation metagenome reveals new insights into its bacterial and fungal community diversity.

    Directory of Open Access Journals (Sweden)

    Koen Illeghems

    Full Text Available This is the first report on the phylogenetic analysis of the community diversity of a single spontaneous cocoa bean box fermentation sample through a metagenomic approach involving 454 pyrosequencing. Several sequence-based and composition-based taxonomic profiling tools were used and evaluated to avoid software-dependent results and their outcome was validated by comparison with previously obtained culture-dependent and culture-independent data. Overall, this approach revealed a wider bacterial (mainly γ-Proteobacteria and fungal diversity than previously found. Further, the use of a combination of different classification methods, in a software-independent way, helped to understand the actual composition of the microbial ecosystem under study. In addition, bacteriophage-related sequences were found. The bacterial diversity depended partially on the methods used, as composition-based methods predicted a wider diversity than sequence-based methods, and as classification methods based solely on phylogenetic marker genes predicted a more restricted diversity compared with methods that took all reads into account. The metagenomic sequencing analysis identified Hanseniaspora uvarum, Hanseniaspora opuntiae, Saccharomyces cerevisiae, Lactobacillus fermentum, and Acetobacter pasteurianus as the prevailing species. Also, the presence of occasional members of the cocoa bean fermentation process was revealed (such as Erwinia tasmaniensis, Lactobacillus brevis, Lactobacillus casei, Lactobacillus rhamnosus, Lactococcus lactis, Leuconostoc mesenteroides, and Oenococcus oeni. Furthermore, the sequence reads associated with viral communities were of a restricted diversity, dominated by Myoviridae and Siphoviridae, and reflecting Lactobacillus as the dominant host. To conclude, an accurate overview of all members of a cocoa bean fermentation process sample was revealed, indicating the superiority of metagenomic sequencing over previously used techniques.

  15. Phylogenetic analysis of a spontaneous cocoa bean fermentation metagenome reveals new insights into its bacterial and fungal community diversity.

    Science.gov (United States)

    Illeghems, Koen; De Vuyst, Luc; Papalexandratou, Zoi; Weckx, Stefan

    2012-01-01

    This is the first report on the phylogenetic analysis of the community diversity of a single spontaneous cocoa bean box fermentation sample through a metagenomic approach involving 454 pyrosequencing. Several sequence-based and composition-based taxonomic profiling tools were used and evaluated to avoid software-dependent results and their outcome was validated by comparison with previously obtained culture-dependent and culture-independent data. Overall, this approach revealed a wider bacterial (mainly γ-Proteobacteria) and fungal diversity than previously found. Further, the use of a combination of different classification methods, in a software-independent way, helped to understand the actual composition of the microbial ecosystem under study. In addition, bacteriophage-related sequences were found. The bacterial diversity depended partially on the methods used, as composition-based methods predicted a wider diversity than sequence-based methods, and as classification methods based solely on phylogenetic marker genes predicted a more restricted diversity compared with methods that took all reads into account. The metagenomic sequencing analysis identified Hanseniaspora uvarum, Hanseniaspora opuntiae, Saccharomyces cerevisiae, Lactobacillus fermentum, and Acetobacter pasteurianus as the prevailing species. Also, the presence of occasional members of the cocoa bean fermentation process was revealed (such as Erwinia tasmaniensis, Lactobacillus brevis, Lactobacillus casei, Lactobacillus rhamnosus, Lactococcus lactis, Leuconostoc mesenteroides, and Oenococcus oeni). Furthermore, the sequence reads associated with viral communities were of a restricted diversity, dominated by Myoviridae and Siphoviridae, and reflecting Lactobacillus as the dominant host. To conclude, an accurate overview of all members of a cocoa bean fermentation process sample was revealed, indicating the superiority of metagenomic sequencing over previously used techniques.

  16. Integrative Workflows for Metagenomic Analysis

    Directory of Open Access Journals (Sweden)

    Efthymios eLadoukakis

    2014-11-01

    Full Text Available The rapid evolution of all sequencing technologies, described by the term Next Generation Sequencing (NGS, have revolutionized metagenomic analysis. They constitute a combination of high-throughput analytical protocols, coupled to delicate measuring techniques, in order to potentially discover, properly assemble and map allelic sequences to the correct genomes, achieving particularly high yields for only a fraction of the cost of traditional processes (i.e. Sanger. From a bioinformatic perspective, this boils down to many gigabytes of data being generated from each single sequencing experiment, rendering the management or even the storage, critical bottlenecks with respect to the overall analytical endeavor. The enormous complexity is even more aggravated by the versatility of the processing steps available, represented by the numerous bioinformatic tools that are essential, for each analytical task, in order to fully unveil the genetic content of a metagenomic dataset. These disparate tasks range from simple, nonetheless non-trivial, quality control of raw data to exceptionally complex protein annotation procedures, requesting a high level of expertise for their proper application or the neat implementation of the whole workflow. Furthermore, a bioinformatic analysis of such scale, requires grand computational resources, imposing as the sole realistic solution, the utilization of cloud computing infrastructures. In this review article we discuss different, integrative, bioinformatic solutions available, which address the aforementioned issues, by performing a critical assessment of the available automated pipelines for data management, quality control and annotation of metagenomic data, embracing various, major sequencing technologies and applications.

  17. Culture-independent discovery of natural products from soil metagenomes.

    Science.gov (United States)

    Katz, Micah; Hover, Bradley M; Brady, Sean F

    2016-03-01

    Bacterial natural products have proven to be invaluable starting points in the development of many currently used therapeutic agents. Unfortunately, traditional culture-based methods for natural product discovery have been deemphasized by pharmaceutical companies due in large part to high rediscovery rates. Culture-independent, or "metagenomic," methods, which rely on the heterologous expression of DNA extracted directly from environmental samples (eDNA), have the potential to provide access to metabolites encoded by a large fraction of the earth's microbial biosynthetic diversity. As soil is both ubiquitous and rich in bacterial diversity, it is an appealing starting point for culture-independent natural product discovery efforts. This review provides an overview of the history of soil metagenome-driven natural product discovery studies and elaborates on the recent development of new tools for sequence-based, high-throughput profiling of environmental samples used in discovering novel natural product biosynthetic gene clusters. We conclude with several examples of these new tools being employed to facilitate the recovery of novel secondary metabolite encoding gene clusters from soil metagenomes and the subsequent heterologous expression of these clusters to produce bioactive small molecules.

  18. Metagenome of a Versatile Chemolithoautotroph from Expanding Oceanic Dead Zones

    Energy Technology Data Exchange (ETDEWEB)

    Walsh, David A.; Zaikova, Elena; Howes, Charles L.; Song, Young; Wright, Jody; Tringe, Susannah G.; Tortell, Philippe D.; Hallam, Steven J.

    2009-07-15

    Oxygen minimum zones (OMZs), also known as oceanic"dead zones", are widespread oceanographic features currently expanding due to global warming and coastal eutrophication. Although inhospitable to metazoan life, OMZs support a thriving but cryptic microbiota whose combined metabolic activity is intimately connected to nutrient and trace gas cycling within the global ocean. Here we report time-resolved metagenomic analyses of a ubiquitous and abundant but uncultivated OMZ microbe (SUP05) closely related to chemoautotrophic gill symbionts of deep-sea clams and mussels. The SUP05 metagenome harbors a versatile repertoire of genes mediating autotrophic carbon assimilation, sulfur-oxidation and nitrate respiration responsive to a wide range of water column redox states. Thus, SUP05 plays integral roles in shaping nutrient and energy flow within oxygen-deficient oceanic waters via carbon sequestration, sulfide detoxification and biological nitrogen loss with important implications for marine productivity and atmospheric greenhouse control.

  19. A metagenomic framework for the study of airborne microbial communities.

    Directory of Open Access Journals (Sweden)

    Shibu Yooseph

    Full Text Available Understanding the microbial content of the air has important scientific, health, and economic implications. While studies have primarily characterized the taxonomic content of air samples by sequencing the 16S or 18S ribosomal RNA gene, direct analysis of the genomic content of airborne microorganisms has not been possible due to the extremely low density of biological material in airborne environments. We developed sampling and amplification methods to enable adequate DNA recovery to allow metagenomic profiling of air samples collected from indoor and outdoor environments. Air samples were collected from a large urban building, a medical center, a house, and a pier. Analyses of metagenomic data generated from these samples reveal airborne communities with a high degree of diversity and different genera abundance profiles. The identities of many of the taxonomic groups and protein families also allows for the identification of the likely sources of the sampled airborne bacteria.

  20. Metagenome of a versatile chemolithoautotroph from expanding oceanic dead zones.

    Science.gov (United States)

    Walsh, David A; Zaikova, Elena; Howes, Charles G; Song, Young C; Wright, Jody J; Tringe, Susannah G; Tortell, Philippe D; Hallam, Steven J

    2009-10-23

    Oxygen minimum zones, also known as oceanic "dead zones," are widespread oceanographic features currently expanding because of global warming. Although inhospitable to metazoan life, they support a cryptic microbiota whose metabolic activities affect nutrient and trace gas cycling within the global ocean. Here, we report metagenomic analyses of a ubiquitous and abundant but uncultivated oxygen minimum zone microbe (SUP05) related to chemoautotrophic gill symbionts of deep-sea clams and mussels. The SUP05 metagenome harbors a versatile repertoire of genes mediating autotrophic carbon assimilation, sulfur oxidation, and nitrate respiration responsive to a wide range of water-column redox states. Our analysis provides a genomic foundation for understanding the ecological and biogeochemical role of pelagic SUP05 in oxygen-deficient oceanic waters and its potential sensitivity to environmental changes.

  1. A modified GC-specific MAKER gene annotation method reveals improved and novel gene predictions of high and low GC content in Oryza sativa.

    Science.gov (United States)

    Bowman, Megan J; Pulman, Jane A; Liu, Tiffany L; Childs, Kevin L

    2017-11-25

    Accurate structural annotation depends on well-trained gene prediction programs. Training data for gene prediction programs are often chosen randomly from a subset of high-quality genes that ideally represent the variation found within a genome. One aspect of gene variation is GC content, which differs across species and is bimodal in grass genomes. When gene prediction programs are trained on a subset of grass genes with random GC content, they are effectively being trained on two classes of genes at once, and this can be expected to result in poor results when genes are predicted in new genome sequences. We find that gene prediction programs trained on grass genes with random GC content do not completely predict all grass genes with extreme GC content. We show that gene prediction programs that are trained with grass genes with high or low GC content can make both better and unique gene predictions compared to gene prediction programs that are trained on genes with random GC content. By separately training gene prediction programs with genes from multiple GC ranges and using the programs within the MAKER genome annotation pipeline, we were able to improve the annotation of the Oryza sativa genome compared to using the standard MAKER annotation protocol. Gene structure was improved in over 13% of genes, and 651 novel genes were predicted by the GC-specific MAKER protocol. We present a new GC-specific MAKER annotation protocol to predict new and improved gene models and assess the biological significance of this method in Oryza sativa. We expect that this protocol will also be beneficial for gene prediction in any organism with bimodal or other unusual gene GC content.

  2. Characterization of a Metagenome-Derived β-Glucosidase and Its Application in Conversion of Polydatin to Resveratrol

    Directory of Open Access Journals (Sweden)

    Zhimao Mai

    2016-03-01

    Full Text Available For the beneficial pharmacological properties of resveratrol, there is increasingly interest in enzymatic conversion of polydatin to resveratrol. The metagenomic technique provides an effective strategy for mining novel polydatin-hydrolysis enzymes from uncultured microorganisms. In this study, a metagenomic library of mangrove soil was constructed and a novel β-glucosidase gene MlBgl was isolated. The deduced amino acid sequences of MlBgl showed the highest identity of 64% with predicted β-glucosidase in the GenBank database. The gene was cloned and overexpressed in Escherichia coli BL21(DE3. Sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE assay demonstrated the purified recombinant β-glucosidase r-MlBgl with a molecular weight approximately of 71 kDa. The optimal pH and temperature of purified recombinant r-MlBgl were 7.0 and 40 °C, respectively. r-MlBgl could hydrolyze polydatin effectively. The kcat and kcat/Km values for polydatin were 989 s−1 and 1476 mM−1·s−1, respectively. These properties suggest that -r-MlBgl has potential application in the enzymatic conversion of polydatin to resveratrol for further study.

  3. Prediction of Wild-type Enzyme Characteristics

    DEFF Research Database (Denmark)

    Geertz-Hansen, Henrik Marcus

    of biotechnology, including enzyme discovery and characterization. This work presents two articles on sequence-based discovery and functional annotation of enzymes in environmental samples, and two articles on analysis and prediction of enzyme thermostability and cofactor requirements. The first article presents...... a sequence-based approach to discovery of proteolytic enzymes in metagenomes obtained from the Polar oceans. We show that microorganisms living in these extreme environments of constant low temperature harbour genes encoding novel proteolytic enzymes with potential industrial relevance. The second article...... presents a web server for the processing and annotation of functional metagenomics sequencing data, tailored to meet the requirements of non-bioinformaticians. The third article presents analyses of the molecular determinants of enzyme thermostability, and a feature-based prediction method of the melting...

  4. Stromal Gene Expression is Predictive for Metastatic Primary Prostate Cancer.

    Science.gov (United States)

    Mo, Fan; Lin, Dong; Takhar, Mandeep; Ramnarine, Varune Rohan; Dong, Xin; Bell, Robert H; Volik, Stanislav V; Wang, Kendric; Xue, Hui; Wang, Yuwei; Haegert, Anne; Anderson, Shawn; Brahmbhatt, Sonal; Erho, Nicholas; Wang, Xinya; Gout, Peter W; Morris, James; Karnes, R Jeffrey; Den, Robert B; Klein, Eric A; Schaeffer, Edward M; Ross, Ashley; Ren, Shancheng; Sahinalp, S Cenk; Li, Yingrui; Xu, Xun; Wang, Jun; Wang, Jian; Gleave, Martin E; Davicioni, Elai; Sun, Yinghao; Wang, Yuzhuo; Collins, Colin C

    2018-04-01

    Clinical grading systems using clinical features alongside nomograms lack precision in guiding treatment decisions in prostate cancer (PCa). There is a critical need for identification of biomarkers that can more accurately stratify patients with primary PCa. To identify a robust prognostic signature to better distinguish indolent from aggressive prostate cancer (PCa). To develop the signature, whole-genome and whole-transcriptome sequencing was conducted on five PCa patient-derived xenograft (PDX) models collected from independent foci of a single primary tumor and exhibiting variable metastatic phenotypes. Multiple independent clinical cohorts including an intermediate-risk cohort were used to validate the biomarkers. The outcome measurement defining aggressive PCa was metastasis following radical prostatectomy. A generalized linear model with lasso regularization was used to build a 93-gene stroma-derived metastasis signature (SDMS). The SDMS association with metastasis was assessed using a Wilcoxon rank-sum test. Performance was evaluated using the area under the curve (AUC) for the receiver operating characteristic, and Kaplan-Meier curves. Univariable and multivariable regression models were used to compare the SDMS alongside clinicopathological variables and reported signatures. AUC was assessed to determine if SDMS is additive or synergistic to previously reported signatures. A close association between stromal gene expression and metastatic phenotype was observed. Accordingly, the SDMS was modeled and validated in multiple independent clinical cohorts. Patients with higher SDMS scores were found to have worse prognosis. Furthermore, SDMS was an independent prognostic factor, can stratify risk in intermediate-risk PCa, and can improve the performance of other previously reported signatures. Profiling of stromal gene expression led to development of an SDMS that was validated as independently prognostic for the metastatic potential of prostate tumors. Our

  5. Comparative analysis of metagenomes of Italian top soil improvers.

    Science.gov (United States)

    Gigliucci, Federica; Brambilla, Gianfranco; Tozzoli, Rosangela; Michelacci, Valeria; Morabito, Stefano

    2017-05-01

    Biosolids originating from Municipal Waste Water Treatment Plants are proposed as top soil improvers (TSI) for their beneficial input of organic carbon on agriculture lands. Their use to amend soil is controversial, as it may lead to the presence of emerging hazards of anthropogenic or animal origin in the environment devoted to food production. In this study, we used a shotgun metagenomics sequencing as a tool to perform a characterization of the hazards related with the TSIs. The samples showed the presence of many virulence genes associated to different diarrheagenic E. coli pathotypes as well as of different antimicrobial resistance-associated genes. The genes conferring resistance to Fluoroquinolones was the most relevant class of antimicrobial resistance genes observed in all the samples tested. To a lesser extent traits associated with the resistance to Methicillin in Staphylococci and genes conferring resistance to Streptothricin, Fosfomycin and Vancomycin were also identified. The most represented metal resistance genes were cobalt-zinc-cadmium related, accounting for 15-50% of the sequence reads in the different metagenomes out of the total number of those mapping on the class of resistance to compounds determinants. Moreover the taxonomic analysis performed by comparing compost-based samples and biosolids derived from municipal sewage-sludges treatments divided the samples into separate populations, based on the microbiota composition. The results confirm that the metagenomics is efficient to detect genomic traits associated with pathogens and antimicrobial resistance in complex matrices and this approach can be efficiently used for the traceability of TSI samples using the microorganisms' profiles as indicators of their origin. Copyright © 2017 Elsevier Inc. All rights reserved.

  6. Exploring neighborhoods in the metagenome universe.

    Science.gov (United States)

    Aßhauer, Kathrin P; Klingenberg, Heiner; Lingner, Thomas; Meinicke, Peter

    2014-07-14

    The variety of metagenomes in current databases provides a rapidly growing source of information for comparative studies. However, the quantity and quality of supplementary metadata is still lagging behind. It is therefore important to be able to identify related metagenomes by means of the available sequence data alone. We have studied efficient sequence-based methods for large-scale identification of similar metagenomes within a database retrieval context. In a broad comparison of different profiling methods we found that vector-based distance measures are well-suitable for the detection of metagenomic neighbors. Our evaluation on more than 1700 publicly available metagenomes indicates that for a query metagenome from a particular habitat on average nine out of ten nearest neighbors represent the same habitat category independent of the utilized profiling method or distance measure. While for well-defined labels a neighborhood accuracy of 100% can be achieved, in general the neighbor detection is severely affected by a natural overlap of manually annotated categories. In addition, we present results of a novel visualization method that is able to reflect the similarity of metagenomes in a 2D scatter plot. The visualization method shows a similarly high accuracy in the reduced space as compared with the high-dimensional profile space. Our study suggests that for inspection of metagenome neighborhoods the profiling methods and distance measures can be chosen to provide a convenient interpretation of results in terms of the underlying features. Furthermore, supplementary metadata of metagenome samples in the future needs to comply with readily available ontologies for fine-grained and standardized annotation. To make profile-based k-nearest-neighbor search and the 2D-visualization of the metagenome universe available to the research community, we included the proposed methods in our CoMet-Universe server for comparative metagenome analysis.

  7. Exploring Neighborhoods in the Metagenome Universe

    Science.gov (United States)

    Aßhauer, Kathrin P.; Klingenberg, Heiner; Lingner, Thomas; Meinicke, Peter

    2014-01-01

    The variety of metagenomes in current databases provides a rapidly growing source of information for comparative studies. However, the quantity and quality of supplementary metadata is still lagging behind. It is therefore important to be able to identify related metagenomes by means of the available sequence data alone. We have studied efficient sequence-based methods for large-scale identification of similar metagenomes within a database retrieval context. In a broad comparison of different profiling methods we found that vector-based distance measures are well-suitable for the detection of metagenomic neighbors. Our evaluation on more than 1700 publicly available metagenomes indicates that for a query metagenome from a particular habitat on average nine out of ten nearest neighbors represent the same habitat category independent of the utilized profiling method or distance measure. While for well-defined labels a neighborhood accuracy of 100% can be achieved, in general the neighbor detection is severely affected by a natural overlap of manually annotated categories. In addition, we present results of a novel visualization method that is able to reflect the similarity of metagenomes in a 2D scatter plot. The visualization method shows a similarly high accuracy in the reduced space as compared with the high-dimensional profile space. Our study suggests that for inspection of metagenome neighborhoods the profiling methods and distance measures can be chosen to provide a convenient interpretation of results in terms of the underlying features. Furthermore, supplementary metadata of metagenome samples in the future needs to comply with readily available ontologies for fine-grained and standardized annotation. To make profile-based k-nearest-neighbor search and the 2D-visualization of the metagenome universe available to the research community, we included the proposed methods in our CoMet-Universe server for comparative metagenome analysis. PMID:25026170

  8. Multisubstrate isotope labeling and metagenomic analysis of active soil bacterial communities.

    Science.gov (United States)

    Verastegui, Y; Cheng, J; Engel, K; Kolczynski, D; Mortimer, S; Lavigne, J; Montalibet, J; Romantsov, T; Hall, M; McConkey, B J; Rose, D R; Tomashek, J J; Scott, B R; Charles, T C; Neufeld, J D

    2014-07-15

    Soil microbial diversity represents the largest global reservoir of novel microorganisms and enzymes. In this study, we coupled functional metagenomics and DNA stable-isotope probing (DNA-SIP) using multiple plant-derived carbon substrates and diverse soils to characterize active soil bacterial communities and their glycoside hydrolase genes, which have value for industrial applications. We incubated samples from three disparate Canadian soils (tundra, temperate rainforest, and agricultural) with five native carbon ((12)C) or stable-isotope-labeled ((13)C) carbohydrates (glucose, cellobiose, xylose, arabinose, and cellulose). Indicator species analysis revealed high specificity and fidelity for many uncultured and unclassified bacterial taxa in the heavy DNA for all soils and substrates. Among characterized taxa, Actinomycetales (Salinibacterium), Rhizobiales (Devosia), Rhodospirillales (Telmatospirillum), and Caulobacterales (Phenylobacterium and Asticcacaulis) were bacterial indicator species for the heavy substrates and soils tested. Both Actinomycetales and Caulobacterales (Phenylobacterium) were associated with metabolism of cellulose, and Alphaproteobacteria were associated with the metabolism of arabinose; members of the order Rhizobiales were strongly associated with the metabolism of xylose. Annotated metagenomic data suggested diverse glycoside hydrolase gene representation within the pooled heavy DNA. By screening 2,876 cloned fragments derived from the (13)C-labeled DNA isolated from soils incubated with cellulose, we demonstrate the power of combining DNA-SIP, multiple-displacement amplification (MDA), and functional metagenomics by efficiently isolating multiple clones with activity on carboxymethyl cellulose and fluorogenic proxy substrates for carbohydrate-active enzymes. Importance: The ability to identify genes based on function, instead of sequence homology, allows the discovery of genes that would not be identified through sequence alone. This

  9. Expanding the marine virosphere using metagenomics.

    Directory of Open Access Journals (Sweden)

    Carolina Megumi Mizuno

    Full Text Available Viruses infecting prokaryotic cells (phages are the most abundant entities of the biosphere and contain a largely uncharted wealth of genomic diversity. They play a critical role in the biology of their hosts and in ecosystem functioning at large. The classical approaches studying phages require isolation from a pure culture of the host. Direct sequencing approaches have been hampered by the small amounts of phage DNA present in most natural habitats and the difficulty in applying meta-omic approaches, such as annotation of small reads and assembly. Serendipitously, it has been discovered that cellular metagenomes of highly productive ocean waters (the deep chlorophyll maximum contain significant amounts of viral DNA derived from cells undergoing the lytic cycle. We have taken advantage of this phenomenon to retrieve metagenomic fosmids containing viral DNA from a Mediterranean deep chlorophyll maximum sample. This method allowed description of complete genomes of 208 new marine phages. The diversity of these genomes was remarkable, contributing 21 genomic groups of tailed bacteriophages of which 10 are completely new. Sequence based methods have allowed host assignment to many of them. These predicted hosts represent a wide variety of important marine prokaryotic microbes like members of SAR11 and SAR116 clades, Cyanobacteria and also the newly described low GC Actinobacteria. A metavirome constructed from the same habitat showed that many of the new phage genomes were abundantly represented. Furthermore, other available metaviromes also indicated that some of the new phages are globally distributed in low to medium latitude ocean waters. The availability of many genomes from the same sample allows a direct approach to viral population genomics confirming the remarkable mosaicism of phage genomes.

  10. Metagenomic analysis of fungal taxa inhabiting Mecca region, Saudi Arabia

    Directory of Open Access Journals (Sweden)

    Tarek A.A. Moussa

    2016-09-01

    Full Text Available The data presented contains the sequences of fungal Internal Transcribed Spacer (ITS and 18S rRNA gene from a metagenome of the Mecca region, Saudi Arabia. Sequences were amplified using fungal specific primers, which amplified the amplicon aligned between the 18S and 28S rRNA genes. A total of 460 fungal species belonging to 133 genera, 58 families, 33 orders, 13 classes and 4 phyla were identified in four contrasting locations. The raw sequencing data used to perform this analysis along with FASTQ file are located in the NCBI Sequence Read Archive (SRA under accession numbers: SRR3150823, SRR3144873, SRR3150825 and SRR3150846.

  11. Metagenomic data analysis : computational methods and applications

    NARCIS (Netherlands)

    Gori, F.

    2013-01-01

    Metagenomics is the study of the genomic content of microbial communities, acquired through DNA sequencing technology. The main advantage of metagenomics is that it can overcome the limitations of individual genome sequencing, that can work only on the few culturable microbes. Unfortunately, the

  12. Back to the Future of Soil Metagenomics.\

    Czech Academy of Sciences Publication Activity Database

    Nesme J, J.; Achouak, W.; Agathos SN, S.N.; Bailey, M.; Baldrian, Petr; Brunel, D.; Frostegård, Å.; Heulin, T.; Jansson JK, J.K.; Jurkevitch, E.; Kruus, K.L.; Kowalchuk, G.A.; Lagares, A.; Lapin-Scott, H.M.; Lemanceau, P.; Le Paslier, D.; Mandic-Mulec, I.; Murrell, J.C.; Myrold, D.D.; Nalin, R.; Nannipieri, P.; Neufeld, J.D.; O'Gara, F.; Parnell, J.J.; Pühler, A.; Pylro, V.; Ramos, J.L.; Roesch, L.F.; Schloter, M.; Schleper, C.; Sczyrba, A.; Sessitsch, A.; Sjöling, S.; Sørensen, J.; Sørensen, S.J.; Tebbe, C.C.; Topp, E.; Tsiamis, G.; van Elsas, J.D.; van Keulen, G.; Widmer, F.; Wagner, M.; Zhang, T.; Zhang, X.; Zhao, L; Zhu, Y-G.; Vogel, T.M.; Simonet, P.

    2016-01-01

    Roč. 7, FEB 10 (2016), s. 73 ISSN 1664-302X Institutional support: RVO:61388971 Keywords : metagenomic * soil microbiology; terrestrial microbiology * metagenomic; soil microbiology; terrestrial microbiology Subject RIV: EE - Microbiology, Virology Impact factor: 4.076, year: 2016

  13. Regulatory links between imprinted genes: evolutionary predictions and consequences.

    Science.gov (United States)

    Patten, Manus M; Cowley, Michael; Oakey, Rebecca J; Feil, Robert

    2016-02-10

    Genomic imprinting is essential for development and growth and plays diverse roles in physiology and behaviour. Imprinted genes have traditionally been studied in isolation or in clusters with respect to cis-acting modes of gene regulation, both from a mechanistic and evolutionary point of view. Recent studies in mammals, however, reveal that imprinted genes are often co-regulated and are part of a gene network involved in the control of cellular proliferation and differentiation. Moreover, a subset of imprinted genes acts in trans on the expression of other imprinted genes. Numerous studies have modulated levels of imprinted gene expression to explore phenotypic and gene regulatory consequences. Increasingly, the applied genome-wide approaches highlight how perturbation of one imprinted gene may affect other maternally or paternally expressed genes. Here, we discuss these novel findings and consider evolutionary theories that offer a rationale for such intricate interactions among imprinted genes. An evolutionary view of these trans-regulatory effects provides a novel interpretation of the logic of gene networks within species and has implications for the origin of reproductive isolation between species. © 2016 The Authors.

  14. Metagenomic Signatures of Microbial Communities in Deep-Sea Hydrothermal Sediments of Azores Vent Fields.

    Science.gov (United States)

    Cerqueira, Teresa; Barroso, Cristina; Froufe, Hugo; Egas, Conceição; Bettencourt, Raul

    2018-01-21

    The organisms inhabiting the deep-seafloor are known to play a crucial role in global biogeochemical cycles. Chemolithoautotrophic prokaryotes, which produce biomass from single carbon molecules, constitute the primary source of nutrition for the higher organisms, being critical for the sustainability of food webs and overall life in the deep-sea hydrothermal ecosystems. The present study investigates the metabolic profiles of chemolithoautotrophs inhabiting the sediments of Menez Gwen and Rainbow deep-sea vent fields, in the Mid-Atlantic Ridge. Differences in the microbial community structure might be reflecting the distinct depth, geology, and distance from vent of the studied sediments. A metagenomic sequencing approach was conducted to characterize the microbiome of the deep-sea hydrothermal sediments and the relevant metabolic pathways used by microbes. Both Menez Gwen and Rainbow metagenomes contained a significant number of genes involved in carbon fixation, revealing the largely autotrophic communities thriving in both sites. Carbon fixation at Menez Gwen site was predicted to occur mainly via the reductive tricarboxylic acid cycle, likely reflecting the dominance of sulfur-oxidizing Epsilonproteobacteria at this site, while different autotrophic pathways were identified at Rainbow site, in particular the Calvin-Benson-Bassham cycle. Chemolithotrophy appeared to be primarily driven by the oxidation of reduced sulfur compounds, whether through the SOX-dependent pathway at Menez Gwen site or through reverse sulfate reduction at Rainbow site. Other energy-yielding processes, such as methane, nitrite, or ammonia oxidation, were also detected but presumably contributing less to chemolithoautotrophy. This work furthers our knowledge of the microbial ecology of deep-sea hydrothermal sediments and represents an important repository of novel genes with potential biotechnological interest.

  15. Comparative metagenomics of anode-associated microbiomes developed in rice paddy-field microbial fuel cells.

    Directory of Open Access Journals (Sweden)

    Atsushi Kouzuma

    Full Text Available In sediment-type microbial fuel cells (sMFCs operating in rice paddy fields, rice-root exudates are converted to electricity by anode-associated rhizosphere microbes. Previous studies have shown that members of the family Geobacteraceae are enriched on the anodes of rhizosphere sMFCs. To deepen our understanding of rhizosphere microbes involved in electricity generation in sMFCs, here, we conducted comparative analyses of anode-associated microbiomes in three MFC systems: a rice paddy-field sMFC, and acetate- and glucose-fed MFCs in which pieces of graphite felt that had functioned as anodes in rice paddy-field sMFC were used as rhizosphere microbe-bearing anodes. After electric outputs became stable, microbiomes associated with the anodes of these MFC systems were analyzed by pyrotag sequencing of 16S rRNA gene amplicons and Illumina shotgun metagenomics. Pyrotag sequencing showed that Geobacteraceae bacteria were associated with the anodes of all three systems, but the dominant Geobacter species in each MFC were different. Specifically, species closely related to G. metallireducens comprised 90% of the anode Geobacteraceae in the acetate-fed MFC, but were only relatively minor components of the rhizosphere sMFC and glucose-fed MFC, whereas species closely related to G. psychrophilus were abundantly detected. This trend was confirmed by the phylogenetic assignments of predicted genes in shotgun metagenome sequences of the anode microbiomes. Our findings suggest that G. psychrophilus and its related species preferentially grow on the anodes of rhizosphere sMFCs and generate electricity through syntrophic interactions with organisms that excrete electron donors.

  16. Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes

    DEFF Research Database (Denmark)

    Nielsen, Henrik Bjørn; Almeida, Mathieu; Juncker, Agnieszka

    2014-01-01

    , such as particular bacterial strains or viruses, remains a largely unsolved problem. Here we present a method, based on binning co-abundant genes across a series of metagenomic samples, that enables comprehensive discovery of new microbial organisms, viruses and co-inherited genetic entities and aids assembly...... of microbial genomes without the need for reference sequences. We demonstrate the method on data from 396 human gut microbiome samples and identify 7,381 co-abundance gene groups (CAGs), including 741 metagenomic species (MGS). We use these to assemble 238 high-quality microbial genomes and identify...... affiliations between MGS and hundreds of viruses or genetic entities. Our method provides the means for comprehensive profiling of the diversity within complex metagenomic samples....

  17. Marine Metagenomics: New Tools for the Study and Exploitation of Marine Microbial Metabolism

    Directory of Open Access Journals (Sweden)

    Alan D. W. Dobson

    2010-03-01

    Full Text Available The marine environment is extremely diverse, with huge variations in pressure and temperature. Nevertheless, life, especially microbial life, thrives throughout the marine biosphere and microbes have adapted to all the divergent environments present. Large scale DNA sequence based approaches have recently been used to investigate the marine environment and these studies have revealed that the oceans harbor unprecedented microbial diversity. Novel gene families with representatives only within such metagenomic datasets represent a large proportion of the ocean metagenome. The presence of so many new gene families from these uncultured and highly diverse microbial populations represents a challenge for the understanding of and exploitation of the biology and biochemistry of the ocean environment. The application of new metagenomic and single cell genomics tools offers new ways to explore the complete metabolic diversity of the marine biome.

  18. Metagenomics reveals pervasive bacterial populations and reduced community diversity across the Alaska tundra ecosystem

    Directory of Open Access Journals (Sweden)

    Eric Robert Johnston

    2016-04-01

    Full Text Available How soil microbial communities contrast with respect to taxonomic and functional composition within and between ecosystems remains an unresolved question that is central to predicting how global anthropogenic change will affect soil functioning and services. In particular, it remains unclear how small-scale observations of soil communities based on the typical volume sampled (1-2 grams are generalizable to ecosystem-scale responses and processes. This is especially relevant for remote, northern latitude soils, which are challenging to sample and are also thought to be more vulnerable to climate change compared to temperate soils. Here, we employed well-replicated shotgun metagenome and 16S rRNA gene amplicon sequencing to characterize community composition and metabolic potential in Alaskan tundra soils, combining our own datasets with those publically available from distant tundra and temperate grassland and agriculture habitats. We found that the abundance of many taxa and metabolic functions differed substantially between tundra soil metagenomes relative to those from temperate soils, and that a high degree of OTU-sharing exists between tundra locations. Tundra soils were an order of magnitude less complex than their temperate counterparts, allowing for near-complete coverage of microbial community richness (~92% breadth by sequencing, and the recovery of twenty-seven high-quality, almost complete (>80% completeness population bins. These population bins, collectively, made up to ~10% of the metagenomic datasets, and represented diverse taxonomic groups and metabolic lifestyles tuned toward sulfur cycling, hydrogen metabolism, methanotrophy, and organic matter oxidation. Several population bins, including members of Acidobacteria, Actinobacteria, and Proteobacteria, were also present in geographically distant (~100-530 km apart tundra habitats (full genome representation and up to 99.6% genome-derived average nucleotide identity. Collectively

  19. A genome-wide gene function prediction resource for Drosophila melanogaster.

    Directory of Open Access Journals (Sweden)

    Han Yan

    2010-08-01

    Full Text Available Predicting gene functions by integrating large-scale biological data remains a challenge for systems biology. Here we present a resource for Drosophila melanogaster gene function predictions. We trained function-specific classifiers to optimize the influence of different biological datasets for each functional category. Our model predicted GO terms and KEGG pathway memberships for Drosophila melanogaster genes with high accuracy, as affirmed by cross-validation, supporting literature evidence, and large-scale RNAi screens. The resulting resource of prioritized associations between Drosophila genes and their potential functions offers a guide for experimental investigations.

  20. Biochemical Characterization of a Family 15 Carbohydrate Esterase from a Bacterial Marine Arctic Metagenome.

    Directory of Open Access Journals (Sweden)

    Concetta De Santi

    Full Text Available The glucuronoyl esterase enzymes of wood-degrading fungi (Carbohydrate Esterase family 15; CE15 form part of the hemicellulolytic and cellulolytic enzyme systems that break down plant biomass, and have possible applications in biotechnology. Homologous enzymes are predicted in the genomes of several bacteria, however these have been much less studied than their fungal counterparts. Here we describe the recombinant production and biochemical characterization of a bacterial CE15 enzyme denoted MZ0003, which was identified by in silico screening of a prokaryotic metagenome library derived from marine Arctic sediment. MZ0003 has high similarity to several uncharacterized gene products of polysaccharide-degrading bacterial species, and phylogenetic analysis indicates a deep evolutionary split between these CE15s and fungal homologs.MZ0003 appears to differ from previously-studied CE15s in some aspects. Some glucuronoyl esterase activity could be measured by qualitative thin-layer chromatography which confirms its assignment as a CE15, however MZ0003 can also hydrolyze a range of other esters, including p-nitrophenyl acetate, which is not acted upon by some fungal homologs. The structure of MZ0003 also appears to differ as it is predicted to have several large loop regions that are absent in previously studied CE15s, and a combination of homology-based modelling and site-directed mutagenesis indicate its catalytic residues deviate from the conserved Ser-His-Glu triad of many fungal CE15s. Taken together, these results indicate that potentially unexplored diversity exists among bacterial CE15s, and this may be accessed by investigation of the microbial metagenome. The combination of low activity on typical glucuronoyl esterase substrates, and the lack of glucuronic acid esters in the marine environment suggest that the physiological substrate of MZ0003 and its homologs is likely to be different from that of related fungal enzymes.

  1. Biological interpretation of genome-wide association studies using predicted gene functions

    NARCIS (Netherlands)

    Pers, Tune H.; Karjalainen, Juha M.; Chan, Yingleong; Westra, Harm-Jan; Wood, Andrew R.; Yang, Jian; Lui, Julian C.; Vedantam, Sailaja; Gustafsson, Stefan; Esko, Tonu; Frayling, Tim; Speliotes, Elizabeth K.; Boehnke, Michael; Raychaudhuri, Soumya; Fehrmann, Rudolf S. N.; Hirschhorn, Joel N.; Franke, Lude; Chu, Audrey Y.; Estrada, Karol; Luan, Jian'an; Kutalik, Zoltán; Amin, Najaf; Buchkovich, Martin L.; Croteau-Chonka, Damien C.; Day, Felix R.; Duan, Yanan; Fall, Tove; Fehrmann, Rudolf; Ferreira, Teresa; Jackson, Anne U.; Karjalainen, Juha; Lo, Ken Sin; Locke, Adam E.; Mägi, Reedik; Mihailov, Evelin; Porcu, Eleonora; Randall, Joshua C.; Scherag, André; Vinkhuyzen, Anna A. E.; Winkler, Thomas W.; Workalemahu, Tsegaselassie; Zhao, Jing Hua; Absher, Devin; Albrecht, Eva; Anderson, Denise; Baron, Jeffrey; Beekman, Marian; Demirkan, Ayse; Ehret, Georg B.; Feenstra, Bjarke; Feitosa, Mary F.; Fischer, Krista; Fraser, Ross M.; Goel, Anuj; Gong, Jian; Justice, E.; Kanoni, Stavroula; Kleber, Marcus E.; Kristiansson, Kati; Lim, Unhee; Lotay, Vaneet; Mangino, Massimo; Mateo Leach, Irene; Medina-Gomez, Carolina; Nalls, Michael A.; Nyholt, Dale R.; Palmer, Cameron D.; Pasko, Dorota; Pechlivanis, Sonali; Prokopenko, Inga; Ried, Janina S.; Ripke, Stephan; Shungin, Dmitry; Stancáková, Alena; Strawbridge, Rona J.; Sung, Yun Ju; Tanaka, Toshiko; Teumer, Alexander; Trompet, Stella; van der Laan, Sander W.; van Setten, Jessica; van Vliet-Ostaptchouk, Jana V.; Wang, Zhaoming; Yengo, Loïc; Zhang, Weihua; Afzal, Uzma; Ärnlöv, Johan; Arscott, Gillian M.; Bandinelli, Stefania; Barrett, Amy; Bellis, Claire; Bennett, Amanda J.; Berne, Christian; Blüher, Matthias; Bolton, Jennifer L.; Böttcher, Yvonne; Boyd, Heather A.; Bruinenberg, Marcel; Buckley, Brendan M.; Buyske, Steven; Caspersen, Ida H.; Chines, Peter S.; Clarke, Robert; Claudi-Boehm, Simone; Cooper, Matthew; Daw, E. Warwick; de Jong, A.; Deelen, Joris; Delgado, Graciela; Denny, Josh C.; Dhonukshe-Rutten, Rosalie; Dimitriou, Maria; Doney, Alex S. F.; Dörr, Marcus; Eklund, Niina; Eury, Elodie; Folkersen, Lasse; Garcia, Melissa E.; Geller, Frank; Giedraitis, Vilmantas; Go, Alan S.; Grallert, Harald; Grammer, Tanja B.; Gräßler, Jürgen; Grönberg, Henrik; de Groot, Lisette C. P. G. M.; Groves, Christopher J.; Haessler, Jeffrey; Haller, Toomas; Hallmans, Goran; Hannemann, Anke; Hartman, Catharina A.; Hassinen, Maija; Hayward, Caroline; Heard-Costa, Nancy L.; Helmer, Quinta; Hemani, Gibran; Henders, Anjali K.; Hillege, Hans L.; Hlatky, Mark A.; Hoffmann, Wolfgang; Hoffmann, Per; Holmen, Oddgeir; Houwing-Duistermaat, Jeanine J.; Illig, Thomas; Isaacs, Aaron; James, Alan L.; Jeff, Janina; Johansen, Berit; Johansson, Åsa; Jolley, Jennifer; Juliusdottir, Thorhildur; Junttila, Juhani; Kho, Abel N.; Kinnunen, Leena; Klopp, Norman; Kocher, Thomas; Kratzer, Wolfgang; Lichtner, Peter; Lind, Lars; Lindström, Jaana; Lobbens, Stéphane; Lorentzon, Mattias; Lu, Yingchang; Lyssenko, Valeriya; Magnusson, Patrik K. E.; Mahajan, Anubha; Maillard, Marc; McArdle, Wendy L.; McKenzie, Colin A.; McLachlan, Stela; McLaren, Paul J.; Menni, Cristina; Merger, Sigrun; Milani, Lili; Moayyeri, Alireza; Monda, Keri L.; Morken, Mario A.; Müller, Gabriele; Müller-Nurasyid, Martina; Musk, Arthur W.; Narisu, Narisu; Nauck, Matthias; Nolte, Ilja M.; Nöthen, Markus M.; Oozageer, Laticia; Pilz, Stefan; Rayner, Nigel W.; Renstrom, Frida; Robertson, Neil R.; Rose, Lynda M.; Roussel, Ronan; Sanna, Serena; Scharnagl, Hubert; Scholtens, Salome; Schumacher, Fredrick R.; Schunkert, Heribert; Scott, Robert A.; Sehmi, Joban; Seufferlein, Thomas; Shi, Jianxin; Silventoinen, Karri; Smit, Johannes H.; Smith, Albert Vernon; Smolonska, Joanna; Stanton, Alice V.; Stirrups, Kathleen; Stott, David J.; Stringham, Heather M.; Sundström, Johan; Swertz, Morris A.; Syvänen, Ann-Christine; Tayo, Bamidele O.; Thorleifsson, Gudmar; Tyrer, Jonathan P.; van Dijk, Suzanne; van Schoor, Natasja M.; van der Velde, Nathalie; van Heemst, Diana; van Oort, Floor V. A.; Vermeulen, Sita H.; Verweij, Niek; Vonk, Judith M.; Waite, Lindsay L.; Waldenberger, Melanie; Wennauer, Roman; Wilkens, Lynne R.; Willenborg, Christina; Wilsgaard, Tom; Wojczynski, Mary K.; Wong, Andrew; Wright, Alan F.; Zhang, Qunyuan; Arveiler, Dominique; Bakker, Stephan J. L.; Beilby, John; Bergman, Richard N.; Bergmann, Sven; Biffar, Reiner; Blangero, John; Boomsma, I.; Bornstein, Stefan R.; Bovet, Pascal; Brambilla, Paolo; Brown, Morris J.; Campbell, Harry; Caulfield, Mark J.; Chakravarti, Aravinda; Collins, Rory; Collins, Francis S.; Crawford, Dana C.; Cupples, L. Adrienne; Danesh, John; de Faire, Ulf; den Ruijter, Hester M.; Erbel, Raimund; Erdmann, Jeanette; Eriksson, Johan G.; Farrall, Martin; Ferrannini, Ele; Ferrières, Jean; Ford, Ian; Forouhi, Nita G.; Forrester, Terrence; Gansevoort, Ron T.; Gejman, Pablo V.; Gieger, Christian; Golay, Alain; Gottesman, Omri; Gudnason, Vilmundur; Gyllensten, Ulf; Haas, David W.; Hall, Alistair S.; Harris, Tamara B.; Hattersley, Andrew T.; Heath, Andrew C.; Hengstenberg, Christian; Hicks, Andrew A.; Hindorff, Lucia A.; Hingorani, Aroon D.; Hofman, Albert; Hovingh, G. Kees; Humphries, Steve E.; Hunt, Steven C.; Hypponen, Elina; Jacobs, Kevin B.; Jarvelin, Marjo-Riitta; Jousilahti, Pekka; Jula, Antti M.; Kaprio, Jaakko; Kastelein, John J. P.; Kayser, Manfred; Kee, Frank; Keinanen-Kiukaanniemi, Sirkka M.; Kiemeney, Lambertus A.; Kooner, Jaspal S.; Kooperberg, Charles; Koskinen, Seppo; Kovacs, Peter; Kraja, Aldi T.; Kumari, Meena; Kuusisto, Johanna; Lakka, Timo A.; Langenberg, Claudia; Le Marchand, Loic; Lehtimäki, Terho; Lupoli, Sara; Madden, Pamela A. F.; Männistö, Satu; Manunta, Paolo; Marette, André; Matise, Tara C.; McKnight, Barbara; Meitinger, Thomas; Moll, Frans L.; Montgomery, Grant W.; Morris, Andrew D.; Morris, Andrew P.; Murray, Jeffrey C.; Nelis, Mari; Ohlsson, Claes; Oldehinkel, Albertine J.; Ong, Ken K.; Ouwehand, Willem H.; Pasterkamp, Gerard; Peters, Annette; Pramstaller, Peter P.; Price, Jackie F.; Qi, Lu; Raitakari, Olli T.; Rankinen, Tuomo; Rao, D. C.; Rice, Treva K.; Ritchie, Marylyn; Rudan, Igor; Salomaa, Veikko; Samani, Nilesh J.; Saramies, Jouko; Sarzynski, Mark A.; Schwarz, Peter E. H.; Sebert, Sylvain; Sever, Peter; Shuldiner, Alan R.; Sinisalo, Juha; Steinthorsdottir, Valgerdur; Stolk, Ronald P.; Tardif, Jean-Claude; Tönjes, Anke; Tremblay, Angelo; Tremoli, Elena; Virtamo, Jarmo; Vohl, Marie-Claude; Amouyel, Philippe; Asselbergs, Folkert W.; Assimes, Themistocles L.; Bochud, Murielle; Boehm, Bernhard O.; Boerwinkle, Eric; Bottinger, Erwin P.; Bouchard, Claude; Cauchi, Stéphane; Chambers, John C.; Chanock, Stephen J.; Cooper, Richard S.; de Bakker, Paul I. W.; Dedoussis, George; Ferrucci, Luigi; Franks, Paul W.; Froguel, Philippe; Groop, Leif C.; Haiman, Christopher A.; Hamsten, Anders; Hayes, M. Geoffrey; Hui, Jennie; Hunter, David J.; Hveem, Kristian; Jukema, J. Wouter; Kaplan, Robert C.; Kivimaki, Mika; Kuh, Diana; Laakso, Markku; Liu, Yongmei; Martin, Nicholas G.; März, Winfried; Melbye, Mads; Moebus, Susanne; Munroe, Patricia B.; Njølstad, Inger; Oostra, Ben A.; Palmer, Colin N. A.; Pedersen, Nancy L.; Perola, Markus; Pérusse, Louis; Peters, Ulrike; Powell, Joseph E.; Power, Chris; Quertermous, Thomas; Rauramaa, Rainer; Reinmaa, Eva; Ridker, Paul M.; Rivadeneira, Fernando; Rotter, Jerome I.; Saaristo, Timo E.; Saleheen, Danish; Schlessinger, David; Slagboom, P. Eline; Snieder, Harold; Spector, Tim D.; Strauch, Konstantin; Stumvoll, Michael; Tuomilehto, Jaakko; Uusitupa, Matti; van der Harst, Pim; Völzke, Henry; Walker, Mark; Wareham, Nicholas J.; Watkins, Hugh; Wichmann, H.-Erich; Wilson, James F.; Zanen, Pieter; Deloukas, Panos; Heid, Iris M.; Lindgren, Cecilia M.; Mohlke, Karen L.; Thorsteinsdottir, Unnur; Barroso, Inês; Fox, Caroline S.; North, Kari E.; Strachan, David P.; Beckmann, Jacques S.; Berndt, Sonja I.; Borecki, Ingrid B.; McCarthy, Mark I.; Metspalu, Andres; Stefansson, Kari; Uitterlinden, André G.; van Duijn, Cornelia M.; Willer, Cristen J.; Price, Alkes L.; Lettre, Guillaume; Loos, Ruth J. F.; Weedon, Michael N.; Ingelsson, Erik; O'Connell, Jeffrey R.; Abecasis, Goncalo R.; Chasman, Daniel I.; Goddard, Michael E.; Visscher, Peter M.; Frayling, Timothy M.

    2015-01-01

    The main challenge for gaining biological insights from genetic associations is identifying which genes and pathways explain the associations. Here we present DEPICT, an integrative tool that employs predicted gene functions to systematically prioritize the most likely causal genes at associated

  2. Combining biological gene expression signatures in predicting outcome in breast cancer: An alternative to supervised classification

    NARCIS (Netherlands)

    Nuyten, Dimitry S. A.; Hastie, Trevor; Chi, Jen-Tsan Ashley; Chang, Howard Y.; van de Vijver, Marc J.

    2008-01-01

    INTRODUCTION: Gene expression profiling has been extensively used to predict outcome in breast cancer patients. We have previously reported on biological hypothesis-driven analysis of gene expression profiling data and we wished to extend this approach through the combinations of various gene

  3. Metagenomic and metabolic profiling of nonlithifying and lithifying stromatolitic mats of Highborne Cay, The Bahamas.

    Directory of Open Access Journals (Sweden)

    Christina L M Khodadad

    Full Text Available BACKGROUND: Stromatolites are laminated carbonate build-ups formed by the metabolic activity of microbial mats and represent one of the oldest known ecosystems on Earth. In this study, we examined a living stromatolite located within the Exuma Sound, The Bahamas and profiled the metagenome and metabolic potential underlying these complex microbial communities. METHODOLOGY/PRINCIPAL FINDINGS: The metagenomes of the two dominant stromatolitic mat types, a nonlithifying (Type 1 and lithifying (Type 3 microbial mat, were partially sequenced and compared. This deep-sequencing approach was complemented by profiling the substrate utilization patterns of the mats using metabolic microarrays. Taxonomic assessment of the protein-encoding genes confirmed previous SSU rRNA analyses that bacteria dominate the metagenome of both mat types. Eukaryotes comprised less than 13% of the metagenomes and were rich in sequences associated with nematodes and heterotrophic protists. Comparative genomic analyses of the functional genes revealed extensive similarities in most of the subsystems between the nonlithifying and lithifying mat types. The one exception was an increase in the relative abundance of certain genes associated with carbohydrate metabolism in the lithifying Type 3 mats. Specifically, genes associated with the degradation of carbohydrates commonly found in exopolymeric substances, such as hexoses, deoxy- and acidic sugars were found. The genetic differences in carbohydrate metabolisms between the two mat types were confirmed using metabolic microarrays. Lithifying mats had a significant increase in diversity and utilization of carbon, nitrogen, phosphorus and sulfur substrates. CONCLUSION/SIGNIFICANCE: The two stromatolitic mat types retained similar microbial communities, functional diversity and many genetic components within their metagenomes. However, there were major differences detected in the activity and genetic pathways of organic carbon

  4. Effects of sample size on differential gene expression, rank order and prediction accuracy of a gene signature.

    Directory of Open Access Journals (Sweden)

    Cynthia Stretch

    Full Text Available Top differentially expressed gene lists are often inconsistent between studies and it has been suggested that small sample sizes contribute to lack of reproducibility and poor prediction accuracy in discriminative models. We considered sex differences (69♂, 65 ♀ in 134 human skeletal muscle biopsies using DNA microarray. The full dataset and subsamples (n = 10 (5 ♂, 5 ♀ to n = 120 (60 ♂, 60 ♀ thereof were used to assess the effect of sample size on the differential expression of single genes, gene rank order and prediction accuracy. Using our full dataset (n = 134, we identified 717 differentially expressed transcripts (p<0.0001 and we were able predict sex with ~90% accuracy, both within our dataset and on external datasets. Both p-values and rank order of top differentially expressed genes became more variable using smaller subsamples. For example, at n = 10 (5 ♂, 5 ♀, no gene was considered differentially expressed at p<0.0001 and prediction accuracy was ~50% (no better than chance. We found that sample size clearly affects microarray analysis results; small sample sizes result in unstable gene lists and poor prediction accuracy. We anticipate this will apply to other phenotypes, in addition to sex.

  5. Predicting disease-related genes using integrated biomedical networks

    OpenAIRE

    Peng, Jiajie; Bai, Kun; Shang, Xuequn; Wang, Guohua; Xue, Hansheng; Jin, Shuilin; Cheng, Liang; Wang, Yadong; Chen, Jin

    2017-01-01

    Background Identifying the genes associated to human diseases is crucial for disease diagnosis and drug design. Computational approaches, esp. the network-based approaches, have been recently developed to identify disease-related genes effectively from the existing biomedical networks. Meanwhile, the advance in biotechnology enables researchers to produce multi-omics data, enriching our understanding on human diseases, and revealing the complex relationships between genes and diseases. Howeve...

  6. Long-read transcriptome data for improved gene prediction in Lentinula edodes

    Directory of Open Access Journals (Sweden)

    Sin-Gi Park

    2017-12-01

    Full Text Available Lentinula edodes is one of the most popular edible mushrooms in the world and contains useful medicinal components such as lentinan. The whole-genome sequence of L. edodes has been determined with the objective of discovering candidate genes associated with agronomic traits, but experimental verification of gene models with correction of gene prediction errors is lacking. To improve the accuracy of gene prediction, we produced 12.6 Gb of long-read transcriptome data of variable lengths using PacBio single-molecule real-time (SMRT sequencing and generated 36,946 transcript clusters with an average length of 2.2 kb. Evidence-driven gene prediction on the basis of long- and short-read RNA sequencing data was performed; a total of 16,610 protein-coding genes were predicted with error correction. Of the predicted genes, 42.2% were verified to be covered by full-length transcript clusters. The raw reads have been deposited in the NCBI SRA database under accession number PRJNA396788. Keywords: Gene model, Gene prediction, Lentinula edodes, PacBio Single-molecule real-time (SMRT transcriptome sequencing

  7. Identifying the Gene Signatures from Gene-Pathway Bipartite Network Guarantees the Robust Model Performance on Predicting the Cancer Prognosis

    Directory of Open Access Journals (Sweden)

    Li He

    2014-01-01

    Full Text Available For the purpose of improving the prediction of cancer prognosis in the clinical researches, various algorithms have been developed to construct the predictive models with the gene signatures detected by DNA microarrays. Due to the heterogeneity of the clinical samples, the list of differentially expressed genes (DEGs generated by the statistical methods or the machine learning algorithms often involves a number of false positive genes, which are not associated with the phenotypic differences between the compared clinical conditions, and subsequently impacts the reliability of the predictive models. In this study, we proposed a strategy, which combined the statistical algorithm with the gene-pathway bipartite networks, to generate the reliable lists of cancer-related DEGs and constructed the models by using support vector machine for predicting the prognosis of three types of cancers, namely, breast cancer, acute myeloma leukemia, and glioblastoma. Our results demonstrated that, combined with the gene-pathway bipartite networks, our proposed strategy can efficiently generate the reliable cancer-related DEG lists for constructing the predictive models. In addition, the model performance in the swap analysis was similar to that in the original analysis, indicating the robustness of the models in predicting the cancer outcomes.

  8. Bioprospecting metagenomes: glycosyl hydrolases for converting biomass

    Directory of Open Access Journals (Sweden)

    Monchy Sebastien

    2009-05-01

    Full Text Available Abstract Throughout immeasurable time, microorganisms evolved and accumulated remarkable physiological and functional heterogeneity, and now constitute the major reserve for genetic diversity on earth. Using metagenomics, namely genetic material recovered directly from environmental samples, this biogenetic diversification can be accessed without the need to cultivate cells. Accordingly, microbial communities and their metagenomes, isolated from biotopes with high turnover rates of recalcitrant biomass, such as lignocellulosic plant cell walls, have become a major resource for bioprospecting; furthermore, this material is a major asset in the search for new biocatalytics (enzymes for various industrial processes, including the production of biofuels from plant feedstocks. However, despite the contributions from metagenomics technologies consequent upon the discovery of novel enzymes, this relatively new enterprise requires major improvements. In this review, we compare function-based metagenome screening and sequence-based metagenome data mining, discussing the advantages and limitations of both methods. We also describe the unusual enzymes discovered via metagenomics approaches, and discuss the future prospects for metagenome technologies.

  9. Human milk metagenome: a functional capacity analysis

    Science.gov (United States)

    2013-01-01

    Background Human milk contains a diverse population of bacteria that likely influences colonization of the infant gastrointestinal tract. Recent studies, however, have been limited to characterization of this microbial community by 16S rRNA analysis. In the present study, a metagenomic approach using Illumina sequencing of a pooled milk sample (ten donors) was employed to determine the genera of bacteria and the types of bacterial open reading frames in human milk that may influence bacterial establishment and stability in this primal food matrix. The human milk metagenome was also compared to that of breast-fed and formula-fed infants’ feces (n = 5, each) and mothers’ feces (n = 3) at the phylum level and at a functional level using open reading frame abundance. Additionally, immune-modulatory bacterial-DNA motifs were also searched for within human milk. Results The bacterial community in human milk contained over 360 prokaryotic genera, with sequences aligning predominantly to the phyla of Proteobacteria (65%) and Firmicutes (34%), and the genera of Pseudomonas (61.1%), Staphylococcus (33.4%) and Streptococcus (0.5%). From assembled human milk-derived contigs, 30,128 open reading frames were annotated and assigned to functional categories. When compared to the metagenome of infants’ and mothers’ feces, the human milk metagenome was less diverse at the phylum level, and contained more open reading frames associated with nitrogen metabolism, membrane transport and stress response (P milk metagenome also contained a similar occurrence of immune-modulatory DNA motifs to that of infants’ and mothers’ fecal metagenomes. Conclusions Our results further expand the complexity of the human milk metagenome and enforce the benefits of human milk ingestion on the microbial colonization of the infant gut and immunity. Discovery of immune-modulatory motifs in the metagenome of human milk indicates more exhaustive analyses of the functionality of the human

  10. Gene Ontology consistent protein function prediction: the FALCON algorithm applied to six eukaryotic genomes

    NARCIS (Netherlands)

    Kourmpetis, Y.A.I.; Dijk, van A.D.J.; Braak, ter C.J.F.

    2013-01-01

    Gene Ontology (GO) is a hierarchical vocabulary for the description of biological functions and locations, often employed by computational methods for protein function prediction. Due to the structure of GO, function predictions can be self- contradictory. For example, a protein may be predicted to

  11. A metagenomic study of methanotrophic microorganisms in Coal Oil Point seep sediments

    Directory of Open Access Journals (Sweden)

    Haverkamp Thomas HA

    2011-10-01

    Full Text Available Abstract Background Methane oxidizing prokaryotes in marine sediments are believed to function as a methane filter reducing the oceanic contribution to the global methane emission. In the anoxic parts of the sediments, oxidation of methane is accomplished by anaerobic methanotrophic archaea (ANME living in syntrophy with sulphate reducing bacteria. This anaerobic oxidation of methane is assumed to be a coupling of reversed methanogenesis and dissimilatory sulphate reduction. Where oxygen is available aerobic methanotrophs take part in methane oxidation. In this study, we used metagenomics to characterize the taxonomic and metabolic potential for methane oxidation at the Tonya seep in the Coal Oil Point area, California. Two metagenomes from different sediment depth horizons (0-4 cm and 10-15 cm below sea floor were sequenced by 454 technology. The metagenomes were analysed to characterize the distribution of aerobic and anaerobic methanotrophic taxa at the two sediment depths. To gain insight into the metabolic potential the metagenomes were searched for marker genes associated with methane oxidation. Results Blast searches followed by taxonomic binning in MEGAN revealed aerobic methanotrophs of the genus Methylococcus to be overrepresented in the 0-4 cm metagenome compared to the 10-15 cm metagenome. In the 10-15 cm metagenome, ANME of the ANME-1 clade, were identified as the most abundant methanotrophic taxon with 8.6% of the reads. Searches for particulate methane monooxygenase (pmoA and methyl-coenzyme M reductase (mcrA, marker genes for aerobic and anaerobic oxidation of methane respectively, identified pmoA in the 0-4 cm metagenome as Methylococcaceae related. The mcrA reads from the 10-15 cm horizon were all classified as originating from the ANME-1 clade. Conclusions Most of the taxa detected were present in both metagenomes and differences in community structure and corresponding metabolic potential between the two samples were mainly

  12. EBI metagenomics--a new resource for the analysis and archiving of metagenomic data.

    Science.gov (United States)

    Hunter, Sarah; Corbett, Matthew; Denise, Hubert; Fraser, Matthew; Gonzalez-Beltran, Alejandra; Hunter, Christopher; Jones, Philip; Leinonen, Rasko; McAnulla, Craig; Maguire, Eamonn; Maslen, John; Mitchell, Alex; Nuka, Gift; Oisel, Arnaud; Pesseat, Sebastien; Radhakrishnan, Rajesh; Rocca-Serra, Philippe; Scheremetjew, Maxim; Sterk, Peter; Vaughan, Daniel; Cochrane, Guy; Field, Dawn; Sansone, Susanna-Assunta

    2014-01-01

    Metagenomics is a relatively recently established but rapidly expanding field that uses high-throughput next-generation sequencing technologies to characterize the microbial communities inhabiting different ecosystems (including oceans, lakes, soil, tundra, plants and body sites). Metagenomics brings with it a number of challenges, including the management, analysis, storage and sharing of data. In response to these challenges, we have developed a new metagenomics resource (http://www.ebi.ac.uk/metagenomics/) that allows users to easily submit raw nucleotide reads for functional and taxonomic analysis by a state-of-the-art pipeline, and have them automatically stored (together with descriptive, standards-compliant metadata) in the European Nucleotide Archive.

  13. A computational screen for type I polyketide synthases in metagenomics shotgun data.

    Directory of Open Access Journals (Sweden)

    Konrad U Foerstner

    Full Text Available BACKGROUND: Polyketides are a diverse group of biotechnologically important secondary metabolites that are produced by multi domain enzymes called polyketide synthases (PKS. METHODOLOGY/PRINCIPAL FINDINGS: We have estimated frequencies of type I PKS (PKS I - a PKS subgroup - in natural environments by using Hidden-Markov-Models of eight domains to screen predicted proteins from six metagenomic shotgun data sets. As the complex PKS I have similarities to other multi-domain enzymes (like those for the fatty acid biosynthesis we increased the reliability and resolution of the dataset by maximum-likelihood trees. The combined information of these trees was then used to discriminate true PKS I domains from evolutionary related but functionally different ones. We were able to identify numerous novel PKS I proteins, the highest density of which was found in Minnesota farm soil with 136 proteins out of 183,536 predicted genes. We also applied the protocol to UniRef database to improve the annotation of proteins with so far unknown function and identified some new instances of horizontal gene transfer. CONCLUSIONS/SIGNIFICANCE: The screening approach proved powerful in identifying PKS I sequences in large sequence data sets and is applicable to many other protein families.

  14. Metagenomic analysis of bacterial community structure and diversity of lignocellulolytic bacteria in Vietnamese native goat rumen

    NARCIS (Netherlands)

    Do, Huyen Thi; Dao, Khoa Trong; Nguyen, Viet Khanh Hoang; Le Ngoc, Giang; Nguyen, Phuong Thi Mai; Le, Lam Tung; Phung, Nguyet Thu; M. van Straalen, Nico; Roelofs, Dick; Truong, Hai Nam

    2017-01-01

    Objective: In a previous study, analysis of Illumina sequenced metagenomic DNA data of bacteria in Vietnamese goats' rumen showed a high diversity of putative lignocellulolytic genes. In this study, taxonomy speculation of microbial community and lignocellulolytic bacteria population in the rumen

  15. Improved cultivation and metagenomics as new tools for bioprospecting in cold environments

    DEFF Research Database (Denmark)

    Vester, Jan Kjølhede; Glaring, Mikkel Andreas; Stougaard, Peter

    2015-01-01

    be limited as few hosts are available for expression of genes with extremophilic properties. This review summarizes the methods developed for improved cultivation as well as the metagenomic approaches for bioprospecting with focus on the challenges faced by bioprospecting in cold environments....

  16. Metagenomic evaluation of bacterial and archaeal diversity in the geothermal hot springs of manikaran, India.

    Science.gov (United States)

    Bhatia, Sonu; Batra, Navneet; Pathak, Ashish; Green, Stefan J; Joshi, Amit; Chauhan, Ashvini

    2015-02-19

    Bacterial and archaeal diversity in geothermal spring water were investigated using 16S rRNA gene amplicon metagenomic sequencing. This revealed the dominance of Firmicutes, Aquificae, and the Deinococcus-Thermus group in this thermophilic environment. A number of sequences remained taxonomically unresolved, indicating the presence of potentially novel microbes in this unique habitat. Copyright © 2015 Bhatia et al.

  17. An enrichment of CRISPR and other defense-related features in marine sponge-associated microbial metagenomes

    Directory of Open Access Journals (Sweden)

    Hannes Horn

    2016-11-01

    Full Text Available Many marine sponges are populated by dense and taxonomically diverse microbial consortia. We employed a metagenomics approach to unravel the differences in the functional gene repertoire among three Mediterranean sponge species, Petrosia ficiformis, Sarcotragus foetidus, Aplysina aerophoba and seawater. Different signatures were observed between sponge and seawater metagenomes with regard to microbial community composition, GC content, and estimated bacterial genome size. Our analysis showed further a pronounced repertoire for defense systems in sponge metagenomes. Specifically, Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR, restriction modification, DNA phosphorothioation and phage growth limitation systems were enriched in sponge metagenomes. These data suggest that defense is an important functional trait for an existence within sponges that requires mechanisms to defend against foreign DNA from microorganisms and viruses. This study contributes to an understanding of the evolutionary arms race between viruses/phages and bacterial genomes and it sheds light on the bacterial defenses that have evolved in the context of the sponge holobiont.

  18. Tuning the performance of a natural treatment process using metagenomics for improved trace organic chemical attenuation

    KAUST Repository

    Drewes, Jorg

    2014-02-01

    By utilizing high-throughput sequencing and metagenomics, this study revealed how the microbial community characteristics including composition, diversity, as well as functional genes in managed aquifer recharge (MAR) systems can be tuned to enhance removal of trace organic chemicals of emerging concern (CECs). Increasing the humic content of the primary substrate resulted in higher microbial diversity. Lower concentrations and a higher humic content of the primary substrate promoted the attenuation of biodegradable CECs in laboratory and field MAR systems. Metagenomic results indicated that the metabolic capabilities of xenobiotic biodegradation were significantly promoted for the microbiome under carbon-starving conditions. © IWA Publishing 2014.

  19. Predictive value of MSH2 gene expression in colorectal cancer treated with capecitabine

    DEFF Research Database (Denmark)

    Jensen, Lars H; Danenberg, Kathleen D; Danenberg, Peter V

    2007-01-01

    was associated with a hazard ratio of 0.5 (95% confidence interval, 0.23-1.11; P = 0.083) in survival analysis. CONCLUSION: The higher gene expression of MSH2 in responders and the trend for predicting overall survival indicates a predictive value of this marker in the treatment of advanced CRC with capecitabine.......PURPOSE: The objective of the present study was to evaluate the gene expression of the DNA mismatch repair gene MSH2 as a predictive marker in advanced colorectal cancer (CRC) treated with first-line capecitabine. PATIENTS AND METHODS: Microdissection of paraffin-embedded tumor tissue, RNA...

  20. Metagenomics and development of the gut microbiota in infants

    DEFF Research Database (Denmark)

    Vallès, Y.; Gosalbes, M. J.; de Vries, Lisbeth Elvira

    2012-01-01

    Clin Microbiol Infect 2012; 18 (Suppl. 4): 21–26 The establishment of a balanced intestinal microbiota is essential for numerous aspects of human health, yet the microbial colonization of the gastrointestinal tract of infants is both complex and highly variable among individuals. In addition......, the gastrointestinal tract microbiota is often exposed to antibiotics, and may be an important reservoir of resistant strains and of transferable resistance genes from early infancy. We are investigating by means of diverse metagenomic approaches several areas of microbiota development in infants, including...

  1. The oral metagenome in health and disease.

    Science.gov (United States)

    Belda-Ferre, Pedro; Alcaraz, Luis David; Cabrera-Rubio, Raúl; Romero, Héctor; Simón-Soro, Aurea; Pignatelli, Miguel; Mira, Alex

    2012-01-01

    The oral cavity of humans is inhabited by hundreds of bacterial species and some of them have a key role in the development of oral diseases, mainly dental caries and periodontitis. We describe for the first time the metagenome of the human oral cavity under health and diseased conditions, with a focus on supragingival dental plaque and cavities. Direct pyrosequencing of eight samples with different oral-health status produced 1 Gbp of sequence without the biases imposed by PCR or cloning. These data show that cavities are not dominated by Streptococcus mutans (the species originally identified as the ethiological agent of dental caries) but are in fact a complex community formed by tens of bacterial species, in agreement with the view that caries is a polymicrobial disease. The analysis of the reads indicated that the oral cavity is functionally a different environment from the gut, with many functional categories enriched in one of the two environments and depleted in the other. Individuals who had never suffered from dental caries showed an over-representation of several functional categories, like genes for antimicrobial peptides and quorum sensing. In addition, they did not have mutans streptococci but displayed high recruitment of other species. Several isolates belonging to these dominant bacteria in healthy individuals were cultured and shown to inhibit the growth of cariogenic bacteria, suggesting the use of these commensal bacterial strains as probiotics to promote oral health and prevent dental caries.

  2. Metagenomic and proteomic analyses to elucidate the mechanism of anaerobic benzene degradation

    Energy Technology Data Exchange (ETDEWEB)

    Abu Laban, Nidal [Helmholtz (Germany)

    2011-07-01

    This paper presents the mechanism of anaerobic benzene degradation using metagenomic and proteomic analyses. The objective of the study is to find out the microbes and biochemistry involved in benzene degradation. Hypotheses are proposed for the initial activation mechanism of benzene under anaerobic conditions. Two methods for degradation, molecular characterization and identification of benzene-degrading enzymes, are described. The physiological and molecular characteristics of iron-reducing enrichment culture are given and the process is detailed. Metagenome analysis of iron-reducing culture is presented using a pie chart. From the metagenome analysis of benzene-degrading culture, putative mobile element genes were identified in the aromatic-degrading configurations. Metaproteomic analysis of iron-reducing cultures and the anaerobic benzene degradation pathway are also elucidated. From the study, it can be concluded that gram-positive bacteria are involved in benzene degradation under iron-reducing conditions and that the catalysis mechanism of putative anaerobic benzene carboxylase needs further investigation.

  3. Marine metagenomics: strategies for the discovery of novel enzymes with biotechnological applications from marine environments

    Directory of Open Access Journals (Sweden)

    Dobson Alan DW

    2008-08-01

    Full Text Available Abstract Metagenomic based strategies have previously been successfully employed as powerful tools to isolate and identify enzymes with novel biocatalytic activities from the unculturable component of microbial communities from various terrestrial environmental niches. Both sequence based and function based screening approaches have been employed to identify genes encoding novel biocatalytic activities and metabolic pathways from metagenomic libraries. While much of the focus to date has centred on terrestrial based microbial ecosystems, it is clear that the marine environment has enormous microbial biodiversity that remains largely unstudied. Marine microbes are both extremely abundant and diverse; the environments they occupy likewise consist of very diverse niches. As culture-dependent methods have thus far resulted in the isolation of only a tiny percentage of the marine microbiota the application of metagenomic strategies holds great potential to study and exploit the enormous microbial biodiversity which is present within these marine environments.

  4. Novel polyhydroxyalkanoate copolymers produced in Pseudomonas putida by metagenomic polyhydroxyalkanoate synthases.

    Science.gov (United States)

    Cheng, Jiujun; Charles, Trevor C

    2016-09-01

    Bacterially produced biodegradable polyhydroxyalkanoates (PHAs) with versatile properties can be achieved using different PHA synthases (PhaCs). This work aims to expand the diversity of known PhaCs via functional metagenomics and demonstrates the use of these novel enzymes in PHA production. Complementation of a PHA synthesis-deficient Pseudomonas putida strain with a soil metagenomic cosmid library retrieved 27 clones expressing either class I, class II, or unclassified PHA synthases, and many did not have close sequence matches to known PhaCs. The composition of PHA produced by these clones was dependent on both the supplied growth substrates and the nature of the PHA synthase, with various combinations of short-chain-length (SCL) and medium-chain-length (MCL) PHA. These data demonstrate the ability to isolate diverse genes for PHA synthesis by functional metagenomics and their use for the production of a variety of PHA polymer and copolymer mixtures.

  5. Novel metagenome-derived carboxylesterase that hydrolyzes β-lactam antibiotics.

    Science.gov (United States)

    Jeon, Jeong Ho; Kim, Soo-Jin; Lee, Hyun Sook; Cha, Sun-Shin; Lee, Jung Hun; Yoon, Sang-Hong; Koo, Bon-Sung; Lee, Chang-Muk; Choi, Sang Ho; Lee, Sang Hee; Kang, Sung Gyun; Lee, Jung-Hyun

    2011-11-01

    It has been proposed that family VIII carboxylesterases and class C β-lactamases are phylogenetically related; however, none of carboxylesterases has been reported to hydrolyze β-lactam antibiotics except nitrocefin, a nonclinical chromogenic substrate. Here, we describe the first example of a novel carboxylesterase derived from a metagenome that is able to cleave the amide bond of various β-lactam substrates and the ester bond of p-nitrophenyl esters. A clone with lipolytic activity was selected by functional screening of a metagenomic library using tributyrin agar plates. The sequence analysis of the clone revealed the presence of an open reading frame (estU1) encoding a polypeptide of 426 amino acids, retaining an S-X-X-K motif that is conserved in class C β-lactamases and family VIII carboxylesterases. The gene was overexpressed in Escherichia coli, and the purified recombinant protein (EstU1) was further characterized. EstU1 showed esterase activity toward various chromogenic p-nitrophenyl esters. In addition, it exhibited hydrolytic activity toward nitrocefin, leading us to investigate whether EstU1 could hydrolyze β-lactam antibiotics. EstU1 was able to hydrolyze first-generation β-lactam antibiotics, such as cephalosporins, cephaloridine, cephalothin, and cefazolin. In a kinetic study, EstU1 showed a similar range of substrate affinities for both p-nitrophenyl butyrate and first-generation cephalosporins while the turnover efficiency for the latter was much lower. Furthermore, site-directed mutagenesis studies revealed that the catalytic triad of EstU1 plays a crucial role in hydrolyzing both ester bonds of p-nitrophenyl esters and amide bonds of the β-lactam ring of antibiotics, implicating the predicted catalytic triad of EstU1 in both activities.

  6. Year-Long Metagenomic Study of River Microbiomes Across Land Use and Water Quality

    Science.gov (United States)

    Van Rossum, Thea; Peabody, Michael A.; Uyaguari-Diaz, Miguel I.; Cronin, Kirby I.; Chan, Michael; Slobodan, Jared R.; Nesbitt, Matthew J.; Suttle, Curtis A.; Hsiao, William W. L.; Tang, Patrick K. C.; Prystajecky, Natalie A.; Brinkman, Fiona S. L.

    2015-01-01

    Select bacteria, such as Escherichia coli or coliforms, have been widely used as sentinels of low water quality; however, there are concerns regarding their predictive accuracy for the protection of human and environmental health. To develop improved monitoring systems, a greater understanding of bacterial community structure, function, and variability across time is required in the context of different pollution types, such as agricultural and urban contamination. Here, we present a year-long survey of free-living bacterial DNA collected from seven sites along rivers in three watersheds with varying land use in Southwestern Canada. This is the first study to examine the bacterial metagenome in flowing freshwater (lotic) environments over such a time span, providing an opportunity to describe bacterial community variability as a function of land use and environmental conditions. Characteristics of the metagenomic data, such as sequence composition and average genome size (AGS), vary with sampling site, environmental conditions, and water chemistry. For example, AGS was correlated with hours of daylight in the agricultural watershed and, across the agriculturally and urban-affected sites, k-mer composition clustering corresponded to nutrient concentrations. In addition to indicating a community shift, this change in AGS has implications in terms of the normalization strategies required, and considerations surrounding such strategies in general are discussed. When comparing abundances of gene functional groups between high- and low-quality water samples collected from an agricultural area, the latter had a higher abundance of nutrient metabolism and bacteriophage groups, possibly reflecting an increase in agricultural runoff. This work presents a valuable dataset representing a year of monthly sampling across watersheds and an analysis targeted at establishing a foundational understanding of how bacterial lotic communities vary across time and land use. The results

  7. Year-long metagenomic study of river microbiomes across land use and water quality

    Directory of Open Access Journals (Sweden)

    Thea eVan Rossum

    2015-12-01

    Full Text Available Select bacteria, such as Escherichia coli or coliforms, have been widely used as sentinels of low water quality; however, there are concerns regarding their predictive accuracy for the protection of human and environmental health. To develop improved monitoring systems, a greater understanding of bacterial community structure, function and variability across time is required in the context of different pollution types, such as agricultural and urban contamination. Here, we present a year-long survey of free-living bacterial DNA collected from seven sites along rivers in three watersheds with varying land use in Southwestern Canada. This is the first study to examine the bacterial metagenome in flowing freshwater (lotic environments over such a time span, providing an opportunity to describe bacterial community variability as a function of land use and environmental conditions. Characteristics of the metagenomic data, such as sequence composition and average genome size, vary with sampling site, environmental conditions, and water chemistry. For example, average genome size was correlated with hours of daylight in the agricultural watershed and, across the agriculturally and urban-affected sites, k-mer composition clustering corresponded to nutrient concentrations. In addition to indicating a community shift, this change in average genome size has implications in terms of the normalisation strategies required, and considerations surrounding such strategies in general are discussed. When comparing abundances of gene functional groups between high- and low-quality water samples collected from an agricultural area, the latter had a higher abundance of nutrient metabolism and bacteriophage groups, possibly reflecting an increase in agricultural runoff. This work presents a valuable dataset representing a year of monthly sampling across watersheds and an analysis targeted at establishing a foundational understanding of how bacterial lotic communities

  8. [Pathology and viral metagenomics, a recent history].

    Science.gov (United States)

    Bernardo, Pauline; Albina, Emmanuel; Eloit, Marc; Roumagnac, Philippe

    2013-05-01

    Human, animal and plant viral diseases have greatly benefited from recent metagenomics developments. Viral metagenomics is a culture-independent approach used to investigate the complete viral genetic populations of a sample. During the last decade, metagenomics concepts and techniques that were first used by ecologists progressively spread into the scientific field of viral pathology. The sample, which was first for ecologists a fraction of ecosystem, became for pathologists an organism that hosts millions of microbes and viruses. This new approach, providing without a priori high resolution qualitative and quantitative data on the viral diversity, is now revolutionizing the way pathologists decipher viral diseases. This review describes the very last improvements of the high throughput next generation sequencing methods and discusses the applications of viral metagenomics in viral pathology, including discovery of novel viruses, viral surveillance and diagnostic, large-scale molecular epidemiology, and viral evolution. © 2013 médecine/sciences – Inserm.

  9. dsPIG: a tool to predict imprinted genes from the deep sequencing of whole transcriptomes

    Directory of Open Access Journals (Sweden)

    Li Hua

    2012-10-01

    Full Text Available Abstract Background Dysregulation of imprinted genes, which are expressed in a parent-of-origin-specific manner, plays an important role in various human diseases, such as cancer and behavioral disorder. To date, however, fewer than 100 imprinted genes have been identified in the human genome. The recent availability of high-throughput technology makes it possible to have large-scale prediction of imprinted genes. Here we propose a Bayesian model (dsPIG to predict imprinted genes on the basis of allelic expression observed in mRNA-Seq data of independent human tissues. Results Our model (dsPIG was capable of identifying imprinted genes with high sensitivity and specificity and a low false discovery rate when the number of sequenced tissue samples was fairly large, according to simulations. By applying dsPIG to the mRNA-Seq data, we predicted 94 imprinted genes in 20 cerebellum samples and 57 imprinted genes in 9 diverse tissue samples with expected low false discovery rates. We also assessed dsPIG using previously validated imprinted and non-imprinted genes. With simulations, we further analyzed how imbalanced allelic expression of non-imprinted genes or different minor allele frequencies affected the predictions of dsPIG. Interestingly, we found that, among biallelically expressed genes, at least 18 genes expressed significantly more transcripts from one allele than the other among different individuals and tissues. Conclusion With the prevalence of the mRNA-Seq technology, dsPIG has become a useful tool for analysis of allelic expression and large-scale prediction of imprinted genes. For ease of use, we have set up a web service and also provided an R package for dsPIG at http://www.shoudanliang.com/dsPIG/.

  10. Gene expression profiles predictive of cold-induced sweetening in potato.

    Science.gov (United States)

    Neilson, Jonathan; Lagüe, M; Thomson, S; Aurousseau, F; Murphy, A M; Bizimungu, B; Deveaux, V; Bègue, Y; Jacobs, J M E; Tai, H H

    2017-07-01

    Cold storage (2-4 °C) used in potato production to suppress diseases and sprouting during storage can result in cold-induced sweetening (CIS), where reducing sugars accumulate in tuber tissue leading to undesirable browning, production of bitter flavors, and increased levels of acrylamide with frying. Potato exhibits genetic and environmental variation in resistance to CIS. The current study profiles gene expression in post-harvest tubers before cold storage using transcriptome sequencing and identifies genes whose expression is predictive for CIS. A distance matrix for potato clones based on glucose levels after cold storage was constructed and compared to distance matrices constructed using RNA-seq gene expression data. Congruence between glucose and gene expression distance matrices was tested for each gene. Correlation between glucose and gene expression was also tested. Seventy-three genes were found that had significant p values in the congruence and correlation tests. Twelve genes from the list of 73 genes also had a high correlation between glucose and gene expression as measured by Nanostring nCounter. The gene annotations indicated functions in protein degradation, nematode resistance, auxin transport, and gibberellin response. These 12 genes were used to build models for prediction of CIS using multiple linear regression. Nine linear models were constructed that used different combinations of the 12 genes. An F-box protein, cellulose synthase, and a putative Lax auxin transporter gene were most frequently used. The findings of this study demonstrate the utility of gene expression profiles in predictive diagnostics for severity of CIS.

  11. [Prediction and bioinformatics analysis of human gene expression profiling regulated by amifostine].

    Science.gov (United States)

    Yang, Bo; Cai, Li-Li; Chi, Xiao-Hua; Lu, Xue-Chun; Zhang, Feng; Tuo, Shuai; Zhu, Hong-Li; Liu, Li-Hong; Yan, Jiang-Wei; Tuo, Chao-Wei

    2011-06-01

    Objective of this study was to perform bioinformatics analysis of the characteristics of gene expression profiling regulated by amifostine and predict its novel potential biological function to provide a direction for further exploring pharmacological actions of amifostine and study methods. Amifostine was used as a key word to search internet-based free gene expression database including GEO, affymetrix gene chip database, GenBank, SAGE, GeneCard, InterPro, ProtoNet, UniProt and BLOCKS and the sifted amifostine-regulated gene expression profiling data was subjected to validity testing, gene expression difference analysis and functional clustering and gene annotation. The results showed that only one data of gene expression profiling regulated by amifostine was sifted from GEO database (accession: GSE3212). Through validity testing and gene expression difference analysis, significant difference (p < 0.01) was only found in 2.14% of the whole genome (460/192000). Gene annotation analysis showed that 139 out of 460 genes were known genes, in which 77 genes were up-regulated and 62 genes were down-regulated. 13 out of 139 genes were newly expressed following amifostine treatment of K562 cells, however expression of 5 genes was completely inhibited. Functional clustering displayed that 139 genes were divided into 11 categories and their biological function was involved in hematopoietic and immunologic regulation, apoptosis and cell cycle. It is concluded that bioinformatics method can be applied to analysis of gene expression profiling regulated by amifostine. Amifostine has a regulatory effect on human gene expression profiling and this action is mainly presented in biological processes including hematopoiesis, immunologic regulation, apoptosis and cell cycle and so on. The effect of amifostine on human gene expression need to be further testified in experimental condition.

  12. Assembly of viral metagenomes from yellowstone hot springs.

    Science.gov (United States)

    Schoenfeld, Thomas; Patterson, Melodee; Richardson, Paul M; Wommack, K Eric; Young, Mark; Mead, David

    2008-07-01

    Thermophilic viruses were reported decades ago; however, knowledge of their diversity, biology, and ecological impact is limited. Previous research on thermophilic viruses focused on cultivated strains. This study examined metagenomic profiles of viruses directly isolated from two mildly alkaline hot springs, Bear Paw (74 degrees C) and Octopus (93 degrees C). Using a new method for constructing libraries from picograms of DNA, nearly 30 Mb of viral DNA sequence was determined. In contrast to previous studies, sequences were assembled at 50% and 95% identity, creating composite contigs up to 35 kb and facilitating analysis of the inherent heterogeneity in the populations. Lowering the assembly identity reduced the estimated number of viral types from 1,440 and 1,310 to 548 and 283, respectively. Surprisingly, the diversity of viral species in these springs approaches that in moderate-temperature environments. While most known thermophilic viruses have a chronic, nonlytic infection lifestyle, analysis of coding sequences suggests lytic viruses are more common in geothermal environments than previously thought. The 50% assembly included one contig with high similarity and perfect synteny to nine genes from Pyrobaculum spherical virus (PSV). In fact, nearly all the genes of the 28-kb genome of PSV have apparent homologs in the metagenomes. Similarities to thermoacidophilic viruses isolated on other continents were limited to specific open reading frames but were equally strong. Nearly 25% of the reads showed significant similarity between the hot springs, suggesting a common subterranean source. To our knowledge, this is the first application of metagenomics to viruses of geothermal origin.

  13. Toward molecular trait-based ecology through integration of biogeochemical, geographical and metagenomic data

    DEFF Research Database (Denmark)

    Raes, Jeroen; Letunic, Ivica; Yamada, Takuji

    2011-01-01

    provincialism). Molecular functional richness and diversity show a distinct latitudinal gradient peaking at 20° N and correlate with primary production. The latter can also be predicted from the molecular functional composition of an environmental sample. Together, our results show that the functional community...... composition derived from metagenomes is an important quantitative readout for molecular trait-based biogeography and ecology....

  14. An Automated Bayesian Framework for Integrative Gene Expression Analysis and Predictive Medicine

    OpenAIRE

    Parikh, Neena; Zollanvari, Amin; Alterovitz, Gil

    2012-01-01

    Motivation: This work constructs a closed loop Bayesian Network framework for predictive medicine via integrative analysis of publicly available gene expression findings pertaining to various diseases. Results: An automated pipeline was successfully constructed. Integrative models were made based on gene expression data obtained from GEO experiments relating to four different diseases using Bayesian statistical methods. Many of these models demonstrated a high level of accuracy and predictive...

  15. Evaluation of Interleukin 8 gene polymorphism for predicting ...

    African Journals Online (AJOL)

    Rajasree Shanmuganathan

    2016-07-09

    Jul 9, 2016 ... Hacking D, Knight JC, Rockett K, Brown H, Frampton J, et al. Increased in vivo transcription of an IL-8 haplotype associated · with respiratory syncytial virus disease-susceptibility. Genes · Immun 2004;5:274–82. 10. Michaud DS, Daugherty SE, Berndt SI, Platz EA, Yeager M, et al. Genetic polymorphisms of ...

  16. Evaluation of Interleukin 8 gene polymorphism for predicting ...

    African Journals Online (AJOL)

    Background and aim: Previous studies have observed the association between inflammation and chronic kidney disease (CKD). The role played by Interleukin 8 (IL8) gene polymorphism has not been studied yet. Hence, the present study has been designed as the first attempt to identify the possible associations between ...

  17. Metagenomics as a tool to obtain full genomes of process-critical bacteria in engineered systems

    DEFF Research Database (Denmark)

    Albertsen, Mads; Hugenholtz, Philip; Tyson, Gene W.

    parameters with functions of specific bacteria within the ecosystems in order to decipher principles that might be used to control and predict ecosystem performance. The main bottleneck in obtaining genomes from the environment is that the vast majority of bacteria are not readily cultured. Metagenomics...... sequenced two metagenomes from the same environmental sample, but using two independent DNA extraction methods, which resulted in different population abundances. This allowed sequence-composition independent binning of numerous high quality draft genomes from both high and low abundant members...... of the bacteria, including time-series. Using more than two metagenomes increases the binning resolution and hence the number of genomes that can be extracted. We are currently at a tipping point in microbial ecology – in the future it will be fast, cheap and easy to obtain genomes directly from the environment...

  18. Bioinformatics analysis of the predicted polyprenol reductase genes in higher plants

    Science.gov (United States)

    Basyuni, M.; Wati, R.

    2018-03-01

    The present study evaluates the bioinformatics methods to analyze twenty-four predicted polyprenol reductase genes from higher plants on GenBank as well as predicted the structure, composition, similarity, subcellular localization, and phylogenetic. The physicochemical properties of plant polyprenol showed diversity among the observed genes. The percentage of the secondary structure of plant polyprenol genes followed the ratio order of α helix > random coil > extended chain structure. The values of chloroplast but not signal peptide were too low, indicated that few chloroplast transit peptide in plant polyprenol reductase genes. The possibility of the potential transit peptide showed variation among the plant polyprenol reductase, suggested the importance of understanding the variety of peptide components of plant polyprenol genes. To clarify this finding, a phylogenetic tree was drawn. The phylogenetic tree shows several branches in the tree, suggested that plant polyprenol reductase genes grouped into divergent clusters in the tree.

  19. A new alkaline lipase obtained from the metagenome of marine sponge Ircinia sp.

    Science.gov (United States)

    Su, Jing; Zhang, Fengli; Sun, Wei; Karuppiah, Valliappan; Zhang, Guangya; Li, Zhiyong; Jiang, Qun

    2015-07-01

    Microorganisms associated with marine sponges are potential resources for marine enzymes. In this study, culture-independent metagenomic approach was used to isolate lipases from the complex microbiome of the sponge Ircinia sp. obtained from the South China Sea. A metagenomic library was constructed, containing 6568 clones, and functional screening on 1 % tributyrin agar resulted in the identification of a positive lipase clone (35F4). Following sequence analysis 35F4 clone was found to contain a putative lipase gene lipA. Sequence analysis of the predicted amino acid sequence of LipA revealed that it is a member of subfamily I.1 of lipases, with 63 % amino acid similarity to the lactonizing lipase from Aeromonas veronii (WP_021231793). Based on the predicted secondary structure, LipA was predicted to be an alkaline enzyme by sequence/structure analysis. Heterologous expression of lipA in E. coli BL21 (DE3) was performed and the characterization of the recombinant enzyme LipA showed that it is an alkaline enzyme with high tolerance to organic solvents. The isolated lipase LipA was active in the broad alkaline range, with the highest activity at pH 9.0, and had a high level of stability over a pH range of 7.0-12.0. The activity of LipA was increased in the presence of 5 mM Ca(2+) and some organic solvents, e.g. methanol, acetone and isopropanol. The optimum temperature for the activity of LipA is 40 °C and the molecular weight of LipA was determined to be ~30 kDa by SDS-PAGE. LipA is an alkaline lipase and shows good tolerance to some organic solvents, which make it of potential utility in the detergent industry and enzyme mediated organic synthesis. The result of this study has broadened the diversity of known lipolytic genes and demonstrated that marine sponges are an important source for new enzymes.

  20. Prediction of Drug Therapy for Chronic Hepatitis C Depending on the IL28B Gene Polymorphism

    Directory of Open Access Journals (Sweden)

    Moroz L.V. Moroz L.V.

    2014-09-01

    Molecular and genetic analysis of IL28V (rs12979860 gene polymorphism, located at a distance of 3 thousand nucleotide pairs from IL28V gene, using the polymerase chain reaction allows to predict the success of combination antiviral therapy, and the presence of C/C genotype can be a predictor of sustained virological response in patients chronic hepatitis C.

  1. Neural network predicts sequence of TP53 gene based on DNA chip

    DEFF Research Database (Denmark)

    Spicker, J.S.; Wikman, F.; Lu, M.L.

    2002-01-01

    We have trained an artificial neural network to predict the sequence of the human TP53 tumor suppressor gene based on a p53 GeneChip. The trained neural network uses as input the fluorescence intensities of DNA hybridized to oligonucleotides on the surface of the chip and makes between zero...

  2. Cohort-specific imputation of gene expression improves prediction of warfarin dose for African Americans

    Directory of Open Access Journals (Sweden)

    Assaf Gottlieb

    2017-11-01

    Full Text Available Abstract Background Genome-wide association studies are useful for discovering genotype–phenotype associations but are limited because they require large cohorts to identify a signal, which can be population-specific. Mapping genetic variation to genes improves power and allows the effects of both protein-coding variation as well as variation in expression to be combined into “gene level” effects. Methods Previous work has shown that warfarin dose can be predicted using information from genetic variation that affects protein-coding regions. Here, we introduce a method that improves dose prediction by integrating tissue-specific gene expression. In particular, we use drug pathways and expression quantitative trait loci knowledge to impute gene expression—on the assumption that differential expression of key pathway genes may impact dose requirement. We focus on 116 genes from the pharmacokinetic and pharmacodynamic pathways of warfarin within training and validation sets comprising both European and African-descent individuals. Results We build gene-tissue signatures associated with warfarin dose in a cohort-specific manner and identify a signature of 11 gene-tissue pairs that significantly augments the International Warfarin Pharmacogenetics Consortium dosage-prediction algorithm in both populations. Conclusions Our results demonstrate that imputed expression can improve dose prediction and bridge population-specific compositions. MATLAB code is available at https://github.com/assafgo/warfarin-cohort

  3. HuMiChip: Development of a Functional Gene Array for the Study of Human Microbiomes

    Energy Technology Data Exchange (ETDEWEB)

    Tu, Q.; Deng, Ye; Lin, Lu; Hemme, Chris L.; He, Zhili; Zhou, Jizhong

    2010-05-17

    Microbiomes play very important roles in terms of nutrition, health and disease by interacting with their hosts. Based on sequence data currently available in public domains, we have developed a functional gene array to monitor both organismal and functional gene profiles of normal microbiota in human and mouse hosts, and such an array is called human and mouse microbiota array, HMM-Chip. First, seed sequences were identified from KEGG databases, and used to construct a seed database (seedDB) containing 136 gene families in 19 metabolic pathways closely related to human and mouse microbiomes. Second, a mother database (motherDB) was constructed with 81 genomes of bacterial strains with 54 from gut and 27 from oral environments, and 16 metagenomes, and used for selection of genes and probe design. Gene prediction was performed by Glimmer3 for bacterial genomes, and by the Metagene program for metagenomes. In total, 228,240 and 801,599 genes were identified for bacterial genomes and metagenomes, respectively. Then the motherDB was searched against the seedDB using the HMMer program, and gene sequences in the motherDB that were highly homologous with seed sequences in the seedDB were used for probe design by the CommOligo software. Different degrees of specific probes, including gene-specific, inclusive and exclusive group-specific probes were selected. All candidate probes were checked against the motherDB and NCBI databases for specificity. Finally, 7,763 probes covering 91.2percent (12,601 out of 13,814) HMMer confirmed sequences from 75 bacterial genomes and 16 metagenomes were selected. This developed HMM-Chip is able to detect the diversity and abundance of functional genes, the gene expression of microbial communities, and potentially, the interactions of microorganisms and their hosts.

  4. Computational prediction of essential genes in an unculturable endosymbiotic bacterium, Wolbachia of Brugia malayi

    Directory of Open Access Journals (Sweden)

    Carlow Clotilde KS

    2009-11-01

    Full Text Available Abstract Background Wolbachia (wBm is an obligate endosymbiotic bacterium of Brugia malayi, a parasitic filarial nematode of humans and one of the causative agents of lymphatic filariasis. There is a pressing need for new drugs against filarial parasites, such as B. malayi. As wBm is required for B. malayi development and fertility, targeting wBm is a promising approach. However, the lifecycle of neither B. malayi nor wBm can be maintained in vitro. To facilitate selection of potential drug targets we computationally ranked the wBm genome based on confidence that a particular gene is essential for the survival of the bacterium. Results wBm protein sequences were aligned using BLAST to the Database of Essential Genes (DEG version 5.2, a collection of 5,260 experimentally identified essential genes in 15 bacterial strains. A confidence score, the Multiple Hit Score (MHS, was developed to predict each wBm gene's essentiality based on the top alignments to essential genes in each bacterial strain. This method was validated using a jackknife methodology to test the ability to recover known essential genes in a control genome. A second estimation of essentiality, the Gene Conservation Score (GCS, was calculated on the basis of phyletic conservation of genes across Wolbachia's parent order Rickettsiales. Clusters of orthologous genes were predicted within the 27 currently available complete genomes. Druggability of wBm proteins was predicted by alignment to a database of protein targets of known compounds. Conclusion Ranking wBm genes by either MHS or GCS predicts and prioritizes potentially essential genes. Comparison of the MHS to GCS produces quadrants representing four types of predictions: those with high confidence of essentiality by both methods (245 genes, those highly conserved across Rickettsiales (299 genes, those similar to distant essential genes (8 genes, and those with low confidence of essentiality (253 genes. These data facilitate

  5. Enhancing the Lasso Approach for Developing a Survival Prediction Model Based on Gene Expression Data

    Directory of Open Access Journals (Sweden)

    Shuhei Kaneko

    2015-01-01

    Full Text Available In the past decade, researchers in oncology have sought to develop survival prediction models using gene expression data. The least absolute shrinkage and selection operator (lasso has been widely used to select genes that truly correlated with a patient’s survival. The lasso selects genes for prediction by shrinking a large number of coefficients of the candidate genes towards zero based on a tuning parameter that is often determined by a cross-validation (CV. However, this method can pass over (or fail to identify true positive genes (i.e., it identifies false negatives in certain instances, because the lasso tends to favor the development of a simple prediction model. Here, we attempt to monitor the identification of false negatives by developing a method for estimating the number of true positive (TP genes for a series of values of a tuning parameter that assumes a mixture distribution for the lasso estimates. Using our developed method, we performed a simulation study to examine its precision in estimating the number of TP genes. Additionally, we applied our method to a real gene expression dataset and found that it was able to identify genes correlated with survival that a CV method was unable to detect.

  6. Fishing for biodiversity: Novel methanopterin-linked C1 transfergenes deduced from the Sargasso Sea metagenome

    Energy Technology Data Exchange (ETDEWEB)

    Kalyuzhnaya, Marina G.; Nercessian, Olivier; Lapidus, Alla; Chistoserdova, Ludmila

    2004-07-01

    The recently generated database of microbial genes from anoligotrophic environment populated by a calculated 1,800 of major phylotypes (the Sargasso Sea metagenome) presents a great source for expanding local databases of genes indicative of a specific function. In this paper we analyze the Sargasso Sea metagenome in terms of the presence of methanopterin-linked C1 transfer genes that are signature for methylotrophy. We conclude that more than 10 phylotypes possessing genes of interest are present in this environment, and a few of these are relatively abundant species. The sequences representative of the major phylotypes do not appear to belong to any known microbial group capable of methanopterin-linked C1 transfer. Instead, they separate from all known sequences on phylogenetic trees, pointing towards their affiliation with a novel microbial phylum. These data imply a broader distribution of methanopterin-linked functions in the microbial world than previously known.

  7. Metagenomic analysis of microbial communities yields insight into impacts of nanoparticle design

    Science.gov (United States)

    Metch, Jacob W.; Burrows, Nathan D.; Murphy, Catherine J.; Pruden, Amy; Vikesland, Peter J.

    2018-01-01

    Next-generation DNA sequencing and metagenomic analysis provide powerful tools for the environmentally friendly design of nanoparticles. Herein we demonstrate this approach using a model community of environmental microbes (that is, wastewater-activated sludge) dosed with gold nanoparticles of varying surface coatings and morphologies. Metagenomic analysis was highly sensitive in detecting the microbial community response to gold nanospheres and nanorods with either cetyltrimethylammonium bromide or polyacrylic acid surface coatings. We observed that the gold-nanoparticle morphology imposes a stronger force in shaping the microbial community structure than does the surface coating. Trends were consistent in terms of the compositions of both taxonomic and functional genes, which include antibiotic resistance genes, metal resistance genes and gene-transfer elements associated with cell stress that are relevant to public health. Given that nanoparticle morphology remained constant, the potential influence of gold dissolution was minimal. Surface coating governed the nanoparticle partitioning between the bioparticulate and aqueous phases.

  8. Opsin gene polymorphism predicts trichromacy in a cathemeral lemur.

    Science.gov (United States)

    Veilleux, Carrie C; Bolnick, Deborah A

    2009-01-01

    Recent research has identified polymorphic trichromacy in three diurnal strepsirrhines: Coquerel's sifaka (Propithecus coquereli), black and white ruffed lemurs (Varecia variegata), and red ruffed lemurs (V. rubra). Current hypotheses suggest that the transitions to diurnality experienced by Propithecus and Varecia were necessary precursors to their independent acquisitions of trichromacy. Accordingly, cathemeral lemurs are thought to lack the M/L opsin gene polymorphism necessary for trichromacy. In this study, the M/L opsin gene was sequenced in ten cathemeral blue-eyed black lemurs (Eulemur macaco flavifrons). This analysis identified a polymorphism identical to that of other trichromatic strepsirrhines at the critical amino acid position 285 in exon 5 of the M/L opsin gene. Thus, polymorphic trichromacy is likely present in at least one cathemeral Eulemur species, suggesting that strict diurnality is not necessary for trichromacy. The presence of trichromacy in E. m. flavifrons suggests that a re-evaluation of current hypotheses regarding the evolution of strepsirrhine trichromacy may be necessary. Although the M/L opsin polymorphism may have been independently acquired three times in the lemurid-indriid clade, the distribution of opsin alleles in lemurids and indriids may also be consistent with a common origin of trichromacy in the last common ancestor of either the lemurids or the lemurid-indriid clade. (c) 2008 Wiley-Liss, Inc.

  9. Exploration of soil metagenome diversity for prospection of enzymes involved in lignocellulosic biomass conversion

    Energy Technology Data Exchange (ETDEWEB)

    Alvarez, T.M.; Squina, F.M. [Laboratorio Nacional de Luz Sincrotron (LNLS), Campinas, SP (Brazil); Paixao, D.A.A.; Franco Cairo, J.P.L.; Buchli, F.; Ruller, R. [Laboratorio Nacional de Ciencia e Tecnologia do Bioetanol (CTBE), Campinas, SP (Brazil); Prade, R. [Oklahoma State University, Sillwater, OK (United States)

    2012-07-01

    Full text: Metagenomics allows access to genetic information encoded in DNA of microorganisms recalcitrant to cultivation. They represent a reservoir of novel biocatalyst with potential application in environmental friendly techniques aiming to overcome the dependence on fossil fuels and also to diminish air and water pollution. The focus of our work is the generation of a tool kit of lignocellulolytic enzymes from soil metagenome, which could be used for second generation ethanol production. Environmental samples were collected at a sugarcane field after harvesting, where it is expected that the microbial population involved on lignocellulose degradation was enriched due to the presence of straws covering the soil. Sugarcane Bagasse-Degrading-Soil (SBDS) metagenome was massively-parallel-454-Roche-sequenced. We identified a full repertoire of genes with significant match to glycosyl hydrolases catalytic domain and carbohydrate-binding modules. Soil metagenomics libraries cloned into pUC19 were screened through functional assays. CMC-agar screening resulted in positive clones, revealing new cellulases coding genes. Through a CMC-zymogram it was possible to observe that one of these genes, nominated as E-1, corresponds to an enzyme that is secreted to the extracellular medium, suggesting that the cloned gene carried the original signal peptide. Enzymatic assays and analysis through capillary electrophoresis showed that E-1 was able to cleave internal glycosidic bonds of cellulose. New rounds of functional screenings through chromogenic substrates are being conducted aiming the generation of a library of lignocellulolytic enzymes derived from soil metagenome, which may become key component for development of second generation biofuels. (author)

  10. Abundance of genes involved in mercury methylation in oceanic environments

    Science.gov (United States)

    Palumbo, A. V.; Podar, M.; Gilmour, C. C.; Brandt, C. C.; Brown, S. D.; Crable, B. R.; Weighill, D.; Jacobson, D. A.; Somenahally, A. C.; Elias, D. A.

    2016-02-01

    The distribution and diversity of genes involved in mercury methylation in oceanic environments is of interest in determining the source of mercury in ocean environments and may have predictive value for mercury methylation rates. The highly conserved hgcAB genes involved in mercury methylation provide an avenue for evaluating the genetic potential for mercury methylation. The genes are sporadically present in a few diverse groups of bacteria and Archaea including Deltaproteobacteria, Firmicutes and Archaea and of over 7000 sequenced species they are only present in about 100 genomes. Examination of sequence data from methylators and non-methylators indicates that these genes are associated with other genes involved in metal transformations and transport. We examined hgcAB presence in over 3500 microbial metagenomes (from all environments) and found the hgcAB genes were present in anaerobic oceanic environments but not in aerobic layers of the open ocean. The genes were common in sediments from marine, coastal and estuarine sources as well as polluted environments. The genes were rare, found in 7 of 138 samples, in metagenomes from the pelagic water column including profiles though the oxygen minimum zone. Other oxic and sub-oxic coastal waters also demonstrated a lack of hgcAB genes including the OMZ in the Eastern North Pacific Ocean. There were some unique hgcA like unique sequences found in metagenomes from depth in the Pacific and Southern Atlantic Ocean. Coastal "dead zone" waters may be important sources of MeHg as the hgcAB genes were abundant in the anoxic waters of a stratified fjord. The genes were absent in microbiomes from vertebrates but were in invertebrate microbiomes However, oceanic species were underrepresented in these samples. Climate change could provide an additional flux of MeHg to the oceans as we found the most abundant representation of hgcAB genes in arctic permafrost. Thus warming could increase flux of methyl mercury to arctic waters.

  11. cDNA sequences reveal considerable gene prediction inaccuracy in the Plasmodium falciparum genome

    Directory of Open Access Journals (Sweden)

    Valenzuela Jesus G

    2007-07-01

    Full Text Available Abstract Background The completion of the Plasmodium falciparum genome represents a milestone in malaria research. The genome sequence allows for the development of genome-wide approaches such as microarray and proteomics that will greatly facilitate our understanding of the parasite biology and accelerate new drug and vaccine development. Designing and application of these genome-wide assays, however, requires accurate information on gene prediction and genome annotation. Unfortunately, the genes in the parasite genome databases were mostly identified using computer software that could make some erroneous predictions. Results We aimed to obtain cDNA sequences to examine the accuracy of gene prediction in silico. We constructed cDNA libraries from mixed blood stages of P. falciparum parasite using the SMART cDNA library construction technique and generated 17332 high-quality expressed sequence tags (EST, including 2198 from primer-walking experiments. Assembly of our sequence tags produced 2548 contigs and 2671 singletons versus 5220 contigs and 5910 singletons when our EST were assembled with EST in public databases. Comparison of all the assembled EST/contigs with predicted CDS and genomic sequences in the PlasmoDB database identified 356 genes with predicted coding sequences fully covered by EST, including 85 genes (23.6% with introns incorrectly predicted. Careful automatic software and manual alignments found an additional 308 genes that have introns different from those predicted, with 152 new introns discovered and 182 introns with sizes or locations different from those predicted. Alternative spliced and antisense transcripts were also detected. Matching cDNA to predicted genes also revealed silent chromosomal regions, mostly at subtelomere regions. Conclusion Our data indicated that approximately 24% of the genes in the current databases were predicted incorrectly, although some of these inaccuracies could represent alternatively

  12. A robust gene expression-based prognostic risk score predicts overall survival of lung adenocarcinoma patients.

    Science.gov (United States)

    Chen, En-Guo; Wang, Pin; Lou, Haizhou; Wang, Yunshan; Yan, Hong; Bi, Lei; Liu, Liang; Li, Bin; Snijders, Antoine M; Mao, Jian-Hua; Hang, Bo

    2018-01-23

    Identification of reliable predictive biomarkers and new therapeutic targets is a critical step for significant improvement in patient outcomes. Here, we developed a multi-step bioinformatics analytic strategy to mine large omics and clinical data to build a prognostic scoring system for predicting the overall survival (OS) of lung adenocarcinoma (LuADC) patients. In latter we first identified 1327 significantly and robustly deregulated genes, 600 of which were significantly associated with the OS of LuADC patients. Gene co-expression network analysis revealed the biological functions of these 600 genes in normal lung and LuADCs, which were found to be enriched for cell cycle-related processes, blood vessel development, cell-matrix adhesion and metabolic processes. Finally, we implemented a multiple resampling method combined with Cox regression analysis to identify a 27-gene signature associated with OS, and then created a prognostic scoring system based on this signature. This scoring system robustly predicted OS of LuADC patients in 100 sampling test sets and was further validated in four independent LuADC cohorts. In addition, in comparison to other existing prognostic gene signatures published in the literature, our signature was significantly superior in predicting OS of LuADC patients. In summary, our multi-omics and clinical data integration study created a 27-gene prognostic risk score that can predict OS of LuADC patients independent of age, gender and clinical stage. This score could guide therapeutic selection and allow stratification in clinical trials.

  13. The metagenomic data life-cycle: standards and best practices

    Energy Technology Data Exchange (ETDEWEB)

    ten Hoopen, Petra; Finn, Robert D.; Bongo, Lars Ailo; Corre, Erwan; Fosso, Bruno; Meyer, Folker; Mitchell, Alex; Pelletier, Eric; Pesole, Graziano; Santamaria, Monica; Willassen, Nils Peder; Cochrane, Guy

    2017-06-16

    Metagenomics data analyses from independent studies can only be compared if the analysis workflows are described in a harmonised way. In this overview, we have mapped the landscape of data standards available for the description of essential steps in metagenomics: (1) material sampling, (2) material sequencing (3) data analysis and (4) data archiving & publishing. Taking examples from marine research, we summarise essential variables used to describe material sampling processes and sequencing procedures in a metagenomics experiment. These aspects of metagenomics dataset generation have been to some extent addressed by the scientific community but greater awareness and adoption is still needed. We emphasise the lack of standards relating to reporting how metagenomics datasets are analysed and how the metagenomics data analysis outputs should be archived and published. We propose best practice as a foundation for a community standard to enable reproducibility and better sharing of metagenomics datasets, leading ultimately to greater metagenomics data reuse and repurposing.

  14. Combined effects of thrombosis pathway gene variants predict cardiovascular events.

    Directory of Open Access Journals (Sweden)

    Kirsi Auro

    2007-07-01

    Full Text Available The genetic background of complex diseases is proposed to consist of several low-penetrance risk loci. Addressing this complexity likely requires both large sample size and simultaneous analysis of different predisposing variants. We investigated the role of four thrombosis genes: coagulation factor V (F5, intercellular adhesion molecule 1 (ICAM1, protein C (PROC, and thrombomodulin (THBD in cardiovascular diseases. Single allelic gene variants and their pair-wise combinations were analyzed in two independently sampled population cohorts from Finland. From among 14,140 FINRISK participants (FINRISK-92, n = 5,999 and FINRISK-97, n = 8,141, we selected for genotyping a sample of 2,222, including 528 incident cardiovascular disease (CVD cases and random subcohorts totaling 786. To cover all known common haplotypes (>10%, 54 single nucleotide polymorphisms (SNPs were genotyped. Classification-tree analysis identified 11 SNPs that were further analyzed in Cox's proportional hazard model as single variants and pair-wise combinations. Multiple testing was controlled by use of two independent cohorts and with false-discovery rate. Several CVD risk variants were identified: In women, the combination of F5 rs7542281 x THBD rs1042580, together with three single F5 SNPs, was associated with CVD events. Among men, PROC rs1041296, when combined with either ICAM1 rs5030341 or F5 rs2269648, was associated with total mortality. As a single variant, PROC rs1401296, together with the F5 Leiden mutation, was associated with ischemic stroke events. Our strategy to combine the classification-tree analysis with more traditional genetic models was successful in identifying SNPs-acting either in combination or as single variants--predisposing to CVD, and produced consistent results in two independent cohorts. These results suggest that variants in these four thrombosis genes contribute to arterial cardiovascular events at population level.

  15. Metagenomic insights into evolution of heavy metal-contaminated groundwater microbial community

    Energy Technology Data Exchange (ETDEWEB)

    Hemme, C.L.; Deng, Y.; Gentry, T.J.; Fields, M.W.; Wu, L.; Barua, S.; Barry, K.; Green-Tringe, S.; Watson, D.B.; He, Z.; Hazen, T.C.; Tiedje, J.M.; Rubin, E.M.; Zhou, J.

    2010-07-01

    Understanding adaptation of biological communities to environmental change is a central issue in ecology and evolution. Metagenomic analysis of a stressed groundwater microbial community reveals that prolonged exposure to high concentrations of heavy metals, nitric acid and organic solvents ({approx}50 years) has resulted in a massive decrease in species and allelic diversity as well as a significant loss of metabolic diversity. Although the surviving microbial community possesses all metabolic pathways necessary for survival and growth in such an extreme environment, its structure is very simple, primarily composed of clonal denitrifying {gamma}- and {beta}-proteobacterial populations. The resulting community is overabundant in key genes conferring resistance to specific stresses including nitrate, heavy metals and acetone. Evolutionary analysis indicates that lateral gene transfer could have a key function in rapid response and adaptation to environmental contamination. The results presented in this study have important implications in understanding, assessing and predicting the impacts of human-induced activities on microbial communities ranging from human health to agriculture to environmental management, and their responses to environmental changes.

  16. Metagenomic Insights into Evolution of a Heavy Metal-Contaminated Groundwater Microbial Community

    Energy Technology Data Exchange (ETDEWEB)

    Hemme, Christopher L.; Deng, Ye; Gentry, Terry J.; Fields, Matthew W.; Wu, Liyou; Barua, Soumitra; Barry, Kerrie; Tringe, Susannah G.; Watson, David B.; He, Zhili; Hazen, Terry C.; Tiedje, James M.; Rubin, Edward M.; Zhou, Jizhong

    2010-02-15

    Understanding adaptation of biological communities to environmental change is a central issue in ecology and evolution. Metagenomic analysis of a stressed groundwater microbial community reveals that prolonged exposure to high concentrations of heavy metals, nitric acid and organic solvents (~;;50 years) have resulted in a massive decrease in species and allelic diversity as well as a significant loss of metabolic diversity. Although the surviving microbial community possesses all metabolic pathways necessary for survival and growth in such an extreme environment, its structure is very simple, primarily composed of clonal denitrifying ?- and ?-proteobacterial populations. The resulting community is over-abundant in key genes conferring resistance to specific stresses including nitrate, heavy metals and acetone. Evolutionary analysis indicates that lateral gene transfer could be a key mechanism in rapidly responding and adapting to environmental contamination. The results presented in this study have important implications in understanding, assessing and predicting the impacts of human-induced activities on microbial communities ranging from human health to agriculture to environmental management, and their responses to environmental changes.

  17. Pathway analysis of gene signatures predicting metastasis of node-negative primary breast cancer

    International Nuclear Information System (INIS)

    Yu, Jack X; Sieuwerts, Anieta M; Zhang, Yi; Martens, John WM; Smid, Marcel; Klijn, Jan GM; Wang, Yixin; Foekens, John A

    2007-01-01

    Published prognostic gene signatures in breast cancer have few genes in common. Here we provide a rationale for this observation by studying the prognostic power and the underlying biological pathways of different gene signatures. Gene signatures to predict the development of metastases in estrogen receptor-positive and estrogen receptor-negative tumors were identified using 500 re-sampled training sets and mapping to Gene Ontology Biological Process to identify over-represented pathways. The Global Test program confirmed that gene expression profilings in the common pathways were associated with the metastasis of the patients. The apoptotic pathway and cell division, or cell growth regulation and G-protein coupled receptor signal transduction, were most significantly associated with the metastatic capability of estrogen receptor-positive or estrogen-negative tumors, respectively. A gene signature derived of the common pathways predicted metastasis in an independent cohort. Mapping of the pathways represented by different published prognostic signatures showed that they share 53% of the identified pathways. We show that divergent gene sets classifying patients for the same clinical endpoint represent similar biological processes and that pathway-derived signatures can be used to predict prognosis. Furthermore, our study reveals that the underlying biology related to aggressiveness of estrogen receptor subgroups of breast cancer is quite different

  18. Oxytocin receptor gene variation predicts subjective responses to MDMA.

    Science.gov (United States)

    Bershad, Anya K; Weafer, Jessica J; Kirkpatrick, Matthew G; Wardle, Margaret C; Miller, Melissa A; de Wit, Harriet

    2016-12-01

    3,4-Methylenedioxymethamphetamine (MDMA, "ecstasy") enhances desire to socialize and feelings of empathy, which are thought to be related to increased oxytocin levels. Thus, variation in the oxytocin receptor gene (OXTR) may influence responses to the drug. Here, we examined the influence of a single OXTR nucleotide polymorphism (SNP) on responses to MDMA in humans. Based on findings that carriers of the A allele at rs53576 exhibit reduced sensitivity to oxytocin-induced social behavior, we hypothesized that these individuals would show reduced subjective responses to MDMA, including sociability. In this three-session, double blind, within-subjects study, healthy volunteers with past MDMA experience (N = 68) received a MDMA (0, 0.75 mg/kg, and 1.5 mg/kg) and provided self-report ratings of sociability, anxiety, and drug effects. These responses were examined in relation to rs53576. MDMA (1.5 mg/kg) did not increase sociability in individuals with the A/A genotype as it did in G allele carriers. The genotypic groups did not differ in responses at the lower MDMA dose, or in cardiovascular or other subjective responses. These findings are consistent with the idea that MDMA-induced sociability is mediated by oxytocin, and that variation in the oxytocin receptor gene may influence responses to the drug.

  19. Metabolic reconstruction for metagenomic data and its application to the human microbiome.

    Directory of Open Access Journals (Sweden)

    Sahar Abubucker

    Full Text Available Microbial communities carry out the majority of the biochemical activity on the planet, and they play integral roles in processes including metabolism and immune homeostasis in the human microbiome. Shotgun sequencing of such communities' metagenomes provides information complementary to organismal abundances from taxonomic markers, but the resulting data typically comprise short reads from hundreds of different organisms and are at best challenging to assemble comparably to single-organism genomes. Here, we describe an alternative approach to infer the functional and metabolic potential of a microbial community metagenome. We determined the gene families and pathways present or absent within a community, as well as their relative abundances, directly from short sequence reads. We validated this methodology using a collection of synthetic metagenomes, recovering the presence and abundance both of large pathways and of small functional modules with high accuracy. We subsequently applied this method, HUMAnN, to the microbial communities of 649 metagenomes drawn from seven primary body sites on 102 individuals as part of the Human Microbiome Project (HMP. This provided a means to compare functional diversity and organismal ecology in the human microbiome, and we determined a core of 24 ubiquitously present modules. Core pathways were often implemented by different enzyme families within different body sites, and 168 functional modules and 196 metabolic pathways varied in metagenomic abundance specifically to one or more niches within the microbiome. These included glycosaminoglycan degradation in the gut, as well as phosphate and amino acid transport linked to host phenotype (vaginal pH in the posterior fornix. An implementation of our methodology is available at http://huttenhower.sph.harvard.edu/humann. This provides a means to accurately and efficiently characterize microbial metabolic pathways and functional modules directly from high

  20. Resolving prokaryotic taxonomy without rRNA: longer oligonucleotide word lengths improve genome and metagenome taxonomic classification.

    Directory of Open Access Journals (Sweden)

    Eric B Alsop

    Full Text Available Oligonucleotide signatures, especially tetranucleotide signatures, have been used as method for homology binning by exploiting an organism's inherent biases towards the use of specific oligonucleotide words. Tetranucleotide signatures have been especially useful in environmental metagenomics samples as many of these samples contain organisms from poorly classified phyla which cannot be easily identified using traditional homology methods, including NCBI BLAST. This study examines oligonucleotide signatures across 1,424 completed genomes from across the tree of life, substantially expanding upon previous work. A comprehensive analysis of mononucleotide through nonanucleotide word lengths suggests that longer word lengths substantially improve the classification of DNA fragments across a range of sizes of relevance to high throughput sequencing. We find that, at present, heptanucleotide signatures represent an optimal balance between prediction accuracy and computational time for resolving taxonomy using both genomic and metagenomic fragments. We directly compare the ability of tetranucleotide and heptanucleotide world lengths (tetranucleotide signatures are the current standard for oligonucleotide word usage analyses for taxonomic binning of metagenome reads. We present evidence that heptanucleotide word lengths consistently provide more taxonomic resolving power, particularly in distinguishing between closely related organisms that are often present in metagenomic samples. This implies that longer oligonucleotide word lengths should replace tetranucleotide signatures for most analyses. Finally, we show that the application of longer word lengths to metagenomic datasets leads to more accurate taxonomic binning of DNA scaffolds and have the potential to substantially improve taxonomic assignment and assembly of metagenomic data.

  1. AmphoraNet: the webserver implementation of the AMPHORA2 metagenomic workflow suite.

    Science.gov (United States)

    Kerepesi, Csaba; Bánky, Dániel; Grolmusz, Vince

    2014-01-10

    Metagenomics went through an astonishing development in the past few years. Today not only gene sequencing experts, but numerous laboratories of other specializations need to analyze DNA sequences gained from clinical or environmental samples. Phylogenetic analysis of the metagenomic data presents significant challenges for the biologist and the bioinformatician. The program suite AMPHORA and its workflow version are examples of publicly available software that yields reliable phylogenetic results for metagenomic data. Here we present AmphoraNet, an easy-to-use webserver that is capable of assigning a probability-weighted taxonomic group for each phylogenetic marker gene found in the input metagenomic sample; the webserver is based on the AMPHORA2 workflow. Since a large proportion of molecular biologists uses the BLAST program and its clones on public webservers instead of the locally installed versions, we believe that the occasional user may find it comfortable that, in this version, no time-consuming installation of every component of the AMPHORA2 suite or expertise in Linux environment is required. The webserver is freely available at http://amphoranet.pitgroup.org; no registration is required. © 2013 Elsevier B.V. All rights reserved.

  2. Microbial Diversity and Biochemical Potential Encoded by Thermal Spring Metagenomes Derived from the Kamchatka Peninsula

    Directory of Open Access Journals (Sweden)

    Bernd Wemheuer

    2013-01-01

    Full Text Available Volcanic regions contain a variety of environments suitable for extremophiles. This study was focused on assessing and exploiting the prokaryotic diversity of two microbial communities derived from different Kamchatkian thermal springs by metagenomic approaches. Samples were taken from a thermoacidophilic spring near the Mutnovsky Volcano and from a thermophilic spring in the Uzon Caldera. Environmental DNA for metagenomic analysis was isolated from collected sediment samples by direct cell lysis. The prokaryotic community composition was examined by analysis of archaeal and bacterial 16S rRNA genes. A total number of 1235 16S rRNA gene sequences were obtained and used for taxonomic classification. Most abundant in the samples were members of Thaumarchaeota, Thermotogae, and Proteobacteria. The Mutnovsky hot spring was dominated by the Terrestrial Hot Spring Group, Kosmotoga, and Acidithiobacillus. The Uzon Caldera was dominated by uncultured members of the Miscellaneous Crenarchaeotic Group and Enterobacteriaceae. The remaining 16S rRNA gene sequences belonged to the Aquificae, Dictyoglomi, Euryarchaeota, Korarchaeota, Thermodesulfobacteria, Firmicutes, and some potential new phyla. In addition, the recovered DNA was used for generation of metagenomic libraries, which were subsequently mined for genes encoding lipolytic and proteolytic enzymes. Three novel genes conferring lipolytic and one gene conferring proteolytic activity were identified.

  3. In silico approach to designing rational metagenomic libraries for functional studies.

    Science.gov (United States)

    Kusnezowa, Anna; Leichert, Lars I

    2017-05-22

    With the development of Next Generation Sequencing technologies, the number of predicted proteins from entire (meta-) genomes has risen exponentially. While for some of these sequences protein functions can be inferred from homology, an experimental characterization is still a requirement for the determination of protein function. However, functional characterization of proteins cannot keep pace with our capabilities to generate more and more sequence data. Here, we present an approach to reduce the number of proteins from entire (meta-) genomes to a reasonably small number for further experimental characterization without loss of important information. About 6.1 million predicted proteins from the Global Ocean Sampling Expedition Metagenome project were distributed into classes based either on homology to existing hidden markov models (HMMs) of known families, or de novo by assessment of pairwise similarity. 5.1 million of these proteins could be classified in this way, yielding 18,437 families. For 4,129 protein families, which did not match existing HMMs from databases, we could create novel HMMs. For each family, we then selected a representative protein, which showed the closest homology to all other proteins in this family. We then selected representatives of four families based on their homology to known and well-characterized lipases. From these four synthesized genes, we could obtain the novel esterase/lipase GOS54, validating our approach. Using an in silico approach, we were able improve the success rate of functional screening and make entire (meta-) genomes amenable for biochemical characterization.

  4. Assembling the Marine Metagenome, One Cell at a Time

    Energy Technology Data Exchange (ETDEWEB)

    Woyke, Tanja; Xie, Gary; Copeland, Alex; Gonzalez, Jose M.; Han, Cliff; Kiss, Hajnalka; Saw, Jimmy H.; Senin, Pavel; Yang, Chi; Chatterji, Sourav; Cheng, Jan-Fang; Eisen, Jonathan A.; Sieracki, Michael E.; Stepanauskas, Ramunas

    2010-06-24

    The difficulty associated with the cultivation of most microorganisms and the complexity of natural microbial assemblages, such as marine plankton or human microbiome, hinder genome reconstruction of representative taxa using cultivation or metagenomic approaches. Here we used an alternative, single cell sequencing approach to obtain high-quality genome assemblies of two uncultured, numerically significant marine microorganisms. We employed fluorescence-activated cell sorting and multiple displacement amplification to obtain hundreds of micrograms of genomic DNA from individual, uncultured cells of two marine flavobacteria from the Gulf of Maine that were phylogenetically distant from existing cultured strains. Shotgun sequencing and genome finishing yielded 1.9 Mbp in 17 contigs and 1.5 Mbp in 21 contigs for the two flavobacteria, with estimated genome recoveries of about 91percent and 78percent, respectively. Only 0.24percent of the assembling sequences were contaminants and were removed from further analysis using rigorous quality control. In contrast to all cultured strains of marine flavobacteria, the two single cell genomes were excellent Global Ocean Sampling (GOS) metagenome fragment recruiters, demonstrating their numerical significance in the ocean. The geographic distribution of GOS recruits along the Northwest Atlantic coast coincided with ocean surface currents. Metabolic reconstruction indicated diverse potential energy sources, including biopolymer degradation, proteorhodopsin photometabolism, and hydrogen oxidation. Compared to cultured relatives, the two uncultured flavobacteria have small genome sizes, few non-coding nucleotides, and few paralogous genes, suggesting adaptations to narrow ecological niches. These features may have contributed to the abundance of the two taxa in specific regions of the ocean, and may have hindered their cultivation. We demonstrate the power of single cell DNA sequencing to generate reference genomes of uncultured

  5. Assembling The Marine Metagenome, One Cell At A Time

    Energy Technology Data Exchange (ETDEWEB)

    Xie, Gang [Los Alamos National Laboratory; Han, Shunsheng [Los Alamos National Laboratory; Kiss, Hajnalka [Los Alamos National Laboratory; Saw, Jimmy [Los Alamos National Laboratory; Senin, Pavel [Los Alamos National Laboratory; Woyke, Tanja [DOE JOINT GENOME INAT.; Copeland, Alex [DOE JOINT GENSOME INST.; Gonzalez, Jose [UNIV OF LAGUNA, SPAIN; Chatterji, Sourav [DOE JOINT GENSOME INST.; Cheng, Jan - Fang [DOE JOINT GENSOME INST.; Eisen, Jonathan A [DOE JOINT GENOME INST.; Sieracki, Michael E [UNIV OF CA-DAVIS; Stepanauskas, Ramunas [BIGELOW LAB

    2008-01-01

    The difficulty associated with the cultivation of most microorganisms and the complexity of natural microbial assemblages, such as marine plankton or human microbiome, hinder genome reconstruction of representative taxa using cultivation or metagenomic approaches. Here we used an alternative, single cell sequencing approach to obtain high-quality genome assemblies of two uncultured, numerically significant marine microorganisms. We employed fluorescence-activated cell sorting and multiple displacement amplification to obtain hundreds of micrograms of genomic DNA from individual, uncultured cells of two marine flavobacteria from the Gulf of Maine that were phylogenetically distant from existing cultured strains. Shotgun sequencing and genome finishing yielded 1.9 Mbp in 17 contigs and 1.5 Mbp in 21 contigs for the two flavobacteria, with estimated genome recoveries of about 91% and 78%, respectively. Only 0.24% of the assembling sequences were contaminants and were removed from further analysis using rigorous quality control. In contrast to all cultured strains of marine flavobacteria, the two single cell genomes were excellent Global Ocean Sampling (GOS) metagenome fragment recruiters, demonstrating their numerical significance in the ocean. The geographic distribution of GOS recruits along the Northwest Atlantic coast coincided with ocean surface currents. Metabolic reconstruction indicated diverse potential energy sources, including biopolymer degradation, proteorhodopsin photometabolism, and hydrogen oxidation. Compared to cultured relatives, the two uncultured flavobacteria have small genome sizes, few non-coding nucleotides, and few paralogous genes, suggesting adaptations to narrow ecological niches. These features may have contributed to the abundance of the two taxa in specific regions of the ocean, and may have hindered their cultivation. We demonstrate the power of single cell DNA sequencing to generate reference genomes of uncultured taxa from a complex

  6. Assembling the marine metagenome, one cell at a time.

    Directory of Open Access Journals (Sweden)

    Tanja Woyke

    Full Text Available The difficulty associated with the cultivation of most microorganisms and the complexity of natural microbial assemblages, such as marine plankton or human microbiome, hinder genome reconstruction of representative taxa using cultivation or metagenomic approaches. Here we used an alternative, single cell sequencing approach to obtain high-quality genome assemblies of two uncultured, numerically significant marine microorganisms. We employed fluorescence-activated cell sorting and multiple displacement amplification to obtain hundreds of micrograms of genomic DNA from individual, uncultured cells of two marine flavobacteria from the Gulf of Maine that were phylogenetically distant from existing cultured strains. Shotgun sequencing and genome finishing yielded 1.9 Mbp in 17 contigs and 1.5 Mbp in 21 contigs for the two flavobacteria, with estimated genome recoveries of about 91% and 78%, respectively. Only 0.24% of the assembling sequences were contaminants and were removed from further analysis using rigorous quality control. In contrast to all cultured strains of marine flavobacteria, the two single cell genomes were excellent Global Ocean Sampling (GOS metagenome fragment recruiters, demonstrating their numerical significance in the ocean. The geographic distribution of GOS recruits along the Northwest Atlantic coast coincided with ocean surface currents. Metabolic reconstruction indicated diverse potential energy sources, including biopolymer degradation, proteorhodopsin photometabolism, and hydrogen oxidation. Compared to cultured relatives, the two uncultured flavobacteria have small genome sizes, few non-coding nucleotides, and few paralogous genes, suggesting adaptations to narrow ecological niches. These features may have contributed to the abundance of the two taxa in specific regions of the ocean, and may have hindered their cultivation. We demonstrate the power of single cell DNA sequencing to generate reference genomes of uncultured

  7. Prediction of cardioembolic, arterial, and lacunar causes of cryptogenic stroke by gene expression and infarct location.

    Science.gov (United States)

    Jickling, Glen C; Stamova, Boryana; Ander, Bradley P; Zhan, Xinhua; Liu, Dazhi; Sison, Shara-Mae; Verro, Piero; Sharp, Frank R

    2012-08-01

    The cause of ischemic stroke remains unclear, or cryptogenic, in as many as 35% of patients with stroke. Not knowing the cause of stroke restricts optimal implementation of prevention therapy and limits stroke research. We demonstrate how gene expression profiles in blood can be used in conjunction with a measure of infarct location on neuroimaging to predict a probable cause in cryptogenic stroke. The cause of cryptogenic stroke was predicted using previously described profiles of differentially expressed genes characteristic of patients with cardioembolic, arterial, and lacunar stroke. RNA was isolated from peripheral blood of 131 cryptogenic strokes and compared with profiles derived from 149 strokes of known cause. Each sample was run on Affymetrix U133 Plus 2.0 microarrays. Cause of cryptogenic stroke was predicted using gene expression in blood and infarct location. Cryptogenic strokes were predicted to be 58% cardioembolic, 18% arterial, 12% lacunar, and 12% unclear etiology. Cryptogenic stroke of predicted cardioembolic etiology had more prior myocardial infarction and higher CHA(2)DS(2)-VASc scores compared with stroke of predicted arterial etiology. Predicted lacunar strokes had higher systolic and diastolic blood pressures and lower National Institutes of Health Stroke Scale compared with predicted arterial and cardioembolic strokes. Cryptogenic strokes of unclear predicted etiology were less likely to have a prior transient ischemic attack or ischemic stroke. Gene expression in conjunction with a measure of infarct location can predict a probable cause in cryptogenic strokes. Predicted groups require further evaluation to determine whether relevant clinical, imaging, or therapeutic differences exist for each group.

  8. An integrated machine learning approach for predicting DosR-regulated genes in Mycobacterium tuberculosis

    Directory of Open Access Journals (Sweden)

    Bacon Joanna

    2010-03-01

    Full Text Available Abstract Background DosR is an important regulator of the response to stress such as limited oxygen availability in Mycobacterium tuberculosis. Time course gene expression data enable us to dissect this response on the gene regulatory level. The mRNA expression profile of a regulator, however, is not necessarily a direct reflection of its activity. Knowing the transcription factor activity (TFA can be exploited to predict novel target genes regulated by the same transcription factor. Various approaches have been proposed to reconstruct TFAs from gene expression data. Most of them capture only a first-order approximation to the complex transcriptional processes by assuming linear gene responses and linear dynamics in TFA, or ignore the temporal information in data from such systems. Results In this paper, we approach the problem of inferring dynamic hidden TFAs using Gaussian processes (GP. We are able to model dynamic TFAs and to account for both linear and nonlinear gene responses. To test the validity of the proposed approach, we reconstruct the hidden TFA of p53, a tumour suppressor activated by DNA damage, using published time course gene expression data. Our reconstructed TFA is closer to the experimentally determined profile of p53 concentration than that from the original study. We then apply the model to time course gene expression data obtained from chemostat cultures of M. tuberculosis under reduced oxygen availability. After estimation of the TFA of DosR based on a number of known target genes using the GP model, we predict novel DosR-regulated genes: the parameters of the model are interpreted as relevance parameters indicating an existing functional relationship between TFA and gene expression. We further improve the prediction by integrating promoter sequence information in a logistic regression model. Apart from the documented DosR-regulated genes, our prediction yields ten novel genes under direct control of DosR. Conclusions

  9. Shotgun metagenomic data streams: surfing without fear

    Energy Technology Data Exchange (ETDEWEB)

    Berendzen, Joel R [Los Alamos National Laboratory

    2010-12-06

    Timely information about bio-threat prevalence, consequence, propagation, attribution, and mitigation is needed to support decision-making, both routinely and in a crisis. One DNA sequencer can stream 25 Gbp of information per day, but sampling strategies and analysis techniques are needed to turn raw sequencing power into actionable knowledge. Shotgun metagenomics can enable biosurveillance at the level of a single city, hospital, or airplane. Metagenomics characterizes viruses and bacteria from complex environments such as soil, air filters, or sewage. Unlike targeted-primer-based sequencing, shotgun methods are not blind to sequences that are truly novel, and they can measure absolute prevalence. Shotgun metagenomic sampling can be non-invasive, efficient, and inexpensive while being informative. We have developed analysis techniques for shotgun metagenomic sequencing that rely upon phylogenetic signature patterns. They work by indexing local sequence patterns in a manner similar to web search engines. Our methods are laptop-fast and favorable scaling properties ensure they will be sustainable as sequencing methods grow. We show examples of application to soil metagenomic samples.

  10. Viral Metagenomics: MetaView Software

    Energy Technology Data Exchange (ETDEWEB)

    Zhou, C; Smith, J

    2007-10-22

    The purpose of this report is to design and develop a tool for analysis of raw sequence read data from viral metagenomics experiments. The tool should compare read sequences of known viral nucleic acid sequence data and enable a user to attempt to determine, with some degree of confidence, what virus groups may be present in the sample. This project was conducted in two phases. In phase 1 we surveyed the literature and examined existing metagenomics tools to educate ourselves and to more precisely define the problem of analyzing raw read data from viral metagenomic experiments. In phase 2 we devised an approach and built a prototype code and database. This code takes viral metagenomic read data in fasta format as input and accesses all complete viral genomes from Kpath for sequence comparison. The system executes at the UNIX command line, producing output that is stored in an Oracle relational database. We provide here a description of the approach we came up with for handling un-assembled, short read data sets from viral metagenomics experiments. We include a discussion of the current MetaView code capabilities and additional functionality that we believe should be added, should additional funding be acquired to continue the work.

  11. Metagenomic analysis of a permafrost microbial community reveals a rapid response to thaw

    Science.gov (United States)

    MacKelprang, R.; Waldrop, M.P.; Deangelis, K.M.; David, M.M.; Chavarria, K.L.; Blazewicz, S.J.; Rubin, E.M.; Jansson, J.K.

    2011-01-01

    Permafrost contains an estimated 1672????????Pg carbon (C), an amount roughly equivalent to the total currently contained within land plants and the atmosphere. This reservoir of C is vulnerable to decomposition as rising global temperatures cause the permafrost to thaw. During thaw, trapped organic matter may become more accessible for microbial degradation and result in greenhouse gas emissions. Despite recent advances in the use of molecular tools to study permafrost microbial communities, their response to thaw remains unclear. Here we use deep metagenomic sequencing to determine the impact of thaw on microbial phylogenetic and functional genes, and relate these data to measurements of methane emissions. Metagenomics, the direct sequencing of DNA from the environment, allows the examination of whole biochemical pathways and associated processes, as opposed to individual pieces of the metabolic puzzle. Our metagenome analyses reveal that during transition from a frozen to a thawed state there are rapid shifts in many microbial, phylogenetic and functional gene abundances and pathways. After one week of incubation at 5 ??C, permafrost metagenomes converge to be more similar to each other than while they are frozen. We find that multiple genes involved in cycling of C and nitrogen shift rapidly during thaw. We also construct the first draft genome from a complex soil metagenome, which corresponds to a novel methanogen. Methane previously accumulated in permafrost is released during thaw and subsequently consumed by methanotrophic bacteria. Together these data point towards the importance of rapid cycling of methane and nitrogen in thawing permafrost. ?? 2011 Macmillan Publishers Limited. All rights reserved.

  12. Functional Metagenomics of a Biostimulated Petroleum-Contaminated Soil Reveals an Extraordinary Diversity of Extradiol Dioxygenases.

    Science.gov (United States)

    Terrón-González, Laura; Martín-Cabello, Guadalupe; Ferrer, Manuel; Santero, Eduardo

    2016-04-01

    A metagenomic library of a petroleum-contaminated soil was constructed in a fosmid vector that allowed heterologous expression of metagenomic DNA. The library, consisting of 6.5 Gb of metagenomic DNA, was screened for extradiol dioxygenase (Edo) activity using catechol and 2,3-dihydroxybiphenyl as the substrates. Fifty-eight independent clones encoding extradiol dioxygenase activity were identified. Forty-one different Edo-encoding genes were identified. The population of Edo genes was not dominated by a particular gene or by highly similar genes; rather, the genes had an even distribution and high diversity. Phylogenetic analyses revealed that most of the genes could not be ascribed to previously defined subfamilies of Edos. Rather, the Edo genes led to the definition of 10 new subfamilies of type I Edos. Phylogenetic analysis of type II enzymes defined 7 families, 2 of which harbored the type II Edos that were found in this work. Particularly striking was the diversity found in family I.3 Edos; 15 out of the 17 sequences assigned to this family belonged to 7 newly defined subfamilies. A strong bias was found that depended on the substrate used for the screening: catechol mainly led to the detection of Edos belonging to the I.2 family, while 2,3-dihydroxybiphenyl led to the detection of most other Edos. Members of the I.2 family showed a clear substrate preference for monocyclic substrates, while those from the I.3 family showed a broader substrate range and high activity toward 2,3-dihydroxybiphenyl. This metagenomic analysis has substantially increased our knowledge of the existing biodiversity of Edos. Copyright © 2016, American Society for Microbiology. All Rights Reserved.

  13. Predicting Genes Involved in Human Cancer Using Network Contextual Information

    Directory of Open Access Journals (Sweden)

    Rahmani Hossein

    2012-03-01

    Full Text Available Protein-Protein Interaction (PPI networks have been widely used for the task of predicting proteins involved in cancer. Previous research has shown that functional information about the protein for which a prediction is made, proximity to specific other proteins in the PPI network, as well as local network structure are informative features in this respect. In this work, we introduce two new types of input features, reflecting additional information: (1 Functional Context: the functions of proteins interacting with the target protein (rather than the protein itself; and (2 Structural Context: the relative position of the target protein with respect to specific other proteins selected according to a novel ANOVA (analysis of variance based measure. We also introduce a selection strategy to pinpoint the most informative features. Results show that the proposed feature types and feature selection strategy yield informative features. A standard machine learning method (Naive Bayes that uses the features proposed here outperforms the current state-of-the-art methods by more than 5% with respect to F-measure. In addition, manual inspection confirms the biological relevance of the top-ranked features.

  14. Prediction and validation of gene-disease associations using methods inspired by social network analyses.

    Directory of Open Access Journals (Sweden)

    U Martin Singh-Blom

    Full Text Available Correctly identifying associations of genes with diseases has long been a goal in biology. With the emergence of large-scale gene-phenotype association datasets in biology, we can leverage statistical and machine learning methods to help us achieve this goal. In this paper, we present two methods for predicting gene-disease associations based on functional gene associations and gene-phenotype associations in model organisms. The first method, the Katz measure, is motivated from its success in social network link prediction, and is very closely related to some of the recent methods proposed for gene-disease association inference. The second method, called Catapult (Combining dATa Across species using Positive-Unlabeled Learning Techniques, is a supervised machine learning method that uses a biased support vector machine where the features are derived from walks in a heterogeneous gene-trait network. We study the performance of the proposed methods and related state-of-the-art methods using two different evaluation strategies, on two distinct data sets, namely OMIM phenotypes and drug-target interactions. Finally, by measuring the performance of the methods using two different evaluation strategies, we show that even though both methods perform very well, the Katz measure is better at identifying associations between traits and poorly studied genes, whereas Catapult is better suited to correctly identifying gene-trait associations overall [corrected].

  15. Prediction of DNA methylation in the promoter of gene suppressor tumor.

    Science.gov (United States)

    Saif, Imane; Kasmi, Yassine; Allali, Karam; Ennaji, Moulay Mustapha

    2018-04-20

    The epigenetics methylation of cytosine is the most common epigenetic form in DNA sequences. It is highly concentrated in the promoter regions of the genes, leading to an inactivation of tumor suppressors regardless of their initial function. In this work, we aim to identify the highly methylated regions; the cytosine-phosphate-guanine (CpG) island located on the promoters and/or the first exon gene known for their key roles in the cell cycle, hence the need to study gene-gene interactions. The Frommer and hidden Markov model algorithms are used as computational methods to identify CpG islands with specificity and sensitivity up to 76% and 80%, respectively. The results obtained show, on the one hand, that the genes studied are suspected of developing hypermethylation in the promoter region of the gene involved in the case of a cancer. We then showed that the relative richness in CG results from a high level of methylation. On the other hand, we observe that the gene-gene interaction exhibits co-expression between the chosen genes. This let us to conclude that the hidden Markov model algorithm predicts more specific and valuable information about the hypermethylation in gene as a preventive and diagnostics tools for the personalized medicine; as that the tumor-suppresser-genes have relative co-expression and complementary relations which the hypermethylation affect in the samples studied in our work. Copyright © 2018. Published by Elsevier B.V.

  16. Genomic Prediction and Association Mapping of Curd-Related Traits in Gene Bank Accessions of Cauliflower.

    Science.gov (United States)

    Thorwarth, Patrick; Yousef, Eltohamy A A; Schmid, Karl J

    2018-02-02

    Genetic resources are an important source of genetic variation for plant breeding. Genome-wide association studies (GWAS) and genomic prediction greatly facilitate the analysis and utilization of useful genetic diversity for improving complex phenotypic traits in crop plants. We explored the potential of GWAS and genomic prediction for improving curd-related traits in cauliflower ( Brassica oleracea var. botrytis ) by combining 174 randomly selected cauliflower gene bank accessions from two different gene banks. The collection was genotyped with genotyping-by-sequencing (GBS) and phenotyped for six curd-related traits at two locations and three growing seasons. A GWAS analysis based on 120,693 single-nucleotide polymorphisms identified a total of 24 significant associations for curd-related traits. The potential for genomic prediction was assessed with a genomic best linear unbiased prediction model and BayesB. Prediction abilities ranged from 0.10 to 0.66 for different traits and did not differ between prediction methods. Imputation of missing genotypes only slightly improved prediction ability. Our results demonstrate that GWAS and genomic prediction in combination with GBS and phenotyping of highly heritable traits can be used to identify useful quantitative trait loci and genotypes among genetically diverse gene bank material for subsequent utilization as genetic resources in cauliflower breeding. Copyright © 2018 Thorwarth et al.

  17. Genomic Prediction and Association Mapping of Curd-Related Traits in Gene Bank Accessions of Cauliflower

    Directory of Open Access Journals (Sweden)

    Patrick Thorwarth

    2018-02-01

    Full Text Available Genetic resources are an important source of genetic variation for plant breeding. Genome-wide association studies (GWAS and genomic prediction greatly facilitate the analysis and utilization of useful genetic diversity for improving complex phenotypic traits in crop plants. We explored the potential of GWAS and genomic prediction for improving curd-related traits in cauliflower (Brassica oleracea var. botrytis by combining 174 randomly selected cauliflower gene bank accessions from two different gene banks. The collection was genotyped with genotyping-by-sequencing (GBS and phenotyped for six curd-related traits at two locations and three growing seasons. A GWAS analysis based on 120,693 single-nucleotide polymorphisms identified a total of 24 significant associations for curd-related traits. The potential for genomic prediction was assessed with a genomic best linear unbiased prediction model and BayesB. Prediction abilities ranged from 0.10 to 0.66 for different traits and did not differ between prediction methods. Imputation of missing genotypes only slightly improved prediction ability. Our results demonstrate that GWAS and genomic prediction in combination with GBS and phenotyping of highly heritable traits can be used to identify useful quantitative trait loci and genotypes among genetically diverse gene bank material for subsequent utilization as genetic resources in cauliflower breeding.

  18. QServer: a biclustering server for prediction and assessment of co-expressed gene clusters.

    Directory of Open Access Journals (Sweden)

    Fengfeng Zhou

    Full Text Available BACKGROUND: Biclustering is a powerful technique for identification of co-expressed gene groups under any (unspecified substantial subset of given experimental conditions, which can be used for elucidation of transcriptionally co-regulated genes. RESULTS: We have previously developed a biclustering algorithm, QUBIC, which can solve more general biclustering problems than previous biclustering algorithms. To fully utilize the analysis power the algorithm provides, we have developed a web server, QServer, for prediction, computational validation and analyses of co-expressed gene clusters. Specifically, the QServer has the following capabilities in addition to biclustering by QUBIC: (i prediction and assessment of conserved cis regulatory motifs in promoter sequences of the predicted co-expressed genes; (ii functional enrichment analyses of the predicted co-expressed gene clusters using Gene Ontology (GO terms, and (iii visualization capabilities in support of interactive biclustering analyses. QServer supports the biclustering and functional analysis for a wide range of organisms, including human, mouse, Arabidopsis, bacteria and archaea, whose underlying genome database will be continuously updated. CONCLUSION: We believe that QServer provides an easy-to-use and highly effective platform useful for hypothesis formulation and testing related to transcription co-regulation.

  19. QServer: a biclustering server for prediction and assessment of co-expressed gene clusters.

    Science.gov (United States)

    Zhou, Fengfeng; Ma, Qin; Li, Guojun; Xu, Ying

    2012-01-01

    Biclustering is a powerful technique for identification of co-expressed gene groups under any (unspecified) substantial subset of given experimental conditions, which can be used for elucidation of transcriptionally co-regulated genes. We have previously developed a biclustering algorithm, QUBIC, which can solve more general biclustering problems than previous biclustering algorithms. To fully utilize the analysis power the algorithm provides, we have developed a web server, QServer, for prediction, computational validation and analyses of co-expressed gene clusters. Specifically, the QServer has the following capabilities in addition to biclustering by QUBIC: (i) prediction and assessment of conserved cis regulatory motifs in promoter sequences of the predicted co-expressed genes; (ii) functional enrichment analyses of the predicted co-expressed gene clusters using Gene Ontology (GO) terms, and (iii) visualization capabilities in support of interactive biclustering analyses. QServer supports the biclustering and functional analysis for a wide range of organisms, including human, mouse, Arabidopsis, bacteria and archaea, whose underlying genome database will be continuously updated. We believe that QServer provides an easy-to-use and highly effective platform useful for hypothesis formulation and testing related to transcription co-regulation.

  20. Adipose gene expression prior to weight loss can differentiate and weakly predict dietary responders.

    Directory of Open Access Journals (Sweden)

    David M Mutch

    Full Text Available BACKGROUND: The ability to identify obese individuals who will successfully lose weight in response to dietary intervention will revolutionize disease management. Therefore, we asked whether it is possible to identify subjects who will lose weight during dietary intervention using only a single gene expression snapshot. METHODOLOGY/PRINCIPAL FINDINGS: The present study involved 54 female subjects from the Nutrient-Gene Interactions in Human Obesity-Implications for Dietary Guidelines (NUGENOB trial to determine whether subcutaneous adipose tissue gene expression co