WorldWideScience

Sample records for genome annotation reveals

  1. Annotating individual human genomes.

    Science.gov (United States)

    Torkamani, Ali; Scott-Van Zeeland, Ashley A; Topol, Eric J; Schork, Nicholas J

    2011-10-01

    Advances in DNA sequencing technologies have made it possible to rapidly, accurately and affordably sequence entire individual human genomes. As impressive as this ability seems, however, it will not likely amount to much if one cannot extract meaningful information from individual sequence data. Annotating variations within individual genomes and providing information about their biological or phenotypic impact will thus be crucially important in moving individual sequencing projects forward, especially in the context of the clinical use of sequence information. In this paper we consider the various ways in which one might annotate individual sequence variations and point out limitations in the available methods for doing so. It is arguable that, in the foreseeable future, DNA sequencing of individual genomes will become routine for clinical, research, forensic, and personal purposes. We therefore also consider directions and areas for further research in annotating genomic variants. Copyright © 2011 Elsevier Inc. All rights reserved.

  2. ANNOTATING INDIVIDUAL HUMAN GENOMES*

    Science.gov (United States)

    Torkamani, Ali; Scott-Van Zeeland, Ashley A.; Topol, Eric J.; Schork, Nicholas J.

    2014-01-01

    Advances in DNA sequencing technologies have made it possible to rapidly, accurately and affordably sequence entire individual human genomes. As impressive as this ability seems, however, it will not likely to amount to much if one cannot extract meaningful information from individual sequence data. Annotating variations within individual genomes and providing information about their biological or phenotypic impact will thus be crucially important in moving individual sequencing projects forward, especially in the context of the clinical use of sequence information. In this paper we consider the various ways in which one might annotate individual sequence variations and point out limitations in the available methods for doing so. It is arguable that, in the foreseeable future, DNA sequencing of individual genomes will become routine for clinical, research, forensic, and personal purposes. We therefore also consider directions and areas for further research in annotating genomic variants. PMID:21839162

  3. Diurnal Cycling Transcription Factors of Pineapple Revealed by Genome-Wide Annotation and Global Transcriptomic Analysis.

    Science.gov (United States)

    Sharma, Anupma; Wai, Ching Man; Ming, Ray; Yu, Qingyi

    2017-09-01

    Circadian clock provides fitness advantage by coordinating internal metabolic and physiological processes to external cyclic environments. Core clock components exhibit daily rhythmic changes in gene expression, and the majority of them are transcription factors (TFs) and transcription coregulators (TCs). We annotated 1,398 TFs from 67 TF families and 80 TCs from 20 TC families in pineapple, and analyzed their tissue-specific and diurnal expression patterns. Approximately 42% of TFs and 45% of TCs displayed diel rhythmic expression, including 177 TF/TCs cycling only in the nonphotosynthetic leaf tissue, 247 cycling only in the photosynthetic leaf tissue, and 201 cycling in both. We identified 68 TF/TCs whose cycling expression was tightly coupled between the photosynthetic and nonphotosynthetic leaf tissues. These TF/TCs likely coordinate key biological processes in pineapple as we demonstrated that this group is enriched in homologous genes that form the core circadian clock in Arabidopsis and includes a STOP1 homolog. Two lines of evidence support the important role of the STOP1 homolog in regulating CAM photosynthesis in pineapple. First, STOP1 responds to acidic pH and regulates a malate channel in multiple plant species. Second, the cycling expression pattern of the pineapple STOP1 and the diurnal pattern of malate accumulation in pineapple leaf are correlated. We further examined duplicate-gene retention and loss in major known circadian genes and refined their evolutionary relationships between pineapple and other plants. Significant variations in duplicate-gene retention and loss were observed for most clock genes in both monocots and dicots. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  4. Carbohydrate catabolic flexibility in the mammalian intestinal commensal Lactobacillus ruminis revealed by fermentation studies aligned to genome annotations

    LENUS (Irish Health Repository)

    2011-08-30

    Abstract Background Lactobacillus ruminis is a poorly characterized member of the Lactobacillus salivarius clade that is part of the intestinal microbiota of pigs, humans and other mammals. Its variable abundance in human and animals may be linked to historical changes over time and geographical differences in dietary intake of complex carbohydrates. Results In this study, we investigated the ability of nine L. ruminis strains of human and bovine origin to utilize fifty carbohydrates including simple sugars, oligosaccharides, and prebiotic polysaccharides. The growth patterns were compared with metabolic pathways predicted by annotation of a high quality draft genome sequence of ATCC 25644 (human isolate) and the complete genome of ATCC 27782 (bovine isolate). All of the strains tested utilized prebiotics including fructooligosaccharides (FOS), soybean-oligosaccharides (SOS) and 1,3:1,4-β-D-gluco-oligosaccharides to varying degrees. Six strains isolated from humans utilized FOS-enriched inulin, as well as FOS. In contrast, three strains isolated from cows grew poorly in FOS-supplemented medium. In general, carbohydrate utilisation patterns were strain-dependent and also varied depending on the degree of polymerisation or complexity of structure. Six putative operons were identified in the genome of the human isolate ATCC 25644 for the transport and utilisation of the prebiotics FOS, galacto-oligosaccharides (GOS), SOS, and 1,3:1,4-β-D-Gluco-oligosaccharides. One of these comprised a novel FOS utilisation operon with predicted capacity to degrade chicory-derived FOS. However, only three of these operons were identified in the ATCC 27782 genome that might account for the utilisation of only SOS and 1,3:1,4-β-D-Gluco-oligosaccharides. Conclusions This study has provided definitive genome-based evidence to support the fermentation patterns of nine strains of Lactobacillus ruminis, and has linked it to gene distribution patterns in strains from different sources

  5. Pipeline to upgrade the genome annotations

    Directory of Open Access Journals (Sweden)

    Lijin K. Gopi

    2017-12-01

    Full Text Available Current era of functional genomics is enriched with good quality draft genomes and annotations for many thousands of species and varieties with the support of the advancements in the next generation sequencing technologies (NGS. Around 25,250 genomes, of the organisms from various kingdoms, are submitted in the NCBI genome resource till date. Each of these genomes was annotated using various tools and knowledge-bases that were available during the period of the annotation. It is obvious that these annotations will be improved if the same genome is annotated using improved tools and knowledge-bases. Here we present a new genome annotation pipeline, strengthened with various tools and knowledge-bases that are capable of producing better quality annotations from the consensus of the predictions from different tools. This resource also perform various additional annotations, apart from the usual gene predictions and functional annotations, which involve SSRs, novel repeats, paralogs, proteins with transmembrane helices, signal peptides etc. This new annotation resource is trained to evaluate and integrate all the predictions together to resolve the overlaps and ambiguities of the boundaries. One of the important highlights of this resource is the capability of predicting the phylogenetic relations of the repeats using the evolutionary trace analysis and orthologous gene clusters. We also present a case study, of the pipeline, in which we upgrade the genome annotation of Nelumbo nucifera (sacred lotus. It is demonstrated that this resource is capable of producing an improved annotation for a better understanding of the biology of various organisms.

  6. WormBase: Annotating many nematode genomes.

    Science.gov (United States)

    Howe, Kevin; Davis, Paul; Paulini, Michael; Tuli, Mary Ann; Williams, Gary; Yook, Karen; Durbin, Richard; Kersey, Paul; Sternberg, Paul W

    2012-01-01

    WormBase (www.wormbase.org) has been serving the scientific community for over 11 years as the central repository for genomic and genetic information for the soil nematode Caenorhabditis elegans. The resource has evolved from its beginnings as a database housing the genomic sequence and genetic and physical maps of a single species, and now represents the breadth and diversity of nematode research, currently serving genome sequence and annotation for around 20 nematodes. In this article, we focus on WormBase's role of genome sequence annotation, describing how we annotate and integrate data from a growing collection of nematode species and strains. We also review our approaches to sequence curation, and discuss the impact on annotation quality of large functional genomics projects such as modENCODE.

  7. Software for computing and annotating genomic ranges.

    Directory of Open Access Journals (Sweden)

    Michael Lawrence

    Full Text Available We describe Bioconductor infrastructure for representing and computing on annotated genomic ranges and integrating genomic data with the statistical computing features of R and its extensions. At the core of the infrastructure are three packages: IRanges, GenomicRanges, and GenomicFeatures. These packages provide scalable data structures for representing annotated ranges on the genome, with special support for transcript structures, read alignments and coverage vectors. Computational facilities include efficient algorithms for overlap and nearest neighbor detection, coverage calculation and other range operations. This infrastructure directly supports more than 80 other Bioconductor packages, including those for sequence analysis, differential expression analysis and visualization.

  8. Software for computing and annotating genomic ranges.

    Science.gov (United States)

    Lawrence, Michael; Huber, Wolfgang; Pagès, Hervé; Aboyoun, Patrick; Carlson, Marc; Gentleman, Robert; Morgan, Martin T; Carey, Vincent J

    2013-01-01

    We describe Bioconductor infrastructure for representing and computing on annotated genomic ranges and integrating genomic data with the statistical computing features of R and its extensions. At the core of the infrastructure are three packages: IRanges, GenomicRanges, and GenomicFeatures. These packages provide scalable data structures for representing annotated ranges on the genome, with special support for transcript structures, read alignments and coverage vectors. Computational facilities include efficient algorithms for overlap and nearest neighbor detection, coverage calculation and other range operations. This infrastructure directly supports more than 80 other Bioconductor packages, including those for sequence analysis, differential expression analysis and visualization.

  9. Genome-wide profiling of 24 hr diel rhythmicity in the water flea, Daphnia pulex: network analysis reveals rhythmic gene expression and enhances functional gene annotation.

    Science.gov (United States)

    Rund, Samuel S C; Yoo, Boyoung; Alam, Camille; Green, Taryn; Stephens, Melissa T; Zeng, Erliang; George, Gary F; Sheppard, Aaron D; Duffield, Giles E; Milenković, Tijana; Pfrender, Michael E

    2016-08-18

    Marine and freshwater zooplankton exhibit daily rhythmic patterns of behavior and physiology which may be regulated directly by the light:dark (LD) cycle and/or a molecular circadian clock. One of the best-studied zooplankton taxa, the freshwater crustacean Daphnia, has a 24 h diel vertical migration (DVM) behavior whereby the organism travels up and down through the water column daily. DVM plays a critical role in resource tracking and the behavioral avoidance of predators and damaging ultraviolet radiation. However, there is little information at the transcriptional level linking the expression patterns of genes to the rhythmic physiology/behavior of Daphnia. Here we analyzed genome-wide temporal transcriptional patterns from Daphnia pulex collected over a 44 h time period under a 12:12 LD cycle (diel) conditions using a cosine-fitting algorithm. We used a comprehensive network modeling and analysis approach to identify novel co-regulated rhythmic genes that have similar network topological properties and functional annotations as rhythmic genes identified by the cosine-fitting analyses. Furthermore, we used the network approach to predict with high accuracy novel gene-function associations, thus enhancing current functional annotations available for genes in this ecologically relevant model species. Our results reveal that genes in many functional groupings exhibit 24 h rhythms in their expression patterns under diel conditions. We highlight the rhythmic expression of immunity, oxidative detoxification, and sensory process genes. We discuss differences in the chronobiology of D. pulex from other well-characterized terrestrial arthropods. This research adds to a growing body of literature suggesting the genetic mechanisms governing rhythmicity in crustaceans may be divergent from other arthropod lineages including insects. Lastly, these results highlight the power of using a network analysis approach to identify differential gene expression and provide novel

  10. JGI Plant Genomics Gene Annotation Pipeline

    Energy Technology Data Exchange (ETDEWEB)

    Shu, Shengqiang; Rokhsar, Dan; Goodstein, David; Hayes, David; Mitros, Therese

    2014-07-14

    Plant genomes vary in size and are highly complex with a high amount of repeats, genome duplication and tandem duplication. Gene encodes a wealth of information useful in studying organism and it is critical to have high quality and stable gene annotation. Thanks to advancement of sequencing technology, many plant species genomes have been sequenced and transcriptomes are also sequenced. To use these vastly large amounts of sequence data to make gene annotation or re-annotation in a timely fashion, an automatic pipeline is needed. JGI plant genomics gene annotation pipeline, called integrated gene call (IGC), is our effort toward this aim with aid of a RNA-seq transcriptome assembly pipeline. It utilizes several gene predictors based on homolog peptides and transcript ORFs. See Methods for detail. Here we present genome annotation of JGI flagship green plants produced by this pipeline plus Arabidopsis and rice except for chlamy which is done by a third party. The genome annotations of these species and others are used in our gene family build pipeline and accessible via JGI Phytozome portal whose URL and front page snapshot are shown below.

  11. MIPS bacterial genomes functional annotation benchmark dataset.

    Science.gov (United States)

    Tetko, Igor V; Brauner, Barbara; Dunger-Kaltenbach, Irmtraud; Frishman, Goar; Montrone, Corinna; Fobo, Gisela; Ruepp, Andreas; Antonov, Alexey V; Surmeli, Dimitrij; Mewes, Hans-Wernen

    2005-05-15

    Any development of new methods for automatic functional annotation of proteins according to their sequences requires high-quality data (as benchmark) as well as tedious preparatory work to generate sequence parameters required as input data for the machine learning methods. Different program settings and incompatible protocols make a comparison of the analyzed methods difficult. The MIPS Bacterial Functional Annotation Benchmark dataset (MIPS-BFAB) is a new, high-quality resource comprising four bacterial genomes manually annotated according to the MIPS functional catalogue (FunCat). These resources include precalculated sequence parameters, such as sequence similarity scores, InterPro domain composition and other parameters that could be used to develop and benchmark methods for functional annotation of bacterial protein sequences. These data are provided in XML format and can be used by scientists who are not necessarily experts in genome annotation. BFAB is available at http://mips.gsf.de/proj/bfab

  12. Contributions to In Silico Genome Annotation

    KAUST Repository

    Kalkatawi, Manal M.

    2017-11-30

    Genome annotation is an important topic since it provides information for the foundation of downstream genomic and biological research. It is considered as a way of summarizing part of existing knowledge about the genomic characteristics of an organism. Annotating different regions of a genome sequence is known as structural annotation, while identifying functions of these regions is considered as a functional annotation. In silico approaches can facilitate both tasks that otherwise would be difficult and timeconsuming. This study contributes to genome annotation by introducing several novel bioinformatics methods, some based on machine learning (ML) approaches. First, we present Dragon PolyA Spotter (DPS), a method for accurate identification of the polyadenylation signals (PAS) within human genomic DNA sequences. For this, we derived a novel feature-set able to characterize properties of the genomic region surrounding the PAS, enabling development of high accuracy optimized ML predictive models. DPS considerably outperformed the state-of-the-art results. The second contribution concerns developing generic models for structural annotation, i.e., the recognition of different genomic signals and regions (GSR) within eukaryotic DNA. We developed DeepGSR, a systematic framework that facilitates generating ML models to predict GSR with high accuracy. To the best of our knowledge, no available generic and automated method exists for such task that could facilitate the studies of newly sequenced organisms. The prediction module of DeepGSR uses deep learning algorithms to derive highly abstract features that depend mainly on proper data representation and hyperparameters calibration. DeepGSR, which was evaluated on recognition of PAS and translation initiation sites (TIS) in different organisms, yields a simpler and more precise representation of the problem under study, compared to some other hand-tailored models, while producing high accuracy prediction results. Finally

  13. Annotated Draft Genome Assemblies for the Northern Bobwhite (Colinus virginianus and the Scaled Quail (Callipepla squamata Reveal Disparate Estimates of Modern Genome Diversity and Historic Effective Population Size

    Directory of Open Access Journals (Sweden)

    David L. Oldeschulte

    2017-09-01

    Full Text Available Northern bobwhite (Colinus virginianus; hereafter bobwhite and scaled quail (Callipepla squamata populations have suffered precipitous declines across most of their US ranges. Illumina-based first- (v1.0 and second- (v2.0 generation draft genome assemblies for the scaled quail and the bobwhite produced N50 scaffold sizes of 1.035 and 2.042 Mb, thereby producing a 45-fold improvement in contiguity over the existing bobwhite assembly, and ≥90% of the assembled genomes were captured within 1313 and 8990 scaffolds, respectively. The scaled quail assembly (v1.0 = 1.045 Gb was ∼20% smaller than the bobwhite (v2.0 = 1.254 Gb, which was supported by kmer-based estimates of genome size. Nevertheless, estimates of GC content (41.72%; 42.66%, genome-wide repetitive content (10.40%; 10.43%, and MAKER-predicted protein coding genes (17,131; 17,165 were similar for the scaled quail (v1.0 and bobwhite (v2.0 assemblies, respectively. BUSCO analyses utilizing 3023 single-copy orthologs revealed a high level of assembly completeness for the scaled quail (v1.0; 84.8% and the bobwhite (v2.0; 82.5%, as verified by comparison with well-established avian genomes. We also detected 273 putative segmental duplications in the scaled quail genome (v1.0, and 711 in the bobwhite genome (v2.0, including some that were shared among both species. Autosomal variant prediction revealed ∼2.48 and 4.17 heterozygous variants per kilobase within the scaled quail (v1.0 and bobwhite (v2.0 genomes, respectively, and estimates of historic effective population size were uniformly higher for the bobwhite across all time points in a coalescent model. However, large-scale declines were predicted for both species beginning ∼15–20 KYA.

  14. Annotated Draft Genome Assemblies for the Northern Bobwhite (Colinus virginianus) and the Scaled Quail (Callipepla squamata) Reveal Disparate Estimates of Modern Genome Diversity and Historic Effective Population Size.

    Science.gov (United States)

    Oldeschulte, David L; Halley, Yvette A; Wilson, Miranda L; Bhattarai, Eric K; Brashear, Wesley; Hill, Joshua; Metz, Richard P; Johnson, Charles D; Rollins, Dale; Peterson, Markus J; Bickhart, Derek M; Decker, Jared E; Sewell, John F; Seabury, Christopher M

    2017-09-07

    Northern bobwhite ( Colinus virginianus ; hereafter bobwhite) and scaled quail ( Callipepla squamata ) populations have suffered precipitous declines across most of their US ranges. Illumina-based first- (v1.0) and second- (v2.0) generation draft genome assemblies for the scaled quail and the bobwhite produced N50 scaffold sizes of 1.035 and 2.042 Mb, thereby producing a 45-fold improvement in contiguity over the existing bobwhite assembly, and ≥90% of the assembled genomes were captured within 1313 and 8990 scaffolds, respectively. The scaled quail assembly (v1.0 = 1.045 Gb) was ∼20% smaller than the bobwhite (v2.0 = 1.254 Gb), which was supported by kmer-based estimates of genome size. Nevertheless, estimates of GC content (41.72%; 42.66%), genome-wide repetitive content (10.40%; 10.43%), and MAKER-predicted protein coding genes (17,131; 17,165) were similar for the scaled quail (v1.0) and bobwhite (v2.0) assemblies, respectively. BUSCO analyses utilizing 3023 single-copy orthologs revealed a high level of assembly completeness for the scaled quail (v1.0; 84.8%) and the bobwhite (v2.0; 82.5%), as verified by comparison with well-established avian genomes. We also detected 273 putative segmental duplications in the scaled quail genome (v1.0), and 711 in the bobwhite genome (v2.0), including some that were shared among both species. Autosomal variant prediction revealed ∼2.48 and 4.17 heterozygous variants per kilobase within the scaled quail (v1.0) and bobwhite (v2.0) genomes, respectively, and estimates of historic effective population size were uniformly higher for the bobwhite across all time points in a coalescent model. However, large-scale declines were predicted for both species beginning ∼15-20 KYA. Copyright © 2017 Oldeschulte et al.

  15. Annotating the human genome with Disease Ontology

    Science.gov (United States)

    Osborne, John D; Flatow, Jared; Holko, Michelle; Lin, Simon M; Kibbe, Warren A; Zhu, Lihua (Julie); Danila, Maria I; Feng, Gang; Chisholm, Rex L

    2009-01-01

    Background The human genome has been extensively annotated with Gene Ontology for biological functions, but minimally computationally annotated for diseases. Results We used the Unified Medical Language System (UMLS) MetaMap Transfer tool (MMTx) to discover gene-disease relationships from the GeneRIF database. We utilized a comprehensive subset of UMLS, which is disease-focused and structured as a directed acyclic graph (the Disease Ontology), to filter and interpret results from MMTx. The results were validated against the Homayouni gene collection using recall and precision measurements. We compared our results with the widely used Online Mendelian Inheritance in Man (OMIM) annotations. Conclusion The validation data set suggests a 91% recall rate and 97% precision rate of disease annotation using GeneRIF, in contrast with a 22% recall and 98% precision using OMIM. Our thesaurus-based approach allows for comparisons to be made between disease containing databases and allows for increased accuracy in disease identification through synonym matching. The much higher recall rate of our approach demonstrates that annotating human genome with Disease Ontology and GeneRIF for diseases dramatically increases the coverage of the disease annotation of human genome. PMID:19594883

  16. Challenges in Whole-Genome Annotation of Pyrosequenced Eukaryotic Genomes

    Energy Technology Data Exchange (ETDEWEB)

    Kuo, Alan; Grigoriev, Igor

    2009-04-17

    Pyrosequencing technologies such as 454/Roche and Solexa/Illumina vastly lower the cost of nucleotide sequencing compared to the traditional Sanger method, and thus promise to greatly expand the number of sequenced eukaryotic genomes. However, the new technologies also bring new challenges such as shorter reads and new kinds and higher rates of sequencing errors, which complicate genome assembly and gene prediction. At JGI we are deploying 454 technology for the sequencing and assembly of ever-larger eukaryotic genomes. Here we describe our first whole-genome annotation of a purely 454-sequenced fungal genome that is larger than a yeast (>30 Mbp). The pezizomycotine (filamentous ascomycote) Aspergillus carbonarius belongs to the Aspergillus section Nigri species complex, members of which are significant as platforms for bioenergy and bioindustrial technology, as members of soil microbial communities and players in the global carbon cycle, and as agricultural toxigens. Application of a modified version of the standard JGI Annotation Pipeline has so far predicted ~;;10k genes. ~;;12percent of these preliminary annotations suffer a potential frameshift error, which is somewhat higher than the ~;;9percent rate in the Sanger-sequenced and conventionally assembled and annotated genome of fellow Aspergillus section Nigri member A. niger. Also,>90percent of A. niger genes have potential homologs in the A. carbonarius preliminary annotation. Weconclude, and with further annotation and comparative analysis expect to confirm, that 454 sequencing strategies provide a promising substrate for annotation of modestly sized eukaryotic genomes. We will also present results of annotation of a number of other pyrosequenced fungal genomes of bioenergy interest.

  17. Inconsistencies of genome annotations in apicomplexan parasites revealed by 5'-end-one-pass and full-length sequences of oligo-capped cDNAs

    Directory of Open Access Journals (Sweden)

    Sugano Sumio

    2009-07-01

    Full Text Available Abstract Background Apicomplexan parasites are causative agents of various diseases including malaria and have been targets of extensive genomic sequencing. We generated 5'-EST collections for six apicomplexa parasites using our full-length oligo-capping cDNA library method. To improve upon the current genome annotations, as well as to validate the importance for physical cDNA clone resources, we generated a large-scale collection of full-length cDNAs for several apicomplexa parasites. Results In this study, we used a total of 61,056 5'-end-single-pass cDNA sequences from Plasmodium falciparum, P. vivax, P. yoelii, P. berghei, Cryptosporidium parvum, and Toxoplasma gondii. We compared these partially sequenced cDNA sequences with the currently annotated gene models and observed significant inconsistencies between the two datasets. In particular, we found that on average 14% of the exons in the current gene models were not supported by any cDNA evidence, and that 16% of the current gene models may contain at least one mis-annotation and should be re-evaluated. We also identified a large number of transcripts that had been previously unidentified. For 732 cDNAs in T. gondii, the entire sequences were determined in order to evaluate the annotated gene models at the complete full-length transcript level. We found that 41% of the T. gondii gene models contained at least one inconsistency. We also identified and confirmed by RT-PCR 140 previously unidentified transcripts found in the intergenic regions of the current gene annotations. We show that the majority of these discrepancies are due to questionable predictions of one or two extra exons in the upstream or downstream regions of the genes. Conclusion Our data indicates that the current gene models are likely to still be incomplete and have much room for improvement. Our unique full-length cDNA information is especially useful for further refinement of the annotations for the genomes of

  18. Towards Viral Genome Annotation Standards, Report from the 2010 NCBI Annotation Workshop.

    Science.gov (United States)

    Brister, James Rodney; Bao, Yiming; Kuiken, Carla; Lefkowitz, Elliot J; Le Mercier, Philippe; Leplae, Raphael; Madupu, Ramana; Scheuermann, Richard H; Schobel, Seth; Seto, Donald; Shrivastava, Susmita; Sterk, Peter; Zeng, Qiandong; Klimke, William; Tatusova, Tatiana

    2010-10-01

    Improvements in DNA sequencing technologies portend a new era in virology and could possibly lead to a giant leap in our understanding of viral evolution and ecology. Yet, as viral genome sequences begin to fill the world's biological databases, it is critically important to recognize that the scientific promise of this era is dependent on consistent and comprehensive genome annotation. With this in mind, the NCBI Genome Annotation Workshop recently hosted a study group tasked with developing sequence, function, and metadata annotation standards for viral genomes. This report describes the issues involved in viral genome annotation and reviews policy recommendations presented at the NCBI Annotation Workshop.

  19. Towards Viral Genome Annotation Standards, Report from the 2010 NCBI Annotation Workshop

    Directory of Open Access Journals (Sweden)

    Qiandong Zeng

    2010-10-01

    Full Text Available Improvements in DNA sequencing technologies portend a new era in virology and could possibly lead to a giant leap in our understanding of viral evolution and ecology. Yet, as viral genome sequences begin to fill the world’s biological databases, it is critically important to recognize that the scientific promise of this era is dependent on consistent and comprehensive genome annotation. With this in mind, the NCBI Genome Annotation Workshop recently hosted a study group tasked with developing sequence, function, and metadata annotation standards for viral genomes. This report describes the issues involved in viral genome annotation and reviews policy recommendations presented at the NCBI Annotation Workshop.

  20. Annotating functional RNAs in genomes using Infernal.

    Science.gov (United States)

    Nawrocki, Eric P

    2014-01-01

    Many different types of functional non-coding RNAs participate in a wide range of important cellular functions but the large majority of these RNAs are not routinely annotated in published genomes. Several programs have been developed for identifying RNAs, including specific tools tailored to a particular RNA family as well as more general ones designed to work for any family. Many of these tools utilize covariance models (CMs), statistical models of the conserved sequence, and structure of an RNA family. In this chapter, as an illustrative example, the Infernal software package and CMs from the Rfam database are used to identify RNAs in the genome of the archaeon Methanobrevibacter ruminantium, uncovering some additional RNAs not present in the genome's initial annotation. Analysis of the results and comparison with family-specific methods demonstrate some important strengths and weaknesses of this general approach.

  1. MicroScope: a platform for microbial genome annotation and comparative genomics.

    Science.gov (United States)

    Vallenet, D; Engelen, S; Mornico, D; Cruveiller, S; Fleury, L; Lajus, A; Rouy, Z; Roche, D; Salvignol, G; Scarpelli, C; Médigue, C

    2009-01-01

    The initial outcome of genome sequencing is the creation of long text strings written in a four letter alphabet. The role of in silico sequence analysis is to assist biologists in the act of associating biological knowledge with these sequences, allowing investigators to make inferences and predictions that can be tested experimentally. A wide variety of software is available to the scientific community, and can be used to identify genomic objects, before predicting their biological functions. However, only a limited number of biologically interesting features can be revealed from an isolated sequence. Comparative genomics tools, on the other hand, by bringing together the information contained in numerous genomes simultaneously, allow annotators to make inferences based on the idea that evolution and natural selection are central to the definition of all biological processes. We have developed the MicroScope platform in order to offer a web-based framework for the systematic and efficient revision of microbial genome annotation and comparative analysis (http://www.genoscope.cns.fr/agc/microscope). Starting with the description of the flow chart of the annotation processes implemented in the MicroScope pipeline, and the development of traditional and novel microbial annotation and comparative analysis tools, this article emphasizes the essential role of expert annotation as a complement of automatic annotation. Several examples illustrate the use of implemented tools for the review and curation of annotations of both new and publicly available microbial genomes within MicroScope's rich integrated genome framework. The platform is used as a viewer in order to browse updated annotation information of available microbial genomes (more than 440 organisms to date), and in the context of new annotation projects (117 bacterial genomes). The human expertise gathered in the MicroScope database (about 280,000 independent annotations) contributes to improve the quality of

  2. Annotation of selection strengths in viral genomes

    DEFF Research Database (Denmark)

    McCauley, Stephen; de Groot, Saskia; Mailund, Thomas

    2007-01-01

    Motivation: Viral genomes tend to code in overlapping reading frames to maximize information content. This may result in atypical codon bias and particular evolutionary constraints. Due to the fast mutation rate of viruses, there is additional strong evidence for varying selection between intra......- and intergenomic regions. The presence of multiple coding regions complicates the concept of Ka/Ks ratio, and thus begs for an alternative approach when investigating selection strengths. Building on the paper by McCauley & Hein (2006), we develop a method for annotating a viral genome coding in overlapping...... may thus achieve an annotation both of coding regions as well as selection strengths, allowing us to investigate different selection patterns and hypotheses. Results: We illustrate our method by applying it to a multiple alignment of four HIV2 sequences, as well as four Hepatitis B sequences. We...

  3. BEACON: automated tool for Bacterial GEnome Annotation ComparisON.

    Science.gov (United States)

    Kalkatawi, Manal; Alam, Intikhab; Bajic, Vladimir B

    2015-08-18

    Genome annotation is one way of summarizing the existing knowledge about genomic characteristics of an organism. There has been an increased interest during the last several decades in computer-based structural and functional genome annotation. Many methods for this purpose have been developed for eukaryotes and prokaryotes. Our study focuses on comparison of functional annotations of prokaryotic genomes. To the best of our knowledge there is no fully automated system for detailed comparison of functional genome annotations generated by different annotation methods (AMs). The presence of many AMs and development of new ones introduce needs to: a/ compare different annotations for a single genome, and b/ generate annotation by combining individual ones. To address these issues we developed an Automated Tool for Bacterial GEnome Annotation ComparisON (BEACON) that benefits both AM developers and annotation analysers. BEACON provides detailed comparison of gene function annotations of prokaryotic genomes obtained by different AMs and generates extended annotations through combination of individual ones. For the illustration of BEACON's utility, we provide a comparison analysis of multiple different annotations generated for four genomes and show on these examples that the extended annotation can increase the number of genes annotated by putative functions up to 27%, while the number of genes without any function assignment is reduced. We developed BEACON, a fast tool for an automated and a systematic comparison of different annotations of single genomes. The extended annotation assigns putative functions to many genes with unknown functions. BEACON is available under GNU General Public License version 3.0 and is accessible at: http://www.cbrc.kaust.edu.sa/BEACON/ .

  4. BEACON: automated tool for Bacterial GEnome Annotation ComparisON

    KAUST Repository

    Kalkatawi, Manal M.

    2015-08-18

    Background Genome annotation is one way of summarizing the existing knowledge about genomic characteristics of an organism. There has been an increased interest during the last several decades in computer-based structural and functional genome annotation. Many methods for this purpose have been developed for eukaryotes and prokaryotes. Our study focuses on comparison of functional annotations of prokaryotic genomes. To the best of our knowledge there is no fully automated system for detailed comparison of functional genome annotations generated by different annotation methods (AMs). Results The presence of many AMs and development of new ones introduce needs to: a/ compare different annotations for a single genome, and b/ generate annotation by combining individual ones. To address these issues we developed an Automated Tool for Bacterial GEnome Annotation ComparisON (BEACON) that benefits both AM developers and annotation analysers. BEACON provides detailed comparison of gene function annotations of prokaryotic genomes obtained by different AMs and generates extended annotations through combination of individual ones. For the illustration of BEACON’s utility, we provide a comparison analysis of multiple different annotations generated for four genomes and show on these examples that the extended annotation can increase the number of genes annotated by putative functions up to 27 %, while the number of genes without any function assignment is reduced. Conclusions We developed BEACON, a fast tool for an automated and a systematic comparison of different annotations of single genomes. The extended annotation assigns putative functions to many genes with unknown functions. BEACON is available under GNU General Public License version 3.0 and is accessible at: http://www.cbrc.kaust.edu.sa/BEACON/

  5. Annotating non-coding regions of the genome.

    Science.gov (United States)

    Alexander, Roger P; Fang, Gang; Rozowsky, Joel; Snyder, Michael; Gerstein, Mark B

    2010-08-01

    Most of the human genome consists of non-protein-coding DNA. Recently, progress has been made in annotating these non-coding regions through the interpretation of functional genomics experiments and comparative sequence analysis. One can conceptualize functional genomics analysis as involving a sequence of steps: turning the output of an experiment into a 'signal' at each base pair of the genome; smoothing this signal and segmenting it into small blocks of initial annotation; and then clustering these small blocks into larger derived annotations and networks. Finally, one can relate functional genomics annotations to conserved units and measures of conservation derived from comparative sequence analysis.

  6. All SNPs are not created equal: genome-wide association studies reveal a consistent pattern of enrichment among functionally annotated SNPs

    DEFF Research Database (Denmark)

    Schork, Andrew J; Thompson, Wesley K; Pham, Phillip

    2013-01-01

    Recent results indicate that genome-wide association studies (GWAS) have the potential to explain much of the heritability of common complex phenotypes, but methods are lacking to reliably identify the remaining associated single nucleotide polymorphisms (SNPs). We applied stratified False...... Discovery Rate (sFDR) methods to leverage genic enrichment in GWAS summary statistics data to uncover new loci likely to replicate in independent samples. Specifically, we use linkage disequilibrium-weighted annotations for each SNP in combination with nominal p-values to estimate the True Discovery Rate...... in introns, and negative enrichment for intergenic SNPs. Stratified enrichment directly leads to increased TDR for a given p-value, mirrored by increased replication rates in independent samples. We show this in independent Crohn's disease GWAS, where we find a hundredfold variation in replication rate...

  7. First generation annotations for the fathead minnow (Pimephales promelas) genome

    Science.gov (United States)

    Ab initio gene prediction and evidence alignment were used to produce the first annotations for the fathead minnow SOAPdenovo genome assembly. Additionally, a genome browser hosted at genome.setac.org provides simplified access to the annotation data in context with fathead minno...

  8. Community annotation and bioinformatics workforce development in concert--Little Skate Genome Annotation Workshops and Jamborees.

    Science.gov (United States)

    Wang, Qinghua; Arighi, Cecilia N; King, Benjamin L; Polson, Shawn W; Vincent, James; Chen, Chuming; Huang, Hongzhan; Kingham, Brewster F; Page, Shallee T; Rendino, Marc Farnum; Thomas, William Kelley; Udwary, Daniel W; Wu, Cathy H

    2012-01-01

    Recent advances in high-throughput DNA sequencing technologies have equipped biologists with a powerful new set of tools for advancing research goals. The resulting flood of sequence data has made it critically important to train the next generation of scientists to handle the inherent bioinformatic challenges. The North East Bioinformatics Collaborative (NEBC) is undertaking the genome sequencing and annotation of the little skate (Leucoraja erinacea) to promote advancement of bioinformatics infrastructure in our region, with an emphasis on practical education to create a critical mass of informatically savvy life scientists. In support of the Little Skate Genome Project, the NEBC members have developed several annotation workshops and jamborees to provide training in genome sequencing, annotation and analysis. Acting as a nexus for both curation activities and dissemination of project data, a project web portal, SkateBase (http://skatebase.org) has been developed. As a case study to illustrate effective coupling of community annotation with workforce development, we report the results of the Mitochondrial Genome Annotation Jamborees organized to annotate the first completely assembled element of the Little Skate Genome Project, as a culminating experience for participants from our three prior annotation workshops. We are applying the physical/virtual infrastructure and lessons learned from these activities to enhance and streamline the genome annotation workflow, as we look toward our continuing efforts for larger-scale functional and structural community annotation of the L. erinacea genome.

  9. Community annotation and bioinformatics workforce development in concert—Little Skate Genome Annotation Workshops and Jamborees

    Science.gov (United States)

    Wang, Qinghua; Arighi, Cecilia N.; King, Benjamin L.; Polson, Shawn W.; Vincent, James; Chen, Chuming; Huang, Hongzhan; Kingham, Brewster F.; Page, Shallee T.; Farnum Rendino, Marc; Thomas, William Kelley; Udwary, Daniel W.; Wu, Cathy H.

    2012-01-01

    Recent advances in high-throughput DNA sequencing technologies have equipped biologists with a powerful new set of tools for advancing research goals. The resulting flood of sequence data has made it critically important to train the next generation of scientists to handle the inherent bioinformatic challenges. The North East Bioinformatics Collaborative (NEBC) is undertaking the genome sequencing and annotation of the little skate (Leucoraja erinacea) to promote advancement of bioinformatics infrastructure in our region, with an emphasis on practical education to create a critical mass of informatically savvy life scientists. In support of the Little Skate Genome Project, the NEBC members have developed several annotation workshops and jamborees to provide training in genome sequencing, annotation and analysis. Acting as a nexus for both curation activities and dissemination of project data, a project web portal, SkateBase (http://skatebase.org) has been developed. As a case study to illustrate effective coupling of community annotation with workforce development, we report the results of the Mitochondrial Genome Annotation Jamborees organized to annotate the first completely assembled element of the Little Skate Genome Project, as a culminating experience for participants from our three prior annotation workshops. We are applying the physical/virtual infrastructure and lessons learned from these activities to enhance and streamline the genome annotation workflow, as we look toward our continuing efforts for larger-scale functional and structural community annotation of the L. erinacea genome. PMID:22434832

  10. Gene calling and bacterial genome annotation with BG7.

    Science.gov (United States)

    Tobes, Raquel; Pareja-Tobes, Pablo; Manrique, Marina; Pareja-Tobes, Eduardo; Kovach, Evdokim; Alekhin, Alexey; Pareja, Eduardo

    2015-01-01

    New massive sequencing technologies are providing many bacterial genome sequences from diverse taxa but a refined annotation of these genomes is crucial for obtaining scientific findings and new knowledge. Thus, bacterial genome annotation has emerged as a key point to investigate in bacteria. Any efficient tool designed specifically to annotate bacterial genomes sequenced with massively parallel technologies has to consider the specific features of bacterial genomes (absence of introns and scarcity of nonprotein-coding sequence) and of next-generation sequencing (NGS) technologies (presence of errors and not perfectly assembled genomes). These features make it convenient to focus on coding regions and, hence, on protein sequences that are the elements directly related with biological functions. In this chapter we describe how to annotate bacterial genomes with BG7, an open-source tool based on a protein-centered gene calling/annotation paradigm. BG7 is specifically designed for the annotation of bacterial genomes sequenced with NGS. This tool is sequence error tolerant maintaining their capabilities for the annotation of highly fragmented genomes or for annotating mixed sequences coming from several genomes (as those obtained through metagenomics samples). BG7 has been designed with scalability as a requirement, with a computing infrastructure completely based on cloud computing (Amazon Web Services).

  11. Evaluation of three automated genome annotations for Halorhabdus utahensis.

    Directory of Open Access Journals (Sweden)

    Peter Bakke

    2009-07-01

    Full Text Available Genome annotations are accumulating rapidly and depend heavily on automated annotation systems. Many genome centers offer annotation systems but no one has compared their output in a systematic way to determine accuracy and inherent errors. Errors in the annotations are routinely deposited in databases such as NCBI and used to validate subsequent annotation errors. We submitted the genome sequence of halophilic archaeon Halorhabdus utahensis to be analyzed by three genome annotation services. We have examined the output from each service in a variety of ways in order to compare the methodology and effectiveness of the annotations, as well as to explore the genes, pathways, and physiology of the previously unannotated genome. The annotation services differ considerably in gene calls, features, and ease of use. We had to manually identify the origin of replication and the species-specific consensus ribosome-binding site. Additionally, we conducted laboratory experiments to test H. utahensis growth and enzyme activity. Current annotation practices need to improve in order to more accurately reflect a genome's biological potential. We make specific recommendations that could improve the quality of microbial annotation projects.

  12. Improving Microbial Genome Annotations in an Integrated Database Context

    Science.gov (United States)

    Chen, I-Min A.; Markowitz, Victor M.; Chu, Ken; Anderson, Iain; Mavromatis, Konstantinos; Kyrpides, Nikos C.; Ivanova, Natalia N.

    2013-01-01

    Effective comparative analysis of microbial genomes requires a consistent and complete view of biological data. Consistency regards the biological coherence of annotations, while completeness regards the extent and coverage of functional characterization for genomes. We have developed tools that allow scientists to assess and improve the consistency and completeness of microbial genome annotations in the context of the Integrated Microbial Genomes (IMG) family of systems. All publicly available microbial genomes are characterized in IMG using different functional annotation and pathway resources, thus providing a comprehensive framework for identifying and resolving annotation discrepancies. A rule based system for predicting phenotypes in IMG provides a powerful mechanism for validating functional annotations, whereby the phenotypic traits of an organism are inferred based on the presence of certain metabolic reactions and pathways and compared to experimentally observed phenotypes. The IMG family of systems are available at http://img.jgi.doe.gov/. PMID:23424620

  13. Improving microbial genome annotations in an integrated database context.

    Directory of Open Access Journals (Sweden)

    I-Min A Chen

    Full Text Available Effective comparative analysis of microbial genomes requires a consistent and complete view of biological data. Consistency regards the biological coherence of annotations, while completeness regards the extent and coverage of functional characterization for genomes. We have developed tools that allow scientists to assess and improve the consistency and completeness of microbial genome annotations in the context of the Integrated Microbial Genomes (IMG family of systems. All publicly available microbial genomes are characterized in IMG using different functional annotation and pathway resources, thus providing a comprehensive framework for identifying and resolving annotation discrepancies. A rule based system for predicting phenotypes in IMG provides a powerful mechanism for validating functional annotations, whereby the phenotypic traits of an organism are inferred based on the presence of certain metabolic reactions and pathways and compared to experimentally observed phenotypes. The IMG family of systems are available at http://img.jgi.doe.gov/.

  14. Annotation of the Clostridium Acetobutylicum Genome

    Energy Technology Data Exchange (ETDEWEB)

    Daly, M. J.

    2004-06-09

    The genome sequence of the solvent producing bacterium Clostridium acetobutylicum ATCC824, has been determined by the shotgun approach. The genome consists of a 3.94 Mb chromosome and a 192 kb megaplasmid that contains the majority of genes responsible for solvent production. Comparison of C. acetobutylicum to Bacillus subtilis reveals significant local conservation of gene order, which has not been seen in comparisons of other genomes with similar, or, in some cases, closer, phylogenetic proximity. This conservation allows the prediction of many previously undetected operons in both bacteria.

  15. Annotation-Based Whole Genomic Prediction and Selection

    DEFF Research Database (Denmark)

    Kadarmideen, Haja; Do, Duy Ngoc; Janss, Luc

    Genomic selection is widely used in both animal and plant species, however, it is performed with no input from known genomic or biological role of genetic variants and therefore is a black box approach in a genomic era. This study investigated the role of different genomic regions and detected QTLs...... in their contribution to estimated genomic variances and in prediction of genomic breeding values by applying SNP annotation approaches to feed efficiency. Ensembl Variant Predictor (EVP) and Pig QTL database were used as the source of genomic annotation for 60K chip. Genomic prediction was performed using the Bayes...... classes. Predictive accuracy was 0.531, 0.532, 0.302, and 0.344 for DFI, RFI, ADG and BF, respectively. The contribution per SNP to total genomic variance was similar among annotated classes across different traits. Predictive performance of SNP classes did not significantly differ from randomized SNP...

  16. Correction of the Caulobacter crescentus NA1000 genome annotation.

    Directory of Open Access Journals (Sweden)

    Bert Ely

    Full Text Available Bacterial genome annotations are accumulating rapidly in the GenBank database and the use of automated annotation technologies to create these annotations has become the norm. However, these automated methods commonly result in a small, but significant percentage of genome annotation errors. To improve accuracy and reliability, we analyzed the Caulobacter crescentus NA1000 genome utilizing computer programs Artemis and MICheck to manually examine the third codon position GC content, alignment to a third codon position GC frame plot peak, and matches in the GenBank database. We identified 11 new genes, modified the start site of 113 genes, and changed the reading frame of 38 genes that had been incorrectly annotated. Furthermore, our manual method of identifying protein-coding genes allowed us to remove 112 non-coding regions that had been designated as coding regions. The improved NA1000 genome annotation resulted in a reduction in the use of rare codons since noncoding regions with atypical codon usage were removed from the annotation and 49 new coding regions were added to the annotation. Thus, a more accurate codon usage table was generated as well. These results demonstrate that a comparison of the location of peaks third codon position GC content to the location of protein coding regions could be used to verify the annotation of any genome that has a GC content that is greater than 60%.

  17. Ten steps to get started in Genome Assembly and Annotation

    Science.gov (United States)

    Dominguez Del Angel, Victoria; Hjerde, Erik; Sterck, Lieven; Capella-Gutierrez, Salvadors; Notredame, Cederic; Vinnere Pettersson, Olga; Amselem, Joelle; Bouri, Laurent; Bocs, Stephanie; Klopp, Christophe; Gibrat, Jean-Francois; Vlasova, Anna; Leskosek, Brane L.; Soler, Lucile; Binzer-Panchal, Mahesh; Lantz, Henrik

    2018-01-01

    As a part of the ELIXIR-EXCELERATE efforts in capacity building, we present here 10 steps to facilitate researchers getting started in genome assembly and genome annotation. The guidelines given are broadly applicable, intended to be stable over time, and cover all aspects from start to finish of a general assembly and annotation project. Intrinsic properties of genomes are discussed, as is the importance of using high quality DNA. Different sequencing technologies and generally applicable workflows for genome assembly are also detailed. We cover structural and functional annotation and encourage readers to also annotate transposable elements, something that is often omitted from annotation workflows. The importance of data management is stressed, and we give advice on where to submit data and how to make your results Findable, Accessible, Interoperable, and Reusable (FAIR). PMID:29568489

  18. Annotation of the protein coding regions of the equine genome

    DEFF Research Database (Denmark)

    Hestand, Matthew S.; Kalbfleisch, Theodore S.; Coleman, Stephen J.

    2015-01-01

    Current gene annotation of the horse genome is largely derived from in silico predictions and cross-species alignments. Only a small number of genes are annotated based on equine EST and mRNA sequences. To expand the number of equine genes annotated from equine experimental evidence, we sequenced m...... and appear to be small errors in the equine reference genome, since they are also identified as homozygous variants by genomic DNA resequencing of the reference horse. Taken together, we provide a resource of equine mRNA structures and protein coding variants that will enhance equine and cross...

  19. Roadmap for annotating transposable elements in eukaryote genomes.

    Science.gov (United States)

    Permal, Emmanuelle; Flutre, Timothée; Quesneville, Hadi

    2012-01-01

    Current high-throughput techniques have made it feasible to sequence even the genomes of non-model organisms. However, the annotation process now represents a bottleneck to genome analysis, especially when dealing with transposable elements (TE). Combined approaches, using both de novo and knowledge-based methods to detect TEs, are likely to produce reasonably comprehensive and sensitive results. This chapter provides a roadmap for researchers involved in genome projects to address this issue. At each step of the TE annotation process, from the identification of TE families to the annotation of TE copies, we outline the tools and good practices to be used.

  20. BEACON: automated tool for Bacterial GEnome Annotation ComparisON

    KAUST Repository

    Kalkatawi, Manal M.; Alam, Intikhab; Bajic, Vladimir B.

    2015-01-01

    We developed BEACON, a fast tool for an automated and a systematic comparison of different annotations of single genomes. The extended annotation assigns putative functions to many genes with unknown functions. BEACON is available under GNU General Public License version 3.0 and is accessible at: http://www.cbrc.kaust.edu.sa/BEACON/

  1. Genome Annotation and Transcriptomics of Oil-Producing Algae

    Science.gov (United States)

    2015-03-16

    AFRL-OSR-VA-TR-2015-0103 GENOME ANNOTATION AND TRANSCRIPTOMICS OF OIL-PRODUCING ALGAE Sabeeha Merchant UNIVERSITY OF CALIFORNIA LOS ANGELES Final...2010 To 12-31-2014 4. TITLE AND SUBTITLE GENOME ANNOTATION AND TRANSCRIPTOMICS OF OIL-PRODUCING ALGAE 5a. CONTRACT NUMBER FA9550-10-1-0095 5b...NOTES 14. ABSTRACT Most algae accumulate triacylglycerols (TAGs) when they are starved for essential nutrients like N, S, P (or Si in the case of some

  2. Annotation of the Domestic Pig Genome by Quantitative Proteogenomics.

    Science.gov (United States)

    Marx, Harald; Hahne, Hannes; Ulbrich, Susanne E; Schnieke, Angelika; Rottmann, Oswald; Frishman, Dmitrij; Kuster, Bernhard

    2017-08-04

    The pig is one of the earliest domesticated animals in the history of human civilization and represents one of the most important livestock animals. The recent sequencing of the Sus scrofa genome was a major step toward the comprehensive understanding of porcine biology, evolution, and its utility as a promising large animal model for biomedical and xenotransplantation research. However, the functional and structural annotation of the Sus scrofa genome is far from complete. Here, we present mass spectrometry-based quantitative proteomics data of nine juvenile organs and six embryonic stages between 18 and 39 days after gestation. We found that the data provide evidence for and improve the annotation of 8176 protein-coding genes including 588 novel and 321 refined gene models. The analysis of tissue-specific proteins and the temporal expression profiles of embryonic proteins provides an initial functional characterization of expressed protein interaction networks and modules including as yet uncharacterized proteins. Comparative transcript and protein expression analysis to human organs reveal a moderate conservation of protein translation across species. We anticipate that this resource will facilitate basic and applied research on Sus scrofa as well as its porcine relatives.

  3. RASTtk: A modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes

    Energy Technology Data Exchange (ETDEWEB)

    Brettin, Thomas; Davis, James J.; Disz, Terry; Edwards, Robert A.; Gerdes, Svetlana; Olsen, Gary J.; Olson, Robert; Overbeek, Ross; Parrello, Bruce; Pusch, Gordon D.; Shukla, Maulik; Thomason, James A.; Stevens, Rick; Vonstein, Veronika; Wattam, Alice R.; Xia, Fangfang

    2015-02-10

    The RAST (Rapid Annotation using Subsystem Technology) annotation engine was built in 2008 to annotate bacterial and archaeal genomes. It works by offering a standard software pipeline for identifying genomic features (i.e., protein-encoding genes and RNA) and annotating their functions. Recently, in order to make RAST a more useful research tool and to keep pace with advancements in bioinformatics, it has become desirable to build a version of RAST that is both customizable and extensible. In this paper, we describe the RAST tool kit (RASTtk), a modular version of RAST that enables researchers to build custom annotation pipelines. RASTtk offers a choice of software for identifying and annotating genomic features as well as the ability to add custom features to an annotation job. RASTtk also accommodates the batch submission of genomes and the ability to customize annotation protocols for batch submissions. This is the first major software restructuring of RAST since its inception.

  4. RASTtk: a modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes.

    Science.gov (United States)

    Brettin, Thomas; Davis, James J; Disz, Terry; Edwards, Robert A; Gerdes, Svetlana; Olsen, Gary J; Olson, Robert; Overbeek, Ross; Parrello, Bruce; Pusch, Gordon D; Shukla, Maulik; Thomason, James A; Stevens, Rick; Vonstein, Veronika; Wattam, Alice R; Xia, Fangfang

    2015-02-10

    The RAST (Rapid Annotation using Subsystem Technology) annotation engine was built in 2008 to annotate bacterial and archaeal genomes. It works by offering a standard software pipeline for identifying genomic features (i.e., protein-encoding genes and RNA) and annotating their functions. Recently, in order to make RAST a more useful research tool and to keep pace with advancements in bioinformatics, it has become desirable to build a version of RAST that is both customizable and extensible. In this paper, we describe the RAST tool kit (RASTtk), a modular version of RAST that enables researchers to build custom annotation pipelines. RASTtk offers a choice of software for identifying and annotating genomic features as well as the ability to add custom features to an annotation job. RASTtk also accommodates the batch submission of genomes and the ability to customize annotation protocols for batch submissions. This is the first major software restructuring of RAST since its inception.

  5. An automated annotation tool for genomic DNA sequences using

    Indian Academy of Sciences (India)

    Genomic sequence data are often available well before the annotated sequence is published. We present a method for analysis of genomic DNA to identify coding sequences using the GeneScan algorithm and characterize these resultant sequences by BLAST. The routines are used to develop a system for automated ...

  6. Combined evidence annotation of transposable elements in genome sequences.

    Directory of Open Access Journals (Sweden)

    Hadi Quesneville

    2005-07-01

    Full Text Available Transposable elements (TEs are mobile, repetitive sequences that make up significant fractions of metazoan genomes. Despite their near ubiquity and importance in genome and chromosome biology, most efforts to annotate TEs in genome sequences rely on the results of a single computational program, RepeatMasker. In contrast, recent advances in gene annotation indicate that high-quality gene models can be produced from combining multiple independent sources of computational evidence. To elevate the quality of TE annotations to a level comparable to that of gene models, we have developed a combined evidence-model TE annotation pipeline, analogous to systems used for gene annotation, by integrating results from multiple homology-based and de novo TE identification methods. As proof of principle, we have annotated "TE models" in Drosophila melanogaster Release 4 genomic sequences using the combined computational evidence derived from RepeatMasker, BLASTER, TBLASTX, all-by-all BLASTN, RECON, TE-HMM and the previous Release 3.1 annotation. Our system is designed for use with the Apollo genome annotation tool, allowing automatic results to be curated manually to produce reliable annotations. The euchromatic TE fraction of D. melanogaster is now estimated at 5.3% (cf. 3.86% in Release 3.1, and we found a substantially higher number of TEs (n = 6,013 than previously identified (n = 1,572. Most of the new TEs derive from small fragments of a few hundred nucleotides long and highly abundant families not previously annotated (e.g., INE-1. We also estimated that 518 TE copies (8.6% are inserted into at least one other TE, forming a nest of elements. The pipeline allows rapid and thorough annotation of even the most complex TE models, including highly deleted and/or nested elements such as those often found in heterochromatic sequences. Our pipeline can be easily adapted to other genome sequences, such as those of the D. melanogaster heterochromatin or other

  7. All SNPs are not created equal: Genome-wide association studies reveal a consistent pattern of enrichment among functionally annotated SNPs

    NARCIS (Netherlands)

    Schork, A.J.; Thompson, W.K.; Pham, P.; Torkamani, A.; Roddey, J.C.; Sullivan, P.F.; Kelsoe, J.; O'Donovan, M.C.; Furberg, H.; Absher, D.; Agudo, A.; Almgren, P.; Ardissino, D.; Assimes, T.L.; Bandinelli, S.; Barzan, L.; Bencko, V.; Benhamou, S.; Benjamin, E.J.; Bernardinelli, L.; Bis, J.; Boehnke, M.; Boerwinkle, E.; Boomsma, D.I.; Brennan, P.; Canova, C.; Castellsagué, X.; Chanock, S.; Chasman, D.I.; Conway, D.I.; Dackor, J.; de Geus, E.J.C.; Duan, J.; Elosua, R.; Everett, B.; Fabianova, E.; Ferrucci, L.; Foretova, L.; Fortmann, S.P.; Franceschini, N.; Frayling, T.M.; Furberg, C.; Gejman, P.V.; Groop, L.; Gu, F.; Guralnik, J.; Hankinson, S.E.; Haritunians, T.; Healy, C.; Hofman, A.; Holcátová, I.; Hunter, D.J.; Hwang, S.J.; Ioannidis, J.P.A.; Iribarren, C.; Jackson, A.U.; Janout, V.; Kaprio, J.; Kim, Y.; Kjaerheim, K.; Knowles, J.W.; Kraft, P.; Ladenvall, C.; Lagiou, P.; Lanthrop, M.; Lerman, C.; Levinson, D.F.; Levy, D.; Li, M.D.; Lin, D.Y.; Lips, E.H.; Lissowska, J.; Lowry, R.B.; Lucas, G.; Macfarlane, T.V.; Maes, H.H.M.; Mannucci, P.M.; Mates, D.; Mauri, F.; McGovern, J.A.; McKay, J.D.; McKnight, B.; Melander, O.; Merlini, P.A.; Milaneschi, Y.; Mohlke, K.L.; O'Donnell, C.J.; Pare, G.; Penninx, B.W.J.H.; Perry, J.R.B.; Posthuma, D.; Preis, S.R.; Psaty, B.; Quertermous, T.; Ramachandran, V.S.; Richiardi, L.; Ridker, P.M.; Rose, J.; Rudnai, P.; Salomaa, V.; Sanders, A.R.; Schwartz, S.M.; Shi, J.; Smit, J.H.; Stringham, H.M.; Szeszenia-Dabrowska, N.; Tanaka, T.; Taylor, K.; Thacker, E.E.; Thornton, L.; Tiemeier, H.; Tuomilehto, J.; Uitterlinden, A.G.; van Duijn, C.M.; Vink, J.M.; Vogelzangs, N.; Voight, B.F.; Walter, S.; Willemsen, G.; Zaridze, D.; Znaor, A.; Akil, H.; Anjorin, A.; Backlund, L.; Badner, J.A.; Barchas, J.D.; Barrett, T.; Bass, N.; Bauer, M.; Bellivier, F.; Bergen, S.E.; Berrettini, W.; Blackwood, D.; Bloss, C.S.; Breen, G.; Breuer, R.; Bunner, W.E.; Burmeister, M.; Byerley, W. F.; Caesar, S.; Chambert, K.; Cichon, S.; St Clair, D.; Collier, D.A.; Corvin, A.; Coryell, W.H.; Craddock, N.; Craig, D.W.; Daly, M.; Day, R.; Degenhardt, F.; Djurovic, S.; Dudbridge, F.; Edenberg, H.J.; Elkin, A.; Etain, B.; Farmer, A.E.; Ferreira, M.A.; Ferrier, I.; Flickinger, M.; Foroud, T.; Frank, J.; Fraser, C.; Frisén, L.; Gershon, E.S.; Gill, M.; Gordon-Smith, K.; Green, E.K.; Greenwood, T.A.; Grozeva, D.; Guan, W.; Gurling, H.; Gustafsson, O.; Hamshere, M.L.; Hautzinger, M.; Herms, S.; Hipolito, M.; Holmans, P.A.; Hultman, C. M.; Jamain, S.; Jones, E.G.; Jones, I.; Jones, L.; Kandaswamy, R.; Kennedy, J.L.; Kirov, G. K.; Koller, D.L.; Kwan, P.; Landén, M.; Langstrom, N.; Lathrop, M.; Lawrence, J.; Lawson, W.B.; Leboyer, M.; Lee, P.H.; Li, J.; Lichtenstein, P.; Lin, D.; Liu, C.; Lohoff, F.W.; Lucae, S.; Mahon, P.B.; Maier, W.; Martin, N.G.; Mattheisen, M.; Matthews, K.; Mattingsdal, M.; McGhee, K.A.; McGuffin, P.; McInnis, M.G.; McIntosh, A.; McKinney, R.; McLean, A.W.; McMahon, F.J.; McQuillin, A.; Meier, S.; Melle, I.; Meng, F.; Mitchell, P.B.; Montgomery, G.W.; Moran, J.; Morken, G.; Morris, D.W.; Moskvina, V.; Muglia, P.; Mühleisen, T.W.; Muir, W.J.; Müller-Myhsok, B.; Myers, R.M.; Nievergelt, C.M.; Nikolov, I.; Nimgaonkar, V.L.; Nöthen, M.M.; Nurnberger, J.I.; Nwulia, E.A.; O'Dushlaine, C.; Osby, U.; Óskarsson, H.; Owen, M.J.; Petursson, H.; Pickard, B.S.; Porgeirsson, P.; Potash, J.B.; Propping, P.; Purcell, S.M.; Quinn, E.; Raychaudhuri, S.; Rice, J.; Rietschel, M.; Ruderfer, D.; Schalling, M.; Schatzberg, A.F.; Scheftner, W.A.; Schofield, P.R.; Schulze, T.G.; Schumacher, J.; Schwarz, M.M.; Scolnick, E.; Scott, L.J.; Shilling, P.D.; Sigurdsson, E.; Sklar, P.; Smith, E.N.; Stefansson, H.; Stefansson, K.; Steffens, M; Steinberg, S.; Strauss, J.; Strohmaier, J.; Szelinger, S.; Thompson, R.C.; Tozzi, F.; Treutlein, J.; Vincent, J.B.; Watson, S.J.; Wienker, T.F.; Williamson, R.; Witt, S.H.; Wright, A.; Xu, W.; Young, A.H.; Zandi, P.P.; Zhang, P.; Zöllner, S.; Agartz, I.; Albus, M.; Alexander, M.; Amdur, R. L.; Amin, F.; Bitter, I.; Black, D.W.; Børglum, A.D.; Brown, M.A.; Bruggeman, R.; Buccola, N.G.; Cahn, W.; Cantor, R.M.; Carr, V.J.; Catts, S. V.; Choudhury, K.; Cloninger, C. R.; Cormican, P.; Danoy, P. A.; Datta, S.; DeHert, M.; Demontis, D.; Dikeos, D.; Donnelly, P.; Donohoe, G.; Duong, L.; Dwyer, S.; Fanous, A.; Fink-Jensen, A.; Freedman, R.; Freimer, N.B.; Friedl, M.; Georgieva, L.; Giegling, I.; Glenthoj, B.; Godard, S.; Golimbet, V.; de Haan, L.; Hansen, M.; Hansen, T.; Hartmann, A.M.; Henskens, F. A.; Hougaard, D. M.; Ingason, A.; Jablensky, A. V.; Jakobsen, K.D.; Jay, M.; Jönsson, E.G.; Jürgens, G.; Kahn, R.S.; Keller, M.C.; Kendler, K.S.; Kenis, G.; Kenny, E.; Konnerth, H.; Konte, B.; Krabbendam, L.; Krasucki, R.; Lasseter, V. K.; Laurent, C.; Lencz, T.; Lerer, F. B.; Liang, K. Y.; Lieberman, J. A.; Linszen, D.H.; Lönnqvist, J.; Loughland, C. M.; Maclean, A. W.; Maher, B.S.; Malhotra, A.K.; Mallet, J.; Malloy, P.; McGrath, J. J.; McLean, D. E.; Michie, P. T.; Milanova, V.; Mors, O.; Mortensen, P.B.; Mowry, B. J.; Myin-Germeys, I.; Neale, B.; Nertney, D. A.; Nestadt, G.; Nielsen, J.; Nordentoft, M.; Norton, N.; O'Neill, F.; Olincy, A.; Olsen, L.; Ophoff, R.A.; Orntoft, T. F.; van Os, J.; Pantelis, C.; Papadimitriou, G.; Pato, C.N.; Peltonen, L.; Pickard, B.; Pietilainen, O.P.; Pimm, J.; Pulver, A. E.; Puri, V.; Quested, D.; Rasmussen, H.B.; Rethelyi, J.M.; Ribble, R.; Riley, B.P.; Rossin, L.; Ruggeri, M.; Rujescu, D.; Schall, U.; Schwab, S. G.; Scott, R.J.; Silverman, J.M.; Spencer, C. C.; Strange, A.; Strengman, E.; Stroup, T.S.; Suvisaari, J.; Terenius, L.; Thirumalai, S.; Timm, S.; Toncheva, D.; Tosato, S.; van den Oord, E.J.; Veldink, J.; Visscher, P.M.; Walsh, D.; Wang, A. G.; Werge, T.; Wiersma, D.; Wildenauer, D. B.; Williams, H.J.; Williams, N.M.; van Winkel, R.; Wormley, B.; Zammit, S.; Schork, N.J.; Andreassen, O.A.; Dale, A.M.

    2013-01-01

    Recent results indicate that genome-wide association studies (GWAS) have the potential to explain much of the heritability of common complex phenotypes, but methods are lacking to reliably identify the remaining associated single nucleotide polymorphisms (SNPs). We applied stratified False Discovery

  8. All SNPs Are Not Created Equal: Genome-Wide Association Studies Reveal a Consistent Pattern of Enrichment among Functionally Annotated SNPs

    NARCIS (Netherlands)

    Schork, Andrew J.; Thompson, Wesley K.; Pham, Phillip; Torkamani, Ali; Roddey, J. Cooper; Sullivan, Patrick F.; Kelsoe, John R.; O'Donovan, Michael C.; Furberg, Helena; Schork, Nicholas J.; Andreassen, Ole A.; Dale, Anders M.; Absher, Devin; Agudo, Antonio; Almgren, Peter; Ardissino, Diego; Assimes, Themistocles L.; Bandinelli, Stephania; Barzan, Luigi; Bencko, Vladimir; Benhamou, Simone; Benjamin, Emelia J.; Bernardinelli, Luisa; Bis, Joshua; Boehnke, Michael; Boerwinkle, Eric; Boomsma, Dorret I.; Brennan, Paul; Canova, Cristina; Castellsagué, Xavier; Chanock, Stephen; Chasman, Daniel; Conway, David I.; Dackor, Jennifer; de Geus, Eco J. C.; Duan, Jubao; Elosua, Roberto; Everett, Brendan; Fabianova, Eleonora; Ferrucci, Luigi; Foretova, Lenka; Fortmann, Stephen P.; Franceschini, Nora; Frayling, Timothy; Furberg, Curt; Gejman, Pablo V.; Groop, Leif; Gu, Fangyi; de Haan, Lieuwe; Linszen, Don H.

    2013-01-01

    Recent results indicate that genome-wide association studies (GWAS) have the potential to explain much of the heritability of common complex phenotypes, but methods are lacking to reliably identify the remaining associated single nucleotide polymorphisms (SNPs). We applied stratified False Discovery

  9. Genome sequencing and annotation of Serratia sp. strain TEL.

    Science.gov (United States)

    Lephoto, Tiisetso E; Gray, Vincent M

    2015-12-01

    We present the annotation of the draft genome sequence of Serratia sp. strain TEL (GenBank accession number KP711410). This organism was isolated from entomopathogenic nematode Oscheius sp. strain TEL (GenBank accession number KM492926) collected from grassland soil and has a genome size of 5,000,541 bp and 542 subsystems. The genome sequence can be accessed at DDBJ/EMBL/GenBank under the accession number LDEG00000000.

  10. Genome sequencing and annotation of Serratia sp. strain TEL

    Directory of Open Access Journals (Sweden)

    Tiisetso E. Lephoto

    2015-12-01

    Full Text Available We present the annotation of the draft genome sequence of Serratia sp. strain TEL (GenBank accession number KP711410. This organism was isolated from entomopathogenic nematode Oscheius sp. strain TEL (GenBank accession number KM492926 collected from grassland soil and has a genome size of 5,000,541 bp and 542 subsystems. The genome sequence can be accessed at DDBJ/EMBL/GenBank under the accession number LDEG00000000.

  11. Genome sequencing and annotation of Serratia sp. strain TEL

    OpenAIRE

    Lephoto, Tiisetso E.; Gray, Vincent M.

    2015-01-01

    We present the annotation of the draft genome sequence of Serratia sp. strain TEL (GenBank accession number KP711410). This organism was isolated from entomopathogenic nematode Oscheius sp. strain TEL (GenBank accession number KM492926) collected from grassland soil and has a genome size of 5,000,541 bp and 542 subsystems. The genome sequence can be accessed at DDBJ/EMBL/GenBank under the accession number LDEG00000000.

  12. MIPS: analysis and annotation of genome information in 2007.

    Science.gov (United States)

    Mewes, H W; Dietmann, S; Frishman, D; Gregory, R; Mannhaupt, G; Mayer, K F X; Münsterkötter, M; Ruepp, A; Spannagl, M; Stümpflen, V; Rattei, T

    2008-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF, Neuherberg, Germany) combines automatic processing of large amounts of sequences with manual annotation of selected model genomes. Due to the massive growth of the available data, the depth of annotation varies widely between independent databases. Also, the criteria for the transfer of information from known to orthologous sequences are diverse. To cope with the task of global in-depth genome annotation has become unfeasible. Therefore, our efforts are dedicated to three levels of annotation: (i) the curation of selected genomes, in particular from fungal and plant taxa (e.g. CYGD, MNCDB, MatDB), (ii) the comprehensive, consistent, automatic annotation employing exhaustive methods for the computation of sequence similarities and sequence-related attributes as well as the classification of individual sequences (SIMAP, PEDANT and FunCat) and (iii) the compilation of manually curated databases for protein interactions based on scrutinized information from the literature to serve as an accepted set of reliable annotated interaction data (MPACT, MPPI, CORUM). All databases and tools described as well as the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de).

  13. A framework for annotating human genome in disease context.

    Science.gov (United States)

    Xu, Wei; Wang, Huisong; Cheng, Wenqing; Fu, Dong; Xia, Tian; Kibbe, Warren A; Lin, Simon M

    2012-01-01

    Identification of gene-disease association is crucial to understanding disease mechanism. A rapid increase in biomedical literatures, led by advances of genome-scale technologies, poses challenge for manually-curated-based annotation databases to characterize gene-disease associations effectively and timely. We propose an automatic method-The Disease Ontology Annotation Framework (DOAF) to provide a comprehensive annotation of the human genome using the computable Disease Ontology (DO), the NCBO Annotator service and NCBI Gene Reference Into Function (GeneRIF). DOAF can keep the resulting knowledgebase current by periodically executing automatic pipeline to re-annotate the human genome using the latest DO and GeneRIF releases at any frequency such as daily or monthly. Further, DOAF provides a computable and programmable environment which enables large-scale and integrative analysis by working with external analytic software or online service platforms. A user-friendly web interface (doa.nubic.northwestern.edu) is implemented to allow users to efficiently query, download, and view disease annotations and the underlying evidences.

  14. Intra-species sequence comparisons for annotating genomes

    Energy Technology Data Exchange (ETDEWEB)

    Boffelli, Dario; Weer, Claire V.; Weng, Li; Lewis, Keith D.; Shoukry, Malak I.; Pachter, Lior; Keys, David N.; Rubin, Edward M.

    2004-07-15

    Analysis of sequence variation among members of a single species offers a potential approach to identify functional DNA elements responsible for biological features unique to that species. Due to its high rate of allelic polymorphism and ease of genetic manipulability, we chose the sea squirt, Ciona intestinalis, to explore intra-species sequence comparisons for genome annotation. A large number of C. intestinalis specimens were collected from four continents and a set of genomic intervals amplified, resequenced and analyzed to determine the mutation rates at each nucleotide in the sequence. We found that regions with low mutation rates efficiently demarcated functionally constrained sequences: these include a set of noncoding elements, which we showed in C intestinalis transgenic assays to act as tissue-specific enhancers, as well as the location of coding sequences. This illustrates that comparisons of multiple members of a species can be used for genome annotation, suggesting a path for the annotation of the sequenced genomes of organisms occupying uncharacterized phylogenetic branches of the animal kingdom and raises the possibility that the resequencing of a large number of Homo sapiens individuals might be used to annotate the human genome and identify sequences defining traits unique to our species. The sequence data from this study has been submitted to GenBank under accession nos. AY667278-AY667407.

  15. Web Apollo: a web-based genomic annotation editing platform.

    Science.gov (United States)

    Lee, Eduardo; Helt, Gregg A; Reese, Justin T; Munoz-Torres, Monica C; Childers, Chris P; Buels, Robert M; Stein, Lincoln; Holmes, Ian H; Elsik, Christine G; Lewis, Suzanna E

    2013-08-30

    Web Apollo is the first instantaneous, collaborative genomic annotation editor available on the web. One of the natural consequences following from current advances in sequencing technology is that there are more and more researchers sequencing new genomes. These researchers require tools to describe the functional features of their newly sequenced genomes. With Web Apollo researchers can use any of the common browsers (for example, Chrome or Firefox) to jointly analyze and precisely describe the features of a genome in real time, whether they are in the same room or working from opposite sides of the world.

  16. Genome Annotation in a Community College Cell Biology Lab

    Science.gov (United States)

    Beagley, C. Timothy

    2013-01-01

    The Biology Department at Salt Lake Community College has used the IMG-ACT toolbox to introduce a genome mapping and annotation exercise into the laboratory portion of its Cell Biology course. This project provides students with an authentic inquiry-based learning experience while introducing them to computational biology and contemporary learning…

  17. MUTAGEN: Multi-user tool for annotating GENomes

    DEFF Research Database (Denmark)

    Brugger, K.; Redder, P.; Skovgaard, Marie

    2003-01-01

    MUTAGEN is a free prokaryotic annotation system. It offers the advantages of genome comparison, graphical sequence browsers, search facilities and open-source for user-specific adjustments. The web-interface allows several users to access the system from standard desktop computers. The Sulfolobus...

  18. Enhanced annotations and features for comparing thousands of Pseudomonas genomes in the Pseudomonas genome database.

    Science.gov (United States)

    Winsor, Geoffrey L; Griffiths, Emma J; Lo, Raymond; Dhillon, Bhavjinder K; Shay, Julie A; Brinkman, Fiona S L

    2016-01-04

    The Pseudomonas Genome Database (http://www.pseudomonas.com) is well known for the application of community-based annotation approaches for producing a high-quality Pseudomonas aeruginosa PAO1 genome annotation, and facilitating whole-genome comparative analyses with other Pseudomonas strains. To aid analysis of potentially thousands of complete and draft genome assemblies, this database and analysis platform was upgraded to integrate curated genome annotations and isolate metadata with enhanced tools for larger scale comparative analysis and visualization. Manually curated gene annotations are supplemented with improved computational analyses that help identify putative drug targets and vaccine candidates or assist with evolutionary studies by identifying orthologs, pathogen-associated genes and genomic islands. The database schema has been updated to integrate isolate metadata that will facilitate more powerful analysis of genomes across datasets in the future. We continue to place an emphasis on providing high-quality updates to gene annotations through regular review of the scientific literature and using community-based approaches including a major new Pseudomonas community initiative for the assignment of high-quality gene ontology terms to genes. As we further expand from thousands of genomes, we plan to provide enhancements that will aid data visualization and analysis arising from whole-genome comparative studies including more pan-genome and population-based approaches. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  19. GI-POP: a combinational annotation and genomic island prediction pipeline for ongoing microbial genome projects.

    Science.gov (United States)

    Lee, Chi-Ching; Chen, Yi-Ping Phoebe; Yao, Tzu-Jung; Ma, Cheng-Yu; Lo, Wei-Cheng; Lyu, Ping-Chiang; Tang, Chuan Yi

    2013-04-10

    Sequencing of microbial genomes is important because of microbial-carrying antibiotic and pathogenetic activities. However, even with the help of new assembling software, finishing a whole genome is a time-consuming task. In most bacteria, pathogenetic or antibiotic genes are carried in genomic islands. Therefore, a quick genomic island (GI) prediction method is useful for ongoing sequencing genomes. In this work, we built a Web server called GI-POP (http://gipop.life.nthu.edu.tw) which integrates a sequence assembling tool, a functional annotation pipeline, and a high-performance GI predicting module, in a support vector machine (SVM)-based method called genomic island genomic profile scanning (GI-GPS). The draft genomes of the ongoing genome projects in contigs or scaffolds can be submitted to our Web server, and it provides the functional annotation and highly probable GI-predicting results. GI-POP is a comprehensive annotation Web server designed for ongoing genome project analysis. Researchers can perform annotation and obtain pre-analytic information include possible GIs, coding/non-coding sequences and functional analysis from their draft genomes. This pre-analytic system can provide useful information for finishing a genome sequencing project. Copyright © 2012 Elsevier B.V. All rights reserved.

  20. Applied bioinformatics: Genome annotation and transcriptome analysis

    DEFF Research Database (Denmark)

    Gupta, Vikas

    agricultural and biological importance. Its capacity to form symbiotic relationships with rhizobia and microrrhizal fungi has fascinated researchers for years. Lotus has a small genome of approximately 470 Mb and a short life cycle of 2 to 3 months, which has made Lotus a model legume plant for many molecular...

  1. MIPS: analysis and annotation of proteins from whole genomes.

    Science.gov (United States)

    Mewes, H W; Amid, C; Arnold, R; Frishman, D; Güldener, U; Mannhaupt, G; Münsterkötter, M; Pagel, P; Strack, N; Stümpflen, V; Warfsmann, J; Ruepp, A

    2004-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF), Neuherberg, Germany, provides protein sequence-related information based on whole-genome analysis. The main focus of the work is directed toward the systematic organization of sequence-related attributes as gathered by a variety of algorithms, primary information from experimental data together with information compiled from the scientific literature. MIPS maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the database of complete cDNAs (German Human Genome Project, NGFN), the database of mammalian protein-protein interactions (MPPI), the database of FASTA homologies (SIMAP), and the interface for the fast retrieval of protein-associated information (QUIPOS). The Arabidopsis thaliana database, the rice database, the plant EST databases (MATDB, MOsDB, SPUTNIK), as well as the databases for the comprehensive set of genomes (PEDANT genomes) are described elsewhere in the 2003 and 2004 NAR database issues, respectively. All databases described, and the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de).

  2. nGASP - the nematode genome annotation assessment project

    Energy Technology Data Exchange (ETDEWEB)

    Coghlan, A; Fiedler, T J; McKay, S J; Flicek, P; Harris, T W; Blasiar, D; Allen, J; Stein, L D

    2008-12-19

    While the C. elegans genome is extensively annotated, relatively little information is available for other Caenorhabditis species. The nematode genome annotation assessment project (nGASP) was launched to objectively assess the accuracy of protein-coding gene prediction software in C. elegans, and to apply this knowledge to the annotation of the genomes of four additional Caenorhabditis species and other nematodes. Seventeen groups worldwide participated in nGASP, and submitted 47 prediction sets for 10 Mb of the C. elegans genome. Predictions were compared to reference gene sets consisting of confirmed or manually curated gene models from WormBase. The most accurate gene-finders were 'combiner' algorithms, which made use of transcript- and protein-alignments and multi-genome alignments, as well as gene predictions from other gene-finders. Gene-finders that used alignments of ESTs, mRNAs and proteins came in second place. There was a tie for third place between gene-finders that used multi-genome alignments and ab initio gene-finders. The median gene level sensitivity of combiners was 78% and their specificity was 42%, which is nearly the same accuracy as reported for combiners in the human genome. C. elegans genes with exons of unusual hexamer content, as well as those with many exons, short exons, long introns, a weak translation start signal, weak splice sites, or poorly conserved orthologs were the most challenging for gene-finders. While the C. elegans genome is extensively annotated, relatively little information is available for other Caenorhabditis species. The nematode genome annotation assessment project (nGASP) was launched to objectively assess the accuracy of protein-coding gene prediction software in C. elegans, and to apply this knowledge to the annotation of the genomes of four additional Caenorhabditis species and other nematodes. Seventeen groups worldwide participated in nGASP, and submitted 47 prediction sets for 10 Mb of the C

  3. Assembly, Annotation, and Analysis of Multiple Mycorrhizal Fungal Genomes

    Energy Technology Data Exchange (ETDEWEB)

    Initiative Consortium, Mycorrhizal Genomics; Kuo, Alan; Grigoriev, Igor; Kohler, Annegret; Martin, Francis

    2013-03-08

    Mycorrhizal fungi play critical roles in host plant health, soil community structure and chemistry, and carbon and nutrient cycling, all areas of intense interest to the US Dept. of Energy (DOE) Joint Genome Institute (JGI). To this end we are building on our earlier sequencing of the Laccaria bicolor genome by partnering with INRA-Nancy and the mycorrhizal research community in the MGI to sequence and analyze dozens of mycorrhizal genomes of all Basidiomycota and Ascomycota orders and multiple ecological types (ericoid, orchid, and ectomycorrhizal). JGI has developed and deployed high-throughput sequencing techniques, and Assembly, RNASeq, and Annotation Pipelines. In 2012 alone we sequenced, assembled, and annotated 12 draft or improved genomes of mycorrhizae, and predicted ~;;232831 genes and ~;;15011 multigene families, All of this data is publicly available on JGI MycoCosm (http://jgi.doe.gov/fungi/), which provides access to both the genome data and tools with which to analyze the data. Preliminary comparisons of the current total of 14 public mycorrhizal genomes suggest that 1) short secreted proteins potentially involved in symbiosis are more enriched in some orders than in others amongst the mycorrhizal Agaricomycetes, 2) there are wide ranges of numbers of genes involved in certain functional categories, such as signal transduction and post-translational modification, and 3) novel gene families are specific to some ecological types.

  4. Protein sequence annotation in the genome era: the annotation concept of SWISS-PROT+TREMBL.

    Science.gov (United States)

    Apweiler, R; Gateau, A; Contrino, S; Martin, M J; Junker, V; O'Donovan, C; Lang, F; Mitaritonna, N; Kappus, S; Bairoch, A

    1997-01-01

    SWISS-PROT is a curated protein sequence database which strives to provide a high level of annotation, a minimal level of redundancy and high level of integration with other databases. Ongoing genome sequencing projects have dramatically increased the number of protein sequences to be incorporated into SWISS-PROT. Since we do not want to dilute the quality standards of SWISS-PROT by incorporating sequences without proper sequence analysis and annotation, we cannot speed up the incorporation of new incoming data indefinitely. However, as we also want to make the sequences available as fast as possible, we introduced TREMBL (TRanslation of EMBL nucleotide sequence database), a supplement to SWISS-PROT. TREMBL consists of computer-annotated entries in SWISS-PROT format derived from the translation of all coding sequences (CDS) in the EMBL nucleotide sequence database, except for CDS already included in SWISS-PROT. While TREMBL is already of immense value, its computer-generated annotation does not match the quality of SWISS-PROTs. The main difference is in the protein functional information attached to sequences. With this in mind, we are dedicating substantial effort to develop and apply computer methods to enhance the functional information attached to TREMBL entries.

  5. Annotating the genome by DNA methylation.

    Science.gov (United States)

    Cedar, Howard; Razin, Aharon

    2017-01-01

    DNA methylation plays a prominent role in setting up and stabilizing the molecular design of gene regulation and by understanding this process one gains profound insight into the underlying biology of mammals. In this article, we trace the discoveries that provided the foundations of this field, starting with the mapping of methyl groups in the genome and the experiments that helped clarify how methylation patterns are maintained through cell division. We then address the basic relationship between methyl groups and gene repression, as well as the molecular rules involved in controlling this process during development in vivo. Finally, we describe ongoing work aimed at defining the role of this modification in disease and deciphering how it may serve as a mechanism for sensing the environment.

  6. MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects.

    Science.gov (United States)

    Holt, Carson; Yandell, Mark

    2011-12-22

    Second-generation sequencing technologies are precipitating major shifts with regards to what kinds of genomes are being sequenced and how they are annotated. While the first generation of genome projects focused on well-studied model organisms, many of today's projects involve exotic organisms whose genomes are largely terra incognita. This complicates their annotation, because unlike first-generation projects, there are no pre-existing 'gold-standard' gene-models with which to train gene-finders. Improvements in genome assembly and the wide availability of mRNA-seq data are also creating opportunities to update and re-annotate previously published genome annotations. Today's genome projects are thus in need of new genome annotation tools that can meet the challenges and opportunities presented by second-generation sequencing technologies. We present MAKER2, a genome annotation and data management tool designed for second-generation genome projects. MAKER2 is a multi-threaded, parallelized application that can process second-generation datasets of virtually any size. We show that MAKER2 can produce accurate annotations for novel genomes where training-data are limited, of low quality or even non-existent. MAKER2 also provides an easy means to use mRNA-seq data to improve annotation quality; and it can use these data to update legacy annotations, significantly improving their quality. We also show that MAKER2 can evaluate the quality of genome annotations, and identify and prioritize problematic annotations for manual review. MAKER2 is the first annotation engine specifically designed for second-generation genome projects. MAKER2 scales to datasets of any size, requires little in the way of training data, and can use mRNA-seq data to improve annotation quality. It can also update and manage legacy genome annotation datasets.

  7. Sequencing and annotation of mitochondrial genomes from individual parasitic helminths.

    Science.gov (United States)

    Jex, Aaron R; Littlewood, D Timothy; Gasser, Robin B

    2015-01-01

    Mitochondrial (mt) genomics has significant implications in a range of fundamental areas of parasitology, including evolution, systematics, and population genetics as well as explorations of mt biochemistry, physiology, and function. Mt genomes also provide a rich source of markers to aid molecular epidemiological and ecological studies of key parasites. However, there is still a paucity of information on mt genomes for many metazoan organisms, particularly parasitic helminths, which has often related to challenges linked to sequencing from tiny amounts of material. The advent of next-generation sequencing (NGS) technologies has paved the way for low cost, high-throughput mt genomic research, but there have been obstacles, particularly in relation to post-sequencing assembly and analyses of large datasets. In this chapter, we describe protocols for the efficient amplification and sequencing of mt genomes from small portions of individual helminths, and highlight the utility of NGS platforms to expedite mt genomics. In addition, we recommend approaches for manual or semi-automated bioinformatic annotation and analyses to overcome the bioinformatic "bottleneck" to research in this area. Taken together, these approaches have demonstrated applicability to a range of parasites and provide prospects for using complete mt genomic sequence datasets for large-scale molecular systematic and epidemiological studies. In addition, these methods have broader utility and might be readily adapted to a range of other medium-sized molecular regions (i.e., 10-100 kb), including large genomic operons, and other organellar (e.g., plastid) and viral genomes.

  8. Supplementary Material for: BEACON: automated tool for Bacterial GEnome Annotation ComparisON

    KAUST Repository

    Kalkatawi, Manal M.; Alam, Intikhab; Bajic, Vladimir B.

    2015-01-01

    Abstract Background Genome annotation is one way of summarizing the existing knowledge about genomic characteristics of an organism. There has been an increased interest during the last several decades in computer-based structural and functional genome annotation. Many methods for this purpose have been developed for eukaryotes and prokaryotes. Our study focuses on comparison of functional annotations of prokaryotic genomes. To the best of our knowledge there is no fully automated system for detailed comparison of functional genome annotations generated by different annotation methods (AMs). Results The presence of many AMs and development of new ones introduce needs to: a/ compare different annotations for a single genome, and b/ generate annotation by combining individual ones. To address these issues we developed an Automated Tool for Bacterial GEnome Annotation ComparisON (BEACON) that benefits both AM developers and annotation analysers. BEACON provides detailed comparison of gene function annotations of prokaryotic genomes obtained by different AMs and generates extended annotations through combination of individual ones. For the illustration of BEACONâ s utility, we provide a comparison analysis of multiple different annotations generated for four genomes and show on these examples that the extended annotation can increase the number of genes annotated by putative functions up to 27 %, while the number of genes without any function assignment is reduced. Conclusions We developed BEACON, a fast tool for an automated and a systematic comparison of different annotations of single genomes. The extended annotation assigns putative functions to many genes with unknown functions. BEACON is available under GNU General Public License version 3.0 and is accessible at: http://www.cbrc.kaust.edu.sa/BEACON/ .

  9. Characterizing and annotating the genome using RNA-seq data.

    Science.gov (United States)

    Chen, Geng; Shi, Tieliu; Shi, Leming

    2017-02-01

    Bioinformatics methods for various RNA-seq data analyses are in fast evolution with the improvement of sequencing technologies. However, many challenges still exist in how to efficiently process the RNA-seq data to obtain accurate and comprehensive results. Here we reviewed the strategies for improving diverse transcriptomic studies and the annotation of genetic variants based on RNA-seq data. Mapping RNA-seq reads to the genome and transcriptome represent two distinct methods for quantifying the expression of genes/transcripts. Besides the known genes annotated in current databases, many novel genes/transcripts (especially those long noncoding RNAs) still can be identified on the reference genome using RNA-seq. Moreover, owing to the incompleteness of current reference genomes, some novel genes are missing from them. Genome- guided and de novo transcriptome reconstruction are two effective and complementary strategies for identifying those novel genes/transcripts on or beyond the reference genome. In addition, integrating the genes of distinct databases to conduct transcriptomics and genetics studies can improve the results of corresponding analyses.

  10. Experimental annotation of the human genome using microarray technology.

    Science.gov (United States)

    Shoemaker, D D; Schadt, E E; Armour, C D; He, Y D; Garrett-Engele, P; McDonagh, P D; Loerch, P M; Leonardson, A; Lum, P Y; Cavet, G; Wu, L F; Altschuler, S J; Edwards, S; King, J; Tsang, J S; Schimmack, G; Schelter, J M; Koch, J; Ziman, M; Marton, M J; Li, B; Cundiff, P; Ward, T; Castle, J; Krolewski, M; Meyer, M R; Mao, M; Burchard, J; Kidd, M J; Dai, H; Phillips, J W; Linsley, P S; Stoughton, R; Scherer, S; Boguski, M S

    2001-02-15

    The most important product of the sequencing of a genome is a complete, accurate catalogue of genes and their products, primarily messenger RNA transcripts and their cognate proteins. Such a catalogue cannot be constructed by computational annotation alone; it requires experimental validation on a genome scale. Using 'exon' and 'tiling' arrays fabricated by ink-jet oligonucleotide synthesis, we devised an experimental approach to validate and refine computational gene predictions and define full-length transcripts on the basis of co-regulated expression of their exons. These methods can provide more accurate gene numbers and allow the detection of mRNA splice variants and identification of the tissue- and disease-specific conditions under which genes are expressed. We apply our technique to chromosome 22q under 69 experimental condition pairs, and to the entire human genome under two experimental conditions. We discuss implications for more comprehensive, consistent and reliable genome annotation, more efficient, full-length complementary DNA cloning strategies and application to complex diseases.

  11. A post-assembly genome-improvement toolkit (PAGIT) to obtain annotated genomes from contigs.

    Science.gov (United States)

    Swain, Martin T; Tsai, Isheng J; Assefa, Samual A; Newbold, Chris; Berriman, Matthew; Otto, Thomas D

    2012-06-07

    Genome projects now produce draft assemblies within weeks owing to advanced high-throughput sequencing technologies. For milestone projects such as Escherichia coli or Homo sapiens, teams of scientists were employed to manually curate and finish these genomes to a high standard. Nowadays, this is not feasible for most projects, and the quality of genomes is generally of a much lower standard. This protocol describes software (PAGIT) that is used to improve the quality of draft genomes. It offers flexible functionality to close gaps in scaffolds, correct base errors in the consensus sequence and exploit reference genomes (if available) in order to improve scaffolding and generating annotations. The protocol is most accessible for bacterial and small eukaryotic genomes (up to 300 Mb), such as pathogenic bacteria, malaria and parasitic worms. Applying PAGIT to an E. coli assembly takes ∼24 h: it doubles the average contig size and annotates over 4,300 gene models.

  12. Protein annotation in the era of personal genomics

    DEFF Research Database (Denmark)

    Holberg Blicher, Thomas; Gupta, Ramneek; Wesolowska, Agata

    2010-01-01

    the differences between many individuals of the same species-humans in particular-the focus needs be on the functional impact of individual residue variation. To fulfil the promises of personal genomics, we need to start asking not only what is in a genome but also how millions of small differences between......Protein annotation provides a condensed and systematic view on the function of individual proteins. It has traditionally dealt with sorting proteins into functional categories, which for example has proven to be successful for the comparison of different species. However, if we are to understand...... individual genomes affect protein function and in turn human health. Copyright © 2010 Elsevier Ltd. All rights reserved....

  13. DFAST: a flexible prokaryotic genome annotation pipeline for faster genome publication.

    Science.gov (United States)

    Tanizawa, Yasuhiro; Fujisawa, Takatomo; Nakamura, Yasukazu

    2018-03-15

    We developed a prokaryotic genome annotation pipeline, DFAST, that also supports genome submission to public sequence databases. DFAST was originally started as an on-line annotation server, and to date, over 7000 jobs have been processed since its first launch in 2016. Here, we present a newly implemented background annotation engine for DFAST, which is also available as a standalone command-line program. The new engine can annotate a typical-sized bacterial genome within 10 min, with rich information such as pseudogenes, translation exceptions and orthologous gene assignment between given reference genomes. In addition, the modular framework of DFAST allows users to customize the annotation workflow easily and will also facilitate extensions for new functions and incorporation of new tools in the future. The software is implemented in Python 3 and runs in both Python 2.7 and 3.4-on Macintosh and Linux systems. It is freely available at https://github.com/nigyta/dfast_core/under the GPLv3 license with external binaries bundled in the software distribution. An on-line version is also available at https://dfast.nig.ac.jp/. yn@nig.ac.jp. Supplementary data are available at Bioinformatics online.

  14. Coordinated international action to accelerate genome-to-phenome with FAANG, the Functional Annotation of Animal Genomes project : open letter

    NARCIS (Netherlands)

    Archibald, A.L.; Bottema, C.D.; Brauning, R.; Burgess, S.C.; Burt, D.W.; Casas, E.; Cheng, H.H.; Clarke, L.; Couldrey, C.; Dalrymple, B.P.; Elsik, C.G.; Foissac, S.; Giuffra, E.; Groenen, M.A.M.; Hayes, B.J.; Huang, L.S.; Khatib, H.; Kijas, J.W.; Kim, H.; Lunney, J.K.; McCarthy, F.M.; McEwan, J.; Moore, S.; Nanduri, B.; Notredame, C.; Palti, Y.; Plastow, G.S.; Reecy, J.M.; Rohrer, G.; Sarropoulou, E.; Schmidt, C.J.; Silverstein, J.; Tellam, R.L.; Tixier-Boichard, M.; Tosser-klopp, G.; Tuggle, C.K.; Vilkki, J.; White, S.N.; Zhao, S.; Zhou, H.

    2015-01-01

    We describe the organization of a nascent international effort, the Functional Annotation of Animal Genomes (FAANG) project, whose aim is to produce comprehensive maps of functional elements in the genomes of domesticated animal species.

  15. The standard operating procedure of the DOE-JGI Microbial Genome Annotation Pipeline (MGAP v.4).

    Science.gov (United States)

    Huntemann, Marcel; Ivanova, Natalia N; Mavromatis, Konstantinos; Tripp, H James; Paez-Espino, David; Palaniappan, Krishnaveni; Szeto, Ernest; Pillay, Manoj; Chen, I-Min A; Pati, Amrita; Nielsen, Torben; Markowitz, Victor M; Kyrpides, Nikos C

    2015-01-01

    The DOE-JGI Microbial Genome Annotation Pipeline performs structural and functional annotation of microbial genomes that are further included into the Integrated Microbial Genome comparative analysis system. MGAP is applied to assembled nucleotide sequence datasets that are provided via the IMG submission site. Dataset submission for annotation first requires project and associated metadata description in GOLD. The MGAP sequence data processing consists of feature prediction including identification of protein-coding genes, non-coding RNAs and regulatory RNA features, as well as CRISPR elements. Structural annotation is followed by assignment of protein product names and functions.

  16. Identifying and annotating human bifunctional RNAs reveals their versatile functions.

    Science.gov (United States)

    Chen, Geng; Yang, Juan; Chen, Jiwei; Song, Yunjie; Cao, Ruifang; Shi, Tieliu; Shi, Leming

    2016-10-01

    Bifunctional RNAs that possess both protein-coding and noncoding functional properties were less explored and poorly understood. Here we systematically explored the characteristics and functions of such human bifunctional RNAs by integrating tandem mass spectrometry and RNA-seq data. We first constructed a pipeline to identify and annotate bifunctional RNAs, leading to the characterization of 132 high-confidence bifunctional RNAs. Our analyses indicate that bifunctional RNAs may be involved in human embryonic development and can be functional in diverse tissues. Moreover, bifunctional RNAs could interact with multiple miRNAs and RNA-binding proteins to exert their corresponding roles. Bifunctional RNAs may also function as competing endogenous RNAs to regulate the expression of many genes by competing for common targeting miRNAs. Finally, somatic mutations of diverse carcinomas may generate harmful effect on corresponding bifunctional RNAs. Collectively, our study not only provides the pipeline for identifying and annotating bifunctional RNAs but also reveals their important gene-regulatory functions.

  17. Evaluation of Three Automated Genome Annotations for Halorhabdus utahensis

    DEFF Research Database (Denmark)

    Bakke, Peter; Carney, Nick; DeLoache, Will

    2009-01-01

    in databases such as NCBI and used to validate subsequent annotation errors. We submitted the genome sequence of halophilic archaeon Halorhabdus utahensis to be analyzed by three genome annotation services. We have examined the output from each service in a variety of ways in order to compare the methodology...

  18. Draft Genome Sequence and Gene Annotation of the Entomopathogenic Fungus Verticillium hemipterigenum

    OpenAIRE

    Horn, Fabian; Habel, Andreas; Scharf, Daniel H.; Dworschak, Jan; Brakhage, Axel A.; Guthke, Reinhard; Hertweck, Christian; Linde, J?rg

    2015-01-01

    Verticillium hemipterigenum (anamorph Torrubiella hemipterigena) is an entomopathogenic fungus and produces a broad range of secondary metabolites. Here, we present the draft genome sequence of the fungus, including gene structure and functional annotation. Genes were predicted incorporating RNA-Seq data and functionally annotated to provide the basis for further genome studies.

  19. Genome annotation in a community college cell biology lab.

    Science.gov (United States)

    Beagley, C Timothy

    2013-01-01

    The Biology Department at Salt Lake Community College has used the IMG-ACT toolbox to introduce a genome mapping and annotation exercise into the laboratory portion of its Cell Biology course. This project provides students with an authentic inquiry-based learning experience while introducing them to computational biology and contemporary learning skills. Additionally, the project strengthens student understanding of the scientific method and contributes to student learning gains in curricular objectives centered around basic molecular biology, specifically, the Central Dogma. Importantly, inclusion of this project in the laboratory course provides students with a positive learning environment and allows for the use of cooperative learning strategies to increase overall student success. Copyright © 2012 International Union of Biochemistry and Molecular Biology, Inc.

  20. Expressed Peptide Tags: An additional layer of data for genome annotation

    Energy Technology Data Exchange (ETDEWEB)

    Savidor, Alon [ORNL; Donahoo, Ryan S [ORNL; Hurtado-Gonzales, Oscar [University of Tennessee, Knoxville (UTK); Verberkmoes, Nathan C [ORNL; Shah, Manesh B [ORNL; Lamour, Kurt H [ORNL; McDonald, W Hayes [ORNL

    2006-01-01

    While genome sequencing is becoming ever more routine, genome annotation remains a challenging process. Identification of the coding sequences within the genomic milieu presents a tremendous challenge, especially for eukaryotes with their complex gene architectures. Here we present a method to assist the annotation process through the use of proteomic data and bioinformatics. Mass spectra of digested protein preparations of the organism of interest were acquired and searched against a protein database created by a six frame translation of the genome. The identified peptides were mapped back to the genome, compared to the current annotation, and then categorized as supporting or extending the current genome annotation. We named the classified peptides Expressed Peptide Tags (EPTs). The well annotated bacterium Rhodopseudomonas palustris was used as a control for the method and showed high degree of correlation between EPT mapping and the current annotation, with 86% of the EPTs confirming existing gene calls and less than 1% of the EPTs expanding on the current annotation. The eukaryotic plant pathogens Phytophthora ramorum and Phytophthora sojae, whose genomes have been recently sequenced and are much less well annotated, were also subjected to this method. A series of algorithmic steps were taken to increase the confidence of EPT identification for these organisms, including generation of smaller sub-databases to be searched against, and definition of EPT criteria that accommodates the more complex eukaryotic gene architecture. As expected, the analysis of the Phytophthora species showed less correlation between EPT mapping and their current annotation. While ~77% of Phytophthora EPTs supported the current annotation, a portion of them (7.2% and 12.6% for P. ramorum and P. sojae, respectively) suggested modification to current gene calls or identified novel genes that were missed by the current genome annotation of these organisms.

  1. New genes expressed in human brains: implications for annotating evolving genomes.

    Science.gov (United States)

    Zhang, Yong E; Landback, Patrick; Vibranovski, Maria; Long, Manyuan

    2012-11-01

    New genes have frequently formed and spread to fixation in a wide variety of organisms, constituting abundant sets of lineage-specific genes. It was recently reported that an excess of primate-specific and human-specific genes were upregulated in the brains of fetuses and infants, and especially in the prefrontal cortex, which is involved in cognition. These findings reveal the prevalent addition of new genetic components to the transcriptome of the human brain. More generally, these findings suggest that genomes are continually evolving in both sequence and content, eroding the conservation endowed by common ancestry. Despite increasing recognition of the importance of new genes, we highlight here that these genes are still seriously under-characterized in functional studies and that new gene annotation is inconsistent in current practice. We propose an integrative approach to annotate new genes, taking advantage of functional and evolutionary genomic methods. We finally discuss how the refinement of new gene annotation will be important for the detection of evolutionary forces governing new gene origination. Copyright © 2012 WILEY Periodicals, Inc.

  2. Ten steps to get started in Genome Assembly and Annotation [version 1; referees: 2 approved

    Directory of Open Access Journals (Sweden)

    Victoria Dominguez Del Angel

    2018-02-01

    Full Text Available As a part of the ELIXIR-EXCELERATE efforts in capacity building, we present here 10 steps to facilitate researchers getting started in genome assembly and genome annotation. The guidelines given are broadly applicable, intended to be stable over time, and cover all aspects from start to finish of a general assembly and annotation project. Intrinsic properties of genomes are discussed, as is the importance of using high quality DNA. Different sequencing technologies and generally applicable workflows for genome assembly are also detailed. We cover structural and functional annotation and encourage readers to also annotate transposable elements, something that is often omitted from annotation workflows. The importance of data management is stressed, and we give advice on where to submit data and how to make your results Findable, Accessible, Interoperable, and Reusable (FAIR.

  3. Curated genome annotation of Oryza sativa ssp. japonica and comparative genome analysis with Arabidopsis thaliana

    Science.gov (United States)

    Itoh, Takeshi; Tanaka, Tsuyoshi; Barrero, Roberto A.; Yamasaki, Chisato; Fujii, Yasuyuki; Hilton, Phillip B.; Antonio, Baltazar A.; Aono, Hideo; Apweiler, Rolf; Bruskiewich, Richard; Bureau, Thomas; Burr, Frances; Costa de Oliveira, Antonio; Fuks, Galina; Habara, Takuya; Haberer, Georg; Han, Bin; Harada, Erimi; Hiraki, Aiko T.; Hirochika, Hirohiko; Hoen, Douglas; Hokari, Hiroki; Hosokawa, Satomi; Hsing, Yue; Ikawa, Hiroshi; Ikeo, Kazuho; Imanishi, Tadashi; Ito, Yukiyo; Jaiswal, Pankaj; Kanno, Masako; Kawahara, Yoshihiro; Kawamura, Toshiyuki; Kawashima, Hiroaki; Khurana, Jitendra P.; Kikuchi, Shoshi; Komatsu, Setsuko; Koyanagi, Kanako O.; Kubooka, Hiromi; Lieberherr, Damien; Lin, Yao-Cheng; Lonsdale, David; Matsumoto, Takashi; Matsuya, Akihiro; McCombie, W. Richard; Messing, Joachim; Miyao, Akio; Mulder, Nicola; Nagamura, Yoshiaki; Nam, Jongmin; Namiki, Nobukazu; Numa, Hisataka; Nurimoto, Shin; O’Donovan, Claire; Ohyanagi, Hajime; Okido, Toshihisa; OOta, Satoshi; Osato, Naoki; Palmer, Lance E.; Quetier, Francis; Raghuvanshi, Saurabh; Saichi, Naomi; Sakai, Hiroaki; Sakai, Yasumichi; Sakata, Katsumi; Sakurai, Tetsuya; Sato, Fumihiko; Sato, Yoshiharu; Schoof, Heiko; Seki, Motoaki; Shibata, Michie; Shimizu, Yuji; Shinozaki, Kazuo; Shinso, Yuji; Singh, Nagendra K.; Smith-White, Brian; Takeda, Jun-ichi; Tanino, Motohiko; Tatusova, Tatiana; Thongjuea, Supat; Todokoro, Fusano; Tsugane, Mika; Tyagi, Akhilesh K.; Vanavichit, Apichart; Wang, Aihui; Wing, Rod A.; Yamaguchi, Kaori; Yamamoto, Mayu; Yamamoto, Naoyuki; Yu, Yeisoo; Zhang, Hao; Zhao, Qiang; Higo, Kenichi; Burr, Benjamin; Gojobori, Takashi; Sasaki, Takuji

    2007-01-01

    We present here the annotation of the complete genome of rice Oryza sativa L. ssp. japonica cultivar Nipponbare. All functional annotations for proteins and non-protein-coding RNA (npRNA) candidates were manually curated. Functions were identified or inferred in 19,969 (70%) of the proteins, and 131 possible npRNAs (including 58 antisense transcripts) were found. Almost 5000 annotated protein-coding genes were found to be disrupted in insertional mutant lines, which will accelerate future experimental validation of the annotations. The rice loci were determined by using cDNA sequences obtained from rice and other representative cereals. Our conservative estimate based on these loci and an extrapolation suggested that the gene number of rice is ∼32,000, which is smaller than previous estimates. We conducted comparative analyses between rice and Arabidopsis thaliana and found that both genomes possessed several lineage-specific genes, which might account for the observed differences between these species, while they had similar sets of predicted functional domains among the protein sequences. A system to control translational efficiency seems to be conserved across large evolutionary distances. Moreover, the evolutionary process of protein-coding genes was examined. Our results suggest that natural selection may have played a role for duplicated genes in both species, so that duplication was suppressed or favored in a manner that depended on the function of a gene. PMID:17210932

  4. Experimental-confirmation and functional-annotation of predicted proteins in the chicken genome

    Directory of Open Access Journals (Sweden)

    McCarthy Fiona M

    2007-11-01

    Full Text Available Abstract Background The chicken genome was sequenced because of its phylogenetic position as a non-mammalian vertebrate, its use as a biomedical model especially to study embryology and development, its role as a source of human disease organisms and its importance as the major source of animal derived food protein. However, genomic sequence data is, in itself, of limited value; generally it is not equivalent to understanding biological function. The benefit of having a genome sequence is that it provides a basis for functional genomics. However, the sequence data currently available is poorly structurally and functionally annotated and many genes do not have standard nomenclature assigned. Results We analysed eight chicken tissues and improved the chicken genome structural annotation by providing experimental support for the in vivo expression of 7,809 computationally predicted proteins, including 30 chicken proteins that were only electronically predicted or hypothetical translations in human. To improve functional annotation (based on Gene Ontology, we mapped these identified proteins to their human and mouse orthologs and used this orthology to transfer Gene Ontology (GO functional annotations to the chicken proteins. The 8,213 orthology-based GO annotations that we produced represent an 8% increase in currently available chicken GO annotations. Orthologous chicken products were also assigned standardized nomenclature based on current chicken nomenclature guidelines. Conclusion We demonstrate the utility of high-throughput expression proteomics for rapid experimental structural annotation of a newly sequenced eukaryote genome. These experimentally-supported predicted proteins were further annotated by assigning the proteins with standardized nomenclature and functional annotation. This method is widely applicable to a diverse range of species. Moreover, information from one genome can be used to improve the annotation of other genomes and

  5. Xylella fastidiosa comparative genomic database is an information resource to explore the annotation, genomic features, and biology of different strains

    Directory of Open Access Journals (Sweden)

    Alessandro M. Varani

    2012-01-01

    Full Text Available The Xylella fastidiosa comparative genomic database is a scientific resource with the aim to provide a user-friendly interface for accessing high-quality manually curated genomic annotation and comparative sequence analysis, as well as for identifying and mapping prophage-like elements, a marked feature of Xylella genomes. Here we describe a database and tools for exploring the biology of this important plant pathogen. The hallmarks of this database are the high quality genomic annotation, the functional and comparative genomic analysis and the identification and mapping of prophage-like elements. It is available from web site http://www.xylella.lncc.br.

  6. AGORA : Organellar genome annotation from the amino acid and nucleotide references.

    Science.gov (United States)

    Jung, Jaehee; Kim, Jong Im; Jeong, Young-Sik; Yi, Gangman

    2018-03-29

    Next-generation sequencing (NGS) technologies have led to the accumulation of highthroughput sequence data from various organisms in biology. To apply gene annotation of organellar genomes for various organisms, more optimized tools for functional gene annotation are required. Almost all gene annotation tools are mainly focused on the chloroplast genome of land plants or the mitochondrial genome of animals.We have developed a web application AGORA for the fast, user-friendly, and improved annotations of organellar genomes. AGORA annotates genes based on a BLAST-based homology search and clustering with selected reference sequences from the NCBI database or user-defined uploaded data. AGORA can annotate the functional genes in almost all mitochondrion and plastid genomes of eukaryotes. The gene annotation of a genome with an exon-intron structure within a gene or inverted repeat region is also available. It provides information of start and end positions of each gene, BLAST results compared with the reference sequence, and visualization of gene map by OGDRAW. Users can freely use the software, and the accessible URL is https://bigdata.dongguk.edu/gene_project/AGORA/.The main module of the tool is implemented by the python and php, and the web page is built by the HTML and CSS to support all browsers. gangman@dongguk.edu.

  7. Genome-wide Annotation, Identification, and Global Transcriptomic Analysis of Regulatory or Small RNA Gene Expression in Staphylococcus aureus.

    Science.gov (United States)

    Carroll, Ronan K; Weiss, Andy; Broach, William H; Wiemels, Richard E; Mogen, Austin B; Rice, Kelly C; Shaw, Lindsey N

    2016-02-09

    In Staphylococcus aureus, hundreds of small regulatory or small RNAs (sRNAs) have been identified, yet this class of molecule remains poorly understood and severely understudied. sRNA genes are typically absent from genome annotation files, and as a consequence, their existence is often overlooked, particularly in global transcriptomic studies. To facilitate improved detection and analysis of sRNAs in S. aureus, we generated updated GenBank files for three commonly used S. aureus strains (MRSA252, NCTC 8325, and USA300), in which we added annotations for >260 previously identified sRNAs. These files, the first to include genome-wide annotation of sRNAs in S. aureus, were then used as a foundation to identify novel sRNAs in the community-associated methicillin-resistant strain USA300. This analysis led to the discovery of 39 previously unidentified sRNAs. Investigating the genomic loci of the newly identified sRNAs revealed a surprising degree of inconsistency in genome annotation in S. aureus, which may be hindering the analysis and functional exploration of these elements. Finally, using our newly created annotation files as a reference, we perform a global analysis of sRNA gene expression in S. aureus and demonstrate that the newly identified tsr25 is the most highly upregulated sRNA in human serum. This study provides an invaluable resource to the S. aureus research community in the form of our newly generated annotation files, while at the same time presenting the first examination of differential sRNA expression in pathophysiologically relevant conditions. Despite a large number of studies identifying regulatory or small RNA (sRNA) genes in Staphylococcus aureus, their annotation is notably lacking in available genome files. In addition to this, there has been a considerable lack of cross-referencing in the wealth of studies identifying these elements, often leading to the same sRNA being identified multiple times and bearing multiple names. In this work

  8. Genome sequencing and annotation of Amycolatopsis azurea DSM 43854T

    Directory of Open Access Journals (Sweden)

    Indu Khatri

    2014-12-01

    Full Text Available We report the 9.2 Mb genome of the azureomycin A and B antibiotic producing strain Amycolatopsis azurea isolated from a Japanese soil sample. The draft genome of strain DSM 43854T consists of 9,223,451 bp with a G + C content of 69.0% and the genome contains 3 rRNA genes (5S–23S–16S and 58 aminoacyl-tRNA synthetase genes. The homology searches revealed that the PKS gene clusters are supposed to be responsible for the biosynthesis of naptomycin, macbecin, rifamycin, mitomycin, maduropeptin enediyne, neocarzinostatin enediyne, C-1027 enediyne, calicheamicin enediyne, landomycin, simocyclinone, medermycin, granaticin, polyketomycin, teicoplanin, balhimycin, vancomycin, staurosporine, rubradirin and complestatin.

  9. Leveraging Genomic Annotations and Pleiotropic Enrichment for Improved Replication Rates in Schizophrenia GWAS

    DEFF Research Database (Denmark)

    Wang, Yunpeng; Thompson, Wesley K; Schork, Andrew J

    2016-01-01

    Most of the genetic architecture of schizophrenia (SCZ) has not yet been identified. Here, we apply a novel statistical algorithm called Covariate-Modulated Mixture Modeling (CM3), which incorporates auxiliary information (heterozygosity, total linkage disequilibrium, genomic annotations, pleiotr...

  10. Re-annotation and re-analysis of the Campylobacter jejuni NCTC11168 genome sequence

    Directory of Open Access Journals (Sweden)

    Dorrell Nick

    2007-06-01

    Full Text Available Abstract Background Campylobacter jejuni is the leading bacterial cause of human gastroenteritis in the developed world. To improve our understanding of this important human pathogen, the C. jejuni NCTC11168 genome was sequenced and published in 2000. The original annotation was a milestone in Campylobacter research, but is outdated. We now describe the complete re-annotation and re-analysis of the C. jejuni NCTC11168 genome using current database information, novel tools and annotation techniques not used during the original annotation. Results Re-annotation was carried out using sequence database searches such as FASTA, along with programs such as TMHMM for additional support. The re-annotation also utilises sequence data from additional Campylobacter strains and species not available during the original annotation. Re-annotation was accompanied by a full literature search that was incorporated into the updated EMBL file [EMBL: AL111168]. The C. jejuni NCTC11168 re-annotation reduced the total number of coding sequences from 1654 to 1643, of which 90.0% have additional information regarding the identification of new motifs and/or relevant literature. Re-annotation has led to 18.2% of coding sequence product functions being revised. Conclusions Major updates were made to genes involved in the biosynthesis of important surface structures such as lipooligosaccharide, capsule and both O- and N-linked glycosylation. This re-annotation will be a key resource for Campylobacter research and will also provide a prototype for the re-annotation and re-interpretation of other bacterial genomes.

  11. BG7: A New Approach for Bacterial Genome Annotation Designed for Next Generation Sequencing Data

    Science.gov (United States)

    Pareja-Tobes, Pablo; Manrique, Marina; Pareja-Tobes, Eduardo; Pareja, Eduardo; Tobes, Raquel

    2012-01-01

    BG7 is a new system for de novo bacterial, archaeal and viral genome annotation based on a new approach specifically designed for annotating genomes sequenced with next generation sequencing technologies. The system is versatile and able to annotate genes even in the step of preliminary assembly of the genome. It is especially efficient detecting unexpected genes horizontally acquired from bacterial or archaeal distant genomes, phages, plasmids, and mobile elements. From the initial phases of the gene annotation process, BG7 exploits the massive availability of annotated protein sequences in databases. BG7 predicts ORFs and infers their function based on protein similarity with a wide set of reference proteins, integrating ORF prediction and functional annotation phases in just one step. BG7 is especially tolerant to sequencing errors in start and stop codons, to frameshifts, and to assembly or scaffolding errors. The system is also tolerant to the high level of gene fragmentation which is frequently found in not fully assembled genomes. BG7 current version – which is developed in Java, takes advantage of Amazon Web Services (AWS) cloud computing features, but it can also be run locally in any operating system. BG7 is a fast, automated and scalable system that can cope with the challenge of analyzing the huge amount of genomes that are being sequenced with NGS technologies. Its capabilities and efficiency were demonstrated in the 2011 EHEC Germany outbreak in which BG7 was used to get the first annotations right the next day after the first entero-hemorrhagic E. coli genome sequences were made publicly available. The suitability of BG7 for genome annotation has been proved for Illumina, 454, Ion Torrent, and PacBio sequencing technologies. Besides, thanks to its plasticity, our system could be very easily adapted to work with new technologies in the future. PMID:23185310

  12. BG7: a new approach for bacterial genome annotation designed for next generation sequencing data.

    Directory of Open Access Journals (Sweden)

    Pablo Pareja-Tobes

    Full Text Available BG7 is a new system for de novo bacterial, archaeal and viral genome annotation based on a new approach specifically designed for annotating genomes sequenced with next generation sequencing technologies. The system is versatile and able to annotate genes even in the step of preliminary assembly of the genome. It is especially efficient detecting unexpected genes horizontally acquired from bacterial or archaeal distant genomes, phages, plasmids, and mobile elements. From the initial phases of the gene annotation process, BG7 exploits the massive availability of annotated protein sequences in databases. BG7 predicts ORFs and infers their function based on protein similarity with a wide set of reference proteins, integrating ORF prediction and functional annotation phases in just one step. BG7 is especially tolerant to sequencing errors in start and stop codons, to frameshifts, and to assembly or scaffolding errors. The system is also tolerant to the high level of gene fragmentation which is frequently found in not fully assembled genomes. BG7 current version - which is developed in Java, takes advantage of Amazon Web Services (AWS cloud computing features, but it can also be run locally in any operating system. BG7 is a fast, automated and scalable system that can cope with the challenge of analyzing the huge amount of genomes that are being sequenced with NGS technologies. Its capabilities and efficiency were demonstrated in the 2011 EHEC Germany outbreak in which BG7 was used to get the first annotations right the next day after the first entero-hemorrhagic E. coli genome sequences were made publicly available. The suitability of BG7 for genome annotation has been proved for Illumina, 454, Ion Torrent, and PacBio sequencing technologies. Besides, thanks to its plasticity, our system could be very easily adapted to work with new technologies in the future.

  13. Towards understanding the first genome sequence of a crenarchaeon by genome annotation using clusters of orthologous groups of proteins (COGs).

    Science.gov (United States)

    Natale, D A; Shankavaram, U T; Galperin, M Y; Wolf, Y I; Aravind, L; Koonin, E V

    2000-01-01

    predicted protein functions provide for a significant improvement in genome annotation. A differential genome display approach helps in a systematic investigation of common and distinct features of gene repertoires and in some cases reveals unexpected connections that may be indicative of functional similarities between phylogenetically distant organisms and of lateral gene exchange.

  14. Genome Context Viewer: visual exploration of multiple annotated genomes using microsynteny.

    Science.gov (United States)

    Cleary, Alan; Farmer, Andrew

    2018-05-01

    The Genome Context Viewer is a visual data-mining tool that allows users to search across multiple providers of genome data for regions with similarly annotated content that may be aligned and visualized at the level of their shared functional elements. By handling ordered sequences of gene family memberships as a unit of search and comparison, the user interface enables quick and intuitive assessment of the degree of gene content divergence and the presence of various types of structural events within syntenic contexts. Insights into functionally significant differences seen at this level of abstraction can then serve to direct the user to more detailed explorations of the underlying data in other interconnected, provider-specific tools. GCV is provided under the GNU General Public License version 3 (GPL-3.0). Source code is available at https://github.com/legumeinfo/lis_context_viewer. adf@ncgr.org. Supplementary data are available at Bioinformatics online.

  15. Annotating the Function of the Human Genome with Gene Ontology and Disease Ontology.

    Science.gov (United States)

    Hu, Yang; Zhou, Wenyang; Ren, Jun; Dong, Lixiang; Wang, Yadong; Jin, Shuilin; Cheng, Liang

    2016-01-01

    Increasing evidences indicated that function annotation of human genome in molecular level and phenotype level is very important for systematic analysis of genes. In this study, we presented a framework named Gene2Function to annotate Gene Reference into Functions (GeneRIFs), in which each functional description of GeneRIFs could be annotated by a text mining tool Open Biomedical Annotator (OBA), and each Entrez gene could be mapped to Human Genome Organisation Gene Nomenclature Committee (HGNC) gene symbol. After annotating all the records about human genes of GeneRIFs, 288,869 associations between 13,148 mRNAs and 7,182 terms, 9,496 associations between 948 microRNAs and 533 terms, and 901 associations between 139 long noncoding RNAs (lncRNAs) and 297 terms were obtained as a comprehensive annotation resource of human genome. High consistency of term frequency of individual gene (Pearson correlation = 0.6401, p = 2.2e - 16) and gene frequency of individual term (Pearson correlation = 0.1298, p = 3.686e - 14) in GeneRIFs and GOA shows our annotation resource is very reliable.

  16. A statistical framework to predict functional non-coding regions in the human genome through integrated analysis of annotation data.

    Science.gov (United States)

    Lu, Qiongshi; Hu, Yiming; Sun, Jiehuan; Cheng, Yuwei; Cheung, Kei-Hoi; Zhao, Hongyu

    2015-05-27

    Identifying functional regions in the human genome is a major goal in human genetics. Great efforts have been made to functionally annotate the human genome either through computational predictions, such as genomic conservation, or high-throughput experiments, such as the ENCODE project. These efforts have resulted in a rich collection of functional annotation data of diverse types that need to be jointly analyzed for integrated interpretation and annotation. Here we present GenoCanyon, a whole-genome annotation method that performs unsupervised statistical learning using 22 computational and experimental annotations thereby inferring the functional potential of each position in the human genome. With GenoCanyon, we are able to predict many of the known functional regions. The ability of predicting functional regions as well as its generalizable statistical framework makes GenoCanyon a unique and powerful tool for whole-genome annotation. The GenoCanyon web server is available at http://genocanyon.med.yale.edu.

  17. Using Microbial Genome Annotation as a Foundation for Collaborative Student Research

    Science.gov (United States)

    Reed, Kelynne E.; Richardson, John M.

    2013-01-01

    We used the Integrated Microbial Genomes Annotation Collaboration Toolkit as a framework to incorporate microbial genomics research into a microbiology and biochemistry course in a way that promoted student learning of bioinformatics and research skills and emphasized teamwork and collaboration as evidenced through multiple assessment mechanisms.…

  18. Improved annotation through genome-scale metabolic modeling of Aspergillus oryzae

    DEFF Research Database (Denmark)

    Vongsangnak, Wanwipa; Olsen, Peter; Hansen, Kim

    2008-01-01

    Background: Since ancient times the filamentous fungus Aspergillus oryzae has been used in the fermentation industry for the production of fermented sauces and the production of industrial enzymes. Recently, the genome sequence of A. oryzae with 12,074 annotated genes was released but the number...... to a genome scale metabolic model of A. oryzae. Results: Our assembled EST sequences we identified 1,046 newly predicted genes in the A. oryzae genome. Furthermore, it was possible to assign putative protein functions to 398 of the newly predicted genes. Noteworthy, our annotation strategy resulted...... model was validated and shown to correctly describe the phenotypic behavior of A. oryzae grown on different carbon sources. Conclusion: A much enhanced annotation of the A. oryzae genome was performed and a genomescale metabolic model of A. oryzae was reconstructed. The model accurately predicted...

  19. PanCoreGen - Profiling, detecting, annotating protein-coding genes in microbial genomes.

    Science.gov (United States)

    Paul, Sandip; Bhardwaj, Archana; Bag, Sumit K; Sokurenko, Evgeni V; Chattopadhyay, Sujay

    2015-12-01

    A large amount of genomic data, especially from multiple isolates of a single species, has opened new vistas for microbial genomics analysis. Analyzing the pan-genome (i.e. the sum of genetic repertoire) of microbial species is crucial in understanding the dynamics of molecular evolution, where virulence evolution is of major interest. Here we present PanCoreGen - a standalone application for pan- and core-genomic profiling of microbial protein-coding genes. PanCoreGen overcomes key limitations of the existing pan-genomic analysis tools, and develops an integrated annotation-structure for a species-specific pan-genomic profile. It provides important new features for annotating draft genomes/contigs and detecting unidentified genes in annotated genomes. It also generates user-defined group-specific datasets within the pan-genome. Interestingly, analyzing an example-set of Salmonella genomes, we detect potential footprints of adaptive convergence of horizontally transferred genes in two human-restricted pathogenic serovars - Typhi and Paratyphi A. Overall, PanCoreGen represents a state-of-the-art tool for microbial phylogenomics and pathogenomics study. Copyright © 2015 Elsevier Inc. All rights reserved.

  20. PanCoreGen – profiling, detecting, annotating protein-coding genes in microbial genomes

    Science.gov (United States)

    Bhardwaj, Archana; Bag, Sumit K; Sokurenko, Evgeni V.

    2015-01-01

    A large amount of genomic data, especially from multiple isolates of a single species, has opened new vistas for microbial genomics analysis. Analyzing pan-genome (i.e. the sum of genetic repertoire) of microbial species is crucial in understanding the dynamics of molecular evolution, where virulence evolution is of major interest. Here we present PanCoreGen – a standalone application for pan- and core-genomic profiling of microbial protein-coding genes. PanCoreGen overcomes key limitations of the existing pan-genomic analysis tools, and develops an integrated annotation-structure for species-specific pan-genomic profile. It provides important new features for annotating draft genomes/contigs and detecting unidentified genes in annotated genomes. It also generates user-defined group-specific datasets within the pan-genome. Interestingly, analyzing an example-set of Salmonella genomes, we detect potential footprints of adaptive convergence of horizontally transferred genes in two human-restricted pathogenic serovars – Typhi and Paratyphi A. Overall, PanCoreGen represents a state-of-the-art tool for microbial phylogenomics and pathogenomics study. PMID:26456591

  1. Exploiting proteomic data for genome annotation and gene model validation in Aspergillus niger

    OpenAIRE

    Wright, James C.; Sugden, Deana; Francis-McIntyre, Sue; Riba Garcia, Isabel; Gaskell, Simon J.; Grigoriev, Igor V.; Baker, Scott E.; Beynon, Robert J.; Hubbard, Simon J.

    2009-01-01

    Abstract Background Proteomic data is a potentially rich, but arguably unexploited, data source for genome annotation. Peptide identifications from tandem mass spectrometry provide prima facie evidence for gene predictions and can discriminate over a set of candidate gene models. Here we apply this to the recently sequenced Aspergillus niger fungal genome from the Joint Genome Institutes (JGI) and another predicted protein set from another A.niger sequence. Tandem mass spectra (MS/MS) were ac...

  2. DFAST and DAGA: web-based integrated genome annotation tools and resources.

    Science.gov (United States)

    Tanizawa, Yasuhiro; Fujisawa, Takatomo; Kaminuma, Eli; Nakamura, Yasukazu; Arita, Masanori

    2016-01-01

    Quality assurance and correct taxonomic affiliation of data submitted to public sequence databases have been an everlasting problem. The DDBJ Fast Annotation and Submission Tool (DFAST) is a newly developed genome annotation pipeline with quality and taxonomy assessment tools. To enable annotation of ready-to-submit quality, we also constructed curated reference protein databases tailored for lactic acid bacteria. DFAST was developed so that all the procedures required for DDBJ submission could be done seamlessly online. The online workspace would be especially useful for users not familiar with bioinformatics skills. In addition, we have developed a genome repository, DFAST Archive of Genome Annotation (DAGA), which currently includes 1,421 genomes covering 179 species and 18 subspecies of two genera, Lactobacillus and Pediococcus , obtained from both DDBJ/ENA/GenBank and Sequence Read Archive (SRA). All the genomes deposited in DAGA were annotated consistently and assessed using DFAST. To assess the taxonomic position based on genomic sequence information, we used the average nucleotide identity (ANI), which showed high discriminative power to determine whether two given genomes belong to the same species. We corrected mislabeled or misidentified genomes in the public database and deposited the curated information in DAGA. The repository will improve the accessibility and reusability of genome resources for lactic acid bacteria. By exploiting the data deposited in DAGA, we found intraspecific subgroups in Lactobacillus gasseri and Lactobacillus jensenii , whose variation between subgroups is larger than the well-accepted ANI threshold of 95% to differentiate species. DFAST and DAGA are freely accessible at https://dfast.nig.ac.jp.

  3. Differential metabolism of Mycoplasma species as revealed by their genomes

    Directory of Open Access Journals (Sweden)

    Fabricio B.M. Arraes

    2007-01-01

    Full Text Available The annotation and comparative analyses of the genomes of Mycoplasma synoviae and Mycoplasma hyopneumonie, as well as of other Mollicutes (a group of bacteria devoid of a rigid cell wall, has set the grounds for a global understanding of their metabolism and infection mechanisms. According to the annotation data, M. synoviae and M. hyopneumoniae are able to perform glycolytic metabolism, but do not possess the enzymatic machinery for citrate and glyoxylate cycles, gluconeogenesis and the pentose phosphate pathway. Both can synthesize ATP by lactic fermentation, but only M. synoviae can convert acetaldehyde to acetate. Also, our genome analysis revealed that M. synoviae and M. hyopneumoniae are not expected to synthesize polysaccharides, but they can take up a variety of carbohydrates via the phosphoenolpyruvate-dependent phosphotransferase system (PEP-PTS. Our data showed that these two organisms are unable to synthesize purine and pyrimidine de novo, since they only possess the sequences which encode salvage pathway enzymes. Comparative analyses of M. synoviae and M. hyopneumoniae with other Mollicutes have revealed differential genes in the former two genomes coding for enzymes that participate in carbohydrate, amino acid and nucleotide metabolism and host-pathogen interaction. The identification of these metabolic pathways will provide a better understanding of the biology and pathogenicity of these organisms.

  4. Exploiting proteomic data for genome annotation and gene model validation in Aspergillus niger.

    Science.gov (United States)

    Wright, James C; Sugden, Deana; Francis-McIntyre, Sue; Riba-Garcia, Isabel; Gaskell, Simon J; Grigoriev, Igor V; Baker, Scott E; Beynon, Robert J; Hubbard, Simon J

    2009-02-04

    Proteomic data is a potentially rich, but arguably unexploited, data source for genome annotation. Peptide identifications from tandem mass spectrometry provide prima facie evidence for gene predictions and can discriminate over a set of candidate gene models. Here we apply this to the recently sequenced Aspergillus niger fungal genome from the Joint Genome Institutes (JGI) and another predicted protein set from another A.niger sequence. Tandem mass spectra (MS/MS) were acquired from 1d gel electrophoresis bands and searched against all available gene models using Average Peptide Scoring (APS) and reverse database searching to produce confident identifications at an acceptable false discovery rate (FDR). 405 identified peptide sequences were mapped to 214 different A.niger genomic loci to which 4093 predicted gene models clustered, 2872 of which contained the mapped peptides. Interestingly, 13 (6%) of these loci either had no preferred predicted gene model or the genome annotators' chosen "best" model for that genomic locus was not found to be the most parsimonious match to the identified peptides. The peptides identified also boosted confidence in predicted gene structures spanning 54 introns from different gene models. This work highlights the potential of integrating experimental proteomics data into genomic annotation pipelines much as expressed sequence tag (EST) data has been. A comparison of the published genome from another strain of A.niger sequenced by DSM showed that a number of the gene models or proteins with proteomics evidence did not occur in both genomes, further highlighting the utility of the method.

  5. MC-GenomeKey: a multicloud system for the detection and annotation of genomic variants.

    Science.gov (United States)

    Elshazly, Hatem; Souilmi, Yassine; Tonellato, Peter J; Wall, Dennis P; Abouelhoda, Mohamed

    2017-01-20

    Next Generation Genome sequencing techniques became affordable for massive sequencing efforts devoted to clinical characterization of human diseases. However, the cost of providing cloud-based data analysis of the mounting datasets remains a concerning bottleneck for providing cost-effective clinical services. To address this computational problem, it is important to optimize the variant analysis workflow and the used analysis tools to reduce the overall computational processing time, and concomitantly reduce the processing cost. Furthermore, it is important to capitalize on the use of the recent development in the cloud computing market, which have witnessed more providers competing in terms of products and prices. In this paper, we present a new package called MC-GenomeKey (Multi-Cloud GenomeKey) that efficiently executes the variant analysis workflow for detecting and annotating mutations using cloud resources from different commercial cloud providers. Our package supports Amazon, Google, and Azure clouds, as well as, any other cloud platform based on OpenStack. Our package allows different scenarios of execution with different levels of sophistication, up to the one where a workflow can be executed using a cluster whose nodes come from different clouds. MC-GenomeKey also supports scenarios to exploit the spot instance model of Amazon in combination with the use of other cloud platforms to provide significant cost reduction. To the best of our knowledge, this is the first solution that optimizes the execution of the workflow using computational resources from different cloud providers. MC-GenomeKey provides an efficient multicloud based solution to detect and annotate mutations. The package can run in different commercial cloud platforms, which enables the user to seize the best offers. The package also provides a reliable means to make use of the low-cost spot instance model of Amazon, as it provides an efficient solution to the sudden termination of spot

  6. GFFview: A Web Server for Parsing and Visualizing Annotation Information of Eukaryotic Genome.

    Science.gov (United States)

    Deng, Feilong; Chen, Shi-Yi; Wu, Zhou-Lin; Hu, Yongsong; Jia, Xianbo; Lai, Song-Jia

    2017-10-01

    Owing to wide application of RNA sequencing (RNA-seq) technology, more and more eukaryotic genomes have been extensively annotated, such as the gene structure, alternative splicing, and noncoding loci. Annotation information of genome is prevalently stored as plain text in General Feature Format (GFF), which could be hundreds or thousands Mb in size. Therefore, it is a challenge for manipulating GFF file for biologists who have no bioinformatic skill. In this study, we provide a web server (GFFview) for parsing the annotation information of eukaryotic genome and then generating statistical description of six indices for visualization. GFFview is very useful for investigating quality and difference of the de novo assembled transcriptome in RNA-seq studies.

  7. Whole genome sequence and genome annotation of Colletotrichum acutatum, causal agent of anthracnose in pepper plants in South Korea.

    Science.gov (United States)

    Han, Joon-Hee; Chon, Jae-Kyung; Ahn, Jong-Hwa; Choi, Ik-Young; Lee, Yong-Hwan; Kim, Kyoung Su

    2016-06-01

    Colletotrichum acutatum is a destructive fungal pathogen which causes anthracnose in a wide range of crops. Here we report the whole genome sequence and annotation of C. acutatum strain KC05, isolated from an infected pepper in Kangwon, South Korea. Genomic DNA from the KC05 strain was used for the whole genome sequencing using a PacBio sequencer and the MiSeq system. The KC05 genome was determined to be 52,190,760 bp in size with a G + C content of 51.73% in 27 scaffolds and to contain 13,559 genes with an average length of 1516 bp. Gene prediction and annotation were performed by incorporating RNA-Seq data. The genome sequence of the KC05 was deposited at DDBJ/ENA/GenBank under the accession number LUXP00000000.

  8. LocusTrack: Integrated visualization of GWAS results and genomic annotation.

    Science.gov (United States)

    Cuellar-Partida, Gabriel; Renteria, Miguel E; MacGregor, Stuart

    2015-01-01

    Genome-wide association studies (GWAS) are an important tool for the mapping of complex traits and diseases. Visual inspection of genomic annotations may be used to generate insights into the biological mechanisms underlying GWAS-identified loci. We developed LocusTrack, a web-based application that annotates and creates plots of regional GWAS results and incorporates user-specified tracks that display annotations such as linkage disequilibrium (LD), phylogenetic conservation, chromatin state, and other genomic and regulatory elements. Currently, LocusTrack can integrate annotation tracks from the UCSC genome-browser as well as from any tracks provided by the user. LocusTrack is an easy-to-use application and can be accessed at the following URL: http://gump.qimr.edu.au/general/gabrieC/LocusTrack/. Users can upload and manage GWAS results and select from and/or provide annotation tracks using simple and intuitive menus. LocusTrack scripts and associated data can be downloaded from the website and run locally.

  9. H2DB: a heritability database across multiple species by annotating trait-associated genomic loci.

    Science.gov (United States)

    Kaminuma, Eli; Fujisawa, Takatomo; Tanizawa, Yasuhiro; Sakamoto, Naoko; Kurata, Nori; Shimizu, Tokurou; Nakamura, Yasukazu

    2013-01-01

    H2DB (http://tga.nig.ac.jp/h2db/), an annotation database of genetic heritability estimates for humans and other species, has been developed as a knowledge database to connect trait-associated genomic loci. Heritability estimates have been investigated for individual species, particularly in human twin studies and plant/animal breeding studies. However, there appears to be no comprehensive heritability database for both humans and other species. Here, we introduce an annotation database for genetic heritabilities of various species that was annotated by manually curating online public resources in PUBMED abstracts and journal contents. The proposed heritability database contains attribute information for trait descriptions, experimental conditions, trait-associated genomic loci and broad- and narrow-sense heritability specifications. Annotated trait-associated genomic loci, for which most are single-nucleotide polymorphisms derived from genome-wide association studies, may be valuable resources for experimental scientists. In addition, we assigned phenotype ontologies to the annotated traits for the purposes of discussing heritability distributions based on phenotypic classifications.

  10. xGDBvm: A Web GUI-Driven Workflow for Annotating Eukaryotic Genomes in the Cloud.

    Science.gov (United States)

    Duvick, Jon; Standage, Daniel S; Merchant, Nirav; Brendel, Volker P

    2016-04-01

    Genome-wide annotation of gene structure requires the integration of numerous computational steps. Currently, annotation is arguably best accomplished through collaboration of bioinformatics and domain experts, with broad community involvement. However, such a collaborative approach is not scalable at today's pace of sequence generation. To address this problem, we developed the xGDBvm software, which uses an intuitive graphical user interface to access a number of common genome analysis and gene structure tools, preconfigured in a self-contained virtual machine image. Once their virtual machine instance is deployed through iPlant's Atmosphere cloud services, users access the xGDBvm workflow via a unified Web interface to manage inputs, set program parameters, configure links to high-performance computing (HPC) resources, view and manage output, apply analysis and editing tools, or access contextual help. The xGDBvm workflow will mask the genome, compute spliced alignments from transcript and/or protein inputs (locally or on a remote HPC cluster), predict gene structures and gene structure quality, and display output in a public or private genome browser complete with accessory tools. Problematic gene predictions are flagged and can be reannotated using the integrated yrGATE annotation tool. xGDBvm can also be configured to append or replace existing data or load precomputed data. Multiple genomes can be annotated and displayed, and outputs can be archived for sharing or backup. xGDBvm can be adapted to a variety of use cases including de novo genome annotation, reannotation, comparison of different annotations, and training or teaching. © 2016 American Society of Plant Biologists. All rights reserved.

  11. FIGENIX: Intelligent automation of genomic annotation: expertise integration in a new software platform

    Directory of Open Access Journals (Sweden)

    Pontarotti Pierre

    2005-08-01

    Full Text Available Abstract Background Two of the main objectives of the genomic and post-genomic era are to structurally and functionally annotate genomes which consists of detecting genes' position and structure, and inferring their function (as well as of other features of genomes. Structural and functional annotation both require the complex chaining of numerous different software, algorithms and methods under the supervision of a biologist. The automation of these pipelines is necessary to manage huge amounts of data released by sequencing projects. Several pipelines already automate some of these complex chaining but still necessitate an important contribution of biologists for supervising and controlling the results at various steps. Results Here we propose an innovative automated platform, FIGENIX, which includes an expert system capable to substitute to human expertise at several key steps. FIGENIX currently automates complex pipelines of structural and functional annotation under the supervision of the expert system (which allows for example to make key decisions, check intermediate results or refine the dataset. The quality of the results produced by FIGENIX is comparable to those obtained by expert biologists with a drastic gain in terms of time costs and avoidance of errors due to the human manipulation of data. Conclusion The core engine and expert system of the FIGENIX platform currently handle complex annotation processes of broad interest for the genomic community. They could be easily adapted to new, or more specialized pipelines, such as for example the annotation of miRNAs, the classification of complex multigenic families, annotation of regulatory elements and other genomic features of interest.

  12. Data for constructing insect genome content matrices for phylogenetic analysis and functional annotation

    Directory of Open Access Journals (Sweden)

    Jeffrey Rosenfeld

    2016-03-01

    Full Text Available Twenty one fully sequenced and well annotated insect genomes were used to construct genome content matrices for phylogenetic analysis and functional annotation of insect genomes. To examine the role of e-value cutoff in ortholog determination we used scaled e-value cutoffs and a single linkage clustering approach.. The present communication includes (1 a list of the genomes used to construct the genome content phylogenetic matrices, (2 a nexus file with the data matrices used in phylogenetic analysis, (3 a nexus file with the Newick trees generated by phylogenetic analysis, (4 an excel file listing the Core (CORE genes and Unique (UNI genes found in five insect groups, and (5 a figure showing a plot of consistency index (CI versus percent of unannotated genes that are apomorphies in the data set for gene losses and gains and bar plots of gains and losses for four consistency index (CI cutoffs.

  13. Improved Genome Assembly and Annotation for the Rock Pigeon (Columba livia).

    Science.gov (United States)

    Holt, Carson; Campbell, Michael; Keays, David A; Edelman, Nathaniel; Kapusta, Aurélie; Maclary, Emily; T Domyan, Eric; Suh, Alexander; Warren, Wesley C; Yandell, Mark; Gilbert, M Thomas P; Shapiro, Michael D

    2018-05-04

    The domestic rock pigeon ( Columba livia ) is among the most widely distributed and phenotypically diverse avian species. C. livia is broadly studied in ecology, genetics, physiology, behavior, and evolutionary biology, and has recently emerged as a model for understanding the molecular basis of anatomical diversity, the magnetic sense, and other key aspects of avian biology. Here we report an update to the C. livia genome reference assembly and gene annotation dataset. Greatly increased scaffold lengths in the updated reference assembly, along with an updated annotation set, provide improved tools for evolutionary and functional genetic studies of the pigeon, and for comparative avian genomics in general. Copyright © 2018 Holt et al.

  14. Genome size analyses of Pucciniales reveal the largest fungal genomes.

    Science.gov (United States)

    Tavares, Sílvia; Ramos, Ana Paula; Pires, Ana Sofia; Azinheira, Helena G; Caldeirinha, Patrícia; Link, Tobias; Abranches, Rita; Silva, Maria do Céu; Voegele, Ralf T; Loureiro, João; Talhinhas, Pedro

    2014-01-01

    Rust fungi (Basidiomycota, Pucciniales) are biotrophic plant pathogens which exhibit diverse complexities in their life cycles and host ranges. The completion of genome sequencing of a few rust fungi has revealed the occurrence of large genomes. Sequencing efforts for other rust fungi have been hampered by uncertainty concerning their genome sizes. Flow cytometry was recently applied to estimate the genome size of a few rust fungi, and confirmed the occurrence of large genomes in this order (averaging 225.3 Mbp, while the average for Basidiomycota was 49.9 Mbp and was 37.7 Mbp for all fungi). In this work, we have used an innovative and simple approach to simultaneously isolate nuclei from the rust and its host plant in order to estimate the genome size of 30 rust species by flow cytometry. Genome sizes varied over 10-fold, from 70 to 893 Mbp, with an average genome size value of 380.2 Mbp. Compared to the genome sizes of over 1800 fungi, Gymnosporangium confusum possesses the largest fungal genome ever reported (893.2 Mbp). Moreover, even the smallest rust genome determined in this study is larger than the vast majority of fungal genomes (94%). The average genome size of the Pucciniales is now of 305.5 Mbp, while the average Basidiomycota genome size has shifted to 70.4 Mbp and the average for all fungi reached 44.2 Mbp. Despite the fact that no correlation could be drawn between the genome sizes, the phylogenomics or the life cycle of rust fungi, it is interesting to note that rusts with Fabaceae hosts present genomes clearly larger than those with Poaceae hosts. Although this study comprises only a small fraction of the more than 7000 rust species described, it seems already evident that the Pucciniales represent a group where genome size expansion could be a common characteristic. This is in sharp contrast to sister taxa, placing this order in a relevant position in fungal genomics research.

  15. Comparative Genomics Reveals High Genomic Diversity in the Genus Photobacterium.

    Science.gov (United States)

    Machado, Henrique; Gram, Lone

    2017-01-01

    Vibrionaceae is a large marine bacterial family, which can constitute up to 50% of the prokaryotic population in marine waters. Photobacterium is the second largest genus in the family and we used comparative genomics on 35 strains representing 16 of the 28 species described so far, to understand the genomic diversity present in the Photobacterium genus. Such understanding is important for ecophysiology studies of the genus. We used whole genome sequences to evaluate phylogenetic relationships using several analyses (16S rRNA, MLSA, fur , amino-acid usage, ANI), which allowed us to identify two misidentified strains. Genome analyses also revealed occurrence of higher and lower GC content clades, correlating with phylogenetic clusters. Pan- and core-genome analysis revealed the conservation of 25% of the genome throughout the genus, with a large and open pan-genome. The major source of genomic diversity could be traced to the smaller chromosome and plasmids. Several of the physiological traits studied in the genus did not correlate with phylogenetic data. Since horizontal gene transfer (HGT) is often suggested as a source of genetic diversity and a potential driver of genomic evolution in bacterial species, we looked into evidence of such in Photobacterium genomes. Genomic islands were the source of genomic differences between strains of the same species. Also, we found transposase genes and CRISPR arrays that suggest multiple encounters with foreign DNA. Presence of genomic exchange traits was widespread and abundant in the genus, suggesting a role in genomic evolution. The high genetic variability and indications of genetic exchange make it difficult to elucidate genome evolutionary paths and raise the awareness of the roles of foreign DNA in the genomic evolution of environmental organisms.

  16. Genome sequencing and annotation of Stenotrophomonas sp. SAM8

    Directory of Open Access Journals (Sweden)

    Samy Selim

    2015-12-01

    Full Text Available We report draft genome sequence of Stenotrophomonas sp. strain SAM8, isolated from environmental water. The draft genome size is 3,665,538 bp with a G + C content of 67.2% and contains 6 rRNA sequence (single copies of 5S, 16S & 23S rRNA. The genome sequence can be accessed at DDBJ/EMBL/GenBank under the accession no. LDAV00000000.

  17. Genome sequencing and annotation of Proteus sp. SAS71

    Directory of Open Access Journals (Sweden)

    Samy Selim

    2015-12-01

    Full Text Available We report draft genome sequence of Proteus sp. strain SAS71, isolated from water spring in Aljouf region, Saudi Arabia. The draft genome size is 3,037,704 bp with a G + C content of 39.3% and contains 6 rRNA sequence (single copies of 5S, 16S & 23S rRNA. The genome sequence can be accessed at DDBJ/EMBL/GenBank under the accession no. LDIU00000000.

  18. Genome sequencing and annotation of multidrug resistant Mycobacterium tuberculosis (MDR-TB PR10 strain

    Directory of Open Access Journals (Sweden)

    Mohd Zakihalani A. Halim

    2016-03-01

    Full Text Available Here, we report the draft genome sequence and annotation of a multidrug resistant Mycobacterium tuberculosis strain PR10 (MDR-TB PR10 isolated from a patient diagnosed with tuberculosis. The size of the draft genome MDR-TB PR10 is 4.34 Mbp with 65.6% of G + C content and consists of 4637 predicted genes. The determinants were categorized by RAST into 400 subsystems with 4286 coding sequences and 50 RNAs. The whole genome shotgun project has been deposited at DDBJ/EMBL/GenBank under the accession number CP010968. Keywords: Mycobacterium tuberculosis, Genome, MDR, Extrapulmonary

  19. Analysis of high-throughput sequencing and annotation strategies for phage genomes.

    Directory of Open Access Journals (Sweden)

    Matthew R Henn

    Full Text Available BACKGROUND: Bacterial viruses (phages play a critical role in shaping microbial populations as they influence both host mortality and horizontal gene transfer. As such, they have a significant impact on local and global ecosystem function and human health. Despite their importance, little is known about the genomic diversity harbored in phages, as methods to capture complete phage genomes have been hampered by the lack of knowledge about the target genomes, and difficulties in generating sufficient quantities of genomic DNA for sequencing. Of the approximately 550 phage genomes currently available in the public domain, fewer than 5% are marine phage. METHODOLOGY/PRINCIPAL FINDINGS: To advance the study of phage biology through comparative genomic approaches we used marine cyanophage as a model system. We compared DNA preparation methodologies (DNA extraction directly from either phage lysates or CsCl purified phage particles, and sequencing strategies that utilize either Sanger sequencing of a linker amplification shotgun library (LASL or of a whole genome shotgun library (WGSL, or 454 pyrosequencing methods. We demonstrate that genomic DNA sample preparation directly from a phage lysate, combined with 454 pyrosequencing, is best suited for phage genome sequencing at scale, as this method is capable of capturing complete continuous genomes with high accuracy. In addition, we describe an automated annotation informatics pipeline that delivers high-quality annotation and yields few false positives and negatives in ORF calling. CONCLUSIONS/SIGNIFICANCE: These DNA preparation, sequencing and annotation strategies enable a high-throughput approach to the burgeoning field of phage genomics.

  20. The draft genome sequence and annotation of the desert woodrat Neotoma lepida

    Directory of Open Access Journals (Sweden)

    Michael Campbell

    2016-09-01

    Full Text Available We present the de novo draft genome sequence for a vertebrate mammalian herbivore, the desert woodrat (Neotoma lepida. This species is of ecological and evolutionary interest with respect to ingestion, microbial detoxification and hepatic metabolism of toxic plant secondary compounds from the highly toxic creosote bush (Larrea tridentata and the juniper shrub (Juniperus monosperma. The draft genome sequence and annotation have been deposited at GenBank under the accession LZPO01000000.

  1. The 2008 update of the Aspergillus nidulans genome annotation: A community effort

    DEFF Research Database (Denmark)

    Wortman, Jennifer Russo; Gilsenan, Jane Mabey; Joardar, Vinita

    2009-01-01

    The identification and annotation of protein-coding genes is one of the primary goals of whole-genome sequencing projects, and the accuracy of predicting the primary protein products of gene expression is vital to the interpretation of the available data and the design of downstream functional ap...

  2. The 2008 update of the Aspergillus nidulans genome annotation : a community effort

    NARCIS (Netherlands)

    Wortman, Jennifer Russo; Gilsenan, Jane Mabey; Joardar, Vinita; Deegan, Jennifer; Clutterbuck, John; Andersen, Mikael R; Archer, David; Bencina, Mojca; Braus, Gerhard; Coutinho, Pedro; von Döhren, Hans; Doonan, John; Driessen, Arnold J M; Durek, Pawel; Espeso, Eduardo; Fekete, Erzsébet; Flipphi, Michel; Estrada, Carlos Garcia; Geysens, Steven; Goldman, Gustavo; de Groot, Piet W J; Hansen, Kim; Harris, Steven D; Heinekamp, Thorsten; Helmstaedt, Kerstin; Henrissat, Bernard; Hofmann, Gerald; Homan, Tim; Horio, Tetsuya; Horiuchi, Hiroyuki; James, Steve; Jones, Meriel; Karaffa, Levente; Karányi, Zsolt; Kato, Masashi; Keller, Nancy; Kelly, Diane E; Kiel, Jan A K W; Kim, Jung-Mi; van der Klei, Ida J; Klis, Frans M; Kovalchuk, Andriy; Krasevec, Nada; Kubicek, Christian P; Liu, Bo; Maccabe, Andrew; Meyer, Vera; Mirabito, Pete; Miskei, Márton; Mos, Magdalena; Mullins, Jonathan; Nelson, David R; Nielsen, Jens; Oakley, Berl R; Osmani, Stephen A; Pakula, Tiina; Paszewski, Andrzej; Paulsen, Ian; Pilsyk, Sebastian; Pócsi, István; Punt, Peter J; Ram, Arthur F J; Ren, Qinghu; Robellet, Xavier; Robson, Geoff; Seiboth, Bernhard; van Solingen, Piet; Specht, Thomas; Sun, Jibin; Taheri-Talesh, Naimeh; Takeshita, Norio; Ussery, Dave; vanKuyk, Patricia A; Visser, Hans; van de Vondervoort, Peter J I; de Vries, Ronald P; Walton, Jonathan; Xiang, Xin; Xiong, Yi; Zeng, An Ping; Brandt, Bernd W; Cornell, Michael J; van den Hondel, Cees A M J J; Visser, Jacob; Oliver, Stephen G; Turner, Geoffrey

    The identification and annotation of protein-coding genes is one of the primary goals of whole-genome sequencing projects, and the accuracy of predicting the primary protein products of gene expression is vital to the interpretation of the available data and the design of downstream functional

  3. The 2008 update of the Aspergillus nidulans genome annotation : A community effort

    NARCIS (Netherlands)

    Wortman, Jennifer Russo; Gilsenan, Jane Mabey; Joardar, Vinita; Deegan, Jennifer; Clutterbuck, John; Andersen, Mikael R.; Archer, David; Bencina, Mojca; Braus, Gerhard; Coutinho, Pedro; von Doehren, Hans; Doonan, John; Driessen, Arnold J. M.; Durek, Pawel; Espeso, Eduardo; Fekete, Erzsebet; Flipphi, Michel; Garcia Estrada, Carlos; Geysens, Steven; Goldman, Gustavo; de Groot, Piet W. J.; Hansen, Kim; Harris, Steven D.; Heinekamp, Thorsten; Helmstaedt, Kerstin; Henrissat, Bernard; Hofmann, Gerald; Homan, Tim; Horio, Tetsuya; Horiuchi, Hiroyuki; James, Steve; Jones, Meriel; Karaffa, Levente; Karanyi, Zsolt; Kato, Masashi; Keller, Nancy; Kelly, Diane E.; Kiel, Jan A. K. W.; Kim, Jung-Mi; van der Klei, Ida J.; Klis, Frans M.; Kovalchuk, Andriy; Krasevec, Nada; Kubicek, Christian P.; Liu, Bo; MacCabe, Andrew; Meyer, Vera; Mirabito, Pete; Miskei, Marton; Mos, Magdalena; Mullins, Jonathan; Nelson, David R.; Nielsen, Jens; Oakley, Berl R.; Osmani, Stephen A.; Pakula, Tiina; Paszewski, Andrzej; Paulsen, Ian; Pilsyk, Sebastian; Pocsi, Istvan; Punt, Peter J.; Ram, Arthur F. J.; Ren, Qinghu; Robellet, Xavier; Robson, Geoff; Seiboth, Bernhard; van Solingen, Piet; Specht, Thomas; Sun, Jibin; Taheri-Talesh, Naimeh; Takeshita, Norio; Ussery, Dave; Vankuyk, Patricia A.; Visser, Hans; de Vondervoort, Peter J. I. van; Walton, Jonathan; Xiang, Xin; Xiong, Yi; Zeng, An Ping; Brandt, Bernd W.; Cornell, Michael J.; van den Hondel, Cees A. M. J. J.; Visser, Jacob; Oliver, Stephen G.; Turner, Geoffrey; Kraševec, Nada; Kuyk, Patricia A. van; Döhren, D.H.; van Seilboth, B; de Vries, R.

    The identification and annotation of protein-coding genes is one of the primary goals of whole-genome sequencing projects, and the accuracy of predicting the primary protein products of gene expression is vital to the interpretation of the available data and the design of downstream functional

  4. CGKB: an annotation knowledge base for cowpea (Vigna unguiculata L. methylation filtered genomic genespace sequences

    Directory of Open Access Journals (Sweden)

    Spraggins Thomas A

    2007-04-01

    Full Text Available Abstract Background Cowpea [Vigna unguiculata (L. Walp.] is one of the most important food and forage legumes in the semi-arid tropics because of its ability to tolerate drought and grow on poor soils. It is cultivated mostly by poor farmers in developing countries, with 80% of production taking place in the dry savannah of tropical West and Central Africa. Cowpea is largely an underexploited crop with relatively little genomic information available for use in applied plant breeding. The goal of the Cowpea Genomics Initiative (CGI, funded by the Kirkhouse Trust, a UK-based charitable organization, is to leverage modern molecular genetic tools for gene discovery and cowpea improvement. One aspect of the initiative is the sequencing of the gene-rich region of the cowpea genome (termed the genespace recovered using methylation filtration technology and providing annotation and analysis of the sequence data. Description CGKB, Cowpea Genespace/Genomics Knowledge Base, is an annotation knowledge base developed under the CGI. The database is based on information derived from 298,848 cowpea genespace sequences (GSS isolated by methylation filtering of genomic DNA. The CGKB consists of three knowledge bases: GSS annotation and comparative genomics knowledge base, GSS enzyme and metabolic pathway knowledge base, and GSS simple sequence repeats (SSRs knowledge base for molecular marker discovery. A homology-based approach was applied for annotations of the GSS, mainly using BLASTX against four public FASTA formatted protein databases (NCBI GenBank Proteins, UniProtKB-Swiss-Prot, UniprotKB-PIR (Protein Information Resource, and UniProtKB-TrEMBL. Comparative genome analysis was done by BLASTX searches of the cowpea GSS against four plant proteomes from Arabidopsis thaliana, Oryza sativa, Medicago truncatula, and Populus trichocarpa. The possible exons and introns on each cowpea GSS were predicted using the HMM-based Genscan gene predication program and the

  5. Exploiting proteomic data for genome annotation and gene model validation in Aspergillus niger

    Directory of Open Access Journals (Sweden)

    Grigoriev Igor V

    2009-02-01

    Full Text Available Abstract Background Proteomic data is a potentially rich, but arguably unexploited, data source for genome annotation. Peptide identifications from tandem mass spectrometry provide prima facie evidence for gene predictions and can discriminate over a set of candidate gene models. Here we apply this to the recently sequenced Aspergillus niger fungal genome from the Joint Genome Institutes (JGI and another predicted protein set from another A.niger sequence. Tandem mass spectra (MS/MS were acquired from 1d gel electrophoresis bands and searched against all available gene models using Average Peptide Scoring (APS and reverse database searching to produce confident identifications at an acceptable false discovery rate (FDR. Results 405 identified peptide sequences were mapped to 214 different A.niger genomic loci to which 4093 predicted gene models clustered, 2872 of which contained the mapped peptides. Interestingly, 13 (6% of these loci either had no preferred predicted gene model or the genome annotators' chosen "best" model for that genomic locus was not found to be the most parsimonious match to the identified peptides. The peptides identified also boosted confidence in predicted gene structures spanning 54 introns from different gene models. Conclusion This work highlights the potential of integrating experimental proteomics data into genomic annotation pipelines much as expressed sequence tag (EST data has been. A comparison of the published genome from another strain of A.niger sequenced by DSM showed that a number of the gene models or proteins with proteomics evidence did not occur in both genomes, further highlighting the utility of the method.

  6. Citrus sinensis annotation project (CAP): a comprehensive database for sweet orange genome.

    Science.gov (United States)

    Wang, Jia; Chen, Dijun; Lei, Yang; Chang, Ji-Wei; Hao, Bao-Hai; Xing, Feng; Li, Sen; Xu, Qiang; Deng, Xiu-Xin; Chen, Ling-Ling

    2014-01-01

    Citrus is one of the most important and widely grown fruit crop with global production ranking firstly among all the fruit crops in the world. Sweet orange accounts for more than half of the Citrus production both in fresh fruit and processed juice. We have sequenced the draft genome of a double-haploid sweet orange (C. sinensis cv. Valencia), and constructed the Citrus sinensis annotation project (CAP) to store and visualize the sequenced genomic and transcriptome data. CAP provides GBrowse-based organization of sweet orange genomic data, which integrates ab initio gene prediction, EST, RNA-seq and RNA-paired end tag (RNA-PET) evidence-based gene annotation. Furthermore, we provide a user-friendly web interface to show the predicted protein-protein interactions (PPIs) and metabolic pathways in sweet orange. CAP provides comprehensive information beneficial to the researchers of sweet orange and other woody plants, which is freely available at http://citrus.hzau.edu.cn/.

  7. Comparative sequence analysis of Sordaria macrospora and Neurospora crassa as a means to improve genome annotation.

    Science.gov (United States)

    Nowrousian, Minou; Würtz, Christian; Pöggeler, Stefanie; Kück, Ulrich

    2004-03-01

    One of the most challenging parts of large scale sequencing projects is the identification of functional elements encoded in a genome. Recently, studies of genomes of up to six different Saccharomyces species have demonstrated that a comparative analysis of genome sequences from closely related species is a powerful approach to identify open reading frames and other functional regions within genomes [Science 301 (2003) 71, Nature 423 (2003) 241]. Here, we present a comparison of selected sequences from Sordaria macrospora to their corresponding Neurospora crassa orthologous regions. Our analysis indicates that due to the high degree of sequence similarity and conservation of overall genomic organization, S. macrospora sequence information can be used to simplify the annotation of the N. crassa genome.

  8. Weighting sequence variants based on their annotation increases power of whole-genome association studies

    DEFF Research Database (Denmark)

    Sveinbjornsson, Gardar; Albrechtsen, Anders; Zink, Florian

    2016-01-01

    The consensus approach to genome-wide association studies (GWAS) has been to assign equal prior probability of association to all sequence variants tested. However, some sequence variants, such as loss-of-function and missense variants, are more likely than others to affect protein function...... for the family-wise error rate (FWER), using as weights the enrichment of sequence annotations among association signals. We show that this weighted adjustment increases the power to detect association over the standard Bonferroni correction. We use the enrichment of associations by sequence annotation we have...

  9. Cancer Genome Interpreter annotates the biological and clinical relevance of tumor alterations.

    Science.gov (United States)

    Tamborero, David; Rubio-Perez, Carlota; Deu-Pons, Jordi; Schroeder, Michael P; Vivancos, Ana; Rovira, Ana; Tusquets, Ignasi; Albanell, Joan; Rodon, Jordi; Tabernero, Josep; de Torres, Carmen; Dienstmann, Rodrigo; Gonzalez-Perez, Abel; Lopez-Bigas, Nuria

    2018-03-28

    While tumor genome sequencing has become widely available in clinical and research settings, the interpretation of tumor somatic variants remains an important bottleneck. Here we present the Cancer Genome Interpreter, a versatile platform that automates the interpretation of newly sequenced cancer genomes, annotating the potential of alterations detected in tumors to act as drivers and their possible effect on treatment response. The results are organized in different levels of evidence according to current knowledge, which we envision can support a broad range of oncology use cases. The resource is publicly available at http://www.cancergenomeinterpreter.org .

  10. Identifying and exploiting trait-relevant tissues with multiple functional annotations in genome-wide association studies

    Science.gov (United States)

    Zhang, Shujun

    2018-01-01

    Genome-wide association studies (GWASs) have identified many disease associated loci, the majority of which have unknown biological functions. Understanding the mechanism underlying trait associations requires identifying trait-relevant tissues and investigating associations in a trait-specific fashion. Here, we extend the widely used linear mixed model to incorporate multiple SNP functional annotations from omics studies with GWAS summary statistics to facilitate the identification of trait-relevant tissues, with which to further construct powerful association tests. Specifically, we rely on a generalized estimating equation based algorithm for parameter inference, a mixture modeling framework for trait-tissue relevance classification, and a weighted sequence kernel association test constructed based on the identified trait-relevant tissues for powerful association analysis. We refer to our analytic procedure as the Scalable Multiple Annotation integration for trait-Relevant Tissue identification and usage (SMART). With extensive simulations, we show how our method can make use of multiple complementary annotations to improve the accuracy for identifying trait-relevant tissues. In addition, our procedure allows us to make use of the inferred trait-relevant tissues, for the first time, to construct more powerful SNP set tests. We apply our method for an in-depth analysis of 43 traits from 28 GWASs using tissue-specific annotations in 105 tissues derived from ENCODE and Roadmap. Our results reveal new trait-tissue relevance, pinpoint important annotations that are informative of trait-tissue relationship, and illustrate how we can use the inferred trait-relevant tissues to construct more powerful association tests in the Wellcome trust case control consortium study. PMID:29377896

  11. Identifying and exploiting trait-relevant tissues with multiple functional annotations in genome-wide association studies.

    Directory of Open Access Journals (Sweden)

    Xingjie Hao

    2018-01-01

    Full Text Available Genome-wide association studies (GWASs have identified many disease associated loci, the majority of which have unknown biological functions. Understanding the mechanism underlying trait associations requires identifying trait-relevant tissues and investigating associations in a trait-specific fashion. Here, we extend the widely used linear mixed model to incorporate multiple SNP functional annotations from omics studies with GWAS summary statistics to facilitate the identification of trait-relevant tissues, with which to further construct powerful association tests. Specifically, we rely on a generalized estimating equation based algorithm for parameter inference, a mixture modeling framework for trait-tissue relevance classification, and a weighted sequence kernel association test constructed based on the identified trait-relevant tissues for powerful association analysis. We refer to our analytic procedure as the Scalable Multiple Annotation integration for trait-Relevant Tissue identification and usage (SMART. With extensive simulations, we show how our method can make use of multiple complementary annotations to improve the accuracy for identifying trait-relevant tissues. In addition, our procedure allows us to make use of the inferred trait-relevant tissues, for the first time, to construct more powerful SNP set tests. We apply our method for an in-depth analysis of 43 traits from 28 GWASs using tissue-specific annotations in 105 tissues derived from ENCODE and Roadmap. Our results reveal new trait-tissue relevance, pinpoint important annotations that are informative of trait-tissue relationship, and illustrate how we can use the inferred trait-relevant tissues to construct more powerful association tests in the Wellcome trust case control consortium study.

  12. Functional annotation from the genome sequence of the giant panda

    OpenAIRE

    Huo, Tong; Zhang, Yinjie; Lin, Jianping

    2012-01-01

    The giant panda is one of the most critically endangered species due to the fragmentation and loss of its habitat. Studying the functions of proteins in this animal, especially specific trait-related proteins, is therefore necessary to protect the species. In this work, the functions of these proteins were investigated using the genome sequence of the giant panda. Data on 21,001 proteins and their functions were stored in the Giant Panda Protein Database, in which the proteins were divided in...

  13. wANNOVAR: annotating genetic variants for personal genomes via the web.

    Science.gov (United States)

    Chang, Xiao; Wang, Kai

    2012-07-01

    High-throughput DNA sequencing platforms have become widely available. As a result, personal genomes are increasingly being sequenced in research and clinical settings. However, the resulting massive amounts of variants data pose significant challenges to the average biologists and clinicians without bioinformatics skills. We developed a web server called wANNOVAR to address the critical needs for functional annotation of genetic variants from personal genomes. The server provides simple and intuitive interface to help users determine the functional significance of variants. These include annotating single nucleotide variants and insertions/deletions for their effects on genes, reporting their conservation levels (such as PhyloP and GERP++ scores), calculating their predicted functional importance scores (such as SIFT and PolyPhen scores), retrieving allele frequencies in public databases (such as the 1000 Genomes Project and NHLBI-ESP 5400 exomes), and implementing a 'variants reduction' protocol to identify a subset of potentially deleterious variants/genes. We illustrated how wANNOVAR can help draw biological insights from sequencing data, by analysing genetic variants generated on two Mendelian diseases. We conclude that wANNOVAR will help biologists and clinicians take advantage of the personal genome information to expedite scientific discoveries. The wANNOVAR server is available at http://wannovar.usc.edu, and will be continuously updated to reflect the latest annotation information.

  14. Evidence-based gene models for structural and functional annotations of the oil palm genome.

    Science.gov (United States)

    Chan, Kuang-Lim; Tatarinova, Tatiana V; Rosli, Rozana; Amiruddin, Nadzirah; Azizi, Norazah; Halim, Mohd Amin Ab; Sanusi, Nik Shazana Nik Mohd; Jayanthi, Nagappan; Ponomarenko, Petr; Triska, Martin; Solovyev, Victor; Firdaus-Raih, Mohd; Sambanthamurthi, Ravigadevi; Murphy, Denis; Low, Eng-Ti Leslie

    2017-09-08

    Oil palm is an important source of edible oil. The importance of the crop, as well as its long breeding cycle (10-12 years) has led to the sequencing of its genome in 2013 to pave the way for genomics-guided breeding. Nevertheless, the first set of gene predictions, although useful, had many fragmented genes. Classification and characterization of genes associated with traits of interest, such as those for fatty acid biosynthesis and disease resistance, were also limited. Lipid-, especially fatty acid (FA)-related genes are of particular interest for the oil palm as they specify oil yields and quality. This paper presents the characterization of the oil palm genome using different gene prediction methods and comparative genomics analysis, identification of FA biosynthesis and disease resistance genes, and the development of an annotation database and bioinformatics tools. Using two independent gene-prediction pipelines, Fgenesh++ and Seqping, 26,059 oil palm genes with transcriptome and RefSeq support were identified from the oil palm genome. These coding regions of the genome have a characteristic broad distribution of GC 3 (fraction of cytosine and guanine in the third position of a codon) with over half the GC 3 -rich genes (GC 3  ≥ 0.75286) being intronless. In comparison, only one-seventh of the oil palm genes identified are intronless. Using comparative genomics analysis, characterization of conserved domains and active sites, and expression analysis, 42 key genes involved in FA biosynthesis in oil palm were identified. For three of them, namely EgFABF, EgFABH and EgFAD3, segmental duplication events were detected. Our analysis also identified 210 candidate resistance genes in six classes, grouped by their protein domain structures. We present an accurate and comprehensive annotation of the oil palm genome, focusing on analysis of important categories of genes (GC 3 -rich and intronless), as well as those associated with important functions, such as FA

  15. Improved methods and resources for paramecium genomics: transcription units, gene annotation and gene expression.

    Science.gov (United States)

    Arnaiz, Olivier; Van Dijk, Erwin; Bétermier, Mireille; Lhuillier-Akakpo, Maoussi; de Vanssay, Augustin; Duharcourt, Sandra; Sallet, Erika; Gouzy, Jérôme; Sperling, Linda

    2017-06-26

    The 15 sibling species of the Paramecium aurelia cryptic species complex emerged after a whole genome duplication that occurred tens of millions of years ago. Given extensive knowledge of the genetics and epigenetics of Paramecium acquired over the last century, this species complex offers a uniquely powerful system to investigate the consequences of whole genome duplication in a unicellular eukaryote as well as the genetic and epigenetic mechanisms that drive speciation. High quality Paramecium gene models are important for research using this system. The major aim of the work reported here was to build an improved gene annotation pipeline for the Paramecium lineage. We generated oriented RNA-Seq transcriptome data across the sexual process of autogamy for the model species Paramecium tetraurelia. We determined, for the first time in a ciliate, candidate P. tetraurelia transcription start sites using an adapted Cap-Seq protocol. We developed TrUC, multi-threaded Perl software that in conjunction with TopHat mapping of RNA-Seq data to a reference genome, predicts transcription units for the annotation pipeline. We used EuGene software to combine annotation evidence. The high quality gene structural annotations obtained for P. tetraurelia were used as evidence to improve published annotations for 3 other Paramecium species. The RNA-Seq data were also used for differential gene expression analysis, providing a gene expression atlas that is more sensitive than the previously established microarray resource. We have developed a gene annotation pipeline tailored for the compact genomes and tiny introns of Paramecium species. A novel component of this pipeline, TrUC, predicts transcription units using Cap-Seq and oriented RNA-Seq data. TrUC could prove useful beyond Paramecium, especially in the case of high gene density. Accurate predictions of 3' and 5' UTR will be particularly valuable for studies of gene expression (e.g. nucleosome positioning, identification of cis

  16. Functional annotation from the genome sequence of the giant panda.

    Science.gov (United States)

    Huo, Tong; Zhang, Yinjie; Lin, Jianping

    2012-08-01

    The giant panda is one of the most critically endangered species due to the fragmentation and loss of its habitat. Studying the functions of proteins in this animal, especially specific trait-related proteins, is therefore necessary to protect the species. In this work, the functions of these proteins were investigated using the genome sequence of the giant panda. Data on 21,001 proteins and their functions were stored in the Giant Panda Protein Database, in which the proteins were divided into two groups: 20,179 proteins whose functions can be predicted by GeneScan formed the known-function group, whereas 822 proteins whose functions cannot be predicted by GeneScan comprised the unknown-function group. For the known-function group, we further classified the proteins by molecular function, biological process, cellular component, and tissue specificity. For the unknown-function group, we developed a strategy in which the proteins were filtered by cross-Blast to identify panda-specific proteins under the assumption that proteins related to the panda-specific traits in the unknown-function group exist. After this filtering procedure, we identified 32 proteins (2 of which are membrane proteins) specific to the giant panda genome as compared against the dog and horse genomes. Based on their amino acid sequences, these 32 proteins were further analyzed by functional classification using SVM-Prot, motif prediction using MyHits, and interacting protein prediction using the Database of Interacting Proteins. Nineteen proteins were predicted to be zinc-binding proteins, thus affecting the activities of nucleic acids. The 32 panda-specific proteins will be further investigated by structural and functional analysis.

  17. The genome of Tetranychus urticae reveals herbivorous pest adaptations

    NARCIS (Netherlands)

    Grbić, M.; Van Leeuwen, T.; Clark, R.M.; Rombauts, S.; Grbić, V.; Osborne, E.J.; Dermauw, W.; Phuong, C.T.N.; Ortego, F.; Hernández-Crespo, P.; Diaz, I.; Martinez, M.; Navajas, M.; Sucena, E.; Magalhães, S.; Nagy, L.; Pace, R.M.; Djuranović, S.; Smagghe, G.; Iga, M.; Christiaens, O.; Veenstra, J.A.; Ewer, J.; Villalobos, R.M.; Hutter, J.L.; Hudson, S.D.; Velez, M.; Yi, S.V.; Zeng, J.; Pires-dasilva, A.; Roch, F.; Cazaux, M.; Navarro, M.; Zhurov, V.; Acevedo, G.; Bjelica, A.; Fawcett, J.A.; Bonnet, E.; Martens, C.; Baele, G.; Wissler, L.; Sanchez-Rodriguez, A.; Tirry, L.; Blais, C.; Demeestere, K.; Henz, S.R.; Gregory, T.R.; Mathieu, J.; Verdon, L.; Farinelli, L.; Schmutz, J.; Lindquist, E.; Feyereisen, R.; Van de Peer, Y.

    2011-01-01

    The spider mite Tetranychus urticae is a cosmopolitan agricultural pest with an extensive host plant range and an extreme record of pesticide resistance. Here we present the completely sequenced and annotated spider mite genome, representing the first complete chelicerate genome. At 90 megabases T.

  18. Revised annotation of Plutella xylostella microRNAs and their genome-wide target identification.

    Science.gov (United States)

    Etebari, K; Asgari, S

    2016-12-01

    The diamondback moth, Plutella xylostella, is the most devastating pest of brassica crops worldwide. Although 128 mature microRNAs (miRNAs) have been annotated from this species in miRBase, there is a need to extend and correct the current P. xylostella miRNA repertoire as a result of its recently improved genome assembly and more available small RNA sequence data. We used our new ultra-deep sequence data and bioinformatics to re-annotate the P. xylostella genome for high confidence miRNAs with the correct 5p and 3p arm features. Furthermore, all the P. xylostella annotated genes were also screened to identify potential miRNA binding sites using three target-predicting algorithms. In total, 203 mature miRNAs were annotated, including 33 novel miRNAs. We identified 7691 highly confident binding sites for 160 pxy-miRNAs. The data provided here will facilitate future studies involving functional analyses of P. xylostella miRNAs as a platform to introduce novel approaches for sustainable management of this destructive pest. © 2016 The Royal Entomological Society.

  19. The Genome Sequence of Leishmania (Leishmania) amazonensis: Functional Annotation and Extended Analysis of Gene Models

    Science.gov (United States)

    Real, Fernando; Vidal, Ramon Oliveira; Carazzolle, Marcelo Falsarella; Mondego, Jorge Maurício Costa; Costa, Gustavo Gilson Lacerda; Herai, Roberto Hirochi; Würtele, Martin; de Carvalho, Lucas Miguel; e Ferreira, Renata Carmona; Mortara, Renato Arruda; Barbiéri, Clara Lucia; Mieczkowski, Piotr; da Silveira, José Franco; Briones, Marcelo Ribeiro da Silva; Pereira, Gonçalo Amarante Guimarães; Bahia, Diana

    2013-01-01

    We present the sequencing and annotation of the Leishmania (Leishmania) amazonensis genome, an etiological agent of human cutaneous leishmaniasis in the Amazon region of Brazil. L. (L.) amazonensis shares features with Leishmania (L.) mexicana but also exhibits unique characteristics regarding geographical distribution and clinical manifestations of cutaneous lesions (e.g. borderline disseminated cutaneous leishmaniasis). Predicted genes were scored for orthologous gene families and conserved domains in comparison with other human pathogenic Leishmania spp. Carboxypeptidase, aminotransferase, and 3′-nucleotidase genes and ATPase, thioredoxin, and chaperone-related domains were represented more abundantly in L. (L.) amazonensis and L. (L.) mexicana species. Phylogenetic analysis revealed that these two species share groups of amastin surface proteins unique to the genus that could be related to specific features of disease outcomes and host cell interactions. Additionally, we describe a hypothetical hybrid interactome of potentially secreted L. (L.) amazonensis proteins and host proteins under the assumption that parasite factors mimic their mammalian counterparts. The model predicts an interaction between an L. (L.) amazonensis heat-shock protein and mammalian Toll-like receptor 9, which is implicated in important immune responses such as cytokine and nitric oxide production. The analysis presented here represents valuable information for future studies of leishmaniasis pathogenicity and treatment. PMID:23857904

  20. Comparative Annotation of Viral Genomes with Non-Conserved Gene Structure

    DEFF Research Database (Denmark)

    de Groot, Saskia; Mailund, Thomas; Hein, Jotun

    2007-01-01

    Motivation: Detecting genes in viral genomes is a complex task. Due to the biological necessity of them being constrained in length, RNA viruses in particular tend to code in overlapping reading frames. Since one amino acid is encoded by a triplet of nucleic acids, up to three genes may be coded...... allows for coding in unidirectional nested and overlapping reading frames, to annotate two homologous aligned viral genomes. Our method does not insist on conserved gene structure between the two sequences, thus making it applicable for the pairwise comparison of more distantly related sequences. Results...... and HIV2, as well as of two different Hepatitis Viruses, attaining results of ~87% sensitivity and ~98.5% specificity. We subsequently incorporate prior knowledge by "knowing" the gene structure of one sequence and annotating the other conditional on it. Boosting accuracy close to perfect we demonstrate...

  1. Metingear: a development environment for annotating genome-scale metabolic models.

    Science.gov (United States)

    May, John W; James, A Gordon; Steinbeck, Christoph

    2013-09-01

    Genome-scale metabolic models often lack annotations that would allow them to be used for further analysis. Previous efforts have focused on associating metabolites in the model with a cross reference, but this can be problematic if the reference is not freely available, multiple resources are used or the metabolite is added from a literature review. Associating each metabolite with chemical structure provides unambiguous identification of the components and a more detailed view of the metabolism. We have developed an open-source desktop application that simplifies the process of adding database cross references and chemical structures to genome-scale metabolic models. Annotated models can be exported to the Systems Biology Markup Language open interchange format. Source code, binaries, documentation and tutorials are freely available at http://johnmay.github.com/metingear. The application is implemented in Java with bundles available for MS Windows and Macintosh OS X.

  2. Comparative Pan-Genome Analysis of Piscirickettsia salmonis Reveals Genomic Divergences within Genogroups

    Directory of Open Access Journals (Sweden)

    Guillermo Nourdin-Galindo

    2017-10-01

    Full Text Available Piscirickettsia salmonis is the etiological agent of salmonid rickettsial septicemia, a disease that seriously affects the salmonid industry. Despite efforts to genomically characterize P. salmonis, functional information on the life cycle, pathogenesis mechanisms, diagnosis, treatment, and control of this fish pathogen remain lacking. To address this knowledge gap, the present study conducted an in silico pan-genome analysis of 19 P. salmonis strains from distinct geographic locations and genogroups. Results revealed an expected open pan-genome of 3,463 genes and a core-genome of 1,732 genes. Two marked genogroups were identified, as confirmed by phylogenetic and phylogenomic relationships to the LF-89 and EM-90 reference strains, as well as by assessments of genomic structures. Different structural configurations were found for the six identified copies of the ribosomal operon in the P. salmonis genome, indicating translocation throughout the genetic material. Chromosomal divergences in genomic localization and quantity of genetic cassettes were also found for the Dot/Icm type IVB secretion system. To determine divergences between core-genomes, additional pan-genome descriptions were compiled for the so-termed LF and EM genogroups. Open pan-genomes composed of 2,924 and 2,778 genes and core-genomes composed of 2,170 and 2,228 genes were respectively found for the LF and EM genogroups. The core-genomes were functionally annotated using the Gene Ontology, KEGG, and Virulence Factor databases, revealing the presence of several shared groups of genes related to basic function of intracellular survival and bacterial pathogenesis. Additionally, the specific pan-genomes for the LF and EM genogroups were defined, resulting in the identification of 148 and 273 exclusive proteins, respectively. Notably, specific virulence factors linked to adherence, colonization, invasion factors, and endotoxins were established. The obtained data suggest that these

  3. Comparative genomics reveals insights into avian genome evolution and adaptation

    Science.gov (United States)

    Zhang, Guojie; Li, Cai; Li, Qiye; Li, Bo; Larkin, Denis M.; Lee, Chul; Storz, Jay F.; Antunes, Agostinho; Greenwold, Matthew J.; Meredith, Robert W.; Ödeen, Anders; Cui, Jie; Zhou, Qi; Xu, Luohao; Pan, Hailin; Wang, Zongji; Jin, Lijun; Zhang, Pei; Hu, Haofu; Yang, Wei; Hu, Jiang; Xiao, Jin; Yang, Zhikai; Liu, Yang; Xie, Qiaolin; Yu, Hao; Lian, Jinmin; Wen, Ping; Zhang, Fang; Li, Hui; Zeng, Yongli; Xiong, Zijun; Liu, Shiping; Zhou, Long; Huang, Zhiyong; An, Na; Wang, Jie; Zheng, Qiumei; Xiong, Yingqi; Wang, Guangbiao; Wang, Bo; Wang, Jingjing; Fan, Yu; da Fonseca, Rute R.; Alfaro-Núñez, Alonzo; Schubert, Mikkel; Orlando, Ludovic; Mourier, Tobias; Howard, Jason T.; Ganapathy, Ganeshkumar; Pfenning, Andreas; Whitney, Osceola; Rivas, Miriam V.; Hara, Erina; Smith, Julia; Farré, Marta; Narayan, Jitendra; Slavov, Gancho; Romanov, Michael N; Borges, Rui; Machado, João Paulo; Khan, Imran; Springer, Mark S.; Gatesy, John; Hoffmann, Federico G.; Opazo, Juan C.; Håstad, Olle; Sawyer, Roger H.; Kim, Heebal; Kim, Kyu-Won; Kim, Hyeon Jeong; Cho, Seoae; Li, Ning; Huang, Yinhua; Bruford, Michael W.; Zhan, Xiangjiang; Dixon, Andrew; Bertelsen, Mads F.; Derryberry, Elizabeth; Warren, Wesley; Wilson, Richard K; Li, Shengbin; Ray, David A.; Green, Richard E.; O’Brien, Stephen J.; Griffin, Darren; Johnson, Warren E.; Haussler, David; Ryder, Oliver A.; Willerslev, Eske; Graves, Gary R.; Alström, Per; Fjeldså, Jon; Mindell, David P.; Edwards, Scott V.; Braun, Edward L.; Rahbek, Carsten; Burt, David W.; Houde, Peter; Zhang, Yong; Yang, Huanming; Wang, Jian; Jarvis, Erich D.; Gilbert, M. Thomas P.; Wang, Jun

    2015-01-01

    Birds are the most species-rich class of tetrapod vertebrates and have wide relevance across many research fields. We explored bird macroevolution using full genomes from 48 avian species representing all major extant clades. The avian genome is principally characterized by its constrained size, which predominantly arose because of lineage-specific erosion of repetitive elements, large segmental deletions, and gene loss. Avian genomes furthermore show a remarkably high degree of evolutionary stasis at the levels of nucleotide sequence, gene synteny, and chromosomal structure. Despite this pattern of conservation, we detected many non-neutral evolutionary changes in protein-coding genes and noncoding regions. These analyses reveal that pan-avian genomic diversity covaries with adaptations to different lifestyles and convergent evolution of traits. PMID:25504712

  4. Expanded microbial genome coverage and improved protein family annotation in the COG database.

    Science.gov (United States)

    Galperin, Michael Y; Makarova, Kira S; Wolf, Yuri I; Koonin, Eugene V

    2015-01-01

    Microbial genome sequencing projects produce numerous sequences of deduced proteins, only a small fraction of which have been or will ever be studied experimentally. This leaves sequence analysis as the only feasible way to annotate these proteins and assign to them tentative functions. The Clusters of Orthologous Groups of proteins (COGs) database (http://www.ncbi.nlm.nih.gov/COG/), first created in 1997, has been a popular tool for functional annotation. Its success was largely based on (i) its reliance on complete microbial genomes, which allowed reliable assignment of orthologs and paralogs for most genes; (ii) orthology-based approach, which used the function(s) of the characterized member(s) of the protein family (COG) to assign function(s) to the entire set of carefully identified orthologs and describe the range of potential functions when there were more than one; and (iii) careful manual curation of the annotation of the COGs, aimed at detailed prediction of the biological function(s) for each COG while avoiding annotation errors and overprediction. Here we present an update of the COGs, the first since 2003, and a comprehensive revision of the COG annotations and expansion of the genome coverage to include representative complete genomes from all bacterial and archaeal lineages down to the genus level. This re-analysis of the COGs shows that the original COG assignments had an error rate below 0.5% and allows an assessment of the progress in functional genomics in the past 12 years. During this time, functions of many previously uncharacterized COGs have been elucidated and tentative functional assignments of many COGs have been validated, either by targeted experiments or through the use of high-throughput methods. A particularly important development is the assignment of functions to several widespread, conserved proteins many of which turned out to participate in translation, in particular rRNA maturation and tRNA modification. The new version of the

  5. Genome sequencing and annotation of Amycolatopsis vancoresmycina strain DSM 44592T

    Directory of Open Access Journals (Sweden)

    Navjot Kaur

    2014-12-01

    Full Text Available We report the 9.0-Mb draft genome of Amycolatopsis vancoresmycina strain DSM 44592T, isolated from Indian soil sample; produces antibiotic vancoresmycin. Draft genome of strain DSM44592T consists of 9,037,069 bp with a G+C content of 71.79% and 8340 predicted protein coding genes and 57 RNAs. RAST annotation indicates that strains Streptomyces sp. AA4 (score 521, Saccharomonospora viridis DSM 43017 (score 400 and Actinosynnema mirum DSM 43827 (score 372 are the closest neighbors of the strain DSM 44592T.

  6. Considerations for creating and annotating the budding yeast Genome Map at SGD: a progress report.

    Science.gov (United States)

    Chan, Esther T; Cherry, J Michael

    2012-01-01

    The Saccharomyces Genome Database (SGD) is compiling and annotating a comprehensive catalogue of functional sequence elements identified in the budding yeast genome. Recent advances in deep sequencing technologies have enabled for example, global analyses of transcription profiling and assembly of maps of transcription factor occupancy and higher order chromatin organization, at nucleotide level resolution. With this growing influx of published genome-scale data, come new challenges for their storage, display, analysis and integration. Here, we describe SGD's progress in the creation of a consolidated resource for genome sequence elements in the budding yeast, the considerations taken in its design and the lessons learned thus far. The data within this collection can be accessed at http://browse.yeastgenome.org and downloaded from http://downloads.yeastgenome.org. DATABASE URL: http://www.yeastgenome.org.

  7. Genome sequencing and comparative genomics reveal a repertoire of putative pathogenicity genes in chilli anthracnose fungus Colletotrichum truncatum.

    Science.gov (United States)

    Rao, Soumya; Nandineni, Madhusudan R

    2017-01-01

    Colletotrichum truncatum, a major fungal phytopathogen, causes the anthracnose disease on an economically important spice crop chilli (Capsicum annuum), resulting in huge economic losses in tropical and sub-tropical countries. It follows a subcuticular intramural infection strategy on chilli with a short, asymptomatic, endophytic phase, which contrasts with the intracellular hemibiotrophic lifestyle adopted by most of the Colletotrichum species. However, little is known about the molecular determinants and the mechanism of pathogenicity in this fungus. A high quality whole genome sequence and gene annotation based on transcriptome data of an Indian isolate of C. truncatum from chilli has been obtained. Analysis of the genome sequence revealed a rich repertoire of pathogenicity genes in C. truncatum encoding secreted proteins, effectors, plant cell wall degrading enzymes, secondary metabolism associated proteins, with potential roles in the host-specific infection strategy, placing it next only to the Fusarium species. The size of genome assembly, number of predicted genes and some of the functional categories were similar to other sequenced Colletotrichum species. The comparative genomic analyses with other species and related fungi identified some unique genes and certain highly expanded gene families of CAZymes, proteases and secondary metabolism associated genes in the genome of C. truncatum. The draft genome assembly and functional annotation of potential pathogenicity genes of C. truncatum provide an important genomic resource for understanding the biology and lifestyle of this important phytopathogen and will pave the way for designing efficient disease control regimens.

  8. Epigenomic annotation-based interpretation of genomic data: from enrichment analysis to machine learning.

    Science.gov (United States)

    Dozmorov, Mikhail G

    2017-10-15

    One of the goals of functional genomics is to understand the regulatory implications of experimentally obtained genomic regions of interest (ROIs). Most sequencing technologies now generate ROIs distributed across the whole genome. The interpretation of these genome-wide ROIs represents a challenge as the majority of them lie outside of functionally well-defined protein coding regions. Recent efforts by the members of the International Human Epigenome Consortium have generated volumes of functional/regulatory data (reference epigenomic datasets), effectively annotating the genome with epigenomic properties. Consequently, a wide variety of computational tools has been developed utilizing these epigenomic datasets for the interpretation of genomic data. The purpose of this review is to provide a structured overview of practical solutions for the interpretation of ROIs with the help of epigenomic data. Starting with epigenomic enrichment analysis, we discuss leading tools and machine learning methods utilizing epigenomic and 3D genome structure data. The hierarchy of tools and methods reviewed here presents a practical guide for the interpretation of genome-wide ROIs within an epigenomic context. mikhail.dozmorov@vcuhealth.org. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  9. MIPS: analysis and annotation of proteins from whole genomes in 2005.

    Science.gov (United States)

    Mewes, H W; Frishman, D; Mayer, K F X; Münsterkötter, M; Noubibou, O; Pagel, P; Rattei, T; Oesterheld, M; Ruepp, A; Stümpflen, V

    2006-01-01

    The Munich Information Center for Protein Sequences (MIPS at the GSF), Neuherberg, Germany, provides resources related to genome information. Manually curated databases for several reference organisms are maintained. Several of these databases are described elsewhere in this and other recent NAR database issues. In a complementary effort, a comprehensive set of >400 genomes automatically annotated with the PEDANT system are maintained. The main goal of our current work on creating and maintaining genome databases is to extend gene centered information to information on interactions within a generic comprehensive framework. We have concentrated our efforts along three lines (i) the development of suitable comprehensive data structures and database technology, communication and query tools to include a wide range of different types of information enabling the representation of complex information such as functional modules or networks Genome Research Environment System, (ii) the development of databases covering computable information such as the basic evolutionary relations among all genes, namely SIMAP, the sequence similarity matrix and the CABiNet network analysis framework and (iii) the compilation and manual annotation of information related to interactions such as protein-protein interactions or other types of relations (e.g. MPCDB, MPPI, CYGD). All databases described and the detailed descriptions of our projects can be accessed through the MIPS WWW server (http://mips.gsf.de).

  10. The fast changing landscape of sequencing technologies and their impact on microbial genome assemblies and annotation.

    Science.gov (United States)

    Mavromatis, Konstantinos; Land, Miriam L; Brettin, Thomas S; Quest, Daniel J; Copeland, Alex; Clum, Alicia; Goodwin, Lynne; Woyke, Tanja; Lapidus, Alla; Klenk, Hans Peter; Cottingham, Robert W; Kyrpides, Nikos C

    2012-01-01

    The emergence of next generation sequencing (NGS) has provided the means for rapid and high throughput sequencing and data generation at low cost, while concomitantly creating a new set of challenges. The number of available assembled microbial genomes continues to grow rapidly and their quality reflects the quality of the sequencing technology used, but also of the analysis software employed for assembly and annotation. In this work, we have explored the quality of the microbial draft genomes across various sequencing technologies. We have compared the draft and finished assemblies of 133 microbial genomes sequenced at the Department of Energy-Joint Genome Institute and finished at the Los Alamos National Laboratory using a variety of combinations of sequencing technologies, reflecting the transition of the institute from Sanger-based sequencing platforms to NGS platforms. The quality of the public assemblies and of the associated gene annotations was evaluated using various metrics. Results obtained with the different sequencing technologies, as well as their effects on downstream processes, were analyzed. Our results demonstrate that the Illumina HiSeq 2000 sequencing system, the primary sequencing technology currently used for de novo genome sequencing and assembly at JGI, has various advantages in terms of total sequence throughput and cost, but it also introduces challenges for the downstream analyses. In all cases assembly results although on average are of high quality, need to be viewed critically and consider sources of errors in them prior to analysis. These data follow the evolution of microbial sequencing and downstream processing at the JGI from draft genome sequences with large gaps corresponding to missing genes of significant biological role to assemblies with multiple small gaps (Illumina) and finally to assemblies that generate almost complete genomes (Illumina+PacBio).

  11. Re-annotation of the genome sequence of Helicobacter pylori 26695

    Directory of Open Access Journals (Sweden)

    Resende Tiago

    2013-12-01

    Full Text Available Helicobacter pylori is a pathogenic bacterium that colonizes the human epithelia, causing duodenal and gastric ulcers, and gastric cancer. The genome of H. pylori 26695 has been previously sequenced and annotated. In addition, two genome-scale metabolic models have been developed. In order to maintain accurate and relevant information on coding sequences (CDS and to retrieve new information, the assignment of new functions to Helicobacter pylori 26695s genes was performed in this work. The use of software tools, on-line databases and an annotation pipeline for inspecting each gene allowed the attribution of validated EC numbers and TC numbers to metabolic genes encoding enzymes and transport proteins, respectively. 1212 genes encoding proteins were identified in this annotation, being 712 metabolic genes and 500 non-metabolic, while 191 new functions were assignment to the CDS of this bacterium. This information provides relevant biological information for the scientific community dealing with this organism and can be used as the basis for a new metabolic model reconstruction.

  12. Single Amplified Genomes as Source for Novel Extremozymes: Annotation, Expression and Functional Assessment

    KAUST Repository

    Grötzinger, Stefan

    2017-12-01

    Enzymes, as nature’s catalysts, show remarkable abilities that can revolutionize the chemical, biotechnological, bioremediation, agricultural and pharmaceutical industries. However, the narrow range of stability of the majority of described biocatalysts limits their use for many applications. To overcome these restrictions, extremozymes derived from microorganisms thriving under harsh conditions can be used. Extremophiles living in high salinity are especially interesting as they operate at low water activity, which is similar to conditions used in standard chemical applications. Because only about 0.1 % of all microorganisms can be cultured, the traditional way of culture-based enzyme function determination needs to be overcome. The rise of high-throughput next-generation-sequencing technologies allows for deep insight into nature’s variety. Single amplified genomes (SAGs) specifically allow for whole genome assemblies from small sample volumes with low cell yields, as are typical for extreme environments. Although these technologies have been available for years, the expected boost in biotechnology has held off. One of the main reasons is the lack of reliable functional annotation of the genomic data, which is caused by the low amount (0.15 %) of experimentally described genes. Here, we present a novel annotation algorithm, designed to annotate the enzymatic function of genomes from microorganisms with low homologies to described microorganisms. The algorithm was established on SAGs from the extreme environment of selected hypersaline Red Sea brine pools with 4.3 M salinity and temperatures up to 68°C. Additionally, a novel consensus pattern for the identification of γ-carbonic anhydrases was created and applied in the algorithm. To verify the annotation, selected genes were expressed in the hypersaline expression system Halobacterium salinarum. This expression system was established and optimized in a continuously stirred tank reactor, leading to

  13. Discovery and annotation of small proteins using genomics, proteomics and computational approaches

    Energy Technology Data Exchange (ETDEWEB)

    Yang, Xiaohan; Tschaplinski, Timothy J.; Hurst, Gregory B.; Jawdy, Sara; Abraham, Paul E.; Lankford, Patricia K.; Adams, Rachel M.; Shah, Manesh B.; Hettich, Robert L.; Lindquist, Erika; Kalluri, Udaya C.; Gunter, Lee E.; Pennacchio, Christa; Tuskan, Gerald A.

    2011-03-02

    Small proteins (10 200 amino acids aa in length) encoded by short open reading frames (sORF) play important regulatory roles in various biological processes, including tumor progression, stress response, flowering, and hormone signaling. However, ab initio discovery of small proteins has been relatively overlooked. Recent advances in deep transcriptome sequencing make it possible to efficiently identify sORFs at the genome level. In this study, we obtained 2.6 million expressed sequence tag (EST) reads from Populus deltoides leaf transcriptome and reconstructed full-length transcripts from the EST sequences. We identified an initial set of 12,852 sORFs encoding proteins of 10 200 aa in length. Three computational approaches were then used to enrich for bona fide protein-coding sORFs from the initial sORF set: (1) codingpotential prediction, (2) evolutionary conservation between P. deltoides and other plant species, and (3) gene family clustering within P. deltoides. As a result, a high-confidence sORF candidate set containing 1469 genes was obtained. Analysis of the protein domains, non-protein-coding RNA motifs, sequence length distribution, and protein mass spectrometry data supported this high-confidence sORF set. In the high-confidence sORF candidate set, known protein domains were identified in 1282 genes (higher-confidence sORF candidate set), out of which 611 genes, designated as highest-confidence candidate sORF set, were supported by proteomics data. Of the 611 highest-confidence candidate sORF genes, 56 were new to the current Populus genome annotation. This study not only demonstrates that there are potential sORF candidates to be annotated in sequenced genomes, but also presents an efficient strategy for discovery of sORFs in species with no genome annotation yet available.

  14. Identification of novel biomass-degrading enzymes from genomic dark matter: Populating genomic sequence space with functional annotation.

    Science.gov (United States)

    Piao, Hailan; Froula, Jeff; Du, Changbin; Kim, Tae-Wan; Hawley, Erik R; Bauer, Stefan; Wang, Zhong; Ivanova, Nathalia; Clark, Douglas S; Klenk, Hans-Peter; Hess, Matthias

    2014-08-01

    Although recent nucleotide sequencing technologies have significantly enhanced our understanding of microbial genomes, the function of ∼35% of genes identified in a genome currently remains unknown. To improve the understanding of microbial genomes and consequently of microbial processes it will be crucial to assign a function to this "genomic dark matter." Due to the urgent need for additional carbohydrate-active enzymes for improved production of transportation fuels from lignocellulosic biomass, we screened the genomes of more than 5,500 microorganisms for hypothetical proteins that are located in the proximity of already known cellulases. We identified, synthesized and expressed a total of 17 putative cellulase genes with insufficient sequence similarity to currently known cellulases to be identified as such using traditional sequence annotation techniques that rely on significant sequence similarity. The recombinant proteins of the newly identified putative cellulases were subjected to enzymatic activity assays to verify their hydrolytic activity towards cellulose and lignocellulosic biomass. Eleven (65%) of the tested enzymes had significant activity towards at least one of the substrates. This high success rate highlights that a gene context-based approach can be used to assign function to genes that are otherwise categorized as "genomic dark matter" and to identify biomass-degrading enzymes that have little sequence similarity to already known cellulases. The ability to assign function to genes that have no related sequence representatives with functional annotation will be important to enhance our understanding of microbial processes and to identify microbial proteins for a wide range of applications. © 2014 Wiley Periodicals, Inc.

  15. The duplicated genes database: identification and functional annotation of co-localised duplicated genes across genomes.

    Directory of Open Access Journals (Sweden)

    Marion Ouedraogo

    Full Text Available BACKGROUND: There has been a surge in studies linking genome structure and gene expression, with special focus on duplicated genes. Although initially duplicated from the same sequence, duplicated genes can diverge strongly over evolution and take on different functions or regulated expression. However, information on the function and expression of duplicated genes remains sparse. Identifying groups of duplicated genes in different genomes and characterizing their expression and function would therefore be of great interest to the research community. The 'Duplicated Genes Database' (DGD was developed for this purpose. METHODOLOGY: Nine species were included in the DGD. For each species, BLAST analyses were conducted on peptide sequences corresponding to the genes mapped on a same chromosome. Groups of duplicated genes were defined based on these pairwise BLAST comparisons and the genomic location of the genes. For each group, Pearson correlations between gene expression data and semantic similarities between functional GO annotations were also computed when the relevant information was available. CONCLUSIONS: The Duplicated Gene Database provides a list of co-localised and duplicated genes for several species with the available gene co-expression level and semantic similarity value of functional annotation. Adding these data to the groups of duplicated genes provides biological information that can prove useful to gene expression analyses. The Duplicated Gene Database can be freely accessed through the DGD website at http://dgd.genouest.org.

  16. Data on genome sequencing, analysis and annotation of a pathogenic Bacillus cereus 062011msu

    Directory of Open Access Journals (Sweden)

    Rashmi Rathy

    2018-04-01

    Full Text Available Bacillus species 062011 msu is a harmful pathogenic strain responsible for causing abscessation in sheep and goat population studied by Mariappan et al. (2012 [1]. The organism specifically targets the female sheep and goat population and results in the reduction of milk and meat production. In the present study, we have performed the whole genome sequencing of the pathogenic isolate using the Ion Torrent sequencing platform and generated 458,944 raw reads with an average length of 198.2 bp. The genome sequence was assembled, annotated and analysed for the genetic islands, metabolic pathways, orthologous groups, virulence factors and antibiotic resistance genes associated with the pathogen. Simultaneously the 16S rRNA sequencing study and genome sequence comparison data confirmed that the strain belongs to the species Bacillus cereus and exhibits 99% sequence homo;logy with the genomes of B. cereus ATCC 10987 and B. cereus FRI-35. Hence, we have renamed the organism as Bacillus cereus 062011msu. The Whole Genome Shotgun (WGS project has been deposited at DDBJ/ENA/GenBank under the accession NTMF00000000 (https://www.ncbi.nlm.nih.gov/bioproject/PRJNA404036(SAMN07629099. Keywords: Bacillus cereus, Genome sequencing, Abscessation, Virulence factors

  17. High-density rhesus macaque oligonucleotide microarray design using early-stage rhesus genome sequence information and human genome annotations

    Directory of Open Access Journals (Sweden)

    Magness Charles L

    2007-01-01

    a closely related species. Conclusion The number of different genes represented on microarrays for unfinished genomes can be greatly increased by matching known gene transcript annotations from a closely related species with sequence data from the unfinished genome. Signal intensity on both EST- and genome-derived arrays was highly correlated with probe distance from the 3' UTR, information often missing from ESTs yet present in early-stage genome projects.

  18. PeakAnalyzer: Genome-wide annotation of chromatin binding and modification loci

    Directory of Open Access Journals (Sweden)

    Tammoja Kairi

    2010-08-01

    Full Text Available Abstract Background Functional genomic studies involving high-throughput sequencing and tiling array applications, such as ChIP-seq and ChIP-chip, generate large numbers of experimentally-derived signal peaks across the genome under study. In analyzing these loci to determine their potential regulatory functions, areas of signal enrichment must be considered relative to proximal genes and regulatory elements annotated throughout the target genome Regions of chromatin association by transcriptional regulators should be distinguished as individual binding sites in order to enhance downstream analyses, such as the identification of known and novel consensus motifs. Results PeakAnalyzer is a set of high-performance utilities for the automated processing of experimentally-derived peak regions and annotation of genomic loci. The programs can accurately subdivide multimodal regions of signal enrichment into distinct subpeaks corresponding to binding sites or chromatin modifications, retrieve genomic sequences encompassing the computed subpeak summits, and identify positional features of interest such as intersection with exon/intron gene components, proximity to up- or downstream transcriptional start sites and cis-regulatory elements. The software can be configured to run either as a pipeline component for high-throughput analyses, or as a cross-platform desktop application with an intuitive user interface. Conclusions PeakAnalyzer comprises a number of utilities essential for ChIP-seq and ChIP-chip data analysis. High-performance implementations are provided for Unix pipeline integration along with a GUI version for interactive use. Source code in C++ and Java is provided, as are native binaries for Linux, Mac OS X and Windows systems.

  19. High-throughput proteogenomics of Ruegeria pomeroyi: seeding a better genomic annotation for the whole marine Roseobacter clade

    Directory of Open Access Journals (Sweden)

    Christie-Oleza Joseph A

    2012-02-01

    Full Text Available Abstract Background The structural and functional annotation of genomes is now heavily based on data obtained using automated pipeline systems. The key for an accurate structural annotation consists of blending similarities between closely related genomes with biochemical evidence of the genome interpretation. In this work we applied high-throughput proteogenomics to Ruegeria pomeroyi, a member of the Roseobacter clade, an abundant group of marine bacteria, as a seed for the annotation of the whole clade. Results A large dataset of peptides from R. pomeroyi was obtained after searching over 1.1 million MS/MS spectra against a six-frame translated genome database. We identified 2006 polypeptides, of which thirty-four were encoded by open reading frames (ORFs that had not previously been annotated. From the pool of 'one-hit-wonders', i.e. those ORFs specified by only one peptide detected by tandem mass spectrometry, we could confirm the probable existence of five additional new genes after proving that the corresponding RNAs were transcribed. We also identified the most-N-terminal peptide of 486 polypeptides, of which sixty-four had originally been wrongly annotated. Conclusions By extending these re-annotations to the other thirty-six Roseobacter isolates sequenced to date (twenty different genera, we propose the correction of the assigned start codons of 1082 homologous genes in the clade. In addition, we also report the presence of novel genes within operons encoding determinants of the important tricarboxylic acid cycle, a feature that seems to be characteristic of some Roseobacter genomes. The detection of their corresponding products in large amounts raises the question of their function. Their discoveries point to a possible theory for protein evolution that will rely on high expression of orphans in bacteria: their putative poor efficiency could be counterbalanced by a higher level of expression. Our proteogenomic analysis will increase

  20. G2S: A web-service for annotating genomic variants on 3D protein structures.

    Science.gov (United States)

    Wang, Juexin; Sheridan, Robert; Sumer, S Onur; Schultz, Nikolaus; Xu, Dong; Gao, Jianjiong

    2018-01-27

    Accurately mapping and annotating genomic locations on 3D protein structures is a key step in structure-based analysis of genomic variants detected by recent large-scale sequencing efforts. There are several mapping resources currently available, but none of them provides a web API (Application Programming Interface) that support programmatic access. We present G2S, a real-time web API that provides automated mapping of genomic variants on 3D protein structures. G2S can align genomic locations of variants, protein locations, or protein sequences to protein structures and retrieve the mapped residues from structures. G2S API uses REST-inspired design conception and it can be used by various clients such as web browsers, command terminals, programming languages and other bioinformatics tools for bringing 3D structures into genomic variant analysis. The webserver and source codes are freely available at https://g2s.genomenexus.org. g2s@genomenexus.org. Supplementary data are available at Bioinformatics online. © The Author (2018). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  1. Updated genome assembly and annotation of Paenibacillus larvae, the agent of American foulbrood disease of honey bees

    Directory of Open Access Journals (Sweden)

    de Graaf Dirk C

    2011-09-01

    Full Text Available Abstract Background As scientists continue to pursue various 'omics-based research, there is a need for high quality data for the most fundamental 'omics of all: genomics. The bacterium Paenibacillus larvae is the causative agent of the honey bee disease American foulbrood. If untreated, it can lead to the demise of an entire hive; the highly social nature of bees also leads to easy disease spread, between both individuals and colonies. Biologists have studied this organism since the early 1900s, and a century later, the molecular mechanism of infection remains elusive. Transcriptomics and proteomics, because of their ability to analyze multiple genes and proteins in a high-throughput manner, may be very helpful to its study. However, the power of these methodologies is severely limited without a complete genome; we undertake to address that deficiency here. Results We used the Illumina GAIIx platform and conventional Sanger sequencing to generate a 182-fold sequence coverage of the P. larvae genome, and assembled the data using ABySS into a total of 388 contigs spanning 4.5 Mbp. Comparative genomics analysis against fully-sequenced soil bacteria P. JDR2 and P. vortex showed that regions of poor conservation may contain putative virulence factors. We used GLIMMER to predict 3568 gene models, and named them based on homology revealed by BLAST searches; proteases, hemolytic factors, toxins, and antibiotic resistance enzymes were identified in this way. Finally, mass spectrometry was used to provide experimental evidence that at least 35% of the genes are expressed at the protein level. Conclusions This update on the genome of P. larvae and annotation represents an immense advancement from what we had previously known about this species. We provide here a reliable resource that can be used to elucidate the mechanism of infection, and by extension, more effective methods to control and cure this widespread honey bee disease.

  2. xGDBvm: A Web GUI-Driven Workflow for Annotating Eukaryotic Genomes in the Cloud[OPEN

    Science.gov (United States)

    Merchant, Nirav

    2016-01-01

    Genome-wide annotation of gene structure requires the integration of numerous computational steps. Currently, annotation is arguably best accomplished through collaboration of bioinformatics and domain experts, with broad community involvement. However, such a collaborative approach is not scalable at today’s pace of sequence generation. To address this problem, we developed the xGDBvm software, which uses an intuitive graphical user interface to access a number of common genome analysis and gene structure tools, preconfigured in a self-contained virtual machine image. Once their virtual machine instance is deployed through iPlant’s Atmosphere cloud services, users access the xGDBvm workflow via a unified Web interface to manage inputs, set program parameters, configure links to high-performance computing (HPC) resources, view and manage output, apply analysis and editing tools, or access contextual help. The xGDBvm workflow will mask the genome, compute spliced alignments from transcript and/or protein inputs (locally or on a remote HPC cluster), predict gene structures and gene structure quality, and display output in a public or private genome browser complete with accessory tools. Problematic gene predictions are flagged and can be reannotated using the integrated yrGATE annotation tool. xGDBvm can also be configured to append or replace existing data or load precomputed data. Multiple genomes can be annotated and displayed, and outputs can be archived for sharing or backup. xGDBvm can be adapted to a variety of use cases including de novo genome annotation, reannotation, comparison of different annotations, and training or teaching. PMID:27020957

  3. Genome Wide Re-Annotation of Caldicellulosiruptor saccharolyticus with New Insights into Genes Involved in Biomass Degradation and Hydrogen Production.

    Science.gov (United States)

    Chowdhary, Nupoor; Selvaraj, Ashok; KrishnaKumaar, Lakshmi; Kumar, Gopal Ramesh

    2015-01-01

    Caldicellulosiruptor saccharolyticus has proven itself to be an excellent candidate for biological hydrogen (H2) production, but still it has major drawbacks like sensitivity to high osmotic pressure and low volumetric H2 productivity, which should be considered before it can be used industrially. A whole genome re-annotation work has been carried out as an attempt to update the incomplete genome information that causes gap in the knowledge especially in the area of metabolic engineering, to improve the H2 producing capabilities of C. saccharolyticus. Whole genome re-annotation was performed through manual means for 2,682 Coding Sequences (CDSs). Bioinformatics tools based on sequence similarity, motif search, phylogenetic analysis and fold recognition were employed for re-annotation. Our methodology could successfully add functions for 409 hypothetical proteins (HPs), 46 proteins previously annotated as putative and assigned more accurate functions for the known protein sequences. Homology based gene annotation has been used as a standard method for assigning function to novel proteins, but over the past few years many non-homology based methods such as genomic context approaches for protein function prediction have been developed. Using non-homology based functional prediction methods, we were able to assign cellular processes or physical complexes for 249 hypothetical sequences. Our re-annotation pipeline highlights the addition of 231 new CDSs generated from MicroScope Platform, to the original genome with functional prediction for 49 of them. The re-annotation of HPs and new CDSs is stored in the relational database that is available on the MicroScope web-based platform. In parallel, a comparative genome analyses were performed among the members of genus Caldicellulosiruptor to understand the function and evolutionary processes. Further, with results from integrated re-annotation studies (homology and genomic context approach), we strongly suggest that Csac

  4. Genome Wide Re-Annotation of Caldicellulosiruptor saccharolyticus with New Insights into Genes Involved in Biomass Degradation and Hydrogen Production.

    Directory of Open Access Journals (Sweden)

    Nupoor Chowdhary

    Full Text Available Caldicellulosiruptor saccharolyticus has proven itself to be an excellent candidate for biological hydrogen (H2 production, but still it has major drawbacks like sensitivity to high osmotic pressure and low volumetric H2 productivity, which should be considered before it can be used industrially. A whole genome re-annotation work has been carried out as an attempt to update the incomplete genome information that causes gap in the knowledge especially in the area of metabolic engineering, to improve the H2 producing capabilities of C. saccharolyticus. Whole genome re-annotation was performed through manual means for 2,682 Coding Sequences (CDSs. Bioinformatics tools based on sequence similarity, motif search, phylogenetic analysis and fold recognition were employed for re-annotation. Our methodology could successfully add functions for 409 hypothetical proteins (HPs, 46 proteins previously annotated as putative and assigned more accurate functions for the known protein sequences. Homology based gene annotation has been used as a standard method for assigning function to novel proteins, but over the past few years many non-homology based methods such as genomic context approaches for protein function prediction have been developed. Using non-homology based functional prediction methods, we were able to assign cellular processes or physical complexes for 249 hypothetical sequences. Our re-annotation pipeline highlights the addition of 231 new CDSs generated from MicroScope Platform, to the original genome with functional prediction for 49 of them. The re-annotation of HPs and new CDSs is stored in the relational database that is available on the MicroScope web-based platform. In parallel, a comparative genome analyses were performed among the members of genus Caldicellulosiruptor to understand the function and evolutionary processes. Further, with results from integrated re-annotation studies (homology and genomic context approach, we strongly

  5. Genomic organization, annotation, and ligand-receptor inferences of chicken chemokines and chemokine receptor genes based on comparative genomics

    Directory of Open Access Journals (Sweden)

    Sze Sing-Hoi

    2005-03-01

    Full Text Available Abstract Background Chemokines and their receptors play important roles in host defense, organogenesis, hematopoiesis, and neuronal communication. Forty-two chemokines and 19 cognate receptors have been found in the human genome. Prior to this report, only 11 chicken chemokines and 7 receptors had been reported. The objectives of this study were to systematically identify chicken chemokines and their cognate receptor genes in the chicken genome and to annotate these genes and ligand-receptor binding by a comparative genomics approach. Results Twenty-three chemokine and 14 chemokine receptor genes were identified in the chicken genome. All of the chicken chemokines contained a conserved CC, CXC, CX3C, or XC motif, whereas all the chemokine receptors had seven conserved transmembrane helices, four extracellular domains with a conserved cysteine, and a conserved DRYLAIV sequence in the second intracellular domain. The number of coding exons in these genes and the syntenies are highly conserved between human, mouse, and chicken although the amino acid sequence homologies are generally low between mammalian and chicken chemokines. Chicken genes were named with the systematic nomenclature used in humans and mice based on phylogeny, synteny, and sequence homology. Conclusion The independent nomenclature of chicken chemokines and chemokine receptors suggests that the chicken may have ligand-receptor pairings similar to mammals. All identified chicken chemokines and their cognate receptors were identified in the chicken genome except CCR9, whose ligand was not identified in this study. The organization of these genes suggests that there were a substantial number of these genes present before divergence between aves and mammals and more gene duplications of CC, CXC, CCR, and CXCR subfamilies in mammals than in aves after the divergence.

  6. Functional Annotation of All Salmonid Genomes (FAASG): an international initiative supporting future salmonid research, conservation and aquaculture.

    Science.gov (United States)

    Macqueen, Daniel J; Primmer, Craig R; Houston, Ross D; Nowak, Barbara F; Bernatchez, Louis; Bergseth, Steinar; Davidson, William S; Gallardo-Escárate, Cristian; Goldammer, Tom; Guiguen, Yann; Iturra, Patricia; Kijas, James W; Koop, Ben F; Lien, Sigbjørn; Maass, Alejandro; Martin, Samuel A M; McGinnity, Philip; Montecino, Martin; Naish, Kerry A; Nichols, Krista M; Ólafsson, Kristinn; Omholt, Stig W; Palti, Yniv; Plastow, Graham S; Rexroad, Caird E; Rise, Matthew L; Ritchie, Rachael J; Sandve, Simen R; Schulte, Patricia M; Tello, Alfredo; Vidal, Rodrigo; Vik, Jon Olav; Wargelius, Anna; Yáñez, José Manuel

    2017-06-27

    We describe an emerging initiative - the 'Functional Annotation of All Salmonid Genomes' (FAASG), which will leverage the extensive trait diversity that has evolved since a whole genome duplication event in the salmonid ancestor, to develop an integrative understanding of the functional genomic basis of phenotypic variation. The outcomes of FAASG will have diverse applications, ranging from improved understanding of genome evolution, to improving the efficiency and sustainability of aquaculture production, supporting the future of fundamental and applied research in an iconic fish lineage of major societal importance.

  7. Emerging applications of read profiles towards the functional annotation of the genome

    DEFF Research Database (Denmark)

    Pundhir, Sachin; Poirazi, Panayiota; Gorodkin, Jan

    2015-01-01

    is typically a result of the protocol designed to address specific research questions. The sequencing results in reads, which when mapped to a reference genome often leads to the formation of distinct patterns (read profiles). Interpretation of these read profiles is essential for their analysis in relation...... to the research question addressed. Several strategies have been employed at varying levels of abstraction ranging from a somewhat ad hoc to a more systematic analysis of read profiles. These include methods which can compare read profiles, e.g., from direct (non-sequence based) alignments to classification...... of patterns into functional groups. In this review, we highlight the emerging applications of read profiles for the annotation of non-coding RNA and cis-regulatory elements (CREs) such as enhancers and promoters. We also discuss the biological rationale behind their formation....

  8. Genomic variant annotation workflow for clinical applications [version 2; referees: 2 approved

    Directory of Open Access Journals (Sweden)

    Thomas Thurnherr

    2016-10-01

    Full Text Available Annotation and interpretation of DNA aberrations identified through next-generation sequencing is becoming an increasingly important task. Even more so in the context of data analysis pipelines for medical applications, where genomic aberrations are associated with phenotypic and clinical features. Here we describe a workflow to identify potential gene targets in aberrated genes or pathways and their corresponding drugs. To this end, we provide the R/Bioconductor package rDGIdb, an R wrapper to query the drug-gene interaction database (DGIdb. DGIdb accumulates drug-gene interaction data from 15 different resources and allows filtering on different levels. The rDGIdb package makes these resources and tools available to R users. Moreover, rDGIdb queries can be automated through incorporation of the rDGIdb package into NGS sequencing pipelines.

  9. Functional annotation of the genome unravels probiotic potential of Bacillus coagulans HS243.

    Science.gov (United States)

    Kapse, N G; Engineer, A S; Gowdaman, V; Wagh, S; Dhakephalkar, P K

    2018-05-30

    Spore forming Bacillus species are widely used as probiotics for human dietary supplements and in animal feeds. However, information on genetic basis of their probiotic action is obscure. Therefore, the present investigation was undertaken to elucidate probiotic traits of B. coagulans HS243 through its genome analysis. Genome mining revealed the presence of an arsenal of marker genes attributed to genuine probiotic traits. In silico analysis of HS243 genome revealed the presence of multi subunit ATPases, ADI pathway genes, chologlycine hydrolase, adhesion proteins for surviving and colonizing harsh gastric transit. HS243 genome harbored vitamin and essential amino acid biosynthetic genes, suggesting the use of HS243 as a nutrient supplement. Bacteriocin producing genes highlighted the disease preventing potential of HS243. Thus, this work established that HS243 possessed the genetic repertoire required for surviving harsh gastric transit and conferring health benefits to the host which were further validated by wet lab evidences. Copyright © 2018. Published by Elsevier Inc.

  10. A kingdom-specific protein domain HMM library for improved annotation of fungal genomes

    Directory of Open Access Journals (Sweden)

    Oliver Stephen G

    2007-04-01

    Full Text Available Abstract Background Pfam is a general-purpose database of protein domain alignments and profile Hidden Markov Models (HMMs, which is very popular for the annotation of sequence data produced by genome sequencing projects. Pfam provides models that are often very general in terms of the taxa that they cover and it has previously been suggested that such general models may lack some of the specificity or selectivity that would be provided by kingdom-specific models. Results Here we present a general approach to create domain libraries of HMMs for sub-taxa of a kingdom. Taking fungal species as an example, we construct a domain library of HMMs (called Fungal Pfam or FPfam using sequences from 30 genomes, consisting of 24 species from the ascomycetes group and two basidiomycetes, Ustilago maydis, a fungal pathogen of maize, and the white rot fungus Phanerochaete chrysosporium. In addition, we include the Microsporidion Encephalitozoon cuniculi, an obligate intracellular parasite, and two non-fungal species, the oomycetes Phytophthora sojae and Phytophthora ramorum, both plant pathogens. We evaluate the performance in terms of coverage against the original 30 genomes used in training FPfam and against five more recently sequenced fungal genomes that can be considered as an independent test set. We show that kingdom-specific models such as FPfam can find instances of both novel and well characterized domains, increases overall coverage and detects more domains per sequence with typically higher bitscores than Pfam for the same domain families. An evaluation of the effect of changing E-values on the coverage shows that the performance of FPfam is consistent over the range of E-values applied. Conclusion Kingdom-specific models are shown to provide improved coverage. However, as the models become more specific, some sequences found by Pfam may be missed by the models in FPfam and some of the families represented in the test set are not present in FPfam

  11. BRAKER1: Unsupervised RNA-Seq-Based Genome Annotation with GeneMark-ET and AUGUSTUS.

    Science.gov (United States)

    Hoff, Katharina J; Lange, Simone; Lomsadze, Alexandre; Borodovsky, Mark; Stanke, Mario

    2016-03-01

    Gene finding in eukaryotic genomes is notoriously difficult to automate. The task is to design a work flow with a minimal set of tools that would reach state-of-the-art performance across a wide range of species. GeneMark-ET is a gene prediction tool that incorporates RNA-Seq data into unsupervised training and subsequently generates ab initio gene predictions. AUGUSTUS is a gene finder that usually requires supervised training and uses information from RNA-Seq reads in the prediction step. Complementary strengths of GeneMark-ET and AUGUSTUS provided motivation for designing a new combined tool for automatic gene prediction. We present BRAKER1, a pipeline for unsupervised RNA-Seq-based genome annotation that combines the advantages of GeneMark-ET and AUGUSTUS. As input, BRAKER1 requires a genome assembly file and a file in bam-format with spliced alignments of RNA-Seq reads to the genome. First, GeneMark-ET performs iterative training and generates initial gene structures. Second, AUGUSTUS uses predicted genes for training and then integrates RNA-Seq read information into final gene predictions. In our experiments, we observed that BRAKER1 was more accurate than MAKER2 when it is using RNA-Seq as sole source for training and prediction. BRAKER1 does not require pre-trained parameters or a separate expert-prepared training step. BRAKER1 is available for download at http://bioinf.uni-greifswald.de/bioinf/braker/ and http://exon.gatech.edu/GeneMark/ katharina.hoff@uni-greifswald.de or borodovsky@gatech.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  12. Comparative Genomics Reveals High Genomic Diversity in the Genus Photobacterium

    DEFF Research Database (Denmark)

    Machado, Henrique; Gram, Lone

    2017-01-01

    was widespread and abundant in the genus, suggesting a role in genomic evolution. The high genetic variability and indications of genetic exchange make it difficult to elucidate genome evolutionary paths and raise the awareness of the roles of foreign DNA in the genomic evolution of environmental organisms.......Vibrionaceae is a large marine bacterial family, which can constitute up to 50% of the prokaryotic population in marine waters. Photobacterium is the second largest genus in the family and we used comparative genomics on 35 strains representing 16 of the 28 species described so far, to understand...... the genomic diversity present in the Photobacterium genus. Such understanding is important for ecophysiology studies of the genus. We used whole genome sequences to evaluate phylogenetic relationships using several analyses (16S rRNA, MLSA, fur, amino-acid usage, ANI), which allowed us to identify two...

  13. Comparative Genomics Reveals High Genomic Diversity in the Genus Photobacterium

    OpenAIRE

    Henrique Machado; Henrique Machado; Lone Gram

    2017-01-01

    Vibrionaceae is a large marine bacterial family, which can constitute up to 50% of the prokaryotic population in marine waters. Photobacterium is the second largest genus in the family and we used comparative genomics on 35 strains representing 16 of the 28 species described so far, to understand the genomic diversity present in the Photobacterium genus. Such understanding is important for ecophysiology studies of the genus. We used whole genome sequences to evaluate phylogenetic relationship...

  14. Use of Modern Chemical Protein Synthesis and Advanced Fluorescent Assay Techniques to Experimentally Validate the Functional Annotation of Microbial Genomes

    Energy Technology Data Exchange (ETDEWEB)

    Kent, Stephen [University of Chicago

    2012-07-20

    The objective of this research program was to prototype methods for the chemical synthesis of predicted protein molecules in annotated microbial genomes. High throughput chemical methods were to be used to make large numbers of predicted proteins and protein domains, based on microbial genome sequences. Microscale chemical synthesis methods for the parallel preparation of peptide-thioester building blocks were developed; these peptide segments are used for the parallel chemical synthesis of proteins and protein domains. Ultimately, it is envisaged that these synthetic molecules would be ‘printed’ in spatially addressable arrays. The unique ability of total synthesis to precision label protein molecules with dyes and with chemical or biochemical ‘tags’ can be used to facilitate novel assay technologies adapted from state-of-the art single molecule fluorescence detection techniques. In the future, in conjunction with modern laboratory automation this integrated set of techniques will enable high throughput experimental validation of the functional annotation of microbial genomes.

  15. Putative drug and vaccine target protein identification using comparative genomic analysis of KEGG annotated metabolic pathways of Mycoplasma hyopneumoniae.

    Science.gov (United States)

    Damte, Dereje; Suh, Joo-Won; Lee, Seung-Jin; Yohannes, Sileshi Belew; Hossain, Md Akil; Park, Seung-Chun

    2013-07-01

    In the present study, a computational comparative and subtractive genomic/proteomic analysis aimed at the identification of putative therapeutic target and vaccine candidate proteins from Kyoto Encyclopedia of Genes and Genomes (KEGG) annotated metabolic pathways of Mycoplasma hyopneumoniae was performed for drug design and vaccine production pipelines against M.hyopneumoniae. The employed comparative genomic and metabolic pathway analysis with a predefined computational systemic workflow extracted a total of 41 annotated metabolic pathways from KEGG among which five were unique to M. hyopneumoniae. A total of 234 proteins were identified to be involved in these metabolic pathways. Although 125 non homologous and predicted essential proteins were found from the total that could serve as potential drug targets and vaccine candidates, additional prioritizing parameters characterize 21 proteins as vaccine candidate while druggability of each of the identified proteins evaluated by the DrugBank database prioritized 42 proteins suitable for drug targets. Copyright © 2013 Elsevier Inc. All rights reserved.

  16. Genome3D: a UK collaborative project to annotate genomic sequences with predicted 3D structures based on SCOP and CATH domains.

    Science.gov (United States)

    Lewis, Tony E; Sillitoe, Ian; Andreeva, Antonina; Blundell, Tom L; Buchan, Daniel W A; Chothia, Cyrus; Cuff, Alison; Dana, Jose M; Filippis, Ioannis; Gough, Julian; Hunter, Sarah; Jones, David T; Kelley, Lawrence A; Kleywegt, Gerard J; Minneci, Federico; Mitchell, Alex; Murzin, Alexey G; Ochoa-Montaño, Bernardo; Rackham, Owen J L; Smith, James; Sternberg, Michael J E; Velankar, Sameer; Yeats, Corin; Orengo, Christine

    2013-01-01

    Genome3D, available at http://www.genome3d.eu, is a new collaborative project that integrates UK-based structural resources to provide a unique perspective on sequence-structure-function relationships. Leading structure prediction resources (DomSerf, FUGUE, Gene3D, pDomTHREADER, Phyre and SUPERFAMILY) provide annotations for UniProt sequences to indicate the locations of structural domains (structural annotations) and their 3D structures (structural models). Structural annotations and 3D model predictions are currently available for three model genomes (Homo sapiens, E. coli and baker's yeast), and the project will extend to other genomes in the near future. As these resources exploit different strategies for predicting structures, the main aim of Genome3D is to enable comparisons between all the resources so that biologists can see where predictions agree and are therefore more trusted. Furthermore, as these methods differ in whether they build their predictions using CATH or SCOP, Genome3D also contains the first official mapping between these two databases. This has identified pairs of similar superfamilies from the two resources at various degrees of consensus (532 bronze pairs, 527 silver pairs and 370 gold pairs).

  17. Comparative genomics reveals insights into avian genome evolution and adaptation

    DEFF Research Database (Denmark)

    Zhang, Guojie; Li, Cai; Li, Qiye

    2014-01-01

    Birds are the most species-rich class of tetrapod vertebrates and have wide relevance across many research fields. We explored bird macroevolution using full genomes from 48 avian species representing all major extant clades. The avian genome is principally characterized by its constrained size, ...

  18. Estimating gene gain and loss rates in the presence of error in genome assembly and annotation using CAFE 3.

    Science.gov (United States)

    Han, Mira V; Thomas, Gregg W C; Lugo-Martinez, Jose; Hahn, Matthew W

    2013-08-01

    Current sequencing methods produce large amounts of data, but genome assemblies constructed from these data are often fragmented and incomplete. Incomplete and error-filled assemblies result in many annotation errors, especially in the number of genes present in a genome. This means that methods attempting to estimate rates of gene duplication and loss often will be misled by such errors and that rates of gene family evolution will be consistently overestimated. Here, we present a method that takes these errors into account, allowing one to accurately infer rates of gene gain and loss among genomes even with low assembly and annotation quality. The method is implemented in the newest version of the software package CAFE, along with several other novel features. We demonstrate the accuracy of the method with extensive simulations and reanalyze several previously published data sets. Our results show that errors in genome annotation do lead to higher inferred rates of gene gain and loss but that CAFE 3 sufficiently accounts for these errors to provide accurate estimates of important evolutionary parameters.

  19. Comparative Genomics Reveals the Core Gene Toolbox for the Fungus-Insect Symbiosis

    Science.gov (United States)

    Stata, Matt; Wang, Wei; White, Merlin M.; Moncalvo, Jean-Marc

    2018-01-01

    ABSTRACT Modern genomics has shed light on many entomopathogenic fungi and expanded our knowledge widely; however, little is known about the genomic features of the insect-commensal fungi. Harpellales are obligate commensals living in the digestive tracts of disease-bearing insects (black flies, midges, and mosquitoes). In this study, we produced and annotated whole-genome sequences of nine Harpellales taxa and conducted the first comparative analyses to infer the genomic diversity within the members of the Harpellales. The genomes of the insect gut fungi feature low (26% to 37%) GC content and large genome size variations (25 to 102 Mb). Further comparisons with insect-pathogenic fungi (from both Ascomycota and Zoopagomycota), as well as with free-living relatives (as negative controls), helped to identify a gene toolbox that is essential to the fungus-insect symbiosis. The results not only narrow the genomic scope of fungus-insect interactions from several thousands to eight core players but also distinguish host invasion strategies employed by insect pathogens and commensals. The genomic content suggests that insect commensal fungi rely mostly on adhesion protein anchors that target digestive system, while entomopathogenic fungi have higher numbers of transmembrane helices, signal peptides, and pathogen-host interaction (PHI) genes across the whole genome and enrich genes as well as functional domains to inactivate the host inflammation system and suppress the host defense. Phylogenomic analyses have revealed that genome sizes of Harpellales fungi vary among lineages with an integer-multiple pattern, which implies that ancient genome duplications may have occurred within the gut of insects. PMID:29764946

  20. Transcriptome analysis reveals the time of the fourth round of genome duplication in common carp (Cyprinus carpio)

    Science.gov (United States)

    2012-01-01

    Background Common carp (Cyprinus carpio) is thought to have undergone one extra round of genome duplication compared to zebrafish. Transcriptome analysis has been used to study the existence and timing of genome duplication in species for which genome sequences are incomplete. Large-scale transcriptome data for the common carp genome should help reveal the timing of the additional duplication event. Results We have sequenced the transcriptome of common carp using 454 pyrosequencing. After assembling the 454 contigs and the published common carp sequences together, we obtained 49,669 contigs and identified genes using homology searches and an ab initio method. We identified 4,651 orthologous pairs between common carp and zebrafish and found 129,984 paralogous pairs within the common carp. An estimation of the synonymous substitution rate in the orthologous pairs indicated that common carp and zebrafish diverged 120 million years ago (MYA). We identified one round of genome duplication in common carp and estimated that it had occurred 5.6 to 11.3 MYA. In zebrafish, no genome duplication event after speciation was observed, suggesting that, compared to zebrafish, common carp had undergone an additional genome duplication event. We annotated the common carp contigs with Gene Ontology terms and KEGG pathways. Compared with zebrafish gene annotations, we found that a set of biological processes and pathways were enriched in common carp. Conclusions The assembled contigs helped us to estimate the time of the fourth-round of genome duplication in common carp. The resource that we have built as part of this study will help advance functional genomics and genome annotation studies in the future. PMID:22424280

  1. Integrative analysis of functional genomic annotations and sequencing data to identify rare causal variants via hierarchical modeling

    Directory of Open Access Journals (Sweden)

    Marinela eCapanu

    2015-05-01

    Full Text Available Identifying the small number of rare causal variants contributing to disease has beena major focus of investigation in recent years, but represents a formidable statisticalchallenge due to the rare frequencies with which these variants are observed. In thiscommentary we draw attention to a formal statistical framework, namely hierarchicalmodeling, to combine functional genomic annotations with sequencing data with theobjective of enhancing our ability to identify rare causal variants. Using simulations weshow that in all configurations studied, the hierarchical modeling approach has superiordiscriminatory ability compared to a recently proposed aggregate measure of deleteriousness,the Combined Annotation-Dependent Depletion (CADD score, supportingour premise that aggregate functional genomic measures can more accurately identifycausal variants when used in conjunction with sequencing data through a hierarchicalmodeling approach

  2. Evaluation of relational and NoSQL database architectures to manage genomic annotations.

    Science.gov (United States)

    Schulz, Wade L; Nelson, Brent G; Felker, Donn K; Durant, Thomas J S; Torres, Richard

    2016-12-01

    While the adoption of next generation sequencing has rapidly expanded, the informatics infrastructure used to manage the data generated by this technology has not kept pace. Historically, relational databases have provided much of the framework for data storage and retrieval. Newer technologies based on NoSQL architectures may provide significant advantages in storage and query efficiency, thereby reducing the cost of data management. But their relative advantage when applied to biomedical data sets, such as genetic data, has not been characterized. To this end, we compared the storage, indexing, and query efficiency of a common relational database (MySQL), a document-oriented NoSQL database (MongoDB), and a relational database with NoSQL support (PostgreSQL). When used to store genomic annotations from the dbSNP database, we found the NoSQL architectures to outperform traditional, relational models for speed of data storage, indexing, and query retrieval in nearly every operation. These findings strongly support the use of novel database technologies to improve the efficiency of data management within the biological sciences. Copyright © 2016 Elsevier Inc. All rights reserved.

  3. Swine transcriptome characterization by combined Iso-Seq and RNA-seq for annotating the emerging long read-based reference genome

    Science.gov (United States)

    PacBio long-read sequencing technology is increasingly popular in genome sequence assembly and transcriptome cataloguing. Recently, a new-generation pig reference genome was assembled based on long reads from this technology. To finely annotate this genome assembly, transcriptomes of nine tissues fr...

  4. A manually annotated Actinidia chinensis var. chinensis (kiwifruit) genome highlights the challenges associated with draft genomes and gene prediction in plants.

    Science.gov (United States)

    Pilkington, Sarah M; Crowhurst, Ross; Hilario, Elena; Nardozza, Simona; Fraser, Lena; Peng, Yongyan; Gunaseelan, Kularajathevan; Simpson, Robert; Tahir, Jibran; Deroles, Simon C; Templeton, Kerry; Luo, Zhiwei; Davy, Marcus; Cheng, Canhong; McNeilage, Mark; Scaglione, Davide; Liu, Yifei; Zhang, Qiong; Datson, Paul; De Silva, Nihal; Gardiner, Susan E; Bassett, Heather; Chagné, David; McCallum, John; Dzierzon, Helge; Deng, Cecilia; Wang, Yen-Yi; Barron, Lorna; Manako, Kelvina; Bowen, Judith; Foster, Toshi M; Erridge, Zoe A; Tiffin, Heather; Waite, Chethi N; Davies, Kevin M; Grierson, Ella P; Laing, William A; Kirk, Rebecca; Chen, Xiuyin; Wood, Marion; Montefiori, Mirco; Brummell, David A; Schwinn, Kathy E; Catanach, Andrew; Fullerton, Christina; Li, Dawei; Meiyalaghan, Sathiyamoorthy; Nieuwenhuizen, Niels; Read, Nicola; Prakash, Roneel; Hunter, Don; Zhang, Huaibi; McKenzie, Marian; Knäbel, Mareike; Harris, Alastair; Allan, Andrew C; Gleave, Andrew; Chen, Angela; Janssen, Bart J; Plunkett, Blue; Ampomah-Dwamena, Charles; Voogd, Charlotte; Leif, Davin; Lafferty, Declan; Souleyre, Edwige J F; Varkonyi-Gasic, Erika; Gambi, Francesco; Hanley, Jenny; Yao, Jia-Long; Cheung, Joey; David, Karine M; Warren, Ben; Marsh, Ken; Snowden, Kimberley C; Lin-Wang, Kui; Brian, Lara; Martinez-Sanchez, Marcela; Wang, Mindy; Ileperuma, Nadeesha; Macnee, Nikolai; Campin, Robert; McAtee, Peter; Drummond, Revel S M; Espley, Richard V; Ireland, Hilary S; Wu, Rongmei; Atkinson, Ross G; Karunairetnam, Sakuntala; Bulley, Sean; Chunkath, Shayhan; Hanley, Zac; Storey, Roy; Thrimawithana, Amali H; Thomson, Susan; David, Charles; Testolin, Raffaele; Huang, Hongwen; Hellens, Roger P; Schaffer, Robert J

    2018-04-16

    Most published genome sequences are drafts, and most are dominated by computational gene prediction. Draft genomes typically incorporate considerable sequence data that are not assigned to chromosomes, and predicted genes without quality confidence measures. The current Actinidia chinensis (kiwifruit) 'Hongyang' draft genome has 164 Mb of sequences unassigned to pseudo-chromosomes, and omissions have been identified in the gene models. A second genome of an A. chinensis (genotype Red5) was fully sequenced. This new sequence resulted in a 554.0 Mb assembly with all but 6 Mb assigned to pseudo-chromosomes. Pseudo-chromosomal comparisons showed a considerable number of translocation events have occurred following a whole genome duplication (WGD) event some consistent with centromeric Robertsonian-like translocations. RNA sequencing data from 12 tissues and ab initio analysis informed a genome-wide manual annotation, using the WebApollo tool. In total, 33,044 gene loci represented by 33,123 isoforms were identified, named and tagged for quality of evidential support. Of these 3114 (9.4%) were identical to a protein within 'Hongyang' The Kiwifruit Information Resource (KIR v2). Some proportion of the differences will be varietal polymorphisms. However, as most computationally predicted Red5 models required manual re-annotation this proportion is expected to be small. The quality of the new gene models was tested by fully sequencing 550 cloned 'Hort16A' cDNAs and comparing with the predicted protein models for Red5 and both the original 'Hongyang' assembly and the revised annotation from KIR v2. Only 48.9% and 63.5% of the cDNAs had a match with 90% identity or better to the original and revised 'Hongyang' annotation, respectively, compared with 90.9% to the Red5 models. Our study highlights the need to take a cautious approach to draft genomes and computationally predicted genes. Our use of the manual annotation tool WebApollo facilitated manual checking and

  5. The Phaeodactylum genome reveals the evolutionary history of diatom genomes

    Czech Academy of Sciences Publication Activity Database

    Bowler, Ch.; Allen, A. E.; Badger, J. H.; Grimwood, J.; Jabbari, K.; Kuo, A.; Maheswari, U.; Martens, C.; Maumus, F.; Otillar, R. P.; Rayko, E.; Salamov, A.; Vandepoele, K.; Beszteri, B.; Gruber, A.; Heijde, M.; Katinka, M.; Mock, T.; Valentin, K.; Verret, F.; Berges, J. A.; Brownlee, C.; Cadoret, J.-P.; Chiovitti, A.; Choi, Ch. J.; Coesel, S.; De Martino, A.; Detter, J. Ch.; Durkin, C.; Falciatore, A.; Fournet, J.; Haruta, M.; Huysman, M. J. J.; Jenkins, B. D.; Jiroutová, Kateřina; Jorgensen, R. E.; Joubert, Y.; Kaplan, A.; Kröger, N.; Kroth, P. G.; La Roche, J.; Lindquist, E.; Lommer, M.; Martin–Jézéquel, V.; Lopez, P. J.; Lucas, S.; Mangogna, M.; McGinnis, K.; Medlin, L. K.; Montsant, A.; Oudot–Le Secq, M.-P.; Napoli, C.; Oborník, Miroslav; Schnitzler Parker, M.; Petit, J.-L.; Porcel, B. M.; Poulsen, N.; Robison, M.; Rychlewski, L.; Rynearson, T. A.; Schmutz, J.; Shapiro, H.; Siaut, M.; Stanley, M.; Sussman, M. R.; Taylor, A. R.; Vardi, A.; von Dassow, P.; Vyverman, W.; Willis, A.; Wyrwicz, L. S.; Rokhsar, D. S.; Weissenbach, J.; Armbrust, E. V.; Green, B. R.; Van de Peer, Y.; Grigoriev, I. V.

    2008-01-01

    Roč. 456, 13-11-2008 (2008), s. 239-244 ISSN 0028-0836 Institutional research plan: CEZ:AV0Z60220518 Keywords : Phaeodactylum * genome * evolution * diatom Subject RIV: EB - Genetics ; Molecular Biology Impact factor: 31.434, year: 2008

  6. VAT: a computational framework to functionally annotate variants in personal genomes within a cloud-computing environment.

    Science.gov (United States)

    Habegger, Lukas; Balasubramanian, Suganthi; Chen, David Z; Khurana, Ekta; Sboner, Andrea; Harmanci, Arif; Rozowsky, Joel; Clarke, Declan; Snyder, Michael; Gerstein, Mark

    2012-09-01

    The functional annotation of variants obtained through sequencing projects is generally assumed to be a simple intersection of genomic coordinates with genomic features. However, complexities arise for several reasons, including the differential effects of a variant on alternatively spliced transcripts, as well as the difficulty in assessing the impact of small insertions/deletions and large structural variants. Taking these factors into consideration, we developed the Variant Annotation Tool (VAT) to functionally annotate variants from multiple personal genomes at the transcript level as well as obtain summary statistics across genes and individuals. VAT also allows visualization of the effects of different variants, integrates allele frequencies and genotype data from the underlying individuals and facilitates comparative analysis between different groups of individuals. VAT can either be run through a command-line interface or as a web application. Finally, in order to enable on-demand access and to minimize unnecessary transfers of large data files, VAT can be run as a virtual machine in a cloud-computing environment. VAT is implemented in C and PHP. The VAT web service, Amazon Machine Image, source code and detailed documentation are available at vat.gersteinlab.org.

  7. CpGAVAS, an integrated web server for the annotation, visualization, analysis, and GenBank submission of completely sequenced chloroplast genome sequences

    Science.gov (United States)

    2012-01-01

    Background The complete sequences of chloroplast genomes provide wealthy information regarding the evolutionary history of species. With the advance of next-generation sequencing technology, the number of completely sequenced chloroplast genomes is expected to increase exponentially, powerful computational tools annotating the genome sequences are in urgent need. Results We have developed a web server CPGAVAS. The server accepts a complete chloroplast genome sequence as input. First, it predicts protein-coding and rRNA genes based on the identification and mapping of the most similar, full-length protein, cDNA and rRNA sequences by integrating results from Blastx, Blastn, protein2genome and est2genome programs. Second, tRNA genes and inverted repeats (IR) are identified using tRNAscan, ARAGORN and vmatch respectively. Third, it calculates the summary statistics for the annotated genome. Fourth, it generates a circular map ready for publication. Fifth, it can create a Sequin file for GenBank submission. Last, it allows the extractions of protein and mRNA sequences for given list of genes and species. The annotation results in GFF3 format can be edited using any compatible annotation editing tools. The edited annotations can then be uploaded to CPGAVAS for update and re-analyses repeatedly. Using known chloroplast genome sequences as test set, we show that CPGAVAS performs comparably to another application DOGMA, while having several superior functionalities. Conclusions CPGAVAS allows the semi-automatic and complete annotation of a chloroplast genome sequence, and the visualization, editing and analysis of the annotation results. It will become an indispensible tool for researchers studying chloroplast genomes. The software is freely accessible from http://www.herbalgenomics.org/cpgavas. PMID:23256920

  8. CpGAVAS, an integrated web server for the annotation, visualization, analysis, and GenBank submission of completely sequenced chloroplast genome sequences

    Directory of Open Access Journals (Sweden)

    Liu Chang

    2012-12-01

    Full Text Available Abstract Background The complete sequences of chloroplast genomes provide wealthy information regarding the evolutionary history of species. With the advance of next-generation sequencing technology, the number of completely sequenced chloroplast genomes is expected to increase exponentially, powerful computational tools annotating the genome sequences are in urgent need. Results We have developed a web server CPGAVAS. The server accepts a complete chloroplast genome sequence as input. First, it predicts protein-coding and rRNA genes based on the identification and mapping of the most similar, full-length protein, cDNA and rRNA sequences by integrating results from Blastx, Blastn, protein2genome and est2genome programs. Second, tRNA genes and inverted repeats (IR are identified using tRNAscan, ARAGORN and vmatch respectively. Third, it calculates the summary statistics for the annotated genome. Fourth, it generates a circular map ready for publication. Fifth, it can create a Sequin file for GenBank submission. Last, it allows the extractions of protein and mRNA sequences for given list of genes and species. The annotation results in GFF3 format can be edited using any compatible annotation editing tools. The edited annotations can then be uploaded to CPGAVAS for update and re-analyses repeatedly. Using known chloroplast genome sequences as test set, we show that CPGAVAS performs comparably to another application DOGMA, while having several superior functionalities. Conclusions CPGAVAS allows the semi-automatic and complete annotation of a chloroplast genome sequence, and the visualization, editing and analysis of the annotation results. It will become an indispensible tool for researchers studying chloroplast genomes. The software is freely accessible from http://www.herbalgenomics.org/cpgavas.

  9. The Douglas-Fir Genome Sequence Reveals Specialization of the Photosynthetic Apparatus in Pinaceae

    Directory of Open Access Journals (Sweden)

    David B. Neale

    2017-09-01

    Full Text Available A reference genome sequence for Pseudotsuga menziesii var. menziesii (Mirb. Franco (Coastal Douglas-fir is reported, thus providing a reference sequence for a third genus of the family Pinaceae. The contiguity and quality of the genome assembly far exceeds that of other conifer reference genome sequences (contig N50 = 44,136 bp and scaffold N50 = 340,704 bp. Incremental improvements in sequencing and assembly technologies are in part responsible for the higher quality reference genome, but it may also be due to a slightly lower exact repeat content in Douglas-fir vs. pine and spruce. Comparative genome annotation with angiosperm species reveals gene-family expansion and contraction in Douglas-fir and other conifers which may account for some of the major morphological and physiological differences between the two major plant groups. Notable differences in the size of the NDH-complex gene family and genes underlying the functional basis of shade tolerance/intolerance were observed. This reference genome sequence not only provides an important resource for Douglas-fir breeders and geneticists but also sheds additional light on the evolutionary processes that have led to the divergence of modern angiosperms from the more ancient gymnosperms.

  10. Homology-based annotation of non-coding RNAs in the genomes of Schistosoma mansoni and Schistosoma japonicum

    Directory of Open Access Journals (Sweden)

    Santana Clara

    2009-10-01

    Full Text Available Abstract Background Schistosomes are trematode parasites of the phylum Platyhelminthes. They are considered the most important of the human helminth parasites in terms of morbidity and mortality. Draft genome sequences are now available for Schistosoma mansoni and Schistosoma japonicum. Non-coding RNA (ncRNA plays a crucial role in gene expression regulation, cellular function and defense, homeostasis, and pathogenesis. The genome-wide annotation of ncRNAs is a non-trivial task unless well-annotated genomes of closely related species are already available. Results A homology search for structured ncRNA in the genome of S. mansoni resulted in 23 types of ncRNAs with conserved primary and secondary structure. Among these, we identified rRNA, snRNA, SL RNA, SRP, tRNAs and RNase P, and also possibly MRP and 7SK RNAs. In addition, we confirmed five miRNAs that have recently been reported in S. japonicum and found two additional homologs of known miRNAs. The tRNA complement of S. mansoni is comparable to that of the free-living planarian Schmidtea mediterranea, although for some amino acids differences of more than a factor of two are observed: Leu, Ser, and His are overrepresented, while Cys, Meth, and Ile are underrepresented in S. mansoni. On the other hand, the number of tRNAs in the genome of S. japonicum is reduced by more than a factor of four. Both schistosomes have a complete set of minor spliceosomal snRNAs. Several ncRNAs that are expected to exist in the S. mansoni genome were not found, among them the telomerase RNA, vault RNAs, and Y RNAs. Conclusion The ncRNA sequences and structures presented here represent the most complete dataset of ncRNA from any lophotrochozoan reported so far. This data set provides an important reference for further analysis of the genomes of schistosomes and indeed eukaryotic genomes at large.

  11. Comparative Genomic Analysis of Clinical and Environmental Vibrio Vulnificus Isolates Revealed Biotype 3 Evolutionary Relationships

    Directory of Open Access Journals (Sweden)

    Yael eKotton

    2015-01-01

    Full Text Available In 1996 a common-source outbreak of severe soft tissue and bloodstream infections erupted among Israeli fish farmers and fish consumers due to changes in fish marketing policies. The causative pathogen was a new strain of Vibrio vulnificus, named biotype 3, which displayed a unique biochemical and genotypic profile. Initial observations suggested that the pathogen erupted as a result of genetic recombination between two distinct populations. We applied a whole genome shotgun sequencing approach using several V. vulnificus strains from Israel in order to study the pan genome of V. vulnificus and determine the phylogenetic relationship of biotype 3 with existing populations. The core genome of V. vulnificus based on 16 draft and complete genomes consisted of 3068 genes, representing between 59% and 78% of the whole genome of 16 strains. The accessory genome varied in size from 781 kbp to 2044 kbp. Phylogenetic analysis based on whole, core, and accessory genomes displayed similar clustering patterns with two main clusters, clinical (C and environmental (E, all biotype 3 strains formed a distinct group within the E cluster. Annotation of accessory genomic regions found in biotype 3 strains and absent from the core genome yielded 1732 genes, of which the vast majority encoded hypothetical proteins, phage-related proteins, and mobile element proteins. A total of 1916 proteins (including 713 hypothetical proteins were present in all human pathogenic strains (both biotype 3 and non-biotype 3 and absent from the environmental strains. Clustering analysis of the non-hypothetical proteins revealed 148 protein clusters shared by all human pathogenic strains; these included transcriptional regulators, arylsulfatases, methyl-accepting chemotaxis proteins, acetyltransferases, GGDEF family proteins, transposases, type IV secretory system (T4SS proteins, and integrases. Our study showed that V. vulnificus biotype 3 evolved from environmental populations and

  12. Computational prediction of over-annotated protein-coding genes in the genome of Agrobacterium tumefaciens strain C58

    Science.gov (United States)

    Yu, Jia-Feng; Sui, Tian-Xiang; Wang, Hong-Mei; Wang, Chun-Ling; Jing, Li; Wang, Ji-Hua

    2015-12-01

    Agrobacterium tumefaciens strain C58 is a type of pathogen that can cause tumors in some dicotyledonous plants. Ever since the genome of A. tumefaciens strain C58 was sequenced, the quality of annotation of its protein-coding genes has been queried continually, because the annotation varies greatly among different databases. In this paper, the questionable hypothetical genes were re-predicted by integrating the TN curve and Z curve methods. As a result, 30 genes originally annotated as “hypothetical” were discriminated as being non-coding sequences. By testing the re-prediction program 10 times on data sets composed of the function-known genes, the mean accuracy of 99.99% and mean Matthews correlation coefficient value of 0.9999 were obtained. Further sequence analysis and COG analysis showed that the re-annotation results were very reliable. This work can provide an efficient tool and data resources for future studies of A. tumefaciens strain C58. Project supported by the National Natural Science Foundation of China (Grant Nos. 61302186 and 61271378) and the Funding from the State Key Laboratory of Bioelectronics of Southeast University.

  13. Computational prediction of over-annotated protein-coding genes in the genome of Agrobacterium tumefaciens strain C58

    International Nuclear Information System (INIS)

    Yu Jia-Feng; Sui Tian-Xiang; Wang Ji-Hua; Wang Hong-Mei; Wang Chun-Ling; Jing Li

    2015-01-01

    Agrobacterium tumefaciens strain C58 is a type of pathogen that can cause tumors in some dicotyledonous plants. Ever since the genome of A. tumefaciens strain C58 was sequenced, the quality of annotation of its protein-coding genes has been queried continually, because the annotation varies greatly among different databases. In this paper, the questionable hypothetical genes were re-predicted by integrating the TN curve and Z curve methods. As a result, 30 genes originally annotated as “hypothetical” were discriminated as being non-coding sequences. By testing the re-prediction program 10 times on data sets composed of the function-known genes, the mean accuracy of 99.99% and mean Matthews correlation coefficient value of 0.9999 were obtained. Further sequence analysis and COG analysis showed that the re-annotation results were very reliable. This work can provide an efficient tool and data resources for future studies of A. tumefaciens strain C58. (special topic)

  14. Genes but not genomes reveal bacterial domestication of Lactococcus lactis.

    Directory of Open Access Journals (Sweden)

    Delphine Passerini

    Full Text Available BACKGROUND: The population structure and diversity of Lactococcus lactis subsp. lactis, a major industrial bacterium involved in milk fermentation, was determined at both gene and genome level. Seventy-six lactococcal isolates of various origins were studied by different genotyping methods and thirty-six strains displaying unique macrorestriction fingerprints were analyzed by a new multilocus sequence typing (MLST scheme. This gene-based analysis was compared to genomic characteristics determined by pulsed-field gel electrophoresis (PFGE. METHODOLOGY/PRINCIPAL FINDINGS: The MLST analysis revealed that L. lactis subsp. lactis is essentially clonal with infrequent intra- and intergenic recombination; also, despite its taxonomical classification as a subspecies, it displays a genetic diversity as substantial as that within several other bacterial species. Genome-based analysis revealed a genome size variability of 20%, a value typical of bacteria inhabiting different ecological niches, and that suggests a large pan-genome for this subspecies. However, the genomic characteristics (macrorestriction pattern, genome or chromosome size, plasmid content did not correlate to the MLST-based phylogeny, with strains from the same sequence type (ST differing by up to 230 kb in genome size. CONCLUSION/SIGNIFICANCE: The gene-based phylogeny was not fully consistent with the traditional classification into dairy and non-dairy strains but supported a new classification based on ecological separation between "environmental" strains, the main contributors to the genetic diversity within the subspecies, and "domesticated" strains, subject to recent genetic bottlenecks. Comparison between gene- and genome-based analyses revealed little relationship between core and dispensable genome phylogenies, indicating that clonal diversification and phenotypic variability of the "domesticated" strains essentially arose through substantial genomic flux within the dispensable

  15. High-coverage sequencing and annotated assembly of the genome of the Australian dragon lizard Pogona vitticeps.

    Science.gov (United States)

    Georges, Arthur; Li, Qiye; Lian, Jinmin; O'Meally, Denis; Deakin, Janine; Wang, Zongji; Zhang, Pei; Fujita, Matthew; Patel, Hardip R; Holleley, Clare E; Zhou, Yang; Zhang, Xiuwen; Matsubara, Kazumi; Waters, Paul; Graves, Jennifer A Marshall; Sarre, Stephen D; Zhang, Guojie

    2015-01-01

    The lizards of the family Agamidae are one of the most prominent elements of the Australian reptile fauna. Here, we present a genomic resource built on the basis of a wild-caught male ZZ central bearded dragon Pogona vitticeps. The genomic sequence for P. vitticeps, generated on the Illumina HiSeq 2000 platform, comprised 317 Gbp (179X raw read depth) from 13 insert libraries ranging from 250 bp to 40 kbp. After filtering for low-quality and duplicated reads, 146 Gbp of data (83X) was available for assembly. Exceptionally high levels of heterozygosity (0.85 % of single nucleotide polymorphisms plus sequence insertions or deletions) complicated assembly; nevertheless, 96.4 % of reads mapped back to the assembled scaffolds, indicating that the assembly included most of the sequenced genome. Length of the assembly was 1.8 Gbp in 545,310 scaffolds (69,852 longer than 300 bp), the longest being 14.68 Mbp. N50 was 2.29 Mbp. Genes were annotated on the basis of de novo prediction, similarity to the green anole Anolis carolinensis, Gallus gallus and Homo sapiens proteins, and P. vitticeps transcriptome sequence assemblies, to yield 19,406 protein-coding genes in the assembly, 63 % of which had intact open reading frames. Our assembly captured 99 % (246 of 248) of core CEGMA genes, with 93 % (231) being complete. The quality of the P. vitticeps assembly is comparable or superior to that of other published squamate genomes, and the annotated P. vitticeps genome can be accessed through a genome browser available at https://genomics.canberra.edu.au.

  16. ChIP-Seq-Annotated Heliconius erato Genome Highlights Patterns of cis-Regulatory Evolution in Lepidoptera

    Directory of Open Access Journals (Sweden)

    James J. Lewis

    2016-09-01

    Full Text Available Uncovering phylogenetic patterns of cis-regulatory evolution remains a fundamental goal for evolutionary and developmental biology. Here, we characterize the evolution of regulatory loci in butterflies and moths using chromatin immunoprecipitation sequencing (ChIP-seq annotation of regulatory elements across three stages of head development. In the process we provide a high-quality, functionally annotated genome assembly for the butterfly, Heliconius erato. Comparing cis-regulatory element conservation across six lepidopteran genomes, we find that regulatory sequences evolve at a pace similar to that of protein-coding regions. We also observe that elements active at multiple developmental stages are markedly more conserved than elements with stage-specific activity. Surprisingly, we also find that stage-specific proximal and distal regulatory elements evolve at nearly identical rates. Our study provides a benchmark for genome-wide patterns of regulatory element evolution in insects, and it shows that developmental timing of activity strongly predicts patterns of regulatory sequence evolution.

  17. WGSSAT: A High-Throughput Computational Pipeline for Mining and Annotation of SSR Markers From Whole Genomes.

    Science.gov (United States)

    Pandey, Manmohan; Kumar, Ravindra; Srivastava, Prachi; Agarwal, Suyash; Srivastava, Shreya; Nagpure, Naresh S; Jena, Joy K; Kushwaha, Basdeo

    2018-03-16

    Mining and characterization of Simple Sequence Repeat (SSR) markers from whole genomes provide valuable information about biological significance of SSR distribution and also facilitate development of markers for genetic analysis. Whole genome sequencing (WGS)-SSR Annotation Tool (WGSSAT) is a graphical user interface pipeline developed using Java Netbeans and Perl scripts which facilitates in simplifying the process of SSR mining and characterization. WGSSAT takes input in FASTA format and automates the prediction of genes, noncoding RNA (ncRNA), core genes, repeats and SSRs from whole genomes followed by mapping of the predicted SSRs onto a genome (classified according to genes, ncRNA, repeats, exonic, intronic, and core gene region) along with primer identification and mining of cross-species markers. The program also generates a detailed statistical report along with visualization of mapped SSRs, genes, core genes, and RNAs. The features of WGSSAT were demonstrated using Takifugu rubripes data. This yielded a total of 139 057 SSR, out of which 113 703 SSR primer pairs were uniquely amplified in silico onto a T. rubripes (fugu) genome. Out of 113 703 mined SSRs, 81 463 were from coding region (including 4286 exonic and 77 177 intronic), 7 from RNA, 267 from core genes of fugu, whereas 105 641 SSR and 601 SSR primer pairs were uniquely mapped onto the medaka genome. WGSSAT is tested under Ubuntu Linux. The source code, documentation, user manual, example dataset and scripts are available online at https://sourceforge.net/projects/wgssat-nbfgr.

  18. Molecular cytogenetic and genomic analyses reveal new insights into the origin of the wheat B genome.

    Science.gov (United States)

    Zhang, Wei; Zhang, Mingyi; Zhu, Xianwen; Cao, Yaping; Sun, Qing; Ma, Guojia; Chao, Shiaoman; Yan, Changhui; Xu, Steven S; Cai, Xiwen

    2018-02-01

    This work pinpointed the goatgrass chromosomal segment in the wheat B genome using modern cytogenetic and genomic technologies, and provided novel insights into the origin of the wheat B genome. Wheat is a typical allopolyploid with three homoeologous subgenomes (A, B, and D). The donors of the subgenomes A and D had been identified, but not for the subgenome B. The goatgrass Aegilops speltoides (genome SS) has been controversially considered a possible candidate for the donor of the wheat B genome. However, the relationship of the Ae. speltoides S genome with the wheat B genome remains largely obscure. The present study assessed the homology of the B and S genomes using an integrative cytogenetic and genomic approach, and revealed the contribution of Ae. speltoides to the origin of the wheat B genome. We discovered noticeable homology between wheat chromosome 1B and Ae. speltoides chromosome 1S, but not between other chromosomes in the B and S genomes. An Ae. speltoides-originated segment spanning a genomic region of approximately 10.46 Mb was detected on the long arm of wheat chromosome 1B (1BL). The Ae. speltoides-originated segment on 1BL was found to co-evolve with the rest of the B genome. Evidently, Ae. speltoides had been involved in the origin of the wheat B genome, but should not be considered an exclusive donor of this genome. The wheat B genome might have a polyphyletic origin with multiple ancestors involved, including Ae. speltoides. These novel findings will facilitate genome studies in wheat and other polyploids.

  19. Improved annotation of antibiotic resistance determinants reveals microbial resistomes cluster by ecology.

    Science.gov (United States)

    Gibson, Molly K; Forsberg, Kevin J; Dantas, Gautam

    2015-01-01

    Antibiotic resistance is a dire clinical problem with important ecological dimensions. While antibiotic resistance in human pathogens continues to rise at alarming rates, the impact of environmental resistance on human health is still unclear. To investigate the relationship between human-associated and environmental resistomes, we analyzed functional metagenomic selections for resistance against 18 clinically relevant antibiotics from soil and human gut microbiota as well as a set of multidrug-resistant cultured soil isolates. These analyses were enabled by Resfams, a new curated database of protein families and associated highly precise and accurate profile hidden Markov models, confirmed for antibiotic resistance function and organized by ontology. We demonstrate that the antibiotic resistance functions that give rise to the resistance profiles observed in environmental and human-associated microbial communities significantly differ between ecologies. Antibiotic resistance functions that most discriminate between ecologies provide resistance to β-lactams and tetracyclines, two of the most widely used classes of antibiotics in the clinic and agriculture. We also analyzed the antibiotic resistance gene composition of over 6000 sequenced microbial genomes, revealing significant enrichment of resistance functions by both ecology and phylogeny. Together, our results indicate that environmental and human-associated microbial communities harbor distinct resistance genes, suggesting that antibiotic resistance functions are largely constrained by ecology.

  20. Facilitating functional annotation of chicken microarray data

    Directory of Open Access Journals (Sweden)

    Gresham Cathy R

    2009-10-01

    Full Text Available Abstract Background Modeling results from chicken microarray studies is challenging for researchers due to little functional annotation associated with these arrays. The Affymetrix GenChip chicken genome array, one of the biggest arrays that serve as a key research tool for the study of chicken functional genomics, is among the few arrays that link gene products to Gene Ontology (GO. However the GO annotation data presented by Affymetrix is incomplete, for example, they do not show references linked to manually annotated functions. In addition, there is no tool that facilitates microarray researchers to directly retrieve functional annotations for their datasets from the annotated arrays. This costs researchers amount of time in searching multiple GO databases for functional information. Results We have improved the breadth of functional annotations of the gene products associated with probesets on the Affymetrix chicken genome array by 45% and the quality of annotation by 14%. We have also identified the most significant diseases and disorders, different types of genes, and known drug targets represented on Affymetrix chicken genome array. To facilitate functional annotation of other arrays and microarray experimental datasets we developed an Array GO Mapper (AGOM tool to help researchers to quickly retrieve corresponding functional information for their dataset. Conclusion Results from this study will directly facilitate annotation of other chicken arrays and microarray experimental datasets. Researchers will be able to quickly model their microarray dataset into more reliable biological functional information by using AGOM tool. The disease, disorders, gene types and drug targets revealed in the study will allow researchers to learn more about how genes function in complex biological systems and may lead to new drug discovery and development of therapies. The GO annotation data generated will be available for public use via AgBase website and

  1. The future of transposable element annotation and their classification in the light of functional genomics - what we can learn from the fables of Jean de la Fontaine?

    Science.gov (United States)

    Arensburger, Peter; Piégu, Benoît; Bigot, Yves

    2016-01-01

    Transposable element (TE) science has been significantly influenced by the pioneering ideas of David Finnegan near the end of the last century, as well as by the classification systems that were subsequently developed. Today, whole genome TE annotation is mostly done using tools that were developed to aid gene annotation rather than to specifically study TEs. We argue that further progress in the TE field is impeded both by current TE classification schemes and by a failure to recognize that TE biology is fundamentally different from that of multicellular organisms. Novel genome wide TE annotation methods are helping to redefine our understanding of TE sequence origins and evolution. We briefly discuss some of these new methods as well as ideas for possible alternative classification schemes. Our hope is to encourage the formation of a society to organize a larger debate on these questions and to promote the adoption of standards for annotation and an improved TE classification.

  2. Genome-wide analysis reveals the extent of EAV-HP integration in domestic chicken.

    Science.gov (United States)

    Wragg, David; Mason, Andrew S; Yu, Le; Kuo, Richard; Lawal, Raman A; Desta, Takele Taye; Mwacharo, Joram M; Cho, Chang-Yeon; Kemp, Steve; Burt, David W; Hanotte, Olivier

    2015-10-14

    EAV-HP is an ancient retrovirus pre-dating Gallus speciation, which continues to circulate in modern chicken populations, and led to the emergence of avian leukosis virus subgroup J causing significant economic losses to the poultry industry. We mapped EAV-HP integration sites in Ethiopian village chickens, a Silkie, Taiwan Country chicken, red junglefowl Gallus gallus and several inbred experimental lines using whole-genome sequence data. An average of 75.22 ± 9.52 integration sites per bird were identified, which collectively group into 279 intervals of which 5 % are common to 90 % of the genomes analysed and are suggestive of pre-domestication integration events. More than a third of intervals are specific to individual genomes, supporting active circulation of EAV-HP in modern chickens. Interval density is correlated with chromosome length (P < 2.31(-6)), and 27 % of intervals are located within 5 kb of a transcript. Functional annotation clustering of genes reveals enrichment for immune-related functions (P < 0.05). Our results illustrate a non-random distribution of EAV-HP in the genome, emphasising the importance it may have played in the adaptation of the species, and provide a platform from which to extend investigations on the co-evolutionary significance of endogenous retroviral genera with their hosts.

  3. Systematic tissue-specific functional annotation of the human genome highlights immune-related DNA elements for late-onset Alzheimer's disease.

    Directory of Open Access Journals (Sweden)

    Qiongshi Lu

    2017-07-01

    Full Text Available Continuing efforts from large international consortia have made genome-wide epigenomic and transcriptomic annotation data publicly available for a variety of cell and tissue types. However, synthesis of these datasets into effective summary metrics to characterize the functional non-coding genome remains a challenge. Here, we present GenoSkyline-Plus, an extension of our previous work through integration of an expanded set of epigenomic and transcriptomic annotations to produce high-resolution, single tissue annotations. After validating our annotations with a catalog of tissue-specific non-coding elements previously identified in the literature, we apply our method using data from 127 different cell and tissue types to present an atlas of heritability enrichment across 45 different GWAS traits. We show that broader organ system categories (e.g. immune system increase statistical power in identifying biologically relevant tissue types for complex diseases while annotations of individual cell types (e.g. monocytes or B-cells provide deeper insights into disease etiology. Additionally, we use our GenoSkyline-Plus annotations in an in-depth case study of late-onset Alzheimer's disease (LOAD. Our analyses suggest a strong connection between LOAD heritability and genetic variants contained in regions of the genome functional in monocytes. Furthermore, we show that LOAD shares a similar localization of SNPs to monocyte-functional regions with Parkinson's disease. Overall, we demonstrate that integrated genome annotations at the single tissue level provide a valuable tool for understanding the etiology of complex human diseases. Our GenoSkyline-Plus annotations are freely available at http://genocanyon.med.yale.edu/GenoSkyline.

  4. Systematic tissue-specific functional annotation of the human genome highlights immune-related DNA elements for late-onset Alzheimer's disease.

    Science.gov (United States)

    Lu, Qiongshi; Powles, Ryan L; Abdallah, Sarah; Ou, Derek; Wang, Qian; Hu, Yiming; Lu, Yisi; Liu, Wei; Li, Boyang; Mukherjee, Shubhabrata; Crane, Paul K; Zhao, Hongyu

    2017-07-01

    Continuing efforts from large international consortia have made genome-wide epigenomic and transcriptomic annotation data publicly available for a variety of cell and tissue types. However, synthesis of these datasets into effective summary metrics to characterize the functional non-coding genome remains a challenge. Here, we present GenoSkyline-Plus, an extension of our previous work through integration of an expanded set of epigenomic and transcriptomic annotations to produce high-resolution, single tissue annotations. After validating our annotations with a catalog of tissue-specific non-coding elements previously identified in the literature, we apply our method using data from 127 different cell and tissue types to present an atlas of heritability enrichment across 45 different GWAS traits. We show that broader organ system categories (e.g. immune system) increase statistical power in identifying biologically relevant tissue types for complex diseases while annotations of individual cell types (e.g. monocytes or B-cells) provide deeper insights into disease etiology. Additionally, we use our GenoSkyline-Plus annotations in an in-depth case study of late-onset Alzheimer's disease (LOAD). Our analyses suggest a strong connection between LOAD heritability and genetic variants contained in regions of the genome functional in monocytes. Furthermore, we show that LOAD shares a similar localization of SNPs to monocyte-functional regions with Parkinson's disease. Overall, we demonstrate that integrated genome annotations at the single tissue level provide a valuable tool for understanding the etiology of complex human diseases. Our GenoSkyline-Plus annotations are freely available at http://genocanyon.med.yale.edu/GenoSkyline.

  5. Systematic tissue-specific functional annotation of the human genome highlights immune-related DNA elements for late-onset Alzheimer’s disease

    Science.gov (United States)

    Abdallah, Sarah; Ou, Derek; Wang, Qian; Hu, Yiming; Lu, Yisi; Liu, Wei; Li, Boyang; Mukherjee, Shubhabrata; Crane, Paul K.; Zhao, Hongyu

    2017-01-01

    Continuing efforts from large international consortia have made genome-wide epigenomic and transcriptomic annotation data publicly available for a variety of cell and tissue types. However, synthesis of these datasets into effective summary metrics to characterize the functional non-coding genome remains a challenge. Here, we present GenoSkyline-Plus, an extension of our previous work through integration of an expanded set of epigenomic and transcriptomic annotations to produce high-resolution, single tissue annotations. After validating our annotations with a catalog of tissue-specific non-coding elements previously identified in the literature, we apply our method using data from 127 different cell and tissue types to present an atlas of heritability enrichment across 45 different GWAS traits. We show that broader organ system categories (e.g. immune system) increase statistical power in identifying biologically relevant tissue types for complex diseases while annotations of individual cell types (e.g. monocytes or B-cells) provide deeper insights into disease etiology. Additionally, we use our GenoSkyline-Plus annotations in an in-depth case study of late-onset Alzheimer’s disease (LOAD). Our analyses suggest a strong connection between LOAD heritability and genetic variants contained in regions of the genome functional in monocytes. Furthermore, we show that LOAD shares a similar localization of SNPs to monocyte-functional regions with Parkinson’s disease. Overall, we demonstrate that integrated genome annotations at the single tissue level provide a valuable tool for understanding the etiology of complex human diseases. Our GenoSkyline-Plus annotations are freely available at http://genocanyon.med.yale.edu/GenoSkyline. PMID:28742084

  6. Comprehensive transcriptome and improved genome annotation of Bacillus licheniformis WX-02.

    Science.gov (United States)

    Guo, Jing; Cheng, Gang; Gou, Xiang-Yong; Xing, Feng; Li, Sen; Han, Yi-Chao; Wang, Long; Song, Jia-Ming; Shu, Cheng-Cheng; Chen, Shou-Wen; Chen, Ling-Ling

    2015-08-19

    The updated genome of Bacillus licheniformis WX-02 comprises a circular chromosome of 4286821 base-pairs containing 4512 protein-coding genes. We applied strand-specific RNA-sequencing to explore the transcriptome profiles of B. licheniformis WX-02 under normal and high-salt conditions (NaCl 6%). We identified 2381 co-expressed gene pairs constituting 871 operon structures. In addition, 1169 antisense transcripts and 90 small RNAs were detected. Systematic comparison of differentially expressed genes under different conditions revealed that genes involved in multiple functions were significantly repressed in long-term high salt adaptation process. Genes related to promotion of glutamic acid synthesis were activated by 6% NaCl, potentially explaining the high yield of γ-PGA under salt condition. This study will be useful for the optimization of crucial metabolic activities in this bacterium. Copyright © 2015. Published by Elsevier B.V.

  7. Draft genome sequence and annotation of Lactobacillus acetotolerans BM-LA14527, a beer-spoilage bacteria.

    Science.gov (United States)

    Liu, Junyan; Li, Lin; Peters, Brian M; Li, Bing; Deng, Yang; Xu, Zhenbo; Shirtliff, Mark E

    2016-09-01

    Lactobacillus acetotolerans is a hard-to-culture beer-spoilage bacterium capable of entering into the viable putative nonculturable (VPNC) state. As part of an initial strategy to investigate the phenotypic behavior of L. acetotolerans, draft genome sequencing was performed. Results demonstrated a total of 1824 predicted annotated genes, with several potential VPNC- and beer-spoilage-associated genes identified. Importantly, this is the first genome sequence of L. acetotolerans as beer-spoilage bacteria and it may aid in further analysis of L. acetotolerans and other beer-spoilage bacteria, with direct implications for food safety control in the beer brewing industry. © FEMS 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  8. Annotation of two large contiguous regions from the Haemonchus contortus genome using RNA-seq and comparative analysis with Caenorhabditis elegans.

    Directory of Open Access Journals (Sweden)

    Roz Laing

    Full Text Available The genomes of numerous parasitic nematodes are currently being sequenced, but their complexity and size, together with high levels of intra-specific sequence variation and a lack of reference genomes, makes their assembly and annotation a challenging task. Haemonchus contortus is an economically significant parasite of livestock that is widely used for basic research as well as for vaccine development and drug discovery. It is one of many medically and economically important parasites within the strongylid nematode group. This group of parasites has the closest phylogenetic relationship with the model organism Caenorhabditis elegans, making comparative analysis a potentially powerful tool for genome annotation and functional studies. To investigate this hypothesis, we sequenced two contiguous fragments from the H. contortus genome and undertook detailed annotation and comparative analysis with C. elegans. The adult H. contortus transcriptome was sequenced using an Illumina platform and RNA-seq was used to annotate a 409 kb overlapping BAC tiling path relating to the X chromosome and a 181 kb BAC insert relating to chromosome I. In total, 40 genes and 12 putative transposable elements were identified. 97.5% of the annotated genes had detectable homologues in C. elegans of which 60% had putative orthologues, significantly higher than previous analyses based on EST analysis. Gene density appears to be less in H. contortus than in C. elegans, with annotated H. contortus genes being an average of two-to-three times larger than their putative C. elegans orthologues due to a greater intron number and size. Synteny appears high but gene order is generally poorly conserved, although areas of conserved microsynteny are apparent. C. elegans operons appear to be partially conserved in H. contortus. Our findings suggest that a combination of RNA-seq and comparative analysis with C. elegans is a powerful approach for the annotation and analysis of strongylid

  9. Diversity of Pseudomonas Genomes, Including Populus-Associated Isolates, as Revealed by Comparative Genome Analysis.

    Science.gov (United States)

    Jun, Se-Ran; Wassenaar, Trudy M; Nookaew, Intawat; Hauser, Loren; Wanchai, Visanu; Land, Miriam; Timm, Collin M; Lu, Tse-Yuan S; Schadt, Christopher W; Doktycz, Mitchel J; Pelletier, Dale A; Ussery, David W

    2016-01-01

    The Pseudomonas genus contains a metabolically versatile group of organisms that are known to occupy numerous ecological niches, including the rhizosphere and endosphere of many plants. Their diversity influences the phylogenetic diversity and heterogeneity of these communities. On the basis of average amino acid identity, comparative genome analysis of >1,000 Pseudomonas genomes, including 21 Pseudomonas strains isolated from the roots of native Populus deltoides (eastern cottonwood) trees resulted in consistent and robust genomic clusters with phylogenetic homogeneity. All Pseudomonas aeruginosa genomes clustered together, and these were clearly distinct from other Pseudomonas species groups on the basis of pangenome and core genome analyses. In contrast, the genomes of Pseudomonas fluorescens were organized into 20 distinct genomic clusters, representing enormous diversity and heterogeneity. Most of our 21 Populus-associated isolates formed three distinct subgroups within the major P. fluorescens group, supported by pathway profile analysis, while two isolates were more closely related to Pseudomonas chlororaphis and Pseudomonas putida. Genes specific to Populus-associated subgroups were identified. Genes specific to subgroup 1 include several sensory systems that act in two-component signal transduction, a TonB-dependent receptor, and a phosphorelay sensor. Genes specific to subgroup 2 contain hypothetical genes, and genes specific to subgroup 3 were annotated with hydrolase activity. This study justifies the need to sequence multiple isolates, especially from P. fluorescens, which displays the most genetic variation, in order to study functional capabilities from a pangenomic perspective. This information will prove useful when choosing Pseudomonas strains for use to promote growth and increase disease resistance in plants. Copyright © 2015 Jun et al.

  10. Annotation Of Novel And Conserved MicroRNA Genes In The Build 10 Sus scrofa Reference Genome And Determination Of Their Expression Levels In Ten Different Tissues

    DEFF Research Database (Denmark)

    Thomsen, Bo; Nielsen, Mathilde; Hedegaard, Jakob

    The DNA template used in the pig genome sequencing project was provided by a Duroc pig named TJ Tabasco. In an effort to annotate microRNA (miRNA) genes in the reference genome we have conducted deep sequencing to determine the miRNA transcriptomes in ten different tissues isolated from Pinky......, a genetically identical clone of TJ Tabasco. The purpose was to generate miRNA sequences that are highly homologous to the reference genome sequence, which along with computational prediction will improve confidence in the genomic annotation of miRNA genes. Based on homology searches of the sequence data...... against miRBase, we identified more than 600 conserved known miRNA/miRNA*, which is a significant increase relative to the 211 porcine miRNA/miRNA* deposited in the current version of miRBase. Furthermore, the genome-wide transcript profiles provided important information on the relative abundance...

  11. Integrated genomics of Mucorales reveals novel therapeutic targets

    Science.gov (United States)

    Mucormycosis is a life-threatening infection caused by Mucorales fungi. We sequenced 30 fungal genomes and performed transcriptomics with three representative Rhizopus and Mucor strains with human airway epithelial cells during fungal invasion to reveal key host and fungal determinants contributing ...

  12. Camelid genomes reveal evolution and adaptation to desert environments.

    Science.gov (United States)

    Wu, Huiguang; Guang, Xuanmin; Al-Fageeh, Mohamed B; Cao, Junwei; Pan, Shengkai; Zhou, Huanmin; Zhang, Li; Abutarboush, Mohammed H; Xing, Yanping; Xie, Zhiyuan; Alshanqeeti, Ali S; Zhang, Yanru; Yao, Qiulin; Al-Shomrani, Badr M; Zhang, Dong; Li, Jiang; Manee, Manee M; Yang, Zili; Yang, Linfeng; Liu, Yiyi; Zhang, Jilin; Altammami, Musaad A; Wang, Shenyuan; Yu, Lili; Zhang, Wenbin; Liu, Sanyang; Ba, La; Liu, Chunxia; Yang, Xukui; Meng, Fanhua; Wang, Shaowei; Li, Lu; Li, Erli; Li, Xueqiong; Wu, Kaifeng; Zhang, Shu; Wang, Junyi; Yin, Ye; Yang, Huanming; Al-Swailem, Abdulaziz M; Wang, Jun

    2014-10-21

    Bactrian camel (Camelus bactrianus), dromedary (Camelus dromedarius) and alpaca (Vicugna pacos) are economically important livestock. Although the Bactrian camel and dromedary are large, typically arid-desert-adapted mammals, alpacas are adapted to plateaus. Here we present high-quality genome sequences of these three species. Our analysis reveals the demographic history of these species since the Tortonian Stage of the Miocene and uncovers a striking correlation between large fluctuations in population size and geological time boundaries. Comparative genomic analysis reveals complex features related to desert adaptations, including fat and water metabolism, stress responses to heat, aridity, intense ultraviolet radiation and choking dust. Transcriptomic analysis of Bactrian camels further reveals unique osmoregulation, osmoprotection and compensatory mechanisms for water reservation underpinned by high blood glucose levels. We hypothesize that these physiological mechanisms represent kidney evolutionary adaptations to the desert environment. This study advances our understanding of camelid evolution and the adaptation of camels to arid-desert environments.

  13. Automated update, revision, and quality control of the maize genome annotations using MAKER-P improves the B73 RefGen_v3 gene models and identifies new genes

    Science.gov (United States)

    The large size and relative complexity of many plant genomes make creation, quality control, and dissemination of high-quality gene structure annotations challenging. In response, we have developed MAKER-P, a fast and easy-to-use genome annotation engine for plants. Here, we report the use of MAKER-...

  14. Genomic sequence around butterfly wing development genes: annotation and comparative analysis.

    Directory of Open Access Journals (Sweden)

    Inês C Conceição

    Full Text Available BACKGROUND: Analysis of genomic sequence allows characterization of genome content and organization, and access beyond gene-coding regions for identification of functional elements. BAC libraries, where relatively large genomic regions are made readily available, are especially useful for species without a fully sequenced genome and can increase genomic coverage of phylogenetic and biological diversity. For example, no butterfly genome is yet available despite the unique genetic and biological properties of this group, such as diversified wing color patterns. The evolution and development of these patterns is being studied in a few target species, including Bicyclus anynana, where a whole-genome BAC library allows targeted access to large genomic regions. METHODOLOGY/PRINCIPAL FINDINGS: We characterize ∼1.3 Mb of genomic sequence around 11 selected genes expressed in B. anynana developing wings. Extensive manual curation of in silico predictions, also making use of a large dataset of expressed genes for this species, identified repetitive elements and protein coding sequence, and highlighted an expansion of Alcohol dehydrogenase genes. Comparative analysis with orthologous regions of the lepidopteran reference genome allowed assessment of conservation of fine-scale synteny (with detection of new inversions and translocations and of DNA sequence (with detection of high levels of conservation of non-coding regions around some, but not all, developmental genes. CONCLUSIONS: The general properties and organization of the available B. anynana genomic sequence are similar to the lepidopteran reference, despite the more than 140 MY divergence. Our results lay the groundwork for further studies of new interesting findings in relation to both coding and non-coding sequence: 1 the Alcohol dehydrogenase expansion with higher similarity between the five tandemly-repeated B. anynana paralogs than with the corresponding B. mori orthologs, and 2 the high

  15. TU-CD-BRB-07: Identification of Associations Between Radiologist-Annotated Imaging Features and Genomic Alterations in Breast Invasive Carcinoma, a TCGA Phenotype Research Group Study

    Energy Technology Data Exchange (ETDEWEB)

    Rao, A; Net, J [University of Miami, Miami, Florida (United States); Brandt, K [Mayo Clinic, Rochester, Minnesota (United States); Huang, E [National Cancer Institute, NIH, Bethesda, MD (United States); Freymann, J; Kirby, J [Leidos Biomedical Research Inc., Frederick, MD (United States); Burnside, E [University of Wisconsin School of Medicine and Public Health, Madison, Wisconsin (United States); Morris, E; Sutton, E [Memorial Sloan Kettering Cancer Center, New York, NY (United States); Bonaccio, E [Roswell Park Cancer Institute, Buffalo, NY (United States); Giger, M; Jaffe, C [Univ Chicago, Chicago, IL (United States); Ganott, M; Zuley, M [University of Pittsburgh Medical Center - Magee Womens Hospital, Pittsburgh, Pennsylvania (United States); Le-Petross, H [MD Anderson Cancer Center, Houston, TX (United States); Dogan, B [UT MDACC, Houston, TX (United States); Whitman, G [UTMDACC, Houston, TX (United States)

    2015-06-15

    , and HGF/MET/RANBP9. Linear nonmass enhancement was associated with PIK3R1 and AKT activity. Conclusion: MRI-genomic association analysis revealed that several BRCA-associated gene features were associated with radiologist-annotated image features.

  16. Genome Sequencing Reveals the Potential of Achromobacter sp. HZ01 for Bioremediation

    Directory of Open Access Journals (Sweden)

    Yue-Hui Hong

    2017-08-01

    Full Text Available Petroleum pollution is a severe environmental issue. Comprehensively revealing the genetic backgrounds of hydrocarbon-degrading microorganisms contributes to developing effective methods for bioremediation of crude oil-polluted environments. Marine bacterium Achromobacter sp. HZ01 is capable of degrading hydrocarbons and producing biosurfactants. In this study, the draft genome (5.5 Mbp of strain HZ01 has been obtained by Illumina sequencing, containing 5,162 predicted genes. Genome annotation shows that “amino acid metabolism” is the most abundant metabolic pathway. Strain HZ01 is not capable of using some common carbohydrates as the sole carbon sources, which is due to that it contains few genes associated with carbohydrate transport and lacks some important enzymes related to glycometabolism. It contains abundant proteins directly related to petroleum hydrocarbon degradation. AlkB hydroxylase and its homologs were not identified. It harbors a complete enzyme system of terminal oxidation pathway for n-alkane degradation, which may be initiated by cytochrome P450. The enzymes involved in the catechol pathway are relatively complete for the degradation of aromatic compounds. This bacterium lacks several essential enzymes for methane oxidation, and Baeyer-Villiger monooxygenase involved in the subterminal oxidation pathway and cycloalkane degradation was not identified. These results suggest that strain HZ01 degrades n-alkanes via the terminal oxidation pathway, degrades aromatic compounds primarily via the catechol pathway and cannot perform methane oxidation or cycloalkane degradation. Additionally, strain HZ01 possesses abundant genes related to the metabolism of secondary metabolites, including some genes involved in biosurfactant (such as glycolipids and lipopeptides synthesis. The genome analysis also reveals its genetic basis for nitrogen metabolism, antibiotic resistance, regulatory responses to environmental changes, cell motility

  17. Upper Palaeolithic Siberian genome reveals dual ancestry of Native Americans

    DEFF Research Database (Denmark)

    Raghavan, Maanasa; Skoglund, Pontus; Graf, Kelly E.

    2014-01-01

    ,000-year-old individual (MA-1), from Mal'ta in south-central Siberia, to an average depth of 1×. To our knowledge this is the oldest anatomically modern human genome reported to date. The MA-1 mitochondrial genome belongs to haplogroup U, which has also been found at high frequency among Upper Palaeolithic......The origins of the First Americans remain contentious. Although Native Americans seem to be genetically most closely related to east Asians, there is no consensus with regard to which specific Old World populations they are closest to. Here we sequence the draft genome of an approximately 24...... that the region was continuously occupied by humans throughout the Last Glacial Maximum. Our findings reveal that western Eurasian genetic signatures in modern-day Native Americans derive not only from post-Columbian admixture, as commonly thought, but also from a mixed ancestry of the First Americans....

  18. The Brassica oleracea genome reveals the asymmetrical evolution of polyploid genomes

    Science.gov (United States)

    Liu, Shengyi; Liu, Yumei; Yang, Xinhua; Tong, Chaobo; Edwards, David; Parkin, Isobel A. P.; Zhao, Meixia; Ma, Jianxin; Yu, Jingyin; Huang, Shunmou; Wang, Xiyin; Wang, Junyi; Lu, Kun; Fang, Zhiyuan; Bancroft, Ian; Yang, Tae-Jin; Hu, Qiong; Wang, Xinfa; Yue, Zhen; Li, Haojie; Yang, Linfeng; Wu, Jian; Zhou, Qing; Wang, Wanxin; King, Graham J; Pires, J. Chris; Lu, Changxin; Wu, Zhangyan; Sampath, Perumal; Wang, Zhuo; Guo, Hui; Pan, Shengkai; Yang, Limei; Min, Jiumeng; Zhang, Dong; Jin, Dianchuan; Li, Wanshun; Belcram, Harry; Tu, Jinxing; Guan, Mei; Qi, Cunkou; Du, Dezhi; Li, Jiana; Jiang, Liangcai; Batley, Jacqueline; Sharpe, Andrew G; Park, Beom-Seok; Ruperao, Pradeep; Cheng, Feng; Waminal, Nomar Espinosa; Huang, Yin; Dong, Caihua; Wang, Li; Li, Jingping; Hu, Zhiyong; Zhuang, Mu; Huang, Yi; Huang, Junyan; Shi, Jiaqin; Mei, Desheng; Liu, Jing; Lee, Tae-Ho; Wang, Jinpeng; Jin, Huizhe; Li, Zaiyun; Li, Xun; Zhang, Jiefu; Xiao, Lu; Zhou, Yongming; Liu, Zhongsong; Liu, Xuequn; Qin, Rui; Tang, Xu; Liu, Wenbin; Wang, Yupeng; Zhang, Yangyong; Lee, Jonghoon; Kim, Hyun Hee; Denoeud, France; Xu, Xun; Liang, Xinming; Hua, Wei; Wang, Xiaowu; Wang, Jun; Chalhoub, Boulos; Paterson, Andrew H

    2014-01-01

    Polyploidization has provided much genetic variation for plant adaptive evolution, but the mechanisms by which the molecular evolution of polyploid genomes establishes genetic architecture underlying species differentiation are unclear. Brassica is an ideal model to increase knowledge of polyploid evolution. Here we describe a draft genome sequence of Brassica oleracea, comparing it with that of its sister species B. rapa to reveal numerous chromosome rearrangements and asymmetrical gene loss in duplicated genomic blocks, asymmetrical amplification of transposable elements, differential gene co-retention for specific pathways and variation in gene expression, including alternative splicing, among a large number of paralogous and orthologous genes. Genes related to the production of anticancer phytochemicals and morphological variations illustrate consequences of genome duplication and gene divergence, imparting biochemical and morphological variation to B. oleracea. This study provides insights into Brassica genome evolution and will underpin research into the many important crops in this genus. PMID:24852848

  19. The Drosophila melanogaster PeptideAtlas facilitates the use of peptide data for improved fly proteomics and genome annotation

    Directory of Open Access Journals (Sweden)

    King Nichole L

    2009-02-01

    Full Text Available Abstract Background Crucial foundations of any quantitative systems biology experiment are correct genome and proteome annotations. Protein databases compiled from high quality empirical protein identifications that are in turn based on correct gene models increase the correctness, sensitivity, and quantitative accuracy of systems biology genome-scale experiments. Results In this manuscript, we present the Drosophila melanogaster PeptideAtlas, a fly proteomics and genomics resource of unsurpassed depth. Based on peptide mass spectrometry data collected in our laboratory the portal http://www.drosophila-peptideatlas.org allows querying fly protein data observed with respect to gene model confirmation and splice site verification as well as for the identification of proteotypic peptides suited for targeted proteomics studies. Additionally, the database provides consensus mass spectra for observed peptides along with qualitative and quantitative information about the number of observations of a particular peptide and the sample(s in which it was observed. Conclusion PeptideAtlas is an open access database for the Drosophila community that has several features and applications that support (1 reduction of the complexity inherently associated with performing targeted proteomic studies, (2 designing and accelerating shotgun proteomics experiments, (3 confirming or questioning gene models, and (4 adjusting gene models such that they are in line with observed Drosophila peptides. While the database consists of proteomic data it is not required that the user is a proteomics expert.

  20. IW-Scoring: an Integrative Weighted Scoring framework for annotating and prioritizing genetic variations in the noncoding genome.

    Science.gov (United States)

    Wang, Jun; Dayem Ullah, Abu Z; Chelala, Claude

    2018-01-30

    The vast majority of germline and somatic variations occur in the noncoding part of the genome, only a small fraction of which are believed to be functional. From the tens of thousands of noncoding variations detectable in each genome, identifying and prioritizing driver candidates with putative functional significance is challenging. To address this, we implemented IW-Scoring, a new Integrative Weighted Scoring model to annotate and prioritise functionally relevant noncoding variations. We evaluate 11 scoring methods, and apply an unsupervised spectral approach for subsequent selective integration into two linear weighted functional scoring schemas for known and novel variations. IW-Scoring produces stable high-quality performance as the best predictors for three independent data sets. We demonstrate the robustness of IW-Scoring in identifying recurrent functional mutations in the TERT promoter, as well as disease SNPs in proximity to consensus motifs and with gene regulatory effects. Using follicular lymphoma as a paradigmatic cancer model, we apply IW-Scoring to locate 11 recurrently mutated noncoding regions in 14 follicular lymphoma genomes, and validate 9 of these regions in an extension cohort, including the promoter and enhancer regions of PAX5. Overall, IW-Scoring demonstrates greater versatility in identifying trait- and disease-associated noncoding variants. Scores from IW-Scoring as well as other methods are freely available from http://www.snp-nexus.org/IW-Scoring/. © The Author(s) 2018. Published by Oxford University Press on behalf of Nucleic Acids Research.

  1. Draft whole genome sequence of groundnut stem rot fungus Athelia rolfsii revealing genetic architect of its pathogenicity and virulence.

    Science.gov (United States)

    Iquebal, M A; Tomar, Rukam S; Parakhia, M V; Singla, Deepak; Jaiswal, Sarika; Rathod, V M; Padhiyar, S M; Kumar, Neeraj; Rai, Anil; Kumar, Dinesh

    2017-07-13

    Groundnut (Arachis hypogaea L.) is an important oil seed crop having major biotic constraint in production due to stem rot disease caused by fungus, Athelia rolfsii causing 25-80% loss in productivity. As chemical and biological combating strategies of this fungus are not very effective, thus genome sequencing can reveal virulence and pathogenicity related genes for better understanding of the host-parasite interaction. We report draft assembly of Athelia rolfsii genome of ~73 Mb having 8919 contigs. Annotation analysis revealed 16830 genes which are involved in fungicide resistance, virulence and pathogenicity along with putative effector and lethal genes. Secretome analysis revealed CAZY genes representing 1085 enzymatic genes, glycoside hydrolases, carbohydrate esterases, carbohydrate-binding modules, auxillary activities, glycosyl transferases and polysaccharide lyases. Repeat analysis revealed 11171 SSRs, LTR, GYPSY and COPIA elements. Comparative analysis with other existing ascomycotina genome predicted conserved domain family of WD40, CYP450, Pkinase and ABC transporter revealing insight of evolution of pathogenicity and virulence. This study would help in understanding pathogenicity and virulence at molecular level and development of new combating strategies. Such approach is imperative in endeavour of genome based solution in stem rot disease management leading to better productivity of groundnut crop in tropical region of world.

  2. Ribosome Profiling Reveals Pervasive Translation Outside of Annotated Protein-Coding Genes

    Directory of Open Access Journals (Sweden)

    Nicholas T. Ingolia

    2014-09-01

    Full Text Available Ribosome profiling suggests that ribosomes occupy many regions of the transcriptome thought to be noncoding, including 5′ UTRs and long noncoding RNAs (lncRNAs. Apparent ribosome footprints outside of protein-coding regions raise the possibility of artifacts unrelated to translation, particularly when they occupy multiple, overlapping open reading frames (ORFs. Here, we show hallmarks of translation in these footprints: copurification with the large ribosomal subunit, response to drugs targeting elongation, trinucleotide periodicity, and initiation at early AUGs. We develop a metric for distinguishing between 80S footprints and nonribosomal sources using footprint size distributions, which validates the vast majority of footprints outside of coding regions. We present evidence for polypeptide production beyond annotated genes, including the induction of immune responses following human cytomegalovirus (HCMV infection. Translation is pervasive on cytosolic transcripts outside of conserved reading frames, and direct detection of this expanded universe of translated products enables efforts at understanding how cells manage and exploit its consequences.

  3. Revealing complex function, process and pathway interactions with high-throughput expression and biological annotation data.

    Science.gov (United States)

    Singh, Nitesh Kumar; Ernst, Mathias; Liebscher, Volkmar; Fuellen, Georg; Taher, Leila

    2016-10-20

    The biological relationships both between and within the functions, processes and pathways that operate within complex biological systems are only poorly characterized, making the interpretation of large scale gene expression datasets extremely challenging. Here, we present an approach that integrates gene expression and biological annotation data to identify and describe the interactions between biological functions, processes and pathways that govern a phenotype of interest. The product is a global, interconnected network, not of genes but of functions, processes and pathways, that represents the biological relationships within the system. We validated our approach on two high-throughput expression datasets describing organismal and organ development. Our findings are well supported by the available literature, confirming that developmental processes and apoptosis play key roles in cell differentiation. Furthermore, our results suggest that processes related to pluripotency and lineage commitment, which are known to be critical for development, interact mainly indirectly, through genes implicated in more general biological processes. Moreover, we provide evidence that supports the relevance of cell spatial organization in the developing liver for proper liver function. Our strategy can be viewed as an abstraction that is useful to interpret high-throughput data and devise further experiments.

  4. Prokaryotic Phylogenies Inferred from Whole-Genome Sequence and Annotation Data

    Directory of Open Access Journals (Sweden)

    Wei Du

    2013-01-01

    Full Text Available Phylogenetic trees are used to represent the evolutionary relationship among various groups of species. In this paper, a novel method for inferring prokaryotic phylogenies using multiple genomic information is proposed. The method is called CGCPhy and based on the distance matrix of orthologous gene clusters between whole-genome pairs. CGCPhy comprises four main steps. First, orthologous genes are determined by sequence similarity, genomic function, and genomic structure information. Second, genes involving potential HGT events are eliminated, since such genes are considered to be the highly conserved genes across different species and the genes located on fragments with abnormal genome barcode. Third, we calculate the distance of the orthologous gene clusters between each genome pair in terms of the number of orthologous genes in conserved clusters. Finally, the neighbor-joining method is employed to construct phylogenetic trees across different species. CGCPhy has been examined on different datasets from 617 complete single-chromosome prokaryotic genomes and achieved applicative accuracies on different species sets in agreement with Bergey's taxonomy in quartet topologies. Simulation results show that CGCPhy achieves high average accuracy and has a low standard deviation on different datasets, so it has an applicative potential for phylogenetic analysis.

  5. Genome sequencing and annotation of Acinetobacter gerneri strain MTCC 9824T

    Directory of Open Access Journals (Sweden)

    Nitin Kumar Singh

    2014-12-01

    Full Text Available The genus Acinetobacter consists of 31 validly published species ubiquitously distributed in nature and primarily associated with nosocomial infection. We report the 4.4 Mb genome of Acinetobacter gerneri strain MTCC 9824T. The genome has a G + C content of 38.0% and includes 3 rRNA genes (5S, 23S16S and 64 aminoacyl-tRNA synthetase genes.

  6. Genome sequencing and annotation of Acinetobacter gyllenbergii strain MTCC 11365T

    Directory of Open Access Journals (Sweden)

    Nitin Kumar Singh

    2014-12-01

    Full Text Available The genus Acinetobacter consists of 31 validly published species ubiquitously distributed in nature and primarily associated with nosocomial infection. We report 4.3 Mb genome of the Acinetobacter gyllenbergii strain MTCC 11365T. The draft genome of A. gyllenbergii has a G + C content of 41.0% and includes 3 rRNA genes (5S, 23S, 16S and 67 aminoacyl-tRNA synthetase genes.

  7. Fish the ChIPs: a pipeline for automated genomic annotation of ChIP-Seq data

    Directory of Open Access Journals (Sweden)

    Minucci Saverio

    2011-10-01

    Full Text Available Abstract Background High-throughput sequencing is generating massive amounts of data at a pace that largely exceeds the throughput of data analysis routines. Here we introduce Fish the ChIPs (FC, a computational pipeline aimed at a broad public of users and designed to perform complete ChIP-Seq data analysis of an unlimited number of samples, thus increasing throughput, reproducibility and saving time. Results Starting from short read sequences, FC performs the following steps: 1 quality controls, 2 alignment to a reference genome, 3 peak calling, 4 genomic annotation, 5 generation of raw signal tracks for visualization on the UCSC and IGV genome browsers. FC exploits some of the fastest and most effective tools today available. Installation on a Mac platform requires very basic computational skills while configuration and usage are supported by a user-friendly graphic user interface. Alternatively, FC can be compiled from the source code on any Unix machine and then run with the possibility of customizing each single parameter through a simple configuration text file that can be generated using a dedicated user-friendly web-form. Considering the execution time, FC can be run on a desktop machine, even though the use of a computer cluster is recommended for analyses of large batches of data. FC is perfectly suited to work with data coming from Illumina Solexa Genome Analyzers or ABI SOLiD and its usage can potentially be extended to any sequencing platform. Conclusions Compared to existing tools, FC has two main advantages that make it suitable for a broad range of users. First of all, it can be installed and run by wet biologists on a Mac machine. Besides it can handle an unlimited number of samples, being convenient for large analyses. In this context, computational biologists can increase reproducibility of their ChIP-Seq data analyses while saving time for downstream analyses. Reviewers This article was reviewed by Gavin Huttley, George

  8. In vivo genome-wide profiling of RNA secondary structure reveals novel regulatory features.

    Science.gov (United States)

    Ding, Yiliang; Tang, Yin; Kwok, Chun Kit; Zhang, Yu; Bevilacqua, Philip C; Assmann, Sarah M

    2014-01-30

    RNA structure has critical roles in processes ranging from ligand sensing to the regulation of translation, polyadenylation and splicing. However, a lack of genome-wide in vivo RNA structural data has limited our understanding of how RNA structure regulates gene expression in living cells. Here we present a high-throughput, genome-wide in vivo RNA structure probing method, structure-seq, in which dimethyl sulphate methylation of unprotected adenines and cytosines is identified by next-generation sequencing. Application of this method to Arabidopsis thaliana seedlings yielded the first in vivo genome-wide RNA structure map at nucleotide resolution for any organism, with quantitative structural information across more than 10,000 transcripts. Our analysis reveals a three-nucleotide periodic repeat pattern in the structure of coding regions, as well as a less-structured region immediately upstream of the start codon, and shows that these features are strongly correlated with translation efficiency. We also find patterns of strong and weak secondary structure at sites of alternative polyadenylation, as well as strong secondary structure at 5' splice sites that correlates with unspliced events. Notably, in vivo structures of messenger RNAs annotated for stress responses are poorly predicted in silico, whereas mRNA structures of genes related to cell function maintenance are well predicted. Global comparison of several structural features between these two categories shows that the mRNAs associated with stress responses tend to have more single-strandedness, longer maximal loop length and higher free energy per nucleotide, features that may allow these RNAs to undergo conformational changes in response to environmental conditions. Structure-seq allows the RNA structurome and its biological roles to be interrogated on a genome-wide scale and should be applicable to any organism.

  9. Comprehensive Genomic Profiling of Esthesioneuroblastoma Reveals Additional Treatment Options.

    Science.gov (United States)

    Gay, Laurie M; Kim, Sungeun; Fedorchak, Kyle; Kundranda, Madappa; Odia, Yazmin; Nangia, Chaitali; Battiste, James; Colon-Otero, Gerardo; Powell, Steven; Russell, Jeffery; Elvin, Julia A; Vergilio, Jo-Anne; Suh, James; Ali, Siraj M; Stephens, Philip J; Miller, Vincent A; Ross, Jeffrey S

    2017-07-01

    Esthesioneuroblastoma (ENB), also known as olfactory neuroblastoma, is a rare malignant neoplasm of the olfactory mucosa. Despite surgical resection combined with radiotherapy and adjuvant chemotherapy, ENB often relapses with rapid progression. Current multimodality, nontargeted therapy for relapsed ENB is of limited clinical benefit. We queried whether comprehensive genomic profiling (CGP) of relapsed or refractory ENB can uncover genomic alterations (GA) that could identify potential targeted therapies for these patients. CGP was performed on formalin-fixed, paraffin-embedded sections from 41 consecutive clinical cases of ENBs using a hybrid-capture, adaptor ligation based next-generation sequencing assay to a mean coverage depth of 593X. The results were analyzed for base substitutions, insertions and deletions, select rearrangements, and copy number changes (amplifications and homozygous deletions). Clinically relevant GA (CRGA) were defined as GA linked to drugs on the market or under evaluation in clinical trials. A total of 28 ENBs harbored GA, with a mean of 1.5 GA per sample. Approximately half of the ENBs (21, 51%) featured at least one CRGA, with an average of 1 CRGA per sample. The most commonly altered gene was TP53 (17%), with GA in PIK3CA , NF1 , CDKN2A , and CDKN2C occurring in 7% of samples. We report comprehensive genomic profiles for 41 ENB tumors. CGP revealed potential new therapeutic targets, including targetable GA in the mTOR, CDK and growth factor signaling pathways, highlighting the clinical value of genomic profiling in ENB. Comprehensive genomic profiling of 41 relapsed or refractory ENBs reveals recurrent alterations or classes of mutation, including amplification of tyrosine kinases encoded on chromosome 5q and mutations affecting genes in the mTOR/PI3K pathway. Approximately half of the ENBs (21, 51%) featured at least one clinically relevant genomic alteration (CRGA), with an average of 1 CRGA per sample. The most commonly altered

  10. Genome-wide association study and annotating candidate gene networks affecting age at first calving in Nellore cattle.

    Science.gov (United States)

    Mota, R R; Guimarães, S E F; Fortes, M R S; Hayes, B; Silva, F F; Verardo, L L; Kelly, M J; de Campos, C F; Guimarães, J D; Wenceslau, R R; Penitente-Filho, J M; Garcia, J F; Moore, S

    2017-12-01

    We performed a genome-wide mapping for the age at first calving (AFC) with the goal of annotating candidate genes that regulate fertility in Nellore cattle. Phenotypic data from 762 cows and 777k SNP genotypes from 2,992 bulls and cows were used. Single nucleotide polymorphism (SNP) effects based on the single-step GBLUP methodology were blocked into adjacent windows of 1 Megabase (Mb) to explain the genetic variance. SNP windows explaining more than 0.40% of the AFC genetic variance were identified on chromosomes 2, 8, 9, 14, 16 and 17. From these windows, we identified 123 coding protein genes that were used to build gene networks. From the association study and derived gene networks, putative candidate genes (e.g., PAPPA, PREP, FER1L6, TPR, NMNAT1, ACAD10, PCMTD1, CRH, OPKR1, NPBWR1 and NCOA2) and transcription factors (TF) (STAT1, STAT3, RELA, E2F1 and EGR1) were strongly associated with female fertility (e.g., negative regulation of luteinizing hormone secretion, folliculogenesis and establishment of uterine receptivity). Evidence suggests that AFC inheritance is complex and controlled by multiple loci across the genome. As several windows explaining higher proportion of the genetic variance were identified on chromosome 14, further studies investigating the interaction across haplotypes to better understand the molecular architecture behind AFC in Nellore cattle should be undertaken. © 2017 Blackwell Verlag GmbH.

  11. Sequencing, de novo assembling, and annotating the genome of the endangered Chinese crocodile lizard Shinisaurus crocodilurus.

    Science.gov (United States)

    Gao, Jian; Li, Qiye; Wang, Zongji; Zhou, Yang; Martelli, Paolo; Li, Fang; Xiong, Zijun; Wang, Jian; Yang, Huanming; Zhang, Guojie

    2017-07-01

    The Chinese crocodile lizard, Shinisaurus crocodilurus, is the only living representative of the monotypic family Shinisauridae under the order Squamata. It is an obligate semi-aquatic, viviparous, diurnal species restricted to specific portions of mountainous locations in southwestern China and northeastern Vietnam. However, in the past several decades, this species has undergone a rapid decrease in population size due to illegal poaching and habitat disruption, making this unique reptile species endangered and listed in the Convention on International Trade in Endangered Species of Wild Fauna and Flora Appendix II since 1990. A proposal to uplist it to Appendix I was passed at the Convention on International Trade in Endangered Species of Wild Fauna and Flora Seventeenth meeting of the Conference of the Parties in 2016. To promote the conservation of this species, we sequenced the genome of a male Chinese crocodile lizard using a whole-genome shotgun strategy on the Illumina HiSeq 2000 platform. In total, we generated ∼291 Gb of raw sequencing data (×149 depth) from 13 libraries with insert sizes ranging from 250 bp to 40 kb. After filtering for polymerase chain reaction-duplicated and low-quality reads, ∼137 Gb of clean data (×70 depth) were obtained for genome assembly. We yielded a draft genome assembly with a total length of 2.24 Gb and an N50 scaffold size of 1.47 Mb. The assembled genome was predicted to contain 20 150 protein-coding genes and up to 1114 Mb (49.6%) of repetitive elements. The genomic resource of the Chinese crocodile lizard will contribute to deciphering the biology of this organism and provides an essential tool for conservation efforts. It also provides a valuable resource for future study of squamate evolution. © The Authors 2017. Published by Oxford University Press.

  12. Whole genome sequences and annotation of Micrococcus luteus SUBG006, a novel phytopathogen of mango.

    Science.gov (United States)

    Rakhashiya, Purvi M; Patel, Pooja P; Thaker, Vrinda S

    2015-12-01

    Actinobaceria, Micrococcus luteus SUBG006 was isolated from infected leaves of Mangifera indica L. vr. Nylon in Rajkot, (22.30°N, 70.78°E), Gujarat, India. The genome size is 3.86 Mb with G + C content of 69.80% and contains 112 rRNA sequences (5S, 16S and 23S). The whole genome sequencing has been deposited in DDBJ/EMBL/GenBank under the accession number JOKP00000000.

  13. Genome sequencing and annotation of Acinetobacter guillouiae strain MSP 4-18

    Directory of Open Access Journals (Sweden)

    Nitin Kumar Singh

    2014-12-01

    Full Text Available The genus Acinetobacter consists of 31 validly published species ubiquitously distributed in nature and primarily associated with nosocomial infection. We report the 4.8 Mb genome of Acinetobacter guillouiae MSP 4-18, isolated from a mangrove soil sample from Parangipettai (11°30′N, 79°47′E, Tamil Nadu, India. The draft genome of A. guillouiae MSP 4-18 has a G + C content of 38.0% and includes 3 rRNA genes (5S, 23S, 16S and 69 aminoacyl-tRNA synthetase genes.

  14. Assembly of the Lactuca sativa, L. cv. Tizian draft genome sequence reveals differences within major resistance complex 1 as compared to the cv. Salinas reference genome.

    Science.gov (United States)

    Verwaaijen, Bart; Wibberg, Daniel; Nelkner, Johanna; Gordin, Miriam; Rupp, Oliver; Winkler, Anika; Bremges, Andreas; Blom, Jochen; Grosch, Rita; Pühler, Alfred; Schlüter, Andreas

    2018-02-10

    Lettuce (Lactuca sativa, L.) is an important annual plant of the family Asteraceae (Compositae). The commercial lettuce cultivar Tizian has been used in various scientific studies investigating the interaction of the plant with phytopathogens or biological control agents. Here, we present the de novo draft genome sequencing and gene prediction for this specific cultivar derived from transcriptome sequence data. The assembled scaffolds amount to a size of 2.22 Gb. Based on RNAseq data, 31,112 transcript isoforms were identified. Functional predictions for these transcripts were determined within the GenDBE annotation platform. Comparison with the cv. Salinas reference genome revealed a high degree of sequence similarity on genome and transcriptome levels, with an average amino acid identity of 99%. Furthermore, it was observed that two large regions are either missing or are highly divergent within the cv. Tizian genome compared to cv. Salinas. One of these regions covers the major resistance complex 1 region of cv. Salinas. The cv. Tizian draft genome sequence provides a valuable resource for future functional and transcriptome analyses focused on this lettuce cultivar. Copyright © 2017 Elsevier B.V. All rights reserved.

  15. Genomic resources for gene discovery, functional genome annotation, and evolutionary studies of maize and its close relatives.

    Science.gov (United States)

    Wang, Chao; Shi, Xue; Liu, Lin; Li, Haiyan; Ammiraju, Jetty S S; Kudrna, David A; Xiong, Wentao; Wang, Hao; Dai, Zhaozhao; Zheng, Yonglian; Lai, Jinsheng; Jin, Weiwei; Messing, Joachim; Bennetzen, Jeffrey L; Wing, Rod A; Luo, Meizhong

    2013-11-01

    Maize is one of the most important food crops and a key model for genetics and developmental biology. A genetically anchored and high-quality draft genome sequence of maize inbred B73 has been obtained to serve as a reference sequence. To facilitate evolutionary studies in maize and its close relatives, much like the Oryza Map Alignment Project (OMAP) (www.OMAP.org) bacterial artificial chromosome (BAC) resource did for the rice community, we constructed BAC libraries for maize inbred lines Zheng58, Chang7-2, and Mo17 and maize wild relatives Zea mays ssp. parviglumis and Tripsacum dactyloides. Furthermore, to extend functional genomic studies to maize and sorghum, we also constructed binary BAC (BIBAC) libraries for the maize inbred B73 and the sorghum landrace Nengsi-1. The BAC/BIBAC vectors facilitate transfer of large intact DNA inserts from BAC clones to the BIBAC vector and functional complementation of large DNA fragments. These seven Zea Map Alignment Project (ZMAP) BAC/BIBAC libraries have average insert sizes ranging from 92 to 148 kb, organellar DNA from 0.17 to 2.3%, empty vector rates between 0.35 and 5.56%, and genome equivalents of 4.7- to 8.4-fold. The usefulness of the Parviglumis and Tripsacum BAC libraries was demonstrated by mapping clones to the reference genome. Novel genes and alleles present in these ZMAP libraries can now be used for functional complementation studies and positional or homology-based cloning of genes for translational genomics.

  16. Completed sequence and corrected annotation of the genome of maize Iranian mosaic virus.

    Science.gov (United States)

    Ghorbani, Abozar; Izadpanah, Keramatollah; Dietzgen, Ralf G

    2018-03-01

    Maize Iranian mosaic virus (MIMV) is a negative-sense single-stranded RNA virus that is classified in the genus Nucleorhabdovirus, family Rhabdoviridae. The MIMV genome contains six open reading frames (ORFs) that encode in 3΄ to 5΄ order the nucleocapsid protein (N), phosphoprotein (P), putative movement protein (P3), matrix protein (M), glycoprotein (G) and RNA-dependent RNA polymerase (L). In this study, we determined the first complete genome sequence of MIMV using Illumina RNA-Seq and 3'/5' RACE. MIMV genome ('Fars' isolate) is 12,426 nucleotides in length. Unexpectedly, the predicted N gene ORF of this isolate and of four other Iranian isolates is 143 nucleotides shorter than that of the MIMV coding-complete reference isolate 'Shiraz 1' (Genbank NC_011542), possibly due to a minor error in the previous sequence. Genetic variability among the N, P, P3 and G ORFs of Iranian MIMV isolates was limited, but highest in the G gene ORF. Phylogenetic analysis of complete nucleorhabdovirus genomes demonstrated a close evolutionary relationship between MIMV, maize mosaic virus and taro vein chlorosis virus.

  17. Annotation of loci from genome-wide association studies using tissue-specific quantitative interaction proteomics

    NARCIS (Netherlands)

    Lundby, Alicia; Rossin, Elizabeth J.; Steffensen, Annette B.; Acha, Moshe Ray; Newton-Cheh, Christopher; Pfeufer, Arne; Lyneh, Stacey N.; Olesen, Soren-Peter; Brunak, Soren; Ellinor, Patrick T.; Jukema, J. Wouter; Trompet, Stella; Ford, Ian; Macfarlane, Peter W.; Krijthe, Bouwe P.; Hofman, Albert; Uitterlinden, Andre G.; Stricker, Bruno H.; Nathoe, Hendrik M.; Spiering, Wilko; Daly, Mark J.; Asselbergs, Ikea W.; van der Harst, Pim; Milan, David J.; de Bakker, Paul I. W.; Lage, Kasper; Olsen, Jesper V.

    Genome-wide association studies (GWAS) have identified thousands of loci associated with complex traits, but it is challenging to pinpoint causal genes in these loci and to exploit subtle association signals. We used tissue-specific quantitative interaction proteomics to map a network of five genes

  18. Functional annotation of rare gene aberration drivers of pancreatic cancer | Office of Cancer Genomics

    Science.gov (United States)

    As we enter the era of precision medicine, characterization of cancer genomes will directly influence therapeutic decisions in the clinic. Here we describe a platform enabling functionalization of rare gene mutations through their high-throughput construction, molecular barcoding and delivery to cancer models for in vivo tumour driver screens. We apply these technologies to identify oncogenic drivers of pancreatic ductal adenocarcinoma (PDAC).

  19. Mapping and annotating obesity-related genes in pig and human genomes.

    Science.gov (United States)

    Martelli, Pier Luigi; Fontanesi, Luca; Piovesan, Damiano; Fariselli, Piero; Casadio, Rita

    2014-01-01

    Background. Obesity is a major health problem in both developed and emerging countries. Obesity is a complex disease whose etiology involves genetic factors in strong interplay with environmental determinants and lifestyle. The discovery of genetic factors and biological pathways underlying human obesity is hampered by the difficulty in controlling the genetic background of human cohorts. Animal models are then necessary to further dissect the genetics of obesity. Pig has emerged as one of the most attractive models, because of the similarity with humans in the mechanisms regulating the fat deposition. Results. We collected the genes related to obesity in humans and to fat deposition traits in pig. We localized them on both human and pig genomes, building a map useful to interpret comparative studies on obesity. We characterized the collected genes structurally and functionally with BAR+ and mapped them on KEGG pathways and on STRING protein interaction network. Conclusions. The collected set consists of 361 obesity related genes in human and pig genomes. All genes were mapped on the human genome, and 54 could not be localized on the pig genome (release 2012). Only for 3 human genes there is no counterpart in pig, confirming that this animal is a good model for human obesity studies. Obesity related genes are mostly involved in regulation and signaling processes/pathways and relevant connection emerges between obesity-related genes and diseases such as cancer and infectious diseases.

  20. Comparative genomics reveals diversity among xanthomonads infecting tomato and pepper

    LENUS (Irish Health Repository)

    Potnis, Neha

    2011-03-11

    Abstract Background Bacterial spot of tomato and pepper is caused by four Xanthomonas species and is a major plant disease in warm humid climates. The four species are distinct from each other based on physiological and molecular characteristics. The genome sequence of strain 85-10, a member of one of the species, Xanthomonas euvesicatoria (Xcv) has been previously reported. To determine the relationship of the four species at the genome level and to investigate the molecular basis of their virulence and differing host ranges, draft genomic sequences of members of the other three species were determined and compared to strain 85-10. Results We sequenced the genomes of X. vesicatoria (Xv) strain 1111 (ATCC 35937), X. perforans (Xp) strain 91-118 and X. gardneri (Xg) strain 101 (ATCC 19865). The genomes were compared with each other and with the previously sequenced Xcv strain 85-10. In addition, the molecular features were predicted that may be required for pathogenicity including the type III secretion apparatus, type III effectors, other secretion systems, quorum sensing systems, adhesins, extracellular polysaccharide, and lipopolysaccharide determinants. Several novel type III effectors from Xg strain 101 and Xv strain 1111 genomes were computationally identified and their translocation was validated using a reporter gene assay. A homolog to Ax21, the elicitor of XA21-mediated resistance in rice, and a functional Ax21 sulfation system were identified in Xcv. Genes encoding proteins with functions mediated by type II and type IV secretion systems have also been compared, including enzymes involved in cell wall deconstruction, as contributors to pathogenicity. Conclusions Comparative genomic analyses revealed considerable diversity among bacterial spot pathogens, providing new insights into differences and similarities that may explain the diverse nature of these strains. Genes specific to pepper pathogens, such as the O-antigen of the lipopolysaccharide cluster

  1. Sequencing the CHO DXB11 genome reveals regional variations in genomic stability and haploidy

    DEFF Research Database (Denmark)

    Kaas, Christian Schrøder; Kristensen, Claus; Betenbaugh, Michael J.

    2015-01-01

    Background: The DHFR negative CHO DXB11 cell line (also known as DUX-B11 and DUKX) was historically the first CHO cell line to be used for large scale production of heterologous proteins and is still used for production of a number of complex proteins.  Results: Here we present the genomic sequence...... of the CHO DXB11 genome sequenced to a depth of 33x. Overall a significant genomic drift was seen favoring GC -> AT point mutations in line with the chemical mutagenesis strategy used for generation of the cell line. The sequencing depth for each gene in the genome revealed distinct peaks at sequencing...... in eight additional analyzed CHO genomes (15-20% haploidy) but not in the genome of the Chinese hamster. The dhfr gene is confirmed to be haploid in CHO DXB11; transcriptionally active and the remaining allele contains a G410C point mutation causing a Thr137Arg missense mutation. We find similar to 2...

  2. Single-cell genomics reveals pyrrolysine-encoding potential in members of uncultivated archaeal candidate division MSBL1

    KAUST Repository

    Guan, Yue; Haroon, Mohamed; Alam, Intikhab; Ferry, James G.; Stingl, Ulrich

    2017-01-01

    Pyrrolysine (Pyl), the 22nd canonical amino acid, is only decoded and synthesized by a limited number of organisms in the domains Archaea and Bacteria. Pyl is encoded by the amber codon UAG, typically a stop codon. To date, all known Pyl-decoding archaea are able to carry out methylotrophic methanogenesis. The functionality of methylamine methyltransferases, an important component of corrinoid-dependent methyltransfer reactions, depends on the presence of Pyl. Here, we present a putative pyl gene cluster obtained from single-cell genomes of the archaeal Mediterranean Sea Brine Lakes group 1 (MSBL1) from the Red Sea. Functional annotation of the MSBL1 single cell amplified genomes (SAGs) also revealed a complete corrinoid-dependent methyl-transfer pathway suggesting that members of MSBL1 may possibly be capable of synthesizing Pyl and metabolizing methylated amines. This article is protected by copyright. All rights reserved.

  3. Single-cell genomics reveals pyrrolysine-encoding potential in members of uncultivated archaeal candidate division MSBL1

    KAUST Repository

    Guan, Yue

    2017-05-11

    Pyrrolysine (Pyl), the 22nd canonical amino acid, is only decoded and synthesized by a limited number of organisms in the domains Archaea and Bacteria. Pyl is encoded by the amber codon UAG, typically a stop codon. To date, all known Pyl-decoding archaea are able to carry out methylotrophic methanogenesis. The functionality of methylamine methyltransferases, an important component of corrinoid-dependent methyltransfer reactions, depends on the presence of Pyl. Here, we present a putative pyl gene cluster obtained from single-cell genomes of the archaeal Mediterranean Sea Brine Lakes group 1 (MSBL1) from the Red Sea. Functional annotation of the MSBL1 single cell amplified genomes (SAGs) also revealed a complete corrinoid-dependent methyl-transfer pathway suggesting that members of MSBL1 may possibly be capable of synthesizing Pyl and metabolizing methylated amines. This article is protected by copyright. All rights reserved.

  4. Annotation of loci from genome-wide association studies using tissue-specific quantitative interaction proteomics

    DEFF Research Database (Denmark)

    Lundby, Alicia; Rossin, Elizabeth J.; Steffensen, Annette B.

    2014-01-01

    Genome-wide association studies (GWAS) have identified thousands of loci associated with complex traits, but it is challenging to pinpoint causal genes in these loci and to exploit subtle association signals. We used tissue-specific quantitative interaction proteomics to map a network of five genes...... involved in the Mendelian disorder long QT syndrome (LOTS). We integrated the LOTS network with GWAS loci from the corresponding common complex trait, QT-interval variation, to identify candidate genes that were subsequently confirmed in Xenopus laevis oocytes and zebrafish. We used the LOTS protein...... network to filter weak GWAS signals by identifying single-nucleotide polymorphisms (SNPs) in proximity to genes in the network supported by strong proteomic evidence. Three SNPs passing this filter reached genome-wide significance after replication genotyping. Overall, we present a general strategy...

  5. Genome, Functional Gene Annotation, and Nuclear Transformation of the Heterokont Oleaginous Alga Nannochloropsis oceanica CCMP1779

    Science.gov (United States)

    2012-11-15

    development of such an algal model system for basic discovery, we sequenced the genome and two sets of transcriptomes of N. oceanica CCMP1779, assembled...CCMP1779 has a gene encoding a highly conserved violax- anthin de-epoxidase ( VDE ) protein like that found in plants (Table S9). In Arabidopsis, VDE is...HLA3 or LCI1 were present. This result suggests that CCMP1779 might have a plastid Ci transport system similar to that of Chlamydomonas, but a distinct

  6. Integrative Annotation of Variants from 1092 Humans: Application to Cancer Genomics

    DEFF Research Database (Denmark)

    Khurana, Ekta; Fu, Yao; Colonna, Vincenza

    2013-01-01

    Identifying Important Identifiers Each of us has millions of sequence variations in our genomes. Signatures of purifying or negative selection should help identify which of those variations is functionally important. Khurana et al. (1235587) used sequence polymorphisms from 1092 humans across 14...... sites tended to occur in network hub promoters. Many recurrent somatic cancer variants occurred in noncoding regulatory regions and thus might indicate mutations that drive cancer....

  7. Comparative genomics reveals mobile pathogenicity chromosomes in Fusarium

    Energy Technology Data Exchange (ETDEWEB)

    Ma, Li Jun; van der Does, H. C.; Borkovich, Katherine A.; Coleman, Jeffrey J.; Daboussi, Marie-Jose; Di Pietro, Antonio; Dufresne, Marie; Freitag, Michael; Grabherr, Manfred; Henrissat, Bernard; Houterman, Petra M.; Kang, Seogchan; Shim, Won-Bo; Wolochuk, Charles; Xie, Xiaohui; Xu, Jin Rong; Antoniw, John; Baker, Scott E.; Bluhm, Burton H.; Breakspear, Andrew; Brown, Daren W.; Butchko, Robert A.; Chapman, Sinead; Coulson, Richard; Coutinho, Pedro M.; Danchin, Etienne G.; Diener, Andrew; Gale, Liane R.; Gardiner, Donald; Goff, Steven; Hammond-Kossack, Kim; Hilburn, Karen; Hua-Van, Aurelie; Jonkers, Wilfried; Kazan, Kemal; Kodira, Chinnappa D.; Koehrsen, Michael; Kumar, Lokesh; Lee, Yong Hwan; Li, Liande; Manners, John M.; Miranda-Saavedra, Diego; Mukherjee, Mala; Park, Gyungsoon; Park, Jongsun; Park, Sook Young; Proctor, Robert H.; Regev, Aviv; Ruiz-Roldan, M. C.; Sain, Divya; Sakthikumar, Sharadha; Sykes, Sean; Schwartz, David C.; Turgeon, Barbara G.; Wapinski, Ilan; Yoder, Olen; Young, Sarah; Zeng, Qiandong; Zhou, Shiguo; Galagan, James; Cuomo, Christina A.; Kistler, H. Corby; Rep, Martijn

    2010-03-18

    Fusarium species are among the most important phytopathogenic and toxigenic fungi, having significant impact on crop production and animal health. Distinctively, members of the F. oxysporum species complex exhibit wide host range but discontinuously distributed host specificity, reflecting remarkable genetic adaptability. To understand the molecular underpinnings of diverse phenotypic traits and their evolution in Fusarium, we compared the genomes of three economically important and phylogenetically related, yet phenotypically diverse plant-pathogenic species, F. graminearum, F. verticillioides and F. oxysporum f. sp. lycopersici. Our analysis revealed greatly expanded lineage-specific (LS) genomic regions in F. oxysporum that include four entire chromosomes, accounting for more than one-quarter of the genome. LS regions are rich in transposons and genes with distinct evolutionary profiles but related to pathogenicity. Experimentally, we demonstrate for the first time the transfer of two LS chromosomes between strains of F. oxysporum, resulting in the conversion of a non-pathogenic strain into a pathogen. Transfer of LS chromosomes between otherwise genetically isolated strains explains the polyphyletic origin of host specificity and the emergence of new pathogenic lineages in the F. oxysporum species complex, putting the evolution of fungal pathogenicity into a new perspective.

  8. Functional Annotation, Genome Organization and Phylogeny of the Grapevine (Vitis vinifera Terpene Synthase Gene Family Based on Genome Assembly, FLcDNA Cloning, and Enzyme Assays

    Directory of Open Access Journals (Sweden)

    Toub Omid

    2010-10-01

    Full Text Available Abstract Background Terpenoids are among the most important constituents of grape flavour and wine bouquet, and serve as useful metabolite markers in viticulture and enology. Based on the initial 8-fold sequencing of a nearly homozygous Pinot noir inbred line, 89 putative terpenoid synthase genes (VvTPS were predicted by in silico analysis of the grapevine (Vitis vinifera genome assembly 1. The finding of this very large VvTPS family, combined with the importance of terpenoid metabolism for the organoleptic properties of grapevine berries and finished wines, prompted a detailed examination of this gene family at the genomic level as well as an investigation into VvTPS biochemical functions. Results We present findings from the analysis of the up-dated 12-fold sequencing and assembly of the grapevine genome that place the number of predicted VvTPS genes at 69 putatively functional VvTPS, 20 partial VvTPS, and 63 VvTPS probable pseudogenes. Gene discovery and annotation included information about gene architecture and chromosomal location. A dense cluster of 45 VvTPS is localized on chromosome 18. Extensive FLcDNA cloning, gene synthesis, and protein expression enabled functional characterization of 39 VvTPS; this is the largest number of functionally characterized TPS for any species reported to date. Of these enzymes, 23 have unique functions and/or phylogenetic locations within the plant TPS gene family. Phylogenetic analyses of the TPS gene family showed that while most VvTPS form species-specific gene clusters, there are several examples of gene orthology with TPS of other plant species, representing perhaps more ancient VvTPS, which have maintained functions independent of speciation. Conclusions The highly expanded VvTPS gene family underpins the prominence of terpenoid metabolism in grapevine. We provide a detailed experimental functional annotation of 39 members of this important gene family in grapevine and comprehensive information

  9. Functional genomics tools applied to plant metabolism: a survey on plant respiration, its connections and the annotation of complex gene functions

    Directory of Open Access Journals (Sweden)

    Wagner L. Araújo

    2012-09-01

    Full Text Available The application of post-genomic techniques in plant respiration studies has greatly improved our ability to assign functions to gene products. In addition it has also revealed previously unappreciated interactions between distal elements of metabolism. Such results have reinforced the need to consider plant respiratory metabolism as part of a complex network and making sense of such interactions will ultimately require the construction of predictive and mechanistic models. Transcriptomics, proteomics, metabolomics and the quantification of metabolic flux will be of great value in creating such models both by facilitating the annotation of complex gene function, determining their structure and by furnishing the quantitative data required to test them. In this review we highlight how these experimental approaches have contributed to our current understanding of plant respiratory metabolism and its interplay with associated process (e.g. photosynthesis, photorespiration and nitrogen metabolism. We also discuss how data from these techniques may be integrated, with the ultimate aim of identifying mechanisms that control and regulate plant respiration and discovering novel gene functions with potential biotechnological implications.

  10. Identification of transcriptional signals in Encephalitozoon cuniculi widespread among Microsporidia phylum: support for accurate structural genome annotation

    Directory of Open Access Journals (Sweden)

    Wincker Patrick

    2009-12-01

    , 5'UTRs being strongly reduced, these signals can be used to ensure the accurate prediction of translation initiation codons for microsporidian genes and to improve microsporidian genome annotation.

  11. “Controlled, cross-species dataset for exploring biases in genome annotation and modification profiles”

    Directory of Open Access Journals (Sweden)

    Alison McAfee

    2015-12-01

    Full Text Available Since the sequencing of the honey bee genome, proteomics by mass spectrometry has become increasingly popular for biological analyses of this insect; but we have observed that the number of honey bee protein identifications is consistently low compared to other organisms [1]. In this dataset, we use nanoelectrospray ionization-coupled liquid chromatography–tandem mass spectrometry (nLC–MS/MS to systematically investigate the root cause of low honey bee proteome coverage. To this end, we present here data from three key experiments: a controlled, cross-species analyses of samples from Apis mellifera, Drosophila melanogaster, Caenorhabditis elegans, Saccharomyces cerevisiae, Mus musculus and Homo sapiens; a proteomic analysis of an individual honey bee whose genome was also sequenced; and a cross-tissue honey bee proteome comparison. The cross-species dataset was interrogated to determine relative proteome coverages between species, and the other two datasets were used to search for polymorphic sequences and to compare protein cleavage profiles, respectively.

  12. Genome-wide annotation of the soybean WRKY family and functional characterization of genes involved in response to Phakopsora pachyrhizi infection.

    Science.gov (United States)

    Bencke-Malato, Marta; Cabreira, Caroline; Wiebke-Strohm, Beatriz; Bücker-Neto, Lauro; Mancini, Estefania; Osorio, Marina B; Homrich, Milena S; Turchetto-Zolet, Andreia Carina; De Carvalho, Mayra C C G; Stolf, Renata; Weber, Ricardo L M; Westergaard, Gastón; Castagnaro, Atílio P; Abdelnoor, Ricardo V; Marcelino-Guimarães, Francismar C; Margis-Pinheiro, Márcia; Bodanese-Zanettini, Maria Helena

    2014-09-10

    Many previous studies have shown that soybean WRKY transcription factors are involved in the plant response to biotic and abiotic stresses. Phakopsora pachyrhizi is the causal agent of Asian Soybean Rust, one of the most important soybean diseases. There are evidences that WRKYs are involved in the resistance of some soybean genotypes against that fungus. The number of WRKY genes already annotated in soybean genome was underrepresented. In the present study, a genome-wide annotation of the soybean WRKY family was carried out and members involved in the response to P. pachyrhizi were identified. As a result of a soybean genomic databases search, 182 WRKY-encoding genes were annotated and 33 putative pseudogenes identified. Genes involved in the response to P. pachyrhizi infection were identified using superSAGE, RNA-Seq of microdissected lesions and microarray experiments. Seventy-five genes were differentially expressed during fungal infection. The expression of eight WRKY genes was validated by RT-qPCR. The expression of these genes in a resistant genotype was earlier and/or stronger compared with a susceptible genotype in response to P. pachyrhizi infection. Soybean somatic embryos were transformed in order to overexpress or silence WRKY genes. Embryos overexpressing a WRKY gene were obtained, but they were unable to convert into plants. When infected with P. pachyrhizi, the leaves of the silenced transgenic line showed a higher number of lesions than the wild-type plants. The present study reports a genome-wide annotation of soybean WRKY family. The participation of some members in response to P. pachyrhizi infection was demonstrated. The results contribute to the elucidation of gene function and suggest the manipulation of WRKYs as a strategy to increase fungal resistance in soybean plants.

  13. Rapid high resolution genotyping of Francisella tularensis by whole genome sequence comparison of annotated genes ("MLST+".

    Directory of Open Access Journals (Sweden)

    Markus H Antwerpen

    Full Text Available The zoonotic disease tularemia is caused by the bacterium Francisella tularensis. This pathogen is considered as a category A select agent with potential to be misused in bioterrorism. Molecular typing based on DNA-sequence like canSNP-typing or MLVA has become the accepted standard for this organism. Due to the organism's highly clonal nature, the current typing methods have reached their limit of discrimination for classifying closely related subpopulations within the subspecies F. tularensis ssp. holarctica. We introduce a new gene-by-gene approach, MLST+, based on whole genome data of 15 sequenced F. tularensis ssp. holarctica strains and apply this approach to investigate an epidemic of lethal tularemia among non-human primates in two animal facilities in Germany. Due to the high resolution of MLST+ we are able to demonstrate that three independent clones of this highly infectious pathogen were responsible for these spatially and temporally restricted outbreaks.

  14. The MycoBrowser portal: a comprehensive and manually annotated resource for mycobacterial genomes.

    Science.gov (United States)

    Kapopoulou, Adamandia; Lew, Jocelyne M; Cole, Stewart T

    2011-01-01

    In this paper, we present the MycoBrowser portal (http://mycobrowser.epfl.ch/), a resource that provides both in silico generated and manually reviewed information within databases dedicated to the complete genomes of Mycobacterium tuberculosis, Mycobacterium leprae, Mycobacterium marinum and Mycobacterium smegmatis. A central component of MycoBrowser is TubercuList (http://tuberculist.epfl.ch), which has recently benefited from a new data management system and web interface. These improvements were extended to all MycoBrowser databases. We provide an overview of the functionalities available and the different ways of interrogating the data then discuss how both the new information and the latest features are helping the mycobacterial research communities. Copyright © 2010 Elsevier Ltd. All rights reserved.

  15. Single-Cell (Meta-Genomics of a Dimorphic Candidatus Thiomargarita nelsonii Reveals Genomic Plasticity

    Directory of Open Access Journals (Sweden)

    Beverly E. Flood

    2016-05-01

    Full Text Available The genus Thiomargarita includes the world’s largest bacteria. But as uncultured organisms, their physiology, metabolism, and basis for their gigantism are not well understood. Thus a genomics approach, applied to a single Candidatus Thiomargarita nelsonii cell was employed to explore the genetic potential of one of these enigmatic giant bacteria. The Thiomargarita cell was obtained from an assemblage of budding Ca. T. nelsonii attached to a provannid gastropod shell from Hydrate Ridge, a methane seep offshore of Oregon, USA. Here we present a manually curated genome of Bud S10 resulting from a hybrid assembly of long Pacific Biosciences and short Illumina sequencing reads. With respect to inorganic carbon fixation and sulfur oxidation pathways, the Ca. T. nelsonii Hydrate Ridge Bud S10 genome was similar to marine sister taxa within the family Beggiatoaceae. However, the Bud S10 genome contains genes suggestive of the genetic potential for lithotrophic growth on arsenite and perhaps hydrogen. The genome also revealed that Bud S10 likely respires nitrate via two pathways: a complete denitrification pathway and a dissimilatory nitrate reduction to ammonia pathway. Both pathways have been predicted, but not previously fully elucidated, in the genomes of other large, vacuolated, sulfur-oxidizing bacteria.Surprisingly, the genome also had a high number of unusual features for a bacterium to include the largest number of metacaspases and introns ever reported in a bacterium. Also present, are a large number of other mobile genetic elements, such as insertion sequence transposable elements and miniature inverted-repeat transposable elements (MITEs. In some cases, mobile genetic elements disrupted key genes in metabolic pathways. For example, a MITE interrupts hupL, which encodes the large subunit of the hydrogenase in hydrogen oxidation. Moreover, we detected a group I intron in one of the most critical genes in the sulfur oxidation pathway, dsr

  16. Annotation of the Asian citrus psyllid genome reveals a reduced innate immune system

    Science.gov (United States)

    Citrus production worldwide is currently facing significant losses due to citrus greening disease, also known as huanglongbing. The citrus greening bacteria, Candidatus Liberibacter asiaticus (CLas), is a persistent propagative pathogen transmitted by the Asian citrus psyllid, Diaphorina citri Kuway...

  17. Mitochondrial Disease Sequence Data Resource (MSeqDR): A global grass-roots consortium to facilitate deposition, curation, annotation, and integrated analysis of genomic data for the mitochondrial disease clinical and research communities

    NARCIS (Netherlands)

    M.J. Falk (Marni J.); L. Shen (Lishuang); M. Gonzalez (Michael); J. Leipzig (Jeremy); M.T. Lott (Marie T.); A.P.M. Stassen (Alphons P.M.); M.A. Diroma (Maria Angela); D. Navarro-Gomez (Daniel); P. Yeske (Philip); R. Bai (Renkui); R.G. Boles (Richard G.); V. Brilhante (Virginia); D. Ralph (David); J.T. DaRe (Jeana T.); R. Shelton (Robert); S.F. Terry (Sharon); Z. Zhang (Zhe); W.C. Copeland (William C.); M. van Oven (Mannis); H. Prokisch (Holger); D.C. Wallace; M. Attimonelli (Marcella); D. Krotoski (Danuta); S. Zuchner (Stephan); X. Gai (Xiaowu); S. Bale (Sherri); J. Bedoyan (Jirair); D.M. Behar (Doron); P. Bonnen (Penelope); L. Brooks (Lisa); C. Calabrese (Claudia); S. Calvo (Sarah); P.F. Chinnery (Patrick); J. Christodoulou (John); D. Church (Deanna); R. Clima (Rosanna); B.H. Cohen (Bruce H.); R.G.H. Cotton (Richard); I.F.M. de Coo (René); O. Derbenevoa (Olga); J.T. den Dunnen (Johan); D. Dimmock (David); G. Enns (Gregory); G. Gasparre (Giuseppe); A. Goldstein (Amy); I. Gonzalez (Iris); K. Gwinn (Katrina); S. Hahn (Sihoun); R.H. Haas (Richard H.); H. Hakonarson (Hakon); M. Hirano (Michio); D. Kerr (Douglas); D. Li (Dong); M. Lvova (Maria); F. Macrae (Finley); D. Maglott (Donna); E. McCormick (Elizabeth); G. Mitchell (Grant); V.K. Mootha (Vamsi K.); Y. Okazaki (Yasushi); A. Pujol (Aurora); M. Parisi (Melissa); J.C. Perin (Juan Carlos); E.A. Pierce (Eric A.); V. Procaccio (Vincent); S. Rahman (Shamima); H. Reddi (Honey); H. Rehm (Heidi); E. Riggs (Erin); R.J.T. Rodenburg (Richard); Y. Rubinstein (Yaffa); R. Saneto (Russell); M. Santorsola (Mariangela); C. Scharfe (Curt); C. Sheldon (Claire); E.A. Shoubridge (Eric); D. Simone (Domenico); B. Smeets (Bert); J.A.M. Smeitink (Jan); C. Stanley (Christine); A. Suomalainen (Anu); M.A. Tarnopolsky (Mark); I. Thiffault (Isabelle); D.R. Thorburn (David R.); J.V. Hove (Johan Van); L. Wolfe (Lynne); L.-J. Wong (Lee-Jun)

    2015-01-01

    textabstractSuccess rates for genomic analyses of highly heterogeneous disorders can be greatly improved if a large cohort of patient data is assembled to enhance collective capabilities for accurate sequence variant annotation, analysis, and interpretation. Indeed, molecular diagnostics requires

  18. Comparative Genomic Analysis Reveals Ecological Differentiation in the Genus Carnobacterium.

    Science.gov (United States)

    Iskandar, Christelle F; Borges, Frédéric; Taminiau, Bernard; Daube, Georges; Zagorec, Monique; Remenant, Benoît; Leisner, Jørgen J; Hansen, Martin A; Sørensen, Søren J; Mangavel, Cécile; Cailliez-Grimal, Catherine; Revol-Junelles, Anne-Marie

    2017-01-01

    Lactic acid bacteria (LAB) differ in their ability to colonize food and animal-associated habitats: while some species are specialized and colonize a limited number of habitats, other are generalist and are able to colonize multiple animal-linked habitats. In the current study, Carnobacterium was used as a model genus to elucidate the genetic basis of these colonization differences. Analyses of 16S rRNA gene meta-barcoding data showed that C. maltaromaticum followed by C. divergens are the most prevalent species in foods derived from animals (meat, fish, dairy products), and in the gut. According to phylogenetic analyses, these two animal-adapted species belong to one of two deeply branched lineages. The second lineage contains species isolated from habitats where contact with animal is rare. Genome analyses revealed that members of the animal-adapted lineage harbor a larger secretome than members of the other lineage. The predicted cell-surface proteome is highly diversified in C. maltaromaticum and C. divergens with genes involved in adaptation to the animal milieu such as those encoding biopolymer hydrolytic enzymes, a heme uptake system, and biopolymer-binding adhesins. These species also exhibit genes for gut adaptation and respiration. In contrast, Carnobacterium species belonging to the second lineage encode a poorly diversified cell-surface proteome, lack genes for gut adaptation and are unable to respire. These results shed light on the important genomics traits required for adaptation to animal-linked habitats in generalist Carnobacterium .

  19. Genomic analysis of primordial dwarfism reveals novel disease genes.

    Science.gov (United States)

    Shaheen, Ranad; Faqeih, Eissa; Ansari, Shinu; Abdel-Salam, Ghada; Al-Hassnan, Zuhair N; Al-Shidi, Tarfa; Alomar, Rana; Sogaty, Sameera; Alkuraya, Fowzan S

    2014-02-01

    Primordial dwarfism (PD) is a disease in which severely impaired fetal growth persists throughout postnatal development and results in stunted adult size. The condition is highly heterogeneous clinically, but the use of certain phenotypic aspects such as head circumference and facial appearance has proven helpful in defining clinical subgroups. In this study, we present the results of clinical and genomic characterization of 16 new patients in whom a broad definition of PD was used (e.g., 3M syndrome was included). We report a novel PD syndrome with distinct facies in two unrelated patients, each with a different homozygous truncating mutation in CRIPT. Our analysis also reveals, in addition to mutations in known PD disease genes, the first instance of biallelic truncating BRCA2 mutation causing PD with normal bone marrow analysis. In addition, we have identified a novel locus for Seckel syndrome based on a consanguineous multiplex family and identified a homozygous truncating mutation in DNA2 as the likely cause. An additional novel PD disease candidate gene XRCC4 was identified by autozygome/exome analysis, and the knockout mouse phenotype is highly compatible with PD. Thus, we add a number of novel genes to the growing list of PD-linked genes, including one which we show to be linked to a novel PD syndrome with a distinct facial appearance. PD is extremely heterogeneous genetically and clinically, and genomic tools are often required to reach a molecular diagnosis.

  20. Algal genomes reveal evolutionary mosaicism and the fate of nucleomorphs

    Energy Technology Data Exchange (ETDEWEB)

    Curtis, Bruce A.; Tanifuji, Goro; Burki, Fabien; Gruber, Ansgar; Irimia, Manuuel; Maruyama, Shinichiro; Arias, Maria C.; Ball, Steven G.; Gile, Gillian H.; Hirakawa, Yoshihisa; Hopkins, Julia F.; Kuo, Alan; Rensing, Stefan A.; Schmutz, Jeremy; Symeonidi, Aikaterini; Elias, Marek; Eveleigh, Robert J. M.; Herman, Emily K.; Klute, Mary J.; Nakayama, Takuro; Obornik, Miroslav; Reyes-Prieto, Adrian; Armbrust, E. Virginia; Aves, Stephen J.; Beiko, Robert G.; Coutinho, Pedro; Dacks, Joel B.; Durnford, Dion G.; Fast, Naomi M.; Green, Beverley R.; Grisdale, Cameron J.; Hempel, Franziska; Henrissat, Bernard; Hoppner, Marc P.; Ishida, Ken-Ichiro; Kim, Eunsoo; Koreny, Ludek; Kroth, Peter G.; Liu, Yuan; Malik, Shehre-Banoo; Maier, Uwe G.; McRose, Darcy; Mock, Thomas; Neilson, Jonathan A. D.; Onodera, Naoko T.; Poole, Anthony M.; Pritham, Ellen J.; Richards, Thomas A.; Rocap, Gabrielle; Roy, Scott W.; Sarai, Chihiro; Schaack, Sarah; Shirato, Shu; Slamovits, Claudio H.; Spencer, Davie F.; Suzuki, Shigekatsu; Worden, Alexandra Z.; Zauner, Stefan; Barry, Kerrie; Bell, Callum; Bharti, Arvind K.; Crow, John A.; Grimwood, Jane; Kramer, Robin; Lindquist, Erika; Lucas, Susan; Salamov, Asaf; McFadden, Geoffrey I.; Lane, Christopher E.; Keeling, Patrick J.; Gray, Michael W.; Grigoriev, Igor V.; Archibald, John M.

    2012-08-10

    Cryptophyte and chlorarachniophyte algae are transitional forms in the widespread secondary endosymbiotic acquisition of photosynthesis by engulfment of eukaryotic algae. Unlike most secondary plastid-bearing algae, miniaturized versions of the endosymbiont nuclei (nucleomorphs) persist in cryptophytes and chlorarachniophytes. To determine why, and to address other fundamental questions about eukaryote eukaryote endosymbiosis, we sequenced the nuclear genomes of the cryptophyte Guillardia theta and the chlorarachniophyte Bigelowiella natans. Both genomes have 21,000 protein genes and are intron rich, and B. natans exhibits unprecedented alternative splicing for a single-celled organism. Phylogenomic analyses and subcellular targeting predictions reveal extensive genetic and biochemical mosaicism, with both host- and endosymbiont-derived genes servicing the mitochondrion, the host cell cytosol, the plastid and the remnant endosymbiont cytosol of both algae. Mitochondrion-to-nucleus gene transfer still occurs in both organisms but plastid-to-nucleus and nucleomorph-to-nucleus transfers do not, which explains why a small residue of essential genes remains locked in each nucleomorph.

  1. Genetic fine-mapping and genomic annotation defines causal mechanisms at type 2 diabetes susceptibility loci

    Science.gov (United States)

    Mahajan, Anubha; Locke, Adam; Rayner, N William; Robertson, Neil; Scott, Robert A; Prokopenko, Inga; Scott, Laura J; Green, Todd; Sparso, Thomas; Thuillier, Dorothee; Yengo, Loic; Grallert, Harald; Wahl, Simone; Frånberg, Mattias; Strawbridge, Rona J; Kestler, Hans; Chheda, Himanshu; Eisele, Lewin; Gustafsson, Stefan; Steinthorsdottir, Valgerdur; Thorleifsson, Gudmar; Qi, Lu; Karssen, Lennart C; van Leeuwen, Elisabeth M; Willems, Sara M; Li, Man; Chen, Han; Fuchsberger, Christian; Kwan, Phoenix; Ma, Clement; Linderman, Michael; Lu, Yingchang; Thomsen, Soren K; Rundle, Jana K; Beer, Nicola L; van de Bunt, Martijn; Chalisey, Anil; Kang, Hyun Min; Voight, Benjamin F; Abecasis, Goncalo R; Almgren, Peter; Baldassarre, Damiano; Balkau, Beverley; Benediktsson, Rafn; Blüher, Matthias; Boeing, Heiner; Bonnycastle, Lori L; Borringer, Erwin P; Burtt, Noël P; Carey, Jason; Charpentier, Guillaume; Chines, Peter S; Cornelis, Marilyn C; Couper, David J; Crenshaw, Andrew T; van Dam, Rob M; Doney, Alex SF; Dorkhan, Mozhgan; Edkins, Sarah; Eriksson, Johan G; Esko, Tonu; Eury, Elodie; Fadista, João; Flannick, Jason; Fontanillas, Pierre; Fox, Caroline; Franks, Paul W; Gertow, Karl; Gieger, Christian; Gigante, Bruna; Gottesman, Omri; Grant, George B; Grarup, Niels; Groves, Christopher J; Hassinen, Maija; Have, Christian T; Herder, Christian; Holmen, Oddgeir L; Hreidarsson, Astradur B; Humphries, Steve E; Hunter, David J; Jackson, Anne U; Jonsson, Anna; Jørgensen, Marit E; Jørgensen, Torben; Kerrison, Nicola D; Kinnunen, Leena; Klopp, Norman; Kong, Augustine; Kovacs, Peter; Kraft, Peter; Kravic, Jasmina; Langford, Cordelia; Leander, Karin; Liang, Liming; Lichtner, Peter; Lindgren, Cecilia M; Lindholm, Eero; Linneberg, Allan; Liu, Ching-Ti; Lobbens, Stéphane; Luan, Jian’an; Lyssenko, Valeriya; Männistö, Satu; McLeod, Olga; Meyer, Julia; Mihailov, Evelin; Mirza, Ghazala; Mühleisen, Thomas W; Müller-Nurasyid, Martina; Navarro, Carmen; Nöthen, Markus M; Oskolkov, Nikolay N; Owen, Katharine R; Palli, Domenico; Pechlivanis, Sonali; Perry, John RB; Platou, Carl GP; Roden, Michael; Ruderfer, Douglas; Rybin, Denis; van der Schouw, Yvonne T; Sennblad, Bengt; Sigurðsson, Gunnar; Stančáková, Alena; Steinbach, Gerald; Storm, Petter; Strauch, Konstantin; Stringham, Heather M; Sun, Qi; Thorand, Barbara; Tikkanen, Emmi; Tonjes, Anke; Trakalo, Joseph; Tremoli, Elena; Tuomi, Tiinamaija; Wennauer, Roman; Wood, Andrew R; Zeggini, Eleftheria; Dunham, Ian; Birney, Ewan; Pasquali, Lorenzo; Ferrer, Jorge; Loos, Ruth JF; Dupuis, Josée; Florez, Jose C; Boerwinkle, Eric; Pankow, James S; van Duijn, Cornelia; Sijbrands, Eric; Meigs, James B; Hu, Frank B; Thorsteinsdottir, Unnur; Stefansson, Kari; Lakka, Timo A; Rauramaa, Rainer; Stumvoll, Michael; Pedersen, Nancy L; Lind, Lars; Keinanen-Kiukaanniemi, Sirkka M; Korpi-Hyövälti, Eeva; Saaristo, Timo E; Saltevo, Juha; Kuusisto, Johanna; Laakso, Markku; Metspalu, Andres; Erbel, Raimund; Jöckel, Karl-Heinz; Moebus, Susanne; Ripatti, Samuli; Salomaa, Veikko; Ingelsson, Erik; Boehm, Bernhard O; Bergman, Richard N; Collins, Francis S; Mohlke, Karen L; Koistinen, Heikki; Tuomilehto, Jaakko; Hveem, Kristian; Njølstad, Inger; Deloukas, Panagiotis; Donnelly, Peter J; Frayling, Timothy M; Hattersley, Andrew T; de Faire, Ulf; Hamsten, Anders; Illig, Thomas; Peters, Annette; Cauchi, Stephane; Sladek, Rob; Froguel, Philippe; Hansen, Torben; Pedersen, Oluf; Morris, Andrew D; Palmer, Collin NA; Kathiresan, Sekar; Melander, Olle; Nilsson, Peter M; Groop, Leif C; Barroso, Inês; Langenberg, Claudia; Wareham, Nicholas J; O’Callaghan, Christopher A; Gloyn, Anna L; Altshuler, David; Boehnke, Michael; Teslovich, Tanya M; McCarthy, Mark I; Morris, Andrew P

    2015-01-01

    We performed fine-mapping of 39 established type 2 diabetes (T2D) loci in 27,206 cases and 57,574 controls of European ancestry. We identified 49 distinct association signals at these loci, including five mapping in/near KCNQ1. “Credible sets” of variants most likely to drive each distinct signal mapped predominantly to non-coding sequence, implying that T2D association is mediated through gene regulation. Credible set variants were enriched for overlap with FOXA2 chromatin immunoprecipitation binding sites in human islet and liver cells, including at MTNR1B, where fine-mapping implicated rs10830963 as driving T2D association. We confirmed that this T2D-risk allele increases FOXA2-bound enhancer activity in islet- and liver-derived cells. We observed allele-specific differences in NEUROD1 binding in islet-derived cells, consistent with evidence that the T2D-risk allele increases islet MTNR1B expression. Our study demonstrates how integration of genetic and genomic information can define molecular mechanisms through which variants underlying association signals exert their effects on disease. PMID:26551672

  2. The house spider genome reveals an ancient whole-genome duplication during arachnid evolution.

    Science.gov (United States)

    Schwager, Evelyn E; Sharma, Prashant P; Clarke, Thomas; Leite, Daniel J; Wierschin, Torsten; Pechmann, Matthias; Akiyama-Oda, Yasuko; Esposito, Lauren; Bechsgaard, Jesper; Bilde, Trine; Buffry, Alexandra D; Chao, Hsu; Dinh, Huyen; Doddapaneni, HarshaVardhan; Dugan, Shannon; Eibner, Cornelius; Extavour, Cassandra G; Funch, Peter; Garb, Jessica; Gonzalez, Luis B; Gonzalez, Vanessa L; Griffiths-Jones, Sam; Han, Yi; Hayashi, Cheryl; Hilbrant, Maarten; Hughes, Daniel S T; Janssen, Ralf; Lee, Sandra L; Maeso, Ignacio; Murali, Shwetha C; Muzny, Donna M; Nunes da Fonseca, Rodrigo; Paese, Christian L B; Qu, Jiaxin; Ronshaugen, Matthew; Schomburg, Christoph; Schönauer, Anna; Stollewerk, Angelika; Torres-Oliva, Montserrat; Turetzek, Natascha; Vanthournout, Bram; Werren, John H; Wolff, Carsten; Worley, Kim C; Bucher, Gregor; Gibbs, Richard A; Coddington, Jonathan; Oda, Hiroki; Stanke, Mario; Ayoub, Nadia A; Prpic, Nikola-Michael; Flot, Jean-François; Posnien, Nico; Richards, Stephen; McGregor, Alistair P

    2017-07-31

    The duplication of genes can occur through various mechanisms and is thought to make a major contribution to the evolutionary diversification of organisms. There is increasing evidence for a large-scale duplication of genes in some chelicerate lineages including two rounds of whole genome duplication (WGD) in horseshoe crabs. To investigate this further, we sequenced and analyzed the genome of the common house spider Parasteatoda tepidariorum. We found pervasive duplication of both coding and non-coding genes in this spider, including two clusters of Hox genes. Analysis of synteny conservation across the P. tepidariorum genome suggests that there has been an ancient WGD in spiders. Comparison with the genomes of other chelicerates, including that of the newly sequenced bark scorpion Centruroides sculpturatus, suggests that this event occurred in the common ancestor of spiders and scorpions, and is probably independent of the WGDs in horseshoe crabs. Furthermore, characterization of the sequence and expression of the Hox paralogs in P. tepidariorum suggests that many have been subject to neo-functionalization and/or sub-functionalization since their duplication. Our results reveal that spiders and scorpions are likely the descendants of a polyploid ancestor that lived more than 450 MYA. Given the extensive morphological diversity and ecological adaptations found among these animals, rivaling those of vertebrates, our study of the ancient WGD event in Arachnopulmonata provides a new comparative platform to explore common and divergent evolutionary outcomes of polyploidization events across eukaryotes.

  3. Mitochondrial genomes reveal recombination in the presumed asexual Fusarium oxysporum species complex.

    Science.gov (United States)

    Brankovics, Balázs; van Dam, Peter; Rep, Martijn; de Hoog, G Sybren; J van der Lee, Theo A; Waalwijk, Cees; van Diepeningen, Anne D

    2017-09-18

    The Fusarium oxysporum species complex (FOSC) contains several phylogenetic lineages. Phylogenetic studies identified two to three major clades within the FOSC. The mitochondrial sequences are highly informative phylogenetic markers, but have been mostly neglected due to technical difficulties. A total of 61 complete mitogenomes of FOSC strains were de novo assembled and annotated. Length variations and intron patterns support the separation of three phylogenetic species. The variable region of the mitogenome that is typical for the genus Fusarium shows two new variants in the FOSC. The variant typical for Fusarium is found in members of all three clades, while variant 2 is found in clades 2 and 3 and variant 3 only in clade 2. The extended set of loci analyzed using a new implementation of the genealogical concordance species recognition method support the identification of three phylogenetic species within the FOSC. Comparative analysis of the mitogenomes in the FOSC revealed ongoing mitochondrial recombination within, but not between phylogenetic species. The recombination indicates the presence of a parasexual cycle in F. oxysporum. The obstacles hindering the usage of the mitogenomes are resolved by using next generation sequencing and selective genome assemblers, such as GRAbB. Complete mitogenome sequences offer a stable basis and reference point for phylogenetic and population genetic studies.

  4. Genome sequence of Thermofilum pendens reveals an exceptional loss of biosynthetic pathways without genome reduction

    Energy Technology Data Exchange (ETDEWEB)

    Kyrpides, Nikos; Anderson, Iain; Rodriguez, Jason; Susanti, Dwi; Porat, Iris; Reich, Claudia; Ulrich, Luke E.; Elkins, James G.; Mavromatis, Kostas; Lykidis, Athanasios; Kim, Edwin; Thompson, Linda S.; Nolan, Matt; Land, Miriam; Copeland, Alex; Lapidus, Alla; Lucas, Susan; Detter, Chris; Zhulin, Igor B.; Olsen, Gary J.; Whitman, William; Mukhopadhyay, Biswarup; Bristow, James; Kyrpides, Nikos

    2008-01-01

    We report the complete genome of Thermofilum pendens, a deep-branching, hyperthermophilic member of the order Thermoproteales within the archaeal kingdom Crenarchaeota. T. pendens is a sulfur-dependent, anaerobic heterotroph isolated from a solfatara in Iceland. It is an extracellular commensal, requiring an extract of Thermoproteus tenax for growth, and the genome sequence reveals that biosynthetic pathways for purines, most amino acids, and most cofactors are absent. In fact T. pendens has fewer biosynthetic enzymes than obligate intracellular parasites, although it does not display other features common among obligate parasites and thus does not appear to be in the process of becoming a parasite. It appears that T. pendens has adapted to life in an environment rich in nutrients. T. pendens was known to utilize peptides as an energy source, but the genome reveals substantial ability to grow on carbohydrates. T. pendens is the first crenarchaeote and only the second archaeon found to have a transporter of the phosphotransferase system. In addition to fermentation, T. pendens may gain energy from sulfur reduction with hydrogen and formate as electron donors. It may also be capable of sulfur-independent growth on formate with formate hydrogenlyase. Additional novel features are the presence of a monomethylamine:corrinoid methyltransferase, the first time this enzyme has been found outside of Methanosarcinales, and a presenilin-related protein. Predicted highly expressed proteins do not include housekeeping genes, and instead include ABC transporters for carbohydrates and peptides, and CRISPR-associated proteins.

  5. Comparative genomic mapping of the bovine Fragile Histidine Triad (FHIT tumour suppressor gene: characterization of a 2 Mb BAC contig covering the locus, complete annotation of the gene, analysis of cDNA and of physiological expression profiles

    Directory of Open Access Journals (Sweden)

    Boussaha Mekki

    2006-05-01

    Full Text Available Abstract Background The Fragile Histidine Triad gene (FHIT is an oncosuppressor implicated in many human cancers, including vesical tumors. FHIT is frequently hit by deletions caused by fragility at FRA3B, the most active of human common fragile sites, where FHIT lays. Vesical tumors affect also cattle, including animals grazing in the wild on bracken fern; compounds released by the fern are known to induce chromosome fragility and may trigger cancer with the interplay of latent Papilloma virus. Results The bovine FHIT was characterized by assembling a contig of 78 BACs. Sequence tags were designed on human exons and introns and used directly to select bovine BACs, or compared with sequence data in the bovine genome database or in the trace archive of the bovine genome sequencing project, and adapted before use. FHIT is split in ten exons like in man, with exons 5 to 9 coding for a 149 amino acids protein. VISTA global alignments between bovine genomic contigs retrieved from the bovine genome database and the human FHIT region were performed. Conservation was extremely high over a 2 Mb region spanning the whole FHIT locus, including the size of introns. Thus, the bovine FHIT covers about 1.6 Mb compared to 1.5 Mb in man. Expression was analyzed by RT-PCR and Northern blot, and was found to be ubiquitous. Four cDNA isoforms were isolated and sequenced, that originate from an alternative usage of three variants of exon 4, revealing a size very close to the major human FHIT cDNAs. Conclusion A comparative genomic approach allowed to assemble a contig of 78 BACs and to completely annotate a 1.6 Mb region spanning the bovine FHIT gene. The findings confirmed the very high level of conservation between human and bovine genomes and the importance of comparative mapping to speed the annotation process of the recently sequenced bovine genome. The detailed knowledge of the genomic FHIT region will allow to study the role of FHIT in bovine cancerogenesis

  6. Comparative genomic mapping of the bovine Fragile Histidine Triad (FHIT) tumour suppressor gene: characterization of a 2 Mb BAC contig covering the locus, complete annotation of the gene, analysis of cDNA and of physiological expression profiles.

    Science.gov (United States)

    Uboldi, Cristina; Guidi, Elena; Roperto, Sante; Russo, Valeria; Roperto, Franco; Di Meo, Giulia Pia; Iannuzzi, Leopoldo; Floriot, Sandrine; Boussaha, Mekki; Eggen, André; Ferretti, Luca

    2006-05-23

    The Fragile Histidine Triad gene (FHIT) is an oncosuppressor implicated in many human cancers, including vesical tumors. FHIT is frequently hit by deletions caused by fragility at FRA3B, the most active of human common fragile sites, where FHIT lays. Vesical tumors affect also cattle, including animals grazing in the wild on bracken fern; compounds released by the fern are known to induce chromosome fragility and may trigger cancer with the interplay of latent Papilloma virus. The bovine FHIT was characterized by assembling a contig of 78 BACs. Sequence tags were designed on human exons and introns and used directly to select bovine BACs, or compared with sequence data in the bovine genome database or in the trace archive of the bovine genome sequencing project, and adapted before use. FHIT is split in ten exons like in man, with exons 5 to 9 coding for a 149 amino acids protein. VISTA global alignments between bovine genomic contigs retrieved from the bovine genome database and the human FHIT region were performed. Conservation was extremely high over a 2 Mb region spanning the whole FHIT locus, including the size of introns. Thus, the bovine FHIT covers about 1.6 Mb compared to 1.5 Mb in man. Expression was analyzed by RT-PCR and Northern blot, and was found to be ubiquitous. Four cDNA isoforms were isolated and sequenced, that originate from an alternative usage of three variants of exon 4, revealing a size very close to the major human FHIT cDNAs. A comparative genomic approach allowed to assemble a contig of 78 BACs and to completely annotate a 1.6 Mb region spanning the bovine FHIT gene. The findings confirmed the very high level of conservation between human and bovine genomes and the importance of comparative mapping to speed the annotation process of the recently sequenced bovine genome. The detailed knowledge of the genomic FHIT region will allow to study the role of FHIT in bovine cancerogenesis, especially of vesical papillomavirus-associated cancers of

  7. Evidence-based annotation of the malaria parasite's genome using comparative expression profiling.

    Directory of Open Access Journals (Sweden)

    Yingyao Zhou

    2008-02-01

    Full Text Available A fundamental problem in systems biology and whole genome sequence analysis is how to infer functions for the many uncharacterized proteins that are identified, whether they are conserved across organisms of different phyla or are phylum-specific. This problem is especially acute in pathogens, such as malaria parasites, where genetic and biochemical investigations are likely to be more difficult. Here we perform comparative expression analysis on Plasmodium parasite life cycle data derived from P. falciparum blood, sporozoite, zygote and ookinete stages, and P. yoelii mosquito oocyst and salivary gland sporozoites, blood and liver stages and show that type II fatty acid biosynthesis genes are upregulated in liver and insect stages relative to asexual blood stages. We also show that some universally uncharacterized genes with orthologs in Plasmodium species, Saccharomyces cerevisiae and humans show coordinated transcription patterns in large collections of human and yeast expression data and that the function of the uncharacterized genes can sometimes be predicted based on the expression patterns across these diverse organisms. We also use a comprehensive and unbiased literature mining method to predict which uncharacterized parasite-specific genes are likely to have roles in processes such as gliding motility, host-cell interactions, sporozoite stage, or rhoptry function. These analyses, together with protein-protein interaction data, provide probabilistic models that predict the function of 926 uncharacterized malaria genes and also suggest that malaria parasites may provide a simple model system for the study of some human processes. These data also provide a foundation for further studies of transcriptional regulation in malaria parasites.

  8. Genome sequence of M6, a diploid inbred clone of the high-glycoalkaloid-producing tuber-bearing potato species Solanum chacoense, reveals residual heterozygosity.

    Science.gov (United States)

    Leisner, Courtney P; Hamilton, John P; Crisovan, Emily; Manrique-Carpintero, Norma C; Marand, Alexandre P; Newton, Linsey; Pham, Gina M; Jiang, Jiming; Douches, David S; Jansky, Shelley H; Buell, C Robin

    2018-05-01

    Cultivated potato (Solanum tuberosum L.) is a highly heterozygous autotetraploid that presents challenges in genome analyses and breeding. Wild potato species serve as a resource for the introgression of important agronomic traits into cultivated potato. One key species is Solanum chacoense and the diploid, inbred clone M6, which is self-compatible and has desirable tuber market quality and disease resistance traits. Sequencing and assembly of the genome of the M6 clone of S. chacoense generated an assembly of 825 767 562 bp in 8260 scaffolds with an N50 scaffold size of 713 602 bp. Pseudomolecule construction anchored 508 Mb of the genome assembly into 12 chromosomes. Genome annotation yielded 49 124 high-confidence gene models representing 37 740 genes. Comparative analyses of the M6 genome with six other Solanaceae species revealed a core set of 158 367 Solanaceae genes and 1897 genes unique to three potato species. Analysis of single nucleotide polymorphisms across the M6 genome revealed enhanced residual heterozygosity on chromosomes 4, 8 and 9 relative to the other chromosomes. Access to the M6 genome provides a resource for identification of key genes for important agronomic traits and aids in genome-enabled development of inbred diploid potatoes with the potential to accelerate potato breeding. © 2018 The Authors The Plant Journal © 2018 John Wiley & Sons Ltd.

  9. A parts list for fungal cellulosomes revealed by comparative genomics

    Energy Technology Data Exchange (ETDEWEB)

    Haitjema, Charles H.; Gilmore, Sean P.; Henske, John K.; Solomon, Kevin V.; de Groot, Randall; Kuo, Alan; Mondo, Stephen J.; Salamov, Asaf A.; LaButti, Kurt; Zhao, Zhiying; Chiniquy, Jennifer; Barry, Kerrie; Brewer, Heather M.; Purvine, Samuel O.; Wright, Aaron T.; Hainaut, Matthieu; Boxma, Brigitte; van Alen, Theo; Hackstein, Johannes H. P.; Henrissat, Bernard; Baker, Scott E.; Grigoriev, Igor V.; O' Malley, Michelle A.

    2017-05-26

    Cellulosomes are large, multi-protein complexes that tether plant biomass degrading enzymes together for improved hydrolysis1. These complexes were first described in anaerobic bacteria where species specific dockerin domains mediate assembly of enzymes onto complementary cohesin motifs interspersed within non-catalytic protein scaffolds1. The versatile protein assembly mechanism conferred by the bacterial cohesin-dockerin interaction is now a standard design principle for synthetic protein-scale pathways2,3. For decades, analogous structures have been reported in the early branching anaerobic fungi, which are known to assemble by sequence divergent non-catalytic dockerin domains (NCDD)4. However, the enzyme components, modular assembly mechanism, and functional role of fungal cellulosomes remain unknown5,6. Here, we describe the comprehensive set of proteins critical to fungal cellulosome assembly, including novel, conserved scaffolding proteins unique to the Neocallimastigomycota. High quality genomes of the anaerobic fungi Anaeromyces robustus, Neocallimastix californiae and Piromyces finnis were assembled with long-read, single molecule technology to overcome their repeat-richness and extremely low GC content. Genomic analysis coupled with proteomic validation revealed an average 320 NCDD-containing proteins per fungal strain that were overwhelmingly carbohydrate active enzymes (CAZymes), with 95 large fungal scaffoldins identified across 4 genera that contain a conserved amino acid sequence repeat that binds to NCDDs. Fungal dockerin and scaffoldin domains have no similarity to their bacterial counterparts, yet several catalytic domains originated via horizontal gene transfer with gut bacteria. Though many catalytic domains are shared with bacteria, the biocatalytic activity of anaerobic fungi is expanded by the inclusion of GH3, GH6, and GH45 enzymes in the enzyme complexes. Collectively, these findings suggest that the fungal cellulosome is an evolutionarily

  10. Diverse circovirus-like genome architectures revealed by environmental metagenomics.

    Science.gov (United States)

    Rosario, Karyna; Duffy, Siobain; Breitbart, Mya

    2009-10-01

    Single-stranded DNA (ssDNA) viruses with circular genomes are the smallest viruses known to infect eukaryotes. The present study identified 10 novel genomes similar to ssDNA circoviruses through data-mining of public viral metagenomes. The metagenomic libraries included samples from reclaimed water and three different marine environments (Chesapeake Bay, British Columbia coastal waters and Sargasso Sea). All the genomes have similarities to the replication (Rep) protein of circoviruses; however, only half have genomic features consistent with known circoviruses. Some of the genomes exhibit a mixture of genomic features associated with different families of ssDNA viruses (i.e. circoviruses, geminiviruses and parvoviruses). Unique genome architectures and phylogenetic analysis of the Rep protein suggest that these viruses belong to novel genera and/or families. Investigating the complex community of ssDNA viruses in the environment can lead to the discovery of divergent species and help elucidate evolutionary links between ssDNA viruses.

  11. The Vigna Genome Server, 'VigGS': A Genomic Knowledge Base of the Genus Vigna Based on High-Quality, Annotated Genome Sequence of the Azuki Bean, Vigna angularis (Willd.) Ohwi & Ohashi.

    Science.gov (United States)

    Sakai, Hiroaki; Naito, Ken; Takahashi, Yu; Sato, Toshiyuki; Yamamoto, Toshiya; Muto, Isamu; Itoh, Takeshi; Tomooka, Norihiko

    2016-01-01

    The genus Vigna includes legume crops such as cowpea, mungbean and azuki bean, as well as >100 wild species. A number of the wild species are highly tolerant to severe environmental conditions including high-salinity, acid or alkaline soil; drought; flooding; and pests and diseases. These features of the genus Vigna make it a good target for investigation of genetic diversity in adaptation to stressful environments; however, a lack of genomic information has hindered such research in this genus. Here, we present a genome database of the genus Vigna, Vigna Genome Server ('VigGS', http://viggs.dna.affrc.go.jp), based on the recently sequenced azuki bean genome, which incorporates annotated exon-intron structures, along with evidence for transcripts and proteins, visualized in GBrowse. VigGS also facilitates user construction of multiple alignments between azuki bean genes and those of six related dicot species. In addition, the database displays sequence polymorphisms between azuki bean and its wild relatives and enables users to design primer sequences targeting any variant site. VigGS offers a simple keyword search in addition to sequence similarity searches using BLAST and BLAT. To incorporate up to date genomic information, VigGS automatically receives newly deposited mRNA sequences of pre-set species from the public database once a week. Users can refer to not only gene structures mapped on the azuki bean genome on GBrowse but also relevant literature of the genes. VigGS will contribute to genomic research into plant biotic and abiotic stresses and to the future development of new stress-tolerant crops. © The Author 2015. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists. All rights reserved. For permissions, please email: journals.permissions@oup.com.

  12. Genomic view of bipolar disorder revealed by whole genome sequencing in a genetic isolate.

    Directory of Open Access Journals (Sweden)

    Benjamin Georgi

    2014-03-01

    Full Text Available Bipolar disorder is a common, heritable mental illness characterized by recurrent episodes of mania and depression. Despite considerable effort to elucidate the genetic underpinnings of bipolar disorder, causative genetic risk factors remain elusive. We conducted a comprehensive genomic analysis of bipolar disorder in a large Old Order Amish pedigree. Microsatellite genotypes and high-density SNP-array genotypes of 388 family members were combined with whole genome sequence data for 50 of these subjects, comprising 18 parent-child trios. This study design permitted evaluation of candidate variants within the context of haplotype structure by resolving the phase in sequenced parent-child trios and by imputation of variants into multiple unsequenced siblings. Non-parametric and parametric linkage analysis of the entire pedigree as well as on smaller clusters of families identified several nominally significant linkage peaks, each of which included dozens of predicted deleterious variants. Close inspection of exonic and regulatory variants in genes under the linkage peaks using family-based association tests revealed additional credible candidate genes for functional studies and further replication in population-based cohorts. However, despite the in-depth genomic characterization of this unique, large and multigenerational pedigree from a genetic isolate, there was no convergence of evidence implicating a particular set of risk loci or common pathways. The striking haplotype and locus heterogeneity we observed has profound implications for the design of studies of bipolar and other related disorders.

  13. Genomic View of Bipolar Disorder Revealed by Whole Genome Sequencing in a Genetic Isolate

    Science.gov (United States)

    Georgi, Benjamin; Craig, David; Kember, Rachel L.; Liu, Wencheng; Lindquist, Ingrid; Nasser, Sara; Brown, Christopher; Egeland, Janice A.; Paul, Steven M.; Bućan, Maja

    2014-01-01

    Bipolar disorder is a common, heritable mental illness characterized by recurrent episodes of mania and depression. Despite considerable effort to elucidate the genetic underpinnings of bipolar disorder, causative genetic risk factors remain elusive. We conducted a comprehensive genomic analysis of bipolar disorder in a large Old Order Amish pedigree. Microsatellite genotypes and high-density SNP-array genotypes of 388 family members were combined with whole genome sequence data for 50 of these subjects, comprising 18 parent-child trios. This study design permitted evaluation of candidate variants within the context of haplotype structure by resolving the phase in sequenced parent-child trios and by imputation of variants into multiple unsequenced siblings. Non-parametric and parametric linkage analysis of the entire pedigree as well as on smaller clusters of families identified several nominally significant linkage peaks, each of which included dozens of predicted deleterious variants. Close inspection of exonic and regulatory variants in genes under the linkage peaks using family-based association tests revealed additional credible candidate genes for functional studies and further replication in population-based cohorts. However, despite the in-depth genomic characterization of this unique, large and multigenerational pedigree from a genetic isolate, there was no convergence of evidence implicating a particular set of risk loci or common pathways. The striking haplotype and locus heterogeneity we observed has profound implications for the design of studies of bipolar and other related disorders. PMID:24625924

  14. ATLAS (Automatic Tool for Local Assembly Structures) - A Comprehensive Infrastructure for Assembly, Annotation, and Genomic Binning of Metagenomic and Metaranscripomic Data

    Energy Technology Data Exchange (ETDEWEB)

    White, Richard A.; Brown, Joseph M.; Colby, Sean M.; Overall, Christopher C.; Lee, Joon-Yong; Zucker, Jeremy D.; Glaesemann, Kurt R.; Jansson, Georg C.; Jansson, Janet K.

    2017-03-02

    ATLAS (Automatic Tool for Local Assembly Structures) is a comprehensive multiomics data analysis pipeline that is massively parallel and scalable. ATLAS contains a modular analysis pipeline for assembly, annotation, quantification and genome binning of metagenomics and metatranscriptomics data and a framework for reference metaproteomic database construction. ATLAS transforms raw sequence data into functional and taxonomic data at the microbial population level and provides genome-centric resolution through genome binning. ATLAS provides robust taxonomy based on majority voting of protein coding open reading frames rolled-up at the contig level using modified lowest common ancestor (LCA) analysis. ATLAS provides robust taxonomy based on majority voting of protein coding open reading frames rolled-up at the contig level using modified lowest common ancestor (LCA) analysis. ATLAS is user-friendly, easy install through bioconda maintained as open-source on GitHub, and is implemented in Snakemake for modular customizable workflows.

  15. Supplementary Material for: Whole genome sequencing reveals genomic heterogeneity and antibiotic purification in Mycobacterium tuberculosis isolates

    KAUST Repository

    Black, PA; Vos, M. de; Louw, GE; Merwe, RG van der; Dippenaar, A.; Streicher, EM; Abdallah, AM; Sampson, SL; Victor, TC; Dolby, T.; Simpson, JA; Helden, PD van; Warren, RM; Pain, Arnab

    2015-01-01

    Abstract Background Whole genome sequencing has revolutionised the interrogation of mycobacterial genomes. Recent studies have reported conflicting findings on the genomic stability of Mycobacterium tuberculosis during the evolution of drug

  16. Symbiodinium genomes reveal adaptive evolution of functions related to symbiosis

    KAUST Repository

    Liu, Huanle; Stephens, Timothy G.; Gonzá lez-Pech, Raú l; Beltran, Victor H.; Lapeyre, Bruno; Bongaerts, Pim; Cooke, Ira; Bourne, David G.; Forê t, Sylvain; Miller, David John; van Oppen, Madeleine J. H.; Voolstra, Christian R.; Ragan, Mark A.; Chan, Cheong Xin

    2017-01-01

    Symbiosis between dinoflagellates of the genus Symbiodinium and reef-building corals forms the trophic foundation of the world's coral reef ecosystems. Here we present the first draft genome of Symbiodinium goreaui (Clade C, type C1: 1.03 Gbp), one of the most ubiquitous endosymbionts associated with corals, and an improved draft genome of Symbiodinium kawagutii (Clade F, strain CS-156: 1.05 Gbp), previously sequenced as strain CCMP2468, to further elucidate genomic signatures of this symbiosis. Comparative analysis of four available Symbiodinium genomes against other dinoflagellate genomes led to the identification of 2460 nuclear gene families that show evidence of positive selection, including genes involved in photosynthesis, transmembrane ion transport, synthesis and modification of amino acids and glycoproteins, and stress response. Further, we identified extensive sets of genes for meiosis and response to light stress. These draft genomes provide a foundational resource for advancing our understanding Symbiodinium biology and the coral-algal symbiosis.

  17. Symbiodinium genomes reveal adaptive evolution of functions related to symbiosis

    KAUST Repository

    Liu, Huanle

    2017-10-06

    Symbiosis between dinoflagellates of the genus Symbiodinium and reef-building corals forms the trophic foundation of the world\\'s coral reef ecosystems. Here we present the first draft genome of Symbiodinium goreaui (Clade C, type C1: 1.03 Gbp), one of the most ubiquitous endosymbionts associated with corals, and an improved draft genome of Symbiodinium kawagutii (Clade F, strain CS-156: 1.05 Gbp), previously sequenced as strain CCMP2468, to further elucidate genomic signatures of this symbiosis. Comparative analysis of four available Symbiodinium genomes against other dinoflagellate genomes led to the identification of 2460 nuclear gene families that show evidence of positive selection, including genes involved in photosynthesis, transmembrane ion transport, synthesis and modification of amino acids and glycoproteins, and stress response. Further, we identified extensive sets of genes for meiosis and response to light stress. These draft genomes provide a foundational resource for advancing our understanding Symbiodinium biology and the coral-algal symbiosis.

  18. MSeqDR mvTool: A mitochondrial DNA Web and API resource for comprehensive variant annotation, universal nomenclature collation, and reference genome conversion.

    Science.gov (United States)

    Shen, Lishuang; Attimonelli, Marcella; Bai, Renkui; Lott, Marie T; Wallace, Douglas C; Falk, Marni J; Gai, Xiaowu

    2018-06-01

    Accurate mitochondrial DNA (mtDNA) variant annotation is essential for the clinical diagnosis of diverse human diseases. Substantial challenges to this process include the inconsistency in mtDNA nomenclatures, the existence of multiple reference genomes, and a lack of reference population frequency data. Clinicians need a simple bioinformatics tool that is user-friendly, and bioinformaticians need a powerful informatics resource for programmatic usage. Here, we report the development and functionality of the MSeqDR mtDNA Variant Tool set (mvTool), a one-stop mtDNA variant annotation and analysis Web service. mvTool is built upon the MSeqDR infrastructure (https://mseqdr.org), with contributions of expert curated data from MITOMAP (https://www.mitomap.org) and HmtDB (https://www.hmtdb.uniba.it/hmdb). mvTool supports all mtDNA nomenclatures, converts variants to standard rCRS- and HGVS-based nomenclatures, and annotates novel mtDNA variants. Besides generic annotations from dbNSFP and Variant Effect Predictor (VEP), mvTool provides allele frequencies in more than 47,000 germline mitogenomes, and disease and pathogenicity classifications from MSeqDR, Mitomap, HmtDB and ClinVar (Landrum et al., 2013). mvTools also provides mtDNA somatic variants annotations. "mvTool API" is implemented for programmatic access using inputs in VCF, HGVS, or classical mtDNA variant nomenclatures. The results are reported as hyperlinked html tables, JSON, Excel, and VCF formats. MSeqDR mvTool is freely accessible at https://mseqdr.org/mvtool.php. © 2018 Wiley Periodicals, Inc.

  19. Prosecutor: parameter-free inference of gene function for prokaryotes using DNA microarray data, genomic context and multiple gene annotation sources

    Directory of Open Access Journals (Sweden)

    van Hijum Sacha AFT

    2008-10-01

    Full Text Available Abstract Background Despite a plethora of functional genomic efforts, the function of many genes in sequenced genomes remains unknown. The increasing amount of microarray data for many species allows employing the guilt-by-association principle to predict function on a large scale: genes exhibiting similar expression patterns are more likely to participate in shared biological processes. Results We developed Prosecutor, an application that enables researchers to rapidly infer gene function based on available gene expression data and functional annotations. Our parameter-free functional prediction method uses a sensitive algorithm to achieve a high association rate of linking genes with unknown function to annotated genes. Furthermore, Prosecutor utilizes additional biological information such as genomic context and known regulatory mechanisms that are specific for prokaryotes. We analyzed publicly available transcriptome data sets and used literature sources to validate putative functions suggested by Prosecutor. We supply the complete results of our analysis for 11 prokaryotic organisms on a dedicated website. Conclusion The Prosecutor software and supplementary datasets available at http://www.prosecutor.nl allow researchers working on any of the analyzed organisms to quickly identify the putative functions of their genes of interest. A de novo analysis allows new organisms to be studied.

  20. Genome analysis of Excretory/Secretory proteins in Taenia solium reveals their Abundance of Antigenic Regions (AAR).

    Science.gov (United States)

    Gomez, Sandra; Adalid-Peralta, Laura; Palafox-Fonseca, Hector; Cantu-Robles, Vito Adrian; Soberón, Xavier; Sciutto, Edda; Fragoso, Gladis; Bobes, Raúl J; Laclette, Juan P; Yauner, Luis del Pozo; Ochoa-Leyva, Adrián

    2015-05-19

    Excretory/Secretory (ES) proteins play an important role in the host-parasite interactions. Experimental identification of ES proteins is time-consuming and expensive. Alternative bioinformatics approaches are cost-effective and can be used to prioritize the experimental analysis of therapeutic targets for parasitic diseases. Here we predicted and functionally annotated the ES proteins in T. solium genome using an integration of bioinformatics tools. Additionally, we developed a novel measurement to evaluate the potential antigenicity of T. solium secretome using sequence length and number of antigenic regions of ES proteins. This measurement was formalized as the Abundance of Antigenic Regions (AAR) value. AAR value for secretome showed a similar value to that obtained for a set of experimentally determined antigenic proteins and was different to the calculated value for the non-ES proteins of T. solium genome. Furthermore, we calculated the AAR values for known helminth secretomes and they were similar to that obtained for T. solium. The results reveal the utility of AAR value as a novel genomic measurement to evaluate the potential antigenicity of secretomes. This comprehensive analysis of T. solium secretome provides functional information for future experimental studies, including the identification of novel ES proteins of therapeutic, diagnosis and immunological interest.

  1. The Capsaspora genome reveals a complex unicellular prehistory of animals.

    Science.gov (United States)

    Suga, Hiroshi; Chen, Zehua; de Mendoza, Alex; Sebé-Pedrós, Arnau; Brown, Matthew W; Kramer, Eric; Carr, Martin; Kerner, Pierre; Vervoort, Michel; Sánchez-Pons, Núria; Torruella, Guifré; Derelle, Romain; Manning, Gerard; Lang, B Franz; Russ, Carsten; Haas, Brian J; Roger, Andrew J; Nusbaum, Chad; Ruiz-Trillo, Iñaki

    2013-01-01

    To reconstruct the evolutionary origin of multicellular animals from their unicellular ancestors, the genome sequences of diverse unicellular relatives are essential. However, only the genome of the choanoflagellate Monosiga brevicollis has been reported to date. Here we completely sequence the genome of the filasterean Capsaspora owczarzaki, the closest known unicellular relative of metazoans besides choanoflagellates. Analyses of this genome alter our understanding of the molecular complexity of metazoans' unicellular ancestors showing that they had a richer repertoire of proteins involved in cell adhesion and transcriptional regulation than previously inferred only with the choanoflagellate genome. Some of these proteins were secondarily lost in choanoflagellates. In contrast, most intercellular signalling systems controlling development evolved later concomitant with the emergence of the first metazoans. We propose that the acquisition of these metazoan-specific developmental systems and the co-option of pre-existing genes drove the evolutionary transition from unicellular protists to metazoans.

  2. The Genome Sequence of Methanohalophilus mahii SLPT Reveals Differences in the Energy Metabolism among Members of the Methanosarcinaceae Inhabiting Freshwater and Saline Environments

    Directory of Open Access Journals (Sweden)

    Stefan Spring

    2010-01-01

    Full Text Available Methanohalophilus mahii is the type species of the genus Methanohalophilus, which currently comprises three distinct species with validly published names. Mhp. mahii represents moderately halophilic methanogenic archaea with a strictly methylotrophic metabolism. The type strain SLPT was isolated from hypersaline sediments collected from the southern arm of Great Salt Lake, Utah. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 2,012,424 bp genome is a single replicon with 2032 protein-coding and 63 RNA genes and part of the Genomic Encyclopedia of Bacteria and Archaea project. A comparison of the reconstructed energy metabolism in the halophilic species Mhp. mahii with other representatives of the Methanosarcinaceae reveals some interesting differences to freshwater species.

  3. The Genome Sequence of Methanohalophilus mahii SLPT Reveals Differences in the Energy Metabolism among Members of the Methanosarcinaceae Inhabiting Freshwater and Saline Environments

    Energy Technology Data Exchange (ETDEWEB)

    Spring, Stefan [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Scheuner, Carmen [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute; Lucas, Susan [U.S. Department of Energy, Joint Genome Institute; Glavina Del Rio, Tijana [U.S. Department of Energy, Joint Genome Institute; Tice, Hope [U.S. Department of Energy, Joint Genome Institute; Copeland, A [U.S. Department of Energy, Joint Genome Institute; Cheng, Jan-Fang [U.S. Department of Energy, Joint Genome Institute; Chen, Feng [U.S. Department of Energy, Joint Genome Institute; Nolan, Matt [U.S. Department of Energy, Joint Genome Institute; Saunders, Elizabeth H [Los Alamos National Laboratory (LANL); Pitluck, Sam [U.S. Department of Energy, Joint Genome Institute; Liolios, Konstantinos [U.S. Department of Energy, Joint Genome Institute; Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Mavromatis, K [U.S. Department of Energy, Joint Genome Institute; Lykidis, A [U.S. Department of Energy, Joint Genome Institute; Pati, Amrita [U.S. Department of Energy, Joint Genome Institute; Chen, Amy [U.S. Department of Energy, Joint Genome Institute; Palaniappan, Krishna [U.S. Department of Energy, Joint Genome Institute; Land, Miriam L [ORNL; Hauser, Loren John [ORNL; Chang, Yun-Juan [ORNL; Jeffries, Cynthia [Oak Ridge National Laboratory (ORNL); Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Detter, J. Chris [U.S. Department of Energy, Joint Genome Institute; Brettin, Thomas S [ORNL; Rohde, Manfred [HZI - Helmholtz Centre for Infection Research, Braunschweig, Germany; Goker, Markus [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Woyke, Tanja [U.S. Department of Energy, Joint Genome Institute; Bristow, James [U.S. Department of Energy, Joint Genome Institute; Eisen, Jonathan [U.S. Department of Energy, Joint Genome Institute; Markowitz, Victor [U.S. Department of Energy, Joint Genome Institute; Hugenholtz, Philip [U.S. Department of Energy, Joint Genome Institute; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute; Klenk, Hans-Peter [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany

    2010-01-01

    Methanohalophilus mahii is the type species of the genus Methanohalophilus, which currently comprises three distinct species with validly published names. Mhp. mahii represents moderately halophilic methanogenic archaea with a strictly methylotrophic metabolism. The type strain SLPT was isolated from hypersaline sediments collected from the southern arm of Great Salt Lake, Utah. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 2,012,424 bp genome is a single replicon with 2032 protein-coding and 63 RNA genes and part of the Genomic Encyclopedia of Bacteria and Archaea project. A comparison of the reconstructed energy metabolism in the halophilic species Mhp. mahii with other representatives of the Methanosarcinaceae reveals some interesting differences to freshwater species.

  4. The Genome Sequence of Methanohalophilus mahii SLPT Reveals Differences in the Energy Metabolism among Members of the Methanosarcinaceae Inhabiting Freshwater and Saline Environments

    Energy Technology Data Exchange (ETDEWEB)

    Spring, Stefan [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Scheuner, Carmen [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Lapidus, Alla L. [Joint Genome Institute, Walnut Creek, California; Lucas, Susan [Joint Genome Institute, Walnut Creek, California; Glavina Del Rio, Tijana [Joint Genome Institute, Walnut Creek, California; Tice, Hope [Joint Genome Institute, Walnut Creek, California; Copeland, A [U.S. Department of Energy, Joint Genome Institute; Cheng, Jan-Fang [Joint Genome Institute, Walnut Creek, California; Chen, Feng [Joint Genome Institute, Walnut Creek, California; Nolan, Matt [Joint Genome Institute, Walnut Creek, California; Saunders, Elizabeth H [Los Alamos National Laboratory (LANL); Pitluck, Samuel [ORNL; Liolios, Konstantinos [Joint Genome Institute, Walnut Creek, California; Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Mavromatis, K [U.S. Department of Energy, Joint Genome Institute; Lykidis, A [U.S. Department of Energy, Joint Genome Institute; Pati, Amrita [U.S. Department of Energy, Joint Genome Institute; Chen, Amy [Joint Genome Institute, Walnut Creek, California; Palaniappan, Krishna [Joint Genome Institute, Walnut Creek, California; Land, Miriam L [ORNL; Hauser, Loren John [ORNL; Chang, Yun-Juan [ORNL; Jeffries, Cynthia D [ORNL; Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Detter, J. Chris [Joint Genome Institute, Walnut Creek, California; Brettin, Thomas S [ORNL; Rohde, Manfred [HZI - Helmholtz Centre for Infection Research, Braunschweig, Germany; Goker, Markus [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Woyke, Tanja [ORNL; Bristow, James [Joint Genome Institute, Walnut Creek, California; Eisen, Jonathan [Joint Genome Institute, Walnut Creek, California; Markowitz, Victor [Joint Genome Institute, Walnut Creek, California; Hugenholtz, Philip [U.S. Department of Energy, Joint Genome Institute; Kyrpidis, Nikos C [ORNL; Klenk, Hans-Peter [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany

    2010-12-01

    Methanohalophilus mahii is the type species of the genus Methanohalophilus, which currently comprises three distinct species with validly published names. Mhp. mahii represents moderately halophilic methanogenic archaea with a strictly methylotrophic metabolism. The type strain SLPT was isolated from hypersaline sediments collected from the southern arm of Great Salt Lake, Utah. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 2,012,424 bp genome is a single replicon with 2032 protein-coding and 63 RNA genes and part of the Genomic Encyclopedia of Bacteria and Archaea project. A comparison of the reconstructed energy metabolism in the halophilic species Mhp. mahii with other representatives of the Methanosarcinaceae reveals some interesting differences to freshwater species.

  5. Integrated analysis of whole genome and transcriptome sequencing reveals diverse transcriptomic aberrations driven by somatic genomic changes in liver cancers.

    Directory of Open Access Journals (Sweden)

    Yuichi Shiraishi

    Full Text Available Recent studies applying high-throughput sequencing technologies have identified several recurrently mutated genes and pathways in multiple cancer genomes. However, transcriptional consequences from these genomic alterations in cancer genome remain unclear. In this study, we performed integrated and comparative analyses of whole genomes and transcriptomes of 22 hepatitis B virus (HBV-related hepatocellular carcinomas (HCCs and their matched controls. Comparison of whole genome sequence (WGS and RNA-Seq revealed much evidence that various types of genomic mutations triggered diverse transcriptional changes. Not only splice-site mutations, but also silent mutations in coding regions, deep intronic mutations and structural changes caused splicing aberrations. HBV integrations generated diverse patterns of virus-human fusion transcripts depending on affected gene, such as TERT, CDK15, FN1 and MLL4. Structural variations could drive over-expression of genes such as WNT ligands, with/without creating gene fusions. Furthermore, by taking account of genomic mutations causing transcriptional aberrations, we could improve the sensitivity of deleterious mutation detection in known cancer driver genes (TP53, AXIN1, ARID2, RPS6KA3, and identified recurrent disruptions in putative cancer driver genes such as HNF4A, CPS1, TSC1 and THRAP3 in HCCs. These findings indicate genomic alterations in cancer genome have diverse transcriptomic effects, and integrated analysis of WGS and RNA-Seq can facilitate the interpretation of a large number of genomic alterations detected in cancer genome.

  6. Nannochloropsis genomes reveal evolution of microalgal oleaginous traits.

    Directory of Open Access Journals (Sweden)

    Dongmei Wang

    2014-01-01

    Full Text Available Oleaginous microalgae are promising feedstock for biofuels, yet the genetic diversity, origin and evolution of oleaginous traits remain largely unknown. Here we present a detailed phylogenomic analysis of five oleaginous Nannochloropsis species (a total of six strains and one time-series transcriptome dataset for triacylglycerol (TAG synthesis on one representative strain. Despite small genome sizes, high coding potential and relative paucity of mobile elements, the genomes feature small cores of ca. 2,700 protein-coding genes and a large pan-genome of >38,000 genes. The six genomes share key oleaginous traits, such as the enrichment of selected lipid biosynthesis genes and certain glycoside hydrolase genes that potentially shift carbon flux from chrysolaminaran to TAG synthesis. The eleven type II diacylglycerol acyltransferase genes (DGAT-2 in every strain, each expressed during TAG synthesis, likely originated from three ancient genomes, including the secondary endosymbiosis host and the engulfed green and red algae. Horizontal gene transfers were inferred in most lipid synthesis nodes with expanded gene doses and many glycoside hydrolase genes. Thus multiple genome pooling and horizontal genetic exchange, together with selective inheritance of lipid synthesis genes and species-specific gene loss, have led to the enormous genetic apparatus for oleaginousness and the wide genomic divergence among present-day Nannochloropsis. These findings have important implications in the screening and genetic engineering of microalgae for biofuels.

  7. Comparative genome analysis of Prevotella intermedia strain isolated from infected root canal reveals features related to pathogenicity and adaptation.

    Science.gov (United States)

    Ruan, Yunfeng; Shen, Lu; Zou, Yan; Qi, Zhengnan; Yin, Jun; Jiang, Jie; Guo, Liang; He, Lin; Chen, Zijiang; Tang, Zisheng; Qin, Shengying

    2015-02-25

    Many species of the genus Prevotella are pathogens that cause oral diseases. Prevotella intermedia is known to cause various oral disorders e.g. periodontal disease, periapical periodontitis and noma as well as colonize in the respiratory tract and be associated with cystic fibrosis and chronic bronchitis. It is of clinical significance to identify the main drive of its various adaptation and pathogenicity. In order to explore the intra-species genetic differences among strains of Prevotella intermedia of different niches, we isolated a strain Prevotella intermedia ZT from the infected root canal of a Chinese patient with periapical periodontitis and gained a draft genome sequence. We annotated the genome and compared it with the genomes of other taxa in the genus Prevotella. The raw data set, consisting of approximately 65X-coverage reads, was trimmed and assembled into contigs from which 2165 ORFs were predicted. The comparison of the Prevotella intermedia ZT genome sequence with the published genome sequence of Prevotella intermedia 17 and Prevotella intermedia ATCC25611 revealed that ~14% of the genes were strain-specific. The Preveotella intermedia strains share a set of conserved genes contributing to its adaptation and pathogenic and possess strain-specific genes especially those involved in adhesion and secreting bacteriocin. The Prevotella intermedia ZT shares similar gene content with other taxa of genus Prevotella. The genomes of the genus Prevotella is highly dynamic with relative conserved parts: on average, about half of the genes in one Prevotella genome were not included in another genome of the different Prevotella species. The degree of conservation varied with different pathways: the ability of amino acid biosynthesis varied greatly with species but the pathway of cell wall components biosynthesis were nearly constant. Phylogenetic tree shows that the taxa from different niches are scarcely distributed among clades. Prevotella intermedia ZT

  8. Complete genome sequence of Brachyspira intermedia reveals unique genomic features in Brachyspira species and phage-mediated horizontal gene transfer

    Science.gov (United States)

    2011-01-01

    Background Brachyspira spp. colonize the intestines of some mammalian and avian species and show different degrees of enteropathogenicity. Brachyspira intermedia can cause production losses in chickens and strain PWS/AT now becomes the fourth genome to be completed in the genus Brachyspira. Results 15 classes of unique and shared genes were analyzed in B. intermedia, B. murdochii, B. hyodysenteriae and B. pilosicoli. The largest number of unique genes was found in B. intermedia and B. murdochii. This indicates the presence of larger pan-genomes. In general, hypothetical protein annotations are overrepresented among the unique genes. A 3.2 kb plasmid was found in B. intermedia strain PWS/AT. The plasmid was also present in the B. murdochii strain but not in nine other Brachyspira isolates. Within the Brachyspira genomes, genes had been translocated and also frequently switched between leading and lagging strands, a process that can be followed by different AT-skews in the third positions of synonymous codons. We also found evidence that bacteriophages were being remodeled and genes incorporated into them. Conclusions The accessory gene pool shapes species-specific traits. It is also influenced by reductive genome evolution and horizontal gene transfer. Gene-transfer events can cross both species and genus boundaries and bacteriophages appear to play an important role in this process. A mechanism for horizontal gene transfer appears to be gene translocations leading to remodeling of bacteriophages in combination with broad tropism. PMID:21816042

  9. Comparative sequence analysis of Solanum and Arabidopsis in a hot spot for pathogen resistance on potato chromosome V reveals a patchwork of conserved and rapidly evolving genome segments

    Directory of Open Access Journals (Sweden)

    Bruggmann Rémy

    2007-05-01

    Full Text Available Abstract Background Quantitative phenotypic variation of agronomic characters in crop plants is controlled by environmental and genetic factors (quantitative trait loci = QTL. To understand the molecular basis of such QTL, the identification of the underlying genes is of primary interest and DNA sequence analysis of the genomic regions harboring QTL is a prerequisite for that. QTL mapping in potato (Solanum tuberosum has identified a region on chromosome V tagged by DNA markers GP21 and GP179, which contains a number of important QTL, among others QTL for resistance to late blight caused by the oomycete Phytophthora infestans and to root cyst nematodes. Results To obtain genomic sequence for the targeted region on chromosome V, two local BAC (bacterial artificial chromosome contigs were constructed and sequenced, which corresponded to parts of the homologous chromosomes of the diploid, heterozygous genotype P6/210. Two contiguous sequences of 417,445 and 202,781 base pairs were assembled and annotated. Gene-by-gene co-linearity was disrupted by non-allelic insertions of retrotransposon elements, stretches of diverged intergenic sequences, differences in gene content and gene order. The latter was caused by inversion of a 70 kbp genomic fragment. These features were also found in comparison to orthologous sequence contigs from three homeologous chromosomes of Solanum demissum, a wild tuber bearing species. Functional annotation of the sequence identified 48 putative open reading frames (ORF in one contig and 22 in the other, with an average of one ORF every 9 kbp. Ten ORFs were classified as resistance-gene-like, 11 as F-box-containing genes, 13 as transposable elements and three as transcription factors. Comparing potato to Arabidopsis thaliana annotated proteins revealed five micro-syntenic blocks of three to seven ORFs with A. thaliana chromosomes 1, 3 and 5. Conclusion Comparative sequence analysis revealed highly conserved collinear regions

  10. Case of 7p22.1 Microduplication Detected by Whole Genome Microarray (REVEAL) in Workup of Child Diagnosed with Autism

    OpenAIRE

    Goitia, Veronica; Oquendo, Marcial; Stratton, Robert

    2015-01-01

    Introduction. More than 60 cases of 7p22 duplications and deletions have been reported with over 16 of them occurring without concomitant chromosomal abnormalities. Patient and Methods. We report a 29-month-old male diagnosed with autism. Whole genome chromosome SNP microarray (REVEAL) demonstrated a 1.3?Mb interstitial duplication of 7p22.1 ->p22.1 arr 7p22.1 (5,436,367?6,762,394), the second smallest interstitial 7p duplication reported to date. This interval included 14 OMIM annotated gene...

  11. Annotated ESTs from various tissues of the brown planthopper Nilaparvata lugens: a genomic resource for studying agricultural pests.

    Science.gov (United States)

    Noda, Hiroaki; Kawai, Sawako; Koizumi, Yoko; Matsui, Kageaki; Zhang, Qiang; Furukawa, Shigetoyo; Shimomura, Michihiko; Mita, Kazuei

    2008-03-03

    The brown planthopper (BPH), Nilaparvata lugens (Hemiptera, Delphacidae), is a serious insect pests of rice plants. Major means of BPH control are application of agricultural chemicals and cultivation of BPH resistant rice varieties. Nevertheless, BPH strains that are resistant to agricultural chemicals have developed, and BPH strains have appeared that are virulent against the resistant rice varieties. Expressed sequence tag (EST) analysis and related applications are useful to elucidate the mechanisms of resistance and virulence and to reveal physiological aspects of this non-model insect, with its poorly understood genetic background. More than 37,000 high-quality ESTs, excluding sequences of mitochondrial genome, microbial genomes, and rDNA, have been produced from 18 libraries of various BPH tissues and stages. About 10,200 clusters have been made from whole EST sequences, with average EST size of 627 bp. Among the top ten most abundantly expressed genes, three are unique and show no homology in BLAST searches. The actin gene was highly expressed in BPH, especially in the thorax. Tissue-specifically expressed genes were extracted based on the expression frequency among the libraries. An EST database is available at our web site. The EST library will provide useful information for transcriptional analyses, proteomic analyses, and gene functional analyses of BPH. Moreover, specific genes for hemimetabolous insects will be identified. The microarray fabricated based on the EST information will be useful for finding genes related to agricultural and biological problems related to this pest.

  12. Annotated ESTs from various tissues of the brown planthopper Nilaparvata lugens: A genomic resource for studying agricultural pests

    Directory of Open Access Journals (Sweden)

    Zhang Qiang

    2008-03-01

    Full Text Available Abstract Background The brown planthopper (BPH, Nilaparvata lugens (Hemiptera, Delphacidae, is a serious insect pests of rice plants. Major means of BPH control are application of agricultural chemicals and cultivation of BPH resistant rice varieties. Nevertheless, BPH strains that are resistant to agricultural chemicals have developed, and BPH strains have appeared that are virulent against the resistant rice varieties. Expressed sequence tag (EST analysis and related applications are useful to elucidate the mechanisms of resistance and virulence and to reveal physiological aspects of this non-model insect, with its poorly understood genetic background. Results More than 37,000 high-quality ESTs, excluding sequences of mitochondrial genome, microbial genomes, and rDNA, have been produced from 18 libraries of various BPH tissues and stages. About 10,200 clusters have been made from whole EST sequences, with average EST size of 627 bp. Among the top ten most abundantly expressed genes, three are unique and show no homology in BLAST searches. The actin gene was highly expressed in BPH, especially in the thorax. Tissue-specifically expressed genes were extracted based on the expression frequency among the libraries. An EST database is available at our web site. Conclusion The EST library will provide useful information for transcriptional analyses, proteomic analyses, and gene functional analyses of BPH. Moreover, specific genes for hemimetabolous insects will be identified. The microarray fabricated based on the EST information will be useful for finding genes related to agricultural and biological problems related to this pest.

  13. Three crocodilian genomes reveal ancestral patterns of evolution among archosaurs

    Science.gov (United States)

    Green, Richard E; Braun, Edward L; Armstrong, Joel; Earl, Dent; Nguyen, Ngan; Hickey, Glenn; Vandewege, Michael W; St John, John A; Capella-Gutiérrez, Salvador; Castoe, Todd A; Kern, Colin; Fujita, Matthew K; Opazo, Juan C; Jurka, Jerzy; Kojima, Kenji K; Caballero, Juan; Hubley, Robert M; Smit, Arian F; Platt, Roy N; Lavoie, Christine A; Ramakodi, Meganathan P; Finger, John W; Suh, Alexander; Isberg, Sally R; Miles, Lee; Chong, Amanda Y; Jaratlerdsiri, Weerachai; Gongora, Jaime; Moran, Christopher; Iriarte, Andrés; McCormack, John; Burgess, Shane C; Edwards, Scott V; Lyons, Eric; Williams, Christina; Breen, Matthew; Howard, Jason T; Gresham, Cathy R; Peterson, Daniel G; Schmitz, Jürgen; Pollock, David D; Haussler, David; Triplett, Eric W; Zhang, Guojie; Irie, Naoki; Jarvis, Erich D; Brochu, Christopher A; Schmidt, Carl J; McCarthy, Fiona M; Faircloth, Brant C; Hoffmann, Federico G; Glenn, Travis C; Gabaldón, Toni; Paten, Benedict; Ray, David A

    2015-01-01

    To provide context for the diversifications of archosaurs, the group that includes crocodilians, dinosaurs and birds, we generated draft genomes of three crocodilians, Alligator mississippiensis (the American alligator), Crocodylus porosus (the saltwater crocodile), and Gavialis gangeticus (the Indian gharial). We observed an exceptionally slow rate of genome evolution within crocodilians at all levels, including nucleotide substitutions, indels, transposable element content and movement, gene family evolution, and chromosomal synteny. When placed within the context of related taxa including birds and turtles, this suggests that the common ancestor of all of these taxa also exhibited slow genome evolution and that the relatively rapid evolution of bird genomes represents an autapomorphy within that clade. The data also provided the opportunity to analyze heterozygosity in crocodilians, which indicates a likely reduction in population size for all three taxa through the Pleistocene. Finally, these new data combined with newly published bird genomes allowed us to reconstruct the partial genome of the common ancestor of archosaurs providing a tool to investigate the genetic starting material of crocodilians, birds, and dinosaurs. PMID:25504731

  14. Comparative Genomics of Methanopyrus sp. SNP6 and KOL6 Revealing Genomic Regions of Plasticity Implicated in Extremely Thermophilic Profiles

    Directory of Open Access Journals (Sweden)

    Zhiliang Yu

    2017-07-01

    Full Text Available Methanopyrus spp. are usually isolated from harsh niches, such as high osmotic pressure and extreme temperature. However, the molecular mechanisms for their environmental adaption are poorly understood. Archaeal species is commonly considered as primitive organism. The evolutional placement of archaea is a fundamental and intriguing scientific question. We sequenced the genomes of Methanopyrus strains SNP6 and KOL6 isolated from the Atlantic and Iceland, respectively. Comparative genomic analysis revealed genetic diversity and instability implicated in niche adaption, including a number of transporter- and integrase/transposase-related genes. Pan-genome analysis also defined the gene pool of Methanopyrus spp., in addition of ~120-Kb genomic region of plasticity impacting cognate genomic architecture. We believe that Methanopyrus genomics could facilitate efficient investigation/recognition of archaeal phylogenetic diverse patterns, as well as improve understanding of biological roles and significance of these versatile microbes.

  15. A draft de novo genome assembly for the northern bobwhite (Colinus virginianus reveals evidence for a rapid decline in effective population size beginning in the Late Pleistocene.

    Directory of Open Access Journals (Sweden)

    Yvette A Halley

    Full Text Available Wild populations of northern bobwhites (Colinus virginianus; hereafter bobwhite have declined across nearly all of their U.S. range, and despite their importance as an experimental wildlife model for ecotoxicology studies, no bobwhite draft genome assembly currently exists. Herein, we present a bobwhite draft de novo genome assembly with annotation, comparative analyses including genome-wide analyses of divergence with the chicken (Gallus gallus and zebra finch (Taeniopygia guttata genomes, and coalescent modeling to reconstruct the demographic history of the bobwhite for comparison to other birds currently in decline (i.e., scarlet macaw; Ara macao. More than 90% of the assembled bobwhite genome was captured within 14,000 unique genes and proteins. Bobwhite analyses of divergence with the chicken and zebra finch genomes revealed many extremely conserved gene sequences, and evidence for lineage-specific divergence of noncoding regions. Coalescent models for reconstructing the demographic history of the bobwhite and the scarlet macaw provided evidence for population bottlenecks which were temporally coincident with human colonization of the New World, the late Pleistocene collapse of the megafauna, and the last glacial maximum. Demographic trends predicted for the bobwhite and the scarlet macaw also were concordant with how opposing natural selection strategies (i.e., skewness in the r-/K-selection continuum would be expected to shape genome diversity and the effective population sizes in these species, which is directly relevant to future conservation efforts.

  16. Re-annotation of the physical map of Glycine max for polyploid-like regions by BAC end sequence driven whole genome shotgun read assembly

    Directory of Open Access Journals (Sweden)

    Shultz Jeffry

    2008-07-01

    Full Text Available Abstract Background Many of the world's most important food crops have either polyploid genomes or homeologous regions derived from segmental shuffling following polyploid formation. The soybean (Glycine max genome has been shown to be composed of approximately four thousand short interspersed homeologous regions with 1, 2 or 4 copies per haploid genome by RFLP analysis, microsatellite anchors to BACs and by contigs formed from BAC fingerprints. Despite these similar regions,, the genome has been sequenced by whole genome shotgun sequence (WGS. Here the aim was to use BAC end sequences (BES derived from three minimum tile paths (MTP to examine the extent and homogeneity of polyploid-like regions within contigs and the extent of correlation between the polyploid-like regions inferred from fingerprinting and the polyploid-like sequences inferred from WGS matches. Results Results show that when sequence divergence was 1–10%, the copy number of homeologous regions could be identified from sequence variation in WGS reads overlapping BES. Homeolog sequence variants (HSVs were single nucleotide polymorphisms (SNPs; 89% and single nucleotide indels (SNIs 10%. Larger indels were rare but present (1%. Simulations that had predicted fingerprints of homeologous regions could be separated when divergence exceeded 2% were shown to be false. We show that a 5–10% sequence divergence is necessary to separate homeologs by fingerprinting. BES compared to WGS traces showed polyploid-like regions with less than 1% sequence divergence exist at 2.3% of the locations assayed. Conclusion The use of HSVs like SNPs and SNIs to characterize BACs wil improve contig building methods. The implications for bioinformatic and functional annotation of polyploid and paleopolyploid genomes show that a combined approach of BAC fingerprint based physical maps, WGS sequence and HSV-based partitioning of BAC clones from homeologous regions to separate contigs will allow reliable de

  17. Signatures of selection in tilapia revealed by whole genome resequencing.

    Science.gov (United States)

    Xia, Jun Hong; Bai, Zhiyi; Meng, Zining; Zhang, Yong; Wang, Le; Liu, Feng; Jing, Wu; Wan, Zi Yi; Li, Jiale; Lin, Haoran; Yue, Gen Hua

    2015-09-16

    Natural selection and selective breeding for genetic improvement have left detectable signatures within the genome of a species. Identification of selection signatures is important in evolutionary biology and for detecting genes that facilitate to accelerate genetic improvement. However, selection signatures, including artificial selection and natural selection, have only been identified at the whole genome level in several genetically improved fish species. Tilapia is one of the most important genetically improved fish species in the world. Using next-generation sequencing, we sequenced the genomes of 47 tilapia individuals. We identified a total of 1.43 million high-quality SNPs and found that the LD block sizes ranged from 10-100 kb in tilapia. We detected over a hundred putative selective sweep regions in each line of tilapia. Most selection signatures were located in non-coding regions of the tilapia genome. The Wnt signaling, gonadotropin-releasing hormone receptor and integrin signaling pathways were under positive selection in all improved tilapia lines. Our study provides a genome-wide map of genetic variation and selection footprints in tilapia, which could be important for genetic studies and accelerating genetic improvement of tilapia.

  18. Genomic landscapes of Chinese hamster ovary cell lines as revealed by the Cricetulus griseus draft genome

    DEFF Research Database (Denmark)

    Lewis, Nathan E; Liu, Xin; Li, Yuxiang

    2013-01-01

    stymied by the lack of a unifying genomic resource for CHO cells. Here we report a 2.4-Gb draft genome sequence of a female Chinese hamster, Cricetulus griseus, harboring 24,044 genes. We also resequenced and analyzed the genomes of six CHO cell lines from the CHO-K1, DG44 and CHO-S lineages...

  19. De novo assembly and functional annotation of Myrciaria dubia fruit transcriptome reveals multiple metabolic pathways for L-ascorbic acid biosynthesis.

    Science.gov (United States)

    Castro, Juan C; Maddox, J Dylan; Cobos, Marianela; Requena, David; Zimic, Mirko; Bombarely, Aureliano; Imán, Sixto A; Cerdeira, Luis A; Medina, Andersson E

    2015-11-24

    Myrciaria dubia is an Amazonian fruit shrub that produces numerous bioactive phytochemicals, but is best known by its high L-ascorbic acid (AsA) content in fruits. Pronounced variation in AsA content has been observed both within and among individuals, but the genetic factors responsible for this variation are largely unknown. The goals of this research, therefore, were to assemble, characterize, and annotate the fruit transcriptome of M. dubia in order to reconstruct metabolic pathways and determine if multiple pathways contribute to AsA biosynthesis. In total 24,551,882 high-quality sequence reads were de novo assembled into 70,048 unigenes (mean length = 1150 bp, N50 = 1775 bp). Assembled sequences were annotated using BLASTX against public databases such as TAIR, GR-protein, FB, MGI, RGD, ZFIN, SGN, WB, TIGR_CMR, and JCVI-CMR with 75.2 % of unigenes having annotations. Of the three core GO annotation categories, biological processes comprised 53.6 % of the total assigned annotations, whereas cellular components and molecular functions comprised 23.3 and 23.1 %, respectively. Based on the KEGG pathway assignment of the functionally annotated transcripts, five metabolic pathways for AsA biosynthesis were identified: animal-like pathway, myo-inositol pathway, L-gulose pathway, D-mannose/L-galactose pathway, and uronic acid pathway. All transcripts coding enzymes involved in the ascorbate-glutathione cycle were also identified. Finally, we used the assembly to identified 6314 genic microsatellites and 23,481 high quality SNPs. This study describes the first next-generation sequencing effort and transcriptome annotation of a non-model Amazonian plant that is relevant for AsA production and other bioactive phytochemicals. Genes encoding key enzymes were successfully identified and metabolic pathways involved in biosynthesis of AsA, anthocyanins, and other metabolic pathways have been reconstructed. The identification of these genes and pathways is in agreement with

  20. Culture independent genomic comparisons reveal environmental adaptations for Altiarchaeales

    Directory of Open Access Journals (Sweden)

    Jordan T Bird

    2016-08-01

    Full Text Available The recently proposed candidatus order Altiarchaeales remains an uncultured archaeal lineage composed of genetically diverse, globally widespread organisms frequently observed in anoxic subsurface environments. In spite of 15 years of studies on the psychrophilic biofilm-producing Candidatus (Ca. Altiarchaeum hamiconexum and its close relatives, very little is known about the phylogenetic and functional diversity of the widespread free-living marine members of this taxon. From methanogenic sediments in the White Oak River Estuary, NC, we sequenced a single cell amplified genome (SAG, WOR_SCG_SM1, and used it to identify and refine two high-quality genomes from metagenomes, WOR_79 and WOR_86-2, from the same site in a different year. These three genomic reconstructions form a monophyletic group which also includes three previously published genomes from metagenomes from terrestrial springs and a SAG from Sakinaw Lake in a group previously designated as pMC2A384. A synapomorphic mutation in the Altiarchaeales tRNA synthetase β subunit, pheT, causes the protein to be encoded as two subunits at distant loci. Consistent with the terrestrial spring clades, our estuarine genomes contain a near-complete autotrophic metabolism, H2 or CO as potential electron donors, a reductive acetyl-CoA pathway for carbon fixation, and methylotroph-like NADP(H-dependent dehydrogenase. Phylogenies based on 16S rRNA genes and concatenated conserved proteins identify two distinct sub-clades of Altiarchaeales, Alti-1 populated by organisms from actively flowing springs, and Alti-2 which is more widespread, diverse, and not associated with visible mats. The core Alti-1 genome supports Alti-1 as adapted for the stream environment, with lipopolysaccharide production capacity, extracellular hami structures. The core Alti-2 genome members of this clade are free-living, with distinct mechanisms for energy maintenance, motility, osmoregulation, and sulfur redox reactions. These

  1. COGNATE: comparative gene annotation characterizer.

    Science.gov (United States)

    Wilbrandt, Jeanne; Misof, Bernhard; Niehuis, Oliver

    2017-07-17

    The comparison of gene and genome structures across species has the potential to reveal major trends of genome evolution. However, such a comparative approach is currently hampered by a lack of standardization (e.g., Elliott TA, Gregory TR, Philos Trans Royal Soc B: Biol Sci 370:20140331, 2015). For example, testing the hypothesis that the total amount of coding sequences is a reliable measure of potential proteome diversity (Wang M, Kurland CG, Caetano-Anollés G, PNAS 108:11954, 2011) requires the application of standardized definitions of coding sequence and genes to create both comparable and comprehensive data sets and corresponding summary statistics. However, such standard definitions either do not exist or are not consistently applied. These circumstances call for a standard at the descriptive level using a minimum of parameters as well as an undeviating use of standardized terms, and for software that infers the required data under these strict definitions. The acquisition of a comprehensive, descriptive, and standardized set of parameters and summary statistics for genome publications and further analyses can thus greatly benefit from the availability of an easy to use standard tool. We developed a new open-source command-line tool, COGNATE (Comparative Gene Annotation Characterizer), which uses a given genome assembly and its annotation of protein-coding genes for a detailed description of the respective gene and genome structure parameters. Additionally, we revised the standard definitions of gene and genome structures and provide the definitions used by COGNATE as a working draft suggestion for further reference. Complete parameter lists and summary statistics are inferred using this set of definitions to allow down-stream analyses and to provide an overview of the genome and gene repertoire characteristics. COGNATE is written in Perl and freely available at the ZFMK homepage ( https://www.zfmk.de/en/COGNATE ) and on github ( https

  2. Snf2 family gene distribution in higher plant genomes reveals DRD1 expansion and diversification in the tomato genome.

    Science.gov (United States)

    Bargsten, Joachim W; Folta, Adam; Mlynárová, Ludmila; Nap, Jan-Peter

    2013-01-01

    As part of large protein complexes, Snf2 family ATPases are responsible for energy supply during chromatin remodeling, but the precise mechanism of action of many of these proteins is largely unknown. They influence many processes in plants, such as the response to environmental stress. This analysis is the first comprehensive study of Snf2 family ATPases in plants. We here present a comparative analysis of 1159 candidate plant Snf2 genes in 33 complete and annotated plant genomes, including two green algae. The number of Snf2 ATPases shows considerable variation across plant genomes (17-63 genes). The DRD1, Rad5/16 and Snf2 subfamily members occur most often. Detailed analysis of the plant-specific DRD1 subfamily in related plant genomes shows the occurrence of a complex series of evolutionary events. Notably tomato carries unexpected gene expansions of DRD1 gene members. Most of these genes are expressed in tomato, although at low levels and with distinct tissue or organ specificity. In contrast, the Snf2 subfamily genes tend to be expressed constitutively in tomato. The results underpin and extend the Snf2 subfamily classification, which could help to determine the various functional roles of Snf2 ATPases and to target environmental stress tolerance and yield in future breeding.

  3. Snf2 family gene distribution in higher plant genomes reveals DRD1 expansion and diversification in the tomato genome.

    Directory of Open Access Journals (Sweden)

    Joachim W Bargsten

    Full Text Available As part of large protein complexes, Snf2 family ATPases are responsible for energy supply during chromatin remodeling, but the precise mechanism of action of many of these proteins is largely unknown. They influence many processes in plants, such as the response to environmental stress. This analysis is the first comprehensive study of Snf2 family ATPases in plants. We here present a comparative analysis of 1159 candidate plant Snf2 genes in 33 complete and annotated plant genomes, including two green algae. The number of Snf2 ATPases shows considerable variation across plant genomes (17-63 genes. The DRD1, Rad5/16 and Snf2 subfamily members occur most often. Detailed analysis of the plant-specific DRD1 subfamily in related plant genomes shows the occurrence of a complex series of evolutionary events. Notably tomato carries unexpected gene expansions of DRD1 gene members. Most of these genes are expressed in tomato, although at low levels and with distinct tissue or organ specificity. In contrast, the Snf2 subfamily genes tend to be expressed constitutively in tomato. The results underpin and extend the Snf2 subfamily classification, which could help to determine the various functional roles of Snf2 ATPases and to target environmental stress tolerance and yield in future breeding.

  4. Genomic Variants Revealed by Invariably Missing Genotypes in Nelore Cattle.

    Directory of Open Access Journals (Sweden)

    Joaquim Manoel da Silva

    Full Text Available High density genotyping panels have been used in a wide range of applications. From population genetics to genome-wide association studies, this technology still offers the lowest cost and the most consistent solution for generating SNP data. However, in spite of the application, part of the generated data is always discarded from final datasets based on quality control criteria used to remove unreliable markers. Some discarded data consists of markers that failed to generate genotypes, labeled as missing genotypes. A subset of missing genotypes that occur in the whole population under study may be caused by technical issues but can also be explained by the presence of genomic variations that are in the vicinity of the assayed SNP and that prevent genotyping probes from annealing. The latter case may contain relevant information because these missing genotypes might be used to identify population-specific genomic variants. In order to assess which case is more prevalent, we used Illumina HD Bovine chip genotypes from 1,709 Nelore (Bos indicus samples. We found 3,200 missing genotypes among the whole population. NGS re-sequencing data from 8 sires were used to verify the presence of genomic variations within their flanking regions in 81.56% of these missing genotypes. Furthermore, we discovered 3,300 novel SNPs/Indels, 31% of which are located in genes that may affect traits of importance for the genetic improvement of cattle production.

  5. Chimpanzee genomic diversity reveals ancient admixture with bonobos

    DEFF Research Database (Denmark)

    de Manuel, Marc; Kuhlwilm, Martin; Frandsen, Peter

    2016-01-01

    Our closest living relatives, chimpanzees and bonobos, have a complex demographic history. We analyzed the high-coverage whole genomes of 75 wild-born chimpanzees and bonobos from 10 countries in Africa. We found that chimpanzee population substructure makes genetic information a good predictor...

  6. Genomic Perturbations Reveal Distinct Regulatory Networks in Intrahepatic Cholangiocarcinoma

    DEFF Research Database (Denmark)

    Nepal, Chirag; O'Rourke, Colm J; Oliveira, Douglas Vnp

    2018-01-01

    Intrahepatic cholangiocarcinoma (iCCA) remains a highly heterogeneous malignancy that has eluded effective patient stratification to date. The extent to which such heterogeneity can be influenced by individual driver mutations remains to be evaluated. Here, we analyzed genomic (whole-exome sequen...

  7. Ancient Ethiopian genome reveals extensive Eurasian admixture in Eastern Africa

    KAUST Repository

    Gallego Llorente, M.; Jones, E. R.; Eriksson, Anders; Siska, V.; Arthur, K. W.; Arthur, J. W.; Curtis, M. C.; Stock, J. T.; Coltorti, M.; Pieruccini, P.; Stretton, S.; Brock, F.; Higham, T.; Park, Y.; Hofreiter, M.; Bradley, D. G.; Bhak, J.; Pinhasi, R.; Manica, A.

    2015-01-01

    Characterizing genetic diversity in Africa is a crucial step for most analyses reconstructing the evolutionary history of anatomically modern humans. However, historic migrations from Eurasia into Africa have affected many contemporary populations, confounding inferences. Here, we present a 12.5×coverage ancient genome of an Ethiopian male ("Mota") who lived approximately 4500 years ago. We use this genome to demonstrate that the Eurasian backflow into Africa came from a population closely related to Early Neolithic farmers, who had colonized Europe 4000 years earlier. The extent of this backflow was much greater than previously reported, reaching all the way to Central, West, and Southern Africa, affecting even populations such as Yoruba and Mbuti, previously thought to be relatively unadmixed, who harbor 6 to 7% Eurasian ancestry.

  8. Ancient Ethiopian genome reveals extensive Eurasian admixture in Eastern Africa

    KAUST Repository

    Gallego Llorente, M.

    2015-10-09

    Characterizing genetic diversity in Africa is a crucial step for most analyses reconstructing the evolutionary history of anatomically modern humans. However, historic migrations from Eurasia into Africa have affected many contemporary populations, confounding inferences. Here, we present a 12.5×coverage ancient genome of an Ethiopian male ("Mota") who lived approximately 4500 years ago. We use this genome to demonstrate that the Eurasian backflow into Africa came from a population closely related to Early Neolithic farmers, who had colonized Europe 4000 years earlier. The extent of this backflow was much greater than previously reported, reaching all the way to Central, West, and Southern Africa, affecting even populations such as Yoruba and Mbuti, previously thought to be relatively unadmixed, who harbor 6 to 7% Eurasian ancestry.

  9. Comparative genomic hybridizations reveal absence of large Streptomyces coelicolor genomic islands in Streptomyces lividans

    OpenAIRE

    Jayapal, Karthik P; Lian, Wei; Glod, Frank; Sherman, David H; Hu, Wei-Shou

    2007-01-01

    Abstract Background The genomes of Streptomyces coelicolor and Streptomyces lividans bear a considerable degree of synteny. While S. coelicolor is the model streptomycete for studying antibiotic synthesis and differentiation, S. lividans is almost exclusively considered as the preferred host, among actinomycetes, for cloning and expression of exogenous DNA. We used whole genome microarrays as a comparative genomics tool for identifying the subtle differences between these two chromosomes. Res...

  10. Algal genomes reveal evolutionary mosaicism and the fate of nucleomorphs

    Czech Academy of Sciences Publication Activity Database

    Oborník, Miroslav; Kořený, Luděk

    2012-01-01

    Roč. 492, č. 7427 (2012), s. 59-65 ISSN 0028-0836 Institutional support: RVO:60077344 Keywords : GENE-TRANSFER * BIGELOWIELLA-NATANS * EUKARYOTIC GENOMES * GUILLARDIA-THETA * NUCLEUS * CHLORARACHNIOPHYTE * PROTEINS * SEQUENCE * ORIGIN * CRYPTOPHYTES Subject RIV: EB - Genetics ; Molecular Biology Impact factor: 38.597, year: 2012 http://www.nature.com/nature/journal/v492/n7427/full/nature11681.html

  11. Upper Palaeolithic genomes reveal deep roots of modern Eurasians

    KAUST Repository

    Jones, Eppie R.

    2015-11-16

    We extend the scope of European palaeogenomics by sequencing the genomes of Late Upper Palaeolithic (13,300 years old, 1.4-fold coverage) and Mesolithic (9,700 years old, 15.4-fold) males from western Georgia in the Caucasus and a Late Upper Palaeolithic (13,700 years old, 9.5-fold) male from Switzerland. While we detect Late Palaeolithic–Mesolithic genomic continuity in both regions, we find that Caucasus hunter-gatherers (CHG) belong to a distinct ancient clade that split from western hunter-gatherers ~45 kya, shortly after the expansion of anatomically modern humans into Europe and from the ancestors of Neolithic farmers ~25 kya, around the Last Glacial Maximum. CHG genomes significantly contributed to the Yamnaya steppe herders who migrated into Europe ~3,000 BC, supporting a formative Caucasus influence on this important Early Bronze age culture. CHG left their imprint on modern populations from the Caucasus and also central and south Asia possibly marking the arrival of Indo-Aryan languages.

  12. Upper Palaeolithic genomes reveal deep roots of modern Eurasians

    KAUST Repository

    Jones, Eppie R.; Gonzalez-Fortes, Gloria; Connell, Sarah; Siska, Veronika; Eriksson, Anders; Martiniano, Rui; McLaughlin, Russell L.; Gallego Llorente, Marcos; Cassidy, Lara M.; Gamba, Cristina; Meshveliani, Tengiz; Bar-Yosef, Ofer; Mü ller, Werner; Belfer-Cohen, Anna; Matskevich, Zinovi; Jakeli, Nino; Higham, Thomas F. G.; Currat, Mathias; Lordkipanidze, David; Hofreiter, Michael; Manica, Andrea; Pinhasi, Ron; Bradley, Daniel G.

    2015-01-01

    We extend the scope of European palaeogenomics by sequencing the genomes of Late Upper Palaeolithic (13,300 years old, 1.4-fold coverage) and Mesolithic (9,700 years old, 15.4-fold) males from western Georgia in the Caucasus and a Late Upper Palaeolithic (13,700 years old, 9.5-fold) male from Switzerland. While we detect Late Palaeolithic–Mesolithic genomic continuity in both regions, we find that Caucasus hunter-gatherers (CHG) belong to a distinct ancient clade that split from western hunter-gatherers ~45 kya, shortly after the expansion of anatomically modern humans into Europe and from the ancestors of Neolithic farmers ~25 kya, around the Last Glacial Maximum. CHG genomes significantly contributed to the Yamnaya steppe herders who migrated into Europe ~3,000 BC, supporting a formative Caucasus influence on this important Early Bronze age culture. CHG left their imprint on modern populations from the Caucasus and also central and south Asia possibly marking the arrival of Indo-Aryan languages.

  13. Annotation of a hybrid partial genome of the Coffee Rust (Hemileia vastatrix contributes to the gene repertoire catalogue of the Pucciniales

    Directory of Open Access Journals (Sweden)

    Marco Aurelio Cristancho

    2014-10-01

    Full Text Available Coffee leaf rust caused by the fungus Hemileia vastatrix is the most damaging disease to coffee worldwide. The pathogen has recently appeared in multiple outbreaks in coffee producing countries resulting in significant yield losses and increases in costs related to its control. New races/isolates are constantly emerging as evidenced by the presence of the fungus in plants that were previously resistant. Genomic studies are opening new avenues for the study of the evolution of pathogens, the detailed description of plant-pathogen interactions and the development of molecular techniques for the identification of individual isolates. For this purpose we sequenced 8 different H. vastatrix isolates using NGS technologies and gathered partial genome assemblies due to the large repetitive content in the coffee rust hybrid genome; 74.4% of the assembled contigs harbor repetitive sequences. A hybrid assembly of 333Mb was built based on the 8 isolates; this assembly was used for subsequent analyses.Analysis of the conserved gene space showed that the hybrid H. vastatrix genome, though highly fragmented, had a satisfactory level of completion with 91.94% of core protein-coding orthologous genes present. RNA-Seq from urediniospores was used to guide the de novo annotation of the H. vastatrix gene complement. In total, 14,445 genes organized in 3,921 families were uncovered; a considerable proportion of the predicted proteins (73.8% were homologous to other Pucciniales species genomes. Several gene families related to the fungal lifestyle were identified, particularly 483 predicted secreted proteins that represent candidate effector genes and will provide interesting hints to decipher virulence in the coffee rust fungus. The genome sequence of Hva will serve as a template to understand the molecular mechanisms used by this fungus to attack the coffee plant, to study the diversity of this species and for the development of molecular markers to distinguish

  14. Prehistoric genomes reveal the genetic foundation and cost of horse domestication

    DEFF Research Database (Denmark)

    Schubert, Mikkel; Jáónsson, Hákon; Chang, Dan

    2014-01-01

    genetics alone. We therefore sequenced two complete horse genomes, predating domestication by thousands of years, to characterize the genetic footprint of domestication. These ancient genomes reveal predomestic population structure and a significant fraction of genetic variation shared with the domestic...... breeds but absent from Przewalski’s horses. We find positive selection on genes involved in various aspects of locomotion, physiology, and cognition. Finally, we show that modern horse genomes contain an excess of deleterious mutations, likely representing the genetic cost of domestication....

  15. Genome-Wide Annotation and Comparative Analysis of Cytochrome P450 Monooxygenases in Basidiomycete Biotrophic Plant Pathogens.

    Directory of Open Access Journals (Sweden)

    Lehlohonolo Benedict Qhanya

    Full Text Available Fungi are an exceptional source of diverse and novel cytochrome P450 monooxygenases (P450s, heme-thiolate proteins, with catalytic versatility. Agaricomycotina saprophytes have yielded most of the available information on basidiomycete P450s. This resulted in observing similar P450 family types in basidiomycetes with few differences in P450 families among Agaricomycotina saprophytes. The present study demonstrated the presence of unique P450 family patterns in basidiomycete biotrophic plant pathogens that could possibly have originated from the adaptation of these species to different ecological niches (host influence. Systematic analysis of P450s in basidiomycete biotrophic plant pathogens belonging to three different orders, Agaricomycotina (Armillaria mellea, Pucciniomycotina (Melampsora laricis-populina, M. lini, Mixia osmundae and Puccinia graminis and Ustilaginomycotina (Ustilago maydis, Sporisorium reilianum and Tilletiaria anomala, revealed the presence of numerous putative P450s ranging from 267 (A. mellea to 14 (M. osmundae. Analysis of P450 families revealed the presence of 41 new P450 families and 27 new P450 subfamilies in these biotrophic plant pathogens. Order-level comparison of P450 families between biotrophic plant pathogens revealed the presence of unique P450 family patterns in these organisms, possibly reflecting the characteristics of their order. Further comparison of P450 families with basidiomycete non-pathogens confirmed that biotrophic plant pathogens harbour the unique P450 families in their genomes. The CYP63, CYP5037, CYP5136, CYP5137 and CYP5341 P450 families were expanded in A. mellea when compared to other Agaricomycotina saprophytes and the CYP5221 and CYP5233 P450 families in P. graminis and M. laricis-populina. The present study revealed that expansion of these P450 families is due to paralogous evolution of member P450s. The presence of unique P450 families in these organisms serves as evidence of how a host

  16. Wild tobacco genomes reveal the evolution of nicotine biosynthesis.

    Science.gov (United States)

    Xu, Shuqing; Brockmöller, Thomas; Navarro-Quezada, Aura; Kuhl, Heiner; Gase, Klaus; Ling, Zhihao; Zhou, Wenwu; Kreitzer, Christoph; Stanke, Mario; Tang, Haibao; Lyons, Eric; Pandey, Priyanka; Pandey, Shree P; Timmermann, Bernd; Gaquerel, Emmanuel; Baldwin, Ian T

    2017-06-06

    Nicotine, the signature alkaloid of Nicotiana species responsible for the addictive properties of human tobacco smoking, functions as a defensive neurotoxin against attacking herbivores. However, the evolution of the genetic features that contributed to the assembly of the nicotine biosynthetic pathway remains unknown. We sequenced and assembled genomes of two wild tobaccos, Nicotiana attenuata (2.5 Gb) and Nicotiana obtusifolia (1.5 Gb), two ecological models for investigating adaptive traits in nature. We show that after the Solanaceae whole-genome triplication event, a repertoire of rapidly expanding transposable elements (TEs) bloated these Nicotiana genomes, promoted expression divergences among duplicated genes, and contributed to the evolution of herbivory-induced signaling and defenses, including nicotine biosynthesis. The biosynthetic machinery that allows for nicotine synthesis in the roots evolved from the stepwise duplications of two ancient primary metabolic pathways: the polyamine and nicotinamide adenine dinucleotide (NAD) pathways. In contrast to the duplication of the polyamine pathway that is shared among several solanaceous genera producing polyamine-derived tropane alkaloids, we found that lineage-specific duplications within the NAD pathway and the evolution of root-specific expression of the duplicated Solanaceae-specific ethylene response factor that activates the expression of all nicotine biosynthetic genes resulted in the innovative and efficient production of nicotine in the genus Nicotiana Transcription factor binding motifs derived from TEs may have contributed to the coexpression of nicotine biosynthetic pathway genes and coordinated the metabolic flux. Together, these results provide evidence that TEs and gene duplications facilitated the emergence of a key metabolic innovation relevant to plant fitness.

  17. Genomic Comparisons Reveal Microevolutionary Differences in Mycobacterium abscessus Subspecies

    Directory of Open Access Journals (Sweden)

    Joon L. Tan

    2017-10-01

    Full Text Available Mycobacterium abscessus, a rapid-growing non-tuberculous mycobacterium, has been the cause of sporadic and outbreak infections world-wide. The subspecies in M. abscessus complex (M. abscessus, M. massiliense, and M. bolletii are associated with different biologic and pathogenic characteristics and are known to be among the most frequently isolated opportunistic pathogens from clinical material. To date, the evolutionary forces that could have contributed to these biological and clinical differences are still unclear. We compared genome data from 243 M. abscessus strains downloaded from the NCBI ftp Refseq database to understand how the microevolutionary processes of homologous recombination and positive selection influenced the diversification of the M. abscessus complex at the subspecies level. The three subspecies are clearly separated in the Minimum Spanning Tree. Their MUMi-based genomic distances support the separation of M. massiliense and M. bolletii into two subspecies. Maximum Likelihood analysis through dN/dS (the ratio of number of non-synonymous substitutions per non-synonymous site, to the number of synonymous substitutions per synonymous site identified distinct genes in each subspecies that could have been affected by positive selection during evolution. The results of genome-wide alignment based on concatenated locally-collinear blocks suggest that (a recombination has affected the M. abscessus complex more than mutation and positive selection; (b recombination occurred more frequently in M. massiliense than in the other two subspecies; and (c the recombined segments in the three subspecies have come from different intra-species and inter-species origins. The results lead to the identification of possible gene sets that could have been responsible for the subspecies-specific features and suggest independent evolution among the three subspecies, with recombination playing a more significant role than positive selection in the

  18. Genomic Comparisons Reveal Microevolutionary Differences in Mycobacterium abscessus Subspecies

    Science.gov (United States)

    Tan, Joon L.; Ng, Kee P.; Ong, Chia S.; Ngeow, Yun F.

    2017-01-01

    Mycobacterium abscessus, a rapid-growing non-tuberculous mycobacterium, has been the cause of sporadic and outbreak infections world-wide. The subspecies in M. abscessus complex (M. abscessus, M. massiliense, and M. bolletii) are associated with different biologic and pathogenic characteristics and are known to be among the most frequently isolated opportunistic pathogens from clinical material. To date, the evolutionary forces that could have contributed to these biological and clinical differences are still unclear. We compared genome data from 243 M. abscessus strains downloaded from the NCBI ftp Refseq database to understand how the microevolutionary processes of homologous recombination and positive selection influenced the diversification of the M. abscessus complex at the subspecies level. The three subspecies are clearly separated in the Minimum Spanning Tree. Their MUMi-based genomic distances support the separation of M. massiliense and M. bolletii into two subspecies. Maximum Likelihood analysis through dN/dS (the ratio of number of non-synonymous substitutions per non-synonymous site, to the number of synonymous substitutions per synonymous site) identified distinct genes in each subspecies that could have been affected by positive selection during evolution. The results of genome-wide alignment based on concatenated locally-collinear blocks suggest that (a) recombination has affected the M. abscessus complex more than mutation and positive selection; (b) recombination occurred more frequently in M. massiliense than in the other two subspecies; and (c) the recombined segments in the three subspecies have come from different intra-species and inter-species origins. The results lead to the identification of possible gene sets that could have been responsible for the subspecies-specific features and suggest independent evolution among the three subspecies, with recombination playing a more significant role than positive selection in the diversification

  19. Complete genome-wide screening and subtractive genomic approach revealed new virulence factors, potential drug targets against bio-war pathogen Brucella melitensis 16M

    Directory of Open Access Journals (Sweden)

    Pradeepkiran JA

    2015-03-01

    Full Text Available Jangampalli Adi Pradeepkiran,1* Sri Bhashyam Sainath,2,3* Konidala Kranthi Kumar,1 Matcha Bhaskar1 1Division of Animal Biotechnology, Department of Zoology, Sri Venkateswara University, Tirupati, India; 2CIMAR/CIIMAR, Centro Interdisciplinar de Investigação Marinha e Ambiental, Universidade do Porto, Rua dos Bragas, Porto, Portugal, 3Department of Biotechnology, Vikrama Simhapuri University, Nellore, Andhra Pradesh, India *These authors contributed equally to this work Abstract: Brucella melitensis 16M is a Gram-negative coccobacillus that infects both animals and humans. It causes a disease known as brucellosis, which is characterized by acute febrile illness in humans and causes abortions in livestock. To prevent and control brucellosis, identification of putative drug targets is crucial. The present study aimed to identify drug targets in B. melitensis 16M by using a subtractive genomic approach. We used available database repositories (Database of Essential Genes, Kyoto Encyclopedia of Genes and Genomes Automatic Annotation Server, and Kyoto Encyclopedia of Genes and Genomes to identify putative genes that are nonhomologous to humans and essential for pathogen B. melitensis 16M. The results revealed that among 3 Mb genome size of pathogen, 53 putative characterized and 13 uncharacterized hypothetical genes were identified; further, from Basic Local Alignment Search Tool protein analysis, one hypothetical protein showed a close resemblance (50% to Silicibacter pomeroyi DUF1285 family protein (2RE3. A further homology model of the target was constructed using MODELLER 9.12 and optimized through variable target function method by molecular dynamics optimization with simulating annealing. The stereochemical quality of the restrained model was evaluated by PROCHECK, VERIFY-3D, ERRAT, and WHATIF servers. Furthermore, structure-based virtual screening was carried out against the predicted active site of the respective protein using the

  20. An Aboriginal Australian Genome Reveals Separate Human Dispersals into Asia

    OpenAIRE

    Rasmussen, Morten; Guo, Xiaosen; Wang, Yong; Lohmueller, Kirk E.; Rasmussen, Simon; Albrechtsen, Anders; Skotte, Line; Lindgreen, Stinus; Metspalu, Mait; Jombart, Thibaut; Kivisild, Toomas; Zhai, Weiwei; Eriksson, Anders; Manica, Andrea; Orlando, Ludovic

    2011-01-01

    We present an Aboriginal Australian genomic sequence obtained from a 100-year-old lock of hair donated by an Aboriginal man from southern Western Australia in the early 20th century. We detect no evidence of European admixture and estimate contamination levels to be below 0.5%. We show that Aboriginal Australians are descendants of an early human dispersal into eastern Asia, possibly 62,000 to 75,000 years ago. This dispersal is separate from the one that gave rise to modern Asians 25,000 to ...

  1. Broad genomic and transcriptional analysis reveals a highly derived genome in dinoflagellate mitochondria

    Directory of Open Access Journals (Sweden)

    Keeling Patrick J

    2007-09-01

    Full Text Available Abstract Background Dinoflagellates comprise an ecologically significant and diverse eukaryotic phylum that is sister to the phylum containing apicomplexan endoparasites. The mitochondrial genome of apicomplexans is uniquely reduced in gene content and size, encoding only three proteins and two ribosomal RNAs (rRNAs within a highly compacted 6 kb DNA. Dinoflagellate mitochondrial genomes have been comparatively poorly studied: limited available data suggest some similarities with apicomplexan mitochondrial genomes but an even more radical type of genomic organization. Here, we investigate structure, content and expression of dinoflagellate mitochondrial genomes. Results From two dinoflagellates, Crypthecodinium cohnii and Karlodinium micrum, we generated over 42 kb of mitochondrial genomic data that indicate a reduced gene content paralleling that of mitochondrial genomes in apicomplexans, i.e., only three protein-encoding genes and at least eight conserved components of the highly fragmented large and small subunit rRNAs. Unlike in apicomplexans, dinoflagellate mitochondrial genes occur in multiple copies, often as gene fragments, and in numerous genomic contexts. Analysis of cDNAs suggests several novel aspects of dinoflagellate mitochondrial gene expression. Polycistronic transcripts were found, standard start codons are absent, and oligoadenylation occurs upstream of stop codons, resulting in the absence of termination codons. Transcripts of at least one gene, cox3, are apparently trans-spliced to generate full-length mRNAs. RNA substitutional editing, a process previously identified for mRNAs in dinoflagellate mitochondria, is also implicated in rRNA expression. Conclusion The dinoflagellate mitochondrial genome shares the same gene complement and fragmentation of rRNA genes with its apicomplexan counterpart. However, it also exhibits several unique characteristics. Most notable are the expansion of gene copy numbers and their arrangements

  2. An Aboriginal Australian Genome Reveals Separate Human Dispersals into Asia

    DEFF Research Database (Denmark)

    Rasmussen, Morten; Guo, Xiaosen; Wang, Yong

    2011-01-01

    We present an Aboriginal Australian genomic sequence obtained from a 100-year-old lock of hair donated by an Aboriginal man from southern Western Australia in the early 20th century. We detect no evidence of European admixture and estimate contamination levels to be below 0.5%. We show that Abori......We present an Aboriginal Australian genomic sequence obtained from a 100-year-old lock of hair donated by an Aboriginal man from southern Western Australia in the early 20th century. We detect no evidence of European admixture and estimate contamination levels to be below 0.5%. We show...... that Aboriginal Australians are descendants of an early human dispersal into eastern Asia, possibly 62,000 to 75,000 years ago. This dispersal is separate from the one that gave rise to modern Asians 25,000 to 38,000 years ago. We also find evidence of gene flow between populations of the two dispersal waves...... prior to the divergence of Native Americans from modern Asian ancestors. Our findings support the hypothesis that present-day Aboriginal Australians descend from the earliest humans to occupy Australia, likely representing one of the oldest continuous populations outside Africa....

  3. Genomic Characterization of Methanomicrobiales Reveals Three Classes of Methanogens

    Energy Technology Data Exchange (ETDEWEB)

    Anderson, Iain; Ulrich, Luke E.; Lupa, Boguslaw; Susanti, Dwi; Porat, Iris; Hooper, Sean D.; Lykidis, Athanasios; Sieprawska-Lupa, Magdalena; Dharmarajan, Lakshmi; Goltsman, Eugene; Lapidus, Alla; Saunders, Elizabeth; Han, Cliff; Land, Miriam; Lucas, Susan; Mukhopadhyay, Biswarup; Whitman, William B.; Woese, Carl; Bristow, James; Kyrpides, Nikos

    2009-05-01

    Methanomicrobiales is the least studied order of methanogens. While these organisms appear to be more closely related to the Methanosarcinales in ribosomal-based phylogenetic analyses, they are metabolically more similar to Class I methanogens. In order to improve our understanding of this lineage, we have completely sequenced the genomes of two members of this order, Methanocorpusculum labreanum Z and Methanoculleus marisnigri JR1, and compared them with the genome of a third, Methanospirillum hungatei JF-1. Similar to Class I methanogens, Methanomicrobiales use a partial reductive citric acid cycle for 2-oxoglutarate biosynthesis, and they have the Eha energy-converting hydrogenase. In common with Methanosarcinales, Methanomicrobiales possess the Ech hydrogenase and at least some of them may couple formylmethanofuran formation and heterodisulfide reduction to transmembrane ion gradients. Uniquely, M. labreanum and M. hungatei contain hydrogenases similar to the Pyrococcus furiosus Mbh hydrogenase, and all three Methanomicrobiales have anti-sigma factor and anti-anti-sigma factor regulatory proteins not found in other methanogens. Phylogenetic analysis based on seven core proteins of methanogenesis and cofactor biosynthesis places the Methanomicrobiales equidistant from Class I methanogens and Methanosarcinales. Our results indicate that Methanomicrobiales, rather than being similar to Class I methanogens or Methanomicrobiales, share some features of both and have some unique properties. We find that there are three distinct classes of methanogens: the Class I methanogens, the Methanomicrobiales (Class II), and the Methanosarcinales (Class III).

  4. OrthoVenn: a web server for genome wide comparison and annotation of orthologous clusters across multiple species

    Science.gov (United States)

    Genome wide analysis of orthologous clusters is an important component of comparative genomics studies. Identifying the overlap among orthologous clusters can enable us to elucidate the function and evolution of proteins across multiple species. Here, we report a web platform named OrthoVenn that i...

  5. Comparison of 26 sphingomonad genomes reveals diverse environmental adaptations and biodegradative capabilities

    DEFF Research Database (Denmark)

    Aylward, Frank O.; McDonald, Bradon R.; Adams, Sandra M.

    2013-01-01

    to the genus Sphingobium. Our pan-genomic analysis of sphingomonads reveals numerous species-specific open reading frames (ORFs) but few signatures of genus-specific cores. The organization and coding potential of the sphingomonad genomes appear to be highly variable, and plasmid-mediated gene transfer...... and chromosome-plasmid recombination, together with prophage- and transposon-mediated rearrangements, appear to play prominent roles in the genome evolution of this group. We find that many of the sphingomonad genomes encode numerous oxygenases and glycoside hydrolases, which are likely responsible...... a basis for understanding the ecological strategies employed by sphingomonads and their role in environmental nutrient cycling....

  6. De novo genome assembly and annotation of Australia's largest freshwater fish, the Murray cod (Maccullochella peelii), from Illumina and Nanopore sequencing read.

    Science.gov (United States)

    Austin, Christopher M; Tan, Mun Hua; Harrisson, Katherine A; Lee, Yin Peng; Croft, Laurence J; Sunnucks, Paul; Pavlova, Alexandra; Gan, Han Ming

    2017-08-01

    One of the most iconic Australian fish is the Murray cod, Maccullochella peelii (Mitchell 1838), a freshwater species that can grow to ∼1.8 metres in length and live to age ≥48 years. The Murray cod is of a conservation concern as a result of strong population contractions, but it is also popular for recreational fishing and is of growing aquaculture interest. In this study, we report the whole genome sequence of the Murray cod to support ongoing population genetics, conservation, and management research, as well as to better understand the evolutionary ecology and history of the species. A draft Murray cod genome of 633 Mbp (N50 = 109 974bp; BUSCO and CEGMA completeness of 94.2% and 91.9%, respectively) with an estimated 148 Mbp of putative repetitive sequences was assembled from the combined sequencing data of 2 fish individuals with an identical maternal lineage; 47.2 Gb of Illumina HiSeq data and 804 Mb of Nanopore data were generated from the first individual while 23.2 Gb of Illumina MiSeq data were generated from the second individual. The inclusion of Nanopore reads for scaffolding followed by subsequent gap-closing using Illumina data led to a 29% reduction in the number of scaffolds and a 55% and 54% increase in the scaffold and contig N50, respectively. We also report the first transcriptome of Murray cod that was subsequently used to annotate the Murray cod genome, leading to the identification of 26 539 protein-coding genes. We present the whole genome of the Murray cod and anticipate this will be a catalyst for a range of genetic, genomic, and phylogenetic studies of the Murray cod and more generally other fish species of the Percichthydae family. © The Authors 2017. Published by Oxford University Press.

  7. An Aboriginal Australian genome reveals separate human dispersals into Asia.

    Science.gov (United States)

    Rasmussen, Morten; Guo, Xiaosen; Wang, Yong; Lohmueller, Kirk E; Rasmussen, Simon; Albrechtsen, Anders; Skotte, Line; Lindgreen, Stinus; Metspalu, Mait; Jombart, Thibaut; Kivisild, Toomas; Zhai, Weiwei; Eriksson, Anders; Manica, Andrea; Orlando, Ludovic; De La Vega, Francisco M; Tridico, Silvana; Metspalu, Ene; Nielsen, Kasper; Ávila-Arcos, María C; Moreno-Mayar, J Víctor; Muller, Craig; Dortch, Joe; Gilbert, M Thomas P; Lund, Ole; Wesolowska, Agata; Karmin, Monika; Weinert, Lucy A; Wang, Bo; Li, Jun; Tai, Shuaishuai; Xiao, Fei; Hanihara, Tsunehiko; van Driem, George; Jha, Aashish R; Ricaut, François-Xavier; de Knijff, Peter; Migliano, Andrea B; Gallego Romero, Irene; Kristiansen, Karsten; Lambert, David M; Brunak, Søren; Forster, Peter; Brinkmann, Bernd; Nehlich, Olaf; Bunce, Michael; Richards, Michael; Gupta, Ramneek; Bustamante, Carlos D; Krogh, Anders; Foley, Robert A; Lahr, Marta M; Balloux, Francois; Sicheritz-Pontén, Thomas; Villems, Richard; Nielsen, Rasmus; Wang, Jun; Willerslev, Eske

    2011-10-07

    We present an Aboriginal Australian genomic sequence obtained from a 100-year-old lock of hair donated by an Aboriginal man from southern Western Australia in the early 20th century. We detect no evidence of European admixture and estimate contamination levels to be below 0.5%. We show that Aboriginal Australians are descendants of an early human dispersal into eastern Asia, possibly 62,000 to 75,000 years ago. This dispersal is separate from the one that gave rise to modern Asians 25,000 to 38,000 years ago. We also find evidence of gene flow between populations of the two dispersal waves prior to the divergence of Native Americans from modern Asian ancestors. Our findings support the hypothesis that present-day Aboriginal Australians descend from the earliest humans to occupy Australia, likely representing one of the oldest continuous populations outside Africa.

  8. Single-Molecule FISH Reveals Non-selective Packaging of Rift Valley Fever Virus Genome Segments.

    Directory of Open Access Journals (Sweden)

    Paul J Wichgers Schreur

    2016-08-01

    Full Text Available The bunyavirus genome comprises a small (S, medium (M, and large (L RNA segment of negative polarity. Although genome segmentation confers evolutionary advantages by enabling genome reassortment events with related viruses, genome segmentation also complicates genome replication and packaging. Accumulating evidence suggests that genomes of viruses with eight or more genome segments are incorporated into virions by highly selective processes. Remarkably, little is known about the genome packaging process of the tri-segmented bunyaviruses. Here, we evaluated, by single-molecule RNA fluorescence in situ hybridization (FISH, the intracellular spatio-temporal distribution and replication kinetics of the Rift Valley fever virus (RVFV genome and determined the segment composition of mature virions. The results reveal that the RVFV genome segments start to replicate near the site of infection before spreading and replicating throughout the cytoplasm followed by translocation to the virion assembly site at the Golgi network. Despite the average intracellular S, M and L genome segments approached a 1:1:1 ratio, major differences in genome segment ratios were observed among cells. We also observed a significant amount of cells lacking evidence of M-segment replication. Analysis of two-segmented replicons and four-segmented viruses subsequently confirmed the previous notion that Golgi recruitment is mediated by the Gn glycoprotein. The absence of colocalization of the different segments in the cytoplasm and the successful rescue of a tri-segmented variant with a codon shuffled M-segment suggested that inter-segment interactions are unlikely to drive the copackaging of the different segments into a single virion. The latter was confirmed by direct visualization of RNPs inside mature virions which showed that the majority of virions lack one or more genome segments. Altogether, this study suggests that RVFV genome packaging is a non-selective process.

  9. Genomic comparison of invasive and rare non-invasive strains reveals Porphyromonas gingivalis genetic polymorphisms

    Directory of Open Access Journals (Sweden)

    Svetlana Dolgilevich

    2011-03-01

    Full Text Available Porphyromonas gingivalis strains are shown to invade human cells in vitro with different invasion efficiencies, varying by up to three orders of magnitude.We tested the hypothesis that invasion-associated interstrain genomic polymorphisms are present in P. gingivalis and that putative invasion-associated genes can contribute to P. gingivalis invasion.Using an invasive (W83 and the only available non-invasive P. gingivalis strain (AJW4 and whole genome microarrays followed by two separate software tools, we carried out comparative genomic hybridization (CGH analysis.We identified 68 annotated and 51 hypothetical open reading frames (ORFs that are polymorphic between these strains. Among these are surface proteins, lipoproteins, capsular polysaccharide biosynthesis enzymes, regulatory and immunoreactive proteins, integrases, and transposases often with abnormal GC content and clustered on the chromosome. Amplification of selected ORFs was used to validate the approach and the selection. Eleven clinical strains were investigated for the presence of selected ORFs. The putative invasion-associated ORFs were present in 10 of the isolates. The invasion ability of three isogenic mutants, carrying deletions in PG0185, PG0186, and PG0982 was tested. The PG0185 (ragA and PG0186 (ragB mutants had 5.1×103-fold and 3.6×103-fold decreased in vitro invasion ability, respectively.The annotation of divergent ORFs suggests deficiency in multiple genes as a basis for P. gingivalis non-invasive phenotype. Access the supplementary material to this article: Supplement, table (see Supplementary files under Reading Tools online.

  10. Functional annotation by sequence-weighted structure alignments: statistical analysis and case studies from the Protein 3000 structural genomics project in Japan.

    Science.gov (United States)

    Standley, Daron M; Toh, Hiroyuki; Nakamura, Haruki

    2008-09-01

    A method to functionally annotate structural genomics targets, based on a novel structural alignment scoring function, is proposed. In the proposed score, position-specific scoring matrices are used to weight structurally aligned residue pairs to highlight evolutionarily conserved motifs. The functional form of the score is first optimized for discriminating domains belonging to the same Pfam family from domains belonging to different families but the same CATH or SCOP superfamily. In the optimization stage, we consider four standard weighting functions as well as our own, the "maximum substitution probability," and combinations of these functions. The optimized score achieves an area of 0.87 under the receiver-operating characteristic curve with respect to identifying Pfam families within a sequence-unique benchmark set of domain pairs. Confidence measures are then derived from the benchmark distribution of true-positive scores. The alignment method is next applied to the task of functionally annotating 230 query proteins released to the public as part of the Protein 3000 structural genomics project in Japan. Of these queries, 78 were found to align to templates with the same Pfam family as the query or had sequence identities > or = 30%. Another 49 queries were found to match more distantly related templates. Within this group, the template predicted by our method to be the closest functional relative was often not the most structurally similar. Several nontrivial cases are discussed in detail. Finally, 103 queries matched templates at the fold level, but not the family or superfamily level, and remain functionally uncharacterized. 2008 Wiley-Liss, Inc.

  11. Comparative analysis of the Oenococcus oeni pan genome reveals genetic diversity in industrially-relevant pathways

    Directory of Open Access Journals (Sweden)

    Borneman Anthony R

    2012-08-01

    Full Text Available Abstract Background Oenococcus oeni, a member of the lactic acid bacteria, is one of a limited number of microorganisms that not only survive, but actively proliferate in wine. It is also unusual as, unlike the majority of bacteria present in wine, it is beneficial to wine quality rather than causing spoilage. These benefits are realised primarily through catalysing malolactic fermentation, but also through imparting other positive sensory properties. However, many of these industrially-important secondary attributes have been shown to be strain-dependent and their genetic basis it yet to be determined. Results In order to investigate the scale and scope of genetic variation in O. oeni, we have performed whole-genome sequencing on eleven strains of this bacterium, bringing the total number of strains for which genome sequences are available to fourteen. While any single strain of O. oeni was shown to contain around 1800 protein-coding genes, in-depth comparative annotation based on genomic synteny and protein orthology identified over 2800 orthologous open reading frames that comprise the pan genome of this species, and less than 1200 genes that make up the conserved genomic core present in all of the strains. The expansion of the pan genome relative to the coding potential of individual strains was shown to be due to the varied presence and location of multiple distinct bacteriophage sequences and also in various metabolic functions with potential impacts on the industrial performance of this species, including cell wall exopolysaccharide biosynthesis, sugar transport and utilisation and amino acid biosynthesis. Conclusions By providing a large cohort of sequenced strains, this study provides a broad insight into the genetic variation present within O. oeni. This data is vital to understanding and harnessing the phenotypic variation present in this economically-important species.

  12. Comparative genomics of Geobacter chemotaxis genes reveals diverse signaling function

    Directory of Open Access Journals (Sweden)

    Antommattei Frances M

    2008-10-01

    Full Text Available Abstract Background Geobacter species are δ-Proteobacteria and are often the predominant species in a variety of sedimentary environments where Fe(III reduction is important. Their ability to remediate contaminated environments and produce electricity makes them attractive for further study. Cell motility, biofilm formation, and type IV pili all appear important for the growth of Geobacter in changing environments and for electricity production. Recent studies in other bacteria have demonstrated that signaling pathways homologous to the paradigm established for Escherichia coli chemotaxis can regulate type IV pili-dependent motility, the synthesis of flagella and type IV pili, the production of extracellular matrix material, and biofilm formation. The classification of these pathways by comparative genomics improves the ability to understand how Geobacter thrives in natural environments and better their use in microbial fuel cells. Results The genomes of G. sulfurreducens, G. metallireducens, and G. uraniireducens contain multiple (~70 homologs of chemotaxis genes arranged in several major clusters (six, seven, and seven, respectively. Unlike the single gene cluster of E. coli, the Geobacter clusters are not all located near the flagellar genes. The probable functions of some Geobacter clusters are assignable by homology to known pathways; others appear to be unique to the Geobacter sp. and contain genes of unknown function. We identified large numbers of methyl-accepting chemotaxis protein (MCP homologs that have diverse sensing domain architectures and generate a potential for sensing a great variety of environmental signals. We discuss mechanisms for class-specific segregation of the MCPs in the cell membrane, which serve to maintain pathway specificity and diminish crosstalk. Finally, the regulation of gene expression in Geobacter differs from E. coli. The sequences of predicted promoter elements suggest that the alternative sigma factors

  13. Comprehensive genomic characterization of campylobacter genus reveals some underlying mechanisms for its genomic diversification.

    Directory of Open Access Journals (Sweden)

    Yizhuang Zhou

    Full Text Available Campylobacter species.are phenotypically diverse in many aspects including host habitats and pathogenicities, which demands comprehensive characterization of the entire Campylobacter genus to study their underlying genetic diversification. Up to now, 34 Campylobacter strains have been sequenced and published in public databases, providing good opportunity to systemically analyze their genomic diversities. In this study, we first conducted genomic characterization, which includes genome-wide alignments, pan-genome analysis, and phylogenetic identification, to depict the genetic diversity of Campylobacter genus. Afterward, we improved the tetranucleotide usage pattern-based naïve Bayesian classifier to identify the abnormal composition fragments (ACFs, fragments with significantly different tetranucleotide frequency profiles from its genomic tetranucleotide frequency profiles including horizontal gene transfers (HGTs to explore the mechanisms for the genetic diversity of this organism. Finally, we analyzed the HGTs transferred via bacteriophage transductions. To our knowledge, this study is the first to use single nucleotide polymorphism information to construct liable microevolution phylogeny of 21 Campylobacter jejuni strains. Combined with the phylogeny of all the collected Campylobacter species based on genome-wide core gene information, comprehensive phylogenetic inference of all 34 Campylobacter organisms was determined. It was found that C. jejuni harbors a high fraction of ACFs possibly through intraspecies recombination, whereas other Campylobacter members possess numerous ACFs possibly via intragenus recombination. Furthermore, some Campylobacter strains have undergone significant ancient viral integration during their evolution process. The improved method is a powerful tool for bacterial genomic analysis. Moreover, the findings would provide useful information for future research on Campylobacter genus.

  14. Pancreatic cancer genomes reveal aberrations in axon guidance pathway genes.

    Science.gov (United States)

    Biankin, Andrew V; Waddell, Nicola; Kassahn, Karin S; Gingras, Marie-Claude; Muthuswamy, Lakshmi B; Johns, Amber L; Miller, David K; Wilson, Peter J; Patch, Ann-Marie; Wu, Jianmin; Chang, David K; Cowley, Mark J; Gardiner, Brooke B; Song, Sarah; Harliwong, Ivon; Idrisoglu, Senel; Nourse, Craig; Nourbakhsh, Ehsan; Manning, Suzanne; Wani, Shivangi; Gongora, Milena; Pajic, Marina; Scarlett, Christopher J; Gill, Anthony J; Pinho, Andreia V; Rooman, Ilse; Anderson, Matthew; Holmes, Oliver; Leonard, Conrad; Taylor, Darrin; Wood, Scott; Xu, Qinying; Nones, Katia; Fink, J Lynn; Christ, Angelika; Bruxner, Tim; Cloonan, Nicole; Kolle, Gabriel; Newell, Felicity; Pinese, Mark; Mead, R Scott; Humphris, Jeremy L; Kaplan, Warren; Jones, Marc D; Colvin, Emily K; Nagrial, Adnan M; Humphrey, Emily S; Chou, Angela; Chin, Venessa T; Chantrill, Lorraine A; Mawson, Amanda; Samra, Jaswinder S; Kench, James G; Lovell, Jessica A; Daly, Roger J; Merrett, Neil D; Toon, Christopher; Epari, Krishna; Nguyen, Nam Q; Barbour, Andrew; Zeps, Nikolajs; Kakkar, Nipun; Zhao, Fengmei; Wu, Yuan Qing; Wang, Min; Muzny, Donna M; Fisher, William E; Brunicardi, F Charles; Hodges, Sally E; Reid, Jeffrey G; Drummond, Jennifer; Chang, Kyle; Han, Yi; Lewis, Lora R; Dinh, Huyen; Buhay, Christian J; Beck, Timothy; Timms, Lee; Sam, Michelle; Begley, Kimberly; Brown, Andrew; Pai, Deepa; Panchal, Ami; Buchner, Nicholas; De Borja, Richard; Denroche, Robert E; Yung, Christina K; Serra, Stefano; Onetto, Nicole; Mukhopadhyay, Debabrata; Tsao, Ming-Sound; Shaw, Patricia A; Petersen, Gloria M; Gallinger, Steven; Hruban, Ralph H; Maitra, Anirban; Iacobuzio-Donahue, Christine A; Schulick, Richard D; Wolfgang, Christopher L; Morgan, Richard A; Lawlor, Rita T; Capelli, Paola; Corbo, Vincenzo; Scardoni, Maria; Tortora, Giampaolo; Tempero, Margaret A; Mann, Karen M; Jenkins, Nancy A; Perez-Mancera, Pedro A; Adams, David J; Largaespada, David A; Wessels, Lodewyk F A; Rust, Alistair G; Stein, Lincoln D; Tuveson, David A; Copeland, Neal G; Musgrove, Elizabeth A; Scarpa, Aldo; Eshleman, James R; Hudson, Thomas J; Sutherland, Robert L; Wheeler, David A; Pearson, John V; McPherson, John D; Gibbs, Richard A; Grimmond, Sean M

    2012-11-15

    Pancreatic cancer is a highly lethal malignancy with few effective therapies. We performed exome sequencing and copy number analysis to define genomic aberrations in a prospectively accrued clinical cohort (n = 142) of early (stage I and II) sporadic pancreatic ductal adenocarcinoma. Detailed analysis of 99 informative tumours identified substantial heterogeneity with 2,016 non-silent mutations and 1,628 copy-number variations. We define 16 significantly mutated genes, reaffirming known mutations (KRAS, TP53, CDKN2A, SMAD4, MLL3, TGFBR2, ARID1A and SF3B1), and uncover novel mutated genes including additional genes involved in chromatin modification (EPC1 and ARID2), DNA damage repair (ATM) and other mechanisms (ZIM2, MAP2K4, NALCN, SLC16A4 and MAGEA6). Integrative analysis with in vitro functional data and animal models provided supportive evidence for potential roles for these genetic aberrations in carcinogenesis. Pathway-based analysis of recurrently mutated genes recapitulated clustering in core signalling pathways in pancreatic ductal adenocarcinoma, and identified new mutated genes in each pathway. We also identified frequent and diverse somatic aberrations in genes described traditionally as embryonic regulators of axon guidance, particularly SLIT/ROBO signalling, which was also evident in murine Sleeping Beauty transposon-mediated somatic mutagenesis models of pancreatic cancer, providing further supportive evidence for the potential involvement of axon guidance genes in pancreatic carcinogenesis.

  15. Nationwide Genomic Study in Denmark Reveals Remarkable Population Homogeneity.

    Science.gov (United States)

    Athanasiadis, Georgios; Cheng, Jade Y; Vilhjálmsson, Bjarni J; Jørgensen, Frank G; Als, Thomas D; Le Hellard, Stephanie; Espeseth, Thomas; Sullivan, Patrick F; Hultman, Christina M; Kjærgaard, Peter C; Schierup, Mikkel H; Mailund, Thomas

    2016-10-01

    Denmark has played a substantial role in the history of Northern Europe. Through a nationwide scientific outreach initiative, we collected genetic and anthropometrical data from ∼800 high school students and used them to elucidate the genetic makeup of the Danish population, as well as to assess polygenic predictions of phenotypic traits in adolescents. We observed remarkable homogeneity across different geographic regions, although we could still detect weak signals of genetic structure reflecting the history of the country. Denmark presented genomic affinity with primarily neighboring countries with overall resemblance of decreasing weight from Britain, Sweden, Norway, Germany, and France. A Polish admixture signal was detected in Zealand and Funen, and our date estimates coincided with historical evidence of Wend settlements in the south of Denmark. We also observed considerably diverse demographic histories among Scandinavian countries, with Denmark having the smallest current effective population size compared to Norway and Sweden. Finally, we found that polygenic prediction of self-reported adolescent height in the population was remarkably accurate (R 2 = 0.639 ± 0.015). The high homogeneity of the Danish population could render population structure a lesser concern for the upcoming large-scale gene-mapping studies in the country. Copyright © 2016 by the Genetics Society of America.

  16. Whole-genome resequencing reveals candidate mutations for pig prolificacy.

    Science.gov (United States)

    Li, Wen-Ting; Zhang, Meng-Meng; Li, Qi-Gang; Tang, Hui; Zhang, Li-Fan; Wang, Ke-Jun; Zhu, Mu-Zhen; Lu, Yun-Feng; Bao, Hai-Gang; Zhang, Yuan-Ming; Li, Qiu-Yan; Wu, Ke-Liang; Wu, Chang-Xin

    2017-12-20

    Changes in pig fertility have occurred as a result of domestication, but are not understood at the level of genetic variation. To identify variations potentially responsible for prolificacy, we sequenced the genomes of the highly prolific Taihu pig breed and four control breeds. Genes involved in embryogenesis and morphogenesis were targeted in the Taihu pig, consistent with the morphological differences observed between the Taihu pig and others during pregnancy. Additionally, excessive functional non-coding mutations have been specifically fixed or nearly fixed in the Taihu pig. We focused attention on an oestrogen response element (ERE) within the first intron of the bone morphogenetic protein receptor type-1B gene ( BMPR1B ) that overlaps with a known quantitative trait locus (QTL) for pig fecundity. Using 242 pigs from 30 different breeds, we confirmed that the genotype of the ERE was nearly fixed in the Taihu pig. ERE function was assessed by luciferase assays, examination of histological sections, chromatin immunoprecipitation, quantitative polymerase chain reactions, and western blots. The results suggest that the ERE may control pig prolificacy via the cis-regulation of BMPR1B expression. This study provides new insight into changes in reproductive performance and highlights the role of non-coding mutations in generating phenotypic diversity between breeds. © 2017 The Author(s).

  17. Whole genome sequencing reveals genomic heterogeneity and antibiotic purification in Mycobacterium tuberculosis isolates

    KAUST Repository

    Black, PA

    2015-10-24

    Background Whole genome sequencing has revolutionised the interrogation of mycobacterial genomes. Recent studies have reported conflicting findings on the genomic stability of Mycobacterium tuberculosis during the evolution of drug resistance. In an age where whole genome sequencing is increasingly relied upon for defining the structure of bacterial genomes, it is important to investigate the reliability of next generation sequencing to identify clonal variants present in a minor percentage of the population. This study aimed to define a reliable cut-off for identification of low frequency sequence variants and to subsequently investigate genetic heterogeneity and the evolution of drug resistance in M. tuberculosis. Methods Genomic DNA was isolated from single colonies from 14 rifampicin mono-resistant M. tuberculosis isolates, as well as the primary cultures and follow up MDR cultures from two of these patients. The whole genomes of the M. tuberculosis isolates were sequenced using either the Illumina MiSeq or Illumina HiSeq platforms. Sequences were analysed with an in-house pipeline. Results Using next-generation sequencing in combination with Sanger sequencing and statistical analysis we defined a read frequency cut-off of 30 % to identify low frequency M. tuberculosis variants with high confidence. Using this cut-off we demonstrated a high rate of genetic diversity between single colonies isolated from one population, showing that by using the current sequencing technology, single colonies are not a true reflection of the genetic diversity within a whole population and vice versa. We further showed that numerous heterogeneous variants emerge and then disappear during the evolution of isoniazid resistance within individual patients. Our findings allowed us to formulate a model for the selective bottleneck which occurs during the course of infection, acting as a genomic purification event. Conclusions Our study demonstrated true levels of genetic diversity

  18. Supplementary Material for: Whole genome sequencing reveals genomic heterogeneity and antibiotic purification in Mycobacterium tuberculosis isolates

    KAUST Repository

    Black, PA

    2015-01-01

    Abstract Background Whole genome sequencing has revolutionised the interrogation of mycobacterial genomes. Recent studies have reported conflicting findings on the genomic stability of Mycobacterium tuberculosis during the evolution of drug resistance. In an age where whole genome sequencing is increasingly relied upon for defining the structure of bacterial genomes, it is important to investigate the reliability of next generation sequencing to identify clonal variants present in a minor percentage of the population. This study aimed to define a reliable cut-off for identification of low frequency sequence variants and to subsequently investigate genetic heterogeneity and the evolution of drug resistance in M. tuberculosis. Methods Genomic DNA was isolated from single colonies from 14 rifampicin mono-resistant M. tuberculosis isolates, as well as the primary cultures and follow up MDR cultures from two of these patients. The whole genomes of the M. tuberculosis isolates were sequenced using either the Illumina MiSeq or Illumina HiSeq platforms. Sequences were analysed with an in-house pipeline. Results Using next-generation sequencing in combination with Sanger sequencing and statistical analysis we defined a read frequency cut-off of 30 % to identify low frequency M. tuberculosis variants with high confidence. Using this cut-off we demonstrated a high rate of genetic diversity between single colonies isolated from one population, showing that by using the current sequencing technology, single colonies are not a true reflection of the genetic diversity within a whole population and vice versa. We further showed that numerous heterogeneous variants emerge and then disappear during the evolution of isoniazid resistance within individual patients. Our findings allowed us to formulate a model for the selective bottleneck which occurs during the course of infection, acting as a genomic purification event. Conclusions Our study demonstrated true levels of genetic

  19. Deep Investigation of Arabidopsis thaliana Junk DNA Reveals a Continuum between Repetitive Elements and Genomic Dark Matter

    Science.gov (United States)

    Maumus, Florian; Quesneville, Hadi

    2014-01-01

    Eukaryotic genomes contain highly variable amounts of DNA with no apparent function. This so-called junk DNA is composed of two components: repeated and repeat-derived sequences (together referred to as the repeatome), and non-annotated sequences also known as genomic dark matter. Because of their high duplication rates as compared to other genomic features, transposable elements are predominant contributors to the repeatome and the products of their decay is thought to be a major source of genomic dark matter. Determining the origin and composition of junk DNA is thus important to help understanding genome evolution as well as host biology. In this study, we have used a combination of tools enabling to show that the repeatome from the small and reducing A. thaliana genome is significantly larger than previously thought. Furthermore, we present the concepts and results from a series of innovative approaches suggesting that a significant amount of the A. thaliana dark matter is of repetitive origin. As a tentative standard for the community, we propose a deep compendium annotation of the A. thaliana repeatome that may help addressing farther genome evolution as well as transcriptional and epigenetic regulation in this model plant. PMID:24709859

  20. Comparative genomics provides insights into the lifestyle and reveals functional heterogeneity of dark septate endophytic fungi.

    Science.gov (United States)

    Knapp, Dániel G; Németh, Julianna B; Barry, Kerrie; Hainaut, Matthieu; Henrissat, Bernard; Johnson, Jenifer; Kuo, Alan; Lim, Joanne Hui Ping; Lipzen, Anna; Nolan, Matt; Ohm, Robin A; Tamás, László; Grigoriev, Igor V; Spatafora, Joseph W; Nagy, László G; Kovács, Gábor M

    2018-04-20

    Dark septate endophytes (DSE) are a form-group of root endophytic fungi with elusive functions. Here, the genomes of two common DSE of semiarid areas, Cadophora sp. and Periconia macrospinosa were sequenced and analyzed with another 32 ascomycetes of different lifestyles. Cadophora sp. (Helotiales) and P. macrospinosa (Pleosporales) have genomes of 70.46 Mb and 54.99 Mb with 22,766 and 18,750 gene models, respectively. The majority of DSE-specific protein clusters lack functional annotation with no similarity to characterized proteins, implying that they have evolved unique genetic innovations. Both DSE possess an expanded number of carbohydrate active enzymes (CAZymes), including plant cell wall degrading enzymes (PCWDEs). Those were similar in three other DSE, and contributed a signal for the separation of root endophytes in principal component analyses of CAZymes, indicating shared genomic traits of DSE fungi. Number of secreted proteases and lipases, aquaporins, and genes linked to melanin synthesis were also relatively high in our fungi. In spite of certain similarities between our two DSE, we observed low levels of convergence in their gene family evolution. This suggests that, despite originating from the same habitat, these two fungi evolved along different evolutionary trajectories and display considerable functional differences within the endophytic lifestyle.

  1. De novo assembly and annotation of the Asian tiger mosquito (Aedes albopictus) repeatome with dnaPipeTE from raw genomic reads and comparative analysis with the yellow fever mosquito (Aedes aegypti).

    Science.gov (United States)

    Goubert, Clément; Modolo, Laurent; Vieira, Cristina; ValienteMoro, Claire; Mavingui, Patrick; Boulesteix, Matthieu

    2015-03-11

    Repetitive DNA, including transposable elements (TEs), is found throughout eukaryotic genomes. Annotating and assembling the "repeatome" during genome-wide analysis often poses a challenge. To address this problem, we present dnaPipeTE-a new bioinformatics pipeline that uses a sample of raw genomic reads. It produces precise estimates of repeated DNA content and TE consensus sequences, as well as the relative ages of TE families. We shows that dnaPipeTE performs well using very low coverage sequencing in different genomes, losing accuracy only with old TE families. We applied this pipeline to the genome of the Asian tiger mosquito Aedes albopictus, an invasive species of human health interest, for which the genome size is estimated to be over 1 Gbp. Using dnaPipeTE, we showed that this species harbors a large (50% of the genome) and potentially active repeatome with an overall TE class and order composition similar to that of Aedes aegypti, the yellow fever mosquito. However, intraorder dynamics show clear distinctions between the two species, with differences at the TE family level. Our pipeline's ability to manage the repeatome annotation problem will make it helpful for new or ongoing assembly projects, and our results will benefit future genomic studies of A. albopictus. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  2. Comparative Genome Analysis Reveals Divergent Genome Size Evolution in a Carnivorous Plant Genus

    Czech Academy of Sciences Publication Activity Database

    Vu, G.T.H.; Schmutzer, T.; Bull, F.; Cao, H.X.; Fuchs, J.; Tran, T.D.; Jovtchev, G.; Pistrick, K.; Stein, N.; Pečinka, A.; Neumann, Pavel; Novák, Petr; Macas, Jiří; Dear, P.H.; Blattner, F.R.; Scholz, U.; Schubert, I.

    2015-01-01

    Roč. 8, č. 3 (2015) ISSN 1940-3372 R&D Projects: GA ČR GBP501/12/G090 Institutional support: RVO:60077344 Keywords : Genlisea * genome * repetitive sequences Subject RIV: EB - Genetics ; Molecular Biology Impact factor: 3.509, year: 2015

  3. Whole genome sequencing reveals genomic heterogeneity and antibiotic purification in Mycobacterium tuberculosis isolates

    KAUST Repository

    Black, PA; de Vos, M.; Louw, GE; van der Merwe, RG; Dippenaar, A.; Streicher, EM; Abdallah, A. M.; Sampson, SL; Victor, TC; Dolby, T.; Simpson, JA; van Helden, PD; Warren, RM; Pain, Arnab

    2015-01-01

    Our study demonstrated true levels of genetic diversity within an M. tuberculosis population and showed that genetic diversity may be re-defined when a selective pressure, such as drug exposure, is imposed on M. tuberculosis populations during the course of infection. This suggests that the genome of M. tuberculosis is more dynamic than previously thought, suggesting preparedness to respond to a changing environment.

  4. Elephantid Genomes Reveal the Molecular Bases of Woolly Mammoth Adaptations to the Arctic

    Directory of Open Access Journals (Sweden)

    Vincent J. Lynch

    2015-07-01

    Full Text Available Woolly mammoths and living elephants are characterized by major phenotypic differences that have allowed them to live in very different environments. To identify the genetic changes that underlie the suite of woolly mammoth adaptations to extreme cold, we sequenced the nuclear genome from three Asian elephants and two woolly mammoths, and we identified and functionally annotated genetic changes unique to woolly mammoths. We found that genes with mammoth-specific amino acid changes are enriched in functions related to circadian biology, skin and hair development and physiology, lipid metabolism, adipose development and physiology, and temperature sensation. Finally, we resurrected and functionally tested the mammoth and ancestral elephant TRPV3 gene, which encodes a temperature-sensitive transient receptor potential (thermoTRP channel involved in thermal sensation and hair growth, and we show that a single mammoth-specific amino acid substitution in an otherwise highly conserved region of the TRPV3 channel strongly affects its temperature sensitivity.

  5. Genome-wide and functional annotation of human E3 ubiquitin ligases identifies MULAN, a mitochondrial E3 that regulates the organelle's dynamics and signaling.

    Directory of Open Access Journals (Sweden)

    Wei Li

    2008-01-01

    Full Text Available Specificity of protein ubiquitylation is conferred by E3 ubiquitin (Ub ligases. We have annotated approximately 617 putative E3s and substrate-recognition subunits of E3 complexes encoded in the human genome. The limited knowledge of the function of members of the large E3 superfamily prompted us to generate genome-wide E3 cDNA and RNAi expression libraries designed for functional screening. An imaging-based screen using these libraries to identify E3s that regulate mitochondrial dynamics uncovered MULAN/FLJ12875, a RING finger protein whose ectopic expression and knockdown both interfered with mitochondrial trafficking and morphology. We found that MULAN is a mitochondrial protein - two transmembrane domains mediate its localization to the organelle's outer membrane. MULAN is oriented such that its E3-active, C-terminal RING finger is exposed to the cytosol, where it has access to other components of the Ub system. Both an intact RING finger and the correct subcellular localization were required for regulation of mitochondrial dynamics, suggesting that MULAN's downstream effectors are proteins that are either integral to, or associated with, mitochondria and that become modified with Ub. Interestingly, MULAN had previously been identified as an activator of NF-kappaB, thus providing a link between mitochondrial dynamics and mitochondria-to-nucleus signaling. These findings suggest the existence of a new, Ub-mediated mechanism responsible for integration of mitochondria into the cellular environment.

  6. Comparative genomics of neuroglobin reveals its early origins.

    Directory of Open Access Journals (Sweden)

    Jasmin Dröge

    Full Text Available Neuroglobin (Ngb is a hexacoordinated globin expressed mainly in the central and peripheral nervous system of vertebrates. Although several hypotheses have been put forward regarding the role of neuroglobin, its definite function remains uncertain. Ngb appears to have a neuro-protective role enhancing cell viability under hypoxia and other types of oxidative stress. Ngb is phylogenetically ancient and has a substitution rate nearly four times lower than that of other vertebrate globins, e.g. hemoglobin. Despite its high sequence conservation among vertebrates Ngb seems to be elusive in invertebrates.We determined candidate orthologs in invertebrates and identified a globin of the placozoan Trichoplax adhaerens that is most likely orthologous to vertebrate Ngb and confirmed the orthologous relationship of the polymeric globin of the sea urchin Strongylocentrotus purpuratus to Ngb. The putative orthologous globin genes are located next to genes orthologous to vertebrate POMT2 similarly to localization of vertebrate Ngb. The shared syntenic position of the globins from Trichoplax, the sea urchin and of vertebrate Ngb strongly suggests that they are orthologous. A search for conserved transcription factor binding sites (TFBSs in the promoter regions of the Ngb genes of different vertebrates via phylogenetic footprinting revealed several TFBSs, which may contribute to the specific expression of Ngb, whereas a comparative analysis with myoglobin revealed several common TFBSs, suggestive of regulatory mechanisms common to globin genes.Identification of the placozoan and echinoderm genes orthologous to vertebrate neuroglobin strongly supports the hypothesis of the early evolutionary origin of this globin, as it shows that neuroglobin was already present in the placozoan-bilaterian last common ancestor. Computational determination of the transcription factor binding sites repertoire provides on the one hand a set of transcriptional factors that are

  7. Repeat associated mechanisms of genome evolution and function revealed by the Mus caroli and Mus pahari genomes.

    Science.gov (United States)

    Thybert, David; Roller, Maša; Navarro, Fábio C P; Fiddes, Ian; Streeter, Ian; Feig, Christine; Martin-Galvez, David; Kolmogorov, Mikhail; Janoušek, Václav; Akanni, Wasiu; Aken, Bronwen; Aldridge, Sarah; Chakrapani, Varshith; Chow, William; Clarke, Laura; Cummins, Carla; Doran, Anthony; Dunn, Matthew; Goodstadt, Leo; Howe, Kerstin; Howell, Matthew; Josselin, Ambre-Aurore; Karn, Robert C; Laukaitis, Christina M; Jingtao, Lilue; Martin, Fergal; Muffato, Matthieu; Nachtweide, Stefanie; Quail, Michael A; Sisu, Cristina; Stanke, Mario; Stefflova, Klara; Van Oosterhout, Cock; Veyrunes, Frederic; Ward, Ben; Yang, Fengtang; Yazdanifar, Golbahar; Zadissa, Amonida; Adams, David J; Brazma, Alvis; Gerstein, Mark; Paten, Benedict; Pham, Son; Keane, Thomas M; Odom, Duncan T; Flicek, Paul

    2018-04-01

    Understanding the mechanisms driving lineage-specific evolution in both primates and rodents has been hindered by the lack of sister clades with a similar phylogenetic structure having high-quality genome assemblies. Here, we have created chromosome-level assemblies of the Mus caroli and Mus pahari genomes. Together with the Mus musculus and Rattus norvegicus genomes, this set of rodent genomes is similar in divergence times to the Hominidae (human-chimpanzee-gorilla-orangutan). By comparing the evolutionary dynamics between the Muridae and Hominidae, we identified punctate events of chromosome reshuffling that shaped the ancestral karyotype of Mus musculus and Mus caroli between 3 and 6 million yr ago, but that are absent in the Hominidae. Hominidae show between four- and sevenfold lower rates of nucleotide change and feature turnover in both neutral and functional sequences, suggesting an underlying coherence to the Muridae acceleration. Our system of matched, high-quality genome assemblies revealed how specific classes of repeats can play lineage-specific roles in related species. Recent LINE activity has remodeled protein-coding loci to a greater extent across the Muridae than the Hominidae, with functional consequences at the species level such as reproductive isolation. Furthermore, we charted a Muridae-specific retrotransposon expansion at unprecedented resolution, revealing how a single nucleotide mutation transformed a specific SINE element into an active CTCF binding site carrier specifically in Mus caroli , which resulted in thousands of novel, species-specific CTCF binding sites. Our results show that the comparison of matched phylogenetic sets of genomes will be an increasingly powerful strategy for understanding mammalian biology. © 2018 Thybert et al.; Published by Cold Spring Harbor Laboratory Press.

  8. Repeat associated mechanisms of genome evolution and function revealed by the Mus caroli and Mus pahari genomes

    Science.gov (United States)

    Thybert, David; Roller, Maša; Navarro, Fábio C.P.; Fiddes, Ian; Streeter, Ian; Feig, Christine; Martin-Galvez, David; Kolmogorov, Mikhail; Janoušek, Václav; Akanni, Wasiu; Aken, Bronwen; Aldridge, Sarah; Chakrapani, Varshith; Chow, William; Clarke, Laura; Cummins, Carla; Doran, Anthony; Dunn, Matthew; Goodstadt, Leo; Howe, Kerstin; Howell, Matthew; Josselin, Ambre-Aurore; Karn, Robert C.; Laukaitis, Christina M.; Jingtao, Lilue; Martin, Fergal; Muffato, Matthieu; Nachtweide, Stefanie; Quail, Michael A.; Sisu, Cristina; Stanke, Mario; Stefflova, Klara; Van Oosterhout, Cock; Veyrunes, Frederic; Ward, Ben; Yang, Fengtang; Yazdanifar, Golbahar; Zadissa, Amonida; Adams, David J.; Brazma, Alvis; Gerstein, Mark; Paten, Benedict; Pham, Son; Keane, Thomas M.; Odom, Duncan T.; Flicek, Paul

    2018-01-01

    Understanding the mechanisms driving lineage-specific evolution in both primates and rodents has been hindered by the lack of sister clades with a similar phylogenetic structure having high-quality genome assemblies. Here, we have created chromosome-level assemblies of the Mus caroli and Mus pahari genomes. Together with the Mus musculus and Rattus norvegicus genomes, this set of rodent genomes is similar in divergence times to the Hominidae (human-chimpanzee-gorilla-orangutan). By comparing the evolutionary dynamics between the Muridae and Hominidae, we identified punctate events of chromosome reshuffling that shaped the ancestral karyotype of Mus musculus and Mus caroli between 3 and 6 million yr ago, but that are absent in the Hominidae. Hominidae show between four- and sevenfold lower rates of nucleotide change and feature turnover in both neutral and functional sequences, suggesting an underlying coherence to the Muridae acceleration. Our system of matched, high-quality genome assemblies revealed how specific classes of repeats can play lineage-specific roles in related species. Recent LINE activity has remodeled protein-coding loci to a greater extent across the Muridae than the Hominidae, with functional consequences at the species level such as reproductive isolation. Furthermore, we charted a Muridae-specific retrotransposon expansion at unprecedented resolution, revealing how a single nucleotide mutation transformed a specific SINE element into an active CTCF binding site carrier specifically in Mus caroli, which resulted in thousands of novel, species-specific CTCF binding sites. Our results show that the comparison of matched phylogenetic sets of genomes will be an increasingly powerful strategy for understanding mammalian biology. PMID:29563166

  9. Genetic variation architecture of mitochondrial genome reveals the differentiation in Korean landrace and weedy rice

    OpenAIRE

    Wei Tong; Qiang He; Yong-Jin Park

    2017-01-01

    Mitochondrial genome variations have been detected despite the overall conservation of this gene content, which has been valuable for plant population genetics and evolutionary studies. Here, we describe mitochondrial variation architecture and our performance of a phylogenetic dissection of Korean landrace and weedy rice. A total of 4,717 variations across the mitochondrial genome were identified adjunct with 10 wild rice. Genetic diversity assessment revealed that wild rice has higher nucle...

  10. Comparative genomics analyses revealed two virulent Listeria monocytogenes strains isolated from ready-to-eat food.

    Science.gov (United States)

    Lim, Shu Yong; Yap, Kien-Pong; Thong, Kwai Lin

    2016-01-01

    Listeria monocytogenes is an important foodborne pathogen that causes considerable morbidity in humans with high mortality rates. In this study, we have sequenced the genomes and performed comparative genomics analyses on two strains, LM115 and LM41, isolated from ready-to-eat food in Malaysia. The genome size of LM115 and LM41 was 2,959,041 and 2,963,111 bp, respectively. These two strains shared approximately 90% homologous genes. Comparative genomics and phylogenomic analyses revealed that LM115 and LM41 were more closely related to the reference strains F2365 and EGD-e, respectively. Our virulence profiling indicated a total of 31 virulence genes shared by both analysed strains. These shared genes included those that encode for internalins and L. monocytogenes pathogenicity island 1 (LIPI-1). Both the Malaysian L. monocytogenes strains also harboured several genes associated with stress tolerance to counter the adverse conditions. Seven antibiotic and efflux pump related genes which may confer resistance against lincomycin, erythromycin, fosfomycin, quinolone, tetracycline, and penicillin, and macrolides were identified in the genomes of both strains. Whole genome sequencing and comparative genomics analyses revealed two virulent L. monocytogenes strains isolated from ready-to-eat foods in Malaysia. The identification of strains with pathogenic, persistent, and antibiotic resistant potentials from minimally processed food warrant close attention from both healthcare and food industry.

  11. Nomadic lifestyle of Lactobacillus plantarum revealed by comparative genomics of 54 strains isolated from different habitats.

    Science.gov (United States)

    Martino, Maria Elena; Bayjanov, Jumamurat R; Caffrey, Brian E; Wels, Michiel; Joncour, Pauline; Hughes, Sandrine; Gillet, Benjamin; Kleerebezem, Michiel; van Hijum, Sacha A F T; Leulier, François

    2016-12-01

    The ability of bacteria to adapt to diverse environmental conditions is well-known. The process of bacterial adaptation to a niche has been linked to large changes in the genome content, showing that many bacterial genomes reflect the constraints imposed by their habitat. However, some highly versatile bacteria are found in diverse habitats that almost share nothing in common. Lactobacillus plantarum is a lactic acid bacterium that is found in a large variety of habitat. With the aim of unravelling the link between evolution and ecological versatility of L. plantarum, we analysed the genomes of 54 L. plantarum strains isolated from different environments. Comparative genome analysis identified a high level of genomic diversity and plasticity among the strains analysed. Phylogenomic and functional divergence studies coupled with gene-trait matching analyses revealed a mixed distribution of the strains, which was uncoupled from their environmental origin. Our findings revealed the absence of specific genomic signatures marking adaptations of L. plantarum towards the diverse habitats it is associated with. This suggests fundamentally similar trends of genome evolution in L. plantarum, which occur in a manner that is apparently uncoupled from ecological constraint and reflects the nomadic lifestyle of this species. © 2016 The Authors. Environmental Microbiology published by Society for Applied Microbiology and John Wiley & Sons Ltd.

  12. Interspecific Comparison and annotation of two complete mitochondrial genome sequences from the plant pathogenic fungus Mycosphaerella graminicola

    Energy Technology Data Exchange (ETDEWEB)

    Millenbaugh, Bonnie A; Pangilinan, Jasmyn L.; Torriani, Stefano F.F.; Goodwin, Stephen B.; Kema, Gert H.J.; McDonald, Bruce A.

    2007-12-07

    The mitochondrial genomes of two isolates of the wheat pathogen Mycosphaerella graminicola were sequenced completely and compared to identify polymorphic regions. This organism is of interest because it is phylogenetically distant from other fungi with sequenced mitochondrial genomes and it has shown discordant patterns of nuclear and mitochondrial diversity. The mitochondrial genome of M. graminicola is a circular molecule of approximately 43,960 bp containing the typical genes coding for 14 proteins related to oxidative phosphorylation, one RNA polymerase, two rRNA genes and a set of 27 tRNAs. The mitochondrial DNA of M. graminicola lacks the gene encoding the putative ribosomal protein (rps5-like), commonly found in fungal mitochondrial genomes. Most of the tRNA genes were clustered with a gene order conserved with many other ascomycetes. A sample of thirty-five additional strains representing the known global mt diversity was partially sequenced to measure overall mitochondrial variability within the species. Little variation was found, confirming previous RFLP-based findings of low mitochondrial diversity. The mitochondrial sequence of M. graminicola is the first reported from the family Mycosphaerellaceae or the order Capnodiales. The sequence also provides a tool to better understand the development of fungicide resistance and the conflicting pattern of high nuclear and low mitochondrial diversity in global populations of this fungus.

  13. Whole genome sequencing and analysis of Campylobacter coli YH502 from retail chicken reveals a plasmid-borne type VI secretion system

    Directory of Open Access Journals (Sweden)

    Sandeep Ghatak

    2017-03-01

    Full Text Available Campylobacter is a major cause of foodborne illnesses worldwide. Campylobacter infections, commonly caused by ingestion of undercooked poultry and meat products, can lead to gastroenteritis and chronic reactive arthritis in humans. Whole genome sequencing (WGS is a powerful technology that provides comprehensive genetic information about bacteria and is increasingly being applied to study foodborne pathogens: e.g., evolution, epidemiology/outbreak investigation, and detection. Herein we report the complete genome sequence of Campylobacter coli strain YH502 isolated from retail chicken in the United States. WGS, de novo assembly, and annotation of the genome revealed a chromosome of 1,718,974 bp and a mega-plasmid (pCOS502 of 125,964 bp. GC content of the genome was 31.2% with 1931 coding sequences and 53 non-coding RNAs. Multiple virulence factors including a plasmid-borne type VI secretion system and antimicrobial resistance genes (beta-lactams, fluoroquinolones, and aminoglycoside were found. The presence of T6SS in a mobile genetic element (plasmid suggests plausible horizontal transfer of these virulence genes to other organisms. The C. coli YH502 genome also harbors CRISPR sequences and associated proteins. Phylogenetic analysis based on average nucleotide identity and single nucleotide polymorphisms identified closely related C. coli genomes available in the NCBI database. Taken together, the analyzed genomic data of this potentially virulent strain of C. coli will facilitate further understanding of this important foodborne pathogen most likely leading to better control strategies. The chromosome and plasmid sequences of C. coli YH502 have been deposited in GenBank under the accession numbers CP018900.1 and CP018901.1, respectively.

  14. Probing Genomic Aspects of the Multi-Host Pathogen Clostridium perfringens Reveals Significant Pangenome Diversity, and a Diverse Array of Virulence Factors

    Directory of Open Access Journals (Sweden)

    Raymond Kiu

    2017-12-01

    Full Text Available Clostridium perfringens is an important cause of animal and human infections, however information about the genetic makeup of this pathogenic bacterium is currently limited. In this study, we sought to understand and characterise the genomic variation, pangenomic diversity, and key virulence traits of 56 C. perfringens strains which included 51 public, and 5 newly sequenced and annotated genomes using Whole Genome Sequencing. Our investigation revealed that C. perfringens has an “open” pangenome comprising 11667 genes and 12.6% of core genes, identified as the most divergent single-species Gram-positive bacterial pangenome currently reported. Our computational analyses also defined C. perfringens phylogeny (16S rRNA gene in relation to some 25 Clostridium species, with C. baratii and C. sardiniense determined to be the closest relatives. Profiling virulence-associated factors confirmed presence of well-characterised C. perfringens-associated exotoxins genes including α-toxin (plc, enterotoxin (cpe, and Perfringolysin O (pfo or pfoA, although interestingly there did not appear to be a close correlation with encoded toxin type and disease phenotype. Furthermore, genomic analysis indicated significant horizontal gene transfer events as defined by presence of prophage genomes, and notably absence of CRISPR defence systems in >70% (40/56 of the strains. In relation to antimicrobial resistance mechanisms, tetracycline resistance genes (tet and anti-defensins genes (mprF were consistently detected in silico (tet: 75%; mprF: 100%. However, pre-antibiotic era strain genomes did not encode for tet, thus implying antimicrobial selective pressures in C. perfringens evolutionary history over the past 80 years. This study provides new genomic understanding of this genetically divergent multi-host bacterium, and further expands our knowledge on this medically and veterinary important pathogen.

  15. Probing Genomic Aspects of the Multi-Host Pathogen Clostridium perfringens Reveals Significant Pangenome Diversity, and a Diverse Array of Virulence Factors.

    Science.gov (United States)

    Kiu, Raymond; Caim, Shabhonam; Alexander, Sarah; Pachori, Purnima; Hall, Lindsay J

    2017-01-01

    Clostridium perfringens is an important cause of animal and human infections, however information about the genetic makeup of this pathogenic bacterium is currently limited. In this study, we sought to understand and characterise the genomic variation, pangenomic diversity, and key virulence traits of 56 C. perfringens strains which included 51 public, and 5 newly sequenced and annotated genomes using Whole Genome Sequencing. Our investigation revealed that C. perfringens has an "open" pangenome comprising 11667 genes and 12.6% of core genes, identified as the most divergent single-species Gram-positive bacterial pangenome currently reported. Our computational analyses also defined C. perfringens phylogeny (16S rRNA gene) in relation to some 25 Clostridium species, with C. baratii and C. sardiniense determined to be the closest relatives. Profiling virulence-associated factors confirmed presence of well-characterised C. perfringens -associated exotoxins genes including α-toxin ( plc ), enterotoxin ( cpe ), and Perfringolysin O ( pfo or pfoA ), although interestingly there did not appear to be a close correlation with encoded toxin type and disease phenotype. Furthermore, genomic analysis indicated significant horizontal gene transfer events as defined by presence of prophage genomes, and notably absence of CRISPR defence systems in >70% (40/56) of the strains. In relation to antimicrobial resistance mechanisms, tetracycline resistance genes ( tet ) and anti-defensins genes ( mprF ) were consistently detected in silico ( tet : 75%; mprF : 100%). However, pre-antibiotic era strain genomes did not encode for tet , thus implying antimicrobial selective pressures in C. perfringens evolutionary history over the past 80 years. This study provides new genomic understanding of this genetically divergent multi-host bacterium, and further expands our knowledge on this medically and veterinary important pathogen.

  16. The red deer Cervus elaphus genome CerEla1.0: sequencing, annotating, genes, and chromosomes.

    Science.gov (United States)

    Bana, Nóra Á; Nyiri, Anna; Nagy, János; Frank, Krisztián; Nagy, Tibor; Stéger, Viktor; Schiller, Mátyás; Lakatos, Péter; Sugár, László; Horn, Péter; Barta, Endre; Orosz, László

    2018-01-02

    We present here the de novo genome assembly CerEla1.0 for the red deer, Cervus elaphus, an emblematic member of the natural megafauna of the Northern Hemisphere. Humans spread the species in the South. Today, the red deer is also a farm-bred animal and is becoming a model animal in biomedical and population studies. Stag DNA was sequenced at 74× coverage by Illumina technology. The ALLPATHS-LG assembly of the reads resulted in 34.7 × 10 3 scaffolds, 26.1 × 10 3 of which were utilized in Cer.Ela1.0. The assembly spans 3.4 Gbp. For building the red deer pseudochromosomes, a pre-established genetic map was used for main anchor points. A nearly complete co-linearity was found between the mapmarker sequences of the deer genetic map and the order and orientation of the orthologous sequences in the syntenic bovine regions. Syntenies were also conserved at the in-scaffold level. The cM distances corresponded to 1.34 Mbp uniformly along the deer genome. Chromosomal rearrangements between deer and cattle were demonstrated. 2.8 × 10 6 SNPs, 365 × 10 3 indels and 19368 protein-coding genes were identified in CerEla1.0, along with positions for centromerons. CerEla1.0 demonstrates the utilization of dual references, i.e., when a target genome (here C. elaphus) already has a pre-established genetic map, and is combined with the well-established whole genome sequence of a closely related species (here Bos taurus). Genome-wide association studies (GWAS) that CerEla1.0 (NCBI, MKHE00000000) could serve for are discussed.

  17. Algal Functional Annotation Tool: a web-based analysis suite to functionally interpret large gene lists using integrated annotation and expression data

    Directory of Open Access Journals (Sweden)

    Merchant Sabeeha S

    2011-07-01

    Full Text Available Abstract Background Progress in genome sequencing is proceeding at an exponential pace, and several new algal genomes are becoming available every year. One of the challenges facing the community is the association of protein sequences encoded in the genomes with biological function. While most genome assembly projects generate annotations for predicted protein sequences, they are usually limited and integrate functional terms from a limited number of databases. Another challenge is the use of annotations to interpret large lists of 'interesting' genes generated by genome-scale datasets. Previously, these gene lists had to be analyzed across several independent biological databases, often on a gene-by-gene basis. In contrast, several annotation databases, such as DAVID, integrate data from multiple functional databases and reveal underlying biological themes of large gene lists. While several such databases have been constructed for animals, none is currently available for the study of algae. Due to renewed interest in algae as potential sources of biofuels and the emergence of multiple algal genome sequences, a significant need has arisen for such a database to process the growing compendiums of algal genomic data. Description The Algal Functional Annotation Tool is a web-based comprehensive analysis suite integrating annotation data from several pathway, ontology, and protein family databases. The current version provides annotation for the model alga Chlamydomonas reinhardtii, and in the future will include additional genomes. The site allows users to interpret large gene lists by identifying associated functional terms, and their enrichment. Additionally, expression data for several experimental conditions were compiled and analyzed to provide an expression-based enrichment search. A tool to search for functionally-related genes based on gene expression across these conditions is also provided. Other features include dynamic visualization of

  18. Combining genomic sequencing methods to explore viral diversity and reveal potential virus-host interactions

    Directory of Open Access Journals (Sweden)

    Cheryl-Emiliane Tien Chow

    2015-04-01

    Full Text Available Viral diversity and virus-host interactions in oxygen-starved regions of the ocean, also known as oxygen minimum zones (OMZs, remain relatively unexplored. Microbial community metabolism in OMZs alters nutrient and energy flow through marine food webs, resulting in biological nitrogen loss and greenhouse gas production. Thus, viruses infecting OMZ microbes have the potential to modulate community metabolism with resulting feedback on ecosystem function. Here, we describe viral communities inhabiting oxic surface (10m and oxygen-starved basin (200m waters of Saanich Inlet, a seasonally anoxic fjord on the coast of Vancouver Island, British Columbia using viral metagenomics and complete viral fosmid sequencing on samples collected between April 2007 and April 2010. Of 6459 open reading frames (ORFs predicted across all 34 viral fosmids, 77.6% (n=5010 had no homology to reference viral genomes. These fosmids recruited a higher proportion of viral metagenomic sequences from Saanich Inlet than from nearby northeastern subarctic Pacific Ocean (Line P waters, indicating differences in the viral communities between coastal and open ocean locations. While functional annotations of fosmid ORFs were limited, recruitment to NCBI’s non-redundant ‘nr’ database and publicly available single-cell genomes identified putative viruses infecting marine thaumarchaeal and SUP05 proteobacteria to provide potential host linkages with relevance to coupled biogeochemical cycling processes in OMZ waters. Taken together, these results highlight the power of coupled analyses of multiple sequence data types, such as viral metagenomic and fosmid sequence data with prokaryotic single cell genomes, to chart viral diversity, elucidate genomic and ecological contexts for previously unclassifiable viral sequences, and identify novel host interactions in natural and engineered ecosystems.

  19. Genome-wide identification of Bcl11b gene targets reveals role in brain-derived neurotrophic factor signaling.

    Directory of Open Access Journals (Sweden)

    Bin Tang

    Full Text Available B-cell leukemia/lymphoma 11B (Bcl11b is a transcription factor showing predominant expression in the striatum. To date, there are no known gene targets of Bcl11b in the nervous system. Here, we define targets for Bcl11b in striatal cells by performing chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq in combination with genome-wide expression profiling. Transcriptome-wide analysis revealed that 694 genes were significantly altered in striatal cells over-expressing Bcl11b, including genes showing striatal-enriched expression similar to Bcl11b. ChIP-seq analysis demonstrated that Bcl11b bound a mixture of coding and non-coding sequences that were within 10 kb of the transcription start site of an annotated gene. Integrating all ChIP-seq hits with the microarray expression data, 248 direct targets of Bcl11b were identified. Functional analysis on the integrated gene target list identified several zinc-finger encoding genes as Bcl11b targets, and further revealed a significant association of Bcl11b to brain-derived neurotrophic factor/neurotrophin signaling. Analysis of ChIP-seq binding regions revealed significant consensus DNA binding motifs for Bcl11b. These data implicate Bcl11b as a novel regulator of the BDNF signaling pathway, which is disrupted in many neurological disorders. Specific targeting of the Bcl11b-DNA interaction could represent a novel therapeutic approach to lowering BDNF signaling specifically in striatal cells.

  20. Analysis of 90 Mb of the potato genome reveals conservation of gene structures and order with tomato but divergence in repetitive sequence composition

    Directory of Open Access Journals (Sweden)

    O'Brien Kimberly

    2008-06-01

    Full Text Available Abstract Background The Solanaceae family contains a number of important crop species including potato (Solanum tuberosum which is grown for its underground storage organ known as a tuber. Albeit the 4th most important food crop in the world, other than a collection of ~220,000 Expressed Sequence Tags, limited genomic sequence information is currently available for potato and advances in potato yield and nutrition content would be greatly assisted through access to a complete genome sequence. While morphologically diverse, Solanaceae species such as potato, tomato, pepper, and eggplant share not only genes but also gene order thereby permitting highly informative comparative genomic analyses. Results In this study, we report on analysis 89.9 Mb of potato genomic sequence representing 10.2% of the genome generated through end sequencing of a potato bacterial artificial chromosome (BAC clone library (87 Mb and sequencing of 22 potato BAC clones (2.9 Mb. The GC content of potato is very similar to Solanum lycopersicon (tomato and other dicotyledonous species yet distinct from the monocotyledonous grass species, Oryza sativa. Parallel analyses of repetitive sequences in potato and tomato revealed substantial differences in their abundance, 34.2% in potato versus 46.3% in tomato, which is consistent with the increased genome size per haploid genome of these two Solanum species. Specific classes and types of repetitive sequences were also differentially represented between these two species including a telomeric-related repetitive sequence, ribosomal DNA, and a number of unclassified repetitive sequences. Comparative analyses between tomato and potato at the gene level revealed a high level of conservation of gene content, genic feature, and gene order although discordances in synteny were observed. Conclusion Genomic level analyses of potato and tomato confirm that gene sequence and gene order are conserved between these solanaceous species and that

  1. Constraints on genome dynamics revealed from gene distribution among the Ralstonia solanacearum species.

    Directory of Open Access Journals (Sweden)

    Pierre Lefeuvre

    Full Text Available Because it is suspected that gene content may partly explain host adaptation and ecology of pathogenic bacteria, it is important to study factors affecting genome composition and its evolution. While recent genomic advances have revealed extremely large pan-genomes for some bacterial species, it remains difficult to predict to what extent gene pool is accessible within or transferable between populations. As genomes bear imprints of the history of the organisms, gene distribution pattern analyses should provide insights into the forces and factors at play in the shaping and maintaining of bacterial genomes. In this study, we revisited the data obtained from a previous CGH microarrays analysis in order to assess the genomic plasticity of the R. solanacearum species complex. Gene distribution analyses demonstrated the remarkably dispersed genome of R. solanacearum with more than half of the genes being accessory. From the reconstruction of the ancestral genomes compositions, we were able to infer the number of gene gain and loss events along the phylogeny. Analyses of gene movement patterns reveal that factors associated with gene function, genomic localization and ecology delineate gene flow patterns. While the chromosome displayed lower rates of movement, the megaplasmid was clearly associated with hot-spots of gene gain and loss. Gene function was also confirmed to be an essential factor in gene gain and loss dynamics with significant differences in movement patterns between different COG categories. Finally, analyses of gene distribution highlighted possible highways of horizontal gene transfer. Due to sampling and design bias, we can only speculate on factors at play in this gene movement dynamic. Further studies examining precise conditions that favor gene transfer would provide invaluable insights in the fate of bacteria, species delineation and the emergence of successful pathogens.

  2. The Population Genomics of Sunflowers and Genomic Determinants of Protein Evolution Revealed by RNAseq

    Directory of Open Access Journals (Sweden)

    Loren H. Rieseberg

    2012-10-01

    Full Text Available Few studies have investigated the causes of evolutionary rate variation among plant nuclear genes, especially in recently diverged species still capable of hybridizing in the wild. The recent advent of Next Generation Sequencing (NGS permits investigation of genome wide rates of protein evolution and the role of selection in generating and maintaining divergence. Here, we use individual whole-transcriptome sequencing (RNAseq to refine our understanding of the population genomics of wild species of sunflowers (Helianthus spp. and the factors that affect rates of protein evolution. We aligned 35 GB of transcriptome sequencing data and identified 433,257 polymorphic sites (SNPs in a reference transcriptome comprising 16,312 genes. Using SNP markers, we identified strong population clustering largely corresponding to the three species analyzed here (Helianthus annuus, H. petiolaris, H. debilis, with one distinct early generation hybrid. Then, we calculated the proportions of adaptive substitution fixed by selection (alpha and identified gene ontology categories with elevated values of alpha. The “response to biotic stimulus” category had the highest mean alpha across the three interspecific comparisons, implying that natural selection imposed by other organisms plays an important role in driving protein evolution in wild sunflowers. Finally, we examined the relationship between protein evolution (dN/dS ratio and several genomic factors predicted to co-vary with protein evolution (gene expression level, divergence and specificity, genetic divergence [FST], and nucleotide diversity pi. We find that variation in rates of protein divergence was correlated with gene expression level and specificity, consistent with results from a broad range of taxa and timescales. This would in turn imply that these factors govern protein evolution both at a microevolutionary and macroevolutionary timescale. Our results contribute to a general understanding of the

  3. Genomic insights into the Acidobacteria reveal strategies for their success in terrestrial environments

    Science.gov (United States)

    Trojan, Daniela; Roux, Simon; Herbold, Craig; Rattei, Thomas; Woebken, Dagmar

    2018-01-01

    Summary Members of the phylum Acidobacteria are abundant and ubiquitous across soils. We performed a large‐scale comparative genome analysis spanning subdivisions 1, 3, 4, 6, 8 and 23 (n = 24) with the goal to identify features to help explain their prevalence in soils and understand their ecophysiology. Our analysis revealed that bacteriophage integration events along with transposable and mobile elements influenced the structure and plasticity of these genomes. Low‐ and high‐affinity respiratory oxygen reductases were detected in multiple genomes, suggesting the capacity for growing across different oxygen gradients. Among many genomes, the capacity to use a diverse collection of carbohydrates, as well as inorganic and organic nitrogen sources (such as via extracellular peptidases), was detected – both advantageous traits in environments with fluctuating nutrient environments. We also identified multiple soil acidobacteria with the potential to scavenge atmospheric concentrations of H2, now encompassing mesophilic soil strains within the subdivision 1 and 3, in addition to a previously identified thermophilic strain in subdivision 4. This large‐scale acidobacteria genome analysis reveal traits that provide genomic, physiological and metabolic versatility, presumably allowing flexibility and versatility in the challenging and fluctuating soil environment. PMID:29327410

  4. Adaptations to a Subterranean Environment and Longevity Revealed by the Analysis of Mole Rat Genomes

    Directory of Open Access Journals (Sweden)

    Xiaodong Fang

    2014-09-01

    Full Text Available Subterranean mammals spend their lives in dark, unventilated environments that are rich in carbon dioxide and ammonia and low in oxygen. Many of these animals are also long-lived and exhibit reduced aging-associated diseases, such as neurodegenerative disorders and cancer. We sequenced the genome of the Damaraland mole rat (DMR, Fukomys damarensis and improved the genome assembly of the naked mole rat (NMR, Heterocephalus glaber. Comparative genome analyses, along with the transcriptomes of related subterranean rodents, revealed candidate molecular adaptations for subterranean life and longevity, including a divergent insulin peptide, expression of oxygen-carrying globins in the brain, prevention of high CO2-induced pain perception, and enhanced ammonia detoxification. Juxtaposition of the genomes of DMR and other more conventional animals with the genome of NMR revealed several truly exceptional NMR features: unusual thermogenesis, an aberrant melatonin system, pain insensitivity, and unique processing of 28S rRNA. Together, these genomes and transcriptomes extend our understanding of subterranean adaptations, stress resistance, and longevity.

  5. Transcriptional Analysis Allows Genome Reannotation and Reveals that Cryptococcus gattii VGII Undergoes Nutrient Restriction during Infection

    Directory of Open Access Journals (Sweden)

    Patrícia Aline Gröhs Ferrareze

    2017-08-01

    Full Text Available Cryptococcus gattii is a human and animal pathogen that infects healthy hosts and caused the Pacific Northwest outbreak of cryptococcosis. The inhalation of infectious propagules can lead to internalization of cryptococcal cells by alveolar macrophages, a niche in which C. gattii cells can survive and proliferate. Although the nutrient composition of macrophages is relatively unknown, the high induction of amino acid transporter genes inside the phagosome indicates a preference for amino acid uptake instead of synthesis. However, the presence of countable errors in the R265 genome annotation indicates significant inhibition of transcriptomic analysis in this hypervirulent strain. Thus, we analyzed RNA-Seq data from in vivo and in vitro cultures of C. gattii R265 to perform the reannotation of the genome. In addition, based on in vivo transcriptomic data, we identified highly expressed genes and pathways of amino acid metabolism that would enable C. gattii to survive and proliferate in vivo. Importantly, we identified high expression in three APC amino acid transporters as well as the GABA permease. The use of amino acids as carbon and nitrogen sources, releasing ammonium and generating carbohydrate metabolism intermediaries, also explains the high expression of components of several degradative pathways, since glucose starvation is an important host defense mechanism.

  6. Experimental annotation of post-translational features and translated coding regions in the pathogen Salmonella Typhimurium

    Energy Technology Data Exchange (ETDEWEB)

    Ansong, Charles; Tolic, Nikola; Purvine, Samuel O.; Porwollik, Steffen; Jones, Marcus B.; Yoon, Hyunjin; Payne, Samuel H.; Martin, Jessica L.; Burnet, Meagan C.; Monroe, Matthew E.; Venepally, Pratap; Smith, Richard D.; Peterson, Scott; Heffron, Fred; Mcclelland, Michael; Adkins, Joshua N.

    2011-08-25

    Complete and accurate genome annotation is crucial for comprehensive and systematic studies of biological systems. For example systems biology-oriented genome scale modeling efforts greatly benefit from accurate annotation of protein-coding genes to develop proper functioning models. However, determining protein-coding genes for most new genomes is almost completely performed by inference, using computational predictions with significant documented error rates (> 15%). Furthermore, gene prediction programs provide no information on biologically important post-translational processing events critical for protein function. With the ability to directly measure peptides arising from expressed proteins, mass spectrometry-based proteomics approaches can be used to augment and verify coding regions of a genomic sequence and importantly detect post-translational processing events. In this study we utilized “shotgun” proteomics to guide accurate primary genome annotation of the bacterial pathogen Salmonella Typhimurium 14028 to facilitate a systems-level understanding of Salmonella biology. The data provides protein-level experimental confirmation for 44% of predicted protein-coding genes, suggests revisions to 48 genes assigned incorrect translational start sites, and uncovers 13 non-annotated genes missed by gene prediction programs. We also present a comprehensive analysis of post-translational processing events in Salmonella, revealing a wide range of complex chemical modifications (70 distinct modifications) and confirming more than 130 signal peptide and N-terminal methionine cleavage events in Salmonella. This study highlights several ways in which proteomics data applied during the primary stages of annotation can improve the quality of genome annotations, especially with regards to the annotation of mature protein products.

  7. Annotation of differentially expressed genes in the somatic embryogenesis of musa and their location in the banana genome.

    Science.gov (United States)

    Maldonado-Borges, Josefina Ines; Ku-Cauich, José Roberto; Escobedo-Graciamedrano, Rosa Maria

    2013-01-01

    Analysis of cDNA-AFLP was used to study the genes expressed in zygotic and somatic embryogenesis of Musa acuminata Colla ssp. malaccensis, and a comparison was made between their differential transcribed fragments (TDFs) and the sequenced genome of the double haploid- (DH-) Pahang of the malaccensis subspecies that is available in the network. A total of 253 transcript-derived fragments (TDFs) were detected with apparent size of 100-4000 bp using 5 pairs of AFLP primers, of which 21 were differentially expressed during the different stages of banana embryogenesis; 15 of the sequences have matched DH-Pahang chromosomes, with 7 of them being homologous to gene sequences encoding either known or putative protein domains of higher plants. Four TDF sequences were located in all Musa chromosomes, while the rest were located in one or two chromosomes. Their putative individual function is briefly reviewed based on published information, and the potential roles of these genes in embryo development are discussed. Thus the availability of the genome of Musa and the information of TDFs sequences presented here opens new possibilities for an in-depth study of the molecular and biochemical research of zygotic and somatic embryogenesis of Musa.

  8. The genome of the seagrass Zostera marina reveals angiosperm adaptation to the sea

    NARCIS (Netherlands)

    Olsen, Jeanine; Rouzé, Pierre; Verhelst, Bram; Lin, Yao-Cheng; Bayer, Till; Collen, Jonas; Dattolo, Emanuela; De Paoli, Emanuele; Dittami, Simon; Maumus, Florian; Michel, Gurvan; Kersting, Anna; Lauritano, Chiara; Lohaus, Rolf; Töpel, Mats; Tonon, Thierry; Vanneste, Kevin; Amirebrahimi, Mojgan; Brakel, Janina; Boström, Christoffer; Chovatia, Mansi; Grimwood, Jane; Jenkins, Jerry W; Jueterbock, Alexander; Mraz, Amy; Stam, Wytze T; Tice, Hope; Bornberg-Bauer, Erich; Green, Pamela J; Pearson, Gareth A; Procaccini, Gabriele; Duarte, Carlos M; Schmutz, Jeremy; Reusch, Thorsten B H; Van de Peer, Yves

    2016-01-01

    Seagrasses colonized the sea on at least three independent occasions to form the basis of one of the most productive and widespread coastal ecosystems on the planet. Here we report the genome of Zostera marina (L.), the first, to our knowledge, marine angiosperm to be fully sequenced. This reveals

  9. Flexibility and symmetry of prokaryotic genome rearrangement reveal lineage-associated core-gene-defined genome organizational frameworks.

    Science.gov (United States)

    Kang, Yu; Gu, Chaohao; Yuan, Lina; Wang, Yue; Zhu, Yanmin; Li, Xinna; Luo, Qibin; Xiao, Jingfa; Jiang, Daquan; Qian, Minping; Ahmed Khan, Aftab; Chen, Fei; Zhang, Zhang; Yu, Jun

    2014-11-25

    The prokaryotic pangenome partitions genes into core and dispensable genes. The order of core genes, albeit assumed to be stable under selection in general, is frequently interrupted by horizontal gene transfer and rearrangement, but how a core-gene-defined genome maintains its stability or flexibility remains to be investigated. Based on data from 30 species, including 425 genomes from six phyla, we grouped core genes into syntenic blocks in the context of a pangenome according to their stability across multiple isolates. A subset of the core genes, often species specific and lineage associated, formed a core-gene-defined genome organizational framework (cGOF). Such cGOFs are either single segmental (one-third of the species analyzed) or multisegmental (the rest). Multisegment cGOFs were further classified into symmetric or asymmetric according to segment orientations toward the origin-terminus axis. The cGOFs in Gram-positive species are exclusively symmetric and often reversible in orientation, as opposed to those of the Gram-negative bacteria, which are all asymmetric and irreversible. Meanwhile, all species showing strong strand-biased gene distribution contain symmetric cGOFs and often specific DnaE (α subunit of DNA polymerase III) isoforms. Furthermore, functional evaluations revealed that cGOF genes are hub associated with regard to cellular activities, and the stability of cGOF provides efficient indexes for scaffold orientation as demonstrated by assembling virtual and empirical genome drafts. cGOFs show species specificity, and the symmetry of multisegmental cGOFs is conserved among taxa and constrained by DNA polymerase-centric strand-biased gene distribution. The definition of species-specific cGOFs provides powerful guidance for genome assembly and other structure-based analysis. Prokaryotic genomes are frequently interrupted by horizontal gene transfer (HGT) and rearrangement. To know whether there is a set of genes not only conserved in position

  10. The Methanosarcina barkeri genome: comparative analysis withMethanosarcina acetivorans and Methanosarcina mazei reveals extensiverearrangement within methanosarcinal genomes

    Energy Technology Data Exchange (ETDEWEB)

    Maeder, Dennis L.; Anderson, Iain; Brettin, Thomas S.; Bruce,David C.; Gilna, Paul; Han, Cliff S.; Lapidus, Alla; Metcalf, William W.; Saunders, Elizabeth; Tapia, Roxanne; Sowers, Kevin R.

    2006-05-19

    We report here a comparative analysis of the genome sequence of Methanosarcina barkeri with those of Methanosarcina acetivorans and Methanosarcina mazei. All three genomes share a conserved double origin of replication and many gene clusters. M. barkeri is distinguished by having an organization that is well conserved with respect to the other Methanosarcinae in the region proximal to the origin of replication with interspecies gene similarities as high as 95%. However it is disordered and marked by increased transposase frequency and decreased gene synteny and gene density in the proximal semi-genome. Of the 3680 open reading frames in M. barkeri, 678 had paralogs with better than 80% similarity to both M. acetivorans and M. mazei while 128 nonhypothetical orfs were unique (non-paralogous) amongst these species including a complete formate dehydrogenase operon, two genes required for N-acetylmuramic acid synthesis, a 14 gene gas vesicle cluster and a bacterial P450-specific ferredoxin reductase cluster not previously observed or characterized in this genus. A cryptic 36 kbp plasmid sequence was detected in M. barkeri that contains an orc1 gene flanked by a presumptive origin of replication consisting of 38 tandem repeats of a 143 nt motif. Three-way comparison of these genomes reveals differing mechanisms for the accrual of changes. Elongation of the large M. acetivorans is the result of multiple gene-scale insertions and duplications uniformly distributed in that genome, while M. barkeri is characterized by localized inversions associated with the loss of gene content. In contrast, the relatively short M. mazei most closely approximates the ancestral organizational state.

  11. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3.

    Science.gov (United States)

    Cingolani, Pablo; Platts, Adrian; Wang, Le Lily; Coon, Melissa; Nguyen, Tung; Wang, Luan; Land, Susan J; Lu, Xiangyi; Ruden, Douglas M

    2012-01-01

    We describe a new computer program, SnpEff, for rapidly categorizing the effects of variants in genome sequences. Once a genome is sequenced, SnpEff annotates variants based on their genomic locations and predicts coding effects. Annotated genomic locations include intronic, untranslated region, upstream, downstream, splice site, or intergenic regions. Coding effects such as synonymous or non-synonymous amino acid replacement, start codon gains or losses, stop codon gains or losses, or frame shifts can be predicted. Here the use of SnpEff is illustrated by annotating ~356,660 candidate SNPs in ~117 Mb unique sequences, representing a substitution rate of ~1/305 nucleotides, between the Drosophila melanogaster w(1118); iso-2; iso-3 strain and the reference y(1); cn(1) bw(1) sp(1) strain. We show that ~15,842 SNPs are synonymous and ~4,467 SNPs are non-synonymous (N/S ~0.28). The remaining SNPs are in other categories, such as stop codon gains (38 SNPs), stop codon losses (8 SNPs), and start codon gains (297 SNPs) in the 5'UTR. We found, as expected, that the SNP frequency is proportional to the recombination frequency (i.e., highest in the middle of chromosome arms). We also found that start-gain or stop-lost SNPs in Drosophila melanogaster often result in additions of N-terminal or C-terminal amino acids that are conserved in other Drosophila species. It appears that the 5' and 3' UTRs are reservoirs for genetic variations that changes the termini of proteins during evolution of the Drosophila genus. As genome sequencing is becoming inexpensive and routine, SnpEff enables rapid analyses of whole-genome sequencing data to be performed by an individual laboratory.

  12. Within-Host Variations of Human Papillomavirus Reveal APOBEC-Signature Mutagenesis in the Viral Genome.

    Science.gov (United States)

    Hirose, Yusuke; Onuki, Mamiko; Tenjimbayashi, Yuri; Mori, Seiichiro; Ishii, Yoshiyuki; Takeuchi, Takamasa; Tasaka, Nobutaka; Satoh, Toyomi; Morisada, Tohru; Iwata, Takashi; Miyamoto, Shingo; Matsumoto, Koji; Sekizawa, Akihiko; Kukimoto, Iwao

    2018-03-28

    Persistent infection with oncogenic human papillomaviruses (HPVs) causes cervical cancer, accompanied with the accumulation of somatic mutations into the host genome. There are concomitant genetic changes in the HPV genome during viral infection; however, their relevance to cervical carcinogenesis is poorly understood. Here we explored within-host genetic diversity of HPV by performing deep sequencing analyses of viral whole-genome sequences in clinical specimens. The whole genomes of HPV types 16, 52 and 58 were amplified by type-specific PCR from total cellular DNA of cervical exfoliated cells collected from patients with cervical intraepithelial neoplasia (CIN) and invasive cervical cancer (ICC), and were deep-sequenced. After constructing a reference vial genome sequence for each specimen, nucleotide positions showing changes with > 0.5% frequencies compared to the reference sequence were determined for individual samples. In total, 1,052 positions of nucleotide variations were detected in HPV genomes from 151 samples (CIN1, n = 56; CIN2/3, n = 68; ICC, n = 27), with varying numbers per sample. Overall, C-to-T and C-to-A substitutions were the dominant changes observed across all histological grades. While C-to-T transitions were predominantly detected in CIN1, their prevalence was decreased in CIN2/3 and fell below that of C-to-A transversions in ICC. Analysis of the tri-nucleotides context encompassing substituted bases revealed that Tp C pN, a preferred target sequence for cellular APOBEC cytosine deaminases, was a primary site for C-to-T substitutions in the HPV genome. These results strongly imply that the APOBEC proteins are drivers of HPV genome mutation, particularly in CIN1 lesions. IMPORTANCE HPVs exhibit surprisingly high levels of genetic diversity, including a large repertoire of minor genomic variants in each viral genotype. Here, by conducting deep sequencing analyses, we show for the first time a comprehensive snapshot of the "within

  13. Whole genome sequencing revealed host adaptation-focused genomic plasticity of pathogenic Leptospira

    Science.gov (United States)

    Xu, Yinghua; Zhu, Yongzhang; Wang, Yuezhu; Chang, Yung-Fu; Zhang, Ying; Jiang, Xiugao; Zhuang, Xuran; Zhu, Yongqiang; Zhang, Jinlong; Zeng, Lingbing; Yang, Minjun; Li, Shijun; Wang, Shengyue; Ye, Qiang; Xin, Xiaofang; Zhao, Guoping; Zheng, Huajun; Guo, Xiaokui; Wang, Junzhi

    2016-01-01

    Leptospirosis, caused by pathogenic Leptospira spp., has recently been recognized as an emerging infectious disease worldwide. Despite its severity and global importance, knowledge about the molecular pathogenesis and virulence evolution of Leptospira spp. remains limited. Here we sequenced and analyzed 102 isolates representing global sources. A high genomic variability were observed among different Leptospira species, which was attributed to massive gene gain and loss events allowing for adaptation to specific niche conditions and changing host environments. Horizontal gene transfer and gene duplication allowed the stepwise acquisition of virulence factors in pathogenic Leptospira evolved from a recent common ancestor. More importantly, the abundant expansion of specific virulence-related protein families, such as metalloproteases-associated paralogs, were exclusively identified in pathogenic species, reflecting the importance of these protein families in the pathogenesis of leptospirosis. Our observations also indicated that positive selection played a crucial role on this bacteria adaptation to hosts. These novel findings may lead to greater understanding of the global diversity and virulence evolution of Leptospira spp. PMID:26833181

  14. Targeted Genome Sequencing Reveals Varicella-Zoster Virus Open Reading Frame 12 Deletion.

    Science.gov (United States)

    Cohrs, Randall J; Lee, Katherine S; Beach, Addilynn; Sanford, Bridget; Baird, Nicholas L; Como, Christina; Graybill, Chiharu; Jones, Dallas; Tekeste, Eden; Ballard, Mitchell; Chen, Xiaomi; Yalacki, David; Frietze, Seth; Jones, Kenneth; Lenac Rovis, Tihana; Jonjić, Stipan; Haas, Jürgen; Gilden, Don

    2017-10-15

    The neurotropic herpesvirus varicella-zoster virus (VZV) establishes a lifelong latent infection in humans following primary infection. The low abundance of VZV nucleic acids in human neurons has hindered an understanding of the mechanisms that regulate viral gene transcription during latency. To overcome this critical barrier, we optimized a targeted capture protocol to enrich VZV DNA and cDNA prior to whole-genome/transcriptome sequence analysis. Since the VZV genome is remarkably stable, it was surprising to detect that VZV32, a VZV laboratory strain with no discernible growth defect in tissue culture, contained a 2,158-bp deletion in open reading frame (ORF) 12. Consequently, ORF 12 and 13 protein expression was abolished and Akt phosphorylation was inhibited. The discovery of the ORF 12 deletion, revealed through targeted genome sequencing analysis, points to the need to authenticate the VZV genome when the virus is propagated in tissue culture. IMPORTANCE Viruses isolated from clinical samples often undergo genetic modifications when cultured in the laboratory. Historically, VZV is among the most genetically stable herpesviruses, a notion supported by more than 60 complete genome sequences from multiple isolates and following multiple in vitro passages. However, application of enrichment protocols to targeted genome sequencing revealed the unexpected deletion of a significant portion of VZV ORF 12 following propagation in cultured human fibroblast cells. While the enrichment protocol did not introduce bias in either the virus genome or transcriptome, the findings indicate the need for authentication of VZV by sequencing when the virus is propagated in tissue culture. Copyright © 2017 American Society for Microbiology.

  15. Chado controller: advanced annotation management with a community annotation system.

    Science.gov (United States)

    Guignon, Valentin; Droc, Gaëtan; Alaux, Michael; Baurens, Franc-Christophe; Garsmeur, Olivier; Poiron, Claire; Carver, Tim; Rouard, Mathieu; Bocs, Stéphanie

    2012-04-01

    We developed a controller that is compliant with the Chado database schema, GBrowse and genome annotation-editing tools such as Artemis and Apollo. It enables the management of public and private data, monitors manual annotation (with controlled vocabularies, structural and functional annotation controls) and stores versions of annotation for all modified features. The Chado controller uses PostgreSQL and Perl. The Chado Controller package is available for download at http://www.gnpannot.org/content/chado-controller and runs on any Unix-like operating system, and documentation is available at http://www.gnpannot.org/content/chado-controller-doc The system can be tested using the GNPAnnot Sandbox at http://www.gnpannot.org/content/gnpannot-sandbox-form valentin.guignon@cirad.fr; stephanie.sidibe-bocs@cirad.fr Supplementary data are available at Bioinformatics online.

  16. Functional annotation of rheumatoid arthritis and osteoarthritis associated genes by integrative genome-wide gene expression profiling analysis.

    Directory of Open Access Journals (Sweden)

    Zhan-Chun Li

    Full Text Available BACKGROUND: Rheumatoid arthritis (RA and osteoarthritis (OA are two major types of joint diseases that share multiple common symptoms. However, their pathological mechanism remains largely unknown. The aim of our study is to identify RA and OA related-genes and gain an insight into the underlying genetic basis of these diseases. METHODS: We collected 11 whole genome-wide expression profiling datasets from RA and OA cohorts and performed a meta-analysis to comprehensively investigate their expression signatures. This method can avoid some pitfalls of single dataset analyses. RESULTS AND CONCLUSION: We found that several biological pathways (i.e., the immunity, inflammation and apoptosis related pathways are commonly involved in the development of both RA and OA. Whereas several other pathways (i.e., vasopressin-related pathway, regulation of autophagy, endocytosis, calcium transport and endoplasmic reticulum stress related pathways present significant difference between RA and OA. This study provides novel insights into the molecular mechanisms underlying this disease, thereby aiding the diagnosis and treatment of the disease.

  17. The Physcomitrella genome reveals evolutionary insights into the conquest of land by plants

    Energy Technology Data Exchange (ETDEWEB)

    Rensing, Stefan A.; Lang, Daniel; Zimmer, Andreas D.; Terry, Astrid; Salamov, Asaf; Shapiro, Harris; Nishiyama, Tomaoki; Perroud, Pierre-Francois; Lindquist, Erika A.; Kamisugi, Yasuko; Tanahashi, Takako; Sakakibara, Keiko; Fujita, Tomomichi; Oishi, Kazuko; Shin, Tadasu; Kuroki, Yoko; Toyoda, Atsushi; Suzuki, Yutaka; Hashimoto, Shin-ichi; Yamaguchi, Kazuo; Sugano, Sumio; Kohara, Yuji; Fujiyama, Asao; Anterola, Aldwin; Aoki, Setsuyuki; Ashton, Neil; Barbazuk, W. Brad; Barker, Elizabeth; Bennetzen, Jeffrey L.; Blankenship, Robert; Cho, Sung Hyun; Dutcher, Susan K.; Estelle, Mark; Fawcett, Jeffrey A.; Gundlach, Heidrum; Hanada, Kousuke; Melkozernov, Alexander; Murata, Takashi; Nelson, David R.; Pils, Birgit; Prigge, Michael; Reiss, Bernd; Renner, Tanya; Rombauts, Stephane; Rushton, Paul J.; Sanderfoot, Anton; Schween, Gabriele; Shiu, Shin-Han; Stueber, Kurt; Theodoulou, Frederica L.; Tu, Hank; Van de Peer, Yves; Verrier, Paul J.; Waters, Elizabeth; Wood, Andrew; Yang, Lixing; Cove, David; Cuming, Andrew C.; Hasebe, Mitsayasu; Lucas, Susan; Mishler, Brent D.; Reski, Ralf; Grigoriev, Igor V.; Quatrano, Rakph S.; Boore, Jeffrey L.

    2007-09-18

    We report the draft genome sequence of the model moss Physcomitrella patens and compare its features with those of flowering plants, from which it is separated by more than 400 million years, and unicellular aquatic algae. This comparison reveals genomic changes concomitant with the evolutionary movement to land, including a general increase in gene family complexity; loss of genes associated with aquatic environments (e.g., flagellar arms); acquisition of genes for tolerating terrestrial stresses (e.g., variation in temperature and water availability); and the development of the auxin and abscisic acid signaling pathways for coordinating multicellular growth and dehydration response. The Physcomitrella genome provides a resource for phylogenetic inferences about gene function and for experimental analysis of plant processes through this plant's unique facility for reverse genetics.

  18. Transcriptional profiling in response to terminal drought stress reveals differential responses along the wheat genome

    Directory of Open Access Journals (Sweden)

    Ferrari Francesco

    2009-06-01

    Full Text Available Abstract Background Water stress during grain filling has a marked effect on grain yield, leading to a reduced endosperm cell number and thus sink capacity to accumulate dry matter. The bread wheat cultivar Chinese Spring (CS, a Chinese Spring terminal deletion line (CS_5AL-10 and the durum wheat cultivar Creso were subjected to transcriptional profiling after exposure to mild and severe drought stress at the grain filling stage to find evidences of differential stress responses associated to different wheat genome regions. Results The transcriptome analysis of Creso, CS and its deletion line revealed 8,552 non redundant probe sets with different expression levels, mainly due to the comparisons between the two species. The drought treatments modified the expression of 3,056 probe sets. Besides a set of genes showing a similar drought response in Creso and CS, cluster analysis revealed several drought response features that can be associated to the different genomic structure of Creso, CS and CS_5AL-10. Some drought-related genes were expressed at lower level (or not expressed in Creso (which lacks the D genome or in the CS_5AL-10 deletion line compared to CS. The chromosome location of a set of these genes was confirmed by PCR-based mapping on the D genome (or the 5AL-10 region. Many clusters were characterized by different level of expression in Creso, CS and CS_AL-10, suggesting that the different genome organization of the three genotypes may affect plant adaptation to stress. Clusters with similar expression trend were grouped and functional classified to mine the biological mean of their activation or repression. Genes involved in ABA, proline, glycine-betaine and sorbitol pathways were found up-regulated by drought stress. Furthermore, the enhanced expression of a set of transposons and retrotransposons was detected in CS_5AL-10. Conclusion Bread and durum wheat genotypes were characterized by a different physiological reaction to water

  19. Comparative Genomics Analyses Reveal Extensive Chromosome Colinearity and Novel Quantitative Trait Loci in Eucalyptus.

    Directory of Open Access Journals (Sweden)

    Fagen Li

    Full Text Available Dense genetic maps, along with quantitative trait loci (QTLs detected on such maps, are powerful tools for genomics and molecular breeding studies. In the important woody genus Eucalyptus, the recent release of E. grandis genome sequence allows for sequence-based genomic comparison and searching for positional candidate genes within QTL regions. Here, dense genetic maps were constructed for E. urophylla and E. tereticornis using genomic simple sequence repeats (SSR, expressed sequence tag (EST derived SSR, EST-derived cleaved amplified polymorphic sequence (EST-CAPS, and diversity arrays technology (DArT markers. The E. urophylla and E. tereticornis maps comprised 700 and 585 markers across 11 linkage groups, totaling at 1,208.2 and 1,241.4 cM in length, respectively. Extensive synteny and colinearity were observed as compared to three earlier DArT-based eucalypt maps (two maps with E. grandis × E. urophylla and one map of E. globulus and with the E. grandis genome sequence. Fifty-three QTLs for growth (10-56 months of age and wood density (56 months were identified in 22 discrete regions on both maps, in which only one colocalizaiton was found between growth and wood density. Novel QTLs were revealed as compared with those previously detected on DArT-based maps for similar ages in Eucalyptus. Eleven to 585 positional candidate genes were obained for a 56-month-old QTL through aligning QTL confidence interval with the E. grandis genome. These results will assist in comparative genomics studies, targeted gene characterization, and marker-assisted selection in Eucalyptus and the related taxa.

  20. Comparative Genomics Analyses Reveal Extensive Chromosome Colinearity and Novel Quantitative Trait Loci in Eucalyptus

    Science.gov (United States)

    Weng, Qijie; Li, Mei; Yu, Xiaoli; Guo, Yong; Wang, Yu; Zhang, Xiaohong; Gan, Siming

    2015-01-01

    Dense genetic maps, along with quantitative trait loci (QTLs) detected on such maps, are powerful tools for genomics and molecular breeding studies. In the important woody genus Eucalyptus, the recent release of E. grandis genome sequence allows for sequence-based genomic comparison and searching for positional candidate genes within QTL regions. Here, dense genetic maps were constructed for E. urophylla and E. tereticornis using genomic simple sequence repeats (SSR), expressed sequence tag (EST) derived SSR, EST-derived cleaved amplified polymorphic sequence (EST-CAPS), and diversity arrays technology (DArT) markers. The E. urophylla and E. tereticornis maps comprised 700 and 585 markers across 11 linkage groups, totaling at 1,208.2 and 1,241.4 cM in length, respectively. Extensive synteny and colinearity were observed as compared to three earlier DArT-based eucalypt maps (two maps with E. grandis × E. urophylla and one map of E. globulus) and with the E. grandis genome sequence. Fifty-three QTLs for growth (10–56 months of age) and wood density (56 months) were identified in 22 discrete regions on both maps, in which only one colocalizaiton was found between growth and wood density. Novel QTLs were revealed as compared with those previously detected on DArT-based maps for similar ages in Eucalyptus. Eleven to 585 positional candidate genes were obained for a 56-month-old QTL through aligning QTL confidence interval with the E. grandis genome. These results will assist in comparative genomics studies, targeted gene characterization, and marker-assisted selection in Eucalyptus and the related taxa. PMID:26695430

  1. Neolithic and Medieval virus genomes reveal complex evolution of Hepatitis B.

    Science.gov (United States)

    Krause-Kyora, Ben; Susat, Julian; Key, Felix M; Kühnert, Denise; Bosse, Esther; Immel, Alexander; Rinne, Christoph; Kornell, Sabin-Christin; Yepes, Diego; Franzenburg, Sören; Heyne, Henrike O; Meier, Thomas; Lösch, Sandra; Meller, Harald; Friederich, Susanne; Nicklisch, Nicole; Alt, Kurt W; Schreiber, Stefan; Tholey, Andreas; Herbig, Alexander; Nebel, Almut; Krause, Johannes

    2018-05-10

    The hepatitis B virus (HBV) is one of the most widespread human pathogens known today, yet its origin and evolutionary history are still unclear and controversial. Here, we report the analysis of three ancient HBV genomes recovered from human skeletons found at three different archaeological sites in Germany. We reconstructed two Neolithic and one medieval HBV genomes by de novo assembly from shotgun DNA sequencing data. Additionally, we observed HBV-specific peptides using paleo-proteomics. Our results show that HBV circulates in the European population for at least 7000 years. The Neolithic HBV genomes show a high genomic similarity to each other. In a phylogenetic network, they do not group with any human-associated HBV genome and are most closely related to those infecting African non-human primates. These ancient virus forms appear to represent distinct lineages that have no close relatives today and possibly went extinct. Our results reveal the great potential of ancient DNA from human skeletons in order to study the long-time evolution of blood borne viruses. © 2018, Krause-Kyora et al.

  2. Comparative Genomics and Transcriptomics Analyses Reveal Divergent Lifestyle Features of Nematode Endoparasitic Fungus Hirsutella minnesotensis

    Science.gov (United States)

    Lai, Yiling; Liu, Keke; Zhang, Xinyu; Zhang, Xiaoling; Li, Kuan; Wang, Niuniu; Shu, Chi; Wu, Yunpeng; Wang, Chengshu; Bushley, Kathryn E.; Xiang, Meichun; Liu, Xingzhong

    2014-01-01

    Hirsutella minnesotensis [Ophiocordycipitaceae (Hypocreales, Ascomycota)] is a dominant endoparasitic fungus by using conidia that adhere to and penetrate the secondary stage juveniles of soybean cyst nematode. Its genome was de novo sequenced and compared with five entomopathogenic fungi in the Hypocreales and three nematode-trapping fungi in the Orbiliales (Ascomycota). The genome of H. minnesotensis is 51.4 Mb and encodes 12,702 genes enriched with transposable elements up to 32%. Phylogenomic analysis revealed that H. minnesotensis was diverged from entomopathogenic fungi in Hypocreales. Genome of H. minnesotensis is similar to those of entomopathogenic fungi to have fewer genes encoding lectins for adhesion and glycoside hydrolases for cellulose degradation, but is different from those of nematode-trapping fungi to possess more genes for protein degradation, signal transduction, and secondary metabolism. Those results indicate that H. minnesotensis has evolved different mechanism for nematode endoparasitism compared with nematode-trapping fungi. Transcriptomics analyses for the time-scale parasitism revealed the upregulations of lectins, secreted proteases and the genes for biosynthesis of secondary metabolites that could be putatively involved in host surface adhesion, cuticle degradation, and host manipulation. Genome and transcriptome analyses provided comprehensive understanding of the evolution and lifestyle of nematode endoparasitism. PMID:25359922

  3. Imaging mass spectrometry and genome mining reveal highly antifungal virulence factor of mushroom soft rot pathogen.

    Science.gov (United States)

    Graupner, Katharina; Scherlach, Kirstin; Bretschneider, Tom; Lackner, Gerald; Roth, Martin; Gross, Harald; Hertweck, Christian

    2012-12-21

    Caught in the act: imaging mass spectrometry of a button mushroom infected with the soft rot pathogen Janthinobacterium agaricidamnosum in conjunction with genome mining revealed jagaricin as a highly antifungal virulence factor that is not produced under standard cultivation conditions. The structure of jagaricin was rigorously elucidated by a combination of physicochemical analyses, chemical derivatization, and bioinformatics. Copyright © 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  4. Virus Genomes Reveal the Factors that Spread and Sustained the West African Ebola Epidemic

    Science.gov (United States)

    2016-08-09

    Ladner, J. T. et al. Evolution and Spread of Ebola Virus in Liberia , 2014--2015. Cell Host Microbe 18, 659–669 (2015). 15. Lemey, P. et al. Unifying...Virus genomes reveal the factors that spread and sustained the West African Ebola epidemic. Gytis Dudas1,2, Luiz Max Carvalho1, Trevor Bedford2...Charlesville, Liberia ., 19University of Sierra Leone, Freetown, Sierra Leone , 20Center for Systems Biology, Department of Organismic and Evolutionary

  5. Genome-Wide RNAi Ionomics Screen Reveals New Genes and Regulation of Human Trace Element Metabolism

    OpenAIRE

    Malinouski, Mikalai; Hasan, Nesrin M.; Zhang, Yan; Seravalli, Javier; Lin, Jie; Avanesov, Andrei; Lutsenko, Svetlana; Gladyshev, Vadim N.

    2017-01-01

    Trace elements are essential for human metabolism and dysregulation of their homeostasis is associated with numerous disorders. Here we characterize mechanisms that regulate trace elements in human cells by designing and performing a genome-wide high-throughput siRNA/ionomics screen, and examining top hits in cellular and biochemical assays. The screen reveals high stability of the ionomes, especially the zinc ionome, and yields known regulators and novel candidates. We further uncover fundam...

  6. Distinct Biological Potential of Streptococcus gordonii and Streptococcus sanguinis Revealed by Comparative Genome Analysis

    OpenAIRE

    Zheng, Wenning; Tan, Mui Fern; Old, Lesley A.; Paterson, Ian C.; Jakubovics, Nicholas S.; Choo, Siew Woh

    2017-01-01

    Streptococcus gordonii and Streptococcus sanguinis are pioneer colonizers of dental plaque and important agents of bacterial infective endocarditis (IE). To gain a greater understanding of these two closely related species, we performed comparative analyses on 14 new S. gordonii and 5 S. sanguinis strains using various bioinformatics approaches. We revealed S. gordonii and S. sanguinis harbor open pan-genomes and share generally high sequence homology and number of core genes including virule...

  7. Australian wild rice reveals pre-domestication origin of polymorphism deserts in rice genome.

    Directory of Open Access Journals (Sweden)

    Gopala Krishnan S

    Full Text Available BACKGROUND: Rice is a major source of human food with a predominantly Asian production base. Domestication involved selection of traits that are desirable for agriculture and to human consumers. Wild relatives of crop plants are a source of useful variation which is of immense value for crop improvement. Australian wild rices have been isolated from the impacts of domestication in Asia and represents a source of novel diversity for global rice improvement. Oryza rufipogon is a perennial wild progenitor of cultivated rice. Oryza meridionalis is a related annual species in Australia. RESULTS: We have examined the sequence of the genomes of AA genome wild rices from Australia that are close relatives of cultivated rice through whole genome re-sequencing. Assembly of the resequencing data to the O. sativa ssp. japonica cv. Nipponbare shows that Australian wild rices possess 2.5 times more single nucleotide polymorphisms than in the Asian wild rice and cultivated O. sativa ssp. indica. Analysis of the genome of domesticated rice reveals regions of low diversity that show very little variation (polymorphism deserts. Both the perennial and annual wild rice from Australia show a high degree of conservation of sequence with that found in cultivated rice in the same 4.58 Mbp region on chromosome 5, which suggests that some of the 'polymorphism deserts' in this and other parts of the rice genome may have originated prior to domestication due to natural selection. CONCLUSIONS: Analysis of genes in the 'polymorphism deserts' indicates that this selection may have been due to biotic or abiotic stress in the environment of early rice relatives. Despite having closely related sequences in these genome regions, the Australian wild populations represent an invaluable source of diversity supporting rice food security.

  8. Australian wild rice reveals pre-domestication origin of polymorphism deserts in rice genome.

    Science.gov (United States)

    Krishnan S, Gopala; Waters, Daniel L E; Henry, Robert J

    2014-01-01

    Rice is a major source of human food with a predominantly Asian production base. Domestication involved selection of traits that are desirable for agriculture and to human consumers. Wild relatives of crop plants are a source of useful variation which is of immense value for crop improvement. Australian wild rices have been isolated from the impacts of domestication in Asia and represents a source of novel diversity for global rice improvement. Oryza rufipogon is a perennial wild progenitor of cultivated rice. Oryza meridionalis is a related annual species in Australia. We have examined the sequence of the genomes of AA genome wild rices from Australia that are close relatives of cultivated rice through whole genome re-sequencing. Assembly of the resequencing data to the O. sativa ssp. japonica cv. Nipponbare shows that Australian wild rices possess 2.5 times more single nucleotide polymorphisms than in the Asian wild rice and cultivated O. sativa ssp. indica. Analysis of the genome of domesticated rice reveals regions of low diversity that show very little variation (polymorphism deserts). Both the perennial and annual wild rice from Australia show a high degree of conservation of sequence with that found in cultivated rice in the same 4.58 Mbp region on chromosome 5, which suggests that some of the 'polymorphism deserts' in this and other parts of the rice genome may have originated prior to domestication due to natural selection. Analysis of genes in the 'polymorphism deserts' indicates that this selection may have been due to biotic or abiotic stress in the environment of early rice relatives. Despite having closely related sequences in these genome regions, the Australian wild populations represent an invaluable source of diversity supporting rice food security.

  9. Genomic characterisation of Wongabel virus reveals novel genes within the Rhabdoviridae.

    Science.gov (United States)

    Gubala, Aneta J; Proll, David F; Barnard, Ross T; Cowled, Chris J; Crameri, Sandra G; Hyatt, Alex D; Boyle, David B

    2008-06-20

    Viruses belonging to the family Rhabdoviridae infect a variety of different hosts, including insects, vertebrates and plants. Currently, there are approximately 200 ICTV-recognised rhabdoviruses isolated around the world. However, the majority remain poorly characterised and only a fraction have been definitively assigned to genera. The genomic and transcriptional complexity displayed by several of the characterised rhabdoviruses indicates large diversity and complexity within this family. To enable an improved taxonomic understanding of this family, it is necessary to gain further information about the poorly characterised members of this family. Here we present the complete genome sequence and predicted transcription strategy of Wongabel virus (WONV), a previously uncharacterised rhabdovirus isolated from biting midges (Culicoides austropalpalis) collected in northern Queensland, Australia. The 13,196 nucleotide genome of WONV encodes five typical rhabdovirus genes N, P, M, G and L. In addition, the WONV genome contains three genes located between the P and M genes (U1, U2, U3) and two open reading frames overlapping with the N and G genes (U4, U5). These five additional genes and their putative protein products appear to be novel, and their functions are unknown. Predictive analysis of the U5 gene product revealed characteristics typical of viroporins, and indicated structural similarities with the alpha-1 protein (putative viroporin) of viruses in the genus Ephemerovirus. Phylogenetic analyses of the N and G proteins of WONV indicated closest similarity with the avian-associated Flanders virus; however, the genomes of these two viruses are significantly diverged. WONV displays a novel and unique genome structure that has not previously been described for any animal rhabdovirus.

  10. Draft genome of an Aerophobetes bacterium reveals a facultative lifestyle in deep-sea anaerobic sediments

    KAUST Repository

    Wang, Yong

    2016-07-01

    Aerophobetes (or CD12) is a recently defined bacterial phylum, of which the metabolic processes and ecological importance remain unclear. In the present study, we obtained the draft genome of an Aerophobetes bacterium TCS1 from saline sediment near the Thuwal cold seep in the Red Sea using a genome binning method. Analysis of 16S rRNA genes of TCS1 and close relatives revealed wide distribution of Aerophobetes in deep-sea sediments. Phylogenetic relationships showed affinity between Aerophobetes TCS1 and some thermophilic bacterial phyla. The genome of TCS1 (at least 1.27 Mbp) contains a full set of genes encoding core metabolic pathways, including glycolysis and pyruvate fermentation to produce acetyl-CoA and acetate. The identification of cross-membrane sugar transporter genes further indicates its potential ability to consume carbohydrates preserved in the sediment under the microbial mat. Aerophobetes bacterium TCS1 therefore probably carried out saccharolytic and fermentative metabolism. The genes responsible for autotrophic synthesis of acetyl-CoA via the Wood–Ljungdahl pathway were also found in the genome. Phylogenetic study of the essential genes for the Wood–Ljungdahl pathway implied relative independence of Aerophobetes bacterium from the known acetogens and methanogens. Compared with genomes of acetogenic bacteria, Aerophobetes bacterium TCS1 genome lacks the genes involved in nitrogen metabolism, sulfur metabolism, signal transduction and cell motility. The metabolic activities of TCS1 might depend on geochemical conditions such as supplies of CO2, hydrogen and sugars, and therefore the TCS1 might be a facultative bacterium in anaerobic saline sediments near cold seeps. © 2016, Science China Press and Springer-Verlag Berlin Heidelberg.

  11. Genomic profiling of plasmablastic lymphoma using array comparative genomic hybridization (aCGH: revealing significant overlapping genomic lesions with diffuse large B-cell lymphoma

    Directory of Open Access Journals (Sweden)

    Lu Xin-Yan

    2009-11-01

    Full Text Available Abstract Background Plasmablastic lymphoma (PL is a subtype of diffuse large B-cell lymphoma (DLBCL. Studies have suggested that tumors with PL morphology represent a group of neoplasms with clinopathologic characteristics corresponding to different entities including extramedullary plasmablastic tumors associated with plasma cell myeloma (PCM. The goal of the current study was to evaluate the genetic similarities and differences among PL, DLBCL (AIDS-related and non AIDS-related and PCM using array-based comparative genomic hybridization. Results Examination of genomic data in PL revealed that the most frequent segmental gain (> 40% include: 1p36.11-1p36.33, 1p34.1-1p36.13, 1q21.1-1q23.1, 7q11.2-7q11.23, 11q12-11q13.2 and 22q12.2-22q13.3. This correlated with segmental gains occurring in high frequency in DLBCL (AIDS-related and non AIDS-related cases. There were some segmental gains and some segmental loss that occurred in PL but not in the other types of lymphoma suggesting that these foci may contain genes responsible for the differentiation of this lymphoma. Additionally, some segmental gains and some segmental loss occurred only in PL and AIDS associated DLBCL suggesting that these foci may be associated with HIV infection. Furthermore, some segmental gains and some segmental loss occurred only in PL and PCM suggesting that these lesions may be related to plasmacytic differentiation. Conclusion To the best of our knowledge, the current study represents the first genomic exploration of PL. The genomic aberration pattern of PL appears to be more similar to that of DLBCL (AIDS-related or non AIDS-related than to PCM. Our findings suggest that PL may remain best classified as a subtype of DLBCL at least at the genome level.

  12. Genome-wide characterization of the WRKY gene family in radish (Raphanus sativus L.) reveals its critical functions under different abiotic stresses.

    Science.gov (United States)

    Karanja, Bernard Kinuthia; Fan, Lianxue; Xu, Liang; Wang, Yan; Zhu, Xianwen; Tang, Mingjia; Wang, Ronghua; Zhang, Fei; Muleke, Everlyne M'mbone; Liu, Liwang

    2017-11-01

    The radish WRKY gene family was genome-widely identified and played critical roles in response to multiple abiotic stresses. The WRKY is among the largest transcription factors (TFs) associated with multiple biological activities for plant survival, including control response mechanisms against abiotic stresses such as heat, salinity, and heavy metals. Radish is an important root vegetable crop and therefore characterization and expression pattern investigation of WRKY transcription factors in radish is imperative. In the present study, 126 putative WRKY genes were retrieved from radish genome database. Protein sequence and annotation scrutiny confirmed that RsWRKY proteins possessed highly conserved domains and zinc finger motif. Based on phylogenetic analysis results, RsWRKYs candidate genes were divided into three groups (Group I, II and III) with the number 31, 74, and 20, respectively. Additionally, gene structure analysis revealed that intron-exon patterns of the WRKY genes are highly conserved in radish. Linkage map analysis indicated that RsWRKY genes were distributed with varying densities over nine linkage groups. Further, RT-qPCR analysis illustrated the significant variation of 36 RsWRKY genes under one or more abiotic stress treatments, implicating that they might be stress-responsive genes. In total, 126 WRKY TFs were identified from the R. sativus genome wherein, 35 of them showed abiotic stress-induced expression patterns. These results provide a genome-wide characterization of RsWRKY TFs and baseline for further functional dissection and molecular evolution investigation, specifically for improving abiotic stress resistances with an ultimate goal of increasing yield and quality of radish.

  13. Whole-genome sequence of a novel Chinese cyprinid herpesvirus 3 isolate reveals the existence of a distinct European genotype in East Asia.

    Science.gov (United States)

    Li, Wei; Lee, Xuezhu; Weng, Shaoping; He, Jianguo; Dong, Chuanfu

    2015-02-25

    Cyprinid herpesvirus 3 (CyHV3), also known as koi herpesvirus (KHV), can be subdivided primarily into European and Asian genotypes, which are represented by CyHV3-U or CyHV3-I and CyHV3-J, respectively. In this study, the whole genome sequence of a novel Chinese CyHV3 isolate (GZ11) was determined and annotated. CyHV3-GZ11 genome was found to contain 295,119 nucleotides with 52.9% G/C content, which is highly similar to those of published CyHV3-U, CyHV3-I, and CyHV3-J strains. With reference to CyHV3-U, CyHV3-I, and CyHV3-J, CyHV3-GZ11 was also classified into 164 open reading frames (ORF), which include eight repeated ORFs. On the basis of the 12 alloherpeviruses core genes, results from phylogenetic analysis showed that CyHV3-GZ11 had closer evolutionary relationships with CyHV3-U and CyHV3-I than with CyHV3/KHV-J, which were also supported by genome wide-based single nucleotide substitution analysis and the use of a series of developed molecular markers. This study was the first to reveal the presence of a distinct European CyHV3 genotype in East and Southeast Asia at a whole genome level, which will evoke new insights on exploring the origin, evolution, and epidemiology of the virus. Copyright © 2014 Elsevier B.V. All rights reserved.

  14. Annotated bibliography

    International Nuclear Information System (INIS)

    1997-08-01

    Under a cooperative agreement with the U.S. Department of Energy's Office of Science and Technology, Waste Policy Institute (WPI) is conducting a five-year research project to develop a research-based approach for integrating communication products in stakeholder involvement related to innovative technology. As part of the research, WPI developed this annotated bibliography which contains almost 100 citations of articles/books/resources involving topics related to communication and public involvement aspects of deploying innovative cleanup technology. To compile the bibliography, WPI performed on-line literature searches (e.g., Dialog, International Association of Business Communicators Public Relations Society of America, Chemical Manufacturers Association, etc.), consulted past years proceedings of major environmental waste cleanup conferences (e.g., Waste Management), networked with professional colleagues and DOE sites to gather reports or case studies, and received input during the August 1996 Research Design Team meeting held to discuss the project's research methodology. Articles were selected for annotation based upon their perceived usefulness to the broad range of public involvement and communication practitioners

  15. A korarchaeal genome reveals insights into the evolution of the Archaea

    Energy Technology Data Exchange (ETDEWEB)

    Anderson, Iain J; Elkins, James G.; Podar, Mircea; Graham, David E.; Makarova, Kira S.; Wolf, Yuri; Randau, Lennart; Hedlund, Brian P.; Brochier-Armanet, Celine; Kunin, Victor; Anderson, Iain; Lapidus, Alla; Goltsman, Eugene; Barry, Kerrie; Koonin, Eugene V.; Hugenholtz, Phil; Kyrpides, Nikos; Wanner, Gerhard; Richardson, Paul; Keller, Martin; Stetter, Karl O.

    2008-06-05

    The candidate division Korarchaeota comprises a group of uncultivated microorganisms that, by their small subunit rRNA phylogeny, may have diverged early from the major archaeal phyla Crenarchaeota and Euryarchaeota. Here, we report the initial characterization of a member of the Korarchaeota with the proposed name,"Candidatus Korarchaeum cryptofilum," which exhibits an ultrathin filamentous morphology. To investigate possible ancestral relationships between deep-branching Korarchaeota and other phyla, we used whole-genome shotgun sequencing to construct a complete composite korarchaeal genome from enriched cells. The genome was assembled into a single contig 1.59 Mb in length with a G + C content of 49percent. Of the 1,617 predicted protein-coding genes, 1,382 (85percent) could be assigned to a revised set of archaeal Clusters of Orthologous Groups (COGs). The predicted gene functions suggest that the organism relies on a simple mode of peptide fermentation for carbon and energy and lacks the ability to synthesize de novo purines, CoA, and several other cofactors. Phylogenetic analyses based on conserved single genes and concatenated protein sequences positioned the korarchaeote as a deep archaeal lineage with an apparent affinity to the Crenarchaeota. However, the predicted gene content revealed that several conserved cellular systems, such as cell division, DNA replication, and tRNA maturation, resemble the counterparts in the Euryarchaeota. In light of the known composition of archaeal genomes, the Korarchaeota might have retained a set of cellular features that represents the ancestral archaeal form.

  16. A Korarchael Genome Reveals Insights into the Evolution of the Archaea

    Energy Technology Data Exchange (ETDEWEB)

    Lapidus, Alla; Elkins, James G.; Podar, Mircea; Graham, David E.; Makarova, Kira S.; Wolf, Yuri; Randau, Lennart; Hedlund, Brian P.; Brochier-Armanet, Celine; Kunin, Victor; Anderson, Iain; Lapidus, Alla; Goltsman, Eugene; Barry, Kerrie; Koonin, Eugene V.; Hugenholtz, Phil; Kyrpides, Nikos; Wanner, Gerhard; Richardson, Paul; Keller, Martin; Stetter, Karl O.

    2008-01-07

    The candidate division Korarchaeota comprises a group of uncultivated microorganisms that, by their small subunit rRNA phylogeny, may have diverged early from the major archaeal phyla Crenarchaeota and Euryarchaeota. Here, we report the initial characterization of a member of the Korarchaeota with the proposed name, ?Candidatus Korarchaeum cryptofilum,? which exhibits an ultrathin filamentous morphology. To investigate possible ancestral relationships between deep-branching Korarchaeota and other phyla, we used whole-genome shotgun sequencing to construct a complete composite korarchaeal genome from enriched cells. The genome was assembled into a single contig 1.59 Mb in length with a G + C content of 49percent. Of the 1,617 predicted protein-coding genes, 1,382 (85percent) could be assigned to a revised set of archaeal Clusters of Orthologous Groups (COGs). The predicted gene functions suggest that the organism relies on a simple mode of peptide fermentation for carbon and energy and lacks the ability to synthesize de novo purines, CoA, and several other cofactors. Phylogenetic analyses based on conserved single genes and concatenated protein sequences positioned the korarchaeote as a deep archaeal lineage with an apparent affinity to the Crenarchaeota. However, the predicted gene content revealed that several conserved cellular systems, such as cell division, DNA replication, and tRNA maturation, resemble the counterparts in the Euryarchaeota. In light of the known composition of archaeal genomes, the Korarchaeota might have retained a set of cellular features that represents the ancestral archaeal form.

  17. Comparative genomic analysis of Lactobacillus plantarum ZJ316 reveals its genetic adaptation and potential probiotic profiles.

    Science.gov (United States)

    Li, Ping; Li, Xuan; Gu, Qing; Lou, Xiu-Yu; Zhang, Xiao-Mei; Song, Da-Feng; Zhang, Chen

    2016-08-01

    In previous studies, Lactobacillus plantarum ZJ316 showed probiotic properties, such as antimicrobial activity against various pathogens and the capacity to significantly improve pig growth and pork quality. The purpose of this study was to reveal the genes potentially related to its genetic adaptation and probiotic profiles based on comparative genomic analysis. The genome sequence of L. plantarum ZJ316 was compared with those of eight L. plantarum strains deposited in GenBank. BLASTN, Mauve, and MUMmer programs were used for genome alignment and comparison. CRISPRFinder was applied for searching the clustered regularly interspaced short palindromic repeats (CRISPRs). We identified genes that encode proteins related to genetic adaptation and probiotic profiles, including carbohydrate transport and metabolism, proteolytic enzyme systems and amino acid biosynthesis, CRISPR adaptive immunity, stress responses, bile salt resistance, ability to adhere to the host intestinal wall, exopolysaccharide (EPS) biosynthesis, and bacteriocin biosynthesis. Comparative characterization of the L. plantarum ZJ316 genome provided the genetic basis for further elucidating the functional mechanisms of its probiotic properties. ZJ316 could be considered a potential probiotic candidate.

  18. Comparative genomic analysis of Lactobacillus plantarum ZJ316 reveals its genetic adaptation and potential probiotic profiles* #

    Science.gov (United States)

    Li, Ping; Li, Xuan; Gu, Qing; Lou, Xiu-yu; Zhang, Xiao-mei; Song, Da-feng; Zhang, Chen

    2016-01-01

    Objective: In previous studies, Lactobacillus plantarum ZJ316 showed probiotic properties, such as antimicrobial activity against various pathogens and the capacity to significantly improve pig growth and pork quality. The purpose of this study was to reveal the genes potentially related to its genetic adaptation and probiotic profiles based on comparative genomic analysis. Methods: The genome sequence of L. plantarum ZJ316 was compared with those of eight L. plantarum strains deposited in GenBank. BLASTN, Mauve, and MUMmer programs were used for genome alignment and comparison. CRISPRFinder was applied for searching the clustered regularly interspaced short palindromic repeats (CRISPRs). Results: We identified genes that encode proteins related to genetic adaptation and probiotic profiles, including carbohydrate transport and metabolism, proteolytic enzyme systems and amino acid biosynthesis, CRISPR adaptive immunity, stress responses, bile salt resistance, ability to adhere to the host intestinal wall, exopolysaccharide (EPS) biosynthesis, and bacteriocin biosynthesis. Conclusions: Comparative characterization of the L. plantarum ZJ316 genome provided the genetic basis for further elucidating the functional mechanisms of its probiotic properties. ZJ316 could be considered a potential probiotic candidate. PMID:27487802

  19. Single nucleus genome sequencing reveals high similarity among nuclei of an endomycorrhizal fungus.

    Directory of Open Access Journals (Sweden)

    Kui Lin

    2014-01-01

    Full Text Available Nuclei of arbuscular endomycorrhizal fungi have been described as highly diverse due to their asexual nature and absence of a single cell stage with only one nucleus. This has raised fundamental questions concerning speciation, selection and transmission of the genetic make-up to next generations. Although this concept has become textbook knowledge, it is only based on studying a few loci, including 45S rDNA. To provide a more comprehensive insight into the genetic makeup of arbuscular endomycorrhizal fungi, we applied de novo genome sequencing of individual nuclei of Rhizophagus irregularis. This revealed a surprisingly low level of polymorphism between nuclei. In contrast, within a nucleus, the 45S rDNA repeat unit turned out to be highly diverged. This finding demystifies a long-lasting hypothesis on the complex genetic makeup of arbuscular endomycorrhizal fungi. Subsequent genome assembly resulted in the first draft reference genome sequence of an arbuscular endomycorrhizal fungus. Its length is 141 Mbps, representing over 27,000 protein-coding gene models. We used the genomic sequence to reinvestigate the phylogenetic relationships of Rhizophagus irregularis with other fungal phyla. This unambiguously demonstrated that Glomeromycota are more closely related to Mucoromycotina than to its postulated sister Dikarya.

  20. Comparative genomics Lactobacillus reuteri from sourdough reveals adaptation of an intestinal symbiont to food fermentations.

    Science.gov (United States)

    Zheng, Jinshui; Zhao, Xin; Lin, Xiaoxi B; Gänzle, Michael

    2015-12-11

    Lactobacillus reuteri is a dominant member of intestinal microbiota of vertebrates, and occurs in food fermentations. The stable presence of L. reuteri in sourdough provides the opportunity to study the adaptation of vertebrate symbionts to an extra-intestinal habitat. This study evaluated this adaptation by comparative genomics of 16 strains of L. reuteri. A core genome phylogenetic tree grouped L. reuteri into 5 clusters corresponding to the host-adapted lineages. The topology of a gene content tree, which includes accessory genes, differed from the core genome phylogenetic tree, suggesting that the differentiation of L. reuteri is shaped by gene loss or acquisition. About 10% of the core genome (124 core genes) were under positive selection. In lineage III sourdough isolates, 177 genes were under positive selection, mainly related to energy conversion and carbohydrate metabolism. The analysis of the competitiveness of L. reuteri in sourdough revealed that the competitivess of sourdough isolates was equal or higher when compared to rodent isolates. This study provides new insights into the adaptation of L. reuteri to food and intestinal habitats, suggesting that these two habitats exert different selective pressure related to growth rate and energy (carbohydrate) metabolism.

  1. Integrative Genomics Reveals Mechanisms of Copy Number Alterations Responsible for Transcriptional Deregulation in Colorectal Cancer

    Science.gov (United States)

    Camps, Jordi; Nguyen, Quang Tri; Padilla-Nash, Hesed M.; Knutsen, Turid; McNeil, Nicole E.; Wangsa, Danny; Hummon, Amanda B.; Grade, Marian; Ried, Thomas; Difilippantonio, Michael J.

    2016-01-01

    To evaluate the mechanisms and consequences of chromosomal aberrations in colorectal cancer (CRC), we used a combination of spectral karyotyping, array comparative genomic hybridization (aCGH), and array-based global gene expression profiling on 31 primary carcinomas and 15 established cell lines. Importantly, aCGH showed that the genomic profiles of primary tumors are recapitulated in the cell lines. We revealed a preponderance of chromosome breakpoints at sites of copy number variants (CNVs) in the CRC cell lines, a novel mechanism of DNA breakage in cancer. The integration of gene expression and aCGH led to the identification of 157 genes localized within high-level copy number changes whose transcriptional deregulation was significantly affected across all of the samples, thereby suggesting that these genes play a functional role in CRC. Genomic amplification at 8q24 was the most recurrent event and led to the overexpression of MYC and FAM84B. Copy number dependent gene expression resulted in deregulation of known cancer genes such as APC, FGFR2, and ERBB2. The identification of only 36 genes whose localization near a breakpoint could account for their observed deregulated expression demonstrates that the major mechanism for transcriptional deregulation in CRC is genomic copy number changes resulting from chromosomal aberrations. PMID:19691111

  2. Comparative Analysis of 35 Basidiomycete Genomes Reveals Diversity and Uniqueness of the Phylum

    Energy Technology Data Exchange (ETDEWEB)

    Riley, Robert; Salamov, Asaf; Otillar, Robert; Fagnan, Kirsten; Boussau, Bastien; Brown, Daren; Henrissat, Bernard; Levasseur, Anthony; Held, Benjamin; Nagy, Laszlo; Floudas, Dimitris; Morin, Emmanuelle; Manning, Gerard; Baker, Scott; Martin, Francis; Blanchette, Robert; Hibbett, David; Grigoriev, Igor V.

    2013-03-11

    Fungi of the phylum Basidiomycota (basidiomycetes), make up some 37percent of the described fungi, and are important in forestry, agriculture, medicine, and bioenergy. This diverse phylum includes symbionts, pathogens, and saprobes including wood decaying fungi. To better understand the diversity of this phylum we compared the genomes of 35 basidiomycete fungi including 6 newly sequenced genomes. The genomes of basidiomycetes span extremes of genome size, gene number, and repeat content. A phylogenetic tree of Basidiomycota was generated using the Phyldog software, which uses all available protein sequence data to simultaneously infer gene and species trees. Analysis of core genes reveals that some 48percent of basidiomycete proteins are unique to the phylum with nearly half of those (22percent) comprising proteins found in only one organism. Phylogenetic patterns of plant biomass-degrading genes suggest a continuum rather than a sharp dichotomy between the white rot and brown rot modes of wood decay among the members of Agaricomycotina subphylum. There is a correlation of the profile of certain gene families to nutritional mode in Agaricomycotina. Based on phylogenetically-informed PCA analysis of such profiles, we predict that that Botryobasidium botryosum and Jaapia argillacea have properties similar to white rot species, although neither has liginolytic class II fungal peroxidases. Furthermore, we find that both fungi exhibit wood decay with white rot-like characteristics in growth assays. Analysis of the rate of discovery of proteins with no or few homologs suggests the high value of continued sequencing of basidiomycete fungi.

  3. Whole genome analysis of linezolid resistance in Streptococcus pneumoniae reveals resistance and compensatory mutations

    Directory of Open Access Journals (Sweden)

    Légaré Danielle

    2011-10-01

    Full Text Available Abstract Background Several mutations were present in the genome of Streptococcus pneumoniae linezolid-resistant strains but the role of several of these mutations had not been experimentally tested. To analyze the role of these mutations, we reconstituted resistance by serial whole genome transformation of a novel resistant isolate into two strains with sensitive background. We sequenced the parent mutant and two independent transformants exhibiting similar minimum inhibitory concentration to linezolid. Results Comparative genomic analyses revealed that transformants acquired G2576T transversions in every gene copy of 23S rRNA and that the number of altered copies correlated with the level of linezolid resistance and cross-resistance to florfenicol and chloramphenicol. One of the transformants also acquired a mutation present in the parent mutant leading to the overexpression of an ABC transporter (spr1021. The acquisition of these mutations conferred a fitness cost however, which was further enhanced by the acquisition of a mutation in a RNA methyltransferase implicated in resistance. Interestingly, the fitness of the transformants could be restored in part by the acquisition of altered copies of the L3 and L16 ribosomal proteins and by mutations leading to the overexpression of the spr1887 ABC transporter that were present in the original linezolid-resistant mutant. Conclusions Our results demonstrate the usefulness of whole genome approaches at detecting major determinants of resistance as well as compensatory mutations that alleviate the fitness cost associated with resistance.

  4. Evolution and phylogeny of the mud shrimps (Crustacea: Decapoda) revealed from complete mitochondrial genomes.

    Science.gov (United States)

    Lin, Feng-Jiau; Liu, Yuan; Sha, Zhongli; Tsang, Ling Ming; Chu, Ka Hou; Chan, Tin-Yam; Liu, Ruiyu; Cui, Zhaoxia

    2012-11-16

    The evolutionary history and relationships of the mud shrimps (Crustacea: Decapoda: Gebiidea and Axiidea) are contentious, with previous attempts revealing mixed results. The mud shrimps were once classified in the infraorder Thalassinidea. Recent molecular phylogenetic analyses, however, suggest separation of the group into two individual infraorders, Gebiidea and Axiidea. Mitochondrial (mt) genome sequence and structure can be especially powerful in resolving higher systematic relationships that may offer new insights into the phylogeny of the mud shrimps and the other decapod infraorders, and test the hypothesis of dividing the mud shrimps into two infraorders. We present the complete mitochondrial genome sequences of five mud shrimps, Austinogebia edulis, Upogebia major, Thalassina kelanang (Gebiidea), Nihonotrypaea thermophilus and Neaxius glyptocercus (Axiidea). All five genomes encode a standard set of 13 protein-coding genes, two ribosomal RNA genes, 22 transfer RNA genes and a putative control region. Except for T. kelanang, mud shrimp mitochondrial genomes exhibited rearrangements and novel patterns compared to the pancrustacean ground pattern. Each of the two Gebiidea species (A. edulis and U. major) and two Axiidea species (N. glyptocercus and N. thermophiles) share unique gene order specific to their infraorders and analyses further suggest these two derived gene orders have evolved independently. Phylogenetic analyses based on the concatenated nucleotide and amino acid sequences of 13 protein-coding genes indicate the possible polyphyly of mud shrimps, supporting the division of the group into two infraorders. However, the infraordinal relationships among the Gebiidea and Axiidea, and other reptants are poorly resolved. The inclusion of mt genome from more taxa, in particular the reptant infraorders Polychelida and Glypheidea is required in further analysis. Phylogenetic analyses on the mt genome sequences and the distinct gene orders provide further

  5. Supervised machine learning reveals introgressed loci in the genomes of Drosophila simulans and D. sechellia.

    Science.gov (United States)

    Schrider, Daniel R; Ayroles, Julien; Matute, Daniel R; Kern, Andrew D

    2018-04-01

    Hybridization and gene flow between species appears to be common. Even though it is clear that hybridization is widespread across all surveyed taxonomic groups, the magnitude and consequences of introgression are still largely unknown. Thus it is crucial to develop the statistical machinery required to uncover which genomic regions have recently acquired haplotypes via introgression from a sister population. We developed a novel machine learning framework, called FILET (Finding Introgressed Loci via Extra-Trees) capable of revealing genomic introgression with far greater power than competing methods. FILET works by combining information from a number of population genetic summary statistics, including several new statistics that we introduce, that capture patterns of variation across two populations. We show that FILET is able to identify loci that have experienced gene flow between related species with high accuracy, and in most situations can correctly infer which population was the donor and which was the recipient. Here we describe a data set of outbred diploid Drosophila sechellia genomes, and combine them with data from D. simulans to examine recent introgression between these species using FILET. Although we find that these populations may have split more recently than previously appreciated, FILET confirms that there has indeed been appreciable recent introgression (some of which might have been adaptive) between these species, and reveals that this gene flow is primarily in the direction of D. simulans to D. sechellia.

  6. Correction: Comparative analysis of fungal genomes reveals different plant cell wall degrading capacity in fungi

    Science.gov (United States)

    2014-01-01

    Abstract The version of this article published in BMC Genomics 2013, 14: 274, contains 9 unpublished genomes (Botryobasidium botryosum, Gymnopus luxurians, Hypholoma sublateritium, Jaapia argillacea, Hebeloma cylindrosporum, Conidiobolus coronatus, Laccaria amethystina, Paxillus involutus, and P. rubicundulus) downloaded from JGI website. In this correction, we removed these genomes after discussion with editors and data producers whom we should have contacted before downloading these genomes. Removing these data did not alter the principle results and conclusions of our original work. The relevant Figures 1, 2, 3, 4 and 6; and Table 1 have been revised. Additional files 1, 3, 4, and 5 were also revised. We would like to apologize for any confusion or inconvenience this may have caused. Background Fungi produce a variety of carbohydrate activity enzymes (CAZymes) for the degradation of plant polysaccharide materials to facilitate infection and/or gain nutrition. Identifying and comparing CAZymes from fungi with different nutritional modes or infection mechanisms may provide information for better understanding of their life styles and infection models. To date, over hundreds of fungal genomes are publicly available. However, a systematic comparative analysis of fungal CAZymes across the entire fungal kingdom has not been reported. Results In this study, we systemically identified glycoside hydrolases (GHs), polysaccharide lyases (PLs), carbohydrate esterases (CEs), and glycosyltransferases (GTs) as well as carbohydrate-binding modules (CBMs) in the predicted proteomes of 94 representative fungi from Ascomycota, Basidiomycota, Chytridiomycota, and Zygomycota. Comparative analysis of these CAZymes that play major roles in plant polysaccharide degradation revealed that fungi exhibit tremendous diversity in the number and variety of CAZymes. Among them, some families of GHs and CEs are the most prevalent CAZymes that are distributed in all of the fungi analyzed

  7. Phylogenetic molecular function annotation

    International Nuclear Information System (INIS)

    Engelhardt, Barbara E; Jordan, Michael I; Repo, Susanna T; Brenner, Steven E

    2009-01-01

    It is now easier to discover thousands of protein sequences in a new microbial genome than it is to biochemically characterize the specific activity of a single protein of unknown function. The molecular functions of protein sequences have typically been predicted using homology-based computational methods, which rely on the principle that homologous proteins share a similar function. However, some protein families include groups of proteins with different molecular functions. A phylogenetic approach for predicting molecular function (sometimes called 'phylogenomics') is an effective means to predict protein molecular function. These methods incorporate functional evidence from all members of a family that have functional characterizations using the evolutionary history of the protein family to make robust predictions for the uncharacterized proteins. However, they are often difficult to apply on a genome-wide scale because of the time-consuming step of reconstructing the phylogenies of each protein to be annotated. Our automated approach for function annotation using phylogeny, the SIFTER (Statistical Inference of Function Through Evolutionary Relationships) methodology, uses a statistical graphical model to compute the probabilities of molecular functions for unannotated proteins. Our benchmark tests showed that SIFTER provides accurate functional predictions on various protein families, outperforming other available methods.

  8. Deep sequencing of the mitochondrial genome reveals common heteroplasmic sites in NADH dehydrogenase genes.

    Science.gov (United States)

    Liu, Chunyu; Fetterman, Jessica L; Liu, Poching; Luo, Yan; Larson, Martin G; Vasan, Ramachandran S; Zhu, Jun; Levy, Daniel

    2018-03-01

    Increasing evidence implicates mitochondrial dysfunction in aging and age-related conditions. But little is known about the molecular basis for this connection. A possible cause may be mutations in the mitochondrial DNA (mtDNA), which are often heteroplasmic-the joint presence of different alleles at a single locus in the same individual. However, the involvement of mtDNA heteroplasmy in aging and age-related conditions has not been investigated thoroughly. We deep-sequenced the complete mtDNA genomes of 356 Framingham Heart Study participants (52% women, mean age 43, mean coverage 4570-fold), identified 2880 unique mutations and comprehensively annotated them by MITOMAP and PolyPhen-2. We discovered 11 heteroplasmic "hot" spots [NADH dehydrogenase (ND) subunit 1, 4, 5 and 6 genes, n = 7; cytochrome c oxidase I (COI), n = 2; 16S rRNA, n = 1; D-loop, n = 1] for which the alternative-to-reference allele ratios significantly increased with advancing age (Bonferroni correction p < 0.001). Four of these heteroplasmic mutations in ND and COI genes were predicted to be deleterious nonsynonymous mutations which may have direct impact on ATP production. We confirmed previous findings that healthy individuals carry many low-frequency heteroplasmy mutations with potentially deleterious effects. We hypothesize that the effect of a single deleterious heteroplasmy may be minimal due to a low mutant-to-wildtype allele ratio, whereas the aggregate effects of many deleterious mutations may cause changes in mitochondrial function and contribute to age-related diseases. The identification of age-related mtDNA mutations is an important step to understand the genetic architecture of age-related diseases and may uncover novel therapeutic targets for such diseases.

  9. In Depth Characterization of Repetitive DNA in 23 Plant Genomes Reveals Sources of Genome Size Variation in the Legume Tribe Fabeae.

    Science.gov (United States)

    Macas, Jiří; Novák, Petr; Pellicer, Jaume; Čížková, Jana; Koblížková, Andrea; Neumann, Pavel; Fuková, Iva; Doležel, Jaroslav; Kelly, Laura J; Leitch, Ilia J

    2015-01-01

    The differential accumulation and elimination of repetitive DNA are key drivers of genome size variation in flowering plants, yet there have been few studies which have analysed how different types of repeats in related species contribute to genome size evolution within a phylogenetic context. This question is addressed here by conducting large-scale comparative analysis of repeats in 23 species from four genera of the monophyletic legume tribe Fabeae, representing a 7.6-fold variation in genome size. Phylogenetic analysis and genome size reconstruction revealed that this diversity arose from genome size expansions and contractions in different lineages during the evolution of Fabeae. Employing a combination of low-pass genome sequencing with novel bioinformatic approaches resulted in identification and quantification of repeats making up 55-83% of the investigated genomes. In turn, this enabled an analysis of how each major repeat type contributed to the genome size variation encountered. Differential accumulation of repetitive DNA was found to account for 85% of the genome size differences between the species, and most (57%) of this variation was found to be driven by a single lineage of Ty3/gypsy LTR-retrotransposons, the Ogre elements. Although the amounts of several other lineages of LTR-retrotransposons and the total amount of satellite DNA were also positively correlated with genome size, their contributions to genome size variation were much smaller (up to 6%). Repeat analysis within a phylogenetic framework also revealed profound differences in the extent of sequence conservation between different repeat types across Fabeae. In addition to these findings, the study has provided a proof of concept for the approach combining recent developments in sequencing and bioinformatics to perform comparative analyses of repetitive DNAs in a large number of non-model species without the need to assemble their genomes.

  10. In Depth Characterization of Repetitive DNA in 23 Plant Genomes Reveals Sources of Genome Size Variation in the Legume Tribe Fabeae.

    Directory of Open Access Journals (Sweden)

    Jiří Macas

    Full Text Available The differential accumulation and elimination of repetitive DNA are key drivers of genome size variation in flowering plants, yet there have been few studies which have analysed how different types of repeats in related species contribute to genome size evolution within a phylogenetic context. This question is addressed here by conducting large-scale comparative analysis of repeats in 23 species from four genera of the monophyletic legume tribe Fabeae, representing a 7.6-fold variation in genome size. Phylogenetic analysis and genome size reconstruction revealed that this diversity arose from genome size expansions and contractions in different lineages during the evolution of Fabeae. Employing a combination of low-pass genome sequencing with novel bioinformatic approaches resulted in identification and quantification of repeats making up 55-83% of the investigated genomes. In turn, this enabled an analysis of how each major repeat type contributed to the genome size variation encountered. Differential accumulation of repetitive DNA was found to account for 85% of the genome size differences between the species, and most (57% of this variation was found to be driven by a single lineage of Ty3/gypsy LTR-retrotransposons, the Ogre elements. Although the amounts of several other lineages of LTR-retrotransposons and the total amount of satellite DNA were also positively correlated with genome size, their contributions to genome size variation were much smaller (up to 6%. Repeat analysis within a phylogenetic framework also revealed profound differences in the extent of sequence conservation between different repeat types across Fabeae. In addition to these findings, the study has provided a proof of concept for the approach combining recent developments in sequencing and bioinformatics to perform comparative analyses of repetitive DNAs in a large number of non-model species without the need to assemble their genomes.

  11. Illumina based whole mitochondrial genome of Junonia iphita reveals minor intraspecific variation

    Directory of Open Access Journals (Sweden)

    Catherine Vanlalruati

    2015-12-01

    Full Text Available In the present study, the near complete mitochondrial genome (mitogenome of Junonia iphita (Lepidoptera: Nymphalidae: Nymphalinae was determined to be 14,892 bp. The gene order and orientation are identical to those in other butterfly species. The phylogenetic tree constructed from the whole mitogenomes using the 13 protein coding genes (PCGs defines the genetic relatedness of the two J. iphita species collected from two different regions. All the Junonia species clustered together, and were further subdivided into clade one consisting of J. almana and J. orithya and clade two comprising of the two J. iphita which were collected from Indo and Indochinese subregions separated by river barrier. Comparison between the two J. iphita sequences revealed minor variations and Single Nucleotide Polymorphisms were identified at 51 sites amounting to 0.4% of the entire mitochondrial genome.

  12. The Chlamydomonas Genome Reveals the Evolution of Key Animal and Plant Functions

    Energy Technology Data Exchange (ETDEWEB)

    Merchant, Sabeeha S

    2007-04-09

    Chlamydomonas reinhardtii is a unicellular green alga whose lineage diverged from land plants over 1 billion years ago. It is a model system for studying chloroplast-based photosynthesis, as well as the structure, assembly, and function of eukaryotic flagella (cilia), which were inherited from the common ancestor of plants and animals, but lost in land plants. We sequenced the 120-megabase nuclear genome of Chlamydomonas and performed comparative phylogenomic analyses, identifying genes encoding uncharacterized proteins that are likely associated with the function and biogenesis of chloroplasts or eukaryotic flagella. Analyses of the Chlamydomonas genome advance our understanding of the ancestral eukaryotic cell, reveal previously unknown genes associated with photosynthetic and flagellar functions, and establish links between ciliopathy and the composition and function of flagella.

  13. Genus-wide comparison of Pseudovibrio bacterial genomes reveal diverse adaptations to different marine invertebrate hosts.

    Science.gov (United States)

    Alex, Anoop; Antunes, Agostinho

    2018-01-01

    Bacteria belonging to the genus Pseudovibrio have been frequently found in association with a wide variety of marine eukaryotic invertebrate hosts, indicative of their versatile and symbiotic lifestyle. A recent comparison of the sponge-associated Pseudovibrio genomes has shed light on the mechanisms influencing a successful symbiotic association with sponges. In contrast, the genomic architecture of Pseudovibrio bacteria associated with other marine hosts has received less attention. Here, we performed genus-wide comparative analyses of 18 Pseudovibrio isolated from sponges, coral, tunicates, flatworm, and seawater. The analyses revealed a certain degree of commonality among the majority of sponge- and coral-associated bacteria. Isolates from other marine invertebrate host, tunicates, exhibited a genetic repertoire for cold adaptation and specific metabolic abilities including mucin degradation in the Antarctic tunicate-associated bacterium Pseudovibrio sp. Tun.PHSC04_5.I4. Reductive genome evolution was simultaneously detected in the flatworm-associated bacteria and the sponge-associated bacterium P. axinellae AD2, through the loss of major secretion systems (type III/VI) and virulence/symbioses factors such as proteins involved in adhesion and attachment to the host. Our study also unraveled the presence of a CRISPR-Cas system in P. stylochi UST20140214-052 a flatworm-associated bacterium possibly suggesting the role of CRISPR-based adaptive immune system against the invading virus particles. Detection of mobile elements and genomic islands (GIs) in all bacterial members highlighted the role of horizontal gene transfer for the acquisition of novel genetic features, likely enhancing the bacterial ecological fitness. These findings are insightful to understand the role of genome diversity in Pseudovibrio as an evolutionary strategy to increase their colonizing success across a wide range of marine eukaryotic hosts.

  14. Dynamic Evolution of Pathogenicity Revealed by Sequencing and Comparative Genomics of 19 Pseudomonas syringae Isolates

    Science.gov (United States)

    Romanchuk, Artur; Chang, Jeff H.; Mukhtar, M. Shahid; Cherkis, Karen; Roach, Jeff; Grant, Sarah R.; Jones, Corbin D.; Dangl, Jeffery L.

    2011-01-01

    Closely related pathogens may differ dramatically in host range, but the molecular, genetic, and evolutionary basis for these differences remains unclear. In many Gram- negative bacteria, including the phytopathogen Pseudomonas syringae, type III effectors (TTEs) are essential for pathogenicity, instrumental in structuring host range, and exhibit wide diversity between strains. To capture the dynamic nature of virulence gene repertoires across P. syringae, we screened 11 diverse strains for novel TTE families and coupled this nearly saturating screen with the sequencing and assembly of 14 phylogenetically diverse isolates from a broad collection of diseased host plants. TTE repertoires vary dramatically in size and content across all P. syringae clades; surprisingly few TTEs are conserved and present in all strains. Those that are likely provide basal requirements for pathogenicity. We demonstrate that functional divergence within one conserved locus, hopM1, leads to dramatic differences in pathogenicity, and we demonstrate that phylogenetics-informed mutagenesis can be used to identify functionally critical residues of TTEs. The dynamism of the TTE repertoire is mirrored by diversity in pathways affecting the synthesis of secreted phytotoxins, highlighting the likely role of both types of virulence factors in determination of host range. We used these 14 draft genome sequences, plus five additional genome sequences previously reported, to identify the core genome for P. syringae and we compared this core to that of two closely related non-pathogenic pseudomonad species. These data revealed the recent acquisition of a 1 Mb megaplasmid by a sub-clade of cucumber pathogens. This megaplasmid encodes a type IV secretion system and a diverse set of unknown proteins, which dramatically increases both the genomic content of these strains and the pan-genome of the species. PMID:21799664

  15. Whole mitochondrial genome sequencing of domestic horses reveals incorporation of extensive wild horse diversity during domestication

    Directory of Open Access Journals (Sweden)

    Lippold Sebastian

    2011-11-01

    Full Text Available Abstract Background DNA target enrichment by micro-array capture combined with high throughput sequencing technologies provides the possibility to obtain large amounts of sequence data (e.g. whole mitochondrial DNA genomes from multiple individuals at relatively low costs. Previously, whole mitochondrial genome data for domestic horses (Equus caballus were limited to only a few specimens and only short parts of the mtDNA genome (especially the hypervariable region were investigated for larger sample sets. Results In this study we investigated whole mitochondrial genomes of 59 domestic horses from 44 breeds and a single Przewalski horse (Equus przewalski using a recently described multiplex micro-array capture approach. We found 473 variable positions within the domestic horses, 292 of which are parsimony-informative, providing a well resolved phylogenetic tree. Our divergence time estimate suggests that the mitochondrial genomes of modern horse breeds shared a common ancestor around 93,000 years ago and no later than 38,000 years ago. A Bayesian skyline plot (BSP reveals a significant population expansion beginning 6,000-8,000 years ago with an ongoing exponential growth until the present, similar to other domestic animal species. Our data further suggest that a large sample of wild horse diversity was incorporated into the domestic population; specifically, at least 46 of the mtDNA lineages observed in domestic horses (73% already existed before the beginning of domestication about 5,000 years ago. Conclusions Our study provides a window into the maternal origins of extant domestic horses and confirms that modern domestic breeds present a wide sample of the mtDNA diversity found in ancestral, now extinct, wild horse populations. The data obtained allow us to detect a population expansion event coinciding with the beginning of domestication and to estimate both the minimum number of female horses incorporated into the domestic gene pool and the

  16. Complete genomes reveal signatures of demographic and genetic declines in the woolly mammoth

    Science.gov (United States)

    Palkopoulou, Eleftheria; Mallick, Swapan; Skoglund, Pontus; Enk, Jacob; Rohland, Nadin; Li, Heng; Omrak, Ayça; Vartanyan, Sergey; Poinar, Hendrik; Götherström, Anders; Reich, David; Dalén, Love

    2015-01-01

    Summary The processes leading up to species extinctions are typically characterized by prolonged declines in population size and geographic distribution, followed by a phase in which populations are very small and may be subject to intrinsic threats, including loss of genetic diversity and inbreeding [1]. However, whether such genetic factors have had an impact on species prior to their extinction is unclear [2, 3]; examining this would require a detailed reconstruction of a species’ demographic history as well as changes in genome-wide diversity leading up to its extinction. Here, we present high-quality complete genome sequences from two woolly mammoths (Mammuthus primigenius). The first mammoth was sequenced at 17.1-fold coverage, and dates to ~4,300 years before present, constituting one of the last surviving individuals on Wrangel Island. The second mammoth, sequenced at 11.2-fold coverage, was obtained from a ~44,800 year old specimen from the Late Pleistocene population in northeastern Siberia. The demographic trajectories inferred from the two genomes are qualitatively similar and reveal a population bottleneck during the Middle or Early Pleistocene, and a more recent severe decline in the ancestors of the Wrangel mammoth at the end of the last glaciation. A comparison of the two genomes shows that the Wrangel mammoth has a 20% reduction in heterozygosity as well as a 28-fold increase in the fraction of the genome that is comprised of runs of homozygosity. We conclude that the population on Wrangel Island, which was the last surviving woolly mammoth population, was subject to reduced genetic diversity shortly before it became extinct. PMID:25913407

  17. Comparative genome analysis of pathogenic and non-pathogenic Clavibacter strains reveals adaptations to their lifestyle.

    Science.gov (United States)

    Załuga, Joanna; Stragier, Pieter; Baeyen, Steve; Haegeman, Annelies; Van Vaerenbergh, Johan; Maes, Martine; De Vos, Paul

    2014-05-22

    The genus Clavibacter harbors economically important plant pathogens infecting agricultural crops such as potato and tomato. Although the vast majority of Clavibacter strains are pathogenic, there is an increasing number of non-pathogenic isolates reported. Non-pathogenic Clavibacter strains isolated from tomato seeds are particularly problematic because they affect the current detection and identification tests for Clavibacter michiganensis subsp. michiganensis (Cmm), which is regulated with a zero tolerance in tomato seed. Their misidentification as pathogenic Cmm hampers a clear judgment on the seed quality and health. To get more insight in the genetic features linked to the lifestyle of these bacteria, a whole-genome sequence of the tomato seed-borne non-pathogenic Clavibacter LMG 26808 was determined. To gain a better understanding of the molecular determinants of pathogenicity, the genome sequence of LMG 26808 was compared with that of the pathogenic Cmm strain (NCPPB 382). The comparative analysis revealed that LMG 26808 does not contain plasmids pCM1 and pCM2 and also lacks the majority of important virulence factors described so far for pathogenic Cmm. This explains its apparent non-pathogenic nature in tomato plants. Moreover, the genome analysis of LMG 26808 detected sequences from a plasmid originating from a member of Enterobacteriaceae/Klebsiella relative. Genes received that way and coding for antibiotic resistance may provide a competitive advantage for survival of LMG 26808 in its ecological niche. Genetically, LMG 26808 was the most similar to the pathogenic Cmm NCPPB 382 but contained more mobile genetic elements. The genome of this non-pathogenic Clavibacter strain contained also a high number of transporters and regulatory genes. The genome sequence of the non-pathogenic Clavibacter strain LMG 26808 and the comparative analyses with other pathogenic Clavibacter strains provided a better understanding of the genetic bases of virulence and

  18. The complete genome sequence of Fibrobacter succinogenes S85 reveals a cellulolytic and metabolic specialist.

    Directory of Open Access Journals (Sweden)

    Garret Suen

    Full Text Available Fibrobacter succinogenes is an important member of the rumen microbial community that converts plant biomass into nutrients usable by its host. This bacterium, which is also one of only two cultivated species in its phylum, is an efficient and prolific degrader of cellulose. Specifically, it has a particularly high activity against crystalline cellulose that requires close physical contact with this substrate. However, unlike other known cellulolytic microbes, it does not degrade cellulose using a cellulosome or by producing high extracellular titers of cellulase enzymes. To better understand the biology of F. succinogenes, we sequenced the genome of the type strain S85 to completion. A total of 3,085 open reading frames were predicted from its 3.84 Mbp genome. Analysis of sequences predicted to encode for carbohydrate-degrading enzymes revealed an unusually high number of genes that were classified into 49 different families of glycoside hydrolases, carbohydrate binding modules (CBMs, carbohydrate esterases, and polysaccharide lyases. Of the 31 identified cellulases, none contain CBMs in families 1, 2, and 3, typically associated with crystalline cellulose degradation. Polysaccharide hydrolysis and utilization assays showed that F. succinogenes was able to hydrolyze a number of polysaccharides, but could only utilize the hydrolytic products of cellulose. This suggests that F. succinogenes uses its array of hemicellulose-degrading enzymes to remove hemicelluloses to gain access to cellulose. This is reflected in its genome, as F. succinogenes lacks many of the genes necessary to transport and metabolize the hydrolytic products of non-cellulose polysaccharides. The F. succinogenes genome reveals a bacterium that specializes in cellulose as its sole energy source, and provides insight into a novel strategy for cellulose degradation.

  19. Deep sequencing of foot-and-mouth disease virus reveals RNA sequences involved in genome packaging.

    Science.gov (United States)

    Logan, Grace; Newman, Joseph; Wright, Caroline F; Lasecka-Dykes, Lidia; Haydon, Daniel T; Cottam, Eleanor M; Tuthill, Tobias J

    2017-10-18

    Non-enveloped viruses protect their genomes by packaging them into an outer shell or capsid of virus-encoded proteins. Packaging and capsid assembly in RNA viruses can involve interactions between capsid proteins and secondary structures in the viral genome as exemplified by the RNA bacteriophage MS2 and as proposed for other RNA viruses of plants, animals and human. In the picornavirus family of non-enveloped RNA viruses, the requirements for genome packaging remain poorly understood. Here we show a novel and simple approach to identify predicted RNA secondary structures involved in genome packaging in the picornavirus foot-and-mouth disease virus (FMDV). By interrogating deep sequencing data generated from both packaged and unpackaged populations of RNA we have determined multiple regions of the genome with constrained variation in the packaged population. Predicted secondary structures of these regions revealed stem loops with conservation of structure and a common motif at the loop. Disruption of these features resulted in attenuation of virus growth in cell culture due to a reduction in assembly of mature virions. This study provides evidence for the involvement of predicted RNA structures in picornavirus packaging and offers a readily transferable methodology for identifying packaging requirements in many other viruses. Importance In order to transmit their genetic material to a new host, non-enveloped viruses must protect their genomes by packaging them into an outer shell or capsid of virus-encoded proteins. For many non-enveloped RNA viruses the requirements for this critical part of the viral life cycle remain poorly understood. We have identified RNA sequences involved in genome packaging of the picornavirus foot-and-mouth disease virus. This virus causes an economically devastating disease of livestock affecting both the developed and developing world. The experimental methods developed to carry out this work are novel, simple and transferable to the

  20. RNA-Seq analysis and annotation of a draft blueberry genome assembly identifies candidate genes involved in fruit ripening, biosynthesis of bioactive compounds, and stage-specific alternative splicing.

    Science.gov (United States)

    Gupta, Vikas; Estrada, April D; Blakley, Ivory; Reid, Rob; Patel, Ketan; Meyer, Mason D; Andersen, Stig Uggerhøj; Brown, Allan F; Lila, Mary Ann; Loraine, Ann E

    2015-01-01

    Blueberries are a rich source of antioxidants and other beneficial compounds that can protect against disease. Identifying genes involved in synthesis of bioactive compounds could enable the breeding of berry varieties with enhanced health benefits. Toward this end, we annotated a previously sequenced draft blueberry genome assembly using RNA-Seq data from five stages of berry fruit development and ripening. Genome-guided assembly of RNA-Seq read alignments combined with output from ab initio gene finders produced around 60,000 gene models, of which more than half were similar to proteins from other species, typically the grape Vitis vinifera. Comparison of gene models to the PlantCyc database of metabolic pathway enzymes identified candidate genes involved in synthesis of bioactive compounds, including bixin, an apocarotenoid with potential disease-fighting properties, and defense-related cyanogenic glycosides, which are toxic. Cyanogenic glycoside (CG) biosynthetic enzymes were highly expressed in green fruit, and a candidate CG detoxification enzyme was up-regulated during fruit ripening. Candidate genes for ethylene, anthocyanin, and 400 other biosynthetic pathways were also identified. Homology-based annotation using Blast2GO and InterPro assigned Gene Ontology terms to around 15,000 genes. RNA-Seq expression profiling showed that blueberry growth, maturation, and ripening involve dynamic gene expression changes, including coordinated up- and down-regulation of metabolic pathway enzymes and transcriptional regulators. Analysis of RNA-seq alignments identified developmentally regulated alternative splicing, promoter use, and 3' end formation. We report genome sequence, gene models, functional annotations, and RNA-Seq expression data that provide an important new resource enabling high throughput studies in blueberry.

  1. Analysis of antisense expression by whole genome tiling microarrays and siRNAs suggests mis-annotation of Arabidopsis orphan protein-coding genes.

    Directory of Open Access Journals (Sweden)

    Casey R Richardson

    2010-05-01

    Full Text Available MicroRNAs (miRNAs and trans-acting small-interfering RNAs (tasi-RNAs are small (20-22 nt long RNAs (smRNAs generated from hairpin secondary structures or antisense transcripts, respectively, that regulate gene expression by Watson-Crick pairing to a target mRNA and altering expression by mechanisms related to RNA interference. The high sequence homology of plant miRNAs to their targets has been the mainstay of miRNA prediction algorithms, which are limited in their predictive power for other kingdoms because miRNA complementarity is less conserved yet transitive processes (production of antisense smRNAs are active in eukaryotes. We hypothesize that antisense transcription and associated smRNAs are biomarkers which can be computationally modeled for gene discovery.We explored rice (Oryza sativa sense and antisense gene expression in publicly available whole genome tiling array transcriptome data and sequenced smRNA libraries (as well as C. elegans and found evidence of transitivity of MIRNA genes similar to that found in Arabidopsis. Statistical analysis of antisense transcript abundances, presence of antisense ESTs, and association with smRNAs suggests several hundred Arabidopsis 'orphan' hypothetical genes are non-coding RNAs. Consistent with this hypothesis, we found novel Arabidopsis homologues of some MIRNA genes on the antisense strand of previously annotated protein-coding genes. A Support Vector Machine (SVM was applied using thermodynamic energy of binding plus novel expression features of sense/antisense transcription topology and siRNA abundances to build a prediction model of miRNA targets. The SVM when trained on targets could predict the "ancient" (deeply conserved class of validated Arabidopsis MIRNA genes with an accuracy of 84%, and 76% for "new" rapidly-evolving MIRNA genes.Antisense and smRNA expression features and computational methods may identify novel MIRNA genes and other non-coding RNAs in plants and potentially other

  2. Draft Genome Sequence of Halostagnicola sp. A56, an Extremely Halophilic Archaeon Isolated from the Andaman Islands

    Science.gov (United States)

    Kanekar, Sagar P.; Saxena, Neha; Pore, Soham D.; Arora, Preeti; Kanekar, P. P.

    2015-01-01

    The first draft genome of Halostagnicola sp. A56, isolated from the Andaman Islands is reported here. The A56 genome comprises 3,178,490 bp in 26 contigs with a G+C content of 60.8%. The genome annotation revealed that A56 could have potential applications for the production of polyhydroxyalkanoate or bioplastics. PMID:26564049

  3. The king cobra genome reveals dynamic gene evolution and adaptation in the snake venom system.

    Science.gov (United States)

    Vonk, Freek J; Casewell, Nicholas R; Henkel, Christiaan V; Heimberg, Alysha M; Jansen, Hans J; McCleary, Ryan J R; Kerkkamp, Harald M E; Vos, Rutger A; Guerreiro, Isabel; Calvete, Juan J; Wüster, Wolfgang; Woods, Anthony E; Logan, Jessica M; Harrison, Robert A; Castoe, Todd A; de Koning, A P Jason; Pollock, David D; Yandell, Mark; Calderon, Diego; Renjifo, Camila; Currier, Rachel B; Salgado, David; Pla, Davinia; Sanz, Libia; Hyder, Asad S; Ribeiro, José M C; Arntzen, Jan W; van den Thillart, Guido E E J M; Boetzer, Marten; Pirovano, Walter; Dirks, Ron P; Spaink, Herman P; Duboule, Denis; McGlinn, Edwina; Kini, R Manjunatha; Richardson, Michael K

    2013-12-17

    Snakes are limbless predators, and many species use venom to help overpower relatively large, agile prey. Snake venoms are complex protein mixtures encoded by several multilocus gene families that function synergistically to cause incapacitation. To examine venom evolution, we sequenced and interrogated the genome of a venomous snake, the king cobra (Ophiophagus hannah), and compared it, together with our unique transcriptome, microRNA, and proteome datasets from this species, with data from other vertebrates. In contrast to the platypus, the only other venomous vertebrate with a sequenced genome, we find that snake toxin genes evolve through several distinct co-option mechanisms and exhibit surprisingly variable levels of gene duplication and directional selection that correlate with their functional importance in prey capture. The enigmatic accessory venom gland shows a very different pattern of toxin gene expression from the main venom gland and seems to have recruited toxin-like lectin genes repeatedly for new nontoxic functions. In addition, tissue-specific microRNA analyses suggested the co-option of core genetic regulatory components of the venom secretory system from a pancreatic origin. Although the king cobra is limbless, we recovered coding sequences for all Hox genes involved in amniote limb development, with the exception of Hoxd12. Our results provide a unique view of the origin and evolution of snake venom and reveal multiple genome-level adaptive responses to natural selection in this complex biological weapon system. More generally, they provide insight into mechanisms of protein evolution under strong selection.

  4. Infectious diseases of marine molluscs and host responses as revealed by genomic tools

    Science.gov (United States)

    Ford, Susan E.

    2016-01-01

    More and more infectious diseases affect marine molluscs. Some diseases have impacted commercial species including MSX and Dermo of the eastern oyster, QPX of hard clams, withering syndrome of abalone and ostreid herpesvirus 1 (OsHV-1) infections of many molluscs. Although the exact transmission mechanisms are not well understood, human activities and associated environmental changes often correlate with increased disease prevalence. For instance, hatcheries and large-scale aquaculture create high host densities, which, along with increasing ocean temperature, might have contributed to OsHV-1 epizootics in scallops and oysters. A key to understanding linkages between the environment and disease is to understand how the environment affects the host immune system. Although we might be tempted to downplay the role of immunity in invertebrates, recent advances in genomics have provided insights into host and parasite genomes and revealed surprisingly sophisticated innate immune systems in molluscs. All major innate immune pathways are found in molluscs with many immune receptors, regulators and effectors expanded. The expanded gene families provide great diversity and complexity in innate immune response, which may be key to mollusc's defence against diverse pathogens in the absence of adaptive immunity. Further advances in host and parasite genomics should improve our understanding of genetic variation in parasite virulence and host disease resistance. PMID:26880838

  5. The king cobra genome reveals dynamic gene evolution and adaptation in the snake venom system

    Science.gov (United States)

    Vonk, Freek J.; Casewell, Nicholas R.; Henkel, Christiaan V.; Heimberg, Alysha M.; Jansen, Hans J.; McCleary, Ryan J. R.; Kerkkamp, Harald M. E.; Vos, Rutger A.; Guerreiro, Isabel; Calvete, Juan J.; Wüster, Wolfgang; Woods, Anthony E.; Logan, Jessica M.; Harrison, Robert A.; Castoe, Todd A.; de Koning, A. P. Jason; Pollock, David D.; Yandell, Mark; Calderon, Diego; Renjifo, Camila; Currier, Rachel B.; Salgado, David; Pla, Davinia; Sanz, Libia; Hyder, Asad S.; Ribeiro, José M. C.; Arntzen, Jan W.; van den Thillart, Guido E. E. J. M.; Boetzer, Marten; Pirovano, Walter; Dirks, Ron P.; Spaink, Herman P.; Duboule, Denis; McGlinn, Edwina; Kini, R. Manjunatha; Richardson, Michael K.

    2013-01-01

    Snakes are limbless predators, and many species use venom to help overpower relatively large, agile prey. Snake venoms are complex protein mixtures encoded by several multilocus gene families that function synergistically to cause incapacitation. To examine venom evolution, we sequenced and interrogated the genome of a venomous snake, the king cobra (Ophiophagus hannah), and compared it, together with our unique transcriptome, microRNA, and proteome datasets from this species, with data from other vertebrates. In contrast to the platypus, the only other venomous vertebrate with a sequenced genome, we find that snake toxin genes evolve through several distinct co-option mechanisms and exhibit surprisingly variable levels of gene duplication and directional selection that correlate with their functional importance in prey capture. The enigmatic accessory venom gland shows a very different pattern of toxin gene expression from the main venom gland and seems to have recruited toxin-like lectin genes repeatedly for new nontoxic functions. In addition, tissue-specific microRNA analyses suggested the co-option of core genetic regulatory components of the venom secretory system from a pancreatic origin. Although the king cobra is limbless, we recovered coding sequences for all Hox genes involved in amniote limb development, with the exception of Hoxd12. Our results provide a unique view of the origin and evolution of snake venom and reveal multiple genome-level adaptive responses to natural selection in this complex biological weapon system. More generally, they provide insight into mechanisms of protein evolution under strong selection. PMID:24297900

  6. Genome-wide Selective Sweeps in Natural Bacterial Populations Revealed by Time-series Metagenomics

    Energy Technology Data Exchange (ETDEWEB)

    Chan, Leong-Keat; Bendall, Matthew L.; Malfatti, Stephanie; Schwientek, Patrick; Tremblay, Julien; Schackwitz, Wendy; Martin, Joel; Pati, Amrita; Bushnell, Brian; Foster, Brian; Kang, Dongwan; Tringe, Susannah G.; Bertilsson, Stefan; Moran, Mary Ann; Shade, Ashley; Newton, Ryan J.; Stevens, Sarah; McMcahon, Katherine D.; Mamlstrom, Rex R.

    2014-05-12

    Multiple evolutionary models have been proposed to explain the formation of genetically and ecologically distinct bacterial groups. Time-series metagenomics enables direct observation of evolutionary processes in natural populations, and if applied over a sufficiently long time frame, this approach could capture events such as gene-specific or genome-wide selective sweeps. Direct observations of either process could help resolve how distinct groups form in natural microbial assemblages. Here, from a three-year metagenomic study of a freshwater lake, we explore changes in single nucleotide polymorphism (SNP) frequencies and patterns of gene gain and loss in populations of Chlorobiaceae and Methylophilaceae. SNP analyses revealed substantial genetic heterogeneity within these populations, although the degree of heterogeneity varied considerably among closely related, co-occurring Methylophilaceae populations. SNP allele frequencies, as well as the relative abundance of certain genes, changed dramatically over time in each population. Interestingly, SNP diversity was purged at nearly every genome position in one of the Chlorobiaceae populations over the course of three years, while at the same time multiple genes either swept through or were swept from this population. These patterns were consistent with a genome-wide selective sweep, a process predicted by the ecotype model? of diversification, but not previously observed in natural populations.

  7. Genome-wide Selective Sweeps in Natural Bacterial Populations Revealed by Time-series Metagenomics

    Energy Technology Data Exchange (ETDEWEB)

    Chan, Leong-Keat; Bendall, Matthew L.; Malfatti, Stephanie; Schwientek, Patrick; Tremblay, Julien; Schackwitz, Wendy; Martin, Joel; Pati, Amrita; Bushnell, Brian; Foster, Brian; Kang, Dongwan; Tringe, Susannah G.; Bertilsson, Stefan; Moran, Mary Ann; Shade, Ashley; Newton, Ryan J.; Stevens, Sarah; McMahon, Katherine D.; Malmstrom, Rex R.

    2014-06-18

    Multiple evolutionary models have been proposed to explain the formation of genetically and ecologically distinct bacterial groups. Time-series metagenomics enables direct observation of evolutionary processes in natural populations, and if applied over a sufficiently long time frame, this approach could capture events such as gene-specific or genome-wide selective sweeps. Direct observations of either process could help resolve how distinct groups form in natural microbial assemblages. Here, from a three-year metagenomic study of a freshwater lake, we explore changes in single nucleotide polymorphism (SNP) frequencies and patterns of gene gain and loss in populations of Chlorobiaceae and Methylophilaceae. SNP analyses revealed substantial genetic heterogeneity within these populations, although the degree of heterogeneity varied considerably among closely related, co-occurring Methylophilaceae populations. SNP allele frequencies, as well as the relative abundance of certain genes, changed dramatically over time in each population. Interestingly, SNP diversity was purged at nearly every genome position in one of the Chlorobiaceae populations over the course of three years, while at the same time multiple genes either swept through or were swept from this population. These patterns were consistent with a genome-wide selective sweep, a process predicted by the ‘ecotype model’ of diversification, but not previously observed in natural populations.

  8. The draft genome of Tibetan hulless barley reveals adaptive patterns to the high stressful Tibetan Plateau.

    Science.gov (United States)

    Zeng, Xingquan; Long, Hai; Wang, Zhuo; Zhao, Shancen; Tang, Yawei; Huang, Zhiyong; Wang, Yulin; Xu, Qijun; Mao, Likai; Deng, Guangbing; Yao, Xiaoming; Li, Xiangfeng; Bai, Lijun; Yuan, Hongjun; Pan, Zhifen; Liu, Renjian; Chen, Xin; WangMu, QiMei; Chen, Ming; Yu, Lili; Liang, Junjun; DunZhu, DaWa; Zheng, Yuan; Yu, Shuiyang; LuoBu, ZhaXi; Guang, Xuanmin; Li, Jiang; Deng, Cao; Hu, Wushu; Chen, Chunhai; TaBa, XiongNu; Gao, Liyun; Lv, Xiaodan; Abu, Yuval Ben; Fang, Xiaodong; Nevo, Eviatar; Yu, Maoqun; Wang, Jun; Tashi, Nyima

    2015-01-27

    The Tibetan hulless barley (Hordeum vulgare L. var. nudum), also called "Qingke" in Chinese and "Ne" in Tibetan, is the staple food for Tibetans and an important livestock feed in the Tibetan Plateau. The diploid nature and adaptation to diverse environments of the highland give it unique resources for genetic research and crop improvement. Here we produced a 3.89-Gb draft assembly of Tibetan hulless barley with 36,151 predicted protein-coding genes. Comparative analyses revealed the divergence times and synteny between barley and other representative Poaceae genomes. The expansion of the gene family related to stress responses was found in Tibetan hulless barley. Resequencing of 10 barley accessions uncovered high levels of genetic variation in Tibetan wild barley and genetic divergence between Tibetan and non-Tibetan barley genomes. Selective sweep analyses demonstrate adaptive correlations of genes under selection with extensive environmental variables. Our results not only construct a genomic framework for crop improvement but also provide evolutionary insights of highland adaptation of Tibetan hulless barley.

  9. Multi-region and single-cell sequencing reveal variable genomic heterogeneity in rectal cancer.

    Science.gov (United States)

    Liu, Mingshan; Liu, Yang; Di, Jiabo; Su, Zhe; Yang, Hong; Jiang, Beihai; Wang, Zaozao; Zhuang, Meng; Bai, Fan; Su, Xiangqian

    2017-11-23

    Colorectal cancer is a heterogeneous group of malignancies with complex molecular subtypes. While colon cancer has been widely investigated, studies on rectal cancer are very limited. Here, we performed multi-region whole-exome sequencing and single-cell whole-genome sequencing to examine the genomic intratumor heterogeneity (ITH) of rectal tumors. We sequenced nine tumor regions and 88 single cells from two rectal cancer patients with tumors of the same molecular classification and characterized their mutation profiles and somatic copy number alterations (SCNAs) at the multi-region and the single-cell levels. A variable extent of genomic heterogeneity was observed between the two patients, and the degree of ITH increased when analyzed on the single-cell level. We found that major SCNAs were early events in cancer development and inherited steadily. Single-cell sequencing revealed mutations and SCNAs which were hidden in bulk sequencing. In summary, we studied the ITH of rectal cancer at regional and single-cell resolution and demonstrated that variable heterogeneity existed in two patients. The mutational scenarios and SCNA profiles of two patients with treatment naïve from the same molecular subtype are quite different. Our results suggest each tumor possesses its own architecture, which may result in different diagnosis, prognosis, and drug responses. Remarkable ITH exists in the two patients we have studied, providing a preliminary impression of ITH in rectal cancer.

  10. Distinct Biological Potential of Streptococcus gordonii and Streptococcus sanguinis Revealed by Comparative Genome Analysis.

    Science.gov (United States)

    Zheng, Wenning; Tan, Mui Fern; Old, Lesley A; Paterson, Ian C; Jakubovics, Nicholas S; Choo, Siew Woh

    2017-06-07

    Streptococcus gordonii and Streptococcus sanguinis are pioneer colonizers of dental plaque and important agents of bacterial infective endocarditis (IE). To gain a greater understanding of these two closely related species, we performed comparative analyses on 14 new S. gordonii and 5 S. sanguinis strains using various bioinformatics approaches. We revealed S. gordonii and S. sanguinis harbor open pan-genomes and share generally high sequence homology and number of core genes including virulence genes. However, we observed subtle differences in genomic islands and prophages between the species. Comparative pathogenomics analysis identified S. sanguinis strains have genes encoding IgA proteases, mitogenic factor deoxyribonucleases, nickel/cobalt uptake and cobalamin biosynthesis. On the contrary, genomic islands of S. gordonii strains contain additional copies of comCDE quorum-sensing system components involved in genetic competence. Two distinct polysaccharide locus architectures were identified, one of which was exclusively present in S. gordonii strains. The first evidence of genes encoding the CylA and CylB system by the α-haemolytic S. gordonii is presented. This study provides new insights into the genetic distinctions between S. gordonii and S. sanguinis, which yields understanding of tooth surfaces colonization and contributions to dental plaque formation, as well as their potential roles in the pathogenesis of IE.

  11. The genome of the seagrass Zostera marina reveals angiosperm adaptation to the sea

    KAUST Repository

    Olsen, Jeanine L.

    2016-01-27

    Seagrasses colonized the sea1 on at least three independent occasions to form the basis of one of the most productive and widespread coastal ecosystems on the planet2. Here we report the genome of Zostera marina (L.), the first, to our knowledge, marine angiosperm to be fully sequenced. This reveals unique insights into the genomic losses and gains involved in achieving the structural and physiological adaptations required for its marine lifestyle, arguably the most severe habitat shift ever accomplished by flowering plants. Key angiosperm innovations that were lost include the entire repertoire of stomatal genes3, genes involved in the synthesis of terpenoids and ethylene signalling, and genes for ultraviolet protection and phytochromes for far-red sensing. Seagrasses have also regained functions enabling them to adjust to full salinity. Their cell walls contain all of the polysaccharides typical of land plants, but also contain polyanionic, low-methylated pectins and sulfated galactans, a feature shared with the cell walls of all macroalgae4 and that is important for ion homoeostasis, nutrient uptake and O2/CO2 exchange through leaf epidermal cells. The Z. marina genome resource will markedly advance a wide range of functional ecological studies from adaptation of marine ecosystems under climate warming5, 6, to unravelling the mechanisms of osmoregulation under high salinities that may further inform our understanding of the evolution of salt tolerance in crop plants7.

  12. Complex evolutionary patterns revealed by mitochondrial genomes of the domestic horse.

    Science.gov (United States)

    Ning, T; Li, J; Lin, K; Xiao, H; Wylie, S; Hua, S; Li, H; Zhang, Y-P

    2014-01-01

    The domestic horse is the most widely used and important stock and recreational animal, valued for its strength and endurance. The energy required by the domestic horse is mainly supplied by mitochondria via oxidative phosphorylation. Thus, selection may have played an essential role in the evolution of the horse mitochondria. Besides, demographic events also affect the DNA polymorphic pattern on mitochondria. To understand the evolutionary patterns of the mitochondria of the domestic horse, we used a deep sequencing approach to obtain the complete sequences of 15 mitochondrial genomes, and four mitochondrial gene sequences, ND6, ATP8, ATP6 and CYTB, collected from 509, 363, 363 and 409 domestic horses, respectively. Evidence of strong substitution rate heterogeneity was found at nonsynonymous sites across the genomes. Signatures of recent positive selection on mtDNA of domestic horse were detected. Specifically, five amino acids in the four mitochondrial genes were identified as the targets of positive selection. Coalescentbased simulations imply that recent population expansion is the most probable explanation for the matrilineal population history for domestic horse. Our findings reveal a complex pattern of non-neutral evolution of the mitochondrial genome in the domestic horses.

  13. Genome scan for nonadditive heterotic trait loci reveals mainly underdominant effects in Saccharomyces cerevisiae.

    Science.gov (United States)

    Laiba, Efrat; Glikaite, Ilana; Levy, Yael; Pasternak, Zohar; Fridman, Eyal

    2016-04-01

    The overdominant model of heterosis explains the superior phenotype of hybrids by synergistic allelic interaction within heterozygous loci. To map such genetic variation in yeast, we used a population doubling time dataset of Saccharomyces cerevisiae 16 × 16 diallel and searched for major contributing heterotic trait loci (HTL). Heterosis was observed for the majority of hybrids, as they surpassed their best parent growth rate. However, most of the local heterozygous loci identified by genome scan were surprisingly underdominant, i.e., reduced growth. We speculated that in these loci adverse effects on growth resulted from incompatible allelic interactions. To test this assumption, we eliminated these allelic interactions by creating hybrids with local hemizygosity for the underdominant HTLs, as well as for control random loci. Growth of hybrids was indeed elevated for most hemizygous to HTL genes but not for control genes, hence validating the results of our genome scan. Assessing the consequences of local heterozygosity by reciprocal hemizygosity and allele replacement assays revealed the influence of genetic background on the underdominant effects of HTLs. Overall, this genome-wide study on a multi-parental hybrid population provides a strong argument against single gene overdominance as a major contributor to heterosis, and favors the dominance complementation model.

  14. Comparative genomics of four closely related Clostridium perfringens bacteriophages reveals variable evolution among core genes with therapeutic potential

    Directory of Open Access Journals (Sweden)

    Siragusa Gregory R

    2011-06-01

    Full Text Available Abstract Background Because biotechnological uses of bacteriophage gene products as alternatives to conventional antibiotics will require a thorough understanding of their genomic context, we sequenced and analyzed the genomes of four closely related phages isolated from Clostridium perfringens, an important agricultural and human pathogen. Results Phage whole-genome tetra-nucleotide signatures and proteomic tree topologies correlated closely with host phylogeny. Comparisons of our phage