WorldWideScience

Sample records for genome sequencing current

  1. Genome Sequencing

    DEFF Research Database (Denmark)

    Sato, Shusei; Andersen, Stig Uggerhøj

    2014-01-01

    The current Lotus japonicus reference genome sequence is based on a hybrid assembly of Sanger TAC/BAC, Sanger shotgun and Illumina shotgun sequencing data generated from the Miyakojima-MG20 accession. It covers nearly all expressed L. japonicus genes and has been annotated mainly based on transcr......The current Lotus japonicus reference genome sequence is based on a hybrid assembly of Sanger TAC/BAC, Sanger shotgun and Illumina shotgun sequencing data generated from the Miyakojima-MG20 accession. It covers nearly all expressed L. japonicus genes and has been annotated mainly based...

  2. Hepatitis C virus whole genome sequencing: Current methods/issues and future challenges.

    Science.gov (United States)

    Trémeaux, Pauline; Caporossi, Alban; Thélu, Marie-Ange; Blum, Michael; Leroy, Vincent; Morand, Patrice; Larrat, Sylvie

    2016-10-01

    Therapy for hepatitis C is currently undergoing a revolution. The arrival of new antiviral agents targeting viral proteins reinforces the need for a better knowledge of the viral strains infecting each patient. Hepatitis C virus (HCV) whole genome sequencing provides essential information for precise typing, study of the viral natural history or identification of resistance-associated variants. First performed with Sanger sequencing, the arrival of next-generation sequencing (NGS) has simplified the technical process and provided more detailed data on the nature and evolution of viral quasi-species. We will review the different techniques used for HCV complete genome sequencing and their applications, both before and after the apparition of NGS. The progress brought by new and future technologies will also be discussed, as well as the remaining difficulties, largely due to the genomic variability.

  3. NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy.

    Science.gov (United States)

    Pruitt, Kim D; Tatusova, Tatiana; Brown, Garth R; Maglott, Donna R

    2012-01-01

    The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database is a collection of genomic, transcript and protein sequence records. These records are selected and curated from public sequence archives and represent a significant reduction in redundancy compared to the volume of data archived by the International Nucleotide Sequence Database Collaboration. The database includes over 16,00 organisms, 2.4 × 0(6) genomic records, 13 × 10(6) proteins and 2 × 10(6) RNA records spanning prokaryotes, eukaryotes and viruses (RefSeq release 49, September 2011). The RefSeq database is maintained by a combined approach of automated analyses, collaboration and manual curation to generate an up-to-date representation of the sequence, its features, names and cross-links to related sources of information. We report here on recent growth, the status of curating the human RefSeq data set, more extensive feature annotation and current policy for eukaryotic genome annotation via the NCBI annotation pipeline. More information about the resource is available online (see http://www.ncbi.nlm.nih.gov/RefSeq/).

  4. Yeast genome sequencing:

    DEFF Research Database (Denmark)

    Piskur, Jure; Langkjær, Rikke Breinhold

    2004-01-01

    For decades, unicellular yeasts have been general models to help understand the eukaryotic cell and also our own biology. Recently, over a dozen yeast genomes have been sequenced, providing the basis to resolve several complex biological questions. Analysis of the novel sequence data has shown...... of closely related species helps in gene annotation and to answer how many genes there really are within the genomes. Analysis of non-coding regions among closely related species has provided an example of how to determine novel gene regulatory sequences, which were previously difficult to analyse because...... they are short and degenerate and occupy different positions. Comparative genomics helps to understand the origin of yeasts and points out crucial molecular events in yeast evolutionary history, such as whole-genome duplication and horizontal gene transfer(s). In addition, the accumulating sequence data provide...

  5. Whole-exome/genome sequencing and genomics.

    Science.gov (United States)

    Grody, Wayne W; Thompson, Barry H; Hudgins, Louanne

    2013-12-01

    As medical genetics has progressed from a descriptive entity to one focused on the functional relationship between genes and clinical disorders, emphasis has been placed on genomics. Genomics, a subelement of genetics, is the study of the genome, the sum total of all the genes of an organism. The human genome, which is contained in the 23 pairs of nuclear chromosomes and in the mitochondrial DNA of each cell, comprises >6 billion nucleotides of genetic code. There are some 23,000 protein-coding genes, a surprisingly small fraction of the total genetic material, with the remainder composed of noncoding DNA, regulatory sequences, and introns. The Human Genome Project, launched in 1990, produced a draft of the genome in 2001 and then a finished sequence in 2003, on the 50th anniversary of the initial publication of Watson and Crick's paper on the double-helical structure of DNA. Since then, this mass of genetic information has been translated at an ever-increasing pace into useable knowledge applicable to clinical medicine. The recent advent of massively parallel DNA sequencing (also known as shotgun, high-throughput, and next-generation sequencing) has brought whole-genome analysis into the clinic for the first time, and most of the current applications are directed at children with congenital conditions that are undiagnosable by using standard genetic tests for single-gene disorders. Thus, pediatricians must become familiar with this technology, what it can and cannot offer, and its technical and ethical challenges. Here, we address the concepts of human genomic analysis and its clinical applicability for primary care providers.

  6. Malaria Genome Sequencing Project

    Science.gov (United States)

    2004-01-01

    million cases and up to 2.7 million A whole chromosome shotgun sequencing strategy was used to deaths from malaria each year. The mortality levels are...deaths from malaria each year. The mortality levels are greatest in determine the genome sequence of P. falciparum clone 3D7. This sub-Saharan Africa...aminolevulinic acid dehydratase. Cura . Genet. 40, 391-398 (2002). 15. Lasonder, E. et al Analysis of the Plasmodium falciparum proteome by high-accuracy mass

  7. Genome sequencing conference II

    Energy Technology Data Exchange (ETDEWEB)

    1990-01-01

    Genome Sequencing Conference 2 was held September 30 to October 30, 1990. 26 speaker abstracts and 33 poster presentations were included in the program report. New and improved methods for DNA sequencing and genetic mapping were presented. Many of the papers were concerned with accuracy and speed of acquisition of data with computers and automation playing an increasing role. Individual papers have been processed separately for inclusion on the database.

  8. Translational genomics for plant breeding with the genome sequence explosion.

    Science.gov (United States)

    Kang, Yang Jae; Lee, Taeyoung; Lee, Jayern; Shim, Sangrea; Jeong, Haneul; Satyawan, Dani; Kim, Moon Young; Lee, Suk-Ha

    2016-04-01

    The use of next-generation sequencers and advanced genotyping technologies has propelled the field of plant genomics in model crops and plants and enhanced the discovery of hidden bridges between genotypes and phenotypes. The newly generated reference sequences of unstudied minor plants can be annotated by the knowledge of model plants via translational genomics approaches. Here, we reviewed the strategies of translational genomics and suggested perspectives on the current databases of genomic resources and the database structures of translated information on the new genome. As a draft picture of phenotypic annotation, translational genomics on newly sequenced plants will provide valuable assistance for breeders and researchers who are interested in genetic studies.

  9. Sequencing the maize genome.

    Science.gov (United States)

    Martienssen, Robert A; Rabinowicz, Pablo D; O'Shaughnessy, Andrew; McCombie, W Richard

    2004-04-01

    Sequencing of complex genomes can be accomplished by enriching shotgun libraries for genes. In maize, gene-enrichment by copy-number normalization (high C(0)t) and methylation filtration (MF) have been used to generate up to two-fold coverage of the gene-space with less than 1 million sequencing reads. Simulations using sequenced bacterial artificial chromosome (BAC) clones predict that 5x coverage of gene-rich regions, accompanied by less than 1x coverage of subclones from BAC contigs, will generate high-quality mapped sequence that meets the needs of geneticists while accommodating unusually high levels of structural polymorphism. By sequencing several inbred strains, we propose a strategy for capturing this polymorphism to investigate hybrid vigor or heterosis.

  10. Fungal genome sequencing: basic biology to biotechnology.

    Science.gov (United States)

    Sharma, Krishna Kant

    2016-08-01

    The genome sequences provide a first glimpse into the genomic basis of the biological diversity of filamentous fungi and yeast. The genome sequence of the budding yeast, Saccharomyces cerevisiae, with a small genome size, unicellular growth, and rich history of genetic and molecular analyses was a milestone of early genomics in the 1990s. The subsequent completion of fission yeast, Schizosaccharomyces pombe and genetic model, Neurospora crassa initiated a revolution in the genomics of the fungal kingdom. In due course of time, a substantial number of fungal genomes have been sequenced and publicly released, representing the widest sampling of genomes from any eukaryotic kingdom. An ambitious genome-sequencing program provides a wealth of data on metabolic diversity within the fungal kingdom, thereby enhancing research into medical science, agriculture science, ecology, bioremediation, bioenergy, and the biotechnology industry. Fungal genomics have higher potential to positively affect human health, environmental health, and the planet's stored energy. With a significant increase in sequenced fungal genomes, the known diversity of genes encoding organic acids, antibiotics, enzymes, and their pathways has increased exponentially. Currently, over a hundred fungal genome sequences are publicly available; however, no inclusive review has been published. This review is an initiative to address the significance of the fungal genome-sequencing program and provides the road map for basic and applied research.

  11. Classifying Genomic Sequences by Sequence Feature Analysis

    Institute of Scientific and Technical Information of China (English)

    Zhi-Hua Liu; Dian Jiao; Xiao Sun

    2005-01-01

    Traditional sequence analysis depends on sequence alignment. In this study, we analyzed various functional regions of the human genome based on sequence features, including word frequency, dinucleotide relative abundance, and base-base correlation. We analyzed the human chromosome 22 and classified the upstream,exon, intron, downstream, and intergenic regions by principal component analysis and discriminant analysis of these features. The results show that we could classify the functional regions of genome based on sequence feature and discriminant analysis.

  12. Genome Sequence Databases (Overview): Sequencing and Assembly

    Energy Technology Data Exchange (ETDEWEB)

    Lapidus, Alla L.

    2009-01-01

    From the date its role in heredity was discovered, DNA has been generating interest among scientists from different fields of knowledge: physicists have studied the three dimensional structure of the DNA molecule, biologists tried to decode the secrets of life hidden within these long molecules, and technologists invent and improve methods of DNA analysis. The analysis of the nucleotide sequence of DNA occupies a special place among the methods developed. Thanks to the variety of sequencing technologies available, the process of decoding the sequence of genomic DNA (or whole genome sequencing) has become robust and inexpensive. Meanwhile the assembly of whole genome sequences remains a challenging task. In addition to the need to assemble millions of DNA fragments of different length (from 35 bp (Solexa) to 800 bp (Sanger)), great interest in analysis of microbial communities (metagenomes) of different complexities raises new problems and pushes some new requirements for sequence assembly tools to the forefront. The genome assembly process can be divided into two steps: draft assembly and assembly improvement (finishing). Despite the fact that automatically performed assembly (or draft assembly) is capable of covering up to 98% of the genome, in most cases, it still contains incorrectly assembled reads. The error rate of the consensus sequence produced at this stage is about 1/2000 bp. A finished genome represents the genome assembly of much higher accuracy (with no gaps or incorrectly assembled areas) and quality ({approx}1 error/10,000 bp), validated through a number of computer and laboratory experiments.

  13. Pig genome sequence - analysis and publication strategy

    NARCIS (Netherlands)

    Archibald, A.L.; Bolund, L.; Churcher, C.; Fredholm, M.; Groenen, M.A.M.; Harlizius, B.

    2010-01-01

    Background - The pig genome is being sequenced and characterised under the auspices of the Swine Genome Sequencing Consortium. The sequencing strategy followed a hybrid approach combining hierarchical shotgun sequencing of BAC clones and whole genome shotgun sequencing. Results - Assemblies of the B

  14. Pig genome sequence - analysis and publication strategy

    DEFF Research Database (Denmark)

    Archibald, Alan L.; Bolund, Lars; Churcher, Carol;

    2010-01-01

    BACKGROUND: The pig genome is being sequenced and characterised under the auspices of the Swine Genome Sequencing Consortium. The sequencing strategy followed a hybrid approach combining hierarchical shotgun sequencing of BAC clones and whole genome shotgun sequencing. RESULTS: Assemblies......) is under construction and will incorporate whole genome shotgun sequence (WGS) data providing > 30x genome coverage. The WGS sequence, most of which comprise short Illumina/Solexa reads, were generated from DNA from the same single Duroc sow as the source of the BAC library from which clones were...

  15. Proficiency testing for bacterial whole genome sequencing: an end-user survey of current capabilities, requirements and priorities

    DEFF Research Database (Denmark)

    Moran-Gilad, Jacob; Sintchenko, Vitali; Karlsmose Pedersen, Susanne

    2015-01-01

    The advent of next-generation sequencing (NGS) has revolutionised public health microbiology. Given the potential impact of NGS, it is paramount to ensure standardisation of 'wet' laboratory and bioinformatic protocols and promote comparability of methods employed by different laboratories and th...

  16. Proficiency Testing for Bacterial Whole Genome Sequencing: An End-User Survey of Current Capabilities, Requirements and Priorities

    DEFF Research Database (Denmark)

    Moran-Gilad, Jacob; Sintchenko, Vitali; Karlsmose Pedersen, Susanne

    2015-01-01

    of costs. The priority pathogens reported by respondents reflected the key drivers for NGS use (high burden disease and ‘high profile’ pathogens). The performance of and participation in PT was perceived as important by most respondents. The wide range of sequencing and bioinformatics practices reported...

  17. Genome Sequences of Eight Morphologically Diverse Alphaproteobacteria▿

    OpenAIRE

    Brown, Pamela J.B.; Kysela, David T.; Buechlein, Aaron; Hemmerich, Chris; Brun, Yves V

    2011-01-01

    The Alphaproteobacteriacomprise morphologically diverse bacteria, including many species of stalked bacteria. Here we announce the genome sequences of eight alphaproteobacteria, including the first genome sequences of species belonging to the genera Asticcacaulis, Hirschia, Hyphomicrobium, and Rhodomicrobium.

  18. Genome sequences of eight morphologically diverse Alphaproteobacteria.

    Science.gov (United States)

    Brown, Pamela J B; Kysela, David T; Buechlein, Aaron; Hemmerich, Chris; Brun, Yves V

    2011-09-01

    The Alphaproteobacteria comprise morphologically diverse bacteria, including many species of stalked bacteria. Here we announce the genome sequences of eight alphaproteobacteria, including the first genome sequences of species belonging to the genera Asticcacaulis, Hirschia, Hyphomicrobium, and Rhodomicrobium.

  19. Genome Sequences of Eight Morphologically Diverse Alphaproteobacteria▿

    Science.gov (United States)

    Brown, Pamela J. B.; Kysela, David T.; Buechlein, Aaron; Hemmerich, Chris; Brun, Yves V.

    2011-01-01

    The Alphaproteobacteriacomprise morphologically diverse bacteria, including many species of stalked bacteria. Here we announce the genome sequences of eight alphaproteobacteria, including the first genome sequences of species belonging to the genera Asticcacaulis, Hirschia, Hyphomicrobium, and Rhodomicrobium. PMID:21705585

  20. Genome Sequence of Mycobacteriophage Momo.

    Science.gov (United States)

    Pope, Welkin H; Bina, Elizabeth A; Brahme, Indraneel S; Hill, Amy B; Himmelstein, Philip H; Hunsicker, Sara M; Ish, Amanda R; Le, Tinh S; Martin, Mary M; Moscinski, Catherine N; Shetty, Sameer A; Swierzewski, Tomasz; Iyengar, Varun B; Kim, Hannah; Schafer, Claire E; Grubb, Sarah R; Warner, Marcie H; Bowman, Charles A; Russell, Daniel A; Hatfull, Graham F

    2015-06-18

    Momo is a newly discovered phage of Mycobacterium smegmatis mc(2)155. Momo has a double-stranded DNA genome 154,553 bp in length, with 233 predicted protein-encoding genes, 34 tRNA genes, and one transfer-messenger RNA (tmRNA) gene. Momo has a myoviral morphology and shares extensive nucleotide sequence similarity with subcluster C1 mycobacteriophages. Copyright © 2015 Pope et al.

  1. Genomic Sequence Variation Markup Language (GSVML).

    Science.gov (United States)

    Nakaya, Jun; Kimura, Michio; Hiroi, Kaei; Ido, Keisuke; Yang, Woosung; Tanaka, Hiroshi

    2010-02-01

    a potential data exchanging format for genomic sequence variation data exchange focusing on human health applications. The international standardization of GSVML is necessary, and is currently underway. GSVML can be applied to enhance the utilization of genomic sequence variation data worldwide by providing a communicable platform between clinical and research applications. Copyright 2009 Elsevier Ireland Ltd. All rights reserved.

  2. Next-generation sequencing and large genome assemblies

    OpenAIRE

    Henson, Joseph; Tischler, German; Ning, Zemin

    2012-01-01

    The next-generation sequencing (NGS) revolution has drastically reduced time and cost requirements for sequencing of large genomes, and also qualitatively changed the problem of assembly. This article reviews the state of the art in de novo genome assembly, paying particular attention to mammalian-sized genomes. The strengths and weaknesses of the main sequencing platforms are highlighted, leading to a discussion of assembly and the new challenges associated with NGS data. Current approaches ...

  3. The Cassava Genome: Current Progress, Future Directions.

    Science.gov (United States)

    Prochnik, Simon; Marri, Pradeep Reddy; Desany, Brian; Rabinowicz, Pablo D; Kodira, Chinnappa; Mohiuddin, Mohammed; Rodriguez, Fausto; Fauquet, Claude; Tohme, Joseph; Harkins, Timothy; Rokhsar, Daniel S; Rounsley, Steve

    2012-03-01

    The starchy swollen roots of cassava provide an essential food source for nearly a billion people, as well as possibilities for bioenergy, yet improvements to nutritional content and resistance to threatening diseases are currently impeded. A 454-based whole genome shotgun sequence has been assembled, which covers 69% of the predicted genome size and 96% of protein-coding gene space, with genome finishing underway. The predicted 30,666 genes and 3,485 alternate splice forms are supported by 1.4 M expressed sequence tags (ESTs). Maps based on simple sequence repeat (SSR)-, and EST-derived single nucleotide polymorphisms (SNPs) already exist. Thanks to the genome sequence, a high-density linkage map is currently being developed from a cross between two diverse cassava cultivars: one susceptible to cassava brown streak disease; the other resistant. An efficient genotyping-by-sequencing (GBS) approach is being developed to catalog SNPs both within the mapping population and among diverse African farmer-preferred varieties of cassava. These resources will accelerate marker-assisted breeding programs, allowing improvements in disease-resistance and nutrition, and will help us understand the genetic basis for disease resistance.

  4. [Mapping and human genome sequence program].

    Science.gov (United States)

    Weissenbach, J

    1997-03-01

    Until recently, human genome programs focused primarily on establishing maps that would provide signposts to researchers seeking to identify genes responsible for inherited diseases, as well as a basis for genome sequencing studies. Preestablished gene mapping goals have been reached. The over 7,000 microsatellite markers identified to date provide a map of sufficient density to allow localization of the gene of a monogenic disease with a precision of 1 to 2 million base pairs. The physical map, based on systematically arranged overlapping sets of artificial yeast chromosomes (YACs), has also made considerable headway during the last few years. The most recently published map covers more than 90% of the genome. However, currently available physical maps cannot be used for sequencing studies because multiple rearrangements occur in YACs. The recently developed sets of radioinduced hybrids are extremely useful for incorporating genes into existing maps. A network of American and European laboratories has successfully used these radioinduced hybrids to map 15,000 gene tags from large-scale cDNA library sequencing programs. There are increasingly pressing reasons for initiating large scale human genome sequencing studies.

  5. Insights from 20 years of bacterial genome sequencing

    DEFF Research Database (Denmark)

    Land, Miriam; Hauser, Loren; Jun, Se-Ran

    2015-01-01

    the genome as well. Sequencing of bacterial genome sequences is now a standard procedure, and the information from tens of thousands of bacterial genomes has had a major impact on our views of the bacterial world. In this review, we explore a series of questions to highlight some insights that comparative...... (close to 90 % of bacterial genomes in GenBank are currently not complete); third-generation sequencing can potentially produce a finished genome in a few hours, and at the same time provide methlylation sites along the entire chromosome. The diversity of bacterial communities is extensive as is evident...

  6. Next-generation sequencing strategies for characterizing the turkey genome.

    Science.gov (United States)

    Dalloul, Rami A; Zimin, Aleksey V; Settlage, Robert E; Kim, Sungwon; Reed, Kent M

    2014-02-01

    The turkey genome sequencing project was initiated in 2008 and has relied primarily on next-generation sequencing (NGS) technologies. Our first efforts used a synergistic combination of 2 NGS platforms (Roche/454 and Illumina GAII), detailed bacterial artificial chromosome (BAC) maps, and unique assembly tools to sequence and assemble the genome of the domesticated turkey, Meleagris gallopavo. Since the first release in 2010, efforts to improve the genome assembly, gene annotation, and genomic analyses continue. The initial assembly build (2.01) represented about 89% of the genome sequence with 17X coverage depth (931 Mb). Sequence contigs were assigned to 30 of the 40 chromosomes with approximately 10% of the assembled sequence corresponding to unassigned chromosomes (ChrUn). The sequence has been refined through both genome-wide and area-focused sequencing, including shotgun and paired-end sequencing, and targeted sequencing of chromosomal regions with low or incomplete coverage. These additional efforts have improved the sequence assembly resulting in 2 subsequent genome builds of higher genome coverage (25X/Build3.0 and 30X/Build4.0) with a current sequence totaling 1,010 Mb. Further, BAC with end sequences assigned to the Z/W and MG18 (MHC) chromosomes, ChrUn, or not placed in the previous build were isolated, deeply sequenced (Hi-Seq), and incorporated into the latest build (5.0). To aid in the annotation and to generate a gene expression atlas of major tissues, a comprehensive set of RNA samples was collected at various developmental stages of female and male turkeys. Transcriptome sequencing data (using Illumina Hi-Seq) will provide information to enhance the final assembly and ultimately improve sequence annotation. The most current sequence covers more than 95% of the turkey genome and should yield a much improved gene level of annotation, making it a valuable resource for studying genetic variations underlying economically important traits in poultry.

  7. Sequencing intractable DNA to close microbial genomes.

    Science.gov (United States)

    Hurt, Richard A; Brown, Steven D; Podar, Mircea; Palumbo, Anthony V; Elias, Dwayne A

    2012-01-01

    Advancement in high throughput DNA sequencing technologies has supported a rapid proliferation of microbial genome sequencing projects, providing the genetic blueprint for in-depth studies. Oftentimes, difficult to sequence regions in microbial genomes are ruled "intractable" resulting in a growing number of genomes with sequence gaps deposited in databases. A procedure was developed to sequence such problematic regions in the "non-contiguous finished" Desulfovibrio desulfuricans ND132 genome (6 intractable gaps) and the Desulfovibrio africanus genome (1 intractable gap). The polynucleotides surrounding each gap formed GC rich secondary structures making the regions refractory to amplification and sequencing. Strand-displacing DNA polymerases used in concert with a novel ramped PCR extension cycle supported amplification and closure of all gap regions in both genomes. The developed procedures support accurate gene annotation, and provide a step-wise method that reduces the effort required for genome finishing.

  8. Sequencing intractable DNA to close microbial genomes.

    Directory of Open Access Journals (Sweden)

    Richard A Hurt

    Full Text Available Advancement in high throughput DNA sequencing technologies has supported a rapid proliferation of microbial genome sequencing projects, providing the genetic blueprint for in-depth studies. Oftentimes, difficult to sequence regions in microbial genomes are ruled "intractable" resulting in a growing number of genomes with sequence gaps deposited in databases. A procedure was developed to sequence such problematic regions in the "non-contiguous finished" Desulfovibrio desulfuricans ND132 genome (6 intractable gaps and the Desulfovibrio africanus genome (1 intractable gap. The polynucleotides surrounding each gap formed GC rich secondary structures making the regions refractory to amplification and sequencing. Strand-displacing DNA polymerases used in concert with a novel ramped PCR extension cycle supported amplification and closure of all gap regions in both genomes. The developed procedures support accurate gene annotation, and provide a step-wise method that reduces the effort required for genome finishing.

  9. Value of a newly sequenced bacterial genome

    DEFF Research Database (Denmark)

    Barbosa, Eudes; Aburjaile, Flavia F; Ramos, Rommel Tj

    2014-01-01

    and annotation will not be undertaken. It is important to know what is lost when we settle for a draft genome and to determine the "scientific value" of a newly sequenced genome. This review addresses the expected impact of newly sequenced genomes on antibacterial discovery and vaccinology. Also, it discusses...

  10. Transforming clinical microbiology with bacterial genome sequencing.

    Science.gov (United States)

    Didelot, Xavier; Bowden, Rory; Wilson, Daniel J; Peto, Tim E A; Crook, Derrick W

    2012-09-01

    Whole-genome sequencing of bacteria has recently emerged as a cost-effective and convenient approach for addressing many microbiological questions. Here, we review the current status of clinical microbiology and how it has already begun to be transformed by using next-generation sequencing. We focus on three essential tasks: identifying the species of an isolate, testing its properties, such as resistance to antibiotics and virulence, and monitoring the emergence and spread of bacterial pathogens. We predict that the application of next-generation sequencing will soon be sufficiently fast, accurate and cheap to be used in routine clinical microbiology practice, where it could replace many complex current techniques with a single, more efficient workflow.

  11. Draft Genome Sequence of Lactobacillus rhamnosus 2166.

    OpenAIRE

    Karlyshev, Andrey V.; Melnikov, Vyacheslav G.; Kosarev, Igor V.; Abramov, Vyacheslav M.

    2014-01-01

    In this report, we present a draft sequence of the genome of Lactobacillus rhamnosus strain 2166, a potential novel probiotic. Genome annotation and read mapping onto a reference genome of L. rhamnosus strain GG allowed for the identification of the differences and similarities in the genomic contents and gene arrangements of these strains.

  12. Draft Genome Sequence of Lactobacillus rhamnosus 2166.

    OpenAIRE

    Karlyshev, Andrey V.; Melnikov, Vyacheslav G.; Kosarev, Igor V.; Abramov, Vyacheslav M.

    2014-01-01

    In this report, we present a draft sequence of the genome of Lactobacillus rhamnosus strain 2166, a potential novel probiotic. Genome annotation and read mapping onto a reference genome of L. rhamnosus strain GG allowed for the identification of the differences and similarities in the genomic contents and gene arrangements of these strains.

  13. Synaptotagmin gene content of the sequenced genomes

    Directory of Open Access Journals (Sweden)

    Craxton Molly

    2004-07-01

    Full Text Available Abstract Background Synaptotagmins exist as a large gene family in mammals. There is much interest in the function of certain family members which act crucially in the regulated synaptic vesicle exocytosis required for efficient neurotransmission. Knowledge of the functions of other family members is relatively poor and the presence of Synaptotagmin genes in plants indicates a role for the family as a whole which is wider than neurotransmission. Identification of the Synaptotagmin genes within completely sequenced genomes can provide the entire Synaptotagmin gene complement of each sequenced organism. Defining the detailed structures of all the Synaptotagmin genes and their encoded products can provide a useful resource for functional studies and a deeper understanding of the evolution of the gene family. The current rapid increase in the number of sequenced genomes from different branches of the tree of life, together with the public deposition of evolutionarily diverse transcript sequences make such studies worthwhile. Results I have compiled a detailed list of the Synaptotagmin genes of Caenorhabditis, Anopheles, Drosophila, Ciona, Danio, Fugu, Mus, Homo, Arabidopsis and Oryza by examining genomic and transcript sequences from public sequence databases together with some transcript sequences obtained by cDNA library screening and RT-PCR. I have compared all of the genes and investigated the relationship between plant Synaptotagmins and their non-Synaptotagmin counterparts. Conclusions I have identified and compared 98 Synaptotagmin genes from 10 sequenced genomes. Detailed comparison of transcript sequences reveals abundant and complex variation in Synaptotagmin gene expression and indicates the presence of Synaptotagmin genes in all animals and land plants. Amino acid sequence comparisons indicate patterns of conservation and diversity in function. Phylogenetic analysis shows the origin of Synaptotagmins in multicellular eukaryotes and their

  14. Value of a newly sequenced bacterial genome

    Institute of Scientific and Technical Information of China (English)

    Eudes; GV; Barbosa; Flavia; F; Aburjaile; Rommel; TJ; Ramos; Adriana; R; Carneiro; Yves; Le; Loir; Jan; Baumbach; Anderson; Miyoshi; Artur; Silva; Vasco; Azevedo

    2014-01-01

    Next-generation sequencing(NGS) technologies have made high-throughput sequencing available to medium- and small-size laboratories, culminating in a tidal wave of genomic information. The quantity of sequenced bacterial genomes has not only brought excitement to the field of genomics but also heightened expectations that NGS would boost antibacterial discovery and vaccine development. Although many possible drug and vaccine targets have been discovered, the success rate of genome-based analysis has remained below expectations. Furthermore, NGS has had consequences for genome quality, resulting in an exponential increase in draft(partial data) genome deposits in public databases. If no further interests are expressed for a particular bacterial genome, it is more likely that the sequencing of its genome will be limited to a draft stage, and the painstaking tasks of completing the sequencing of its genome and annotation will not be undertaken. It is important to know what is lost when we settle for a draft genome and to determine the "scientific value" of a newly sequenced genome. This review addresses the expected impact of newly sequenced genomes on antibacterial discovery and vaccinology. Also, it discusses the factors that could be leading to the increase in the number of draft deposits and the consequent loss of relevant biological information.

  15. A Snapshot of the Emerging Tomato Genome Sequence

    Directory of Open Access Journals (Sweden)

    Lukas A. Mueller

    2009-03-01

    Full Text Available The genome of tomato ( L. is being sequenced by an international consortium of 10 countries (Korea, China, the United Kingdom, India, the Netherlands, France, Japan, Spain, Italy, and the United States as part of the larger “International Solanaceae Genome Project (SOL: Systems Approach to Diversity and Adaptation” initiative. The tomato genome sequencing project uses an ordered bacterial artificial chromosome (BAC approach to generate a high-quality tomato euchromatic genome sequence for use as a reference genome for the Solanaceae and euasterids. Sequence is deposited at GenBank and at the SOL Genomics Network (SGN. Currently, there are around 1000 BACs finished or in progress, representing more than a third of the projected euchromatic portion of the genome. An annotation effort is also underway by the International Tomato Annotation Group. The expected number of genes in the euchromatin is ∼40,000, based on an estimate from a preliminary annotation of 11% of finished sequence. Here, we present this first snapshot of the emerging tomato genome and its annotation, a short comparison with potato ( L. sequence data, and the tools available for the researchers to exploit this new resource are also presented. In the future, whole-genome shotgun techniques will be combined with the BAC-by-BAC approach to cover the entire tomato genome. The high-quality reference euchromatic tomato sequence is expected to be near completion by 2010.

  16. Microbial genomics: from sequence to function.

    OpenAIRE

    Schwartz, I

    2000-01-01

    The era of genomics (the study of genes and their function) began a scant dozen years ago with a suggestion by James Watson that the complete DNA sequence of the human genome be determined. Since that time, the human genome project has attracted a great deal of attention in the scientific world and the general media; the scope of the sequencing effort, and the extraordinary value that it will provide, has served to mask the enormous progress in sequencing other genomes. Microbial genome seque...

  17. Insights from twenty years of bacterial genome sequencing

    Energy Technology Data Exchange (ETDEWEB)

    Land, Miriam L [ORNL; Hauser, Loren John [ORNL; Jun, Se Ran [ORNL; Nookaew, Intawat [ORNL; Leuze, Michael Rex [ORNL; Ahn, Tae-Hyuk [ORNL; Karpinets, Tatiana V [ORNL; Lund, Ole [Technical University of Denmark; Kora, Guruprasad H [ORNL; Wassenaar, Trudy [Molecular Microbiology & Genomics Consultants, Zotzenheim, Germany; Poudel, Suresh [ORNL; Ussery, David W [ORNL

    2015-01-01

    Since the first two complete bacterial genome sequences were published in 1995, the science of bacteria has dramatically changed. Using third-generation DNA sequencing, it is possible to completely sequence a bacterial genome in a few hours and identify some types of methylation sites along the genome as well. Sequencing of bacterial genome sequences is now a standard procedure, and the information from tens of thousands of bacterial genomes has had a major impact on our views of the bacterial world. In this review, we explore a series of questions to highlight some insights that comparative genomics has produced. To date, there are genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. However, the distribution is quite skewed towards a few phyla that contain model organisms. But the breadth is continuing to improve, with projects dedicated to filling in less characterized taxonomic groups. The clustered regularly interspaced short palindromic repeats (CRISPR)-Cas system provides bacteria with immunity against viruses, which outnumber bacteria by tenfold. How fast can we go? Second-generation sequencing has produced a large number of draft genomes (close to 90 % of bacterial genomes in GenBank are currently not complete); third-generation sequencing can potentially produce a finished genome in a few hours, and at the same time provide methlylation sites along the entire chromosome. The diversity of bacterial communities is extensive as is evident from the genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. Genome sequencing can help in classifying an organism, and in the case where multiple genomes of the same species are available, it is possible to calculate the pan- and core genomes; comparison of more than 2000 Escherichia coli genomes finds an E. coli core genome of about 3100 gene families and a total of about 89,000 different gene families. Why do we care about bacterial genome

  18. Generation of Physical Map Contig-Specific Sequences Useful for Whole Genome Sequence Scaffolding

    Science.gov (United States)

    Jiang, Yanliang; Ninwichian, Parichart; Liu, Shikai; Zhang, Jiaren; Kucuktas, Huseyin; Sun, Fanyue; Kaltenboeck, Ludmilla; Sun, Luyang; Bao, Lisui; Liu, Zhanjiang

    2013-01-01

    Along with the rapid advances of the nextgen sequencing technologies, more and more species are added to the list of organisms whose whole genomes are sequenced. However, the assembled draft genome of many organisms consists of numerous small contigs, due to the short length of the reads generated by nextgen sequencing platforms. In order to improve the assembly and bring the genome contigs together, more genome resources are needed. In this study, we developed a strategy to generate a valuable genome resource, physical map contig-specific sequences, which are randomly distributed genome sequences in each physical contig. Two-dimensional tagging method was used to create specific tags for 1,824 physical contigs, in which the cost was dramatically reduced. A total of 94,111,841 100-bp reads and 315,277 assembled contigs are identified containing physical map contig-specific tags. The physical map contig-specific sequences along with the currently available BAC end sequences were then used to anchor the catfish draft genome contigs. A total of 156,457 genome contigs (~79% of whole genome sequencing assembly) were anchored and grouped into 1,824 pools, in which 16,680 unique genes were annotated. The physical map contig-specific sequences are valuable resources to link physical map, genetic linkage map and draft whole genome sequences, consequently have the capability to improve the whole genome sequences assembly and scaffolding, and improve the genome-wide comparative analysis as well. The strategy developed in this study could also be adopted in other species whose whole genome assembly is still facing a challenge. PMID:24205335

  19. Generation of physical map contig-specific sequences useful for whole genome sequence scaffolding.

    Directory of Open Access Journals (Sweden)

    Yanliang Jiang

    Full Text Available Along with the rapid advances of the nextgen sequencing technologies, more and more species are added to the list of organisms whose whole genomes are sequenced. However, the assembled draft genome of many organisms consists of numerous small contigs, due to the short length of the reads generated by nextgen sequencing platforms. In order to improve the assembly and bring the genome contigs together, more genome resources are needed. In this study, we developed a strategy to generate a valuable genome resource, physical map contig-specific sequences, which are randomly distributed genome sequences in each physical contig. Two-dimensional tagging method was used to create specific tags for 1,824 physical contigs, in which the cost was dramatically reduced. A total of 94,111,841 100-bp reads and 315,277 assembled contigs are identified containing physical map contig-specific tags. The physical map contig-specific sequences along with the currently available BAC end sequences were then used to anchor the catfish draft genome contigs. A total of 156,457 genome contigs (~79% of whole genome sequencing assembly were anchored and grouped into 1,824 pools, in which 16,680 unique genes were annotated. The physical map contig-specific sequences are valuable resources to link physical map, genetic linkage map and draft whole genome sequences, consequently have the capability to improve the whole genome sequences assembly and scaffolding, and improve the genome-wide comparative analysis as well. The strategy developed in this study could also be adopted in other species whose whole genome assembly is still facing a challenge.

  20. Genome Sequence of Lactobacillus rhamnosus ATCC 8530

    OpenAIRE

    Pittet, Vanessa; Ewen, Emily; Bushell, Barry R.; Ziola, Barry

    2012-01-01

    Lactobacillus rhamnosus is found in the human gastrointestinal tract and is important for probiotics. We became interested in L. rhamnosus isolate ATCC 8530 in relation to beer spoilage and hops resistance. We report here the genome sequence of this isolate, along with a brief comparison to other available L. rhamnosus genome sequences.

  1. Identification of optimum sequencing depth especially for de novo genome assembly of small genomes using next generation sequencing data.

    Science.gov (United States)

    Desai, Aarti; Marwah, Veer Singh; Yadav, Akshay; Jha, Vineet; Dhaygude, Kishor; Bangar, Ujwala; Kulkarni, Vivek; Jere, Abhay

    2013-01-01

    Next Generation Sequencing (NGS) is a disruptive technology that has found widespread acceptance in the life sciences research community. The high throughput and low cost of sequencing has encouraged researchers to undertake ambitious genomic projects, especially in de novo genome sequencing. Currently, NGS systems generate sequence data as short reads and de novo genome assembly using these short reads is computationally very intensive. Due to lower cost of sequencing and higher throughput, NGS systems now provide the ability to sequence genomes at high depth. However, currently no report is available highlighting the impact of high sequence depth on genome assembly using real data sets and multiple assembly algorithms. Recently, some studies have evaluated the impact of sequence coverage, error rate and average read length on genome assembly using multiple assembly algorithms, however, these evaluations were performed using simulated datasets. One limitation of using simulated datasets is that variables such as error rates, read length and coverage which are known to impact genome assembly are carefully controlled. Hence, this study was undertaken to identify the minimum depth of sequencing required for de novo assembly for different sized genomes using graph based assembly algorithms and real datasets. Illumina reads for E.coli (4.6 MB) S.kudriavzevii (11.18 MB) and C.elegans (100 MB) were assembled using SOAPdenovo, Velvet, ABySS, Meraculous and IDBA-UD. Our analysis shows that 50X is the optimum read depth for assembling these genomes using all assemblers except Meraculous which requires 100X read depth. Moreover, our analysis shows that de novo assembly from 50X read data requires only 6-40 GB RAM depending on the genome size and assembly algorithm used. We believe that this information can be extremely valuable for researchers in designing experiments and multiplexing which will enable optimum utilization of sequencing as well as analysis resources.

  2. Maize genome sequencing by methylation filtration.

    Science.gov (United States)

    Palmer, Lance E; Rabinowicz, Pablo D; O'Shaughnessy, Andrew L; Balija, Vivekanand S; Nascimento, Lidia U; Dike, Sujit; de la Bastide, Melissa; Martienssen, Robert A; McCombie, W Richard

    2003-12-19

    Gene enrichment strategies offer an alternative to sequencing large and repetitive genomes such as that of maize. We report the generation and analysis of nearly 100,000 undermethylated (or methylation filtration) maize sequences. Comparison with the rice genome reveals that methylation filtration results in a more comprehensive representation of maize genes than those that result from expressed sequence tags or transposon insertion sites sequences. About 7% of the repetitive DNA is unmethylated and thus selected in our libraries, but potentially active transposons and unmethylated organelle genomes can be identified. Reverse transcription polymerase chain reaction can be used to finish the maize transcriptome.

  3. Human Genome Sequencing in Health and Disease

    Science.gov (United States)

    Gonzaga-Jauregui, Claudia; Lupski, James R.; Gibbs, Richard A.

    2013-01-01

    Following the “finished,” euchromatic, haploid human reference genome sequence, the rapid development of novel, faster, and cheaper sequencing technologies is making possible the era of personalized human genomics. Personal diploid human genome sequences have been generated, and each has contributed to our better understanding of variation in the human genome. We have consequently begun to appreciate the vastness of individual genetic variation from single nucleotide to structural variants. Translation of genome-scale variation into medically useful information is, however, in its infancy. This review summarizes the initial steps undertaken in clinical implementation of personal genome information, and describes the application of whole-genome and exome sequencing to identify the cause of genetic diseases and to suggest adjuvant therapies. Better analysis tools and a deeper understanding of the biology of our genome are necessary in order to decipher, interpret, and optimize clinical utility of what the variation in the human genome can teach us. Personal genome sequencing may eventually become an instrument of common medical practice, providing information that assists in the formulation of a differential diagnosis. We outline herein some of the remaining challenges. PMID:22248320

  4. The genome sequence of parrot bornavirus 5.

    Science.gov (United States)

    Guo, Jianhua; Tizard, Ian

    2015-12-01

    Although several new avian bornaviruses have recently been described, information on their evolution, virulence, and sequence are often limited. Here we report the complete genome sequence of parrot bornavirus 5 (PaBV-5) isolated from a case of proventricular dilatation disease in a Palm cockatoo (Probosciger aterrimus). The complete genome consists of 8842 nucleotides with distinct 5' and 3' end sequences. This virus shares nucleotide sequence identities of 69-74 % with other bornaviruses in the genomic regions excluding the 5' and 3' terminal sequences. Phylogenetic analysis based on the genomic regions demonstrated this new isolate is an isolated branch within the clade that includes the aquatic bird bornaviruses and the passerine bornaviruses. Based on phylogenetic analyses and its low nucleotide sequence identities with other bornavirus, we support the proposal that PaBV-5 be assigned to a new bornavirus species:- Psittaciform 2 bornavirus.

  5. Genomic sequencing of Pleistocene cave bears

    Energy Technology Data Exchange (ETDEWEB)

    Noonan, James P.; Hofreiter, Michael; Smith, Doug; Priest, JamesR.; Rohland, Nadin; Rabeder, Gernot; Krause, Johannes; Detter, J. Chris; Paabo, Svante; Rubin, Edward M.

    2005-04-01

    Despite the information content of genomic DNA, ancient DNA studies to date have largely been limited to amplification of mitochondrial DNA due to technical hurdles such as contamination and degradation of ancient DNAs. In this study, we describe two metagenomic libraries constructed using unamplified DNA extracted from the bones of two 40,000-year-old extinct cave bears. Analysis of {approx}1 Mb of sequence from each library showed that, despite significant microbial contamination, 5.8 percent and 1.1 percent of clones in the libraries contain cave bear inserts, yielding 26,861 bp of cave bear genome sequence. Alignment of this sequence to the dog genome, the closest sequenced genome to cave bear in terms of evolutionary distance, revealed roughly the expected ratio of cave bear exons, repeats and conserved noncoding sequences. Only 0.04 percent of all clones sequenced were derived from contamination with modern human DNA. Comparison of cave bear with orthologous sequences from several modern bear species revealed the evolutionary relationship of these lineages. Using the metagenomic approach described here, we have recovered substantial quantities of mammalian genomic sequence more than twice as old as any previously reported, establishing the feasibility of ancient DNA genomic sequencing programs.

  6. Correction for Measurement Error from Genotyping-by-Sequencing in Genomic Variance and Genomic Prediction Models

    DEFF Research Database (Denmark)

    Ashraf, Bilal; Janss, Luc; Jensen, Just

    Genotyping-by-sequencing (GBSeq) is becoming a cost-effective genotyping platform for species without available SNP arrays. GBSeq considers to sequence short reads from restriction sites covering a limited part of the genome (e.g., 5-10%) with low sequencing depth per individual (e.g., 5-10X per....... In the current work we show how the correction for measurement error in GBSeq can also be applied in whole genome genomic variance and genomic prediction models. Bayesian whole-genome random regression models are proposed to allow implementation of large-scale SNP-based models with a per-SNP correction...... for measurement error. We show correct retrieval of genomic explained variance, and improved genomic prediction when accounting for the measurement error in GBSeq data...

  7. Strategies for complete plastid genome sequencing.

    Science.gov (United States)

    Twyford, Alex D; Ness, Rob W

    2016-10-28

    Plastid sequencing is an essential tool in the study of plant evolution. This high-copy organelle is one of the most technically accessible regions of the genome, and its sequence conservation makes it a valuable region for comparative genome evolution, phylogenetic analysis and population studies. Here, we discuss recent innovations and approaches for de novo plastid assembly that harness genomic tools. We focus on technical developments including low-cost sequence library preparation approaches for genome skimming, enrichment via hybrid baits and methylation-sensitive capture, sequence platforms with higher read outputs and longer read lengths, and automated tools for assembly. These developments allow for a much more streamlined assembly than via conventional short-range PCR. Although newer methods make complete plastid sequencing possible for any land plant or green alga, there are still challenges for producing finished plastomes particularly from herbarium material or from structurally divergent plastids such as those of parasitic plants.

  8. Plantagora: modeling whole genome sequencing and assembly of plant genomes.

    Directory of Open Access Journals (Sweden)

    Roger Barthelson

    Full Text Available BACKGROUND: Genomics studies are being revolutionized by the next generation sequencing technologies, which have made whole genome sequencing much more accessible to the average researcher. Whole genome sequencing with the new technologies is a developing art that, despite the large volumes of data that can be produced, may still fail to provide a clear and thorough map of a genome. The Plantagora project was conceived to address specifically the gap between having the technical tools for genome sequencing and knowing precisely the best way to use them. METHODOLOGY/PRINCIPAL FINDINGS: For Plantagora, a platform was created for generating simulated reads from several different plant genomes of different sizes. The resulting read files mimicked either 454 or Illumina reads, with varying paired end spacing. Thousands of datasets of reads were created, most derived from our primary model genome, rice chromosome one. All reads were assembled with different software assemblers, including Newbler, Abyss, and SOAPdenovo, and the resulting assemblies were evaluated by an extensive battery of metrics chosen for these studies. The metrics included both statistics of the assembly sequences and fidelity-related measures derived by alignment of the assemblies to the original genome source for the reads. The results were presented in a website, which includes a data graphing tool, all created to help the user compare rapidly the feasibility and effectiveness of different sequencing and assembly strategies prior to testing an approach in the lab. Some of our own conclusions regarding the different strategies were also recorded on the website. CONCLUSIONS/SIGNIFICANCE: Plantagora provides a substantial body of information for comparing different approaches to sequencing a plant genome, and some conclusions regarding some of the specific approaches. Plantagora also provides a platform of metrics and tools for studying the process of sequencing and assembly

  9. Complete Genome Sequence of Zika Virus Isolated from Semen

    Science.gov (United States)

    Graham, Victoria; Lewandowski, Kuiama; Dowall, Stuart D.; Pullan, Steven T.; Hewson, Roger

    2016-01-01

    Zika virus (ZIKV) is an emerging pathogenic flavivirus currently circulating in numerous countries in South America, the Caribbean, and the Western Pacific Region. Using an unbiased metagenomic sequencing approach, we report here the first complete genome sequence of ZIKV isolated from a clinical semen sample. PMID:27738033

  10. Microbial species delineation using whole genome sequences

    Energy Technology Data Exchange (ETDEWEB)

    Kyrpides, Nikos; Mukherjee, Supratim; Ivanova, Natalia; Mavrommatics, Kostas; Pati, Amrita; Konstantinidis, Konstantinos

    2014-10-20

    Species assignments in prokaryotes use a manual, poly-phasic approach utilizing both phenotypic traits and sequence information of phylogenetic marker genes. With thousands of genomes being sequenced every year, an automated, uniform and scalable approach exploiting the rich genomic information in whole genome sequences is desired, at least for the initial assignment of species to an organism. We have evaluated pairwise genome-wide Average Nucleotide Identity (gANI) values and alignment fractions (AFs) for nearly 13,000 genomes using our fast implementation of the computation, identifying robust and widely applicable hard cut-offs for species assignments based on AF and gANI. Using these cutoffs, we generated stable species-level clusters of organisms, which enabled the identification of several species mis-assignments and facilitated the assignment of species for organisms without species definitions.

  11. Genomic Prediction from Whole Genome Sequence in Livestock: The 1000 Bull Genomes Project

    DEFF Research Database (Denmark)

    Hayes, Benjamin J; MacLeod, Iona M; Daetwyler, Hans D

    Advantages of using whole genome sequence data to predict genomic estimated breeding values (GEBV) include better persistence of accuracy of GEBV across generations and more accurate GEBV across breeds. The 1000 Bull Genomes Project provides a database of whole genome sequenced key ancestor bulls...

  12. Genomic prediction using QTL derived from whole genome sequence data

    DEFF Research Database (Denmark)

    Brøndum, Rasmus Froberg; Su, Guosheng; Janss, Luc

    This study investigated the gain in accuracy of genomic prediction when a small number of significant variants from single marker analysis based on whole genome sequence data were added to the regular 54k SNP data. Analyses were performed for Nordic Holstein and Danish Jersey animals, using eithe...

  13. Multilocus Sequence Typing of Total-Genome-Sequenced Bacteria

    DEFF Research Database (Denmark)

    Larsen, Mette Voldby; Cosentino, Salvatore; Rasmussen, Simon

    2012-01-01

    Accurate strain identification is essential for anyone working with bacteria. For many species, multilocus sequence typing (MLST) is considered the "gold standard" of typing, but it is traditionally performed in an expensive and time-consuming manner. As the costs of whole-genome sequencing (WGS...... the MLST databases are downloaded monthly, and the best-matching MLST alleles of the specified MLST scheme are found using a BLAST-based ranking method. The sequence type is then determined by the combination of alleles identified. The method was tested on preassembled genomes from 336 isolates covering 56...... MLST schemes, on short sequence reads from 387 isolates covering 10 schemes, and on a small test set of short sequence reads from 29 isolates for which the sequence type had been determined by traditional methods. The method presented here enables investigators to determine the sequence types...

  14. The characterization of twenty sequenced human genomes.

    Directory of Open Access Journals (Sweden)

    Kimberly Pelak

    2010-09-01

    Full Text Available We present the analysis of twenty human genomes to evaluate the prospects for identifying rare functional variants that contribute to a phenotype of interest. We sequenced at high coverage ten "case" genomes from individuals with severe hemophilia A and ten "control" genomes. We summarize the number of genetic variants emerging from a study of this magnitude, and provide a proof of concept for the identification of rare and highly-penetrant functional variants by confirming that the cause of hemophilia A is easily recognizable in this data set. We also show that the number of novel single nucleotide variants (SNVs discovered per genome seems to stabilize at about 144,000 new variants per genome, after the first 15 individuals have been sequenced. Finally, we find that, on average, each genome carries 165 homozygous protein-truncating or stop loss variants in genes representing a diverse set of pathways.

  15. Perspectives of Integrative Cancer Genomics in Next Generation Sequencing Era

    Directory of Open Access Journals (Sweden)

    So Mee Kwon

    2012-06-01

    Full Text Available The explosive development of genomics technologies including microarrays and next generation sequencing (NGS has provided comprehensive maps of cancer genomes, including the expression of mRNAs and microRNAs, DNA copy numbers, sequence variations, and epigenetic changes. These genome-wide profiles of the genetic aberrations could reveal the candidates for diagnostic and/or prognostic biomarkers as well as mechanistic insights into tumor development and progression. Recent efforts to establish the huge cancer genome compendium and integrative omics analyses, so-called "integromics", have extended our understanding on the cancer genome, showing its daunting complexity and heterogeneity. However, the challenges of the structured integration, sharing, and interpretation of the big omics data still remain to be resolved. Here, we review several issues raised in cancer omics data analysis, including NGS, focusing particularly on the study design and analysis strategies. This might be helpful to understand the current trends and strategies of the rapidly evolving cancer genomics research.

  16. Composition and organization of active centromere sequences in complex genomes

    Directory of Open Access Journals (Sweden)

    Hayden Karen E

    2012-07-01

    Full Text Available Abstract Background Centromeres are sites of chromosomal spindle attachment during mitosis and meiosis. While the sequence basis for centromere identity remains a subject of considerable debate, one approach is to examine the genomic organization at these active sites that are correlated with epigenetic marks of centromere function. Results We have developed an approach to characterize both satellite and non-satellite centromeric sequences that are missing from current assemblies in complex genomes, using the dog genome as an example. Combining this genomic reference with an epigenetic dataset corresponding to sequences associated with the histone H3 variant centromere protein A (CENP-A, we identify active satellite sequence domains that appear to be both functionally and spatially distinct within the overall definition of satellite families. Conclusions These findings establish a genomic and epigenetic foundation for exploring the functional role of centromeric sequences in the previously sequenced dog genome and provide a model for similar studies within the context of less-characterized genomes.

  17. Genome sequence and analysis of Lactobacillus helveticus

    Directory of Open Access Journals (Sweden)

    Paola eCremonesi

    2013-01-01

    Full Text Available The microbiological characterization of lactobacilli is historically well developed, but the genomic analysis is recent. Because of the widespread use of L. helveticus in cheese technology, information concerning the heterogeneity in this species is accumulating rapidly. Recently, the genome of five L. helveticus strains was sequenced to completion and compared with other genomically characterized lactobacilli. The genomic analysis of the first sequenced strain, L. helveticus DPC 4571, isolated from cheese and selected for its characteristics of rapid lysis and high proteolytic activity, has revealed a plethora of genes with industrial potential including those responsible for key metabolic functions such as proteolysis, lipolysis, and cell lysis. These genes and their derived enzymes can facilitate the production of cheese and cheese derivatives with potential for use as ingredients in consumer foods. In addition, L. helveticus has the potential to produce peptides with a biological function, such as angiotensin converting enzyme (ACE inhibitory activity, in fermented dairy products, demonstrating the therapeutic value of this species. A most intriguing feature of the genome of L. helveticus is the remarkable similarity in gene content with many intestinal lactobacilli. Comparative genomics has allowed the identification of key gene sets that facilitate a variety of lifestyles including adaptation to food matrices or the gastrointestinal tract.As genome sequence and functional genomic information continues to explode, key features of the genomes of L. helveticus strains continue to be discovered, answering many questions but also raising many new ones.

  18. Rapid whole genome sequencing and precision neonatology.

    Science.gov (United States)

    Petrikin, Joshua E; Willig, Laurel K; Smith, Laurie D; Kingsmore, Stephen F

    2015-12-01

    Traditionally, genetic testing has been too slow or perceived to be impractical to initial management of the critically ill neonate. Technological advances have led to the ability to sequence and interpret the entire genome of a neonate in as little as 26 h. As the cost and speed of testing decreases, the utility of whole genome sequencing (WGS) of neonates for acute and latent genetic illness increases. Analyzing the entire genome allows for concomitant evaluation of the currently identified 5588 single gene diseases. When applied to a select population of ill infants in a level IV neonatal intensive care unit, WGS yielded a diagnosis of a causative genetic disease in 57% of patients. These diagnoses may lead to clinical management changes ranging from transition to palliative care for uniformly lethal conditions for alteration or initiation of medical or surgical therapy to improve outcomes in others. Thus, institution of 2-day WGS at time of acute presentation opens the possibility of early implementation of precision medicine. This implementation may create opportunities for early interventional, frequently novel or off-label therapies that may alter disease trajectory in infants with what would otherwise be fatal disease. Widespread deployment of rapid WGS and precision medicine will raise ethical issues pertaining to interpretation of variants of unknown significance, discovery of incidental findings related to adult onset conditions and carrier status, and implementation of medical therapies for which little is known in terms of risks and benefits. Despite these challenges, precision neonatology has significant potential both to decrease infant mortality related to genetic diseases with onset in newborns and to facilitate parental decision making regarding transition to palliative care.

  19. Genome and exome sequencing in the clinic: unbiased genomic approaches with a high diagnostic yield

    NARCIS (Netherlands)

    Nelen, M.; Veltman, J.A.

    2012-01-01

    For the reasons discussed here, we think whole-genome- or exome-based approaches are currently most suited for diagnostic implementation in genetically heterogeneous diseases, initially to complement and later to replace Sanger sequencing, qPCR and genomic microarrays. Patients do need to be counsel

  20. Sequencing and comparing whole mitochondrial genomes ofanimals

    Energy Technology Data Exchange (ETDEWEB)

    Boore, Jeffrey L.; Macey, J. Robert; Medina, Monica

    2005-04-22

    Comparing complete animal mitochondrial genome sequences is becoming increasingly common for phylogenetic reconstruction and as a model for genome evolution. Not only are they much more informative than shorter sequences of individual genes for inferring evolutionary relatedness, but these data also provide sets of genome-level characters, such as the relative arrangements of genes, that can be especially powerful. We describe here the protocols commonly used for physically isolating mtDNA, for amplifying these by PCR or RCA, for cloning,sequencing, assembly, validation, and gene annotation, and for comparing both sequences and gene arrangements. On several topics, we offer general observations based on our experiences to date with determining and comparing complete mtDNA sequences.

  1. Resequencing of the common marmoset genome improves genome assemblies and gene-coding sequence analysis.

    Science.gov (United States)

    Sato, Kengo; Kuroki, Yoko; Kumita, Wakako; Fujiyama, Asao; Toyoda, Atsushi; Kawai, Jun; Iriki, Atsushi; Sasaki, Erika; Okano, Hideyuki; Sakakibara, Yasubumi

    2015-11-20

    The first draft of the common marmoset (Callithrix jacchus) genome was published by the Marmoset Genome Sequencing and Analysis Consortium. The draft was based on whole-genome shotgun sequencing, and the current assembly version is Callithrix_jacches-3.2.1, but there still exist 187,214 undetermined gap regions and supercontigs and relatively short contigs that are unmapped to chromosomes in the draft genome. We performed resequencing and assembly of the genome of common marmoset by deep sequencing with high-throughput sequencing technology. Several different sequence runs using Illumina sequencing platforms were executed, and 181 Gbp of high-quality bases including mate-pairs with long insert lengths of 3, 8, 20, and 40 Kbp were obtained, that is, approximately 60× coverage. The resequencing significantly improved the MGSAC draft genome sequence. The N50 of the contigs, which is a statistical measure used to evaluate assembly quality, doubled. As a result, 51% of the contigs (total length: 299 Mbp) that were unmapped to chromosomes in the MGSAC draft were merged with chromosomal contigs, and the improved genome sequence helped to detect 5,288 new genes that are homologous to human cDNAs and the gaps in 5,187 transcripts of the Ensembl gene annotations were completely filled.

  2. Complete genome sequence of arracacha mottle virus.

    Science.gov (United States)

    Orílio, Anelise F; Lucinda, Natalia; Dusi, André N; Nagata, Tatsuya; Inoue-Nagata, Alice K

    2013-01-01

    Arracacha mottle virus (AMoV) is the only potyvirus reported to infect arracacha (Arracacia xanthorrhiza) in Brazil. Here, the complete genome sequence of an isolate of AMoV was determined to be 9,630 nucleotides in length, excluding the 3' poly-A tail, and encoding a polyprotein of 3,135 amino acids and a putative P3N-PIPO protein. Its genomic organization is typical of a member of the genus Potyvirus, containing all conserved motifs. Its full genome sequence shared 56.2 % nucleotide identity with sunflower chlorotic mottle virus and verbena virus Y, the most closely related viruses.

  3. Simple sequence repeats in mycobacterial genomes

    Indian Academy of Sciences (India)

    Vattipally B Sreenu; Pankaj Kumar; Javaregowda Nagaraju; Hampapathalu A Nagarajaram

    2007-01-01

    Simple sequence repeats (SSRs) or microsatellites are the repetitive nucleotide sequences of motifs of length 1–6 bp. They are scattered throughout the genomes of all the known organisms ranging from viruses to eukaryotes. Microsatellites undergo mutations in the form of insertions and deletions (INDELS) of their repeat units with some bias towards insertions that lead to microsatellite tract expansion. Although prokaryotic genomes derive some plasticity due to microsatellite mutations they have in-built mechanisms to arrest undue expansions of microsatellites and one such mechanism is constituted by post-replicative DNA repair enzymes MutL, MutH and MutS. The mycobacterial genomes lack these enzymes and as a null hypothesis one could expect these genomes to harbour many long tracts. It is therefore interesting to analyse the mycobacterial genomes for distribution and abundance of microsatellites tracts and to look for potentially polymorphic microsatellites. Available mycobacterial genomes, Mycobacterium avium, M. leprae, M. bovis and the two strains of M. tuberculosis (CDC1551 and H37Rv) were analysed for frequencies and abundance of SSRs. Our analysis revealed that the SSRs are distributed throughout the mycobacterial genomes at an average of 220–230 SSR tracts per kb. All the mycobacterial genomes contain few regions that are conspicuously denser or poorer in microsatellites compared to their expected genome averages. The genomes distinctly show scarcity of long microsatellites despite the absence of a post-replicative DNA repair system. Such severe scarcity of long microsatellites could arise as a result of strong selection pressures operating against long and unstable sequences although influence of GC-content and role of point mutations in arresting microsatellite expansions can not be ruled out. Nonetheless, the long tracts occasionally found in coding as well as non-coding regions may account for limited genome plasticity in these genomes.

  4. The Release 6 reference sequence of the Drosophila melanogaster genome

    Science.gov (United States)

    Carlson, Joseph W.; Wan, Kenneth H.; Park, Soo; Mendez, Ivonne; Galle, Samuel E.; Booth, Benjamin W.; Pfeiffer, Barret D.; George, Reed A.; Svirskas, Robert; Krzywinski, Martin; Schein, Jacqueline; Accardo, Maria Carmela; Damia, Elisabetta; Messina, Giovanni; Méndez-Lago, María; de Pablos, Beatriz; Demakova, Olga V.; Andreyeva, Evgeniya N.; Boldyreva, Lidiya V.; Marra, Marco; Carvalho, A. Bernardo; Dimitri, Patrizio; Villasante, Alfredo; Zhimulev, Igor F.; Rubin, Gerald M.; Karpen, Gary H.

    2015-01-01

    Drosophila melanogaster plays an important role in molecular, genetic, and genomic studies of heredity, development, metabolism, behavior, and human disease. The initial reference genome sequence reported more than a decade ago had a profound impact on progress in Drosophila research, and improving the accuracy and completeness of this sequence continues to be important to further progress. We previously described improvement of the 117-Mb sequence in the euchromatic portion of the genome and 21 Mb in the heterochromatic portion, using a whole-genome shotgun assembly, BAC physical mapping, and clone-based finishing. Here, we report an improved reference sequence of the single-copy and middle-repetitive regions of the genome, produced using cytogenetic mapping to mitotic and polytene chromosomes, clone-based finishing and BAC fingerprint verification, ordering of scaffolds by alignment to cDNA sequences, incorporation of other map and sequence data, and validation by whole-genome optical restriction mapping. These data substantially improve the accuracy and completeness of the reference sequence and the order and orientation of sequence scaffolds into chromosome arm assemblies. Representation of the Y chromosome and other heterochromatic regions is particularly improved. The new 143.9-Mb reference sequence, designated Release 6, effectively exhausts clone-based technologies for mapping and sequencing. Highly repeat-rich regions, including large satellite blocks and functional elements such as the ribosomal RNA genes and the centromeres, are largely inaccessible to current sequencing and assembly methods and remain poorly represented. Further significant improvements will require sequencing technologies that do not depend on molecular cloning and that produce very long reads. PMID:25589440

  5. Sequencing the Cotton Genomes-Gossypium spp.

    Institute of Scientific and Technical Information of China (English)

    PATERSON Andrew H

    2008-01-01

    @@ The genomes of most major crops,including cotton,will be fully sequenced in the next fewyears.Cotton is unusual,although not unique,in that we will need to sequence not only cultivated(tetraploid) genotypes but their diploid progenitors,to understand how elite cottons have surpassedthe productivity and quality of their progenitors.

  6. Coelacanth genome sequence reveals the evolutionary history of vertebrate genes.

    Science.gov (United States)

    Noonan, James P; Grimwood, Jane; Danke, Joshua; Schmutz, Jeremy; Dickson, Mark; Amemiya, Chris T; Myers, Richard M

    2004-12-01

    The coelacanth is one of the nearest living relatives of tetrapods. However, a teleost species such as zebrafish or Fugu is typically used as the outgroup in current tetrapod comparative sequence analyses. Such studies are complicated by the fact that teleost genomes have undergone a whole-genome duplication event, as well as individual gene-duplication events. Here, we demonstrate the value of coelacanth genome sequence by complete sequencing and analysis of the protocadherin gene cluster of the Indonesian coelacanth, Latimeria menadoensis. We found that coelacanth has 49 protocadherin cluster genes organized in the same three ordered subclusters, alpha, beta, and gamma, as the 54 protocadherin cluster genes in human. In contrast, whole-genome and tandem duplications have generated two zebrafish protocadherin clusters comprised of at least 97 genes. Additionally, zebrafish protocadherins are far more prone to homogenizing gene conversion events than coelacanth protocadherins, suggesting that recombination- and duplication-driven plasticity may be a feature of teleost genomes. Our results indicate that coelacanth provides the ideal outgroup sequence against which tetrapod genomes can be measured. We therefore present L. menadoensis as a candidate for whole-genome sequencing.

  7. Whole-genome sequencing for comparative genomics and de novo genome assembly.

    Science.gov (United States)

    Benjak, Andrej; Sala, Claudia; Hartkoorn, Ruben C

    2015-01-01

    Next-generation sequencing technologies for whole-genome sequencing of mycobacteria are rapidly becoming an attractive alternative to more traditional sequencing methods. In particular this technology is proving useful for genome-wide identification of mutations in mycobacteria (comparative genomics) as well as for de novo assembly of whole genomes. Next-generation sequencing however generates a vast quantity of data that can only be transformed into a usable and comprehensible form using bioinformatics. Here we describe the methodology one would use to prepare libraries for whole-genome sequencing, and the basic bioinformatics to identify mutations in a genome following Illumina HiSeq or MiSeq sequencing, as well as de novo genome assembly following sequencing using Pacific Biosciences (PacBio).

  8. Complete genome sequence of Shewanella putrefaciens. Final report

    Energy Technology Data Exchange (ETDEWEB)

    Heidelberg, John F.

    2001-04-01

    Seventy percent of the costs for genome sequencing Shewanella putrefaciens (oneidensis) were requested. These funds were expected to allow completion of the low-pass (5-fold) random sequencing and complete closure and annotation of the 200 kbp plasmid. Because of cost reduction that occurred during the period of this grant, these goals have been far exceeded. Currently, the S. putrefaciens genome is very nearly completely closed, even though the genome was significantly larger than expected and extremely repetitive. The entire genome sequence has been made BLAST searchable on the TIGR web page, and an extensive effort has been made to make data and analyses available to all researchers working on S. putrefaciens (oneidensis).

  9. Long-read sequence assembly of the gorilla genome

    Science.gov (United States)

    Gordon, David; Huddleston, John; Chaisson, Mark J. P.; Hill, Christopher M.; Kronenberg, Zev N.; Munson, Katherine M.; Malig, Maika; Raja, Archana; Fiddes, Ian; Hillier, LaDeana W.; Dunn, Christopher; Baker, Carl; Armstrong, Joel; Diekhans, Mark; Paten, Benedict; Shendure, Jay; Wilson, Richard K.; Haussler, David; Chin, Chen-Shan; Eichler, Evan E.

    2016-01-01

    Accurate sequence and assembly of genomes is a critical first step for studies of genetic variation. We generated a high-quality assembly of the gorilla genome using single-molecule, real-time sequence technology and a string graph de novo assembly algorithm. The new assembly improves contiguity by two to three orders of magnitude with respect to previously released assemblies, recovering 87% of missing reference exons and incomplete gene models. Although regions of large, high-identity segmental duplications remain largely unresolved, this comprehensive assembly provides new biological insight into genetic diversity, structural variation, gene loss, and representation of repeat structures within the gorilla genome. The approach provides a path forward for the routine assembly of mammalian genomes at a level approaching that of the current quality of the human genome. PMID:27034376

  10. Genome Sequence of the Palaeopolyploid soybean

    Energy Technology Data Exchange (ETDEWEB)

    Schmutz, Jeremy; Cannon, Steven B.; Schlueter, Jessica; Ma, Jianxin; Mitros, Therese; Nelson, William; Hyten, David L.; Song, Qijian; Thelen, Jay J.; Cheng, Jianlin; Xu, Dong; Hellsten, Uffe; May, Gregory D.; Yu, Yeisoo; Sakura, Tetsuya; Umezawa, Taishi; Bhattacharyya, Madan K.; Sandhu, Devinder; Valliyodan, Babu; Lindquist, Erika; Peto, Myron; Grant, David; Shu, Shengqiang; Goodstein, David; Barry, Kerrie; Futrell-Griggs, Montona; Abernathy, Brian; Du, Jianchang; Tian, Zhixi; Zhu, Liucun; Gill, Navdeep; Joshi, Trupti; Libault, Marc; Sethuraman, Anand; Zhang, Xue-Cheng; Shinozaki, Kazuo; Nguyen, Henry T.; Wing, Rod A.; Cregan, Perry; Specht, James; Grimwood, Jane; Rokhsar, Dan; Stacey, Gary; Shoemaker, Randy C.; Jackson, Scott A.

    2009-08-03

    Soybean (Glycine max) is one of the most important crop plants for seed protein and oil content, and for its capacity to fix atmospheric nitrogen through symbioses with soil-borne microorganisms. We sequenced the 1.1-gigabase genome by a whole-genome shotgun approach and integrated it with physical and high-density genetic maps to create a chromosome-scale draft sequence assembly. We predict 46,430 protein-coding genes, 70percent more than Arabidopsis and similar to the poplar genome which, like soybean, is an ancient polyploid (palaeopolyploid). About 78percent of the predicted genes occur in chromosome ends, which comprise less than one-half of the genome but account for nearly all of the genetic recombination. Genome duplications occurred at approximately 59 and 13 million years ago, resulting in a highly duplicated genome with nearly 75percent of the genes present in multiple copies. The two duplication events were followed by gene diversification and loss, and numerous chromosome rearrangements. An accurate soybean genome sequence will facilitate the identification of the genetic basis of many soybean traits, and accelerate the creation of improved soybean varieties.

  11. Viral genome sequencing by random priming methods

    Directory of Open Access Journals (Sweden)

    Zhang Xinsheng

    2008-01-01

    Full Text Available Abstract Background Most emerging health threats are of zoonotic origin. For the overwhelming majority, their causative agents are RNA viruses which include but are not limited to HIV, Influenza, SARS, Ebola, Dengue, and Hantavirus. Of increasing importance therefore is a better understanding of global viral diversity to enable better surveillance and prediction of pandemic threats; this will require rapid and flexible methods for complete viral genome sequencing. Results We have adapted the SISPA methodology 123 to genome sequencing of RNA and DNA viruses. We have demonstrated the utility of the method on various types and sources of viruses, obtaining near complete genome sequence of viruses ranging in size from 3,000–15,000 kb with a median depth of coverage of 14.33. We used this technique to generate full viral genome sequence in the presence of host contaminants, using viral preparations from cell culture supernatant, allantoic fluid and fecal matter. Conclusion The method described is of great utility in generating whole genome assemblies for viruses with little or no available sequence information, viruses from greatly divergent families, previously uncharacterized viruses, or to more fully describe mixed viral infections.

  12. Rhipicephalus (Boophilus) microplus strain Deutsch, whole genome shotgun sequencing project first submission of genome sequence

    Science.gov (United States)

    The size and repetitive nature of the Rhipicephalus microplus genome makes obtaining a full genome sequence difficult. Cot filtration/selection techniques were used to reduce the repetitive fraction of the tick genome and enrich for the fraction of DNA with gene-containing regions. The Cot-selected ...

  13. Next-generation sequencing and large genome assemblies.

    Science.gov (United States)

    Henson, Joseph; Tischler, German; Ning, Zemin

    2012-06-01

    The next-generation sequencing (NGS) revolution has drastically reduced time and cost requirements for sequencing of large genomes, and also qualitatively changed the problem of assembly. This article reviews the state of the art in de novo genome assembly, paying particular attention to mammalian-sized genomes. The strengths and weaknesses of the main sequencing platforms are highlighted, leading to a discussion of assembly and the new challenges associated with NGS data. Current approaches to assembly are outlined and the various software packages available are introduced and compared. The question of whether quality assemblies can be produced using short-read NGS data alone, or whether it must be combined with more expensive sequencing techniques, is considered. Prospects for future assemblers and tests of assembly performance are also discussed.

  14. Sequencing and comparative analysis of the gorilla MHC genomic sequence.

    Science.gov (United States)

    Wilming, Laurens G; Hart, Elizabeth A; Coggill, Penny C; Horton, Roger; Gilbert, James G R; Clee, Chris; Jones, Matt; Lloyd, Christine; Palmer, Sophie; Sims, Sarah; Whitehead, Siobhan; Wiley, David; Beck, Stephan; Harrow, Jennifer L

    2013-01-01

    Major histocompatibility complex (MHC) genes play a critical role in vertebrate immune response and because the MHC is linked to a significant number of auto-immune and other diseases it is of great medical interest. Here we describe the clone-based sequencing and subsequent annotation of the MHC region of the gorilla genome. Because the MHC is subject to extensive variation, both structural and sequence-wise, it is not readily amenable to study in whole genome shotgun sequence such as the recently published gorilla genome. The variation of the MHC also makes it of evolutionary interest and therefore we analyse the sequence in the context of human and chimpanzee. In our comparisons with human and re-annotated chimpanzee MHC sequence we find that gorilla has a trimodular RCCX cluster, versus the reference human bimodular cluster, and additional copies of Class I (pseudo)genes between Gogo-K and Gogo-A (the orthologues of HLA-K and -A). We also find that Gogo-H (and Patr-H) is coding versus the HLA-H pseudogene and, conversely, there is a Gogo-DQB2 pseudogene versus the HLA-DQB2 coding gene. Our analysis, which is freely available through the VEGA genome browser, provides the research community with a comprehensive dataset for comparative and evolutionary research of the MHC.

  15. Large-Scale Sequencing: The Future of Genomic Sciences Colloquium

    Energy Technology Data Exchange (ETDEWEB)

    Margaret Riley; Merry Buckley

    2009-01-01

    Genetic sequencing and the various molecular techniques it has enabled have revolutionized the field of microbiology. Examining and comparing the genetic sequences borne by microbes - including bacteria, archaea, viruses, and microbial eukaryotes - provides researchers insights into the processes microbes carry out, their pathogenic traits, and new ways to use microorganisms in medicine and manufacturing. Until recently, sequencing entire microbial genomes has been laborious and expensive, and the decision to sequence the genome of an organism was made on a case-by-case basis by individual researchers and funding agencies. Now, thanks to new technologies, the cost and effort of sequencing is within reach for even the smallest facilities, and the ability to sequence the genomes of a significant fraction of microbial life may be possible. The availability of numerous microbial genomes will enable unprecedented insights into microbial evolution, function, and physiology. However, the current ad hoc approach to gathering sequence data has resulted in an unbalanced and highly biased sampling of microbial diversity. A well-coordinated, large-scale effort to target the breadth and depth of microbial diversity would result in the greatest impact. The American Academy of Microbiology convened a colloquium to discuss the scientific benefits of engaging in a large-scale, taxonomically-based sequencing project. A group of individuals with expertise in microbiology, genomics, informatics, ecology, and evolution deliberated on the issues inherent in such an effort and generated a set of specific recommendations for how best to proceed. The vast majority of microbes are presently uncultured and, thus, pose significant challenges to such a taxonomically-based approach to sampling genome diversity. However, we have yet to even scratch the surface of the genomic diversity among cultured microbes. A coordinated sequencing effort of cultured organisms is an appropriate place to begin

  16. Sequencing and Analysis of a Genomic Fragment Provide an Insight into the Dunaliella viridis Genomic Sequence

    Institute of Scientific and Technical Information of China (English)

    Xiao-Ming SUN; Yuan-Ping TANG; Xiang-Zong MENG; Wen-Wen ZHANG; Shan LI; Zhi-Rui DENG; Zheng-Kai XU; Ren-Tao SONG

    2006-01-01

    Dunaliella is a genus of wall-less unicellular eukaryotic green alga. Its exceptional resistances to salt and various other stresses have made it an ideal model for stress tolerance study. However, very little is known about its genome and genomic sequences. In this study, we sequenced and analyzed a 29,268 bp genomic fragment from Dunaliella viridis. The fragment showed low sequence homology to the GenBank database. At the nucleotide level, only a segment with significant sequence homology to 18S rRNA was found. The fragment contained six putative genes, but only one gene showed significant homology at the protein level to GenBank database. The average GC content of this sequence was 51.1%, which was much lower than that of close related green algae Chlamydomonas (65.7%). Significant segmental duplications were found within this fragment. The duplicated sequences accounted for about 35.7% of the entire region. Large amounts of simple sequence repeats (microsatellites) were found, with strong bias towards (AC)n type (76%). Analysis of other Dunaliella genomic sequences in the GenBank database (total 25,749 bp) was in agreement with these findings. These sequence features made it difficult to sequence Dunaliella genomic sequences. Further investigation should be made to reveal the biological significance of these unique sequence features.

  17. Sorghum genome sequencing by methylation filtration.

    Directory of Open Access Journals (Sweden)

    Joseph A Bedell

    2005-01-01

    Full Text Available Sorghum bicolor is a close relative of maize and is a staple crop in Africa and much of the developing world because of its superior tolerance of arid growth conditions. We have generated sequence from the hypomethylated portion of the sorghum genome by applying methylation filtration (MF technology. The evidence suggests that 96% of the genes have been sequence tagged, with an average coverage of 65% across their length. Remarkably, this level of gene discovery was accomplished after generating a raw coverage of less than 300 megabases of the 735-megabase genome. MF preferentially captures exons and introns, promoters, microRNAs, and simple sequence repeats, and minimizes interspersed repeats, thus providing a robust view of the functional parts of the genome. The sorghum MF sequence set is beneficial to research on sorghum and is also a powerful resource for comparative genomics among the grasses and across the entire plant kingdom. Thousands of hypothetical gene predictions in rice and Arabidopsis are supported by the sorghum dataset, and genomic similarities highlight evolutionarily conserved regions that will lead to a better understanding of rice and Arabidopsis.

  18. Sorghum genome sequencing by methylation filtration.

    Science.gov (United States)

    Bedell, Joseph A; Budiman, Muhammad A; Nunberg, Andrew; Citek, Robert W; Robbins, Dan; Jones, Joshua; Flick, Elizabeth; Rholfing, Theresa; Fries, Jason; Bradford, Kourtney; McMenamy, Jennifer; Smith, Michael; Holeman, Heather; Roe, Bruce A; Wiley, Graham; Korf, Ian F; Rabinowicz, Pablo D; Lakey, Nathan; McCombie, W Richard; Jeddeloh, Jeffrey A; Martienssen, Robert A

    2005-01-01

    Sorghum bicolor is a close relative of maize and is a staple crop in Africa and much of the developing world because of its superior tolerance of arid growth conditions. We have generated sequence from the hypomethylated portion of the sorghum genome by applying methylation filtration (MF) technology. The evidence suggests that 96% of the genes have been sequence tagged, with an average coverage of 65% across their length. Remarkably, this level of gene discovery was accomplished after generating a raw coverage of less than 300 megabases of the 735-megabase genome. MF preferentially captures exons and introns, promoters, microRNAs, and simple sequence repeats, and minimizes interspersed repeats, thus providing a robust view of the functional parts of the genome. The sorghum MF sequence set is beneficial to research on sorghum and is also a powerful resource for comparative genomics among the grasses and across the entire plant kingdom. Thousands of hypothetical gene predictions in rice and Arabidopsis are supported by the sorghum dataset, and genomic similarities highlight evolutionarily conserved regions that will lead to a better understanding of rice and Arabidopsis.

  19. Combining two technologies for full genome sequencing of human.

    Science.gov (United States)

    Skryabin, K G; Prokhortchouk, E B; Mazur, A M; Boulygina, E S; Tsygankova, S V; Nedoluzhko, A V; Rastorguev, S M; Matveev, V B; Chekanov, N N; D A, Goranskaya; Teslyuk, A B; Gruzdeva, N M; Velikhov, V E; Zaridze, D G; Kovalchuk, M V

    2009-10-01

    At present, the new technologies of DNA sequencing are rapidly developing allowing quick and efficient characterisation of organisms at the level of the genome structure. In this study, the whole genome sequencing of a human (Russian man) was performed using two technologies currently present on the market - Sequencing by Oligonucleotide Ligation and Detection (SOLiD™) (Applied Biosystems) and sequencing technologies of molecular clusters using fluorescently labeled precursors (Illumina). The total number of generated data resulted in 108.3 billion base pairs (60.2 billion from Illumina technology and 48.1 billion from SOLiD technology). Statistics performed on reads generated by GAII and SOLiD showed that they covered 75% and 96% of the genome respectively. Short polymorphic regions were detected with comparable accuracy however, the absolute amount of them revealed by SOLiD was several times less than by GAII. Optimal algorithm for using the latest methods of sequencing was established for the analysis of individual human genomes. The study is the first Russian effort towards whole human genome sequencing.

  20. Current development and application of soybean genomics

    Institute of Scientific and Technical Information of China (English)

    Lingli HE; Jing ZHAO; Man ZHAO; Chaoying HE

    2011-01-01

    Soybean (Glycine max),an important domesticated species originated in China,constitutes a major source of edible oils and high-quality plant proteins worldwide.In spite of its complex genome as a consequence of an ancient tetraploidilization,platforms for map-based genomics,sequence-based genomics,comparative genomics and functional genomics have been well developed in the last decade,thus rich repertoires of genomic tools and resources are available,which have been influencing the soybean genetic improvement.Here we mainly review the progresses of soybean (including its wild relative Glycine soja) genomics and its impetus for soybean breeding,and raise the major biological questions needing to be addressed.Genetic maps,physical maps,QTL and EST mapping have been so well achieved that the marker assisted selection and positional cloning in soybean is feasible and even routine.Whole genome sequencing and transcriptomic analyses provide a large collection of molecular markers and predicted genes,which are instrumental to comparative genomics and functional genomics.Comparative genomics has started to reveal the evolution of soybean genome and the molecular basis of soybean domestication process.Microarrays resources,mutagenesis and efficient transformation systems become essential components of soybean functional genomics.Furthermore,phenotypic functional genomics via both forward and reverse genetic approaches has inferred functions of many genes involved in plant and seed development,in response to abiotic stresses,functioning in plant-pathogenic microbe interactions,and controlling the oil and protein content of seed.These achievements have paved the way for generation of transgenic or genetically modified (GM) soybean crops.

  1. Cactus: Algorithms for genome multiple sequence alignment

    OpenAIRE

    Paten, Benedict; Earl, Dent; Nguyen, Ngan; Diekhans, Mark; Zerbino, Daniel; Haussler, David

    2011-01-01

    Much attention has been given to the problem of creating reliable multiple sequence alignments in a model incorporating substitutions, insertions, and deletions. Far less attention has been paid to the problem of optimizing alignments in the presence of more general rearrangement and copy number variation. Using Cactus graphs, recently introduced for representing sequence alignments, we describe two complementary algorithms for creating genomic alignments. We have implemented these algorithms...

  2. NCBI Reference Sequences: current status, policy and new initiatives.

    Science.gov (United States)

    Pruitt, Kim D; Tatusova, Tatiana; Klimke, William; Maglott, Donna R

    2009-01-01

    NCBI's Reference Sequence (RefSeq) database (http://www.ncbi.nlm.nih.gov/RefSeq/) is a curated non-redundant collection of sequences representing genomes, transcripts and proteins. RefSeq records integrate information from multiple sources and represent a current description of the sequence, the gene and sequence features. The database includes over 5300 organisms spanning prokaryotes, eukaryotes and viruses, with records for more than 5.5 x 10(6) proteins (RefSeq release 30). Feature annotation is applied by a combination of curation, collaboration, propagation from other sources and computation. We report here on the recent growth of the database, recent changes to feature annotations and record types for eukaryotic (primarily vertebrate) species and policies regarding species inclusion and genome annotation. In addition, we introduce RefSeqGene, a new initiative to support reporting variation data on a stable genomic coordinate system.

  3. Whole genome sequencing in clinical and public health microbiology.

    Science.gov (United States)

    Kwong, J C; McCallum, N; Sintchenko, V; Howden, B P

    2015-04-01

    Genomics and whole genome sequencing (WGS) have the capacity to greatly enhance knowledge and understanding of infectious diseases and clinical microbiology.The growth and availability of bench-top WGS analysers has facilitated the feasibility of genomics in clinical and public health microbiology.Given current resource and infrastructure limitations, WGS is most applicable to use in public health laboratories, reference laboratories, and hospital infection control-affiliated laboratories.As WGS represents the pinnacle for strain characterisation and epidemiological analyses, it is likely to replace traditional typing methods, resistance gene detection and other sequence-based investigations (e.g., 16S rDNA PCR) in the near future.Although genomic technologies are rapidly evolving, widespread implementation in clinical and public health microbiology laboratories is limited by the need for effective semi-automated pipelines, standardised quality control and data interpretation, bioinformatics expertise, and infrastructure.

  4. Mapping and Sequencing the Human Genome

    Science.gov (United States)

    1988-01-01

    Numerous meetings have been held and a debate has developed in the biological community over the merits of mapping and sequencing the human genome. In response a committee to examine the desirability and feasibility of mapping and sequencing the human genome was formed to suggest options for implementing the project. The committee asked many questions. Should the analysis of the human genome be left entirely to the traditionally uncoordinated, but highly successful, support systems that fund the vast majority of biomedical research. Or should a more focused and coordinated additional support system be developed that is limited to encouraging and facilitating the mapping and eventual sequencing of the human genome. If so, how can this be done without distorting the broader goals of biological research that are crucial for any understanding of the data generated in such a human genome project. As the committee became better informed on the many relevant issues, the opinions of its members coalesced, producing a shared consensus of what should be done. This report reflects that consensus.

  5. Whole-genome amplification of single-cell genomes for next-generation sequencing.

    Science.gov (United States)

    Korfhage, Christian; Fisch, Evelyn; Fricke, Evelyn; Baedker, Silke; Loeffert, Dirk

    2013-10-11

    DNA sequence analysis and genotyping of biological samples using next-generation sequencing (NGS), microarrays, or real-time PCR is often limited by the small amount of sample available. A single cell contains only one to four copies of the genomic DNA, depending on the organism (haploid or diploid organism) and the cell-cycle phase. The DNA content of a single cell ranges from a few femtograms in bacteria to picograms in mammalia. In contrast, a deep analysis of the genome currently requires a few hundred nanograms up to micrograms of genomic DNA for library formation necessary for NGS sequencing or labeling protocols (e.g., microarrays). Consequently, accurate whole-genome amplification (WGA) of single-cell DNA is required for reliable genetic analysis (e.g., NGS) and is particularly important when genomic DNA is limited. The use of single-cell WGA has enabled the analysis of genomic heterogeneity of individual cells (e.g., somatic genomic variation in tumor cells). This unit describes how the genome of single cells can be used for WGA for further genomic studies, such as NGS. Recommendations for isolation of single cells are given and common sources of errors are discussed.

  6. Genome sequence of Lactobacillus farciminis KCTC 3681.

    Science.gov (United States)

    Nam, Seong-Hyeuk; Choi, Sang-Haeng; Kang, Aram; Kim, Dong-Wook; Kim, Ryong Nam; Kim, Aeri; Kim, Dae-Soo; Park, Hong-Seog

    2011-04-01

    Lactobacillus farciminis is one of the most prevalent lactic acid bacterial species present during the manufacturing process of kimchi, the best-known traditional Korean dish. Here, we present the draft genome sequence of the type strain Lactobacillus farciminis KCTC 3681 (2,498,309 bp, with a G+C content of 36.4%), which consists of 5 scaffolds.

  7. The diploid genome sequence of an Asian individual

    DEFF Research Database (Denmark)

    Wang, Jun; Wang, Wei; Li, Ruiqiang

    2008-01-01

    Here we present the first diploid genome sequence of an Asian individual. The genome was sequenced to 36-fold average coverage using massively parallel sequencing technology. We aligned the short reads onto the NCBI human reference genome to 99.97% coverage, and guided by the reference genome, we...

  8. Annotation-based genome-wide SNP discovery in the large and complex Aegilops tauschii genome using next-generation sequencing without a reference genome sequence

    Directory of Open Access Journals (Sweden)

    Luo Ming-Cheng

    2011-01-01

    containing putative SNPs was amplified by PCR from AL8/78 and AS75 and resequenced with the ABI 3730 xl. In a sample of 302 randomly selected putative SNPs, 84.0% in gene regions, 88.0% in repeat junctions, and 81.3% in uncharacterized regions were validated. Conclusion An annotation-based genome-wide SNP discovery pipeline for NGS platforms was developed. The pipeline is suitable for SNP discovery in genomic libraries of complex genomes and does not require a reference genome sequence. The pipeline is applicable to all current NGS platforms, provided that at least one such platform generates relatively long reads. The pipeline package, AGSNP, and the discovered 497,118 Ae. tauschii SNPs can be accessed at (http://avena.pw.usda.gov/wheatD/agsnp.shtml.

  9. A Draft Sequence of the Neandertal Genome

    Science.gov (United States)

    Green, Richard E.; Li, Heng; Zhai, Weiwei; Fritz, Markus Hsi-Yang; Hansen, Nancy F.; Durand, Eric Y.; Malaspinas, Anna-Sapfo; Jensen, Jeffrey D.; Marques-Bonet, Tomas; Alkan, Can; Prüfer, Kay; Meyer, Matthias; Burbano, Hernán A.; Good, Jeffrey M.; Schultz, Rigo; Aximu-Petri, Ayinuer; Butthof, Anne; Höber, Barbara; Höffner, Barbara; Siegemund, Madlen; Weihmann, Antje; Nusbaum, Chad; Lander, Eric S.; Russ, Carsten; Novod, Nathaniel; Affourtit, Jason; Egholm, Michael; Verna, Christine; Rudan, Pavao; Brajkovic, Dejana; Kucan, Željko; Gušic, Ivan; Doronichev, Vladimir B.; Golovanova, Liubov V.; Lalueza-Fox, Carles; de la Rasilla, Marco; Fortea, Javier; Rosas, Antonio; Schmitz, Ralf W.; Johnson, Philip L. F.; Eichler, Evan E.; Falush, Daniel; Birney, Ewan; Mullikin, James C.; Slatkin, Montgomery; Nielsen, Rasmus; Kelso, Janet; Lachmann, Michael; Reich, David; Pääbo, Svante

    2016-01-01

    Neandertals, the closest evolutionary relatives of present-day humans, lived in large parts of Europe and western Asia before disappearing 30,000 years ago. We present a draft sequence of the Neandertal genome composed of more than 4 billion nucleotides from three individuals. Comparisons of the Neandertal genome to the genomes of five present-day humans from different parts of the world identify a number of genomic regions that may have been affected by positive selection in ancestral modern humans, including genes involved in metabolism and in cognitive and skeletal development. We show that Neandertals shared more genetic variants with present-day humans in Eurasia than with present-day humans in sub-Saharan Africa, suggesting that gene flow from Neandertals into the ancestors of non-Africans occurred before the divergence of Eurasian groups from each other. PMID:20448178

  10. Illuminating the Black Box of Genome Sequence Assembly: A Free Online Tool to Introduce Students to Bioinformatics

    Science.gov (United States)

    Taylor, D. Leland; Campbell, A. Malcolm; Heyer, Laurie J.

    2013-01-01

    Next-generation sequencing technologies have greatly reduced the cost of sequencing genomes. With the current sequencing technology, a genome is broken into fragments and sequenced, producing millions of "reads." A computer algorithm pieces these reads together in the genome assembly process. PHAST is a set of online modules…

  11. An empirical strategy for characterizing bacterial proteomes across species in the absence of genomic sequences

    Energy Technology Data Exchange (ETDEWEB)

    Turse, Joshua E.; Marshall, Matthew J.; Fredrickson, Jim K.; Lipton, Mary S.; Callister, Stephen J.

    2010-11-12

    Current methods in proteomics are dependent on the availability of sequenced genomes to identify proteins. However, genomic sequences are not always available for bacteria or microbial communities, even with high throughput sequencing technology becoming more readily available. Nevertheless, the homology that exists between related bacteria makes possible the extraction of meaningful biological information from an organism’s, or community’s proteome using the genomic sequence of a near neighbor. Here, a cross-organism search strategy was used to look at the amount of proteomics information obtainable with relative genetic distance from a near neighbor organism and to identify proteins in the proteome of minimally characterized environmental isolates. We conclude that closely related organisms with sequenced genomes, can be used to characterize proteomes of organisms with unsequenced genomes. In general, a cross-organism search strategy demonstrates the first step to use of sequences genomes to evaluate the proteomes of environmental bacteria and microbial communities that have no sequenced genome

  12. BSMAP: whole genome bisulfite sequence MAPping program

    Directory of Open Access Journals (Sweden)

    Li Wei

    2009-07-01

    Full Text Available Abstract Background Bisulfite sequencing is a powerful technique to study DNA cytosine methylation. Bisulfite treatment followed by PCR amplification specifically converts unmethylated cytosines to thymine. Coupled with next generation sequencing technology, it is able to detect the methylation status of every cytosine in the genome. However, mapping high-throughput bisulfite reads to the reference genome remains a great challenge due to the increased searching space, reduced complexity of bisulfite sequence, asymmetric cytosine to thymine alignments, and multiple CpG heterogeneous methylation. Results We developed an efficient bisulfite reads mapping algorithm BSMAP to address the above issues. BSMAP combines genome hashing and bitwise masking to achieve fast and accurate bisulfite mapping. Compared with existing bisulfite mapping approaches, BSMAP is faster, more sensitive and more flexible. Conclusion BSMAP is the first general-purpose bisulfite mapping software. It is able to map high-throughput bisulfite reads at whole genome level with feasible memory and CPU usage. It is freely available under GPL v3 license at http://code.google.com/p/bsmap/.

  13. Agaricus bisporus genome sequence: a commentary.

    Science.gov (United States)

    Kerrigan, Richard W; Challen, Michael P; Burton, Kerry S

    2013-06-01

    The genomes of two isolates of Agaricus bisporus have been sequenced recently. This soil-inhabiting fungus has a wide geographical distribution in nature and it is also cultivated in an industrialized indoor process ($4.7bn annual worldwide value) to produce edible mushrooms. Previously this lignocellulosic fungus has resisted precise econutritional classification, i.e. into white- or brown-rot decomposers. The generation of the genome sequence and transcriptomic analyses has revealed a new classification, 'humicolous', for species adapted to grow in humic-rich, partially decomposed leaf material. The Agaricus biporus genomes contain a collection of polysaccharide and lignin-degrading genes and more interestingly an expanded number of genes (relative to other lignocellulosic fungi) that enhance degradation of lignin derivatives, i.e. heme-thiolate peroxidases and β-etherases. A motif that is hypothesized to be a promoter element in the humicolous adaptation suite is present in a large number of genes specifically up-regulated when the mycelium is grown on humic-rich substrate. The genome sequence of A. bisporus offers a platform to explore fungal biology in carbon-rich soil environments and terrestrial cycling of carbon, nitrogen, phosphorus and potassium.

  14. New complete genome sequences of human rhinoviruses shed light on their phylogeny and genomic features

    Directory of Open Access Journals (Sweden)

    Zdobnov Evgeny M

    2007-07-01

    Full Text Available Abstract Background Human rhinoviruses (HRV, the most frequent cause of respiratory infections, include 99 different serotypes segregating into two species, A and B. Rhinoviruses share extensive genomic sequence similarity with enteroviruses and both are part of the picornavirus family. Nevertheless they differ significantly at the phenotypic level. The lack of HRV full-length genome sequences and the absence of analysis comparing picornaviruses at the whole genome level limit our knowledge of the genomic features supporting these differences. Results Here we report complete genome sequences of 12 HRV-A and HRV-B serotypes, more than doubling the current number of available HRV sequences. The whole-genome maximum-likelihood phylogenetic analysis suggests that HRV-B and human enteroviruses (HEV diverged from the last common ancestor after their separation from HRV-A. On the other hand, compared to HEV, HRV-B are more related to HRV-A in the capsid and 3B-C regions. We also identified the presence of a 2C cis-acting replication element (cre in HRV-B that is not present in HRV-A, and that had been previously characterized only in HEV. In contrast to HEV viruses, HRV-A and HRV-B share also markedly lower GC content along the whole genome length. Conclusion Our findings provide basis to speculate about both the biological similarities and the differences (e.g. tissue tropism, temperature adaptation or acid lability of these three groups of viruses.

  15. Improved genome sequencing using an engineered transposase.

    Science.gov (United States)

    Kia, Amirali; Gloeckner, Christian; Osothprarop, Trina; Gormley, Niall; Bomati, Erin; Stephenson, Michelle; Goryshin, Igor; He, Molly Min

    2017-01-17

    Next-generation sequencing (NGS) has transformed genomic research by reducing turnaround time and cost. However, no major breakthrough has been made in the upstream library preparation methods until the transposase-based Nextera method was invented. Nextera combines DNA fragmentation and barcoding in a single tube reaction and therefore enables a very fast workflow to sequencing-ready DNA libraries within a couple of hours. When compared to the traditional ligation-based methods, transposed-based Nextera has a slight insertion bias. Here we present the discovery of a mutant transposase (Tn5-059) with a lowered GC insertion bias through protein engineering. We demonstrate Tn5-059 reduces AT dropout and increases uniformity of genome coverage in both bacterial genomes and human genome. We also observe higher library diversity generated by Tn5-059 when compared to Nextera v2 for human exomes, which leads to less sequencing and lower cost per genome. In addition, when used for human exomes, Tn5-059 delivers consistent library insert size over a range of input DNA, allowing up to a tenfold variance from the 50 ng input recommendation. Enhanced DNA input tolerance of Tn5-059 can translate to flexibility and robustness of workflow. DNA input tolerance together with superior uniformity of coverage and lower AT dropouts extend the applications of transposase based library preps. We discuss possible mechanisms of improvements in Tn5-059, and potential advantages of using the new mutant in varieties of applications including microbiome sequencing and chromatin profiling.

  16. Complete genome sequence of Pyrolobus fumarii type strain (1AT)

    Energy Technology Data Exchange (ETDEWEB)

    Anderson, Iain [U.S. Department of Energy, Joint Genome Institute; Goker, Markus [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Nolan, Matt [U.S. Department of Energy, Joint Genome Institute; Lucas, Susan [U.S. Department of Energy, Joint Genome Institute; Hammon, Nancy [U.S. Department of Energy, Joint Genome Institute; Deshpande, Shweta [U.S. Department of Energy, Joint Genome Institute; Cheng, Jan-Fang [U.S. Department of Energy, Joint Genome Institute; Tapia, Roxanne [Los Alamos National Laboratory (LANL); Han, Cliff [Los Alamos National Laboratory (LANL); Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Pitluck, Sam [U.S. Department of Energy, Joint Genome Institute; Huntemann, Marcel [U.S. Department of Energy, Joint Genome Institute; Liolios, Konstantinos [U.S. Department of Energy, Joint Genome Institute; Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Pagani, Ioanna [U.S. Department of Energy, Joint Genome Institute; Mavromatis, K [U.S. Department of Energy, Joint Genome Institute; Ovchinnikova, Galina [U.S. Department of Energy, Joint Genome Institute; Pati, Amrita [U.S. Department of Energy, Joint Genome Institute; Chen, Amy [U.S. Department of Energy, Joint Genome Institute; Palaniappan, Krishna [U.S. Department of Energy, Joint Genome Institute; Land, Miriam L [ORNL; Hauser, Loren John [ORNL; Brambilla, Evelyne-Marie [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Huber, Harald [Universitat Regensburg, Regensburg, Germany; Yasawong, Montri [HZI - Helmholtz Centre for Infection Research, Braunschweig, Germany; Rohde, Manfred [HZI - Helmholtz Centre for Infection Research, Braunschweig, Germany; Spring, Stefan [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Abt, Birte [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Sikorski, Johannes [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Wirth, Reinhard [Universitat Regensburg, Regensburg, Germany; Detter, J. Chris [U.S. Department of Energy, Joint Genome Institute; Woyke, Tanja [U.S. Department of Energy, Joint Genome Institute; Bristow, James [U.S. Department of Energy, Joint Genome Institute; Eisen, Jonathan [U.S. Department of Energy, Joint Genome Institute; Markowitz, Victor [U.S. Department of Energy, Joint Genome Institute; Hugenholtz, Philip [U.S. Department of Energy, Joint Genome Institute; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute; Klenk, Hans-Peter [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute

    2011-01-01

    Pyrolobus fumarii Bl chl et al. 1997 is the type species of the genus Pyrolobus, which be- longs to the crenarchaeal family Pyrodictiaceae. The species is a facultatively microaerophilic non-motile crenarchaeon. It is of interest because of its isolated phylogenetic location in the tree of life and because it is a hyperthermophilic chemolithoautotroph known as the primary producer of organic matter at deep-sea hydrothermal vents. P. fumarii exhibits currently the highest optimal growth temperature of all life forms on earth (106 C). This is the first com- pleted genome sequence of a member of the genus Pyrolobus to be published and only the second genome sequence from a member of the family Pyrodictiaceae. Although Diversa Corporation announced the completion of sequencing of the P. fumarii genome on Septem- ber 25, 2001, this sequence was never released to the public. The 1,843,267 bp long genome with its 1,986 protein-coding and 52 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  17. Genomic resources in fruit plants: an assessment of current status.

    Science.gov (United States)

    Rai, Manoj K; Shekhawat, N S

    2015-01-01

    The availability of many genomic resources such as genome sequences, functional genomics resources including microarrays and RNA-seq, sufficient numbers of molecular markers, express sequence tags (ESTs) and high-density genetic maps is causing a rapid acceleration of genetics and genomic research of many fruit plants. This is leading to an increase in our knowledge of the genes that are linked to many horticultural and agronomically important traits. Recently, some progress has also been made on the identification and functional analysis of miRNAs in some fruit plants. This is one of the most active research fields in plant sciences. The last decade has witnessed development of genomic resources in many fruit plants such as apple, banana, citrus, grapes, papaya, pears, strawberry etc.; however, many of them are still not being exploited. Furthermore, owing to lack of resources, infrastructure and research facilities in many lesser-developed countries, development of genomic resources in many underutilized or less-studied fruit crops, which grow in these countries, is limited. Thus, research emphasis should be given to those fruit crops for which genomic resources are relatively scarce. The development of genomic databases of these less-studied fruit crops will enable biotechnologists to identify target genes that underlie key horticultural and agronomical traits. This review presents an overview of the current status of the development of genomic resources in fruit plants with the main emphasis being on genome sequencing, EST resources, functional genomics resources including microarray and RNA-seq, identification of quantitative trait loci and construction of genetic maps as well as efforts made on the identification and functional analysis of miRNAs in fruit plants.

  18. Complete genome sequence of human astrovirus genotype 6

    Directory of Open Access Journals (Sweden)

    Vernet Guy

    2010-02-01

    Full Text Available Abstract Background Human astroviruses (HAstVs are one of the important causes of acute gastroenteritis in children. Currently, eight HAstV genotypes have been identified and all but two (HAstV-6 and HAstV-7 have been fully sequenced. We here sequenced and analyzed the complete genome of a HAstV-6 strain (192-BJ07, which was identified in Beijing, China. Results The genome of 192-BJ07 consists of 6745 nucleotides. The 192-BJ07 strain displays a 77.2-78.0% nucleotide sequence identity with other HAstV genotypes and exhibits amino acid sequence identities of 86.5-87.4%, 94.2-95.1%, and 65.5-74.8% in the ORF1a, ORF1b, and ORF2 regions, respectively. Homological analysis of ORF2 shows that 192-BJ07 is 96.3% identical to the documented HAstV-6 strain. Further, phylogenetic analysis indicates that different genomic regions are likely undergoing different evolutionary and selective pressures. No recombination event was observed in HAstV-6 in this study. Conclusion The completely sequenced and characterized genome of HAstV-6 (192-BJ07 provides further insight into the genetics of astroviruses and aids in the surveillance and control of HAstV gastroenteritis.

  19. Identifying driver mutations in sequenced cancer genomes

    DEFF Research Database (Denmark)

    Raphael, Benjamin J; Dobson, Jason R; Oesper, Layla

    2014-01-01

    High-throughput DNA sequencing is revolutionizing the study of cancer and enabling the measurement of the somatic mutations that drive cancer development. However, the resulting sequencing datasets are large and complex, obscuring the clinically important mutations in a background of errors, noise......, and random mutations. Here, we review computational approaches to identify somatic mutations in cancer genome sequences and to distinguish the driver mutations that are responsible for cancer from random, passenger mutations. First, we describe approaches to detect somatic mutations from high-throughput DNA...... sequencing data, particularly for tumor samples that comprise heterogeneous populations of cells. Next, we review computational approaches that aim to predict driver mutations according to their frequency of occurrence in a cohort of samples, or according to their predicted functional impact on protein...

  20. Whole genome sequence analysis of Mycobacterium suricattae

    KAUST Repository

    Dippenaar, Anzaan

    2015-10-21

    Tuberculosis occurs in various mammalian hosts and is caused by a range of different lineages of the Mycobacterium tuberculosis complex (MTBC). A recently described member, Mycobacterium suricattae, causes tuberculosis in meerkats (Suricata suricatta) in Southern Africa and preliminary genetic analysis showed this organism to be closely related to an MTBC pathogen of rock hyraxes (Procavia capensis), the dassie bacillus. Here we make use of whole genome sequencing to describe the evolution of the genome of M. suricattae, including known and novel regions of difference, SNPs and IS6110 insertion sites. We used genome-wide phylogenetic analysis to show that M. suricattae clusters with the chimpanzee bacillus, previously isolated from a chimpanzee (Pan troglodytes) in West Africa. We propose an evolutionary scenario for the Mycobacterium africanum lineage 6 complex, showing the evolutionary relationship of M. africanum and chimpanzee bacillus, and the closely related members M. suricattae, dassie bacillus and Mycobacterium mungi.

  1. Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and Genome Analyzer systems

    OpenAIRE

    Minoche, André E.; Dohm, Juliane C.; Himmelbauer, Heinz

    2011-01-01

    Background The generation and analysis of high-throughput sequencing data are becoming a major component of many studies in molecular biology and medical research. Illumina's Genome Analyzer (GA) and HiSeq instruments are currently the most widely used sequencing devices. Here, we comprehensively evaluate properties of genomic HiSeq and GAIIx data derived from two plant genomes and one virus, with read lengths of 95 to 150 bases. Results We provide quantifications and evidence for GC bias, er...

  2. Draft Genome Sequence of Rubrivivax gelatinosus CBS

    Energy Technology Data Exchange (ETDEWEB)

    Hu, P. S.; Lang, J.; Wawrousek, K.; Yu, J. P.; Maness, P. C.; Chen, J.

    2012-06-01

    Rubrivivax gelatinosus CBS, a purple nonsulfur photosynthetic bacterium, can grow photosynthetically using CO and N{sub 2} as the sole carbon and nitrogen nutrients, respectively. R. gelatinosus CBS is of particular interest due to its ability to metabolize CO and yield H{sub 2}. We present the 5-Mb draft genome sequence of R. gelatinosus CBS with the goal of providing genetic insight into the metabolic properties of this bacterium.

  3. ABySS-Explorer: visualizing genome sequence assemblies.

    Science.gov (United States)

    Nielsen, Cydney B; Jackman, Shaun D; Birol, Inanç; Jones, Steven J M

    2009-01-01

    One bottleneck in large-scale genome sequencing projects is reconstructing the full genome sequence from the short subsequences produced by current technologies. The final stages of the genome assembly process inevitably require manual inspection of data inconsistencies and could be greatly aided by visualization. This paper presents our design decisions in translating key data features identified through discussions with analysts into a concise visual encoding. Current visualization tools in this domain focus on local sequence errors making high-level inspection of the assembly difficult if not impossible. We present a novel interactive graph display, ABySS-Explorer, that emphasizes the global assembly structure while also integrating salient data features such as sequence length. Our tool replaces manual and in some cases pen-and-paper based analysis tasks, and we discuss how user feedback was incorporated into iterative design refinements. Finally, we touch on applications of this representation not initially considered in our design phase, suggesting the generality of this encoding for DNA sequence data.

  4. Genome sequence of Psychrobacter cibarius strain W1

    DEFF Research Database (Denmark)

    Raghupathi, Prem Krishnan; Herschend, Jakob; Røder, Henriette Lyng

    2016-01-01

    Here, we report the draft genome sequence of Psychrobacter cibarius strain W1, which was isolated at a slaughterhouse in Denmark. The 3.63-Mb genome sequence was assembled into 241 contigs.......Here, we report the draft genome sequence of Psychrobacter cibarius strain W1, which was isolated at a slaughterhouse in Denmark. The 3.63-Mb genome sequence was assembled into 241 contigs....

  5. Draft genome sequence of an aflatoxigenic Aspergillus species, A. bombycis

    Science.gov (United States)

    The genome of the A. bombycis Type strain was sequenced using a Personal Genome Machine, followed by annotation of its predicted genes. The genome size for A. bombycis was found to be approximately 37 Mb and contained 12,266 genes. This announcement introduces a sequenced genome for an aflatoxigenic...

  6. What Will We Do with a Cotton Genome Sequence?

    Institute of Scientific and Technical Information of China (English)

    BRUBAKER Curt

    2008-01-01

    @@ With the publication of "Toward Sequencing Cotton (Gossypium) Genomes" [Chen et al.PlantPhysiology,2007,145:1303-1310-] a clear consensus emerged from the cotton genomics community not only that cotton genome sequences were a critical resource for research and commercial innovationin cotton genomics,but that there was a logical means of achieving this goal.

  7. Sequencing of a Cultivated Diploid Cotton Genome-Gossypium arboreum

    Institute of Scientific and Technical Information of China (English)

    WILKINS; Thea; A

    2008-01-01

    Sequencing the genomes of crop species and model systems contributes significantly to our understanding of the organization,structure and function of plant genomes.In a `white paper' published in 2007,the cotton community set forth a strategic plan for sequencing the AD genome of cultivated upland cotton that initially targets less complex diploid genomes.This strategy banks on the high degree

  8. Identification of ancient remains through genomic sequencing

    Science.gov (United States)

    Blow, Matthew J.; Zhang, Tao; Woyke, Tanja; Speller, Camilla F.; Krivoshapkin, Andrei; Yang, Dongya Y.; Derevianko, Anatoly; Rubin, Edward M.

    2008-01-01

    Studies of ancient DNA have been hindered by the preciousness of remains, the small quantities of undamaged DNA accessible, and the limitations associated with conventional PCR amplification. In these studies, we developed and applied a genomewide adapter-mediated emulsion PCR amplification protocol for ancient mammalian samples estimated to be between 45,000 and 69,000 yr old. Using 454 Life Sciences (Roche) and Illumina sequencing (formerly Solexa sequencing) technologies, we examined over 100 megabases of DNA from amplified extracts, revealing unbiased sequence coverage with substantial amounts of nonredundant nuclear sequences from the sample sources and negligible levels of human contamination. We consistently recorded over 500-fold increases, such that nanogram quantities of starting material could be amplified to microgram quantities. Application of our protocol to a 50,000-yr-old uncharacterized bone sample that was unsuccessful in mitochondrial PCR provided sufficient nuclear sequences for comparison with extant mammals and subsequent phylogenetic classification of the remains. The combined use of emulsion PCR amplification and high-throughput sequencing allows for the generation of large quantities of DNA sequence data from ancient remains. Using such techniques, even small amounts of ancient remains with low levels of endogenous DNA preservation may yield substantial quantities of nuclear DNA, enabling novel applications of ancient DNA genomics to the investigation of extinct phyla. PMID:18426903

  9. Enhanced Dynamic Algorithm of Genome Sequence Alignments

    Directory of Open Access Journals (Sweden)

    Arabi E. keshk

    2014-05-01

    Full Text Available The merging of biology and computer science has created a new field called computational biology that explore the capacities of computers to gain knowledge from biological data, bioinformatics. Computational biology is rooted in life sciences as well as computers, information sciences, and technologies. The main problem in computational biology is sequence alignment that is a way of arranging the sequences of DNA, RNA or protein to identify the region of similarity and relationship between sequences. This paper introduces an enhancement of dynamic algorithm of genome sequence alignment, which called EDAGSA. It is filling the three main diagonals without filling the entire matrix by the unused data. It gets the optimal solution with decreasing the execution time and therefore the performance is increased. To illustrate the effectiveness of optimizing the performance of the proposed algorithm, it is compared with the traditional methods such as Needleman-Wunsch, Smith-Waterman and longest common subsequence algorithms. Also, database is implemented for using the algorithm in multi-sequence alignments for searching the optimal sequence that matches the given sequence.

  10. Assembling the 20 Gb white spruce (Picea glauca) genome from whole-genome shotgun sequencing data.

    Science.gov (United States)

    Birol, Inanc; Raymond, Anthony; Jackman, Shaun D; Pleasance, Stephen; Coope, Robin; Taylor, Greg A; Yuen, Macaire Man Saint; Keeling, Christopher I; Brand, Dana; Vandervalk, Benjamin P; Kirk, Heather; Pandoh, Pawan; Moore, Richard A; Zhao, Yongjun; Mungall, Andrew J; Jaquish, Barry; Yanchuk, Alvin; Ritland, Carol; Boyle, Brian; Bousquet, Jean; Ritland, Kermit; Mackay, John; Bohlmann, Jörg; Jones, Steven J M

    2013-06-15

    White spruce (Picea glauca) is a dominant conifer of the boreal forests of North America, and providing genomics resources for this commercially valuable tree will help improve forest management and conservation efforts. Sequencing and assembling the large and highly repetitive spruce genome though pushes the boundaries of the current technology. Here, we describe a whole-genome shotgun sequencing strategy using two Illumina sequencing platforms and an assembly approach using the ABySS software. We report a 20.8 giga base pairs draft genome in 4.9 million scaffolds, with a scaffold N50 of 20,356 bp. We demonstrate how recent improvements in the sequencing technology, especially increasing read lengths and paired end reads from longer fragments have a major impact on the assembly contiguity. We also note that scalable bioinformatics tools are instrumental in providing rapid draft assemblies. The Picea glauca genome sequencing and assembly data are available through NCBI (Accession#: ALWZ0100000000 PID: PRJNA83435). http://www.ncbi.nlm.nih.gov/bioproject/83435.

  11. DEVELOPMENT OF NEW SEQUENCING TECHNOLOGIES AND THEIR APPLICATION IN GENOME ANALYSIS OF DOMESTIC ANIMALS

    Directory of Open Access Journals (Sweden)

    Kristina Gvozdanović

    2015-12-01

    Full Text Available Sequencing and detailed study of the genom of domestic animals began in the middle of the last century. It was primarily referred to development of the first generation sequencing methods, i.e. Sanger sequencing method. Next generation sequencing methods are currently the most common methods in the analysis of domestic animals genom. The application of these methods gave us up to 100 time more data in comparison with Sanger method. Analyses including RNA sequencing, genotyping of whole genome, immunoprecipitation associated with DNA microarrays, detection ofmutations and inherited diseases, sequencing ofthemitochondrial genome and many others have been conducted with development and application of new sequencing methods since 2005 until today. Application of new sequencing methods in the analysis ofdomestic animal genome provides better understanding of the genetic basis for important production traits which could help in improving the livestock production.

  12. Current-voltage characteristics of double-strand DNA sequences

    Science.gov (United States)

    Bezerril, L. M.; Moreira, D. A.; Albuquerque, E. L.; Fulco, U. L.; de Oliveira, E. L.; de Sousa, J. S.

    2009-09-01

    We use a tight-binding formulation to investigate the transmissivity and the current-voltage (I-V) characteristics of sequences of double-strand DNA molecules. In order to reveal the relevance of the underlying correlations in the nucleotides distribution, we compare the results for the genomic DNA sequence with those of artificial sequences (the long-range correlated Fibonacci and Rudin-Shapiro one) and a random sequence, which is a kind of prototype of a short-range correlated system. The random sequence is presented here with the same first neighbors pair correlations of the human DNA sequence. We found that the long-range character of the correlations is important to the transmissivity spectra, although the I-V curves seem to be mostly influenced by the short-range correlations.

  13. Current-voltage characteristics of double-strand DNA sequences

    Energy Technology Data Exchange (ETDEWEB)

    Bezerril, L.M.; Moreira, D.A. [Departamento de Fisica, Universidade Federal do Rio Grande do Norte, 59072-970, Natal-RN (Brazil); Albuquerque, E.L., E-mail: eudenilson@dfte.ufrn.b [Departamento de Fisica, Universidade Federal do Rio Grande do Norte, 59072-970, Natal-RN (Brazil); Fulco, U.L. [Departamento de Biofisica e Farmacologia, Universidade Federal do Rio Grande do Norte, 59072-970, Natal-RN (Brazil); Oliveira, E.L. de; Sousa, J.S. de [Departamento de Fisica, Universidade Federal do Ceara, 60455-760, Fortaleza-CE (Brazil)

    2009-09-07

    We use a tight-binding formulation to investigate the transmissivity and the current-voltage (I-V) characteristics of sequences of double-strand DNA molecules. In order to reveal the relevance of the underlying correlations in the nucleotides distribution, we compare the results for the genomic DNA sequence with those of artificial sequences (the long-range correlated Fibonacci and Rudin-Shapiro one) and a random sequence, which is a kind of prototype of a short-range correlated system. The random sequence is presented here with the same first neighbors pair correlations of the human DNA sequence. We found that the long-range character of the correlations is important to the transmissivity spectra, although the I-V curves seem to be mostly influenced by the short-range correlations.

  14. Detecting overlapping coding sequences in virus genomes

    Directory of Open Access Journals (Sweden)

    Brown Chris M

    2006-02-01

    Full Text Available Abstract Background Detecting new coding sequences (CDSs in viral genomes can be difficult for several reasons. The typically compact genomes often contain a number of overlapping coding and non-coding functional elements, which can result in unusual patterns of codon usage; conservation between related sequences can be difficult to interpret – especially within overlapping genes; and viruses often employ non-canonical translational mechanisms – e.g. frameshifting, stop codon read-through, leaky-scanning and internal ribosome entry sites – which can conceal potentially coding open reading frames (ORFs. Results In a previous paper we introduced a new statistic – MLOGD (Maximum Likelihood Overlapping Gene Detector – for detecting and analysing overlapping CDSs. Here we present (a an improved MLOGD statistic, (b a greatly extended suite of software using MLOGD, (c a database of results for 640 virus sequence alignments, and (d a web-interface to the software and database. Tests show that, from an alignment with just 20 mutations, MLOGD can discriminate non-overlapping CDSs from non-coding ORFs with a typical accuracy of up to 98%, and can detect CDSs overlapping known CDSs with a typical accuracy of 90%. In addition, the software produces a variety of statistics and graphics, useful for analysing an input multiple sequence alignment. Conclusion MLOGD is an easy-to-use tool for virus genome annotation, detecting new CDSs – in particular overlapping or short CDSs – and for analysing overlapping CDSs following frameshift sites. The software, web-server, database and supplementary material are available at http://guinevere.otago.ac.nz/mlogd.html.

  15. Whole genome sequencing reveals genomic heterogeneity and antibiotic purification in Mycobacterium tuberculosis isolates

    KAUST Repository

    Black, PA

    2015-10-24

    Background Whole genome sequencing has revolutionised the interrogation of mycobacterial genomes. Recent studies have reported conflicting findings on the genomic stability of Mycobacterium tuberculosis during the evolution of drug resistance. In an age where whole genome sequencing is increasingly relied upon for defining the structure of bacterial genomes, it is important to investigate the reliability of next generation sequencing to identify clonal variants present in a minor percentage of the population. This study aimed to define a reliable cut-off for identification of low frequency sequence variants and to subsequently investigate genetic heterogeneity and the evolution of drug resistance in M. tuberculosis. Methods Genomic DNA was isolated from single colonies from 14 rifampicin mono-resistant M. tuberculosis isolates, as well as the primary cultures and follow up MDR cultures from two of these patients. The whole genomes of the M. tuberculosis isolates were sequenced using either the Illumina MiSeq or Illumina HiSeq platforms. Sequences were analysed with an in-house pipeline. Results Using next-generation sequencing in combination with Sanger sequencing and statistical analysis we defined a read frequency cut-off of 30 % to identify low frequency M. tuberculosis variants with high confidence. Using this cut-off we demonstrated a high rate of genetic diversity between single colonies isolated from one population, showing that by using the current sequencing technology, single colonies are not a true reflection of the genetic diversity within a whole population and vice versa. We further showed that numerous heterogeneous variants emerge and then disappear during the evolution of isoniazid resistance within individual patients. Our findings allowed us to formulate a model for the selective bottleneck which occurs during the course of infection, acting as a genomic purification event. Conclusions Our study demonstrated true levels of genetic diversity

  16. Supplementary Material for: Whole genome sequencing reveals genomic heterogeneity and antibiotic purification in Mycobacterium tuberculosis isolates

    KAUST Repository

    Black, PA

    2015-01-01

    Abstract Background Whole genome sequencing has revolutionised the interrogation of mycobacterial genomes. Recent studies have reported conflicting findings on the genomic stability of Mycobacterium tuberculosis during the evolution of drug resistance. In an age where whole genome sequencing is increasingly relied upon for defining the structure of bacterial genomes, it is important to investigate the reliability of next generation sequencing to identify clonal variants present in a minor percentage of the population. This study aimed to define a reliable cut-off for identification of low frequency sequence variants and to subsequently investigate genetic heterogeneity and the evolution of drug resistance in M. tuberculosis. Methods Genomic DNA was isolated from single colonies from 14 rifampicin mono-resistant M. tuberculosis isolates, as well as the primary cultures and follow up MDR cultures from two of these patients. The whole genomes of the M. tuberculosis isolates were sequenced using either the Illumina MiSeq or Illumina HiSeq platforms. Sequences were analysed with an in-house pipeline. Results Using next-generation sequencing in combination with Sanger sequencing and statistical analysis we defined a read frequency cut-off of 30 % to identify low frequency M. tuberculosis variants with high confidence. Using this cut-off we demonstrated a high rate of genetic diversity between single colonies isolated from one population, showing that by using the current sequencing technology, single colonies are not a true reflection of the genetic diversity within a whole population and vice versa. We further showed that numerous heterogeneous variants emerge and then disappear during the evolution of isoniazid resistance within individual patients. Our findings allowed us to formulate a model for the selective bottleneck which occurs during the course of infection, acting as a genomic purification event. Conclusions Our study demonstrated true levels of genetic

  17. An evaluation of Comparative Genome Sequencing (CGS by comparing two previously-sequenced bacterial genomes

    Directory of Open Access Journals (Sweden)

    Herring Christopher D

    2007-08-01

    Full Text Available Abstract Background With the development of new technology, it has recently become practical to resequence the genome of a bacterium after experimental manipulation. It is critical though to know the accuracy of the technique used, and to establish confidence that all of the mutations were detected. Results In order to evaluate the accuracy of genome resequencing using the microarray-based Comparative Genome Sequencing service provided by Nimblegen Systems Inc., we resequenced the E. coli strain W3110 Kohara using MG1655 as a reference, both of which have been completely sequenced using traditional sequencing methods. CGS detected 7 of 8 small sequence differences, one large deletion, and 9 of 12 IS element insertions present in W3110, but did not detect a large chromosomal inversion. In addition, we confirmed that CGS also detected 2 SNPs, one deletion and 7 IS element insertions that are not present in the genome sequence, which we attribute to changes that occurred after the creation of the W3110 lambda clone library. The false positive rate for SNPs was one per 244 Kb of genome sequence. Conclusion CGS is an effective way to detect multiple mutations present in one bacterium relative to another, and while highly cost-effective, is prone to certain errors. Mutations occurring in repeated sequences or in sequences with a high degree of secondary structure may go undetected. It is also critical to follow up on regions of interest in which SNPs were not called because they often indicate deletions or IS element insertions.

  18. Why Assembling Plant Genome Sequences Is So Challenging

    Directory of Open Access Journals (Sweden)

    Pedro Seoane

    2012-09-01

    Full Text Available In spite of the biological and economic importance of plants, relatively few plant species have been sequenced. Only the genome sequence of plants with relatively small genomes, most of them angiosperms, in particular eudicots, has been determined. The arrival of next-generation sequencing technologies has allowed the rapid and efficient development of new genomic resources for non-model or orphan plant species. But the sequencing pace of plants is far from that of animals and microorganisms. This review focuses on the typical challenges of plant genomes that can explain why plant genomics is less developed than animal genomics. Explanations about the impact of some confounding factors emerging from the nature of plant genomes are given. As a result of these challenges and confounding factors, the correct assembly and annotation of plant genomes is hindered, genome drafts are produced, and advances in plant genomics are delayed.

  19. Why Assembling Plant Genome Sequences Is So Challenging

    Science.gov (United States)

    Claros, Manuel Gonzalo; Bautista, Rocío; Guerrero-Fernández, Darío; Benzerki, Hicham; Seoane, Pedro; Fernández-Pozo, Noé

    2012-01-01

    In spite of the biological and economic importance of plants, relatively few plant species have been sequenced. Only the genome sequence of plants with relatively small genomes, most of them angiosperms, in particular eudicots, has been determined. The arrival of next-generation sequencing technologies has allowed the rapid and efficient development of new genomic resources for non-model or orphan plant species. But the sequencing pace of plants is far from that of animals and microorganisms. This review focuses on the typical challenges of plant genomes that can explain why plant genomics is less developed than animal genomics. Explanations about the impact of some confounding factors emerging from the nature of plant genomes are given. As a result of these challenges and confounding factors, the correct assembly and annotation of plant genomes is hindered, genome drafts are produced, and advances in plant genomics are delayed. PMID:24832233

  20. Genomic relationships computed from either next- generation sequence or array SNP data

    NARCIS (Netherlands)

    Perez Enciso, M.

    2014-01-01

    The use of sequence data in genomic prediction models is a topic of high interest, given the decreasing prices of current next'-generation sequencing technologies (NGS) and the theoretical possibility of directly interrogating the genomes for all causal mutations. Here, we compare by simulation how

  1. Construction of an integrated database to support genomic sequence analysis

    Energy Technology Data Exchange (ETDEWEB)

    Gilbert, W.; Overbeek, R.

    1994-11-01

    The central goal of this project is to develop an integrated database to support comparative analysis of genomes including DNA sequence data, protein sequence data, gene expression data and metabolism data. In developing the logic-based system GenoBase, a broader integration of available data was achieved due to assistance from collaborators. Current goals are to easily include new forms of data as they become available and to easily navigate through the ensemble of objects described within the database. This report comments on progress made in these areas.

  2. The impact of next-generation sequencing on genomics

    Institute of Scientific and Technical Information of China (English)

    Jun Zhang; Rod Chiodini; Ahmed Badr; Genfa Zhang

    2011-01-01

    This article reviews basic concepts,general applications,and the potential impact of next-generation sequencing(NGS)technologies on genomics,with particular reference to currently available and possible future platforms and bioinformatics.NGS technologies have demonstrated the capacity to sequence DNA at unprecedented speed,thereby enabling previously unimaginable scientific achievements and novel biological applications.But,the massive data produced by NGS also presents a significant challenge for data storage,analyses,and management solutions.Advanced bioinformatic tools are essential for the successful application of NGS technology.As evidenced throughout this review,NGS technologies will have a striking impact on genomic research and the entire biological field.With its ability to tackle the unsolved challenges unconquered by previous genomic technologies,NGS is likely to unravel the complexity of the human genome in terms of genetic variations,some of which may be confined to susceptible loci for some common human conditions.The impact of NGS technologies on genomics will be far reaching and likely change the field for years to come.

  3. Building the sequence map of the human pan-genome

    DEFF Research Database (Denmark)

    Li, Ruiqiang; Li, Yingrui; Zheng, Hancheng

    2010-01-01

    Here we integrate the de novo assembly of an Asian and an African genome with the NCBI reference human genome, as a step toward constructing the human pan-genome. We identified approximately 5 Mb of novel sequences not present in the reference genome in each of these assemblies. Most novel...... analysis of predicted genes indicated that the novel sequences contain potentially functional coding regions. We estimate that a complete human pan-genome would contain approximately 19-40 Mb of novel sequence not present in the extant reference genome. The extensive amount of novel sequence contributing...... to the genetic variation of the pan-genome indicates the importance of using complete genome sequencing and de novo assembly....

  4. Complete genome sequence of Rhodospirillum rubrum type strain (S1).

    Science.gov (United States)

    Munk, A Christine; Copeland, Alex; Lucas, Susan; Lapidus, Alla; Del Rio, Tijana Glavina; Barry, Kerrie; Detter, John C; Hammon, Nancy; Israni, Sanjay; Pitluck, Sam; Brettin, Thomas; Bruce, David; Han, Cliff; Tapia, Roxanne; Gilna, Paul; Schmutz, Jeremy; Larimer, Frank; Land, Miriam; Kyrpides, Nikos C; Mavromatis, Konstantinos; Richardson, Paul; Rohde, Manfred; Göker, Markus; Klenk, Hans-Peter; Zhang, Yaoping; Roberts, Gary P; Reslewic, Susan; Schwartz, David C

    2011-07-01

    Rhodospirillum rubrum (Esmarch 1887) Molisch 1907 is the type species of the genus Rhodospirillum, which is the type genus of the family Rhodospirillaceae in the class Alphaproteobacteria. The species is of special interest because it is an anoxygenic phototroph that produces extracellular elemental sulfur (instead of oxygen) while harvesting light. It contains one of the most simple photosynthetic systems currently known, lacking light harvesting complex 2. Strain S1(T) can grow on carbon monoxide as sole energy source. With currently over 1,750 PubMed entries, R. rubrum is one of the most intensively studied microbial species, in particular for physiological and genetic studies. Next to R. centenum strain SW, the genome sequence of strain S1(T) is only the second genome of a member of the genus Rhodospirillum to be published, but the first type strain genome from the genus. The 4,352,825 bp long chromosome and 53,732 bp plasmid with a total of 3,850 protein-coding and 83 RNA genes were sequenced as part of the DOE Joint Genome Institute Program DOEM 2002.

  5. Large-Scale Sequencing: The Future of Genomic Sciences Colloquium

    Energy Technology Data Exchange (ETDEWEB)

    Margaret Riley; Merry Buckley

    2009-01-01

    Genetic sequencing and the various molecular techniques it has enabled have revolutionized the field of microbiology. Examining and comparing the genetic sequences borne by microbes - including bacteria, archaea, viruses, and microbial eukaryotes - provides researchers insights into the processes microbes carry out, their pathogenic traits, and new ways to use microorganisms in medicine and manufacturing. Until recently, sequencing entire microbial genomes has been laborious and expensive, and the decision to sequence the genome of an organism was made on a case-by-case basis by individual researchers and funding agencies. Now, thanks to new technologies, the cost and effort of sequencing is within reach for even the smallest facilities, and the ability to sequence the genomes of a significant fraction of microbial life may be possible. The availability of numerous microbial genomes will enable unprecedented insights into microbial evolution, function, and physiology. However, the current ad hoc approach to gathering sequence data has resulted in an unbalanced and highly biased sampling of microbial diversity. A well-coordinated, large-scale effort to target the breadth and depth of microbial diversity would result in the greatest impact. The American Academy of Microbiology convened a colloquium to discuss the scientific benefits of engaging in a large-scale, taxonomically-based sequencing project. A group of individuals with expertise in microbiology, genomics, informatics, ecology, and evolution deliberated on the issues inherent in such an effort and generated a set of specific recommendations for how best to proceed. The vast majority of microbes are presently uncultured and, thus, pose significant challenges to such a taxonomically-based approach to sampling genome diversity. However, we have yet to even scratch the surface of the genomic diversity among cultured microbes. A coordinated sequencing effort of cultured organisms is an appropriate place to begin

  6. Detecting long tandem duplications in genomic sequences

    Directory of Open Access Journals (Sweden)

    Audemard Eric

    2012-05-01

    Full Text Available Abstract Background Detecting duplication segments within completely sequenced genomes provides valuable information to address genome evolution and in particular the important question of the emergence of novel functions. The usual approach to gene duplication detection, based on all-pairs protein gene comparisons, provides only a restricted view of duplication. Results In this paper, we introduce ReD Tandem, a software using a flow based chaining algorithm targeted at detecting tandem duplication arrays of moderate to longer length regions, with possibly locally weak similarities, directly at the DNA level. On the A. thaliana genome, using a reference set of tandem duplicated genes built using TAIR,a we show that ReD Tandem is able to predict a large fraction of recently duplicated genes (dS  Conclusions ReD Tandem allows to identify large tandem duplications without any annotation, leading to agnostic identification of tandem duplications. This approach nicely complements the usual protein gene based which ignores duplications involving non coding regions. It is however inherently restricted to relatively recent duplications. By recovering otherwise ignored events, ReD Tandem gives a more comprehensive view of existing evolutionary processes and may also allow to improve existing annotations.

  7. Whole genome sequence-based serogrouping of Listeria monocytogenes isolates.

    Science.gov (United States)

    Hyden, Patrick; Pietzka, Ariane; Lennkh, Anna; Murer, Andrea; Springer, Burkhard; Blaschitz, Marion; Indra, Alexander; Huhulescu, Steliana; Allerberger, Franz; Ruppitsch, Werner; Sensen, Christoph W

    2016-10-10

    Whole genome sequencing (WGS) is currently becoming the method of choice for characterization of Listeria monocytogenes isolates in national reference laboratories (NRLs). WGS is superior with regards to accuracy, resolution and analysis speed in comparison to several other methods including serotyping, PCR, pulsed field gel electrophoresis (PFGE), multilocus sequence typing (MLST), multilocus variable number tandem repeat analysis (MLVA), and multivirulence-locus sequence typing (MVLST), which have been used thus far for the characterization of bacterial isolates (and are still important tools in reference laboratories today) to control and prevent listeriosis, one of the major sources of foodborne diseases for humans. Backward compatibility of WGS to former methods can be maintained by extraction of the respective information from WGS data. Serotyping was the first subtyping method for L. monocytogenes capable of differentiating 12 serovars and national reference laboratories still perform serotyping and PCR-based serogrouping as a first level classification method for Listeria monocytogenes surveillance. Whole genome sequence based core genome MLST analysis of a L. monocytogenes collection comprising 172 isolates spanning all 12 serotypes was performed for serogroup determination. These isolates clustered according to their serotypes and it was possible to group them either into the IIa, IIc, IVb or IIb clusters, respectively, which were generated by minimum spanning tree (MST) and neighbor joining (NJ) tree data analysis, demonstrating the power of the new approach. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.

  8. Multi-platform next-generation sequencing of the domestic turkey (Meleagris gallopavo) genome assembly and analysis

    Science.gov (United States)

    Next-generation sequencing technologies were used to rapidly and efficiently sequence the genome of the domestic turkey (Meleagris gallopavo). The current genome assembly (~1.1 Gb) includes 917 Mb of sequence assigned to chromosomes. Innate heterozygosity of the sequenced bird allowed discovery of...

  9. GI-SVM: A sensitive method for predicting genomic islands based on unannotated sequence of a single genome.

    Science.gov (United States)

    Lu, Bingxin; Leong, Hon Wai

    2016-02-01

    Genomic islands (GIs) are clusters of functionally related genes acquired by lateral genetic transfer (LGT), and they are present in many bacterial genomes. GIs are extremely important for bacterial research, because they not only promote genome evolution but also contain genes that enhance adaption and enable antibiotic resistance. Many methods have been proposed to predict GI. But most of them rely on either annotations or comparisons with other closely related genomes. Hence these methods cannot be easily applied to new genomes. As the number of newly sequenced bacterial genomes rapidly increases, there is a need for methods to detect GI based solely on sequences of a single genome. In this paper, we propose a novel method, GI-SVM, to predict GIs given only the unannotated genome sequence. GI-SVM is based on one-class support vector machine (SVM), utilizing composition bias in terms of k-mer content. From our evaluations on three real genomes, GI-SVM can achieve higher recall compared with current methods, without much loss of precision. Besides, GI-SVM allows flexible parameter tuning to get optimal results for each genome. In short, GI-SVM provides a more sensitive method for researchers interested in a first-pass detection of GI in newly sequenced genomes.

  10. Genomic Sequence Comparisons, 1987-2003 Final Report

    Energy Technology Data Exchange (ETDEWEB)

    George M. Church

    2004-07-29

    This project was to develop new DNA sequencing and RNA and protein quantitation methods and related genome annotation tools. The project began in 1987 with the development of multiplex sequencing (published in Science in 1988), and one of the first automated sequencing methods. This lead to the first commercial genome sequence in 1994 and to the establishment of the main commercial participants (GTC then Agencourt) in the public DOE/NIH genome project. In collaboration with GTC we contributed to one of the first complete DOE genome sequences, in 1997, that of Methanobacterium thermoautotropicum, a species of great relevance to energy-rich gas production.

  11. Genome sequence and analysis of the tuber crop potato

    DEFF Research Database (Denmark)

    Xu, X.; Pan, S.; Cheng, S.

    2011-01-01

    and assemble 86% of the 844-megabase genome. We predict 39,031 protein-coding genes and present evidence for at least two genome duplication events indicative of a palaeopolyploid origin. As the first genome sequence of an asterid, the potato genome reveals 2,642 genes specific to this large angiosperm clade...

  12. Two genome sequences of the same bacterial strain, Gluconacetobacter diazotrophicus PAl 5, suggest a new standard in genome sequence submission.

    Science.gov (United States)

    Giongo, Adriana; Tyler, Heather L; Zipperer, Ursula N; Triplett, Eric W

    2010-06-15

    Gluconacetobacter diazotrophicus PAl 5 is of agricultural significance due to its ability to provide fixed nitrogen to plants. Consequently, its genome sequence has been eagerly anticipated to enhance understanding of endophytic nitrogen fixation. Two groups have sequenced the PAl 5 genome from the same source (ATCC 49037), though the resulting sequences contain a surprisingly high number of differences. Therefore, an optical map of PAl 5 was constructed in order to determine which genome assembly more closely resembles the chromosomal DNA by aligning each sequence against a physical map of the genome. While one sequence aligned very well, over 98% of the second sequence contained numerous rearrangements. The many differences observed between these two genome sequences could be owing to either assembly errors or rapid evolutionary divergence. The extent of the differences derived from sequence assembly errors could be assessed if the raw sequencing reads were provided by both genome centers at the time of genome sequence submission. Hence, a new genome sequence standard is proposed whereby the investigator supplies the raw reads along with the closed sequence so that the community can make more accurate judgments on whether differences observed in a single stain may be of biological origin or are simply caused by differences in genome assembly procedures.

  13. Genome Sequence of Stachybotrys chartarum Strain 51-11

    OpenAIRE

    Betancourt, Doris A.; Dean, Timothy R.; Kim, Jean; Levy, Josh

    2015-01-01

    The Stachybotrys chartarum strain 51-11 genome was sequenced by shotgun sequencing utilizing Illumina HiSeq 2000 and PacBio technologies. Since S. chartarum has been implicated as having health impacts within water-damaged buildings, any information extracted from the genomic sequence data relating to toxins or the metabolism of the fungus might be useful.

  14. Complete Genome Sequence of Rift Valley Fever Virus Strain Lunyo.

    Science.gov (United States)

    Lumley, Sarah; Horton, Daniel L; Marston, Denise A; Johnson, Nicholas; Ellis, Richard J; Fooks, Anthony R; Hewson, Roger

    2016-04-14

    Using next-generation sequencing technologies, the first complete genome sequence of Rift Valley fever virus strain Lunyo is reported here. Originally reported as an attenuated antigenic variant strain from Uganda, genomic sequence analysis shows that Lunyo clusters together with other Ugandan isolates.

  15. Draft genome sequence of the sexually transmitted pathogen Trichomonas vaginalis

    DEFF Research Database (Denmark)

    Carlton, Jane M.; Hirt, Robert P.; Silva, Joana C.

    2007-01-01

    We describe the genome sequence of the protist Trichomonas vaginalis, a sexually transmitted human pathogen. Repeats and transposable elements comprise about two-thirds of the approximately 160-megabase genome, reflecting a recent massive expansion of genetic material. This expansion...

  16. Coevolution between simple sequence repeats (SSRs and virus genome size

    Directory of Open Access Journals (Sweden)

    Zhao Xiangyan

    2012-08-01

    Full Text Available Abstract Background Relationship between the level of repetitiveness in genomic sequence and genome size has been investigated by making use of complete prokaryotic and eukaryotic genomes, but relevant studies have been rarely made in virus genomes. Results In this study, a total of 257 viruses were examined, which cover 90% of genera. The results showed that simple sequence repeats (SSRs is strongly, positively and significantly correlated with genome size. Certain repeat class is distributed in a certain range of genome sequence length. Mono-, di- and tri- repeats are widely distributed in all virus genomes, tetra- SSRs as a common component consist in genomes which more than 100 kb in size; in the range of genome  Conclusions We conducted this research standing on the height of the whole virus. We concluded that genome size is an important factor in affecting the occurrence of SSRs; hosts are also responsible for the variances of SSRs content to a certain degree.

  17. Co-barcoded sequence reads from long DNA fragments: A cost-effective solution for Perfect Genome sequencing

    Directory of Open Access Journals (Sweden)

    Brock A Peters

    2015-01-01

    Full Text Available Next generation sequencing (NGS technologies, primarily based on massively parallel sequencing (MPS, have touched and radically changed almost all aspects of research worldwide. These technologies have allowed for the rapid analysis, to date, of the genomes of more than 2,000 different species. In humans, NGS has arguably had the largest impact. Over 100,000 genomes of individual humans (based on various estimates have been sequenced allowing for deep insights into what makes individuals and families unique and what causes disease in each of us. Despite all of this progress, the current state of the art in sequence technology is far from generating a perfect genome sequence and much remains to be understood in the biology of human and other organisms’ genomes. In the article that follows we outline, why the perfect genome in humans is important, what is lacking from current human whole genome sequences, and a potential strategy for achieving the perfect genome in a cost effective manner.

  18. Draft Genome Sequences of Klebsiella variicola Plant Isolates.

    Science.gov (United States)

    Martínez-Romero, Esperanza; Silva-Sanchez, Jesús; Barrios, Humberto; Rodríguez-Medina, Nadia; Martínez-Barnetche, Jesús; Téllez-Sosa, Juan; Gómez-Barreto, Rosa Elena; Garza-Ramos, Ulises

    2015-09-10

    Three endophytic Klebsiella variicola isolates-T29A, 3, and 6A2, obtained from sugar cane stem, maize shoots, and banana leaves, respectively-were used for whole-genome sequencing. Here, we report the draft genome sequences of circular chromosomes and plasmids. The genomes contain plant colonization and cellulases genes. This study will help toward understanding the genomic basis of K. variicola interaction with plant hosts.

  19. Reconstructing cancer genomes from paired-end sequencing data

    Directory of Open Access Journals (Sweden)

    Oesper Layla

    2012-04-01

    Full Text Available Abstract Background A cancer genome is derived from the germline genome through a series of somatic mutations. Somatic structural variants - including duplications, deletions, inversions, translocations, and other rearrangements - result in a cancer genome that is a scrambling of intervals, or "blocks" of the germline genome sequence. We present an efficient algorithm for reconstructing the block organization of a cancer genome from paired-end DNA sequencing data. Results By aligning paired reads from a cancer genome - and a matched germline genome, if available - to the human reference genome, we derive: (i a partition of the reference genome into intervals; (ii adjacencies between these intervals in the cancer genome; (iii an estimated copy number for each interval. We formulate the Copy Number and Adjacency Genome Reconstruction Problem of determining the cancer genome as a sequence of the derived intervals that is consistent with the measured adjacencies and copy numbers. We design an efficient algorithm, called Paired-end Reconstruction of Genome Organization (PREGO, to solve this problem by reducing it to an optimization problem on an interval-adjacency graph constructed from the data. The solution to the optimization problem results in an Eulerian graph, containing an alternating Eulerian tour that corresponds to a cancer genome that is consistent with the sequencing data. We apply our algorithm to five ovarian cancer genomes that were sequenced as part of The Cancer Genome Atlas. We identify numerous rearrangements, or structural variants, in these genomes, analyze reciprocal vs. non-reciprocal rearrangements, and identify rearrangements consistent with known mechanisms of duplication such as tandem duplications and breakage/fusion/bridge (B/F/B cycles. Conclusions We demonstrate that PREGO efficiently identifies complex and biologically relevant rearrangements in cancer genome sequencing data. An implementation of the PREGO algorithm is

  20. Genome sequencing and annotation of Morganella sp. SA36

    Directory of Open Access Journals (Sweden)

    Samy Selim

    2015-12-01

    Full Text Available We report draft genome sequence of Morganella sp. Strain SA36, isolated from water spring in Aljouf region, Saudi Arabia. The draft genome size is 2,564,439 bp with a G + C content of 51.1% and contains 6 rRNA sequence (single copies of 5S, 16S & 23S rRNA. The genome sequence can be accessed at DDBJ/EMBL/GenBank under the accession no. LDNQ00000000.

  1. Genome sequencing and annotation of Stenotrophomonas sp. SAM8

    Directory of Open Access Journals (Sweden)

    Samy Selim

    2015-12-01

    Full Text Available We report draft genome sequence of Stenotrophomonas sp. strain SAM8, isolated from environmental water. The draft genome size is 3,665,538 bp with a G + C content of 67.2% and contains 6 rRNA sequence (single copies of 5S, 16S & 23S rRNA. The genome sequence can be accessed at DDBJ/EMBL/GenBank under the accession no. LDAV00000000.

  2. Genome sequencing and annotation of Proteus sp. SAS71

    Directory of Open Access Journals (Sweden)

    Samy Selim

    2015-12-01

    Full Text Available We report draft genome sequence of Proteus sp. strain SAS71, isolated from water spring in Aljouf region, Saudi Arabia. The draft genome size is 3,037,704 bp with a G + C content of 39.3% and contains 6 rRNA sequence (single copies of 5S, 16S & 23S rRNA. The genome sequence can be accessed at DDBJ/EMBL/GenBank under the accession no. LDIU00000000.

  3. Complete Genome Sequence of Corynebacterium pseudotuberculosis Viscerotropic Strain N1

    Science.gov (United States)

    Portela, Ricardo W.; Sousa, Thiago J.; Rocha, Flávia; Pereira, Felipe L.; Dorella, Fernanda A.; Carvalho, Alex F.; Menezes, Nildo; Macedo, Eduardo S.; Moura-Costa, Lilia F.; Meyer, Roberto; Leal, Carlos A. G.; Figueiredo, Henrique C.; Azevedo, Vasco

    2016-01-01

    We present the complete genome sequence of Corynebacterium pseudotuberculosis strain N1. The sequencing was performed with the Ion Torrent Personal Genome Machine system. The genome is a circular chromosome with 2,337,845 bp, a G+C content of 52.85%, and a total of 2,045 coding sequences, 12 rRNAs, 49 tRNAs, and 58 pseudogenes. PMID:26823597

  4. Genome Sequences of 14 Firmicutes Strains Isolated from the Human Vagina

    Science.gov (United States)

    Deitzler, Grace E.; Ruiz, Maria J.; Weimer, Cory; Park, SoEun; Robinson, Lloyd; Hallsworth-Pepin, Kymberlie; Wollam, Aye; Mitreva, Makedonka

    2016-01-01

    Research on vaginal infections is currently limited by a lack of available fully sequenced bacterial reference strains. Here, we present strains (now available through BEI Resources) and genome sequences for a set of 14 vaginal isolates from the phylum Firmicutes. These genome sequences provide a valuable resource for future research in understanding the role of Gram-positive bacteria in vaginal health and disease. PMID:27688329

  5. STR allele sequence variation: Current knowledge and future issues.

    Science.gov (United States)

    Gettings, Katherine Butler; Aponte, Rachel A; Vallone, Peter M; Butler, John M

    2015-09-01

    This article reviews what is currently known about short tandem repeat (STR) allelic sequence variation in and around the twenty-four loci most commonly used throughout the world to perform forensic DNA investigations. These STR loci include D1S1656, TPOX, D2S441, D2S1338, D3S1358, FGA, CSF1PO, D5S818, SE33, D6S1043, D7S820, D8S1179, D10S1248, TH01, vWA, D12S391, D13S317, Penta E, D16S539, D18S51, D19S433, D21S11, Penta D, and D22S1045. All known reported variant alleles are compiled along with genomic information available from GenBank, dbSNP, and the 1000 Genomes Project. Supplementary files are included which provide annotated reference sequences for each STR locus, characterize genomic variation around the STR repeat region, and compare alleles present in currently available STR kit allelic ladders. Looking to the future, STR allele nomenclature options are discussed as they relate to next generation sequencing efforts underway.

  6. Heterogeneous Cloud Framework for Big Data Genome Sequencing.

    Science.gov (United States)

    Wang, Chao; Li, Xi; Chen, Peng; Wang, Aili; Zhou, Xuehai; Yu, Hong

    2015-01-01

    The next generation genome sequencing problem with short (long) reads is an emerging field in numerous scientific and big data research domains. However, data sizes and ease of access for scientific researchers are growing and most current methodologies rely on one acceleration approach and so cannot meet the requirements imposed by explosive data scales and complexities. In this paper, we propose a novel FPGA-based acceleration solution with MapReduce framework on multiple hardware accelerators. The combination of hardware acceleration and MapReduce execution flow could greatly accelerate the task of aligning short length reads to a known reference genome. To evaluate the performance and other metrics, we conducted a theoretical speedup analysis on a MapReduce programming platform, which demonstrates that our proposed architecture have efficient potential to improve the speedup for large scale genome sequencing applications. Also, as a practical study, we have built a hardware prototype on the real Xilinx FPGA chip. Significant metrics on speedup, sensitivity, mapping quality, error rate, and hardware cost are evaluated, respectively. Experimental results demonstrate that the proposed platform could efficiently accelerate the next generation sequencing problem with satisfactory accuracy and acceptable hardware cost.

  7. Global genomic diversity of human papillomavirus 6 based on 724 isolates and 190 complete genome sequences.

    Science.gov (United States)

    Jelen, Mateja M; Chen, Zigui; Kocjan, Boštjan J; Burt, Felicity J; Chan, Paul K S; Chouhy, Diego; Combrinck, Catharina E; Coutlée, François; Estrade, Christine; Ferenczy, Alex; Fiander, Alison; Franco, Eduardo L; Garland, Suzanne M; Giri, Adriana A; González, Joaquín Víctor; Gröning, Arndt; Heidrich, Kerstin; Hibbitts, Sam; Hošnjak, Lea; Luk, Tommy N M; Marinic, Karina; Matsukura, Toshihiko; Neumann, Anna; Oštrbenk, Anja; Picconi, Maria Alejandra; Richardson, Harriet; Sagadin, Martin; Sahli, Roland; Seedat, Riaz Y; Seme, Katja; Severini, Alberto; Sinchi, Jessica L; Smahelova, Jana; Tabrizi, Sepehr N; Tachezy, Ruth; Tohme, Sarah; Uloza, Virgilijus; Vitkauskiene, Astra; Wong, Yong Wee; Zidovec Lepej, Snježana; Burk, Robert D; Poljak, Mario

    2014-07-01

    Human papillomavirus type 6 (HPV6) is the major etiological agent of anogenital warts and laryngeal papillomas and has been included in both the quadrivalent and nonavalent prophylactic HPV vaccines. This study investigated the global genomic diversity of HPV6, using 724 isolates and 190 complete genomes from six continents, and the association of HPV6 genomic variants with geographical location, anatomical site of infection/disease, and gender. Initially, a 2,800-bp E5a-E5b-L1-LCR fragment was sequenced from 492/530 (92.8%) HPV6-positive samples collected for this study. Among them, 130 exhibited at least one single nucleotide polymorphism (SNP), indel, or amino acid change in the E5a-E5b-L1-LCR fragment and were sequenced in full. A global alignment and maximum likelihood tree of 190 complete HPV6 genomes (130 fully sequenced in this study and 60 obtained from sequence repositories) revealed two variant lineages, A and B, and five B sublineages: B1, B2, B3, B4, and B5. HPV6 (sub)lineage-specific SNPs and a 960-bp representative region for whole-genome-based phylogenetic clustering within the L2 open reading frame were identified. Multivariate logistic regression analysis revealed that lineage B predominated globally. Sublineage B3 was more common in Africa and North and South America, and lineage A was more common in Asia. Sublineages B1 and B3 were associated with anogenital infections, indicating a potential lesion-specific predilection of some HPV6 sublineages. Females had higher odds for infection with sublineage B3 than males. In conclusion, a global HPV6 phylogenetic analysis revealed the existence of two variant lineages and five sublineages, showing some degree of ethnogeographic, gender, and/or disease predilection in their distribution. This study established the largest database of globally circulating HPV6 genomic variants and contributed a total of 130 new, complete HPV6 genome sequences to available sequence repositories. Two HPV6 variant lineages

  8. A taste of pineapple evolution through genome sequencing.

    Science.gov (United States)

    Xu, Qing; Liu, Zhong-Jian

    2015-12-01

    The genome sequence assembly of the highly heterozygous Ananas comosus and its varieties is an impressive technical achievement. The sequence opens the door to a greater understanding of pineapple morphology and evolution.

  9. Whole-Genome Sequence Assembly for Mammalian Genomes: Arachne 2

    OpenAIRE

    Jaffe, David B.; Butler, Jonathan; Gnerre, Sante; Mauceli, Evan; Lindblad-Toh, Kerstin; Jill P. Mesirov; Michael C Zody; Lander, Eric S.

    2003-01-01

    We previously described the whole-genome assembly program Arachne, presenting assemblies of simulated data for small to mid-sized genomes. Here we describe algorithmic adaptations to the program, allowing for assembly of mammalian-size genomes, and also improving the assembly of smaller genomes. Three principal changes were simultaneously made and applied to the assembly of the mouse genome, during a six-month period of development: (1) Supercontigs (scaffolds) were iteratively broken and rej...

  10. Cost analysis of whole genome sequencing in German clinical practice.

    Science.gov (United States)

    Plöthner, Marika; Frank, Martin; von der Schulenburg, J-Matthias Graf

    2017-06-01

    Whole genome sequencing (WGS) is an emerging tool in clinical diagnostics. However, little has been said about its procedure costs, owing to a dearth of related cost studies. This study helps fill this research gap by analyzing the execution costs of WGS within the setting of German clinical practice. First, to estimate costs, a sequencing process related to clinical practice was undertaken. Once relevant resources were identified, a quantification and monetary evaluation was conducted using data and information from expert interviews with clinical geneticists, and personnel at private enterprises and hospitals. This study focuses on identifying the costs associated with the standard sequencing process, and the procedure costs for a single WGS were analyzed on the basis of two sequencing platforms-namely, HiSeq 2500 and HiSeq Xten, both by Illumina, Inc. In addition, sensitivity analyses were performed to assess the influence of various uses of sequencing platforms and various coverage values on a fixed-cost degression. In the base case scenario-which features 80 % utilization and 30-times coverage-the cost of a single WGS analysis with the HiSeq 2500 was estimated at €3858.06. The cost of sequencing materials was estimated at €2848.08; related personnel costs of €396.94 and acquisition/maintenance costs (€607.39) were also found. In comparison, the cost of sequencing that uses the latest technology (i.e., HiSeq Xten) was approximately 63 % cheaper, at €1411.20. The estimated costs of WGS currently exceed the prediction of a 'US$1000 per genome', by more than a factor of 3.8. In particular, the material costs in themselves exceed this predicted cost.

  11. Genome BLAST distance phylogenies inferred from whole plastid and whole mitochondrion genome sequences

    Directory of Open Access Journals (Sweden)

    Holland Barbara R

    2006-07-01

    Full Text Available Abstract Background Phylogenetic methods which do not rely on multiple sequence alignments are important tools in inferring trees directly from completely sequenced genomes. Here, we extend the recently described Genome BLAST Distance Phylogeny (GBDP strategy to compute phylogenetic trees from all completely sequenced plastid genomes currently available and from a selection of mitochondrial genomes representing the major eukaryotic lineages. BLASTN, TBLASTX, or combinations of both are used to locate high-scoring segment pairs (HSPs between two sequences from which pairwise similarities and distances are computed in different ways resulting in a total of 96 GBDP variants. The suitability of these distance formulae for phylogeny reconstruction is directly estimated by computing a recently described measure of "treelikeness", the so-called δ value, from the respective distance matrices. Additionally, we compare the trees inferred from these matrices using UPGMA, NJ, BIONJ, FastME, or STC, respectively, with the NCBI taxonomy tree of the taxa under study. Results Our results indicate that, at this taxonomic level, plastid genomes are much more valuable for inferring phylogenies than are mitochondrial genomes, and that distances based on breakpoints are of little use. Distances based on the proportion of "matched" HSP length to average genome length were best for tree estimation. Additionally we found that using TBLASTX instead of BLASTN and, particularly, combining TBLASTX and BLASTN leads to a small but significant increase in accuracy. Other factors do not significantly affect the phylogenetic outcome. The BIONJ algorithm results in phylogenies most in accordance with the current NCBI taxonomy, with NJ and FastME performing insignificantly worse, and STC performing as well if applied to high quality distance matrices. δ values are found to be a reliable predictor of phylogenetic accuracy. Conclusion Using the most treelike distance matrices, as

  12. Long-range genomic enrichment, sequencing, and assembly to determine unknown sequences flanking a known microRNA.

    Directory of Open Access Journals (Sweden)

    Zhaorong Ma

    Full Text Available Conserved plant microRNAs (miRNAs modulate important biological processes but little is known about conserved cis-regulatory elements (CREs surrounding MIRNA genes. We developed a solution-based targeted genomic enrichment methodology to capture, enrich, and sequence flanking genomic regions surrounding conserved MIRNA genes with a locked-nucleic acid (LNA-modified, biotinylated probe complementary to the mature miRNA sequence. Genomic DNA bound by the probe is captured by streptavidin-coated magnetic beads, amplified, sequenced and assembled de novo to obtain genomic DNA sequences flanking MIRNA locus of interest. We demonstrate the sensitivity and specificity of this enrichment methodology in Arabidopsis thaliana to enrich targeted regions spanning 10-20 kb surrounding known MIR166 and MIR165 loci. Assembly of the sequencing reads successfully recovered all targeted loci. While further optimization for larger, more complex genomes is needed, this method may enable determination of flanking genomic DNA sequence surrounding a known core (like a conserved mature miRNA from multiple species that currently don't have a full genome assembly available.

  13. Genome Project Standards in a New Era of Sequencing

    Energy Technology Data Exchange (ETDEWEB)

    GSC Consortia; HMP Jumpstart Consortia; Chain, P. S. G.; Grafham, D. V.; Fulton, R. S.; FitzGerald, M. G.; Hostetler, J.; Muzny, D.; Detter, J. C.; Ali, J.; Birren, B.; Bruce, D. C.; Buhay, C.; Cole, J. R.; Ding, Y.; Dugan, S.; Field, D.; Garrity, G. M.; Gibbs, R.; Graves, T.; Han, C. S.; Harrison, S. H.; Highlander, S.; Hugenholtz, P.; Khouri, H. M.; Kodira, C. D.; Kolker, E.; Kyrpides, N. C.; Lang, D.; Lapidus, A.; Malfatti, S. A.; Markowitz, V.; Metha, T.; Nelson, K. E.; Parkhill, J.; Pitluck, S.; Qin, X.; Read, T. D.; Schmutz, J.; Sozhamannan, S.; Strausberg, R.; Sutton, G.; Thomson, N. R.; Tiedje, J. M.; Weinstock, G.; Wollam, A.

    2009-06-01

    For over a decade, genome 43 sequences have adhered to only two standards that are relied on for purposes of sequence analysis by interested third parties (1, 2). However, ongoing developments in revolutionary sequencing technologies have resulted in a redefinition of traditional whole genome sequencing that requires a careful reevaluation of such standards. With commercially available 454 pyrosequencing (followed by Illumina, SOLiD, and now Helicos), there has been an explosion of genomes sequenced under the moniker 'draft', however these can be very poor quality genomes (due to inherent errors in the sequencing technologies, and the inability of assembly programs to fully address these errors). Further, one can only infer that such draft genomes may be of poor quality by navigating through the databases to find the number and type of reads deposited in sequence trace repositories (and not all genomes have this available), or to identify the number of contigs or genome fragments deposited to the database. The difficulty in assessing the quality of such deposited genomes has created some havoc for genome analysis pipelines and contributed to many wasted hours of (mis)interpretation. These same novel sequencing technologies have also brought an exponential leap in raw sequencing capability, and at greatly reduced prices that have further skewed the time- and cost-ratios of draft data generation versus the painstaking process of improving and finishing a genome. The resulting effect is an ever-widening gap between drafted and finished genomes that only promises to continue (Figure 1), hence there is an urgent need to distinguish good and poor datasets. The sequencing institutes in the authorship, along with the NIH's Human Microbiome Project Jumpstart Consortium (3), strongly believe that a new set of standards is required for genome sequences. The following represents a set of six community-defined categories of genome sequence standards that better

  14. Identification of novel biomass-degrading enzymes from genomic dark matter: Populating genomic sequence space with functional annotation.

    Science.gov (United States)

    Piao, Hailan; Froula, Jeff; Du, Changbin; Kim, Tae-Wan; Hawley, Erik R; Bauer, Stefan; Wang, Zhong; Ivanova, Nathalia; Clark, Douglas S; Klenk, Hans-Peter; Hess, Matthias

    2014-08-01

    Although recent nucleotide sequencing technologies have significantly enhanced our understanding of microbial genomes, the function of ∼35% of genes identified in a genome currently remains unknown. To improve the understanding of microbial genomes and consequently of microbial processes it will be crucial to assign a function to this "genomic dark matter." Due to the urgent need for additional carbohydrate-active enzymes for improved production of transportation fuels from lignocellulosic biomass, we screened the genomes of more than 5,500 microorganisms for hypothetical proteins that are located in the proximity of already known cellulases. We identified, synthesized and expressed a total of 17 putative cellulase genes with insufficient sequence similarity to currently known cellulases to be identified as such using traditional sequence annotation techniques that rely on significant sequence similarity. The recombinant proteins of the newly identified putative cellulases were subjected to enzymatic activity assays to verify their hydrolytic activity towards cellulose and lignocellulosic biomass. Eleven (65%) of the tested enzymes had significant activity towards at least one of the substrates. This high success rate highlights that a gene context-based approach can be used to assign function to genes that are otherwise categorized as "genomic dark matter" and to identify biomass-degrading enzymes that have little sequence similarity to already known cellulases. The ability to assign function to genes that have no related sequence representatives with functional annotation will be important to enhance our understanding of microbial processes and to identify microbial proteins for a wide range of applications.

  15. Genome-wide BAC-end sequencing of Cucumis melo using two BAC libraries

    Directory of Open Access Journals (Sweden)

    Puigdomènech Pere

    2010-11-01

    Full Text Available Abstract Background Although melon (Cucumis melo L. is an economically important fruit crop, no genome-wide sequence information is openly available at the current time. We therefore sequenced BAC-ends representing a total of 33,024 clones, half of them from a previously described melon BAC library generated with restriction endonucleases and the remainder from a new random-shear BAC library. Results We generated a total of 47,140 high-quality BAC-end sequences (BES, 91.7% of which were paired-BES. Both libraries were assembled independently and then cross-assembled to obtain a final set of 33,372 non-redundant, high-quality sequences. These were grouped into 6,411 contigs (4.5 Mb and 26,961 non-assembled BES (14.4 Mb, representing ~4.2% of the melon genome. The sequences were used to screen genomic databases, identifying 7,198 simple sequence repeats (corresponding to one microsatellite every 2.6 kb and 2,484 additional repeats of which 95.9% represented transposable elements. The sequences were also used to screen expressed sequence tag (EST databases, revealing 11,372 BES that were homologous to ESTs. This suggests that ~30% of the melon genome consists of coding DNA. We observed regions of microsynteny between melon paired-BES and six other dicotyledonous plant genomes. Conclusion The analysis of nearly 50,000 BES from two complementary genomic libraries covered ~4.2% of the melon genome, providing insight into properties such as microsatellite and transposable element distribution, and the percentage of coding DNA. The observed synteny between melon paired-BES and six other plant genomes showed that useful comparative genomic data can be derived through large scale BAC-end sequencing by anchoring a small proportion of the melon genome to other sequenced genomes.

  16. Epigenetics of obesity: beyond the genome sequence.

    Science.gov (United States)

    Cordero, Paul; Li, Jiawei; Oben, Jude A

    2015-07-01

    After the study of the gene code as a trigger for obesity, epigenetic code has appeared as a novel tool in the diagnosis, prognosis and treatment of obesity, and its related comorbidities. This review summarizes the status of the epigenetic field associated with obesity, and the current epigenetic-based approaches for obesity treatment. Thanks to technical advances, novel and key obesity-associated polymorphisms have been described by genome-wide association studies, but there are limitations with their predictive power. Epigenetics is also studied for disease association, which involves decoding of the genome information, transcriptional status and later phenotypes. Obesity could be induced during adult life by feeding and other environmental factors, and there is a strong association between obesity features and specific epigenetic patterns. These patterns could be established during early life stages, and programme the risk of obesity and its comorbidities during adult life. Furthermore, recent studies have shown that DNA methylation profile could be applied as biomarkers of diet-induced weight loss treatment. High-throughput technologies, recently implemented for commercial genetic test panels, could soon lead to the creation of epigenetic test panels for obesity. Nonetheless, epigenetics is a modifiable risk factor, and different dietary patterns or environmental insights during distinct stages of life could lead to rewriting of the epigenetic profile.

  17. Draft Genome Sequence of a Diarrheagenic Morganella morganii Isolate.

    Science.gov (United States)

    Singh, Pallavi; Mosci, Rebekah; Rudrik, James T; Manning, Shannon D

    2015-10-08

    This is a report of the whole-genome draft sequence of a diarrheagenic Morganella morganii isolate from a patient in Michigan, USA. This genome represents an important addition to the limited number of pathogenic M. morganii genomes available.

  18. Complete genome sequence of ‘Candidatus Liberibacter africanus’

    Science.gov (United States)

    The complete genome sequence of ‘Candidatus Liberibacter africanus’ (Laf), strain ptsapsy, was obtained by an Illumina HiSeq 2000. The Laf genome comprises 1,192,232 nucleotides, 34.5% GC content, 1,141 predicted coding sequences, 44 tRNAs, 3 complete copies of ribosomal RNA genes (16S, 23S and 5S) ...

  19. Genome sequencing and annotation of Cellulomonas sp. HZM

    Directory of Open Access Journals (Sweden)

    Patric Chua

    2015-09-01

    Full Text Available We report the draft genome sequence of Cellulomonas sp. HZM, isolated from a tropical peat swamp forest. The draft genome size is 3,559,280 bp with a G + C content of 73% and contains 3 rRNA sequences (single copies of 5S, 16S and 23S rRNA.

  20. Whole-Genome Sequences of 26 Vibrio cholerae Isolates

    Science.gov (United States)

    Watve, Samit S.; Chande, Aroon T.; Rishishwar, Lavanya; Jordan, I. King

    2016-01-01

    The human pathogen Vibrio cholerae employs several adaptive mechanisms for environmental persistence, including natural transformation and type VI secretion, creating a reservoir for the spread of disease. Here, we report whole-genome sequences of 26 diverse V. cholerae isolates, significantly increasing the sequence diversity of publicly available V. cholerae genomes. PMID:28007852

  1. Complete Genome Sequence of Staphylococcus pseudintermedius Type Strain LMG 22219

    Science.gov (United States)

    Abouelkhair, Mohamed A.; Riley, Matthew C.; Bemis, David A.

    2017-01-01

    ABSTRACT We report the first complete genome sequence of LMG 22219 (=ON 86T = CCUG 49543T), the Staphylococcus pseudintermedius type strain isolated from feline lung tissue. This sequence information will facilitate phylogenetic comparisons of staphylococcal species and other bacteria at the genome level. PMID:28209834

  2. Genome sequence of Kocuria palustris strain W4

    DEFF Research Database (Denmark)

    Herschend, Jakob; Raghupathi, Prem Krishnan; Røder, Henriette Lyng;

    2016-01-01

    We report the 3.09 Mb draft genome sequence ofKocuria palustrisW4, isolated from a slaughterhouse in Denmark.......We report the 3.09 Mb draft genome sequence ofKocuria palustrisW4, isolated from a slaughterhouse in Denmark....

  3. Genome sequence of Kocuria palustris strain W4

    DEFF Research Database (Denmark)

    Herschend, Jakob; Raghupathi, Prem Krishnan; Røder, Henriette Lyng

    2016-01-01

    We report the 3.09 Mb draft genome sequence ofKocuria palustrisW4, isolated from a slaughterhouse in Denmark.......We report the 3.09 Mb draft genome sequence ofKocuria palustrisW4, isolated from a slaughterhouse in Denmark....

  4. Nearly Complete Genome Sequence of Lactobacillus plantarum Strain NIZO2877

    NARCIS (Netherlands)

    Martino, M.E.; Bayjanov, J.R.; Joncour, P.; Hughes, S.; Gillet, B.; Kleerebezem, M; Siezen, R.; Hijum, S.A.F.T. van; Leulier, F.

    2015-01-01

    Lactobacillus plantarum is a versatile bacterial species that is isolated mostly from foods. Here, we present the first genome sequence of L. plantarum strain NIZO2877 isolated from a hot dog in Vietnam. Its two contigs represent a nearly complete genome sequence.

  5. Investigation of genome sequences within the family Pasteurellaceae

    DEFF Research Database (Denmark)

    Angen, Øystein; Ussery, David

    . The homology between genomes ranged from 47.2% to 94.1%. The number of genes found increased steadily for each sequence added to the analysis and the pan-genome of all 20 sequences consisted of around 8500 genes. On the other hand, the number of genes found in all strains steadily decreased when adding...

  6. Full Genome Sequence of Giant Panda Rotavirus Strain CH-1

    Science.gov (United States)

    Guo, Ling; Yang, Shaolin; Wang, Chengdong; Chen, Shijie; Yang, Xiaonong; Hou, Rong; Quan, Zifang; Hao, Zhongxiang

    2013-01-01

    We report here the complete genomic sequence of the giant panda rotavirus strain CH-1. This work is the first to document the complete genomic sequence (segments 1 to 11) of the CH-1 strain, which offers an effective platform for providing authentic research experiences to novice scientists. PMID:23469354

  7. Complete genome sequence of Enterobacter aerogenes KCTC 2190.

    Science.gov (United States)

    Shin, Sang Heum; Kim, Sewhan; Kim, Jae Young; Lee, Soojin; Um, Youngsoon; Oh, Min-Kyu; Kim, Young-Rok; Lee, Jinwon; Yang, Kap-Seok

    2012-05-01

    This is the first complete genome sequence of the Enterobacter aerogenes species. Here we present the genome sequence of E. aerogenes KCTC 2190, which contains 5,280,350 bp with a G + C content of 54.8 mol%, 4,912 protein-coding genes, and 109 structural RNAs.

  8. Draft Genome Sequence of Enterococcus mundtii CRL1656

    OpenAIRE

    2012-01-01

    We report the draft genome sequence of Enterococcus mundtii CRL1656, which was isolated from the stripping milk of a clinically healthy adult Holstein dairy cow from a dairy farm of the northwestern region of Tucumán (Argentina). The 3.10-Mb genome sequence consists of 450 large contigs and contains 2,741 predicted protein-coding genes.

  9. Complete Genome Sequence of the Human Gut Symbiont Roseburia hominis

    DEFF Research Database (Denmark)

    Travis, Anthony J.; Kelly, Denise; Flint, Harry J;

    2015-01-01

    We report here the complete genome sequence of the human gut symbiont Roseburia hominis A2-183(T) (= DSM 16839(T) = NCIMB 14029(T)), isolated from human feces. The genome is represented by a 3,592,125-bp chromosome with 3,405 coding sequences. A number of potential functions contributing to host-...

  10. Initial sequencing and analysis of the human genome.

    Science.gov (United States)

    Lander, E S; Linton, L M; Birren, B; Nusbaum, C; Zody, M C; Baldwin, J; Devon, K; Dewar, K; Doyle, M; FitzHugh, W; Funke, R; Gage, D; Harris, K; Heaford, A; Howland, J; Kann, L; Lehoczky, J; LeVine, R; McEwan, P; McKernan, K; Meldrim, J; Mesirov, J P; Miranda, C; Morris, W; Naylor, J; Raymond, C; Rosetti, M; Santos, R; Sheridan, A; Sougnez, C; Stange-Thomann, Y; Stojanovic, N; Subramanian, A; Wyman, D; Rogers, J; Sulston, J; Ainscough, R; Beck, S; Bentley, D; Burton, J; Clee, C; Carter, N; Coulson, A; Deadman, R; Deloukas, P; Dunham, A; Dunham, I; Durbin, R; French, L; Grafham, D; Gregory, S; Hubbard, T; Humphray, S; Hunt, A; Jones, M; Lloyd, C; McMurray, A; Matthews, L; Mercer, S; Milne, S; Mullikin, J C; Mungall, A; Plumb, R; Ross, M; Shownkeen, R; Sims, S; Waterston, R H; Wilson, R K; Hillier, L W; McPherson, J D; Marra, M A; Mardis, E R; Fulton, L A; Chinwalla, A T; Pepin, K H; Gish, W R; Chissoe, S L; Wendl, M C; Delehaunty, K D; Miner, T L; Delehaunty, A; Kramer, J B; Cook, L L; Fulton, R S; Johnson, D L; Minx, P J; Clifton, S W; Hawkins, T; Branscomb, E; Predki, P; Richardson, P; Wenning, S; Slezak, T; Doggett, N; Cheng, J F; Olsen, A; Lucas, S; Elkin, C; Uberbacher, E; Frazier, M; Gibbs, R A; Muzny, D M; Scherer, S E; Bouck, J B; Sodergren, E J; Worley, K C; Rives, C M; Gorrell, J H; Metzker, M L; Naylor, S L; Kucherlapati, R S; Nelson, D L; Weinstock, G M; Sakaki, Y; Fujiyama, A; Hattori, M; Yada, T; Toyoda, A; Itoh, T; Kawagoe, C; Watanabe, H; Totoki, Y; Taylor, T; Weissenbach, J; Heilig, R; Saurin, W; Artiguenave, F; Brottier, P; Bruls, T; Pelletier, E; Robert, C; Wincker, P; Smith, D R; Doucette-Stamm, L; Rubenfield, M; Weinstock, K; Lee, H M; Dubois, J; Rosenthal, A; Platzer, M; Nyakatura, G; Taudien, S; Rump, A; Yang, H; Yu, J; Wang, J; Huang, G; Gu, J; Hood, L; Rowen, L; Madan, A; Qin, S; Davis, R W; Federspiel, N A; Abola, A P; Proctor, M J; Myers, R M; Schmutz, J; Dickson, M; Grimwood, J; Cox, D R; Olson, M V; Kaul, R; Raymond, C; Shimizu, N; Kawasaki, K; Minoshima, S; Evans, G A; Athanasiou, M; Schultz, R; Roe, B A; Chen, F; Pan, H; Ramser, J; Lehrach, H; Reinhardt, R; McCombie, W R; de la Bastide, M; Dedhia, N; Blöcker, H; Hornischer, K; Nordsiek, G; Agarwala, R; Aravind, L; Bailey, J A; Bateman, A; Batzoglou, S; Birney, E; Bork, P; Brown, D G; Burge, C B; Cerutti, L; Chen, H C; Church, D; Clamp, M; Copley, R R; Doerks, T; Eddy, S R; Eichler, E E; Furey, T S; Galagan, J; Gilbert, J G; Harmon, C; Hayashizaki, Y; Haussler, D; Hermjakob, H; Hokamp, K; Jang, W; Johnson, L S; Jones, T A; Kasif, S; Kaspryzk, A; Kennedy, S; Kent, W J; Kitts, P; Koonin, E V; Korf, I; Kulp, D; Lancet, D; Lowe, T M; McLysaght, A; Mikkelsen, T; Moran, J V; Mulder, N; Pollara, V J; Ponting, C P; Schuler, G; Schultz, J; Slater, G; Smit, A F; Stupka, E; Szustakowki, J; Thierry-Mieg, D; Thierry-Mieg, J; Wagner, L; Wallis, J; Wheeler, R; Williams, A; Wolf, Y I; Wolfe, K H; Yang, S P; Yeh, R F; Collins, F; Guyer, M S; Peterson, J; Felsenfeld, A; Wetterstrand, K A; Patrinos, A; Morgan, M J; de Jong, P; Catanese, J J; Osoegawa, K; Shizuya, H; Choi, S; Chen, Y J; Szustakowki, J

    2001-02-15

    The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.

  11. A gapless genome sequence of the fungus Botrytis cinerea

    NARCIS (Netherlands)

    Kan, Van Jan A.L.; Stassen, Joost H.M.; Mosbach, Andreas; Lee, Van Der Theo A.J.; Faino, Luigi; Farmer, Andrew D.; Papasotiriou, Dimitrios G.; Zhou, Shiguo; Seidl, Michael F.; Cottam, Eleanor; Edel, Dominique; Hahn, Matthias; Schwartz, David C.; Dietrich, Robert A.; Widdison, Stephanie; Scalliet, Gabriel

    2016-01-01

    Following earlier incomplete and fragmented versions of a genome sequence for the grey mould Botrytis cinerea, a gapless, near-finished genome sequence for B. cinerea strain B05.10 is reported. The assembly comprised 18 chromosomes and was confirmed by an optical map and a genetic map based on ap

  12. Draft Genome Sequence of Raoultella planticola, Isolated from River Water.

    Science.gov (United States)

    Jothikumar, Narayanan; Kahler, Amy; Strockbine, Nancy; Gladney, Lori; Hill, Vincent R

    2014-10-16

    We isolated Raoultella planticola from a river water sample, which was phenotypically indistinguishable from Escherichia coli on MI agar. The genome sequence of R. planticola was determined to gain information about its metabolic functions contributing to its false positive appearance of E. coli on MI agar. We report the first whole genome sequence of Raoultella planticola.

  13. Genome sequence of the Chlamydophila abortus variant strain LLG.

    Science.gov (United States)

    Sait, Michelle; Clark, Ewan M; Wheelhouse, Nick; Livingstone, Morag; Spalding, Lucy; Siarkou, Victoria I; Vretou, Evangelia; Smith, David G E; Lainson, F Alex; Longbottom, David

    2011-08-01

    Chlamydophila abortus is a common cause of ruminant abortion. Here we report the genome sequence of strain LLG, which differs genotypically and phenotypically from the wild-type strain S26/3. Genome sequencing revealed differences between LLG and S26/3 to occur in pseudogene content, in transmembrane head/inc family proteins, and in biotin biosynthesis genes.

  14. Draft Genome Sequence of Tannerella forsythia Type Strain ATCC 43037.

    Science.gov (United States)

    Friedrich, Valentin; Pabinger, Stephan; Chen, Tsute; Messner, Paul; Dewhirst, Floyd E; Schäffer, Christina

    2015-06-11

    Tannerella forsythia is an oral pathogen implicated in the development of periodontitis. Here, we report the draft genome sequence of the Tannerella forsythia strain ATCC 43037. The previously available genome of this designation (NCBI reference sequence NC_016610.1) was discovered to be derived from a different strain, FDC 92A2 (= ATCC BAA-2717).

  15. Draft Genome Sequence of Tannerella forsythia Type Strain ATCC 43037

    OpenAIRE

    Friedrich, Valentin; Pabinger, Stephan; Chen, Tsute; Messner, Paul; Dewhirst, Floyd E.; Schäffer, Christina

    2015-01-01

    Tannerella forsythia is an oral pathogen implicated in the development of periodontitis. Here, we report the draft genome sequence of the Tannerella forsythia strain ATCC 43037. The previously available genome of this designation (NCBI reference sequence NC_016610.1) was discovered to be derived from a different strain, FDC 92A2 (= ATCC BAA-2717).

  16. Nearly Complete Genome Sequence of Lactobacillus plantarum Strain NIZO2877

    NARCIS (Netherlands)

    Martino, M.E.; Bayjanov, J.R.; Joncour, P.; Hughes, S.; Gillet, B.; Kleerebezem, M; Siezen, R.; Hijum, S.A.F.T. van; Leulier, F.

    2015-01-01

    Lactobacillus plantarum is a versatile bacterial species that is isolated mostly from foods. Here, we present the first genome sequence of L. plantarum strain NIZO2877 isolated from a hot dog in Vietnam. Its two contigs represent a nearly complete genome sequence.

  17. Complete Genome Sequence of Lactobacillus plantarum CGMCC 8198

    Science.gov (United States)

    Dong, Qing-Qing; Hu, Hai-Jie; Wang, Qiu-Tong; Gu, Xiang-Chao; Zhou, Hao; Zhou, Wen-Juan; Ni, Xiao-Meng

    2017-01-01

    ABSTRACT We report the complete genome sequence of Lactobacillus plantarum CGMCC 8198, a novel probiotic strain isolated from fermented herbage. We have determined the complete genome sequence of strain L. plantarum CGMCC 8198, which consists of genes that are likely to be involved in dairy fermentation and that have probiotic qualities. PMID:28183756

  18. Sequencing of a Cultivated Diploid CottonGenome-Gossypium arboreum

    Institute of Scientific and Technical Information of China (English)

    WILKINS Thea A

    2008-01-01

    @@ Sequencing the genomes of crop species and model systems contributes significantly to our under-standing of the organization,structure and function of plant genomes.In a "white paper" published in2007,the cotton community set forth a strategic plan for sequencing the AD genome of cultivated up-land cotton that initially targets less complex diploid genomes.This strategy banks on the high degreeof conservation between diploid progenitors and AD species that will allow information derived fromdiploid genomes to be directly applied to the tetraploids.

  19. Comparison of 61 Sequenced Escherichia coli Genomes

    DEFF Research Database (Denmark)

    Lukjancenko, Oksana; Wassenaar, T. M.; Ussery, David

    2010-01-01

    MLST was performed, many of the various strains appear jumbled and less well resolved. The predicted pan-genome comprises 15,741 gene families, and only 993 (6%) of the families are represented in every genome, comprising the core genome. The variable or 'accessory' genes thus make up more than 90......% of the pan-genome and about 80% of a typical genome; some of these variable genes tend to be co-localized on genomic islands. The diversity within the species E. coli, and the overlap in gene content between this and related species, suggests a continuum rather than sharp species borders in this group...

  20. Haplotype-resolved genome sequencing of a Gujarati Indian individual.

    Science.gov (United States)

    Kitzman, Jacob O; Mackenzie, Alexandra P; Adey, Andrew; Hiatt, Joseph B; Patwardhan, Rupali P; Sudmant, Peter H; Ng, Sarah B; Alkan, Can; Qiu, Ruolan; Eichler, Evan E; Shendure, Jay

    2011-01-01

    Haplotype information is essential to the complete description and interpretation of genomes, genetic diversity and genetic ancestry. Although individual human genome sequencing is increasingly routine, nearly all such genomes are unresolved with respect to haplotype. Here we combine the throughput of massively parallel sequencing with the contiguity information provided by large-insert cloning to experimentally determine the haplotype-resolved genome of a South Asian individual. A single fosmid library was split into a modest number of pools, each providing ∼3% physical coverage of the diploid genome. Sequencing of each pool yielded reads overwhelmingly derived from only one homologous chromosome at any given location. These data were combined with whole-genome shotgun sequence to directly phase 94% of ascertained heterozygous single nucleotide polymorphisms (SNPs) into long haplotype blocks (N50 of 386 kilobases (kbp)). This method also facilitates the analysis of structural variation, for example, to anchor novel insertions to specific locations and haplotypes.

  1. Unexpected cross-species contamination in genome sequencing projects

    Directory of Open Access Journals (Sweden)

    Samier Merchant

    2014-11-01

    Full Text Available The raw data from a genome sequencing project sometimes contains DNA from contaminating organisms, which may be introduced during sample collection or sequence preparation. In some instances, these contaminants remain in the sequence even after assembly and deposition of the genome into public databases. As a result, searches of these databases may yield erroneous and confusing results. We used efficient microbiome analysis software to scan the draft assembly of domestic cow, Bos taurus, and identify 173 small contigs that appeared to derive from microbial contaminants. In the course of verifying these findings, we discovered that one genome, Neisseria gonorrhoeae TCDC-NG08107, although putatively a complete genome, contained multiple sequences that actually derived from the cow and sheep genomes. Our findings illustrate the need to carefully validate findings of anomalous DNA that rely on comparisons to either draft or finished genomes.

  2. Second generation sequencing of the mesothelioma tumor genome.

    Directory of Open Access Journals (Sweden)

    Raphael Bueno

    Full Text Available The current paradigm for elucidating the molecular etiology of cancers relies on the interrogation of small numbers of genes, which limits the scope of investigation. Emerging second-generation massively parallel DNA sequencing technologies have enabled more precise definition of the cancer genome on a global scale. We examined the genome of a human primary malignant pleural mesothelioma (MPM tumor and matched normal tissue by using a combination of sequencing-by-synthesis and pyrosequencing methodologies to a 9.6X depth of coverage. Read density analysis uncovered significant aneuploidy and numerous rearrangements. Method-dependent informatics rules, which combined the results of different sequencing platforms, were developed to identify and validate candidate mutations of multiple types. Many more tumor-specific rearrangements than point mutations were uncovered at this depth of sequencing, resulting in novel, large-scale, inter- and intra-chromosomal deletions, inversions, and translocations. Nearly all candidate point mutations appeared to be previously unknown SNPs. Thirty tumor-specific fusions/translocations were independently validated with PCR and Sanger sequencing. Of these, 15 represented disrupted gene-encoding regions, including kinases, transcription factors, and growth factors. One large deletion in DPP10 resulted in altered transcription and expression of DPP10 transcripts in a set of 53 additional MPM tumors correlated with survival. Additionally, three point mutations were observed in the coding regions of NKX6-2, a transcription regulator, and NFRKB, a DNA-binding protein involved in modulating NFKB1. Several regions containing genes such as PCBD2 and DHFR, which are involved in growth factor signaling and nucleotide synthesis, respectively, were selectively amplified in the tumor. Second-generation sequencing uncovered all types of mutations in this MPM tumor, with DNA rearrangements representing the dominant type.

  3. Draft sequences of the radish (Raphanus sativus L.) genome.

    Science.gov (United States)

    Kitashiba, Hiroyasu; Li, Feng; Hirakawa, Hideki; Kawanabe, Takahiro; Zou, Zhongwei; Hasegawa, Yoichi; Tonosaki, Kaoru; Shirasawa, Sachiko; Fukushima, Aki; Yokoi, Shuji; Takahata, Yoshihito; Kakizaki, Tomohiro; Ishida, Masahiko; Okamoto, Shunsuke; Sakamoto, Koji; Shirasawa, Kenta; Tabata, Satoshi; Nishio, Takeshi

    2014-10-01

    Radish (Raphanus sativus L., n = 9) is one of the major vegetables in Asia. Since the genomes of Brassica and related species including radish underwent genome rearrangement, it is quite difficult to perform functional analysis based on the reported genomic sequence of Brassica rapa. Therefore, we performed genome sequencing of radish. Short reads of genomic sequences of 191.1 Gb were obtained by next-generation sequencing (NGS) for a radish inbred line, and 76,592 scaffolds of ≥ 300 bp were constructed along with the bacterial artificial chromosome-end sequences. Finally, the whole draft genomic sequence of 402 Mb spanning 75.9% of the estimated genomic size and containing 61,572 predicted genes was obtained. Subsequently, 221 single nucleotide polymorphism markers and 768 PCR-RFLP markers were used together with the 746 markers produced in our previous study for the construction of a linkage map. The map was combined further with another radish linkage map constructed mainly with expressed sequence tag-simple sequence repeat markers into a high-density integrated map of 1,166 cM with 2,553 DNA markers. A total of 1,345 scaffolds were assigned to the linkage map, spanning 116.0 Mb. Bulked PCR products amplified by 2,880 primer pairs were sequenced by NGS, and SNPs in eight inbred lines were identified.

  4. Ancient Human Genome Sequence of an Extinct Palaeo-Eskimo

    DEFF Research Database (Denmark)

    Rasmussen, Morten; Li, Yingrui; Lindgreen, Stinus;

    2010-01-01

    We report here the genome sequence of an ancient human. Obtained from approximately 4,000-year-old permafrost-preserved hair, the genome represents a male individual from the first known culture to settle in Greenland. Sequenced to an average depth of 20x, we recover 79% of the diploid genome, an...... for a migration from Siberia into the New World some 5,500 years ago, independent of that giving rise to the modern Native Americans and Inuit....

  5. Complete genome sequence of Archaeoglobus profundus type strain (AV18T)

    Energy Technology Data Exchange (ETDEWEB)

    von Jan, Mathias [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute; Glavina Del Rio, Tijana [U.S. Department of Energy, Joint Genome Institute; Copeland, A [U.S. Department of Energy, Joint Genome Institute; Tice, Hope [U.S. Department of Energy, Joint Genome Institute; Cheng, Jan-Fang [U.S. Department of Energy, Joint Genome Institute; Lucas, Susan [U.S. Department of Energy, Joint Genome Institute; Chen, Feng [U.S. Department of Energy, Joint Genome Institute; Nolan, Matt [U.S. Department of Energy, Joint Genome Institute; Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Han, Cliff [Los Alamos National Laboratory (LANL); Pitluck, Sam [U.S. Department of Energy, Joint Genome Institute; Liolios, Konstantinos [U.S. Department of Energy, Joint Genome Institute; Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Mavromatis, K [U.S. Department of Energy, Joint Genome Institute; Ovchinnikova, Galina [U.S. Department of Energy, Joint Genome Institute; Chertkov, Olga [Los Alamos National Laboratory (LANL); Pati, Amrita [U.S. Department of Energy, Joint Genome Institute; Chen, Amy [U.S. Department of Energy, Joint Genome Institute; Palaniappan, Krishna [U.S. Department of Energy, Joint Genome Institute; Land, Miriam L [ORNL; Hauser, Loren John [ORNL; Chang, Yun-Juan [ORNL; Jeffries, Cynthia [Oak Ridge National Laboratory (ORNL); Saunders, Elizabeth H [Los Alamos National Laboratory (LANL); Brettin, Thomas S [ORNL; Detter, J. Chris [U.S. Department of Energy, Joint Genome Institute; Chain, Patrick S. G. [Lawrence Livermore National Laboratory (LLNL); Eichinger, Konrad [Universitat Regensburg, Regensburg, Germany; Huber, Harald [Universitat Regensburg, Regensburg, Germany; Spring, Stefan [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Rohde, Manfred [HZI - Helmholtz Centre for Infection Research, Braunschweig, Germany; Goker, Markus [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Wirth, Reinhard [Universitat Regensburg, Regensburg, Germany; Woyke, Tanja [U.S. Department of Energy, Joint Genome Institute; Bristow, James [U.S. Department of Energy, Joint Genome Institute; Eisen, Jonathan [U.S. Department of Energy, Joint Genome Institute; Markowitz, Victor [U.S. Department of Energy, Joint Genome Institute; Hugenholtz, Philip [U.S. Department of Energy, Joint Genome Institute; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute; Klenk, Hans-Peter [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany

    2010-01-01

    Archaeoglobus profundus (Burggraf et al. 1990) is a hyperthermophilic archaeon in the euryarchaeal class Archaeoglobi, which is currently represented by six validly named species and two taxonomically challenged 'Geoglobus' strains, all belonging to the same family Archaeoglobaceae. All members were isolated from marine hydrothermal habitats and are obligate anaerobes. Here we describe the features of the organism, together with the complete genome sequence and annotation. This is the second completed genome sequence of a member of the class Archaeoglobi. The 1,563,423 bp genome with its 1,858 protein-coding and 52 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  6. The genome sequence of Schizosaccharomyces pombe.

    Science.gov (United States)

    Wood, V; Gwilliam, R; Rajandream, M-A; Lyne, M; Lyne, R; Stewart, A; Sgouros, J; Peat, N; Hayles, J; Baker, S; Basham, D; Bowman, S; Brooks, K; Brown, D; Brown, S; Chillingworth, T; Churcher, C; Collins, M; Connor, R; Cronin, A; Davis, P; Feltwell, T; Fraser, A; Gentles, S; Goble, A; Hamlin, N; Harris, D; Hidalgo, J; Hodgson, G; Holroyd, S; Hornsby, T; Howarth, S; Huckle, E J; Hunt, S; Jagels, K; James, K; Jones, L; Jones, M; Leather, S; McDonald, S; McLean, J; Mooney, P; Moule, S; Mungall, K; Murphy, L; Niblett, D; Odell, C; Oliver, K; O'Neil, S; Pearson, D; Quail, M A; Rabbinowitsch, E; Rutherford, K; Rutter, S; Saunders, D; Seeger, K; Sharp, S; Skelton, J; Simmonds, M; Squares, R; Squares, S; Stevens, K; Taylor, K; Taylor, R G; Tivey, A; Walsh, S; Warren, T; Whitehead, S; Woodward, J; Volckaert, G; Aert, R; Robben, J; Grymonprez, B; Weltjens, I; Vanstreels, E; Rieger, M; Schäfer, M; Müller-Auer, S; Gabel, C; Fuchs, M; Düsterhöft, A; Fritzc, C; Holzer, E; Moestl, D; Hilbert, H; Borzym, K; Langer, I; Beck, A; Lehrach, H; Reinhardt, R; Pohl, T M; Eger, P; Zimmermann, W; Wedler, H; Wambutt, R; Purnelle, B; Goffeau, A; Cadieu, E; Dréano, S; Gloux, S; Lelaure, V; Mottier, S; Galibert, F; Aves, S J; Xiang, Z; Hunt, C; Moore, K; Hurst, S M; Lucas, M; Rochet, M; Gaillardin, C; Tallada, V A; Garzon, A; Thode, G; Daga, R R; Cruzado, L; Jimenez, J; Sánchez, M; del Rey, F; Benito, J; Domínguez, A; Revuelta, J L; Moreno, S; Armstrong, J; Forsburg, S L; Cerutti, L; Lowe, T; McCombie, W R; Paulsen, I; Potashkin, J; Shpakovski, G V; Ussery, D; Barrell, B G; Nurse, P; Cerrutti, L

    2002-02-21

    We have sequenced and annotated the genome of fission yeast (Schizosaccharomyces pombe), which contains the smallest number of protein-coding genes yet recorded for a eukaryote: 4,824. The centromeres are between 35 and 110 kilobases (kb) and contain related repeats including a highly conserved 1.8-kb element. Regions upstream of genes are longer than in budding yeast (Saccharomyces cerevisiae), possibly reflecting more-extended control regions. Some 43% of the genes contain introns, of which there are 4,730. Fifty genes have significant similarity with human disease genes; half of these are cancer related. We identify highly conserved genes important for eukaryotic cell organization including those required for the cytoskeleton, compartmentation, cell-cycle control, proteolysis, protein phosphorylation and RNA splicing. These genes may have originated with the appearance of eukaryotic life. Few similarly conserved genes that are important for multicellular organization were identified, suggesting that the transition from prokaryotes to eukaryotes required more new genes than did the transition from unicellular to multicellular organization.

  7. Whole-Genome Sequences of Thirteen Isolates of Borrelia burgdorferi

    Energy Technology Data Exchange (ETDEWEB)

    Schutzer S. E.; Dunn J.; Fraser-Liggett, C. M.; Casjens, S. R.; Qiu, W.-G.; Mongodin, E. F.; Luft, B. J.

    2011-02-01

    Borrelia burgdorferi is a causative agent of Lyme disease in North America and Eurasia. The first complete genome sequence of B. burgdorferi strain 31, available for more than a decade, has assisted research on the pathogenesis of Lyme disease. Because a single genome sequence is not sufficient to understand the relationship between genotypic and geographic variation and disease phenotype, we determined the whole-genome sequences of 13 additional B. burgdorferi isolates that span the range of natural variation. These sequences should allow improved understanding of pathogenesis and provide a foundation for novel detection, diagnosis, and prevention strategies.

  8. Scrutinizing virus genome termini by high-throughput sequencing.

    Directory of Open Access Journals (Sweden)

    Shasha Li

    Full Text Available Analysis of genomic terminal sequences has been a major step in studies on viral DNA replication and packaging mechanisms. However, traditional methods to study genome termini are challenging due to the time-consuming protocols and their inefficiency where critical details are lost easily. Recent advances in next generation sequencing (NGS have enabled it to be a powerful tool to study genome termini. In this study, using NGS we sequenced one iridovirus genome and twenty phage genomes and confirmed for the first time that the high frequency sequences (HFSs found in the NGS reads are indeed the terminal sequences of viral genomes. Further, we established a criterion to distinguish the type of termini and the viral packaging mode. We also obtained additional terminal details such as terminal repeats, multi-termini, asymmetric termini. With this approach, we were able to simultaneously detect details of the genome termini as well as obtain the complete sequence of bacteriophage genomes. Theoretically, this application can be further extended to analyze larger and more complicated genomes of plant and animal viruses. This study proposed a novel and efficient method for research on viral replication, packaging, terminase activity, transcription regulation, and metabolism of the host cell.

  9. The minimum information about a genome sequence (MIGS) specification

    DEFF Research Database (Denmark)

    Field, D; Garrity, G; Gray, T;

    2008-01-01

    the development of better descriptions of genomic investigations, we have formed the Genomic Standards Consortium (GSC). Here, we introduce the minimum information about a genome sequence (MIGS) specification with the intent of promoting participation in its development and discussing the resources...... that will be required to develop improved mechanisms of metadata capture and exchange. As part of its wider goals, the GSC also supports improving the 'transparency' of the information contained in existing genomic databases....

  10. Genomic libraries: II. Subcloning, sequencing, and assembling large-insert genomic DNA clones.

    Science.gov (United States)

    Quail, Mike A; Matthews, Lucy; Sims, Sarah; Lloyd, Christine; Beasley, Helen; Baxter, Simon W

    2011-01-01

    Sequencing large insert clones to completion is useful for characterizing specific genomic regions, identifying haplotypes, and closing gaps in whole genome sequencing projects. Despite being a standard technique in molecular laboratories, DNA sequencing using the Sanger method can be highly problematic when complex secondary structures or sequence repeats are encountered in genomic clones. Here, we describe methods to isolate DNA from a large insert clone (fosmid or BAC), subclone the sample, and sequence the region to the highest industry standard. Troubleshooting solutions for sequencing difficult templates are discussed.

  11. Bacillus anthracis genome organization in light of whole transcriptome sequencing

    Energy Technology Data Exchange (ETDEWEB)

    Martin, Jeffrey; Zhu, Wenhan; Passalacqua, Karla D.; Bergman, Nicholas; Borodovsky, Mark

    2010-03-22

    Emerging knowledge of whole prokaryotic transcriptomes could validate a number of theoretical concepts introduced in the early days of genomics. What are the rules connecting gene expression levels with sequence determinants such as quantitative scores of promoters and terminators? Are translation efficiency measures, e.g. codon adaptation index and RBS score related to gene expression? We used the whole transcriptome shotgun sequencing of a bacterial pathogen Bacillus anthracis to assess correlation of gene expression level with promoter, terminator and RBS scores, codon adaptation index, as well as with a new measure of gene translational efficiency, average translation speed. We compared computational predictions of operon topologies with the transcript borders inferred from RNA-Seq reads. Transcriptome mapping may also improve existing gene annotation. Upon assessment of accuracy of current annotation of protein-coding genes in the B. anthracis genome we have shown that the transcriptome data indicate existence of more than a hundred genes missing in the annotation though predicted by an ab initio gene finder. Interestingly, we observed that many pseudogenes possess not only a sequence with detectable coding potential but also promoters that maintain transcriptional activity.

  12. Using Partial Genomic Fosmid Libraries for Sequencing CompleteOrganellar Genomes

    Energy Technology Data Exchange (ETDEWEB)

    McNeal, Joel R.; Leebens-Mack, James H.; Arumuganathan, K.; Kuehl, Jennifer V.; Boore, Jeffrey L.; dePamphilis, Claude W.

    2005-08-26

    Organellar genome sequences provide numerous phylogenetic markers and yield insight into organellar function and molecular evolution. These genomes are much smaller in size than their nuclear counterparts; thus, their complete sequencing is much less expensive than total nuclear genome sequencing, making broader phylogenetic sampling feasible. However, for some organisms it is challenging to isolate plastid DNA for sequencing using standard methods. To overcome these difficulties, we constructed partial genomic libraries from total DNA preparations of two heterotrophic and two autotrophic angiosperm species using fosmid vectors. We then used macroarray screening to isolate clones containing large fragments of plastid DNA. A minimum tiling path of clones comprising the entire genome sequence of each plastid was selected, and these clones were shotgun-sequenced and assembled into complete genomes. Although this method worked well for both heterotrophic and autotrophic plants, nuclear genome size had a dramatic effect on the proportion of screened clones containing plastid DNA and, consequently, the overall number of clones that must be screened to ensure full plastid genome coverage. This technique makes it possible to determine complete plastid genome sequences for organisms that defy other available organellar genome sequencing methods, especially those for which limited amounts of tissue are available.

  13. The Fast Changing Landscape of Sequencing Technologies and Their Impact on Microbial Genome Assemblies and Annotation

    Energy Technology Data Exchange (ETDEWEB)

    Mavromatis, K [U.S. Department of Energy, Joint Genome Institute; Land, Miriam L [ORNL; Brettin, Thomas S [ORNL; Quest, Daniel J [ORNL; Copeland, A [U.S. Department of Energy, Joint Genome Institute; Clum, Alicia [U.S. Department of Energy, Joint Genome Institute; Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Woyke, Tanja [U.S. Department of Energy, Joint Genome Institute; Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute; Klenk, Hans-Peter [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Cottingham, Robert W [ORNL; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute

    2012-01-01

    Background: The emergence of next generation sequencing (NGS) has provided the means for rapid and high throughput sequencing and data generation at low cost, while concomitantly creating a new set of challenges. The number of available assembled microbial genomes continues to grow rapidly and their quality reflects the quality of the sequencing technology used, but also of the analysis software employed for assembly and annotation. Methodology/Principal Findings: In this work, we have explored the quality of the microbial draft genomes across various sequencing technologies. We have compared the draft and finished assemblies of 133 microbial genomes sequenced at the Department of Energy-Joint Genome Institute and finished at the Los Alamos National Laboratory using a variety of combinations of sequencing technologies, reflecting the transition of the institute from Sanger-based sequencing platforms to NGS platforms. The quality of the public assemblies and of the associated gene annotations was evaluated using various metrics. Results obtained with the different sequencing technologies, as well as their effects on downstream processes, were analyzed. Our results demonstrate that the Illumina HiSeq 2000 sequencing system, the primary sequencing technology currently used for de novo genome sequencing and assembly at JGI, has various advantages in terms of total sequence throughput and cost, but it also introduces challenges for the downstream analyses. In all cases assembly results although on average are of high quality, need to be viewed critically and consider sources of errors in them prior to analysis. Conclusion: These data follow the evolution of microbial sequencing and downstream processing at the JGI from draft genome sequences with large gaps corresponding to missing genes of significant biological role to assemblies with multiple small gaps (Illumina) and finally to assemblies that generate almost complete genomes (Illumina+PacBio).

  14. A Probabilistic Genome-Wide Gene Reading Frame Sequence Model

    DEFF Research Database (Denmark)

    Have, Christian Theil; Mørk, Søren

    We introduce a new type of probabilistic sequence model, that model the sequential composition of reading frames of genes in a genome. Our approach extends gene finders with a model of the sequential composition of genes at the genome-level -- effectively producing a sequential genome annotation...... and are evaluated by the effect on prediction performance. Since bacterial gene finding to a large extent is a solved problem it forms an ideal proving ground for evaluating the explicit modeling of larger scale gene sequence composition of genomes. We conclude that the sequential composition of gene reading frames...... as output. The model can be used to obtain the most probable genome annotation based on a combination of i: a gene finder score of each gene candidate and ii: the sequence of the reading frames of gene candidates through a genome. The model --- as well as a higher order variant --- is developed and tested...

  15. Marsupial Genome Sequences: Providing Insight into Evolution and Disease

    Directory of Open Access Journals (Sweden)

    Janine E. Deakin

    2012-01-01

    Full Text Available Marsupials (metatherians, with their position in vertebrate phylogeny and their unique biological features, have been studied for many years by a dedicated group of researchers, but it has only been since the sequencing of the first marsupial genome that their value has been more widely recognised. We now have genome sequences for three distantly related marsupial species (the grey short-tailed opossum, the tammar wallaby, and Tasmanian devil, with the promise of many more genomes to be sequenced in the near future, making this a particularly exciting time in marsupial genomics. The emergence of a transmissible cancer, which is obliterating the Tasmanian devil population, has increased the importance of obtaining and analysing marsupial genome sequence for understanding such diseases as well as for conservation efforts. In addition, these genome sequences have facilitated studies aimed at answering questions regarding gene and genome evolution and provided insight into the evolution of epigenetic mechanisms. Here I highlight the major advances in our understanding of evolution and disease, facilitated by marsupial genome projects, and speculate on the future contributions to be made by such sequences.

  16. Complete genome sequence of Klebsiella pneumoniae phage JD001.

    Science.gov (United States)

    Cui, Zelin; Shen, Wenbin; Wang, Zheng; Zhang, Haotian; Me, Rao; Wang, Yanchun; Zeng, Lingbin; Zhu, Yongzhang; Qin, Jinhong; He, Ping; Guo, Xiaokui

    2012-12-01

    Klebsiella pneumoniae is a member of the family Enterobacteriaceae, opportunistic pathogens that are among the eight most prevalent infectious agents in hospitals. The emergence of multidrug-resistant strains of K. pneumoniae has became a public health problem globally. To develop an effective antimicrobial agent, we isolated a bacteriophage, named JD001, from seawater and sequenced its genome. Comparative genome analysis of phage JD001 with other K. pneumoniae bacteriophages revealed that phage JD001 has little similarity to previously published K. pneumoniae phages KP15, KP32, KP34, and phiKO2. Here we announce the complete genome sequence of JD001 and report major findings from the genomic analysis.

  17. Complete genome sequence of Sulfurospirillum deleyianum type strain (5175T)

    Energy Technology Data Exchange (ETDEWEB)

    Sikorski, Johannes [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute; Copeland, A [U.S. Department of Energy, Joint Genome Institute; Glavina Del Rio, Tijana [U.S. Department of Energy, Joint Genome Institute; Nolan, Matt [U.S. Department of Energy, Joint Genome Institute; Lucas, Susan [U.S. Department of Energy, Joint Genome Institute; Chen, Feng [U.S. Department of Energy, Joint Genome Institute; Tice, Hope [U.S. Department of Energy, Joint Genome Institute; Cheng, Jan-Fang [U.S. Department of Energy, Joint Genome Institute; Saunders, Elizabeth H [Los Alamos National Laboratory (LANL); Bruce, David [Los Alamos National Laboratory (LANL); Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Pitluck, Sam [U.S. Department of Energy, Joint Genome Institute; Ovchinnikova, Galina [U.S. Department of Energy, Joint Genome Institute; Pati, Amrita [U.S. Department of Energy, Joint Genome Institute; Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Mavromatis, K [U.S. Department of Energy, Joint Genome Institute; Chen, Amy [U.S. Department of Energy, Joint Genome Institute; Palaniappan, Krishna [U.S. Department of Energy, Joint Genome Institute; Chain, Patrick S. G. [Lawrence Livermore National Laboratory (LLNL); Land, Miriam L [ORNL; Hauser, Loren John [ORNL; Chang, Yun-Juan [ORNL; Jeffries, Cynthia [Oak Ridge National Laboratory (ORNL); Detter, J. Chris [U.S. Department of Energy, Joint Genome Institute; Han, Cliff [Los Alamos National Laboratory (LANL); Rohde, Manfred [HZI - Helmholtz Centre for Infection Research, Braunschweig, Germany; Lang, Elke [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Spring, Stefan [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Goker, Markus [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Bristow, James [U.S. Department of Energy, Joint Genome Institute; Eisen, Jonathan [U.S. Department of Energy, Joint Genome Institute; Markowitz, Victor [U.S. Department of Energy, Joint Genome Institute; Hugenholtz, Philip [U.S. Department of Energy, Joint Genome Institute; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute; Klenk, Hans-Peter [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany

    2010-01-01

    Sulfurospirillum deleyianum Schumacher et al. 1993 is the type species of the genus Sulfurospirillum. S. deleyianum is a model organism for studying sulfur reduction and dissimilatory nitrate reduction as energy source for growth. Also, it is a prominent model organism for studying the structural and functional characteristics of the cytochrome c nitrite reductase. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of the genus Sulfurospirillum. The 2,306,351 bp long genome with its 2291 protein-coding and 52 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  18. Complete genome sequence of Acidimicrobium ferrooxidans type strain (ICPT)

    Energy Technology Data Exchange (ETDEWEB)

    Clum, Alicia; Nolan, Matt; Lang, Elke; Glavina Del Rio, Tijana; Tice, Hope; Copeland, Alex; Cheng, Jan-Fang; Lucas, Susan; Chen, Feng; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Ivanova, Natalia; Mavrommatis, Konstantinos; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Goker, Markus; Spring, Stefan; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jefferies, Cynthia C.; Chain, Patrick; Bristow, James; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C.; Klenk, Hans-Peter; Lapidus, Alla

    2009-05-20

    Acidimicrobium ferrooxidans (Clark and Norris 1996) is the sole and type species of the genus, which until recently was the only genus within the actinobacterial family Acidimicrobiaceae and in the order Acidomicrobiales. Rapid oxidation of iron pyrite during autotrophic growth in the absence of an enhanced CO2 concentration is characteristic for A. ferrooxidans. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of the order Acidomicrobiales, and the 2,158,157 bp long single replicon genome with its 2038 protein coding and 54 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  19. Complete genome sequence of Gordonia bronchialis type strain (3410T)

    Energy Technology Data Exchange (ETDEWEB)

    Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Sikorski, Johannes [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Jando, Marlen [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute; Nolan, Matt [U.S. Department of Energy, Joint Genome Institute; Glavina Del Rio, Tijana [U.S. Department of Energy, Joint Genome Institute; Tice, Hope [U.S. Department of Energy, Joint Genome Institute; Copeland, A [U.S. Department of Energy, Joint Genome Institute; Cheng, Jan-Fang [U.S. Department of Energy, Joint Genome Institute; Chen, Feng [U.S. Department of Energy, Joint Genome Institute; Bruce, David [Los Alamos National Laboratory (LANL); Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Pitluck, Sam [U.S. Department of Energy, Joint Genome Institute; Mavromatis, K [U.S. Department of Energy, Joint Genome Institute; Ovchinnikova, Galina [U.S. Department of Energy, Joint Genome Institute; Pati, Amrita [U.S. Department of Energy, Joint Genome Institute; Chen, Amy [U.S. Department of Energy, Joint Genome Institute; Palaniappan, Krishna [U.S. Department of Energy, Joint Genome Institute; Land, Miriam L [ORNL; Hauser, Loren John [ORNL; Chang, Yun-Juan [ORNL; Jeffries, Cynthia [Oak Ridge National Laboratory (ORNL); Chain, Patrick S. G. [Lawrence Livermore National Laboratory (LLNL); Saunders, Elizabeth H [Los Alamos National Laboratory (LANL); Han, Cliff [Los Alamos National Laboratory (LANL); Detter, J C [U.S. Department of Energy, Joint Genome Institute; Brettin, Thomas S [ORNL; Rohde, Manfred [HZI - Helmholtz Centre for Infection Research, Braunschweig, Germany; Goker, Markus [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Bristow, James [U.S. Department of Energy, Joint Genome Institute; Eisen, Jonathan [U.S. Department of Energy, Joint Genome Institute; Markowitz, Victor [U.S. Department of Energy, Joint Genome Institute; Hugenholtz, Philip [U.S. Department of Energy, Joint Genome Institute; Klenk, Hans-Peter [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute

    2010-01-01

    Gordonia bronchialis Tsukamura 1971 is the type species of the genus. G. bronchialis is a human-pathogenic organism that has been isolated from a large variety of human tissues. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of the family Gordoniaceae. The 5,290,012 bp long genome with its 4,944 protein-coding and 55 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  20. Oxford Nanopore MinION Sequencing and Genome Assembly

    Institute of Scientific and Technical Information of China (English)

    Hengyun Lu; Francesca Giordano; Zemin Ning

    2016-01-01

    The revolution of genome sequencing is continuing after the successful second-generation sequencing (SGS) technology. The third-generation sequencing (TGS) technology, led by Pacific Biosciences (PacBio), is progressing rapidly, moving from a technology once only capable of providing data for small genome analysis, or for performing targeted screening, to one that pro-mises high quality de novo assembly and structural variation detection for human-sized genomes. In 2014, the MinION, the first commercial sequencer using nanopore technology, was released by Oxford Nanopore Technologies (ONT). MinION identifies DNA bases by measuring the changes in electrical conductivity generated as DNA strands pass through a biological pore. Its portability, affordability, and speed in data production makes it suitable for real-time applications, the release of the long read sequencer MinION has thus generated much excitement and interest in the geno-mics community. While de novo genome assemblies can be cheaply produced from SGS data, assem-bly continuity is often relatively poor, due to the limited ability of short reads to handle long repeats. Assembly quality can be greatly improved by using TGS long reads, since repetitive regions can be easily expanded into using longer sequencing lengths, despite having higher error rates at the base level. The potential of nanopore sequencing has been demonstrated by various studies in gen-ome surveillance at locations where rapid and reliable sequencing is needed, but where resources are limited.

  1. Genome sequencing and annotation of Aeromonas sp. HZM

    Directory of Open Access Journals (Sweden)

    Patric Chua

    2015-09-01

    Full Text Available We report the draft genome sequence of Aeromonas sp. strain HZM, isolated from tropical peat swamp forest soil. The draft genome size is 4,451,364 bp with a G + C content of 61.7% and contains 10 rRNA sequences (eight copies of 5S rRNA genes, single copy of 16S and 23S rRNA each. The genome sequence can be accessed at DDBJ/EMBL/GenBank under the accession no. JEMQ00000000.

  2. Complete Genome Sequence of Phytopathogenic Pectobacterium atrosepticum Bacteriophage Peat1.

    Science.gov (United States)

    Kalischuk, Melanie; Hachey, John; Kawchuk, Lawrence

    2015-08-13

    Pectobacterium atrosepticum is a common phytopathogen causing significant economic losses worldwide. To develop a biocontrol strategy for this blackleg pathogen of solanaceous plants, P. atrosepticum bacteriophage Peat1 was isolated and its genome completely sequenced. Interestingly, morphological and sequence analyses of the 45,633-bp genome revealed that phage Peat1 is a member of the family Podoviridae and most closely resembles the Klebsiella pneumoniae bacteriophage KP34. This is the first published complete genome sequence of a phytopathogenic P. atrosepticum bacteriophage, and details provide important information for the development of biocontrol by advancing our understanding of phage-phytopathogen interactions.

  3. Advantages of genome sequencing by long-read sequencer using SMRT technology in medical area.

    Science.gov (United States)

    Nakano, Kazuma; Shiroma, Akino; Shimoji, Makiko; Tamotsu, Hinako; Ashimine, Noriko; Ohki, Shun; Shinzato, Misuzu; Minami, Maiko; Nakanishi, Tetsuhiro; Teruya, Kuniko; Satou, Kazuhito; Hirano, Takashi

    2017-07-01

    PacBio RS II is the first commercialized third-generation DNA sequencer able to sequence a single molecule DNA in real-time without amplification. PacBio RS II's sequencing technology is novel and unique, enabling the direct observation of DNA synthesis by DNA polymerase. PacBio RS II confers four major advantages compared to other sequencing technologies: long read lengths, high consensus accuracy, a low degree of bias, and simultaneous capability of epigenetic characterization. These advantages surmount the obstacle of sequencing genomic regions such as high/low G+C, tandem repeat, and interspersed repeat regions. Moreover, PacBio RS II is ideal for whole genome sequencing, targeted sequencing, complex population analysis, RNA sequencing, and epigenetics characterization. With PacBio RS II, we have sequenced and analyzed the genomes of many species, from viruses to humans. Herein, we summarize and review some of our key genome sequencing projects, including full-length viral sequencing, complete bacterial genome and almost-complete plant genome assemblies, and long amplicon sequencing of a disease-associated gene region. We believe that PacBio RS II is not only an effective tool for use in the basic biological sciences but also in the medical/clinical setting.

  4. The Genomic Scrapheap Challenge; Extracting Relevant Data from Unmapped Whole Genome Sequencing Reads, Including Strain Specific Genomic Segments, in Rats.

    Science.gov (United States)

    van der Weide, Robin H; Simonis, Marieke; Hermsen, Roel; Toonen, Pim; Cuppen, Edwin; de Ligt, Joep

    2016-01-01

    Unmapped next-generation sequencing reads are typically ignored while they contain biologically relevant information. We systematically analyzed unmapped reads from whole genome sequencing of 33 inbred rat strains. High quality reads were selected and enriched for biologically relevant sequences; similarity-based analysis revealed clustering similar to previously reported phylogenetic trees. Our results demonstrate that on average 20% of all unmapped reads harbor sequences that can be used to improve reference genomes and generate hypotheses on potential genotype-phenotype relationships. Analysis pipelines would benefit from incorporating the described methods and reference genomes would benefit from inclusion of the genomic segments obtained through these efforts.

  5. MIPS: a database for genomes and protein sequences.

    Science.gov (United States)

    Mewes, H W; Frishman, D; Güldener, U; Mannhaupt, G; Mayer, K; Mokrejs, M; Morgenstern, B; Münsterkötter, M; Rudd, S; Weil, B

    2002-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF, Neuherberg, Germany) continues to provide genome-related information in a systematic way. MIPS supports both national and European sequencing and functional analysis projects, develops and maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences, and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the databases for the comprehensive set of genomes (PEDANT genomes), the database of annotated human EST clusters (HIB), the database of complete cDNAs from the DHGP (German Human Genome Project), as well as the project specific databases for the GABI (Genome Analysis in Plants) and HNB (Helmholtz-Netzwerk Bioinformatik) networks. The Arabidospsis thaliana database (MATDB), the database of mitochondrial proteins (MITOP) and our contribution to the PIR International Protein Sequence Database have been described elsewhere [Schoof et al. (2002) Nucleic Acids Res., 30, 91-93; Scharfe et al. (2000) Nucleic Acids Res., 28, 155-158; Barker et al. (2001) Nucleic Acids Res., 29, 29-32]. All databases described, the protein analysis tools provided and the detailed descriptions of our projects can be accessed through the MIPS World Wide Web server (http://mips.gsf.de).

  6. ICDS database: interrupted CoDing sequences in prokaryotic genomes.

    Science.gov (United States)

    Perrodou, Emmanuel; Deshayes, Caroline; Muller, Jean; Schaeffer, Christine; Van Dorsselaer, Alain; Ripp, Raymond; Poch, Olivier; Reyrat, Jean-Marc; Lecompte, Odile

    2006-01-01

    Unrecognized frameshifts, in-frame stop codons and sequencing errors lead to Interrupted CoDing Sequence (ICDS) that can seriously affect all subsequent steps of functional characterization, from in silico analysis to high-throughput proteomic projects. Here, we describe the Interrupted CoDing Sequence database containing ICDS detected by a similarity-based approach in 80 complete prokaryotic genomes. ICDS can be retrieved by species browsing or similarity searches via a web interface (http://www-bio3d-igbmc.u-strasbg.fr/ICDS/). The definition of each interrupted gene is provided as well as the ICDS genomic localization with the surrounding sequence. Furthermore, to facilitate the experimental characterization of ICDS, we propose optimized primers for re-sequencing purposes. The database will be regularly updated with additional data from ongoing sequenced genomes. Our strategy has been validated by three independent tests: (i) ICDS prediction on a benchmark of artificially created frameshifts, (ii) comparison of predicted ICDS and results obtained from the comparison of the two genomic sequences of Bacillus licheniformis strain ATCC 14580 and (iii) re-sequencing of 25 predicted ICDS of the recently sequenced genome of Mycobacterium smegmatis. This allows us to estimate the specificity and sensitivity (95 and 82%, respectively) of our program and the efficiency of primer determination.

  7. Genome-scale validation of deep-sequencing libraries.

    Directory of Open Access Journals (Sweden)

    Dominic Schmidt

    Full Text Available Chromatin immunoprecipitation followed by high-throughput (HTP sequencing (ChIP-seq is a powerful tool to establish protein-DNA interactions genome-wide. The primary limitation of its broad application at present is the often-limited access to sequencers. Here we report a protocol, Mab-seq, that generates genome-scale quality evaluations for nucleic acid libraries intended for deep-sequencing. We show how commercially available genomic microarrays can be used to maximize the efficiency of library creation and quickly generate reliable preliminary data on a chromosomal scale in advance of deep sequencing. We also exploit this technique to compare enriched regions identified using microarrays with those identified by sequencing, demonstrating that they agree on a core set of clearly identified enriched regions, while characterizing the additional enriched regions identifiable using HTP sequencing.

  8. Genomic treasure troves: complete genome sequencing of herbarium and insect museum specimens.

    Directory of Open Access Journals (Sweden)

    Martijn Staats

    Full Text Available Unlocking the vast genomic diversity stored in natural history collections would create unprecedented opportunities for genome-scale evolutionary, phylogenetic, domestication and population genomic studies. Many researchers have been discouraged from using historical specimens in molecular studies because of both generally limited success of DNA extraction and the challenges associated with PCR-amplifying highly degraded DNA. In today's next-generation sequencing (NGS world, opportunities and prospects for historical DNA have changed dramatically, as most NGS methods are actually designed for taking short fragmented DNA molecules as templates. Here we show that using a standard multiplex and paired-end Illumina sequencing approach, genome-scale sequence data can be generated reliably from dry-preserved plant, fungal and insect specimens collected up to 115 years ago, and with minimal destructive sampling. Using a reference-based assembly approach, we were able to produce the entire nuclear genome of a 43-year-old Arabidopsis thaliana (Brassicaceae herbarium specimen with high and uniform sequence coverage. Nuclear genome sequences of three fungal specimens of 22-82 years of age (Agaricus bisporus, Laccaria bicolor, Pleurotus ostreatus were generated with 81.4-97.9% exome coverage. Complete organellar genome sequences were assembled for all specimens. Using de novo assembly we retrieved between 16.2-71.0% of coding sequence regions, and hence remain somewhat cautious about prospects for de novo genome assembly from historical specimens. Non-target sequence contaminations were observed in 2 of our insect museum specimens. We anticipate that future museum genomics projects will perhaps not generate entire genome sequences in all cases (our specimens contained relatively small and low-complexity genomes, but at least generating vital comparative genomic data for testing (phylogenetic, demographic and genetic hypotheses, that become increasingly more

  9. Genomic treasure troves: complete genome sequencing of herbarium and insect museum specimens.

    Science.gov (United States)

    Staats, Martijn; Erkens, Roy H J; van de Vossenberg, Bart; Wieringa, Jan J; Kraaijeveld, Ken; Stielow, Benjamin; Geml, József; Richardson, James E; Bakker, Freek T

    2013-01-01

    Unlocking the vast genomic diversity stored in natural history collections would create unprecedented opportunities for genome-scale evolutionary, phylogenetic, domestication and population genomic studies. Many researchers have been discouraged from using historical specimens in molecular studies because of both generally limited success of DNA extraction and the challenges associated with PCR-amplifying highly degraded DNA. In today's next-generation sequencing (NGS) world, opportunities and prospects for historical DNA have changed dramatically, as most NGS methods are actually designed for taking short fragmented DNA molecules as templates. Here we show that using a standard multiplex and paired-end Illumina sequencing approach, genome-scale sequence data can be generated reliably from dry-preserved plant, fungal and insect specimens collected up to 115 years ago, and with minimal destructive sampling. Using a reference-based assembly approach, we were able to produce the entire nuclear genome of a 43-year-old Arabidopsis thaliana (Brassicaceae) herbarium specimen with high and uniform sequence coverage. Nuclear genome sequences of three fungal specimens of 22-82 years of age (Agaricus bisporus, Laccaria bicolor, Pleurotus ostreatus) were generated with 81.4-97.9% exome coverage. Complete organellar genome sequences were assembled for all specimens. Using de novo assembly we retrieved between 16.2-71.0% of coding sequence regions, and hence remain somewhat cautious about prospects for de novo genome assembly from historical specimens. Non-target sequence contaminations were observed in 2 of our insect museum specimens. We anticipate that future museum genomics projects will perhaps not generate entire genome sequences in all cases (our specimens contained relatively small and low-complexity genomes), but at least generating vital comparative genomic data for testing (phylo)genetic, demographic and genetic hypotheses, that become increasingly more horizontal

  10. Real-time, portable genome sequencing for Ebola surveillance

    Science.gov (United States)

    Bore, Joseph Akoi; Koundouno, Raymond; Dudas, Gytis; Mikhail, Amy; Ouédraogo, Nobila; Afrough, Babak; Bah, Amadou; Baum, Jonathan HJ; Becker-Ziaja, Beate; Boettcher, Jan-Peter; Cabeza-Cabrerizo, Mar; Camino-Sanchez, Alvaro; Carter, Lisa L.; Doerrbecker, Juiliane; Enkirch, Theresa; Dorival, Isabel Graciela García; Hetzelt, Nicole; Hinzmann, Julia; Holm, Tobias; Kafetzopoulou, Liana Eleni; Koropogui, Michel; Kosgey, Abigail; Kuisma, Eeva; Logue, Christopher H; Mazzarelli, Antonio; Meisel, Sarah; Mertens, Marc; Michel, Janine; Ngabo, Didier; Nitzsche, Katja; Pallash, Elisa; Patrono, Livia Victoria; Portmann, Jasmine; Repits, Johanna Gabriella; Rickett, Natasha Yasmin; Sachse, Andrea; Singethan, Katrin; Vitoriano, Inês; Yemanaberhan, Rahel L; Zekeng, Elsa G; Trina, Racine; Bello, Alexander; Sall, Amadou Alpha; Faye, Ousmane; Faye, Oumar; Magassouba, N’Faly; Williams, Cecelia V.; Amburgey, Victoria; Winona, Linda; Davis, Emily; Gerlach, Jon; Washington, Franck; Monteil, Vanessa; Jourdain, Marine; Bererd, Marion; Camara, Alimou; Somlare, Hermann; Camara, Abdoulaye; Gerard, Marianne; Bado, Guillaume; Baillet, Bernard; Delaune, Déborah; Nebie, Koumpingnin Yacouba; Diarra, Abdoulaye; Savane, Yacouba; Pallawo, Raymond Bernard; Gutierrez, Giovanna Jaramillo; Milhano, Natacha; Roger, Isabelle; Williams, Christopher J; Yattara, Facinet; Lewandowski, Kuiama; Taylor, Jamie; Rachwal, Philip; Turner, Daniel; Pollakis, Georgios; Hiscox, Julian A.; Matthews, David A.; O’Shea, Matthew K.; Johnston, Andrew McD; Wilson, Duncan; Hutley, Emma; Smit, Erasmus; Di Caro, Antonino; Woelfel, Roman; Stoecker, Kilian; Fleischmann, Erna; Gabriel, Martin; Weller, Simon A.; Koivogui, Lamine; Diallo, Boubacar; Keita, Sakoba; Rambaut, Andrew; Formenty, Pierre; Gunther, Stephan; Carroll, Miles W.

    2016-01-01

    The Ebola virus disease (EVD) epidemic in West Africa is the largest on record, responsible for >28,599 cases and >11,299 deaths 1. Genome sequencing in viral outbreaks is desirable in order to characterize the infectious agent to determine its evolutionary rate, signatures of host adaptation, identification and monitoring of diagnostic targets and responses to vaccines and treatments. The Ebola virus genome (EBOV) substitution rate in the Makona strain has been estimated at between 0.87 × 10−3 to 1.42 × 10−3 mutations per site per year. This is equivalent to 16 to 27 mutations in each genome, meaning that sequences diverge rapidly enough to identify distinct sub-lineages during a prolonged epidemic 2-7. Genome sequencing provides a high-resolution view of pathogen evolution and is increasingly sought-after for outbreak surveillance. Sequence data may be used to guide control measures, but only if the results are generated quickly enough to inform interventions 8. Genomic surveillance during the epidemic has been sporadic due to a lack of local sequencing capacity coupled with practical difficulties transporting samples to remote sequencing facilities 9. In order to address this problem, we devised a genomic surveillance system that utilizes a novel nanopore DNA sequencing instrument. In April 2015 this system was transported in standard airline luggage to Guinea and used for real-time genomic surveillance of the ongoing epidemic. Here we present sequence data and analysis of 142 Ebola virus (EBOV) samples collected during the period March to October 2015. We were able to generate results in less than 24 hours after receiving an Ebola positive sample, with the sequencing process taking as little as 15-60 minutes. We show that real-time genomic surveillance is possible in resource-limited settings and can be established rapidly to monitor outbreaks. PMID:26840485

  11. Genome sequence, comparative analysis and haplotype structure of the domestic dog.

    Science.gov (United States)

    Lindblad-Toh, Kerstin; Wade, Claire M; Mikkelsen, Tarjei S; Karlsson, Elinor K; Jaffe, David B; Kamal, Michael; Clamp, Michele; Chang, Jean L; Kulbokas, Edward J; Zody, Michael C; Mauceli, Evan; Xie, Xiaohui; Breen, Matthew; Wayne, Robert K; Ostrander, Elaine A; Ponting, Chris P; Galibert, Francis; Smith, Douglas R; DeJong, Pieter J; Kirkness, Ewen; Alvarez, Pablo; Biagi, Tara; Brockman, William; Butler, Jonathan; Chin, Chee-Wye; Cook, April; Cuff, James; Daly, Mark J; DeCaprio, David; Gnerre, Sante; Grabherr, Manfred; Kellis, Manolis; Kleber, Michael; Bardeleben, Carolyne; Goodstadt, Leo; Heger, Andreas; Hitte, Christophe; Kim, Lisa; Koepfli, Klaus-Peter; Parker, Heidi G; Pollinger, John P; Searle, Stephen M J; Sutter, Nathan B; Thomas, Rachael; Webber, Caleb; Baldwin, Jennifer; Abebe, Adal; Abouelleil, Amr; Aftuck, Lynne; Ait-Zahra, Mostafa; Aldredge, Tyler; Allen, Nicole; An, Peter; Anderson, Scott; Antoine, Claudel; Arachchi, Harindra; Aslam, Ali; Ayotte, Laura; Bachantsang, Pasang; Barry, Andrew; Bayul, Tashi; Benamara, Mostafa; Berlin, Aaron; Bessette, Daniel; Blitshteyn, Berta; Bloom, Toby; Blye, Jason; Boguslavskiy, Leonid; Bonnet, Claude; Boukhgalter, Boris; Brown, Adam; Cahill, Patrick; Calixte, Nadia; Camarata, Jody; Cheshatsang, Yama; Chu, Jeffrey; Citroen, Mieke; Collymore, Alville; Cooke, Patrick; Dawoe, Tenzin; Daza, Riza; Decktor, Karin; DeGray, Stuart; Dhargay, Norbu; Dooley, Kimberly; Dooley, Kathleen; Dorje, Passang; Dorjee, Kunsang; Dorris, Lester; Duffey, Noah; Dupes, Alan; Egbiremolen, Osebhajajeme; Elong, Richard; Falk, Jill; Farina, Abderrahim; Faro, Susan; Ferguson, Diallo; Ferreira, Patricia; Fisher, Sheila; FitzGerald, Mike; Foley, Karen; Foley, Chelsea; Franke, Alicia; Friedrich, Dennis; Gage, Diane; Garber, Manuel; Gearin, Gary; Giannoukos, Georgia; Goode, Tina; Goyette, Audra; Graham, Joseph; Grandbois, Edward; Gyaltsen, Kunsang; Hafez, Nabil; Hagopian, Daniel; Hagos, Birhane; Hall, Jennifer; Healy, Claire; Hegarty, Ryan; Honan, Tracey; Horn, Andrea; Houde, Nathan; Hughes, Leanne; Hunnicutt, Leigh; Husby, M; Jester, Benjamin; Jones, Charlien; Kamat, Asha; Kanga, Ben; Kells, Cristyn; Khazanovich, Dmitry; Kieu, Alix Chinh; Kisner, Peter; Kumar, Mayank; Lance, Krista; Landers, Thomas; Lara, Marcia; Lee, William; Leger, Jean-Pierre; Lennon, Niall; Leuper, Lisa; LeVine, Sarah; Liu, Jinlei; Liu, Xiaohong; Lokyitsang, Yeshi; Lokyitsang, Tashi; Lui, Annie; Macdonald, Jan; Major, John; Marabella, Richard; Maru, Kebede; Matthews, Charles; McDonough, Susan; Mehta, Teena; Meldrim, James; Melnikov, Alexandre; Meneus, Louis; Mihalev, Atanas; Mihova, Tanya; Miller, Karen; Mittelman, Rachel; Mlenga, Valentine; Mulrain, Leonidas; Munson, Glen; Navidi, Adam; Naylor, Jerome; Nguyen, Tuyen; Nguyen, Nga; Nguyen, Cindy; Nguyen, Thu; Nicol, Robert; Norbu, Nyima; Norbu, Choe; Novod, Nathaniel; Nyima, Tenchoe; Olandt, Peter; O'Neill, Barry; O'Neill, Keith; Osman, Sahal; Oyono, Lucien; Patti, Christopher; Perrin, Danielle; Phunkhang, Pema; Pierre, Fritz; Priest, Margaret; Rachupka, Anthony; Raghuraman, Sujaa; Rameau, Rayale; Ray, Verneda; Raymond, Christina; Rege, Filip; Rise, Cecil; Rogers, Julie; Rogov, Peter; Sahalie, Julie; Settipalli, Sampath; Sharpe, Theodore; Shea, Terrance; Sheehan, Mechele; Sherpa, Ngawang; Shi, Jianying; Shih, Diana; Sloan, Jessie; Smith, Cherylyn; Sparrow, Todd; Stalker, John; Stange-Thomann, Nicole; Stavropoulos, Sharon; Stone, Catherine; Stone, Sabrina; Sykes, Sean; Tchuinga, Pierre; Tenzing, Pema; Tesfaye, Senait; Thoulutsang, Dawa; Thoulutsang, Yama; Topham, Kerri; Topping, Ira; Tsamla, Tsamla; Vassiliev, Helen; Venkataraman, Vijay; Vo, Andy; Wangchuk, Tsering; Wangdi, Tsering; Weiand, Michael; Wilkinson, Jane; Wilson, Adam; Yadav, Shailendra; Yang, Shuli; Yang, Xiaoping; Young, Geneva; Yu, Qing; Zainoun, Joanne; Zembek, Lisa; Zimmer, Andrew; Lander, Eric S

    2005-12-08

    Here we report a high-quality draft genome sequence of the domestic dog (Canis familiaris), together with a dense map of single nucleotide polymorphisms (SNPs) across breeds. The dog is of particular interest because it provides important evolutionary information and because existing breeds show great phenotypic diversity for morphological, physiological and behavioural traits. We use sequence comparison with the primate and rodent lineages to shed light on the structure and evolution of genomes and genes. Notably, the majority of the most highly conserved non-coding sequences in mammalian genomes are clustered near a small subset of genes with important roles in development. Analysis of SNPs reveals long-range haplotypes across the entire dog genome, and defines the nature of genetic diversity within and across breeds. The current SNP map now makes it possible for genome-wide association studies to identify genes responsible for diseases and traits, with important consequences for human and companion animal health.

  12. Sequencing of chloroplast genome using whole cellular DNA and Solexa sequencing technology

    Directory of Open Access Journals (Sweden)

    Jian eWu

    2012-11-01

    Full Text Available Sequencing of the chloroplast genome using traditional sequencing methods has been difficult because of its size (>120 kb and the complicated procedures required to prepare templates. To explore the feasibility of sequencing the chloroplast genome using DNA extracted from whole cells and Solexa sequencing technology, we sequenced whole cellular DNA isolated from leaves of three Brassica rapa accessions with one lane per accession. In total, 246 Mb, 362Mb, 361 Mb sequence data were generated for the three accessions Chiifu-401-42, Z16 and FT, respectively. Microreads were assembled by reference-guided assembly using the cpDNA sequences of B. rapa, Arabidopsis thaliana, and Nicotiana tabacum. We achieved coverage of more than 99.96% of the cp genome in the three tested accessions using the B. rapa sequence as the reference. When A. thaliana or N. tabacum sequences were used as references, 99.7–99.8% or 95.5–99.7% of the B. rapa chloroplast genome was covered, respectively. These results demonstrated that sequencing of whole cellular DNA isolated from young leaves using the Illumina Genome Analyzer is an efficient method for high-throughput sequencing of chloroplast genome.

  13. Complete genome sequence of Cellulomonas flavigena type strain (134T)

    Energy Technology Data Exchange (ETDEWEB)

    Abt, Birte [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Foster, Brian [U.S. Department of Energy, Joint Genome Institute; Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute; Clum, Alicia [U.S. Department of Energy, Joint Genome Institute; Sun, Hui [U.S. Department of Energy, Joint Genome Institute; Pukall, Rudiger [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Lucas, Susan [U.S. Department of Energy, Joint Genome Institute; Glavina Del Rio, Tijana [U.S. Department of Energy, Joint Genome Institute; Nolan, Matt [U.S. Department of Energy, Joint Genome Institute; Tice, Hope [U.S. Department of Energy, Joint Genome Institute; Cheng, Jan-Fang [U.S. Department of Energy, Joint Genome Institute; Pitluck, Sam [U.S. Department of Energy, Joint Genome Institute; Liolios, Konstantinos [U.S. Department of Energy, Joint Genome Institute; Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Mavromatis, K [U.S. Department of Energy, Joint Genome Institute; Ovchinnikova, Galina [U.S. Department of Energy, Joint Genome Institute; Pati, Amrita [U.S. Department of Energy, Joint Genome Institute; Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Chen, Amy [U.S. Department of Energy, Joint Genome Institute; Palaniappan, Krishna [U.S. Department of Energy, Joint Genome Institute; Land, Miriam L [ORNL; Hauser, Loren John [ORNL; Chang, Yun-Juan [ORNL; Jeffries, Cynthia [Oak Ridge National Laboratory (ORNL); Rohde, Manfred [HZI - Helmholtz Centre for Infection Research, Braunschweig, Germany; Goker, Markus [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Woyke, Tanja [U.S. Department of Energy, Joint Genome Institute; Bristow, James [U.S. Department of Energy, Joint Genome Institute; Eisen, Jonathan [U.S. Department of Energy, Joint Genome Institute; Markowitz, Victor [U.S. Department of Energy, Joint Genome Institute; Hugenholtz, Philip [U.S. Department of Energy, Joint Genome Institute; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute; Klenk, Hans-Peter [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany

    2010-01-01

    Cellulomonas flavigena (Kellerman and McBeth 1912) Bergey et al. 1923 is the type species of the genus Cellulomonas of the actinobacterial family Cellulomonadaceae. Members of the genus Cellulomonas are of special interest for their ability to degrade cellulose and hemicellulose, particularly with regard to the use of biomass as an alternative energy source. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of a member of the genus Cellulomonas, and next to the human pathogen Tropheryma whipplei the second complete genome sequence within the actinobacterial family Cellulomonadaceae. The 4,123,179 bp long single replicon genome with its 3,735 protein-coding and 53 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  14. Comparative genomics beyond sequence-based alignments

    DEFF Research Database (Denmark)

    Þórarinsson, Elfar; Yao, Zizhen; Wiklund, Eric D.;

    2008-01-01

    Recent computational scans for non-coding RNAs (ncRNAs) in multiple organisms have relied on existing multiple sequence alignments. However, as sequence similarity drops, a key signal of RNA structure--frequent compensating base changes--is increasingly likely to cause sequence-based alignment me...

  15. Analysis of Simple Sequence Repeats in Genomes of Rhizobia

    Institute of Scientific and Technical Information of China (English)

    GAO Ya-mei; HAN Yi-qiang; TANG Hui; SUN Dong-mei; WANG Yan-jie; WANG Wei-dong

    2008-01-01

    Simple sequence repeats (SSRs) or microsatellites, as genetic markers, are ubiquitous in genomes of various organisms. The analysis of SSR in rhizobia genome provides useful information for a variety of applications in population genetics of rhizobia. We analyzed the occurrences, relative abundance, and relative density of SSRs, the most common in Bradyrhizobium japonicum, Mesorhizobium loti, and Sinorhizobium meliloti genomes se-quenced in the microorganisms tandem repeats database, and SSRs in the three species genomes were compared with each other. The result showed that there were 1 410, 859, and 638 SSRs in B. japonicum, M. loti, and 5. meliloti genomes, respectively. In the genomes of B. japonicum, M. loti, and 5. meliloti, tetranucleotide, pentanucleotide, and hexanucleotide repeats were more abundant and indicated higher mutation rates in these species. The least abundance was mononucleotide repeat. The SSRs type and distribution were similar among these species.

  16. The Arabidopsis lyrata genome sequence and the basis of rapid genome size change

    Energy Technology Data Exchange (ETDEWEB)

    Hu, Tina T.; Pattyn, Pedro; Bakker, Erica G.; Cao, Jun; Cheng, Jan-Fang; Clark, Richard M.; Fahlgren, Noah; Fawcett, Jeffrey A.; Grimwood, Jane; Gundlach, Heidrun; Haberer, Georg; Hollister, Jesse D.; Ossowski, Stephan; Ottilar, Robert P.; Salamov, Asaf A.; Schneeberger, Korbinian; Spannagl, Manuel; Wang, Xi; Yang, Liang; Nasrallah, Mikhail E.; Bergelson, Joy; Carrington, James C.; Gaut, Brandon S.; Schmutz, Jeremy; Mayer, Klaus F. X.; Van de Peer, Yves; Grigoriev, Igor V.; Nordborg, Magnus; Weigel, Detlef; Guo, Ya-Long

    2011-04-29

    In our manuscript, we present a high-quality genome sequence of the Arabidopsis thaliana relative, Arabidopsis lyrata, produced by dideoxy sequencing. We have performed the usual types of genome analysis (gene annotation, dN/dS studies etc. etc.), but this is relegated to the Supporting Information. Instead, we focus on what was a major motivation for sequencing this genome, namely to understand how A. thaliana lost half its genome in a few million years and lived to tell the tale. The rather surprising conclusion is that there is not a single genomic feature that accounts for the reduced genome, but that every aspect centromeres, intergenic regions, transposable elements, gene family number is affected through hundreds of thousands of cuts. This strongly suggests that overall genome size in itself is what has been under selection, a suggestion that is strongly supported by our demonstration (using population genetics data from A. thaliana) that new deletions seem to be driven to fixation.

  17. REVISITING MOLECULAR CLONING TO SOLVE GENOME SEQUENCING PROJECT CONFLICTS

    National Research Council Canada - National Science Library

    Hugo A Barrera-Saldaña; Aarón Daniel Ramírez-Sánchez; Tiffany Editth Palacios-Tovar; Dionicio Aguirre-Treviño; Saúl Felipe Karr-de-León

    2017-01-01

    .... Molecular cloning was chosen as the most straight-forward strategy to solve the dilemma. The initial characterization of recombinant plasmids by restriction enzyme digestion confirmed the presence of two genomic sequences...

  18. Complete Genome Sequence of Mycobacterium phlei Type Strain RIVM601174

    KAUST Repository

    Abdallah, A. M.

    2012-05-24

    Mycobacterium phlei is a rapidly growing nontuberculous Mycobacterium species that is typically nonpathogenic, with few reported cases of human disease. Here we report the whole genome sequence of M. phlei type strain RIVM601174.

  19. Draft Genome Sequence of Coprobacter fastidiosus NSB1T

    Science.gov (United States)

    Chaplin, A. V.; Efimov, B. A.; Khokhlova, E. V.; Kafarskaia, L. I.; Tupikin, A. E.; Kabilov, M. R.

    2014-01-01

    Coprobacter fastidiosus is a Gram-negative obligate anaerobic bacterium belonging to the phylum Bacteroidetes. In this work, we report the draft genome sequence of C. fastidiosus strain NSB1T isolated from human infant feces. PMID:24604645

  20. Cancer Genome Sequencing and Its Implications for Personalized Cancer Vaccines

    Energy Technology Data Exchange (ETDEWEB)

    Li, Lijin [Department of Surgery, Washington University School of Medicine, St. Louis, MO 63110 (United States); Goedegebuure, Peter [Department of Surgery, Washington University School of Medicine, St. Louis, MO 63110 (United States); The Alvin J. Siteman Cancer Center at Barnes-Jewish Hospital and Washington University School of Medicine, St. Louis, MO 63110 (United States); Mardis, Elaine R. [The Alvin J. Siteman Cancer Center at Barnes-Jewish Hospital and Washington University School of Medicine, St. Louis, MO 63110 (United States); The Genome Institute at Washington University School of Medicine, St. Louis, MO 63108 (United States); Ellis, Matthew J.C. [The Alvin J. Siteman Cancer Center at Barnes-Jewish Hospital and Washington University School of Medicine, St. Louis, MO 63110 (United States); Department of Medicine, Washington University School of Medicine, St. Louis, MO 63110 (United States); Zhang, Xiuli; Herndon, John M. [Department of Surgery, Washington University School of Medicine, St. Louis, MO 63110 (United States); Fleming, Timothy P. [Department of Surgery, Washington University School of Medicine, St. Louis, MO 63110 (United States); The Alvin J. Siteman Cancer Center at Barnes-Jewish Hospital and Washington University School of Medicine, St. Louis, MO 63110 (United States); Carreno, Beatriz M. [The Alvin J. Siteman Cancer Center at Barnes-Jewish Hospital and Washington University School of Medicine, St. Louis, MO 63110 (United States); Department of Medicine, Washington University School of Medicine, St. Louis, MO 63110 (United States); Hansen, Ted H. [The Alvin J. Siteman Cancer Center at Barnes-Jewish Hospital and Washington University School of Medicine, St. Louis, MO 63110 (United States); Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO 63110 (United States); Gillanders, William E., E-mail: gillandersw@wudosis.wustl.edu [Department of Surgery, Washington University School of Medicine, St. Louis, MO 63110 (United States); The Alvin J. Siteman Cancer Center at Barnes-Jewish Hospital and Washington University School of Medicine, St. Louis, MO 63110 (United States)

    2011-11-25

    New DNA sequencing platforms have revolutionized human genome sequencing. The dramatic advances in genome sequencing technologies predict that the $1,000 genome will become a reality within the next few years. Applied to cancer, the availability of cancer genome sequences permits real-time decision-making with the potential to affect diagnosis, prognosis, and treatment, and has opened the door towards personalized medicine. A promising strategy is the identification of mutated tumor antigens, and the design of personalized cancer vaccines. Supporting this notion are preliminary analyses of the epitope landscape in breast cancer suggesting that individual tumors express significant numbers of novel antigens to the immune system that can be specifically targeted through cancer vaccines.

  1. Draft Genome Sequences of Nine Cyanobacterial Strains from Diverse Habitats

    Science.gov (United States)

    Zhu, Tao; Hou, Shengwei

    2017-01-01

    ABSTRACT Here, we report the annotated draft genome sequences of nine different cyanobacteria, which were originally collected from different habitats, including hot springs, terrestrial, freshwater, and marine environments, and cover four of the five morphological subsections of cyanobacteria. PMID:28254973

  2. Cancer Genome Sequencing and Its Implications for Personalized Cancer Vaccines

    Directory of Open Access Journals (Sweden)

    William E. Gillanders

    2011-11-01

    Full Text Available New DNA sequencing platforms have revolutionized human genome sequencing. The dramatic advances in genome sequencing technologies predict that the $1,000 genome will become a reality within the next few years. Applied to cancer, the availability of cancer genome sequences permits real-time decision-making with the potential to affect diagnosis, prognosis, and treatment, and has opened the door towards personalized medicine. A promising strategy is the identification of mutated tumor antigens, and the design of personalized cancer vaccines. Supporting this notion are preliminary analyses of the epitope landscape in breast cancer suggesting that individual tumors express significant numbers of novel antigens to the immune system that can be specifically targeted through cancer vaccines.

  3. Genome sequence of vanilla distortion mosaic virus infecting Coriandrum sativum.

    Science.gov (United States)

    Adams, I P; Rai, S; Deka, M; Harju, V; Hodges, T; Hayward, G; Skelton, A; Fox, A; Boonham, N

    2014-12-01

    The 9573-nucleotide genome of a potyvirus was sequenced from a Coriandrum sativum plant from India with viral symptoms. On analysis, this virus was shown to have greater than 85 % nucleotide sequence identity to vanilla distortion mosaic virus (VDMV). Analysis of the putative coat protein sequence confirmed that this virus was in fact VDMV, with greater than 91 % amino acid sequence identity. The genome appears to encode a 3083-amino-acid polyprotein potentially cleaved into the 10 mature proteins expected in potyviruses. Phylogenetic analysis confirmed that VDMV is a distinct but ungrouped member of the genus Potyvirus.

  4. Next-generation sequencing technology for genetics and genomics of sorghum

    DEFF Research Database (Denmark)

    Luo, Hong; Mocoeur, Anne Raymonde Joelle; Jing, Hai-Chun

    2014-01-01

    NGS platforms, comparing their working theories and reveiwing their advantages and disavantages. We also discuss the future of NGS development and point out that single molecular sequencing would push the technology to the next level for biological sciences. Much of the chapter focuses on the use......The invention and application of Next-Generation Sequencing (NGS) technologies have revolutionized the study of genetics and genomics. Much research which would not even be considered are nowdays being excuted in many laboratories as routine. In this chapter, we introduce the currently available...... of NGS technologies in sorghum. Although the acquisition of the first whole-genome sequence in sorghum was carried out primarily using Sanger sequencing, the use of NGS for examining the genome-wide variation was almost synchronized with other work. Interesting genomic variation was found between sweet...

  5. Intra-species sequence comparisons for annotating genomes

    Energy Technology Data Exchange (ETDEWEB)

    Boffelli, Dario; Weer, Claire V.; Weng, Li; Lewis, Keith D.; Shoukry, Malak I.; Pachter, Lior; Keys, David N.; Rubin, Edward M.

    2004-07-15

    Analysis of sequence variation among members of a single species offers a potential approach to identify functional DNA elements responsible for biological features unique to that species. Due to its high rate of allelic polymorphism and ease of genetic manipulability, we chose the sea squirt, Ciona intestinalis, to explore intra-species sequence comparisons for genome annotation. A large number of C. intestinalis specimens were collected from four continents and a set of genomic intervals amplified, resequenced and analyzed to determine the mutation rates at each nucleotide in the sequence. We found that regions with low mutation rates efficiently demarcated functionally constrained sequences: these include a set of noncoding elements, which we showed in C intestinalis transgenic assays to act as tissue-specific enhancers, as well as the location of coding sequences. This illustrates that comparisons of multiple members of a species can be used for genome annotation, suggesting a path for the annotation of the sequenced genomes of organisms occupying uncharacterized phylogenetic branches of the animal kingdom and raises the possibility that the resequencing of a large number of Homo sapiens individuals might be used to annotate the human genome and identify sequences defining traits unique to our species. The sequence data from this study has been submitted to GenBank under accession nos. AY667278-AY667407.

  6. Genome Sequence of Mycobacterium Phage Waterfoul

    Science.gov (United States)

    Jackson, Paige N.; Embry, Ella K.; Johnson, Christa O.; Watson, Tiara L.; Weast, Sayre K.; DeGraw, Caroline J.; Douglas, Jessica R.; Sellers, J. Michael; D’Angelo, William A.

    2016-01-01

    Waterfoul is a newly isolated temperate siphovirus of Mycobacterium smegmatis mc2155. It was identified as a member of the K5 cluster of Mycobacterium phages and has a 61,248-bp genome with 95 predicted genes. PMID:27856585

  7. Draft genome sequence of the Tibetan antelope

    OpenAIRE

    Ge, Ri-Li; Cai, Qingle; Shen, Yong-Yi; San, A.; Ma, Lan; Zhang, Yong; Yi, Xin; Chen, Yan; Yang, Lingfeng; Huang, Ying; He, Rongjun; Hui, Yuanyuan; Hao, Meirong; Li, Yue; Wang, Bo

    2013-01-01

    The Tibetan antelope (Pantholops hodgsonii) is endemic to the extremely inhospitable high-altitude environment of the Qinghai-Tibetan Plateau, a region that has a low partial pressure of oxygen and high ultraviolet radiation. Here we generate a draft genome of this artiodactyl and use it to detect the potential genetic bases of highland adaptation. Compared with other plain-dwelling mammals, the genome of the Tibetan antelope shows signals of adaptive evolution and gene-family expansion in ge...

  8. Whole genome and transcriptome sequencing of a B3 thymoma.

    Directory of Open Access Journals (Sweden)

    Iacopo Petrini

    Full Text Available Molecular pathology of thymomas is poorly understood. Genomic aberrations are frequently identified in tumors but no extensive sequencing has been reported in thymomas. Here we present the first comprehensive view of a B3 thymoma at whole genome and transcriptome levels. A 55-year-old Caucasian female underwent complete resection of a stage IVA B3 thymoma. RNA and DNA were extracted from a snap frozen tumor sample with a fraction of cancer cells over 80%. We performed array comparative genomic hybridization using Agilent platform, transcriptome sequencing using HiSeq 2000 (Illumina and whole genome sequencing using Complete Genomics Inc platform. Whole genome sequencing determined, in tumor and normal, the sequence of both alleles in more than 95% of the reference genome (NCBI Build 37. Copy number (CN aberrations were comparable with those previously described for B3 thymomas, with CN gain of chromosome 1q, 5, 7 and X and CN loss of 3p, 6, 11q42.2-qter and q13. One translocation t(11;X was identified by whole genome sequencing and confirmed by PCR and Sanger sequencing. Ten single nucleotide variations (SNVs and 2 insertion/deletions (INDELs were identified; these mutations resulted in non-synonymous amino acid changes or affected splicing sites. The lack of common cancer-associated mutations in this patient suggests that thymomas may evolve through mechanisms distinctive from other tumor types, and supports the rationale for additional high-throughput sequencing screens to better understand the somatic genetic architecture of thymoma.

  9. Draft genome sequence of Therminicola potens strain JR

    Energy Technology Data Exchange (ETDEWEB)

    Byrne-Bailey, K.G.; Wrighton, K.C.; Melnyk, R.A.; Agbo, P.; Hazen, T.C.; Coates, J.D.

    2010-07-01

    'Thermincola potens' strain JR is one of the first Gram-positive dissimilatory metal-reducing bacteria (DMRB) for which there is a complete genome sequence. Consistent with the physiology of this organism, preliminary annotation revealed an abundance of multiheme c-type cytochromes that are putatively associated with the periplasm and cell surface in a Gram-positive bacterium. Here we report the complete genome sequence of strain JR.

  10. Complete Genome Sequence of Phytopathogenic Pectobacterium atrosepticum Bacteriophage Peat1

    OpenAIRE

    Kalischuk, Melanie; Hachey, John; Kawchuk, Lawrence

    2015-01-01

    Pectobacterium atrosepticum is a common phytopathogen causing significant economic losses worldwide. To develop a biocontrol strategy for this blackleg pathogen of solanaceous plants, P. atrosepticum bacteriophage Peat1 was isolated and its genome completely sequenced. Interestingly, morphological and sequence analyses of the 45,633-bp genome revealed that phage Peat1 is a member of the family Podoviridae and most closely resembles the Klebsiella pneumoniae bacteriophage KP34. This is the fir...

  11. Triplex-forming oligonucleotide target sequences in the human genome

    OpenAIRE

    Goñi, J Ramon; de la Cruz, Xavier; Orozco, Modesto

    2004-01-01

    The existence of sequences in the human genome which can be a target for triplex formation, and accordingly are candidates for anti-gene therapies, has been studied by using bioinformatics tools. It was found that the population of triplex-forming oligonucleotide target sequences (TTS) is much more abundant than that expected from simple random models. The population of TTS is large in all the genome, without major differences between chromosomes. A wide analysis along annotated regions of th...

  12. Draft Genome Sequence of Bacillus tequilensis Strain FJAT-14262a

    OpenAIRE

    Chen, Qian-Qian; Liu, Bo; Liu, Guo-hong; Wang, Jie-ping; Che, Jian-Mei

    2015-01-01

    Bacillus tequilensis FJAT-14262a is a Gram-positive rod-shaped bacterium. Here, we report the 4,038,551-bp genome sequence of B. tequilensis FJAT-14262a, which will provide useful information for genomic taxonomy and phylogenomics of Bacillus.

  13. Draft Genome Sequence of Bacillus tequilensis Strain FJAT-14262a.

    Science.gov (United States)

    Chen, Qian-Qian; Liu, Bo; Liu, Guo-Hong; Wang, Jie-Ping; Che, Jian-Mei

    2015-11-12

    Bacillus tequilensis FJAT-14262a is a Gram-positive rod-shaped bacterium. Here, we report the 4,038,551-bp genome sequence of B. tequilensis FJAT-14262a, which will provide useful information for genomic taxonomy and phylogenomics of Bacillus.

  14. Finished Genome Sequence of Collimonas arenae Cal35

    NARCIS (Netherlands)

    Wu, Je-Jia; de Jager, Victor; Deng, Wen-ling; Leveau, Johan

    2015-01-01

    We announce the finished genome sequence of soil forest isolate Collimonas arenae Cal35, which comprises a 5.6-Mbp chromosome and 41-kb plasmid. The Cal35 genome is the second one published for the bacterial genus Collimonas and represents the first opportunity for high-resolution comparison of geno

  15. Draft genome sequence of the silver pomfret fish, Pampus argenteus.

    Science.gov (United States)

    AlMomin, Sabah; Kumar, Vinod; Al-Amad, Sami; Al-Hussaini, Mohsen; Dashti, Talal; Al-Enezi, Khaznah; Akbar, Abrar

    2016-01-01

    Silver pomfret, Pampus argenteus, is a fish species from coastal waters. Despite its high commercial value, this edible fish has not been sequenced. Hence, its genetic and genomic studies have been limited. We report the first draft genome sequence of the silver pomfret obtained using a Next Generation Sequencing (NGS) technology. We assembled 38.7 Gb of nucleotides into scaffolds of 350 Mb with N50 of about 1.5 kb, using high quality paired end reads. These scaffolds represent 63.7% of the estimated silver pomfret genome length. The newly sequenced and assembled genome has 11.06% repetitive DNA regions, and this percentage is comparable to that of the tilapia genome. The genome analysis predicted 16 322 genes. About 91% of these genes showed homology with known proteins. Many gene clusters were annotated to protein and fatty-acid metabolism pathways that may be important in the context of the meat texture and immune system developmental processes. The reference genome can pave the way for the identification of many other genomic features that could improve breeding and population-management strategies, and it can also help characterize the genetic diversity of P. argenteus.

  16. Whole-Genome Sequences of Three Symbiotic Endozoicomonas Bacteria

    KAUST Repository

    Neave, Matthew J.

    2014-08-14

    Members of the genus Endozoicomonas associate with a wide range of marine organisms. Here, we report on the whole-genome sequencing, assembly, and annotation of three Endozoicomonas type strains. These data will assist in exploring interactions between Endozoicomonas organisms and their hosts, and it will aid in the assembly of genomes from uncultivated Endozoicomonas spp.

  17. Finished Genome Sequence of Collimonas arenae Cal35

    NARCIS (Netherlands)

    Wu, Je-Jia; de Jager, Victor; Deng, Wen-ling; Leveau, Johan

    2015-01-01

    We announce the finished genome sequence of soil forest isolate Collimonas arenae Cal35, which comprises a 5.6-Mbp chromosome and 41-kb plasmid. The Cal35 genome is the second one published for the bacterial genus Collimonas and represents the first opportunity for high-resolution comparison of geno

  18. Complete Genome Sequence of Bacillus thuringiensis Bacteriophage Smudge.

    Science.gov (United States)

    Cornell, Jessica L; Breslin, Eileen; Schuhmacher, Zachary; Himelright, Madison; Berluti, Cassandra; Boyd, Charles; Carson, Rachel; Del Gallo, Elle; Giessler, Caris; Gilliam, Benjamin; Heatherly, Catherine; Nevin, Julius; Nguyen, Bryan; Nguyen, Justin; Parada, Jocelyn; Sutterfield, Blake; Tukruni, Muruj; Temple, Louise

    2016-08-18

    Smudge, a bacteriophage enriched from soil using Bacillus thuringiensis DSM-350 as the host, had its complete genome sequenced. Smudge is a myovirus with a genome consisting of 292 genes and was identified as belonging to the C1 cluster of Bacillus phages.

  19. Complete Genome Sequence of Bacillus thuringiensis Strain 407 Cry-

    OpenAIRE

    Poehlein, Anja; Liesegang, Heiko

    2013-01-01

    Bacillus thuringiensis is an insect pathogen that has been used widely as a biopesticide. Here, we report the genome sequence of strain 407 Cry-, which is used to study the genetic determinants of pathogenicity. The genome consists of a 5.5-Mb chromosome and nine plasmids, including a novel 502-kb megaplasmid.

  20. Genome sequences of Listeria monocytogenes strains with resistance to arsenic

    Science.gov (United States)

    Listeria monocytogenes frequently exhibits resistance to arsenic. We report here the draft genome sequences of eight genetically diverse arsenic-resistant L. monocytogenes strains from human listeriosis and food-associated environments. Availability of these genomes would help to elucidate the role ...

  1. Complete genome sequence of Bifidobacterium bifidum S17.

    NARCIS (Netherlands)

    Zhurina, D.; Zomer, A.L.; Gleinser, M.; Brancaccio, V.F.; Auchter, M.; Waidmann, M.S.; Westermann, C.; Sinderen, D. van; Riedel, C.U.

    2011-01-01

    Here, we report on the first completely annotated genome sequence of a Bifidobacterium bifidum strain. B. bifidum S17, isolated from feces of a breast-fed infant, was shown to strongly adhere to intestinal epithelial cells and has potent anti-inflammatory activity in vitro and in vivo. The genome se

  2. Draft genome sequences of 10 strains of the genus exiguobacterium.

    Science.gov (United States)

    Vishnivetskaya, Tatiana A; Chauhan, Archana; Layton, Alice C; Pfiffner, Susan M; Huntemann, Marcel; Copeland, Alex; Chen, Amy; Kyrpides, Nikos C; Markowitz, Victor M; Palaniappan, Krishna; Ivanova, Natalia; Mikhailova, Natalia; Ovchinnikova, Galina; Andersen, Evan W; Pati, Amrita; Stamatis, Dimitrios; Reddy, T B K; Shapiro, Nicole; Nordberg, Henrik P; Cantor, Michael N; Hua, X Susan; Woyke, Tanja

    2014-10-16

    High-quality draft genome sequences were determined for 10 Exiguobacterium strains in order to provide insight into their evolutionary strategies for speciation and environmental adaptation. The selected genomes include psychrotrophic and thermophilic species from a range of habitats, which will allow for a comparison of metabolic pathways and stress response genes.

  3. Complete genome sequence of Aeromonas hydrophila AL06-06

    Science.gov (United States)

    Aeromonas hydrophila occurs in freshwater environments and infects fish and mammals. In this work, we report the complete genome sequence of Aeromonas hydrophila AL06-06, which was isolated from diseased goldfish and is being used for comparative genomic studies with A. hydrophila strains causing ba...

  4. The complete chloroplast genome sequence of Abies nephrolepis (Pinaceae: Abietoideae

    Directory of Open Access Journals (Sweden)

    Dong-Keun Yi

    2016-06-01

    Full Text Available The plant chloroplast (cp genome has maintained a relatively conserved structure and gene content throughout evolution. Cp genome sequences have been used widely for resolving evolutionary and phylogenetic issues at various taxonomic levels of plants. Here, we report the complete cp genome of Abies nephrolepis. The A. nephrolepis cp genome is 121,336 base pairs (bp in length including a pair of short inverted repeat regions (IRa and IRb of 139 bp each separated by a small single copy (SSC region of 54,323 bp (SSC and a large single copy region of 66,735 bp (LSC. It contains 114 genes, 68 of which are protein coding genes, 35 tRNA and four rRNA genes, six open reading frames, and one pseudogene. Seventeen repeat units and 64 simple sequence repeats (SSR have been detected in A. nephrolepis cp genome. Large IR sequences locate in 42-kb inversion points (1186 bp. The A. nephrolepis cp genome is identical to Abies koreana’s which is closely related to taxa. Pairwise comparison between two cp genomes revealed 140 polymorphic sites in each. Complete cp genome sequence of A. nephrolepis has a significant potential to provide information on the evolutionary pattern of Abietoideae and valuable data for development of DNA markers for easy identification and classification.

  5. Draft genome sequence of the sexually transmitted pathogen Trichomonas vaginalis

    NARCIS (Netherlands)

    Carlton, Jane M.; Hirt, Robert P.; Silva, Joana C.; Delcher, Arthur L.; Schatz, Michael; Zhao, Qi; Wortman, Jennifer R.; Bidwell, Shelby L.; Alsmark, U. Cecilia M.; Besteiro, Sebastien; Sicheritz-Ponten, Thomas; Noel, Christophe J.; Dacks, Joel B.; Foster, Peter G.; Simillion, Cedric; Van de Peer, Yves; Miranda-Saavedra, Diego; Barton, Geoffrey J.; Westrop, Gareth D.; Mueller, Sylke; Dessi, Daniele; Fiori, Pier Luigi; Ren, Qinghu; Paulsen, Ian; Zhang, Hanbang; Bastida-Corcuera, Felix D.; Simoes-Barbosa, Augusto; Brown, Mark T.; Hayes, Richard D.; Mukherjee, Mandira; Okumura, Cheryl Y.; Schneider, Rachel; Smith, Alias J.; Vanacova, Stepanka; Villalvazo, Maria; Haas, Brian J.; Pertea, Mihaela; Feldblyum, Tamara V.; Utterback, Terry R.; Shu, Chung-Li; Osoegawa, Kazutoyo; de Jong, Pieter J.; Hrdy, Ivan; Horvathova, Lenka; Zubacova, Zuzana; Dolezal, Pavel; Malik, Shehre-Banoo; Logsdon, John M.; Henze, Katrin; Gupta, Arti; Wang, Ching C.; Dunne, Rebecca L.; Upcroft, Jacqueline A.; Upcroft, Peter; White, Owen; Salzberg, Steven L.; Tang, Petrus; Chiu, Cheng-Hsun; Lee, Ying-Shiung; Embley, T. Martin; Coombs, Graham H.; Mottram, Jeremy C.; Tachezy, Jan; Fraser-Liggett, Claire M.; Johnson, Patricia J.

    2007-01-01

    We describe the genome sequence of the protist Trichomonas vaginalis, a sexually transmitted human pathogen. Repeats and transposable elements comprise about two-thirds of the similar to 160-megabase genome, reflecting a recent massive expansion of genetic material. This expansion, in conjunction wi

  6. Whole-genome sequence-based analysis of thyroid function

    DEFF Research Database (Denmark)

    Taylor, Peter N.; Porcu, Eleonora; Chew, Shelby

    2015-01-01

    Normal thyroid function is essential for health, but its genetic architecture remains poorly understood. Here, for the heritable thyroid traits thyrotropin (TSH) and free thyroxine (FT4), we analyse whole-genome sequence data from the UK10K project (N = 2,287). Using additional whole-genome seque...

  7. Complete Genome Sequence of Pediococcus pentosaceus Strain SL4

    DEFF Research Database (Denmark)

    Dantoft, Shruti Harnal; Bielak, Eliza Maria; Seo, Jae-Gu;

    2013-01-01

    Pediococcus pentosaceus SL4 was isolated from a Korean fermented vegetable product, kimchi. We report here the whole-genome sequence (WGS) of P. pentosaceus SL4. The genome consists of a 1.79-Mb circular chromosome (G+C content of 37.3%) and seven distinct plasmids ranging in size from 4 kb to 50...

  8. Extensive sequencing of seven human genomes to characterize benchmark reference materials.

    Science.gov (United States)

    Zook, Justin M; Catoe, David; McDaniel, Jennifer; Vang, Lindsay; Spies, Noah; Sidow, Arend; Weng, Ziming; Liu, Yuling; Mason, Christopher E; Alexander, Noah; Henaff, Elizabeth; McIntyre, Alexa B R; Chandramohan, Dhruva; Chen, Feng; Jaeger, Erich; Moshrefi, Ali; Pham, Khoa; Stedman, William; Liang, Tiffany; Saghbini, Michael; Dzakula, Zeljko; Hastie, Alex; Cao, Han; Deikus, Gintaras; Schadt, Eric; Sebra, Robert; Bashir, Ali; Truty, Rebecca M; Chang, Christopher C; Gulbahce, Natali; Zhao, Keyan; Ghosh, Srinka; Hyland, Fiona; Fu, Yutao; Chaisson, Mark; Xiao, Chunlin; Trow, Jonathan; Sherry, Stephen T; Zaranek, Alexander W; Ball, Madeleine; Bobe, Jason; Estep, Preston; Church, George M; Marks, Patrick; Kyriazopoulou-Panagiotopoulou, Sofia; Zheng, Grace X Y; Schnall-Levin, Michael; Ordonez, Heather S; Mudivarti, Patrice A; Giorda, Kristina; Sheng, Ying; Rypdal, Karoline Bjarnesdatter; Salit, Marc

    2016-06-07

    The Genome in a Bottle Consortium, hosted by the National Institute of Standards and Technology (NIST) is creating reference materials and data for human genome sequencing, as well as methods for genome comparison and benchmarking. Here, we describe a large, diverse set of sequencing data for seven human genomes; five are current or candidate NIST Reference Materials. The pilot genome, NA12878, has been released as NIST RM 8398. We also describe data from two Personal Genome Project trios, one of Ashkenazim Jewish ancestry and one of Chinese ancestry. The data come from 12 technologies: BioNano Genomics, Complete Genomics paired-end and LFR, Ion Proton exome, Oxford Nanopore, Pacific Biosciences, SOLiD, 10X Genomics GemCode WGS, and Illumina exome and WGS paired-end, mate-pair, and synthetic long reads. Cell lines, DNA, and data from these individuals are publicly available. Therefore, we expect these data to be useful for revealing novel information about the human genome and improving sequencing technologies, SNP, indel, and structural variant calling, and de novo assembly.

  9. Dissection of the octoploid strawberry genome by deep sequencing of the genomes of Fragaria species.

    Science.gov (United States)

    Hirakawa, Hideki; Shirasawa, Kenta; Kosugi, Shunichi; Tashiro, Kosuke; Nakayama, Shinobu; Yamada, Manabu; Kohara, Mistuyo; Watanabe, Akiko; Kishida, Yoshie; Fujishiro, Tsunakazu; Tsuruoka, Hisano; Minami, Chiharu; Sasamoto, Shigemi; Kato, Midori; Nanri, Keiko; Komaki, Akiko; Yanagi, Tomohiro; Guoxin, Qin; Maeda, Fumi; Ishikawa, Masami; Kuhara, Satoru; Sato, Shusei; Tabata, Satoshi; Isobe, Sachiko N

    2014-01-01

    Cultivated strawberry (Fragaria x ananassa) is octoploid and shows allogamous behaviour. The present study aims at dissecting this octoploid genome through comparison with its wild relatives, F. iinumae, F. nipponica, F. nubicola, and F. orientalis by de novo whole-genome sequencing on an Illumina and Roche 454 platforms. The total length of the assembled Illumina genome sequences obtained was 698 Mb for F. x ananassa, and ∼200 Mb each for the four wild species. Subsequently, a virtual reference genome termed FANhybrid_r1.2 was constructed by integrating the sequences of the four homoeologous subgenomes of F. x ananassa, from which heterozygous regions in the Roche 454 and Illumina genome sequences were eliminated. The total length of FANhybrid_r1.2 thus created was 173.2 Mb with the N50 length of 5137 bp. The Illumina-assembled genome sequences of F. x ananassa and the four wild species were then mapped onto the reference genome, along with the previously published F. vesca genome sequence to establish the subgenomic structure of F. x ananassa. The strategy adopted in this study has turned out to be successful in dissecting the genome of octoploid F. x ananassa and appears promising when applied to the analysis of other polyploid plant species.

  10. UFO: a web server for ultra-fast functional profiling of whole genome protein sequences.

    Science.gov (United States)

    Meinicke, Peter

    2009-09-02

    Functional profiling is a key technique to characterize and compare the functional potential of entire genomes. The estimation of profiles according to an assignment of sequences to functional categories is a computationally expensive task because it requires the comparison of all protein sequences from a genome with a usually large database of annotated sequences or sequence families. Based on machine learning techniques for Pfam domain detection, the UFO web server for ultra-fast functional profiling allows researchers to process large protein sequence collections instantaneously. Besides the frequencies of Pfam and GO categories, the user also obtains the sequence specific assignments to Pfam domain families. In addition, a comparison with existing genomes provides dissimilarity scores with respect to 821 reference proteomes. Considering the underlying UFO domain detection, the results on 206 test genomes indicate a high sensitivity of the approach. In comparison with current state-of-the-art HMMs, the runtime measurements show a considerable speed up in the range of four orders of magnitude. For an average size prokaryotic genome, the computation of a functional profile together with its comparison typically requires about 10 seconds of processing time. For the first time the UFO web server makes it possible to get a quick overview on the functional inventory of newly sequenced organisms. The genome scale comparison with a large number of precomputed profiles allows a first guess about functionally related organisms. The service is freely available and does not require user registration or specification of a valid email address.

  11. First fungal genome sequence from Africa: A preliminary analysis

    Directory of Open Access Journals (Sweden)

    Rene Sutherland

    2012-01-01

    Full Text Available Some of the most significant breakthroughs in the biological sciences this century will emerge from the development of next generation sequencing technologies. The ease of availability of DNA sequence made possible through these new technologies has given researchers opportunities to study organisms in a manner that was not possible with Sanger sequencing. Scientists will, therefore, need to embrace genomics, as well as develop and nurture the human capacity to sequence genomes and utilise the ’tsunami‘ of data that emerge from genome sequencing. In response to these challenges, we sequenced the genome of Fusarium circinatum, a fungal pathogen of pine that causes pitch canker, a disease of great concern to the South African forestry industry. The sequencing work was conducted in South Africa, making F. circinatum the first eukaryotic organism for which the complete genome has been sequenced locally. Here we report on the process that was followed to sequence, assemble and perform a preliminary characterisation of the genome. Furthermore, details of the computer annotation and manual curation of this genome are presented. The F. circinatum genome was found to be nearly 44 million bases in size, which is similar to that of four other Fusarium genomes that have been sequenced elsewhere. The genome contains just over 15 000 open reading frames, which is less than that of the related species, Fusarium oxysporum, but more than that for Fusarium verticillioides. Amongst the various putative gene clusters identified in F. circinatum, those encoding the secondary metabolites fumosin and fusarin appeared to harbour evidence of gene translocation. It is anticipated that similar comparisons of other loci will provide insights into the genetic basis for pathogenicity of the pitch canker pathogen. Perhaps more importantly, this project has engaged a relatively large group of scientists

  12. Coverage bias and sensitivity of variant calling for four whole-genome sequencing technologies.

    Directory of Open Access Journals (Sweden)

    Nora Rieber

    Full Text Available The emergence of high-throughput, next-generation sequencing technologies has dramatically altered the way we assess genomes in population genetics and in cancer genomics. Currently, there are four commonly used whole-genome sequencing platforms on the market: Illumina's HiSeq2000, Life Technologies' SOLiD 4 and its completely redesigned 5500xl SOLiD, and Complete Genomics' technology. A number of earlier studies have compared a subset of those sequencing platforms or compared those platforms with Sanger sequencing, which is prohibitively expensive for whole genome studies. Here we present a detailed comparison of the performance of all currently available whole genome sequencing platforms, especially regarding their ability to call SNVs and to evenly cover the genome and specific genomic regions. Unlike earlier studies, we base our comparison on four different samples, allowing us to assess the between-sample variation of the platforms. We find a pronounced GC bias in GC-rich regions for Life Technologies' platforms, with Complete Genomics performing best here, while we see the least bias in GC-poor regions for HiSeq2000 and 5500xl. HiSeq2000 gives the most uniform coverage and displays the least sample-to-sample variation. In contrast, Complete Genomics exhibits by far the smallest fraction of bases not covered, while the SOLiD platforms reveal remarkable shortcomings, especially in covering CpG islands. When comparing the performance of the four platforms for calling SNPs, HiSeq2000 and Complete Genomics achieve the highest sensitivity, while the SOLiD platforms show the lowest false positive rate. Finally, we find that integrating sequencing data from different platforms offers the potential to combine the strengths of different technologies. In summary, our results detail the strengths and weaknesses of all four whole-genome sequencing platforms. It indicates application areas that call for a specific sequencing platform and disallow other

  13. Evaluation of next generation mtGenome sequencing using the Ion Torrent Personal Genome Machine (PGM)☆

    Science.gov (United States)

    Parson, Walther; Strobl, Christina; Huber, Gabriela; Zimmermann, Bettina; Gomes, Sibylle M.; Souto, Luis; Fendt, Liane; Delport, Rhena; Langit, Reina; Wootton, Sharon; Lagacé, Robert; Irwin, Jodi

    2013-01-01

    Insights into the human mitochondrial phylogeny have been primarily achieved by sequencing full mitochondrial genomes (mtGenomes). In forensic genetics (partial) mtGenome information can be used to assign haplotypes to their phylogenetic backgrounds, which may, in turn, have characteristic geographic distributions that would offer useful information in a forensic case. In addition and perhaps even more relevant in the forensic context, haplogroup-specific patterns of mutations form the basis for quality control of mtDNA sequences. The current method for establishing (partial) mtDNA haplotypes is Sanger-type sequencing (STS), which is laborious, time-consuming, and expensive. With the emergence of Next Generation Sequencing (NGS) technologies, the body of available mtDNA data can potentially be extended much more quickly and cost-efficiently. Customized chemistries, laboratory workflows and data analysis packages could support the community and increase the utility of mtDNA analysis in forensics. We have evaluated the performance of mtGenome sequencing using the Personal Genome Machine (PGM) and compared the resulting haplotypes directly with conventional Sanger-type sequencing. A total of 64 mtGenomes (>1 million bases) were established that yielded high concordance with the corresponding STS haplotypes (<0.02% differences). About two-thirds of the differences were observed in or around homopolymeric sequence stretches. In addition, the sequence alignment algorithm employed to align NGS reads played a significant role in the analysis of the data and the resulting mtDNA haplotypes. Further development of alignment software would be desirable to facilitate the application of NGS in mtDNA forensic genetics. PMID:23948325

  14. Whole genome sequencing of elite rice cultivars as a comprehensive information resource for marker assisted selection

    Science.gov (United States)

    Current advances in sequencing technologies and bioinformatics allow to determine a nearly complete genomic background of rice, a staple food for the poor people. Consequently, comprehensive databases of variation among thousands of varieties is currently being assembled and released. Proper analysi...

  15. Complete genome sequence of Ferroglobus placidus AEDII12DO

    Energy Technology Data Exchange (ETDEWEB)

    Anderson, Iain [U.S. Department of Energy, Joint Genome Institute; Risso, Carla [University of Massachusetts, Amherst; Holmes, Dawn [University of Massachusetts, Amherst; Lucas, Susan [U.S. Department of Energy, Joint Genome Institute; Copeland, A [U.S. Department of Energy, Joint Genome Institute; Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute; Cheng, Jan-Fang [U.S. Department of Energy, Joint Genome Institute; Bruce, David [Los Alamos National Laboratory (LANL); Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Pitluck, Sam [U.S. Department of Energy, Joint Genome Institute; Saunders, Elizabeth H [Los Alamos National Laboratory (LANL); Brettin, Thomas S [ORNL; Detter, J. Chris [U.S. Department of Energy, Joint Genome Institute; Han, Cliff [Los Alamos National Laboratory (LANL); Tapia, Roxanne [Los Alamos National Laboratory (LANL); Larimer, Frank W [ORNL; Land, Miriam L [ORNL; Hauser, Loren John [ORNL; Woyke, Tanja [U.S. Department of Energy, Joint Genome Institute; Lovley, Derek [University of Massachusetts, Amherst; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute; Ivanova, N [U.S. Department of Energy, Joint Genome Institute

    2011-01-01

    Ferroglobus placidus belongs to the order Archaeoglobales within the archaeal phylum Euryar- chaeota. Strain AEDII12DO is the type strain of the species and was isolated from a shallow marine hydrothermal system at Vulcano, Italy. It is a hyperthermophilic, anaerobic chemoli- thoautotroph, but it can also use a variety of aromatic compounds as electron donors. Here we describe the features of this organism together with the complete genome sequence and anno- tation. The 2,196,266 bp genome with its 2,567 protein-coding and 55 RNA genes was se- quenced as part of a DOE Joint Genome Institute Laboratory Sequencing Program (LSP) project.

  16. Complete genome sequence of Serratia plymuthica strain AS12

    Energy Technology Data Exchange (ETDEWEB)

    Neupane, Saraswoti [Uppsala University, Uppsala, Sweden; Finlay, Roger D. [Uppsala University, Uppsala, Sweden; Alstrom, Sadhna [Uppsala University, Uppsala, Sweden; Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute; Lucas, Susan [U.S. Department of Energy, Joint Genome Institute; Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute; Bruce, David [Los Alamos National Laboratory (LANL); Pitluck, Sam [U.S. Department of Energy, Joint Genome Institute; Peters, Lin [U.S. Department of Energy, Joint Genome Institute; Ovchinnikova, Galina [U.S. Department of Energy, Joint Genome Institute; Chertkov, Olga [Los Alamos National Laboratory (LANL); Han, James [U.S. Department of Energy, Joint Genome Institute; Han, Cliff [Los Alamos National Laboratory (LANL); Tapia, Roxanne [Los Alamos National Laboratory (LANL); Detter, J. Chris [U.S. Department of Energy, Joint Genome Institute; Land, Miriam L [ORNL; Hauser, Loren John [ORNL; Cheng, Jan-Fang [U.S. Department of Energy, Joint Genome Institute; Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Pagani, Ioanna [U.S. Department of Energy, Joint Genome Institute; Klenk, Hans-Peter [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Woyke, Tanja [U.S. Department of Energy, Joint Genome Institute; Hogberg, Nils [Uppsala University, Uppsala, Sweden

    2012-01-01

    A plant associated member of the family Enterobacteriaceae, Serratia plymuthica strain AS12 was isolated from rapeseed roots. It is of scientific interest due to its plant growth promoting and plant pathogen inhibiting ability. The genome of S. plymuthica AS12 comprises a 5,443,009 bp long circular chromosome, which consists of 4,952 protein-coding genes, 87 tRNA genes and 7 rRNA operons. This genome was sequenced within the 2010 DOE-JGI Community Sequencing Program (CSP2010) as part of the project entitled 'Genomics of four rapeseed plant growth promoting bacteria with antagonistic effect on plant pathogens'.

  17. Complete genome sequence of Desulfurispirillum indicum strain S5(T).

    Science.gov (United States)

    Bini, Elisabetta; Rauschenbach, Ines; Narasingarao, Priya; Starovoytov, Valentin; Hauser, Lauren; Jeffries, Cynthia D; Land, Miriam; Bruce, David; Detter, Chris; Goodwin, Lynne; Han, Shunsheng; Held, Brittany; Tapia, Roxanne; Copeland, Alex; Ivanova, Natalia; Mikhailova, Natalia; Nolan, Matt; Pati, Amrita; Pennacchio, Len; Pitluck, Sam; Woyke, Tanja; Häggblom, Max

    2011-12-31

    Desulfurispirillum indicum strain S5(T) is a strictly anaerobic bacterium isolated from river sediment in Chennai, India. D. indicum belongs to the deep branching phylum of Chrysiogenetes, which currently only includes three other cultured species. Strain S5(T) is the type strain of the species and it is capable of growth using selenate, selenite, arsenate, nitrate or nitrite as terminal electron acceptors. The 2,928,377 bp genome encodes 2,619 proteins and 49 RNA genes, and the information gained from its sequence will be relevant to the elucidation of microbially-mediated transformations of arsenic and selenium, in addition to deepening our knowledge of the underrepresented phylum of Chrysiogenetes.

  18. Genome size evolution in pufferfish: an insight from BAC clone-based Diodon holocanthus genome sequencing

    Directory of Open Access Journals (Sweden)

    Gan Xiaoni

    2010-06-01

    Full Text Available Abstract Background Variations in genome size within and between species have been observed since the 1950 s in diverse taxonomic groups. Serving as model organisms, smooth pufferfish possess the smallest vertebrate genomes. Interestingly, spiny pufferfish from its sister family have genome twice as large as smooth pufferfish. Therefore, comparative genomic analysis between smooth pufferfish and spiny pufferfish is useful for our understanding of genome size evolution in pufferfish. Results Ten BAC clones of a spiny pufferfish Diodon holocanthus were randomly selected and shotgun sequenced. In total, 776 kb of non-redundant sequences without gap representing 0.1% of the D. holocanthus genome were identified, and 77 distinct genes were predicted. In the sequenced D. holocanthus genome, 364 kb is homologous with 265 kb of the Takifugu rubripes genome, and 223 kb is homologous with 148 kb of the Tetraodon nigroviridis genome. The repetitive DNA accounts for 8% of the sequenced D. holocanthus genome, which is higher than that in the T. rubripes genome (6.89% and that in the Te. nigroviridis genome (4.66%. In the repetitive DNA, 76% is retroelements which account for 6% of the sequenced D. holocanthus genome and belong to known families of transposable elements. More than half of retroelements were distributed within genes. In the non-homologous regions, repeat element proportion in D. holocanthus genome increased to 10.6% compared with T. rubripes and increased to 9.19% compared with Te. nigroviridis. A comparison of 10 well-defined orthologous genes showed that the average intron size (566 bp in D. holocanthus genome is significantly longer than that in the smooth pufferfish genome (435 bp. Conclusion Compared with the smooth pufferfish, D. holocanthus has a low gene density and repeat elements rich genome. Genome size variation between D. holocanthus and the smooth pufferfish exhibits as length variation between homologous region and different

  19. Monitoring Genomic Sequences during SELEX Using High-Throughput Sequencing: Neutral SELEX

    Science.gov (United States)

    Chen, Doris; Lorenz, Christina; Schroeder, Renée

    2010-01-01

    Background SELEX is a well established in vitro selection tool to analyze the structure of ligand-binding nucleic acid sequences called aptamers. Genomic SELEX transforms SELEX into a tool to discover novel, genomically encoded RNA or DNA sequences binding a ligand of interest, called genomic aptamers. Concerns have been raised regarding requirements imposed on RNA sequences undergoing SELEX selection. Methodology/Principal Findings To evaluate SELEX and assess the extent of these effects, we designed and performed a Neutral SELEX experiment omitting the selection step, such that the sequences are under the sole selective pressure of SELEX's amplification steps. Using high-throughput sequencing, we obtained thousands of full-length sequences from the initial genomic library and the pools after each of the 10 rounds of Neutral SELEX. We compared these to sequences obtained from a Genomic SELEX experiment deriving from the same initial library, but screening for RNAs binding with high affinity to the E. coli regulator protein Hfq. With each round of Neutral SELEX, sequences became less stable and changed in nucleotide content, but no sequences were enriched. In contrast, we detected substantial enrichment in the Hfq-selected set with enriched sequences having structural stability similar to the neutral sequences but with significantly different nucleotide selection. Conclusions/Significance Our data indicate that positive selection in SELEX acts independently of the neutral selective requirements imposed on the sequences. We conclude that Genomic SELEX, when combined with high-throughput sequencing of positively and neutrally selected pools, as well as the gnomic library, is a powerful method to identify genomic aptamers. PMID:20161784

  20. Monitoring genomic sequences during SELEX using high-throughput sequencing: neutral SELEX.

    Directory of Open Access Journals (Sweden)

    Bob Zimmermann

    Full Text Available BACKGROUND: SELEX is a well established in vitro selection tool to analyze the structure of ligand-binding nucleic acid sequences called aptamers. Genomic SELEX transforms SELEX into a tool to discover novel, genomically encoded RNA or DNA sequences binding a ligand of interest, called genomic aptamers. Concerns have been raised regarding requirements imposed on RNA sequences undergoing SELEX selection. METHODOLOGY/PRINCIPAL FINDINGS: To evaluate SELEX and assess the extent of these effects, we designed and performed a Neutral SELEX experiment omitting the selection step, such that the sequences are under the sole selective pressure of SELEX's amplification steps. Using high-throughput sequencing, we obtained thousands of full-length sequences from the initial genomic library and the pools after each of the 10 rounds of Neutral SELEX. We compared these to sequences obtained from a Genomic SELEX experiment deriving from the same initial library, but screening for RNAs binding with high affinity to the E. coli regulator protein Hfq. With each round of Neutral SELEX, sequences became less stable and changed in nucleotide content, but no sequences were enriched. In contrast, we detected substantial enrichment in the Hfq-selected set with enriched sequences having structural stability similar to the neutral sequences but with significantly different nucleotide selection. CONCLUSIONS/SIGNIFICANCE: Our data indicate that positive selection in SELEX acts independently of the neutral selective requirements imposed on the sequences. We conclude that Genomic SELEX, when combined with high-throughput sequencing of positively and neutrally selected pools, as well as the gnomic library, is a powerful method to identify genomic aptamers.

  1. Specialized microbial databases for inductive exploration of microbial genome sequences

    Directory of Open Access Journals (Sweden)

    Cabau Cédric

    2005-02-01

    Full Text Available Abstract Background The enormous amount of genome sequence data asks for user-oriented databases to manage sequences and annotations. Queries must include search tools permitting function identification through exploration of related objects. Methods The GenoList package for collecting and mining microbial genome databases has been rewritten using MySQL as the database management system. Functions that were not available in MySQL, such as nested subquery, have been implemented. Results Inductive reasoning in the study of genomes starts from "islands of knowledge", centered around genes with some known background. With this concept of "neighborhood" in mind, a modified version of the GenoList structure has been used for organizing sequence data from prokaryotic genomes of particular interest in China. GenoChore http://bioinfo.hku.hk/genochore.html, a set of 17 specialized end-user-oriented microbial databases (including one instance of Microsporidia, Encephalitozoon cuniculi, a member of Eukarya has been made publicly available. These databases allow the user to browse genome sequence and annotation data using standard queries. In addition they provide a weekly update of searches against the world-wide protein sequences data libraries, allowing one to monitor annotation updates on genes of interest. Finally, they allow users to search for patterns in DNA or protein sequences, taking into account a clustering of genes into formal operons, as well as providing extra facilities to query sequences using predefined sequence patterns. Conclusion This growing set of specialized microbial databases organize data created by the first Chinese bacterial genome programs (ThermaList, Thermoanaerobacter tencongensis, LeptoList, with two different genomes of Leptospira interrogans and SepiList, Staphylococcus epidermidis associated to related organisms for comparison.

  2. Pig genome sequence - analysis and publication strategy

    DEFF Research Database (Denmark)

    Archibald, Alan L.; Bolund, Lars; Churcher, Carol

    2010-01-01

    preferentially selected for sequencing. In accordance with the Bermuda and Fort Lauderdale agreements and the more recent Toronto Statement the data have been released into public sequence repositories (Genbank/EMBL, NCBI/Ensembl trace repositories) in a timely manner and in advance of publication. CONCLUSIONS...

  3. A Targeted Enrichment Strategy for Massively Parallel Sequencing of Angiosperm Plastid Genomes

    Directory of Open Access Journals (Sweden)

    Gregory W. Stull

    2013-02-01

    Full Text Available Premise of the study: We explored a targeted enrichment strategy to facilitate rapid and low-cost next-generation sequencing (NGS of numerous complete plastid genomes from across the phylogenetic breadth of angiosperms. Methods and Results: A custom RNA probe set including the complete sequences of 22 previously sequenced eudicot plastomes was designed to facilitate hybridization-based targeted enrichment of eudicot plastid genomes. Using this probe set and an Agilent SureSelect targeted enrichment kit, we conducted an enrichment experiment including 24 angiosperms (22 eudicots, two monocots, which were subsequently sequenced on a single lane of the Illumina GAIIx with single-end, 100-bp reads. This approach yielded nearly complete to complete plastid genomes with exceptionally high coverage (mean coverage: 717×, even for the two monocots. Conclusions: Our enrichment experiment was highly successful even though many aspects of the capture process employed were suboptimal. Hence, significant improvements to this methodology are feasible. With this general approach and probe set, it should be possible to sequence more than 300 essentially complete plastid genomes in a single Illumina GAIIx lane (achieving 50× mean coverage. However, given the complications of pooling numerous samples for multiplex sequencing and the limited number of barcodes (e.g., 96 available in commercial kits, we recommend 96 samples as a current practical maximum for multiplex plastome sequencing. This high-throughput approach should facilitate large-scale plastid genome sequencing at any level of phylogenetic diversity in angiosperms.

  4. Open access to sequence: Browsing the Pichia pastoris genome

    Directory of Open Access Journals (Sweden)

    Graf Alexandra

    2009-10-01

    Full Text Available Abstract The first genome sequences of the important yeast protein production host Pichia pastoris have been released into the public domain this spring. In order to provide the scientific community easy and versatile access to the sequence, two web-sites have been installed as a resource for genomic sequence, gene and protein information for P. pastoris: A GBrowse based genome browser was set up at http://www.pichiagenome.org and a genome portal with gene annotation and browsing functionality at http://bioinformatics.psb.ugent.be/webtools/bogas. Both websites are offering information on gene annotation and function, regulation and structure. In addition, a WiKi based platform allows all users to create additional information on genes, proteins, physiology and other items of P. pastoris research, so that the Pichia community can benefit from exchange of knowledge, data and materials.

  5. The genome sequence of the model ascomycete fungus Podospora anserina

    NARCIS (Netherlands)

    Espagne, Eric; Lespinet, Olivier; Malagnac, Fabienne; Da Silva, Corinne; Jaillon, Olivier; Porcel, Betina M; Couloux, Arnaud; Aury, Jean-Marc; Ségurens, Béatrice; Poulain, Julie; Anthouard, Véronique; Grossetete, Sandrine; Khalili, Hamid; Coppin, Evelyne; Déquard-Chablat, Michelle; Picard, Marguerite; Contamine, Véronique; Arnaise, Sylvie; Bourdais, Anne; Berteaux-Lecellier, Véronique; Gautheret, Daniel; de Vries, Ronald P; Battaglia, Evy; Coutinho, Pedro M; Danchin, Etienne Gj; Henrissat, Bernard; Khoury, Riyad El; Sainsard-Chanet, Annie; Boivin, Antoine; Pinan-Lucarré, Bérangère; Sellem, Carole H; Debuchy, Robert; Wincker, Patrick; Weissenbach, Jean; Silar, Philippe

    2008-01-01

    BACKGROUND: The dung-inhabiting ascomycete fungus Podospora anserina is a model used to study various aspects of eukaryotic and fungal biology, such as ageing, prions and sexual development. RESULTS: We present a 10X draft sequence of P. anserina genome, linked to the sequences of a large expressed

  6. Genome sequence of Stachybotrys chartarum Strain 51-11

    Science.gov (United States)

    Stachybotrys chartarum strain 51-11 genome was sequenced by shotgun sequencing utilizing Illumina Hiseq 2000 and PacBio long read technology. Since Stachybotrys chartarum has been implicated in health impacts within water-damaged buildings, any information extracted from the geno...

  7. Sequencing and analysis of an Irish human genome.

    LENUS (Irish Health Repository)

    Tong, Pin

    2010-01-01

    Recent studies generating complete human sequences from Asian, African and European subgroups have revealed population-specific variation and disease susceptibility loci. Here, choosing a DNA sample from a population of interest due to its relative geographical isolation and genetic impact on further populations, we extend the above studies through the generation of 11-fold coverage of the first Irish human genome sequence.

  8. Genome sequence of Stachybotrys chartarum Strain 51-11

    Science.gov (United States)

    Stachybotrys chartarum strain 51-11 genome was sequenced by shotgun sequencing utilizing Illumina Hiseq 2000 and PacBio long read technology. Since Stachybotrys chartarum has been implicated in health impacts within water-damaged buildings, any information extracted from the geno...

  9. Draft Genome Sequence of Lactobacillus fermentum NB-22

    Science.gov (United States)

    Shkoporov, A. N.; Efimov, B. A.; Pikina, A. P.; Borisova, O. Y.; Gladko, I. A.; Postnikova, E. A.; Lordkipanidze, A. E.; Kafarskaia, L. I.

    2015-01-01

    We announce here a draft genome sequence of Lactobacillus fermentum NB-22, a strain isolated from human vaginal microbiota. The assembled sequence consists of 190 contigs, joined into 137 scaffolds, and the total size is 2.01 Mb. PMID:26272572

  10. Complete Genome Sequence of Kocuria palustris MU14/1.

    Science.gov (United States)

    Calcutt, Michael J; Foecking, Mark F

    2015-10-15

    Presented here is the first completely assembled genome sequence of Kocuria palustris, an actinobacterial species with broad ecological distribution. The single, circular chromosome of K. palustris MU14/1 comprises 2,854,447 bp, has a G+C content of 70.5%, and contains a deduced gene set of 2,521 coding sequences.

  11. Complete Genome Sequence of Kocuria palustris MU14/1

    OpenAIRE

    2015-01-01

    Presented here is the first completely assembled genome sequence of Kocuria palustris, an actinobacterial species with broad ecological distribution. The single, circular chromosome of K. palustris MU14/1 comprises 2,854,447 bp, has a G+C content of 70.5%, and contains a deduced gene set of 2,521 coding sequences.

  12. Complete Genomic Sequence of Issyk-Kul Virus.

    Science.gov (United States)

    Atkinson, Barry; Marston, Denise A; Ellis, Richard J; Fooks, Anthony R; Hewson, Roger

    2015-07-02

    Issyk-Kul virus (ISKV) is an ungrouped virus tentatively assigned to the Bunyaviridae family and is associated with an acute febrile illness in several central Asian countries. Using next-generation sequencing technologies, we report here the full-genome sequence for this novel unclassified arboviral pathogen circulating in central Asia.

  13. Complete genome sequence of a new maize-associated cytorhabdovirus

    Science.gov (United States)

    A new 11,877 nt cytorhabdovirus sequence with 6 open reading frames has been identified in a maize sample. It shares 50 and 51% genome-wide nucleotide sequence identity with northern cereal mosaic cytorhabdovirus (NCMV) and barley yellow striate mosaic cytorhabdovirus (BYSMV), respectively....

  14. NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins.

    Science.gov (United States)

    Pruitt, Kim D; Tatusova, Tatiana; Maglott, Donna R

    2005-01-01

    The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database (http://www.ncbi.nlm.nih.gov/RefSeq/) provides a non-redundant collection of sequences representing genomic data, transcripts and proteins. Although the goal is to provide a comprehensive dataset representing the complete sequence information for any given species, the database pragmatically includes sequence data that are currently publicly available in the archival databases. The database incorporates data from over 2400 organisms and includes over one million proteins representing significant taxonomic diversity spanning prokaryotes, eukaryotes and viruses. Nucleotide and protein sequences are explicitly linked, and the sequences are linked to other resources including the NCBI Map Viewer and Gene. Sequences are annotated to include coding regions, conserved domains, variation, references, names, database cross-references, and other features using a combined approach of collaboration and other input from the scientific community, automated annotation, propagation from GenBank and curation by NCBI staff.

  15. Genome sequence analysis of the model grass Brachypodium distachyon: insights into grass genome evolution

    Energy Technology Data Exchange (ETDEWEB)

    Schulman, Al

    2009-08-09

    Three subfamilies of grasses, the Erhardtoideae (rice), the Panicoideae (maize, sorghum, sugar cane and millet), and the Pooideae (wheat, barley and cool season forage grasses) provide the basis of human nutrition and are poised to become major sources of renewable energy. Here we describe the complete genome sequence of the wild grass Brachypodium distachyon (Brachypodium), the first member of the Pooideae subfamily to be completely sequenced. Comparison of the Brachypodium, rice and sorghum genomes reveals a precise sequence- based history of genome evolution across a broad diversity of the grass family and identifies nested insertions of whole chromosomes into centromeric regions as a predominant mechanism driving chromosome evolution in the grasses. The relatively compact genome of Brachypodium is maintained by a balance of retroelement replication and loss. The complete genome sequence of Brachypodium, coupled to its exceptional promise as a model system for grass research, will support the development of new energy and food crops

  16. Genomic Sequencing of Single Microbial Cells from Environmental Samples

    Energy Technology Data Exchange (ETDEWEB)

    Ishoey, Thomas; Woyke, Tanja; Stepanauskas, Ramunas; Novotny, Mark; Lasken, Roger S.

    2008-02-01

    Recently developed techniques allow genomic DNA sequencing from single microbial cells [Lasken RS: Single-cell genomic sequencing using multiple displacement amplification, Curr Opin Microbiol 2007, 10:510-516]. Here, we focus on research strategies for putting these methods into practice in the laboratory setting. An immediate consequence of single-cell sequencing is that it provides an alternative to culturing organisms as a prerequisite for genomic sequencing. The microgram amounts of DNA required as template are amplified from a single bacterium by a method called multiple displacement amplification (MDA) avoiding the need to grow cells. The ability to sequence DNA from individual cells will likely have an immense impact on microbiology considering the vast numbers of novel organisms, which have been inaccessible unless culture-independent methods could be used. However, special approaches have been necessary to work with amplified DNA. MDA may not recover the entire genome from the single copy present in most bacteria. Also, some sequence rearrangements can occur during the DNA amplification reaction. Over the past two years many research groups have begun to use MDA, and some practical approaches to single-cell sequencing have been developed. We review the consensus that is emerging on optimum methods, reliability of amplified template, and the proper interpretation of 'composite' genomes which result from the necessity of combining data from several single-cell MDA reactions in order to complete the assembly. Preferred laboratory methods are considered on the basis of experience at several large sequencing centers where >70% of genomes are now often recovered from single cells. Methods are reviewed for preparation of bacterial fractions from environmental samples, single-cell isolation, DNA amplification by MDA, and DNA sequencing.

  17. Genomic insight into the common carp (Cyprinus carpio genome by sequencing analysis of BAC-end sequences

    Directory of Open Access Journals (Sweden)

    Wang Jintu

    2011-04-01

    Full Text Available Abstract Background Common carp is one of the most important aquaculture teleost fish in the world. Common carp and other closely related Cyprinidae species provide over 30% aquaculture production in the world. However, common carp genomic resources are still relatively underdeveloped. BAC end sequences (BES are important resources for genome research on BAC-anchored genetic marker development, linkage map and physical map integration, and whole genome sequence assembling and scaffolding. Result To develop such valuable resources in common carp (Cyprinus carpio, a total of 40,224 BAC clones were sequenced on both ends, generating 65,720 clean BES with an average read length of 647 bp after sequence processing, representing 42,522,168 bp or 2.5% of common carp genome. The first survey of common carp genome was conducted with various bioinformatics tools. The common carp genome contains over 17.3% of repetitive elements with GC content of 36.8% and 518 transposon ORFs. To identify and develop BAC-anchored microsatellite markers, a total of 13,581 microsatellites were detected from 10,355 BES. The coding region of 7,127 genes were recognized from 9,443 BES on 7,453 BACs, with 1,990 BACs have genes on both ends. To evaluate the similarity to the genome of closely related zebrafish, BES of common carp were aligned against zebrafish genome. A total of 39,335 BES of common carp have conserved homologs on zebrafish genome which demonstrated the high similarity between zebrafish and common carp genomes, indicating the feasibility of comparative mapping between zebrafish and common carp once we have physical map of common carp. Conclusion BAC end sequences are great resources for the first genome wide survey of common carp. The repetitive DNA was estimated to be approximate 28% of common carp genome, indicating the higher complexity of the genome. Comparative analysis had mapped around 40,000 BES to zebrafish genome and established over 3

  18. Analysis of high-throughput sequencing and annotation strategies for phage genomes.

    Directory of Open Access Journals (Sweden)

    Matthew R Henn

    Full Text Available BACKGROUND: Bacterial viruses (phages play a critical role in shaping microbial populations as they influence both host mortality and horizontal gene transfer. As such, they have a significant impact on local and global ecosystem function and human health. Despite their importance, little is known about the genomic diversity harbored in phages, as methods to capture complete phage genomes have been hampered by the lack of knowledge about the target genomes, and difficulties in generating sufficient quantities of genomic DNA for sequencing. Of the approximately 550 phage genomes currently available in the public domain, fewer than 5% are marine phage. METHODOLOGY/PRINCIPAL FINDINGS: To advance the study of phage biology through comparative genomic approaches we used marine cyanophage as a model system. We compared DNA preparation methodologies (DNA extraction directly from either phage lysates or CsCl purified phage particles, and sequencing strategies that utilize either Sanger sequencing of a linker amplification shotgun library (LASL or of a whole genome shotgun library (WGSL, or 454 pyrosequencing methods. We demonstrate that genomic DNA sample preparation directly from a phage lysate, combined with 454 pyrosequencing, is best suited for phage genome sequencing at scale, as this method is capable of capturing complete continuous genomes with high accuracy. In addition, we describe an automated annotation informatics pipeline that delivers high-quality annotation and yields few false positives and negatives in ORF calling. CONCLUSIONS/SIGNIFICANCE: These DNA preparation, sequencing and annotation strategies enable a high-throughput approach to the burgeoning field of phage genomics.

  19. Sequence analysis of the genome of carnation (Dianthus caryophyllus L.).

    Science.gov (United States)

    Yagi, Masafumi; Kosugi, Shunichi; Hirakawa, Hideki; Ohmiya, Akemi; Tanase, Koji; Harada, Taro; Kishimoto, Kyutaro; Nakayama, Masayoshi; Ichimura, Kazuo; Onozaki, Takashi; Yamaguchi, Hiroyasu; Sasaki, Nobuhiro; Miyahara, Taira; Nishizaki, Yuzo; Ozeki, Yoshihiro; Nakamura, Noriko; Suzuki, Takamasa; Tanaka, Yoshikazu; Sato, Shusei; Shirasawa, Kenta; Isobe, Sachiko; Miyamura, Yoshinori; Watanabe, Akiko; Nakayama, Shinobu; Kishida, Yoshie; Kohara, Mitsuyo; Tabata, Satoshi

    2014-06-01

    The whole-genome sequence of carnation (Dianthus caryophyllus L.) cv. 'Francesco' was determined using a combination of different new-generation multiplex sequencing platforms. The total length of the non-redundant sequences was 568,887,315 bp, consisting of 45,088 scaffolds, which covered 91% of the 622 Mb carnation genome estimated by k-mer analysis. The N50 values of contigs and scaffolds were 16,644 bp and 60,737 bp, respectively, and the longest scaffold was 1,287,144 bp. The average GC content of the contig sequences was 36%. A total of 1050, 13, 92 and 143 genes for tRNAs, rRNAs, snoRNA and miRNA, respectively, were identified in the assembled genomic sequences. For protein-encoding genes, 43 266 complete and partial gene structures excluding those in transposable elements were deduced. Gene coverage was ∼ 98%, as deduced from the coverage of the core eukaryotic genes. Intensive characterization of the assigned carnation genes and comparison with those of other plant species revealed characteristic features of the carnation genome. The results of this study will serve as a valuable resource for fundamental and applied research of carnation, especially for breeding new carnation varieties. Further information on the genomic sequences is available at http://carnation.kazusa.or.jp.

  20. Genome sequencing of a single tardigrade Hypsibius dujardini individual.

    Science.gov (United States)

    Arakawa, Kazuharu; Yoshida, Yuki; Tomita, Masaru

    2016-08-16

    Tardigrades are ubiquitous microscopic animals that play an important role in the study of metazoan phylogeny. Most terrestrial tardigrades can withstand extreme environments by entering an ametabolic desiccated state termed anhydrobiosis. Due to their small size and the non-axenic nature of laboratory cultures, molecular studies of tardigrades are prone to contamination. To minimize the possibility of microbial contaminations and to obtain high-quality genomic information, we have developed an ultra-low input library sequencing protocol to enable the genome sequencing of a single tardigrade Hypsibius dujardini individual. Here, we describe the details of our sequencing data and the ultra-low input library preparation methodologies.

  1. Biased distribution of DNA uptake sequences towards genome maintenance genes

    DEFF Research Database (Denmark)

    Davidsen, T.; Rodland, E.A.; Lagesen, K.

    2004-01-01

    coding regions are the DNA uptake sequences (DUS) required for natural genetic transformation. More importantly, we found a significantly higher density of DUS within genes involved in DNA repair, recombination, restriction-modification and replication than in any other annotated gene group......Repeated sequence signatures are characteristic features of all genomic DNA. We have made a rigorous search for repeat genomic sequences in the human pathogens Neisseria meningitidis, Neisseria gonorrhoeae and Haemophilus influenzae and found that by far the most frequent 9-10mers residing within...

  2. DNA sequencing leads to genomics progress in China

    Institute of Scientific and Technical Information of China (English)

    WU JiaYan; XIAO JingFa; ZHANG RuoSi; YU Jun

    2011-01-01

    1 Science in the large-scale sequencing era Ten years ago,the first draft sequence assembly of the human genome was completed [1],bringing biomedical research one-step closer toward the goal of revolutionizing diagnosis,prevention,and treatment of human diseases.Recently,journalists from the journal Nature surveyed more than 1000 life scientists regarding this laudable aim [2],obtaining substantially negative responses [3].However,almost all of those surveyed had been influenced,in one way or another,by the availability of the human genome sequence,and they also agreed with the notion that the "sequence is the start." The complexity of genome biology and almost every aspect of human biology is far greater than previously thought [4].

  3. Genomic multiple sequence alignments: refinement using a genetic algorithm

    Directory of Open Access Journals (Sweden)

    Lefkowitz Elliot J

    2005-08-01

    Full Text Available Abstract Background Genomic sequence data cannot be fully appreciated in isolation. Comparative genomics – the practice of comparing genomic sequences from different species – plays an increasingly important role in understanding the genotypic differences between species that result in phenotypic differences as well as in revealing patterns of evolutionary relationships. One of the major challenges in comparative genomics is producing a high-quality alignment between two or more related genomic sequences. In recent years, a number of tools have been developed for aligning large genomic sequences. Most utilize heuristic strategies to identify a series of strong sequence similarities, which are then used as anchors to align the regions between the anchor points. The resulting alignment is globally correct, but in many cases is suboptimal locally. We describe a new program, GenAlignRefine, which improves the overall quality of global multiple alignments by using a genetic algorithm to improve local regions of alignment. Regions of low quality are identified, realigned using the program T-Coffee, and then refined using a genetic algorithm. Because a better COFFEE (Consistency based Objective Function For alignmEnt Evaluation score generally reflects greater alignment quality, the algorithm searches for an alignment that yields a better COFFEE score. To improve the intrinsic slowness of the genetic algorithm, GenAlignRefine was implemented as a parallel, cluster-based program. Results We tested the GenAlignRefine algorithm by running it on a Linux cluster to refine sequences from a simulation, as well as refine a multiple alignment of 15 Orthopoxvirus genomic sequences approximately 260,000 nucleotides in length that initially had been aligned by Multi-LAGAN. It took approximately 150 minutes for a 40-processor Linux cluster to optimize some 200 fuzzy (poorly aligned regions of the orthopoxvirus alignment. Overall sequence identity increased only

  4. Sequencing and Analysis of Neanderthal Genomic DNA

    OpenAIRE

    Noonan, James P.; Coop, Graham; Kudaravalli, Sridhar; Smith, Doug; Krause, Johannes; Alessi, Joe; Chen, Feng; Platt, Darren; Paabo, Svante; Pritchard, Jonathan K; Rubin, Edward M.

    2006-01-01

    Our knowledge of Neanderthals is based on a limited number of remains and artifacts from which we must make inferences about their biology, behavior, and relationship to ourselves. Here, we describe the characterization of these extinct hominids from a new perspective, based on the development of a Neanderthal metagenomic library and its high-throughput sequencing and analysis. Several lines of evidence indicate that the 65,250 base pairs of hominid sequence so far identified in the library a...

  5. Inconsistencies in Neanderthal genomic DNA sequences.

    Directory of Open Access Journals (Sweden)

    Jeffrey D Wall

    2007-10-01

    Full Text Available Two recently published papers describe nuclear DNA sequences that were obtained from the same Neanderthal fossil. Our reanalyses of the data from these studies show that they are not consistent with each other and point to serious problems with the data quality in one of the studies, possibly due to modern human DNA contaminants and/or a high rate of sequencing errors.

  6. Complete genome sequences of three strains of coxsackievirus a7.

    Science.gov (United States)

    Ylä-Pelto, Jani; Koskinen, Satu; Karelehto, Eveliina; Sittig, Eleonora; Roivainen, Merja; Hyypiä, Timo; Susi, Petri

    2013-04-11

    Genomes of three strains (Parker, USSR, and 275/58) of coxsackievirus A7 (CV-A7) were amplified by the long reverse transcription (RT)-PCR method and sequenced. While the sequences of Parker and USSR were identical, the similarities of 275/58 to the CV-A7 reference sequence, accession no. AY421765, were 82.6% and 96.2% for nucleotides and amino acids, respectively.

  7. Current Developments in Prokaryotic Single Cell Whole Genome Amplification

    Energy Technology Data Exchange (ETDEWEB)

    Goudeau, Danielle; Nath, Nandita; Ciobanu, Doina; Cheng, Jan-Fang; Malmstrom, Rex

    2014-03-14

    Our approach to prokaryotic single-cell Whole Genome Amplification at the JGI continues to evolve. To increase both the quality and number of single-cell genomes produced, we explore all aspects of the process from cell sorting to sequencing. For example, we now utilize specialized reagents, acoustic liquid handling, and reduced reaction volumes eliminate non-target DNA contamination in WGA reactions. More specifically, we use a cleaner commercial WGA kit from Qiagen that employs a UV decontamination procedure initially developed at the JGI, and we use the Labcyte Echo for tip-less liquid transfer to set up 2uL reactions. Acoustic liquid handling also dramatically reduces reagent costs. In addition, we are exploring new cell lysis methods including treatment with Proteinase K, lysozyme, and other detergents, in order to complement standard alkaline lysis and allow for more efficient disruption of a wider range of cells. Incomplete lysis represents a major hurdle for WGA on some environmental samples, especially rhizosphere, peatland, and other soils. Finding effective lysis strategies that are also compatible with WGA is challenging, and we are currently assessing the impact of various strategies on genome recovery.

  8. Bioinformatics tools and databases for whole genome sequence analysis of Mycobacterium tuberculosis.

    Science.gov (United States)

    Faksri, Kiatichai; Tan, Jun Hao; Chaiprasert, Angkana; Teo, Yik-Ying; Ong, Rick Twee-Hee

    2016-11-01

    Tuberculosis (TB) is an infectious disease of global public health importance caused by Mycobacterium tuberculosis complex (MTC) in which M. tuberculosis (Mtb) is the major causative agent. Recent advancements in genomic technologies such as next generation sequencing have enabled high throughput cost-effective generation of whole genome sequence information from Mtb clinical isolates, providing new insights into the evolution, genomic diversity and transmission of the Mtb bacteria, including molecular mechanisms of antibiotic resistance. The large volume of sequencing data generated however necessitated effective and efficient management, storage, analysis and visualization of the data and results through development of novel and customized bioinformatics software tools and databases. In this review, we aim to provide a comprehensive survey of the current freely available bioinformatics software tools and publicly accessible databases for genomic analysis of Mtb for identifying disease transmission in molecular epidemiology and in rapid determination of the antibiotic profiles of clinical isolates for prompt and optimal patient treatment.

  9. Draft genome sequence of the Tibetan antelope.

    Science.gov (United States)

    Ge, Ri-Li; Cai, Qingle; Shen, Yong-Yi; San, A; Ma, Lan; Zhang, Yong; Yi, Xin; Chen, Yan; Yang, Lingfeng; Huang, Ying; He, Rongjun; Hui, Yuanyuan; Hao, Meirong; Li, Yue; Wang, Bo; Ou, Xiaohua; Xu, Jiaohui; Zhang, Yongfen; Wu, Kui; Geng, Chunyu; Zhou, Weiping; Zhou, Taicheng; Irwin, David M; Yang, Yingzhong; Ying, Liu; Bao, Haihua; Kim, Jaebum; Larkin, Denis M; Ma, Jian; Lewin, Harris A; Xing, Jinchuan; Platt, Roy N; Ray, David A; Auvil, Loretta; Capitanu, Boris; Zhang, Xiufeng; Zhang, Guojie; Murphy, Robert W; Wang, Jun; Zhang, Ya-Ping; Wang, Jian

    2013-01-01

    The Tibetan antelope (Pantholops hodgsonii) is endemic to the extremely inhospitable high-altitude environment of the Qinghai-Tibetan Plateau, a region that has a low partial pressure of oxygen and high ultraviolet radiation. Here we generate a draft genome of this artiodactyl and use it to detect the potential genetic bases of highland adaptation. Compared with other plain-dwelling mammals, the genome of the Tibetan antelope shows signals of adaptive evolution and gene-family expansion in genes associated with energy metabolism and oxygen transmission. Both the highland American pika, and the Tibetan antelope have signals of positive selection for genes involved in DNA repair and the production of ATPase. Genes associated with hypoxia seem to have experienced convergent evolution. Thus, our study suggests that common genetic mechanisms might have been utilized to enable high-altitude adaptation.

  10. Information recovery from low coverage whole-genome bisulfite sequencing.

    Science.gov (United States)

    Libertini, Emanuele; Heath, Simon C; Hamoudi, Rifat A; Gut, Marta; Ziller, Michael J; Czyz, Agata; Ruotti, Victor; Stunnenberg, Hendrik G; Frontini, Mattia; Ouwehand, Willem H; Meissner, Alexander; Gut, Ivo G; Beck, Stephan

    2016-06-27

    The cost of whole-genome bisulfite sequencing (WGBS) remains a bottleneck for many studies and it is therefore imperative to extract as much information as possible from a given dataset. This is particularly important because even at the recommend 30X coverage for reference methylomes, up to 50% of high-resolution features such as differentially methylated positions (DMPs) cannot be called with current methods as determined by saturation analysis. To address this limitation, we have developed a tool that dynamically segments WGBS methylomes into blocks of comethylation (COMETs) from which lost information can be recovered in the form of differentially methylated COMETs (DMCs). Using this tool, we demonstrate recovery of ∼30% of the lost DMP information content as DMCs even at very low (5X) coverage. This constitutes twice the amount that can be recovered using an existing method based on differentially methylated regions (DMRs). In addition, we explored the relationship between COMETs and haplotypes in lymphoblastoid cell lines of African and European origin. Using best fit analysis, we show COMETs to be correlated in a population-specific manner, suggesting that this type of dynamic segmentation may be useful for integrated (epi)genome-wide association studies in the future.

  11. Whole genome sequencing of Chinese clearhead icefish, Protosalanx hyalocranius.

    Science.gov (United States)

    Liu, Kai; Xu, Dongpo; Li, Jia; Bian, Chao; Duan, Jinrong; Zhou, Yanfeng; Zhang, Minying; You, Xinxin; You, Yang; Chen, Jieming; Yu, Hui; Xu, Gangchun; Fang, Di-An; Qiang, Jun; Jiang, Shulun; He, Jie; Xu, Junmin; Shi, Qiong; Zhang, Zhiyong; Xu, Pao

    2017-04-01

    Chinese clearhead icefish, Protosalanx hyalocranius , is a representative icefish species with economic importance and special appearance. Due to its great economic value in China, the fish was introduced into Lake Dianchi and several other lakes from the Lake Taihu half a century ago. Similar to the Sinocyclocheilus cavefish, the clearhead icefish has certain cavefish-like traits, such as transparent body and nearly scaleless skin. Here, we provide the whole genome sequence of this surface-dwelling fish and generated a draft genome assembly, aiming at exploring molecular mechanisms for the biological interests. A total of 252.1 Gb of raw reads were sequenced. Subsequently, a novel draft genome assembly was generated, with the scaffold N50 reaching 1.163 Mb. The genome completeness was estimated to be 98.39 % by using the CEGMA evaluation. Finally, we annotated 19 884 protein-coding genes and observed that repeat sequences account for 24.43 % of the genome assembly. We report the first draft genome of the Chinese clearhead icefish. The genome assembly will provide a solid foundation for further molecular breeding and germplasm resource protection in Chinese clearhead icefish, as well as other icefishes. It is also a valuable genetic resource for revealing the molecular mechanisms for the cavefish-like characters.

  12. Genome-wide sequence variations among Mycobacterium avium subspecies paratuberculosis.

    Directory of Open Access Journals (Sweden)

    Chung-Yi eHsu

    2011-12-01

    Full Text Available Mycobacterium avium subspecies paratuberculosis (M. ap, the causative agent of Johne’s disease (JD, infects many farmed ruminants, wildlife animals and humans. To better understand the molecular pathogenesis of these infections, we analyzed the whole genome sequences of several M. ap and M. avium subspecies avium (M. avium strains isolated from various hosts and environments. Using Next-generation sequencing technology, all 6 M. ap isolates showed a high percentage of homology (98% to the reference genome sequence of M. ap K-10 isolated from cattle. However, 2 M. avium isolates (DT 78 and Env 77 showed significant sequence diversity from the reference strain M. avium 104. The genomes of M. avium isolates DT 78 and Env 77 exhibited only 87% and 40% homology, respectively, to the M. avium 104 reference genome. Within the M. ap isolates, genomic rearrangements (insertions/deletions, Indels were not detected, and only unique single nucleotide polymorphisms (SNPs were observed among the 6 M. ap strains. While most of the SNPs (~100 in M. ap genomes were non-synonymous, a total of ~ 6000 SNPs were detected among M. avium genomes, most of them were synonymous suggesting a differential selective pressure between M. ap and M. avium isolates. In addition, SNPs-based phylo-genomic analysis showed that isolates from goat and Oryx are closely related to the cattle (K-10 strain while the human isolate (M. ap 4B is closely related to the environmental strains, indicating environmental source to human infections. Overall, SNPs were the most common variations among M. ap isolates while SNPs in addition to Indels were prevalent among M. avium isolates. Genomic variations will be useful in designing host-specific markers for the analysis of mycobacterial evolution and for developing novel diagnostics directed against Johne’s disease in animals.

  13. The diploid genome sequence of an individual human.

    Directory of Open Access Journals (Sweden)

    Samuel Levy

    2007-09-01

    Full Text Available Presented here is a genome sequence of an individual human. It was produced from approximately 32 million random DNA fragments, sequenced by Sanger dideoxy technology and assembled into 4,528 scaffolds, comprising 2,810 million bases (Mb of contiguous sequence with approximately 7.5-fold coverage for any given region. We developed a modified version of the Celera assembler to facilitate the identification and comparison of alternate alleles within this individual diploid genome. Comparison of this genome and the National Center for Biotechnology Information human reference assembly revealed more than 4.1 million DNA variants, encompassing 12.3 Mb. These variants (of which 1,288,319 were novel included 3,213,401 single nucleotide polymorphisms (SNPs, 53,823 block substitutions (2-206 bp, 292,102 heterozygous insertion/deletion events (indels(1-571 bp, 559,473 homozygous indels (1-82,711 bp, 90 inversions, as well as numerous segmental duplications and copy number variation regions. Non-SNP DNA variation accounts for 22% of all events identified in the donor, however they involve 74% of all variant bases. This suggests an important role for non-SNP genetic alterations in defining the diploid genome structure. Moreover, 44% of genes were heterozygous for one or more variants. Using a novel haplotype assembly strategy, we were able to span 1.5 Gb of genome sequence in segments >200 kb, providing further precision to the diploid nature of the genome. These data depict a definitive molecular portrait of a diploid human genome that provides a starting point for future genome comparisons and enables an era of individualized genomic information.

  14. Comparison of methods for genomic localization of gene trap sequences

    Directory of Open Access Journals (Sweden)

    Ferrin Thomas E

    2006-09-01

    Full Text Available Abstract Background Gene knockouts in a model organism such as mouse provide a valuable resource for the study of basic biology and human disease. Determining which gene has been inactivated by an untargeted gene trapping event poses a challenging annotation problem because gene trap sequence tags, which represent sequence near the vector insertion site of a trapped gene, are typically short and often contain unresolved residues. To understand better the localization of these sequences on the mouse genome, we compared stand-alone versions of the alignment programs BLAT, SSAHA, and MegaBLAST. A set of 3,369 sequence tags was aligned to build 34 of the mouse genome using default parameters for each algorithm. Known genome coordinates for the cognate set of full-length genes (1,659 sequences were used to evaluate localization results. Results In general, all three programs performed well in terms of localizing sequences to a general region of the genome, with only relatively subtle errors identified for a small proportion of the sequence tags. However, large differences in performance were noted with regard to correctly identifying exon boundaries. BLAT correctly identified the vast majority of exon boundaries, while SSAHA and MegaBLAST missed the majority of exon boundaries. SSAHA consistently reported the fewest false positives and is the fastest algorithm. MegaBLAST was comparable to BLAT in speed, but was the most susceptible to localizing sequence tags incorrectly to pseudogenes. Conclusion The differences in performance for sequence tags and full-length reference sequences were surprisingly small. Characteristic variations in localization results for each program were noted that affect the localization of sequence at exon boundaries, in particular.

  15. Projector 2 : contig mapping for efficient gap-closure of prokaryotic genome sequence assemblies

    NARCIS (Netherlands)

    van Hijum, SAFT; Zomer, AL; Kuipers, OP; Kok, J

    2005-01-01

    With genome sequencing efforts increasing exponentially, valuable information accumulates on genomic content of the various organisms sequenced. Projector 2 uses (un) finished genomic sequences of an organism as a template to infer linkage information for a genome sequence assembly of a related orga

  16. High-throughput sequencing of core STR loci for forensic genetic investigations using the Roche Genome Sequencer FLX platform

    DEFF Research Database (Denmark)

    Fordyce, Sarah L; Avila-Arcos, Maria C; Rockenbauer, Eszter;

    2011-01-01

    The analysis and profiling of short tandem repeat (STR) loci is routinely used in forensic genetics. Current methods to investigate STR loci, including PCR-based standard fragment analyses and capillary electrophoresis, only provide amplicon lengths that are used to estimate the number of STR...... repeat units. These methods do not allow for the full resolution of STR base composition that sequencing approaches could provide. Here we present an STR profiling method based on the use of the Roche Genome Sequencer (GS) FLX to simultaneously sequence multiple core STR loci. Using this method...

  17. Deep whole-genome sequencing of 100 southeast Asian Malays.

    Science.gov (United States)

    Wong, Lai-Ping; Ong, Rick Twee-Hee; Poh, Wan-Ting; Liu, Xuanyao; Chen, Peng; Li, Ruoying; Lam, Kevin Koi-Yau; Pillai, Nisha Esakimuthu; Sim, Kar-Seng; Xu, Haiyan; Sim, Ngak-Leng; Teo, Shu-Mei; Foo, Jia-Nee; Tan, Linda Wei-Lin; Lim, Yenly; Koo, Seok-Hwee; Gan, Linda Seo-Hwee; Cheng, Ching-Yu; Wee, Sharon; Yap, Eric Peng-Huat; Ng, Pauline Crystal; Lim, Wei-Yen; Soong, Richie; Wenk, Markus Rene; Aung, Tin; Wong, Tien-Yin; Khor, Chiea-Chuen; Little, Peter; Chia, Kee-Seng; Teo, Yik-Ying

    2013-01-10

    Whole-genome sequencing across multiple samples in a population provides an unprecedented opportunity for comprehensively characterizing the polymorphic variants in the population. Although the 1000 Genomes Project (1KGP) has offered brief insights into the value of population-level sequencing, the low coverage has compromised the ability to confidently detect rare and low-frequency variants. In addition, the composition of populations in the 1KGP is not complete, despite the fact that the study design has been extended to more than 2,500 samples from more than 20 population groups. The Malays are one of the Austronesian groups predominantly present in Southeast Asia and Oceania, and the Singapore Sequencing Malay Project (SSMP) aims to perform deep whole-genome sequencing of 100 healthy Malays. By sequencing at a minimum of 30× coverage, we have illustrated the higher sensitivity at detecting low-frequency and rare variants and the ability to investigate the presence of hotspots of functional mutations. Compared to the low-pass sequencing in the 1KGP, the deeper coverage allows more functional variants to be identified for each person. A comparison of the fidelity of genotype imputation of Malays indicated that a population-specific reference panel, such as the SSMP, outperforms a cosmopolitan panel with larger number of individuals for common SNPs. For lower-frequency (population-level sequencing versus low-pass sequencing, especially in populations that are poorly represented in population-genetics studies.

  18. BG7: A New Approach for Bacterial Genome Annotation Designed for Next Generation Sequencing Data

    Science.gov (United States)

    Pareja-Tobes, Pablo; Manrique, Marina; Pareja-Tobes, Eduardo; Pareja, Eduardo; Tobes, Raquel

    2012-01-01

    BG7 is a new system for de novo bacterial, archaeal and viral genome annotation based on a new approach specifically designed for annotating genomes sequenced with next generation sequencing technologies. The system is versatile and able to annotate genes even in the step of preliminary assembly of the genome. It is especially efficient detecting unexpected genes horizontally acquired from bacterial or archaeal distant genomes, phages, plasmids, and mobile elements. From the initial phases of the gene annotation process, BG7 exploits the massive availability of annotated protein sequences in databases. BG7 predicts ORFs and infers their function based on protein similarity with a wide set of reference proteins, integrating ORF prediction and functional annotation phases in just one step. BG7 is especially tolerant to sequencing errors in start and stop codons, to frameshifts, and to assembly or scaffolding errors. The system is also tolerant to the high level of gene fragmentation which is frequently found in not fully assembled genomes. BG7 current version – which is developed in Java, takes advantage of Amazon Web Services (AWS) cloud computing features, but it can also be run locally in any operating system. BG7 is a fast, automated and scalable system that can cope with the challenge of analyzing the huge amount of genomes that are being sequenced with NGS technologies. Its capabilities and efficiency were demonstrated in the 2011 EHEC Germany outbreak in which BG7 was used to get the first annotations right the next day after the first entero-hemorrhagic E. coli genome sequences were made publicly available. The suitability of BG7 for genome annotation has been proved for Illumina, 454, Ion Torrent, and PacBio sequencing technologies. Besides, thanks to its plasticity, our system could be very easily adapted to work with new technologies in the future. PMID:23185310

  19. Draft Genome Sequence of Sporidiobolus salmonicolor CBS 6832, a Red-Pigmented Basidiomycetous Yeast.

    Science.gov (United States)

    Coelho, Marco A; Almeida, João M G C F; Hittinger, Chris Todd; Gonçalves, Paula

    2015-05-21

    We report the genome sequencing and annotation of the basidiomycetous red-pigmented yeast Sporidiobolus salmonicolor strain CBS 6832. The current assembly contains 395 scaffolds, for a total size of about 20.5 Mb and a G+C content of ~61.3%. The genome annotation predicts 5,147 putative protein-coding genes. Copyright © 2015 Coelho et al.

  20. Genomic architecture of HIV-1 infection: current status & challenges.

    Science.gov (United States)

    Kaur, Gurvinder; Sharma, Gaurav; Kumar, Neeraj; Kaul, Mrinali H; Bansal, Rhea A; Vajpayee, Madhu; Wig, Naveet; Sharma, Surender K; Mehra, Narinder K

    2013-11-01

    Studies on host genomics have revealed the existence of identifiable HIV-1 specific protective factors among infected individuals who remain naturally resistant viraemia controllers with little or no evidence of virus replication. These factors are broadly grouped into those that are immune associated (MHC, chemokines, cytokines, CTLs and others), linked to viral entry (chemokine co-receptors and ligands), act as post-entry restriction elements (TRIM5a, APOBEC3) and those associated with viral replication (cytokines and others). These features have been identified through multiple experimental approaches ranging from candidate gene approaches, genome wide association studies (GWAS), expression analysis in conjunction with functional assays in humans to primate based models. Several studies have highlighted the individual and population level gross differences both in the viral clade sequences as well as host determined genetic associations. This review collates current information on studies involving major histocompatibility complex (MHC) as well as non MHC genes in the context of HIV-1 infection and AIDS involving varied ethnic groups. Special focus of the review is on the genetic studies carried out on the Indian population. Further challenges with regard to therapeutic interventions based on current knowledge have been discussed along with discussion on documented cases of stem cell therapy and very early highly active antiretroviral therapy (HAART) interventions.

  1. Complete genome sequence of Nakamurella multipartita type strain (Y-104).

    Science.gov (United States)

    Tice, Hope; Mayilraj, Shanmugam; Sims, David; Lapidus, Alla; Nolan, Matt; Lucas, Susan; Glavina Del Rio, Tijana; Copeland, Alex; Cheng, Jan-Fang; Meincke, Linda; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Ivanova, Natalia; Mavromatis, Konstantinos; Ovchinnikova, Galina; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D; Detter, John C; Brettin, Thomas; Rohde, Manfred; Göker, Markus; Bristow, Jim; Eisen, Jonathan A; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Chen, Feng

    2010-03-30

    Nakamurella multipartita (Yoshimi et al. 1996) Tao et al. 2004 is the type species of the monospecific genus Nakamurella in the actinobacterial suborder Frankineae. The nonmotile, coccus-shaped strain was isolated from activated sludge acclimated with sugar-containing synthetic wastewater, and is capable of accumulating large amounts of polysaccharides in its cells. Here we describe the features of the organism, together with the complete genome sequence and annotation. This is the first complete genome sequence of a member of the family Nakamurellaceae. The 6,060,298 bp long single replicon genome with its 5415 protein-coding and 56 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  2. Complete genome sequence of Desulfotomaculum acetoxidans type strain (5575T)

    Energy Technology Data Exchange (ETDEWEB)

    Spring, Stefan [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute; Schroder, Maren [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Gleim, Dorothea [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Sims, David [Los Alamos National Laboratory (LANL); Meincke, Linda [Los Alamos National Laboratory (LANL); Glavina Del Rio, Tijana [U.S. Department of Energy, Joint Genome Institute; Tice, Hope [U.S. Department of Energy, Joint Genome Institute; Copeland, A [U.S. Department of Energy, Joint Genome Institute; Cheng, Jan-Fang [U.S. Department of Energy, Joint Genome Institute; Chen, Feng [U.S. Department of Energy, Joint Genome Institute; Lucas, Susan [U.S. Department of Energy, Joint Genome Institute; Nolan, Matt [U.S. Department of Energy, Joint Genome Institute; Bruce, David [Los Alamos National Laboratory (LANL); Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Pitluck, Sam [U.S. Department of Energy, Joint Genome Institute; Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Mavromatis, K [U.S. Department of Energy, Joint Genome Institute; Mikhailova, Natalia [U.S. Department of Energy, Joint Genome Institute; Pati, Amrita [U.S. Department of Energy, Joint Genome Institute; Chen, Amy [U.S. Department of Energy, Joint Genome Institute; Palaniappan, Krishna [U.S. Department of Energy, Joint Genome Institute; Land, Miriam L [ORNL; Hauser, Loren John [ORNL; Chang, Yun-Juan [ORNL; Jeffries, Cynthia [Oak Ridge National Laboratory (ORNL); Chain, Patrick S. G. [Lawrence Livermore National Laboratory (LLNL); Saunders, Elizabeth H [Los Alamos National Laboratory (LANL); Brettin, Tom [Los Alamos National Laboratory (LANL); Detter, J. Chris [U.S. Department of Energy, Joint Genome Institute; Goker, Markus [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Bristow, James [U.S. Department of Energy, Joint Genome Institute; Eisen, Jonathan [U.S. Department of Energy, Joint Genome Institute; Markowitz, Victor [U.S. Department of Energy, Joint Genome Institute; Hugenholtz, Philip [U.S. Department of Energy, Joint Genome Institute; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute; Klenk, Hans-Peter [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Han, Cliff [Los Alamos National Laboratory (LANL)

    2009-01-01

    Desulfotomaculum acetoxidans Widdel and Pfennig 1977 was one of the first sulfate-reducing bacteria known to grow with acetate as sole energy and carbon source. It is able to oxidize substrates completely to carbon dioxide with sulfate as the electron acceptor, which is reduced to hydrogen sulfide. All available data about this species are based on strain 5575T, isolated from piggery waste in Germany. Here we describe the features of this organ-ism, together with the complete genome sequence and annotation. This is the first completed genome sequence of a Desulfotomaculum species with validly published name. The 4,545,624 bp long single replicon genome with its 4370 protein-coding and 100 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  3. Draft genome sequence of the rubber tree Hevea brasiliensis

    Directory of Open Access Journals (Sweden)

    Rahman Ahmad Yamin Abdul

    2013-02-01

    Full Text Available Abstract Background Hevea brasiliensis, a member of the Euphorbiaceae family, is the major commercial source of natural rubber (NR. NR is a latex polymer with high elasticity, flexibility, and resilience that has played a critical role in the world economy since 1876. Results Here, we report the draft genome sequence of H. brasiliensis. The assembly spans ~1.1 Gb of the estimated 2.15 Gb haploid genome. Overall, ~78% of the genome was identified as repetitive DNA. Gene prediction shows 68,955 gene models, of which 12.7% are unique to Hevea. Most of the key genes associated with rubber biosynthesis, rubberwood formation, disease resistance, and allergenicity have been identified. Conclusions The knowledge gained from this genome sequence will aid in the future development of high-yielding clones to keep up with the ever increasing need for natural rubber.

  4. Complete genome sequence of Arcobacter nitrofigilis type strain (CIT)

    Energy Technology Data Exchange (ETDEWEB)

    Pati, Amrita [U.S. Department of Energy, Joint Genome Institute; Gronow, Sabine [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute; Copeland, A [U.S. Department of Energy, Joint Genome Institute; Glavina Del Rio, Tijana [U.S. Department of Energy, Joint Genome Institute; Nolan, Matt [U.S. Department of Energy, Joint Genome Institute; Lucas, Susan [U.S. Department of Energy, Joint Genome Institute; Tice, Hope [U.S. Department of Energy, Joint Genome Institute; Cheng, Jan-Fang [U.S. Department of Energy, Joint Genome Institute; Han, Cliff [Los Alamos National Laboratory (LANL); Chertkov, Olga [Los Alamos National Laboratory (LANL); Bruce, David [Los Alamos National Laboratory (LANL); Tapia, Roxanne [Los Alamos National Laboratory (LANL); Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Pitluck, Sam [U.S. Department of Energy, Joint Genome Institute; Liolios, Konstantinos [U.S. Department of Energy, Joint Genome Institute; Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Mavromatis, K [U.S. Department of Energy, Joint Genome Institute; Chen, Amy [U.S. Department of Energy, Joint Genome Institute; Palaniappan, Krishna [U.S. Department of Energy, Joint Genome Institute; Land, Miriam L [ORNL; Hauser, Loren John [ORNL; Jeffries, Cynthia [Oak Ridge National Laboratory (ORNL); Detter, J. Chris [U.S. Department of Energy, Joint Genome Institute; Rohde, Manfred [HZI - Helmholtz Centre for Infection Research, Braunschweig, Germany; Goker, Markus [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Bristow, James [U.S. Department of Energy, Joint Genome Institute; Eisen, Jonathan [U.S. Department of Energy, Joint Genome Institute; Markowitz, Victor [U.S. Department of Energy, Joint Genome Institute; Hugenholtz, Philip [U.S. Department of Energy, Joint Genome Institute; Klenk, Hans-Peter [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute

    2010-01-01

    Arcobacter nitrofigilis (McClung et al. 1983) Vandamme et al. 1991 is the type species of the genus Arcobacter in the epsilonproteobacterial family Campylobacteraceae. The species was first described in 1983 as Campylobacter nitrofigilis [1] after its detection as a free-living, nitrogen-fixing Campylobacter species associated with Spartina alterniflora Loisel. roots [2]. It is of phylogenetic interest because of its lifestyle as a symbiotic organism in a marine environment in contrast to many other Arcobacter species which are associated with warm-blooded animals and tend to be pathogenic. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of a type stain of the genus Arcobacter. The 3,192,235 bp genome with its 3,154 protein-coding and 70 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  5. The complete plastid genome sequence of Bomarea edulis (Alstroemeriaceae: Liliales).

    Science.gov (United States)

    Kim, Jung Sung; Kim, Hyoung Tae; Yoon, Chang Young; Kim, Joo-Hwan

    2016-05-01

    Bomarea, a member of the family Alstroemeriaceae, is distributed from Chile to Mexico and includes approximately 120 species. Recent molecular phylogenetic studies have clarified the monophyly of the family within the order Liliales and the sister relationship with the family Colchicaceae. At this time, five plastid genomes of Liliales have been analyzed at the familial level. To examine plastid genome variation at the generic level, we sequenced the plastid genome of Bomarea edulis, which is the most widely distributed species in the genus, and compared it with Alstroemeria aurea. The plastid genome sequence of B. edulis was 154,925 bp in length with a similar structure as A. aurea, excluding the IR-LSC junction. Ycf68 and infA were pseudogenes caused by frameshift mutations, and the ycf15 gene was deleted, similar to A. aurea.

  6. Complete genome sequence of Tsukamurella paurometabola type strain (no. 33).

    Science.gov (United States)

    Munk, A Christine; Lapidus, Alla; Lucas, Susan; Nolan, Matt; Tice, Hope; Cheng, Jan-Fang; Del Rio, Tijana Glavina; Goodwin, Lynne; Pitluck, Sam; Liolios, Konstantinos; Huntemann, Marcel; Ivanova, Natalia; Mavromatis, Konstantinos; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Tapia, Roxanne; Han, Cliff; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D; Brettin, Thomas; Yasawong, Montri; Brambilla, Evelyne-Marie; Rohde, Manfred; Sikorski, Johannes; Göker, Markus; Detter, John C; Woyke, Tanja; Bristow, James; Eisen, Jonathan A; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter

    2011-07-01

    Tsukamurella paurometabola corrig. (Steinhaus 1941) Collins et al. 1988 is the type species of the genus Tsukamurella, which is the type genus to the family Tsukamurellaceae. The species is not only of interest because of its isolated phylogenetic location, but also because it is a human opportunistic pathogen with some strains of the species reported to cause lung infection, lethal meningitis, and necrotizing tenosynovitis. This is the first completed genome sequence of a member of the genus Tsukamurella and the first genome sequence of a member of the family Tsukamurellaceae. The 4,479,724 bp long genome contains a 99,806 bp long plasmid and a total of 4,335 protein-coding and 56 RNA genes, and is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  7. Complete genome sequence of Halorhabdus utahensis type strain (AX-2).

    Science.gov (United States)

    Anderson, Iain; Tindall, Brian J; Pomrenke, Helga; Göker, Markus; Lapidus, Alla; Nolan, Matt; Copeland, Alex; Glavina Del Rio, Tijana; Chen, Feng; Tice, Hope; Cheng, Jan-Fang; Lucas, Susan; Chertkov, Olga; Bruce, David; Brettin, Thomas; Detter, John C; Han, Cliff; Goodwin, Lynne; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D; Pitluck, Sam; Pati, Amrita; Mavromatis, Konstantinos; Ivanova, Natalia; Ovchinnikova, Galina; Chen, Amy; Palaniappan, Krishna; Chain, Patrick; Rohde, Manfred; Bristow, Jim; Eisen, Jonathan A; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter

    2009-11-22

    Halorhabdus utahensis Wainø et al. 2000 is the type species of the genus, which is of phylogenetic interest because of its location on one of the deepest branches within the very extensive euryarchaeal family Halobacteriaceae. H. utahensis is a free-living, motile, rod shaped to pleomorphic, Gram-negative archaeon, which was originally isolated from a sediment sample collected from the southern arm of Great Salt Lake, Utah, USA. When grown on appropriate media, H. utahensis can form polyhydroxybutyrate (PHB). Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of the a member of halobacterial genus Halorhabdus, and the 3,116,795 bp long single replicon genome with its 3027 protein-coding and 48 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  8. Complete genome sequence of Acidimicrobium ferrooxidans type strain (ICPT)

    Energy Technology Data Exchange (ETDEWEB)

    Clum, Alicia [U.S. Department of Energy, Joint Genome Institute; Nolan, Matt [U.S. Department of Energy, Joint Genome Institute; Lang, Elke [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Glavina Del Rio, Tijana [U.S. Department of Energy, Joint Genome Institute; Tice, Hope [U.S. Department of Energy, Joint Genome Institute; Copeland, A [U.S. Department of Energy, Joint Genome Institute; Cheng, Jan-Fang [U.S. Department of Energy, Joint Genome Institute; Lucas, Susan [U.S. Department of Energy, Joint Genome Institute; Chen, Feng [U.S. Department of Energy, Joint Genome Institute; Bruce, David [U.S. Department of Energy, Joint Genome Institute; Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Pitluck, Sam [U.S. Department of Energy, Joint Genome Institute; Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Mavromatis, K [U.S. Department of Energy, Joint Genome Institute; Mikhailova, Natalia [U.S. Department of Energy, Joint Genome Institute; Pati, Amrita [U.S. Department of Energy, Joint Genome Institute; Chen, Amy [U.S. Department of Energy, Joint Genome Institute; Palaniappan, Krishna [U.S. Department of Energy, Joint Genome Institute; Goker, Markus [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Spring, Stefan [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Land, Miriam L [ORNL; Hauser, Loren John [ORNL; Chang, Yun-Juan [ORNL; Jeffries, Cynthia [Oak Ridge National Laboratory (ORNL); Chain, Patrick S. G. [Lawrence Livermore National Laboratory (LLNL); Bristow, James [U.S. Department of Energy, Joint Genome Institute; Eisen, Jonathan [U.S. Department of Energy, Joint Genome Institute; Markowitz, Victor [U.S. Department of Energy, Joint Genome Institute; Hugenholtz, Philip [U.S. Department of Energy, Joint Genome Institute; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute; Klenk, Hans-Peter [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute

    2009-01-01

    Acidimicrobium ferrooxidans (Clark and Norris 1996) is the sole and type species of the ge-nus, which until recently was the only genus within the actinobacterial family Acidimicrobia-ceae and in the order Acidomicrobiales. Rapid oxidation of iron pyrite during autotrophic growth in the absence of an enhanced CO2 concentration is characteristic for A. ferrooxidans. Here we describe the features of this organism, together with the complete genome se-quence, and annotation. This is the first complete genome sequence of the order Acidomi-crobiales, and the 2,158,157 bp long single replicon genome with its 2038 protein coding and 54 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  9. Using the NCBI Map Viewer to browse genomic sequence data.

    Science.gov (United States)

    Wolfsberg, Tyra G

    2011-04-01

    This unit includes a basic protocol with an introduction to the Map Viewer, describing how to perform a simple text-based search of genome annotations to view the genomic context of a gene, navigate along a chromosome, zoom in and out, and change the displayed maps to hide and show information. It also describes some of NCBI's sequence-analysis tools, which are provided as links from the Map Viewer. The alternate protocols describe different ways to query the genome sequence, and also illustrate additional features of the Map Viewer. Alternate Protocol 1 shows how to perform and interpret the results of a BLAST search against the human genome. Alternate Protocol 2 demonstrates how to retrieve a list of all genes between two STS markers. Finally, Alternate Protocol 3 shows how to find all annotated members of a gene family.

  10. Complete coding sequences of the rabbitpox virus genome.

    Science.gov (United States)

    Li, G; Chen, N; Roper, R L; Feng, Z; Hunter, A; Danila, M; Lefkowitz, E J; Buller, R M L; Upton, C

    2005-11-01

    Rabbitpox virus (RPXV) is highly virulent for rabbits and it has long been suspected to be a close relative of vaccinia virus. To explore these questions, the complete coding region of the rabbitpox virus genome was sequenced to permit comparison with sequenced strains of vaccinia virus and other orthopoxviruses. The genome of RPXV strain Utrecht (RPXV-UTR) is 197 731 nucleotides long, excluding the terminal hairpin structures at each end of the genome. The RPXV-UTR genome has 66.5 % A + T content, 184 putative functional genes and 12 fragmented ORF regions that are intact in other orthopoxviruses. The sequence of the RPXV-UTR genome reveals that two RPXV-UTR genes have orthologues in variola virus (VARV; the causative agent of smallpox), but not in vaccinia virus (VACV) strains. These genes are a zinc RING finger protein gene (RPXV-UTR-008) and an ankyrin repeat family protein gene (RPXV-UTR-180). A third gene, encoding a chemokine-binding protein (RPXV-UTR-001/184), is complete in VARV but functional only in some VACV strains. Examination of the evolutionary relationship between RPXV and other orthopoxviruses was carried out using the central 143 kb DNA sequence conserved among all completely sequenced orthopoxviruses and also the protein sequences of 49 gene products present in all completely sequenced chordopoxviruses. The results of these analyses both confirm that RPXV-UTR is most closely related to VACV and suggest that RPXV has not evolved directly from any of the sequenced VACV strains, since RPXV contains a 719 bp region not previously identified in any VACV.

  11. Plasmodium knowlesi genome sequences from clinical isolates reveal extensive genomic dimorphism.

    Directory of Open Access Journals (Sweden)

    Miguel M Pinheiro

    Full Text Available Plasmodium knowlesi is a newly described zoonosis that causes malaria in the human population that can be severe and fatal. The study of P. knowlesi parasites from human clinical isolates is relatively new and, in order to obtain maximum information from patient sample collections, we explored the possibility of generating P. knowlesi genome sequences from archived clinical isolates. Our patient sample collection consisted of frozen whole blood samples that contained excessive human DNA contamination and, in that form, were not suitable for parasite genome sequencing. We developed a method to reduce the amount of human DNA in the thawed blood samples in preparation for high throughput parasite genome sequencing using Illumina HiSeq and MiSeq sequencing platforms. Seven of fifteen samples processed had sufficiently pure P. knowlesi DNA for whole genome sequencing. The reads were mapped to the P. knowlesi H strain reference genome and an average mapping of 90% was obtained. Genes with low coverage were removed leaving 4623 genes for subsequent analyses. Previously we identified a DNA sequence dimorphism on a small fragment of the P. knowlesi normocyte binding protein xa gene on chromosome 14. We used the genome data to assemble full-length Pknbpxa sequences and discovered that the dimorphism extended along the gene. An in-house algorithm was developed to detect SNP sites co-associating with the dimorphism. More than half of the P. knowlesi genome was dimorphic, involving genes on all chromosomes and suggesting that two distinct types of P. knowlesi infect the human population in Sarawak, Malaysian Borneo. We use P. knowlesi clinical samples to demonstrate that Plasmodium DNA from archived patient samples can produce high quality genome data. We show that analyses, of even small numbers of difficult clinical malaria isolates, can generate comprehensive genomic information that will improve our understanding of malaria parasite diversity and

  12. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation

    Science.gov (United States)

    O'Leary, Nuala A.; Wright, Mathew W.; Brister, J. Rodney; Ciufo, Stacy; Haddad, Diana; McVeigh, Rich; Rajput, Bhanu; Robbertse, Barbara; Smith-White, Brian; Ako-Adjei, Danso; Astashyn, Alexander; Badretdin, Azat; Bao, Yiming; Blinkova, Olga; Brover, Vyacheslav; Chetvernin, Vyacheslav; Choi, Jinna; Cox, Eric; Ermolaeva, Olga; Farrell, Catherine M.; Goldfarb, Tamara; Gupta, Tripti; Haft, Daniel; Hatcher, Eneida; Hlavina, Wratko; Joardar, Vinita S.; Kodali, Vamsi K.; Li, Wenjun; Maglott, Donna; Masterson, Patrick; McGarvey, Kelly M.; Murphy, Michael R.; O'Neill, Kathleen; Pujar, Shashikant; Rangwala, Sanjida H.; Rausch, Daniel; Riddick, Lillian D.; Schoch, Conrad; Shkeda, Andrei; Storz, Susan S.; Sun, Hanzhen; Thibaud-Nissen, Francoise; Tolstoy, Igor; Tully, Raymond E.; Vatsan, Anjana R.; Wallin, Craig; Webb, David; Wu, Wendy; Landrum, Melissa J.; Kimchi, Avi; Tatusova, Tatiana; DiCuccio, Michael; Kitts, Paul; Murphy, Terence D.; Pruitt, Kim D.

    2016-01-01

    The RefSeq project at the National Center for Biotechnology Information (NCBI) maintains and curates a publicly available database of annotated genomic, transcript, and protein sequence records (http://www.ncbi.nlm.nih.gov/refseq/). The RefSeq project leverages the data submitted to the International Nucleotide Sequence Database Collaboration (INSDC) against a combination of computation, manual curation, and collaboration to produce a standard set of stable, non-redundant reference sequences. The RefSeq project augments these reference sequences with current knowledge including publications, functional features and informative nomenclature. The database currently represents sequences from more than 55 000 organisms (>4800 viruses, >40 000 prokaryotes and >10 000 eukaryotes; RefSeq release 71), ranging from a single record to complete genomes. This paper summarizes the current status of the viral, prokaryotic, and eukaryotic branches of the RefSeq project, reports on improvements to data access and details efforts to further expand the taxonomic representation of the collection. We also highlight diverse functional curation initiatives that support multiple uses of RefSeq data including taxonomic validation, genome annotation, comparative genomics, and clinical testing. We summarize our approach to utilizing available RNA-Seq and other data types in our manual curation process for vertebrate, plant, and other species, and describe a new direction for prokaryotic genomes and protein name management. PMID:26553804

  13. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation.

    Science.gov (United States)

    O'Leary, Nuala A; Wright, Mathew W; Brister, J Rodney; Ciufo, Stacy; Haddad, Diana; McVeigh, Rich; Rajput, Bhanu; Robbertse, Barbara; Smith-White, Brian; Ako-Adjei, Danso; Astashyn, Alexander; Badretdin, Azat; Bao, Yiming; Blinkova, Olga; Brover, Vyacheslav; Chetvernin, Vyacheslav; Choi, Jinna; Cox, Eric; Ermolaeva, Olga; Farrell, Catherine M; Goldfarb, Tamara; Gupta, Tripti; Haft, Daniel; Hatcher, Eneida; Hlavina, Wratko; Joardar, Vinita S; Kodali, Vamsi K; Li, Wenjun; Maglott, Donna; Masterson, Patrick; McGarvey, Kelly M; Murphy, Michael R; O'Neill, Kathleen; Pujar, Shashikant; Rangwala, Sanjida H; Rausch, Daniel; Riddick, Lillian D; Schoch, Conrad; Shkeda, Andrei; Storz, Susan S; Sun, Hanzhen; Thibaud-Nissen, Francoise; Tolstoy, Igor; Tully, Raymond E; Vatsan, Anjana R; Wallin, Craig; Webb, David; Wu, Wendy; Landrum, Melissa J; Kimchi, Avi; Tatusova, Tatiana; DiCuccio, Michael; Kitts, Paul; Murphy, Terence D; Pruitt, Kim D

    2016-01-04

    The RefSeq project at the National Center for Biotechnology Information (NCBI) maintains and curates a publicly available database of annotated genomic, transcript, and protein sequence records (http://www.ncbi.nlm.nih.gov/refseq/). The RefSeq project leverages the data submitted to the International Nucleotide Sequence Database Collaboration (INSDC) against a combination of computation, manual curation, and collaboration to produce a standard set of stable, non-redundant reference sequences. The RefSeq project augments these reference sequences with current knowledge including publications, functional features and informative nomenclature. The database currently represents sequences from more than 55,000 organisms (>4800 viruses, >40,000 prokaryotes and >10,000 eukaryotes; RefSeq release 71), ranging from a single record to complete genomes. This paper summarizes the current status of the viral, prokaryotic, and eukaryotic branches of the RefSeq project, reports on improvements to data access and details efforts to further expand the taxonomic representation of the collection. We also highlight diverse functional curation initiatives that support multiple uses of RefSeq data including taxonomic validation, genome annotation, comparative genomics, and clinical testing. We summarize our approach to utilizing available RNA-Seq and other data types in our manual curation process for vertebrate, plant, and other species, and describe a new direction for prokaryotic genomes and protein name management.

  14. Genome sequence and genetic diversity of European ash trees

    DEFF Research Database (Denmark)

    Sollars, Elizabeth S A; Harper, Andrea L; Kelly, Laura J;

    2016-01-01

    Ash trees (genus Fraxinus, family Oleaceae) are widespread throughout the Northern Hemisphere, but are being devastated in Europe by the fungus Hymenoscyphus fraxineus, causing ash dieback, and in North America by the herbivorous beetle Agrilus planipennis. Here we sequence the genome of a low......-heterozygosity Fraxinus excelsior tree from Gloucestershire, UK, annotating 38,852 protein-coding genes of which 25% appear ash specific when compared with the genomes of ten other plant species. Analyses of paralogous genes suggest a whole-genome duplication shared with olive (Olea europaea, Oleaceae). We also re......-sequence 37 F. excelsior trees from Europe, finding evidence for apparent long-term decline in effective population size. Using our reference sequence, we re-analyse association transcriptomic data, yielding improved markers for reduced susceptibility to ash dieback. Surveys of these markers in British...

  15. Understanding Cancer Genome and Its Evolution by Next Generation Sequencing

    DEFF Research Database (Denmark)

    Hou, Yong

    Cancer will cause 13 million deaths by the year of 2030, ranking the second leading cause of death worldwide. Previous studies indicate that most of the cancers originate from cells that acquired somatic mutations and evolved as Darwin Theory. Ten biological insights of cancer have been summarized...... recently. Cutting-age technologies like next generation sequencing (NGS) enable exploring cancer genome and evolution much more efficiently. However, integrated cancer genome sequencing studies showed great inter-/intra-tumoral heterogeneity (ITH) and complex evolution patterns beyond the cancer biological...... evolution by NGS, we first developed high throughput single cell sequencing (SCS) pipeline on whole exome and trascriptome and updated the pipeline after systematically reviewed the existed single cell whole genome amplification (WGA) and whole transcriptome amplification methods. Using SCS pipeline we...

  16. The complete mitochondrial genome sequence of the Daweishan Mini chicken.

    Science.gov (United States)

    Yan, Ming-Li; Ding, Su-Ping; Ye, Shao-Hui; Wang, Chun-Guang; He, Bao-Li; Yuan, Zhi-Dong; Liu, Li-Li

    2016-01-01

    Daweishan Mini chicken is a valuable chicken breed in China. In this study, the complete mitochondrial genome sequence of Daweishan Mini chicken using PCR amplification, sequencing and assembling has been obtained for the first time. The total length of the mitochondrial genome was 16,785 bp, with the base composition of 30.26% A, 23.73% T, 32.51% C, 13.51% G. It contained 37 genes (2 ribosomal RNA genes, 13 protein-coding genes, 22 transfer RNA genes) and a major non-coding control region (D-loop region). The protein start codons are ATG, except for COX1 that begins with GTG. The complete mitochondrial genome sequence of Daweishan Mini chicken provides an important data set for further investigation on the phylogenetic relationships within Gallus gallus.

  17. Identification of probable genomic packaging signal sequence from SARS—CoV genome by bioinformatics analysis

    Institute of Scientific and Technical Information of China (English)

    QINLei; XIONGBin; LUOCheng; GUOZong-Ming; HAOPei; SUJiong; NANPeng; FENGYing; SHIYi-Xiang; YUXiao-Jing; LUOXiao-Min; CHENKai-Xian; SHENXu; SHENJian-Hua; ZOUJian-Ping; ZHAOGuo-Ping; SHITie-Liu; HEWei-Zhong; ZHONGYang; JIANGHua-Liang; LIYi-Xue

    2003-01-01

    AIM:To predict the probable genomic packaging signal of SARS-CoV by bioinformatics analysis. The derived packaging signal may be used to design antisense RNA and RNA interfere (RANi) drugs treating SARS. methods: Based on the studies about the genomic packaging signals of MHV and BCoV, especially the information about primary and secondary structures, the putative genomic packaging signal of SARS_CoV were analyzed by using bioinformatic tools. Multi-alignment for the genomic sequences was performed among SARS-CoV,MHV,BCoV, PEDV and HCoV 229E. Secondary structures of RNA sequences were also predicted for the identification fo the possible genomic packaging signals. Meanwhile, the N and M proteins of all five viruses were analyzed to study the evolutionary relationship with genomic packaging signals. RESULTS: The putative genomic packaging signal of SARS-CoV locates at the 3′ end of ORF1b near that of MHV and BCoV, where is the most variable region of this gene. The RNA secondary structure of SARS-CoV genomic packaging signal is very similar to that of MHV and BCoV. The same result was also obtained in studying the genomic packaging signals of PEDV and HCoV 229E. Further more, the genomic sequence multi-alignment indicated that the locations of packaging signals of SARS-CoV, PEDV, and HCoV overlaped each other. It seems that the mutation rate of packaging signal sequences is much higher than the N protein, while only subtle variations for the M protein. CONCLUSIONS: The probable genomic packaging signal of SARS-CoV is analogous to that of MHV and BCoV, with the corresponding secondary RNA structure locating at the similar region of ORF1b. The positions where genomic packaging signals exist have suffered rounds of mutations, which may influence the primary structures of the N and M proteins consequently.

  18. Draft Genome Sequences of Actinobacillus pleuropneumoniae Serotypes 2 and 6

    DEFF Research Database (Denmark)

    Zhan, Bujie; Angen, Øystein; Hedegaard, Jakob

    2010-01-01

    Actinobacillus pleuropneumoniae is a bacterial pathogen that causes highly contagious respiratory infection in pigs and has a serious impact on the production economy and animal welfare. As clear differences in virulence between serotypes have been observed, the genetic basis should be investigated...... at the genomic level. Here, we present the draft genome sequences of the A. pleuropneumoniae serotypes 2 (strain 4226) and 6 (strain Femo)....

  19. Genome Sequences for Multiple Clavibacter Strains from Different Subspecies.

    Science.gov (United States)

    Li, Xiang Sean; Yuan, Xiaoli Kat

    2017-09-21

    The Gram-positive genus Clavibacter harbors economically important plant pathogens infecting a variety of agricultural crops, such as potato, tomato, corn, barley, etc. Here, we report five new genome sequences, those of strains CFIA-Cs3N, CFIA-CsR14, LMG 3663(T), LMG 7333(T), and ATCC 33566(T), from different subspecies of Clavibacter michiganensis All these genomic data will be used for reclassification and niche-adapted feature comparisons. © Crown copyright 2017.

  20. Complete genome sequence of Aggregatibacter (Haemophilus) aphrophilus NJ8700.

    Science.gov (United States)

    Di Bonaventura, Maria Pia; DeSalle, Rob; Pop, Mihai; Nagarajan, Niranjan; Figurski, David H; Fine, Daniel H; Kaplan, Jeffrey B; Planet, Paul J

    2009-07-01

    We report the finished and annotated genome sequence of Aggregatibacter aphrophilus strain NJ8700, a strain isolated from the oral flora of a healthy individual, and discuss characteristics that may affect its dual roles in human health and disease. This strain has a rough appearance, and its genome contains genes encoding a type VI secretion system and several factors that may participate in host colonization.

  1. Complete Genome Sequence of Aggregatibacter (Haemophilus) aphrophilus NJ8700▿

    Science.gov (United States)

    Di Bonaventura, Maria Pia; DeSalle, Rob; Pop, Mihai; Nagarajan, Niranjan; Figurski, David H.; Fine, Daniel H.; Kaplan, Jeffrey B.; Planet, Paul J.

    2009-01-01

    We report the finished and annotated genome sequence of Aggregatibacter aphrophilus strain NJ8700, a strain isolated from the oral flora of a healthy individual, and discuss characteristics that may affect its dual roles in human health and disease. This strain has a rough appearance, and its genome contains genes encoding a type VI secretion system and several factors that may participate in host colonization. PMID:19447908

  2. Genome Sequences of Equid Herpesviruses 2 and 5

    Science.gov (United States)

    Wilkie, Gavin S.; Kerr, Karen; Stewart, James P.; Studdert, Michael J.

    2015-01-01

    We resequenced the genome of equid herpesvirus 2 (EHV2) strain 86/67 and sequenced the genomes of EHV2 strain G9/92 and equid herpesvirus 5 (EHV5) strain 2-141/67. The most prominent genetic differences are the dissimilar locations of the interleukin-10 (IL-10)-like genes and the presence of an OX-2-like gene in EHV5 only. PMID:25767243

  3. Genome Sequence of the Urethral Isolate Pseudomonas aeruginosa RN21

    OpenAIRE

    Wibberg, Daniel; Tielen, Petra; Narten, Maike; Schobert, Max; Blom, Jochen; Schatschneider, Sarah; Meyer, Ann-Kathrin; Neubauer, Rüdiger; Albersmeier, Andreas; Albaum, Stefan; Jahn, Martina; Goesmann, Alexander; Vorhölter, Frank-Jörg; Pühler, Alfred; Jahn, Dieter

    2015-01-01

    Pseudomonas aeruginosa is known to cause complicated urinary tract infections (UTI). The improved 7.0-Mb draft genome sequence of P. aeruginosa RN21, isolated from a patient with an acute UTI, was determined. It carries three (pro)phage genomes, genes for two restriction/modification systems, and a clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated (Cas) system.

  4. Complete Genome Sequence of Klebsiella pneumoniae YH43

    Science.gov (United States)

    Ogura, Yoshitoshi; Hayashi, Tetsuya; Mizunoe, Yoshimitsu

    2016-01-01

    We report here the complete genome sequence of Klebsiella pneumoniae strain YH43, isolated from sweet potato. The genome consists of a single circular chromosome of 5,520,319 bp in length. It carries 8 copies of rRNA operons, 86 tRNA genes, 5,154 protein-coding genes, and the nif gene cluster for nitrogen fixation. PMID:27081127

  5. Complete genome sequence and comparative genomic analysis of an emerging human pathogen, serotype V Streptococcus agalactiae

    NARCIS (Netherlands)

    Tettelin, H; Masignani, [No Value; Cieslewicz, MJ; Eisen, JA; Peterson, S; Paulsen, IT; Nelson, KE; Margarit, [No Value; Read, TD; Madoff, LC; Beanan, MJ; Brinkac, LM; Daugherty, SC; DeBoy, RT; Durkin, AS; Kolonay, JF; Madupu, R; Lewis, MR; Radune, D; Fedorova, NB; Scanlan, D; Khouri, H; Mulligan, S; Carty, HA; Cline, RT; Van Aken, SE; Gill, J; Scarselli, M; Mora, M; Iacobini, ET; Brettoni, C; Galli, G; Mariani, M; Vegni, F; Maione, D; Rinaudo, D; Rappuoli, R; Telford, JL; Kasper, DL; Grandi, G; Fraser, CM

    2002-01-01

    The 2,160,267 bp genome sequence of Streptococcus agalactiae, the leading cause of bacterial sepsis, pneumonia, and meningitis in neonates in the U.S. and Europe, is predicted to encode 2,175 genes. Genome comparisons among S. agalactiae, Streptococcus pneumoniae, Streptococcus pyogenes, and the oth

  6. Preliminary Genomic Characterization of Ten Hardwood Tree Species from Multiplexed Low Coverage Whole Genome Sequencing.

    Directory of Open Access Journals (Sweden)

    Margaret Staton

    Full Text Available Forest health issues are on the rise in the United States, resulting from introduction of alien pests and diseases, coupled with abiotic stresses related to climate change. Increasingly, forest scientists are finding genetic/genomic resources valuable in addressing forest health issues. For a set of ten ecologically and economically important native hardwood tree species representing a broad phylogenetic spectrum, we used low coverage whole genome sequencing from multiplex Illumina paired ends to economically profile their genomic content. For six species, the genome content was further analyzed by flow cytometry in order to determine the nuclear genome size. Sequencing yielded a depth of 0.8X to 7.5X, from which in silico analysis yielded preliminary estimates of gene and repetitive sequence content in the genome for each species. Thousands of genomic SSRs were identified, with a clear predisposition toward dinucleotide repeats and AT-rich repeat motifs. Flanking primers were designed for SSR loci for all ten species, ranging from 891 loci in sugar maple to 18,167 in redbay. In summary, we have demonstrated that useful preliminary genome information including repeat content, gene content and useful SSR markers can be obtained at low cost and time input from a single lane of Illumina multiplex sequence.

  7. Complete genome sequence and comparative genomic analysis of an emerging human pathogen, serotype V Streptococcus agalactiae

    NARCIS (Netherlands)

    Tettelin, H; Masignani, [No Value; Cieslewicz, MJ; Eisen, JA; Peterson, S; Paulsen, IT; Nelson, KE; Margarit, [No Value; Read, TD; Madoff, LC; Beanan, MJ; Brinkac, LM; Daugherty, SC; DeBoy, RT; Durkin, AS; Kolonay, JF; Madupu, R; Lewis, MR; Radune, D; Fedorova, NB; Scanlan, D; Khouri, H; Mulligan, S; Carty, HA; Cline, RT; Van Aken, SE; Gill, J; Scarselli, M; Mora, M; Iacobini, ET; Brettoni, C; Galli, G; Mariani, M; Vegni, F; Maione, D; Rinaudo, D; Rappuoli, R; Telford, JL; Kasper, DL; Grandi, G; Fraser, CM

    2002-01-01

    The 2,160,267 bp genome sequence of Streptococcus agalactiae, the leading cause of bacterial sepsis, pneumonia, and meningitis in neonates in the U.S. and Europe, is predicted to encode 2,175 genes. Genome comparisons among S. agalactiae, Streptococcus pneumoniae, Streptococcus pyogenes, and the

  8. BATCH-GE: Batch analysis of Next-Generation Sequencing data for genome editing assessment.

    Science.gov (United States)

    Boel, Annekatrien; Steyaert, Woutert; De Rocker, Nina; Menten, Björn; Callewaert, Bert; De Paepe, Anne; Coucke, Paul; Willaert, Andy

    2016-07-27

    Targeted mutagenesis by the CRISPR/Cas9 system is currently revolutionizing genetics. The ease of this technique has enabled genome engineering in-vitro and in a range of model organisms and has pushed experimental dimensions to unprecedented proportions. Due to its tremendous progress in terms of speed, read length, throughput and cost, Next-Generation Sequencing (NGS) has been increasingly used for the analysis of CRISPR/Cas9 genome editing experiments. However, the current tools for genome editing assessment lack flexibility and fall short in the analysis of large amounts of NGS data. Therefore, we designed BATCH-GE, an easy-to-use bioinformatics tool for batch analysis of NGS-generated genome editing data, available from https://github.com/WouterSteyaert/BATCH-GE.git. BATCH-GE detects and reports indel mutations and other precise genome editing events and calculates the corresponding mutagenesis efficiencies for a large number of samples in parallel. Furthermore, this new tool provides flexibility by allowing the user to adapt a number of input variables. The performance of BATCH-GE was evaluated in two genome editing experiments, aiming to generate knock-out and knock-in zebrafish mutants. This tool will not only contribute to the evaluation of CRISPR/Cas9-based experiments, but will be of use in any genome editing experiment and has the ability to analyze data from every organism with a sequenced genome.

  9. Standardized metadata for human pathogen/vector genomic sequences.

    Directory of Open Access Journals (Sweden)

    Vivien G Dugan

    Full Text Available High throughput sequencing has accelerated the determination of genome sequences for thousands of human infectious disease pathogens and dozens of their vectors. The scale and scope of these data are enabling genotype-phenotype association studies to identify genetic determinants of pathogen virulence and drug/insecticide resistance, and phylogenetic studies to track the origin and spread of disease outbreaks. To maximize the utility of genomic sequences for these purposes, it is essential that metadata about the pathogen/vector isolate characteristics be collected and made available in organized, clear, and consistent formats. Here we report the development of the GSCID/BRC Project and Sample Application Standard, developed by representatives of the Genome Sequencing Centers for Infectious Diseases (GSCIDs, the Bioinformatics Resource Centers (BRCs for Infectious Diseases, and the U.S. National Institute of Allergy and Infectious Diseases (NIAID, part of the National Institutes of Health (NIH, informed by interactions with numerous collaborating scientists. It includes mapping to terms from other data standards initiatives, including the Genomic Standards Consortium's minimal information (MIxS and NCBI's BioSample/BioProjects checklists and the Ontology for Biomedical Investigations (OBI. The standard includes data fields about characteristics of the organism or environmental source of the specimen, spatial-temporal information about the specimen isolation event, phenotypic characteristics of the pathogen/vector isolated, and project leadership and support. By modeling metadata fields into an ontology-based semantic framework and reusing existing ontologies and minimum information checklists, the application standard can be extended to support additional project-specific data fields and integrated with other data represented with comparable standards. The use of this metadata standard by all ongoing and future GSCID sequencing projects will

  10. Sequence analysis reveals mosaic genome of Aichi virus

    Directory of Open Access Journals (Sweden)

    Han Xiaohong

    2011-08-01

    Full Text Available Abstract Aichi virus is a positive-sense and single-stranded RNA virus, which demonstrated to be related to diarrhea of Children. In the present study, phylogenetic and recombination analysis based on the Aichi virus complete genomes available in GenBank reveal a mosaic genome sequence [GenBank: FJ890523], of which the nt 261-852 region (the nt position was based on the aligned sequence file shows close relationship with AB010145/Japan with 97.9% sequence identity, while the other genomic regions show close relationship with AY747174/German with 90.1% sequence identity. Our results will provide valuable hints for future research on Aichi virus diversity. Aichi virus is a member of the Kobuvirus genus of the Picornaviridae family 12 and belongs to a positive-sense and single-stranded RNA virus. Its presence in fecal specimens of children suffering from diarrhea has been demonstrated in several Asian countries 3456, in Brazil and German 7, in France 8 and in Tunisia 9. Some reports showed the high level of seroprevalence in adults 710, suggesting the widespread exposure to Aichi virus during childhood. The genome of Aichi virus contains 8,280 nucleotides and a poly(A tail. The single large open reading frame (nt 713-8014 according to the strain AB010145 encodes a polyprotein of 2,432 amino acids that is cleaved into the typical picornavirus structural proteins VP0, VP3, VP1, and nonstructural proteins 2A, 2B, 2C, 3A, 3B, 3C and 3D 211. Based on the phylogenetic analysis of 519-bp sequences at the 3C-3D (3CD junction, Aichi viruses can be divided into two genotypes A and B with approximately 90% sequence homology 12. Although only six complete genomes of Aichi virus were deposited in GenBank at present, mosaic genomes can be found in strains from different countries.

  11. The genome sequence of the colonial chordate, Botryllus schlosseri

    Science.gov (United States)

    Voskoboynik, Ayelet; Neff, Norma F; Sahoo, Debashis; Newman, Aaron M; Pushkarev, Dmitry; Koh, Winston; Passarelli, Benedetto; Fan, H Christina; Mantalas, Gary L; Palmeri, Karla J; Ishizuka, Katherine J; Gissi, Carmela; Griggio, Francesca; Ben-Shlomo, Rachel; Corey, Daniel M; Penland, Lolita; White, Richard A; Weissman, Irving L; Quake, Stephen R

    2013-01-01

    Botryllus schlosseri is a colonial urochordate that follows the chordate plan of development following sexual reproduction, but invokes a stem cell-mediated budding program during subsequent rounds of asexual reproduction. As urochordates are considered to be the closest living invertebrate relatives of vertebrates, they are ideal subjects for whole genome sequence analyses. Using a novel method for high-throughput sequencing of eukaryotic genomes, we sequenced and assembled 580 Mbp of the B. schlosseri genome. The genome assembly is comprised of nearly 14,000 intron-containing predicted genes, and 13,500 intron-less predicted genes, 40% of which could be confidently parceled into 13 (of 16 haploid) chromosomes. A comparison of homologous genes between B. schlosseri and other diverse taxonomic groups revealed genomic events underlying the evolution of vertebrates and lymphoid-mediated immunity. The B. schlosseri genome is a community resource for studying alternative modes of reproduction, natural transplantation reactions, and stem cell-mediated regeneration. DOI: http://dx.doi.org/10.7554/eLife.00569.001 PMID:23840927

  12. Low-pass sequencing for microbial comparative genomics

    Directory of Open Access Journals (Sweden)

    Kennedy Sean

    2004-01-01

    Full Text Available Abstract Background We studied four extremely halophilic archaea by low-pass shotgun sequencing: (1 the metabolically versatile Haloarcula marismortui; (2 the non-pigmented Natrialba asiatica; (3 the psychrophile Halorubrum lacusprofundi and (4 the Dead Sea isolate Halobaculum gomorrense. Approximately one thousand single pass genomic sequences per genome were obtained. The data were analyzed by comparative genomic analyses using the completed Halobacterium sp. NRC-1 genome as a reference. Low-pass shotgun sequencing is a simple, inexpensive, and rapid approach that can readily be performed on any cultured microbe. Results As expected, the four archaeal halophiles analyzed exhibit both bacterial and eukaryotic characteristics as well as uniquely archaeal traits. All five halophiles exhibit greater than sixty percent GC content and low isoelectric points (pI for their predicted proteins. Multiple insertion sequence (IS elements, often involved in genome rearrangements, were identified in H. lacusprofundi and H. marismortui. The core biological functions that govern cellular and genetic mechanisms of H. sp. NRC-1 appear to be conserved in these four other halophiles. Multiple TATA box binding protein (TBP and transcription factor IIB (TFB homologs were identified from most of the four shotgunned halophiles. The reconstructed molecular tree of all five halophiles shows a large divergence between these species, but with the closest relationship being between H. sp. NRC-1 and H. lacusprofundi. Conclusion Despite the diverse habitats of these species, all five halophiles share (1 high GC content and (2 low protein isoelectric points, which are characteristics associated with environmental exposure to UV radiation and hypersalinity, respectively. Identification of multiple IS elements in the genome of H. lacusprofundi and H. marismortui suggest that genome structure and dynamic genome reorganization might be similar to that previously observed in the

  13. Castor bean organelle genome sequencing and worldwide genetic diversity analysis.

    Directory of Open Access Journals (Sweden)

    Maximo Rivarola

    Full Text Available Castor bean is an important oil-producing plant in the Euphorbiaceae family. Its high-quality oil contains up to 90% of the unusual fatty acid ricinoleate, which has many industrial and medical applications. Castor bean seeds also contain ricin, a highly toxic Type 2 ribosome-inactivating protein, which has gained relevance in recent years due to biosafety concerns. In order to gain knowledge on global genetic diversity in castor bean and to ultimately help the development of breeding and forensic tools, we carried out an extensive chloroplast sequence diversity analysis. Taking advantage of the recently published genome sequence of castor bean, we assembled the chloroplast and mitochondrion genomes extracting selected reads from the available whole genome shotgun reads. Using the chloroplast reference genome we used the methylation filtration technique to readily obtain draft genome sequences of 7 geographically and genetically diverse castor bean accessions. These sequence data were used to identify single nucleotide polymorphism markers and phylogenetic analysis resulted in the identification of two major clades that were not apparent in previous population genetic studies using genetic markers derived from nuclear DNA. Two distinct sub-clades could be defined within each major clade and large-scale genotyping of castor bean populations worldwide confirmed previously observed low levels of genetic diversity and showed a broad geographic distribution of each sub-clade.

  14. Castor bean organelle genome sequencing and worldwide genetic diversity analysis.

    Science.gov (United States)

    Rivarola, Maximo; Foster, Jeffrey T; Chan, Agnes P; Williams, Amber L; Rice, Danny W; Liu, Xinyue; Melake-Berhan, Admasu; Huot Creasy, Heather; Puiu, Daniela; Rosovitz, M J; Khouri, Hoda M; Beckstrom-Sternberg, Stephen M; Allan, Gerard J; Keim, Paul; Ravel, Jacques; Rabinowicz, Pablo D

    2011-01-01

    Castor bean is an important oil-producing plant in the Euphorbiaceae family. Its high-quality oil contains up to 90% of the unusual fatty acid ricinoleate, which has many industrial and medical applications. Castor bean seeds also contain ricin, a highly toxic Type 2 ribosome-inactivating protein, which has gained relevance in recent years due to biosafety concerns. In order to gain knowledge on global genetic diversity in castor bean and to ultimately help the development of breeding and forensic tools, we carried out an extensive chloroplast sequence diversity analysis. Taking advantage of the recently published genome sequence of castor bean, we assembled the chloroplast and mitochondrion genomes extracting selected reads from the available whole genome shotgun reads. Using the chloroplast reference genome we used the methylation filtration technique to readily obtain draft genome sequences of 7 geographically and genetically diverse castor bean accessions. These sequence data were used to identify single nucleotide polymorphism markers and phylogenetic analysis resulted in the identification of two major clades that were not apparent in previous population genetic studies using genetic markers derived from nuclear DNA. Two distinct sub-clades could be defined within each major clade and large-scale genotyping of castor bean populations worldwide confirmed previously observed low levels of genetic diversity and showed a broad geographic distribution of each sub-clade.

  15. Castor Bean Organelle Genome Sequencing and Worldwide Genetic Diversity Analysis

    Science.gov (United States)

    Chan, Agnes P.; Williams, Amber L.; Rice, Danny W.; Liu, Xinyue; Melake-Berhan, Admasu; Huot Creasy, Heather; Puiu, Daniela; Rosovitz, M. J.; Khouri, Hoda M.; Beckstrom-Sternberg, Stephen M.; Allan, Gerard J.; Keim, Paul; Ravel, Jacques; Rabinowicz, Pablo D.

    2011-01-01

    Castor bean is an important oil-producing plant in the Euphorbiaceae family. Its high-quality oil contains up to 90% of the unusual fatty acid ricinoleate, which has many industrial and medical applications. Castor bean seeds also contain ricin, a highly toxic Type 2 ribosome-inactivating protein, which has gained relevance in recent years due to biosafety concerns. In order to gain knowledge on global genetic diversity in castor bean and to ultimately help the development of breeding and forensic tools, we carried out an extensive chloroplast sequence diversity analysis. Taking advantage of the recently published genome sequence of castor bean, we assembled the chloroplast and mitochondrion genomes extracting selected reads from the available whole genome shotgun reads. Using the chloroplast reference genome we used the methylation filtration technique to readily obtain draft genome sequences of 7 geographically and genetically diverse castor bean accessions. These sequence data were used to identify single nucleotide polymorphism markers and phylogenetic analysis resulted in the identification of two major clades that were not apparent in previous population genetic studies using genetic markers derived from nuclear DNA. Two distinct sub-clades could be defined within each major clade and large-scale genotyping of castor bean populations worldwide confirmed previously observed low levels of genetic diversity and showed a broad geographic distribution of each sub-clade. PMID:21750729

  16. Sequencing and analysis of the giant panda genome

    Institute of Scientific and Technical Information of China (English)

    YANG HuanMing

    2010-01-01

    @@ The giant panda (Ailuropoda melanoleuca) is loved all over the world and is considered a symbol of China, as illustrated by its being one of the mascots for the Beijing 2008 Olympic Games.It is also one of the world's most endangered animals and a flagship species for conservation.Using next-generation sequencing technology (Illumina Genome Analyzer) and our in-house assembly software, we have generated the first map of the giant panda genome sequence.This map will provide an unparalleled amount of information to aid in understanding the genetic and biological nature of this unique species and will contribute significantly to disease control and conservation efforts for this endangered species.In March 2008, the giant panda genome sequencing and analysis project was started at the Beijing Genomics Institute (BGI) in Shenzhen with collaborators from the Kunming Institute of Zoology and the Chengdu Research Base of Giant Panda Breeding.On 21 Jan.2010, this collaboration resulted in the publication, as a cover story in the journal Nature, of the sequencing and analysis of the giant panda genome.

  17. The longest ultraconserved sequences and evolution of vertebrate mitochondrial genomes

    Institute of Scientific and Technical Information of China (English)

    2006-01-01

    We compared 753 genomes of bacteria, archaea, and mitochondria (more than 540 M data) and found four unique ultraconserved sequences in 352 vertebrate mitochondrial genomes which are the longest or second longest or third longest ultraconserved subsequences in the vertebrate mitochondrial genomes, their lengths are approximate to those of small RNA. Surprisingly, the classification and evolution relationship among some high-level categories of animals can be clearly reflected by their regularity of occurrence; moreover, these findings gave rise to some new ideas of evolution of mitochondria and living beings. For instance, the variations in mitochondrial genomes of animals may help clarify the evolution relationship between Aves and Reptile, and understand the fact that the origin of mitochondrion is at least not a simple copy of genomes of lower living things such as bacteria and archaea.

  18. The minimum information about a genome sequence (MIGS) specification.

    Science.gov (United States)

    Field, Dawn; Garrity, George; Gray, Tanya; Morrison, Norman; Selengut, Jeremy; Sterk, Peter; Tatusova, Tatiana; Thomson, Nicholas; Allen, Michael J; Angiuoli, Samuel V; Ashburner, Michael; Axelrod, Nelson; Baldauf, Sandra; Ballard, Stuart; Boore, Jeffrey; Cochrane, Guy; Cole, James; Dawyndt, Peter; De Vos, Paul; DePamphilis, Claude; Edwards, Robert; Faruque, Nadeem; Feldman, Robert; Gilbert, Jack; Gilna, Paul; Glöckner, Frank Oliver; Goldstein, Philip; Guralnick, Robert; Haft, Dan; Hancock, David; Hermjakob, Henning; Hertz-Fowler, Christiane; Hugenholtz, Phil; Joint, Ian; Kagan, Leonid; Kane, Matthew; Kennedy, Jessie; Kowalchuk, George; Kottmann, Renzo; Kolker, Eugene; Kravitz, Saul; Kyrpides, Nikos; Leebens-Mack, Jim; Lewis, Suzanna E; Li, Kelvin; Lister, Allyson L; Lord, Phillip; Maltsev, Natalia; Markowitz, Victor; Martiny, Jennifer; Methe, Barbara; Mizrachi, Ilene; Moxon, Richard; Nelson, Karen; Parkhill, Julian; Proctor, Lita; White, Owen; Sansone, Susanna-Assunta; Spiers, Andrew; Stevens, Robert; Swift, Paul; Taylor, Chris; Tateno, Yoshio; Tett, Adrian; Turner, Sarah; Ussery, David; Vaughan, Bob; Ward, Naomi; Whetzel, Trish; San Gil, Ingio; Wilson, Gareth; Wipat, Anil

    2008-05-01

    With the quantity of genomic data increasing at an exponential rate, it is imperative that these data be captured electronically, in a standard format. Standardization activities must proceed within the auspices of open-access and international working bodies. To tackle the issues surrounding the development of better descriptions of genomic investigations, we have formed the Genomic Standards Consortium (GSC). Here, we introduce the minimum information about a genome sequence (MIGS) specification with the intent of promoting participation in its development and discussing the resources that will be required to develop improved mechanisms of metadata capture and exchange. As part of its wider goals, the GSC also supports improving the 'transparency' of the information contained in existing genomic databases.

  19. Complete genome sequence of Haliscomenobacter hydrossis type strain (OT)

    Energy Technology Data Exchange (ETDEWEB)

    Daligault, Hajnalka E. [Los Alamos National Laboratory (LANL); Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute; Zeytun, Ahmet [Los Alamos National Laboratory (LANL); Nolan, Matt [U.S. Department of Energy, Joint Genome Institute; Lucas, Susan [U.S. Department of Energy, Joint Genome Institute; Glavina Del Rio, Tijana [U.S. Department of Energy, Joint Genome Institute; Tice, Hope [U.S. Department of Energy, Joint Genome Institute; Cheng, Jan-Fang [U.S. Department of Energy, Joint Genome Institute; Tapia, Roxanne [Los Alamos National Laboratory (LANL); Han, Cliff [Los Alamos National Laboratory (LANL); Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Pitluck, Sam [U.S. Department of Energy, Joint Genome Institute; Liolios, Konstantinos [U.S. Department of Energy, Joint Genome Institute; Pagani, Ioanna [U.S. Department of Energy, Joint Genome Institute; Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Huntemann, Marcel [U.S. Department of Energy, Joint Genome Institute; Mavromatis, K [U.S. Department of Energy, Joint Genome Institute; Mikhailova, Natalia [U.S. Department of Energy, Joint Genome Institute; Pati, Amrita [U.S. Department of Energy, Joint Genome Institute; Chen, Amy [U.S. Department of Energy, Joint Genome Institute; Palaniappan, Krishna [U.S. Department of Energy, Joint Genome Institute; Land, Miriam L [ORNL; Hauser, Loren John [ORNL; Brambilla, Evelyne-Marie [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Rohde, Manfred [HZI - Helmholtz Centre for Infection Research, Braunschweig, Germany; Verbarg, Susanne [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Goker, Markus [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Bristow, James [U.S. Department of Energy, Joint Genome Institute; Eisen, Jonathan [U.S. Department of Energy, Joint Genome Institute; Markowitz, Victor [U.S. Department of Energy, Joint Genome Institute; Hugenholtz, Philip [U.S. Department of Energy, Joint Genome Institute; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute; Klenk, Hans-Peter [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Woyke, Tanja [U.S. Department of Energy, Joint Genome Institute

    2011-01-01

    Haliscomenobacter hydrossis van Veen et al. 1973 is the type species of the genus Halisco- menobacter, which belongs to order 'Sphingobacteriales'. The species is of interest because of its isolated phylogenetic location in the tree of life, especially the so far genomically un- charted part of it, and because the organism grows in a thin, hardly visible hyaline sheath. Members of the species were isolated from fresh water of lakes and from ditch water. The genome of H. hydrossis is the first completed genome sequence reported from a member of the family 'Saprospiraceae'. The 8,771,651 bp long genome with its three plasmids of 92 kbp, 144 kbp and 164 kbp length contains 6,848 protein-coding and 60 RNA genes, and is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  20. An empirical strategy for characterizing bacterial proteomes across species in the absence of genomic sequences.

    Directory of Open Access Journals (Sweden)

    Joshua E Turse

    Full Text Available Global protein identification through current proteomics methods typically depends on the availability of sequenced genomes. In spite of increasingly high throughput sequencing technologies, this information is not available for every microorganism and rarely available for entire microbial communities. Nevertheless, the protein-level homology that exists between related bacteria makes it possible to extract biological information from the proteome of an organism or microbial community by using the genomic sequences of a near neighbor organism. Here, we demonstrate a trans-organism search strategy for determining the extent to which near-neighbor genome sequences can be applied to identify proteins in unsequenced environmental isolates. In proof of concept testing, we found that within a CLUSTAL W distance of 0.089, near-neighbor genomes successfully identified a high percentage of proteins within an organism. Application of this strategy to characterize environmental bacterial isolates lacking sequenced genomes, but having 16S rDNA sequence similarity to Shewanella resulted in the identification of 300-500 proteins in each strain. The majority of identified pathways mapped to core processes, as well as to processes unique to the Shewanellae, in particular to the presence of c-type cytochromes. Examples of core functional categories include energy metabolism, protein and nucleotide synthesis and cofactor biosynthesis, allowing classification of bacteria by observation of conserved processes. Additionally, within these core functionalities, we observed proteins involved in the alternative lactate utilization pathway, recently described in Shewanella.

  1. An Empirical Strategy for Characterizing Bacterial Proteomes across Species in the Absence of Genomic Sequences

    Science.gov (United States)

    Turse, Joshua E.; Marshall, Matthew J.; Fredrickson, James K.; Lipton, Mary S.; Callister, Stephen J.

    2010-01-01

    Global protein identification through current proteomics methods typically depends on the availability of sequenced genomes. In spite of increasingly high throughput sequencing technologies, this information is not available for every microorganism and rarely available for entire microbial communities. Nevertheless, the protein-level homology that exists between related bacteria makes it possible to extract biological information from the proteome of an organism or microbial community by using the genomic sequences of a near neighbor organism. Here, we demonstrate a trans-organism search strategy for determining the extent to which near-neighbor genome sequences can be applied to identify proteins in unsequenced environmental isolates. In proof of concept testing, we found that within a CLUSTAL W distance of 0.089, near-neighbor genomes successfully identified a high percentage of proteins within an organism. Application of this strategy to characterize environmental bacterial isolates lacking sequenced genomes, but having 16S rDNA sequence similarity to Shewanella resulted in the identification of 300–500 proteins in each strain. The majority of identified pathways mapped to core processes, as well as to processes unique to the Shewanellae, in particular to the presence of c-type cytochromes. Examples of core functional categories include energy metabolism, protein and nucleotide synthesis and cofactor biosynthesis, allowing classification of bacteria by observation of conserved processes. Additionally, within these core functionalities, we observed proteins involved in the alternative lactate utilization pathway, recently described in Shewanella. PMID:21103051

  2. Establishing a framework for comparative analysis of genome sequences

    Energy Technology Data Exchange (ETDEWEB)

    Bansal, A.K.

    1995-06-01

    This paper describes a framework and a high-level language toolkit for comparative analysis of genome sequence alignment The framework integrates the information derived from multiple sequence alignment and phylogenetic tree (hypothetical tree of evolution) to derive new properties about sequences. Multiple sequence alignments are treated as an abstract data type. Abstract operations have been described to manipulate a multiple sequence alignment and to derive mutation related information from a phylogenetic tree by superimposing parsimonious analysis. The framework has been applied on protein alignments to derive constrained columns (in a multiple sequence alignment) that exhibit evolutionary pressure to preserve a common property in a column despite mutation. A Prolog toolkit based on the framework has been implemented and demonstrated on alignments containing 3000 sequences and 3904 columns.

  3. Genomic Treasure Troves: Complete Genome Sequencing of Herbarium and Insect Museum Specimens

    Science.gov (United States)

    Staats, Martijn; Erkens, Roy H. J.; van de Vossenberg, Bart; Wieringa, Jan J.; Kraaijeveld, Ken; Stielow, Benjamin; Geml, József; Richardson, James E.; Bakker, Freek T.

    2013-01-01

    Unlocking the vast genomic diversity stored in natural history collections would create unprecedented opportunities for genome-scale evolutionary, phylogenetic, domestication and population genomic studies. Many researchers have been discouraged from using historical specimens in molecular studies because of both generally limited success of DNA extraction and the challenges associated with PCR-amplifying highly degraded DNA. In today's next-generation sequencing (NGS) world, opportunities and prospects for historical DNA have changed dramatically, as most NGS methods are actually designed for taking short fragmented DNA molecules as templates. Here we show that using a standard multiplex and paired-end Illumina sequencing approach, genome-scale sequence data can be generated reliably from dry-preserved plant, fungal and insect specimens collected up to 115 years ago, and with minimal destructive sampling. Using a reference-based assembly approach, we were able to produce the entire nuclear genome of a 43-year-old Arabidopsis thaliana (Brassicaceae) herbarium specimen with high and uniform sequence coverage. Nuclear genome sequences of three fungal specimens of 22–82 years of age (Agaricus bisporus, Laccaria bicolor, Pleurotus ostreatus) were generated with 81.4–97.9% exome coverage. Complete organellar genome sequences were assembled for all specimens. Using de novo assembly we retrieved between 16.2–71.0% of coding sequence regions, and hence remain somewhat cautious about prospects for de novo genome assembly from historical specimens. Non-target sequence contaminations were observed in 2 of our insect museum specimens. We anticipate that future museum genomics projects will perhaps not generate entire genome sequences in all cases (our specimens contained relatively small and low-complexity genomes), but at least generating vital comparative genomic data for testing (phylo)genetic, demographic and genetic hypotheses, that become increasingly more

  4. The complete mitochondrial genome sequence of Emperor Penguins (Aptenodytes forsteri).

    Science.gov (United States)

    Xu, Qiwu; Xia, Yan; Dang, Xiao; Chen, Xiaoli

    2016-09-01

    The emperor penguin (Aptenodytes forsteri) is the largest living species of penguin. Herein, we first reported the complete mitochondrial genome of emperor penguin. The mitochondrial genome is a circular molecule of 17 301 bp in length, consisting of 13 protein-coding genes, 22 tRNA genes, two rRNA, and one control region. To verify the accuracy and the utility of new determined mitogenome sequences, we constructed the species phylogenetic tree of emperor penguin together with 10 other closely species. This is the second complete mitochondrial genome of penguin, and this is going to be an important data to study mitochondrial evolution of birds.

  5. Use of Alignment-Free Phylogenetics for Rapid Genome Sequence-Based Typing of Helicobacter pylori Virulence Markers and Antibiotic Susceptibility

    NARCIS (Netherlands)

    van Vliet, Arnoud H M; Kusters, Johannes G.

    2015-01-01

    Whole-genome sequencing is becoming a leading technology in the typing and epidemiology of microbial pathogens, but the increase in genomic information necessitates significant investment in bioinformatic resources and expertise, and currently used methodologies struggle with genetically heterogeneo

  6. An automated annotation tool for genomic DNA sequences using GeneScan and BLAST

    Indian Academy of Sciences (India)

    Andrew M. Lynn; Chakresh Kumar Jain; K. Kosalai; Pranjan Barman; Nupur Thakur; Harish Batra; Alok Bhattacharya

    2001-04-01

    Genomic sequence data are often available well before the annotated sequence is published. We present a method for analysis of genomic DNA to identify coding sequences using the GeneScan algorithm and characterize these resultant sequences by BLAST. The routines are used to develop a system for automated annotation of genome DNA sequences.

  7. An integrated semiconductor device enabling non-optical genome sequencing.

    Science.gov (United States)

    Rothberg, Jonathan M; Hinz, Wolfgang; Rearick, Todd M; Schultz, Jonathan; Mileski, William; Davey, Mel; Leamon, John H; Johnson, Kim; Milgrew, Mark J; Edwards, Matthew; Hoon, Jeremy; Simons, Jan F; Marran, David; Myers, Jason W; Davidson, John F; Branting, Annika; Nobile, John R; Puc, Bernard P; Light, David; Clark, Travis A; Huber, Martin; Branciforte, Jeffrey T; Stoner, Isaac B; Cawley, Simon E; Lyons, Michael; Fu, Yutao; Homer, Nils; Sedova, Marina; Miao, Xin; Reed, Brian; Sabina, Jeffrey; Feierstein, Erika; Schorn, Michelle; Alanjary, Mohammad; Dimalanta, Eileen; Dressman, Devin; Kasinskas, Rachel; Sokolsky, Tanya; Fidanza, Jacqueline A; Namsaraev, Eugeni; McKernan, Kevin J; Williams, Alan; Roth, G Thomas; Bustillo, James

    2011-07-20

    The seminal importance of DNA sequencing to the life sciences, biotechnology and medicine has driven the search for more scalable and lower-cost solutions. Here we describe a DNA sequencing technology in which scalable, low-cost semiconductor manufacturing techniques are used to make an integrated circuit able to directly perform non-optical DNA sequencing of genomes. Sequence data are obtained by directly sensing the ions produced by template-directed DNA polymerase synthesis using all-natural nucleotides on this massively parallel semiconductor-sensing device or ion chip. The ion chip contains ion-sensitive, field-effect transistor-based sensors in perfect register with 1.2 million wells, which provide confinement and allow parallel, simultaneous detection of independent sequencing reactions. Use of the most widely used technology for constructing integrated circuits, the complementary metal-oxide semiconductor (CMOS) process, allows for low-cost, large-scale production and scaling of the device to higher densities and larger array sizes. We show the performance of the system by sequencing three bacterial genomes, its robustness and scalability by producing ion chips with up to 10 times as many sensors and sequencing a human genome.

  8. The mitochondrial genome sequence of the Tasmanian tiger (Thylacinus cynocephalus).

    Science.gov (United States)

    Miller, Webb; Drautz, Daniela I; Janecka, Jan E; Lesk, Arthur M; Ratan, Aakrosh; Tomsho, Lynn P; Packard, Mike; Zhang, Yeting; McClellan, Lindsay R; Qi, Ji; Zhao, Fangqing; Gilbert, M Thomas P; Dalén, Love; Arsuaga, Juan Luis; Ericson, Per G P; Huson, Daniel H; Helgen, Kristofer M; Murphy, William J; Götherström, Anders; Schuster, Stephan C

    2009-02-01

    We report the first two complete mitochondrial genome sequences of the thylacine (Thylacinus cynocephalus), or so-called Tasmanian tiger, extinct since 1936. The thylacine's phylogenetic position within australidelphian marsupials has long been debated, and here we provide strong support for the thylacine's basal position in Dasyuromorphia, aided by mitochondrial genome sequence that we generated from the extant numbat (Myrmecobius fasciatus). Surprisingly, both of our thylacine sequences differ by 11%-15% from putative thylacine mitochondrial genes in GenBank, with one of our samples originating from a direct offspring of the previously sequenced individual. Our data sample each mitochondrial nucleotide an average of 50 times, thereby providing the first high-fidelity reference sequence for thylacine population genetics. Our two sequences differ in only five nucleotides out of 15,452, hinting at a very low genetic diversity shortly before extinction. Despite the samples' heavy contamination with bacterial and human DNA and their temperate storage history, we estimate that as much as one-third of the total DNA in each sample is from the thylacine. The microbial content of the two thylacine samples was subjected to metagenomic analysis, and showed striking differences between a wild-captured individual and a born-in-captivity one. This study therefore adds to the growing evidence that extensive sequencing of museum collections is both feasible and desirable, and can yield complete genomes.

  9. The mitochondrial genome sequence of the Tasmanian tiger (Thylacinus cynocephalus)

    DEFF Research Database (Denmark)

    Miller, Webb; Drautz, Daniela I; Janecka, Jan E

    2009-01-01

    We report the first two complete mitochondrial genome sequences of the thylacine (Thylacinus cynocephalus), or so-called Tasmanian tiger, extinct since 1936. The thylacine's phylogenetic position within australidelphian marsupials has long been debated, and here we provide strong support for the ......We report the first two complete mitochondrial genome sequences of the thylacine (Thylacinus cynocephalus), or so-called Tasmanian tiger, extinct since 1936. The thylacine's phylogenetic position within australidelphian marsupials has long been debated, and here we provide strong support...... for the thylacine's basal position in Dasyuromorphia, aided by mitochondrial genome sequence that we generated from the extant numbat (Myrmecobius fasciatus). Surprisingly, both of our thylacine sequences differ by 11%-15% from putative thylacine mitochondrial genes in GenBank, with one of our samples originating...... from a direct offspring of the previously sequenced individual. Our data sample each mitochondrial nucleotide an average of 50 times, thereby providing the first high-fidelity reference sequence for thylacine population genetics. Our two sequences differ in only five nucleotides out of 15,452, hinting...

  10. The role of whole genome sequencing in antimicrobial susceptibility testing of bacteria: report from the EUCAST Subcommittee

    DEFF Research Database (Denmark)

    Ellington, M J; Ekelund, O; Aarestrup, Frank Møller

    2017-01-01

    Whole genome sequencing (WGS) offers the potential to predict antimicrobial susceptibility from a single assay. The European Committee on Antimicrobial Susceptibility Testing established a subcommittee to review the current development status of WGS for bacterial antimicrobial susceptibility test...

  11. Next-Generation Sequencing and Genome Editing in Plant Virology

    OpenAIRE

    Hadidi, Ahmed; Flores, Ricardo; Candresse, Thierry; Barba, Marina

    2016-01-01

    Next-generation sequencing (NGS) has been applied to plant virology since 2009. NGS provides highly efficient, rapid, low cost DNA, or RNA high-throughput sequencing of the genomes of plant viruses and viroids and of the specific small RNAs generated during the infection process. These small RNAs, which cover frequently the whole genome of the infectious agent, are 21–24 nt long and are known as vsRNAs for viruses and vd-sRNAs for viroids. NGS has been used in a number of studies in plant vir...

  12. Whole-genome sequence-based analysis of thyroid function

    OpenAIRE

    Taylor, Peter N; Porcu, Eleonora; Chew, Shelby; Campbell, Purdey J.; Traglia, Michela; Brown, Suzanne J.; Mullin, Benjamin H; Shihab, Hashem A.; Min, Josine; Walter, Klaudia; Memari, Yasin; Huang, Jie; Barnes, Michael R.; Beilby, John P.; Charoen, Pimphen

    2015-01-01

    Normal thyroid function is essential for health, but its genetic architecture remains poorly understood. Here, for the heritable thyroid traits thyrotropin (TSH) and free thyroxine (FT4), we analyse whole-genome sequence data from the UK10K project (N=2,287). Using additional whole-genome sequence and deeply imputed data sets, we report meta-analysis results for common variants (MAF≥1%) associated with TSH and FT4 (N=16,335). For TSH, we identify a novel variant in SYN2 (MAF=23.5%, P=6.15 × 1...

  13. Complete genome sequence of the European sheatfish virus.

    Science.gov (United States)

    Mavian, Carla; López-Bueno, Alberto; Fernández Somalo, María Pilar; Alcamí, Antonio; Alejo, Alí

    2012-06-01

    Viral diseases are an increasing threat to the thriving aquaculture industry worldwide. An emerging group of fish pathogens is formed by several ranaviruses, which have been isolated at different locations from freshwater and seawater fish species since 1985. We report the complete genome sequence of European sheatfish ranavirus (ESV), the first ranavirus isolated in Europe, which causes high mortality rates in infected sheatfish (Silurus glanis) and in other species. Analysis of the genome sequence shows that ESV belongs to the amphibian-like ranaviruses and is closely related to the epizootic hematopoietic necrosis virus (EHNV), a disease agent geographically confined to the Australian continent and notifiable to the World Organization for Animal Health.

  14. Determining and comparing protein function in Bacterial genome sequences

    DEFF Research Database (Denmark)

    Vesth, Tammi Camilla

    annotation of genes – the descriptions assigned to genes that describe the likely function of the encoded proteins. This process is limited by several factors, including the definition of a function which can be more or less specific as well as how many genes can actually be assigned a function based...... of this class have very little homology to other known genomes making functional annotation based on sequence similarity very difficult. Inspired in part by this analysis, an approach for comparative functional annotation was created based public sequenced genomes, CMGfunc. Functionally related groups...

  15. Complete Plastid Genome Sequence of the Brown Alga Undaria pinnatifida.

    Directory of Open Access Journals (Sweden)

    Lei Zhang

    Full Text Available In this study, we fully sequenced the circular plastid genome of a brown alga, Undaria pinnatifida. The genome is 130,383 base pairs (bp in size; it contains a large single-copy (LSC, 76,598 bp and a small single-copy region (SSC, 42,977 bp, separated by two inverted repeats (IRa and IRb: 5,404 bp. The genome contains 139 protein-coding, 28 tRNA, and 6 rRNA genes; none of these genes contains introns. Organization and gene contents of the U. pinnatifida plastid genome were similar to those of Saccharina japonica. There is a co-linear relationship between the plastid genome of U. pinnatifida and that of three previously sequenced large brown algal species. Phylogenetic analyses of 43 taxa based on 23 plastid protein-coding genes grouped all plastids into a red or green lineage. In the large brown algae branch, U. pinnatifida and S. japonica formed a sister clade with much closer relationship to Ectocarpus siliculosus than to Fucus vesiculosus. For the first time, the start codon ATT was identified in the plastid genome of large brown algae, in the atpA gene of U. pinnatifida. In addition, we found a gene-length change induced by a 3-bp repetitive DNA in ycf35 and ilvB genes of the U. pinnatifida plastid genome.

  16. Solving the molecular diagnostic testing conundrum for Mendelian disorders in the era of next-generation sequencing: single-gene, gene panel, or exome/genome sequencing.

    Science.gov (United States)

    Xue, Yuan; Ankala, Arunkanth; Wilcox, William R; Hegde, Madhuri R

    2015-06-01

    Next-generation sequencing is changing the paradigm of clinical genetic testing. Today there are numerous molecular tests available, including single-gene tests, gene panels, and exome sequencing or genome sequencing. As a result, ordering physicians face the conundrum of selecting the best diagnostic tool for their patients with genetic conditions. Single-gene testing is often most appropriate for conditions with distinctive clinical features and minimal locus heterogeneity. Next-generation sequencing-based gene panel testing, which can be complemented with array comparative genomic hybridization and other ancillary methods, provides a comprehensive and feasible approach for heterogeneous disorders. Exome sequencing and genome sequencing have the advantage of being unbiased regarding what set of genes is analyzed, enabling parallel interrogation of most of the genes in the human genome. However, current limitations of next-generation sequencing technology and our variant interpretation capabilities caution us against offering exome sequencing or genome sequencing as either stand-alone or first-choice diagnostic approaches. A growing interest in personalized medicine calls for the application of genome sequencing in clinical diagnostics, but major challenges must be addressed before its full potential can be realized. Here, we propose a testing algorithm to help clinicians opt for the most appropriate molecular diagnostic tool for each scenario.

  17. Sequence modelling and an extensible data model for genomic database

    Energy Technology Data Exchange (ETDEWEB)

    Li, Peter Wei-Der [California Univ., San Francisco, CA (United States)]|[Lawrence Berkeley Lab., CA (United States)

    1992-01-01

    The Human Genome Project (HGP) plans to sequence the human genome by the beginning of the next century. It will generate DNA sequences of more than 10 billion bases and complex marker sequences (maps) of more than 100 million markers. All of these information will be stored in database management systems (DBMSs). However, existing data models do not have the abstraction mechanism for modelling sequences and existing DBMS`s do not have operations for complex sequences. This work addresses the problem of sequence modelling in the context of the HGP and the more general problem of an extensible object data model that can incorporate the sequence model as well as existing and future data constructs and operators. First, we proposed a general sequence model that is application and implementation independent. This model is used to capture the sequence information found in the HGP at the conceptual level. In addition, abstract and biological sequence operators are defined for manipulating the modelled sequences. Second, we combined many features of semantic and object oriented data models into an extensible framework, which we called the ``Extensible Object Model``, to address the need of a modelling framework for incorporating the sequence data model with other types of data constructs and operators. This framework is based on the conceptual separation between constructors and constraints. We then used this modelling framework to integrate the constructs for the conceptual sequence model. The Extensible Object Model is also defined with a graphical representation, which is useful as a tool for database designers. Finally, we defined a query language to support this model and implement the query processor to demonstrate the feasibility of the extensible framework and the usefulness of the conceptual sequence model.

  18. Sequence modelling and an extensible data model for genomic database

    Energy Technology Data Exchange (ETDEWEB)

    Li, Peter Wei-Der (California Univ., San Francisco, CA (United States) Lawrence Berkeley Lab., CA (United States))

    1992-01-01

    The Human Genome Project (HGP) plans to sequence the human genome by the beginning of the next century. It will generate DNA sequences of more than 10 billion bases and complex marker sequences (maps) of more than 100 million markers. All of these information will be stored in database management systems (DBMSs). However, existing data models do not have the abstraction mechanism for modelling sequences and existing DBMS's do not have operations for complex sequences. This work addresses the problem of sequence modelling in the context of the HGP and the more general problem of an extensible object data model that can incorporate the sequence model as well as existing and future data constructs and operators. First, we proposed a general sequence model that is application and implementation independent. This model is used to capture the sequence information found in the HGP at the conceptual level. In addition, abstract and biological sequence operators are defined for manipulating the modelled sequences. Second, we combined many features of semantic and object oriented data models into an extensible framework, which we called the Extensible Object Model'', to address the need of a modelling framework for incorporating the sequence data model with other types of data constructs and operators. This framework is based on the conceptual separation between constructors and constraints. We then used this modelling framework to integrate the constructs for the conceptual sequence model. The Extensible Object Model is also defined with a graphical representation, which is useful as a tool for database designers. Finally, we defined a query language to support this model and implement the query processor to demonstrate the feasibility of the extensible framework and the usefulness of the conceptual sequence model.

  19. Genomic divergences among cattle, dog and human estimated from large-scale alignments of genomic sequences

    Directory of Open Access Journals (Sweden)

    Shade Larry L

    2006-06-01

    Full Text Available Abstract Background Approximately 11 Mb of finished high quality genomic sequences were sampled from cattle, dog and human to estimate genomic divergences and their regional variation among these lineages. Results Optimal three-way multi-species global sequence alignments for 84 cattle clones or loci (each >50 kb of genomic sequence were constructed using the human and dog genome assemblies as references. Genomic divergences and substitution rates were examined for each clone and for various sequence classes under different functional constraints. Analysis of these alignments revealed that the overall genomic divergences are relatively constant (0.32–0.37 change/site for pairwise comparisons among cattle, dog and human; however substitution rates vary across genomic regions and among different sequence classes. A neutral mutation rate (2.0–2.2 × 10(-9 change/site/year was derived from ancestral repetitive sequences, whereas the substitution rate in coding sequences (1.1 × 10(-9 change/site/year was approximately half of the overall rate (1.9–2.0 × 10(-9 change/site/year. Relative rate tests also indicated that cattle have a significantly faster rate of substitution as compared to dog and that this difference is about 6%. Conclusion This analysis provides a large-scale and unbiased assessment of genomic divergences and regional variation of substitution rates among cattle, dog and human. It is expected that these data will serve as a baseline for future mammalian molecular evolution studies.

  20. Current research status of immunology in the genomic era

    Institute of Scientific and Technical Information of China (English)

    2009-01-01

    This review updates the current status of immunology research under the influence of genomics,both conceptually and technologically.It particularly highlights the advantages of employing the high-throughput and large-scale technology,the large genomic database,and bioinformatic power in the immunology research.The fast development in the fields of basic immunology,clinical immunology(tumor and infectious immunology) and vaccine designing is illustrated with respect to the successful usage of genomic strategy.We also speculate the future research directions of immunology in the era of genomics and post-genomics.

  1. Current research status of immunology in the genomic era

    Institute of Scientific and Technical Information of China (English)

    LI HaoWen; LI dinZhi; ZHAO GuoPing; WANG Ying

    2009-01-01

    This review updates the current status of immunology research under the influence of genomics, both conceptually and technologically. It particularly highlights the advantages of employing the high-throughput and large-scale technology, the large genomic database, and bioinformatic power in the immunology research. The fast development in the fields of basic immunology, clinical immunology (tumor and infectious immunology) and vaccine designing is illustrated with respect to the successful usage of genomic strategy. We also speculate the future research directions of immunology in the era of genomics and post-genomics.

  2. Complete genome sequence of Thauera aminoaromatica strain MZ1T

    Science.gov (United States)

    Jiang, Ke; Sanseverino, John; Chauhan, Archana; Lucas, Susan; Copeland, Alex; Lapidus, Alla; Del Rio, Tijana Glavina; Dalin, Eileen; Tice, Hope; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Sims, David; Brettin, Thomas; Detter, John C.; Han, Cliff; Chang, Y.J.; Larimer, Frank; Land, Miriam; Hauser, Loren; Kyrpides, Nikos C.; Mikhailova, Natalia; Moser, Scott; Jegier, Patricia; Close, Dan; DeBruyn, Jennifer M.; Wang, Ying; Layton, Alice C.; Allen, Michael S.; Sayler, Gary S.

    2012-01-01

    Thauera aminoaromatica strain MZ1T, an isolate belonging to genus Thauera, of the family Rhodocyclaceae and the class the Betaproteobacteria, has been characterized for its ability to produce abundant exopolysaccharide and degrade various aromatic compounds with nitrate as an electron acceptor. These properties, if fully understood at the genome-sequence level, can aid in environmental processing of organic matter in anaerobic cycles by short-circuiting a central anaerobic metabolite, acetate, from microbiological conversion to methane, a critical greenhouse gas. Strain MZ1T is the first strain from the genus Thauera with a completely sequenced genome. The 4,496,212 bp chromosome and 78,374 bp plasmid contain 4,071 protein-coding and 71 RNA genes, and were sequenced as part of the DOE Community Sequencing Program CSP_776774. PMID:23407619

  3. Complete genome sequence of Thauera aminoaromatica strain MZ1T

    Energy Technology Data Exchange (ETDEWEB)

    Sanseverino, John [ORNL; Chauhan, Archana [University of Tennessee, Knoxville (UTK); Lucas, Susan [U.S. Department of Energy, Joint Genome Institute; Copeland, A [U.S. Department of Energy, Joint Genome Institute; Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute; Glavina Del Rio, Tijana [U.S. Department of Energy, Joint Genome Institute; Dalin, Eileen [U.S. Department of Energy, Joint Genome Institute; Tice, Hope [U.S. Department of Energy, Joint Genome Institute; Bruce, David [Los Alamos National Laboratory (LANL); Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Pitluck, Sam [U.S. Department of Energy, Joint Genome Institute; Sims, David [Los Alamos National Laboratory (LANL); Brettin, Thomas S [ORNL; Detter, J. Chris [U.S. Department of Energy, Joint Genome Institute; Han, Cliff [Los Alamos National Laboratory (LANL); Chang, Yun-Juan [ORNL; Larimer, Frank W [ORNL; Land, Miriam L [ORNL; Hauser, Loren John [ORNL; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute; Mikhailova, Natalia [U.S. Department of Energy, Joint Genome Institute; Moser, Scott [University of Tennessee, Knoxville (UTK); Jegier, Patricia [University of Tennessee, Knoxville (UTK); Close, Dan [University of Tennessee, Knoxville (UTK); Wang, Ying [University of Tennessee, Knoxville (UTK); Layton, Alice [University of Tennessee, Knoxville (UTK); Allen, Michael S. [University of Tennessee, Knoxville (UTK); Sayler, Gary [University of Tennessee, Knoxville (UTK)

    2012-01-01

    Thauera aminoaromatica strain MZ1T, an isolate belonging to genus Thauera, of the family Rhodocyclaceae and the class the Betaproteobacteria, has been characterized for its ability to produce abundant exopolysaccharide and degrade various aromatic compounds with nitrate as an electron acceptor. These properties, if fully understood at the genome-sequence level, can aid in environmental processing of organic matter in anaerobic cycles by short-circuiting a central anaerobic metabolite, acetate, from microbiological conversion to methane, a criti-cal greenhouse gas. Strain MZ1T is the first strain from the genus Thauera with a completely sequenced genome. The 4,496,212 bp chromosome and 78,374 bp plasmid contain 4,071 protein-coding and 71 RNA genes, and were sequenced as part of the DOE Community Se-quencing Program CSP{_}776774.

  4. Building a model: developing genomic resources for common milkweed (Asclepias syriaca with low coverage genome sequencing

    Directory of Open Access Journals (Sweden)

    Weitemier Kevin

    2011-05-01

    Full Text Available Abstract Background Milkweeds (Asclepias L. have been extensively investigated in diverse areas of evolutionary biology and ecology; however, there are few genetic resources available to facilitate and compliment these studies. This study explored how low coverage genome sequencing of the common milkweed (Asclepias syriaca L. could be useful in characterizing the genome of a plant without prior genomic information and for development of genomic resources as a step toward further developing A. syriaca as a model in ecology and evolution. Results A 0.5× genome of A. syriaca was produced using Illumina sequencing. A virtually complete chloroplast genome of 158,598 bp was assembled, revealing few repeats and loss of three genes: accD, clpP, and ycf1. A nearly complete rDNA cistron (18S-5.8S-26S; 7,541 bp and 5S rDNA (120 bp sequence were obtained. Assessment of polymorphism revealed that the rDNA cistron and 5S rDNA had 0.3% and 26.7% polymorphic sites, respectively. A partial mitochondrial genome sequence (130,764 bp, with identical gene content to tobacco, was also assembled. An initial characterization of repeat content indicated that Ty1/copia-like retroelements are the most common repeat type in the milkweed genome. At least one A. syriaca microread hit 88% of Catharanthus roseus (Apocynaceae unigenes (median coverage of 0.29× and 66% of single copy orthologs (COSII in asterids (median coverage of 0.14×. From this partial characterization of the A. syriaca genome, markers for population genetics (microsatellites and phylogenetics (low-copy nuclear genes studies were developed. Conclusions The results highlight the promise of next generation sequencing for development of genomic resources for any organism. Low coverage genome sequencing allows characterization of the high copy fraction of the genome and exploration of the low copy fraction of the genome, which facilitate the development of molecular tools for further study of a target species

  5. Building a model: developing genomic resources for common milkweed (Asclepias syriaca) with low coverage genome sequencing.

    Science.gov (United States)

    Straub, Shannon C K; Fishbein, Mark; Livshultz, Tatyana; Foster, Zachary; Parks, Matthew; Weitemier, Kevin; Cronn, Richard C; Liston, Aaron

    2011-05-04

    Milkweeds (Asclepias L.) have been extensively investigated in diverse areas of evolutionary biology and ecology; however, there are few genetic resources available to facilitate and compliment these studies. This study explored how low coverage genome sequencing of the common milkweed (Asclepias syriaca L.) could be useful in characterizing the genome of a plant without prior genomic information and for development of genomic resources as a step toward further developing A. syriaca as a model in ecology and evolution. A 0.5× genome of A. syriaca was produced using Illumina sequencing. A virtually complete chloroplast genome of 158,598 bp was assembled, revealing few repeats and loss of three genes: accD, clpP, and ycf1. A nearly complete rDNA cistron (18S-5.8S-26S; 7,541 bp) and 5S rDNA (120 bp) sequence were obtained. Assessment of polymorphism revealed that the rDNA cistron and 5S rDNA had 0.3% and 26.7% polymorphic sites, respectively. A partial mitochondrial genome sequence (130,764 bp), with identical gene content to tobacco, was also assembled. An initial characterization of repeat content indicated that Ty1/copia-like retroelements are the most common repeat type in the milkweed genome. At least one A. syriaca microread hit 88% of Catharanthus roseus (Apocynaceae) unigenes (median coverage of 0.29×) and 66% of single copy orthologs (COSII) in asterids (median coverage of 0.14×). From this partial characterization of the A. syriaca genome, markers for population genetics (microsatellites) and phylogenetics (low-copy nuclear genes) studies were developed. The results highlight the promise of next generation sequencing for development of genomic resources for any organism. Low coverage genome sequencing allows characterization of the high copy fraction of the genome and exploration of the low copy fraction of the genome, which facilitate the development of molecular tools for further study of a target species and its relatives. This study represents a first

  6. From Sequence to Morphology - Long-Range Correlations in Complete Sequenced Genomes

    NARCIS (Netherlands)

    T.A. Knoch (Tobias)

    2004-01-01

    textabstractThe largely unresolved sequential organization, i.e. the relations within DNA sequences, and its connection to the three-dimensional organization of genomes was investigated by correlation analyses of completely sequenced chromosomes from Viroids, Archaea, Bacteria, Arabidopsis thali

  7. Complete Genome Sequence of Streptococcus agalactiae CNCTC 10/84, a Hypervirulent Sequence Type 26 Strain

    OpenAIRE

    Hooven, Thomas A.; Randis, Tara M.; Daugherty, Sean C.; Narechania, Apurva; Planet, Paul J.; Tettelin, Hervé; Ratner, Adam J.

    2014-01-01

    Streptococcus agalactiae (group B Streptococcus [GBS]) is a human pathogen with a propensity to cause neonatal infections. We report the complete genome sequence of GBS strain CNCTC 10/84, a hypervirulent clinical isolate frequently used to study GBS pathogenesis. Comparative analysis of this sequence may shed light on novel pathogenic mechanisms.

  8. Genome Calligrapher: A Web Tool for Refactoring Bacterial Genome Sequences for de Novo DNA Synthesis.

    Science.gov (United States)

    Christen, Matthias; Deutsch, Samuel; Christen, Beat

    2015-08-21

    Recent advances in synthetic biology have resulted in an increasing demand for the de novo synthesis of large-scale DNA constructs. Any process improvement that enables fast and cost-effective streamlining of digitized genetic information into fabricable DNA sequences holds great promise to study, mine, and engineer genomes. Here, we present Genome Calligrapher, a computer-aided design web tool intended for whole genome refactoring of bacterial chromosomes for de novo DNA synthesis. By applying a neutral recoding algorithm, Genome Calligrapher optimizes GC content and removes obstructive DNA features known to interfere with the synthesis of double-stranded DNA and the higher order assembly into large DNA constructs. Subsequent bioinformatics analysis revealed that synthesis constraints are prevalent among bacterial genomes. However, a low level of codon replacement is sufficient for refactoring bacterial genomes into easy-to-synthesize DNA sequences. To test the algorithm, 168 kb of synthetic DNA comprising approximately 20 percent of the synthetic essential genome of the cell-cycle bacterium Caulobacter crescentus was streamlined and then ordered from a commercial supplier of low-cost de novo DNA synthesis. The successful assembly into eight 20 kb segments indicates that Genome Calligrapher algorithm can be efficiently used to refactor difficult-to-synthesize DNA. Genome Calligrapher is broadly applicable to recode biosynthetic pathways, DNA sequences, and whole bacterial genomes, thus offering new opportunities to use synthetic biology tools to explore the functionality of microbial diversity. The Genome Calligrapher web tool can be accessed at https://christenlab.ethz.ch/GenomeCalligrapher  .

  9. Current challenges in genome annotation through structural biology and bioinformatics.

    Science.gov (United States)

    Furnham, Nicholas; de Beer, Tjaart A P; Thornton, Janet M

    2012-10-01

    With the huge volume in genomic sequences being generated from high-throughout sequencing projects the requirement for providing accurate and detailed annotations of gene products has never been greater. It is proving to be a huge challenge for computational biologists to use as much information as possible from experimental data to provide annotations for genome data of unknown function. A central component to this process is to use experimentally determined structures, which provide a means to detect homology that is not discernable from just the sequence and permit the consequences of genomic variation to be realized at the molecular level. In particular, structures also form the basis of many bioinformatics methods for improving the detailed functional annotations of enzymes in combination with similarities in sequence and chemistry. Copyright © 2012. Published by Elsevier Ltd.

  10. Draft Genome Sequence of Corynebacterium diphtheriae Biovar Intermedius NCTC 5011

    OpenAIRE

    Sangal, Vartul; Nicholas P Tucker; Burkovski, Andreas; Hoskisson, Paul A.

    2012-01-01

    We report an annotated draft genome of the human pathogen Corynebacterium diphtheriae bv. intermedius NCTC 5011. This strain is the first C. diphtheriae bv. intermedius strain to be sequenced, and our results provide a useful comparison to the other primary disease-causing biovars, C. diphtheriae bv. gravis and C. diphtheriae bv. mitis. The sequence has been deposited at DDBJ/EMBL/GenBank with the accession number AJVH01000000.

  11. Draft genome sequence of Corynebacterium diphtheriae biovar intermedius NCTC 5011.

    Science.gov (United States)

    Sangal, Vartul; Tucker, Nicholas P; Burkovski, Andreas; Hoskisson, Paul A

    2012-09-01

    We report an annotated draft genome of the human pathogen Corynebacterium diphtheriae bv. intermedius NCTC 5011. This strain is the first C. diphtheriae bv. intermedius strain to be sequenced, and our results provide a useful comparison to the other primary disease-causing biovars, C. diphtheriae bv. gravis and C. diphtheriae bv. mitis. The sequence has been deposited at DDBJ/EMBL/GenBank with the accession number AJVH01000000.

  12. Complete genome sequence of Saccharothrix espanaensis DSM 44229T and comparison to the other completely sequenced Pseudonocardiaceae

    Directory of Open Access Journals (Sweden)

    Strobel Tina

    2012-09-01

    Full Text Available Abstract Background The genus Saccharothrix is a representative of the family Pseudonocardiaceae, known to include producer strains of a wide variety of potent antibiotics. Saccharothrix espanaensis produces both saccharomicins A and B of the promising new class of heptadecaglycoside antibiotics, active against both bacteria and yeast. Results To better assess its capabilities, the complete genome sequence of S. espanaensis was established. With a size of 9,360,653 bp, coding for 8,501 genes, it stands alongside other Pseudonocardiaceae with large genomes. Besides a predicted core genome of 810 genes shared in the family, S. espanaensis has a large number of accessory genes: 2,967 singletons when compared to the family, of which 1,292 have no clear orthologs in the RefSeq database. The genome analysis revealed the presence of 26 biosynthetic gene clusters potentially encoding secondary metabolites. Among them, the cluster coding for the saccharomicins could be identified. Conclusion S. espanaensis is the first completely sequenced species of the genus Saccharothrix. The genome discloses the cluster responsible for the biosynthesis of the saccharomicins, the largest oligosaccharide antibiotic currently identified. Moreover, the genome revealed 25 additional putative secondary metabolite gene clusters further suggesting the strain’s potential for natural product synthesis.

  13. Draft genome sequence of Acinetobacter pittii ST643 shared by cystic fibrosis patients

    Science.gov (United States)

    Rocha, Géssica A; Ferreira, Alex G; Lima, Danielle F; Leão, Robson S; Carvalho-Assef, Ana Paula D; Folescu, Tânia W; Albano, Rodolpho M; Marques, Elizabeth A

    2016-01-01

    Acinetobacter pittii has emerged as an important hospital pathogen that is associated with outbreaks and drug resistance. In cystic fibrosis (CF) patients, the detection of Acinetobacter spp. is rare; however, we isolated the A. pittii sequence type ST643 in several Brazilian CF patients treated in the same centre. The current study describes the draft genome of A. pittii ST643. PMID:27653362

  14. Genome sequencing and analysis of the versatile cell factory Aspergillus niger CBS 513.88

    DEFF Research Database (Denmark)

    Pel, Herman J.; de Winde, Johannes H.; Archer, David B.

    2007-01-01

    The filamentous fungus Aspergillus niger is widely exploited by the fermentation industry for the production of enzymes and organic acids, particularly citric acid. We sequenced the 33.9-megabase genome of A. niger CBS 513.88, the ancestor of currently used enzyme production strains. A high level...

  15. Draft Genome Sequence of an Enterobacter Species Associated with Illnesses and Powdered Infant Formula

    Science.gov (United States)

    Jackson, Emily E.; Ogrodzki, Pauline; Pascoe, Ben; Sheppard, Samuel K.

    2016-01-01

    This is the first report of the draft genome sequence of an Enterobacter species that may have been transmitted from powdered infant formula (PIF) to infants, resulting in illness. Enterobacter spp. are currently permitted in PIF, but the transmission of this strain indicates that the microbiological criteria for PIF may need revision. PMID:26769921

  16. An integrated framework for discovery and genotyping of genomic variants from high-throughput sequencing experiments

    NARCIS (Netherlands)

    Duitama, Jorge; Quintero, Juan Camilo; Cruz, Daniel Felipe; Quintero, Constanza; Hubmann, Georg; Foulquié-Moreno, Maria R.; Verstrepen, Kevin J.; Thevelein, Johan M.; Tohme, Joe

    2014-01-01

    Recent advances in high-throughput sequencing (HTS) technologies and computing capacity have produced unprecedented amounts of genomic data that have unraveled the genetics of phenotypic variability in several species. However, operating and integrating current software tools for data analysis still

  17. Brucella microti: the genome sequence of an emerging pathogen

    Directory of Open Access Journals (Sweden)

    Scholz Holger C

    2009-08-01

    Full Text Available Abstract Background Using a combination of pyrosequencing and conventional Sanger sequencing, the complete genome sequence of the recently described novel Brucella species, Brucella microti, was determined. B. microti is a member of the genus Brucella within the Alphaproteobacteria, which consists of medically important highly pathogenic facultative intracellular bacteria. In contrast to all other Brucella species, B. microti is a fast growing and biochemically very active microorganism with a phenotype more similar to that of Ochrobactrum, a facultative human pathogen. The atypical phenotype of B. microti prompted us to look for genomic differences compared to other Brucella species and to look for similarities with Ochrobactrum. Results The genome is composed of two circular chromosomes of 2,117,050 and 1,220,319 base pairs. Unexpectedly, we found that the genome sequence of B. microti is almost identical to that of Brucella suis 1330 with an overall sequence identity of 99.84% in aligned regions. The most significant structural difference between the two genomes is a bacteriophage-related 11,742 base pairs insert only present in B. microti. However, this insert is unlikely to have any phenotypical consequence. Only four protein coding genes are shared between B. microti and Ochrobactrum anthropi but impaired in other sequenced Brucella. The most noticeable difference between B. microti and other Brucella species was found in the sequence of the 23S ribosomal RNA gene. This unusual variation could have pleiotropic effects and explain the fast growth of B. microti. Conclusion Contrary to expectations from the phenotypic analysis, the genome sequence of B. microti is highly similar to that of known Brucella species, and is remotely related to the one of O. anthropi. How the few differences in gene content between B. microti and B. suis 1330 could result in vastly different phenotypes remains to be elucidated. This unexpected finding will

  18. Mitochondrial genome sequences and comparative genomics ofPhytophthora ramorum and P. sojae

    Energy Technology Data Exchange (ETDEWEB)

    Martin, Frank N.; Douda, Bensasson; Tyler, Brett M.; Boore,Jeffrey L.

    2007-01-01

    The complete sequences of the mitochondrial genomes of theoomycetes of Phytophthora ramorum and P. sojae were determined during thecourse of their complete nuclear genome sequencing (Tyler, et al. 2006).Both are circular, with sizes of 39,314 bp for P. ramorum and 42,975 bpfor P. sojae. Each contains a total of 37 identifiable protein-encodinggenes, 25 or 26 tRNAs (P. sojae and P. ramorum, respectively)specifying19 amino acids, and a variable number of ORFs (7 for P. ramorum and 12for P. sojae) which are potentially additional functional genes.Non-coding regions comprise approximately 11.5 percent and 18.4 percentof the genomes of P. ramorum and P. sojae, respectively. Relative to P.sojae, there is an inverted repeat of 1,150 bp in P. ramorum thatincludes an unassigned unique ORF, a tRNA gene, and adjacent non-codingsequences, but otherwise the gene order in both species is identical.Comparisons of these genomes with published sequences of the P. infestansmitochondrial genome reveals a number of similarities, but the gene orderin P. infestans differs in two adjacent locations due to inversions.Sequence alignments of the three genomes indicated sequence conservationranging from 75 to 85 percent and that specific regions were morevariable than others.

  19. A Pan-HIV Strategy for Complete Genome Sequencing.

    Science.gov (United States)

    Berg, Michael G; Yamaguchi, Julie; Alessandri-Gradt, Elodie; Tell, Robert W; Plantier, Jean-Christophe; Brennan, Catherine A

    2016-04-01

    Molecular surveillance is essential to monitor HIV diversity and track emerging strains. We have developed a universal library preparation method (HIV-SMART [i.e.,switchingmechanismat 5' end ofRNAtranscript]) for next-generation sequencing that harnesses the specificity of HIV-directed priming to enable full genome characterization of all HIV-1 groups (M, N, O, and P) and HIV-2. Broad application of the HIV-SMART approach was demonstrated using a panel of diverse cell-cultured virus isolates. HIV-1 non-subtype B-infected clinical specimens from Cameroon were then used to optimize the protocol to sequence directly from plasma. When multiplexing 8 or more libraries per MiSeq run, full genome coverage at a median ∼2,000× depth was routinely obtained for either sample type. The method reproducibly generated the same consensus sequence, consistently identified viral sequence heterogeneity present in specimens, and at viral loads of ≤4.5 log copies/ml yielded sufficient coverage to permit strain classification. HIV-SMART provides an unparalleled opportunity to identify diverse HIV strains in patient specimens and to determine phylogenetic classification based on the entire viral genome. Easily adapted to sequence any RNA virus, this technology illustrates the utility of next-generation sequencing (NGS) for viral characterization and surveillance.

  20. A Pan-HIV Strategy for Complete Genome Sequencing

    Science.gov (United States)

    Yamaguchi, Julie; Alessandri-Gradt, Elodie; Tell, Robert W.; Brennan, Catherine A.

    2015-01-01

    Molecular surveillance is essential to monitor HIV diversity and track emerging strains. We have developed a universal library preparation method (HIV-SMART [i.e., switching mechanism at 5′ end of RNA transcript]) for next-generation sequencing that harnesses the specificity of HIV-directed priming to enable full genome characterization of all HIV-1 groups (M, N, O, and P) and HIV-2. Broad application of the HIV-SMART approach was demonstrated using a panel of diverse cell-cultured virus isolates. HIV-1 non-subtype B-infected clinical specimens from Cameroon were then used to optimize the protocol to sequence directly from plasma. When multiplexing 8 or more libraries per MiSeq run, full genome coverage at a median ∼2,000× depth was routinely obtained for either sample type. The method reproducibly generated the same consensus sequence, consistently identified viral sequence heterogeneity present in specimens, and at viral loads of ≤4.5 log copies/ml yielded sufficient coverage to permit strain classification. HIV-SMART provides an unparalleled opportunity to identify diverse HIV strains in patient specimens and to determine phylogenetic classification based on the entire viral genome. Easily adapted to sequence any RNA virus, this technology illustrates the utility of next-generation sequencing (NGS) for viral characterization and surveillance. PMID:26699702

  1. Quantifying Next Generation Sequencing Sample Pre-Processing Bias in HIV-1 Complete Genome Sequencing.

    Science.gov (United States)

    Vrancken, Bram; Trovão, Nídia Sequeira; Baele, Guy; van Wijngaerden, Eric; Vandamme, Anne-Mieke; van Laethem, Kristel; Lemey, Philippe

    2016-01-07

    Genetic analyses play a central role in infectious disease research. Massively parallelized "mechanical cloning" and sequencing technologies were quickly adopted by HIV researchers in order to broaden the understanding of the clinical importance of minor drug-resistant variants. These efforts have, however, remained largely limited to small genomic regions. The growing need to monitor multiple genome regions for drug resistance testing, as well as the obvious benefit for studying evolutionary and epidemic processes makes complete genome sequencing an important goal in viral research. In addition, a major drawback for NGS applications to RNA viruses is the need for large quantities of input DNA. Here, we use a generic overlapping amplicon-based near full-genome amplification protocol to compare low-input enzymatic fragmentation (Nextera™) with conventional mechanical shearing for Roche 454 sequencing. We find that the fragmentation method has only a modest impact on the characterization of the population composition and that for reliable results, the variation introduced at all steps of the procedure--from nucleic acid extraction to sequencing--should be taken into account, a finding that is also relevant for NGS technologies that are now more commonly used. Furthermore, by applying our protocol to deep sequence a number of pre-therapy plasma and PBMC samples, we illustrate the potential benefits of a near complete genome sequencing approach in routine genotyping.

  2. Sequence Determination from Overlapping Fragments: A Simple Model of Whole-Genome Shotgun Sequencing

    Science.gov (United States)

    Derrida, Bernard; Fink, Thomas M.

    2002-02-01

    Assembling fragments randomly sampled from along a sequence is the basis of whole-genome shotgun sequencing, a technique used to map the DNA of the human and other genomes. We calculate the probability that a random sequence can be recovered from a collection of overlapping fragments. We provide an exact solution for an infinite alphabet and in the case of constant overlaps. For the general problem we apply two assembly strategies and give the probability that the assembly puzzle can be solved in the limit of infinitely many fragments.

  3. Pseudo-De Novo Assembly and Analysis of Unmapped Genome Sequence Reads in Wild Zebrafish Reveal Novel Gene Content.

    Science.gov (United States)

    Faber-Hammond, Joshua J; Brown, Kim H

    2016-04-01

    Zebrafish represents the third vertebrate with an officially completed genome, yet it remains incomplete with additions and corrections continuing with the current release, GRCz10, having 13% of zebrafish cDNA sequences unmapped. This disparity may result from population differences, given that the genome reference was generated from clonal individuals with limited genetic diversity. This is supported by the recent analysis of a single wild zebrafish, which identified over 5.2 million SNPs and 1.6 million in/dels in the previous genome build, zv9. Re-examination of this sequence data set indicated that 13.8% of quality sequence reads failed to align to GRCz10. Using a novel bioinformatics de novo assembly pipeline on these unmappable reads, we identified 1,514,491 novel contigs covering ∼224 Mb of genomic sequence. Among these, 1083 contigs were found to contain a potential gene coding sequence. RNA-seq data comparison confirmed that 362 contigs contained a transcribed DNA sequence, suggesting that a large amount of functional genomic sequence remains unannotated in the zebrafish reference genome. By utilizing the bioinformatics pipeline developed in this study, the zebrafish genome will be bolstered as a model for human disease research. Adaptation of the pipeline described here also offers a cost-efficient and effective method to identify and map novel genetic content across any genome and will ultimately aid in the completion of additional genomes for a broad range of species.

  4. Complete genome sequence of Stackebrandtia nassauensis type strain (LLR-40K-21T)

    Energy Technology Data Exchange (ETDEWEB)

    Munk, Christine [U.S. Department of Energy, Joint Genome Institute; Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute; Copeland, A [U.S. Department of Energy, Joint Genome Institute; Jando, Marlen [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Mayilraj, Shanmugam [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Glavina Del Rio, Tijana [U.S. Department of Energy, Joint Genome Institute; Nolan, Matt [U.S. Department of Energy, Joint Genome Institute; Chen, Feng [U.S. Department of Energy, Joint Genome Institute; Han, Cliff [Los Alamos National Laboratory (LANL); Detter, J. Chris [U.S. Department of Energy, Joint Genome Institute; Bruce, David [Los Alamos National Laboratory (LANL); Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Chain, Patrick S. G. [Lawrence Livermore National Laboratory (LLNL); Pitluck, Sam [U.S. Department of Energy, Joint Genome Institute; Goker, Markus [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Ovchinnikova, Galina [U.S. Department of Energy, Joint Genome Institute; Pati, Amrita [U.S. Department of Energy, Joint Genome Institute; Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Mavromatis, K [U.S. Department of Energy, Joint Genome Institute; Chen, Amy [U.S. Department of Energy, Joint Genome Institute; Palaniappan, Krishna [U.S. Department of Energy, Joint Genome Institute; Land, Miriam L [ORNL; Hauser, Loren John [ORNL; Chang, Yun-Juan [ORNL; Jeffries, Cynthia [Oak Ridge National Laboratory (ORNL); Bristow, James [U.S. Department of Energy, Joint Genome Institute; Eisen, Jonathan [U.S. Department of Energy, Joint Genome Institute; Markowitz, Victor [U.S. Department of Energy, Joint Genome Institute; Hugenholtz, Philip [U.S. Department of Energy, Joint Genome Institute; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute; Klenk, Hans-Peter [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany

    2009-01-01

    Stackebrandtia nassauensis Labeda and Kroppenstedt (2005) is the type species of the genus Stackebrandtia, and a member of the actinobacterial family Glycomycetaceae. Strackebrandtia currently contains two species, which are differentiated from Glycomyces spp. by cellular fatty acid and menaquinone composition. Strain LLR-40K-21T is Gram-positive, aerobic, and nonmotile, with a branched substrate mycelium and on some media an aerial mycelium. The strain was originally isolated from a soil sample collected from a road side in Nassau, Bahamas. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first complete genome sequence of the actinobacterial suborder Glycomycineae. The 6,841,557 bp long single replicon genome with its 6487 protein-coding and 53 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  5. Complete genome sequence of Stackebrandtia nassauensis type strain (LLR-40K-21T)

    Energy Technology Data Exchange (ETDEWEB)

    Munk, Chris [Los Alamos National Lab. (LANL), Los Alamos, NM (United States). Bioscience Division; Lapidus, Alla [DOE Joint Genome Institute, Walnut Creek, CA (United States); Copeland, Alex [DOE Joint Genome Institute, Walnut Creek, CA (United States); Jando, Marlen [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig (Germany); Mayilraj, Shanmugam [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig (Germany); Inst. of Microbial Technology, Chandigarh (India); Glavina Del Rio, Tijana [DOE Joint Genome Institute, Walnut Creek, CA (United States); Nolan, Matt [DOE Joint Genome Institute, Walnut Creek, CA (United States); Chen, Feng [DOE Joint Genome Institute, Walnut Creek, CA (United States); Lucas, Susan [DOE Joint Genome Institute, Walnut Creek, CA (United States); Tice, Hope [DOE Joint Genome Institute, Walnut Creek, CA (United States); Cheng, Jan-Fang [DOE Joint Genome Institute, Walnut Creek, CA (United States); Han, Cliff [DOE Joint Genome Institute, Walnut Creek, CA (United States); Los Alamos National Lab. (LANL), Los Alamos, NM (United States). Bioscience Division; Detter, John C. [DOE Joint Genome Institute, Walnut Creek, CA (United States); Los Alamos National Lab. (LANL), Los Alamos, NM (United States). Bioscience Division; Bruce, David [DOE Joint Genome Institute, Walnut Creek, CA (United States); Los Alamos National Lab. (LANL), Los Alamos, NM (United States). Bioscience Division; Goodwin, Lynne [DOE Joint Genome Institute, Walnut Creek, CA (United States); Los Alamos National Lab. (LANL), Los Alamos, NM (United States). Bioscience Division; Chain, Patrick [DOE Joint Genome Institute, Walnut Creek, CA (United States); Los Alamos National Lab. (LANL), Los Alamos, NM (United States). Bioscience Division; Pitluck, Sam [DOE Joint Genome Institute, Walnut Creek, CA (United States); Göker, Markus [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig (Germany); Ovchinikova, Galina [DOE Joint Genome Institute, Walnut Creek, CA (United States); Pati, Amrita [DOE Joint Genome Institute, Walnut Creek, CA (United States); Ivanova, Natalia [DOE Joint Genome Institute, Walnut Creek, CA (United States); Mavromatis, Konstantinos [DOE Joint Genome Institute, Walnut Creek, CA (United States); Chen, Amy [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Palaniappan, Krishna [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Land, Miriam [DOE Joint Genome Institute, Walnut Creek, CA (United States); Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Hauser, Loren [DOE Joint Genome Institute, Walnut Creek, CA (United States); Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Chang, Yun-Juan [DOE Joint Genome Institute, Walnut Creek, CA (United States); Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Jeffries, Cynthia D. [DOE Joint Genome Institute, Walnut Creek, CA (United States); Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Bristow, Jim [DOE Joint Genome Institute, Walnut Creek, CA (United States); Eisen, Jonathan A. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States). Bioscience Division; Univ. of California, Davis, CA (United States); Markowitz, Victor [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Hugenholtz, Philip [DOE Joint Genome Institute, Walnut Creek, CA (United States); Kyrpides, Nikos C. [DOE Joint Genome Institute, Walnut Creek, CA (United States); Klenk, Hans-Peter [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig (Germany)

    2009-12-30

    Stackebrandtia nassauensis Labeda and Kroppenstedt (2005) is the type species of the genus Stackebrandtia, and a member of the actinobacterial family Glycomycetaceae. Stackebrandtia currently contains two species, which are differentiated from Glycomyces spp. by cellular fatty acid and menaquinone composition. Strain LLR-40K-21T is Gram-positive, aerobic, and nonmotile, with a branched substrate mycelium and on some media an aerial mycelium. The strain was originally isolated from a soil sample collected from a road side in Nassau, Bahamas. We describe the features of this organism, together with the complete genome sequence and annotation. Lastly, this is the first complete genome sequence of the actinobacterial suborder Glycomycineae. The 6,841,557 bp long single replicon genome with its 6487 protein-coding and 53 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  6. Implications of the plastid genome sequence of typha (typhaceae, poales) for understanding genome evolution in poaceae.

    Science.gov (United States)

    Guisinger, Mary M; Chumley, Timothy W; Kuehl, Jennifer V; Boore, Jeffrey L; Jansen, Robert K

    2010-02-01

    Plastid genomes of the grasses (Poaceae) are unusual in their organization and rates of sequence evolution. There has been a recent surge in the availability of grass plastid genome sequences, but a comprehensive comparative analysis of genome evolution has not been performed that includes any related families in the Poales. We report on the plastid genome of Typha latifolia, the first non-grass Poales sequenced to date, and we present comparisons of genome organization and sequence evolution within Poales. Our results confirm that grass plastid genomes exhibit acceleration in both genomic rearrangements and nucleotide substitutions. Poaceae have multiple structural rearrangements, including three inversions, three genes losses (accD, ycf1, ycf2), intron losses in two genes (clpP, rpoC1), and expansion of the inverted repeat (IR) into both large and small single-copy regions. These rearrangements are restricted to the Poaceae, and IR expansion into the small single-copy region correlates with the phylogeny of the family. Comparisons of 73 protein-coding genes for 47 angiosperms including nine Poaceae genera confirm that the branch leading to Poaceae has significantly accelerated rates of change relative to other monocots and angiosperms. Furthermore, rates of sequence evolution within grasses are lower, indicating a deceleration during diversification of the family. Overall there is a strong correlation between accelerated rates of genomic rearrangements and nucleotide substitutions in Poaceae, a phenomenon that has been noted recently throughout angiosperms. The cause of the correlation is unknown, but faulty DNA repair has been suggested in other systems including bacterial and animal mitochondrial genomes.

  7. Complete genome sequence of Streptobacillus moniliformis type strain (9901T)

    Science.gov (United States)

    Nolan, Matt; Gronow, Sabine; Lapidus, Alla; Ivanova, Natalia; Copeland, Alex; Lucas, Susan; Del Rio, Tijana Glavina; Chen, Feng; Tice, Hope; Pitluck, Sam; Cheng, Jan-Fang; Sims, David; Meincke, Linda; Bruce, David; Goodwin, Lynne; Brettin, Thomas; Han, Cliff; Detter, John C.; Ovchinikova, Galina; Pati, Amrita; Mavromatis, Konstantinos; Mikhailova, Natalia; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D.; Rohde, Manfred; Spröer, Cathrin; Göker, Markus; Bristow, Jim; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C.; Klenk, Hans-Peter; Chain, Patrick

    2009-01-01

    Streptobacillus moniliformis Levaditi et al. 1925 is the type and sole species of the genus Streptobacillus, and is of phylogenetic interest because of its isolated location in the sparsely populated and neither taxonomically nor genomically much accessed family 'Leptotrichiaceae' within the phylum Fusobacteria. The 'Leptotrichiaceae' have not been well characterized, genomically or taxonomically. S. moniliformis,is a Gram-negative, non-motile, pleomorphic bacterium and is the etiologic agent of rat bite fever and Haverhill fever. Strain 9901T, the type strain of the species, was isolated from a patient with rat bite fever. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is only the second completed genome sequence of the order Fusobacteriales and no more than the third sequence from the phylum Fusobacteria. The 1,662,578 bp long chromosome and the 10,702 bp plasmid with a total of 1511 protein-coding and 55 RNA genes are part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:21304670

  8. Genome sequence and characterization of the Tsukamurella bacteriophage TPA2.

    Science.gov (United States)

    Petrovski, Steve; Seviour, Robert J; Tillett, Daniel

    2011-02-01

    The formation of stable foam in activated sludge plants is a global problem for which control is difficult. These foams are often stabilized by hydrophobic mycolic acid-synthesizing Actinobacteria, among which are Tsukamurella spp. This paper describes the isolation from activated sludge of the novel double-stranded DNA phage TPA2. This polyvalent Siphoviridae family phage is lytic for most Tsukamurella species. Whole-genome sequencing reveals that the TPA2 genome is circularly permuted (61,440 bp) and that 70% of its sequence is novel. We have identified 78 putative open reading frames, 95 pairs of inverted repeats, and 6 palindromes. The TPA2 genome has a modular gene structure that shares some similarity to those of Mycobacterium phages. A number of the genes display a mosaic architecture, suggesting that the TPA2 genome has evolved at least in part from genetic recombination events. The genome sequence reveals many novel genes that should inform any future discussion on Tsukamurella phage evolution.

  9. A fast and robust method for whole genome sequencing of the Aleutian Mink Disease Virus (AMDV) genome

    DEFF Research Database (Denmark)

    Hagberg, Emma Elisabeth; Krarup, Anders; Fahnøe, Ulrik;

    2016-01-01

    have focused on limited regions of the viral genome. This paper describes a robust, reliable, and fast protocol for amplification of the full AMDV genome using long-range PCR. The method was used to generate next generation sequencing data for the non-virulent cell-culture adapted AMDV-G strain as well......Aleutian Mink Disease Virus (AMDV) is a frequently encountered pathogen associated with commercial mink breeding. AMDV infection leads to increased mortality and compromised animal health and welfare. Currently little is known about the molecular evolution of the virus, and the few existing studies...

  10. Characterizing the citrus cultivar Carrizo genome through 454 shotgun sequencing.

    Science.gov (United States)

    Belknap, William R; Wang, Yi; Huo, Naxin; Wu, Jiajie; Rockhold, David R; Gu, Yong Q; Stover, Ed

    2011-12-01

    The citrus cultivar Carrizo is the single most important rootstock to the US citrus industry and has resistance or tolerance to a number of major citrus diseases, including citrus tristeza virus, foot rot, and Huanglongbing (HLB, citrus greening). A Carrizo genomic sequence database providing approximately 3.5×genome coverage (haploid genome size approximately 367 Mb) was populated through 454 GS FLX shotgun sequencing. Analysis of the repetitive DNA fraction indicated a total interspersed repeat fraction of 36.5%. Assembly and characterization of abundant citrus Ty3/gypsy elements revealed a novel type of element containing open reading frames encoding a viral RNA-silencing suppressor protein (RNA binding protein, rbp) and a plant cytokinin riboside 5′-monophosphate phosphoribohydrolase-related protein (LONELY GUY, log). Similar gypsy elements were identified in the Populus trichocarpa genome. Gene-coding region analysis indicated that 24.4% of the nonrepetitive reads contained genic regions. The depth of genome coverage was sufficient to allow accurate assembly of constituent genes, including a putative phloem-expressed gene. The development of the Carrizo database (http://citrus.pw.usda.gov/) will contribute to characterization of agronomically significant loci and provide a publicly available genomic resource to the citrus research community.

  11. Genome sequence of the stramenopile Blastocystis, a human anaerobic parasite

    Science.gov (United States)

    2011-01-01

    Background Blastocystis is a highly prevalent anaerobic eukaryotic parasite of humans and animals that is associated with various gastrointestinal and extraintestinal disorders. Epidemiological studies have identified different subtypes but no one subtype has been definitively correlated with disease. Results Here we report the 18.8 Mb genome sequence of a Blastocystis subtype 7 isolate, which is the smallest stramenopile genome sequenced to date. The genome is highly compact and contains intriguing rearrangements. Comparisons with other available stramenopile genomes (plant pathogenic oomycete and diatom genomes) revealed effector proteins potentially involved in the adaptation to the intestinal environment, which were likely acquired via horizontal gene transfer. Moreover, Blastocystis living in anaerobic conditions harbors mitochondria-like organelles. An incomplete oxidative phosphorylation chain, a partial Krebs cycle, amino acid and fatty acid metabolisms and an iron-sulfur cluster assembly are all predicted to occur in these organelles. Predicted secretory proteins possess putative activities that may alter host physiology, such as proteases, protease-inhibitors, immunophilins and glycosyltransferases. This parasite also possesses the enzymatic machinery to tolerate oxidative bursts resulting from its own metabolism or induced by the host immune system. Conclusions This study provides insights into the genome architecture of this unusual stramenopile. It also proposes candidate genes with which to study the physiopathology of this parasite and thus may lead to further investigations into Blastocystis-host interactions. PMID:21439036

  12. Genome sequence of the lager brewing yeast, an interspecies hybrid.

    Science.gov (United States)

    Nakao, Yoshihiro; Kanamori, Takeshi; Itoh, Takehiko; Kodama, Yukiko; Rainieri, Sandra; Nakamura, Norihisa; Shimonaga, Tomoko; Hattori, Masahira; Ashikari, Toshihiko

    2009-04-01

    This work presents the genome sequencing of the lager brewing yeast (Saccharomyces pastorianus) Weihenstephan 34/70, a strain widely used in lager beer brewing. The 25 Mb genome comprises two nuclear sub-genomes originating from Saccharomyces cerevisiae and Saccharomyces bayanus and one circular mitochondrial genome originating from S. bayanus. Thirty-six different types of chromosomes were found including eight chromosomes with translocations between the two sub-genomes, whose breakpoints are within the orthologous open reading frames. Several gene loci responsible for typical lager brewing yeast characteristics such as maltotriose uptake and sulfite production have been increased in number by chromosomal rearrangements. Despite an overall high degree of conservation of the synteny with S. cerevisiae and S. bayanus, the syntenies were not well conserved in the sub-telomeric regions that contain lager brewing yeast characteristic and specific genes. Deletion of larger chromosomal regions, a massive unilateral decrease of the ribosomal DNA cluster and bilateral truncations of over 60 genes reflect a post-hybridization evolution process. Truncations and deletions of less efficient maltose and maltotriose uptake genes may indicate the result of adaptation to brewing. The genome sequence of this interspecies hybrid yeast provides a new tool for better understanding of lager brewing yeast behavior in industrial beer production.

  13. Hellbender genome sequences shed light on genomic expansion at the base of crown salamanders.

    Science.gov (United States)

    Sun, Cheng; Mueller, Rachel Lockridge

    2014-07-01

    Among animals, genome sizes range from 20 Mb to 130 Gb, with 380-fold variation across vertebrates. Most of the largest vertebrate genomes are found in salamanders, an amphibian clade of 660 species. Thus, salamanders are an important system for studying causes and consequences of genomic gigantism. Previously, we showed that plethodontid salamander genomes accumulate higher levels of long terminal repeat (LTR) retrotransposons than do other vertebrates, although the evolutionary origins of such sequences remained unexplored. We also showed that some salamanders in the family Plethodontidae have relatively slow rates of DNA loss through small insertions and deletions. Here, we present new data from Cryptobranchus alleganiensis, the hellbender. Cryptobranchus and Plethodontidae span the basal phylogenetic split within salamanders; thus, analyses incorporating these taxa can shed light on the genome of the ancestral crown salamander lineage, which underwent expansion. We show that high levels of LTR retrotransposons likely characterize all crown salamanders, suggesting that disproportionate expansion of this transposable element (TE) class contributed to genomic expansion. Phylogenetic and age distribution analyses of salamander LTR retrotransposons indicate that salamanders' high TE levels reflect persistence and diversification of ancestral TEs rather than horizontal transfer events. Finally, we show that relatively slow DNA loss rates through small indels likely characterize all crown salamanders, suggesting that a decreased DNA loss rate contributed to genomic expansion at the clade's base. Our identification of shared genomic features across phylogenetically distant salamanders is a first step toward identifying the evolutionary processes underlying accumulation and persistence of high levels of repetitive sequence in salamander genomes.

  14. Complete Genome Sequence of Mycobacterium vaccae Type Strain ATCC 25954

    KAUST Repository

    Ho, Y. S.

    2012-10-26

    Mycobacterium vaccae is a rapidly growing, nontuberculous Mycobacterium species that is generally not considered a human pathogen and is of major pharmaceutical interest as an immunotherapeutic agent. We report here the annotated genome sequence of the M. vaccae type strain, ATCC 25954.

  15. Draft Genome Sequences of Two Marinitoga camini Isolates Producing Bacterioviruses

    Science.gov (United States)

    Mercier, Coraline; Lossouarn, Julien; Haverkamp, Thomas; Bienvenu, Nadège; Godfroy, Anne; Cueff-Gauchard, Valérie

    2016-01-01

    Here, we present the draft genome sequences of two thermophilic Marinitoga strain members of the Thermotogales order, Marinitoga camini DV1155 and Marinitoga camini DV1197. These strains were isolated from deep-sea hydrothermal vents of the Mid-Atlantic Ridge. PMID:27834711

  16. Complete Genome Sequence of Canine Papillomavirus Type 16

    OpenAIRE

    Luff, Jennifer; Mader, Michelle; Britton, Monica; Fass, Joseph; Rowland, Peter; Orr, Carolyn; Schlegel, Richard; Yuan, Hang

    2015-01-01

    Papillomaviruses are epitheliotropic, circular, double-stranded DNA viruses within the family Papillomaviridae that are associated with benign and malignant tumors in humans and animals. We report the complete genome sequence of canine papillomavirus type 16 identified within multiple pigmented cutaneous plaques and squamous cell carcinoma from an intact female Basenji dog.

  17. Phytophthora genome sequences uncover evolutionary origins and mechanisms of pathogenesis

    NARCIS (Netherlands)

    Tyler, B.M.; Tripathy, S.; Zhang, X.; Dehal, P.; Jiang, R.H.Y.; Aerts, A.; Arredondo, F.D.; Baxter, L.; Bensasson, D.; Beynon, J.L.; Chapman, J.; Damasceno, C.M.B.; Dorrance, A.E.; Dou, D.; Dickerman, A.W.; Dubchak, I.L.; Garbelotto, M.; Gijzen, M.; Gordon, S.G.; Govers, F.; Grunwald, N.J.; Huang, W.; Ivors, K.L.; Jones, R.W.; Kamoun, S.; Krampis, K.; Lamour, K.H.; Lee, M.K.; McDonald, W.H.; Medina, M.; Meijer, H.J.G.; Nordberg, E.K.; Maclean, D.J.; Ospina-Giraldo, M.D.; Morris, P.F.; Phuntumart, V.; Putnam, N.H.; Rash, S.; Rose, J.K.C.; Sakihama, Y.; Salamov, A.A.; Savidor, A.; Scheuring, C.F.; Smith, B.M.; Sobral, B.W.S.; Terry, A.; Torto-Alalibo, T.A.; Win, J.; Xu, Z.; Zhang, H.; Grigoriev, I.V.; Rokhsar, D.S.; Boore, J.L.

    2006-01-01

    Draft genome sequences have been determined for the soybean pathogen Phytophthora sojae and the sudden oak death pathogen Phytophthora ramorum. Oömycetes such as these Phytophthora species share the kingdom Stramenopila with photosynthetic algae such as diatoms, and the presence of many Phytophthora

  18. Complete Genome Sequence of Bacillus megaterium Siphophage Silence.

    Science.gov (United States)

    Solis, Jonathan A; Farmer, Nicholas G; Cahill, Jesse L; Rasche, Eric S; Kuty Everett, Gabriel F

    2015-10-08

    Silence is a newly isolated siphophage that infects Bacillus megaterium, a soil bacterium that is used readily in research and commercial applications. A study of B. megaterium phage Silence will enhance our knowledge of the diversity of Bacillus phages. Here, we describe the complete genome sequence and annotated features of Silence. Copyright © 2015 Solis et al.

  19. Genome Sequence of Herpes Simplex Virus 1 Strain SC16

    Science.gov (United States)

    Rastrojo, Alberto; López-Muñoz, Alberto Domingo

    2017-01-01

    ABSTRACT Herpes simplex virus 1 (HSV-1), also known as Human herpesvirus 1, is a highly prevalent human neurotropic pathogen that causes a variety of diseases, including lethal encephalitis. Here, we report the genome sequence of the HSV-1 strain SC16. PMID:28126930

  20. Genome sequence of the human pathogen Vibrio cholerae Amazonia.

    NARCIS (Netherlands)

    Thompson, C.C.; Marin, M.A.; Dias, G.M.; Dutilh, B.E.; Edwards, R.A.; Iida, T.; Thompson, F.L.; Vicente, A.C.

    2011-01-01

    Vibrio cholerae O1 Amazonia is a pathogen that was isolated from cholera-like diarrhea cases in at least two countries, Brazil and Ghana. Based on multilocus sequence analysis, this lineage belongs to a distinct profile compared to strains from El Tor and classical biotypes. The genomic analysis rev

  1. Genome Sequences of Gordonia terrae Bacteriophages Phinally and Vivi2.

    Science.gov (United States)

    Pope, Welkin H; Anderson, Kaitlyn C; Arora, Charu; Bortz, Michael E; Burnet, George; Conover, David H; D'Incau, Gina M; Ghobrial, Jonathan A; Jonas, Audrey L; Migdal, Emily J; Rote, Nicole L; German, Brian A; McDonnell, Jill E; Mezghani, Nadia; Schafer, Claire E; Thompson, Paige K; Ulbrich, Megan C; Yu, Victor J; Furbee, Emily C; Grubb, Sarah R; Warner, Marcie H; Montgomery, Matthew T; Garlena, Rebecca A; Russell, Daniel A; Jacobs-Sera, Deborah; Hatfull, Graham F

    2016-08-18

    Bacteriophages Phinally and Vivi2 were isolated from soil from Pittsburgh, Pennsylvania, USA, using host Gordonia terrae 3612. The Phinally and Vivi2 genomes are 59,265 bp and 59,337 bp, respectively, and share sequence similarity with each other and with GTE6. Fewer than 25% of the 87 to 89 putative genes have predictable functions.

  2. Genome Sequence of Bacillus thuringiensis subsp. kurstaki Strain HD-1.

    Science.gov (United States)

    Day, Michael; Ibrahim, Mohamed; Dyer, David; Bulla, Lee

    2014-07-17

    We report here the complete genome sequence of Bacillus thuringiensis subsp. kurstaki strain HD-1, which serves as the primary U.S. reference standard for all commercial insecticidal formulations of B. thuringiensis manufactured around the world. Copyright © 2014 Day et al.

  3. Draft Genome Sequence of Mycobacterium chimaera Type Strain Fl-0169

    Science.gov (United States)

    We report the draft genome sequence of the type strain Mycobacterium chimaera Fl-0169T, a member of the Mycobacterium avium complex (MAC). M. chimaera Fl-0169T was isolated from a patient in Italy and is highly similar to strains of M. chimaera isolated in Ireland, though Fl-016...

  4. Complete genome sequence of the myxobacterium Sorangium cellulosum

    DEFF Research Database (Denmark)

    Schneiker, S; Perlova, O; Kaiser, O

    2007-01-01

    The genus Sorangium synthesizes approximately half of the secondary metabolites isolated from myxobacteria, including the anti-cancer metabolite epothilone. We report the complete genome sequence of the model Sorangium strain S. cellulosum Soce56, which produces several natural products and has m...

  5. Complete Genome Sequence of Anaplasma marginale subsp. centrale

    Science.gov (United States)

    Anaplasma marginale subsp. centrale is a naturally attenuated subtype that has been used as a vaccine for a century. We sequenced the genome of this organism and compared it to those of virulent senso stricto A. marginale strains. The comparison markedly narrows the number of outer membrane protein ...

  6. Draft Genome Sequence of Paecilomyces hepiali, Isolated from Cordyceps sinensis.

    Science.gov (United States)

    Yu, Yi; Wang, Wenting; Wang, Linping; Pang, Fang; Guo, Lanping; Song, Lai; Liu, Guiming; Feng, Chengqiang

    2016-07-07

    Paecilomyces hepiali is an endoparasitic fungus that commonly exists in the natural Cordyceps sinensis Here, we report the draft genome sequence of P. hepiali, which will facilitate the exploitation of medicinal compounds produced by the fungus. Copyright © 2016 Yu et al.

  7. Genome sequence of Mycoplasma hyorhinis strain GDL-1.

    Science.gov (United States)

    Calcutt, Michael J; Foecking, Mark F; Rosales, Ruben S; Ellis, Richard J; Nicholas, Robin A J

    2012-04-01

    Mycoplasma hyorhinis impacts swine health and production in many countries, either as a primary pathogen or as a component of a polymicrobial infection. Isolates of this species are also common contaminants of tissue culture lines. The genome sequence of the cell culture isolate M. hyorhinis GDL-1 is presented herein.

  8. Genome sequence of the human pathogen Vibrio cholerae Amazonia.

    NARCIS (Netherlands)

    Thompson, C.C.; Marin, M.A.; Dias, G.M.; Dutilh, B.E.; Edwards, R.A.; Iida, T.; Thompson, F.L.; Vicente, A.C.

    2011-01-01

    Vibrio cholerae O1 Amazonia is a pathogen that was isolated from cholera-like diarrhea cases in at least two countries, Brazil and Ghana. Based on multilocus sequence analysis, this lineage belongs to a distinct profile compared to strains from El Tor and classical biotypes. The genomic analysis

  9. Complete Genome Sequence of Beijerinckia indica subsp. indica▿

    Science.gov (United States)

    Tamas, Ivica; Dedysh, Svetlana N.; Liesack, Werner; Stott, Matthew B.; Alam, Maqsudul; Murrell, J. Colin; Dunfield, Peter F.

    2010-01-01

    Beijerinckia indica subsp. indica is an aerobic, acidophilic, exopolysaccharide-producing, N2-fixing soil bacterium. It is a generalist chemoorganotroph that is phylogenetically closely related to facultative and obligate methanotrophs of the genera Methylocella and Methylocapsa. Here we report the full genome sequence of this bacterium. PMID:20601475

  10. Complete Genome Sequence of Neisseria weaveri Strain NCTC13585

    Science.gov (United States)

    Fazal, Mohammed-Abbas; Burnett, Edward; Deheer-Graham, Ana; Oliver, Karen; Holroyd, Nancy; Russell, Julie E.

    2016-01-01

    Neisseria weaveri is a commensal organism of the canine oral cavity and an occasional opportunistic human pathogen which is associated with dog bite wounds. Here we report the first complete genomic sequence of the N. weaveri NCTC13585 (CCUG30381) strain, which was originally isolated from a patient with a canine bite wound. PMID:27563039

  11. Genome Sequences of Newly Isolated Mycobacteriophages Forming Cluster S.

    Science.gov (United States)

    Mills, Monique L; Bragg, Judd; Bruce, Asri; Dehn, Ari; Drouin, Jordan; Hefner, Morgan; Katon, Dylan; McHugh, Dustin; Zeba, Franck; Bowman, Charles A; Cresawn, Steven G; Jacobs-Sera, Deborah; Russell, Daniel A; Pope, Welkin H; Hatfull, Graham F; Dunbar, David A; Zegers, Gerard P; Page, Shallee T

    2016-09-29

    We describe the genomes of two mycobacteriophages, MosMoris and Gattaca, newly isolated on Mycobacterium smegmatis The two phages are very similar to each other, differing in 61 single nucleotide polymorphisms and six small insertion/deletions. Both have extensive nucleotide sequence similarity to mycobacteriophage Marvin and together form cluster S. Copyright © 2016 Mills et al.

  12. Genome Sequence of the Pathogenic Fungus Sporothrix schenckii (ATCC 58251).

    Science.gov (United States)

    Cuomo, Christina A; Rodriguez-Del Valle, Nuri; Perez-Sanchez, Lizaida; Abouelleil, Amr; Goldberg, Jonathan; Young, Sarah; Zeng, Qiandong; Birren, Bruce W

    2014-05-22

    Sporothrix schenckii is a pathogenic dimorphic fungus that grows as a yeast and as mycelia. This species is the causative agent of sporotrichosis, typically a skin infection. We report the genome sequence of S. schenckii, which will facilitate the study of this fungus and of the Sporothrix schenckii group.

  13. Complete Genome Sequence of Bacillus megaterium Myophage Mater

    OpenAIRE

    Lancaster, Jacob C.; Hodde, Mary K.; Hernandez, Adriana C.; Kuty Everett, Gabriel F.

    2015-01-01

    Bacillus megaterium is a ubiquitous, soil inhabiting Gram-positive bacterium that is a common model organism and is used in industrial applications for protein production. The following reports the complete sequencing and annotation of the genome of B. megaterium myophage Mater and describes the major features identified.

  14. Complete Genome Sequence of Bacillus megaterium Siphophage Silence

    OpenAIRE

    Solis, Jonathan A.; Farmer, Nicholas G.; Cahill, Jesse L.; Rasche, Eric S.; Kuty Everett, Gabriel F.

    2015-01-01

    Silence is a newly isolated siphophage that infects Bacillus megaterium, a soil bacterium that is used readily in research and commercial applications. A study of B. megaterium phage Silence will enhance our knowledge of the diversity of Bacillus phages. Here, we describe the complete genome sequence and annotated features of Silence.

  15. Genome sequence and genetic diversity of European ash trees

    DEFF Research Database (Denmark)

    Sollars, Elizabeth S A; Harper, Andrea L; Kelly, Laura J;

    2016-01-01

    Ash trees (genus Fraxinus, family Oleaceae) are widespread throughout the Northern Hemisphere, but are being devastated in Europe by the fungus Hymenoscyphus fraxineus, causing ash dieback, and in North America by the herbivorous beetle Agrilus planipennis. Here we sequence the genome of a low...... to an emerging health threat in a non-model organism opens the way for mitigation of the epidemic....

  16. Templated sequence insertion polymorphisms in the human genome

    Science.gov (United States)

    Onozawa, Masahiro; Aplan, Peter

    2016-11-01

    Templated Sequence Insertion Polymorphism (TSIP) is a recently described form of polymorphism recognized in the human genome, in which a sequence that is templated from a distant genomic region is inserted into the genome, seemingly at random. TSIPs can be grouped into two classes based on nucleotide sequence features at the insertion junctions; Class 1 TSIPs show features of insertions that are mediated via the LINE-1 ORF2 protein, including 1) target-site duplication (TSD), 2) polyadenylation 10-30 nucleotides downstream of a “cryptic” polyadenylation signal, and 3) preference for insertion at a 5’-TTTT/A-3’ sequence. In contrast, class 2 TSIPs show features consistent with repair of a DNA double-strand break via insertion of a DNA “patch” that is derived from a distant genomic region. Survey of a large number of normal human volunteers demonstrates that most individuals have 25-30 TSIPs, and that these TSIPs track with specific geographic regions. Similar to other forms of human polymorphism, we suspect that these TSIPs may be important for the generation of human diversity and genetic diseases.

  17. Targeted enrichment of genomic DNA regions for next generation sequencing

    NARCIS (Netherlands)

    Mertens, F.; El-Sharawy, A.; Sauer, S.; Van Helvoort, J.; Van der Zaag, P.J.; Franke, A.; Nilsson, M.; Lehrach. H.; Brookes, A.

    2011-01-01

    In this review we discuss the latest targeted enrichment methods, and aspects of their utilization along with second generation sequencing for complex genome analysis. In doing so we provide an overview of issues involved in detecting genetic variation, for which targeted enrichment has become a pow

  18. The complete mitochondrial genome sequence of Diaphorina citri (Hemiptera: Psyllidae)

    Science.gov (United States)

    The first complete mitochondrial genome (mitogenome) sequence of Asian citrus psyllid, Diaphorina citri (Hemiptera: Psyllidae), from Guangzhou, China is presented. The circular mitogenome is 14,996 bp in length with an A+T content of 74.5%, and contains 13 protein-coding genes (PCGs), 22 tRNA genes ...

  19. Draft Genome Sequence of Campylobacter jejuni 11168H

    Science.gov (United States)

    Macdonald, Sarah E.; Gundogdu, Ozan; Dorrell, Nick; Wren, Brendan W.; Blake, Damer

    2017-01-01

    ABSTRACT Campylobacter jejuni is the most prevalent cause of food-borne gastroenteritis in the developed world. The reference and original sequenced strain C. jejuni NCTC11168 has low levels of motility compared to clinical isolates. Here, we describe the draft genome of the laboratory derived hypermotile variant named 11168H. PMID:28153902

  20. Genome sequence of Kingella kingae septic arthritis isolate PYKK081.

    Science.gov (United States)

    Kaplan, Jeffrey B; Lo, Chienchi; Xie, Gary; Johnson, Shannon L; Chain, Patrick S G; Donnelly, Robert; Kachlany, Scott C; Balashova, Nataliya V

    2012-06-01

    Kingella kingae is a human oral bacterium that can cause infections of the skeletal system in children. The bacterium is also a cardiovascular pathogen causing infective endocarditis in children and adults. We report herein the draft genome sequence of septic arthritis K. kingae strain PYKK081.

  1. Genome Sequence of Kingella kingae Septic Arthritis Isolate PYKK081

    OpenAIRE

    Kaplan, Jeffrey B.; Lo, Chienchi; Xie, Gary; Johnson, Shannon L.; Chain, Patrick S.G.; Donnelly, Robert; Kachlany, Scott C.; Balashova, Nataliya V.

    2012-01-01

    Kingella kingae is a human oral bacterium that can cause infections of the skeletal system in children. The bacterium is also a cardiovascular pathogen causing infective endocarditis in children and adults. We report herein the draft genome sequence of septic arthritis K. kingae strain PYKK081.

  2. Complete Genome Sequence of the Cyanobacterium Anabaena sp. 33047

    Science.gov (United States)

    2016-01-01

    This study presents the complete nucleotide sequence of Anabaena sp. ATCC 33047 (Anabaena CA), a filamentous, nitrogen-fixing marine cyanobacterium, which under salt stress conditions accumulates sucrose internally. The elucidation of the genome will contribute to the understanding of cyanobacterial diversity. PMID:27516507

  3. Draft Genome Sequence of Buttiauxella agrestis, Isolated from Surface Water

    OpenAIRE

    Jothikumar, Narayanan; Kahler, Amy; Strockbine, Nancy; Gladney, Lori; Hill, Vincent R.

    2014-01-01

    MI agar is routinely used for quantifying Escherichia coli in drinking water. A suspect E. coli colony isolated from a water sample was identified as Buttiauxella agrestis. The whole genome sequence of B. agrestis was determined to understand the genetic basis for its phenotypic resemblance to E. coli on MI agar.

  4. Complete Chloroplast Genome Sequence of Dendrobium nobile from Northeastern India

    Science.gov (United States)

    Parameswaran, Sriram; Sundar, Durai

    2016-01-01

    The orchid species Dendrobium nobile belonging to the family Orchidaceae and genus Dendrobium (a vast genus that encompasses nearly 1,200 species) has an herbal medicinal history of about 2000 years in east and south Asian countries. Here, we report the complete chloroplast genome sequence of D. nobile from northeastern India for the first time.

  5. Genome Sequence of the Oral Probiotic Streptococcus salivarius JF.

    Science.gov (United States)

    Jia, Fang

    2016-09-22

    Streptococcus salivarius is a nonpathogenic Gram-positive bacterium and the predominant colonizer of the oral microbiota. It finds a wide application in the prevention of upper respiratory tract infections, also reducing the frequency of other main pathogens. Here, we present the complete genome sequence of the oral probiotic S. salivarius JF. Copyright © 2016 Jia.

  6. Genome Sequence of the Oral Probiotic Streptococcus salivarius JF

    OpenAIRE

    Jia, Fang

    2016-01-01

    Streptococcus salivarius is a nonpathogenic Gram-positive bacterium and the predominant colonizer of the oral microbiota. It finds a wide application in the prevention of upper respiratory tract infections, also reducing the frequency of other main pathogens. Here, we present the complete genome sequence of the oral probiotic S. salivarius JF.

  7. Complete genome sequence of the alfalfa latent virus

    Science.gov (United States)

    Alfalfa latent virus (ALV) is a member of the carlavirus group and occurs symptomlessly in alfalfa (Medicago sativa). In the US it is prevalent in Nebraska and Wisconsin. The virus is recognized as a strain of Pea streak virus (PeSV) So far, no complete genomic sequence of PSV or ALV is availab...

  8. Genome Sequences of Nine Gram-Negative Vaginal Bacterial Isolates

    Science.gov (United States)

    Deitzler, Grace E.; Ruiz, Maria J.; Lu, Wendy; Weimer, Cory; Park, SoEun; Robinson, Lloyd S.; Hallsworth-Pepin, Kymberlie; Wollam, Aye; Mitreva, Makedonka

    2016-01-01

    The vagina is home to a wide variety of bacteria that have great potential to impact human health. Here, we announce reference strains (now available through BEI Resources) and draft genome sequences for 9 Gram-negative vaginal isolates from the taxa Citrobacter, Klebsiella, Fusobacterium, Proteus, and Prevotella. PMID:27688330

  9. Genome Sequences of 11 Human Vaginal Actinobacteria Strains

    Science.gov (United States)

    Deitzler, Grace E.; Ruiz, Maria J.; Weimer, Cory; Park, SoEun; Robinson, Lloyd S.; Hallsworth-Pepin, Kymberlie; Wollam, Aye; Mitreva, Makedonka

    2016-01-01

    The composition of the vaginal microbiota is an important health determinant. Several members of the phylum Actinobacteria have been implicated in bacterial vaginosis, a condition associated with many negative health outcomes. Here, we present 11 strains of vaginal Actinobacteria (now available through BEI Resources) along with draft genome sequences. PMID:27688328

  10. Whole genome sequencing for childhood cancer in Denmark

    DEFF Research Database (Denmark)

    Gupta, Ramneek

    of host, tumour and gut microbiome’s genomes. In Europe, cancer accounts for approximately 25% of all deaths in children >1 year. Most cured patients are burdened by late effects, including risk of second cancer and debilitating toxicities. Recent advancements in genetic sequencing technology...

  11. Whole genome sequencing for childhood cancer in Denmark

    DEFF Research Database (Denmark)

    Gupta, Ramneek

    of host, tumour and gut microbiome’s genomes. In Europe, cancer accounts for approximately 25% of all deaths in children >1 year. Most cured patients are burdened by late effects, including risk of second cancer and debilitating toxicities. Recent advancements in genetic sequencing technology...

  12. First Complete Genome Sequence of Haemophilus influenzae Serotype a

    Science.gov (United States)

    Hayden, Kristy

    2017-01-01

    ABSTRACT Haemophilus influenzae is an important human pathogen that primarily infects small children. In recent years, H. influenzae serotype a has emerged as a significant cause of invasive disease among indigenous populations. Here, we present the first complete whole-genome sequence of H. influenzae serotype a. PMID:28104664

  13. Complete Genome Sequence of the Haloalkaliphilic, Hydrogen Producing Halanaerobium hydrogenoformans

    Energy Technology Data Exchange (ETDEWEB)

    Brown, Steven D [ORNL; Begemann, Matthew B [University of Wisconsin, Madison; Mormile, Dr. Melanie R. [Missouri University of Science and Technology; Wall, Judy D. [University of Missouri; Han, Cliff [Los Alamos National Laboratory (LANL); Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Pitluck, Samual [U.S. Department of Energy, Joint Genome Institute; Land, Miriam L [ORNL; Hauser, Loren John [ORNL; Elias, Dwayne A [ORNL

    2011-01-01

    Halanaerobium hydrogenoformans is an alkaliphilic bacterium capable of biohydrogen production at pH 11 and 7% (w/v) salt. We present the 2.6 Mb genome sequence to provide insights into its physiology and potential for bioenergy applications.

  14. Genome Sequence of Lactobacillus farciminis KCTC 3681▿

    Science.gov (United States)

    Nam, Seong-Hyeuk; Choi, Sang-Haeng; Kang, Aram; Kim, Dong-Wook; Kim, Ryong Nam; Kim, Aeri; Kim, Dae-Soo; Park, Hong-Seog

    2011-01-01

    Lactobacillus farciminis is one of the most prevalent lactic acid bacterial species present during the manufacturing process of kimchi, the best-known traditional Korean dish. Here, we present the draft genome sequence of the type strain Lactobacillus farciminis KCTC 3681 (2,498,309 bp, with a G+C content of 36.4%), which consists of 5 scaffolds. PMID:21257766

  15. Complete Genome Sequence of Bradyrhizobium diazoefficiens USDA 122, a Nitrogen-Fixing Soybean Symbiont

    Science.gov (United States)

    Sugawara, Masayuki; Tsukui, Takahiro; Kaneko, Takakazu; Ohtsubo, Yoshiyuki; Sato, Shusei; Nagata, Yuji; Tsuda, Masataka; Mitsui, Hisayuki

    2017-01-01

    ABSTRACT We report the complete genome sequence of Bradyrhizobium diazoefficiens USDA 122, a nitrogen-fixing soybean symbiont. The genome consists of a 9.1 Mb circular chromosome, and 8,551 coding sequences (CDSs) were predicted on the genome. The sequence will provide insight into the evolution of rhizobial genome, and the symbiotic compatibility with host plants. PMID:28254989

  16. SEXCMD: Development and validation of sex marker sequences for whole-exome/genome and RNA sequencing.

    Science.gov (United States)

    Jeong, Seongmun; Kim, Jiwoong; Park, Won; Jeon, Hongmin; Kim, Namshin

    2017-01-01

    Over the last decade, a large number of nucleotide sequences have been generated by next-generation sequencing technologies and deposited to public databases. However, most of these datasets do not specify the sex of individuals sampled because researchers typically ignore or hide this information. Male and female genomes in many species have distinctive sex chromosomes, XX/XY and ZW/ZZ, and expression levels of many sex-related genes differ between the sexes. Herein, we describe how to develop sex marker sequences from syntenic regions of sex chromosomes and use them to quickly identify the sex of individuals being analyzed. Array-based technologies routinely use either known sex markers or the B-allele frequency of X or Z chromosomes to deduce the sex of an individual. The same strategy has been used with whole-exome/genome sequence data; however, all reads must be aligned onto a reference genome to determine the B-allele frequency of the X or Z chromosomes. SEXCMD is a pipeline that can extract sex marker sequences from reference sex chromosomes and rapidly identify the sex of individuals from whole-exome/genome and RNA sequencing after training with a known dataset through a simple machine learning approach. The pipeline counts total numbers of hits from sex-specific marker sequences and identifies the sex of the individuals sampled based on the fact that XX/ZZ samples do not have Y or W chromosome hits. We have successfully validated our pipeline with mammalian (Homo sapiens; XY) and avian (Gallus gallus; ZW) genomes. Typical calculation time when applying SEXCMD to human whole-exome or RNA sequencing datasets is a few minutes, and analyzing human whole-genome datasets takes about 10 minutes. Another important application of SEXCMD is as a quality control measure to avoid mixing samples before bioinformatics analysis. SEXCMD comprises simple Python and R scripts and is freely available at https://github.com/lovemun/SEXCMD.

  17. Low coverage sequencing of two Asian elephant (Elephas maximus) genomes.

    Science.gov (United States)

    Dastjerdi, Akbar; Robert, Christelle; Watson, Mick

    2014-01-01

    There are three species of elephant that exist, the Asian elephant (Elephas maximus) and two species of African elephant (Loxodonta africana and Loxodonta cyclotis). The populations of all three species are dwindling, and are under threat due to factors, such as habitat destruction and ivory hunting. The species differ in many respects, including in their morphology and response to disease. The availability of elephant genome sequence data from all three elephant species will complement studies of behaviour, genetic diversity, evolution and disease resistance. We present low-coverage Illumina sequence data from two Asian elephants, representing approximately 5X and 2.5X coverage respectively. Both raw and aligned data are available, using the African elephant (L. africana) genome as a reference. The data presented here are an important addition to the available genetic and genomic information on Asian and African elephants.

  18. Phytophthora Genome Sequences Uncover Evolutionary Origins and Mechanisms of Pathogenesis

    Energy Technology Data Exchange (ETDEWEB)

    Lamour, Kurt H [ORNL; McDonald, W Hayes [ORNL; Savidor, Alon [ORNL

    2006-01-01

    Genome sequences of the soybean pathogen, Phytophthora sojae, and the sudden oak death pathogen, Phytophthora ramorum, suggest a photosynthetic past and reveal recent massive expansion and diversification of potential pathogenicity gene families. Abstract: Draft genome sequences of the soybean pathogen, Phytophthora sojae, and the sudden oak death pathogen, Phytophthora ramorum, have been determined. O mycetes such as these Phytophthora species share the kingdom Stramenopila with photosynthetic algae such as diatoms and the presence of many Phytophthora genes of probable phototroph origin support a photosynthetic ancestry for the stramenopiles. Comparison of the two species' genomes reveals a rapid expansion and diversification of many protein families associated with plant infection such as hydrolases, ABC transporters, protein toxins, proteinase inhibitors and, in particular, a superfamily of 700 proteins with similarity to known o mycete avirulence genes.

  19. Characterization of the complete genome sequence of pike fry rhabdovirus.

    Science.gov (United States)

    Chen, Hong-Lian; Liu, Hong; Liu, Zong-Xiao; He, Jun-Qiang; Gao, Long-Ying; Shi, Xiu-Jie; Jiang, Yu-Lin

    2009-01-01

    The complete genome sequence of pike fry rhabdovirus (PFRV), consisting of 11,097 nucleotides, was determined. The genome contains five genes, encoding the nucleoprotein (N), phosphoprotein (P), matrix protein (M), glycoprotein (G), and RNA-dependent RNA polymerase (L) protein in the order 3'-N-P-M-G-L-5'. 3' leader- and 5' trailer-sequences in the PFRV genome show inverse complementarity. The PFRV proteins share the highest homology to the proteins of spring viremia of carp virus (SVCV), ranging from 55.3 to 91.4%. Phylogenetic analysis of the five proteins showed that PFRV clusters with SVCV and is closely related to the mammalian vesiculoviruses, 903/87, STRV and SCRV.

  20. Bioinformatics for whole-genome shotgun sequencing of microbial communities.

    Directory of Open Access Journals (Sweden)

    Kevin Chen

    2005-07-01

    Full Text Available The application of whole-genome shotgun sequencing to microbial communities represents a major development in metagenomics, the study of uncultured microbes via the tools of modern genomic analysis. In the past year, whole-genome shotgun sequencing projects of prokaryotic communities from an acid mine biofilm, the Sargasso Sea, Minnesota farm soil, three deep-sea whale falls, and deep-sea sediments have been reported, adding to previously published work on viral communities from marine and fecal samples. The interpretation of this new kind of data poses a wide variety of exciting and difficult bioinformatics problems. The aim of this review is to introduce the bioinformatics community to this emerging field by surveying existing techniques and promising new approaches for several of the most interesting of these computational problems.

  1. MethylC-seq library preparation for base-resolution whole-genome bisulfite sequencing.

    Science.gov (United States)

    Urich, Mark A; Nery, Joseph R; Lister, Ryan; Schmitz, Robert J; Ecker, Joseph R

    2015-03-01

    Current high-throughput DNA sequencing technologies enable acquisition of billions of data points through which myriad biological processes can be interrogated, including genetic variation, chromatin structure, gene expression patterns, small RNAs and protein-DNA interactions. Here we describe the MethylC-sequencing (MethylC-seq) library preparation method, a 2-d protocol that enables the genome-wide identification of cytosine DNA methylation states at single-base resolution. The technique involves fragmentation of genomic DNA followed by adapter ligation, bisulfite conversion and limited amplification using adapter-specific PCR primers in preparation for sequencing. To date, this protocol has been successfully applied to genomic DNA isolated from primary cell culture, sorted cells and fresh tissue from over a thousand plant and animal samples.

  2. Assessment of whole genome amplification for sequence capture and massively parallel sequencing.

    Directory of Open Access Journals (Sweden)

    Johanna Hasmats

    Full Text Available Exome sequence capture and massively parallel sequencing can be combined to achieve inexpensive and rapid global analyses of the functional sections of the genome. The difficulties of working with relatively small quantities of genetic material, as may be necessary when sharing tumor biopsies between collaborators for instance, can be overcome using whole genome amplification. However, the potential drawbacks of using a whole genome amplification technology based on random primers in combination with sequence capture followed by massively parallel sequencing have not yet been examined in detail, especially in the context of mutation discovery in tumor material. In this work, we compare mutations detected in sequence data for unamplified DNA, whole genome amplified DNA, and RNA originating from the same tumor tissue samples from 16 patients diagnosed with non-small cell lung cancer. The results obtained provide a comprehensive overview of the merits of these techniques for mutation analysis. We evaluated the identified genetic variants, and found that most (74% of them were observed in both the amplified and the unamplified sequence data. Eighty-nine percent of the variations found by WGA were shared with unamplified DNA. We demonstrate a strategy for avoiding allelic bias by including RNA-sequencing information.

  3. Assessment of whole genome amplification for sequence capture and massively parallel sequencing.

    Science.gov (United States)

    Hasmats, Johanna; Gréen, Henrik; Orear, Cedric; Validire, Pierre; Huss, Mikael; Käller, Max; Lundeberg, Joakim

    2014-01-01

    Exome sequence capture and massively parallel sequencing can be combined to achieve inexpensive and rapid global analyses of the functional sections of the genome. The difficulties of working with relatively small quantities of genetic material, as may be necessary when sharing tumor biopsies between collaborators for instance, can be overcome using whole genome amplification. However, the potential drawbacks of using a whole genome amplification technology based on random primers in combination with sequence capture followed by massively parallel sequencing have not yet been examined in detail, especially in the context of mutation discovery in tumor material. In this work, we compare mutations detected in sequence data for unamplified DNA, whole genome amplified DNA, and RNA originating from the same tumor tissue samples from 16 patients diagnosed with non-small cell lung cancer. The results obtained provide a comprehensive overview of the merits of these techniques for mutation analysis. We evaluated the identified genetic variants, and found that most (74%) of them were observed in both the amplified and the unamplified sequence data. Eighty-nine percent of the variations found by WGA were shared with unamplified DNA. We demonstrate a strategy for avoiding allelic bias by including RNA-sequencing information.

  4. Whole genome sequence and comparative genomic sequence analysis of Helicoverpa armigera nucleopolyhedrovirus (HearNPV-L1) isolated from India.

    Science.gov (United States)

    Raghavendra, Ashika T; Jalali, Sushil K; Ojha, Rakshit; Shivalingaswamy, Timalapur M; Bhatnagar, Raj

    2017-03-01

    The whole genome of Helicoverpa armigera nucleopolyhedrovirus (HearNPV) from India, HearNPV-L1, was sequenced and analyzed, with a view to look for genes and/or nucleotide sequences that might be involved in the differences and virulence among other HearNPVs sequenced from other countries like SP1A (Spain), NNg1 (Kenya) and G4 (China). The entire nucleotide sequence of the HearNPV-L1 genome was 136,740 bp in length having GC content of 39.19% and contained 113 ORFs that could encode polypeptides with more than 50 amino acids (GenBank accession number KT013224). Two ORFs, viz., ORF 18 (300 bp) and ORF 19 (401 bp) identified were unique in HearNPV-L1 genome. Most of the HearNPV-L1 ORFs showed high similarity to NNg1, SP1A and G4 genomes. HearNPV-L1 genome contains 5 h (hr1-hr5), these regions were found 84-100% similar to hr region of NNg1, SP1A and G4 genomes. A total of four bro genes were observed in HearNPV-L1 genome, of which bro-a gene was 12 and 351 bp bigger than SP1A and G4 bro-a, respectively, while bro-b was 15 bp bigger SP1A and NNg1 bro-b, whereas 593 bp shorter than G4 bro-b, while bro-c was 12 bp shorter than NNg1, however bro-c was absent in G4 genome. HearNPV-L1 bro-d was 100% homologous to bro-d of SP1A, NNg1 and G4 genomes, respectively. The comparative analysis of HearNPV-L1 genome indicated that there are several other putative genes and nucleotide sequences that may be responsible for insecticidal activity in HearNPV-L1 isolate, however, further functional analysis of the hypothetical (putative) genes may help identifying the genes that are crucial for the virulence and insecticidal activity.

  5. Simplified large-scale Sanger genome sequencing for influenza A/H3N2 virus.

    Directory of Open Access Journals (Sweden)

    Hong Kai Lee

    Full Text Available BACKGROUND: The advent of next-generation sequencing technologies and the resultant lower costs of sequencing have enabled production of massive amounts of data, including the generation of full genome sequences of pathogens. However, the small genome size of the influenza virus arguably justifies the use of the more conventional Sanger sequencing technology which is still currently more readily available in most diagnostic laboratories. RESULTS: We present a simplified Sanger-based genome sequencing method for sequencing the influenza A/H3N2 virus in a large-scale format. The entire genome sequencing was completed with 19 reverse transcription-polymerase chain reactions (RT-PCRs and 39 sequencing reactions. This method was tested on 15 native clinical samples and 15 culture isolates, respectively, collected between 2009 and 2011. The 15 native clinical samples registered quantification cycle values ranging from 21.0 to 30.56, which were equivalent to 2.4×10(3-1.4×10(6 viral copies/µL of RNA extract. All the PCR-amplified products were sequenced directly without PCR product purification. Notably, high quality sequencing data up to 700 bp were generated for all the samples tested. The completed sequence covered 408,810 nucleotides in total, with 13,627 nucleotides per genome, attaining 100% coding completeness. Of all the bases produced, an average of 89.49% were Phred quality value 40 (QV40 bases (representing an accuracy of circa one miscall for every 10,000 bases or higher, and an average of 93.46% were QV30 bases (one miscall every 1000 bases or higher. CONCLUSIONS: This sequencing protocol has been shown to be cost-effective and less labor-intensive in obtaining full influenza genomes. The constant high quality of sequences generated imparts confidence in extending the application of this non-purified amplicon sequencing approach to other gene sequencing assays, with appropriate use of suitably designed primers.

  6. Draft genome sequence of the Algerian bee Apis mellifera intermissa

    Directory of Open Access Journals (Sweden)

    Nizar Jamal Haddad

    2015-06-01

    Full Text Available Apis mellifera intermissa is the native honeybee subspecies of Algeria. A. m. intermissa occurs in Tunisia, Algeria and Morocco, between the Atlas and the Mediterranean and Atlantic coasts. This bee is very important due to its high ability to adapt to great variations in climatic conditions and due to its preferable cleaning behavior. Here we report the draft genome sequence of this honey bee, its Whole Genome Shotgun project has been deposited at DDBJ/EMBL/GenBank under the accession JSUV00000000. The 240-Mb genome is being annotated and analyzed. Comparison with the genome of other Apis mellifera sub-species promises to yield insights into the evolution of adaptations to high temperature and resistance to Varroa parasite infestation.

  7. Genome sequence of the pea aphid Acyrthosiphon pisum

    DEFF Research Database (Denmark)

    Richards, S.; Gibbs, R. A.; Gerardo, N. M.;

    2010-01-01

    Aphids are important agricultural pests and also biological models for studies of insect-plant interactions, symbiosis, virus vectoring, and the developmental causes of extreme phenotypic plasticity. Here we present the 464 Mb draft genome assembly of the pea aphid Acyrthosiphon pisum. This first...... published whole genome sequence of a basal hemimetabolous insect provides an outgroup to the multiple published genomes of holometabolous insects. Pea aphids are host-plant specialists, they can reproduce both sexually and asexually, and they have coevolved with an obligate bacterial symbiont. Here we...... include genes involved in chromatin modification, miRNA synthesis, and sugar transport. Gene losses include genes central to the IMD immune pathway, selenoprotein utilization, purine salvage, and the entire urea cycle. The pea aphid genome reveals that only a limited number of genes have been acquired...

  8. Draft genome sequence of the Algerian bee Apis mellifera intermissa.

    Science.gov (United States)

    Haddad, Nizar Jamal; Loucif-Ayad, Wahida; Adjlane, Noureddine; Saini, Deepti; Manchiganti, Rushiraj; Krishnamurthy, Venkatesh; AlShagoor, Banan; Batainh, Ahmed Mahmud; Mugasimangalam, Raja

    2015-06-01

    Apis mellifera intermissa is the native honeybee subspecies of Algeria. A. m. intermissa occurs in Tunisia, Algeria and Morocco, between the Atlas and the Mediterranean and Atlantic coasts. This bee is very important due to its high ability to adapt to great variations in climatic conditions and due to its preferable cleaning behavior. Here we report the draft genome sequence of this honey bee, its Whole Genome Shotgun project has been deposited at DDBJ/EMBL/GenBank under the accession JSUV00000000. The 240-Mb genome is being annotated and analyzed. Comparison with the genome of other Apis mellifera sub-species promises to yield insights into the evolution of adaptations to high temperature and resistance to Varroa parasite infestation.

  9. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins.

    Science.gov (United States)

    Pruitt, Kim D; Tatusova, Tatiana; Maglott, Donna R

    2007-01-01

    NCBI's reference sequence (RefSeq) database (http://www.ncbi.nlm.nih.gov/RefSeq/) is a curated non-redundant collection of sequences representing genomes, transcripts and proteins. The database includes 3774 organisms spanning prokaryotes, eukaryotes and viruses, and has records for 2,879,860 proteins (RefSeq release 19). RefSeq records integrate information from multiple sources, when additional data are available from those sources and therefore represent a current description of the sequence and its features. Annotations include coding regions, conserved domains, tRNAs, sequence tagged sites (STS), variation, references, gene and protein product names, and database cross-references. Sequence is reviewed and features are added using a combined approach of collaboration and other input from the scientific community, prediction, propagation from GenBank and curation by NCBI staff. The format of all RefSeq records is validated, and an increasing number of tests are being applied to evaluate the quality of sequence and annotation, especially in the context of complete genomic sequence.

  10. Genomic MRI - a Public Resource for Studying Sequence Patterns within Genomic DNA

    Science.gov (United States)

    Prakash, Ashwin; Bechtel, Jason; Fedorov, Alexei

    2011-01-01

    Non-coding genomic regions in complex eukaryotes, including intergenic areas, introns, and untranslated segments of exons, are profoundly non-random in their nucleotide composition and consist of a complex mosaic of sequence patterns. These patterns include so-called Mid-Range Inhomogeneity (MRI) regions -- sequences 30-10000 nucleotides in length that are enriched by a particular base or combination of bases (e.g. (G+T)-rich, purine-rich, etc.). MRI regions are associated with unusual (non-B-form) DNA structures that are often involved in regulation of gene expression, recombination, and other genetic processes (Fedorova & Fedorov 2010). The existence of a strong fixation bias within MRI regions against mutations that tend to reduce their sequence inhomogeneity additionally supports the functionality and importance of these genomic sequences (Prakash et al. 2009). Here we demonstrate a freely available Internet resource -- the Genomic MRI program package -- designed for computational analysis of genomic sequences in order to find and characterize various MRI patterns within them (Bechtel et al. 2008). This package also allows generation of randomized sequences with various properties and level of correspondence to the natural input DNA sequences. The main goal of this resource is to facilitate examination of vast regions of non-coding DNA that are still scarcely investigated and await thorough exploration and recognition. PMID:21610667

  11. Sequencing Crop Genomes: A Gateway to Improve Tropical Agriculture.

    Science.gov (United States)

    Thottathil, Gincy Paily; Jayasekaran, Kandakumar; Othman, Ahmad Sofiman

    2016-02-01

    Agricultural development in the tropics lags behind development in the temperate latitudes due to the lack of advanced technology, and various biotic and abiotic factors. To cope with the increasing demand for food and other plant-based products, improved crop varieties have to be developed. To breed improved varieties, a better understanding of crop genetics is necessary. With the advent of next-generation DNA sequencing technologies, many important crop genomes have been sequenced. Primary importance has been given to food crops, including cereals, tuber crops, vegetables, and fruits. The DNA sequence information is extremely valuable for identifying key genes controlling important agronomic traits and for identifying genetic variability among the cultivars. However, massive DNA re-sequencing and gene expression studies have to be performed to substantially improve our understanding of crop genetics. Application of the knowledge obtained from the genomes, transcriptomes, expression studies, and epigenetic studies would enable the development of improved varieties and may lead to a second green revolution. The applications of next generation DNA sequencing technologies in crop improvement, its limitations, future prospects, and the features of important crop genome projects are reviewed herein.

  12. MetaSim: a sequencing simulator for genomics and metagenomics.

    Directory of Open Access Journals (Sweden)

    Daniel C Richter

    Full Text Available BACKGROUND: The new research field of metagenomics is providing exciting insights into various, previously unclassified ecological systems. Next-generation sequencing technologies are producing a rapid increase of environmental data in public databases. There is great need for specialized software solutions and statistical methods for dealing with complex metagenome data sets. METHODOLOGY/PRINCIPAL FINDINGS: To facilitate the development and improvement of metagenomic tools and the planning of metagenomic projects, we introduce a sequencing simulator called MetaSim. Our software can be used to generate collections of synthetic reads that reflect the diverse taxonomical composition of typical metagenome data sets. Based on a database of given genomes, the program allows the user to design a metagenome by specifying the number of genomes present at different levels of the NCBI taxonomy, and then to collect reads from the metagenome using a simulation of a number of different sequencing technologies. A population sampler optionally produces evolved sequences based on source genomes and a given evolutionary tree. CONCLUSIONS/SIGNIFICANCE: MetaSim allows the user to simulate individual read datasets that can be used as standardized test scenarios for planning sequencing projects or for benchmarking metagenomic software.

  13. Quantifying Next Generation Sequencing Sample Pre-Processing Bias in HIV-1 Complete Genome Sequencing

    Directory of Open Access Journals (Sweden)

    Bram Vrancken

    2016-01-01

    Full Text Available Genetic analyses play a central role in infectious disease research. Massively parallelized “mechanical cloning” and sequencing technologies were quickly adopted by HIV researchers in order to broaden the understanding of the clinical importance of minor drug-resistant variants. These efforts have, however, remained largely limited to small genomic regions. The growing need to monitor multiple genome regions for drug resistance testing, as well as the obvious benefit for studying evolutionary and epidemic processes makes complete genome sequencing an important goal in viral research. In addition, a major drawback for NGS applications to RNA viruses is the need for large quantities of input DNA. Here, we use a generic overlapping amplicon-based near full-genome amplification protocol to compare low-input enzymatic fragmentation (Nextera™ with conventional mechanical shearing for Roche 454 sequencing. We find that the fragmentation method has only a modest impact on the characterization of the population composition and that for reliable results, the variation introduced at all steps of the procedure—from nucleic acid extraction to sequencing—should be taken into account, a finding that is also relevant for NGS technologies that are now more commonly used. Furthermore, by applying our protocol to deep sequence a number of pre-therapy plasma and PBMC samples, we illustrate the potential benefits of a near complete genome sequencing approach in routine genotyping.

  14. Microsatellite discovery by deep sequencing of enriched genomic libraries.

    Science.gov (United States)

    Santana, Quentin; Coetzee, Martin; Steenkamp, Emma; Mlonyeni, Osmond; Hammond, Gifty; Wingfield, Michael; Wingfield, Brenda

    2009-03-01

    Robust molecular markers such as microsatellites are important tools used to understand the dynamics of natural populations, but their identification and development are typically time consuming and labor intensive. The recent emergence of so-called next-generation sequencing raised the question as to whether this new technology might be applied to microsatellite development. Following this view, we considered whether deep sequencing using the 454 Life Sciences/Roche GS-FLX genome sequencing system could lead to a rapid protocol to develop microsatellite primers as markers for genetic studies. For this purpose, genomic DNA was sourced from three unrelated organisms: a fungus (the pine pathogen Fusarium circinatum), an insect (the pine-damaging wasp Sirex noctilio), and the wasp's associated nematode parasite (Deladenus siricidicola). Two methods, FIASCO (fast isolation by AFLP of sequences containing repeats) and ISSR-PCR (inter-simple sequence repeat PCR), were used to generate microsatellite-enriched DNA for the 454 libraries. From the resulting 1.2-1.7 megabases of DNA sequence data, we were able to identify 873 microsatellites that have sufficient flanking sequence available for primer design and potential amplification. This approach to microsatellite discovery was substantially more rapid, effective, and economical than other methods, and this study has shown that pyrosequencing provides an outstanding new technology that can be applied to this purpose.

  15. Genome-wide microsatellite identification in the fungus Anisogramma anomala using Illumina sequencing and genome assembly.

    Directory of Open Access Journals (Sweden)

    Guohong Cai

    Full Text Available High-throughput sequencing has been dramatically accelerating the discovery of microsatellite markers (also known as Simple Sequence Repeats. Both 454 and Illumina reads have been used directly in microsatellite discovery and primer design (the "Seq-to-SSR" approach. However, constraints of this approach include: 1 many microsatellite-containing reads do not have sufficient flanking sequences to allow primer design, and 2 difficulties in removing microsatellite loci residing in longer, repetitive regions. In the current study, we applied the novel "Seq-Assembly-SSR" approach to overcome these constraints in Anisogramma anomala. In our approach, Illumina reads were first assembled into a draft genome, and the latter was then used in microsatellite discovery. A. anomala is an obligate biotrophic ascomycete that causes eastern filbert blight disease of commercial European hazelnut. Little is known about its population structure or diversity. Approximately 26 M 146 bp Illumina reads were generated from a paired-end library of a fungal strain from Oregon. The reads were assembled into a draft genome of 333 Mb (excluding gaps, with contig N50 of 10,384 bp and scaffold N50 of 32,987 bp. A bioinformatics pipeline identified 46,677 microsatellite motifs at 44,247 loci, including 2,430 compound loci. Primers were successfully designed for 42,923 loci (97%. After removing 2,886 loci close to assembly gaps and 676 loci in repetitive regions, a genome-wide microsatellite database of 39,361 loci was generated for the fungus. In experimental screening of 236 loci using four geographically representative strains, 228 (96.6% were successfully amplified and 214 (90.7% produced single PCR products. Twenty-three (9.7% were found to be perfect polymorphic loci. A small-scale population study using 11 polymorphic loci revealed considerable gene diversity. Clustering analysis grouped isolates of this fungus into two clades in accordance with their geographic origins

  16. Genome-wide microsatellite identification in the fungus Anisogramma anomala using Illumina sequencing and genome assembly.

    Science.gov (United States)

    Cai, Guohong; Leadbetter, Clayton W; Muehlbauer, Megan F; Molnar, Thomas J; Hillman, Bradley I

    2013-01-01

    High-throughput sequencing has been dramatically accelerating the discovery of microsatellite markers (also known as Simple Sequence Repeats). Both 454 and Illumina reads have been used directly in microsatellite discovery and primer design (the "Seq-to-SSR" approach). However, constraints of this approach include: 1) many microsatellite-containing reads do not have sufficient flanking sequences to allow primer design, and 2) difficulties in removing microsatellite loci residing in longer, repetitive regions. In the current study, we applied the novel "Seq-Assembly-SSR" approach to overcome these constraints in Anisogramma anomala. In our approach, Illumina reads were first assembled into a draft genome, and the latter was then used in microsatellite discovery. A. anomala is an obligate biotrophic ascomycete that causes eastern filbert blight disease of commercial European hazelnut. Little is known about its population structure or diversity. Approximately 26 M 146 bp Illumina reads were generated from a paired-end library of a fungal strain from Oregon. The reads were assembled into a draft genome of 333 Mb (excluding gaps), with contig N50 of 10,384 bp and scaffold N50 of 32,987 bp. A bioinformatics pipeline identified 46,677 microsatellite motifs at 44,247 loci, including 2,430 compound loci. Primers were successfully designed for 42,923 loci (97%). After removing 2,886 loci close to assembly gaps and 676 loci in repetitive regions, a genome-wide microsatellite database of 39,361 loci was generated for the fungus. In experimental screening of 236 loci using four geographically representative strains, 228 (96.6%) were successfully amplified and 214 (90.7%) produced single PCR products. Twenty-three (9.7%) were found to be perfect polymorphic loci. A small-scale population study using 11 polymorphic loci revealed considerable gene diversity. Clustering analysis grouped isolates of this fungus into two clades in accordance with their geographic origins. Thus, the

  17. Complete genome sequence of arracacha virus B: a novel cheravirus.

    Science.gov (United States)

    Adams, I P; Glover, R; Souza-Richards, R; Bennett, S; Hany, U; Boonham, N

    2013-04-01

    The complete genome sequences of RNA1 and RNA2 of the oca strain of the potato virus arracacha virus B were determined using next-generation sequencing. The RNA1 molecule is predicted to encode a 259-kDa polyprotein with homology to proteins of the cheraviruses apple latent spherical virus (ALSV) and cherry rasp leaf virus (CRLV). The RNA2 molecule is predicted to encode a 102-kDa polyprotein which also has homology to the corresponding protein of ALSV and, to a lesser degree, CRLV (30 % for RNA1, 24 % for RNA2). Detailed analysis of the genome sequence confirms that AVB is a distinct member of the genus Cheravirus.

  18. The "most wanted" taxa from the human microbiome for whole genome sequencing.

    Directory of Open Access Journals (Sweden)

    Anthony A Fodor

    Full Text Available The goal of the Human Microbiome Project (HMP is to generate a comprehensive catalog of human-associated microorganisms including reference genomes representing the most common species. Toward this goal, the HMP has characterized the microbial communities at 18 body habitats in a cohort of over 200 healthy volunteers using 16S rRNA gene (16S sequencing and has generated nearly 1,000 reference genomes from human-associated microorganisms. To determine how well current reference genome collections capture the diversity observed among the healthy microbiome and to guide isolation and future sequencing of microbiome members, we compared the HMP's 16S data sets to several reference 16S collections to create a 'most wanted' list of taxa for sequencing. Our analysis revealed that the diversity of commonly occurring taxa within the HMP cohort microbiome is relatively modest, few novel taxa are represented by these OTUs and many common taxa among HMP volunteers recur across different populations of healthy humans. Taken together, these results suggest that it should be possible to perform whole-genome sequencing on a large fraction of the human microbiome, including the 'most wanted', and that these sequences should serve to support microbiome studies across multiple cohorts. Also, in stark contrast to other taxa, the 'most wanted' organisms are poorly represented among culture collections suggesting that novel culture- and single-cell-based methods will be required to isolate these organisms for sequencing.

  19. Ensemble analysis of adaptive compressed genome sequencing strategies

    Science.gov (United States)

    2014-01-01

    Background Acquiring genomes at single-cell resolution has many applications such as in the study of microbiota. However, deep sequencing and assembly of all of millions of cells in a sample is prohibitively costly. A property that can come to rescue is that deep sequencing of every cell should not be necessary to capture all distinct genomes, as the majority of cells are biological replicates. Biologically important samples are often sparse in that sense. In this paper, we propose an adaptive compressed method, also known as distilled sensing, to capture all distinct genomes in a sparse microbial community with reduced sequencing effort. As opposed to group testing in which the number of distinct events is often constant and sparsity is equivalent to rarity of an event, sparsity in our case means scarcity of distinct events in comparison to the data size. Previously, we introduced the problem and proposed a distilled sensing solution based on the breadth first search strategy. We simulated the whole process which constrained our ability to study the behavior of the algorithm for the entire ensemble due to its computational intensity. Results In this paper, we modify our previous breadth first search strategy and introduce the depth first search strategy. Instead of simulating the entire process, which is intractable for a large number of experiments, we provide a dynamic programming algorithm to analyze the behavior of the method for the entire ensemble. The ensemble analysis algorithm recursively calculates the probability of capturing every distinct genome and also the expected total sequenced nucleotides for a given population profile. Our results suggest that the expected total sequenced nucleotides grows proportional to log of the number of cells and proportional linearly with the number of distinct genomes. The probability of missing a genome depends on its abundance and the ratio of its size over the maximum genome size in the sample. The modified resource

  20. Draft genome sequence of Aeromonas hydrophila TN97-08

    Science.gov (United States)

    Aeromonas hydrophila is an opportunistic Gram-negative species causing disease in fish and mammals. The genus Aeromonas affects a variety of aquatic organisms and lives in diverse aquatic ecosystems (1). There are 39 A. hydrophila genomes currently available in GenBank. In the current study, we repo...

  1. UFO: a web server for ultra-fast functional profiling of whole genome protein sequences

    Directory of Open Access Journals (Sweden)

    Meinicke Peter

    2009-09-01

    Full Text Available Abstract Background Functional profiling is a key technique to characterize and compare the functional potential of entire genomes. The estimation of profiles according to an assignment of sequences to functional categories is a computationally expensive task because it requires the comparison of all protein sequences from a genome with a usually large database of annotated sequences or sequence families. Description Based on machine learning techniques for Pfam domain detection, the UFO web server for ultra-fast functional profiling allows researchers to process large protein sequence collections instantaneously. Besides the frequencies of Pfam and GO categories, the user also obtains the sequence specific assignments to Pfam domain families. In addition, a comparison with existing genomes provides dissimilarity scores with respect to 821 reference proteomes. Considering the underlying UFO domain detection, the results on 206 test genomes indicate a high sensitivity of the approach. In comparison with current state-of-the-art HMMs, the runtime measurements show a considerable speed up in the range of four orders of magnitude. For an average size prokaryotic genome, the computation of a functional profile together with its comparison typically requires about 10 seconds of processing time. Conclusion For the first time the UFO web server makes it possible to get a quick overview on the functional inventory of newly sequenced organisms. The genome scale comparison with a large number of precomputed profiles allows a first guess about functionally related organisms. The service is freely available and does not require user registration or specification of a valid email address.

  2. Complete genome sequence of Streptobacillus moniliformis type strain (9901T)

    Energy Technology Data Exchange (ETDEWEB)

    Nolan, Matt [U.S. Department of Energy, Joint Genome Institute; Gronow, Sabine [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute; Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Copeland, A [U.S. Department of Energy, Joint Genome Institute; Lucas, Susan [U.S. Department of Energy, Joint Genome Institute; Glavina Del Rio, Tijana [U.S. Department of Energy, Joint Genome Institute; Chen, Feng [U.S. Department of Energy, Joint Genome Institute; Sims, David [Los Alamos National Laboratory (LANL); Meincke, Linda [Los Alamos National Laboratory (LANL); Bruce, David [Los Alamos National Laboratory (LANL); Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Han, Cliff [Los Alamos National Laboratory (LANL); Detter, J. Chris [U.S. Department of Energy, Joint Genome Institute; Ovchinnikova, Galina [U.S. Department of Energy, Joint Genome Institute; Pati, Amrita [U.S. Department of Energy, Joint Genome Institute; Mavromatis, K [U.S. Department of Energy, Joint Genome Institute; Mikhailova, Natalia [U.S. Department of Energy, Joint Genome Institute; Chen, Amy [U.S. Department of Energy, Joint Genome Institute; Palaniappan, Krishna [U.S. Department of Energy, Joint Genome Institute; Land, Miriam L [ORNL; Hauser, Loren John [ORNL; Chang, Yun-Juan [ORNL; Jeffries, Cynthia [Oak Ridge National Laboratory (ORNL); Rohde, Manfred [HZI - Helmholtz Centre for Infection Research, Braunschweig, Germany; Sproer, Cathrin [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Goker, Markus [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Bristow, James [U.S. Department of Energy, Joint Genome Institute; Eisen, Jonathan [U.S. Department of Energy, Joint Genome Institute; Markowitz, Victor [U.S. Department of Energy, Joint Genome Institute; Hugenholtz, Philip [U.S. Department of Energy, Joint Genome Institute; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute; Klenk, Hans-Peter [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Chain, Patrick S. G. [Lawrence Livermore National Laboratory (LLNL)

    2009-01-01

    Streptobacillus moniliformis Levaditi et al. 1925 is the sole and type species of the genus, and is of phylogenetic interest because of its isolated location in the sparsely populated and neither taxonomically nor genomically much accessed family 'Leptotrichiaceae' within the phylum 'Fusobacteria'. S. moniliformis, a Gram-negative, non-motile and pleomorphic bacterium, is the etiologic agent of rat bite fever and Haverhill fever. Strain 9901T, the type strain of the species, was isolated from a patient with rat bite fever. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is only the second completed genome sequence of the order 'Fusobacteriales' and no more than the third sequence from the phylum 'Fusobacteria'. The 1,662,578 bp long chromosome and the 10,702 bp plasmid with a total of 1511 protein-coding and 55 RNA genes are part of the Genomic Encyclopedia of Bacteria and Archaea project.

  3. Complete genome sequence of Actinosynnema mirum type strain (101T)

    Energy Technology Data Exchange (ETDEWEB)

    Land, Miriam L [ORNL; Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute; Mayilraj, Shanmugam [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Chen, Feng [U.S. Department of Energy, Joint Genome Institute; Copeland, A [U.S. Department of Energy, Joint Genome Institute; Glavina Del Rio, Tijana [U.S. Department of Energy, Joint Genome Institute; Nolan, Matt [U.S. Department of Energy, Joint Genome Institute; Lucas, Susan [U.S. Department of Energy, Joint Genome Institute; Tice, Hope [U.S. Department of Energy, Joint Genome Institute; Cheng, Jan-Fang [U.S. Department of Energy, Joint Genome Institute; Chertkov, Olga [Los Alamos National Laboratory (LANL); Bruce, David [U.S. Department of Energy, Joint Genome Institute; Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Pitluck, Sam [U.S. Department of Energy, Joint Genome Institute; Rohde, Manfred [HZI - Helmholtz Centre for Infection Research, Braunschweig, Germany; Goker, Markus [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Pati, Amrita [U.S. Department of Energy, Joint Genome Institute; Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Mavromatis, K [U.S. Department of Energy, Joint Genome Institute; Chen, Amy [U.S. Department of Energy, Joint Genome Institute; Palaniappan, Krishna [U.S. Department of Energy, Joint Genome Institute; Hauser, Loren John [ORNL; Chang, Yun-Juan [ORNL; Jeffries, Cynthia [Oak Ridge National Laboratory (ORNL); Brettin, Thomas S [ORNL; Detter, J. Chris [U.S. Department of Energy, Joint Genome Institute; Han, Cliff [Los Alamos National Laboratory (LANL); Chain, Patrick S. G. [Lawrence Livermore National Laboratory (LLNL); Tindall, Brian [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Bristow, James [U.S. Department of Energy, Joint Genome Institute; Eisen, Jonathan [U.S. Department of Energy, Joint Genome Institute; Markowitz, Victor [U.S. Department of Energy, Joint Genome Institute; Hugenholtz, Philip [U.S. Department of Energy, Joint Genome Institute; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute; Klenk, Hans-Peter [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany

    2009-01-01

    Actinosynnema mirum Hasegawa et al. 1978 is the type species of the genus, and is of phylogenetic interest because of its central phylogenetic location in the Actino-synnemataceae, a rapidly growing family within the actinobacterial suborder Pseudo-nocardineae. A. mirum is characterized by its motile spores borne on synnemata and as a producer of nocardicin antibiotics. It is capable of growing aerobically and under a moderate CO2 atmosphere. The strain is a Gram-positive, aerial and substrate mycelium producing bacterium, originally isolated from a grass blade collected from the Raritan River, New Jersey. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first complete genome sequence of a member of the family Actinosynnemataceae, and only the second sequence from the actinobacterial suborder Pseudonocardineae. The 8,248,144 bp long single replicon genome with its 7100 protein-coding and 77 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  4. Complete genome sequence of Actinosynnema mirum type strain (101T)

    Energy Technology Data Exchange (ETDEWEB)

    Land, Miriam; Lapidus, Alla; Mayilraj, Shanmugam; Chen, Feng; Copeland, Alex; Glavina Del Rio, Tijana; Nolan, Matt; Lucas, Susan; Tice, Hope; Cheng, Jan-Fang; Chertkov, Olga; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Rohde, Manfred; Goker, Markus; Pati, Amrita; Ivanova, Natalia; Mavrommatis, Konstantinos; Chen, Amy; Palaniappan, Krishna; Hauser, Loren; Chang, Yun-Juan; Jefferies, Cynthia; Brettin, Thomas; Detter, John C.; Han, Cliff; Chain, Patrick; Tindall, Brian; Bristow, James; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C.; Klenk, Hans-Peter

    2009-05-20

    Actinosynnema mirum Hasegawa et al. 1978 is the type species of the genus, and is of phylogenetic interest because of its central phylogenetic location in the Actino-synnemataceae, a rapidly growing family within the actinobacterial suborder Pseudo-nocardineae. A. mirum is characterized by its motile spores borne on synnemata and as a producer of nocardicin antibiotics. It is capable of growing aerobically and under a moderate CO2 atmosphere. The strain is a Gram-positive, aerial and substrate mycelium producing bacterium, originally isolated from a grass blade collected from the Raritan River, New Jersey. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first complete genome sequence of a member of the family Actinosynnemataceae, and only the second sequence from the actinobacterial suborder Pseudonocardineae. The 8,248,144 bp long single replicon genome with its 7100 protein-coding and 77 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  5. The complete mitochondrial genome sequence of Eimeria magna (Apicomplexa: Coccidia).

    Science.gov (United States)

    Tian, Si-Qin; Cui, Ping; Fang, Su-Fang; Liu, Guo-Hua; Wang, Chun-Ren; Zhu, Xing-Quan

    2015-01-01

    In the present study, we determined the complete mitochondrial DNA (mtDNA) sequence of Eimeria magna from rabbits for the first time, and compared its gene contents and genome organizations with that of seven Eimeria spp. from domestic chickens. The size of the complete mt genome sequence of E. magna is 6249 bp, which consists of 3 protein-coding genes (cytb, cox1 and cox3), 12 gene fragments for the large subunit (LSU) rRNA, and 7 gene fragments for the small subunit (SSU) rRNA, without transfer RNA genes, in accordance with that of Eimeria spp. from chickens. The putative direction of translation for three genes (cytb, cox1 and cox3) was the same as those of Eimeria species from domestic chickens. The content of A + T is 65.16% for E. magna mt genome (29.73% A, 35.43% T, 17.09 G and 17.75% C). The E. magna mt genome sequence provides novel mtDNA markers for studying the molecular epidemiology and population genetics of Eimeria spp. and has implications for the molecular diagnosis and control of rabbit coccidiosis.

  6. Whole genome sequencing as the ultimate tool to diagnose tuberculosis

    Directory of Open Access Journals (Sweden)

    Dick van Soolingen

    2016-01-01

    Full Text Available In the past two decades, DNA techniques have been increasingly used in the laboratory diagnosis of tuberculosis (TB. The (sub species of the Mycobacterium tuberculosis complex are usually identified using reverse line blot techniques. The resistance is predicted by the detection of mutations in genes associated with resistance. Nevertheless, all cases are still subjected to cumbersome phenotypic resistance testing. The production of a strain-characteristic DNA fingerprint, to investigate the epidemiology of TB, is done by the 24-locus variable number tandem repeat (VNTR typing. However, most of the molecular techniques in the diagnosis of TB can eventually be replaced by whole genome sequencing (WGS. Many international TB reference laboratories are currently working on the introduction of WGS; however, standardization in the international context is lacking. The European Centre for Infectious Disease Prevention and Control in Stockholm, Sweden organizes a yearly round of quality control on VNTR typing and in 2015 for the first time also WGS. In this first proficiency study, only three out of eight international TB laboratories produced WGS results in line with those of the reference laboratory. The whole process of DNA isolation, purification, quantification, sequencing, and analysis/interpretation of data is still under development. In this presentation, many aspects will be covered that influence the quality and interpretation of WGS results. The turn-around-time, analysis, and utility of WGS will be discussed. Moreover, the experiences in the use of WGS in the molecular epidemiology of TB in The Netherlands are detailed. It can be concluded that many difficulties still have to be conquered. The state of the art is that bacteria still have to be cultured to have sufficient quality and quantity of DNA for succesful WGS. The quality of sequencing has improved significantly over the past 7 years, and the detection of mutations has, therefore

  7. Using 454 technology for long-PCR based sequencing of the complete mitochondrial genome from single Haemonchus contortus (Nematoda

    Directory of Open Access Journals (Sweden)

    Waeschenbach Andrea

    2008-01-01

    Full Text Available Abstract Background Mitochondrial (mt genomes represent a rich source of molecular markers for a range of applications, including population genetics, systematics, epidemiology and ecology. In the present study, we used 454 technology (or the GS20, massively parallel picolitre reactor platform to determine the complete mt genome of Haemonchus contortus (Nematoda: Trichostrongylidae, a parasite of substantial agricultural, veterinary and economic significance. We validate this approach by comparison with mt sequences from publicly available expressed sequence tag (EST and genomic survey sequence (GSS data sets. Results The complete mt genome of Haemonchus contortus was sequenced directly from long-PCR amplified template utilizing genomic DNA (~20–40 ng from a single adult male using 454 technology. A single contig was assembled and compared against mt sequences mined from publicly available EST (NemBLAST and GSS datasets. The comparison demonstrated that the 454 technology platform is reliable for the sequencing of AT-rich mt genomes from nematodes. The mt genome sequenced for Haemonchus contortus was 14,055 bp in length and was highly AT-rich (78.1%. In accordance with other chromadorean nematodes studied to date, the mt genome of H. contortus contained 36 genes (12 protein coding, 22 tRNAs, rrnL and rrnS and was similar in structure, size and gene arrangement to those characterized previously for members of the Strongylida. Conclusion The present study demonstrates the utility of 454 technology for the rapid determination of mt genome sequences from tiny amounts of DNA and reveals a wealth of mt genomic data in current databases available for mining. This approach provides a novel platform for high-throughput sequencing of mt genomes from nematodes and other organisms.

  8. Full-Genome Sequence of a Reassortant H1N1 Swine Influenza Virus Isolated from Pigs in Italy.

    Science.gov (United States)

    Chiapponi, Chiara; Baioni, Laura; Luppi, Andrea; Moreno, Ana; Castellan, Alberto; Foni, Emanuela

    2013-10-03

    In this study, the full-genome sequence of a novel reassortant H1N1 swine influenza virus (SIV) is reported. The isolate has a hemagglutinin (HA) gene of the pandemic H1N1 influenza virus, but it carries the seven genome segments of the avian-origin H1N1 SIV currently circulating in European pig farms.

  9. MIR retrotransposon sequences provide insulators to the human genome.

    Science.gov (United States)

    Wang, Jianrong; Vicente-García, Cristina; Seruggia, Davide; Moltó, Eduardo; Fernandez-Miñán, Ana; Neto, Ana; Lee, Elbert; Gómez-Skarmeta, José Luis; Montoliu, Lluís; Lunyak, Victoria V; Jordan, I King

    2015-08-11

    Insulators are regulatory elements that help to organize eukaryotic chromatin via enhancer-blocking and chromatin barrier activity. Although there are several examples of transposable element (TE)-derived insulators, the contribution of TEs to human insulators has not been systematically explored. Mammalian-wide interspersed repeats (MIRs) are a conserved family of TEs that have substantial regulatory capacity and share sequence characteristics with tRNA-related insulators. We sought to evaluate whether MIRs can serve as insulators in the human genome. We applied a bioinformatic screen using genome sequence and functional genomic data from CD4(+) T cells to identify a set of 1,178 predicted MIR insulators genome-wide. These predicted MIR insulators were computationally tested to serve as chromatin barriers and regulators of gene expression in CD4(+) T cells. The activity of predicted MIR insulators was experimentally validated using in vitro and in vivo enhancer-blocking assays. MIR insulators are enriched around genes of the T-cell receptor pathway and reside at T-cell-specific boundaries of repressive and active chromatin. A total of 58% of the MIR insulators predicted here show evidence of T-cell-specific chromatin barrier and gene regulatory activity. MIR insulators appear to be CCCTC-binding factor (CTCF) independent and show a distinct local chromatin environment with marked peaks for RNA Pol III and a number of histone modifications, suggesting that MIR insulators recruit transcriptional complexes and chromatin modifying enzymes in situ to help establish chromatin and regulatory domains in the human genome. The provisioning of insulators by MIRs across the human genome suggests a specific mechanism by which TE sequences can be used to modulate gene regulatory networks.

  10. Next-Generation Sequencing and Genome Editing in Plant Virology

    Directory of Open Access Journals (Sweden)

    Ahmed Hadidi

    2016-08-01

    Full Text Available Next-generation sequencing (NGS has been applied to plant virology since 2009. NGS provides highly efficient, rapid, low cost DNA or RNA high-throughput sequencing of the genomes of plant viruses and viroids and of the specific small RNAs generated during the infection process. These small RNAs, which cover frequently the whole genome of the infectious agent, are 21-24 nt long and are known as vsRNAs for viruses and vd-sRNAs for viroids. NGS has been used in a number of studies in plant virology including, but not limited to, discovery of novel viruses and viroids as well as detection and identification of those pathogens already known, analysis of genome diversity and evolution, and study of pathogen epidemiology. The genome engineering editing method, clustered regularly interspaced short palindromic repeats (CRISPR-Cas9 system has been successfully used recently to engineer resistance to DNA geminiviruses (family, Geminiviridae by targeting different viral genome sequences in infected Nicotiana benthamiana or Arabidopsis plants. The DNA viruses targeted include tomato yellow leaf curl virus and merremia mosaic virus (begomovirus; beet curly top virus and beet severe curly top virus (curtovirus; and bean yellow dwarf virus (mastrevirus. The technique has also been used against the RNA viruses zucchini yellow mosaic virus, papaya ringspot virus and turnip mosaic virus (potyvirus and cucumber vein yellowing virus (ipomovirus, family, Potyviridae by targeting the translation initiation genes eIF4E in cucumber or Arabidopsis plants. From these recent advances of major importance, it is expected that NGS and CRISPR-Cas technologies will play a significant role in the very near future in advancing the field of plant virology and connecting it with other related fields of biology.Keywords: Next-generation sequencing, NGS, plant virology, plant viruses, viroids, resistance to plant viruses by CRISPR-Cas9

  11. ReRep: Computational detection of repetitive sequences in genome survey sequences (GSS

    Directory of Open Access Journals (Sweden)

    Alves-Ferreira Marcelo

    2008-09-01

    Full Text Available Abstract Background Genome survey sequences (GSS offer a preliminary global view of a genome since, unlike ESTs, they cover coding as well as non-coding DNA and include repetitive regions of the genome. A more precise estimation of the nature, quantity and variability of repetitive sequences very early in a genome sequencing project is of considerable importance, as such data strongly influence the estimation of genome coverage, library quality and progress in scaffold construction. Also, the elimination of repetitive sequences from the initial assembly process is important to avoid errors and unnecessary complexity. Repetitive sequences are also of interest in a variety of other studies, for instance as molecular markers. Results We designed and implemented a straightforward pipeline called ReRep, which combines bioinformatics tools for identifying repetitive structures in a GSS dataset. In a case study, we first applied the pipeline to a set of 970 GSSs, sequenced in our laboratory from the human pathogen Leishmania braziliensis, the causative agent of leishmaniosis, an important public health problem in Brazil. We also verified the applicability of ReRep to new sequencing technologies using a set of 454-reads of an Escheria coli. The behaviour of several parameters in the algorithm is evaluated and suggestions are made for tuning of the analysis. Conclusion The ReRep approach for identification of repetitive elements in GSS datasets proved to be straightforward and efficient. Several potential repetitive sequences were found in a L. braziliensis GSS dataset generated in our laboratory, and further validated by the analysis of a more complete genomic dataset from the EMBL and Sanger Centre databases. ReRep also identified most of the E. coli K12 repeats prior to assembly in an example dataset obtained by automated sequencing using 454 technology. The parameters controlling the algorithm behaved consistently and may be tuned to the properties

  12. Phytophthora Genome Sequences Uncover Evolutionary Origins and Mechanisms of Pathogenesis

    Energy Technology Data Exchange (ETDEWEB)

    Tyler, Brett M.; Tripathy, Sucheta; Zhang, Xuemin; Dehal, Paramvir; Jiang, Rays H. Y.; Aerts, Andrea; Arredondo, Felipe D.; Baxter, Laura; Bensasson, Douda; Beynon, JIm L.; Chapman, Jarrod; Damasceno, Cynthia M. B.; Dorrance, Anne E.; Dou, Daolong; Dickerman, Allan W.; Dubchak, Inna L.; Garbelotto, Matteo; Gijzen, Mark; Gordon, Stuart G.; Govers, Francine; Grunwald, NIklaus J.; Huang, Wayne; Ivors, Kelly L.; Jones, Richard W.; Kamoun, Sophien; Krampis, Konstantinos; Lamour, Kurt H.; Lee, Mi-Kyung; McDonald, W. Hayes; Medina, Monica; Meijer, Harold J. G.; Nordberg, Erik K.; Maclean, Donald J.; Ospina-Giraldo, Manuel D.; Morris, Paul F.; Phuntumart, Vipaporn; Putnam, Nicholas J.; Rash, Sam; Rose, Jocelyn K. C.; Sakihama, Yasuko; Salamov, Asaf A.; Savidor, Alon; Scheuring, Chantel F.; Smith, Brian M.; Sobral, Bruno W. S.; Terry, Astrid; Torto-Alalibo, Trudy A.; Win, Joe; Xu, Zhanyou; Zhang, Hongbin; Grigoriev, Igor V.; Rokhsar, Daniel S.; Boore, Jeffrey L.

    2006-04-17

    Draft genome sequences have been determined for the soybean pathogen Phytophthora sojae and the sudden oak death pathogen Phytophthora ramorum. Oömycetes such as these Phytophthora species share the kingdom Stramenopila with photosynthetic algae such as diatoms, and the presence of many Phytophthora genes of probable phototroph origin supports a photosynthetic ancestry for the stramenopiles. Comparison of the two species' genomes reveals a rapid expansion and diversification of many protein families associated with plant infection such as hydrolases, ABC transporters, protein toxins, proteinase inhibitors, and, in particular, a superfamily of 700 proteins with similarity to known oömycete avirulence genes.

  13. Genomic organization and sequence analysis of the vomeronasal receptor V2R genes in mouse genome

    Institute of Scientific and Technical Information of China (English)

    YANG Hui; Zhang YaPing

    2007-01-01

    Two multigene superfamilies, named V1R and V2R, encoding seven-transmembrane-domain G-protein coupled receptors (GPCRs) have been identified as pheromone receptors in mammals. Three V2R gene families have been described in mouse and rat. Here we screened the updated mouse genome sequence database and finally retrieved 63 putative functional V2R genes including three newly identified genes which formed a new additional family. We described the genomic organization of these genes and also characterized the conservation of mouse V2R protein sequences. These genomic and sequence information we described are useful as part of the evidence to speculate the functional domain of V2Rs and should give aid to the functionality study in the future.

  14. Five Complete Chloroplast Genome Sequences from Diospyros: Genome Organization and Comparative Analysis.

    Directory of Open Access Journals (Sweden)

    Jianmin Fu

    Full Text Available Diospyros is the largest genus in Ebenaceae, comprising more than 500 species with remarkable economic value, especially Diospyros kaki Thunb., which has traditionally been an important food resource in China, Korea, and Japan. Complete chloroplast (cp genomes from D. kaki, D. lotus L., D. oleifera Cheng., D. glaucifolia Metc., and Diospyros 'Jinzaoshi' were sequenced using Illumina sequencing technology. This is the first cp genome reported in Ebenaceae. The cp genome sequences of Diospyros ranged from 157,300 to 157,784 bp in length, presenting a typical quadripartite structure with two inverted repeats each separated by one large and one small single-copy region. For each cp genome, 134 genes were annotated, including 80 protein-coding, 31 tRNA, and 4 rRNA unique genes. In all, 179 repeats and 283 single sequence repeats were identified. Four hypervariable regions, namely, intergenic region of trnQ_rps16, trnV_ndhC, and psbD_trnT, and intron of ndhA, were identified in the Diospyros genomes. Phylogenetic analyses based on the whole cp genome, protein-coding, and intergenic and intron sequences indicated that D. oleifera is closely related to D. kaki and could be used as a model plant for future research on D. kaki; to our knowledge, this is proposed for the first time. Further, these analyses together with two large deletions (301 and 140 bp in the cp genome of D. 'Jinzaoshi', support its placement as a new species in Diospyros. Both maximum parsimony and likelihood analyses for 19 taxa indicated the basal position of Ericales in asterids and suggested that Ebenaceae is monophyletic in Ericales.

  15. Are Escherichia coli Pathotypes Still Relevant in the Era of Whole-Genome Sequencing?

    Directory of Open Access Journals (Sweden)

    Roy Michael Robins-Browne

    2016-11-01

    Full Text Available The empirical and pragmatic nature of diagnostic microbiology has given rise to several different schemes to subtype E. coli, including biotyping, serotyping and pathotyping. These schemes have proved invaluable in identifying and tracking outbreaks, and for prognostication in individual cases of infection, but they are imprecise and potentially misleading due to the malleability and continuous evolution of E. coli. Whole genome sequencing can be used to accurately determine E. coli subtypes that are based on allelic variation or differences in gene content, such as serotyping and pathotyping. Whole genome sequencing also provides information about single nucleotide polymorphisms in the core genome of E. coli, which form the basis of sequence typing, and is more reliable than other systems for tracking the evolution and spread of individual strains. A typing scheme for E. coli based on genome sequences that includes elements of both the core and accessory genomes, should reduce typing anomalies and promote understanding of how different varieties of E. coli spread and cause disease. Such a scheme could also define pathotypes more precisely than current methods.

  16. Impact of Negative Sequence Current Injection by Wind Power Plants

    DEFF Research Database (Denmark)

    Chaudhary, Sanjay; Göksu, Ömer; Teodorescu, Remus

    2013-01-01

    This paper presents an analysis of the impact from negative sequence current injection by wind power plants in power systems under steady-state and short-term unbalanced conditions, including faults. The separate positive and negative sequence current control capability of the grid-side converter...

  17. Uncovering genomic features and maternal origin of korean native chicken by whole genome sequencing.

    Science.gov (United States)

    Kwak, Woori; Song, Ki-Duk; Oh, Jae-Don; Heo, Kang-Nyeong; Lee, Jun-Heon; Lee, Woon Kyu; Yoon, Sook Hee; Kim, Heebal; Cho, Seoae; Lee, Hak-Kyo

    2014-01-01

    The Korean Native Chicken (KNC) is an important endemic biological resource in Korea. While numerous studies have been conducted exploring this breed, none have used next-generation sequencing to identify its specific genomic features. We sequenced five strains of KNC and identified 10.9 million SNVs and 1.3 million InDels. Through the analysis, we found that the highly variable region common to all 5 strains had genes like PCHD15, CISD1, PIK3C2A, and NUCB2 that might be related to the phenotypic traits of the chicken such as auditory sense, growth rate and egg traits. In addition, we assembled unaligned reads that could not be mapped to the reference genome. By assembling the unaligned reads, we were able to present genomic sequences characteristic to the KNC. Based on this, we also identified genes related to the olfactory receptors and antigen that are common to all 5 strains. Finally, through the reconstructed mitochondrial genome sequences, we performed phylogenomic analysis and elucidated the maternal origin of the artificially restored KNC. Our results revealed that the KNC has multiple maternal origins which are in agreement with Korea's history of chicken breed imports. The results presented here provide a valuable basis for future research on genomic features of KNC and further understanding of KNC's origin.

  18. Uncovering genomic features and maternal origin of korean native chicken by whole genome sequencing.

    Directory of Open Access Journals (Sweden)

    Woori Kwak

    Full Text Available The Korean Native Chicken (KNC is an important endemic biological resource in Korea. While numerous studies have been conducted exploring this breed, none have used next-generation sequencing to identify its specific genomic features. We sequenced five strains of KNC and identified 10.9 million SNVs and 1.3 million InDels. Through the analysis, we found that the highly variable region common to all 5 strains had genes like PCHD15, CISD1, PIK3C2A, and NUCB2 that might be related to the phenotypic traits of the chicken such as auditory sense, growth rate and egg traits. In addition, we assembled unaligned reads that could not be mapped to the reference genome. By assembling the unaligned reads, we were able to present genomic sequences characteristic to the KNC. Based on this, we also identified genes related to the olfactory receptors and antigen that are common to all 5 strains. Finally, through the reconstructed mitochondrial genome sequences, we performed phylogenomic analysis and elucidated the maternal origin of the artificially restored KNC. Our results revealed that the KNC has multiple maternal origins which are in agreement with Korea's history of chicken breed imports. The results presented here provide a valuable basis for future research on genomic features of KNC and further understanding of KNC's origin.

  19. Insights into the Evolution of Cotton Diploids and Polyploids from Whole-Genome Re-sequencing

    OpenAIRE

    Page, Justin T.; Huynh, Mark D; Zach S Liechty; Grupp, Kara; Stelly, David; Hulse, Amanda M; Ashrafi, Hamid; Van Deynze, Allen; Wendel, Jonathan F.; Udall, Joshua A.

    2013-01-01

    Understanding the composition, evolution, and function of the Gossypium hirsutum (cotton) genome is complicated by the joint presence of two genomes in its nucleus (AT and DT genomes). These two genomes were derived from progenitor A-genome and D-genome diploids involved in ancestral allopolyploidization. To better understand the allopolyploid genome, we re-sequenced the genomes of extant diploid relatives that contain the A1 (Gossypium herbaceum), A2 (Gossypium arboreum), or D5 (Gossypium ra...

  20. Genomic Sequencing of Orientia tsutsugamushi Strain Karp, an Assembly Comparable to the Genome Size of the Strain Ikeda.

    Science.gov (United States)

    Liao, Hsiao-Mei; Chao, Chien-Chung; Lei, Haiyan; Li, Bingjie; Tsai, Shien; Hung, Guo-Chiuan; Ching, Wei-Mei; Lo, Shyh-Ching

    2016-08-18

    Orientia tsutsugamushi, an intracellular bacterium, belongs to the family Rickettsiaceae This study presents the draft genome sequence of strain Karp, with 2.0 Mb as the size of the completed genome. This nearly finished draft genome sequence was annotated with the RAST server and the contents compared to those of the other strains.

  1. Genomic Sequencing of Orientia tsutsugamushi Strain Karp, an Assembly Comparable to the Genome Size of the Strain Ikeda

    Science.gov (United States)

    Liao, Hsiao-Mei; Chao, Chien-Chung; Lei, Haiyan; Li, Bingjie; Tsai, Shien; Hung, Guo-Chiuan

    2016-01-01

    Orientia tsutsugamushi, an intracellular bacterium, belongs to the family Rickettsiaceae. This study presents the draft genome sequence of strain Karp, with 2.0 Mb as the size of the completed genome. This nearly finished draft genome sequence was annotated with the RAST server and the contents compared to those of the other strains. PMID:27540052

  2. Application of massive parallel sequencing to whole genome SNP discovery in the porcine genome

    Directory of Open Access Journals (Sweden)

    Crooijmans Richard PMA

    2009-08-01

    Full Text Available Abstract Background Although the Illumina 1 G Genome Analyzer generates billions of base pairs of sequence data, challenges arise in sequence selection due to the varying sequence quality. Therefore, in the framework of the International Porcine SNP Chip Consortium, this pilot study aimed to evaluate the impact of the quality level of the sequenced bases on mapping quality and identification of true SNPs on a large scale. Results DNA pooled from five animals from a commercial boar line was digested with DraI; 150–250-bp fragments were isolated and end-sequenced using the Illumina 1 G Genome Analyzer, yielding 70,348,064 sequences 36-bp long. Rules were developed to select sequences, which were then aligned to unique positions in a reference genome. Sequences were selected based on quality, and three thresholds of sequence quality (SQ were compared. The highest threshold of SQ allowed identification of a larger number of SNPs (17,489, distributed widely across the pig genome. In total, 3,142 SNPs were validated with a success rate of 96%. The correlation between estimated minor allele frequency (MAF and genotyped MAF was moderate, and SNPs were highly polymorphic in other pig breeds. Lowering the SQ threshold and maintaining the same criteria for SNP identification resulted in the discovery of fewer SNPs (16,768, of which 259 were not identified using higher SQ levels. Validation of SNPs found exclusively in the lower SQ threshold had a success rate of 94% and a low correlation between estimated MAF and genotyped MAF. Base change analysis suggested that the rate of transitions in the pig genome is likely to be similar to that observed in humans. Chromosome X showed reduced nucleotide diversity relative to autosomes, as observed for other species. Conclusion Large numbers of SNPs can be identified reliably by creating strict rules for sequence selection, which simultaneously decreases sequence ambiguity. Selection of sequences using a higher SQ

  3. Full Genome Sequence and sfRNA Interferon Antagonist Activity of Zika Virus from Recife, Brazil

    Science.gov (United States)

    Rezelj, Veronica V.; Clark, Jordan J.; Cordeiro, Marli T.; Freitas de Oliveira França, Rafael; Pena, Lindomar J.; Wilkie, Gavin S.; Da Silva Filipe, Ana; Davis, Christopher; Hughes, Joseph; Varjak, Margus; Selinger, Martin; Zuvanov, Luíza; Owsianka, Ania M.; Patel, Arvind H.; McLauchlan, John; Lindenbach, Brett D.; Fall, Gamou; Sall, Amadou A.; Biek, Roman; Rehwinkel, Jan; Schnettler, Esther; Kohl, Alain

    2016-01-01

    Background The outbreak of Zika virus (ZIKV) in the Americas has transformed a previously obscure mosquito-transmitted arbovirus of the Flaviviridae family into a major public health concern. Little is currently known about the evolution and biology of ZIKV and the factors that contribute to the associated pathogenesis. Determining genomic sequences of clinical viral isolates and characterization of elements within these are an important prerequisite to advance our understanding of viral replicative processes and virus-host interactions. Methodology/Principal findings We obtained a ZIKV isolate from a patient who presented with classical ZIKV-associated symptoms, and used high throughput sequencing and other molecular biology approaches to determine its full genome sequence, including non-coding regions. Genome regions were characterized and compared to the sequences of other isolates where available. Furthermore, we identified a subgenomic flavivirus RNA (sfRNA) in ZIKV-infected cells that has antagonist activity against RIG-I induced type I interferon induction, with a lesser effect on MDA-5 mediated action. Conclusions/Significance The full-length genome sequence including non-coding regions of a South American ZIKV isolate from a patient with classical symptoms will support efforts to develop genetic tools for this virus. Detection of sfRNA that counteracts interferon responses is likely to be important for further understanding of pathogenesis and virus-host interactions. PMID:27706161

  4. Sequencing of Ebola Virus Genomes Using Nanopore Technology

    Science.gov (United States)

    Hoenen, Thomas

    2017-01-01

    Sequencing of virus genomes during disease outbreaks can provide valuable information for diagnostics, epidemiology, and evaluation of potential countermeasures. However, particularly in remote areas logistical and technical challenges can be significant. Nanopore sequencing provides an alternative to classical Sanger and next-generation sequencing methods, and was successfully used under outbreak conditions (Hoenen et al., 2016; Quick et al., 2016). Here we describe a protocol used for sequencing of Ebola virus under outbreak conditions using Nanopore technology, which we successfully implemented at the CDC/NIH diagnostic laboratory (de Wit et al., 2016) located at the ELWA-3 Ebola virus Treatment Unit in Monrovia, Liberia, during the recent Ebola virus outbreak in West Africa.

  5. A hybrid approach for de novo human genome sequence assembly and phasing.

    Science.gov (United States)

    Mostovoy, Yulia; Levy-Sakin, Michal; Lam, Jessica; Lam, Ernest T; Hastie, Alex R; Marks, Patrick; Lee, Joyce; Chu, Catherine; Lin, Chin; Džakula, Željko; Cao, Han; Schlebusch, Stephen A; Giorda, Kristina; Schnall-Levin, Michael; Wall, Jeffrey D; Kwok, Pui-Yan

    2016-07-01

    Despite tremendous progress in genome sequencing, the basic goal of producing a phased (haplotype-resolved) genome sequence with end-to-end contiguity for each chromosome at reasonable cost and effort is still unrealized. In this study, we describe an approach to performing de novo genome assembly and experimental phasing by integrating the data from Illumina short-read sequencing, 10X Genomics linked-read sequencing, and BioNano Genomics genome mapping to yield a high-quality, phased, de novo assembled human genome.

  6. The draft genome sequence of the nematode Caenorhabditis briggsae, a companion to C. elegans.

    Science.gov (United States)

    Gupta, Bhagwati P; Sternberg, Paul W

    2003-01-01

    The publication of the draft genome sequence of Caenorhabditis briggsae improves the annotation of the genome of its close relative Caenorhabditis elegans and will facilitate comparative genomics and the study of the evolutionary changes during development.

  7. Draft genome sequence of the oilseed species Ricinus communis.

    Science.gov (United States)

    Chan, Agnes P; Crabtree, Jonathan; Zhao, Qi; Lorenzi, Hernan; Orvis, Joshua; Puiu, Daniela; Melake-Berhan, Admasu; Jones, Kristine M; Redman, Julia; Chen, Grace; Cahoon, Edgar B; Gedil, Melaku; Stanke, Mario; Haas, Brian J; Wortman, Jennifer R; Fraser-Liggett, Claire M; Ravel, Jacques; Rabinowicz, Pablo D

    2010-09-01

    Castor bean (Ricinus communis) is an oilseed crop that belongs to the spurge (Euphorbiaceae) family, which comprises approximately 6,300 species that include cassava (Manihot esculenta), rubber tree (Hevea brasiliensis) and physic nut (Jatropha curcas). It is primarily of economic interest as a source of castor oil, used for the production of high-quality lubricants because of its high proportion of the unusual fatty acid ricinoleic acid. However, castor bean genomics is also relevant to biosecurity as the seeds contain high levels of ricin, a highly toxic, ribosome-inactivating protein. Here we report the draft genome sequence of castor bean (4.6-fold coverage), the first for a member of the Euphorbiaceae. Whereas most of the key genes involved in oil synthesis and turnover are single copy, the number of members of the ricin gene family is larger than previously thought. Comparative genomics analysis suggests the presence of an ancient hexaploidization event that is conserved across the dicotyledonous lineage.

  8. Genome sequence and description of Anaerosalibacter massiliensis sp. nov.

    Directory of Open Access Journals (Sweden)

    N. Dione

    2016-03-01

    Full Text Available Anaerosalibacter massiliensis sp. nov. strain ND1T (= CSUR P762 = DSM 27308 is the type strain of A. massiliensis sp. nov., a new species within the genus Anaerosalibacter. This strain, the genome of which is described here, was isolated from the faecal flora of a 49-year-old healthy Brazilian man. Anaerosalibacter massiliensis is a Gram-positive, obligate anaerobic rod and member of the family Clostridiaceae. With the complete genome sequence and annotation, we describe here the features of this organism. The 3 197 911 bp long genome (one chromosome but no plasmid contains 3271 protein-coding and 62 RNA genes, including six rRNA genes.

  9. The complete chloroplast genome sequence of Dendrobium nobile.

    Science.gov (United States)

    Yan, Wenjin; Niu, Zhitao; Zhu, Shuying; Ye, Meirong; Ding, Xiaoyu

    2016-11-01

    The complete chloroplast (cp) genome sequence of Dendrobium nobile, an endangered and traditional Chinese medicine with important economic value, is presented in this article. The total genome size is 150,793 bp, containing a large single copy (LSC) region (84,939 bp) and a small single copy region (SSC) (13,310 bp) which were separated by two inverted repeat (IRs) regions (26,272 bp). The overall GC contents of the plastid genome were 38.8%. In total, 130 unique genes were annotated and they were consisted of 76 protein-coding genes, 30 tRNA genes and 4 rRNA genes. Fourteen genes contained one or two introns.

  10. Combined evidence annotation of transposable elements in genome sequences.

    Directory of Open Access Journals (Sweden)

    Hadi Quesneville

    2005-07-01

    Full Text Available Transposable elements (TEs are mobile, repetitive sequences that make up significant fractions of metazoan genomes. Despite their near ubiquity and importance in genome and chromosome biology, most efforts to annotate TEs in genome sequences rely on the results of a single computational program, RepeatMasker. In contrast, recent advances in gene annotation indicate that high-quality gene models can be produced from combining multiple independent sources of computational evidence. To elevate the quality of TE annotations to a level comparable to that of gene models, we have developed a combined evidence-model TE annotation pipeline, analogous to systems used for gene annotation, by integrating results from multiple homology-based and de novo TE identification methods. As proof of principle, we have annotated "TE models" in Drosophila melanogaster Release 4 genomic sequences using the combined computational evidence derived from RepeatMasker, BLASTER, TBLASTX, all-by-all BLASTN, RECON, TE-HMM and the previous Release 3.1 annotation. Our system is designed for use with the Apollo genome annotation tool, allowing automatic results to be curated manually to produce reliable annotations. The euchromatic TE fraction of D. melanogaster is now estimated at 5.3% (cf. 3.86% in Release 3.1, and we found a substantially higher number of TEs (n = 6,013 than previously identified (n = 1,572. Most of the new TEs derive from small fragments of a few hundred nucleotides long and highly abundant families not previously annotated (e.g., INE-1. We also estimated that 518 TE copies (8.6% are inserted into at least one other TE, forming a nest of elements. The pipeline allows rapid and thorough annotation of even the most complex TE models, including highly deleted and/or nested elements such as those often found in heterochromatic sequences. Our pipeline can be easily adapted to other genome sequences, such as those of the D. melanogaster heterochromatin or other

  11. Next-Generation Sequencing and Genome Editing in Plant Virology.

    Science.gov (United States)

    Hadidi, Ahmed; Flores, Ricardo; Candresse, Thierry; Barba, Marina

    2016-01-01

    Next-generation sequencing (NGS) has been applied to plant virology since 2009. NGS provides highly efficient, rapid, low cost DNA, or RNA high-throughput sequencing of the genomes of plant viruses and viroids and of the specific small RNAs generated during the infection process. These small RNAs, which cover frequently the whole genome of the infectious agent, are 21-24 nt long and are known as vsRNAs for viruses and vd-sRNAs for viroids. NGS has been used in a number of studies in plant virology including, but not limited to, discovery of novel viruses and viroids as well as detection and identification of those pathogens already known, analysis of genome diversity and evolution, and study of pathogen epidemiology. The genome engineering editing method, clustered regularly interspaced short palindromic repeats (CRISPR)-Cas9 system has been successfully used recently to engineer resistance to DNA geminiviruses (family, Geminiviridae) by targeting different viral genome sequences in infected Nicotiana benthamiana or Arabidopsis plants. The DNA viruses targeted include tomato yellow leaf curl virus and merremia mosaic virus (begomovirus); beet curly top virus and beet severe curly top virus (curtovirus); and bean yellow dwarf virus (mastrevirus). The technique has also been used against the RNA viruses zucchini yellow mosaic virus, papaya ringspot virus and turnip mosaic virus (potyvirus) and cucumber vein yellowing virus (ipomovirus, family, Potyviridae) by targeting the translation initiation genes eIF4E in cucumber or Arabidopsis plants. From these recent advances of major importance, it is expected that NGS and CRISPR-Cas technologies will play a significant role in the very near future in advancing the field of plant virology and connecting it with other related fields of biology.

  12. Next-Generation Sequencing and Genome Editing in Plant Virology

    Science.gov (United States)

    Hadidi, Ahmed; Flores, Ricardo; Candresse, Thierry; Barba, Marina

    2016-01-01

    Next-generation sequencing (NGS) has been applied to plant virology since 2009. NGS provides highly efficient, rapid, low cost DNA, or RNA high-throughput sequencing of the genomes of plant viruses and viroids and of the specific small RNAs generated during the infection process. These small RNAs, which cover frequently the whole genome of the infectious agent, are 21–24 nt long and are known as vsRNAs for viruses and vd-sRNAs for viroids. NGS has been used in a number of studies in plant virology including, but not limited to, discovery of novel viruses and viroids as well as detection and identification of those pathogens already known, analysis of genome diversity and evolution, and study of pathogen epidemiology. The genome engineering editing method, clustered regularly interspaced short palindromic repeats (CRISPR)-Cas9 system has been successfully used recently to engineer resistance to DNA geminiviruses (family, Geminiviridae) by targeting different viral genome sequences in infected Nicotiana benthamiana or Arabidopsis plants. The DNA viruses targeted include tomato yellow leaf curl virus and merremia mosaic virus (begomovirus); beet curly top virus and beet severe curly top virus (curtovirus); and bean yellow dwarf virus (mastrevirus). The technique has also been used against the RNA viruses zucchini yellow mosaic virus, papaya ringspot virus and turnip mosaic virus (potyvirus) and cucumber vein yellowing virus (ipomovirus, family, Potyviridae) by targeting the translation initiation genes eIF4E in cucumber or Arabidopsis plants. From these recent advances of major importance, it is expected that NGS and CRISPR-Cas technologies will play a significant role in the very near future in advancing the field of plant virology and connecting it with other related fields of biology. PMID:27617007

  13. Origin of noncoding DNA sequences: molecular fossils of genome evolution.

    OpenAIRE

    Naora, H.; MIYAHARA, K.; Curnow, R. N.

    1987-01-01

    The total amount of noncoding sequences on chromosomes of contemporary organisms varies significantly from species to species. We propose a hypothesis for the origin of these noncoding sequences that assumes that (i) an approximately equal to 0.55-kilobase (kb)-long reading frame composed the primordial gene and (ii) a 20-kb-long single-stranded polynucleotide is the longest molecule (as a genome) that was polymerized at random and without a specific template in the primordial soup/cell. The ...

  14. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd

    Energy Technology Data Exchange (ETDEWEB)

    Fleischmann, R.D.; Adams, M.D.; White, O. [Institute for Genomic Research, Gaithersburg, MD (United States)] [and others

    1995-07-28

    An approach for genome analysis based on sequencing and assembly of unselected pieces of DNA from the whole chromosome has been applied to obtain the complete nucleotide sequence (1,830,137 base pairs) of the genome from the bacterium Haemophilus influenzae Rd. This approach eliminates the need for initial mapping efforts and is therefore applicable to the vast array of microbial species for which genome maps are unavailable. The H. influenzae Rd genome sequence (Genome Sequence DataBase accession number L42023) represents the only complete genome sequence from a free-living organism. 46 refs., 4 figs., 4 tabs.

  15. Reduced representation sequencing: a success in maize and a promise for other plant genomes.

    Science.gov (United States)

    Barbazuk, W Brad; Bedell, Joseph A; Rabinowicz, Pablo D

    2005-08-01

    Plant, and particularly cereal genomes, are challenging to sequence due to their large size and high repetitive DNA content. Gene-enrichment strategies are alternative or complementary approaches to complete genome sequencing that yield, rapidly and inexpensively, useful sequence data from large and complex genomes. The maize genome is large (2.7 Gbp) and contains large amounts of conserved repetitive elements. Furthermore, the high allelic diversity found between maize inbred lines may necessitate sequencing several inbred lines in order to recover the maize "gene pool". Two gene-enrichment approaches, methylation filtration (MF) and high C(o)t (HC) sequencing have been tested in maize and their ability to sample the gene space has been examined. Combined with other genomic sequencing strategies, gene-enriched genomic sequencing is a practical way to examine the maize gene pool, to order and orient the genic sequences on the genome, and to enable investigation of gene content of other complex plant genomes.

  16. Genome Sequence of the Probiotic Strain Lactobacillus rhamnosus (Formerly Lactobacillus casei) LOCK900

    OpenAIRE

    Aleksandrzak-Piekarczyk, Tamara; Koryszewska-Bagi?ska, Anna; Bardowski, Jacek

    2013-01-01

    Lactobacillus rhamnosus LOCK900 fulfills the criteria required for probiotic strains. In this study, we report a whole-genome sequence of this isolate and compare it with other L.?rhamnosus complete genome sequences already published.

  17. Sequencing and annotated analysis of an Estonian human genome.

    Science.gov (United States)

    Lilleoja, Rutt; Sarapik, Aili; Reimann, Ene; Reemann, Paula; Jaakma, Ülle; Vasar, Eero; Kõks, Sulev

    2012-02-01

    In present study we describe the sequencing and annotated analysis of the individual genome of Estonian. Using SOLID technology we generated 2,449,441,916 of 50-bp reads. The Bioscope version 1.3 was used for mapping and pairing of reads to the NCBI human genome reference (build 36, hg18). Bioscope enables also the annotation of the results of variant (tertiary) analysis. The average mapping of reads was 75.5% with total coverage of 107.72 Gb. resulting in mean fold coverage of 34.6. We found 3,482,975 SNPs out of which 352,492 were novel. 21,222 SNPs were in coding region: 10,649 were synonymous SNPs, 10,360 were nonsynonymous missense SNPs, 155 were nonsynonymous nonsense SNPs and 58 were nonsynonymous frameshifts. We identified 219 CNVs with total base pair coverage of 37,326,300 bp and 87,451 large insertion/deletion polymorphisms covering 10,152,256 bp of the genome. In addition, we found 285,864 small size insertion/deletion polymorphisms out of which 133,969 were novel. Finally, we identified 53 inversions, 19 overlapped genes and 2 overlapped exons. Interestingly, we found the region in chromosome 6 to be enriched with the coding SNPs and CNVs. This study confirms previous findings, that our genomes are more complex and variable as thought before. Therefore, sequencing of the personal genomes followed by annotation would improve the analysis of heritability of phenotypes and our understandings on the functions of genome.

  18. Current trends in chloroplast genome research

    African Journals Online (AJOL)

    USER

    2010-06-14

    Jun 14, 2010 ... organization and promising ability for transgenic expression. Therefore, cpDNA ... highly conserved in size, structure and gene content. (Olmstead and ... contain only 10% of the genes required for fully functional organelle whereas the ... concept (McFadden, 1999; Maliga, 2003; Howe, 2008). Currently 170 ...

  19. Complete Genome Sequence of Mycobacterium xenopi Type Strain RIVM700367

    KAUST Repository

    Abdallah, A. M.

    2012-05-24

    Mycobacterium xenopi is a slow-growing, thermophilic, water-related Mycobacterium species. Like other nontuberculous mycobacteria, M. xenopi more commonly infects humans with altered immune function, such as chronic obstructive pulmonary disease patients. It is considered clinically relevant in a significant proportion of the patients from whom it is isolated. We report here the whole genome sequence of M. xenopi type strain RIVM700367.

  20. Evolutionary insights from suffix array-based genome sequence analysis

    Indian Academy of Sciences (India)

    Anindya Poddar; Nagasuma Chandra; Madhavi Ganapathiraju; K Sekar; Judith Klein-Seetharaman; Raj Reddy; N Balakrishnan

    2007-08-01

    Gene and protein sequence analyses, central components of studies in modern biology are easily amenable to string matching and pattern recognition algorithms. The growing need of analysing whole genome sequences more efficiently and thoroughly, has led to the emergence of new computational methods. Suffix trees and suffix arrays are data structures, well known in many other areas and are highly suited for sequence analysis too. Here we report an improvement to the design of construction of suffix arrays. Enhancement in versatility and scalability, enabled by this approach, is demonstrated through the use of real-life examples. The scalability of the algorithm to whole genomes renders it suitable to address many biologically interesting problems. One example is the evolutionary insight gained by analysing unigrams, bi-grams and higher n-grams, indicating that the genetic code has a direct influence on the overall composition of the genome. Further, different proteomes have been analysed for the coverage of the possible peptide space, which indicate that as much as a quarter of the total space at the tetra-peptide level is left un-sampled in prokaryotic organisms, although almost all tri-peptides can be seen in one protein or another in a proteome. Besides, distinct patterns begin to emerge for the counts of particular tetra and higher peptides, indicative of a ‘meaning’ for tetra and higher n-grams. The toolkit has also been used to demonstrate the usefulness of identifying repeats in whole proteomes efficiently. As an example, 16 members of one COG, coded by the genome of Mycobacterium tuberculosis H37Rv have been found to contain a repeating sequence of 300 amino acids.