WorldWideScience

Sample records for survey genomic sequence

  1. Microsatellite DNA in genomic survey sequences and UniGenes of loblolly pine

    Science.gov (United States)

    Craig S Echt; Surya Saha; Dennis L Deemer; C Dana Nelson

    2011-01-01

    Genomic DNA sequence databases are a potential and growing resource for simple sequence repeat (SSR) marker development in loblolly pine (Pinus taeda L.). Loblolly pine also has many expressed sequence tags (ESTs) available for microsatellite (SSR) marker development. We compared loblolly pine SSR densities in genome survey sequences (GSSs) to those in non-redundant...

  2. ReRep: Computational detection of repetitive sequences in genome survey sequences (GSS

    Directory of Open Access Journals (Sweden)

    Alves-Ferreira Marcelo

    2008-09-01

    Full Text Available Abstract Background Genome survey sequences (GSS offer a preliminary global view of a genome since, unlike ESTs, they cover coding as well as non-coding DNA and include repetitive regions of the genome. A more precise estimation of the nature, quantity and variability of repetitive sequences very early in a genome sequencing project is of considerable importance, as such data strongly influence the estimation of genome coverage, library quality and progress in scaffold construction. Also, the elimination of repetitive sequences from the initial assembly process is important to avoid errors and unnecessary complexity. Repetitive sequences are also of interest in a variety of other studies, for instance as molecular markers. Results We designed and implemented a straightforward pipeline called ReRep, which combines bioinformatics tools for identifying repetitive structures in a GSS dataset. In a case study, we first applied the pipeline to a set of 970 GSSs, sequenced in our laboratory from the human pathogen Leishmania braziliensis, the causative agent of leishmaniosis, an important public health problem in Brazil. We also verified the applicability of ReRep to new sequencing technologies using a set of 454-reads of an Escheria coli. The behaviour of several parameters in the algorithm is evaluated and suggestions are made for tuning of the analysis. Conclusion The ReRep approach for identification of repetitive elements in GSS datasets proved to be straightforward and efficient. Several potential repetitive sequences were found in a L. braziliensis GSS dataset generated in our laboratory, and further validated by the analysis of a more complete genomic dataset from the EMBL and Sanger Centre databases. ReRep also identified most of the E. coli K12 repeats prior to assembly in an example dataset obtained by automated sequencing using 454 technology. The parameters controlling the algorithm behaved consistently and may be tuned to the properties

  3. Survey and analysis of simple sequence repeats (SSRs) present in the genomes of plant viroids.

    Science.gov (United States)

    Qin, Lü; Zhang, Zhixiang; Zhao, Xiangyan; Wu, Xiaolong; Chen, Yubao; Tan, Zhongyang; Li, Shifang

    2014-01-01

    Extensive simple sequence repeat (SSR) surveys have been performed for eukaryotic prokaryotic and viral genomes, but information regarding SSRs in viroids is limited. We undertook a survey to examine the presence of SSRs in viroid genomes. Our results show that the distribution of SSRs in viroids may influence secondary structure, and that SSRs could play a role in generating genetic diversity. We also discuss the potential evolutionary role of repeated sequences in the viroid genome. This is the first report of SSR loci in viroids, and our study could be helpful in understanding the structure and evolution of viroid genomes.

  4. Genome Sequencing

    DEFF Research Database (Denmark)

    Sato, Shusei; Andersen, Stig Uggerhøj

    2014-01-01

    The current Lotus japonicus reference genome sequence is based on a hybrid assembly of Sanger TAC/BAC, Sanger shotgun and Illumina shotgun sequencing data generated from the Miyakojima-MG20 accession. It covers nearly all expressed L. japonicus genes and has been annotated mainly based...... on transcriptional evidence. Analysis of repetitive sequences suggests that they are underrepresented in the reference assembly, reflecting an enrichment of gene-rich regions in the current assembly. Characterization of Lotus natural variation by resequencing of L. japonicus accessions and diploid Lotus species...... is currently ongoing, facilitated by the MG20 reference sequence...

  5. A sequence-based survey of the complex structural organization of tumor genomes

    Energy Technology Data Exchange (ETDEWEB)

    Collins, Colin; Raphael, Benjamin J.; Volik, Stanislav; Yu, Peng; Wu, Chunxiao; Huang, Guiqing; Linardopoulou, Elena V.; Trask, Barbara J.; Waldman, Frederic; Costello, Joseph; Pienta, Kenneth J.; Mills, Gordon B.; Bajsarowicz, Krystyna; Kobayashi, Yasuko; Sridharan, Shivaranjani; Paris, Pamela; Tao, Quanzhou; Aerni, Sarah J.; Brown, Raymond P.; Bashir, Ali; Gray, Joe W.; Cheng, Jan-Fang; de Jong, Pieter; Nefedov, Mikhail; Ried, Thomas; Padilla-Nash, Hesed M.; Collins, Colin C.

    2008-04-03

    The genomes of many epithelial tumors exhibit extensive chromosomal rearrangements. All classes of genome rearrangements can be identified using End Sequencing Profiling (ESP), which relies on paired-end sequencing of cloned tumor genomes. In this study, brain, breast, ovary and prostate tumors along with three breast cancer cell lines were surveyed with ESP yielding the largest available collection of sequence-ready tumor genome breakpoints and providing evidence that some rearrangements may be recurrent. Sequencing and fluorescence in situ hybridization (FISH) confirmed translocations and complex tumor genome structures that include coamplification and packaging of disparate genomic loci with associated molecular heterogeneity. Comparison of the tumor genomes suggests recurrent rearrangements. Some are likely to be novel structural polymorphisms, whereas others may be bona fide somatic rearrangements. A recurrent fusion transcript in breast tumors and a constitutional fusion transcript resulting from a segmental duplication were identified. Analysis of end sequences for single nucleotide polymorphisms (SNPs) revealed candidate somatic mutations and an elevated rate of novel SNPs in an ovarian tumor. These results suggest that the genomes of many epithelial tumors may be far more dynamic and complex than previously appreciated and that genomic fusions including fusion transcripts and proteins may be common, possibly yielding tumor-specific biomarkers and therapeutic targets.

  6. Genome survey sequencing and genetic background characterization of Gracilariopsis lemaneiformis (Rhodophyta) based on next-generation sequencing.

    Science.gov (United States)

    Zhou, Wei; Hu, Yiyi; Sui, Zhenghong; Fu, Feng; Wang, Jinguo; Chang, Lianpeng; Guo, Weihua; Li, Binbin

    2013-01-01

    Gracilariopsis lemaneiformis has a high economic value and is one of the most important aquaculture species in China. Despite it is economic importance, it has remained largely unstudied at the genomic level. In this study, we conducted a genome survey of Gp. lemaneiformis using next-generation sequencing (NGS) technologies. In total, 18.70 Gb of high-quality sequence data with an estimated genome size of 97 Mb were obtained by HiSeq 2000 sequencing for Gp. lemaneiformis. These reads were assembled into 160,390 contigs with a N50 length of 3.64 kb, which were further assembled into 125,685 scaffolds with a total length of 81.17 Mb. Genome analysis predicted 3490 genes and a GC% content of 48%. The identified genes have an average transcript length of 1,429 bp, an average coding sequence size of 1,369 bp, 1.36 exons per gene, exon length of 1,008 bp, and intron length of 191 bp. From the initial assembled scaffold, transposable elements constituted 54.64% (44.35 Mb) of the genome, and 7737 simple sequence repeats (SSRs) were identified. Among these SSRs, the trinucleotide repeat type was the most abundant (up to 73.20% of total SSRs), followed by the di- (17.41%), tetra- (5.49%), hexa- (2.90%), and penta- (1.00%) nucleotide repeat type. These characteristics suggest that Gp. lemaneiformis is a model organism for genetic study. This is the first report of genome-wide characterization within this taxon.

  7. Genome Survey Sequencing and Genetic Background Characterization of Gracilariopsis lemaneiformis (Rhodophyta) Based on Next-Generation Sequencing

    Science.gov (United States)

    Sui, Zhenghong; Fu, Feng; Wang, Jinguo; Chang, Lianpeng; Guo, Weihua; Li, Binbin

    2013-01-01

    Gracilariopsis lemaneiformis has a high economic value and is one of the most important aquaculture species in China. Despite it is economic importance, it has remained largely unstudied at the genomic level. In this study, we conducted a genome survey of Gp. lemaneiformis using next-generation sequencing (NGS) technologies. In total, 18.70 Gb of high-quality sequence data with an estimated genome size of 97 Mb were obtained by HiSeq 2000 sequencing for Gp. lemaneiformis. These reads were assembled into 160,390 contigs with a N50 length of 3.64 kb, which were further assembled into 125,685 scaffolds with a total length of 81.17 Mb. Genome analysis predicted 3490 genes and a GC% content of 48%. The identified genes have an average transcript length of 1,429 bp, an average coding sequence size of 1,369 bp, 1.36 exons per gene, exon length of 1,008 bp, and intron length of 191 bp. From the initial assembled scaffold, transposable elements constituted 54.64% (44.35 Mb) of the genome, and 7737 simple sequence repeats (SSRs) were identified. Among these SSRs, the trinucleotide repeat type was the most abundant (up to 73.20% of total SSRs), followed by the di- (17.41%), tetra- (5.49%), hexa- (2.90%), and penta- (1.00%) nucleotide repeat type. These characteristics suggest that Gp. lemaneiformis is a model organism for genetic study. This is the first report of genome-wide characterization within this taxon. PMID:23875008

  8. A survey of current practices for genomic sequencing test interpretation and reporting processes in US laboratories.

    Science.gov (United States)

    O'Daniel, Julianne M; McLaughlin, Heather M; Amendola, Laura M; Bale, Sherri J; Berg, Jonathan S; Bick, David; Bowling, Kevin M; Chao, Elizabeth C; Chung, Wendy K; Conlin, Laura K; Cooper, Gregory M; Das, Soma; Deignan, Joshua L; Dorschner, Michael O; Evans, James P; Ghazani, Arezou A; Goddard, Katrina A; Gornick, Michele; Farwell Hagman, Kelly D; Hambuch, Tina; Hegde, Madhuri; Hindorff, Lucia A; Holm, Ingrid A; Jarvik, Gail P; Knight Johnson, Amy; Mighion, Lindsey; Morra, Massimo; Plon, Sharon E; Punj, Sumit; Richards, C Sue; Santani, Avni; Shirts, Brian H; Spinner, Nancy B; Tang, Sha; Weck, Karen E; Wolf, Susan M; Yang, Yaping; Rehm, Heidi L

    2017-05-01

    While the diagnostic success of genomic sequencing expands, the complexity of this testing should not be overlooked. Numerous laboratory processes are required to support the identification, interpretation, and reporting of clinically significant variants. This study aimed to examine the workflow and reporting procedures among US laboratories to highlight shared practices and identify areas in need of standardization. Surveys and follow-up interviews were conducted with laboratories offering exome and/or genome sequencing to support a research program or for routine clinical services. The 73-item survey elicited multiple choice and free-text responses that were later clarified with phone interviews. Twenty-one laboratories participated. Practices highly concordant across all groups included consent documentation, multiperson case review, and enabling patient opt-out of incidental or secondary findings analysis. Noted divergence included use of phenotypic data to inform case analysis and interpretation and reporting of case-specific quality metrics and methods. Few laboratory policies detailed procedures for data reanalysis, data sharing, or patient access to data. This study provides an overview of practices and policies of experienced exome and genome sequencing laboratories. The results enable broader consideration of which practices are becoming standard approaches, where divergence remains, and areas of development in best practice guidelines that may be helpful.Genet Med advance online publication 03 Novemeber 2016.

  9. Survey of the Applications of NGS to Whole-Genome Sequencing and Expression Profiling

    Directory of Open Access Journals (Sweden)

    Jong-Sung Lim

    2012-03-01

    Full Text Available Recently, the technologies of DNA sequence variation and gene expression profiling have been used widely as approaches in the expertise of genome biology and genetics. The application to genome study has been particularly developed with the introduction of the next-generation DNA sequencer (NGS Roche/454 and Illumina/Solexa systems, along with bioinformation analysis technologies of whole-genome de novo assembly, expression profiling, DNA variation discovery, and genotyping. Both massive whole-genome shotgun paired-end sequencing and mate paired-end sequencing data are important steps for constructing de novo assembly of novel genome sequencing data. It is necessary to have DNA sequence information from a multiplatform NGS with at least 2× and 30× depth sequence of genome coverage using Roche/454 and Illumina/Solexa, respectively, for effective an way of de novo assembly. Massive short-length reading data from the Illumina/Solexa system is enough to discover DNA variation, resulting in reducing the cost of DNA sequencing. Whole-genome expression profile data are useful to approach genome system biology with quantification of expressed RNAs from a whole-genome transcriptome, depending on the tissue samples. The hybrid mRNA sequences from Rohce/454 and Illumina/Solexa are more powerful to find novel genes through de novo assembly in any whole-genome sequenced species. The 20× and 50× coverage of the estimated transcriptome sequences using Roche/454 and Illumina/Solexa, respectively, is effective to create novel expressed reference sequences. However, only an average 30× coverage of a transcriptome with short read sequences of Illumina/Solexa is enough to check expression quantification, compared to the reference expressed sequence tag sequence.

  10. A genome-wide BAC-end sequence survey provides first insights into sweetpotato (Ipomoea batatas (L.) Lam.) genome composition.

    Science.gov (United States)

    Si, Zengzhi; Du, Bing; Huo, Jinxi; He, Shaozhen; Liu, Qingchang; Zhai, Hong

    2016-11-21

    Sweetpotato, Ipomoea batatas (L.) Lam., is an important food crop widely grown in the world. However, little is known about the genome of this species because it is a highly heterozygous hexaploid. Gaining a more in-depth knowledge of sweetpotato genome is therefore necessary and imperative. In this study, the first bacterial artificial chromosome (BAC) library of sweetpotato was constructed. Clones from the BAC library were end-sequenced and analyzed to provide genome-wide information about this species. The BAC library contained 240,384 clones with an average insert size of 101 kb and had a 7.93-10.82 × coverage of the genome, and the probability of isolating any single-copy DNA sequence from the library was more than 99%. Both ends of 8310 BAC clones randomly selected from the library were sequenced to generate 11,542 high-quality BAC-end sequences (BESs), with an accumulative length of 7,595,261 bp and an average length of 658 bp. Analysis of the BESs revealed that 12.17% of the sweetpotato genome were known repetitive DNA, including 7.37% long terminal repeat (LTR) retrotransposons, 1.15% Non-LTR retrotransposons and 1.42% Class II DNA transposons etc., 18.31% of the genome were identified as sweetpotato-unique repetitive DNA and 10.00% of the genome were predicted to be coding regions. In total, 3,846 simple sequences repeats (SSRs) were identified, with a density of one SSR per 1.93 kb, from which 288 SSRs primers were designed and tested for length polymorphism using 20 sweetpotato accessions, 173 (60.07%) of them produced polymorphic bands. Sweetpotato BESs had significant hits to the genome sequences of I. trifida and more matches to the whole-genome sequences of Solanum lycopersicum than those of Vitis vinifera, Theobroma cacao and Arabidopsis thaliana. The first BAC library for sweetpotato has been successfully constructed. The high quality BESs provide first insights into sweetpotato genome composition, and have significant hits to the genome

  11. Pigs in sequence space: A 0.66X coverage pig genome survey based on shotgun sequencing

    DEFF Research Database (Denmark)

    Wernersson, Rasmus; Schierup, M.H.; Jorgensen, F.G.

    2005-01-01

    sequences (0.66X coverage) from the pig genome. The data are hereby released (NCBI Trace repository with center name "SDJVP", and project name "Sino-Danish Pig Genome Project") together with an initial evolutionary analysis. The non-repetitive fraction of the sequences was aligned to the UCSC human......-mouse alignment and the resulting three-species alignments were annotated using the human genome annotation. Ultra-conserved elements and miRNAs were identified. The results show that for each of these types of orthologous data, pig is much closer to human than mouse is. Purifying selection has been more...... on the human genome by bisecting the evolutionary branch between human and mouse with the mouse branch being approximately 3 times as long as the human branch. Additionally, the joint alignment of the shot-gun sequences to the human-mouse alignment offers the investigator a rapid way to defining specific...

  12. Development of simple sequence repeat (SSR) markers of sesame (Sesamum indicum) from a genome survey.

    Science.gov (United States)

    Wei, Xin; Wang, Linhai; Zhang, Yanxin; Qi, Xiaoqiong; Wang, Xiaoling; Ding, Xia; Zhang, Jing; Zhang, Xiurong

    2014-04-22

    Sesame (Sesamum indicum), an important oil crop, is widely grown in tropical and subtropical regions. It provides part of the daily edible oil allowance for almost half of the world's population. A limited number of co-dominant markers has been developed and applied in sesame genetic diversity and germplasm identity studies. Here we report for the first time a whole genome survey used to develop simple sequence repeat (SSR) markers and to detect the genetic diversity of sesame germplasm. From the initial assembled sesame genome, 23,438 SSRs (≥5 repeats) were identified. The most common repeat motif was dinucleotide with a frequency of 84.24%, followed by 13.53% trinucleotide, 1.65% tetranucleotide, 0.3% pentanucleotide and 0.28% hexanucleotide motifs. From 1500 designed and synthesised primer pairs, 218 polymorphic SSRs were developed and used to screen 31 sesame accessions that from 12 countries. STRUCTURE and phylogenetic analyses indicated that all sesame accessions could be divided into two groups: one mainly from China and another from other countries. Cluster analysis classified Chinese major sesame varieties into three groups. These novel SSR markers are a useful tool for genetic linkage map construction, genetic diversity detection, and marker-assisted selective sesame breeding.

  13. Comparative genomic survey, exon-intron annotation and phylogenetic analysis of NAT-homologous sequences in archaea, protists, fungi, viruses, and invertebrates

    Science.gov (United States)

    We have previously published extensive genomic surveys [1-3], reporting NAT-homologous sequences in hundreds of sequenced bacterial, fungal and vertebrate genomes. We present here the results of our latest search of 2445 genomes, representing 1532 (70 archaeal, 1210 bacterial, 43 protist, 97 fungal,...

  14. Survey of genome sequences in a wild sweet potato, Ipomoea trifida (H. B. K.) G. Don

    Science.gov (United States)

    Hirakawa, Hideki; Okada, Yoshihiro; Tabuchi, Hiroaki; Shirasawa, Kenta; Watanabe, Akiko; Tsuruoka, Hisano; Minami, Chiharu; Nakayama, Shinobu; Sasamoto, Shigemi; Kohara, Mitsuyo; Kishida, Yoshie; Fujishiro, Tsunakazu; Kato, Midori; Nanri, Keiko; Komaki, Akiko; Yoshinaga, Masaru; Takahata, Yasuhiro; Tanaka, Masaru; Tabata, Satoshi; Isobe, Sachiko N.

    2015-01-01

    Ipomoea trifida (H. B. K.) G. Don. is the most likely diploid ancestor of the hexaploid sweet potato, I. batatas (L.) Lam. To assist in analysis of the sweet potato genome, de novo whole-genome sequencing was performed with two lines of I. trifida, namely the selfed line Mx23Hm and the highly heterozygous line 0431-1, using the Illumina HiSeq platform. We classified the sequences thus obtained as either ‘core candidates’ (common to the two lines) or ‘line specific’. The total lengths of the assembled sequences of Mx23Hm (ITR_r1.0) was 513 Mb, while that of 0431-1 (ITRk_r1.0) was 712 Mb. Of the assembled sequences, 240 Mb (Mx23Hm) and 353 Mb (0431-1) were classified into core candidate sequences. A total of 62,407 (62.4 Mb) and 109,449 (87.2 Mb) putative genes were identified, respectively, in the genomes of Mx23Hm and 0431-1, of which 11,823 were derived from core sequences of Mx23Hm, while 28,831 were from the core candidate sequence of 0431-1. There were a total of 1,464,173 single-nucleotide polymorphisms and 16,682 copy number variations (CNVs) in the two assembled genomic sequences (under the condition of log2 ratio of >1 and CNV size >1,000 bases). The results presented here are expected to contribute to the progress of genomic and genetic studies of I. trifida, as well as studies of the sweet potato and the genus Ipomoea in general. PMID:25805887

  15. Survey of genome sequences in a wild sweet potato, Ipomoea trifida (H. B. K.) G. Don.

    Science.gov (United States)

    Hirakawa, Hideki; Okada, Yoshihiro; Tabuchi, Hiroaki; Shirasawa, Kenta; Watanabe, Akiko; Tsuruoka, Hisano; Minami, Chiharu; Nakayama, Shinobu; Sasamoto, Shigemi; Kohara, Mitsuyo; Kishida, Yoshie; Fujishiro, Tsunakazu; Kato, Midori; Nanri, Keiko; Komaki, Akiko; Yoshinaga, Masaru; Takahata, Yasuhiro; Tanaka, Masaru; Tabata, Satoshi; Isobe, Sachiko N

    2015-04-01

    Ipomoea trifida (H. B. K.) G. Don. is the most likely diploid ancestor of the hexaploid sweet potato, I. batatas (L.) Lam. To assist in analysis of the sweet potato genome, de novo whole-genome sequencing was performed with two lines of I. trifida, namely the selfed line Mx23Hm and the highly heterozygous line 0431-1, using the Illumina HiSeq platform. We classified the sequences thus obtained as either 'core candidates' (common to the two lines) or 'line specific'. The total lengths of the assembled sequences of Mx23Hm (ITR_r1.0) was 513 Mb, while that of 0431-1 (ITRk_r1.0) was 712 Mb. Of the assembled sequences, 240 Mb (Mx23Hm) and 353 Mb (0431-1) were classified into core candidate sequences. A total of 62,407 (62.4 Mb) and 109,449 (87.2 Mb) putative genes were identified, respectively, in the genomes of Mx23Hm and 0431-1, of which 11,823 were derived from core sequences of Mx23Hm, while 28,831 were from the core candidate sequence of 0431-1. There were a total of 1,464,173 single-nucleotide polymorphisms and 16,682 copy number variations (CNVs) in the two assembled genomic sequences (under the condition of log2 ratio of >1 and CNV size >1,000 bases). The results presented here are expected to contribute to the progress of genomic and genetic studies of I. trifida, as well as studies of the sweet potato and the genus Ipomoea in general. © The Author 2015. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  16. Survey and analysis of simple sequence repeats in the Laccaria bicolor genome, with development of microsatellite markers

    Energy Technology Data Exchange (ETDEWEB)

    Labbe, Jessy L [ORNL; Murat, Claude [INRA, Nancy, France; Morin, Emmanuelle [INRA, Nancy, France; Le Tacon, F [UMR, France; Martin, Francis [INRA, Nancy, France

    2011-01-01

    It is becoming clear that simple sequence repeats (SSRs) play a significant role in fungal genome organization, and they are a large source of genetic markers for population genetics and meiotic maps. We identified SSRs in the Laccaria bicolor genome by in silico survey and analyzed their distribution in the different genomic regions. We also compared the abundance and distribution of SSRs in L. bicolor with those of the following fungal genomes: Phanerochaete chrysosporium, Coprinopsis cinerea, Ustilago maydis, Cryptococcus neoformans, Aspergillus nidulans, Magnaporthe grisea, Neurospora crassa and Saccharomyces cerevisiae. Using the MISA computer program, we detected 277,062 SSRs in the L. bicolor genome representing 8% of the assembled genomic sequence. Among the analyzed basidiomycetes, L. bicolor exhibited the highest SSR density although no correlation between relative abundance and the genome sizes was observed. In most genomes the short motifs (mono- to trinucleotides) were more abundant than the longer repeated SSRs. Generally, in each organism, the occurrence, relative abundance, and relative density of SSRs decreased as the repeat unit increased. Furthermore, each organism had its own common and longest SSRs. In the L. bicolor genome, most of the SSRs were located in intergenic regions (73.3%) and the highest SSR density was observed in transposable elements (TEs; 6,706 SSRs/Mb). However, 81% of the protein-coding genes contained SSRs in their exons, suggesting that SSR polymorphism may alter gene phenotypes. Within a L. bicolor offspring, sequence polymorphism of 78 SSRs was mainly detected in non-TE intergenic regions. Unlike previously developed microsatellite markers, these new ones are spread throughout the genome; these markers could have immediate applications in population genetics.

  17. Survey and analysis of simple sequence repeats (SSRs) in three genomes of Candida species.

    Science.gov (United States)

    Jia, Dongmei

    2016-06-15

    Simple sequence repeats (SSRs) or microsatellites, which composed of tandem repeated short units of 1-6 bp, have been paying attention continuously. Here, the distribution, composition and polymorphism of microsatellites and compound microsatellites were analyzed in three available genomes of Candida species (Candida dubliniensis, Candida glabrata and Candida orthopsilosis). The results show that there were 118,047, 66,259 and 61,119 microsatellites in genomes of C. dubliniensis, C. glabrata and C. orthopsilosis, respectively. The SSRs covered more than 1/3 length of genomes in the three species. The microsatellites, which just consist of bases A and (or) T, such as (A)n, (T)n, (AT)n, (TA)n, (AAT)n, (TAA)n, (TTA)n, (ATA)n, (ATT)n and (TAT)n, were predominant in the three genomes. The length of microsatellites was focused on 6 bp and 9 bp either in the three genomes or in its coding sequences. What's more, the relative abundance (19.89/kbp) and relative density (167.87 bp/kbp) of SSRs in sequence of mitochondrion of C. glabrata were significantly great than that in any one of genomes or chromosomes of the three species. In addition, the distance between any two adjacent microsatellites was an important factor to influence the formation of compound microsatellites. The analysis may be helpful for further studying the roles of microsatellites in genomes' origination, organization and evolution of Candida species. Copyright © 2016 Elsevier B.V. All rights reserved.

  18. Pigs in sequence space: A 0.66X coverage pig genome survey based on shotgun sequencing

    Directory of Open Access Journals (Sweden)

    Li Wei

    2005-05-01

    Full Text Available Abstract Background Comparative whole genome analysis of Mammalia can benefit from the addition of more species. The pig is an obvious choice due to its economic and medical importance as well as its evolutionary position in the artiodactyls. Results We have generated ~3.84 million shotgun sequences (0.66X coverage from the pig genome. The data are hereby released (NCBI Trace repository with center name "SDJVP", and project name "Sino-Danish Pig Genome Project" together with an initial evolutionary analysis. The non-repetitive fraction of the sequences was aligned to the UCSC human-mouse alignment and the resulting three-species alignments were annotated using the human genome annotation. Ultra-conserved elements and miRNAs were identified. The results show that for each of these types of orthologous data, pig is much closer to human than mouse is. Purifying selection has been more efficient in pig compared to human, but not as efficient as in mouse, and pig seems to have an isochore structure most similar to the structure in human. Conclusion The addition of the pig to the set of species sequenced at low coverage adds to the understanding of selective pressures that have acted on the human genome by bisecting the evolutionary branch between human and mouse with the mouse branch being approximately 3 times as long as the human branch. Additionally, the joint alignment of the shot-gun sequences to the human-mouse alignment offers the investigator a rapid way to defining specific regions for analysis and resequencing.

  19. Yeast genome sequencing:

    DEFF Research Database (Denmark)

    Piskur, Jure; Langkjær, Rikke Breinhold

    2004-01-01

    For decades, unicellular yeasts have been general models to help understand the eukaryotic cell and also our own biology. Recently, over a dozen yeast genomes have been sequenced, providing the basis to resolve several complex biological questions. Analysis of the novel sequence data has shown...... that the minimum number of genes from each species that need to be compared to produce a reliable phylogeny is about 20. Yeast has also become an attractive model to study speciation in eukaryotes, especially to understand molecular mechanisms behind the establishment of reproductive isolation. Comparison...... they are short and degenerate and occupy different positions. Comparative genomics helps to understand the origin of yeasts and points out crucial molecular events in yeast evolutionary history, such as whole-genome duplication and horizontal gene transfer(s). In addition, the accumulating sequence data provide...

  20. Complete genome sequence of a nonculturable Methanococcus maripaludis strain extracted in a metagenomic survey of petroleum reservoir fluids.

    Science.gov (United States)

    Wang, Xiaoyi; Greenfield, Paul; Li, Dongmei; Hendry, Philip; Volk, Herbert; Sutherland, Tara D

    2011-10-01

    Extraction of genome sequences from metagenomic data is crucial for reconstructing the metabolism of microbial communities that cannot be mimicked in the laboratory. A complete Methanococcus maripaludis genome was generated from metagenomic data derived from a thermophilic subsurface oil reservoir. M. maripaludis is a hydrogenotrophic methanogenic species that is common in mesophilic saline environments. Comparison of the genome from the thermophilic, subsurface environment with the genome of the type species will provide insight into the adaptation of a methanogenic genome to an oil reservoir environment.

  1. Genome Sequence Databases (Overview): Sequencing and Assembly

    Energy Technology Data Exchange (ETDEWEB)

    Lapidus, Alla L.

    2009-01-01

    From the date its role in heredity was discovered, DNA has been generating interest among scientists from different fields of knowledge: physicists have studied the three dimensional structure of the DNA molecule, biologists tried to decode the secrets of life hidden within these long molecules, and technologists invent and improve methods of DNA analysis. The analysis of the nucleotide sequence of DNA occupies a special place among the methods developed. Thanks to the variety of sequencing technologies available, the process of decoding the sequence of genomic DNA (or whole genome sequencing) has become robust and inexpensive. Meanwhile the assembly of whole genome sequences remains a challenging task. In addition to the need to assemble millions of DNA fragments of different length (from 35 bp (Solexa) to 800 bp (Sanger)), great interest in analysis of microbial communities (metagenomes) of different complexities raises new problems and pushes some new requirements for sequence assembly tools to the forefront. The genome assembly process can be divided into two steps: draft assembly and assembly improvement (finishing). Despite the fact that automatically performed assembly (or draft assembly) is capable of covering up to 98% of the genome, in most cases, it still contains incorrectly assembled reads. The error rate of the consensus sequence produced at this stage is about 1/2000 bp. A finished genome represents the genome assembly of much higher accuracy (with no gaps or incorrectly assembled areas) and quality ({approx}1 error/10,000 bp), validated through a number of computer and laboratory experiments.

  2. Genome Sequences of Oryza Species

    KAUST Repository

    Kumagai, Masahiko

    2018-02-14

    This chapter summarizes recent data obtained from genome sequencing, annotation projects, and studies on the genome diversity of Oryza sativa and related Oryza species. O. sativa, commonly known as Asian rice, is the first monocot species whose complete genome sequence was deciphered based on physical mapping by an international collaborative effort. This genome, along with its accurate and comprehensive annotation, has become an indispensable foundation for crop genomics and breeding. With the development of innovative sequencing technologies, genomic studies of O. sativa have dramatically increased; in particular, a large number of cultivars and wild accessions have been sequenced and compared with the reference rice genome. Since de novo genome sequencing has become cost-effective, the genome of African cultivated rice, O. glaberrima, has also been determined. Comparative genomic studies have highlighted the independent domestication processes of different rice species, but it also turned out that Asian and African rice share a common gene set that has experienced similar artificial selection. An international project aimed at constructing reference genomes and examining the genome diversity of wild Oryza species is currently underway, and the genomes of some species are publicly available. This project provides a platform for investigations such as the evolution, development, polyploidization, and improvement of crops. Studies on the genomic diversity of Oryza species, including wild species, should provide new insights to solve the problem of growing food demands in the face of rapid climatic changes.

  3. De novo 454 sequencing of barcoded BAC pools for comprehensive gene survey and genome analysis in the complex genome of barley

    Directory of Open Access Journals (Sweden)

    Scholz Uwe

    2009-11-01

    Full Text Available Abstract Background De novo sequencing the entire genome of a large complex plant genome like the one of barley (Hordeum vulgare L. is a major challenge both in terms of experimental feasibility and costs. The emergence and breathtaking progress of next generation sequencing technologies has put this goal into focus and a clone based strategy combined with the 454/Roche technology is conceivable. Results To test the feasibility, we sequenced 91 barcoded, pooled, gene containing barley BACs using the GS FLX platform and assembled the sequences under iterative change of parameters. The BAC assemblies were characterized by N50 of ~50 kb (N80 ~31 kb, N90 ~21 kb and a Q40 of 94%. For ~80% of the clones, the best assemblies consisted of less than 10 contigs at 24-fold mean sequence coverage. Moreover we show that gene containing regions seem to assemble completely and uninterrupted thus making the approach suitable for detecting complete and positionally anchored genes. By comparing the assemblies of four clones to their complete reference sequences generated by the Sanger method, we evaluated the distribution, quality and representativeness of the 454 sequences as well as the consistency and reliability of the assemblies. Conclusion The described multiplex 454 sequencing of barcoded BACs leads to sequence consensi highly representative for the clones. Assemblies are correct for the majority of contigs. Though the resolution of complex repetitive structures requires additional experimental efforts, our approach paves the way for a clone based strategy of sequencing the barley genome.

  4. High occurrence of functional new chimeric genes in survey of rice chromosome 3 short arm genome sequences.

    Science.gov (United States)

    Zhang, Chengjun; Wang, Jun; Marowsky, Nicholas C; Long, Manyuan; Wing, Rod A; Fan, Chuanzhu

    2013-01-01

    In an effort to identify newly evolved genes in rice, we searched the genomes of Asian-cultivated rice Oryza sativa ssp. japonica and its wild progenitors, looking for lineage-specific genes. Using genome pairwise comparison of approximately 20-Mb DNA sequences from the chromosome 3 short arm (Chr3s) in six rice species, O. sativa, O. nivara, O. rufipogon, O. glaberrima, O. barthii, and O. punctata, combined with synonymous substitution rate tests and other evidence, we were able to identify potential recently duplicated genes, which evolved within the last 1 Myr. We identified 28 functional O. sativa genes, which likely originated after O. sativa diverged from O. glaberrima. These genes account for around 1% (28/3,176) of all annotated genes on O. sativa's Chr3s. Among the 28 new genes, two recently duplicated segments contained eight genes. Fourteen of the 28 new genes consist of chimeric gene structure derived from one or multiple parental genes and flanking targeting sequences. Although the majority of these 28 new genes were formed by single or segmental DNA-based gene duplication and recombination, we found two genes that were likely originated partially through exon shuffling. Sequence divergence tests between new genes and their putative progenitors indicated that new genes were most likely evolving under natural selection. We showed all 28 new genes appeared to be functional, as suggested by Ka/Ks analysis and the presence of RNA-seq, cDNA, expressed sequence tag, massively parallel signature sequencing, and/or small RNA data. The high rate of new gene origination and of chimeric gene formation in rice may demonstrate rice's broad diversification, domestication, its environmental adaptation, and the role of new genes in rice speciation.

  5. A genome survey sequencing of the Java mouse deer (Tragulus javanicus) adds new aspects to the evolution of lineage specific retrotransposons in Ruminantia (Cetartiodactyla).

    Science.gov (United States)

    Gallus, S; Kumar, V; Bertelsen, M F; Janke, A; Nilsson, M A

    2015-10-25

    Ruminantia, the ruminating, hoofed mammals (cow, deer, giraffe and allies) are an unranked artiodactylan clade. Around 50-60 million years ago the BovB retrotransposon entered the ancestral ruminantian genome through horizontal gene transfer. A survey genome screen using 454-pyrosequencing of the Java mouse deer (Tragulus javanicus) and the lesser kudu (Tragelaphus imberbis) was done to investigate and to compare the landscape of transposable elements within Ruminantia. The family Tragulidae (mouse deer) is the only representative of Tragulina and phylogenetically important, because it represents the earliest divergence in Ruminantia. The data analyses show that, relative to other ruminantian species, the lesser kudu genome has seen an expansion of BovB Long INterspersed Elements (LINEs) and BovB related Short INterspersed Elements (SINEs) like BOVA2. In comparison the genome of Java mouse deer has fewer BovB elements than other ruminants, especially Bovinae, and has in addition a novel CHR-3 SINE most likely propagated by LINE-1. By contrast the other ruminants have low amounts of CHR SINEs but high numbers of actively propagating BovB-derived and BovB-propagated SINEs. The survey sequencing data suggest that the transposable element landscape in mouse deer (Tragulina) is unique among Ruminantia, suggesting a lineage specific evolutionary trajectory that does not involve BovB mediated retrotransposition. This shows that the genomic landscape of mobile genetic elements can rapidly change in any lineage. Copyright © 2015 Elsevier B.V. All rights reserved.

  6. Malaria Genome Sequencing Project

    Science.gov (United States)

    2004-01-01

    proteins in plastid segregation mutants of Toxoplasma gandii. L. Biot. Parasito . Today 11, 1-4 (1995). Chem. 276, 28436-28442 (2001). 11. Su, X. et al... parasito - gene mapping studies have shown that regions of gene synteny exist phorous vacuole membrane29 . between species of rodent malaria9 and between...Carucci, D. J. Rodent models of malaria in the genomics era. Trends Parasito , 18, selection of karyotype mutants and non-gametocyte producer mutants

  7. Survey and analysis of simple sequence repeats in the Ustilaginoidea virens genome and the development of microsatellite markers.

    Science.gov (United States)

    Yu, Mina; Yu, Junjie; Li, Huanhuan; Wang, Yahui; Yin, Xiaole; Bo, Huiwen; Ding, Hui; Zhou, Yuxin; Liu, Yongfeng

    2016-07-01

    Ustilaginoidea virens is the causal agent of rice false smut, causing quantitative and qualitative losses in rice industry. However, the development and application of simple sequence repeat (SSR) markers for genetic diversity studies in U. virens were limited. This study is the first to perform large-scale development of SSR markers of this pathogen at the genome level, to (1) compare these SSR markers with those of other fungi, (2) analyze the pattern of the SSRs, and (3) obtain more informative genetic markers. U. virens is rich in SSRs, and 13,778 SSRs were identified with a relative abundance of 349.7SSRs/Mb. The most common motifs in the genome or in noncoding regions were mononucleotides, whereas trinucleotides in coding sequences. A total of 6 out of 127 primers were randomly selected to be used to analyze 115 isolates, and these 6 primers showed high polymorphism in U. virens. This study may serve as an important resource for molecular genetic studies in U. virens. Copyright © 2016 Elsevier B.V. All rights reserved.

  8. Proficiency Testing for Bacterial Whole Genome Sequencing: An End-User Survey of Current Capabilities, Requirements and Priorities

    DEFF Research Database (Denmark)

    Moran-Gilad, Jacob; Sintchenko, Vitali; Karlsmose Pedersen, Susanne

    2015-01-01

    The advent of next-generation sequencing (NGS) has revolutionised public health microbiology. Given the potential impact of NGS, it is paramount to ensure standardisation of ‘wet’ laboratory and bioinformatic protocols and promote comparability of methods employed by different laboratories...... and their outputs. Therefore, one of the ambitious goals of the Global Microbial Identifier (GMI) initiative (http://​www.​globalmicrobiali​dentifier.​org/​) has been to establish a mechanism for inter-laboratory NGS proficiency testing (PT). This report presents findings from the survey recently conducted...

  9. Proficiency testing for bacterial whole genome sequencing: an end-user survey of current capabilities, requirements and priorities

    DEFF Research Database (Denmark)

    Moran-Gilad, Jacob; Sintchenko, Vitali; Karlsmose Pedersen, Susanne

    2015-01-01

    The advent of next-generation sequencing (NGS) has revolutionised public health microbiology. Given the potential impact of NGS, it is paramount to ensure standardisation of 'wet' laboratory and bioinformatic protocols and promote comparability of methods employed by different laboratories...... and their outputs. Therefore, one of the ambitious goals of the Global Microbial Identifier (GMI) initiative (http://www.globalmicrobialidentifier.org/) has been to establish a mechanism for inter-laboratory NGS proficiency testing (PT). This report presents findings from the survey recently conducted by Working...

  10. Challenges to genome sequence dissection in sweetpotato

    Science.gov (United States)

    Isobe, Sachiko; Shirasawa, Kenta; Hirakawa, Hideki

    2017-01-01

    The development of next generation sequencing (NGS) technologies has enabled the determination of whole genome sequences in many non-model plant species. However, genome sequencing in sweetpotato (Ipomoea batatas (L.) Lam) is still difficult because of the hexaploid genome structure. Previous studies suggested that a diploid wild relative, I. trifida (H.B.K.) Don., is the most possible ancestor of sweetpotato. Therefore, the genetic and genomic features of I. trifida have been studied as a potential reference for sweetpotato. Meanwhile, several research groups have begun the challenging task of directly sequencing the sweetpotato genome. In this manuscript, we review the recent results and activities of large-scale genome and transcriptome analysis related to genome sequence dissection in sweetpotato under the sections as follows: I. trifida genome and transcript sequencing, genome sequences of I. nil (Japanese morning glory), transcript sequences in sweetpotato, chloroplast sequences, transposable elements and transfer DNA. The recent international activities of de novo whole genome sequencing in sweetpotato are also described. The large-scale publically available genome and transcript sequence resources and the international genome sequencing streams are expected to promote the genome sequence dissection in sweetpotato. PMID:28465666

  11. Pig genome sequence - analysis and publication strategy

    DEFF Research Database (Denmark)

    Archibald, Alan L.; Bolund, Lars; Churcher, Carol

    2010-01-01

    BACKGROUND: The pig genome is being sequenced and characterised under the auspices of the Swine Genome Sequencing Consortium. The sequencing strategy followed a hybrid approach combining hierarchical shotgun sequencing of BAC clones and whole genome shotgun sequencing. RESULTS: Assemblies...... of the BAC clone derived genome sequence have been annotated using the Pre-Ensembl and Ensembl automated pipelines and made accessible through the Pre-Ensembl/Ensembl browsers. The current annotated genome assembly (Sscrofa9) was released with Ensembl 56 in September 2009. A revised assembly (Sscrofa10......) is under construction and will incorporate whole genome shotgun sequence (WGS) data providing > 30x genome coverage. The WGS sequence, most of which comprise short Illumina/Solexa reads, were generated from DNA from the same single Duroc sow as the source of the BAC library from which clones were...

  12. Elucidating population histories using genomic DNA sequences.

    Science.gov (United States)

    Vigilant, Linda

    2009-04-01

    In 1993, Cliff Jolly suggested that rather than debating species definitions and classifications, energy would be better spent investigating multidimensional patterns of variation and gene flow among populations. Until now, however, genetic studies of wild primate populations have been limited to very small portions of the genome. Access to complete genome sequences of humans, chimpanzees, macaques, and other primates makes it possible to design studies surveying substantial amounts of DNA sequence variation at multiple genetic loci in representatives of closely related but distinct wild primate populations. Such data can be analyzed with new approaches that estimate not only when populations diverged but also the relative amounts and directions of subsequent gene flow. These analyses will reemphasize the difficulty of achieving consistent species and subspecies definitions by revealing the extent of variation in the amount and duration of gene flow accompanying population divergences.

  13. Sequencing intractable DNA to close microbial genomes.

    Directory of Open Access Journals (Sweden)

    Richard A Hurt

    Full Text Available Advancement in high throughput DNA sequencing technologies has supported a rapid proliferation of microbial genome sequencing projects, providing the genetic blueprint for in-depth studies. Oftentimes, difficult to sequence regions in microbial genomes are ruled "intractable" resulting in a growing number of genomes with sequence gaps deposited in databases. A procedure was developed to sequence such problematic regions in the "non-contiguous finished" Desulfovibrio desulfuricans ND132 genome (6 intractable gaps and the Desulfovibrio africanus genome (1 intractable gap. The polynucleotides surrounding each gap formed GC rich secondary structures making the regions refractory to amplification and sequencing. Strand-displacing DNA polymerases used in concert with a novel ramped PCR extension cycle supported amplification and closure of all gap regions in both genomes. The developed procedures support accurate gene annotation, and provide a step-wise method that reduces the effort required for genome finishing.

  14. Snake Genome Sequencing: Results and Future Prospects

    Directory of Open Access Journals (Sweden)

    Harald M. I. Kerkkamp

    2016-12-01

    Full Text Available Snake genome sequencing is in its infancy—very much behind the progress made in sequencing the genomes of humans, model organisms and pathogens relevant to biomedical research, and agricultural species. We provide here an overview of some of the snake genome projects in progress, and discuss the biological findings, with special emphasis on toxinology, from the small number of draft snake genomes already published. We discuss the future of snake genomics, pointing out that new sequencing technologies will help overcome the problem of repetitive sequences in assembling snake genomes. Genome sequences are also likely to be valuable in examining the clustering of toxin genes on the chromosomes, in designing recombinant antivenoms and in studying the epigenetic regulation of toxin gene expression.

  15. Snake Genome Sequencing: Results and Future Prospects.

    Science.gov (United States)

    Kerkkamp, Harald M I; Kini, R Manjunatha; Pospelov, Alexey S; Vonk, Freek J; Henkel, Christiaan V; Richardson, Michael K

    2016-12-01

    Snake genome sequencing is in its infancy-very much behind the progress made in sequencing the genomes of humans, model organisms and pathogens relevant to biomedical research, and agricultural species. We provide here an overview of some of the snake genome projects in progress, and discuss the biological findings, with special emphasis on toxinology, from the small number of draft snake genomes already published. We discuss the future of snake genomics, pointing out that new sequencing technologies will help overcome the problem of repetitive sequences in assembling snake genomes. Genome sequences are also likely to be valuable in examining the clustering of toxin genes on the chromosomes, in designing recombinant antivenoms and in studying the epigenetic regulation of toxin gene expression.

  16. Complete sequence of the mitochondrial genome of ...

    Indian Academy of Sciences (India)

    Supplementary data: Complete sequence of the mitochondrial genome of Odontamblyopus rubicundus (Perciformes: Gobiidae): genome characterization and phylogenetic analysis. Tianxing Liu, Xiaoxiao Jin, Rixin Wang and Tianjun Xu. J. Genet. 92, 423–432. Figure 1. Gene map of O. rubicundus mitochondrial genome.

  17. Towards a reference pecan genome sequence

    Science.gov (United States)

    The cost of generating DNA sequence data has declined dramatically over the previous 15 years as a result of the Human Genome Project and the potential applications of genome sequencing for human medicine. This cost reduction has generated renewed interest among crop breeding scientists in applying...

  18. Value of a newly sequenced bacterial genome

    DEFF Research Database (Denmark)

    Barbosa, Eudes; Aburjaile, Flavia F; Ramos, Rommel Tj

    2014-01-01

    and annotation will not be undertaken. It is important to know what is lost when we settle for a draft genome and to determine the "scientific value" of a newly sequenced genome. This review addresses the expected impact of newly sequenced genomes on antibacterial discovery and vaccinology. Also, it discusses...... the factors that could be leading to the increase in the number of draft deposits and the consequent loss of relevant biological information....

  19. A survey of HK, HPt, and RR domains and their organization in two-component systems and phosphorelay proteins of organisms with fully sequenced genomes

    Directory of Open Access Journals (Sweden)

    Baldiri Salvado

    2015-08-01

    Full Text Available Two Component Systems and Phosphorelays (TCS/PR are environmental signal transduction cascades in prokaryotes and, less frequently, in eukaryotes. The internal domain organization of proteins and the topology of TCS/PR cascades play an important role in shaping the responses of the circuits. It is thus important to maintain updated censuses of TCS/PR proteins in order to identify the various topologies used by nature and enable a systematic study of the dynamics associated with those topologies. To create such a census, we analyzed the proteomes of 7,609 organisms from all domains of life with fully sequenced and annotated genomes. To begin, we survey each proteome searching for proteins containing domains that are associated with internal signal transmission within TCS/PR: Histidine Kinase (HK, Response Regulator (RR and Histidine Phosphotranfer (HPt domains, and analyze how these domains are arranged in the individual proteins. Then, we find all types of operon organization and calculate how much more likely are proteins that contain TCS/PR domains to be coded by neighboring genes than one would expect from the genome background of each organism. Finally, we analyze if the fusion of domains into single TCS/PR proteins is more frequently observed than one might expect from the background of each proteome. We find 50 alternative ways in which the HK, HPt, and RR domains are observed to organize into single proteins. In prokaryotes, TCS/PR coding genes tend to be clustered in operons. 90% of all proteins identified in this study contain just one of the three domains, while 8% of the remaining proteins combine one copy of an HK, a RR, and/or an HPt domain. In eukaryotes, 25% of all TCS/PR proteins have more than one domain. These results might have implications for how signals are internally transmitted within TCS/PR cascades. These implications could explain the selection of the various designs in alternative circumstances.

  20. Human Genome Sequencing in Health and Disease

    Science.gov (United States)

    Gonzaga-Jauregui, Claudia; Lupski, James R.; Gibbs, Richard A.

    2013-01-01

    Following the “finished,” euchromatic, haploid human reference genome sequence, the rapid development of novel, faster, and cheaper sequencing technologies is making possible the era of personalized human genomics. Personal diploid human genome sequences have been generated, and each has contributed to our better understanding of variation in the human genome. We have consequently begun to appreciate the vastness of individual genetic variation from single nucleotide to structural variants. Translation of genome-scale variation into medically useful information is, however, in its infancy. This review summarizes the initial steps undertaken in clinical implementation of personal genome information, and describes the application of whole-genome and exome sequencing to identify the cause of genetic diseases and to suggest adjuvant therapies. Better analysis tools and a deeper understanding of the biology of our genome are necessary in order to decipher, interpret, and optimize clinical utility of what the variation in the human genome can teach us. Personal genome sequencing may eventually become an instrument of common medical practice, providing information that assists in the formulation of a differential diagnosis. We outline herein some of the remaining challenges. PMID:22248320

  1. Genomic sequencing of Pleistocene cave bears

    Energy Technology Data Exchange (ETDEWEB)

    Noonan, James P.; Hofreiter, Michael; Smith, Doug; Priest, JamesR.; Rohland, Nadin; Rabeder, Gernot; Krause, Johannes; Detter, J. Chris; Paabo, Svante; Rubin, Edward M.

    2005-04-01

    Despite the information content of genomic DNA, ancient DNA studies to date have largely been limited to amplification of mitochondrial DNA due to technical hurdles such as contamination and degradation of ancient DNAs. In this study, we describe two metagenomic libraries constructed using unamplified DNA extracted from the bones of two 40,000-year-old extinct cave bears. Analysis of {approx}1 Mb of sequence from each library showed that, despite significant microbial contamination, 5.8 percent and 1.1 percent of clones in the libraries contain cave bear inserts, yielding 26,861 bp of cave bear genome sequence. Alignment of this sequence to the dog genome, the closest sequenced genome to cave bear in terms of evolutionary distance, revealed roughly the expected ratio of cave bear exons, repeats and conserved noncoding sequences. Only 0.04 percent of all clones sequenced were derived from contamination with modern human DNA. Comparison of cave bear with orthologous sequences from several modern bear species revealed the evolutionary relationship of these lineages. Using the metagenomic approach described here, we have recovered substantial quantities of mammalian genomic sequence more than twice as old as any previously reported, establishing the feasibility of ancient DNA genomic sequencing programs.

  2. Plantagora: modeling whole genome sequencing and assembly of plant genomes.

    Directory of Open Access Journals (Sweden)

    Roger Barthelson

    Full Text Available BACKGROUND: Genomics studies are being revolutionized by the next generation sequencing technologies, which have made whole genome sequencing much more accessible to the average researcher. Whole genome sequencing with the new technologies is a developing art that, despite the large volumes of data that can be produced, may still fail to provide a clear and thorough map of a genome. The Plantagora project was conceived to address specifically the gap between having the technical tools for genome sequencing and knowing precisely the best way to use them. METHODOLOGY/PRINCIPAL FINDINGS: For Plantagora, a platform was created for generating simulated reads from several different plant genomes of different sizes. The resulting read files mimicked either 454 or Illumina reads, with varying paired end spacing. Thousands of datasets of reads were created, most derived from our primary model genome, rice chromosome one. All reads were assembled with different software assemblers, including Newbler, Abyss, and SOAPdenovo, and the resulting assemblies were evaluated by an extensive battery of metrics chosen for these studies. The metrics included both statistics of the assembly sequences and fidelity-related measures derived by alignment of the assemblies to the original genome source for the reads. The results were presented in a website, which includes a data graphing tool, all created to help the user compare rapidly the feasibility and effectiveness of different sequencing and assembly strategies prior to testing an approach in the lab. Some of our own conclusions regarding the different strategies were also recorded on the website. CONCLUSIONS/SIGNIFICANCE: Plantagora provides a substantial body of information for comparing different approaches to sequencing a plant genome, and some conclusions regarding some of the specific approaches. Plantagora also provides a platform of metrics and tools for studying the process of sequencing and assembly

  3. Comparison of 61 Sequenced Escherichia coli Genomes

    DEFF Research Database (Denmark)

    Lukjancenko, Oksana; Wassenaar, T. M.; Ussery, David

    2010-01-01

    Escherichia coli is an important component of the biosphere and is an ideal model for studies of processes involved in bacterial genome evolution. Sixty-one publically available E. coli and Shigella spp. sequenced genomes are compared, using basic methods to produce phylogenetic and proteomics......% of the pan-genome and about 80% of a typical genome; some of these variable genes tend to be co-localized on genomic islands. The diversity within the species E. coli, and the overlap in gene content between this and related species, suggests a continuum rather than sharp species borders in this group...

  4. The characterization of twenty sequenced human genomes.

    Directory of Open Access Journals (Sweden)

    Kimberly Pelak

    2010-09-01

    Full Text Available We present the analysis of twenty human genomes to evaluate the prospects for identifying rare functional variants that contribute to a phenotype of interest. We sequenced at high coverage ten "case" genomes from individuals with severe hemophilia A and ten "control" genomes. We summarize the number of genetic variants emerging from a study of this magnitude, and provide a proof of concept for the identification of rare and highly-penetrant functional variants by confirming that the cause of hemophilia A is easily recognizable in this data set. We also show that the number of novel single nucleotide variants (SNVs discovered per genome seems to stabilize at about 144,000 new variants per genome, after the first 15 individuals have been sequenced. Finally, we find that, on average, each genome carries 165 homozygous protein-truncating or stop loss variants in genes representing a diverse set of pathways.

  5. Genomic Prediction from Whole Genome Sequence in Livestock: The 1000 Bull Genomes Project

    DEFF Research Database (Denmark)

    Hayes, Benjamin J; MacLeod, Iona M; Daetwyler, Hans D

    Advantages of using whole genome sequence data to predict genomic estimated breeding values (GEBV) include better persistence of accuracy of GEBV across generations and more accurate GEBV across breeds. The 1000 Bull Genomes Project provides a database of whole genome sequenced key ancestor bulls...

  6. Multilocus Sequence Typing of Total-Genome-Sequenced Bacteria

    DEFF Research Database (Denmark)

    Larsen, Mette Voldby; Cosentino, Salvatore; Rasmussen, Simon

    2012-01-01

    Accurate strain identification is essential for anyone working with bacteria. For many species, multilocus sequence typing (MLST) is considered the "gold standard" of typing, but it is traditionally performed in an expensive and time-consuming manner. As the costs of whole-genome sequencing (WGS...

  7. Mycobacterium tuberculosis H37Ra genome sequencing

    Indian Academy of Sciences (India)

    2007-02-09

    Feb 9, 2007 ... Home; Journals; Journal of Biosciences; Volume 32; Issue 2. Commentary: The value of comparative genomics in understanding mycobacterial virulence: Mycobacterium tuberculosis H37Ra genome sequencing – a worthwhile endeavour. Deepak Sharma Jaya Sivaswami Tyagi. Volume 32 Issue 2 March ...

  8. Simple sequence repeats in mycobacterial genomes

    Indian Academy of Sciences (India)

    Prakash

    J. Biosci. 32(1), January 2007. The list of microsatellite rich as well as poor regions in the five mycobacterial genomes. Local GC%. Repeat rich(+)/. Repeat poor(-). Total ORFs. Number of ... Simple sequence repeats in mycobacterial genomes. VATTIPALLY .... heat shock protein (grpE) (15839737), heat shock protein (dnaJ) ...

  9. Refined Pichia pastoris reference genome sequence

    Science.gov (United States)

    Sturmberger, Lukas; Chappell, Thomas; Geier, Martina; Krainer, Florian; Day, Kasey J.; Vide, Ursa; Trstenjak, Sara; Schiefer, Anja; Richardson, Toby; Soriaga, Leah; Darnhofer, Barbara; Birner-Gruenberger, Ruth; Glick, Benjamin S.; Tolstorukov, Ilya; Cregg, James; Madden, Knut; Glieder, Anton

    2016-01-01

    Strains of the species Komagataella phaffii are the most frequently used “Pichia pastoris” strains employed for recombinant protein production as well as studies on peroxisome biogenesis, autophagy and secretory pathway analyses. Genome sequencing of several different P. pastoris strains has provided the foundation for understanding these cellular functions in recent genomics, transcriptomics and proteomics experiments. This experimentation has identified mistakes, gaps and incorrectly annotated open reading frames in the previously published draft genome sequences. Here, a refined reference genome is presented, generated with genome and transcriptome sequencing data from multiple P. pastoris strains. Twelve major sequence gaps from 20 to 6000 base pairs were closed and 5111 out of 5256 putative open reading frames were manually curated and confirmed by RNA-seq and published LC-MS/MS data, including the addition of new open reading frames (ORFs) and a reduction in the number of spliced genes from 797 to 571. One chromosomal fragment of 76 kbp between two previous gaps on chromosome 1 and another 134 kbp fragment at the end of chromosome 4, as well as several shorter fragments needed re-orientation. In total more than 500 positions in the genome have been corrected. This reference genome is presented with new chromosomal numbering, positioning ribosomal repeats at the distal ends of the four chromosomes, and includes predicted chromosomal centromeres as well as the sequence of two linear cytoplasmic plasmids of 13.1 and 9.5 kbp found in some strains of P. pastoris. PMID:27084056

  10. Genome sequence and analysis of Lactobacillus helveticus

    Directory of Open Access Journals (Sweden)

    Paola eCremonesi

    2013-01-01

    Full Text Available The microbiological characterization of lactobacilli is historically well developed, but the genomic analysis is recent. Because of the widespread use of L. helveticus in cheese technology, information concerning the heterogeneity in this species is accumulating rapidly. Recently, the genome of five L. helveticus strains was sequenced to completion and compared with other genomically characterized lactobacilli. The genomic analysis of the first sequenced strain, L. helveticus DPC 4571, isolated from cheese and selected for its characteristics of rapid lysis and high proteolytic activity, has revealed a plethora of genes with industrial potential including those responsible for key metabolic functions such as proteolysis, lipolysis, and cell lysis. These genes and their derived enzymes can facilitate the production of cheese and cheese derivatives with potential for use as ingredients in consumer foods. In addition, L. helveticus has the potential to produce peptides with a biological function, such as angiotensin converting enzyme (ACE inhibitory activity, in fermented dairy products, demonstrating the therapeutic value of this species. A most intriguing feature of the genome of L. helveticus is the remarkable similarity in gene content with many intestinal lactobacilli. Comparative genomics has allowed the identification of key gene sets that facilitate a variety of lifestyles including adaptation to food matrices or the gastrointestinal tract.As genome sequence and functional genomic information continues to explode, key features of the genomes of L. helveticus strains continue to be discovered, answering many questions but also raising many new ones.

  11. Sequencing and comparing whole mitochondrial genomes ofanimals

    Energy Technology Data Exchange (ETDEWEB)

    Boore, Jeffrey L.; Macey, J. Robert; Medina, Monica

    2005-04-22

    Comparing complete animal mitochondrial genome sequences is becoming increasingly common for phylogenetic reconstruction and as a model for genome evolution. Not only are they much more informative than shorter sequences of individual genes for inferring evolutionary relatedness, but these data also provide sets of genome-level characters, such as the relative arrangements of genes, that can be especially powerful. We describe here the protocols commonly used for physically isolating mtDNA, for amplifying these by PCR or RCA, for cloning,sequencing, assembly, validation, and gene annotation, and for comparing both sequences and gene arrangements. On several topics, we offer general observations based on our experiences to date with determining and comparing complete mtDNA sequences.

  12. Whole-genome sequencing of veterinary pathogens

    DEFF Research Database (Denmark)

    Ronco, Troels

    using whole-genome sequencing. The results showed that NELoc-1 and -3 and the two virulence genes netB and cnaA were significantly more associated with NE isolates from chickens compared to NE isolates from turkeys. Only NELoc-2 was associated with NE isolates from both turkeys and chickens. A putative......-electrophoresis and single-locus sequencing has been widely used to characterize such types of veterinary pathogens. However, DNA sequencing techniques have become fast and cost effective in recent years and whole-genome sequencing data provide a much higher discriminative power and reproducibility than any...... of the traditional molecular techniques. In this PhD project three important veterinary pathogens (Clostridium perfringens, Escherichia coli and Staphylococcus aureus) were investigated using whole-genome sequencing. This was done in five different scientific papers which all have been published. Paper I and II...

  13. Acetone utilization by sulfate-reducing bacteria: draft genome sequence of Desulfococcus biacutus and a proteomic survey of acetone-inducible proteins

    OpenAIRE

    Gutiérrez Acosta, Olga B; Schleheck, David; Schink, Bernhard

    2014-01-01

    Background The sulfate-reducing bacterium Desulfococcus biacutus is able to utilize acetone for growth by an inducible degradation pathway that involves a novel activation reaction for acetone with CO as a co-substrate. The mechanism, enzyme(s) and gene(s) involved in this acetone activation reaction are of great interest because they represent a novel and yet undefined type of activation reaction under strictly anoxic conditions. Results In this study, a draft genome sequence of D. biacutus ...

  14. Genomic insight into the common carp (Cyprinus carpio genome by sequencing analysis of BAC-end sequences

    Directory of Open Access Journals (Sweden)

    Wang Jintu

    2011-04-01

    Full Text Available Abstract Background Common carp is one of the most important aquaculture teleost fish in the world. Common carp and other closely related Cyprinidae species provide over 30% aquaculture production in the world. However, common carp genomic resources are still relatively underdeveloped. BAC end sequences (BES are important resources for genome research on BAC-anchored genetic marker development, linkage map and physical map integration, and whole genome sequence assembling and scaffolding. Result To develop such valuable resources in common carp (Cyprinus carpio, a total of 40,224 BAC clones were sequenced on both ends, generating 65,720 clean BES with an average read length of 647 bp after sequence processing, representing 42,522,168 bp or 2.5% of common carp genome. The first survey of common carp genome was conducted with various bioinformatics tools. The common carp genome contains over 17.3% of repetitive elements with GC content of 36.8% and 518 transposon ORFs. To identify and develop BAC-anchored microsatellite markers, a total of 13,581 microsatellites were detected from 10,355 BES. The coding region of 7,127 genes were recognized from 9,443 BES on 7,453 BACs, with 1,990 BACs have genes on both ends. To evaluate the similarity to the genome of closely related zebrafish, BES of common carp were aligned against zebrafish genome. A total of 39,335 BES of common carp have conserved homologs on zebrafish genome which demonstrated the high similarity between zebrafish and common carp genomes, indicating the feasibility of comparative mapping between zebrafish and common carp once we have physical map of common carp. Conclusion BAC end sequences are great resources for the first genome wide survey of common carp. The repetitive DNA was estimated to be approximate 28% of common carp genome, indicating the higher complexity of the genome. Comparative analysis had mapped around 40,000 BES to zebrafish genome and established over 3

  15. Bioinformatic analysis of whole genome sequencing data

    OpenAIRE

    Maqbool, Khurram

    2014-01-01

    Evolution has shaped the life forms for billion of years. Domestication is an accelerated process that can be used as a model for evolutionary changes. The aim of this thesis project has been to carry out extensive bioinformatic analyses of whole genome sequencing data to reveal SNPs, InDels and selective sweeps in the chicken, pig and dog genome. Pig genome sequencing revealed loci under selection for elongation of back and increased number of vertebrae, associated with the NR6A1, PLAG1,...

  16. Harnessing Whole Genome Sequencing in Medical Mycology.

    Science.gov (United States)

    Cuomo, Christina A

    2017-01-01

    Comparative genome sequencing studies of human fungal pathogens enable identification of genes and variants associated with virulence and drug resistance. This review describes current approaches, resources, and advances in applying whole genome sequencing to study clinically important fungal pathogens. Genomes for some important fungal pathogens were only recently assembled, revealing gene family expansions in many species and extreme gene loss in one obligate species. The scale and scope of species sequenced is rapidly expanding, leveraging technological advances to assemble and annotate genomes with higher precision. By using iteratively improved reference assemblies or those generated de novo for new species, recent studies have compared the sequence of isolates representing populations or clinical cohorts. Whole genome approaches provide the resolution necessary for comparison of closely related isolates, for example, in the analysis of outbreaks or sampled across time within a single host. Genomic analysis of fungal pathogens has enabled both basic research and diagnostic studies. The increased scale of sequencing can be applied across populations, and new metagenomic methods allow direct analysis of complex samples.

  17. 10KP: A phylodiverse genome sequencing plan

    Science.gov (United States)

    Cheng, Shifeng; Melkonian, Michael; Brockington, Samuel; Archibald, John M; Delaux, Pierre-Marc; Melkonian, Barbara; Mavrodiev, Evgeny V; Sun, Wenjing; Fu, Yuan; Yang, Huanming; Soltis, Douglas E; Graham, Sean W; Soltis, Pamela S; Liu, Xin; Xu, Xun

    2018-01-01

    Abstract Understanding plant evolution and diversity in a phylogenomic context is an enormous challenge due, in part, to limited availability of genome-scale data across phylodiverse species. The 10KP (10,000 Plants) Genome Sequencing Project will sequence and characterize representative genomes from every major clade of embryophytes, green algae, and protists (excluding fungi) within the next 5 years. By implementing and continuously improving leading-edge sequencing technologies and bioinformatics tools, 10KP will catalogue the genome content of plant and protist diversity and make these data freely available as an enduring foundation for future scientific discoveries and applications. 10KP is structured as an international consortium, open to the global community, including botanical gardens, plant research institutes, universities, and private industry. Our immediate goal is to establish a policy framework for this endeavor, the principles of which are outlined here. PMID:29618049

  18. 10KP: A phylodiverse genome sequencing plan.

    Science.gov (United States)

    Cheng, Shifeng; Melkonian, Michael; Smith, Stephen A; Brockington, Samuel; Archibald, John M; Delaux, Pierre-Marc; Li, Fay-Wei; Melkonian, Barbara; Mavrodiev, Evgeny V; Sun, Wenjing; Fu, Yuan; Yang, Huanming; Soltis, Douglas E; Graham, Sean W; Soltis, Pamela S; Liu, Xin; Xu, Xun; Wong, Gane Ka-Shu

    2018-03-01

    Understanding plant evolution and diversity in a phylogenomic context is an enormous challenge due, in part, to limited availability of genome-scale data across phylodiverse species. The 10KP (10,000 Plants) Genome Sequencing Project will sequence and characterize representative genomes from every major clade of embryophytes, green algae, and protists (excluding fungi) within the next 5 years. By implementing and continuously improving leading-edge sequencing technologies and bioinformatics tools, 10KP will catalogue the genome content of plant and protist diversity and make these data freely available as an enduring foundation for future scientific discoveries and applications. 10KP is structured as an international consortium, open to the global community, including botanical gardens, plant research institutes, universities, and private industry. Our immediate goal is to establish a policy framework for this endeavor, the principles of which are outlined here.

  19. Acetone utilization by sulfate-reducing bacteria: draft genome sequence of Desulfococcus biacutus and a proteomic survey of acetone-inducible proteins.

    Science.gov (United States)

    Gutiérrez Acosta, Olga B; Schleheck, David; Schink, Bernhard

    2014-07-11

    The sulfate-reducing bacterium Desulfococcus biacutus is able to utilize acetone for growth by an inducible degradation pathway that involves a novel activation reaction for acetone with CO as a co-substrate. The mechanism, enzyme(s) and gene(s) involved in this acetone activation reaction are of great interest because they represent a novel and yet undefined type of activation reaction under strictly anoxic conditions. In this study, a draft genome sequence of D. biacutus was established. Sequencing, assembly and annotation resulted in 159 contigs with 5,242,029 base pairs and 4773 predicted genes; 4708 were predicted protein-encoding genes, and 3520 of these had a functional prediction. Proteins and genes were identified that are specifically induced during growth with acetone. A thiamine diphosphate-requiring enzyme appeared to be highly induced during growth with acetone and is probably involved in the activation reaction. Moreover, a coenzyme B12- dependent enzyme and proteins that are involved in redox reactions were also induced during growth with acetone. We present for the first time the genome of a sulfate reducer that is able to grow with acetone. The genome information of this organism represents an important tool for the elucidation of a novel reaction mechanism that is employed by a sulfate reducer in acetone activation.

  20. Genome Sequence of the Palaeopolyploid soybean

    Energy Technology Data Exchange (ETDEWEB)

    Schmutz, Jeremy; Cannon, Steven B.; Schlueter, Jessica; Ma, Jianxin; Mitros, Therese; Nelson, William; Hyten, David L.; Song, Qijian; Thelen, Jay J.; Cheng, Jianlin; Xu, Dong; Hellsten, Uffe; May, Gregory D.; Yu, Yeisoo; Sakura, Tetsuya; Umezawa, Taishi; Bhattacharyya, Madan K.; Sandhu, Devinder; Valliyodan, Babu; Lindquist, Erika; Peto, Myron; Grant, David; Shu, Shengqiang; Goodstein, David; Barry, Kerrie; Futrell-Griggs, Montona; Abernathy, Brian; Du, Jianchang; Tian, Zhixi; Zhu, Liucun; Gill, Navdeep; Joshi, Trupti; Libault, Marc; Sethuraman, Anand; Zhang, Xue-Cheng; Shinozaki, Kazuo; Nguyen, Henry T.; Wing, Rod A.; Cregan, Perry; Specht, James; Grimwood, Jane; Rokhsar, Dan; Stacey, Gary; Shoemaker, Randy C.; Jackson, Scott A.

    2009-08-03

    Soybean (Glycine max) is one of the most important crop plants for seed protein and oil content, and for its capacity to fix atmospheric nitrogen through symbioses with soil-borne microorganisms. We sequenced the 1.1-gigabase genome by a whole-genome shotgun approach and integrated it with physical and high-density genetic maps to create a chromosome-scale draft sequence assembly. We predict 46,430 protein-coding genes, 70percent more than Arabidopsis and similar to the poplar genome which, like soybean, is an ancient polyploid (palaeopolyploid). About 78percent of the predicted genes occur in chromosome ends, which comprise less than one-half of the genome but account for nearly all of the genetic recombination. Genome duplications occurred at approximately 59 and 13 million years ago, resulting in a highly duplicated genome with nearly 75percent of the genes present in multiple copies. The two duplication events were followed by gene diversification and loss, and numerous chromosome rearrangements. An accurate soybean genome sequence will facilitate the identification of the genetic basis of many soybean traits, and accelerate the creation of improved soybean varieties.

  1. Transforming clinical microbiology with bacterial genome sequencing

    OpenAIRE

    Didelot, Xavier; Bowden, Rory; Wilson, Daniel J.; Peto, Tim E. A.; Crook, Derrick W.

    2012-01-01

    Whole genome sequencing of bacteria has recently emerged as a cost-effective and convenient approach for addressing many microbiological questions. Here we review the current status of clinical microbiology and how it has already begun to be transformed by the use of next-generation sequencing. We focus on three essential tasks: identifying the species of an isolate, testing its properties such as resistance to antibiotics and virulence, and monitoring the emergence and spread of bacterial pa...

  2. Escherichia Coli: From Genome Sequences to Consequence

    Directory of Open Access Journals (Sweden)

    Mark Pallen

    2006-01-01

    Full Text Available The present article summarizes a presentation given by Professor Mark Pallen of the School of Medicine at the University of Birmingham (Birmingham, United Kingdom for the Fourth Stanier Lecture held in Regina, Saskatchewan, on November 9, 2004. Professor Pallen's lecture, entitled 'Escherichia coli: From genome sequences to consequences', provides a summary of the important discoveries of his team of research scientists in the area of genetic sequencing and variations in phenotypic expression.

  3. Genome sequencing and population genomics in non-model organisms.

    Science.gov (United States)

    Ellegren, Hans

    2014-01-01

    High-throughput sequencing technologies are revolutionizing the life sciences. The past 12 months have seen a burst of genome sequences from non-model organisms, in each case representing a fundamental source of data of significant importance to biological research. This has bearing on several aspects of evolutionary biology, and we are now beginning to see patterns emerging from these studies. These include significant heterogeneity in the rate of recombination that affects adaptive evolution and base composition, the role of population size in adaptive evolution, and the importance of expansion of gene families in lineage-specific adaptation. Moreover, resequencing of population samples (population genomics) has enabled the identification of the genetic basis of critical phenotypes and cast light on the landscape of genomic divergence during speciation. Copyright © 2013 Elsevier Ltd. All rights reserved.

  4. Genome shotgun sequencing and development of microsatellite ...

    African Journals Online (AJOL)

    Analysis of the gerbera genome DNA ('Raon') general library showed that sequences of (AT), (AG), (AAG) and (AAT) repeats appeared most often, whereas (AC), (AAC) and (ACC) were the least frequent. Primer pairs were designed for 80 loci. Only eight primer pairs produced reproducible polymorphic bands in the 28 ...

  5. Genome sequencing for obstetricians & gynaecologists | Kent ...

    African Journals Online (AJOL)

    The medical profession has been waiting for a decade to be invigorated by the sequencing of the human genome, arguably the greatest scientific project ever. The technology has been spectacular but the results of the project have yielded more unexpected results than definitive answers – many about the very nature of our ...

  6. Gene Fusion: A Genome Wide Survey

    Science.gov (United States)

    Liang, Ping; Riley, Monica

    2001-01-01

    As a well known fact, organisms form larger and complex multimodular (composite or chimeric) and mostly multi-functional proteins through gene fusion of two or more individual genes which have independent evolution histories and functions. We call each of these components a module. The existence of multimodular proteins may improves the efficiency in gene regulation and in cellular functions, and thus may give the host organism advantages in adaptation to environments. Analysis of all gene fusions in present-day organisms should allow us to examine the patterns of gene fusion in context with cellular functions, to trace back the evolution processes from the ancient smaller and uni-functional proteins to the present-day larger and complex multi-functional proteins, and to estimate the minimal number of ancestor proteins that existed in the last common ancestor for all life on earth. Although many multimodular proteins have been experimentally known, identification of gene fusion events systematically at genome scale had not been possible until recently when large number of completed genome sequences have been becoming available. In addition, technical difficulties for such analysis also exist due to the complexity of this biological and evolutionary process. We report from this study a new strategy to computationally identify multimodular proteins using completed genome sequences and the results surveyed from 22 organisms with the data from over 40 organisms to be presented during the meeting. Additional information is contained in the original extended abstract.

  7. Automated genome sequence analysis and annotation.

    Science.gov (United States)

    Andrade, M A; Brown, N P; Leroy, C; Hoersch, S; de Daruvar, A; Reich, C; Franchini, A; Tamames, J; Valencia, A; Ouzounis, C; Sander, C

    1999-05-01

    Large-scale genome projects generate a rapidly increasing number of sequences, most of them biochemically uncharacterized. Research in bioinformatics contributes to the development of methods for the computational characterization of these sequences. However, the installation and application of these methods require experience and are time consuming. We present here an automatic system for preliminary functional annotation of protein sequences that has been applied to the analysis of sets of sequences from complete genomes, both to refine overall performance and to make new discoveries comparable to those made by human experts. The GeneQuiz system includes a Web-based browser that allows examination of the evidence leading to an automatic annotation and offers additional information, views of the results, and links to biological databases that complement the automatic analysis. System structure and operating principles concerning the use of multiple sequence databases, underlying sequence analysis tools, lexical analyses of database annotations and decision criteria for functional assignments are detailed. The system makes automatic quality assessments of results based on prior experience with the underlying sequence analysis tools; overall error rates in functional assignment are estimated at 2.5-5% for cases annotated with highest reliability ('clear' cases). Sources of over-interpretation of results are discussed with proposals for improvement. A conservative definition for reporting 'new findings' that takes account of database maturity is presented along with examples of possible kinds of discoveries (new function, family and superfamily) made by the system. System performance in relation to sequence database coverage, database dynamics and database search methods is analysed, demonstrating the inherent advantages of an integrated automatic approach using multiple databases and search methods applied in an objective and repeatable manner. The GeneQuiz system

  8. The first genome sequences of human bocaviruses from Vietnam.

    Science.gov (United States)

    Thanh, Tran Tan; Van, Hoang Minh Tu; Hong, Nguyen Thi Thu; Nhu, Le Nguyen Truc; Anh, Nguyen To; Tuan, Ha Manh; Hien, Ho Van; Tuong, Nguyen Manh; Kien, Trinh Trung; Khanh, Truong Huu; Nhan, Le Nguyen Thanh; Hung, Nguyen Thanh; Chau, Nguyen Van Vinh; Thwaites, Guy; van Doorn, H Rogier; Tan, Le Van

    2016-01-01

    As part of an ongoing effort to generate complete genome sequences of hand, foot and mouth disease-causing enteroviruses directly from clinical specimens, two complete coding sequences and two partial genomic sequences of human bocavirus 1 (n=3) and 2 (n=1) were co-amplified and sequenced, representing the first genome sequences of human bocaviruses from Vietnam. The sequences may aid future study aiming at understanding the evolution of the pathogen.

  9. Templated sequence insertion polymorphisms in the human genome

    Science.gov (United States)

    Onozawa, Masahiro; Aplan, Peter

    2016-11-01

    Templated Sequence Insertion Polymorphism (TSIP) is a recently described form of polymorphism recognized in the human genome, in which a sequence that is templated from a distant genomic region is inserted into the genome, seemingly at random. TSIPs can be grouped into two classes based on nucleotide sequence features at the insertion junctions; Class 1 TSIPs show features of insertions that are mediated via the LINE-1 ORF2 protein, including 1) target-site duplication (TSD), 2) polyadenylation 10-30 nucleotides downstream of a “cryptic” polyadenylation signal, and 3) preference for insertion at a 5’-TTTT/A-3’ sequence. In contrast, class 2 TSIPs show features consistent with repair of a DNA double-strand break via insertion of a DNA “patch” that is derived from a distant genomic region. Survey of a large number of normal human volunteers demonstrates that most individuals have 25-30 TSIPs, and that these TSIPs track with specific geographic regions. Similar to other forms of human polymorphism, we suspect that these TSIPs may be important for the generation of human diversity and genetic diseases.

  10. Identifying driver mutations in sequenced cancer genomes

    DEFF Research Database (Denmark)

    Raphael, Benjamin J; Dobson, Jason R; Oesper, Layla

    2014-01-01

    High-throughput DNA sequencing is revolutionizing the study of cancer and enabling the measurement of the somatic mutations that drive cancer development. However, the resulting sequencing datasets are large and complex, obscuring the clinically important mutations in a background of errors, noise......, and random mutations. Here, we review computational approaches to identify somatic mutations in cancer genome sequences and to distinguish the driver mutations that are responsible for cancer from random, passenger mutations. First, we describe approaches to detect somatic mutations from high-throughput DNA...... sequencing data, particularly for tumor samples that comprise heterogeneous populations of cells. Next, we review computational approaches that aim to predict driver mutations according to their frequency of occurrence in a cohort of samples, or according to their predicted functional impact on protein...

  11. Whole genome sequence analysis of Mycobacterium suricattae

    KAUST Repository

    Dippenaar, Anzaan

    2015-10-21

    Tuberculosis occurs in various mammalian hosts and is caused by a range of different lineages of the Mycobacterium tuberculosis complex (MTBC). A recently described member, Mycobacterium suricattae, causes tuberculosis in meerkats (Suricata suricatta) in Southern Africa and preliminary genetic analysis showed this organism to be closely related to an MTBC pathogen of rock hyraxes (Procavia capensis), the dassie bacillus. Here we make use of whole genome sequencing to describe the evolution of the genome of M. suricattae, including known and novel regions of difference, SNPs and IS6110 insertion sites. We used genome-wide phylogenetic analysis to show that M. suricattae clusters with the chimpanzee bacillus, previously isolated from a chimpanzee (Pan troglodytes) in West Africa. We propose an evolutionary scenario for the Mycobacterium africanum lineage 6 complex, showing the evolutionary relationship of M. africanum and chimpanzee bacillus, and the closely related members M. suricattae, dassie bacillus and Mycobacterium mungi.

  12. Transforming clinical microbiology with bacterial genome sequencing.

    Science.gov (United States)

    Didelot, Xavier; Bowden, Rory; Wilson, Daniel J; Peto, Tim E A; Crook, Derrick W

    2012-09-01

    Whole-genome sequencing of bacteria has recently emerged as a cost-effective and convenient approach for addressing many microbiological questions. Here, we review the current status of clinical microbiology and how it has already begun to be transformed by using next-generation sequencing. We focus on three essential tasks: identifying the species of an isolate, testing its properties, such as resistance to antibiotics and virulence, and monitoring the emergence and spread of bacterial pathogens. We predict that the application of next-generation sequencing will soon be sufficiently fast, accurate and cheap to be used in routine clinical microbiology practice, where it could replace many complex current techniques with a single, more efficient workflow.

  13. An automated annotation tool for genomic DNA sequences using ...

    Indian Academy of Sciences (India)

    Genomic sequence data are often available well before the annotated sequence is published. We present a method for analysis of genomic DNA to identify coding sequences using the GeneScan algorithm and characterize these resultant sequences by BLAST. The routines are used to develop a system for automated ...

  14. An automated annotation tool for genomic DNA sequences using

    Indian Academy of Sciences (India)

    Genomic sequence data are often available well before the annotated sequence is published. We present a method for analysis of genomic DNA to identify coding sequences using the GeneScan algorithm and characterize these resultant sequences by BLAST. The routines are used to develop a system for automated ...

  15. Draft Genome Sequence of Mycobacterium chimaera Type ...

    Science.gov (United States)

    We report the draft genome sequence of the type strain Mycobacterium chimaera Fl-0169T, a member of the Mycobacterium avium complex (MAC). M. chimaera Fl-0169T was isolated from a patient in Italy and is highly similar to strains of M. chimaera isolated in Ireland, though Fl-0169T possesses unique virulence genes. Evidence suggests that M. avium, M. intracellulare, and M. chimaera are differently virulent and a comparative genomic analysis is critically needed to identify diagnostic targets that reliably differentiate species of MAC. With treatment costs for Mycobacterium infections estimated to be >$1.8 B annually in the U.S., correct species identification will result in improved treatment selection, lower costs, and improved patient outcomes.

  16. Genomic signal processing for DNA sequence clustering

    Directory of Open Access Journals (Sweden)

    Gerardo Mendizabal-Ruiz

    2018-01-01

    Full Text Available Genomic signal processing (GSP methods which convert DNA data to numerical values have recently been proposed, which would offer the opportunity of employing existing digital signal processing methods for genomic data. One of the most used methods for exploring data is cluster analysis which refers to the unsupervised classification of patterns in data. In this paper, we propose a novel approach for performing cluster analysis of DNA sequences that is based on the use of GSP methods and the K-means algorithm. We also propose a visualization method that facilitates the easy inspection and analysis of the results and possible hidden behaviors. Our results support the feasibility of employing the proposed method to find and easily visualize interesting features of sets of DNA data.

  17. Genomic signal processing for DNA sequence clustering.

    Science.gov (United States)

    Mendizabal-Ruiz, Gerardo; Román-Godínez, Israel; Torres-Ramos, Sulema; Salido-Ruiz, Ricardo A; Vélez-Pérez, Hugo; Morales, J Alejandro

    2018-01-01

    Genomic signal processing (GSP) methods which convert DNA data to numerical values have recently been proposed, which would offer the opportunity of employing existing digital signal processing methods for genomic data. One of the most used methods for exploring data is cluster analysis which refers to the unsupervised classification of patterns in data. In this paper, we propose a novel approach for performing cluster analysis of DNA sequences that is based on the use of GSP methods and the K-means algorithm. We also propose a visualization method that facilitates the easy inspection and analysis of the results and possible hidden behaviors. Our results support the feasibility of employing the proposed method to find and easily visualize interesting features of sets of DNA data.

  18. Genome sequence and analysis of the tuber crop potato

    DEFF Research Database (Denmark)

    Xu, X.; Pan, S.; Cheng, S.

    2011-01-01

    and assemble 86% of the 844-megabase genome. We predict 39,031 protein-coding genes and present evidence for at least two genome duplication events indicative of a palaeopolyploid origin. As the first genome sequence of an asterid, the potato genome reveals 2,642 genes specific to this large angiosperm clade...... contributed to the evolution of tuber development. The potato genome sequence provides a platform for genetic improvement of this vital crop....

  19. PennCNV in whole-genome sequencing data

    OpenAIRE

    de Araújo Lima, Leandro; Wang, Kai

    2017-01-01

    Background The use of high-throughput sequencing data has improved the results of genomic analysis due to the resolution of mapping algorithms. Although several tools for copy-number variation calling in whole genome sequencing have been published, the noisy nature of sequencing data is still a limitation for accuracy and concordance among such tools. To assess the performance of PennCNV original algorithm for array data in whole genome sequencing data, we processed mapping (BAM) files to ext...

  20. Building the sequence map of the human pan-genome

    DEFF Research Database (Denmark)

    Li, Ruiqiang; Li, Yingrui; Zheng, Hancheng

    2010-01-01

    Here we integrate the de novo assembly of an Asian and an African genome with the NCBI reference human genome, as a step toward constructing the human pan-genome. We identified approximately 5 Mb of novel sequences not present in the reference genome in each of these assemblies. Most novel...... analysis of predicted genes indicated that the novel sequences contain potentially functional coding regions. We estimate that a complete human pan-genome would contain approximately 19-40 Mb of novel sequence not present in the extant reference genome. The extensive amount of novel sequence contributing...... to the genetic variation of the pan-genome indicates the importance of using complete genome sequencing and de novo assembly....

  1. Initial sequencing and comparative analysis of the mouse genome.

    Science.gov (United States)

    Waterston, Robert H; Lindblad-Toh, Kerstin; Birney, Ewan; Rogers, Jane; Abril, Josep F; Agarwal, Pankaj; Agarwala, Richa; Ainscough, Rachel; Alexandersson, Marina; An, Peter; Antonarakis, Stylianos E; Attwood, John; Baertsch, Robert; Bailey, Jonathon; Barlow, Karen; Beck, Stephan; Berry, Eric; Birren, Bruce; Bloom, Toby; Bork, Peer; Botcherby, Marc; Bray, Nicolas; Brent, Michael R; Brown, Daniel G; Brown, Stephen D; Bult, Carol; Burton, John; Butler, Jonathan; Campbell, Robert D; Carninci, Piero; Cawley, Simon; Chiaromonte, Francesca; Chinwalla, Asif T; Church, Deanna M; Clamp, Michele; Clee, Christopher; Collins, Francis S; Cook, Lisa L; Copley, Richard R; Coulson, Alan; Couronne, Olivier; Cuff, James; Curwen, Val; Cutts, Tim; Daly, Mark; David, Robert; Davies, Joy; Delehaunty, Kimberly D; Deri, Justin; Dermitzakis, Emmanouil T; Dewey, Colin; Dickens, Nicholas J; Diekhans, Mark; Dodge, Sheila; Dubchak, Inna; Dunn, Diane M; Eddy, Sean R; Elnitski, Laura; Emes, Richard D; Eswara, Pallavi; Eyras, Eduardo; Felsenfeld, Adam; Fewell, Ginger A; Flicek, Paul; Foley, Karen; Frankel, Wayne N; Fulton, Lucinda A; Fulton, Robert S; Furey, Terrence S; Gage, Diane; Gibbs, Richard A; Glusman, Gustavo; Gnerre, Sante; Goldman, Nick; Goodstadt, Leo; Grafham, Darren; Graves, Tina A; Green, Eric D; Gregory, Simon; Guigó, Roderic; Guyer, Mark; Hardison, Ross C; Haussler, David; Hayashizaki, Yoshihide; Hillier, LaDeana W; Hinrichs, Angela; Hlavina, Wratko; Holzer, Timothy; Hsu, Fan; Hua, Axin; Hubbard, Tim; Hunt, Adrienne; Jackson, Ian; Jaffe, David B; Johnson, L Steven; Jones, Matthew; Jones, Thomas A; Joy, Ann; Kamal, Michael; Karlsson, Elinor K; Karolchik, Donna; Kasprzyk, Arkadiusz; Kawai, Jun; Keibler, Evan; Kells, Cristyn; Kent, W James; Kirby, Andrew; Kolbe, Diana L; Korf, Ian; Kucherlapati, Raju S; Kulbokas, Edward J; Kulp, David; Landers, Tom; Leger, J P; Leonard, Steven; Letunic, Ivica; Levine, Rosie; Li, Jia; Li, Ming; Lloyd, Christine; Lucas, Susan; Ma, Bin; Maglott, Donna R; Mardis, Elaine R; Matthews, Lucy; Mauceli, Evan; Mayer, John H; McCarthy, Megan; McCombie, W Richard; McLaren, Stuart; McLay, Kirsten; McPherson, John D; Meldrim, Jim; Meredith, Beverley; Mesirov, Jill P; Miller, Webb; Miner, Tracie L; Mongin, Emmanuel; Montgomery, Kate T; Morgan, Michael; Mott, Richard; Mullikin, James C; Muzny, Donna M; Nash, William E; Nelson, Joanne O; Nhan, Michael N; Nicol, Robert; Ning, Zemin; Nusbaum, Chad; O'Connor, Michael J; Okazaki, Yasushi; Oliver, Karen; Overton-Larty, Emma; Pachter, Lior; Parra, Genís; Pepin, Kymberlie H; Peterson, Jane; Pevzner, Pavel; Plumb, Robert; Pohl, Craig S; Poliakov, Alex; Ponce, Tracy C; Ponting, Chris P; Potter, Simon; Quail, Michael; Reymond, Alexandre; Roe, Bruce A; Roskin, Krishna M; Rubin, Edward M; Rust, Alistair G; Santos, Ralph; Sapojnikov, Victor; Schultz, Brian; Schultz, Jörg; Schwartz, Matthias S; Schwartz, Scott; Scott, Carol; Seaman, Steven; Searle, Steve; Sharpe, Ted; Sheridan, Andrew; Shownkeen, Ratna; Sims, Sarah; Singer, Jonathan B; Slater, Guy; Smit, Arian; Smith, Douglas R; Spencer, Brian; Stabenau, Arne; Stange-Thomann, Nicole; Sugnet, Charles; Suyama, Mikita; Tesler, Glenn; Thompson, Johanna; Torrents, David; Trevaskis, Evanne; Tromp, John; Ucla, Catherine; Ureta-Vidal, Abel; Vinson, Jade P; Von Niederhausern, Andrew C; Wade, Claire M; Wall, Melanie; Weber, Ryan J; Weiss, Robert B; Wendl, Michael C; West, Anthony P; Wetterstrand, Kris; Wheeler, Raymond; Whelan, Simon; Wierzbowski, Jamey; Willey, David; Williams, Sophie; Wilson, Richard K; Winter, Eitan; Worley, Kim C; Wyman, Dudley; Yang, Shan; Yang, Shiaw-Pyng; Zdobnov, Evgeny M; Zody, Michael C; Lander, Eric S

    2002-12-05

    The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.

  2. A genome survey sequencing of the Java mouse deer (Tragulus javanicus) adds new aspects to the evolution of lineage specific retrotransposons in Ruminantia (Cetartiodactyla)

    DEFF Research Database (Denmark)

    Gallus, S; Kumar, V; Bertelsen, Mads Frost

    2015-01-01

    it represents the earliest divergence in Ruminantia. The data analyses show that, relative to other ruminantian species, the lesser kudu genome has seen an expansion of BovB Long INterspersed Elements (LINEs) and BovB related Short INterspersed Elements (SINEs) like BOVA2. In comparison the genome of Java mouse...

  3. Reassociation kinetics-based approach for partial genome sequencing of the cattle tick, Rhipicephalus (Boophilus microplus

    Directory of Open Access Journals (Sweden)

    Bellgard Matthew

    2010-06-01

    Full Text Available Abstract Background The size and repetitive nature of the Rhipicephalus microplus genome makes obtaining a full genome sequence fiscally and technically problematic. To selectively obtain gene-enriched regions of this tick's genome, Cot filtration was performed, and Cot-filtered DNA was sequenced via 454 FLX pyrosequencing. Results The sequenced Cot-filtered genomic DNA was assembled with an EST-based gene index of 14,586 unique entries where each EST served as a potential "seed" for scaffold formation. The new sequence assembly extended the lengths of 3,913 of the 14,586 gene index entries. Over half of the extensions corresponded to extensions of over 30 amino acids. To survey the repetitive elements in the tick genome, the complete sequences of five BAC clones were determined. Both Class I and II transposable elements were found. Comparison of the BAC and Cot filtration data indicates that Cot filtration was highly successful in filtering repetitive DNA out of the genomic DNA used in 454 sequencing. Conclusion Cot filtration is a very useful strategy to incorporate into genome sequencing projects on organisms with large genome sizes and which contain high percentages of repetitive, difficult to assemble, genomic DNA. Combining the Cot selection approach with 454 sequencing and assembly with a pre-existing EST database as seeds resulted in extensions of 27% of the members of the EST database.

  4. Draft Genome Sequence of the Fungus Trametes hirsuta 072.

    Science.gov (United States)

    Pavlov, Andrey R; Tyazhelova, Tatiana V; Moiseenko, Konstantin V; Vasina, Daria V; Mosunova, Olga V; Fedorova, Tatiana V; Maloshenok, Lilya G; Landesman, Elena O; Bruskin, Sergei A; Psurtseva, Nadezhda V; Slesarev, Alexei I; Kozyavkin, Sergei A; Koroleva, Olga V

    2015-11-19

    A standard draft genome sequence of the white rot saprotrophic fungus Trametes hirsuta 072 (Basidiomycota, Polyporales) is presented. The genome sequence contains about 33.6 Mb assembled in 141 scaffolds with a G+C content of ~57.6%. The draft genome annotation predicts 14,598 putative protein-coding open reading frames (ORFs). Copyright © 2015 Pavlov et al.

  5. Mining olive genome through library sequencing and bioinformatics ...

    African Journals Online (AJOL)

    As one of the initial steps of olive (Olea europaea L.) genome analysis, a small insert genomic DNA library was constructed (digesting olive genomic DNA with SmaI and cloning the digestion products into pUC19 vector) and randomly picked 83 colonies were sequenced. Analysis of the insert sequences revealed 12 clones ...

  6. Genome sequence and genetic diversity of European ash trees.

    Science.gov (United States)

    Sollars, Elizabeth S A; Harper, Andrea L; Kelly, Laura J; Sambles, Christine M; Ramirez-Gonzalez, Ricardo H; Swarbreck, David; Kaithakottil, Gemy; Cooper, Endymion D; Uauy, Cristobal; Havlickova, Lenka; Worswick, Gemma; Studholme, David J; Zohren, Jasmin; Salmon, Deborah L; Clavijo, Bernardo J; Li, Yi; He, Zhesi; Fellgett, Alison; McKinney, Lea Vig; Nielsen, Lene Rostgaard; Douglas, Gerry C; Kjær, Erik Dahl; Downie, J Allan; Boshier, David; Lee, Steve; Clark, Jo; Grant, Murray; Bancroft, Ian; Caccamo, Mario; Buggs, Richard J A

    2017-01-12

    Ash trees (genus Fraxinus, family Oleaceae) are widespread throughout the Northern Hemisphere, but are being devastated in Europe by the fungus Hymenoscyphus fraxineus, causing ash dieback, and in North America by the herbivorous beetle Agrilus planipennis. Here we sequence the genome of a low-heterozygosity Fraxinus excelsior tree from Gloucestershire, UK, annotating 38,852 protein-coding genes of which 25% appear ash specific when compared with the genomes of ten other plant species. Analyses of paralogous genes suggest a whole-genome duplication shared with olive (Olea europaea, Oleaceae). We also re-sequence 37 F. excelsior trees from Europe, finding evidence for apparent long-term decline in effective population size. Using our reference sequence, we re-analyse association transcriptomic data, yielding improved markers for reduced susceptibility to ash dieback. Surveys of these markers in British populations suggest that reduced susceptibility to ash dieback may be more widespread in Great Britain than in Denmark. We also present evidence that susceptibility of trees to H. fraxineus is associated with their iridoid glycoside levels. This rapid, integrated, multidisciplinary research response to an emerging health threat in a non-model organism opens the way for mitigation of the epidemic.

  7. Rapid whole genome sequencing and precision neonatology.

    Science.gov (United States)

    Petrikin, Joshua E; Willig, Laurel K; Smith, Laurie D; Kingsmore, Stephen F

    2015-12-01

    Traditionally, genetic testing has been too slow or perceived to be impractical to initial management of the critically ill neonate. Technological advances have led to the ability to sequence and interpret the entire genome of a neonate in as little as 26 h. As the cost and speed of testing decreases, the utility of whole genome sequencing (WGS) of neonates for acute and latent genetic illness increases. Analyzing the entire genome allows for concomitant evaluation of the currently identified 5588 single gene diseases. When applied to a select population of ill infants in a level IV neonatal intensive care unit, WGS yielded a diagnosis of a causative genetic disease in 57% of patients. These diagnoses may lead to clinical management changes ranging from transition to palliative care for uniformly lethal conditions for alteration or initiation of medical or surgical therapy to improve outcomes in others. Thus, institution of 2-day WGS at time of acute presentation opens the possibility of early implementation of precision medicine. This implementation may create opportunities for early interventional, frequently novel or off-label therapies that may alter disease trajectory in infants with what would otherwise be fatal disease. Widespread deployment of rapid WGS and precision medicine will raise ethical issues pertaining to interpretation of variants of unknown significance, discovery of incidental findings related to adult onset conditions and carrier status, and implementation of medical therapies for which little is known in terms of risks and benefits. Despite these challenges, precision neonatology has significant potential both to decrease infant mortality related to genetic diseases with onset in newborns and to facilitate parental decision making regarding transition to palliative care. Copyright © 2015 Elsevier Inc. All rights reserved.

  8. Next Generation Sequencing at the University of Chicago Genomics Core

    Energy Technology Data Exchange (ETDEWEB)

    Faber, Pieter [University of Chicago

    2013-04-24

    The University of Chicago Genomics Core provides University of Chicago investigators (and external clients) access to State-of-the-Art genomics capabilities: next generation sequencing, Sanger sequencing / genotyping and micro-arrays (gene expression, genotyping, and methylation). The current presentation will highlight our capabilities in the area of ultra-high throughput sequencing analysis.

  9. Draft Genome Sequence of Bacillus amyloliquefaciens B-1895

    OpenAIRE

    Karlyshev, Andrey V.; Melnikov, Vyacheslav G.; Chistyakov, Vladimir A.

    2014-01-01

    In this report, we present a draft genome sequence of Bacillus amyloliquefaciens strain B-1895. Comparison with the genome of a reference strain demonstrated similar overall organization, as well as differences involving large gene clusters.

  10. The diploid genome sequence of an Asian individual

    DEFF Research Database (Denmark)

    Wang, Jun; Wang, Wei; Li, Ruiqiang

    2008-01-01

    Here we present the first diploid genome sequence of an Asian individual. The genome was sequenced to 36-fold average coverage using massively parallel sequencing technology. We aligned the short reads onto the NCBI human reference genome to 99.97% coverage, and guided by the reference genome, we...... used uniquely mapped reads to assemble a high-quality consensus sequence for 92% of the Asian individual's genome. We identified approximately 3 million single-nucleotide polymorphisms (SNPs) inside this region, of which 13.6% were not in the dbSNP database. Genotyping analysis showed that SNP...... identification had high accuracy and consistency, indicating the high sequence quality of this assembly. We also carried out heterozygote phasing and haplotype prediction against HapMap CHB and JPT haplotypes (Chinese and Japanese, respectively), sequence comparison with the two available individual genomes (J...

  11. Whole Genome Sequencing: Innovation Dream or Privacy Nightmare?

    OpenAIRE

    De Cristofaro, Emiliano

    2012-01-01

    Over the past several years, DNA sequencing has emerged as one of the driving forces in life-sciences, paving the way for affordable and accurate whole genome sequencing. As genomes represent the entirety of an organism's hereditary information, the availability of complete human genomes prompts a wide range of revolutionary applications. The hope for improving modern healthcare and better understanding the human genome propels many interesting and challenging research frontiers. Unfortunatel...

  12. Reconstructing cancer genomes from paired-end sequencing data

    Directory of Open Access Journals (Sweden)

    Oesper Layla

    2012-04-01

    Full Text Available Abstract Background A cancer genome is derived from the germline genome through a series of somatic mutations. Somatic structural variants - including duplications, deletions, inversions, translocations, and other rearrangements - result in a cancer genome that is a scrambling of intervals, or "blocks" of the germline genome sequence. We present an efficient algorithm for reconstructing the block organization of a cancer genome from paired-end DNA sequencing data. Results By aligning paired reads from a cancer genome - and a matched germline genome, if available - to the human reference genome, we derive: (i a partition of the reference genome into intervals; (ii adjacencies between these intervals in the cancer genome; (iii an estimated copy number for each interval. We formulate the Copy Number and Adjacency Genome Reconstruction Problem of determining the cancer genome as a sequence of the derived intervals that is consistent with the measured adjacencies and copy numbers. We design an efficient algorithm, called Paired-end Reconstruction of Genome Organization (PREGO, to solve this problem by reducing it to an optimization problem on an interval-adjacency graph constructed from the data. The solution to the optimization problem results in an Eulerian graph, containing an alternating Eulerian tour that corresponds to a cancer genome that is consistent with the sequencing data. We apply our algorithm to five ovarian cancer genomes that were sequenced as part of The Cancer Genome Atlas. We identify numerous rearrangements, or structural variants, in these genomes, analyze reciprocal vs. non-reciprocal rearrangements, and identify rearrangements consistent with known mechanisms of duplication such as tandem duplications and breakage/fusion/bridge (B/F/B cycles. Conclusions We demonstrate that PREGO efficiently identifies complex and biologically relevant rearrangements in cancer genome sequencing data. An implementation of the PREGO algorithm is

  13. Genome sequencing and annotation of Stenotrophomonas sp. SAM8

    Directory of Open Access Journals (Sweden)

    Samy Selim

    2015-12-01

    Full Text Available We report draft genome sequence of Stenotrophomonas sp. strain SAM8, isolated from environmental water. The draft genome size is 3,665,538 bp with a G + C content of 67.2% and contains 6 rRNA sequence (single copies of 5S, 16S & 23S rRNA. The genome sequence can be accessed at DDBJ/EMBL/GenBank under the accession no. LDAV00000000.

  14. Evolutionary insights from suffix array-based genome sequence ...

    Indian Academy of Sciences (India)

    2007-08-06

    Aug 6, 2007 ... Keywords. Biological language modelling toolkit (BLMT); genome sequence analysis; n-grams; pattern matching; suffix arrays; suffix trees; short peptide sequences genetic code bias ...

  15. Genome Sequence of Lactobacillus plantarum Strain UCMA 3037

    OpenAIRE

    Naz, Saima; Tareb, Raouf; Bernardeau, Marion; Vaisse, Melissa; Lucchetti-Miganeh, Celine; Rechenmann, Mathias; Vernoux, Jean-Paul

    2013-01-01

    Nucleic acid of the strain Lactobacillus plantarum UCMA 3037, isolated from raw milk camembert cheese in our laboratory, was sequenced. We present its draft genome sequence with the aim of studying its functional properties and relationship to the cheese ecosystem.

  16. A Snapshot of the Emerging Tomato Genome Sequence

    Directory of Open Access Journals (Sweden)

    Lukas A. Mueller

    2009-03-01

    Full Text Available The genome of tomato ( L. is being sequenced by an international consortium of 10 countries (Korea, China, the United Kingdom, India, the Netherlands, France, Japan, Spain, Italy, and the United States as part of the larger “International Solanaceae Genome Project (SOL: Systems Approach to Diversity and Adaptation” initiative. The tomato genome sequencing project uses an ordered bacterial artificial chromosome (BAC approach to generate a high-quality tomato euchromatic genome sequence for use as a reference genome for the Solanaceae and euasterids. Sequence is deposited at GenBank and at the SOL Genomics Network (SGN. Currently, there are around 1000 BACs finished or in progress, representing more than a third of the projected euchromatic portion of the genome. An annotation effort is also underway by the International Tomato Annotation Group. The expected number of genes in the euchromatin is ∼40,000, based on an estimate from a preliminary annotation of 11% of finished sequence. Here, we present this first snapshot of the emerging tomato genome and its annotation, a short comparison with potato ( L. sequence data, and the tools available for the researchers to exploit this new resource are also presented. In the future, whole-genome shotgun techniques will be combined with the BAC-by-BAC approach to cover the entire tomato genome. The high-quality reference euchromatic tomato sequence is expected to be near completion by 2010.

  17. A plant pathology perspective of fungal genome sequencing.

    Science.gov (United States)

    Aylward, Janneke; Steenkamp, Emma T; Dreyer, Léanne L; Roets, Francois; Wingfield, Brenda D; Wingfield, Michael J

    2017-06-01

    The majority of plant pathogens are fungi and many of these adversely affect food security. This mini-review aims to provide an analysis of the plant pathogenic fungi for which genome sequences are publically available, to assess their general genome characteristics, and to consider how genomics has impacted plant pathology. A list of sequenced fungal species was assembled, the taxonomy of all species verified, and the potential reason for sequencing each of the species considered. The genomes of 1090 fungal species are currently (October 2016) in the public domain and this number is rapidly rising. Pathogenic species comprised the largest category (35.5 %) and, amongst these, plant pathogens are predominant. Of the 191 plant pathogenic fungal species with available genomes, 61.3 % cause diseases on food crops, more than half of which are staple crops. The genomes of plant pathogens are slightly larger than those of other fungal species sequenced to date and they contain fewer coding sequences in relation to their genome size. Both of these factors can be attributed to the expansion of repeat elements. Sequenced genomes of plant pathogens provide blueprints from which potential virulence factors were identified and from which genes associated with different pathogenic strategies could be predicted. Genome sequences have also made it possible to evaluate adaptability of pathogen genomes and genomic regions that experience selection pressures. Some genomic patterns, however, remain poorly understood and plant pathogen genomes alone are not sufficient to unravel complex pathogen-host interactions. Genomes, therefore, cannot replace experimental studies that can be complex and tedious. Ultimately, the most promising application lies in using fungal plant pathogen genomics to inform disease management and risk assessment strategies. This will ultimately minimize the risks of future disease outbreaks and assist in preparation for emerging pathogen outbreaks.

  18. The genomic scrapheap challenge; extracting relevant data from unmapped whole genome sequencing reads, including strain specific genomic segments, in rats

    NARCIS (Netherlands)

    Van Der Weide, Robin H.; Simonis, Marieke; Hermsen, Roel; Toonen, Pim; Cuppen, Edwin; de Ligt, Joep

    2016-01-01

    Unmapped next-generation sequencing reads are typically ignored while they contain biologically relevant information. We systematically analyzed unmapped reads from whole genome sequencing of 33 inbred rat strains. High quality reads were selected and enriched for biologically relevant sequences;

  19. Microbial genome sequencing using optical mapping and Illumina sequencing

    Science.gov (United States)

    Introduction Optical mapping is a technique in which strands of genomic DNA are digested with one or more restriction enzymes, and a physical map of the genome constructed from the resulting image. In outline, genomic DNA is extracted from a pure culture, linearly arrayed on a specialized glass sli...

  20. An overview of wheat genome sequencing and its implications for ...

    Indian Academy of Sciences (India)

    Taken together, the availability of high-quality refer- ence genome and transcriptome data would invariably expe- dite the crop improvement programmes. Using the existing genetic and genomic resources and tools, wheat researchers shall be able to integrate and apply the genome sequence information to explore the ...

  1. Why size really matters when sequencing plant genomes

    Czech Academy of Sciences Publication Activity Database

    Kelly, L.J.; Leitch, A.R.; Fay, M. F.; Renny-Byfield, S.; Pellicer, J.; Macas, Jiří; Leitch, I.J.

    2012-01-01

    Roč. 5, č. 4 (2012), s. 415-425 ISSN 1755-0874 Institutional research plan: CEZ:AV0Z50510513 Institutional support: RVO:60077344 Keywords : C-value * genome assembly * genome size evolution * genome sequencing Subject RIV: EB - Genetics ; Molecular Biology Impact factor: 0.924, year: 2012

  2. Genome sequencing and annotation of Cellulomonas sp. HZM

    OpenAIRE

    Chua, Patric; Har, Zi Mei; Austin, Christopher M.; Yule, Catherine M.; Dykes, Gary A.; Lee, Sui Mae

    2015-01-01

    We report the draft genome sequence of Cellulomonas sp. HZM, isolated from a tropical peat swamp forest. The draft genome size is 3,559,280 bp with a G + C content of 73% and contains 3 rRNA sequences (single copies of 5S, 16S and 23S rRNA).

  3. Genome sequencing and annotation of Cellulomonas sp. HZM.

    Science.gov (United States)

    Chua, Patric; Har, Zi Mei; Austin, Christopher M; Yule, Catherine M; Dykes, Gary A; Lee, Sui Mae

    2015-09-01

    We report the draft genome sequence of Cellulomonas sp. HZM, isolated from a tropical peat swamp forest. The draft genome size is 3,559,280 bp with a G + C content of 73% and contains 3 rRNA sequences (single copies of 5S, 16S and 23S rRNA).

  4. Genome sequencing and annotation of Cellulomonas sp. HZM

    Directory of Open Access Journals (Sweden)

    Patric Chua

    2015-09-01

    Full Text Available We report the draft genome sequence of Cellulomonas sp. HZM, isolated from a tropical peat swamp forest. The draft genome size is 3,559,280 bp with a G + C content of 73% and contains 3 rRNA sequences (single copies of 5S, 16S and 23S rRNA.

  5. Draft Genome Sequence of NDM-1-Producing Leclercia adecarboxylata

    OpenAIRE

    Hoyos-Mallecot, Yannick; Rojo-Mart?n, Mar?a Dolores; Bonnin, R?my A.; Creton, Elodie; Navarro Mar?, Jose Mar?a; Naas, Thierry

    2017-01-01

    ABSTRACT Here, we provide the first draft genome sequence of NDM-1-producing Leclercia adecarboxylata, a human-opportunistic pathogen. The draft genome sequence consists of a total length of 5.13?Mbp, with an average G+C content of 55.2%.

  6. A computational genomics pipeline for prokaryotic sequencing projects.

    Science.gov (United States)

    Kislyuk, Andrey O; Katz, Lee S; Agrawal, Sonia; Hagen, Matthew S; Conley, Andrew B; Jayaraman, Pushkala; Nelakuditi, Viswateja; Humphrey, Jay C; Sammons, Scott A; Govil, Dhwani; Mair, Raydel D; Tatti, Kathleen M; Tondella, Maria L; Harcourt, Brian H; Mayer, Leonard W; Jordan, I King

    2010-08-01

    New sequencing technologies have accelerated research on prokaryotic genomes and have made genome sequencing operations outside major genome sequencing centers routine. However, no off-the-shelf solution exists for the combined assembly, gene prediction, genome annotation and data presentation necessary to interpret sequencing data. The resulting requirement to invest significant resources into custom informatics support for genome sequencing projects remains a major impediment to the accessibility of high-throughput sequence data. We present a self-contained, automated high-throughput open source genome sequencing and computational genomics pipeline suitable for prokaryotic sequencing projects. The pipeline has been used at the Georgia Institute of Technology and the Centers for Disease Control and Prevention for the analysis of Neisseria meningitidis and Bordetella bronchiseptica genomes. The pipeline is capable of enhanced or manually assisted reference-based assembly using multiple assemblers and modes; gene predictor combining; and functional annotation of genes and gene products. Because every component of the pipeline is executed on a local machine with no need to access resources over the Internet, the pipeline is suitable for projects of a sensitive nature. Annotation of virulence-related features makes the pipeline particularly useful for projects working with pathogenic prokaryotes. The pipeline is licensed under the open-source GNU General Public License and available at the Georgia Tech Neisseria Base (http://nbase.biology.gatech.edu/). The pipeline is implemented with a combination of Perl, Bourne Shell and MySQL and is compatible with Linux and other Unix systems.

  7. Biased distribution of DNA uptake sequences towards genome maintenance genes

    DEFF Research Database (Denmark)

    Davidsen, T.; Rodland, E.A.; Lagesen, K.

    2004-01-01

    Repeated sequence signatures are characteristic features of all genomic DNA. We have made a rigorous search for repeat genomic sequences in the human pathogens Neisseria meningitidis, Neisseria gonorrhoeae and Haemophilus influenzae and found that by far the most frequent 9-10mers residing within...

  8. Complete Genome Sequence of the Human Gut Symbiont Roseburia hominis

    DEFF Research Database (Denmark)

    Travis, Anthony J.; Kelly, Denise; Flint, Harry J

    2015-01-01

    We report here the complete genome sequence of the human gut symbiont Roseburia hominis A2-183(T) (= DSM 16839(T) = NCIMB 14029(T)), isolated from human feces. The genome is represented by a 3,592,125-bp chromosome with 3,405 coding sequences. A number of potential functions contributing to host...

  9. On the current status of Phakopsora pachyrhizi genome sequencing.

    Science.gov (United States)

    Loehrer, Marco; Vogel, Alexander; Huettel, Bruno; Reinhardt, Richard; Benes, Vladimir; Duplessis, Sébastien; Usadel, Björn; Schaffrath, Ulrich

    2014-01-01

    Recent advances in the field of sequencing technologies and bioinformatics allow a more rapid access to genomes of non-model organisms at sinking costs. Accordingly, draft genomes of several economically important cereal rust fungi have been released in the last 3 years. Aside from the very recent flax rust and poplar rust draft assemblies there are no genomic data available for other dicot-infecting rust fungi. In this article we outline rust fungus sequencing efforts and comment on the current status of Phakopsora pachyrhizi (Asian soybean rust) genome sequencing.

  10. Investigation of genome sequences within the family Pasteurellaceae

    DEFF Research Database (Denmark)

    Angen, Øystein; Ussery, David

    shows that the genomic diversity within species of Pasteurellaceae is a bit lower than the level which has been found for E. coli and B. cereus. This might reflect that most members of Pasteurellaecae have relatively small genomes and that the species of the family often are adapted to host specific......Introduction The bacterial genome sequences are now available for an increasing number of strains within the family Pasteurellaceae. At present, 24 Pasteurellaceae genomes are publicly available through internet databases, and another 40 genomes are being sequenced. This investigation will describe...... the core genome for both the family Pasteurellaceae and for the species Haemophilus influenzae. Methods Twenty genome sequences from the following species were included: Haemophilus influenzae (11 strains), Haemophilus ducreyi (1 strain), Histophilus somni (2 strains), Haemophilus parasuis (1 strain...

  11. Genome Sequence of Human Rhinovirus A22, Strain Lancaster/2015.

    Science.gov (United States)

    Atkinson, Kate V; Bishop, Lisa A; Rhodes, Glenn; Salez, Nicolas; McEwan, Neil R; Hegarty, Matthew J; Robey, Julie; Harding, Nicola; Wetherell, Simon; Lauder, Robert M; Pickup, Roger W; Wilkinson, Mark; Gatherer, Derek

    2017-03-23

    The genome of human rhinovirus A22 (HRV-A22) was assembled by deep sequencing RNA samples from nasopharyngeal swabs. The assembled genome is 8.7% divergent from the HRV-A22 reference strain over its full length, and it is only the second full-length genome sequence for HRV-A22. The new strain is designated strain HRV-A22/Lancaster/2015. Copyright © 2017 Atkinson et al.

  12. On the current status of Phakopsora pachyrhizi genome sequencing

    OpenAIRE

    Loehrer, Marco; Vogel, Alexander; Huettel, Bruno; Reinhardt, Richard; Benes, Vladimir; Duplessis, Sébastien; Usadel, Björn; Schaffrath, Ulrich

    2014-01-01

    Recent advances in the field of sequencing technologies and bioinformatics allow a more rapid access to genomes of non-model organisms at sinking costs. Accordingly, draft genomes of several economically important cereal rust fungi have been released in the last 3 years. Aside from the very recent flax rust and poplar rust draft assemblies there are no genomic data available for other dicot-infecting rust fungi. In this article we outline rust fungus sequencing efforts and comment on the curr...

  13. Online Resources Genome survey on invasive veined rapa whelk ...

    Indian Academy of Sciences (India)

    Hao Song

    189 read alignment. Bioinformatics 25:1966-1967. 190. Li R, Zhu H, Ruan J, Qian W, Fang X, Shi Z, Li Y, Li S, Shan G, and Kristiansen K. 2010. De novo assembly of human ... Zhou W, Hu Y, Sui Z, Fu F, Wang J, Chang L, Guo W, and Li B. 2013. Genome survey sequencing and genetic. 225 background characterization of ...

  14. Whole-Genome Sequences of Thirteen Isolates of Borrelia burgdorferi

    Energy Technology Data Exchange (ETDEWEB)

    Schutzer S. E.; Dunn J.; Fraser-Liggett, C. M.; Casjens, S. R.; Qiu, W.-G.; Mongodin, E. F.; Luft, B. J.

    2011-02-01

    Borrelia burgdorferi is a causative agent of Lyme disease in North America and Eurasia. The first complete genome sequence of B. burgdorferi strain 31, available for more than a decade, has assisted research on the pathogenesis of Lyme disease. Because a single genome sequence is not sufficient to understand the relationship between genotypic and geographic variation and disease phenotype, we determined the whole-genome sequences of 13 additional B. burgdorferi isolates that span the range of natural variation. These sequences should allow improved understanding of pathogenesis and provide a foundation for novel detection, diagnosis, and prevention strategies.

  15. Computational Profiling of Microbial Genomes using Short Sequences

    Science.gov (United States)

    Doering, Dale; Tsukuda, Toyoko

    2001-03-01

    The genomes of a number of microbial species have now been completely sequenced. We have developed a program for the statistical analysis of the appearance frequency and location of short DNA segments within an entire microbial genome. Using this program, the genomes of Methanococcus jannischii (1.66 Mbase; 68radiodurans (3.28 Mbase; 66and compared to a randomly generated genomic pattern. The random sequence shows the expected statistical frequency distribution about the average that equals the genome size divided by the total number of N size short segments (4N). In contrast, the microbial genomes are radically skewed with a large number of segments that rarely occur and a few that are highly represented in the genome. The specific distribution profile of the segments is strongly dependent on the overall bias in the organism. The biased appearance frequency allows us to develop a genome signature of each microbial species.

  16. Genomic libraries: II. Subcloning, sequencing, and assembling large-insert genomic DNA clones.

    Science.gov (United States)

    Quail, Mike A; Matthews, Lucy; Sims, Sarah; Lloyd, Christine; Beasley, Helen; Baxter, Simon W

    2011-01-01

    Sequencing large insert clones to completion is useful for characterizing specific genomic regions, identifying haplotypes, and closing gaps in whole genome sequencing projects. Despite being a standard technique in molecular laboratories, DNA sequencing using the Sanger method can be highly problematic when complex secondary structures or sequence repeats are encountered in genomic clones. Here, we describe methods to isolate DNA from a large insert clone (fosmid or BAC), subclone the sample, and sequence the region to the highest industry standard. Troubleshooting solutions for sequencing difficult templates are discussed.

  17. Real-time, portable genome sequencing for Ebola surveillance.

    Science.gov (United States)

    Quick, Joshua; Loman, Nicholas J; Duraffour, Sophie; Simpson, Jared T; Severi, Ettore; Cowley, Lauren; Bore, Joseph Akoi; Koundouno, Raymond; Dudas, Gytis; Mikhail, Amy; Ouédraogo, Nobila; Afrough, Babak; Bah, Amadou; Baum, Jonathan Hj; Becker-Ziaja, Beate; Boettcher, Jan-Peter; Cabeza-Cabrerizo, Mar; Camino-Sanchez, Alvaro; Carter, Lisa L; Doerrbecker, Juiliane; Enkirch, Theresa; Dorival, Isabel Graciela García; Hetzelt, Nicole; Hinzmann, Julia; Holm, Tobias; Kafetzopoulou, Liana Eleni; Koropogui, Michel; Kosgey, Abigail; Kuisma, Eeva; Logue, Christopher H; Mazzarelli, Antonio; Meisel, Sarah; Mertens, Marc; Michel, Janine; Ngabo, Didier; Nitzsche, Katja; Pallash, Elisa; Patrono, Livia Victoria; Portmann, Jasmine; Repits, Johanna Gabriella; Rickett, Natasha Yasmin; Sachse, Andrea; Singethan, Katrin; Vitoriano, Inês; Yemanaberhan, Rahel L; Zekeng, Elsa G; Trina, Racine; Bello, Alexander; Sall, Amadou Alpha; Faye, Ousmane; Faye, Oumar; Magassouba, N'Faly; Williams, Cecelia V; Amburgey, Victoria; Winona, Linda; Davis, Emily; Gerlach, Jon; Washington, Franck; Monteil, Vanessa; Jourdain, Marine; Bererd, Marion; Camara, Alimou; Somlare, Hermann; Camara, Abdoulaye; Gerard, Marianne; Bado, Guillaume; Baillet, Bernard; Delaune, Déborah; Nebie, Koumpingnin Yacouba; Diarra, Abdoulaye; Savane, Yacouba; Pallawo, Raymond Bernard; Gutierrez, Giovanna Jaramillo; Milhano, Natacha; Roger, Isabelle; Williams, Christopher J; Yattara, Facinet; Lewandowski, Kuiama; Taylor, Jamie; Rachwal, Philip; Turner, Daniel; Pollakis, Georgios; Hiscox, Julian A; Matthews, David A; O'Shea, Matthew K; Johnston, Andrew McD; Wilson, Duncan; Hutley, Emma; Smit, Erasmus; Di Caro, Antonino; Woelfel, Roman; Stoecker, Kilian; Fleischmann, Erna; Gabriel, Martin; Weller, Simon A; Koivogui, Lamine; Diallo, Boubacar; Keita, Sakoba; Rambaut, Andrew; Formenty, Pierre; Gunther, Stephan; Carroll, Miles W

    2016-02-11

    The Ebola virus disease epidemic in West Africa is the largest on record, responsible for over 28,599 cases and more than 11,299 deaths. Genome sequencing in viral outbreaks is desirable to characterize the infectious agent and determine its evolutionary rate. Genome sequencing also allows the identification of signatures of host adaptation, identification and monitoring of diagnostic targets, and characterization of responses to vaccines and treatments. The Ebola virus (EBOV) genome substitution rate in the Makona strain has been estimated at between 0.87 × 10(-3) and 1.42 × 10(-3) mutations per site per year. This is equivalent to 16-27 mutations in each genome, meaning that sequences diverge rapidly enough to identify distinct sub-lineages during a prolonged epidemic. Genome sequencing provides a high-resolution view of pathogen evolution and is increasingly sought after for outbreak surveillance. Sequence data may be used to guide control measures, but only if the results are generated quickly enough to inform interventions. Genomic surveillance during the epidemic has been sporadic owing to a lack of local sequencing capacity coupled with practical difficulties transporting samples to remote sequencing facilities. To address this problem, here we devise a genomic surveillance system that utilizes a novel nanopore DNA sequencing instrument. In April 2015 this system was transported in standard airline luggage to Guinea and used for real-time genomic surveillance of the ongoing epidemic. We present sequence data and analysis of 142 EBOV samples collected during the period March to October 2015. We were able to generate results less than 24 h after receiving an Ebola-positive sample, with the sequencing process taking as little as 15-60 min. We show that real-time genomic surveillance is possible in resource-limited settings and can be established rapidly to monitor outbreaks.

  18. From Sequence to Morphology - Long-Range Correlations in Complete Sequenced Genomes

    NARCIS (Netherlands)

    T.A. Knoch (Tobias)

    2004-01-01

    textabstractThe largely unresolved sequential organization, i.e. the relations within DNA sequences, and its connection to the three-dimensional organization of genomes was investigated by correlation analyses of completely sequenced chromosomes from Viroids, Archaea, Bacteria, Arabidopsis

  19. Using Partial Genomic Fosmid Libraries for Sequencing CompleteOrganellar Genomes

    Energy Technology Data Exchange (ETDEWEB)

    McNeal, Joel R.; Leebens-Mack, James H.; Arumuganathan, K.; Kuehl, Jennifer V.; Boore, Jeffrey L.; dePamphilis, Claude W.

    2005-08-26

    Organellar genome sequences provide numerous phylogenetic markers and yield insight into organellar function and molecular evolution. These genomes are much smaller in size than their nuclear counterparts; thus, their complete sequencing is much less expensive than total nuclear genome sequencing, making broader phylogenetic sampling feasible. However, for some organisms it is challenging to isolate plastid DNA for sequencing using standard methods. To overcome these difficulties, we constructed partial genomic libraries from total DNA preparations of two heterotrophic and two autotrophic angiosperm species using fosmid vectors. We then used macroarray screening to isolate clones containing large fragments of plastid DNA. A minimum tiling path of clones comprising the entire genome sequence of each plastid was selected, and these clones were shotgun-sequenced and assembled into complete genomes. Although this method worked well for both heterotrophic and autotrophic plants, nuclear genome size had a dramatic effect on the proportion of screened clones containing plastid DNA and, consequently, the overall number of clones that must be screened to ensure full plastid genome coverage. This technique makes it possible to determine complete plastid genome sequences for organisms that defy other available organellar genome sequencing methods, especially those for which limited amounts of tissue are available.

  20. Complete genome sequence of Acidimicrobium ferrooxidans type strain (ICPT)

    Energy Technology Data Exchange (ETDEWEB)

    Clum, Alicia; Nolan, Matt; Lang, Elke; Glavina Del Rio, Tijana; Tice, Hope; Copeland, Alex; Cheng, Jan-Fang; Lucas, Susan; Chen, Feng; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Ivanova, Natalia; Mavrommatis, Konstantinos; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Goker, Markus; Spring, Stefan; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jefferies, Cynthia C.; Chain, Patrick; Bristow, James; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C.; Klenk, Hans-Peter; Lapidus, Alla

    2009-05-20

    Acidimicrobium ferrooxidans (Clark and Norris 1996) is the sole and type species of the genus, which until recently was the only genus within the actinobacterial family Acidimicrobiaceae and in the order Acidomicrobiales. Rapid oxidation of iron pyrite during autotrophic growth in the absence of an enhanced CO2 concentration is characteristic for A. ferrooxidans. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of the order Acidomicrobiales, and the 2,158,157 bp long single replicon genome with its 2038 protein coding and 54 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  1. Oxford Nanopore MinION Sequencing and Genome Assembly

    Directory of Open Access Journals (Sweden)

    Hengyun Lu

    2016-10-01

    Full Text Available The revolution of genome sequencing is continuing after the successful second-generation sequencing (SGS technology. The third-generation sequencing (TGS technology, led by Pacific Biosciences (PacBio, is progressing rapidly, moving from a technology once only capable of providing data for small genome analysis, or for performing targeted screening, to one that promises high quality de novo assembly and structural variation detection for human-sized genomes. In 2014, the MinION, the first commercial sequencer using nanopore technology, was released by Oxford Nanopore Technologies (ONT. MinION identifies DNA bases by measuring the changes in electrical conductivity generated as DNA strands pass through a biological pore. Its portability, affordability, and speed in data production makes it suitable for real-time applications, the release of the long read sequencer MinION has thus generated much excitement and interest in the genomics community. While de novo genome assemblies can be cheaply produced from SGS data, assembly continuity is often relatively poor, due to the limited ability of short reads to handle long repeats. Assembly quality can be greatly improved by using TGS long reads, since repetitive regions can be easily expanded into using longer sequencing lengths, despite having higher error rates at the base level. The potential of nanopore sequencing has been demonstrated by various studies in genome surveillance at locations where rapid and reliable sequencing is needed, but where resources are limited.

  2. Genome Sequence of Australian Indigenous Wine Yeast Torulaspora delbrueckii COFT1 Using Nanopore Sequencing.

    Science.gov (United States)

    Tondini, Federico; Jiranek, Vladimir; Grbin, Paul R; Onetto, Cristobal A

    2018-04-26

    Here, we report the first sequenced genome of an indigenous Australian wine isolate of Torulaspora delbrueckii using the Oxford Nanopore MinION and Illumina HiSeq sequencing platforms. The genome size is 9.4 Mb and contains 4,831 genes. Copyright © 2018 Tondini et al.

  3. Sequencing of chloroplast genome using whole cellular DNA and Solexa sequencing technology

    Directory of Open Access Journals (Sweden)

    Jian eWu

    2012-11-01

    Full Text Available Sequencing of the chloroplast genome using traditional sequencing methods has been difficult because of its size (>120 kb and the complicated procedures required to prepare templates. To explore the feasibility of sequencing the chloroplast genome using DNA extracted from whole cells and Solexa sequencing technology, we sequenced whole cellular DNA isolated from leaves of three Brassica rapa accessions with one lane per accession. In total, 246 Mb, 362Mb, 361 Mb sequence data were generated for the three accessions Chiifu-401-42, Z16 and FT, respectively. Microreads were assembled by reference-guided assembly using the cpDNA sequences of B. rapa, Arabidopsis thaliana, and Nicotiana tabacum. We achieved coverage of more than 99.96% of the cp genome in the three tested accessions using the B. rapa sequence as the reference. When A. thaliana or N. tabacum sequences were used as references, 99.7–99.8% or 95.5–99.7% of the B. rapa chloroplast genome was covered, respectively. These results demonstrated that sequencing of whole cellular DNA isolated from young leaves using the Illumina Genome Analyzer is an efficient method for high-throughput sequencing of chloroplast genome.

  4. Simple sequence repeats in mycobacterial genomes

    Indian Academy of Sciences (India)

    2006-12-18

    Dec 18, 2006 ... Although prokaryotic genomes derive some plasticity due to microsatellite mutations they have in-built mechanisms to arrest undue expansions of microsatellites and one such mechanism is constituted by post-replicative DNA repair enzymes MutL, MutH and MutS. The mycobacterial genomes lack these ...

  5. Genome-scale validation of deep-sequencing libraries.

    Directory of Open Access Journals (Sweden)

    Dominic Schmidt

    Full Text Available Chromatin immunoprecipitation followed by high-throughput (HTP sequencing (ChIP-seq is a powerful tool to establish protein-DNA interactions genome-wide. The primary limitation of its broad application at present is the often-limited access to sequencers. Here we report a protocol, Mab-seq, that generates genome-scale quality evaluations for nucleic acid libraries intended for deep-sequencing. We show how commercially available genomic microarrays can be used to maximize the efficiency of library creation and quickly generate reliable preliminary data on a chromosomal scale in advance of deep sequencing. We also exploit this technique to compare enriched regions identified using microarrays with those identified by sequencing, demonstrating that they agree on a core set of clearly identified enriched regions, while characterizing the additional enriched regions identifiable using HTP sequencing.

  6. The Release 6 reference sequence of the Drosophila melanogaster genome.

    Science.gov (United States)

    Hoskins, Roger A; Carlson, Joseph W; Wan, Kenneth H; Park, Soo; Mendez, Ivonne; Galle, Samuel E; Booth, Benjamin W; Pfeiffer, Barret D; George, Reed A; Svirskas, Robert; Krzywinski, Martin; Schein, Jacqueline; Accardo, Maria Carmela; Damia, Elisabetta; Messina, Giovanni; Méndez-Lago, María; de Pablos, Beatriz; Demakova, Olga V; Andreyeva, Evgeniya N; Boldyreva, Lidiya V; Marra, Marco; Carvalho, A Bernardo; Dimitri, Patrizio; Villasante, Alfredo; Zhimulev, Igor F; Rubin, Gerald M; Karpen, Gary H; Celniker, Susan E

    2015-03-01

    Drosophila melanogaster plays an important role in molecular, genetic, and genomic studies of heredity, development, metabolism, behavior, and human disease. The initial reference genome sequence reported more than a decade ago had a profound impact on progress in Drosophila research, and improving the accuracy and completeness of this sequence continues to be important to further progress. We previously described improvement of the 117-Mb sequence in the euchromatic portion of the genome and 21 Mb in the heterochromatic portion, using a whole-genome shotgun assembly, BAC physical mapping, and clone-based finishing. Here, we report an improved reference sequence of the single-copy and middle-repetitive regions of the genome, produced using cytogenetic mapping to mitotic and polytene chromosomes, clone-based finishing and BAC fingerprint verification, ordering of scaffolds by alignment to cDNA sequences, incorporation of other map and sequence data, and validation by whole-genome optical restriction mapping. These data substantially improve the accuracy and completeness of the reference sequence and the order and orientation of sequence scaffolds into chromosome arm assemblies. Representation of the Y chromosome and other heterochromatic regions is particularly improved. The new 143.9-Mb reference sequence, designated Release 6, effectively exhausts clone-based technologies for mapping and sequencing. Highly repeat-rich regions, including large satellite blocks and functional elements such as the ribosomal RNA genes and the centromeres, are largely inaccessible to current sequencing and assembly methods and remain poorly represented. Further significant improvements will require sequencing technologies that do not depend on molecular cloning and that produce very long reads. © 2015 Hoskins et al.; Published by Cold Spring Harbor Laboratory Press.

  7. Complete genome sequence of Cellulomonas flavigena type strain (134T)

    Science.gov (United States)

    Abt, Birte; Foster, Brian; Lapidus, Alla; Clum, Alicia; Sun, Hui; Pukall, Rüdiger; Lucas, Susan; Glavina Del Rio, Tijana; Nolan, Matt; Tice, Hope; Cheng, Jan-Fang; Pitluck, Sam; Liolios, Konstantinos; Ivanova, Natalia; Mavromatis, Konstantinos; Ovchinnikova, Galina; Pati, Amrita; Goodwin, Lynne; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D.; Rohde, Manfred; Göker, Markus; Woyke, Tanja; Bristow, James; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C.; Klenk, Hans-Peter

    2010-01-01

    Cellulomonas flavigena (Kellerman and McBeth 1912) Bergey et al. 1923 is the type species of the genus Cellulomonas of the actinobacterial family Cellulomonadaceae. Members of the genus Cellulomonas are of special interest for their ability to degrade cellulose and hemicellulose, particularly with regard to the use of biomass as an alternative energy source. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of a member of the genus Cellulomonas, and next to the human pathogen Tropheryma whipplei the second complete genome sequence within the actinobacterial family Cellulomonadaceae. The 4,123,179 bp long single replicon genome with its 3,735 protein-coding and 53 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:21304688

  8. Complete genome sequence of Cellulomonas flavigena type strain (134T)

    Energy Technology Data Exchange (ETDEWEB)

    Abt, Birte [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Foster, Brian [U.S. Department of Energy, Joint Genome Institute; Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute; Clum, Alicia [U.S. Department of Energy, Joint Genome Institute; Sun, Hui [U.S. Department of Energy, Joint Genome Institute; Pukall, Rudiger [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Lucas, Susan [U.S. Department of Energy, Joint Genome Institute; Glavina Del Rio, Tijana [U.S. Department of Energy, Joint Genome Institute; Nolan, Matt [U.S. Department of Energy, Joint Genome Institute; Tice, Hope [U.S. Department of Energy, Joint Genome Institute; Cheng, Jan-Fang [U.S. Department of Energy, Joint Genome Institute; Pitluck, Sam [U.S. Department of Energy, Joint Genome Institute; Liolios, Konstantinos [U.S. Department of Energy, Joint Genome Institute; Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Mavromatis, K [U.S. Department of Energy, Joint Genome Institute; Ovchinnikova, Galina [U.S. Department of Energy, Joint Genome Institute; Pati, Amrita [U.S. Department of Energy, Joint Genome Institute; Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Chen, Amy [U.S. Department of Energy, Joint Genome Institute; Palaniappan, Krishna [U.S. Department of Energy, Joint Genome Institute; Land, Miriam L [ORNL; Hauser, Loren John [ORNL; Chang, Yun-Juan [ORNL; Jeffries, Cynthia [Oak Ridge National Laboratory (ORNL); Rohde, Manfred [HZI - Helmholtz Centre for Infection Research, Braunschweig, Germany; Goker, Markus [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Woyke, Tanja [U.S. Department of Energy, Joint Genome Institute; Bristow, James [U.S. Department of Energy, Joint Genome Institute; Eisen, Jonathan [U.S. Department of Energy, Joint Genome Institute; Markowitz, Victor [U.S. Department of Energy, Joint Genome Institute; Hugenholtz, Philip [U.S. Department of Energy, Joint Genome Institute; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute; Klenk, Hans-Peter [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany

    2010-01-01

    Cellulomonas flavigena (Kellerman and McBeth 1912) Bergey et al. 1923 is the type species of the genus Cellulomonas of the actinobacterial family Cellulomonadaceae. Members of the genus Cellulomonas are of special interest for their ability to degrade cellulose and hemicellulose, particularly with regard to the use of biomass as an alternative energy source. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of a member of the genus Cellulomonas, and next to the human pathogen Tropheryma whipplei the second complete genome sequence within the actinobacterial family Cellulomonadaceae. The 4,123,179 bp long single replicon genome with its 3,735 protein-coding and 53 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  9. The complete chloroplast genome sequence of Zanthoxylum piperitum.

    Science.gov (United States)

    Lee, Jonghoon; Lee, Hyeon Ju; Kim, Kyunghee; Lee, Sang-Choon; Sung, Sang Hyun; Yang, Tae-Jin

    2016-09-01

    The complete chloroplast genome sequence of Zanthoxylum piperitum, a plant species with useful aromatic oils in family Rutaceae, was generated in this study by de novo assembly with whole-genome sequence data. The chloroplast genome was 158 154 bp in length with a typical quadripartite structure containing a pair of inverted repeats of 27 644 bp, separated by large single copy and small single copy of 85 340 bp and 17 526 bp, respectively. The chloroplast genome harbored 112 genes consisting of 78 protein-coding genes 30 tRNA genes and 4 rRNA genes. Phylogenetic analysis of the complete chloroplast genome sequences with those of known relatives revealed that Z. piperitum is most closely related to the Citrus species.

  10. Genomic treasure troves: complete genome sequencing of herbarium and insect museum specimens.

    Directory of Open Access Journals (Sweden)

    Martijn Staats

    Full Text Available Unlocking the vast genomic diversity stored in natural history collections would create unprecedented opportunities for genome-scale evolutionary, phylogenetic, domestication and population genomic studies. Many researchers have been discouraged from using historical specimens in molecular studies because of both generally limited success of DNA extraction and the challenges associated with PCR-amplifying highly degraded DNA. In today's next-generation sequencing (NGS world, opportunities and prospects for historical DNA have changed dramatically, as most NGS methods are actually designed for taking short fragmented DNA molecules as templates. Here we show that using a standard multiplex and paired-end Illumina sequencing approach, genome-scale sequence data can be generated reliably from dry-preserved plant, fungal and insect specimens collected up to 115 years ago, and with minimal destructive sampling. Using a reference-based assembly approach, we were able to produce the entire nuclear genome of a 43-year-old Arabidopsis thaliana (Brassicaceae herbarium specimen with high and uniform sequence coverage. Nuclear genome sequences of three fungal specimens of 22-82 years of age (Agaricus bisporus, Laccaria bicolor, Pleurotus ostreatus were generated with 81.4-97.9% exome coverage. Complete organellar genome sequences were assembled for all specimens. Using de novo assembly we retrieved between 16.2-71.0% of coding sequence regions, and hence remain somewhat cautious about prospects for de novo genome assembly from historical specimens. Non-target sequence contaminations were observed in 2 of our insect museum specimens. We anticipate that future museum genomics projects will perhaps not generate entire genome sequences in all cases (our specimens contained relatively small and low-complexity genomes, but at least generating vital comparative genomic data for testing (phylogenetic, demographic and genetic hypotheses, that become increasingly more

  11. Genomic treasure troves: complete genome sequencing of herbarium and insect museum specimens.

    Science.gov (United States)

    Staats, Martijn; Erkens, Roy H J; van de Vossenberg, Bart; Wieringa, Jan J; Kraaijeveld, Ken; Stielow, Benjamin; Geml, József; Richardson, James E; Bakker, Freek T

    2013-01-01

    Unlocking the vast genomic diversity stored in natural history collections would create unprecedented opportunities for genome-scale evolutionary, phylogenetic, domestication and population genomic studies. Many researchers have been discouraged from using historical specimens in molecular studies because of both generally limited success of DNA extraction and the challenges associated with PCR-amplifying highly degraded DNA. In today's next-generation sequencing (NGS) world, opportunities and prospects for historical DNA have changed dramatically, as most NGS methods are actually designed for taking short fragmented DNA molecules as templates. Here we show that using a standard multiplex and paired-end Illumina sequencing approach, genome-scale sequence data can be generated reliably from dry-preserved plant, fungal and insect specimens collected up to 115 years ago, and with minimal destructive sampling. Using a reference-based assembly approach, we were able to produce the entire nuclear genome of a 43-year-old Arabidopsis thaliana (Brassicaceae) herbarium specimen with high and uniform sequence coverage. Nuclear genome sequences of three fungal specimens of 22-82 years of age (Agaricus bisporus, Laccaria bicolor, Pleurotus ostreatus) were generated with 81.4-97.9% exome coverage. Complete organellar genome sequences were assembled for all specimens. Using de novo assembly we retrieved between 16.2-71.0% of coding sequence regions, and hence remain somewhat cautious about prospects for de novo genome assembly from historical specimens. Non-target sequence contaminations were observed in 2 of our insect museum specimens. We anticipate that future museum genomics projects will perhaps not generate entire genome sequences in all cases (our specimens contained relatively small and low-complexity genomes), but at least generating vital comparative genomic data for testing (phylo)genetic, demographic and genetic hypotheses, that become increasingly more horizontal

  12. Genome sequence of vanilla distortion mosaic virus infecting Coriandrum sativum.

    Science.gov (United States)

    Adams, I P; Rai, S; Deka, M; Harju, V; Hodges, T; Hayward, G; Skelton, A; Fox, A; Boonham, N

    2014-12-01

    The 9573-nucleotide genome of a potyvirus was sequenced from a Coriandrum sativum plant from India with viral symptoms. On analysis, this virus was shown to have greater than 85 % nucleotide sequence identity to vanilla distortion mosaic virus (VDMV). Analysis of the putative coat protein sequence confirmed that this virus was in fact VDMV, with greater than 91 % amino acid sequence identity. The genome appears to encode a 3083-amino-acid polyprotein potentially cleaved into the 10 mature proteins expected in potyviruses. Phylogenetic analysis confirmed that VDMV is a distinct but ungrouped member of the genus Potyvirus.

  13. Draft Genome Sequence of Type Strain Streptococcus gordonii ATCC 10558

    DEFF Research Database (Denmark)

    Rasmussen, Louise Hesselbjerg; Dargis, Rimtas; Christensen, Jens Jørgen Elmer

    2016-01-01

    Streptococcus gordonii ATCC 10558T was isolated from a patient with infective endocarditis in 1946 and announced as a type strain in 1989. Here, we report the 2,154,510-bp draft genome sequence of S. gordonii ATCC 10558T. This sequence will contribute to knowledge about the pathogenesis of infect......Streptococcus gordonii ATCC 10558T was isolated from a patient with infective endocarditis in 1946 and announced as a type strain in 1989. Here, we report the 2,154,510-bp draft genome sequence of S. gordonii ATCC 10558T. This sequence will contribute to knowledge about the pathogenesis...

  14. Cancer Genome Sequencing and Its Implications for Personalized Cancer Vaccines

    International Nuclear Information System (INIS)

    Li, Lijin; Goedegebuure, Peter; Mardis, Elaine R.; Ellis, Matthew J.C.; Zhang, Xiuli; Herndon, John M.; Fleming, Timothy P.; Carreno, Beatriz M.; Hansen, Ted H.; Gillanders, William E.

    2011-01-01

    New DNA sequencing platforms have revolutionized human genome sequencing. The dramatic advances in genome sequencing technologies predict that the $1,000 genome will become a reality within the next few years. Applied to cancer, the availability of cancer genome sequences permits real-time decision-making with the potential to affect diagnosis, prognosis, and treatment, and has opened the door towards personalized medicine. A promising strategy is the identification of mutated tumor antigens, and the design of personalized cancer vaccines. Supporting this notion are preliminary analyses of the epitope landscape in breast cancer suggesting that individual tumors express significant numbers of novel antigens to the immune system that can be specifically targeted through cancer vaccines

  15. Cancer Genome Sequencing and Its Implications for Personalized Cancer Vaccines

    Directory of Open Access Journals (Sweden)

    William E. Gillanders

    2011-11-01

    Full Text Available New DNA sequencing platforms have revolutionized human genome sequencing. The dramatic advances in genome sequencing technologies predict that the $1,000 genome will become a reality within the next few years. Applied to cancer, the availability of cancer genome sequences permits real-time decision-making with the potential to affect diagnosis, prognosis, and treatment, and has opened the door towards personalized medicine. A promising strategy is the identification of mutated tumor antigens, and the design of personalized cancer vaccines. Supporting this notion are preliminary analyses of the epitope landscape in breast cancer suggesting that individual tumors express significant numbers of novel antigens to the immune system that can be specifically targeted through cancer vaccines.

  16. Cancer Genome Sequencing and Its Implications for Personalized Cancer Vaccines

    Energy Technology Data Exchange (ETDEWEB)

    Li, Lijin [Department of Surgery, Washington University School of Medicine, St. Louis, MO 63110 (United States); Goedegebuure, Peter [Department of Surgery, Washington University School of Medicine, St. Louis, MO 63110 (United States); The Alvin J. Siteman Cancer Center at Barnes-Jewish Hospital and Washington University School of Medicine, St. Louis, MO 63110 (United States); Mardis, Elaine R. [The Alvin J. Siteman Cancer Center at Barnes-Jewish Hospital and Washington University School of Medicine, St. Louis, MO 63110 (United States); The Genome Institute at Washington University School of Medicine, St. Louis, MO 63108 (United States); Ellis, Matthew J.C. [The Alvin J. Siteman Cancer Center at Barnes-Jewish Hospital and Washington University School of Medicine, St. Louis, MO 63110 (United States); Department of Medicine, Washington University School of Medicine, St. Louis, MO 63110 (United States); Zhang, Xiuli; Herndon, John M. [Department of Surgery, Washington University School of Medicine, St. Louis, MO 63110 (United States); Fleming, Timothy P. [Department of Surgery, Washington University School of Medicine, St. Louis, MO 63110 (United States); The Alvin J. Siteman Cancer Center at Barnes-Jewish Hospital and Washington University School of Medicine, St. Louis, MO 63110 (United States); Carreno, Beatriz M. [The Alvin J. Siteman Cancer Center at Barnes-Jewish Hospital and Washington University School of Medicine, St. Louis, MO 63110 (United States); Department of Medicine, Washington University School of Medicine, St. Louis, MO 63110 (United States); Hansen, Ted H. [The Alvin J. Siteman Cancer Center at Barnes-Jewish Hospital and Washington University School of Medicine, St. Louis, MO 63110 (United States); Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO 63110 (United States); Gillanders, William E., E-mail: gillandersw@wudosis.wustl.edu [Department of Surgery, Washington University School of Medicine, St. Louis, MO 63110 (United States); The Alvin J. Siteman Cancer Center at Barnes-Jewish Hospital and Washington University School of Medicine, St. Louis, MO 63110 (United States)

    2011-11-25

    New DNA sequencing platforms have revolutionized human genome sequencing. The dramatic advances in genome sequencing technologies predict that the $1,000 genome will become a reality within the next few years. Applied to cancer, the availability of cancer genome sequences permits real-time decision-making with the potential to affect diagnosis, prognosis, and treatment, and has opened the door towards personalized medicine. A promising strategy is the identification of mutated tumor antigens, and the design of personalized cancer vaccines. Supporting this notion are preliminary analyses of the epitope landscape in breast cancer suggesting that individual tumors express significant numbers of novel antigens to the immune system that can be specifically targeted through cancer vaccines.

  17. Complete Genome Sequence of Mycobacterium phlei Type Strain RIVM601174

    KAUST Repository

    Abdallah, A. M.

    2012-05-24

    Mycobacterium phlei is a rapidly growing nontuberculous Mycobacterium species that is typically nonpathogenic, with few reported cases of human disease. Here we report the whole genome sequence of M. phlei type strain RIVM601174.

  18. Draft Genome Sequences of 1,183 Salmonella Strains from the 100K Pathogen Genome Project.

    Science.gov (United States)

    Kong, Nguyet; Davis, Matthew; Arabyan, Narine; Huang, Bihua C; Weis, Allison M; Chen, Poyin; Thao, Kao; Ng, Whitney; Chin, Ning; Foutouhi, Soraya; Foutouhi, Azarene; Kaufman, James; Xie, Yi; Storey, Dylan B; Weimer, Bart C

    2017-07-13

    Salmonella is a common food-associated bacterium that has substantial impact on worldwide human health and the global economy. This is the public release of 1,183 Salmonella draft genome sequences as part of the 100K Pathogen Genome Project. These isolates represent global genomic diversity in the Salmonella genus. Copyright © 2017 Kong et al.

  19. A genome-wide analysis of FRT-like sequences in the human genome.

    Science.gov (United States)

    Shultz, Jeffry L; Voziyanova, Eugenia; Konieczka, Jay H; Voziyanov, Yuri

    2011-03-23

    Efficient and precise genome manipulations can be achieved by the Flp/FRT system of site-specific DNA recombination. Applications of this system are limited, however, to cases when target sites for Flp recombinase, FRT sites, are pre-introduced into a genome locale of interest. To expand use of the Flp/FRT system in genome engineering, variants of Flp recombinase can be evolved to recognize pre-existing genomic sequences that resemble FRT and thus can serve as recombination sites. To understand the distribution and sequence properties of genomic FRT-like sites, we performed a genome-wide analysis of FRT-like sites in the human genome using the experimentally-derived parameters. Out of 642,151 identified FRT-like sequences, 581,157 sequences were unique and 12,452 sequences had at least one exact duplicate. Duplicated FRT-like sequences are located mostly within LINE1, but also within LTRs of endogenous retroviruses, Alu repeats and other repetitive DNA sequences. The unique FRT-like sequences were classified based on the number of matches to FRT within the first four proximal bases pairs of the Flp binding elements of FRT and the nature of mismatched base pairs in the same region. The data obtained will be useful for the emerging field of genome engineering.

  20. A genome-wide analysis of FRT-like sequences in the human genome.

    Directory of Open Access Journals (Sweden)

    Jeffry L Shultz

    2011-03-01

    Full Text Available Efficient and precise genome manipulations can be achieved by the Flp/FRT system of site-specific DNA recombination. Applications of this system are limited, however, to cases when target sites for Flp recombinase, FRT sites, are pre-introduced into a genome locale of interest. To expand use of the Flp/FRT system in genome engineering, variants of Flp recombinase can be evolved to recognize pre-existing genomic sequences that resemble FRT and thus can serve as recombination sites. To understand the distribution and sequence properties of genomic FRT-like sites, we performed a genome-wide analysis of FRT-like sites in the human genome using the experimentally-derived parameters. Out of 642,151 identified FRT-like sequences, 581,157 sequences were unique and 12,452 sequences had at least one exact duplicate. Duplicated FRT-like sequences are located mostly within LINE1, but also within LTRs of endogenous retroviruses, Alu repeats and other repetitive DNA sequences. The unique FRT-like sequences were classified based on the number of matches to FRT within the first four proximal bases pairs of the Flp binding elements of FRT and the nature of mismatched base pairs in the same region. The data obtained will be useful for the emerging field of genome engineering.

  1. Bos taurus strain:dairy beef (cattle): 1000 Bull Genomes Run 2, Bovine Whole Genome Sequence

    NARCIS (Netherlands)

    Bouwman, A.C.; Daetwyler, H.D.; Chamberlain, Amanda J.; Ponce, Carla Hurtado; Sargolzaei, Mehdi; Schenkel, Flavio S.; Sahana, Goutam; Govignon-Gion, Armelle; Boitard, Simon; Dolezal, Marlies; Pausch, Hubert; Brøndum, Rasmus F.; Bowman, Phil J.; Thomsen, Bo; Guldbrandtsen, Bernt; Lund, Mogens S.; Servin, Bertrand; Garrick, Dorian J.; Reecy, James M.; Vilkki, Johanna; Bagnato, Alessandro; Wang, Min; Hoff, Jesse L.; Schnabel, Robert D.; Taylor, Jeremy F.; Vinkhuyzen, Anna A.E.; Panitz, Frank; Bendixen, Christian; Holm, Lars-Erik; Gredler, Birgit; Hozé, Chris; Boussaha, Mekki; Sanchez, Marie Pierre; Rocha, Dominique; Capitan, Aurelien; Tribout, Thierry; Barbat, Anne; Croiseau, Pascal; Drögemüller, Cord; Jagannathan, Vidhya; Vander Jagt, Christy; Crowley, John J.; Bieber, Anna; Purfield, Deirdre C.; Berry, Donagh P.; Emmerling, Reiner; Götz, Kay Uwe; Frischknecht, Mirjam; Russ, Ingolf; Sölkner, Johann; Tassell, van Curtis P.; Fries, Ruedi; Stothard, Paul; Veerkamp, R.F.; Boichard, Didier; Goddard, Mike E.; Hayes, Ben J.

    2014-01-01

    Whole genome sequence data (BAM format) of 234 bovine individuals aligned to UMD3.1. The aim of the study was to identify genetic variants (SNPs and indels) for downstream analysis such as imputation, GWAS, and detection of lethal recessives. Additional sequences for later 1000 bull genomes runs can

  2. Genome Sequence of the Biocontrol Strain Pseudomonas fluorescens F113

    Science.gov (United States)

    Redondo-Nieto, Miguel; Barret, Matthieu; Morrisey, John P.; Germaine, Kieran; Martínez-Granero, Francisco; Barahona, Emma; Navazo, Ana; Sánchez-Contreras, María; Moynihan, Jennifer A.; Giddens, Stephen R.; Coppoolse, Eric R.; Muriel, Candela; Stiekema, Willem J.; Rainey, Paul B.; Dowling, David; O'Gara, Fergal; Martín, Marta

    2012-01-01

    Pseudomonas fluorescens F113 is a plant growth-promoting rhizobacterium (PGPR) that has biocontrol activity against fungal plant pathogens and is a model for rhizosphere colonization. Here, we present its complete genome sequence, which shows that besides a core genome very similar to those of other strains sequenced within this species, F113 possesses a wide array of genes encoding specialized functions for thriving in the rhizosphere and interacting with eukaryotic organisms. PMID:22328765

  3. Genome Sequence of the Milbemycin-Producing Bacterium Streptomyces bingchenggensis▿

    OpenAIRE

    Wang, Xiang-Jing; Yan, Yi-Jun; Zhang, Bo; An, Jing; Wang, Ji-Jia; Tian, Jun; Jiang, Ling; Chen, Yi-Hua; Huang, Sheng-Xiong; Yin, Min; Zhang, Ji; Gao, Ai-Li; Liu, Chong-Xi; Zhu, Zhao-Xiang; Xiang, Wen-Sheng

    2010-01-01

    Streptomyces bingchenggensis is a soil-dwelling bacterium producing the commercially important anthelmintic macrolide milbemycins. Besides milbemycins, the insecticidal polyether antibiotic nanchangmycin and some other antibiotics have also been isolated from this strain. Here we report the complete genome sequence of S. bingchenggensis. The availability of the genome sequence of S. bingchenggensis should enable us to understand the biosynthesis of these structurally intricate antibiotics bet...

  4. Genome sequence of Pantoea agglomerans strain IG1.

    Science.gov (United States)

    Matsuzawa, Tomohiko; Mori, Kazuki; Kadowaki, Takeshi; Shimada, Misato; Tashiro, Kosuke; Kuhara, Satoru; Inagawa, Hiroyuki; Soma, Gen-ichiro; Takegawa, Kaoru

    2012-03-01

    Pantoea agglomerans is a gram-negative bacterium that grows symbiotically with various plants. Here we report the 4.8-Mb genome sequence of P. agglomerans strain IG1. The lipopolysaccharides derived from P. agglomerans IG1 have been shown to be effective in the prevention of various diseases, such as bacterial or viral infection, lifestyle-related diseases. This genome sequence represents a substantial step toward the elucidation of pathways for production of lipopolysaccharides.

  5. Whole genome and transcriptome sequencing of a B3 thymoma.

    Directory of Open Access Journals (Sweden)

    Iacopo Petrini

    Full Text Available Molecular pathology of thymomas is poorly understood. Genomic aberrations are frequently identified in tumors but no extensive sequencing has been reported in thymomas. Here we present the first comprehensive view of a B3 thymoma at whole genome and transcriptome levels. A 55-year-old Caucasian female underwent complete resection of a stage IVA B3 thymoma. RNA and DNA were extracted from a snap frozen tumor sample with a fraction of cancer cells over 80%. We performed array comparative genomic hybridization using Agilent platform, transcriptome sequencing using HiSeq 2000 (Illumina and whole genome sequencing using Complete Genomics Inc platform. Whole genome sequencing determined, in tumor and normal, the sequence of both alleles in more than 95% of the reference genome (NCBI Build 37. Copy number (CN aberrations were comparable with those previously described for B3 thymomas, with CN gain of chromosome 1q, 5, 7 and X and CN loss of 3p, 6, 11q42.2-qter and q13. One translocation t(11;X was identified by whole genome sequencing and confirmed by PCR and Sanger sequencing. Ten single nucleotide variations (SNVs and 2 insertion/deletions (INDELs were identified; these mutations resulted in non-synonymous amino acid changes or affected splicing sites. The lack of common cancer-associated mutations in this patient suggests that thymomas may evolve through mechanisms distinctive from other tumor types, and supports the rationale for additional high-throughput sequencing screens to better understand the somatic genetic architecture of thymoma.

  6. The Arabidopsis lyrata genome sequence and the basis of rapid genome size change

    Energy Technology Data Exchange (ETDEWEB)

    Hu, Tina T.; Pattyn, Pedro; Bakker, Erica G.; Cao, Jun; Cheng, Jan-Fang; Clark, Richard M.; Fahlgren, Noah; Fawcett, Jeffrey A.; Grimwood, Jane; Gundlach, Heidrun; Haberer, Georg; Hollister, Jesse D.; Ossowski, Stephan; Ottilar, Robert P.; Salamov, Asaf A.; Schneeberger, Korbinian; Spannagl, Manuel; Wang, Xi; Yang, Liang; Nasrallah, Mikhail E.; Bergelson, Joy; Carrington, James C.; Gaut, Brandon S.; Schmutz, Jeremy; Mayer, Klaus F. X.; Van de Peer, Yves; Grigoriev, Igor V.; Nordborg, Magnus; Weigel, Detlef; Guo, Ya-Long

    2011-04-29

    In our manuscript, we present a high-quality genome sequence of the Arabidopsis thaliana relative, Arabidopsis lyrata, produced by dideoxy sequencing. We have performed the usual types of genome analysis (gene annotation, dN/dS studies etc. etc.), but this is relegated to the Supporting Information. Instead, we focus on what was a major motivation for sequencing this genome, namely to understand how A. thaliana lost half its genome in a few million years and lived to tell the tale. The rather surprising conclusion is that there is not a single genomic feature that accounts for the reduced genome, but that every aspect centromeres, intergenic regions, transposable elements, gene family number is affected through hundreds of thousands of cuts. This strongly suggests that overall genome size in itself is what has been under selection, a suggestion that is strongly supported by our demonstration (using population genetics data from A. thaliana) that new deletions seem to be driven to fixation.

  7. Draft genome sequence of Penicillium marneffei strain PM1.

    Science.gov (United States)

    Woo, Patrick C Y; Lau, Susanna K P; Liu, Bin; Cai, James J; Chong, Ken T K; Tse, Herman; Kao, Richard Y T; Chan, Che-Man; Chow, Wang-Ngai; Yuen, Kwok-Yung

    2011-12-01

    Penicillium marneffei is the most important thermal dimorphic, pathogenic fungus endemic in China and Southeast Asia and is particularly important in HIV-positive patients. We report the 28,887,485-bp draft genome sequence of P. marneffei, which contains its complete mitochondrial genome, sexual cycle genes, a high diversity of Mp1p homologues, and polyketide synthase genes.

  8. Whole-Genome Sequences of Three Symbiotic Endozoicomonas Bacteria

    KAUST Repository

    Neave, Matthew J.

    2014-08-14

    Members of the genus Endozoicomonas associate with a wide range of marine organisms. Here, we report on the whole-genome sequencing, assembly, and annotation of three Endozoicomonas type strains. These data will assist in exploring interactions between Endozoicomonas organisms and their hosts, and it will aid in the assembly of genomes from uncultivated Endozoicomonas spp.

  9. Whole-genome sequence-based analysis of thyroid function

    DEFF Research Database (Denmark)

    Taylor, Peter N.; Porcu, Eleonora; Chew, Shelby

    2015-01-01

    Normal thyroid function is essential for health, but its genetic architecture remains poorly understood. Here, for the heritable thyroid traits thyrotropin (TSH) and free thyroxine (FT4), we analyse whole-genome sequence data from the UK10K project (N = 2,287). Using additional whole-genome seque...

  10. Genome sequences of Listeria monocytogenes strains with resistance to arsenic

    Science.gov (United States)

    Listeria monocytogenes frequently exhibits resistance to arsenic. We report here the draft genome sequences of eight genetically diverse arsenic-resistant L. monocytogenes strains from human listeriosis and food-associated environments. Availability of these genomes would help to elucidate the role ...

  11. A bibliometric analysis of global research on genome sequencing ...

    African Journals Online (AJOL)

    The results show that disease and protein related researches were the leading research focuses, and comparative genomics and evolution related research had strong potential in the near future. Key words: Genome sequencing, research trend, scientometrics, science citation index expanded (SCI-Expanded), word cluster ...

  12. Complete Genome Sequence of Pediococcus pentosaceus Strain SL4

    DEFF Research Database (Denmark)

    Dantoft, Shruti Harnal; Bielak, Eliza Maria; Seo, Jae-Gu

    2013-01-01

    Pediococcus pentosaceus SL4 was isolated from a Korean fermented vegetable product, kimchi. We report here the whole-genome sequence (WGS) of P. pentosaceus SL4. The genome consists of a 1.79-Mb circular chromosome (G+C content of 37.3%) and seven distinct plasmids ranging in size from 4 kb to 50...

  13. The complete chloroplast genome sequence of Abies nephrolepis (Pinaceae: Abietoideae

    Directory of Open Access Journals (Sweden)

    Dong-Keun Yi

    2016-06-01

    Full Text Available The plant chloroplast (cp genome has maintained a relatively conserved structure and gene content throughout evolution. Cp genome sequences have been used widely for resolving evolutionary and phylogenetic issues at various taxonomic levels of plants. Here, we report the complete cp genome of Abies nephrolepis. The A. nephrolepis cp genome is 121,336 base pairs (bp in length including a pair of short inverted repeat regions (IRa and IRb of 139 bp each separated by a small single copy (SSC region of 54,323 bp (SSC and a large single copy region of 66,735 bp (LSC. It contains 114 genes, 68 of which are protein coding genes, 35 tRNA and four rRNA genes, six open reading frames, and one pseudogene. Seventeen repeat units and 64 simple sequence repeats (SSR have been detected in A. nephrolepis cp genome. Large IR sequences locate in 42-kb inversion points (1186 bp. The A. nephrolepis cp genome is identical to Abies koreana’s which is closely related to taxa. Pairwise comparison between two cp genomes revealed 140 polymorphic sites in each. Complete cp genome sequence of A. nephrolepis has a significant potential to provide information on the evolutionary pattern of Abietoideae and valuable data for development of DNA markers for easy identification and classification.

  14. Draft genome sequence of the sexually transmitted pathogen Trichomonas vaginalis

    DEFF Research Database (Denmark)

    Carlton, Jane M.; Hirt, Robert P.; Silva, Joana C.

    2007-01-01

    We describe the genome sequence of the protist Trichomonas vaginalis, a sexually transmitted human pathogen. Repeats and transposable elements comprise about two-thirds of the approximately 160-megabase genome, reflecting a recent massive expansion of genetic material. This expansion, in conjunct...

  15. Complete genome sequence of pronghorn virus, a pestivirus

    Science.gov (United States)

    The complete genome sequence of Pronghorn virus, a member of the Pestivirus genus of the Flaviviridae, was determined. The virus, originally isolated from a pronghorn antelope, had a genome of 12,287 nucleotides with a single open reading frame of 11,694 bases encoding 3898 amino acids....

  16. Genome sequence of the olive tree, Olea europaea.

    Science.gov (United States)

    Cruz, Fernando; Julca, Irene; Gómez-Garrido, Jèssica; Loska, Damian; Marcet-Houben, Marina; Cano, Emilio; Galán, Beatriz; Frias, Leonor; Ribeca, Paolo; Derdak, Sophia; Gut, Marta; Sánchez-Fernández, Manuel; García, Jose Luis; Gut, Ivo G; Vargas, Pablo; Alioto, Tyler S; Gabaldón, Toni

    2016-06-27

    The Mediterranean olive tree (Olea europaea subsp. europaea) was one of the first trees to be domesticated and is currently of major agricultural importance in the Mediterranean region as the source of olive oil. The molecular bases underlying the phenotypic differences among domesticated cultivars, or between domesticated olive trees and their wild relatives, remain poorly understood. Both wild and cultivated olive trees have 46 chromosomes (2n). A total of 543 Gb of raw DNA sequence from whole genome shotgun sequencing, and a fosmid library containing 155,000 clones from a 1,000+ year-old olive tree (cv. Farga) were generated by Illumina sequencing using different combinations of mate-pair and pair-end libraries. Assembly gave a final genome with a scaffold N50 of 443 kb, and a total length of 1.31 Gb, which represents 95 % of the estimated genome length (1.38 Gb). In addition, the associated fungus Aureobasidium pullulans was partially sequenced. Genome annotation, assisted by RNA sequencing from leaf, root, and fruit tissues at various stages, resulted in 56,349 unique protein coding genes, suggesting recent genomic expansion. Genome completeness, as estimated using the CEGMA pipeline, reached 98.79 %. The assembled draft genome of O. europaea will provide a valuable resource for the study of the evolution and domestication processes of this important tree, and allow determination of the genetic bases of key phenotypic traits. Moreover, it will enhance breeding programs and the formation of new varieties.

  17. Comparative sequence analyses of genome and transcriptome ...

    Indian Academy of Sciences (India)

    /fulltext/jbsc/040/05/0891-0907. Keywords. Asian elephant; comparative genomics; gene prediction; transcriptome. Abstract. The Asian elephant Elephas maximus and the African elephant Loxodonta africana that diverged 5-7 million years ...

  18. Whole Genome Sequencing for Genomics-Guided Investigations of Escherichia coli O157:H7 Outbreaks

    OpenAIRE

    Rusconi Brigida; Sanjar Fatemeh; Koenig Sara SK; Mammel Mark K; Tarr Phillip I; Eppinger Mark

    2016-01-01

    Multi isolate whole genome sequencing (WGS) and typing for outbreak investigations has become a reality in the post-genomics era. We applied this technology to strains from Escherichia coli O157:H7 outbreaks. These include isolates from seven North America outbreaks, as well as multiple isolates from the same patient and from different infected individuals in the same household. Customized high-resolution bioinformatics sequence typing strategies were developed to assess the core genome and m...

  19. First fungal genome sequence from Africa: A preliminary analysis

    Directory of Open Access Journals (Sweden)

    Rene Sutherland

    2012-01-01

    Full Text Available Some of the most significant breakthroughs in the biological sciences this century will emerge from the development of next generation sequencing technologies. The ease of availability of DNA sequence made possible through these new technologies has given researchers opportunities to study organisms in a manner that was not possible with Sanger sequencing. Scientists will, therefore, need to embrace genomics, as well as develop and nurture the human capacity to sequence genomes and utilise the ’tsunami‘ of data that emerge from genome sequencing. In response to these challenges, we sequenced the genome of Fusarium circinatum, a fungal pathogen of pine that causes pitch canker, a disease of great concern to the South African forestry industry. The sequencing work was conducted in South Africa, making F. circinatum the first eukaryotic organism for which the complete genome has been sequenced locally. Here we report on the process that was followed to sequence, assemble and perform a preliminary characterisation of the genome. Furthermore, details of the computer annotation and manual curation of this genome are presented. The F. circinatum genome was found to be nearly 44 million bases in size, which is similar to that of four other Fusarium genomes that have been sequenced elsewhere. The genome contains just over 15 000 open reading frames, which is less than that of the related species, Fusarium oxysporum, but more than that for Fusarium verticillioides. Amongst the various putative gene clusters identified in F. circinatum, those encoding the secondary metabolites fumosin and fusarin appeared to harbour evidence of gene translocation. It is anticipated that similar comparisons of other loci will provide insights into the genetic basis for pathogenicity of the pitch canker pathogen. Perhaps more importantly, this project has engaged a relatively large group of scientists

  20. Dissection of the octoploid strawberry genome by deep sequencing of the genomes of Fragaria species.

    Science.gov (United States)

    Hirakawa, Hideki; Shirasawa, Kenta; Kosugi, Shunichi; Tashiro, Kosuke; Nakayama, Shinobu; Yamada, Manabu; Kohara, Mistuyo; Watanabe, Akiko; Kishida, Yoshie; Fujishiro, Tsunakazu; Tsuruoka, Hisano; Minami, Chiharu; Sasamoto, Shigemi; Kato, Midori; Nanri, Keiko; Komaki, Akiko; Yanagi, Tomohiro; Guoxin, Qin; Maeda, Fumi; Ishikawa, Masami; Kuhara, Satoru; Sato, Shusei; Tabata, Satoshi; Isobe, Sachiko N

    2014-01-01

    Cultivated strawberry (Fragaria x ananassa) is octoploid and shows allogamous behaviour. The present study aims at dissecting this octoploid genome through comparison with its wild relatives, F. iinumae, F. nipponica, F. nubicola, and F. orientalis by de novo whole-genome sequencing on an Illumina and Roche 454 platforms. The total length of the assembled Illumina genome sequences obtained was 698 Mb for F. x ananassa, and ∼200 Mb each for the four wild species. Subsequently, a virtual reference genome termed FANhybrid_r1.2 was constructed by integrating the sequences of the four homoeologous subgenomes of F. x ananassa, from which heterozygous regions in the Roche 454 and Illumina genome sequences were eliminated. The total length of FANhybrid_r1.2 thus created was 173.2 Mb with the N50 length of 5137 bp. The Illumina-assembled genome sequences of F. x ananassa and the four wild species were then mapped onto the reference genome, along with the previously published F. vesca genome sequence to establish the subgenomic structure of F. x ananassa. The strategy adopted in this study has turned out to be successful in dissecting the genome of octoploid F. x ananassa and appears promising when applied to the analysis of other polyploid plant species.

  1. Genome sequence of the date palm Phoenix dactylifera L.

    Science.gov (United States)

    Al-Mssallem, Ibrahim S; Hu, Songnian; Zhang, Xiaowei; Lin, Qiang; Liu, Wanfei; Tan, Jun; Yu, Xiaoguang; Liu, Jiucheng; Pan, Linlin; Zhang, Tongwu; Yin, Yuxin; Xin, Chengqi; Wu, Hao; Zhang, Guangyu; Ba Abdullah, Mohammed M; Huang, Dawei; Fang, Yongjun; Alnakhli, Yasser O; Jia, Shangang; Yin, An; Alhuzimi, Eman M; Alsaihati, Burair A; Al-Owayyed, Saad A; Zhao, Duojun; Zhang, Sun; Al-Otaibi, Noha A; Sun, Gaoyuan; Majrashi, Majed A; Li, Fusen; Tala; Wang, Jixiang; Yun, Quanzheng; Alnassar, Nafla A; Wang, Lei; Yang, Meng; Al-Jelaify, Rasha F; Liu, Kan; Gao, Shenghan; Chen, Kaifu; Alkhaldi, Samiyah R; Liu, Guiming; Zhang, Meng; Guo, Haiyan; Yu, Jun

    2013-01-01

    Date palm (Phoenix dactylifera L.) is a cultivated woody plant species with agricultural and economic importance. Here we report a genome assembly for an elite variety (Khalas), which is 605.4 Mb in size and covers >90% of the genome (~671 Mb) and >96% of its genes (~41,660 genes). Genomic sequence analysis demonstrates that P. dactylifera experienced a clear genome-wide duplication after either ancient whole genome duplications or massive segmental duplications. Genetic diversity analysis indicates that its stress resistance and sugar metabolism-related genes tend to be enriched in the chromosomal regions where the density of single-nucleotide polymorphisms is relatively low. Using transcriptomic data, we also illustrate the date palm's unique sugar metabolism that underlies fruit development and ripening. Our large-scale genomic and transcriptomic data pave the way for further genomic studies not only on P. dactylifera but also other Arecaceae plants.

  2. Ancient Human Genome Sequence of an Extinct Palaeo-Eskimo

    DEFF Research Database (Denmark)

    Rasmussen, Morten; Li, Yingrui; Lindgreen, Stinus

    2010-01-01

    We report here the genome sequence of an ancient human. Obtained from approximately 4,000-year-old permafrost-preserved hair, the genome represents a male individual from the first known culture to settle in Greenland. Sequenced to an average depth of 20x, we recover 79% of the diploid genome...... possible phenotypic characteristics of the individual that belonged to a culture whose location has yielded only trace human remains. We compare the high-confidence SNPs to those of contemporary populations to find the populations most closely related to the individual. This provides evidence...

  3. Complete genome sequence of Serratia plymuthica strain AS12

    Energy Technology Data Exchange (ETDEWEB)

    Neupane, Saraswoti [Uppsala University, Uppsala, Sweden; Finlay, Roger D. [Uppsala University, Uppsala, Sweden; Alstrom, Sadhna [Uppsala University, Uppsala, Sweden; Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute; Lucas, Susan [U.S. Department of Energy, Joint Genome Institute; Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute; Bruce, David [Los Alamos National Laboratory (LANL); Pitluck, Sam [U.S. Department of Energy, Joint Genome Institute; Peters, Lin [U.S. Department of Energy, Joint Genome Institute; Ovchinnikova, Galina [U.S. Department of Energy, Joint Genome Institute; Chertkov, Olga [Los Alamos National Laboratory (LANL); Han, James [U.S. Department of Energy, Joint Genome Institute; Han, Cliff [Los Alamos National Laboratory (LANL); Tapia, Roxanne [Los Alamos National Laboratory (LANL); Detter, J. Chris [U.S. Department of Energy, Joint Genome Institute; Land, Miriam L [ORNL; Hauser, Loren John [ORNL; Cheng, Jan-Fang [U.S. Department of Energy, Joint Genome Institute; Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Pagani, Ioanna [U.S. Department of Energy, Joint Genome Institute; Klenk, Hans-Peter [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Woyke, Tanja [U.S. Department of Energy, Joint Genome Institute; Hogberg, Nils [Uppsala University, Uppsala, Sweden

    2012-01-01

    A plant associated member of the family Enterobacteriaceae, Serratia plymuthica strain AS12 was isolated from rapeseed roots. It is of scientific interest due to its plant growth promoting and plant pathogen inhibiting ability. The genome of S. plymuthica AS12 comprises a 5,443,009 bp long circular chromosome, which consists of 4,952 protein-coding genes, 87 tRNA genes and 7 rRNA operons. This genome was sequenced within the 2010 DOE-JGI Community Sequencing Program (CSP2010) as part of the project entitled 'Genomics of four rapeseed plant growth promoting bacteria with antagonistic effect on plant pathogens'.

  4. Specialized microbial databases for inductive exploration of microbial genome sequences

    Directory of Open Access Journals (Sweden)

    Cabau Cédric

    2005-02-01

    Full Text Available Abstract Background The enormous amount of genome sequence data asks for user-oriented databases to manage sequences and annotations. Queries must include search tools permitting function identification through exploration of related objects. Methods The GenoList package for collecting and mining microbial genome databases has been rewritten using MySQL as the database management system. Functions that were not available in MySQL, such as nested subquery, have been implemented. Results Inductive reasoning in the study of genomes starts from "islands of knowledge", centered around genes with some known background. With this concept of "neighborhood" in mind, a modified version of the GenoList structure has been used for organizing sequence data from prokaryotic genomes of particular interest in China. GenoChore http://bioinfo.hku.hk/genochore.html, a set of 17 specialized end-user-oriented microbial databases (including one instance of Microsporidia, Encephalitozoon cuniculi, a member of Eukarya has been made publicly available. These databases allow the user to browse genome sequence and annotation data using standard queries. In addition they provide a weekly update of searches against the world-wide protein sequences data libraries, allowing one to monitor annotation updates on genes of interest. Finally, they allow users to search for patterns in DNA or protein sequences, taking into account a clustering of genes into formal operons, as well as providing extra facilities to query sequences using predefined sequence patterns. Conclusion This growing set of specialized microbial databases organize data created by the first Chinese bacterial genome programs (ThermaList, Thermoanaerobacter tencongensis, LeptoList, with two different genomes of Leptospira interrogans and SepiList, Staphylococcus epidermidis associated to related organisms for comparison.

  5. Evolutionary insights from suffix array-based genome sequence ...

    Indian Academy of Sciences (India)

    2007-08-06

    Aug 6, 2007 ... Gene and protein sequence analyses, central components of studies in modern biology are easily amenable to string matching and pattern recognition algorithms. The growing need of analysing whole genome sequences more efficiently and thoroughly, has led to the emergence of new computational ...

  6. An automated annotation tool for genomic DNA sequences using ...

    Indian Academy of Sciences (India)

    Unknown

    , New Delhi 110 067, India. Abstract ... analysis of genomic DNA to identify coding sequences using the GeneScan algorithm and characterize these resultant sequences by .... genes for the TCA cycle, while in mitochondria only a subset of the ...

  7. Sequencing and analysis of an Irish human genome.

    LENUS (Irish Health Repository)

    Tong, Pin

    2010-01-01

    Recent studies generating complete human sequences from Asian, African and European subgroups have revealed population-specific variation and disease susceptibility loci. Here, choosing a DNA sample from a population of interest due to its relative geographical isolation and genetic impact on further populations, we extend the above studies through the generation of 11-fold coverage of the first Irish human genome sequence.

  8. Complete genome sequence of a novel pestivirus from sheep.

    Science.gov (United States)

    Becher, Paul; Schmeiser, Stefanie; Oguzoglu, Tuba Cigdem; Postel, Alexander

    2012-10-01

    We report here the complete genome sequence of pestivirus strain Aydin/04-TR, which is the prototype of a group of similar viruses currently present in sheep and goats in Turkey. Sequence data from this virus showed that it clusters separately from the established and previously proposed tentative pestivirus species.

  9. Complete Genome Sequence of a Novel Pestivirus from Sheep

    OpenAIRE

    Becher, Paul; Schmeiser, Stefanie; Oguzoglu, Tuba Cigdem; Postel, Alexander

    2012-01-01

    We report here the complete genome sequence of pestivirus strain Aydin/04-TR, which is the prototype of a group of similar viruses currently present in sheep and goats in Turkey. Sequence data from this virus showed that it clusters separately from the established and previously proposed tentative pestivirus species.

  10. The sheep genome reference sequence: a work in progress.

    Science.gov (United States)

    Archibald, A L; Cockett, N E; Dalrymple, B P; Faraut, T; Kijas, J W; Maddox, J F; McEwan, J C; Hutton Oddy, V; Raadsma, H W; Wade, C; Wang, J; Wang, W; Xun, X

    2010-10-01

    Until recently, the construction of a reference genome was performed using Sanger sequencing alone. The emergence of next-generation sequencing platforms now means reference genomes may incorporate sequence data generated from a range of sequencing platforms, each of which have different read length, systematic biases and mate-pair characteristics. The objective of this review is to inform the mammalian genomics community about the experimental strategy being pursued by the International Sheep Genomics Consortium (ISGC) to construct the draft reference genome of sheep (Ovis aries). Component activities such as data generation, sequence assembly and annotation are described, along with information concerning the key researchers performing the work. This aims to foster future participation from across the research community through the coordinated activities of the consortium. The review also serves as a 'marker paper' by providing information concerning the pre-publication release of the reference genome. This ensures the ISGC adheres to the framework for data sharing established at the recent Toronto International Data Release Workshop and provides guidelines for data users. © 2010 The Authors, Animal Genetics © 2010 Stichting International Foundation for Animal Genetics.

  11. Complete genome sequence of Shewanella putrefaciens. Final report

    Energy Technology Data Exchange (ETDEWEB)

    Heidelberg, John F.

    2001-04-01

    Seventy percent of the costs for genome sequencing Shewanella putrefaciens (oneidensis) were requested. These funds were expected to allow completion of the low-pass (5-fold) random sequencing and complete closure and annotation of the 200 kbp plasmid. Because of cost reduction that occurred during the period of this grant, these goals have been far exceeded. Currently, the S. putrefaciens genome is very nearly completely closed, even though the genome was significantly larger than expected and extremely repetitive. The entire genome sequence has been made BLAST searchable on the TIGR web page, and an extensive effort has been made to make data and analyses available to all researchers working on S. putrefaciens (oneidensis).

  12. Human genome and genetic sequencing research and informed consent

    International Nuclear Information System (INIS)

    Iwakawa, Mayumi

    2003-01-01

    On March 29, 2001, the Ethical Guidelines for Human Genome and Genetic Sequencing Research were established. They have intended to serve as ethical guidelines for all human genome and genetic sequencing research practice, for the purpose of upholding respect for human dignity and rights and enforcing use of proper methods in the pursuit of human genome and genetic sequencing research, with the understanding and cooperation of the public. The RadGenomics Project has prepared a research protocol and informed consent document that follow these ethical guidelines. We have endeavored to protect the privacy of individual information, and have established a procedure for examination of research practices by an ethics committee. Here we report our procedure in order to offer this concept to the patients. (authors)

  13. Genomic Sequencing of Single Microbial Cells from Environmental Samples

    Energy Technology Data Exchange (ETDEWEB)

    Ishoey, Thomas; Woyke, Tanja; Stepanauskas, Ramunas; Novotny, Mark; Lasken, Roger S.

    2008-02-01

    Recently developed techniques allow genomic DNA sequencing from single microbial cells [Lasken RS: Single-cell genomic sequencing using multiple displacement amplification, Curr Opin Microbiol 2007, 10:510-516]. Here, we focus on research strategies for putting these methods into practice in the laboratory setting. An immediate consequence of single-cell sequencing is that it provides an alternative to culturing organisms as a prerequisite for genomic sequencing. The microgram amounts of DNA required as template are amplified from a single bacterium by a method called multiple displacement amplification (MDA) avoiding the need to grow cells. The ability to sequence DNA from individual cells will likely have an immense impact on microbiology considering the vast numbers of novel organisms, which have been inaccessible unless culture-independent methods could be used. However, special approaches have been necessary to work with amplified DNA. MDA may not recover the entire genome from the single copy present in most bacteria. Also, some sequence rearrangements can occur during the DNA amplification reaction. Over the past two years many research groups have begun to use MDA, and some practical approaches to single-cell sequencing have been developed. We review the consensus that is emerging on optimum methods, reliability of amplified template, and the proper interpretation of 'composite' genomes which result from the necessity of combining data from several single-cell MDA reactions in order to complete the assembly. Preferred laboratory methods are considered on the basis of experience at several large sequencing centers where >70% of genomes are now often recovered from single cells. Methods are reviewed for preparation of bacterial fractions from environmental samples, single-cell isolation, DNA amplification by MDA, and DNA sequencing.

  14. A Probabilistic Genome-Wide Gene Reading Frame Sequence Model

    DEFF Research Database (Denmark)

    Have, Christian Theil; Mørk, Søren

    as output. The model can be used to obtain the most probable genome annotation based on a combination of i: a gene finder score of each gene candidate and ii: the sequence of the reading frames of gene candidates through a genome. The model --- as well as a higher order variant --- is developed and tested......We introduce a new type of probabilistic sequence model, that model the sequential composition of reading frames of genes in a genome. Our approach extends gene finders with a model of the sequential composition of genes at the genome-level -- effectively producing a sequential genome annotation...... using the probabilistic logic programming language and machine learning system PRISM - a fast and efficient model prototyping environment, using bacterial gene finding performance as a benchmark of signal strength. The model is used to prune a set of gene predictions from an underlying gene finder...

  15. Complete genome sequence of Kribbella flavida type strain (IFO 14399).

    Science.gov (United States)

    Pukall, Rüdiger; Lapidus, Alla; Glavina Del Rio, Tijana; Copeland, Alex; Tice, Hope; Cheng, Jan-Fang; Lucas, Susan; Chen, Feng; Nolan, Matt; Labutti, Kurt; Pati, Amrita; Ivanova, Natalia; Mavromatis, Konstantinos; Mikhailova, Natalia; Pitluck, Sam; Bruce, David; Goodwin, Lynne; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D; Chen, Amy; Palaniappan, Krishna; Chain, Patrick; Rohde, Manfred; Göker, Markus; Bristow, Jim; Eisen, Jonathan A; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Brettin, Thomas

    2010-03-30

    The genus Kribbella consists of 15 species, with Kribbella flavida (Park et al. 1999) as the type species. The name Kribbella was formed from the acronym of the Korea Research Institute of Bioscience and Biotechnology, KRIBB. Strains of the various Kribbella species were originally isolated from soil, potato, alum slate mine, patinas of catacombs or from horse racecourses. Here we describe the features of K. flavida together with the complete genome sequence and annotation. In addition to the 5.3 Mbp genome of Nocardioides sp. JS614, this is only the second completed genome sequence of the family Nocardioidaceae. The 7,579,488 bp long genome with its 7,086 protein-coding and 60 RNA genes and is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  16. The genome sequence of sweet cherry (Prunus avium) for use in genomics-assisted breeding.

    Science.gov (United States)

    Shirasawa, Kenta; Isuzugawa, Kanji; Ikenaga, Mitsunobu; Saito, Yutaro; Yamamoto, Toshiya; Hirakawa, Hideki; Isobe, Sachiko

    2017-10-01

    We determined the genome sequence of sweet cherry (Prunus avium) using next-generation sequencing technology. The total length of the assembled sequences was 272.4 Mb, consisting of 10,148 scaffold sequences with an N50 length of 219.6 kb. The sequences covered 77.8% of the 352.9 Mb sweet cherry genome, as estimated by k-mer analysis, and included >96.0% of the core eukaryotic genes. We predicted 43,349 complete and partial protein-encoding genes. A high-density consensus map with 2,382 loci was constructed using double-digest restriction site-associated DNA sequencing. Comparing the genetic maps of sweet cherry and peach revealed high synteny between the two genomes; thus the scaffolds were integrated into pseudomolecules using map- and synteny-based strategies. Whole-genome resequencing of six modern cultivars found 1,016,866 SNPs and 162,402 insertions/deletions, out of which 0.7% were deleterious. The sequence variants, as well as simple sequence repeats, can be used as DNA markers. The genomic information helps us to identify agronomically important genes and will accelerate genetic studies and breeding programs for sweet cherries. Further information on the genomic sequences and DNA markers is available in DBcherry (http://cherry.kazusa.or.jp (8 May 2017, date last accessed)). © The Author 2017. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  17. Genome sequence analysis of the model grass Brachypodium distachyon: insights into grass genome evolution

    Energy Technology Data Exchange (ETDEWEB)

    Schulman, Al

    2009-08-09

    Three subfamilies of grasses, the Erhardtoideae (rice), the Panicoideae (maize, sorghum, sugar cane and millet), and the Pooideae (wheat, barley and cool season forage grasses) provide the basis of human nutrition and are poised to become major sources of renewable energy. Here we describe the complete genome sequence of the wild grass Brachypodium distachyon (Brachypodium), the first member of the Pooideae subfamily to be completely sequenced. Comparison of the Brachypodium, rice and sorghum genomes reveals a precise sequence- based history of genome evolution across a broad diversity of the grass family and identifies nested insertions of whole chromosomes into centromeric regions as a predominant mechanism driving chromosome evolution in the grasses. The relatively compact genome of Brachypodium is maintained by a balance of retroelement replication and loss. The complete genome sequence of Brachypodium, coupled to its exceptional promise as a model system for grass research, will support the development of new energy and food crops

  18. Simple sequence repeats in mycobacterial genomes

    Indian Academy of Sciences (India)

    Prakash

    2006-12-18

    Dec 18, 2006 ... strains of M. tuberculosis (CDC1551 and H37Rv) were analysed for frequencies and abundance of SSRs. .... Figure 1. Tract density profiles of microsatellites in the five genomes: (a) M. avium (MA), (b) M. bovis (MB), (c) M. tuberculosis ...... Jackson A L, Chen R and Loeb L A 1998 Induction of microsatellite.

  19. Rapid and Accurate Sequencing of Enterovirus Genomes Using MinION Nanopore Sequencer.

    Science.gov (United States)

    Wang, Ji; Ke, Yue Hua; Zhang, Yong; Huang, Ke Qiang; Wang, Lei; Shen, Xin Xin; Dong, Xiao Ping; Xu, Wen Bo; Ma, Xue Jun

    2017-10-01

    Knowledge of an enterovirus genome sequence is very important in epidemiological investigation to identify transmission patterns and ascertain the extent of an outbreak. The MinION sequencer is increasingly used to sequence various viral pathogens in many clinical situations because of its long reads, portability, real-time accessibility of sequenced data, and very low initial costs. However, information is lacking on MinION sequencing of enterovirus genomes. In this proof-of-concept study using Enterovirus 71 (EV71) and Coxsackievirus A16 (CA16) strains as examples, we established an amplicon-based whole genome sequencing method using MinION. We explored the accuracy, minimum sequencing time, discrimination and high-throughput sequencing ability of MinION, and compared its performance with Sanger sequencing. Within the first minute (min) of sequencing, the accuracy of MinION was 98.5% for the single EV71 strain and 94.12%-97.33% for 10 genetically-related CA16 strains. In as little as 14 min, 99% identity was reached for the single EV71 strain, and in 17 min (on average), 99% identity was achieved for 10 CA16 strains in a single run. MinION is suitable for whole genome sequencing of enteroviruses with sufficient accuracy and fine discrimination and has the potential as a fast, reliable and convenient method for routine use. Copyright © 2017 The Editorial Board of Biomedical and Environmental Sciences. Published by China CDC. All rights reserved.

  20. Mining genome sequencing data to identify the genomic features linked to breast cancer histopathology

    Science.gov (United States)

    Ping, Zheng; Siegal, Gene P.; Almeida, Jonas S.; Schnitt, Stuart J.; Shen, Dejun

    2014-01-01

    Background: Genetics and genomics have radically altered our understanding of breast cancer progression. However, the genomic basis of various histopathologic features of breast cancer is not yet well-defined. Materials and Methods: The Cancer Genome Atlas (TCGA) is an international database containing a large collection of human cancer genome sequencing data. cBioPortal is a web tool developed for mining these sequencing data. We performed mining of TCGA sequencing data in an attempt to characterize the genomic features correlated with breast cancer histopathology. We first assessed the quality of the TCGA data using a group of genes with known alterations in various cancers. Both genome-wide gene mutation and copy number changes as well as a group of genes with a high frequency of genetic changes were then correlated with various histopathologic features of invasive breast cancer. Results: Validation of TCGA data using a group of genes with known alterations in breast cancer suggests that the TCGA has accurately documented the genomic abnormalities of multiple malignancies. Further analysis of TCGA breast cancer sequencing data shows that accumulation of specific genomic defects is associated with higher tumor grade, larger tumor size and receptor negativity. Distinct groups of genomic changes were found to be associated with the different grades of invasive ductal carcinoma. The mutator role of the TP53 gene was validated by genomic sequencing data of invasive breast cancer and TP53 mutation was found to play a critical role in defining high tumor grade. Conclusions: Data mining of the TCGA genome sequencing data is an innovative and reliable method to help characterize the genomic abnormalities associated with histopathologic features of invasive breast cancer. PMID:24672738

  1. Mining genome sequencing data to identify the genomic features linked to breast cancer histopathology

    Directory of Open Access Journals (Sweden)

    Zheng Ping

    2014-01-01

    Full Text Available Background: Genetics and genomics have radically altered our understanding of breast cancer progression. However, the genomic basis of various histopathologic features of breast cancer is not yet well-defined. Materials and Methods: The Cancer Genome Atlas (TCGA is an international database containing a large collection of human cancer genome sequencing data. cBioPortal is a web tool developed for mining these sequencing data. We performed mining of TCGA sequencing data in an attempt to characterize the genomic features correlated with breast cancer histopathology. We first assessed the quality of the TCGA data using a group of genes with known alterations in various cancers. Both genome-wide gene mutation and copy number changes as well as a group of genes with a high frequency of genetic changes were then correlated with various histopathologic features of invasive breast cancer. Results: Validation of TCGA data using a group of genes with known alterations in breast cancer suggests that the TCGA has accurately documented the genomic abnormalities of multiple malignancies. Further analysis of TCGA breast cancer sequencing data shows that accumulation of specific genomic defects is associated with higher tumor grade, larger tumor size and receptor negativity. Distinct groups of genomic changes were found to be associated with the different grades of invasive ductal carcinoma. The mutator role of the TP53 gene was validated by genomic sequencing data of invasive breast cancer and TP53 mutation was found to play a critical role in defining high tumor grade. Conclusions: Data mining of the TCGA genome sequencing data is an innovative and reliable method to help characterize the genomic abnormalities associated with histopathologic features of invasive breast cancer.

  2. Genomic multiple sequence alignments: refinement using a genetic algorithm

    Directory of Open Access Journals (Sweden)

    Lefkowitz Elliot J

    2005-08-01

    Full Text Available Abstract Background Genomic sequence data cannot be fully appreciated in isolation. Comparative genomics – the practice of comparing genomic sequences from different species – plays an increasingly important role in understanding the genotypic differences between species that result in phenotypic differences as well as in revealing patterns of evolutionary relationships. One of the major challenges in comparative genomics is producing a high-quality alignment between two or more related genomic sequences. In recent years, a number of tools have been developed for aligning large genomic sequences. Most utilize heuristic strategies to identify a series of strong sequence similarities, which are then used as anchors to align the regions between the anchor points. The resulting alignment is globally correct, but in many cases is suboptimal locally. We describe a new program, GenAlignRefine, which improves the overall quality of global multiple alignments by using a genetic algorithm to improve local regions of alignment. Regions of low quality are identified, realigned using the program T-Coffee, and then refined using a genetic algorithm. Because a better COFFEE (Consistency based Objective Function For alignmEnt Evaluation score generally reflects greater alignment quality, the algorithm searches for an alignment that yields a better COFFEE score. To improve the intrinsic slowness of the genetic algorithm, GenAlignRefine was implemented as a parallel, cluster-based program. Results We tested the GenAlignRefine algorithm by running it on a Linux cluster to refine sequences from a simulation, as well as refine a multiple alignment of 15 Orthopoxvirus genomic sequences approximately 260,000 nucleotides in length that initially had been aligned by Multi-LAGAN. It took approximately 150 minutes for a 40-processor Linux cluster to optimize some 200 fuzzy (poorly aligned regions of the orthopoxvirus alignment. Overall sequence identity increased only

  3. Pervasive sequence patents cover the entire human genome.

    Science.gov (United States)

    Rosenfeld, Jeffrey A; Mason, Christopher E

    2013-01-01

    The scope and eligibility of patents for genetic sequences have been debated for decades, but a critical case regarding gene patents (Association of Molecular Pathologists v. Myriad Genetics) is now reaching the US Supreme Court. Recent court rulings have supported the assertion that such patents can provide intellectual property rights on sequences as small as 15 nucleotides (15mers), but an analysis of all current US patent claims and the human genome presented here shows that 15mer sequences from all human genes match at least one other gene. The average gene matches 364 other genes as 15mers; the breast-cancer-associated gene BRCA1 has 15mers matching at least 689 other genes. Longer sequences (1,000 bp) still showed extensive cross-gene matches. Furthermore, 15mer-length claims from bovine and other animal patents could also claim as much as 84% of the genes in the human genome. In addition, when we expanded our analysis to full-length patent claims on DNA from all US patents to date, we found that 41% of the genes in the human genome have been claimed. Thus, current patents for both short and long nucleotide sequences are extraordinarily non-specific and create an uncertain, problematic liability for genomic medicine, especially in regard to targeted re-sequencing and other sequence diagnostic assays.

  4. Draft genome sequence of the Tibetan antelope.

    Science.gov (United States)

    Ge, Ri-Li; Cai, Qingle; Shen, Yong-Yi; San, A; Ma, Lan; Zhang, Yong; Yi, Xin; Chen, Yan; Yang, Lingfeng; Huang, Ying; He, Rongjun; Hui, Yuanyuan; Hao, Meirong; Li, Yue; Wang, Bo; Ou, Xiaohua; Xu, Jiaohui; Zhang, Yongfen; Wu, Kui; Geng, Chunyu; Zhou, Weiping; Zhou, Taicheng; Irwin, David M; Yang, Yingzhong; Ying, Liu; Bao, Haihua; Kim, Jaebum; Larkin, Denis M; Ma, Jian; Lewin, Harris A; Xing, Jinchuan; Platt, Roy N; Ray, David A; Auvil, Loretta; Capitanu, Boris; Zhang, Xiufeng; Zhang, Guojie; Murphy, Robert W; Wang, Jun; Zhang, Ya-Ping; Wang, Jian

    2013-01-01

    The Tibetan antelope (Pantholops hodgsonii) is endemic to the extremely inhospitable high-altitude environment of the Qinghai-Tibetan Plateau, a region that has a low partial pressure of oxygen and high ultraviolet radiation. Here we generate a draft genome of this artiodactyl and use it to detect the potential genetic bases of highland adaptation. Compared with other plain-dwelling mammals, the genome of the Tibetan antelope shows signals of adaptive evolution and gene-family expansion in genes associated with energy metabolism and oxygen transmission. Both the highland American pika, and the Tibetan antelope have signals of positive selection for genes involved in DNA repair and the production of ATPase. Genes associated with hypoxia seem to have experienced convergent evolution. Thus, our study suggests that common genetic mechanisms might have been utilized to enable high-altitude adaptation.

  5. Comparison of methods for genomic localization of gene trap sequences

    Directory of Open Access Journals (Sweden)

    Ferrin Thomas E

    2006-09-01

    Full Text Available Abstract Background Gene knockouts in a model organism such as mouse provide a valuable resource for the study of basic biology and human disease. Determining which gene has been inactivated by an untargeted gene trapping event poses a challenging annotation problem because gene trap sequence tags, which represent sequence near the vector insertion site of a trapped gene, are typically short and often contain unresolved residues. To understand better the localization of these sequences on the mouse genome, we compared stand-alone versions of the alignment programs BLAT, SSAHA, and MegaBLAST. A set of 3,369 sequence tags was aligned to build 34 of the mouse genome using default parameters for each algorithm. Known genome coordinates for the cognate set of full-length genes (1,659 sequences were used to evaluate localization results. Results In general, all three programs performed well in terms of localizing sequences to a general region of the genome, with only relatively subtle errors identified for a small proportion of the sequence tags. However, large differences in performance were noted with regard to correctly identifying exon boundaries. BLAT correctly identified the vast majority of exon boundaries, while SSAHA and MegaBLAST missed the majority of exon boundaries. SSAHA consistently reported the fewest false positives and is the fastest algorithm. MegaBLAST was comparable to BLAT in speed, but was the most susceptible to localizing sequence tags incorrectly to pseudogenes. Conclusion The differences in performance for sequence tags and full-length reference sequences were surprisingly small. Characteristic variations in localization results for each program were noted that affect the localization of sequence at exon boundaries, in particular.

  6. The diploid genome sequence of an individual human.

    Directory of Open Access Journals (Sweden)

    Samuel Levy

    2007-09-01

    Full Text Available Presented here is a genome sequence of an individual human. It was produced from approximately 32 million random DNA fragments, sequenced by Sanger dideoxy technology and assembled into 4,528 scaffolds, comprising 2,810 million bases (Mb of contiguous sequence with approximately 7.5-fold coverage for any given region. We developed a modified version of the Celera assembler to facilitate the identification and comparison of alternate alleles within this individual diploid genome. Comparison of this genome and the National Center for Biotechnology Information human reference assembly revealed more than 4.1 million DNA variants, encompassing 12.3 Mb. These variants (of which 1,288,319 were novel included 3,213,401 single nucleotide polymorphisms (SNPs, 53,823 block substitutions (2-206 bp, 292,102 heterozygous insertion/deletion events (indels(1-571 bp, 559,473 homozygous indels (1-82,711 bp, 90 inversions, as well as numerous segmental duplications and copy number variation regions. Non-SNP DNA variation accounts for 22% of all events identified in the donor, however they involve 74% of all variant bases. This suggests an important role for non-SNP genetic alterations in defining the diploid genome structure. Moreover, 44% of genes were heterozygous for one or more variants. Using a novel haplotype assembly strategy, we were able to span 1.5 Gb of genome sequence in segments >200 kb, providing further precision to the diploid nature of the genome. These data depict a definitive molecular portrait of a diploid human genome that provides a starting point for future genome comparisons and enables an era of individualized genomic information.

  7. Genome-wide sequence variations among Mycobacterium avium subspecies paratuberculosis.

    Directory of Open Access Journals (Sweden)

    Chung-Yi eHsu

    2011-12-01

    Full Text Available Mycobacterium avium subspecies paratuberculosis (M. ap, the causative agent of Johne’s disease (JD, infects many farmed ruminants, wildlife animals and humans. To better understand the molecular pathogenesis of these infections, we analyzed the whole genome sequences of several M. ap and M. avium subspecies avium (M. avium strains isolated from various hosts and environments. Using Next-generation sequencing technology, all 6 M. ap isolates showed a high percentage of homology (98% to the reference genome sequence of M. ap K-10 isolated from cattle. However, 2 M. avium isolates (DT 78 and Env 77 showed significant sequence diversity from the reference strain M. avium 104. The genomes of M. avium isolates DT 78 and Env 77 exhibited only 87% and 40% homology, respectively, to the M. avium 104 reference genome. Within the M. ap isolates, genomic rearrangements (insertions/deletions, Indels were not detected, and only unique single nucleotide polymorphisms (SNPs were observed among the 6 M. ap strains. While most of the SNPs (~100 in M. ap genomes were non-synonymous, a total of ~ 6000 SNPs were detected among M. avium genomes, most of them were synonymous suggesting a differential selective pressure between M. ap and M. avium isolates. In addition, SNPs-based phylo-genomic analysis showed that isolates from goat and Oryx are closely related to the cattle (K-10 strain while the human isolate (M. ap 4B is closely related to the environmental strains, indicating environmental source to human infections. Overall, SNPs were the most common variations among M. ap isolates while SNPs in addition to Indels were prevalent among M. avium isolates. Genomic variations will be useful in designing host-specific markers for the analysis of mycobacterial evolution and for developing novel diagnostics directed against Johne’s disease in animals.

  8. Genome-wide characterization of simple sequence repeats in cucumber (Cucumis sativus L.

    Directory of Open Access Journals (Sweden)

    Simon Philipp W

    2010-10-01

    Full Text Available Abstract Background Cucumber, Cucumis sativus L. is an important vegetable crop worldwide. Until very recently, cucumber genetic and genomic resources, especially molecular markers, have been very limited, impeding progress of cucumber breeding efforts. Microsatellites are short tandemly repeated DNA sequences, which are frequently favored as genetic markers due to their high level of polymorphism and codominant inheritance. Data from previously characterized genomes has shown that these repeats vary in frequency, motif sequence, and genomic location across taxa. During the last year, the genomes of two cucumber genotypes were sequenced including the Chinese fresh market type inbred line '9930' and the North American pickling type inbred line 'Gy14'. These sequences provide a powerful tool for developing markers in a large scale. In this study, we surveyed and characterized the distribution and frequency of perfect microsatellites in 203 Mbp assembled Gy14 DNA sequences, representing 55% of its nuclear genome, and in cucumber EST sequences. Similar analyses were performed in genomic and EST data from seven other plant species, and the results were compared with those of cucumber. Results A total of 112,073 perfect repeats were detected in the Gy14 cucumber genome sequence, accounting for 0.9% of the assembled Gy14 genome, with an overall density of 551.9 SSRs/Mbp. While tetranucleotides were the most frequent microsatellites in genomic DNA sequence, dinucleotide repeats, which had more repeat units than any other SSR type, had the highest cumulative sequence length. Coding regions (ESTs of the cucumber genome had fewer microsatellites compared to its genomic sequence, with trinucleotides predominating in EST sequences. AAG was the most frequent repeat in cucumber ESTs. Overall, AT-rich motifs prevailed in both genomic and EST data. Compared to the other species examined, cucumber genomic sequence had the highest density of SSRs (although

  9. Sequencing skippy: the genome sequence of an Australian kangaroo, Macropus eugenii

    Science.gov (United States)

    2011-01-01

    Sequencing of the tammar wallaby (Macropus eugenii) reveals insights into genome evolution, and mammalian reproduction and development. See research article: http://genomebiology.com/2011/12/8/R81 PMID:21861852

  10. Draft Genome Sequences of Two Virulent Serotypes of Avian Pasteurella multocida

    OpenAIRE

    Abrahante, Juan E.; Johnson, Timothy J.; Hunter, Samuel S.; Maheswaran, Samuel K.; Hauglund, Melissa J.; Bayles, Darrell O.; Tatum, Fred M.; Briggs, Robert E.

    2013-01-01

    Here we report the draft genome sequences of two virulent avian strains of Pasteurella multocida. Comparative analyses of these genomes were done with the published genome sequence of avirulent P.?multocida strain Pm70.

  11. Draft Genome Sequences of Two Virulent Serotypes of Avian Pasteurella multocida

    Science.gov (United States)

    Abrahante, Juan E.; Johnson, Timothy J.; Hunter, Samuel S.; Maheswaran, Samuel K.; Hauglund, Melissa J.; Bayles, Darrell O.; Tatum, Fred M.

    2013-01-01

    Here we report the draft genome sequences of two virulent avian strains of Pasteurella multocida. Comparative analyses of these genomes were done with the published genome sequence of avirulent P. multocida strain Pm70. PMID:23405337

  12. Mitochondrial genome sequencing helps show the evolutionary mechanism of mitochondrial genome formation in Brassica

    Science.gov (United States)

    2011-01-01

    Background Angiosperm mitochondrial genomes are more complex than those of other organisms. Analyses of the mitochondrial genome sequences of at least 11 angiosperm species have showed several common properties; these cannot easily explain, however, how the diverse mitotypes evolved within each genus or species. We analyzed the evolutionary relationships of Brassica mitotypes by sequencing. Results We sequenced the mitotypes of cam (Brassica rapa), ole (B. oleracea), jun (B. juncea), and car (B. carinata) and analyzed them together with two previously sequenced mitotypes of B. napus (pol and nap). The sizes of whole single circular genomes of cam, jun, ole, and car are 219,747 bp, 219,766 bp, 360,271 bp, and 232,241 bp, respectively. The mitochondrial genome of ole is largest as a resulting of the duplication of a 141.8 kb segment. The jun mitotype is the result of an inherited cam mitotype, and pol is also derived from the cam mitotype with evolutionary modifications. Genes with known functions are conserved in all mitotypes, but clear variation in open reading frames (ORFs) with unknown functions among the six mitotypes was observed. Sequence relationship analysis showed that there has been genome compaction and inheritance in the course of Brassica mitotype evolution. Conclusions We have sequenced four Brassica mitotypes, compared six Brassica mitotypes and suggested a mechanism for mitochondrial genome formation in Brassica, including evolutionary events such as inheritance, duplication, rearrangement, genome compaction, and mutation. PMID:21988783

  13. GABenchToB: a genome assembly benchmark tuned on bacteria and benchtop sequencers.

    Directory of Open Access Journals (Sweden)

    Sebastian Jünemann

    Full Text Available De novo genome assembly is the process of reconstructing a complete genomic sequence from countless small sequencing reads. Due to the complexity of this task, numerous genome assemblers have been developed to cope with different requirements and the different kinds of data provided by sequencers within the fast evolving field of next-generation sequencing technologies. In particular, the recently introduced generation of benchtop sequencers, like Illumina's MiSeq and Ion Torrent's Personal Genome Machine (PGM, popularized the easy, fast, and cheap sequencing of bacterial organisms to a broad range of academic and clinical institutions. With a strong pragmatic focus, here, we give a novel insight into the line of assembly evaluation surveys as we benchmark popular de novo genome assemblers based on bacterial data generated by benchtop sequencers. Therefore, single-library assemblies were generated, assembled, and compared to each other by metrics describing assembly contiguity and accuracy, and also by practice-oriented criteria as for instance computing time. In addition, we extensively analyzed the effect of the depth of coverage on the genome assemblies within reasonable ranges and the k-mer optimization problem of de Bruijn Graph assemblers. Our results show that, although both MiSeq and PGM allow for good genome assemblies, they require different approaches. They not only pair with different assembler types, but also affect assemblies differently regarding the depth of coverage where oversampling can become problematic. Assemblies vary greatly with respect to contiguity and accuracy but also by the requirement on the computing power. Consequently, no assembler can be rated best for all preconditions. Instead, the given kind of data, the demands on assembly quality, and the available computing infrastructure determines which assembler suits best. The data sets, scripts and all additional information needed to replicate our results are freely

  14. 959 Nematode Genomes: a semantic wiki for coordinating sequencing projects.

    Science.gov (United States)

    Kumar, Sujai; Schiffer, Philipp H; Blaxter, Mark

    2012-01-01

    Genome sequencing has been democratized by second-generation technologies, and even small labs can sequence metazoan genomes now. In this article, we describe '959 Nematode Genomes'--a community-curated semantic wiki to coordinate the sequencing efforts of individual labs to collectively sequence 959 genomes spanning the phylum Nematoda. The main goal of the wiki is to track sequencing projects that have been proposed, are in progress, or have been completed. Wiki pages for species and strains are linked to pages for people and organizations, using machine- and human-readable metadata that users can query to see the status of their favourite worm. The site is based on the same platform that runs Wikipedia, with semantic extensions that allow the underlying taxonomy and data storage models to be maintained and updated with ease compared with a conventional database-driven web site. The wiki also provides a way to track and share preliminary data if those data are not polished enough to be submitted to the official sequence repositories. In just over a year, this wiki has already fostered new international collaborations and attracted newcomers to the enthusiastic community of nematode genomicists. www.nematodegenomes.org.

  15. Genome shotgun sequencing and development of microsatellite ...

    African Journals Online (AJOL)

    ADP

    2012-04-10

    Apr 10, 2012 ... 22,527,019 bp, which consisted of 3,085 contigs and 28,249 singletons. We assembled 61,958 reads into. 3,085 contigs, of which 114 (3.70%) contained microsatellite repeats. The average G+C content was. 39.3%. Functional annotation to known sequences yielded 14.7% unigenes in the 'Raon' cultivar.

  16. Understanding Cancer Genome and Its Evolution by Next Generation Sequencing

    DEFF Research Database (Denmark)

    Hou, Yong

    knowledge we previously know. There is very limited knowledge of East Asia lung cancer genome except enrichment of EGFR mutations and lack of KRAS mutations. We carried out integrated genomic, transcriptomic and methylomic analysis of 335 primary Chinese lung adenocarcinomas (LUAD) and 35 corresponding......Cancer will cause 13 million deaths by the year of 2030, ranking the second leading cause of death worldwide. Previous studies indicate that most of the cancers originate from cells that acquired somatic mutations and evolved as Darwin Theory. Ten biological insights of cancer have been summarized...... recently. Cutting-age technologies like next generation sequencing (NGS) enable exploring cancer genome and evolution much more efficiently. However, integrated cancer genome sequencing studies showed great inter-/intra-tumoral heterogeneity (ITH) and complex evolution patterns beyond the cancer biological...

  17. Complete genome sequence of Truepera radiovictrix type strain (RQ-24).

    Science.gov (United States)

    Ivanova, Natalia; Rohde, Christine; Munk, Christine; Nolan, Matt; Lucas, Susan; Del Rio, Tijana Glavina; Tice, Hope; Deshpande, Shweta; Cheng, Jan-Fang; Tapia, Roxane; Han, Cliff; Goodwin, Lynne; Pitluck, Sam; Liolios, Konstantinos; Mavromatis, Konstantinos; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D; Brambilla, Evelyne; Rohde, Manfred; Göker, Markus; Tindall, Brian J; Woyke, Tanja; Bristow, James; Eisen, Jonathan A; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Lapidus, Alla

    2011-02-22

    Truepera radiovictrix Albuquerque et al. 2005 is the type species of the genus Truepera within the phylum "Deinococcus/Thermus". T. radiovictrix is of special interest not only because of its isolated phylogenetic location in the order Deinococcales, but also because of its ability to grow under multiple extreme conditions in alkaline, moderately saline, and high temperature habitats. Of particular interest is the fact that, T. radiovictrix is also remarkably resistant to ionizing radiation, a feature it shares with members of the genus Deinococcus. This is the first completed genome sequence of a member of the family Trueperaceae and the fourth type strain genome sequence from a member of the order Deinococcales. The 3,260,398 bp long genome with its 2,994 protein-coding and 52 RNA genes consists of one circular chromosome and is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  18. Whole genome sequencing in clinical and public health microbiology.

    Science.gov (United States)

    Kwong, J C; McCallum, N; Sintchenko, V; Howden, B P

    2015-04-01

    Genomics and whole genome sequencing (WGS) have the capacity to greatly enhance knowledge and understanding of infectious diseases and clinical microbiology.The growth and availability of bench-top WGS analysers has facilitated the feasibility of genomics in clinical and public health microbiology.Given current resource and infrastructure limitations, WGS is most applicable to use in public health laboratories, reference laboratories, and hospital infection control-affiliated laboratories.As WGS represents the pinnacle for strain characterisation and epidemiological analyses, it is likely to replace traditional typing methods, resistance gene detection and other sequence-based investigations (e.g., 16S rDNA PCR) in the near future.Although genomic technologies are rapidly evolving, widespread implementation in clinical and public health microbiology laboratories is limited by the need for effective semi-automated pipelines, standardised quality control and data interpretation, bioinformatics expertise, and infrastructure.

  19. Complete genome sequence of Arcobacter nitrofigilis type strain (CIT)

    Energy Technology Data Exchange (ETDEWEB)

    Pati, Amrita [U.S. Department of Energy, Joint Genome Institute; Gronow, Sabine [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute; Copeland, A [U.S. Department of Energy, Joint Genome Institute; Glavina Del Rio, Tijana [U.S. Department of Energy, Joint Genome Institute; Nolan, Matt [U.S. Department of Energy, Joint Genome Institute; Lucas, Susan [U.S. Department of Energy, Joint Genome Institute; Tice, Hope [U.S. Department of Energy, Joint Genome Institute; Cheng, Jan-Fang [U.S. Department of Energy, Joint Genome Institute; Han, Cliff [Los Alamos National Laboratory (LANL); Chertkov, Olga [Los Alamos National Laboratory (LANL); Bruce, David [Los Alamos National Laboratory (LANL); Tapia, Roxanne [Los Alamos National Laboratory (LANL); Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Pitluck, Sam [U.S. Department of Energy, Joint Genome Institute; Liolios, Konstantinos [U.S. Department of Energy, Joint Genome Institute; Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Mavromatis, K [U.S. Department of Energy, Joint Genome Institute; Chen, Amy [U.S. Department of Energy, Joint Genome Institute; Palaniappan, Krishna [U.S. Department of Energy, Joint Genome Institute; Land, Miriam L [ORNL; Hauser, Loren John [ORNL; Jeffries, Cynthia [Oak Ridge National Laboratory (ORNL); Detter, J. Chris [U.S. Department of Energy, Joint Genome Institute; Rohde, Manfred [HZI - Helmholtz Centre for Infection Research, Braunschweig, Germany; Goker, Markus [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Bristow, James [U.S. Department of Energy, Joint Genome Institute; Eisen, Jonathan [U.S. Department of Energy, Joint Genome Institute; Markowitz, Victor [U.S. Department of Energy, Joint Genome Institute; Hugenholtz, Philip [U.S. Department of Energy, Joint Genome Institute; Klenk, Hans-Peter [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute

    2010-01-01

    Arcobacter nitrofigilis (McClung et al. 1983) Vandamme et al. 1991 is the type species of the genus Arcobacter in the epsilonproteobacterial family Campylobacteraceae. The species was first described in 1983 as Campylobacter nitrofigilis [1] after its detection as a free-living, nitrogen-fixing Campylobacter species associated with Spartina alterniflora Loisel. roots [2]. It is of phylogenetic interest because of its lifestyle as a symbiotic organism in a marine environment in contrast to many other Arcobacter species which are associated with warm-blooded animals and tend to be pathogenic. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of a type stain of the genus Arcobacter. The 3,192,235 bp genome with its 3,154 protein-coding and 70 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  20. Complete genome sequence of Tsukamurella paurometabola type strain (no. 33).

    Science.gov (United States)

    Munk, A Christine; Lapidus, Alla; Lucas, Susan; Nolan, Matt; Tice, Hope; Cheng, Jan-Fang; Del Rio, Tijana Glavina; Goodwin, Lynne; Pitluck, Sam; Liolios, Konstantinos; Huntemann, Marcel; Ivanova, Natalia; Mavromatis, Konstantinos; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Tapia, Roxanne; Han, Cliff; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D; Brettin, Thomas; Yasawong, Montri; Brambilla, Evelyne-Marie; Rohde, Manfred; Sikorski, Johannes; Göker, Markus; Detter, John C; Woyke, Tanja; Bristow, James; Eisen, Jonathan A; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter

    2011-07-01

    Tsukamurella paurometabola corrig. (Steinhaus 1941) Collins et al. 1988 is the type species of the genus Tsukamurella, which is the type genus to the family Tsukamurellaceae. The species is not only of interest because of its isolated phylogenetic location, but also because it is a human opportunistic pathogen with some strains of the species reported to cause lung infection, lethal meningitis, and necrotizing tenosynovitis. This is the first completed genome sequence of a member of the genus Tsukamurella and the first genome sequence of a member of the family Tsukamurellaceae. The 4,479,724 bp long genome contains a 99,806 bp long plasmid and a total of 4,335 protein-coding and 56 RNA genes, and is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  1. Whole genome sequencing in clinical and public health microbiology

    Science.gov (United States)

    Kwong, J. C.; McCallum, N.; Sintchenko, V.; Howden, B. P.

    2015-01-01

    SummaryGenomics and whole genome sequencing (WGS) have the capacity to greatly enhance knowledge and understanding of infectious diseases and clinical microbiology. The growth and availability of bench-top WGS analysers has facilitated the feasibility of genomics in clinical and public health microbiology. Given current resource and infrastructure limitations, WGS is most applicable to use in public health laboratories, reference laboratories, and hospital infection control-affiliated laboratories. As WGS represents the pinnacle for strain characterisation and epidemiological analyses, it is likely to replace traditional typing methods, resistance gene detection and other sequence-based investigations (e.g., 16S rDNA PCR) in the near future. Although genomic technologies are rapidly evolving, widespread implementation in clinical and public health microbiology laboratories is limited by the need for effective semi-automated pipelines, standardised quality control and data interpretation, bioinformatics expertise, and infrastructure. PMID:25730631

  2. Draft genome sequence of the rubber tree Hevea brasiliensis

    Directory of Open Access Journals (Sweden)

    Rahman Ahmad Yamin Abdul

    2013-02-01

    Full Text Available Abstract Background Hevea brasiliensis, a member of the Euphorbiaceae family, is the major commercial source of natural rubber (NR. NR is a latex polymer with high elasticity, flexibility, and resilience that has played a critical role in the world economy since 1876. Results Here, we report the draft genome sequence of H. brasiliensis. The assembly spans ~1.1 Gb of the estimated 2.15 Gb haploid genome. Overall, ~78% of the genome was identified as repetitive DNA. Gene prediction shows 68,955 gene models, of which 12.7% are unique to Hevea. Most of the key genes associated with rubber biosynthesis, rubberwood formation, disease resistance, and allergenicity have been identified. Conclusions The knowledge gained from this genome sequence will aid in the future development of high-yielding clones to keep up with the ever increasing need for natural rubber.

  3. The complete mitochondrial genome sequence of the Daweishan Mini chicken.

    Science.gov (United States)

    Yan, Ming-Li; Ding, Su-Ping; Ye, Shao-Hui; Wang, Chun-Guang; He, Bao-Li; Yuan, Zhi-Dong; Liu, Li-Li

    2016-01-01

    Daweishan Mini chicken is a valuable chicken breed in China. In this study, the complete mitochondrial genome sequence of Daweishan Mini chicken using PCR amplification, sequencing and assembling has been obtained for the first time. The total length of the mitochondrial genome was 16,785 bp, with the base composition of 30.26% A, 23.73% T, 32.51% C, 13.51% G. It contained 37 genes (2 ribosomal RNA genes, 13 protein-coding genes, 22 transfer RNA genes) and a major non-coding control region (D-loop region). The protein start codons are ATG, except for COX1 that begins with GTG. The complete mitochondrial genome sequence of Daweishan Mini chicken provides an important data set for further investigation on the phylogenetic relationships within Gallus gallus.

  4. Genome sequence of the model medicinal mushroom Ganoderma lucidum

    Science.gov (United States)

    Chen, Shilin; Xu, Jiang; Liu, Chang; Zhu, Yingjie; Nelson, David R.; Zhou, Shiguo; Li, Chunfang; Wang, Lizhi; Guo, Xu; Sun, Yongzhen; Luo, Hongmei; Li, Ying; Song, Jingyuan; Henrissat, Bernard; Levasseur, Anthony; Qian, Jun; Li, Jianqin; Luo, Xiang; Shi, Linchun; He, Liu; Xiang, Li; Xu, Xiaolan; Niu, Yunyun; Li, Qiushi; Han, Mira V.; Yan, Haixia; Zhang, Jin; Chen, Haimei; Lv, Aiping; Wang, Zhen; Liu, Mingzhu; Schwartz, David C.; Sun, Chao

    2012-01-01

    Ganoderma lucidum is a widely used medicinal macrofungus in traditional Chinese medicine that creates a diverse set of bioactive compounds. Here we report its 43.3-Mb genome, encoding 16,113 predicted genes, obtained using next-generation sequencing and optical mapping approaches. The sequence analysis reveals an impressive array of genes encoding cytochrome P450s (CYPs), transporters and regulatory proteins that cooperate in secondary metabolism. The genome also encodes one of the richest sets of wood degradation enzymes among all of the sequenced basidiomycetes. In all, 24 physical CYP gene clusters are identified. Moreover, 78 CYP genes are coexpressed with lanosterol synthase, and 16 of these show high similarity to fungal CYPs that specifically hydroxylate testosterone, suggesting their possible roles in triterpenoid biosynthesis. The elucidation of the G. lucidum genome makes this organism a potential model system for the study of secondary metabolic pathways and their regulation in medicinal fungi. PMID:22735441

  5. Epigenetics of obesity: beyond the genome sequence.

    Science.gov (United States)

    Cordero, Paul; Li, Jiawei; Oben, Jude A

    2015-07-01

    After the study of the gene code as a trigger for obesity, epigenetic code has appeared as a novel tool in the diagnosis, prognosis and treatment of obesity, and its related comorbidities. This review summarizes the status of the epigenetic field associated with obesity, and the current epigenetic-based approaches for obesity treatment. Thanks to technical advances, novel and key obesity-associated polymorphisms have been described by genome-wide association studies, but there are limitations with their predictive power. Epigenetics is also studied for disease association, which involves decoding of the genome information, transcriptional status and later phenotypes. Obesity could be induced during adult life by feeding and other environmental factors, and there is a strong association between obesity features and specific epigenetic patterns. These patterns could be established during early life stages, and programme the risk of obesity and its comorbidities during adult life. Furthermore, recent studies have shown that DNA methylation profile could be applied as biomarkers of diet-induced weight loss treatment. High-throughput technologies, recently implemented for commercial genetic test panels, could soon lead to the creation of epigenetic test panels for obesity. Nonetheless, epigenetics is a modifiable risk factor, and different dietary patterns or environmental insights during distinct stages of life could lead to rewriting of the epigenetic profile.

  6. Complete Genome Sequence of Peste des Petits Ruminants Virus from Georgia, 2016

    OpenAIRE

    Rajko-Nenow, Paulina Z.; Cunliffe, Tabitha G.; Flannery, John T.; Ropiak, Honorata M.; Avaliani, Lasha; Donduashvili, Marina; Baron, Michael D.; Batten, Carrie A.

    2017-01-01

    ABSTRACT We report here the complete genome sequence of a peste des petits ruminants virus (PPRV) from the first outbreak of the disease in Georgia in January 2016. Genome sequencing was performed using Illumina next-generation sequencing technology in conjunction with Sanger sequencing. This PPRV/Georgia/Tbilisi/2016 genome sequence clustered within lineage IV PPRV viruses.

  7. Standardized metadata for human pathogen/vector genomic sequences.

    Directory of Open Access Journals (Sweden)

    Vivien G Dugan

    Full Text Available High throughput sequencing has accelerated the determination of genome sequences for thousands of human infectious disease pathogens and dozens of their vectors. The scale and scope of these data are enabling genotype-phenotype association studies to identify genetic determinants of pathogen virulence and drug/insecticide resistance, and phylogenetic studies to track the origin and spread of disease outbreaks. To maximize the utility of genomic sequences for these purposes, it is essential that metadata about the pathogen/vector isolate characteristics be collected and made available in organized, clear, and consistent formats. Here we report the development of the GSCID/BRC Project and Sample Application Standard, developed by representatives of the Genome Sequencing Centers for Infectious Diseases (GSCIDs, the Bioinformatics Resource Centers (BRCs for Infectious Diseases, and the U.S. National Institute of Allergy and Infectious Diseases (NIAID, part of the National Institutes of Health (NIH, informed by interactions with numerous collaborating scientists. It includes mapping to terms from other data standards initiatives, including the Genomic Standards Consortium's minimal information (MIxS and NCBI's BioSample/BioProjects checklists and the Ontology for Biomedical Investigations (OBI. The standard includes data fields about characteristics of the organism or environmental source of the specimen, spatial-temporal information about the specimen isolation event, phenotypic characteristics of the pathogen/vector isolated, and project leadership and support. By modeling metadata fields into an ontology-based semantic framework and reusing existing ontologies and minimum information checklists, the application standard can be extended to support additional project-specific data fields and integrated with other data represented with comparable standards. The use of this metadata standard by all ongoing and future GSCID sequencing projects will

  8. Order and correlations in genomic DNA sequences. The spectral approach

    International Nuclear Information System (INIS)

    Lobzin, Vasilii V; Chechetkin, Vladimir R

    2000-01-01

    The structural analysis of genomic DNA sequences is discussed in the framework of the spectral approach, which is sufficiently universal due to the reciprocal correspondence and mutual complementarity of Fourier transform length scales. The spectral characteristics of random sequences of the same nucleotide composition possess the property of self-averaging for relatively short sequences of length M≥100-300. Comparison with the characteristics of random sequences determines the statistical significance of the structural features observed. Apart from traditional applications to the search for hidden periodicities, spectral methods are also efficient in studying mutual correlations in DNA sequences. By combining spectra for structure factors and correlation functions, not only integral correlations can be estimated but also their origin identified. Using the structural spectral entropy approach, the regularity of a sequence can be quantitatively assessed. A brief introduction to the problem is also presented and other major methods of DNA sequence analysis described. (reviews of topical problems)

  9. Establishing a framework for comparative analysis of genome sequences

    Energy Technology Data Exchange (ETDEWEB)

    Bansal, A.K.

    1995-06-01

    This paper describes a framework and a high-level language toolkit for comparative analysis of genome sequence alignment The framework integrates the information derived from multiple sequence alignment and phylogenetic tree (hypothetical tree of evolution) to derive new properties about sequences. Multiple sequence alignments are treated as an abstract data type. Abstract operations have been described to manipulate a multiple sequence alignment and to derive mutation related information from a phylogenetic tree by superimposing parsimonious analysis. The framework has been applied on protein alignments to derive constrained columns (in a multiple sequence alignment) that exhibit evolutionary pressure to preserve a common property in a column despite mutation. A Prolog toolkit based on the framework has been implemented and demonstrated on alignments containing 3000 sequences and 3904 columns.

  10. Transcription of densovirus endogenous sequences in the Myzus persicae genome.

    Science.gov (United States)

    Clavijo, Gabriel; van Munster, Manuella; Monsion, Baptiste; Bochet, Nicole; Brault, Véronique

    2016-04-01

    Integration of non-retroviral sequences in the genome of different organisms has been observed and, in some cases, a relationship of these integrations with immunity has been established. The genome of the green peach aphid, Myzus persicae (clone G006), was screened for densovirus-like sequence (DLS) integrations. A total of 21 DLSs localized on 10 scaffolds were retrieved that mostly shared sequence identity with two aphid-infecting viruses, Myzus persicae densovirus (MpDNV) and Dysaphis plantaginea densovirus (DplDNV). In some cases, uninterrupted potential ORFs corresponding to non-structural viral proteins or capsid proteins were found within DLSs identified in the aphid genome. In particular, one scaffold harboured a complete virus-like genome, while another scaffold contained two virus-like genomes in reverse orientation. Remarkably, transcription of some of these ORFs was observed in M. persicae, suggesting a biological effect of these viral integrations. In contrast to most of the other densoviruses identified so far that induce acute host infection, it has been reported previously that MpDNV has only a minor effect on M. persicae fitness, while DplDNV can even have a beneficial effect on its aphid host. This suggests that DLS integration in the M. persicae genome may be responsible for the latency of MpDNV infection in the aphid host.

  11. Quantitative trait loci markers derived from whole genome sequence data increases the reliability of genomic prediction

    DEFF Research Database (Denmark)

    Brøndum, Rasmus Froberg; Su, Guosheng; Janss, Luc

    2015-01-01

    This study investigated the effect on the reliability of genomic prediction when a small number of significant variants from single marker analysis based on whole genome sequence data were added to the regular 54k single nucleotide polymorphism (SNP) array data. The extra markers were selected wi...

  12. Preliminary Genomic Characterization of Ten Hardwood Tree Species from Multiplexed Low Coverage Whole Genome Sequencing.

    Directory of Open Access Journals (Sweden)

    Margaret Staton

    Full Text Available Forest health issues are on the rise in the United States, resulting from introduction of alien pests and diseases, coupled with abiotic stresses related to climate change. Increasingly, forest scientists are finding genetic/genomic resources valuable in addressing forest health issues. For a set of ten ecologically and economically important native hardwood tree species representing a broad phylogenetic spectrum, we used low coverage whole genome sequencing from multiplex Illumina paired ends to economically profile their genomic content. For six species, the genome content was further analyzed by flow cytometry in order to determine the nuclear genome size. Sequencing yielded a depth of 0.8X to 7.5X, from which in silico analysis yielded preliminary estimates of gene and repetitive sequence content in the genome for each species. Thousands of genomic SSRs were identified, with a clear predisposition toward dinucleotide repeats and AT-rich repeat motifs. Flanking primers were designed for SSR loci for all ten species, ranging from 891 loci in sugar maple to 18,167 in redbay. In summary, we have demonstrated that useful preliminary genome information including repeat content, gene content and useful SSR markers can be obtained at low cost and time input from a single lane of Illumina multiplex sequence.

  13. Unveiling Mycoplasma hyopneumoniae Promoters: Sequence Definition and Genomic Distribution

    Science.gov (United States)

    Weber, Shana de Souto; Sant'Anna, Fernando Hayashi; Schrank, Irene Silveira

    2012-01-01

    Several Mycoplasma species have had their genome completely sequenced, including four strains of the swine pathogen Mycoplasma hyopneumoniae. Nevertheless, little is known about the nucleotide sequences that control transcriptional initiation in these microorganisms. Therefore, with the objective of investigating the promoter sequences of M. hyopneumoniae, 23 transcriptional start sites (TSSs) of distinct genes were mapped. A pattern that resembles the σ70 promoter −10 element was found upstream of the TSSs. However, no −35 element was distinguished. Instead, an AT-rich periodic signal was identified. About half of the experimentally defined promoters contained the motif 5′-TRTGn-3′, which was identical to the −16 element usually found in Gram-positive bacteria. The defined promoters were utilized to build position-specific scoring matrices in order to scan putative promoters upstream of all coding sequences (CDSs) in the M. hyopneumoniae genome. Two hundred and one signals were found associated with 169 CDSs. Most of these sequences were located within 100 nucleotides of the start codons. This study has shown that the number of promoter-like sequences in the M. hyopneumoniae genome is more frequent than expected by chance, indicating that most of the sequences detected are probably biologically functional. PMID:22334569

  14. The complete mitochondrial genome sequence of Emperor Penguins (Aptenodytes forsteri).

    Science.gov (United States)

    Xu, Qiwu; Xia, Yan; Dang, Xiao; Chen, Xiaoli

    2016-09-01

    The emperor penguin (Aptenodytes forsteri) is the largest living species of penguin. Herein, we first reported the complete mitochondrial genome of emperor penguin. The mitochondrial genome is a circular molecule of 17 301 bp in length, consisting of 13 protein-coding genes, 22 tRNA genes, two rRNA, and one control region. To verify the accuracy and the utility of new determined mitogenome sequences, we constructed the species phylogenetic tree of emperor penguin together with 10 other closely species. This is the second complete mitochondrial genome of penguin, and this is going to be an important data to study mitochondrial evolution of birds.

  15. The complete chloroplast genome sequence of Curcuma flaviflora (Curcuma).

    Science.gov (United States)

    Zhang, Yan; Deng, Jiabin; Li, Yangyi; Gao, Gang; Ding, Chunbang; Zhang, Li; Zhou, Yonghong; Yang, Ruiwu

    2016-09-01

    The complete chloroplast (cp) genome of Curcuma flaviflora, a medicinal plant in Southeast Asia, was sequenced. The genome size was 160 478 bp in length, with 36.3% GC content. A pair of inverted repeats (IRs) of 26 946 bp were separated by a large single copy (LSC) of 88 008 bp and a small single copy (SSC) of 18 578 bp, respectively. The cp genome contained 132 annotated genes, including 79 protein coding genes, 30 tRNA genes, and four rRNA genes. And 19 of these genes were duplicated in inverted repeat regions.

  16. Complete Genome Sequence of Escherichia coli Strain WG5

    DEFF Research Database (Denmark)

    Imamovic, Lejla; Misiakou, Maria-Anna; van der Helm, Eric

    2018-01-01

    Escherichia coli strain WG5 is a widely used host for phage detection, including somatic coliphages employed as standard ISO method 10705-1 (2000). Here, we present the complete genome sequence of a commercial E. coli WG5 strain.......Escherichia coli strain WG5 is a widely used host for phage detection, including somatic coliphages employed as standard ISO method 10705-1 (2000). Here, we present the complete genome sequence of a commercial E. coli WG5 strain....

  17. Sequence modelling and an extensible data model for genomic database

    Energy Technology Data Exchange (ETDEWEB)

    Li, Peter Wei-Der [California Univ., San Francisco, CA (United States); Univ. of California, Berkeley, CA (United States)

    1992-01-01

    The Human Genome Project (HGP) plans to sequence the human genome by the beginning of the next century. It will generate DNA sequences of more than 10 billion bases and complex marker sequences (maps) of more than 100 million markers. All of these information will be stored in database management systems (DBMSs). However, existing data models do not have the abstraction mechanism for modelling sequences and existing DBMS`s do not have operations for complex sequences. This work addresses the problem of sequence modelling in the context of the HGP and the more general problem of an extensible object data model that can incorporate the sequence model as well as existing and future data constructs and operators. First, we proposed a general sequence model that is application and implementation independent. This model is used to capture the sequence information found in the HGP at the conceptual level. In addition, abstract and biological sequence operators are defined for manipulating the modelled sequences. Second, we combined many features of semantic and object oriented data models into an extensible framework, which we called the ``Extensible Object Model``, to address the need of a modelling framework for incorporating the sequence data model with other types of data constructs and operators. This framework is based on the conceptual separation between constructors and constraints. We then used this modelling framework to integrate the constructs for the conceptual sequence model. The Extensible Object Model is also defined with a graphical representation, which is useful as a tool for database designers. Finally, we defined a query language to support this model and implement the query processor to demonstrate the feasibility of the extensible framework and the usefulness of the conceptual sequence model.

  18. Sequence modelling and an extensible data model for genomic database

    Energy Technology Data Exchange (ETDEWEB)

    Li, Peter Wei-Der (California Univ., San Francisco, CA (United States) Lawrence Berkeley Lab., CA (United States))

    1992-01-01

    The Human Genome Project (HGP) plans to sequence the human genome by the beginning of the next century. It will generate DNA sequences of more than 10 billion bases and complex marker sequences (maps) of more than 100 million markers. All of these information will be stored in database management systems (DBMSs). However, existing data models do not have the abstraction mechanism for modelling sequences and existing DBMS's do not have operations for complex sequences. This work addresses the problem of sequence modelling in the context of the HGP and the more general problem of an extensible object data model that can incorporate the sequence model as well as existing and future data constructs and operators. First, we proposed a general sequence model that is application and implementation independent. This model is used to capture the sequence information found in the HGP at the conceptual level. In addition, abstract and biological sequence operators are defined for manipulating the modelled sequences. Second, we combined many features of semantic and object oriented data models into an extensible framework, which we called the Extensible Object Model'', to address the need of a modelling framework for incorporating the sequence data model with other types of data constructs and operators. This framework is based on the conceptual separation between constructors and constraints. We then used this modelling framework to integrate the constructs for the conceptual sequence model. The Extensible Object Model is also defined with a graphical representation, which is useful as a tool for database designers. Finally, we defined a query language to support this model and implement the query processor to demonstrate the feasibility of the extensible framework and the usefulness of the conceptual sequence model.

  19. Complete Plastid Genome Sequence of the Brown Alga Undaria pinnatifida.

    Directory of Open Access Journals (Sweden)

    Lei Zhang

    Full Text Available In this study, we fully sequenced the circular plastid genome of a brown alga, Undaria pinnatifida. The genome is 130,383 base pairs (bp in size; it contains a large single-copy (LSC, 76,598 bp and a small single-copy region (SSC, 42,977 bp, separated by two inverted repeats (IRa and IRb: 5,404 bp. The genome contains 139 protein-coding, 28 tRNA, and 6 rRNA genes; none of these genes contains introns. Organization and gene contents of the U. pinnatifida plastid genome were similar to those of Saccharina japonica. There is a co-linear relationship between the plastid genome of U. pinnatifida and that of three previously sequenced large brown algal species. Phylogenetic analyses of 43 taxa based on 23 plastid protein-coding genes grouped all plastids into a red or green lineage. In the large brown algae branch, U. pinnatifida and S. japonica formed a sister clade with much closer relationship to Ectocarpus siliculosus than to Fucus vesiculosus. For the first time, the start codon ATT was identified in the plastid genome of large brown algae, in the atpA gene of U. pinnatifida. In addition, we found a gene-length change induced by a 3-bp repetitive DNA in ycf35 and ilvB genes of the U. pinnatifida plastid genome.

  20. Chemical rationale for selection of isolates for genome sequencing

    DEFF Research Database (Denmark)

    Rank, Christian; Larsen, Thomas Ostenfeld; Frisvad, Jens Christian

    The advances in gene sequencing will in the near future enable researchers to affordably acquire the full genomes of handpicked isolates. We here present a method to evaluate the chemical potential of an entire species and select representatives for genome sequencing. The selection criteria for new...... strains to be sequenced can be manifold, but for studying the functional phenotype, using a metabolome based approach offers a cheap and rapid assessment of critical strains to cover the chemical diversity. We have applied this methodology on the complex A. flavus/A. oryzae group. Though these two species...... are in principal identical, they represent two different phenotypes. This is clearly presented through a correspondence analysis of selected extrolites, in which the subtle chemical differences are visually dispersed. The results points to a handful of strains, which, if sequenced, will likely enhance our...

  1. Next is now: new technologies for sequencing of genomes, transcriptomes and beyond

    Science.gov (United States)

    Lister, Ryan; Gregory, Brian D.; Ecker, Joseph R.

    2009-01-01

    Summary The sudden availability of DNA sequencing technologies that rapidly produce vast amounts of sequence information has triggered a paradigm shift in genomics, enabling massively parallel surveying of complex nucleic acid populations. The diversity of applications to which these technologies have already been applied demonstrates the immense range of cellular processes and properties that can now be studied at the single-base resolution. These include genome resequencing and polymorphism discovery, mutation mapping, DNA methylation, histone modifications, transcriptome sequencing, gene discovery, alternative splicing identification, small RNA profiling, DNA-protein and possibly even protein-protein interactions. Thus, these deep sequencing technologies offer plant biologists unprecedented opportunities to increase the understanding of the functions and dynamics of plant cells and populations. PMID:19157957

  2. Correction for Measurement Error from Genotyping-by-Sequencing in Genomic Variance and Genomic Prediction Models

    DEFF Research Database (Denmark)

    Ashraf, Bilal; Janss, Luc; Jensen, Just

    sample). The GBSeq data can be used directly in genomic models in the form of individual SNP allele-frequency estimates (e.g., reference reads/total reads per polymorphic site per individual), but is subject to measurement error due to the low sequencing depth per individual. Due to technical reasons....... In the current work we show how the correction for measurement error in GBSeq can also be applied in whole genome genomic variance and genomic prediction models. Bayesian whole-genome random regression models are proposed to allow implementation of large-scale SNP-based models with a per-SNP correction...... for measurement error. We show correct retrieval of genomic explained variance, and improved genomic prediction when accounting for the measurement error in GBSeq data...

  3. Genomic divergences among cattle, dog and human estimated from large-scale alignments of genomic sequences

    Directory of Open Access Journals (Sweden)

    Shade Larry L

    2006-06-01

    Full Text Available Abstract Background Approximately 11 Mb of finished high quality genomic sequences were sampled from cattle, dog and human to estimate genomic divergences and their regional variation among these lineages. Results Optimal three-way multi-species global sequence alignments for 84 cattle clones or loci (each >50 kb of genomic sequence were constructed using the human and dog genome assemblies as references. Genomic divergences and substitution rates were examined for each clone and for various sequence classes under different functional constraints. Analysis of these alignments revealed that the overall genomic divergences are relatively constant (0.32–0.37 change/site for pairwise comparisons among cattle, dog and human; however substitution rates vary across genomic regions and among different sequence classes. A neutral mutation rate (2.0–2.2 × 10(-9 change/site/year was derived from ancestral repetitive sequences, whereas the substitution rate in coding sequences (1.1 × 10(-9 change/site/year was approximately half of the overall rate (1.9–2.0 × 10(-9 change/site/year. Relative rate tests also indicated that cattle have a significantly faster rate of substitution as compared to dog and that this difference is about 6%. Conclusion This analysis provides a large-scale and unbiased assessment of genomic divergences and regional variation of substitution rates among cattle, dog and human. It is expected that these data will serve as a baseline for future mammalian molecular evolution studies.

  4. Quantifying Next Generation Sequencing Sample Pre-Processing Bias in HIV-1 Complete Genome Sequencing.

    Science.gov (United States)

    Vrancken, Bram; Trovão, Nídia Sequeira; Baele, Guy; van Wijngaerden, Eric; Vandamme, Anne-Mieke; van Laethem, Kristel; Lemey, Philippe

    2016-01-07

    Genetic analyses play a central role in infectious disease research. Massively parallelized "mechanical cloning" and sequencing technologies were quickly adopted by HIV researchers in order to broaden the understanding of the clinical importance of minor drug-resistant variants. These efforts have, however, remained largely limited to small genomic regions. The growing need to monitor multiple genome regions for drug resistance testing, as well as the obvious benefit for studying evolutionary and epidemic processes makes complete genome sequencing an important goal in viral research. In addition, a major drawback for NGS applications to RNA viruses is the need for large quantities of input DNA. Here, we use a generic overlapping amplicon-based near full-genome amplification protocol to compare low-input enzymatic fragmentation (Nextera™) with conventional mechanical shearing for Roche 454 sequencing. We find that the fragmentation method has only a modest impact on the characterization of the population composition and that for reliable results, the variation introduced at all steps of the procedure--from nucleic acid extraction to sequencing--should be taken into account, a finding that is also relevant for NGS technologies that are now more commonly used. Furthermore, by applying our protocol to deep sequence a number of pre-therapy plasma and PBMC samples, we illustrate the potential benefits of a near complete genome sequencing approach in routine genotyping.

  5. Complete genome sequence of a street rabies virus from Mexico.

    Science.gov (United States)

    Zhang, Guoqing; Fu, Zhen F

    2012-10-01

    A canine rabies virus (RABV) has been used as a street rabies virus in laboratory investigations. Its entire genome was sequenced and found to be closely related to that of canine RABV circulating in Mexico. Sequence comparison indicates that the virus is closely related to those in the "cosmopolitan" group, with high homology (89 to 93%) to clade I of rabies viruses. The virus is now termed dog rabies virus-Mexico (DRV-Mexico).

  6. Building a model: developing genomic resources for common milkweed (Asclepias syriaca with low coverage genome sequencing

    Directory of Open Access Journals (Sweden)

    Weitemier Kevin

    2011-05-01

    Full Text Available Abstract Background Milkweeds (Asclepias L. have been extensively investigated in diverse areas of evolutionary biology and ecology; however, there are few genetic resources available to facilitate and compliment these studies. This study explored how low coverage genome sequencing of the common milkweed (Asclepias syriaca L. could be useful in characterizing the genome of a plant without prior genomic information and for development of genomic resources as a step toward further developing A. syriaca as a model in ecology and evolution. Results A 0.5× genome of A. syriaca was produced using Illumina sequencing. A virtually complete chloroplast genome of 158,598 bp was assembled, revealing few repeats and loss of three genes: accD, clpP, and ycf1. A nearly complete rDNA cistron (18S-5.8S-26S; 7,541 bp and 5S rDNA (120 bp sequence were obtained. Assessment of polymorphism revealed that the rDNA cistron and 5S rDNA had 0.3% and 26.7% polymorphic sites, respectively. A partial mitochondrial genome sequence (130,764 bp, with identical gene content to tobacco, was also assembled. An initial characterization of repeat content indicated that Ty1/copia-like retroelements are the most common repeat type in the milkweed genome. At least one A. syriaca microread hit 88% of Catharanthus roseus (Apocynaceae unigenes (median coverage of 0.29× and 66% of single copy orthologs (COSII in asterids (median coverage of 0.14×. From this partial characterization of the A. syriaca genome, markers for population genetics (microsatellites and phylogenetics (low-copy nuclear genes studies were developed. Conclusions The results highlight the promise of next generation sequencing for development of genomic resources for any organism. Low coverage genome sequencing allows characterization of the high copy fraction of the genome and exploration of the low copy fraction of the genome, which facilitate the development of molecular tools for further study of a target species

  7. Building a model: developing genomic resources for common milkweed (Asclepias syriaca) with low coverage genome sequencing.

    Science.gov (United States)

    Straub, Shannon C K; Fishbein, Mark; Livshultz, Tatyana; Foster, Zachary; Parks, Matthew; Weitemier, Kevin; Cronn, Richard C; Liston, Aaron

    2011-05-04

    Milkweeds (Asclepias L.) have been extensively investigated in diverse areas of evolutionary biology and ecology; however, there are few genetic resources available to facilitate and compliment these studies. This study explored how low coverage genome sequencing of the common milkweed (Asclepias syriaca L.) could be useful in characterizing the genome of a plant without prior genomic information and for development of genomic resources as a step toward further developing A. syriaca as a model in ecology and evolution. A 0.5× genome of A. syriaca was produced using Illumina sequencing. A virtually complete chloroplast genome of 158,598 bp was assembled, revealing few repeats and loss of three genes: accD, clpP, and ycf1. A nearly complete rDNA cistron (18S-5.8S-26S; 7,541 bp) and 5S rDNA (120 bp) sequence were obtained. Assessment of polymorphism revealed that the rDNA cistron and 5S rDNA had 0.3% and 26.7% polymorphic sites, respectively. A partial mitochondrial genome sequence (130,764 bp), with identical gene content to tobacco, was also assembled. An initial characterization of repeat content indicated that Ty1/copia-like retroelements are the most common repeat type in the milkweed genome. At least one A. syriaca microread hit 88% of Catharanthus roseus (Apocynaceae) unigenes (median coverage of 0.29×) and 66% of single copy orthologs (COSII) in asterids (median coverage of 0.14×). From this partial characterization of the A. syriaca genome, markers for population genetics (microsatellites) and phylogenetics (low-copy nuclear genes) studies were developed. The results highlight the promise of next generation sequencing for development of genomic resources for any organism. Low coverage genome sequencing allows characterization of the high copy fraction of the genome and exploration of the low copy fraction of the genome, which facilitate the development of molecular tools for further study of a target species and its relatives. This study represents a first

  8. Building a model: developing genomic resources for common milkweed (Asclepias syriaca) with low coverage genome sequencing

    Science.gov (United States)

    2011-01-01

    Background Milkweeds (Asclepias L.) have been extensively investigated in diverse areas of evolutionary biology and ecology; however, there are few genetic resources available to facilitate and compliment these studies. This study explored how low coverage genome sequencing of the common milkweed (Asclepias syriaca L.) could be useful in characterizing the genome of a plant without prior genomic information and for development of genomic resources as a step toward further developing A. syriaca as a model in ecology and evolution. Results A 0.5× genome of A. syriaca was produced using Illumina sequencing. A virtually complete chloroplast genome of 158,598 bp was assembled, revealing few repeats and loss of three genes: accD, clpP, and ycf1. A nearly complete rDNA cistron (18S-5.8S-26S; 7,541 bp) and 5S rDNA (120 bp) sequence were obtained. Assessment of polymorphism revealed that the rDNA cistron and 5S rDNA had 0.3% and 26.7% polymorphic sites, respectively. A partial mitochondrial genome sequence (130,764 bp), with identical gene content to tobacco, was also assembled. An initial characterization of repeat content indicated that Ty1/copia-like retroelements are the most common repeat type in the milkweed genome. At least one A. syriaca microread hit 88% of Catharanthus roseus (Apocynaceae) unigenes (median coverage of 0.29×) and 66% of single copy orthologs (COSII) in asterids (median coverage of 0.14×). From this partial characterization of the A. syriaca genome, markers for population genetics (microsatellites) and phylogenetics (low-copy nuclear genes) studies were developed. Conclusions The results highlight the promise of next generation sequencing for development of genomic resources for any organism. Low coverage genome sequencing allows characterization of the high copy fraction of the genome and exploration of the low copy fraction of the genome, which facilitate the development of molecular tools for further study of a target species and its relatives

  9. Brucella microti: the genome sequence of an emerging pathogen

    Directory of Open Access Journals (Sweden)

    Scholz Holger C

    2009-08-01

    Full Text Available Abstract Background Using a combination of pyrosequencing and conventional Sanger sequencing, the complete genome sequence of the recently described novel Brucella species, Brucella microti, was determined. B. microti is a member of the genus Brucella within the Alphaproteobacteria, which consists of medically important highly pathogenic facultative intracellular bacteria. In contrast to all other Brucella species, B. microti is a fast growing and biochemically very active microorganism with a phenotype more similar to that of Ochrobactrum, a facultative human pathogen. The atypical phenotype of B. microti prompted us to look for genomic differences compared to other Brucella species and to look for similarities with Ochrobactrum. Results The genome is composed of two circular chromosomes of 2,117,050 and 1,220,319 base pairs. Unexpectedly, we found that the genome sequence of B. microti is almost identical to that of Brucella suis 1330 with an overall sequence identity of 99.84% in aligned regions. The most significant structural difference between the two genomes is a bacteriophage-related 11,742 base pairs insert only present in B. microti. However, this insert is unlikely to have any phenotypical consequence. Only four protein coding genes are shared between B. microti and Ochrobactrum anthropi but impaired in other sequenced Brucella. The most noticeable difference between B. microti and other Brucella species was found in the sequence of the 23S ribosomal RNA gene. This unusual variation could have pleiotropic effects and explain the fast growth of B. microti. Conclusion Contrary to expectations from the phenotypic analysis, the genome sequence of B. microti is highly similar to that of known Brucella species, and is remotely related to the one of O. anthropi. How the few differences in gene content between B. microti and B. suis 1330 could result in vastly different phenotypes remains to be elucidated. This unexpected finding will

  10. Brucella microti: the genome sequence of an emerging pathogen.

    Science.gov (United States)

    Audic, Stéphane; Lescot, Magali; Claverie, Jean-Michel; Scholz, Holger C

    2009-08-04

    Using a combination of pyrosequencing and conventional Sanger sequencing, the complete genome sequence of the recently described novel Brucella species, Brucella microti, was determined. B. microti is a member of the genus Brucella within the Alphaproteobacteria, which consists of medically important highly pathogenic facultative intracellular bacteria. In contrast to all other Brucella species, B. microti is a fast growing and biochemically very active microorganism with a phenotype more similar to that of Ochrobactrum, a facultative human pathogen. The atypical phenotype of B. microti prompted us to look for genomic differences compared to other Brucella species and to look for similarities with Ochrobactrum. The genome is composed of two circular chromosomes of 2,117,050 and 1,220,319 base pairs. Unexpectedly, we found that the genome sequence of B. microti is almost identical to that of Brucella suis 1330 with an overall sequence identity of 99.84% in aligned regions. The most significant structural difference between the two genomes is a bacteriophage-related 11,742 base pairs insert only present in B. microti. However, this insert is unlikely to have any phenotypical consequence. Only four protein coding genes are shared between B. microti and Ochrobactrum anthropi but impaired in other sequenced Brucella. The most noticeable difference between B. microti and other Brucella species was found in the sequence of the 23S ribosomal RNA gene. This unusual variation could have pleiotropic effects and explain the fast growth of B. microti. Contrary to expectations from the phenotypic analysis, the genome sequence of B. microti is highly similar to that of known Brucella species, and is remotely related to the one of O. anthropi. How the few differences in gene content between B. microti and B. suis 1330 could result in vastly different phenotypes remains to be elucidated. This unexpected finding will complicate the task of identifying virulence determinants in the

  11. A Pan-HIV Strategy for Complete Genome Sequencing

    Science.gov (United States)

    Yamaguchi, Julie; Alessandri-Gradt, Elodie; Tell, Robert W.; Brennan, Catherine A.

    2015-01-01

    Molecular surveillance is essential to monitor HIV diversity and track emerging strains. We have developed a universal library preparation method (HIV-SMART [i.e., switching mechanism at 5′ end of RNA transcript]) for next-generation sequencing that harnesses the specificity of HIV-directed priming to enable full genome characterization of all HIV-1 groups (M, N, O, and P) and HIV-2. Broad application of the HIV-SMART approach was demonstrated using a panel of diverse cell-cultured virus isolates. HIV-1 non-subtype B-infected clinical specimens from Cameroon were then used to optimize the protocol to sequence directly from plasma. When multiplexing 8 or more libraries per MiSeq run, full genome coverage at a median ∼2,000× depth was routinely obtained for either sample type. The method reproducibly generated the same consensus sequence, consistently identified viral sequence heterogeneity present in specimens, and at viral loads of ≤4.5 log copies/ml yielded sufficient coverage to permit strain classification. HIV-SMART provides an unparalleled opportunity to identify diverse HIV strains in patient specimens and to determine phylogenetic classification based on the entire viral genome. Easily adapted to sequence any RNA virus, this technology illustrates the utility of next-generation sequencing (NGS) for viral characterization and surveillance. PMID:26699702

  12. Microbial diversity and activity in the Nematostella vectensis holobiont: insights from 16S rRNA gene sequencing, isolate genomes, and a pilot-scale survey of gene expression

    Directory of Open Access Journals (Sweden)

    Jia Yi Har

    2015-09-01

    Full Text Available We have characterized the molecular and genomic diversity of the microbiota of the starlet sea anemone Nematostella vectensis, a cnidarian model for comparative developmental and functional biology and a year-round inhabitant of temperate salt marshes. Molecular phylogenetic analysis of 16S rRNA gene clone libraries revealed four ribotypes associated with N. vectensis at multiple locations and times. These associates include two novel ribotypes within the ε-Proteobacterial order Campylobacterales and the Spirochetes, respectively, each sharing 99% 16S rRNA identity with Endozoicomonas elysicola and Pseudomonas oleovorans, respectively. Species-specific PCR revealed that these populations persisted in N. vectensis asexually propagated under laboratory conditions. cDNA indicated expression of the Campylobacterales and Endozoicomonas 16S rRNA in anemones from Sippewissett Marsh, MA. A collection of bacteria from laboratory raised N. vectensis was dominated by isolates from P. oleovorans and Rhizobium radiobacter. Isolates from field-collected anemones revealed an association with Limnobacter and Stappia isolates. Genomic DNA sequencing was carried out on 10 cultured bacterial isolates representing field- and laboratory-associates, i.e. Limnobacter spp., Stappia spp., P. oleovorans and R. radiobacter. Genomes contained multiple genes identified as virulence (host-association factors while S. stellulata and L. thiooxidans genomes revealed pathways for mixotrophic sulfur oxidation. A pilot metatranscriptome of laboratory-raised N. vectensis was compared to the isolate genomes and indicated expression of ORFs from L. thiooxidans with predicted functions of motility, nutrient scavenging (Fe and P, polyhydroxyalkanoate synthesis for carbon storage, and selective permeability (porins. We hypothesize that such activities may mediate acclimation and persistence of bacteria in N. vectensis.

  13. SEXCMD: Development and validation of sex marker sequences for whole-exome/genome and RNA sequencing.

    Directory of Open Access Journals (Sweden)

    Seongmun Jeong

    Full Text Available Over the last decade, a large number of nucleotide sequences have been generated by next-generation sequencing technologies and deposited to public databases. However, most of these datasets do not specify the sex of individuals sampled because researchers typically ignore or hide this information. Male and female genomes in many species have distinctive sex chromosomes, XX/XY and ZW/ZZ, and expression levels of many sex-related genes differ between the sexes. Herein, we describe how to develop sex marker sequences from syntenic regions of sex chromosomes and use them to quickly identify the sex of individuals being analyzed. Array-based technologies routinely use either known sex markers or the B-allele frequency of X or Z chromosomes to deduce the sex of an individual. The same strategy has been used with whole-exome/genome sequence data; however, all reads must be aligned onto a reference genome to determine the B-allele frequency of the X or Z chromosomes. SEXCMD is a pipeline that can extract sex marker sequences from reference sex chromosomes and rapidly identify the sex of individuals from whole-exome/genome and RNA sequencing after training with a known dataset through a simple machine learning approach. The pipeline counts total numbers of hits from sex-specific marker sequences and identifies the sex of the individuals sampled based on the fact that XX/ZZ samples do not have Y or W chromosome hits. We have successfully validated our pipeline with mammalian (Homo sapiens; XY and avian (Gallus gallus; ZW genomes. Typical calculation time when applying SEXCMD to human whole-exome or RNA sequencing datasets is a few minutes, and analyzing human whole-genome datasets takes about 10 minutes. Another important application of SEXCMD is as a quality control measure to avoid mixing samples before bioinformatics analysis. SEXCMD comprises simple Python and R scripts and is freely available at https://github.com/lovemun/SEXCMD.

  14. SEXCMD: Development and validation of sex marker sequences for whole-exome/genome and RNA sequencing.

    Science.gov (United States)

    Jeong, Seongmun; Kim, Jiwoong; Park, Won; Jeon, Hongmin; Kim, Namshin

    2017-01-01

    Over the last decade, a large number of nucleotide sequences have been generated by next-generation sequencing technologies and deposited to public databases. However, most of these datasets do not specify the sex of individuals sampled because researchers typically ignore or hide this information. Male and female genomes in many species have distinctive sex chromosomes, XX/XY and ZW/ZZ, and expression levels of many sex-related genes differ between the sexes. Herein, we describe how to develop sex marker sequences from syntenic regions of sex chromosomes and use them to quickly identify the sex of individuals being analyzed. Array-based technologies routinely use either known sex markers or the B-allele frequency of X or Z chromosomes to deduce the sex of an individual. The same strategy has been used with whole-exome/genome sequence data; however, all reads must be aligned onto a reference genome to determine the B-allele frequency of the X or Z chromosomes. SEXCMD is a pipeline that can extract sex marker sequences from reference sex chromosomes and rapidly identify the sex of individuals from whole-exome/genome and RNA sequencing after training with a known dataset through a simple machine learning approach. The pipeline counts total numbers of hits from sex-specific marker sequences and identifies the sex of the individuals sampled based on the fact that XX/ZZ samples do not have Y or W chromosome hits. We have successfully validated our pipeline with mammalian (Homo sapiens; XY) and avian (Gallus gallus; ZW) genomes. Typical calculation time when applying SEXCMD to human whole-exome or RNA sequencing datasets is a few minutes, and analyzing human whole-genome datasets takes about 10 minutes. Another important application of SEXCMD is as a quality control measure to avoid mixing samples before bioinformatics analysis. SEXCMD comprises simple Python and R scripts and is freely available at https://github.com/lovemun/SEXCMD.

  15. Detection of inter-spread repeat sequence in genomic DNA sequence.

    Science.gov (United States)

    Murakami, Hiroo; Sugaya, Nobuyoshi; Sato, Makihiko; Imaizumi, Akira; Aburatani, Sachiyo; Horimoto, Katsuhisa

    2004-01-01

    Various types of periodic patterns in nucleotide sequences are known to be very abundant in a genomic DNA sequence, and to play important biological roles such as gene expression, genome structural stabilization, and recombination. We present a new method, named "STEPSTONE", to find a specific periodic pattern of repeat sequence, inter-spread repeat, in which the tandem repeats of the conserved and the not-conserved regions appear periodically. In our method, at first, the data on periods of short repeat sequences found in a target sequence are stored as a hash data, and then are selected by application of an auto-correlation test in time series analysis. Among the statistically selected sequences, the inter-spread repeats are obtained by usual alignment procedures through two steps. To test the performance of our method, we examined the inter-spread repeats in Mycobacterium tuberculosis and Zamia paucijuga genomic sequences. As a result, our method exactly detected the repeats in the two sequences, being useful for identifying systematically the inter-spread repeats in DNA sequence.

  16. Mitochondrial genome sequences and comparative genomics ofPhytophthora ramorum and P. sojae

    Energy Technology Data Exchange (ETDEWEB)

    Martin, Frank N.; Douda, Bensasson; Tyler, Brett M.; Boore,Jeffrey L.

    2007-01-01

    The complete sequences of the mitochondrial genomes of theoomycetes of Phytophthora ramorum and P. sojae were determined during thecourse of their complete nuclear genome sequencing (Tyler, et al. 2006).Both are circular, with sizes of 39,314 bp for P. ramorum and 42,975 bpfor P. sojae. Each contains a total of 37 identifiable protein-encodinggenes, 25 or 26 tRNAs (P. sojae and P. ramorum, respectively)specifying19 amino acids, and a variable number of ORFs (7 for P. ramorum and 12for P. sojae) which are potentially additional functional genes.Non-coding regions comprise approximately 11.5 percent and 18.4 percentof the genomes of P. ramorum and P. sojae, respectively. Relative to P.sojae, there is an inverted repeat of 1,150 bp in P. ramorum thatincludes an unassigned unique ORF, a tRNA gene, and adjacent non-codingsequences, but otherwise the gene order in both species is identical.Comparisons of these genomes with published sequences of the P. infestansmitochondrial genome reveals a number of similarities, but the gene orderin P. infestans differs in two adjacent locations due to inversions.Sequence alignments of the three genomes indicated sequence conservationranging from 75 to 85 percent and that specific regions were morevariable than others.

  17. Genome sequence and characterization of the Tsukamurella bacteriophage TPA2.

    Science.gov (United States)

    Petrovski, Steve; Seviour, Robert J; Tillett, Daniel

    2011-02-01

    The formation of stable foam in activated sludge plants is a global problem for which control is difficult. These foams are often stabilized by hydrophobic mycolic acid-synthesizing Actinobacteria, among which are Tsukamurella spp. This paper describes the isolation from activated sludge of the novel double-stranded DNA phage TPA2. This polyvalent Siphoviridae family phage is lytic for most Tsukamurella species. Whole-genome sequencing reveals that the TPA2 genome is circularly permuted (61,440 bp) and that 70% of its sequence is novel. We have identified 78 putative open reading frames, 95 pairs of inverted repeats, and 6 palindromes. The TPA2 genome has a modular gene structure that shares some similarity to those of Mycobacterium phages. A number of the genes display a mosaic architecture, suggesting that the TPA2 genome has evolved at least in part from genetic recombination events. The genome sequence reveals many novel genes that should inform any future discussion on Tsukamurella phage evolution.

  18. Analysis of B-genome derived simple sequence repeat (SSR ...

    African Journals Online (AJOL)

    A study was conducted to investigate the genetic variability between 40 Musa genotypes maintained at the Musa germplasm collection of the International Institute for Tropical Agriculture, Ibadan using nine B-genome derived simple sequence repeat (SSR) markers. The nine primers produced reproducible and discrete ...

  19. Complete Genome Sequence of Mycobacterium vaccae Type Strain ATCC 25954

    KAUST Repository

    Ho, Y. S.

    2012-10-26

    Mycobacterium vaccae is a rapidly growing, nontuberculous Mycobacterium species that is generally not considered a human pathogen and is of major pharmaceutical interest as an immunotherapeutic agent. We report here the annotated genome sequence of the M. vaccae type strain, ATCC 25954.

  20. Draft Genome Sequence of Mycobacterium chimaera Type Strain Fl-0169

    Science.gov (United States)

    We report the draft genome sequence of the type strain Mycobacterium chimaera Fl-0169T, a member of the Mycobacterium avium complex (MAC). M. chimaera Fl-0169T was isolated from a patient in Italy and is highly similar to strains of M. chimaera isolated in Ireland, though Fl-016...

  1. Genome Sequence of Gordonia Phage BetterKatz

    Science.gov (United States)

    Berryman, Emily N.; Forrest, Kaitlyn M.; McHale, Lilliana; Wertz, Anthony T.; Zhuang, Zenas; Kasturiarachi, Naomi S.; Pressimone, Catherine A.; Schiebel, Johnathon G.; Furbee, Emily C.; Grubb, Sarah R.; Warner, Marcie H.; Montgomery, Matthew T.; Garlena, Rebecca A.; Russell, Daniel A.; Jacobs-Sera, Deborah; Hatfull, Graham F.

    2016-01-01

    BetterKatz is a bacteriophage isolated from a soil sample collected in Pittsburgh, Pennsylvania using the host Gordonia terrae 3612. BetterKatz’s genome is 50,636 bp long and contains 75 predicted protein-coding genes, 35 of which have been assigned putative functions. BetterKatz is not closely related to other sequenced Gordonia phages. PMID:27516497

  2. Complete genome sequence of the myxobacterium Sorangium cellulosum

    DEFF Research Database (Denmark)

    Schneiker, S; Perlova, O; Kaiser, O

    2007-01-01

    The genus Sorangium synthesizes approximately half of the secondary metabolites isolated from myxobacteria, including the anti-cancer metabolite epothilone. We report the complete genome sequence of the model Sorangium strain S. cellulosum Soce56, which produces several natural products and has...

  3. Genome Sequences of 19 Novel Erwinia amylovora Bacteriophages

    Science.gov (United States)

    Esplin, Ian N. D.; Berg, Jordan A.; Sharma, Ruchira; Allen, Robert C.; Arens, Daniel K.; Ashcroft, Cody R.; Bairett, Shannon R.; Beatty, Nolan J.; Bickmore, Madeline; Bloomfield, Travis J.; Brady, T. Scott; Bybee, Rachel N.; Carter, John L.; Choi, Minsey C.; Duncan, Steven; Fajardo, Christopher P.; Foy, Brayden B.; Fuhriman, David A.; Gibby, Paul D.; Grossarth, Savannah E.; Harbaugh, Kala; Harris, Natalie; Hilton, Jared A.; Hurst, Emily; Hyde, Jonathan R.; Ingersoll, Kayleigh; Jacobson, Caitlin M.; James, Brady D.; Jarvis, Todd M.; Jaen-Anieves, Daniella; Jensen, Garrett L.; Knabe, Bradley K.; Kruger, Jared L.; Merrill, Bryan D.; Pape, Jenny A.; Payne Anderson, Ashley M.; Payne, David E.; Peck, Malia D.; Pollock, Samuel V.; Putnam, Micah J.; Ransom, Ethan K.; Ririe, Devin B.; Robinson, David M.; Rogers, Spencer L.; Russell, Kerri A.; Schoenhals, Jonathan E.; Shurtleff, Christopher A.; Simister, Austin R.; Smith, Hunter G.; Stephenson, Michael B.; Staley, Lyndsay A.; Stettler, Jason M.; Stratton, Mallorie L.; Tateoka, Olivia B.; Tatlow, P. J.; Taylor, Alexander S.; Thompson, Suzanne E.; Townsend, Michelle H.; Thurgood, Trever L.; Usher, Brittian K.; Whitley, Kiara V.; Ward, Andrew T.; Ward, Megan E. H.; Webb, Charles J.; Wienclaw, Trevor M.; Williamson, Taryn L.; Wells, Michael J.; Wright, Cole K.; Breakwell, Donald P.; Hope, Sandra

    2017-01-01

    ABSTRACT Erwinia amylovora is the causal agent of fire blight, a devastating disease affecting some plants of the Rosaceae family. We isolated bacteriophages from samples collected from infected apple and pear trees along the Wasatch Front in Utah. We announce 19 high-quality complete genome sequences of E. amylovora bacteriophages. PMID:29146842

  4. Genome Sequences of 19 Novel Erwinia amylovora Bacteriophages

    OpenAIRE

    Esplin, Ian N. D.; Berg, Jordan A.; Sharma, Ruchira; Allen, Robert C.; Arens, Daniel K.; Ashcroft, Cody R.; Bairett, Shannon R.; Beatty, Nolan J.; Bickmore, Madeline; Bloomfield, Travis J.; Brady, T. Scott; Bybee, Rachel N.; Carter, John L.; Choi, Minsey C.; Duncan, Steven

    2017-01-01

    ABSTRACT Erwinia amylovora is the causal agent of fire blight, a devastating disease affecting some plants of the Rosaceae family. We isolated bacteriophages from samples collected from infected apple and pear trees along the Wasatch Front in Utah. We announce 19 high-quality complete genome sequences of E. amylovora bacteriophages.

  5. Genome Sequence of Ureaplasma diversum Strain ATCC 49782.

    Science.gov (United States)

    Marques, Lucas M; Guimarães, Ana M S; Martins, Hellen B; Rezende, Izadora S; Barbosa, Maysa S; Campos, Guilherme B; do Nascimento, Naíla C; Dos Santos, Andrea P; Amorim, Aline T; Santos, Verena M; Messick, Joanne B; Timenetsky, Jorge

    2015-04-16

    Here, we report the complete genome sequence of Ureaplasma diversum strain ATCC 49782. This species is of bovine origin, having an association with reproductive disorders in cattle, including placentitis, fetal alveolitis, abortion, and birth of weak calves. It has a small circular chromosome of 975,425 bp. Copyright © 2015 Marques et al.

  6. Targeted enrichment of genomic DNA regions for next generation sequencing

    NARCIS (Netherlands)

    Mertens, F.; El-Sharawy, A.; Sauer, S.; Van Helvoort, J.; Van der Zaag, P.J.; Franke, A.; Nilsson, M.; Lehrach. H.; Brookes, A.

    2011-01-01

    In this review we discuss the latest targeted enrichment methods, and aspects of their utilization along with second generation sequencing for complex genome analysis. In doing so we provide an overview of issues involved in detecting genetic variation, for which targeted enrichment has become a

  7. Draft Genome Sequence of Rhodococcus rhodochrous Strain ATCC 21198

    Energy Technology Data Exchange (ETDEWEB)

    Shields-Menard, Sara A. [Mississippi State University (MSU); Brown, Steven D [ORNL; Klingeman, Dawn Marie [ORNL; Indest, Karl [University of Tennessee (UTK) and Oak Ridge National Laboratory (ORNL); Hancock, Dawn [U.S. Army Engineer Research and Development Center; Wewalwela, Jayani [Mississippi State University (MSU); French, Todd [Mississippi State University (MSU); Donaldson, Janet [Mississippi State University

    2014-01-01

    Rhodococcus rhodochrous is a Gram-positive red-pigmented bacterium commonly found in the soil. The draft genome sequence for R. rhodochrous strain ATCC 21198 is presented here to provide genetic data for a better understanding of its lipid-accumulating capabilities.

  8. cDNA, genomic sequence cloning and overexpression of ribosomal ...

    African Journals Online (AJOL)

    RPS16 of eukaryote is a component of the 40S small ribosomal subunit encoded by RPS16 gene and is also a homolog of prokaryotic RPS9. The cDNA and genomic sequence of RPS16 was cloned successfully for the first time from the Giant Panda (Ailuropoda melanoleuca) using reverse transcription-polymerase chain ...

  9. PennCNV in whole-genome sequencing data.

    Science.gov (United States)

    de Araújo Lima, Leandro; Wang, Kai

    2017-10-03

    The use of high-throughput sequencing data has improved the results of genomic analysis due to the resolution of mapping algorithms. Although several tools for copy-number variation calling in whole genome sequencing have been published, the noisy nature of sequencing data is still a limitation for accuracy and concordance among such tools. To assess the performance of PennCNV original algorithm for array data in whole genome sequencing data, we processed mapping (BAM) files to extract coverage, representing log R ratio (LRR) of signal intensity, and B allele frequency (BAF). We used high quality sample NA12878 from the recently reported NIST database and created 10 artificial samples with several CNVs spread along all chromosomes. We compared PennCNV-Seq with other tools with general deletions and duplications, as well as for different number of copies and copy-neutral loss-of-heterozygosity (LOH). PennCNV-Seq was able to find correct CNVs and can be integrated in existing CNV calling pipelines to report accurately the number of copies in specific genomic regions.

  10. Draft genome sequence of the mulberry tree Morus notabilis

    Science.gov (United States)

    He, Ningjia; Zhang, Chi; Qi, Xiwu; Zhao, Shancen; Tao, Yong; Yang, Guojun; Lee, Tae-Ho; Wang, Xiyin; Cai, Qingle; Li, Dong; Lu, Mengzhu; Liao, Sentai; Luo, Guoqing; He, Rongjun; Tan, Xu; Xu, Yunmin; Li, Tian; Zhao, Aichun; Jia, Ling; Fu, Qiang; Zeng, Qiwei; Gao, Chuan; Ma, Bi; Liang, Jiubo; Wang, Xiling; Shang, Jingzhe; Song, Penghua; Wu, Haiyang; Fan, Li; Wang, Qing; Shuai, Qin; Zhu, Juanjuan; Wei, Congjin; Zhu-Salzman, Keyan; Jin, Dianchuan; Wang, Jinpeng; Liu, Tao; Yu, Maode; Tang, Cuiming; Wang, Zhenjiang; Dai, Fanwei; Chen, Jiafei; Liu, Yan; Zhao, Shutang; Lin, Tianbao; Zhang, Shougong; Wang, Junyi; Wang, Jian; Yang, Huanming; Yang, Guangwei; Wang, Jun; Paterson, Andrew H.; Xia, Qingyou; Ji, Dongfeng; Xiang, Zhonghuai

    2013-01-01

    Human utilization of the mulberry–silkworm interaction started at least 5,000 years ago and greatly influenced world history through the Silk Road. Complementing the silkworm genome sequence, here we describe the genome of a mulberry species Morus notabilis. In the 330-Mb genome assembly, we identify 128 Mb of repetitive sequences and 29,338 genes, 60.8% of which are supported by transcriptome sequencing. Mulberry gene sequences appear to evolve ~3 times faster than other Rosales, perhaps facilitating the species’ spread worldwide. The mulberry tree is among a few eudicots but several Rosales that have not preserved genome duplications in more than 100 million years; however, a neopolyploid series found in the mulberry tree and several others suggest that new duplications may confer benefits. Five predicted mulberry miRNAs are found in the haemolymph and silk glands of the silkworm, suggesting interactions at molecular levels in the plant–herbivore relationship. The identification and analyses of mulberry genes involved in diversifying selection, resistance and protease inhibitor expressed in the laticifers will accelerate the improvement of mulberry plants. PMID:24048436

  11. Genome sequence of the human pathogen Vibrio cholerae Amazonia.

    NARCIS (Netherlands)

    Thompson, C.C.; Marin, M.A.; Dias, G.M.; Dutilh, B.E.; Edwards, R.A.; Iida, T.; Thompson, F.L.; Vicente, A.C.

    2011-01-01

    Vibrio cholerae O1 Amazonia is a pathogen that was isolated from cholera-like diarrhea cases in at least two countries, Brazil and Ghana. Based on multilocus sequence analysis, this lineage belongs to a distinct profile compared to strains from El Tor and classical biotypes. The genomic analysis

  12. Insights from 20 years of bacterial genome sequencing

    DEFF Research Database (Denmark)

    Land, Miriam; Hauser, Loren; Jun, Se-Ran

    2015-01-01

    in less characterized taxonomic groups. The clustered regularly interspaced short palindromic repeats (CRISPR)-Cas system provides bacteria with immunity against viruses, which outnumber bacteria by tenfold. How fast can we go? Second-generation sequencing has produced a large number of draft genomes...

  13. Complete Genome Sequence of Beijerinckia indica subsp. indica▿

    Science.gov (United States)

    Tamas, Ivica; Dedysh, Svetlana N.; Liesack, Werner; Stott, Matthew B.; Alam, Maqsudul; Murrell, J. Colin; Dunfield, Peter F.

    2010-01-01

    Beijerinckia indica subsp. indica is an aerobic, acidophilic, exopolysaccharide-producing, N2-fixing soil bacterium. It is a generalist chemoorganotroph that is phylogenetically closely related to facultative and obligate methanotrophs of the genera Methylocella and Methylocapsa. Here we report the full genome sequence of this bacterium. PMID:20601475

  14. Complete genome sequence of Beijerinckia indica subsp. indica.

    Science.gov (United States)

    Tamas, Ivica; Dedysh, Svetlana N; Liesack, Werner; Stott, Matthew B; Alam, Maqsudul; Murrell, J Colin; Dunfield, Peter F

    2010-09-01

    Beijerinckia indica subsp. indica is an aerobic, acidophilic, exopolysaccharide-producing, N(2)-fixing soil bacterium. It is a generalist chemoorganotroph that is phylogenetically closely related to facultative and obligate methanotrophs of the genera Methylocella and Methylocapsa. Here we report the full genome sequence of this bacterium.

  15. Genome Sequence of Novel Human Parechovirus Type 17

    OpenAIRE

    B?ttcher, Sindy; Obermeier, Patrick E.; Diedrich, Sabine; Kabor?, Yolande; D?Alfonso, Rossella; Pfister, Herbert; Kaiser, Rolf; Di Cristanziano, Veronica

    2017-01-01

    ABSTRACT Human parechoviruses (HPeV) circulate worldwide, causing a broad variety of symptoms, preferentially in early childhood. We report here the nearly complete genome sequence of a novel HPeV type, consisting of 7,062 nucleotides and encoding 2,179?amino acids. M36/CI/2014 was taxonomically classified as HPeV-17 by the picornavirus study group.

  16. Comparison of the genome sequences and the phylogenetic ...

    Indian Academy of Sciences (India)

    Home; Journals; Journal of Biosciences; Volume 25; Issue 3. Comparison of the genome sequences and the phylogenetic analyses of the GP78 and the Vellore P20778 isolates of Japanese encephalitis virus from India. Sudhanshu Vrati. Articles Volume 25 Issue 3 September 2000 pp 257-262 ...

  17. Complete Genome Sequence of Enterococcus faecium Commensal Isolate E1002

    NARCIS (Netherlands)

    Tytgat, Hanne L P; Douillard, François P; Laine, Pia K; Paulin, Lars; Willems, Rob J L; de Vos, Willem M

    2016-01-01

    The emergence of vancomycin-resistant enterococci (VRE) has been associated with an increase in multidrug-resistant nosocomial infections. Here, we report the 2.614-Mb genome sequence of the Enterococcus faecium commensal isolate E1002, which will be instrumental in further understanding the

  18. Complete Genome Sequence of Prevotella intermedia Strain 17-2

    OpenAIRE

    Nambu, Takayuki; Yamane, Kazuyoshi; Maruyama, Hugo; Mashimo, Chiho; Yamanaka, Takeshi

    2015-01-01

    Prevotella intermedia, a Gram-negative black-pigmented anaerobic rod, is frequently isolated from not only periodontal pockets but also purulent infections. We report here the complete genome sequence of P.?intermedia strain 17-2, which is a non-exopolysaccharide-producing variant obtained from exopolysaccharide (EPS)-producing P.?intermedia strain 17 stock culture.

  19. A Bibliometric Analysis of Global Research on Genome Sequencing ...

    African Journals Online (AJOL)

    YSHo

    This study was carried out to evaluate the global scientific production of genome sequencing research to assess the characteristics of the research performances and the research tendencies. Data were obtained from Science Citation Index Expanded database during 1991-2010. Conventional methods including document ...

  20. Characterization of genomic sequence of a drought-resistant gene ...

    Indian Academy of Sciences (India)

    Characterization of genomic sequence of a drought-resistant gene. TaSnRK2.7 in wheat species. HONG YING ZHANG1,2, WEI LI3, XIN GUO MAO1 and RUI LIAN JING1∗. 1The National Key Facility for Crop Gene Resources and Genetic Improvement, Institute of Crop Science,. Chinese Academy of Agricultural Sciences, ...

  1. The white spot syndrome virus DNA genome sequence

    NARCIS (Netherlands)

    Hulten, van M.C.W.; Witteveldt, J.; Peters, S.; Kloosterboer, N.; Tarchini, R.; Fiers, M.; Sandbrink, H.; Klein Lankhorst, R.; Vlak, J.M.

    2001-01-01

    White spot syndrome virus (WSSV) is at present a major scourge to worldwide shrimp cultivation. We have determined the entire sequence of the double-stranded, circular DNA genome of WSSV, which contains 292,967 nucleotides encompassing 184 major open reading frames (ORFs). Only 6 f the WSSV ORFs

  2. Large-Scale Sequencing: The Future of Genomic Sciences Colloquium

    Energy Technology Data Exchange (ETDEWEB)

    Margaret Riley; Merry Buckley

    2009-01-01

    Genetic sequencing and the various molecular techniques it has enabled have revolutionized the field of microbiology. Examining and comparing the genetic sequences borne by microbes - including bacteria, archaea, viruses, and microbial eukaryotes - provides researchers insights into the processes microbes carry out, their pathogenic traits, and new ways to use microorganisms in medicine and manufacturing. Until recently, sequencing entire microbial genomes has been laborious and expensive, and the decision to sequence the genome of an organism was made on a case-by-case basis by individual researchers and funding agencies. Now, thanks to new technologies, the cost and effort of sequencing is within reach for even the smallest facilities, and the ability to sequence the genomes of a significant fraction of microbial life may be possible. The availability of numerous microbial genomes will enable unprecedented insights into microbial evolution, function, and physiology. However, the current ad hoc approach to gathering sequence data has resulted in an unbalanced and highly biased sampling of microbial diversity. A well-coordinated, large-scale effort to target the breadth and depth of microbial diversity would result in the greatest impact. The American Academy of Microbiology convened a colloquium to discuss the scientific benefits of engaging in a large-scale, taxonomically-based sequencing project. A group of individuals with expertise in microbiology, genomics, informatics, ecology, and evolution deliberated on the issues inherent in such an effort and generated a set of specific recommendations for how best to proceed. The vast majority of microbes are presently uncultured and, thus, pose significant challenges to such a taxonomically-based approach to sampling genome diversity. However, we have yet to even scratch the surface of the genomic diversity among cultured microbes. A coordinated sequencing effort of cultured organisms is an appropriate place to begin

  3. Complete Genomic Sequence of Rabies Virus from an Ethiopian Wolf

    Science.gov (United States)

    Wise, Emma L.; Ellis, Richard J.; McElhinney, Lorraine M.; Banyard, Ashley C.; Johnson, Nicholas; Deressa, Asefa; Regassa, Fekede; de Lamballerie, Xavier; Fooks, Anthony R.; Sillero-Zubiri, Claudio

    2015-01-01

    Ethiopian wolves are the rarest canid in the world, with only 500 found in the Ethiopian highlands. Rabies poses the most immediate threat to their survival, causing epizootic cycles of mass mortality. The complete genome sequence of a rabies virus (RABV) derived from an Ethiopian wolf during the most recent epizootic is reported here. PMID:25814597

  4. Complete genome sequence of Oceanithermus profundus type strain (506T)

    Energy Technology Data Exchange (ETDEWEB)

    Pati, Amrita [U.S. Department of Energy, Joint Genome Institute; Zhang, Xiaojing [Los Alamos National Laboratory (LANL); Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute; Nolan, Matt [U.S. Department of Energy, Joint Genome Institute; Lucas, Susan [U.S. Department of Energy, Joint Genome Institute; Glavina Del Rio, Tijana [U.S. Department of Energy, Joint Genome Institute; Tice, Hope [U.S. Department of Energy, Joint Genome Institute; Cheng, Jan-Fang [U.S. Department of Energy, Joint Genome Institute; Tapia, Roxanne [Los Alamos National Laboratory (LANL); Han, Cliff [Los Alamos National Laboratory (LANL); Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Pitluck, Sam [U.S. Department of Energy, Joint Genome Institute; Liolios, Konstantinos [U.S. Department of Energy, Joint Genome Institute; Pagani, Ioanna [U.S. Department of Energy, Joint Genome Institute; Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Mavromatis, K [U.S. Department of Energy, Joint Genome Institute; Chen, Amy [U.S. Department of Energy, Joint Genome Institute; Palaniappan, Krishna [U.S. Department of Energy, Joint Genome Institute; Hauser, Loren John [ORNL; Jeffries, Cynthia [Oak Ridge National Laboratory (ORNL); Brambilla, Evelyne-Marie [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Ruhl, Alina [U.S. Department of Energy, Joint Genome Institute; Mwirichia, Romano [University of Munster, Germany; Rohde, Manfred [HZI - Helmholtz Centre for Infection Research, Braunschweig, Germany; Tindall, Brian [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Sikorski, Johannes [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Wirth, Reinhard [Universitat Regensburg, Regensburg, Germany; Goker, Markus [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Woyke, Tanja [U.S. Department of Energy, Joint Genome Institute; Detter, J. Chris [U.S. Department of Energy, Joint Genome Institute; Bristow, James [U.S. Department of Energy, Joint Genome Institute; Eisen, Jonathan [U.S. Department of Energy, Joint Genome Institute; Markowitz, Victor [U.S. Department of Energy, Joint Genome Institute; Hugenholtz, Philip [U.S. Department of Energy, Joint Genome Institute; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute; Klenk, Hans-Peter [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Land, Miriam L [ORNL

    2011-01-01

    Oceanithermus profundus Miroshnichenko et al. 2003 is the type species of the genus Oceanithermus, which belongs to the family Thermaceae. The genus currently comprises two species whose members are thermophilic and are able to reduce sulfur compounds and nitrite. The organism is adapted to the salinity of sea water, is able to utilize a broad range of carbohydrates, some proteinaceous substrates, organic acids and alcohols. This is the first completed genome sequence of a member of the genus Oceanithermus and the fourth sequence from the family Thermaceae. The 2,439,291 bp long genome with its 2,391 protein-coding and 54 RNA genes consists of one chromosome and a 135,351 bp long plasmid, and is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  5. Phytophthora Genome Sequences Uncover Evolutionary Origins and Mechanisms of Pathogenesis

    Energy Technology Data Exchange (ETDEWEB)

    Lamour, Kurt H [ORNL; McDonald, W Hayes [ORNL; Savidor, Alon [ORNL

    2006-01-01

    Genome sequences of the soybean pathogen, Phytophthora sojae, and the sudden oak death pathogen, Phytophthora ramorum, suggest a photosynthetic past and reveal recent massive expansion and diversification of potential pathogenicity gene families. Abstract: Draft genome sequences of the soybean pathogen, Phytophthora sojae, and the sudden oak death pathogen, Phytophthora ramorum, have been determined. O mycetes such as these Phytophthora species share the kingdom Stramenopila with photosynthetic algae such as diatoms and the presence of many Phytophthora genes of probable phototroph origin support a photosynthetic ancestry for the stramenopiles. Comparison of the two species' genomes reveals a rapid expansion and diversification of many protein families associated with plant infection such as hydrolases, ABC transporters, protein toxins, proteinase inhibitors and, in particular, a superfamily of 700 proteins with similarity to known o mycete avirulence genes.

  6. Whole exome and whole genome sequencing with dried blood spot DNA without whole genome amplification.

    Science.gov (United States)

    Bassaganyas, Laia; Freedman, George; Vaka, Dedeepya; Wan, Eunice; Lao, Richard; Chen, Flavia; Kvale, Mark; Currier, Robert J; Puck, Jennifer M; Kwok, Pui-Yan

    2018-01-01

    Newborn screening (NBS) for rare conditions is performed in all 50 states in the USA. We have partnered with the California Department of Public Health Genetic Disease Laboratory to determine whether sufficient DNA can be extracted from archived dried blood spots (DBS) for next-generation sequencing in the hopes that next-generation sequencing can play a role in NBS. We optimized the DNA extraction and sequencing library preparation protocols for residual infant DBS archived over 20 years ago and successfully obtained acceptable whole exome and whole genome sequencing data. This sequencing study using DBS DNA without whole genome amplification prior to sequencing library preparation provides evidence that properly stored residual newborn DBS are a satisfactory source of DNA for genetic studies. © 2017 Wiley Periodicals, Inc.

  7. Genome sequence and genetic diversity of European ash trees

    DEFF Research Database (Denmark)

    Sollars, Elizabeth S A; Harper, Andrea L; Kelly, Laura J

    2017-01-01

    -sequence 37 F. excelsior trees from Europe, finding evidence for apparent long-term decline in effective population size. Using our reference sequence, we re-analyse association transcriptomic data, yielding improved markers for reduced susceptibility to ash dieback. Surveys of these markers in British...... to an emerging health threat in a non-model organism opens the way for mitigation of the epidemic....

  8. Quantifying Next Generation Sequencing Sample Pre-Processing Bias in HIV-1 Complete Genome Sequencing

    Directory of Open Access Journals (Sweden)

    Bram Vrancken

    2016-01-01

    Full Text Available Genetic analyses play a central role in infectious disease research. Massively parallelized “mechanical cloning” and sequencing technologies were quickly adopted by HIV researchers in order to broaden the understanding of the clinical importance of minor drug-resistant variants. These efforts have, however, remained largely limited to small genomic regions. The growing need to monitor multiple genome regions for drug resistance testing, as well as the obvious benefit for studying evolutionary and epidemic processes makes complete genome sequencing an important goal in viral research. In addition, a major drawback for NGS applications to RNA viruses is the need for large quantities of input DNA. Here, we use a generic overlapping amplicon-based near full-genome amplification protocol to compare low-input enzymatic fragmentation (Nextera™ with conventional mechanical shearing for Roche 454 sequencing. We find that the fragmentation method has only a modest impact on the characterization of the population composition and that for reliable results, the variation introduced at all steps of the procedure—from nucleic acid extraction to sequencing—should be taken into account, a finding that is also relevant for NGS technologies that are now more commonly used. Furthermore, by applying our protocol to deep sequence a number of pre-therapy plasma and PBMC samples, we illustrate the potential benefits of a near complete genome sequencing approach in routine genotyping.

  9. Complete genome sequence of 'Thermobaculum terrenum' type strain (YNP1).

    Science.gov (United States)

    Kiss, Hajnalka; Cleland, David; Lapidus, Alla; Lucas, Susan; Del Rio, Tijana Glavina; Nolan, Matt; Tice, Hope; Han, Cliff; Goodwin, Lynne; Pitluck, Sam; Liolios, Konstantinos; Ivanova, Natalia; Mavromatis, Konstantinos; Ovchinnikova, Galina; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D; Lu, Megan; Brettin, Thomas; Detter, John C; Göker, Markus; Tindall, Brian J; Beck, Brian; McDermott, Timothy R; Woyke, Tanja; Bristow, James; Eisen, Jonathan A; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Cheng, Jan-Fang

    2010-10-27

    'Thermobaculum terrenum' Botero et al. 2004 is the sole species within the proposed genus 'Thermobaculum'. Strain YNP1(T) is the only cultivated member of an acid tolerant, extremely thermophilic species belonging to a phylogenetically isolated environmental clone group within the phylum Chloroflexi. At present, the name 'Thermobaculum terrenum' is not yet validly published as it contravenes Rule 30 (3a) of the Bacteriological Code. The bacterium was isolated from a slightly acidic extreme thermal soil in Yellowstone National Park, Wyoming (USA). Depending on its final taxonomic allocation, this is likely to be the third completed genome sequence of a member of the class Thermomicrobia and the seventh type strain genome from the phylum Chloroflexi. The 3,101,581 bp long genome with its 2,872 protein-coding and 58 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  10. Supervised Learning for Detection of Duplicates in Genomic Sequence Databases.

    Science.gov (United States)

    Chen, Qingyu; Zobel, Justin; Zhang, Xiuzhen; Verspoor, Karin

    2016-01-01

    First identified as an issue in 1996, duplication in biological databases introduces redundancy and even leads to inconsistency when contradictory information appears. The amount of data makes purely manual de-duplication impractical, and existing automatic systems cannot detect duplicates as precisely as can experts. Supervised learning has the potential to address such problems by building automatic systems that learn from expert curation to detect duplicates precisely and efficiently. While machine learning is a mature approach in other duplicate detection contexts, it has seen only preliminary application in genomic sequence databases. We developed and evaluated a supervised duplicate detection method based on an expert curated dataset of duplicates, containing over one million pairs across five organisms derived from genomic sequence databases. We selected 22 features to represent distinct attributes of the database records, and developed a binary model and a multi-class model. Both models achieve promising performance; under cross-validation, the binary model had over 90% accuracy in each of the five organisms, while the multi-class model maintains high accuracy and is more robust in generalisation. We performed an ablation study to quantify the impact of different sequence record features, finding that features derived from meta-data, sequence identity, and alignment quality impact performance most strongly. The study demonstrates machine learning can be an effective additional tool for de-duplication of genomic sequence databases. All Data are available as described in the supplementary material.

  11. Secure distributed genome analysis for GWAS and sequence comparison computation

    Science.gov (United States)

    2015-01-01

    Background The rapid increase in the availability and volume of genomic data makes significant advances in biomedical research possible, but sharing of genomic data poses challenges due to the highly sensitive nature of such data. To address the challenges, a competition for secure distributed processing of genomic data was organized by the iDASH research center. Methods In this work we propose techniques for securing computation with real-life genomic data for minor allele frequency and chi-squared statistics computation, as well as distance computation between two genomic sequences, as specified by the iDASH competition tasks. We put forward novel optimizations, including a generalization of a version of mergesort, which might be of independent interest. Results We provide implementation results of our techniques based on secret sharing that demonstrate practicality of the suggested protocols and also report on performance improvements due to our optimization techniques. Conclusions This work describes our techniques, findings, and experimental results developed and obtained as part of iDASH 2015 research competition to secure real-life genomic computations and shows feasibility of securely computing with genomic data in practice. PMID:26733307

  12. Supplementary Material for: Whole genome sequencing reveals genomic heterogeneity and antibiotic purification in Mycobacterium tuberculosis isolates

    KAUST Repository

    Black, PA

    2015-01-01

    Abstract Background Whole genome sequencing has revolutionised the interrogation of mycobacterial genomes. Recent studies have reported conflicting findings on the genomic stability of Mycobacterium tuberculosis during the evolution of drug resistance. In an age where whole genome sequencing is increasingly relied upon for defining the structure of bacterial genomes, it is important to investigate the reliability of next generation sequencing to identify clonal variants present in a minor percentage of the population. This study aimed to define a reliable cut-off for identification of low frequency sequence variants and to subsequently investigate genetic heterogeneity and the evolution of drug resistance in M. tuberculosis. Methods Genomic DNA was isolated from single colonies from 14 rifampicin mono-resistant M. tuberculosis isolates, as well as the primary cultures and follow up MDR cultures from two of these patients. The whole genomes of the M. tuberculosis isolates were sequenced using either the Illumina MiSeq or Illumina HiSeq platforms. Sequences were analysed with an in-house pipeline. Results Using next-generation sequencing in combination with Sanger sequencing and statistical analysis we defined a read frequency cut-off of 30 % to identify low frequency M. tuberculosis variants with high confidence. Using this cut-off we demonstrated a high rate of genetic diversity between single colonies isolated from one population, showing that by using the current sequencing technology, single colonies are not a true reflection of the genetic diversity within a whole population and vice versa. We further showed that numerous heterogeneous variants emerge and then disappear during the evolution of isoniazid resistance within individual patients. Our findings allowed us to formulate a model for the selective bottleneck which occurs during the course of infection, acting as a genomic purification event. Conclusions Our study demonstrated true levels of genetic

  13. Whole genome sequencing reveals genomic heterogeneity and antibiotic purification in Mycobacterium tuberculosis isolates

    KAUST Repository

    Black, PA

    2015-10-24

    Background Whole genome sequencing has revolutionised the interrogation of mycobacterial genomes. Recent studies have reported conflicting findings on the genomic stability of Mycobacterium tuberculosis during the evolution of drug resistance. In an age where whole genome sequencing is increasingly relied upon for defining the structure of bacterial genomes, it is important to investigate the reliability of next generation sequencing to identify clonal variants present in a minor percentage of the population. This study aimed to define a reliable cut-off for identification of low frequency sequence variants and to subsequently investigate genetic heterogeneity and the evolution of drug resistance in M. tuberculosis. Methods Genomic DNA was isolated from single colonies from 14 rifampicin mono-resistant M. tuberculosis isolates, as well as the primary cultures and follow up MDR cultures from two of these patients. The whole genomes of the M. tuberculosis isolates were sequenced using either the Illumina MiSeq or Illumina HiSeq platforms. Sequences were analysed with an in-house pipeline. Results Using next-generation sequencing in combination with Sanger sequencing and statistical analysis we defined a read frequency cut-off of 30 % to identify low frequency M. tuberculosis variants with high confidence. Using this cut-off we demonstrated a high rate of genetic diversity between single colonies isolated from one population, showing that by using the current sequencing technology, single colonies are not a true reflection of the genetic diversity within a whole population and vice versa. We further showed that numerous heterogeneous variants emerge and then disappear during the evolution of isoniazid resistance within individual patients. Our findings allowed us to formulate a model for the selective bottleneck which occurs during the course of infection, acting as a genomic purification event. Conclusions Our study demonstrated true levels of genetic diversity

  14. Complete genome sequence of Actinosynnema mirum type strain (101T)

    Energy Technology Data Exchange (ETDEWEB)

    Land, Miriam; Lapidus, Alla; Mayilraj, Shanmugam; Chen, Feng; Copeland, Alex; Glavina Del Rio, Tijana; Nolan, Matt; Lucas, Susan; Tice, Hope; Cheng, Jan-Fang; Chertkov, Olga; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Rohde, Manfred; Goker, Markus; Pati, Amrita; Ivanova, Natalia; Mavrommatis, Konstantinos; Chen, Amy; Palaniappan, Krishna; Hauser, Loren; Chang, Yun-Juan; Jefferies, Cynthia; Brettin, Thomas; Detter, John C.; Han, Cliff; Chain, Patrick; Tindall, Brian; Bristow, James; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C.; Klenk, Hans-Peter

    2009-05-20

    Actinosynnema mirum Hasegawa et al. 1978 is the type species of the genus, and is of phylogenetic interest because of its central phylogenetic location in the Actino-synnemataceae, a rapidly growing family within the actinobacterial suborder Pseudo-nocardineae. A. mirum is characterized by its motile spores borne on synnemata and as a producer of nocardicin antibiotics. It is capable of growing aerobically and under a moderate CO2 atmosphere. The strain is a Gram-positive, aerial and substrate mycelium producing bacterium, originally isolated from a grass blade collected from the Raritan River, New Jersey. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first complete genome sequence of a member of the family Actinosynnemataceae, and only the second sequence from the actinobacterial suborder Pseudonocardineae. The 8,248,144 bp long single replicon genome with its 7100 protein-coding and 77 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  15. The impact of next-generation sequencing on genomics.

    Science.gov (United States)

    Zhang, Jun; Chiodini, Rod; Badr, Ahmed; Zhang, Genfa

    2011-03-20

    This article reviews basic concepts, general applications, and the potential impact of next-generation sequencing (NGS) technologies on genomics, with particular reference to currently available and possible future platforms and bioinformatics. NGS technologies have demonstrated the capacity to sequence DNA at unprecedented speed, thereby enabling previously unimaginable scientific achievements and novel biological applications. But, the massive data produced by NGS also presents a significant challenge for data storage, analyses, and management solutions. Advanced bioinformatic tools are essential for the successful application of NGS technology. As evidenced throughout this review, NGS technologies will have a striking impact on genomic research and the entire biological field. With its ability to tackle the unsolved challenges unconquered by previous genomic technologies, NGS is likely to unravel the complexity of the human genome in terms of genetic variations, some of which may be confined to susceptible loci for some common human conditions. The impact of NGS technologies on genomics will be far reaching and likely change the field for years to come. Copyright © 2011. Published by Elsevier Ltd.

  16. The complete chloroplast genome sequence of Dioscorea zingiberensis (Dioscoreceae).

    Science.gov (United States)

    Zhou, Wen; Chen, Chen; Hua, Wen-Ping; Wang, Zhe-Zhi

    2016-07-01

    Dioscorea zingiberensis (Dioscoreceae) is an important medicinal plant endemic to China. Here, its chloroplast genome sequence is reconstructed from the whole-genome Illumina sequencing data. The circular genome is 153,970 bp in length, and comprises a pair of inverted repeat (IR) regions of 25,491 bp each, a large single-copy (LSC) region of 83,950 bp and a small single-copy (SSC) region of 19,038 bp. The chloroplast genome contains 132 genes, including 86 protein-coding genes (79 PCG species), 8 ribosomal RNA genes (four rRNA species) and 38 transfer RNA genes (30 tRNA species). Out of these genes, 10 harbor a single intron, and 7 contain a couple of introns. The overall A + T content of the whole genome is 62.8%, while the corresponding values of the LSC, SSC and IR regions are 64.9%, 68.8% and 57.0%, respectively.

  17. Rapid identification and mapping of insertion sequences in Escherichia coli genomes using vectorette PCR

    Directory of Open Access Journals (Sweden)

    Dean Antony M

    2004-07-01

    Full Text Available Abstract Background Insertion sequences (IS are small DNA segments capable of transposing within and between prokaryotic genomes, often causing insertional mutations and chromosomal rearrangements. Although several methods are available for locating ISs in microbial genomes, they are either labor-intensive or inefficient. Here, we use vectorette PCR to identify and map the genomic positions of the eight insertion sequences (IS1, 2, 3, 4, 5, 30, 150, and 186 found in E. coli strain CGSC6300, a close relative of MG1655 whose genome has been sequenced. Results Genomic DNA from strain CGSC6300 was digested with a four-base cutter Rsa I and the resulting restriction fragments ligated onto vectorette units. Using IS-specific primers directed outward from the extreme ends of each IS and a vectorette primer, flanking DNA fragments were amplified from all but one of the 37 IS elements identified in the genomic sequence of MG1655. Purification and sequencing of the PCR products confirmed that they are IS-associated flanking DNA fragments corresponding to the known IS locations in the MG1655 genome. Seven additional insertions were found in strain CGSC6300 indicating that very closely related isolates of the same laboratory strain (the K12 isolate may differ in their IS complement. Two other E. coli K12 derivatives, TD2 and TD10, were also analyzed by vectorette PCR. They share 36 of the MG1655 IS sites as well as having 16 and 18 additional insertions, respectively. Conclusion This study shows that vectorette PCR is a swift, efficient, reliable method for typing microbial strains and identifying and mapping IS insertion sites present in microbial genomes. Unlike Southern hybridization and inverse PCR, our approach involves only one genomic digest and one ligation step. Vectorette PCR is then used to simultaneously amplify all IS elements of a given type, making it a rapid and sensitive means to survey IS elements in genomes. The ability to rapidly

  18. Next-Generation Sequencing and Genome Editing in Plant Virology

    Directory of Open Access Journals (Sweden)

    Ahmed Hadidi

    2016-08-01

    Full Text Available Next-generation sequencing (NGS has been applied to plant virology since 2009. NGS provides highly efficient, rapid, low cost DNA or RNA high-throughput sequencing of the genomes of plant viruses and viroids and of the specific small RNAs generated during the infection process. These small RNAs, which cover frequently the whole genome of the infectious agent, are 21-24 nt long and are known as vsRNAs for viruses and vd-sRNAs for viroids. NGS has been used in a number of studies in plant virology including, but not limited to, discovery of novel viruses and viroids as well as detection and identification of those pathogens already known, analysis of genome diversity and evolution, and study of pathogen epidemiology. The genome engineering editing method, clustered regularly interspaced short palindromic repeats (CRISPR-Cas9 system has been successfully used recently to engineer resistance to DNA geminiviruses (family, Geminiviridae by targeting different viral genome sequences in infected Nicotiana benthamiana or Arabidopsis plants. The DNA viruses targeted include tomato yellow leaf curl virus and merremia mosaic virus (begomovirus; beet curly top virus and beet severe curly top virus (curtovirus; and bean yellow dwarf virus (mastrevirus. The technique has also been used against the RNA viruses zucchini yellow mosaic virus, papaya ringspot virus and turnip mosaic virus (potyvirus and cucumber vein yellowing virus (ipomovirus, family, Potyviridae by targeting the translation initiation genes eIF4E in cucumber or Arabidopsis plants. From these recent advances of major importance, it is expected that NGS and CRISPR-Cas technologies will play a significant role in the very near future in advancing the field of plant virology and connecting it with other related fields of biology.Keywords: Next-generation sequencing, NGS, plant virology, plant viruses, viroids, resistance to plant viruses by CRISPR-Cas9

  19. Recent progress in the methods of genome sequencing

    Directory of Open Access Journals (Sweden)

    Zhao Ning-wei

    2010-04-01

    Full Text Available Genome sequencing is a very important tool for the development of genetic diagnosis, drugs of gene engineering, pharmacogenetics, etc. As the HGP comes into people's ears, there is an emerging need for the genome sequencing. During the recent years, there are two different traditional strategies available for this target: shotgun sequencing and hierarchical sequencing. Besides these, many efforts are pursuing new ideas to facilitate fast and cost-effective genome sequencing, including 454 GS system, polony sequencing, single molecular array, nanopore sequencing, with each having different unique characteristics, but remains to be fully developed.Sequenciação do genoma foi um instrumento muito importante para o desenvolvimento do diagnóstico genético, a droga da engenharia genética, farmacogenética, etc. Como o HGP entrar em ouvidos do povo, há uma necessidade emergente para a sequenciação do genoma. Durante os últimos anos, há duas diferentes estratégias tradicionais disponíveis para este objectivo: seqüenciamento shotgun hierárquico e sequenciação. Além desses, muitos esforços estão a prosseguir novas idéias para facilitar a rápida e eficaz em termos de custos sequenciação do genoma, incluindo 454 GS sistema, polony seqüenciamento, único molecular array, nanopore seqüenciamento, com cada um dos quais com diferentes características únicas, e que resta para ser mais desenvolvido.

  20. The complete chloroplast genome sequence of Hibiscus syriacus.

    Science.gov (United States)

    Kwon, Hae-Yun; Kim, Joon-Hyeok; Kim, Sea-Hyun; Park, Ji-Min; Lee, Hyoshin

    2016-09-01

    The complete chloroplast genome sequence of Hibiscus syriacus L. is presented in this study. The genome is composed of 161 019 bp in length, with a typical circular structure containing a pair of inverted repeats of 25 745 bp of length separated by a large single-copy region and a small single-copy region of 89 698 bp and 19 831 bp of length, respectively. The overall GC content is 36.8%. One hundred and fourteen genes were annotated, including 81 protein-coding genes, 4 ribosomal RNA genes and 29 transfer RNA genes.

  1. From Genome Sequence to Taxonomy - A Skeptic’s View

    DEFF Research Database (Denmark)

    Özen, Asli Ismihan; Vesth, Tammi Camilla; Ussery, David

    2012-01-01

    genome sequences to classify bacteria, and the method of choice, as always, depends on the question asked and the particular need. For example, 16S rRNA can define a bacterial species, and relate species, genera, and higher orders into groups consistent with their known biological properties. However...... many commonly used methods and also describes potential pitfalls if used inappropriately, as well as which questions are best addressed by particular methods. After a brief introduction to the classical methods of taxonomy, a description of the bacterial genomes currently available is given...

  2. Draft genome sequence of Rhodococcus rhodochrous strain ATCC 17895

    Science.gov (United States)

    Chen, Bi-Shuang; Otten, Linda G.; Resch, Verena; Muyzer, Gerard; Hanefeld, Ulf

    2013-01-01

    Rhodococcus rhodochrous ATCC 17895 possesses an array of mono- and dioxygenases, as well as hydratases, which makes it an interesting organism for biocatalysis. R. rhodochrous is a Gram-positive aerobic bacterium with a rod-like morphology. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 6,869,887 bp long genome contains 6,609 protein-coding genes and 53 RNA genes. Based on small subunit rRNA analysis, the strain is more likely to be a strain of Rhodococcus erythropolis rather than Rhodococcus rhodochrous. PMID:24501654

  3. Whole-Genome Sequencing: Automated, Indexed Library Preparation.

    Science.gov (United States)

    Mardis, Elaine; McCombie, W Richard

    2017-03-01

    This protocol describes an automated procedure for constructing an indexed Illumina DNA library. With this method, genomic DNA fragments are produced by sonication, using high-frequency acoustic energy to shear DNA. Double-stranded DNA (dsDNA) will fragment when exposed to the energy of adaptive focused acoustic shearing (AFA). The resulting DNA fragments are ligated to adaptors, amplified by polymer chain reaction (PCR), and subjected to size selection using magnetic beads. The product is suitable for use as template in whole-genome sequencing. © 2017 Cold Spring Harbor Laboratory Press.

  4. Phytophthora Genome Sequences Uncover Evolutionary Origins and Mechanisms of Pathogenesis

    Energy Technology Data Exchange (ETDEWEB)

    Tyler, Brett M.; Tripathy, Sucheta; Zhang, Xuemin; Dehal, Paramvir; Jiang, Rays H. Y.; Aerts, Andrea; Arredondo, Felipe D.; Baxter, Laura; Bensasson, Douda; Beynon, JIm L.; Chapman, Jarrod; Damasceno, Cynthia M. B.; Dorrance, Anne E.; Dou, Daolong; Dickerman, Allan W.; Dubchak, Inna L.; Garbelotto, Matteo; Gijzen, Mark; Gordon, Stuart G.; Govers, Francine; Grunwald, NIklaus J.; Huang, Wayne; Ivors, Kelly L.; Jones, Richard W.; Kamoun, Sophien; Krampis, Konstantinos; Lamour, Kurt H.; Lee, Mi-Kyung; McDonald, W. Hayes; Medina, Monica; Meijer, Harold J. G.; Nordberg, Erik K.; Maclean, Donald J.; Ospina-Giraldo, Manuel D.; Morris, Paul F.; Phuntumart, Vipaporn; Putnam, Nicholas J.; Rash, Sam; Rose, Jocelyn K. C.; Sakihama, Yasuko; Salamov, Asaf A.; Savidor, Alon; Scheuring, Chantel F.; Smith, Brian M.; Sobral, Bruno W. S.; Terry, Astrid; Torto-Alalibo, Trudy A.; Win, Joe; Xu, Zhanyou; Zhang, Hongbin; Grigoriev, Igor V.; Rokhsar, Daniel S.; Boore, Jeffrey L.

    2006-04-17

    Draft genome sequences have been determined for the soybean pathogen Phytophthora sojae and the sudden oak death pathogen Phytophthora ramorum. Oömycetes such as these Phytophthora species share the kingdom Stramenopila with photosynthetic algae such as diatoms, and the presence of many Phytophthora genes of probable phototroph origin supports a photosynthetic ancestry for the stramenopiles. Comparison of the two species' genomes reveals a rapid expansion and diversification of many protein families associated with plant infection such as hydrolases, ABC transporters, protein toxins, proteinase inhibitors, and, in particular, a superfamily of 700 proteins with similarity to known oömycete avirulence genes.

  5. Whole genome sequence-based serogrouping of Listeria monocytogenes isolates.

    Science.gov (United States)

    Hyden, Patrick; Pietzka, Ariane; Lennkh, Anna; Murer, Andrea; Springer, Burkhard; Blaschitz, Marion; Indra, Alexander; Huhulescu, Steliana; Allerberger, Franz; Ruppitsch, Werner; Sensen, Christoph W

    2016-10-10

    Whole genome sequencing (WGS) is currently becoming the method of choice for characterization of Listeria monocytogenes isolates in national reference laboratories (NRLs). WGS is superior with regards to accuracy, resolution and analysis speed in comparison to several other methods including serotyping, PCR, pulsed field gel electrophoresis (PFGE), multilocus sequence typing (MLST), multilocus variable number tandem repeat analysis (MLVA), and multivirulence-locus sequence typing (MVLST), which have been used thus far for the characterization of bacterial isolates (and are still important tools in reference laboratories today) to control and prevent listeriosis, one of the major sources of foodborne diseases for humans. Backward compatibility of WGS to former methods can be maintained by extraction of the respective information from WGS data. Serotyping was the first subtyping method for L. monocytogenes capable of differentiating 12 serovars and national reference laboratories still perform serotyping and PCR-based serogrouping as a first level classification method for Listeria monocytogenes surveillance. Whole genome sequence based core genome MLST analysis of a L. monocytogenes collection comprising 172 isolates spanning all 12 serotypes was performed for serogroup determination. These isolates clustered according to their serotypes and it was possible to group them either into the IIa, IIc, IVb or IIb clusters, respectively, which were generated by minimum spanning tree (MST) and neighbor joining (NJ) tree data analysis, demonstrating the power of the new approach. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.

  6. The mitochondrial genome sequence of the Tasmanian tiger (Thylacinus cynocephalus)

    DEFF Research Database (Denmark)

    Miller, Webb; Drautz, Daniela I; Janecka, Jan E

    2009-01-01

    We report the first two complete mitochondrial genome sequences of the thylacine (Thylacinus cynocephalus), or so-called Tasmanian tiger, extinct since 1936. The thylacine's phylogenetic position within australidelphian marsupials has long been debated, and here we provide strong support for the ......We report the first two complete mitochondrial genome sequences of the thylacine (Thylacinus cynocephalus), or so-called Tasmanian tiger, extinct since 1936. The thylacine's phylogenetic position within australidelphian marsupials has long been debated, and here we provide strong support...... for the thylacine's basal position in Dasyuromorphia, aided by mitochondrial genome sequence that we generated from the extant numbat (Myrmecobius fasciatus). Surprisingly, both of our thylacine sequences differ by 11%-15% from putative thylacine mitochondrial genes in GenBank, with one of our samples originating...... at a very low genetic diversity shortly before extinction. Despite the samples' heavy contamination with bacterial and human DNA and their temperate storage history, we estimate that as much as one-third of the total DNA in each sample is from the thylacine. The microbial content of the two thylacine...

  7. Whole genome sequencing of Plasmodium falciparum from dried blood spots using selective whole genome amplification.

    Science.gov (United States)

    Oyola, Samuel O; Ariani, Cristina V; Hamilton, William L; Kekre, Mihir; Amenga-Etego, Lucas N; Ghansah, Anita; Rutledge, Gavin G; Redmond, Seth; Manske, Magnus; Jyothi, Dushyanth; Jacob, Chris G; Otto, Thomas D; Rockett, Kirk; Newbold, Chris I; Berriman, Matthew; Kwiatkowski, Dominic P

    2016-12-20

    Translating genomic technologies into healthcare applications for the malaria parasite Plasmodium falciparum has been limited by the technical and logistical difficulties of obtaining high quality clinical samples from the field. Sampling by dried blood spot (DBS) finger-pricks can be performed safely and efficiently with minimal resource and storage requirements compared with venous blood (VB). Here, the use of selective whole genome amplification (sWGA) to sequence the P. falciparum genome from clinical DBS samples was evaluated, and the results compared with current methods that use leucodepleted VB. Parasite DNA with high (>95%) human DNA contamination was selectively amplified by Phi29 polymerase using short oligonucleotide probes of 8-12 mers as primers. These primers were selected on the basis of their differential frequency of binding the desired (P. falciparum DNA) and contaminating (human) genomes. Using sWGA method, clinical samples from 156 malaria patients, including 120 paired samples for head-to-head comparison of DBS and leucodepleted VB were sequenced. Greater than 18-fold enrichment of P. falciparum DNA was achieved from DBS extracts. The parasitaemia threshold to achieve >5× coverage for 50% of the genome was 0.03% (40 parasites per 200 white blood cells). Over 99% SNP concordance between VB and DBS samples was achieved after excluding missing calls. The sWGA methods described here provide a reliable and scalable way of generating P. falciparum genome sequence data from DBS samples. The current data indicate that it will be possible to get good quality sequence on most if not all drug resistance loci from the majority of symptomatic malaria patients. This technique overcomes a major limiting factor in P. falciparum genome sequencing from field samples, and paves the way for large-scale epidemiological applications.

  8. Human genetics and genomics a decade after the release of the draft sequence of the human genome.

    Science.gov (United States)

    Naidoo, Nasheen; Pawitan, Yudi; Soong, Richie; Cooper, David N; Ku, Chee-Seng

    2011-10-01

    Substantial progress has been made in human genetics and genomics research over the past ten years since the publication of the draft sequence of the human genome in 2001. Findings emanating directly from the Human Genome Project, together with those from follow-on studies, have had an enormous impact on our understanding of the architecture and function of the human genome. Major developments have been made in cataloguing genetic variation, the International HapMap Project, and with respect to advances in genotyping technologies. These developments are vital for the emergence of genome-wide association studies in the investigation of complex diseases and traits. In parallel, the advent of high-throughput sequencing technologies has ushered in the 'personal genome sequencing' era for both normal and cancer genomes, and made possible large-scale genome sequencing studies such as the 1000 Genomes Project and the International Cancer Genome Consortium. The high-throughput sequencing and sequence-capture technologies are also providing new opportunities to study Mendelian disorders through exome sequencing and whole-genome sequencing. This paper reviews these major developments in human genetics and genomics over the past decade.

  9. Human genetics and genomics a decade after the release of the draft sequence of the human genome

    Science.gov (United States)

    2011-01-01

    Substantial progress has been made in human genetics and genomics research over the past ten years since the publication of the draft sequence of the human genome in 2001. Findings emanating directly from the Human Genome Project, together with those from follow-on studies, have had an enormous impact on our understanding of the architecture and function of the human genome. Major developments have been made in cataloguing genetic variation, the International HapMap Project, and with respect to advances in genotyping technologies. These developments are vital for the emergence of genome-wide association studies in the investigation of complex diseases and traits. In parallel, the advent of high-throughput sequencing technologies has ushered in the 'personal genome sequencing' era for both normal and cancer genomes, and made possible large-scale genome sequencing studies such as the 1000 Genomes Project and the International Cancer Genome Consortium. The high-throughput sequencing and sequence-capture technologies are also providing new opportunities to study Mendelian disorders through exome sequencing and whole-genome sequencing. This paper reviews these major developments in human genetics and genomics over the past decade. PMID:22155605

  10. Sequence analysis of Schmallenberg virus genomes detected in Hungary.

    Science.gov (United States)

    Fehér, Enikő; Marton, Szilvia; Tóth, Ádám György; Ursu, Krisztina; Wernike, Kerstin; Beer, Martin; Dán, Ádám; Bányai, Krisztián

    2017-12-01

    Since its emergence near the German-Dutch border in 2011, Schmallenberg virus (SBV) has been identified in many European countries. In this study, we determined the complete coding sequence of seven Hungarian SBV genomes to expand our knowledge about the genetic diversity of circulating field strains. The samples originated from the first case, an aborted cattle fetus without malformation collected in 2012, and from the blood samples of six adult cattle in 2014. The Hungarian SBV sequences shared ≥99.3% nucleotide (nt) and ≥97.8% amino acid (aa) identity with each other, and ≥98.9 nt and ≥96.7% aa identity with reference strains. Although phylogenetic analyses showed low resolution in general, the M sequences of cattle and sheep origin SBV strains seemed to cluster on different branches. Both common and unique mutation sites were observed in different groups of sequences that might help understanding the evolution of emerging SBV strains.

  11. Draft genome sequence of Cellulomonas carbonis T26T and comparative analysis of six Cellulomonas genomes

    OpenAIRE

    Zhuang, Weiping; Zhang, Shengzhe; Xia, Xian; Wang, Gejiao

    2015-01-01

    Most Cellulomonas strains are cellulolytic and this feature may be applied in straw degradation and bioremediation. In this study, Cellulomonas carbonis T26T, Cellulomonas bogoriensis DSM 16987T and Cellulomonas cellasea 20108T were sequenced. Here we described the draft genomic information of C. carbonis T26T and compared it to the related Cellulomonas genomes. Strain T26T has a 3,990,666?bp genome size with a G?+?C content of 73.4?%, containing 3418 protein-coding genes and 59 RNA genes. Th...

  12. A model of the statistical power of comparative genome sequence analysis.

    OpenAIRE

    Sean R Eddy

    2005-01-01

    Comparative genome sequence analysis is powerful, but sequencing genomes is expensive. It is desirable to be able to predict how many genomes are needed for comparative genomics, and at what evolutionary distances. Here I describe a simple mathematical model for the common problem of identifying conserved sequences. The model leads to some useful rules of thumb. For a given evolutionary distance, the number of comparative genomes needed for a constant level of statistical stringency in identi...

  13. Permanent draft genome sequence of Comamonas testosteroni KF-1.

    Science.gov (United States)

    Weiss, Michael; Kesberg, Anna I; Labutti, Kurt M; Pitluck, Sam; Bruce, David; Hauser, Loren; Copeland, Alex; Woyke, Tanja; Lowry, Stephen; Lucas, Susan; Land, Miriam; Goodwin, Lynne; Kjelleberg, Staffan; Cook, Alasdair M; Buhmann, Matthias; Thomas, Torsten; Schleheck, David

    2013-01-01

    Comamonas testosteroni KF-1 is a model organism for the elucidation of the novel biochemical degradation pathways for xenobiotic 4-sulfophenylcarboxylates (SPC) formed during biodegradation of synthetic 4-sulfophenylalkane surfactants (linear alkylbenzenesulfonates, LAS) by bacterial communities. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 6,026,527 bp long chromosome (one sequencing gap) exhibits an average G+C content of 61.79% and is predicted to encode 5,492 protein-coding genes and 114 RNA genes.

  14. Construction of an integrated database to support genomic sequence analysis

    Energy Technology Data Exchange (ETDEWEB)

    Gilbert, W.; Overbeek, R.

    1994-11-01

    The central goal of this project is to develop an integrated database to support comparative analysis of genomes including DNA sequence data, protein sequence data, gene expression data and metabolism data. In developing the logic-based system GenoBase, a broader integration of available data was achieved due to assistance from collaborators. Current goals are to easily include new forms of data as they become available and to easily navigate through the ensemble of objects described within the database. This report comments on progress made in these areas.

  15. Genome sequence and description of Anaerosalibacter massiliensis sp. nov.

    Directory of Open Access Journals (Sweden)

    N. Dione

    2016-03-01

    Full Text Available Anaerosalibacter massiliensis sp. nov. strain ND1T (= CSUR P762 = DSM 27308 is the type strain of A. massiliensis sp. nov., a new species within the genus Anaerosalibacter. This strain, the genome of which is described here, was isolated from the faecal flora of a 49-year-old healthy Brazilian man. Anaerosalibacter massiliensis is a Gram-positive, obligate anaerobic rod and member of the family Clostridiaceae. With the complete genome sequence and annotation, we describe here the features of this organism. The 3 197 911 bp long genome (one chromosome but no plasmid contains 3271 protein-coding and 62 RNA genes, including six rRNA genes.

  16. Mapping copy number variation by population-scale genome sequencing

    DEFF Research Database (Denmark)

    Mills, Ryan E.; Walter, Klaudia; Stewart, Chip

    2011-01-01

    , copy number variants) based on whole genome DNA sequencing data from 185 human genomes, integrating evidence from complementary SV discovery approaches with extensive experimental validations. Our map encompassed 22,025 deletions and 6,000 additional SVs, including insertions and tandem duplications....... Most SVs (53%) were mapped to nucleotide resolution, which facilitated analysing their origin and functional impact. We examined numerous whole and partial gene deletions with a genotyping approach and observed a depletion of gene disruptions amongst high frequency deletions. Furthermore, we observed...... differences in the size spectra of SVs originating from distinct formation mechanisms, and constructed a map of SV hotspots formed by common mechanisms. Our analytical framework and SV map serves as a resource for sequencing-based association studies....

  17. Genome sequence resource for the oomycete taro pathogen Phytophthora colocasiae.

    Science.gov (United States)

    Vetukuri, Ramesh Raju; Kushwaha, Sandeep; Sen, Diya; Whisson, Stephen C; Lamour, Kurt; Grenville-Briggs, Laura

    2018-03-20

    Phytophthora colocasiae is a phytopathogenic oomycete that causes leaf blight and corm rot on taro (Colocasia esculenta), an important staple crop in the tropics. The impact of P. colocasiae is a serious concern for food security in Asian and Oceanic regions. Vietnamese strain 7290 of P. colocasiae was sequenced (Illumina) to assemble a draft genome of 56.6 Mb, comprised of 19853 scaffolds and 19984 predicted protein-coding genes. As in other Phytophthora species, P. colocasiae possesses numerous pathogenicity-related genes, such as the RxLR class of effectors. This draft genome sequence of P. colocasiae provides a resource for underpinning the first steps into determining the molecular mechanisms of disease development in this pathosystem.

  18. The complete chloroplast genome sequence of Euonymus japonicus (Celastraceae).

    Science.gov (United States)

    Choi, Kyoung Su; Park, SeonJoo

    2016-09-01

    The complete chloroplast (cp) genome sequence of the Euonymus japonicus, the first sequenced of the genus Euonymus, was reported in this study. The total length was 157 637 bp, containing a pair of 26 678 bp inverted repeat region (IR), which were separated by small single copy (SSC) region and large single copy (LSC) region of 18 340 bp and 85 941 bp, respectively. This genome contains 107 unique genes, including 74 coding genes, four rRNA genes, and 29 tRNA genes. Seventeen genes contain intron of E. japonicus, of which three genes (clpP, ycf3, and rps12) include two introns. The maximum likelihood (ML) phylogenetic analysis revealed that E. japonicus was closely related to Manihot and Populus.

  19. Combined evidence annotation of transposable elements in genome sequences.

    Directory of Open Access Journals (Sweden)

    Hadi Quesneville

    2005-07-01

    Full Text Available Transposable elements (TEs are mobile, repetitive sequences that make up significant fractions of metazoan genomes. Despite their near ubiquity and importance in genome and chromosome biology, most efforts to annotate TEs in genome sequences rely on the results of a single computational program, RepeatMasker. In contrast, recent advances in gene annotation indicate that high-quality gene models can be produced from combining multiple independent sources of computational evidence. To elevate the quality of TE annotations to a level comparable to that of gene models, we have developed a combined evidence-model TE annotation pipeline, analogous to systems used for gene annotation, by integrating results from multiple homology-based and de novo TE identification methods. As proof of principle, we have annotated "TE models" in Drosophila melanogaster Release 4 genomic sequences using the combined computational evidence derived from RepeatMasker, BLASTER, TBLASTX, all-by-all BLASTN, RECON, TE-HMM and the previous Release 3.1 annotation. Our system is designed for use with the Apollo genome annotation tool, allowing automatic results to be curated manually to produce reliable annotations. The euchromatic TE fraction of D. melanogaster is now estimated at 5.3% (cf. 3.86% in Release 3.1, and we found a substantially higher number of TEs (n = 6,013 than previously identified (n = 1,572. Most of the new TEs derive from small fragments of a few hundred nucleotides long and highly abundant families not previously annotated (e.g., INE-1. We also estimated that 518 TE copies (8.6% are inserted into at least one other TE, forming a nest of elements. The pipeline allows rapid and thorough annotation of even the most complex TE models, including highly deleted and/or nested elements such as those often found in heterochromatic sequences. Our pipeline can be easily adapted to other genome sequences, such as those of the D. melanogaster heterochromatin or other

  20. A Genome Sequencing Program for Novel Undiagnosed Diseases

    OpenAIRE

    Bloss, Cinnamon S.; Scott-Van Zeeland, Ashley A.; Topol, Sarah E.; Darst, Burcu F.; Boeldt, Debra L.; Erikson, Galina A.; Bethel, Kelly J.; Bjork, Robert L.; Friedman, Jennifer R.; Hwynn, Nelson; Patay, Bradley A.; Pockros, Paul J.; Scott, Erick R.; Simon, Ronald A.; Williams, Gary W.

    2015-01-01

    Purpose The Scripps Idiopathic Diseases of huMan (IDIOM) study aims to discover novel gene-disease relationships and provide molecular genetic diagnosis and treatment guidance for individuals with novel diseases using genome sequencing integrated with clinical assessment and multidisciplinary case review. Methods Here we describe the IDIOM study operational protocol and initial results. Results 121 cases underwent first tier review by the principal investigators to determine if the primary in...

  1. Whole Genome Sequencing of a Healthy Aging Cohort

    OpenAIRE

    Erikson, Galina A.; Bodian, Dale L.; Rueda, Manuel; Molparia, Bhuvan; Scott, Erick R.; Scott-Van Zeeland, Ashley A.; Topol, Sarah E.; Wineinger, Nathan E.; Niederhuber, John E.; Topol, Eric J.; Torkamani, Ali

    2016-01-01

    Studies of long-lived individuals have revealed few genetic mechanisms for protection against age-associated disease. Therefore, we pursued genome sequencing of a related phenotype – healthy aging – to understand the genetics of disease-free aging without medical intervention. In contrast with studies of exceptional longevity, usually focused on centenarians, healthy aging is not associated with known longevity variants but is associated with reduced genetic susceptibility to Alzheimer and co...

  2. Complete Genome Sequence of Mycobacterium xenopi Type Strain RIVM700367

    KAUST Repository

    Abdallah, A. M.

    2012-05-24

    Mycobacterium xenopi is a slow-growing, thermophilic, water-related Mycobacterium species. Like other nontuberculous mycobacteria, M. xenopi more commonly infects humans with altered immune function, such as chronic obstructive pulmonary disease patients. It is considered clinically relevant in a significant proportion of the patients from whom it is isolated. We report here the whole genome sequence of M. xenopi type strain RIVM700367.

  3. Genome Sequence of the Human Pathogen Vibrio cholerae Amazonia

    Science.gov (United States)

    Thompson, Cristiane C.; Marin, Michel A.; Dias, Graciela M.; Dutilh, Bas E.; Edwards, Robert A.; Iida, Tetsuya; Thompson, Fabiano L.; Vicente, Ana Carolina P.

    2011-01-01

    Vibrio cholerae O1 Amazonia is a pathogen that was isolated from cholera-like diarrhea cases in at least two countries, Brazil and Ghana. Based on multilocus sequence analysis, this lineage belongs to a distinct profile compared to strains from El Tor and classical biotypes. The genomic analysis revealed that it contains Vibrio pathogenicity island 2 and a set of genes related to pathogenesis and fitness, such as the type VI secretion system, present in choleragenic V. cholerae strains. PMID:21952545

  4. Genome Sequence of Streptomyces sp. Strain Tü6071▿

    OpenAIRE

    Erxleben, Anika; Wunsch-Palasis, Julia; Grüning, Björn A.; Luzhetska, Marta; Bechthold, Andreas; Günther, Stefan

    2011-01-01

    Streptomyces sp. Tü6071 is a soil-dwelling bacterium which has a highly active isoprenoid biosynthesis. Isoprenoids are important precursors for biopharmaceutical molecules such as antibiotics or anticancer agents, e.g., landomycin. Streptomyces sp. Tü6071 produces the industrially important terpene glycosides phenalinolactones, which have antibacterial activity against several Gram-positive bacteria. The availability of the genome sequence of Streptomyces sp. Tü6071 allows for understanding ...

  5. Complete Genome Sequence of Prevotella intermedia Strain 17-2.

    Science.gov (United States)

    Nambu, Takayuki; Yamane, Kazuyoshi; Maruyama, Hugo; Mashimo, Chiho; Yamanaka, Takeshi

    2015-08-20

    Prevotella intermedia, a Gram-negative black-pigmented anaerobic rod, is frequently isolated from not only periodontal pockets but also purulent infections. We report here the complete genome sequence of P. intermedia strain 17-2, which is a non-exopolysaccharide-producing variant obtained from exopolysaccharide (EPS)-producing P. intermedia strain 17 stock culture. Copyright © 2015 Nambu et al.

  6. Registered Report: Melanoma genome sequencing reveals frequent PREX2 mutations

    OpenAIRE

    sprotocols

    2015-01-01

    Authors: Denise Chroscinski, Darryl Sampey, Alex Hewitt, The Reproducibility Project: Cancer Biology† ### Abstract The [Reproducibility Project: Cancer Biology](https://osf.io/e81xl/wiki/home/) seeks to address growing concerns about reproducibility in scientific research by conducting replications of 50 papers in the field of cancer biology published between 2010 and 2012. This Registered Report describes the proposed replication plan of key experiments from “Melanoma genome sequenci...

  7. The complete chloroplast genome sequence of Chrysanthemum indicum.

    Science.gov (United States)

    Xia, Ye; Hu, Zhigang; Li, Xiwen; Wang, Ping; Zhang, Xiuqiao; Li, Qing; Lu, Chaolong

    2016-11-01

    Chrysanthemum indicum, an important medicinal plant of Asteraceae, had a long history in use for medicine in China. In this study, the complete chloroplast genome of C. indicum was sequenced by a 454 sequencing platform, and the structure of the obtained chloroplast genome was also analyzed. The complete chloroplast genome of C. indicum was 150 972 bp in length and had a pair of inverted repeats (IR, 24 956 bp) separated by a large (LSC, 82 741 bp) and small single copy (SSC, 18 319 bp) regions. Its total GC content was 37.48%. There were 126 chloroplast genes including 83 protein-coding genes, 35 tRNAs and eight rRNAs were successfully annotated. Sixteen genes contained one or two introns. Phylogenetic analyses declared that the chloroplast genome could distinguish C. indicum from its closely related species and might become a potential super barcode for the identification of these species.

  8. Genome sequence of Yersinia pestis, the causative agent of plague.

    Science.gov (United States)

    Parkhill, J; Wren, B W; Thomson, N R; Titball, R W; Holden, M T; Prentice, M B; Sebaihia, M; James, K D; Churcher, C; Mungall, K L; Baker, S; Basham, D; Bentley, S D; Brooks, K; Cerdeño-Tárraga, A M; Chillingworth, T; Cronin, A; Davies, R M; Davis, P; Dougan, G; Feltwell, T; Hamlin, N; Holroyd, S; Jagels, K; Karlyshev, A V; Leather, S; Moule, S; Oyston, P C; Quail, M; Rutherford, K; Simmonds, M; Skelton, J; Stevens, K; Whitehead, S; Barrell, B G

    2001-10-04

    The Gram-negative bacterium Yersinia pestis is the causative agent of the systemic invasive infectious disease classically referred to as plague, and has been responsible for three human pandemics: the Justinian plague (sixth to eighth centuries), the Black Death (fourteenth to nineteenth centuries) and modern plague (nineteenth century to the present day). The recent identification of strains resistant to multiple drugs and the potential use of Y. pestis as an agent of biological warfare mean that plague still poses a threat to human health. Here we report the complete genome sequence of Y. pestis strain CO92, consisting of a 4.65-megabase (Mb) chromosome and three plasmids of 96.2 kilobases (kb), 70.3 kb and 9.6 kb. The genome is unusually rich in insertion sequences and displays anomalies in GC base-composition bias, indicating frequent intragenomic recombination. Many genes seem to have been acquired from other bacteria and viruses (including adhesins, secretion systems and insecticidal toxins). The genome contains around 150 pseudogenes, many of which are remnants of a redundant enteropathogenic lifestyle. The evidence of ongoing genome fluidity, expansion and decay suggests Y. pestis is a pathogen that has undergone large-scale genetic flux and provides a unique insight into the ways in which new and highly virulent pathogens evolve.

  9. Complete genome sequence of Kytococcus sedentarius type strain (541).

    Science.gov (United States)

    Sims, David; Brettin, Thomas; Detter, John C; Han, Cliff; Lapidus, Alla; Copeland, Alex; Glavina Del Rio, Tijana; Nolan, Matt; Chen, Feng; Lucas, Susan; Tice, Hope; Cheng, Jan-Fang; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Ovchinnikova, Galina; Pati, Amrita; Ivanova, Natalia; Mavrommatis, Konstantinos; Chen, Amy; Palaniappan, Krishna; D'haeseleer, Patrik; Chain, Patrick; Bristow, Jim; Eisen, Jonathan A; Markowitz, Victor; Hugenholtz, Philip; Schneider, Susanne; Göker, Markus; Pukall, Rüdiger; Kyrpides, Nikos C; Klenk, Hans-Peter

    2009-07-20

    Kytococcus sedentarius (ZoBell and Upham 1944) Stackebrandt et al. 1995 is the type strain of the species, and is of phylogenetic interest because of its location in the Dermacoccaceae, a poorly studied family within the actinobacterial suborder Micrococcineae. Kytococcus sedentarius is known for the production of oligoketide antibiotics as well as for its role as an opportunistic pathogen causing valve endocarditis, hemorrhagic pneumonia, and pitted keratolysis. It is strictly aerobic and can only grow when several amino acids are provided in the medium. The strain described in this report is a free-living, nonmotile, Gram-positive bacterium, originally isolated from a marine environment. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of a member of the family Dermacoccaceae and the 2,785,024 bp long single replicon genome with its 2639 protein-coding and 64 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  10. Complete genome sequence of Kytococcus sedentarius type strain (541T)

    Science.gov (United States)

    Sims, David; Brettin, Thomas; Detter, John C.; Han, Cliff; Lapidus, Alla; Copeland, Alex; Glavina Del Rio, Tijana; Nolan, Matt; Chen, Feng; Lucas, Susan; Tice, Hope; Cheng, Jan-Fang; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Ovchinnikova, Galina; Pati, Amrita; Ivanova, Natalia; Mavrommatis, Konstantinos; Chen, Amy; Palaniappan, Krishna; D'haeseleer, Patrik; Chain, Patrick; Bristow, Jim; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Schneider, Susanne; Göker, Markus; Pukall, Rüdiger; Kyrpides, Nikos C.; Klenk, Hans-Peter

    2009-01-01

    Kytococcus sedentarius (ZoBell and Upham 1944) Stackebrandt et al. 1995 is the type strain of the species, and is of phylogenetic interest because of its location in the Dermacoccaceae, a poorly studied family within the actinobacterial suborder Micrococcineae. Kytococcus sedentarius is known for the production of oligoketide antibiotics as well as for its role as an opportunistic pathogen causing valve endocarditis, hemorrhagic pneumonia, and pitted keratolysis. It is strictly aerobic and can only grow when several amino acids are provided in the medium. The strain described in this report is a free-living, nonmotile, Gram-positive bacterium, originally isolated from a marine environment. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of a member of the family Dermacoccaceae and the 2,785,024 bp long single replicon genome with its 2639 protein-coding and 64 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:21304632

  11. Complete genome sequence of Desulfomicrobium baculatum type strain (XT)

    Energy Technology Data Exchange (ETDEWEB)

    Copeland, Alex; Spring, Stefan; Goker, Markus; Schneider, Susanne; Lapidus, Alla; Glavina Del Rio, Tijana; Tice, Hope; Cheng, Jan-Fang; Lucas, Susan; Chen, Feng; Nolan, Matt; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Ivanova, Natalia; Mavrommatis, Konstantinos; Ovchinnikova, Galina; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jefferies, Cynthia C; Meincke, Linda; Sims, David; Brettin, Thomas; Detter, John C; Han, Cliff; Chain, Patrick; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Klenk, Hans-Peter; Kyrpides, Nikos C; Lucas, Susan

    2009-05-20

    Desulfomicrobium baculatum is the type species of the genus Desulfomicrobium, which is the type genus of the family Desulfomicrobiaceae. It is of phylogenetic interest because of the isolated location of the family Desulfomicrobiaceae within the order Desulfovibrionales. D. baculatum strain XT is a Gram-negative, motile, sulfate-reducing bacterium isolated from water-saturated manganese carbonate ore. It is strictly anaerobic and does not require NaCl for growth, although NaCl concentrations up to 6percent (w/v) are tolerated. The metabolism is respiratory or fermentative. In the presence of sulfate, pyruvate and lactate are incompletely oxidized to acetate and CO2. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of a member of the deltaproteobacterial family Desulfomicrobiaceae, and this 3,942,657 bp long single replicon genome with its 3494 protein-coding and 72 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  12. Genome sequence of Arthrobacter antarcticus strain W2, isolated from a slaughterhouse

    DEFF Research Database (Denmark)

    Herschend, Jakob; Raghupathi, Prem Krishnan; Røder, Henriette Lyng

    2016-01-01

    We report the draft genome sequence ofArthrobacter antarcticusstrain W2, which was isolated from a wall of a small slaughterhouse in Denmark. The 4.43-Mb genome sequence was assembled into 170 contigs.......We report the draft genome sequence ofArthrobacter antarcticusstrain W2, which was isolated from a wall of a small slaughterhouse in Denmark. The 4.43-Mb genome sequence was assembled into 170 contigs....

  13. Genome sequence of Psychrobacter cibarius strain W1

    DEFF Research Database (Denmark)

    Raghupathi, Prem Krishnan; Herschend, Jakob; Røder, Henriette Lyng

    2016-01-01

    Here, we report the draft genome sequence of Psychrobacter cibarius strain W1, which was isolated at a slaughterhouse in Denmark. The 3.63-Mb genome sequence was assembled into 241 contigs.......Here, we report the draft genome sequence of Psychrobacter cibarius strain W1, which was isolated at a slaughterhouse in Denmark. The 3.63-Mb genome sequence was assembled into 241 contigs....

  14. Genome sequence of carboxylesterase, carboxylase and xylose isomerase producing alkaliphilic haloarchaeon Haloterrigena turkmenica WANU15

    Directory of Open Access Journals (Sweden)

    Samy Selim

    2016-03-01

    Full Text Available We report draft genome sequence of Haloterrigena turkmenica strain WANU15, isolated from Soda Lake. The draft genome size is 2,950,899 bp with a G + C content of 64% and contains 49 RNA sequence. The genome sequence can be accessed at DDBJ/EMBL/GenBank under the accession no. LKCV00000000. Keywords: Soda Lake, Haloterrigena turkmenica, Carboxylesterase, Carboxylase, Xylose isomerase, Whole genome sequencing

  15. Ancient human genome sequence of an extinct Palaeo-Eskimo

    Science.gov (United States)

    Rasmussen, Morten; Li, Yingrui; Lindgreen, Stinus; Pedersen, Jakob Skou; Albrechtsen, Anders; Moltke, Ida; Metspalu, Mait; Metspalu, Ene; Kivisild, Toomas; Gupta, Ramneek; Bertalan, Marcelo; Nielsen, Kasper; Gilbert, M. Thomas P.; Wang, Yong; Raghavan, Maanasa; Campos, Paula F.; Kamp, Hanne Munkholm; Wilson, Andrew S.; Gledhill, Andrew; Tridico, Silvana; Bunce, Michael; Lorenzen, Eline D.; Binladen, Jonas; Guo, Xiaosen; Zhao, Jing; Zhang, Xiuqing; Zhang, Hao; Li, Zhuo; Chen, Minfeng; Orlando, Ludovic; Kristiansen, Karsten; Bak, Mads; Tommerup, Niels; Bendixen, Christian; Pierre, Tracey L.; Grønnow, Bjarne; Meldgaard, Morten; Andreasen, Claus; Fedorova, Sardana A.; Osipova, Ludmila P.; Higham, Thomas F. G.; Ramsey, Christopher Bronk; Hansen, Thomas v. O.; Nielsen, Finn C.; Crawford, Michael H.; Brunak, Søren; Sicheritz-Pontén, Thomas; Villems, Richard; Nielsen, Rasmus; Krogh, Anders; Wang, Jun; Willerslev, Eske

    2013-01-01

    We report here the genome sequence of an ancient human. Obtained from ∼4,000-year-old permafrost-preserved hair, the genome represents a male individual from the first known culture to settle in Greenland. Sequenced to an average depth of 20×, we recover 79% of the diploid genome, an amount close to the practical limit of current sequencing technologies. We identify 353,151 high-confidence single-nucleotide polymorphisms (SNPs), of which 6.8% have not been reported previously. We estimate raw read contamination to be no higher than 0.8%. We use functional SNP assessment to assign possible phenotypic characteristics of the individual that belonged to a culture whose location has yielded only trace human remains. We compare the high-confidence SNPs to those of contemporary populations to find the populations most closely related to the individual. This provides evidence for a migration from Siberia into the New World some 5,500 years ago, independent of that giving rise to the modern Native Americans and Inuit. PMID:20148029

  16. Sequencing and characterization of the FVB/NJ mouse genome.

    Science.gov (United States)

    Wong, Kim; Bumpstead, Suzannah; Van Der Weyden, Louise; Reinholdt, Laura G; Wilming, Laurens G; Adams, David J; Keane, Thomas M

    2012-08-23

    The FVB/NJ mouse strain has its origins in a colony of outbred Swiss mice established in 1935 at the National Institutes of Health. Mice derived from this source were selectively bred for sensitivity to histamine diphosphate and the B strain of Friend leukemia virus. This led to the establishment of the FVB/N inbred strain, which was subsequently imported to the Jackson Laboratory and designated FVB/NJ. The FVB/NJ mouse has several distinct characteristics, such as large pronuclear morphology, vigorous reproductive performance, and consistently large litters that make it highly desirable for transgenic strain production and general purpose use. Using next-generation sequencing technology, we have sequenced the genome of FVB/NJ to approximately 50-fold coverage, and have generated a comprehensive catalog of single nucleotide polymorphisms, small insertion/deletion polymorphisms, and structural variants, relative to the reference C57BL/6J genome. We have examined a previously identified quantitative trait locus for atherosclerosis susceptibility on chromosome 10 and identify several previously unknown candidate causal variants. The sequencing of the FVB/NJ genome and generation of this catalog has increased the number of known variant sites in FVB/NJ by a factor of four, and will help accelerate the identification of the precise molecular variants that are responsible for phenotypes observed in this widely used strain.

  17. Complete genome sequence analysis of chicken astrovirus isolate from India.

    Science.gov (United States)

    Patel, Amrutlal K; Pandit, Ramesh J; Thakkar, Jalpa R; Hinsu, Ankit T; Pandey, Vinod C; Pal, Joy K; Prajapati, Kantilal S; Jakhesara, Subhash J; Joshi, Chaitanya G

    2017-03-01

    Chicken astroviruses have been known to cause severe disease in chickens leading to increased mortality and "white chicks" condition. Here we aim to characterize the causative agent of visceral gout suspected for astrovirus infection in broiler breeder chickens. Total RNA isolated from allantoic fluid of SPF embryo passaged with infected chicken sample was sequenced by whole genome shotgun sequencing using ion-torrent PGM platform. The sequence was analysed for the presence of coding and non-coding features, its similarity with reported isolates and epitope analysis of capsid structural protein. The consensus length of 7513 bp genome sequence of Indian isolate of chicken astrovirus was obtained after assembly of 14,121 high quality reads. The genome was comprised of 13 bp 5'-UTR, three open reading frames (ORFs) including ORF1a encoding serine protease, ORF1b encoding RNA dependent RNA polymerase (RdRp) and ORF2 encoding capsid protein, and 298 bp of 3'-UTR which harboured two corona virus stem loop II like "s2m" motifs and a poly A stretch of 19 nucleotides. The genetic analysis of CAstV/INDIA/ANAND/2016 suggested highest sequence similarity of 86.94% with the chicken astrovirus isolate CAstV/GA2011 followed by 84.76% with CAstV/4175 and 74.48%% with CAstV/Poland/G059/2014 isolates. The capsid structural protein of CAstV/INDIA/ANAND/2016 showed 84.67% similarity with chicken astrovirus isolate CAstV/GA2011, 81.06% with CAstV/4175 and 41.18% with CAstV/Poland/G059/2014 isolates. However, the capsid protein sequence showed high degree of sequence identity at nucleotide level (98.64-99.32%) and at amino acids level (97.74-98.69%) with reported sequences of Indian isolates suggesting their common origin and limited sequence divergence. The epitope analysis by SVMTriP identified two unique epitopes in our isolate, seven shared epitopes among Indian isolates and two shared epitopes among all isolates except Poland isolate which carried all distinct epitopes.

  18. Genome wide characterization of simple sequence repeats in watermelon genome and their application in comparative mapping and genetic diversity analysis

    Science.gov (United States)

    Simple sequence repeats (SSR) or microsatellite markers are one of the most informative and versatile DNA-based markers. The use of next-generation sequencing technologies allow whole genome sequencing and make it possible to develop large numbers of SSRs through bioinformatic analysis of genome da...

  19. The Douglas-fir genome sequence reveals specialization of the photosynthetic apparatus in Pinaceae

    Science.gov (United States)

    David B. Neale; Patrick E. McGuire; Nicholas C. Wheeler; Kristian A. Stevens; Marc W. Crepeau; Charis Cardeno; Aleksey V. Zimin; Daniela Puiu; Geo M. Pertea; U. Uzay Sezen; Claudio Casola; Tomasz E. Koralewski; Robin Paul; Daniel Gonzalez-Ibeas; Sumaira Zaman; Richard Cronn; Mark Yandell; Carson Holt; Charles H. Langley; James A. Yorke; Steven L. Salzberg; Jill L. Wegrzyn

    2017-01-01

    A reference genome sequence for Pseudotsuga menziesii var. menziesii (Mirb.) Franco (Coastal Douglas-fir) is reported, thus providing a reference sequence for a third genus of the family Pinaceae. The contiguity and quality of the genome assembly far exceeds that of other conifer reference genome sequences (contig N50 = 44,136 bp and scaffold N50...

  20. Draft Genome Sequence of "Terrisporobacter othiniensis" Isolated from a Blood Culture from a Human Patient

    DEFF Research Database (Denmark)

    Lund, Lars Christian; Sydenham, Thomas Vognbjerg; Høgh, Silje Vermedal

    2015-01-01

    "Terrisporobacter othiniensis" (proposed species) was isolated from a blood culture. Genomic DNA was sequenced using a MiSeq benchtop sequencer (Illumina) and assembled using the SPAdes genome assembler. This resulted in a draft genome sequence comprising 3,980,019 bp in 167 contigs containing 3...

  1. Draft genome sequences of seven isolates of Phytophthora ramorum EU2 from Northern Ireland

    Directory of Open Access Journals (Sweden)

    Lourdes de la Mata Saez

    2015-12-01

    Full Text Available Here we present draft-quality genome sequence assemblies for the oomycete Phytophthora ramorum genetic lineage EU2. We sequenced genomes of seven isolates collected in Northern Ireland between 2010 and 2012. Multiple genome sequences from P. ramorum EU2 will be valuable for identifying genetic variation within the clonal lineage that can be useful for tracking its spread.

  2. Comparison of three methods of parasitoid polydnavirus genomic DNA isolation to facilitate polydnavirus genomic sequencing.

    Science.gov (United States)

    Rodríguez-Pérez, Mario A; Beckage, Nancy E

    2008-04-01

    A major long-term goal of polydnavirus (PDV) genome research is to identify novel virally encoded molecules that may serve as biopesticides to target insect pests that threaten agriculture and human health. As PDV viral replication in cell culture in vitro has not yet been achieved, several thousands of wasps must be dissected to yield enough viral DNA from the adult ovaries to carry out PDV genomic sequencing. This study compares three methods of PDV genomic DNA isolation for the PDV of Cotesia flavipes, which parasitizes the sugarcane borer, Diatraea saccharalis, preparatory to sequencing the C. flavipes bracovirus genome. Two of these protocols incorporate phenol-chloroform DNA extraction steps in the procedure and the third protocol uses a modified Qiagen DNA kit method to extract viral DNA. The latter method proved significantly less time-consuming and more cost-effective. Efforts are currently underway to bioengineer insect pathogenic viruses with PDV genes, so that their gene products will enhance baculovirus virulence for agricultural insect pests, either via suppression of the immune system of the host or by PDV-mediated induction of its developmental arrest. Sequencing a growing number of complete PDV genomes will enhance those efforts, which will be facilitated by the study reported here. (c) 2008 Wiley-Liss, Inc.

  3. Building a model: developing genomic resources for common milkweed (Asclepias syriaca) with low coverage genome sequencing

    Science.gov (United States)

    Shannon C.K. Straub; Mark Fishbein; Tatyana Livshult; Zachary Foster; Matthew Parks; Kevin Weitemier; Richard C. Cronn; Aaron. Liston

    2011-01-01

    Milkweeds (Asclepias L.) have been extensively investigated in diverse areas of evolutionary biology and ecology; however, there are few genetic resources available to facilitate and compliment these studies. This study explored how low coverage genome sequencing of the common milkweed (Asclepias syriaca L.) could be useful in...

  4. The genome sequence of Caenorhabditis briggsae: a platform for comparative genomics.

    Directory of Open Access Journals (Sweden)

    Lincoln D Stein

    2003-11-01

    Full Text Available The soil nematodes Caenorhabditis briggsae and Caenorhabditis elegans diverged from a common ancestor roughly 100 million years ago and yet are almost indistinguishable by eye. They have the same chromosome number and genome sizes, and they occupy the same ecological niche. To explore the basis for this striking conservation of structure and function, we have sequenced the C. briggsae genome to a high-quality draft stage and compared it to the finished C. elegans sequence. We predict approximately 19,500 protein-coding genes in the C. briggsae genome, roughly the same as in C. elegans. Of these, 12,200 have clear C. elegans orthologs, a further 6,500 have one or more clearly detectable C. elegans homologs, and approximately 800 C. briggsae genes have no detectable matches in C. elegans. Almost all of the noncoding RNAs (ncRNAs known are shared between the two species. The two genomes exhibit extensive colinearity, and the rate of divergence appears to be higher in the chromosomal arms than in the centers. Operons, a distinctive feature of C. elegans, are highly conserved in C. briggsae, with the arrangement of genes being preserved in 96% of cases. The difference in size between the C. briggsae (estimated at approximately 104 Mbp and C. elegans (100.3 Mbp genomes is almost entirely due to repetitive sequence, which accounts for 22.4% of the C. briggsae genome in contrast to 16.5% of the C. elegans genome. Few, if any, repeat families are shared, suggesting that most were acquired after the two species diverged or are undergoing rapid evolution. Coclustering the C. elegans and C. briggsae proteins reveals 2,169 protein families of two or more members. Most of these are shared between the two species, but some appear to be expanding or contracting, and there seem to be as many as several hundred novel C. briggsae gene families. The C. briggsae draft sequence will greatly improve the annotation of the C. elegans genome. Based on similarity to C

  5. The genome sequence of Blochmannia floridanus: Comparative analysis of reduced genomes

    Science.gov (United States)

    Gil, Rosario; Silva, Francisco J.; Zientz, Evelyn; Delmotte, François; González-Candelas, Fernando; Latorre, Amparo; Rausell, Carolina; Kamerbeek, Judith; Gadau, Jürgen; Hölldobler, Bert; van Ham, Roeland C. H. J.; Gross, Roy; Moya, Andrés

    2003-01-01

    Bacterial symbioses are widespread among insects, probably being one of the key factors of their evolutionary success. We present the complete genome sequence of Blochmannia floridanus, the primary endosymbiont of carpenter ants. Although these ants feed on a complex diet, this symbiosis very likely has a nutritional basis: Blochmannia is able to supply nitrogen and sulfur compounds to the host while it takes advantage of the host metabolic machinery. Remarkably, these bacteria lack all known genes involved in replication initiation (dnaA, priA, and recA). The phylogenetic analysis of a set of conserved protein-coding genes shows that Bl. floridanus is phylogenetically related to Buchnera aphidicola and Wigglesworthia glossinidia, the other endosymbiotic bacteria whose complete genomes have been sequenced so far. Comparative analysis of the five known genomes from insect endosymbiotic bacteria reveals they share only 313 genes, a number that may be close to the minimum gene set necessary to sustain endosymbiotic life. PMID:12886019

  6. Complete genome sequence of an attenuated Sparfloxacin-resistant Streptococcus agalactiae strain 138spar

    Science.gov (United States)

    The complete genome of a sparfloxacin-resistant Streptococcus agalactiae vaccine strain 138spar is 1,838,126 bp in size. The genome has 1892 coding sequences and 82 RNAs. The annotation of the genome is added by the NCBI Prokaryotic Genome Annotation Pipeline. The publishing of this genome will allo...

  7. Genomics and introgression: discovery and mapping of thousands of species-diagnostic SNPs using RAD sequencing

    Science.gov (United States)

    Hand, Brian K.; Hether, Tyler D; Kovach, Ryan P.; Muhlfeld, Clint C.; Amish, Stephen J.; Boyer, Matthew C.; O’Rourke, Sean M.; Miller, Michael R.; Lowe, Winsor H.; Hohenlohe, Paul A.; Luikart, Gordon

    2015-01-01

    Invasive hybridization and introgression pose a serious threat to the persistence of many native species. Understanding the effects of hybridization on native populations (e.g., fitness consequences) requires numerous species-diagnostic loci distributed genome-wide. Here we used RAD sequencing to discover thousands of single-nucleotide polymorphisms (SNPs) that are diagnostic between rainbow trout (RBT, Oncorhynchus mykiss), the world’s most widely introduced fish, and native westslope cutthroat trout (WCT, O. clarkii lewisi) in the northern Rocky Mountains, USA. We advanced previous work that identified 4,914 species-diagnostic loci by using longer sequence reads (100 bp vs. 60 bp) and a larger set of individuals (n = 84). We sequenced RAD libraries for individuals from diverse sampling sources, including native populations of WCT and hatchery broodstocks of WCT and RBT. We also took advantage of a newly released reference genome assembly for RBT to align our RAD loci. In total, we discovered 16,788 putatively diagnostic SNPs, 10,267 of which we mapped to anchored chromosome locations on the RBT genome. A small portion of previously discovered putative diagnostic loci (325 of 4,914) were no longer diagnostic (i.e., fixed between species) based on our wider survey of non-hybridized RBT and WCT individuals. Our study suggests that RAD loci mapped to a draft genome assembly could provide the marker density required to identify genes and chromosomal regions influencing selection in admixed populations of conservation concern and evolutionary interest.

  8. Second generation sequencing of the mesothelioma tumor genome.

    Directory of Open Access Journals (Sweden)

    Raphael Bueno

    2010-05-01

    Full Text Available The current paradigm for elucidating the molecular etiology of cancers relies on the interrogation of small numbers of genes, which limits the scope of investigation. Emerging second-generation massively parallel DNA sequencing technologies have enabled more precise definition of the cancer genome on a global scale. We examined the genome of a human primary malignant pleural mesothelioma (MPM tumor and matched normal tissue by using a combination of sequencing-by-synthesis and pyrosequencing methodologies to a 9.6X depth of coverage. Read density analysis uncovered significant aneuploidy and numerous rearrangements. Method-dependent informatics rules, which combined the results of different sequencing platforms, were developed to identify and validate candidate mutations of multiple types. Many more tumor-specific rearrangements than point mutations were uncovered at this depth of sequencing, resulting in novel, large-scale, inter- and intra-chromosomal deletions, inversions, and translocations. Nearly all candidate point mutations appeared to be previously unknown SNPs. Thirty tumor-specific fusions/translocations were independently validated with PCR and Sanger sequencing. Of these, 15 represented disrupted gene-encoding regions, including kinases, transcription factors, and growth factors. One large deletion in DPP10 resulted in altered transcription and expression of DPP10 transcripts in a set of 53 additional MPM tumors correlated with survival. Additionally, three point mutations were observed in the coding regions of NKX6-2, a transcription regulator, and NFRKB, a DNA-binding protein involved in modulating NFKB1. Several regions containing genes such as PCBD2 and DHFR, which are involved in growth factor signaling and nucleotide synthesis, respectively, were selectively amplified in the tumor. Second-generation sequencing uncovered all types of mutations in this MPM tumor, with DNA rearrangements representing the dominant type.

  9. Complete genome sequence of Borrelia afzelii K78 and comparative genome analysis.

    Directory of Open Access Journals (Sweden)

    Wolfgang Schüler

    Full Text Available The main Borrelia species causing Lyme borreliosis in Europe and Asia are Borrelia afzelii, B. garinii, B. burgdorferi and B. bavariensis. This is in contrast to the United States, where infections are exclusively caused by B. burgdorferi. Until to date the genome sequences of four B. afzelii strains, of which only two include the numerous plasmids, are available. In order to further assess the genetic diversity of B. afzelii, the most common species in Europe, responsible for the large variety of clinical manifestations of Lyme borreliosis, we have determined the full genome sequence of the B. afzelii strain K78, a clinical isolate from Austria. The K78 genome contains a linear chromosome (905,949 bp and 13 plasmids (8 linear and 5 circular together presenting 1,309 open reading frames of which 496 are located on plasmids. With the exception of lp28-8, all linear replicons in their full length including their telomeres have been sequenced. The comparison with the genomes of the four other B. afzelii strains, ACA-1, PKo, HLJ01 and Tom3107, as well as the one of B. burgdorferi strain B31, confirmed a high degree of conservation within the linear chromosome of B. afzelii, whereas plasmid encoded genes showed a much larger diversity. Since some plasmids present in B. burgdorferi are missing in the B. afzelii genomes, the corresponding virulence factors of B. burgdorferi are found in B. afzelii on other unrelated plasmids. In addition, we have identified a species specific region in the circular plasmid, cp26, which could be used for species determination. Different non-coding RNAs have been located on the B. afzelii K78 genome, which have not previously been annotated in any of the published Borrelia genomes.

  10. Complete Genome Sequence of Borrelia afzelii K78 and Comparative Genome Analysis

    Science.gov (United States)

    Schüler, Wolfgang; Bunikis, Ignas; Weber-Lehman, Jacqueline; Comstedt, Pär; Kutschan-Bunikis, Sabrina; Stanek, Gerold; Huber, Jutta; Meinke, Andreas; Bergström, Sven; Lundberg, Urban

    2015-01-01

    The main Borrelia species causing Lyme borreliosis in Europe and Asia are Borrelia afzelii, B. garinii, B. burgdorferi and B. bavariensis. This is in contrast to the United States, where infections are exclusively caused by B. burgdorferi. Until to date the genome sequences of four B. afzelii strains, of which only two include the numerous plasmids, are available. In order to further assess the genetic diversity of B. afzelii, the most common species in Europe, responsible for the large variety of clinical manifestations of Lyme borreliosis, we have determined the full genome sequence of the B. afzelii strain K78, a clinical isolate from Austria. The K78 genome contains a linear chromosome (905,949 bp) and 13 plasmids (8 linear and 5 circular) together presenting 1,309 open reading frames of which 496 are located on plasmids. With the exception of lp28-8, all linear replicons in their full length including their telomeres have been sequenced. The comparison with the genomes of the four other B. afzelii strains, ACA-1, PKo, HLJ01 and Tom3107, as well as the one of B. burgdorferi strain B31, confirmed a high degree of conservation within the linear chromosome of B. afzelii, whereas plasmid encoded genes showed a much larger diversity. Since some plasmids present in B. burgdorferi are missing in the B. afzelii genomes, the corresponding virulence factors of B. burgdorferi are found in B. afzelii on other unrelated plasmids. In addition, we have identified a species specific region in the circular plasmid, cp26, which could be used for species determination. Different non-coding RNAs have been located on the B. afzelii K78 genome, which have not previously been annotated in any of the published Borrelia genomes. PMID:25798594

  11. De novo assembly of the carrot mitochondrial genome using next generation sequencing of whole genomic DNA provides first evidence of DNA transfer into an angiosperm plastid genome

    Directory of Open Access Journals (Sweden)

    Iorizzo Massimo

    2012-05-01

    Full Text Available Abstract Background Sequence analysis of organelle genomes has revealed important aspects of plant cell evolution. The scope of this study was to develop an approach for de novo assembly of the carrot mitochondrial genome using next generation sequence data from total genomic DNA. Results Sequencing data from a carrot 454 whole genome library were used to develop a de novo assembly of the mitochondrial genome. Development of a new bioinformatic tool allowed visualizing contig connections and elucidation of the de novo assembly. Southern hybridization demonstrated recombination across two large repeats. Genome annotation allowed identification of 44 protein coding genes, three rRNA and 17 tRNA. Identification of the plastid genome sequence allowed organelle genome comparison. Mitochondrial intergenic sequence analysis allowed detection of a fragment of DNA specific to the carrot plastid genome. PCR amplification and sequence analysis across different Apiaceae species revealed consistent conservation of this fragment in the mitochondrial genomes and an insertion in Daucus plastid genomes, giving evidence of a mitochondrial to plastid transfer of DNA. Sequence similarity with a retrotransposon element suggests a possibility that a transposon-like event transferred this sequence into the plastid genome. Conclusions This study confirmed that whole genome sequencing is a practical approach for de novo assembly of higher plant mitochondrial genomes. In addition, a new aspect of intercompartmental genome interaction was reported providing the first evidence for DNA transfer into an angiosperm plastid genome. The approach used here could be used more broadly to sequence and assemble mitochondrial genomes of diverse species. This information will allow us to better understand intercompartmental interactions and cell evolution.

  12. Human genome sequencing with direct x-ray holographic imaging

    International Nuclear Information System (INIS)

    Rhodes, C.K.

    1993-01-01

    Direct holographic imaging of biological materials is widely applicable to the study of the structure, properties and action of genetic material. This particular application involves the sequencing of the human genome where prospective genomic imaging technology is composed of three subtechnologies, name an x-ray holographic camera, suitable chemistry and enzymology for the preparation of tagged DNA samples, and the illuminator in the form of an x-ray laser. We report appropriate x-ray camera, embodied by the instrument developed by MCR, is available and that suitable chemical and enzymatic procedures exist for the preparation of the necessary tagged DNA strands. Concerning the future development of the x-ray illuminator. We find that a practical small scale x-ray light source is indeed feasible. This outcome requires the use of unconventional physical processes in order to achieve the necessary power-compression in the amplifying medium. The understanding of these new physical mechanisms is developing rapidly. Importantly, although the x-ray source does not currently exist, the understanding of these new physical mechanisms is developing rapidly and the research has established the basic scaling laws that will determine the properties of the x-ray illuminator. When this x-ray source becomes available, an extremely rapid and cost effective instrument for 3-D imaging of biological materials can be applied to a wide range of biological structural assays, including the base-pair sequencing of the human genome and many questions regarding its higher levels of organization

  13. Whole genome sequencing of matched primary and metastatic acral melanomas.

    Science.gov (United States)

    Turajlic, Samra; Furney, Simon J; Lambros, Maryou B; Mitsopoulos, Costas; Kozarewa, Iwanka; Geyer, Felipe C; Mackay, Alan; Hakas, Jarle; Zvelebil, Marketa; Lord, Christopher J; Ashworth, Alan; Thomas, Meirion; Stamp, Gordon; Larkin, James; Reis-Filho, Jorge S; Marais, Richard

    2012-02-01

    Next generation sequencing has enabled systematic discovery of mutational spectra in cancer samples. Here, we used whole genome sequencing to characterize somatic mutations and structural variation in a primary acral melanoma and its lymph node metastasis. Our data show that the somatic mutational rates in this acral melanoma sample pair were more comparable to the rates reported in cancer genomes not associated with mutagenic exposure than in the genome of a melanoma cell line or the transcriptome of melanoma short-term cultures. Despite the perception that acral skin is sun-protected, the dominant mutational signature in these samples is compatible with damage due to ultraviolet light exposure. A nonsense mutation in ERCC5 discovered in both the primary and metastatic tumors could also have contributed to the mutational signature through accumulation of unrepaired dipyrimidine lesions. However, evidence of transcription-coupled repair was suggested by the lower mutational rate in the transcribed regions and expressed genes. The primary and the metastasis are highly similar at the level of global gene copy number alterations, loss of heterozygosity and single nucleotide variation (SNV). Furthermore, the majority of the SNVs in the primary tumor were propagated in the metastasis and one nonsynonymous coding SNV and one splice site mutation appeared to arise de novo in the metastatic lesion.

  14. Complete chloroplast genome sequence of Elodea canadensis and comparative analyses with other monocot plastid genomes.

    Science.gov (United States)

    Huotari, Tea; Korpelainen, Helena

    2012-10-15

    Elodea canadensis is an aquatic angiosperm native to North America. It has attracted great attention due to its invasive nature when transported to new areas in its non-native range. We have determined the complete nucleotide sequence of the chloroplast (cp) genome of Elodea. Taxonomically Elodea is a basal monocot, and only few monocot cp genomes representing early lineages of monocots have been sequenced so far. The genome is a circular double-stranded DNA molecule 156,700 bp in length, and has a typical structure with large (LSC 86,194 bp) and small (SSC 17,810 bp) single-copy regions separated by a pair of inverted repeats (IRs 26,348 bp each). The Elodea cp genome contains 113 unique genes and 16 duplicated genes in the IR regions. A comparative analysis showed that the gene order and organization of the Elodea cp genome is almost identical to that of Amborella trichopoda, a basal angiosperm. The structure of IRs in Elodea is unique among monocot species with the whole cp genome sequenced. In Elodea and another monocot Lemna minor the borders between IRs and LSC are located upstream of rps 19 gene and downstream of trnH-GUG gene, while in most monocots, IR has extended to include both trnH and rps 19 genes. A phylogenetic analysis conducted using Bayesian method, based on the DNA sequences of 81 chloroplast genes from 17 monocot taxa provided support for the placement of Elodea together with Lemna as a basal monocot and the next diverging lineage of monocots after Acorales. In comparison with other monocots, the Elodea cp genome has gone through only few rearrangements or gene losses. IR of Elodea has a unique structure among the monocot species studied so far as its structure is similar to that of a basal angiosperm Amborella. This result together with phylogenetic analyses supports the placement of Elodea as a basal monocot to the next diverging lineage of monocots after Acorales. So far, only few cp genomes representing early lineages of monocots have been

  15. Exploiting long read sequencing technologies to establish high quality highly contiguous pig reference genome assemblies

    Science.gov (United States)

    The current pig reference genome sequence (Sscrofa10.2) was established using Sanger sequencing and following the clone-by-clone hierarchical shotgun sequencing approach used in the public human genome project. However, as sequence coverage was low (4-6x) the resulting assembly was only of draft qua...

  16. The sequence and analysis of a Chinese pig genome

    Directory of Open Access Journals (Sweden)

    Fang Xiaodong

    2012-11-01

    Full Text Available Abstract Background The pig is an economically important food source, amounting to approximately 40% of all meat consumed worldwide. Pigs also serve as an important model organism because of their similarity to humans at the anatomical, physiological and genetic level, making them very useful for studying a variety of human diseases. A pig strain of particular interest is the miniature pig, specifically the Wuzhishan pig (WZSP, as it has been extensively inbred. Its high level of homozygosity offers increased ease for selective breeding for specific traits and a more straightforward understanding of the genetic changes that underlie its biological characteristics. WZSP also serves as a promising means for applications in surgery, tissue engineering, and xenotransplantation. Here, we report the sequencing and analysis of an inbreeding WZSP genome. Results Our results reveal some unique genomic features, including a relatively high level of homozygosity in the diploid genome, an unusual distribution of heterozygosity, an over-representation of tRNA-derived transposable elements, a small amount of porcine endogenous retrovirus, and a lack of type C retroviruses. In addition, we carried out systematic research on gene evolution, together with a detailed investigation of the counterparts of human drug target genes. Conclusion Our results provide the opportunity to more clearly define the genomic character of pig, which could enhance our ability to create more useful pig models.

  17. Complete genome sequence of Enterococcus faecium strain TX16 and comparative genomic analysis of Enterococcus faecium genomes

    Science.gov (United States)

    2012-01-01

    Background Enterococci are among the leading causes of hospital-acquired infections in the United States and Europe, with Enterococcus faecalis and Enterococcus faecium being the two most common species isolated from enterococcal infections. In the last decade, the proportion of enterococcal infections caused by E. faecium has steadily increased compared to other Enterococcus species. Although the underlying mechanism for the gradual replacement of E. faecalis by E. faecium in the hospital environment is not yet understood, many studies using genotyping and phylogenetic analysis have shown the emergence of a globally dispersed polyclonal subcluster of E. faecium strains in clinical environments. Systematic study of the molecular epidemiology and pathogenesis of E. faecium has been hindered by the lack of closed, complete E. faecium genomes that can be used as references. Results In this study, we report the complete genome sequence of the E. faecium strain TX16, also known as DO, which belongs to multilocus sequence type (ST) 18, and was the first E. faecium strain ever sequenced. Whole genome comparison of the TX16 genome with 21 E. faecium draft genomes confirmed that most clinical, outbreak, and hospital-associated (HA) strains (including STs 16, 17, 18, and 78), in addition to strains of non-hospital origin, group in the same clade (referred to as the HA clade) and are evolutionally considerably more closely related to each other by phylogenetic and gene content similarity analyses than to isolates in the community-associated (CA) clade with approximately a 3–4% average nucleotide sequence difference between the two clades at the core genome level. Our study also revealed that many genomic loci in the TX16 genome are unique to the HA clade. 380 ORFs in TX16 are HA-clade specific and antibiotic resistance genes are enriched in HA-clade strains. Mobile elements such as IS16 and transposons were also found almost exclusively in HA strains, as previously reported

  18. Pan-Cancer Analysis of Genomic Sequencing Among the Elderly.

    Science.gov (United States)

    Wahl, Daniel R; Nguyen, Paul L; Santiago, Maria; Yousefi, Kasra; Davicioni, Elai; Shumway, Dean A; Speers, Corey; Mehra, Rohit; Feng, Felix Y; Osborne, Joseph R; Spratt, Daniel E

    2017-07-15

    We hypothesized that elderly patients might have age-specific genetic abnormalities yet be underrepresented in currently available sequencing repositories, which could limit the effect of sequencing efforts for this population. Leveraging The Cancer Genome Atlas (TCGA) data portal, 9 tumor types were analyzed. The frequency distribution of cancer by age was determined and compared with Surveillance, Epidemiology, and End Results data. Using the estimated median somatic mutational frequency of each tumor type, the samples needed beyond TCGA to detect a 10% mutational frequency were calculated. Microarray data from a separate prospective cohort were obtained from primary prostatectomy samples to determine whether elderly-specific transcriptomic alterations could be identified. Of the 5236 TCGA samples, 73% were from patients aged elderly patients with cancer were likely to harbor age-specific molecular abnormalities, we accessed transcriptomic data from a separate, larger database of >2000 prostate cancer samples. That analysis revealed significant differences in the expression of 10 genes in patients aged ≥70 years compared with those Elderly patients have been underrepresented in genomic sequencing studies. Our data suggest the presence of elderly-specific molecular alterations. Further dedicated efforts to understand the biology of cancer among the elderly will be important moving forward. Copyright © 2017 Elsevier Inc. All rights reserved.

  19. Complete nucleotide sequences of avian metapneumovirus subtype B genome.

    Science.gov (United States)

    Sugiyama, Miki; Ito, Hiroshi; Hata, Yusuke; Ono, Eriko; Ito, Toshihiro

    2010-12-01

    Complete nucleotide sequences were determined for subtype B avian metapneumovirus (aMPV), the attenuated vaccine strain VCO3/50 and its parental pathogenic strain VCO3/60616. The genomes of both strains comprised 13,508 nucleotides (nt), with a 42-nt leader at the 3'-end and a 46-nt trailer at the 5'-end. The genome contains eight genes in the order 3'-N-P-M-F-M2-SH-G-L-5', which is the same order shown in the other metapneumoviruses. The genes are flanked on either side by conserved transcriptional start and stop signals and have intergenic sequences varying in length from 1 to 88 nt. Comparison of nt and predicted amino acid (aa) sequences of VCO3/60616 with those of other metapneumoviruses revealed higher homology with aMPV subtype A virus than with other metapneumoviruses. A total of 18 nt and 10 deduced aa differences were seen between the strains, and one or a combination of several differences could be associated with attenuation of VCO3/50.

  20. Genotyping of Mycobacterium tuberculosis using whole genome sequencing.

    Science.gov (United States)

    Amlerova, Jana; Bitar, Ibrahim; Hrabak, Jaroslav

    2018-03-17

    Tuberculosis (TB) is considered one of the most serious infectious diseases worldwide. Effective control of tuberculosis infection involves multiple steps, such as reliable detection, treatment, an epidemiological control as a part of case management, and further surveillance and monitoring of TB spread in the human population. Due to the accelerating advances in molecular biology, especially in DNA sequencing, in the past decade, the application of these methods has become crucial for TB evolution studies, differentiation of Mycobacterium tuberculosis genotypes, and their distribution. Currently, several molecular genetic methods are available. The oldest typing methods (e.g., IS6110-RFLP, spoligotyping, and MIRU-VNTR) can discover the chain of transmission to the patient. Currently, whole genome sequencing facilitates is furthermore able to identify the source of infection, the transmission trays among individuals sharing the same isolate, as well as determination of the TB evolution and its resistance to antituberculotic agents. It is obvious that this technique will become a new gold standard in genotyping methods in tuberculosis molecular epidemiological studies. In this article, molecular genetic typing methods with a special focus on whole genome sequencing and data management are reviewed.

  1. HEP Computing Tools, Grid and Supercomputers for Genome Sequencing Studies

    Science.gov (United States)

    De, K.; Klimentov, A.; Maeno, T.; Mashinistov, R.; Novikov, A.; Poyda, A.; Tertychnyy, I.; Wenaus, T.

    2017-10-01

    PanDA - Production and Distributed Analysis Workload Management System has been developed to address ATLAS experiment at LHC data processing and analysis challenges. Recently PanDA has been extended to run HEP scientific applications on Leadership Class Facilities and supercomputers. The success of the projects to use PanDA beyond HEP and Grid has drawn attention from other compute intensive sciences such as bioinformatics. Recent advances of Next Generation Genome Sequencing (NGS) technology led to increasing streams of sequencing data that need to be processed, analysed and made available for bioinformaticians worldwide. Analysis of genomes sequencing data using popular software pipeline PALEOMIX can take a month even running it on the powerful computer resource. In this paper we will describe the adaptation the PALEOMIX pipeline to run it on a distributed computing environment powered by PanDA. To run pipeline we split input files into chunks which are run separately on different nodes as separate inputs for PALEOMIX and finally merge output file, it is very similar to what it done by ATLAS to process and to simulate data. We dramatically decreased the total walltime because of jobs (re)submission automation and brokering within PanDA. Using software tools developed initially for HEP and Grid can reduce payload execution time for Mammoths DNA samples from weeks to days.

  2. Multifaceted biological insights from a draft genome sequence of the tobacco hornworm moth, Manduca sexta

    Science.gov (United States)

    Kanost, Michael R.; Arrese, Estela L.; Cao, Xiaolong; Chen, Yun-Ru; Chellapilla, Sanjay; Goldsmith, Marian R; Grosse-Wilde, Ewald; Heckel, David G.; Herndon, Nicolae; Jiang, Haobo; Papanicolaou, Alexie; Qu, Jiaxin; Soulages, Jose L.; Vogel, Heiko; Walters, James; Waterhouse, Robert M.; Ahn, Seung-Joon; Almeida, Francisca C.; An, Chunju; Aqrawi, Peshtewani; Bretschneider, Anne; Bryant, William B.; Bucks, Sascha; Chao, Hsu; Chevignon, Germain; Christen, Jayne M.; Clarke, David F.; Dittmer, Neal T.; Ferguson, Laura C.F.; Garavelou, Spyridoula; Gordon, Karl H.J.; Gunaratna, Ramesh T.; Han, Yi; Hauser, Frank; He, Yan; Heidel-Fischer, Hanna; Hirsh, Ariana; Hu, Yingxia; Jiang, Hongbo; Kalra, Divya; Klinner, Christian; König, Christopher; Kovar, Christie; Kroll, Ashley R.; Kuwar, Suyog S.; Lee, Sandy L.; Lehman, Rüdiger; Li, Kai; Li, Zhaofei; Liang, Hanquan; Lovelace, Shanna; Lu, Zhiqiang; Mansfield, Jennifer H.; McCulloch, Kyle J.; Mathew, Tittu; Morton, Brian; Muzny, Donna M.; Neunemann, David; Ongeri, Fiona; Pauchet, Yannick; Pu, Ling-Ling; Pyrousis, Ioannis; Rao, Xiang-Jun; Redding, Amanda; Roesel, Charles; Sanchez-Gracia, Alejandro; Schaack, Sarah; Shukla, Aditi; Tetreau, Guillaume; Wang, Yang; Xiong, Guang-Hua; Traut, Walther; Walsh, Tom K.; Worley, Kim C.; Wu, Di; Wu, Wenbi; Wu, Yuan-Qing; Zhang, Xiufeng; Zou, Zhen; Zucker, Hannah; Briscoe, Adriana D.; Burmester, Thorsten; Clem, Rollie J.; Feyereisen, René; Grimmelikhuijzen, Cornelis J.P; Hamodrakas, Stavros J.; Hansson, Bill S.; Huguet, Elisabeth; Jermiin, Lars S.; Lan, Que; Lehman, Herman K.; Lorenzen, Marce; Merzendorfer, Hans; Michalopoulos, Ioannis; Morton, David B.; Muthukrishnan, Subbaratnam; Oakeshott, John G.; Palmer, Will; Park, Yoonseong; Passarelli, A. Lorena; Rozas, Julio; Schwartz, Lawrence M.; Smith, Wendy; Southgate, Agnes; Vilcinskas, Andreas; Vogt, Richard; Wang, Ping; Werren, John; Yu, Xiao-Qiang; Zhou, Jing-Jiang; Brown, Susan J.; Scherer, Steven E.; Richards, Stephen; Blissard, Gary W.

    2016-01-01

    Manduca sexta, known as the tobacco hornworm or Carolina sphinx moth, is a lepidopteran insect that is used extensively as a model system for research in insect biochemistry, physiology, neurobiology, development, and immunity. One important benefit of this species as an experimental model is its extremely large size, reaching more than 10 g in the larval stage. M. sexta larvae feed on solanaceous plants and thus must tolerate a substantial challenge from plant allelochemicals, including nicotine. We report the sequence and annotation of the M. sexta genome, and a survey of gene expression in various tissues and developmental stages. The Msex_1.0 genome assembly resulted in a total genome size of 419.4 Mbp. Repetitive sequences accounted for 25.8% of the assembled genome. The official gene set is comprised of 15,451 protein-coding genes, of which 2498 were manually curated. Extensive RNA-seq data from many tissues and developmental stages were used to improve gene models and for insights into gene expression patterns. Genome wide synteny analysis indicated a high level of macrosynteny in the Lepidoptera. Annotation and analyses were carried out for gene families involved in a wide spectrum of biological processes, including apoptosis, vacuole sorting, growth and development, structures of exoskeleton, egg shells, and muscle, vision, chemosensation, ion channels, signal transduction, neuropeptide signaling, neurotransmitter synthesis and transport, nicotine tolerance, lipid metabolism, and immunity. This genome sequence, annotation, and analysis provide an important new resource from a well-studied model insect species and will facilitate further biochemical and mechanistic experimental studies of many biological systems in insects. PMID:27522922

  3. The zebrafish reference genome sequence and its relationship to the human genome.

    Science.gov (United States)

    Howe, Kerstin; Clark, Matthew D; Torroja, Carlos F; Torrance, James; Berthelot, Camille; Muffato, Matthieu; Collins, John E; Humphray, Sean; McLaren, Karen; Matthews, Lucy; McLaren, Stuart; Sealy, Ian; Caccamo, Mario; Churcher, Carol; Scott, Carol; Barrett, Jeffrey C; Koch, Romke; Rauch, Gerd-Jörg; White, Simon; Chow, William; Kilian, Britt; Quintais, Leonor T; Guerra-Assunção, José A; Zhou, Yi; Gu, Yong; Yen, Jennifer; Vogel, Jan-Hinnerk; Eyre, Tina; Redmond, Seth; Banerjee, Ruby; Chi, Jianxiang; Fu, Beiyuan; Langley, Elizabeth; Maguire, Sean F; Laird, Gavin K; Lloyd, David; Kenyon, Emma; Donaldson, Sarah; Sehra, Harminder; Almeida-King, Jeff; Loveland, Jane; Trevanion, Stephen; Jones, Matt; Quail, Mike; Willey, Dave; Hunt, Adrienne; Burton, John; Sims, Sarah; McLay, Kirsten; Plumb, Bob; Davis, Joy; Clee, Chris; Oliver, Karen; Clark, Richard; Riddle, Clare; Elliot, David; Eliott, David; Threadgold, Glen; Harden, Glenn; Ware, Darren; Begum, Sharmin; Mortimore, Beverley; Mortimer, Beverly; Kerry, Giselle; Heath, Paul; Phillimore, Benjamin; Tracey, Alan; Corby, Nicole; Dunn, Matthew; Johnson, Christopher; Wood, Jonathan; Clark, Susan; Pelan, Sarah; Griffiths, Guy; Smith, Michelle; Glithero, Rebecca; Howden, Philip; Barker, Nicholas; Lloyd, Christine; Stevens, Christopher; Harley, Joanna; Holt, Karen; Panagiotidis, Georgios; Lovell, Jamieson; Beasley, Helen; Henderson, Carl; Gordon, Daria; Auger, Katherine; Wright, Deborah; Collins, Joanna; Raisen, Claire; Dyer, Lauren; Leung, Kenric; Robertson, Lauren; Ambridge, Kirsty; Leongamornlert, Daniel; McGuire, Sarah; Gilderthorp, Ruth; Griffiths, Coline; Manthravadi, Deepa; Nichol, Sarah; Barker, Gary; Whitehead, Siobhan; Kay, Michael; Brown, Jacqueline; Murnane, Clare; Gray, Emma; Humphries, Matthew; Sycamore, Neil; Barker, Darren; Saunders, David; Wallis, Justene; Babbage, Anne; Hammond, Sian; Mashreghi-Mohammadi, Maryam; Barr, Lucy; Martin, Sancha; Wray, Paul; Ellington, Andrew; Matthews, Nicholas; Ellwood, Matthew; Woodmansey, Rebecca; Clark, Graham; Cooper, James D; Cooper, James; Tromans, Anthony; Grafham, Darren; Skuce, Carl; Pandian, Richard; Andrews, Robert; Harrison, Elliot; Kimberley, Andrew; Garnett, Jane; Fosker, Nigel; Hall, Rebekah; Garner, Patrick; Kelly, Daniel; Bird, Christine; Palmer, Sophie; Gehring, Ines; Berger, Andrea; Dooley, Christopher M; Ersan-Ürün, Zübeyde; Eser, Cigdem; Geiger, Horst; Geisler, Maria; Karotki, Lena; Kirn, Anette; Konantz, Judith; Konantz, Martina; Oberländer, Martina; Rudolph-Geiger, Silke; Teucke, Mathias; Lanz, Christa; Raddatz, Günter; Osoegawa, Kazutoyo; Zhu, Baoli; Rapp, Amanda; Widaa, Sara; Langford, Cordelia; Yang, Fengtang; Schuster, Stephan C; Carter, Nigel P; Harrow, Jennifer; Ning, Zemin; Herrero, Javier; Searle, Steve M J; Enright, Anton; Geisler, Robert; Plasterk, Ronald H A; Lee, Charles; Westerfield, Monte; de Jong, Pieter J; Zon, Leonard I; Postlethwait, John H; Nüsslein-Volhard, Christiane; Hubbard, Tim J P; Roest Crollius, Hugues; Rogers, Jane; Stemple, Derek L

    2013-04-25

    Zebrafish have become a popular organism for the study of vertebrate gene function. The virtually transparent embryos of this species, and the ability to accelerate genetic studies by gene knockdown or overexpression, have led to the widespread use of zebrafish in the detailed investigation of vertebrate gene function and increasingly, the study of human genetic disease. However, for effective modelling of human genetic disease it is important to understand the extent to which zebrafish genes and gene structures are related to orthologous human genes. To examine this, we generated a high-quality sequence assembly of the zebrafish genome, made up of an overlapping set of completely sequenced large-insert clones that were ordered and oriented using a high-resolution high-density meiotic map. Detailed automatic and manual annotation provides evidence of more than 26,000 protein-coding genes, the largest gene set of any vertebrate so far sequenced. Comparison to the human reference genome shows that approximately 70% of human genes have at least one obvious zebrafish orthologue. In addition, the high quality of this genome assembly provides a clearer understanding of key genomic features such as a unique repeat content, a scarcity of pseudogenes, an enrichment of zebrafish-specific genes on chromosome 4 and chromosomal regions that influence sex determination.

  4. Complete genome sequence of Paenibacillus sp. strain JDR-2

    Science.gov (United States)

    Chow, Virginia; Nong, Guang; St. John, Franz J.; Rice, John D.; Dickstein, Ellen; Chertkov, Olga; Bruce, David; Detter, Chris; Brettin, Thomas; Han, James; Woyke, Tanja; Pitluck, Sam; Nolan, Matt; Pati, Amrita; Martin, Joel; Copeland, Alex; Land, Miriam L.; Goodwin, Lynne; Jones, Jeffrey B.; Ingram, Lonnie O.; Shanmugam, Keelnathan T.; Preston, James F.

    2012-01-01

    Paenibacillus sp. strain JDR-2, an aggressively xylanolytic bacterium isolated from sweetgum (Liquidambar styraciflua) wood, is able to efficiently depolymerize, assimilate and metabolize 4-O-methylglucuronoxylan, the predominant structural component of hardwood hemicelluloses. A basis for this capability was first supported by the identification of genes and characterization of encoded enzymes and has been further defined by the sequencing and annotation of the complete genome, which we describe. In addition to genes implicated in the utilization of β-1,4-xylan, genes have also been identified for the utilization of other hemicellulosic polysaccharides. The genome of Paenibacillus sp. JDR-2 contains 7,184,930 bp in a single replicon with 6,288 protein-coding and 122 RNA genes. Uniquely prominent are 874 genes encoding proteins involved in carbohydrate transport and metabolism. The prevalence and organization of these genes support a metabolic potential for bioprocessing of hemicellulose fractions derived from lignocellulosic resources. PMID:22675593

  5. Complete genome sequence of Paenibacillus sp. strain JDR-2

    Energy Technology Data Exchange (ETDEWEB)

    Chow, Virginia [University of Florida; Nong, Guang [University of Florida; St. John, Franz J. [US Forest Service, Forest Products Laboratory, Madison, Wisconsin, USA; Dickstein, Ellen [University of Florida; Chertkov, Olga [Los Alamos National Laboratory (LANL); Bruce, David [Los Alamos National Laboratory (LANL); Detter, J. Chris [U.S. Department of Energy, Joint Genome Institute; Brettin, Thomas S [ORNL; Han, James [U.S. Department of Energy, Joint Genome Institute; Woyke, Tanja [U.S. Department of Energy, Joint Genome Institute; Pitluck, Sam [U.S. Department of Energy, Joint Genome Institute; Nolan, Matt [U.S. Department of Energy, Joint Genome Institute; Pati, Amrita [U.S. Department of Energy, Joint Genome Institute; Martin, Joel [U.S. Department of Energy, Joint Genome Institute; Copeland, A [U.S. Department of Energy, Joint Genome Institute; Land, Miriam L [ORNL; Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Jones, Jeffrey B. [University of Florida; Ingram, Lonnie O. [University of Florida; Shanmugam, Keelnathan T. [University of Florida; Preston, James F. [University of Florida

    2012-01-01

    Paenibacillus sp. strain JDR-2, an aggressively xylanolytic bacterium isolated from sweetgum (Liquidambar styraciflua) wood, is able to efficiently depolymerize, assimilate and metabolize 4-O-methylglucuronoxylan, the predominant structural component of hardwood hemicelluloses. A basis for this capability was first supported by the identification of genes and characterization of encoded enzymes and has been further defined by the sequencing and annotation of the complete genome, which we describe. In addition to genes implicated in the utilization of -1,4-xylan, genes have also been identified for the utilization of other hemicellulosic polysaccharides. The genome of Paenibacillus sp. JDR-2 contains 7,184,930 bp in a single replicon with 6,288 protein-coding and 122 RNA genes. Uniquely prominent are 874 genes encoding proteins involved in carbohydrate transport and metabolism. The prevalence and organization of these genes support a metabolic potential for bioprocessing of hemicellulose fractions derived from lignocellulosic resources.

  6. Complete genome sequence of Paenibacillus sp. strain JDR-2.

    Science.gov (United States)

    Chow, Virginia; Nong, Guang; St John, Franz J; Rice, John D; Dickstein, Ellen; Chertkov, Olga; Bruce, David; Detter, Chris; Brettin, Thomas; Han, James; Woyke, Tanja; Pitluck, Sam; Nolan, Matt; Pati, Amrita; Martin, Joel; Copeland, Alex; Land, Miriam L; Goodwin, Lynne; Jones, Jeffrey B; Ingram, Lonnie O; Shanmugam, Keelnathan T; Preston, James F

    2012-03-19

    Paenibacillus sp. strain JDR-2, an aggressively xylanolytic bacterium isolated from sweetgum (Liquidambar styraciflua) wood, is able to efficiently depolymerize, assimilate and metabolize 4-O-methylglucuronoxylan, the predominant structural component of hardwood hemicelluloses. A basis for this capability was first supported by the identification of genes and characterization of encoded enzymes and has been further defined by the sequencing and annotation of the complete genome, which we describe. In addition to genes implicated in the utilization of β-1,4-xylan, genes have also been identified for the utilization of other hemicellulosic polysaccharides. The genome of Paenibacillus sp. JDR-2 contains 7,184,930 bp in a single replicon with 6,288 protein-coding and 122 RNA genes. Uniquely prominent are 874 genes encoding proteins involved in carbohydrate transport and metabolism. The prevalence and organization of these genes support a metabolic potential for bioprocessing of hemicellulose fractions derived from lignocellulosic resources.

  7. Whole-Genome Sequencing: Automated, Nonindexed Library Preparation.

    Science.gov (United States)

    Mardis, Elaine; McCombie, W Richard

    2017-03-01

    This protocol describes an automated procedure for constructing a nonindexed Illumina DNA library and relies on the use of a CyBi-SELMA automated pipetting machine, the Covaris E210 shearing instrument, and the epMotion 5075. With this method, genomic DNA fragments are produced by sonication, using high-frequency acoustic energy to shear DNA. Here, double-stranded DNA is fragmented when exposed to the energy of adaptive focused acoustic shearing (AFA). The resulting DNA fragments are ligated to adaptors, amplified by polymerase chain reaction (PCR), and subjected to size selection using magnetic beads. The product is suitable for use as template in whole-genome sequencing. © 2017 Cold Spring Harbor Laboratory Press.

  8. Whole-genome sequencing approaches for conservation biology: Advantages, limitations and practical recommendations.

    Science.gov (United States)

    Fuentes-Pardo, Angela P; Ruzzante, Daniel E

    2017-10-01

    Whole-genome resequencing (WGR) is a powerful method for addressing fundamental evolutionary biology questions that have not been fully resolved using traditional methods. WGR includes four approaches: the sequencing of individuals to a high depth of coverage with either unresolved or resolved haplotypes, the sequencing of population genomes to a high depth by mixing equimolar amounts of unlabelled-individual DNA (Pool-seq) and the sequencing of multiple individuals from a population to a low depth (lcWGR). These techniques require the availability of a reference genome. This, along with the still high cost of shotgun sequencing and the large demand for computing resources and storage, has limited their implementation in nonmodel species with scarce genomic resources and in fields such as conservation biology. Our goal here is to describe the various WGR methods, their pros and cons and potential applications in conservation biology. WGR offers an unprecedented marker density and surveys a wide diversity of genetic variations not limited to single nucleotide polymorphisms (e.g., structural variants and mutations in regulatory elements), increasing their power for the detection of signatures of selection and local adaptation as well as for the identification of the genetic basis of phenotypic traits and diseases. Currently, though, no single WGR approach fulfils all requirements of conservation genetics, and each method has its own limitations and sources of potential bias. We discuss proposed ways to minimize such biases. We envision a not distant future where the analysis of whole genomes becomes a routine task in many nonmodel species and fields including conservation biology. © 2017 John Wiley & Sons Ltd.

  9. Clinical genomics information management software linking cancer genome sequence and clinical decisions.

    Science.gov (United States)

    Watt, Stuart; Jiao, Wei; Brown, Andrew M K; Petrocelli, Teresa; Tran, Ben; Zhang, Tong; McPherson, John D; Kamel-Reid, Suzanne; Bedard, Philippe L; Onetto, Nicole; Hudson, Thomas J; Dancey, Janet; Siu, Lillian L; Stein, Lincoln; Ferretti, Vincent

    2013-09-01

    Using sequencing information to guide clinical decision-making requires coordination of a diverse set of people and activities. In clinical genomics, the process typically includes sample acquisition, template preparation, genome data generation, analysis to identify and confirm variant alleles, interpretation of clinical significance, and reporting to clinicians. We describe a software application developed within a clinical genomics study, to support this entire process. The software application tracks patients, samples, genomic results, decisions and reports across the cohort, monitors progress and sends reminders, and works alongside an electronic data capture system for the trial's clinical and genomic data. It incorporates systems to read, store, analyze and consolidate sequencing results from multiple technologies, and provides a curated knowledge base of tumor mutation frequency (from the COSMIC database) annotated with clinical significance and drug sensitivity to generate reports for clinicians. By supporting the entire process, the application provides deep support for clinical decision making, enabling the generation of relevant guidance in reports for verification by an expert panel prior to forwarding to the treating physician. Copyright © 2013 Elsevier Inc. All rights reserved.

  10. Differences in sequencing technologies improve the retrieval of anammox bacterial genome from metagenomes

    NARCIS (Netherlands)

    Gori, F.; Tringe, S.G.; Folino, G.; Van Hijum, S.A.F.T.; Op den Camp, H.J.M.; Jetten, M.S.M.; Marchiori, E.

    2013-01-01

    Background Sequencing technologies have different biases, in single-genome sequencing and metagenomic sequencing; these can significantly affect ORFs recovery and the population distribution of a metagenome. In this paper we investigate how well different technologies represent information related

  11. Differences in sequencing technologies improve the retrieval of anammox bacterial genome from metagenomes

    NARCIS (Netherlands)

    Gori, F.; Tringe, S.G.; Folino, G.; Hijum, S.A.F.T. van; Camp, H.J. Op den; Jetten, M.S.; Marchiori, E.

    2013-01-01

    BACKGROUND: Sequencing technologies have different biases, in single-genome sequencing and metagenomic sequencing; these can significantly affect ORFs recovery and the population distribution of a metagenome. In this paper we investigate how well different technologies represent information related

  12. Whole-Genome de novo Sequencing Of Quail And Grey Partridge

    DEFF Research Database (Denmark)

    Holm, Lars-Erik; Panitz, Frank; Burt, Dave

    2011-01-01

    The development in sequencing methods has made it possible to perform whole genome de novo sequencing of species without large commercial interests. Within the EU-financed QUANTOMICS project (KBBE-2A-222664), we have performed de novo sequencing of quail (Coturnix coturnix) and grey partridge...... comparative studies towards the chicken genome and will aid in identifying evolutionarily conserved sequences within the Galliformes. The obtained sequences from quail and partridge represent a beginning of generating the whole genome sequence for these species. The continuation of establishing the genome...

  13. Assembly of the Complete Sitka Spruce Chloroplast Genome Using 10X Genomics' GemCode Sequencing Data.

    Directory of Open Access Journals (Sweden)

    Lauren Coombe

    Full Text Available The linked read sequencing library preparation platform by 10X Genomics produces barcoded sequencing libraries, which are subsequently sequenced using the Illumina short read sequencing technology. In this new approach, long fragments of DNA are partitioned into separate micro-reactions, where the same index sequence is incorporated into each of the sequencing fragment inserts derived from a given long fragment. In this study, we exploited this property by using reads from index sequences associated with a large number of reads, to assemble the chloroplast genome of the Sitka spruce tree (Picea sitchensis. Here we report on the first Sitka spruce chloroplast genome assembled exclusively from P. sitchensis genomic libraries prepared using the 10X Genomics protocol. We show that the resulting 124,049 base pair long genome shares high sequence similarity with the related white spruce and Norway spruce chloroplast genomes, but diverges substantially from a previously published P. sitchensis- P. thunbergii chimeric genome. The use of reads from high-frequency indices enabled separation of the nuclear genome reads from that of the chloroplast, which resulted in the simplification of the de Bruijn graphs used at the various stages of assembly.

  14. Rapid sequencing of the bamboo mitochondrial genome using Illumina technology and parallel episodic evolution of organelle genomes in grasses.

    Directory of Open Access Journals (Sweden)

    Peng-Fei Ma

    Full Text Available BACKGROUND: Compared to their counterparts in animals, the mitochondrial (mt genomes of angiosperms exhibit a number of unique features. However, unravelling their evolution is hindered by the few completed genomes, of which are essentially Sanger sequenced. While next-generation sequencing technologies have revolutionized chloroplast genome sequencing, they are just beginning to be applied to angiosperm mt genomes. Chloroplast genomes of grasses (Poaceae have undergone episodic evolution and the evolutionary rate was suggested to be correlated between chloroplast and mt genomes in Poaceae. It is interesting to investigate whether correlated rate change also occurred in grass mt genomes as expected under lineage effects. A time-calibrated phylogenetic tree is needed to examine rate change. METHODOLOGY/PRINCIPAL FINDINGS: We determined a largely completed mt genome from a bamboo, Ferrocalamus rimosivaginus (Poaceae, through Illumina sequencing of total DNA. With combination of de novo and reference-guided assembly, 39.5-fold coverage Illumina reads were finally assembled into scaffolds totalling 432,839 bp. The assembled genome contains nearly the same genes as the completed mt genomes in Poaceae. For examining evolutionary rate in grass mt genomes, we reconstructed a phylogenetic tree including 22 taxa based on 31 mt genes. The topology of the well-resolved tree was almost identical to that inferred from chloroplast genome with only minor difference. The inconsistency possibly derived from long branch attraction in mtDNA tree. By calculating absolute substitution rates, we found significant rate change (∼4-fold in mt genome before and after the diversification of Poaceae both in synonymous and nonsynonymous terms. Furthermore, the rate change was correlated with that of chloroplast genomes in grasses. CONCLUSIONS/SIGNIFICANCE: Our result demonstrates that it is a rapid and efficient approach to obtain angiosperm mt genome sequences using

  15. A genomic survey of the fish parasite Spironucleus salmonicida indicates genomic plasticity among diplomonads and significant lateral gene transfer in eukaryote genome evolution

    Directory of Open Access Journals (Sweden)

    Logsdon John M

    2007-02-01

    Full Text Available Abstract Background Comparative genomic studies of the mitochondrion-lacking protist group Diplomonadida (diplomonads has been lacking, although Giardia lamblia has been intensively studied. We have performed a sequence survey project resulting in 2341 expressed sequence tags (EST corresponding to 853 unique clones, 5275 genome survey sequences (GSS, and eleven finished contigs from the diplomonad fish parasite Spironucleus salmonicida (previously described as S. barkhanus. Results The analyses revealed a compact genome with few, if any, introns and very short 3' untranslated regions. Strikingly different patterns of codon usage were observed in genes corresponding to frequently sampled ESTs versus genes poorly sampled, indicating that translational selection is influencing the codon usage of highly expressed genes. Rigorous phylogenomic analyses identified 84 genes – mostly encoding metabolic proteins – that have been acquired by diplomonads or their relatively close ancestors via lateral gene transfer (LGT. Although most acquisitions were from prokaryotes, more than a dozen represent likely transfers of genes between eukaryotic lineages. Many genes that provide novel insights into the genetic basis of the biology and pathogenicity of this parasitic protist were identified including 149 that putatively encode variant-surface cysteine-rich proteins which are candidate virulence factors. A number of genomic properties that distinguish S. salmonicida from its human parasitic relative G. lamblia were identified such as nineteen putative lineage-specific gene acquisitions, distinct mutational biases and codon usage and distinct polyadenylation signals. Conclusion Our results highlight the power of comparative genomic studies to yield insights into the biology of parasitic protists and the evolution of their genomes, and suggest that genetic exchange between distantly-related protist lineages may be occurring at an appreciable rate in eukaryote

  16. Genome BLAST distance phylogenies inferred from whole plastid and whole mitochondrion genome sequences

    Directory of Open Access Journals (Sweden)

    Holland Barbara R

    2006-07-01

    Full Text Available Abstract Background Phylogenetic methods which do not rely on multiple sequence alignments are important tools in inferring trees directly from completely sequenced genomes. Here, we extend the recently described Genome BLAST Distance Phylogeny (GBDP strategy to compute phylogenetic trees from all completely sequenced plastid genomes currently available and from a selection of mitochondrial genomes representing the major eukaryotic lineages. BLASTN, TBLASTX, or combinations of both are used to locate high-scoring segment pairs (HSPs between two sequences from which pairwise similarities and distances are computed in different ways resulting in a total of 96 GBDP variants. The suitability of these distance formulae for phylogeny reconstruction is directly estimated by computing a recently described measure of "treelikeness", the so-called δ value, from the respective distance matrices. Additionally, we compare the trees inferred from these matrices using UPGMA, NJ, BIONJ, FastME, or STC, respectively, with the NCBI taxonomy tree of the taxa under study. Results Our results indicate that, at this taxonomic level, plastid genomes are much more valuable for inferring phylogenies than are mitochondrial genomes, and that distances based on breakpoints are of little use. Distances based on the proportion of "matched" HSP length to average genome length were best for tree estimation. Additionally we found that using TBLASTX instead of BLASTN and, particularly, combining TBLASTX and BLASTN leads to a small but significant increase in accuracy. Other factors do not significantly affect the phylogenetic outcome. The BIONJ algorithm results in phylogenies most in accordance with the current NCBI taxonomy, with NJ and FastME performing insignificantly worse, and STC performing as well if applied to high quality distance matrices. δ values are found to be a reliable predictor of phylogenetic accuracy. Conclusion Using the most treelike distance matrices, as

  17. Comparison of two Next Generation sequencing platforms for full genome sequencing of Classical Swine Fever Virus

    DEFF Research Database (Denmark)

    Fahnøe, Ulrik; Pedersen, Anders Gorm; Höper, Dirk

    2013-01-01

    to the consensus sequence. Additionally, we got an average sequence depth for the genome of 4000 for the Iontorrent PGM and 400 for the FLX platform making the mapping suitable for single nucleotide variant (SNV) detection. The analysis revealed a single non-silent SNV A10665G leading to the amino acid change D......3431G in the RNAdependent RNA polymerase NS5B. This SNV was present at 100% frequency in the 12th passage and only at 55% in the 4th passage, which could explain the difference in growth kinetics between the passages....

  18. Deciphering the biology of Mycobacterium tuberculosis from thecomplete genome sequence

    DEFF Research Database (Denmark)

    Cole, S.T.; Krogh, Anders Stærmose

    1998-01-01

    Countless millions of people have died from tuberculosis, a chronic infectious disease caused by the tubercle bacillus. The complete genome sequence of the best-characterized strain of Mycobacterium tuberculosis, H37Rv, has been determined and analysed in order to improve our understanding....... tuberculosis differs radically from other bacteria in that a very large portion of its coding capacity is devoted to the production of enzymes involved in lipogenesis and lipolysis, and to two new families of glycine-rich proteins with a repetitive structure that may represent a source of antigenic variation....

  19. Whole-Genome Sequencing in Epidemiology of Campylobacter jejuni Infections.

    Science.gov (United States)

    Llarena, Ann-Katrin; Taboada, Eduardo; Rossi, Mirko

    2017-05-01

    This review describes the current state of knowledge regarding the application of whole-genome sequencing (WGS) in the epidemiology of Campylobacter jejuni , the leading cause of bacterial gastroenteritis worldwide. We describe how WGS has increased our understanding of the evolutionary and epidemiological dynamics of this pathogen and how WGS has the potential to improve surveillance and outbreak detection. We have identified hurdles to the full implementation of WGS in public health settings. Despite these challenges, we think that ample evidence is available to support the benefits of integrating WGS into the routine monitoring of C. jejuni infections and outbreak investigations. Copyright © 2017 American Society for Microbiology.

  20. Twenty-One Genome Sequences from Pseudomonas Species and 19 Genome Sequences from Diverse Bacteria Isolated from the Rhizosphere and Endosphere of Populus deltoides

    Energy Technology Data Exchange (ETDEWEB)

    Brown, Steven D [ORNL; Utturkar, Sagar M [ORNL; Klingeman, Dawn Marie [ORNL; Johnson, Courtney M [ORNL; Martin, Stanton [ORNL; Land, Miriam L [ORNL; Lu, Tse-Yuan [ORNL; Schadt, Christopher Warren [ORNL; Doktycz, Mitchel John [ORNL; Pelletier, Dale A [ORNL

    2012-01-01

    To aid in the investigation of the Populus deltoides microbiome we generated draft genome sequences for twenty one Pseudomonas and twenty one other diverse bacteria isolated from Populus deltoides roots. Genome sequences for isolates similar to Acidovorax, Bradyrhizobium, Brevibacillus, Burkholderia, Caulobacter, Chryseobacterium, Flavobacterium, Herbaspirillum, Novosphingobium, Pantoea, Phyllobacterium, Polaromonas, Rhizobium, Sphingobium and Variovorax were generated.

  1. The first complete chloroplast genome sequences of Ulmus species by de novo sequencing: Genome comparative and taxonomic position analysis.

    Directory of Open Access Journals (Sweden)

    Li-Hui Zuo

    Full Text Available Elm (Ulmus has a long history of use as a high-quality heavy hardwood famous for its resistance to drought, cold, and salt. It grows in temperate, warm temperate, and subtropical regions. This is the first report of Ulmaceae chloroplast genomes by de novo sequencing. The Ulmus chloroplast genomes exhibited a typical quadripartite structure with two single-copy regions (long single copy [LSC] and short single copy [SSC] sections separated by a pair of inverted repeats (IRs. The lengths of the chloroplast genomes from five Ulmus ranged from 158,953 to 159,453 bp, with the largest observed in Ulmus davidiana and the smallest in Ulmus laciniata. The genomes contained 137-145 protein-coding genes, of which Ulmus davidiana var. japonica and U. davidiana had the most and U. pumila had the fewest. The five Ulmus species exhibited different evolutionary routes, as some genes had been lost. In total, 18 genes contained introns, 13 of which (trnL-TAA+, trnL-TAA-, rpoC1-, rpl2-, ndhA-, ycf1, rps12-, rps12+, trnA-TGC+, trnA-TGC-, trnV-TAC-, trnI-GAT+, and trnI-GAT were shared among all five species. The intron of ycf1 was the longest (5,675bp while that of trnF-AAA was the smallest (53bp. All Ulmus species except U. davidiana exhibited the same degree of amplification in the IR region. To determine the phylogenetic positions of the Ulmus species, we performed phylogenetic analyses using common protein-coding genes in chloroplast sequences of 42 other species published in NCBI. The cluster results showed the closest plants to Ulmaceae were Moraceae and Cannabaceae, followed by Rosaceae. Ulmaceae and Moraceae both belonged to Urticales, and the chloroplast genome clustering results were consistent with their traditional taxonomy. The results strongly supported the position of Ulmaceae as a member of the order Urticales. In addition, we found a potential error in the traditional taxonomies of U. davidiana and U. davidiana var. japonica, which should be

  2. The first complete chloroplast genome sequences of Ulmus species by de novo sequencing: Genome comparative and taxonomic position analysis.

    Science.gov (United States)

    Zuo, Li-Hui; Shang, Ai-Qin; Zhang, Shuang; Yu, Xiao-Yue; Ren, Ya-Chao; Yang, Min-Sheng; Wang, Jin-Mao

    2017-01-01

    Elm (Ulmus) has a long history of use as a high-quality heavy hardwood famous for its resistance to drought, cold, and salt. It grows in temperate, warm temperate, and subtropical regions. This is the first report of Ulmaceae chloroplast genomes by de novo sequencing. The Ulmus chloroplast genomes exhibited a typical quadripartite structure with two single-copy regions (long single copy [LSC] and short single copy [SSC] sections) separated by a pair of inverted repeats (IRs). The lengths of the chloroplast genomes from five Ulmus ranged from 158,953 to 159,453 bp, with the largest observed in Ulmus davidiana and the smallest in Ulmus laciniata. The genomes contained 137-145 protein-coding genes, of which Ulmus davidiana var. japonica and U. davidiana had the most and U. pumila had the fewest. The five Ulmus species exhibited different evolutionary routes, as some genes had been lost. In total, 18 genes contained introns, 13 of which (trnL-TAA+, trnL-TAA-, rpoC1-, rpl2-, ndhA-, ycf1, rps12-, rps12+, trnA-TGC+, trnA-TGC-, trnV-TAC-, trnI-GAT+, and trnI-GAT) were shared among all five species. The intron of ycf1 was the longest (5,675bp) while that of trnF-AAA was the smallest (53bp). All Ulmus species except U. davidiana exhibited the same degree of amplification in the IR region. To determine the phylogenetic positions of the Ulmus species, we performed phylogenetic analyses using common protein-coding genes in chloroplast sequences of 42 other species published in NCBI. The cluster results showed the closest plants to Ulmaceae were Moraceae and Cannabaceae, followed by Rosaceae. Ulmaceae and Moraceae both belonged to Urticales, and the chloroplast genome clustering results were consistent with their traditional taxonomy. The results strongly supported the position of Ulmaceae as a member of the order Urticales. In addition, we found a potential error in the traditional taxonomies of U. davidiana and U. davidiana var. japonica, which should be confirmed with a

  3. A Modified Protocol for Bisulfite Genomic Sequencing of Difficult Samples

    Directory of Open Access Journals (Sweden)

    Pappas Jane J

    2009-06-01

    Full Text Available Abstract The bisulfite genomic sequencing protocol is a widely used method for analyzing DNA methylation. It relies on the deamination of unmethylated cytosine residues to uracil; however, its high rates of DNA degradation and incomplete cytosine to uracil conversion often lead to failed experiments, uninformative results, and false positives. Here, we report the addition of a single-step multiple restriction enzyme digestion (MRED designed to differentially digest polymerase chain reaction products amplified from unconverted DNA while leaving those of converted DNA intact. We show that for our model system, RARB2 P2 promoter, use of MRED increased informative sequencings ninefold, and MRED did not alter the clonal representation in one fully methylated cell line, H-596, treated or not with 5-azadeoxycytidine, a methylation inhibitor. We believe that this method may easily be adapted for analyzing other genes and provide guidelines for selecting the most appropriate MRED restriction enzymes.

  4. A Modified Protocol for Bisulfite Genomic Sequencing of Difficult Samples

    Directory of Open Access Journals (Sweden)

    Toulouse André

    2009-01-01

    Full Text Available Abstract The bisulfite genomic sequencing protocol is a widely used method for analyzing DNA methylation. It relies on the deamination of unmethylated cytosine residues to uracil; however, its high rates of DNA degradation and incomplete cytosine to uracil conversion often lead to failed experiments, uninformative results, and false positives. Here, we report the addition of a single-step multiple restriction enzyme digestion (MRED designed to differentially digest polymerase chain reaction products amplified from unconverted DNA while leaving those of converted DNA intact. We show that for our model system, RARB2 P2 promoter, use of MRED increased informative sequencings ninefold, and MRED did not alter the clonal representation in one fully methylated cell line, H-596, treated or not with 5-azadeoxycytidine, a methylation inhibitor. We believe that this method may easily be adapted for analyzing other genes and provide guidelines for selecting the most appropriate MRED restriction enzymes.

  5. Identification of transcribed sequences in the human genome

    Energy Technology Data Exchange (ETDEWEB)

    Gardiner, K.

    1992-12-01

    The workshop was held at the National Institutes of Mental Health, Bethesda, Maryland, on October 4 and 5, 1991. Twenty-four investigators attended from England, Germany and the United States. The topics discussed included: Genome sequence analysis using computer assisted detection of open reading frames, splice sites and hexamer patterns, direct exon identification using trapping of internal and 3' exons, and a recombination based system, cDNA library construction and screening, including the use of normalization and subtraction procedures, Alu and splice donor site PCR from hybrid cell lines, and microdissection clones as probes, use of labeled CDNAS as probes to screen lambda and cosmid libraries, and sequencing of random cDNAs.

  6. Defining and Evaluating a Core Genome Multilocus Sequence Typing Scheme for Whole-Genome Sequence-Based Typing of Listeria monocytogenes

    OpenAIRE

    Ruppitsch, Werner; Pietzka, Ariane; Prior, Karola; Bletz, Stefan; Fernandez, Haizpea Lasa; Allerberger, Franz; Harmsen, Dag; Mellmann, Alexander

    2015-01-01

    Whole-genome sequencing (WGS) has emerged today as an ultimate typing tool to characterize Listeria monocytogenes outbreaks. However, data analysis and interlaboratory comparability of WGS data are still challenging for most public health laboratories. Therefore, we have developed and evaluated a new L. monocytogenes typing scheme based on genome-wide gene-by-gene comparisons (core genome multilocus the sequence typing [cgMLST]) to allow for a unique typing nomenclature. Initially, we determi...

  7. Genomic sequence is highly predictive of local nucleosome depletion.

    Directory of Open Access Journals (Sweden)

    Guo-Cheng Yuan

    2008-01-01

    Full Text Available The regulation of DNA accessibility through nucleosome positioning is important for transcription control. Computational models have been developed to predict genome-wide nucleosome positions from DNA sequences, but these models consider only nucleosome sequences, which may have limited their power. We developed a statistical multi-resolution approach to identify a sequence signature, called the N-score, that distinguishes nucleosome binding DNA from non-nucleosome DNA. This new approach has significantly improved the prediction accuracy. The sequence information is highly predictive for local nucleosome enrichment or depletion, whereas predictions of the exact positions are only modestly more accurate than a null model, suggesting the importance of other regulatory factors in fine-tuning the nucleosome positions. The N-score in promoter regions is negatively correlated with gene expression levels. Regulatory elements are enriched in low N-score regions. While our model is derived from yeast data, the N-score pattern computed from this model agrees well with recent high-resolution protein-binding data in human.

  8. SmashCell: A software framework for the analysis of single-cell amplified genome sequences

    DEFF Research Database (Denmark)

    Harrington, Eoghan D; Arumugam, Manimozhiyan; Raes, Jeroen

    2010-01-01

    SUMMARY: Recent advances in single-cell manipulation technology, whole genome amplification and high-throughput sequencing have now made it possible to sequence the genome of an individual cell. The bioinformatic analysis of these genomes however is far more complicated than the analysis of those...

  9. Genome Sequence of the Banana Pathogen Dickeya zeae Strain MS1, Which Causes Bacterial Soft Rot.

    Science.gov (United States)

    Zhang, Jing-Xin; Lin, Bi-Run; Shen, Hui-Fang; Pu, Xiao-Ming

    2013-06-13

    We report a draft genome sequence of Dickeya zeae strain MS1, which is the causative agent of banana soft rot in China, and we show several of its specific properties compared with those of other D. zeae strains. Genome sequencing provides a tool for understanding the genomic determination of the pathogenicity and phylogeny placement of this pathogen.

  10. Genome Sequence of the Banana Pathogen Dickeya zeae Strain MS1, Which Causes Bacterial Soft Rot

    OpenAIRE

    Zhang, Jing-Xin; Lin, Bi-Run; Shen, Hui-Fang; Pu, Xiao-Ming

    2013-01-01

    We report a draft genome sequence of Dickeya zeae strain MS1, which is the causative agent of banana soft rot in China, and we show several of its specific properties compared with those of other D.?zeae strains. Genome sequencing provides a tool for understanding the genomic determination of the pathogenicity and phylogeny placement of this pathogen.

  11. The Contribution of Short Repeats of Low Sequence Complexity to Large Conifer Genomes

    Science.gov (United States)

    A. Schmidt; R.L. Doudrick; J.S. Heslop-Harrison; T. Schmidt

    2000-01-01

    Abstract: The abundance and genomic organization of six simple sequence repeats, consisting of di-, tri-, and tetranucleotide sequence motifs, and a minisatellite repeat have been analyzed in different gymnosperms by Southern hybridization. Within the gymnosperm genomes investigated, the abundance and genomic organization of micro- and...

  12. Draft genome sequence of Cellulomonas carbonis T26(T) and comparative analysis of six Cellulomonas genomes.

    Science.gov (United States)

    Zhuang, Weiping; Zhang, Shengzhe; Xia, Xian; Wang, Gejiao

    2015-01-01

    Most Cellulomonas strains are cellulolytic and this feature may be applied in straw degradation and bioremediation. In this study, Cellulomonas carbonis T26(T), Cellulomonas bogoriensis DSM 16987(T) and Cellulomonas cellasea 20108(T) were sequenced. Here we described the draft genomic information of C. carbonis T26(T) and compared it to the related Cellulomonas genomes. Strain T26(T) has a 3,990,666 bp genome size with a G + C content of 73.4 %, containing 3418 protein-coding genes and 59 RNA genes. The results showed good correlation between the genotypes and the physiological phenotypes. The information are useful for the better application of the Cellulomonas strains.

  13. Privacy preserving processing of genomic data: A survey.

    Science.gov (United States)

    Akgün, Mete; Bayrak, A Osman; Ozer, Bugra; Sağıroğlu, M Şamil

    2015-08-01

    Recently, the rapid advance in genome sequencing technology has led to production of huge amount of sensitive genomic data. However, a serious privacy challenge is confronted with increasing number of genetic tests as genomic data is the ultimate source of identity for humans. Lately, privacy threats and possible solutions regarding the undesired access to genomic data are discussed, however it is challenging to apply proposed solutions to real life problems due to the complex nature of security definitions. In this review, we have categorized pre-existing problems and corresponding solutions in more understandable and convenient way. Additionally, we have also included open privacy problems coming with each genomic data processing procedure. We believe our classification of genome associated privacy problems will pave the way for linking of real-life problems with previously proposed methods. Copyright © 2015 Elsevier Inc. All rights reserved.

  14. Complete genome sequence of “Enterobacter lignolyticus” SCF1

    Energy Technology Data Exchange (ETDEWEB)

    DeAngelis, Kristen M.; D' Haeseleer, Patrik; Chivian, Dylan; Fortney, Julian L.; Khudyakov, Jane I.; Simmons, Blake A.; Woo, Hannah; Arkin, Adam P.; Davenport, Karen W.; Goodwin, Lynne A.; Chen, Amy; Ivanova, Natalia; Kyrpides, Nikos C.; Mavromatis, Konstantinos; Woyke, Tanja; Hazen, Terry C.

    2011-09-23

    In an effort to discover anaerobic bacteria capable of lignin degradation, we isolated 'Ente-robacter lignolyticus' SCF1 on minimal media with alkali lignin as the sole source of carbon. This organism was isolated anaerobically from tropical forest soils collected from the Short Cloud Forest site in the El Yunque National Forest in Puerto Rico, USA, part of the Luquillo Long-Term Ecological Research Station. At this site, the soils experience strong fluctuations in redox potential and are net methane producers. Because of its ability to grow on lignin anae-robically, we sequenced the genome. The genome of 'E. lignolyticus' SCF1 is 4.81 Mbp with no detected plasmids, and includes a relatively small arsenal of lignocellulolytic carbohy-drate active enzymes. Lignin degradation was observed in culture, and the genome revealed two putative laccases, a putative peroxidase, and a complete 4-hydroxyphenylacetate degra-dation pathway encoded in a single gene cluster.

  15. DOE project on genome mapping and sequencing. Progress report, 1992

    Energy Technology Data Exchange (ETDEWEB)

    Evans, G.A.

    1992-12-31

    These efforts on the human genome project were initiated in September, 1990, to contribute towards completion of the human genome project physical mapping effort. In the original application, the authors proposed a novel strategy for constructing a physical map of human chromosome 11, based upon techniques derived in this group and by others. The original goals were to (1) produce a set of cosmid reference clones mapped to specific sites by high resolution fluorescence in situ hybridization, (2) produce a set of associated STS sequences and PCR primers for each site, (3) isolate YAC clones corresponding to each STS and, (4) construct YAC contigs such that > 90% of the chromosome would be covered by contigs of 2 mb or greater. Since that time, and with the advent of new technology and reagents, the strategy has been modified slightly but still retains the same goals as originally proposed. The authors have added a project to produce chromosome 11-specific cDNAs and determine the map location and DNA sequence of a selected portion of them.

  16. A Genome Sequencing Program for Novel Undiagnosed Diseases

    Science.gov (United States)

    Bloss, Cinnamon S.; Scott-Van Zeeland, Ashley A.; Topol, Sarah E.; Darst, Burcu F.; Boeldt, Debra L.; Erikson, Galina A.; Bethel, Kelly J.; Bjork, Robert L.; Friedman, Jennifer R.; Hwynn, Nelson; Patay, Bradley A.; Pockros, Paul J.; Scott, Erick R.; Simon, Ronald A.; Williams, Gary W.; Schork, Nicholas J.; Topol, Eric J.; Torkamani, Ali

    2015-01-01

    Purpose The Scripps Idiopathic Diseases of huMan (IDIOM) study aims to discover novel gene-disease relationships and provide molecular genetic diagnosis and treatment guidance for individuals with novel diseases using genome sequencing integrated with clinical assessment and multidisciplinary case review. Methods Here we describe the IDIOM study operational protocol and initial results. Results 121 cases underwent first tier review by the principal investigators to determine if the primary inclusion criteria were satisfied, 59 (48.8%) underwent second tier review by our clinician-scientist review panel, and 17 (14.0%) patients and their family members were enrolled. 60% of cases resulted in a plausible molecular diagnosis. 18% of cases resulted in a confirmed molecular diagnosis. 2 of 3 confirmed cases led to the identification of novel gene-disease relationships. In the third confirmed case, a previously described but unrecognized disease was revealed. In all three confirmed cases, a new clinical management strategy was initiated based on the genetic findings. Conclusions Genome sequencing provides tangible clinical benefit for individuals with idiopathic genetic disease, not only in the context of molecular genetic diagnosis of known rare conditions, but also in cases where prior clinical information regarding a new genetic disorder is lacking. PMID:25790160

  17. The complete genome sequence and comparative genome analysis of the high pathogenicity Yersinia enterocolitica strain 8081.

    Directory of Open Access Journals (Sweden)

    Nicholas R Thomson

    2006-12-01

    Full Text Available The human enteropathogen, Yersinia enterocolitica, is a significant link in the range of Yersinia pathologies extending from mild gastroenteritis to bubonic plague. Comparison at the genomic level is a key step in our understanding of the genetic basis for this pathogenicity spectrum. Here we report the genome of Y. enterocolitica strain 8081 (serotype 0:8; biotype 1B and extensive microarray data relating to the genetic diversity of the Y. enterocolitica species. Our analysis reveals that the genome of Y. enterocolitica strain 8081 is a patchwork of horizontally acquired genetic loci, including a plasticity zone of 199 kb containing an extraordinarily high density of virulence genes. Microarray analysis has provided insights into species-specific Y. enterocolitica gene functions and the intraspecies differences between the high, low, and nonpathogenic Y. enterocolitica biotypes. Through comparative genome sequence analysis we provide new information on the evolution of the Yersinia. We identify numerous loci that represent ancestral clusters of genes potentially important in enteric survival and pathogenesis, which have been lost or are in the process of being lost, in the other sequenced Yersinia lineages. Our analysis also highlights large metabolic operons in Y. enterocolitica that are absent in the related enteropathogen, Yersinia pseudotuberculosis, indicating major differences in niche and nutrients used within the mammalian gut. These include clusters directing, the production of hydrogenases, tetrathionate respiration, cobalamin synthesis, and propanediol utilisation. Along with ancestral gene clusters, the genome of Y. enterocolitica has revealed species-specific and enteropathogen-specific loci. This has provided important insights into the pathology of this bacterium and, more broadly, into the evolution of the genus. Moreover, wider investigations looking at the patterns of gene loss and gain in the Yersinia have highlighted common

  18. Functional annotation from the genome sequence of the giant panda.

    Science.gov (United States)

    Huo, Tong; Zhang, Yinjie; Lin, Jianping

    2012-08-01

    The giant panda is one of the most critically endangered species due to the fragmentation and loss of its habitat. Studying the functions of proteins in this animal, especially specific trait-related proteins, is therefore necessary to protect the species. In this work, the functions of these proteins were investigated using the genome sequence of the giant panda. Data on 21,001 proteins and their functions were stored in the Giant Panda Protein Database, in which the proteins were divided into two groups: 20,179 proteins whose functions can be predicted by GeneScan formed the known-function group, whereas 822 proteins whose functions cannot be predicted by GeneScan comprised the unknown-function group. For the known-function group, we further classified the proteins by molecular function, biological process, cellular component, and tissue specificity. For the unknown-function group, we developed a strategy in which the proteins were filtered by cross-Blast to identify panda-specific proteins under the assumption that proteins related to the panda-specific traits in the unknown-function group exist. After this filtering procedure, we identified 32 proteins (2 of which are membrane proteins) specific to the giant panda genome as compared against the dog and horse genomes. Based on their amino acid sequences, these 32 proteins were further analyzed by functional classification using SVM-Prot, motif prediction using MyHits, and interacting protein prediction using the Database of Interacting Proteins. Nineteen proteins were predicted to be zinc-binding proteins, thus affecting the activities of nucleic acids. The 32 panda-specific proteins will be further investigated by structural and functional analysis.

  19. Insights into three whole-genome duplications gleaned from the Paramecium caudatum genome sequence.

    Science.gov (United States)

    McGrath, Casey L; Gout, Jean-Francois; Doak, Thomas G; Yanagi, Akira; Lynch, Michael

    2014-08-01

    Paramecium has long been a model eukaryote. The sequence of the Paramecium tetraurelia genome reveals a history of three successive whole-genome duplications (WGDs), and the sequences of P. biaurelia and P. sexaurelia suggest that these WGDs are shared by all members of the aurelia species complex. Here, we present the genome sequence of P. caudatum, a species closely related to the P. aurelia species group. P. caudatum shares only the most ancient of the three WGDs with the aurelia complex. We found that P. caudatum maintains twice as many paralogs from this early event as the P. aurelia species, suggesting that post-WGD gene retention is influenced by subsequent WGDs and supporting the importance of selection for dosage in gene retention. The availability of P. caudatum as an outgroup allows an expanded analysis of the aurelia intermediate and recent WGD events. Both the Guanine+Cytosine (GC) content and the expression level of preduplication genes are significant predictors of duplicate retention. We find widespread asymmetrical evolution among aurelia paralogs, which is likely caused by gradual pseudogenization rather than by neofunctionalization. Finally, cases of divergent resolution of intermediate WGD duplicates between aurelia species implicate this process acts as an ongoing reinforcement mechanism of reproductive isolation long after a WGD event. Copyright © 2014 by the Genetics Society of America.

  20. SISPA-Seq for rapid whole genome surveys of bacterial isolates.

    Science.gov (United States)

    Wright, Meredith S; Stockwell, Timothy B; Beck, Erin; Busam, Dana A; Bajaksouzian, Saralee; Jacobs, Michael R; Bonomo, Robert A; Adams, Mark D

    2015-06-01

    Whole genome sequencing (WGS) of large isolate collections has many applications, yet sequencing costs are still significant. We sought to develop a rapid and cost efficient WGS method to address fundamental questions in clinical microbiology. We evaluated the performance of SISPA (Sequence-Independent, Single-Primer Amplification) combined with next-generation sequencing (SISPA-Seq) of 75 clinical isolates of Acinetobacter baumannii to establish whether SISPA-Seq resulted in sufficient coverage and quality to (1) determine strain phylogenetic placement and (2) and carriage of known antibiotic resistance (AbR) genes. Strains for which whole genome sequences were available were included for validation. Two libraries for each strain were constructed from separate SISPA reactions with different barcoded primers, using genomic DNA prepared from either high quality or rapid heat-lysis preparations. SISPA-Seq resulted in a median of 65× genome coverage when reads from both primer sets were combined. Coverage and quality were sufficient for detection of AbR genes by comparison of reads to the ARG-ANNOT database and were often sufficient to distinguish between different allelic variants of the same gene. kSNP and RAxML were used to construct a robust phylogeny based on single-nucleotide variants (SNVs) that showed that the SISPA-Seq data was sufficient for sensitive and accurate phylogenetic placement. Advantages of the SISPA-Seq method include inexpensive and rapid DNA preparation and a typical total cost less than one-half that of standard genome sequencing. In summary, SISPA-Seq can be used to survey whole genomes of a large strain collection and identify strains that should be targeted for additional sequencing. Copyright © 2015 Elsevier B.V. All rights reserved.

  1. Next Generation Semiconductor Based Sequencing of the Donkey (Equus asinus Genome Provided Comparative Sequence Data against the Horse Genome and a Few Millions of Single Nucleotide Polymorphisms.

    Directory of Open Access Journals (Sweden)

    Francesca Bertolini

    Full Text Available Few studies investigated the donkey (Equus asinus at the whole genome level so far. Here, we sequenced the genome of two male donkeys using a next generation semiconductor based sequencing platform (the Ion Proton sequencer and compared obtained sequence information with the available donkey draft genome (and its Illumina reads from which it was originated and with the EquCab2.0 assembly of the horse genome. Moreover, the Ion Torrent Personal Genome Analyzer was used to sequence reduced representation libraries (RRL obtained from a DNA pool including donkeys of different breeds (Grigio Siciliano, Ragusano and Martina Franca. The number of next generation sequencing reads aligned with the EquCab2.0 horse genome was larger than those aligned with the draft donkey genome. This was due to the larger N50 for contigs and scaffolds of the horse genome. Nucleotide divergence between E. caballus and E. asinus was estimated to be ~ 0.52-0.57%. Regions with low nucleotide divergence were identified in several autosomal chromosomes and in the whole chromosome X. These regions might be evolutionally important in equids. Comparing Y-chromosome regions we identified variants that could be useful to track donkey paternal lineages. Moreover, about 4.8 million of single nucleotide polymorphisms (SNPs in the donkey genome were identified and annotated combining sequencing data from Ion Proton (whole genome sequencing and Ion Torrent (RRL runs with Illumina reads. A higher density of SNPs was present in regions homologous to horse chromosome 12, in which several studies reported a high frequency of copy number variants. The SNPs we identified constitute a first resource useful to describe variability at the population genomic level in E. asinus and to establish monitoring systems for the conservation of donkey genetic resources.

  2. Next Generation Semiconductor Based Sequencing of the Donkey (Equus asinus) Genome Provided Comparative Sequence Data against the Horse Genome and a Few Millions of Single Nucleotide Polymorphisms.

    Science.gov (United States)

    Bertolini, Francesca; Scimone, Concetta; Geraci, Claudia; Schiavo, Giuseppina; Utzeri, Valerio Joe; Chiofalo, Vincenzo; Fontanesi, Luca

    2015-01-01

    Few studies investigated the donkey (Equus asinus) at the whole genome level so far. Here, we sequenced the genome of two male donkeys using a next generation semiconductor based sequencing platform (the Ion Proton sequencer) and compared obtained sequence information with the available donkey draft genome (and its Illumina reads from which it was originated) and with the EquCab2.0 assembly of the horse genome. Moreover, the Ion Torrent Personal Genome Analyzer was used to sequence reduced representation libraries (RRL) obtained from a DNA pool including donkeys of different breeds (Grigio Siciliano, Ragusano and Martina Franca). The number of next generation sequencing reads aligned with the EquCab2.0 horse genome was larger than those aligned with the draft donkey genome. This was due to the larger N50 for contigs and scaffolds of the horse genome. Nucleotide divergence between E. caballus and E. asinus was estimated to be ~ 0.52-0.57%. Regions with low nucleotide divergence were identified in several autosomal chromosomes and in the whole chromosome X. These regions might be evolutionally important in equids. Comparing Y-chromosome regions we identified variants that could be useful to track donkey paternal lineages. Moreover, about 4.8 million of single nucleotide polymorphisms (SNPs) in the donkey genome were identified and annotated combining sequencing data from Ion Proton (whole genome sequencing) and Ion Torrent (RRL) runs with Illumina reads. A higher density of SNPs was present in regions homologous to horse chromosome 12, in which several studies reported a high frequency of copy number variants. The SNPs we identified constitute a first resource useful to describe variability at the population genomic level in E. asinus and to establish monitoring systems for the conservation of donkey genetic resources.

  3. Genome Sequence of Torulaspora delbrueckii NRRL Y-50541, Isolated from Mezcal Fermentation

    Science.gov (United States)

    Gomez-Angulo, Jorge; Vega-Alvarado, Leticia; Escalante-García, Zazil; Grande, Ricardo; Gschaedler-Mathis, Anne; Amaya-Delgado, Lorena

    2015-01-01

    Torulaspora delbrueckii presents metabolic features interesting for biotechnological applications (in the dairy and wine industries). Recently, the T. delbrueckii CBS 1146 genome, which has been maintained under laboratory conditions since 1970, was published. Thus, a genome of a new mezcal yeast was sequenced and characterized and showed genetic differences and a higher genome assembly quality, offering a better reference genome. PMID:26205871

  4. Returning individual research results for genome sequences of pancreatic cancer

    Science.gov (United States)

    2014-01-01

    Background Disclosure of individual results to participants in genomic research is a complex and contentious issue. There are many existing commentaries and opinion pieces on the topic, but little empirical data concerning actual cases describing how individual results have been returned. Thus, the real life risks and benefits of disclosing individual research results to participants are rarely if ever presented as part of this debate. Methods The Australian Pancreatic Cancer Genome Initiative (APGI) is an Australian contribution to the International Cancer Genome Consortium (ICGC), that involves prospective sequencing of tumor and normal genomes of study participants with pancreatic cancer in Australia. We present three examples that illustrate different facets of how research results may arise, and how they may be returned to individuals within an ethically defensible and clinically practical framework. This framework includes the necessary elements identified by others including consent, determination of the significance of results and which to return, delineation of the responsibility for communication and the clinical pathway for managing the consequences of returning results. Results Of 285 recruited patients, we returned results to a total of 25 with no adverse events to date. These included four that were classified as medically actionable, nine as clinically significant and eight that were returned at the request of the treating clinician. Case studies presented depict instances where research results impacted on cancer susceptibility, current treatment and diagnosis, and illustrate key practical challenges of developing an effective framework. Conclusions We suggest that return of individual results is both feasible and ethically defensible but only within the context of a robust framework that involves a close relationship between researchers and clinicians. PMID:24963353

  5. The Pinus taeda genome is characterized by diverse and highly diverged repetitive sequences.

    Science.gov (United States)

    Kovach, Allen; Wegrzyn, Jill L; Parra, Genis; Holt, Carson; Bruening, George E; Loopstra, Carol A; Hartigan, James; Yandell, Mark; Langley, Charles H; Korf, Ian; Neale, David B

    2010-07-07

    In today's age of genomic discovery, no attempt has been made to comprehensively sequence a gymnosperm genome. The largest genus in the coniferous family Pinaceae is Pinus, whose 110-120 species have extremely large genomes (c. 20-40 Gb, 2N = 24). The size and complexity of these genomes have prompted much speculation as to the feasibility of completing a conifer genome sequence. Conifer genomes are reputed to be highly repetitive, but there is little information available on the nature and identity of repetitive units in gymnosperms. The pines have extensive genetic resources, with approximately 329000 ESTs from eleven species and genetic maps in eight species, including a dense genetic map of the twelve linkage groups in Pinus taeda. We present here the Sanger sequence and annotation of ten P. taeda BAC clones and Genome Analyzer II whole genome shotgun (WGS) sequences representing 7.5% of the genome. Computational annotation of ten BACs predicts three putative protein-coding genes and at least fifteen likely pseudogenes in nearly one megabase of sequence. We found three conifer-specific LTR retroelements in the BACs, and tentatively identified at least 15 others based on evidence from the distantly related angiosperms. Alignment of WGS sequences to the BACs indicates that 80% of BAC sequences have similar copies (> or = 75% nucleotide identity) elsewhere in the genome, but only 23% have identical copies (99% identity). The three most common repetitive elements in the genome were identified and, when combined, represent less than 5% of the genome. This study indicates that the majority of repeats in the P. taeda genome are 'novel' and will therefore require additional BAC or genomic sequencing for accurate characterization. The pine genome contains a very large number of diverged and probably defunct repetitive elements. This study also provides new evidence that sequencing a pine genome using a WGS approach is a feasible goal.

  6. The Pinus taeda genome is characterized by diverse and highly diverged repetitive sequences

    Directory of Open Access Journals (Sweden)

    Yandell Mark

    2010-07-01

    Full Text Available Abstract Background In today's age of genomic discovery, no attempt has been made to comprehensively sequence a gymnosperm genome. The largest genus in the coniferous family Pinaceae is Pinus, whose 110-120 species have extremely large genomes (c. 20-40 Gb, 2N = 24. The size and complexity of these genomes have prompted much speculation as to the feasibility of completing a conifer genome sequence. Conifer genomes are reputed to be highly repetitive, but there is little information available on the nature and identity of repetitive units in gymnosperms. The pines have extensive genetic resources, with approximately 329000 ESTs from eleven species and genetic maps in eight species, including a dense genetic map of the twelve linkage groups in Pinus taeda. Results We present here the Sanger sequence and annotation of ten P. taeda BAC clones and Genome Analyzer II whole genome shotgun (WGS sequences representing 7.5% of the genome. Computational annotation of ten BACs predicts three putative protein-coding genes and at least fifteen likely pseudogenes in nearly one megabase of sequence. We found three conifer-specific LTR retroelements in the BACs, and tentatively identified at least 15 others based on evidence from the distantly related angiosperms. Alignment of WGS sequences to the BACs indicates that 80% of BAC sequences have similar copies (≥ 75% nucleotide identity elsewhere in the genome, but only 23% have identical copies (99% identity. The three most common repetitive elements in the genome were identified and, when combined, represent less than 5% of the genome. Conclusions This study indicates that the majority of repeats in the P. taeda genome are 'novel' and will therefore require additional BAC or genomic sequencing for accurate characterization. The pine genome contains a very large number of diverged and probably defunct repetitive elements. This study also provides new evidence that sequencing a pine genome using a WGS approach is

  7. Suspected cases of intracontinental Burkholderia pseudomallei sequence type homoplasy resolved using whole-genome sequencing.

    Science.gov (United States)

    Aziz, Ammar; Sarovich, Derek S; Harris, Tegan M; Kaestli, Mirjam; McRobb, Evan; Mayo, Mark; Currie, Bart J; Price, Erin P

    2017-11-01

    Burkholderia pseudomallei is a Gram-negative environmental bacterium that causes melioidosis, a disease of high mortality in humans and animals. Multilocus sequence typing (MLST) is a popular and portable genotyping method that has been used extensively to characterise the genetic diversity of B. pseudomallei populations. MLST has been central to our understanding of the underlying phylogeographical signal present in the B. pseudomallei genome, revealing distinct populations on both the intra- and the inter-continental level. However, due to its high recombination rate, it is possible for B. pseudomallei isolates to share the same multilocus sequence type (ST) despite being genetically and geographically distinct, with two cases of 'ST homoplasy' recently reported between Cambodian and Australian B. pseudomallei isolates. This phenomenon can dramatically confound conclusions about melioidosis transmission patterns and source attribution, a critical issue for bacteria such as B. pseudomallei that are of concern due to their potential for use as bioweapons. In this study, we used whole-genome sequencing to identify the first reported instances of intracontinental ST homoplasy, which involved ST-722 and ST-804 B. pseudomallei isolates separated by large geographical distances. In contrast, a third suspected homoplasy case was shown to be a true long-range (460 km) dispersal event between a remote Australian island and the Australian mainland. Our results show that, whilst a highly useful and portable method, MLST can occasionally lead to erroneous conclusions about isolate origin and disease attribution. In cases where a shared ST is identified between geographically distant locales, whole-genome sequencing should be used to resolve strain origin.

  8. Prediction of causative genomic relationships using sequence data of five French and Danish dairy cattle breeds

    DEFF Research Database (Denmark)

    van den Berg, Irene; Boichard, Didier; Lund, Mogens Sandø

    The increasing number of sequenced individuals makes the use of sequences for genomic prediction feasible. With the current SNP chips, the accuracy of genomic selection is limited in populations with low linkage disequilibrium, as is the case in across-breed prediction in dairy cattle. Sequence d...

  9. Genome Sequences of Three Apple chlorotic leaf spot virus Isolates from Hawthorns in China

    OpenAIRE

    Guo, Wei; Zheng, Wenyan; Wang, Mei; Li, Xiaohong; Ma, Yue; Dai, Hongyan

    2016-01-01

    The genome sequences of Apple chlorotic leaf spot virus (ACLSV) isolates from three accessions of hawthorns (Crataegus pinnatifida) grown at Shenyang Agricultural University were determined using Illumina RNA-seq. To confirm the assembly data from the de novo sequencing, two ACLSV genomic sequences (SY01 and SY02) were sequenced using the Sanger method. The SY01 and SY02 sequences obtained with the Sanger method showed 99.5% and 99.7% nucleotide identity with the transcriptome data, respectiv...

  10. Unique sequence features of the Human Adenovirus 31 complete genomic sequence are conserved in clinical isolates

    Directory of Open Access Journals (Sweden)

    Darr Sebastian

    2009-11-01

    Full Text Available Abstract Background Human adenoviruses (HAdV are causing a broad spectrum of diseases. One of the most severe forms of adenovirus infection is a disseminated disease resulting in significant morbidity and mortality. Several reports in recent years have identified HAdV-31 from species A (HAdV-A31 as a cause of disseminated disease in children following haematopoetic stem cell transplantation (hSCT and liver transplantation. We sequenced and analyzed the complete genome of the HAdV-A31 prototype strain to uncover unique sequence motifs associated with its high virulence. Moreover, we sequenced coding regions known to be essential for tropism and virulence (early transcription units E1A, E3, E4, the fiber knob and the penton base of HAdV-A31 clinical isolates from patients with disseminated disease. Results The genome size of HAdV-A31 is 33763 base pairs (bp in length with a GC content of 46.36%. Nucleotide alignment to the closely related HAdV-A12 revealed an overall homology of 84.2%. The genome organization into early, intermediate and late regions is similar to HAdV-A12. Sequence analysis of the prototype strain showed unique sequence features such as an immunoglobulin-like domain in the species A specific gene product E3 CR1 beta and a potentially integrin binding RGD motif in the C-terminal region of the protein IX. These features were conserved in all analyzed clinical isolates. Overall, amino acid sequences of clinical isolates were highly conserved compared to the prototype (99.2 to 100%, but a synonymous/non synonymous ratio (S/N of 2.36 in E3 CR1 beta suggested positive selection. Conclusion Unique sequence features of HAdV-A31 may enhance its ability to escape the host's immune surveillance and may facilitate a promiscuous tropism for various tissues. Moderate evolution of clinical isolates did not indicate the emergence of new HAdV-A31 subtypes in the recent years.

  11. Unique sequence features of the Human adenovirus 31 complete genomic sequence are conserved in clinical isolates.

    Science.gov (United States)

    Hofmayer, Soeren; Madisch, Ijad; Darr, Sebastian; Rehren, Fabienne; Heim, Albert

    2009-11-25

    Human adenoviruses (HAdV) are causing a broad spectrum of diseases. One of the most severe forms of adenovirus infection is a disseminated disease resulting in significant morbidity and mortality. Several reports in recent years have identified HAdV-31 from species A (HAdV-A31) as a cause of disseminated disease in children following haematopoetic stem cell transplantation (hSCT) and liver transplantation. We sequenced and analyzed the complete genome of the HAdV-A31 prototype strain to uncover unique sequence motifs associated with its high virulence. Moreover, we sequenced coding regions known to be essential for tropism and virulence (early transcription units E1A, E3, E4, the fiber knob and the penton base) of HAdV-A31 clinical isolates from patients with disseminated disease. The genome size of HAdV-A31 is 33763 base pairs (bp) in length with a GC content of 46.36%. Nucleotide alignment to the closely related HAdV-A12 revealed an overall homology of 84.2%. The genome organization into early, intermediate and late regions is similar to HAdV-A12. Sequence analysis of the prototype strain showed unique sequence features such as an immunoglobulin-like domain in the species A specific gene product E3 CR1 beta and a potentially integrin binding RGD motif in the C-terminal region of the protein IX. These features were conserved in all analyzed clinical isolates. Overall, amino acid sequences of clinical isolates were highly conserved compared to the prototype (99.2 to 100%), but a synonymous/non synonymous ratio (S/N) of 2.36 in E3 CR1 beta suggested positive selection. Unique sequence features of HAdV-A31 may enhance its ability to escape the host's immune surveillance and may facilitate a promiscuous tropism for various tissues. Moderate evolution of clinical isolates did not indicate the emergence of new HAdV-A31 subtypes in the recent years.

  12. Complete Genome Sequences of Chrysanthemum Stunt Viroid from a Single Chrysanthemum Cultivar

    OpenAIRE

    Choi, Hoseong; Jo, Yeonhwa; Yoon, Ju-Yeon; Choi, Seung-Kook; Cho, Won Kyong

    2015-01-01

    The chrysanthemum stunt viroid (CSVd), a member of the genus Pospiviroid with a single circular RNA genome, infects many chrysanthemum species. Here, we report 25 complete genome sequences of CSVd in a single chrysanthemum cultivar, revealing 20 variants.

  13. Complete genome sequence of the probiotic Lactobacillus casei strain BL23.

    NARCIS (Netherlands)

    Maze, A.; Boel, G.; Zuniga, M.; Bourand, A.; Loux, V.; Yebra, M.J.; Monedero, V.; Correia, K.; Jacques, N.; Beaufils, S.; Poncet, S.; Joyet, P.; Milohanic, E.; Casaregola, S.; Auffray, Y.; Perez-Martinez, G.; Gibrat, J.F.; Zagorec, M.; Francke, C.; Hartke, A.; Deutscher, J.

    2010-01-01

    The entire genome of Lactobacillus casei BL23, a strain with probiotic properties, has been sequenced. The genomes of BL23 and the industrially used probiotic strain Shirota YIT 9029 (Yakult) seem to be very similar.

  14. Angular Momentum across the Hubble sequence from the CALIFA survey

    NARCIS (Netherlands)

    Falcón-Barroso, Jesús; Lyubenova, Mariya; van de Ven, Glenn

    We investigate the stellar angular momentum of galaxies across the Hubble sequence from the CALIFA survey. The distribution of CALIFA elliptical and lenticular galaxies in the λRe - ɛe diagram is consistent with that shown by the Atlas3D survey. Our data, however, show that the location of spiral

  15. Multi-scale coding of genomic information: From DNA sequence to genome structure and function

    Energy Technology Data Exchange (ETDEWEB)

    Arneodo, Alain, E-mail: alain.arneodo@ens-lyon.f [Universite de Lyon, F-69000 Lyon (France); Laboratoire Joliot-Curie and Laboratoire de Physique, CNRS, Ecole Normale Superieure de Lyon, F-69007 Lyon (France); Vaillant, Cedric, E-mail: cedric.vaillant@ens-lyon.f [Universite de Lyon, F-69000 Lyon (France); Laboratoire Joliot-Curie and Laboratoire de Physique, CNRS, Ecole Normale Superieure de Lyon, F-69007 Lyon (France); Audit, Benjamin, E-mail: benjamin.audit@ens-lyon.f [Universite de Lyon, F-69000 Lyon (France); Laboratoire Joliot-Curie and Laboratoire de Physique, CNRS, Ecole Normale Superieure de Lyon, F-69007 Lyon (France); Argoul, Francoise, E-mail: francoise.argoul@ens-lyon.f [Universite de Lyon, F-69000 Lyon (France); Laboratoire Joliot-Curie and Laboratoire de Physique, CNRS, Ecole Normale Superieure de Lyon, F-69007 Lyon (France); D' Aubenton-Carafa, Yves, E-mail: daubenton@cgm.cnrs-gif.f [Centre de Genetique Moleculaire, CNRS, Allee de la Terrasse, 91198 Gif-sur-Yvette (France); Thermes, Claude, E-mail: claude.thermes@cgm.cnrs-gif.f [Centre de Genetique Moleculaire, CNRS, Allee de la Terrasse, 91198 Gif-sur-Yvette (France)

    2011-02-15

    Understanding how chromatin is spatially and dynamically organized in the nucleus of eukaryotic cells and how this affects genome functions is one of the main challenges of cell biology. Since the different orders of packaging in the hierarchical organization of DNA condition the accessibility of DNA sequence elements to trans-acting factors that control the transcription and replication processes, there is actually a wealth of structural and dynamical information to learn in the primary DNA sequence. In this review, we show that when using concepts, methodologies, numerical and experimental techniques coming from statistical mechanics and nonlinear physics combined with wavelet-based multi-scale signal processing, we are able to decipher the multi-scale sequence encoding of chromatin condensation-decondensation mechanisms that play a fundamental role in regulating many molecular processes involved in nuclear functions.

  16. A Proposal to Sequence the Genome of a Garter Snake (Thamnophis sirtalis)

    OpenAIRE

    Castoe, Todd A.; Bronikowski, Anne M.; Pfrender, Michael E.; Shapiro, Michael D.; Pollock, David D.; Warren, Wesley C.; Edwards, Scott V.; Brodie, Edmund D. III

    2011-01-01

    Here we develop an argument in support of sequencing a garter snake (Thamnophis sirtalis) genome, and outline a plan to accomplish this. This snake is a common, widespread, nonvenomous North American species that has served as a model for diverse studies in evolutionary biology, physiology, genomics, behavior and coevolution. The anole lizard is currently the only genome sequence available for a non-avian reptile. Thus, the garter snake at this time would be the first available snake genome s...

  17. Genomic Resources for Water Yam (Dioscorea alata L.): Analyses of EST-Sequences, De Novo Sequencing and GBS Libraries.

    Science.gov (United States)

    Saski, Christopher A; Bhattacharjee, Ranjana; Scheffler, Brian E; Asiedu, Robert

    2015-01-01

    The reducing cost and rapid progress in next-generation sequencing techniques coupled with high performance computational approaches have resulted in large-scale discovery of advanced genomic resources in several model and non-model plant species. Yam (Dioscorea spp.) is a major food and cash crop in many countries but research efforts have been limited to understand the genetics and generate genomic information for the crop. The availability of a large number of genomic resources including genome-wide molecular markers will accelerate the breeding efforts and application of genomic selection in yams. In the present study, several methods including expressed sequence tags (EST)-sequencing, de novo sequencing, and genotyping-by-sequencing (GBS) profiles on two yam (Dioscorea alata L.) genotypes (TDa 95/00328 and TDa 95-310) was performed to generate genomic resources for use in its improvement programs. This includes a comprehensive set of EST-SSRs, genomic SSRs, whole genome SNPs, and reduced representation SNPs. A total of 1,152 EST-SSRs were developed from >40,000 EST-sequences generated from the two genotypes. A set of 388 EST-SSRs were validated as polymorphic showing a polymorphism rate of 34% when tested on two diverse parents targeted for anthracnose disease. In addition, approximately 40X de novo whole genome sequence coverage was generated for each of the two genotypes, and a total of 18,584 and 15,952 genomic SSRs were identified for TDa 95/00328 and TDa 95-310, respectively. A custom made pipeline resulted in the selection of 573 genomic SSRs common across the two genotypes, of which only eight failed, 478 being polymorphic and 62 monomorphic indicating a polymorphic rate of 83.5%. Additionally, 288,505 high quality SNPs were also identified between these two genotypes. Genotyping by sequencing reads on these two genotypes also revealed 36,790 overlapping SNP positions that are distributed throughout the genome. Our efforts in using different approaches

  18. Whole-genome sequencing and intensive analysis of the undomesticated soybean (Glycine soja Sieb. and Zucc.) genome.

    Science.gov (United States)

    Kim, Moon Young; Lee, Sunghoon; Van, Kyujung; Kim, Tae-Hyung; Jeong, Soon-Chun; Choi, Ik-Young; Kim, Dae-Soo; Lee, Yong-Seok; Park, Daeui; Ma, Jianxin; Kim, Woo-Yeon; Kim, Byoung-Chul; Park, Sungjin; Lee, Kyung-A; Kim, Dong Hyun; Kim, Kil Hyun; Shin, Jin Hee; Jang, Young Eun; Kim, Kyung Do; Liu, Wei Xian; Chaisan, Tanapon; Kang, Yang Jae; Lee, Yeong-Ho; Kim, Kook-Hyung; Moon, Jung-Kyung; Schmutz, Jeremy; Jackson, Scott A; Bhak, Jong; Lee, Suk-Ha

    2010-12-21

    The genome of soybean (Glycine max), a commercially important crop, has recently been sequenced and is one of six crop species to have been sequenced. Here we report the genome sequence of G. soja, the undomesticated ancestor of G. max (in particular, G. soja var. IT182932). The 48.8-Gb Illumina Genome Analyzer (Illumina-GA) short DNA reads were aligned to the G. max reference genome and a consensus was determined for G. soja. This consensus sequence spanned 915.4 Mb, representing a coverage of 97.65% of the G. max published genome sequence and an average mapping depth of 43-fold. The nucleotide sequence of the G. soja genome, which contains 2.5 Mb of substituted bases and 406 kb of small insertions/deletions relative to G. max, is ∼0.31% different from that of G. max. In addition to the mapped 915.4-Mb consensus sequence, 32.4 Mb of large deletions and 8.3 Mb of novel sequence contigs in the G. soja genome were also detected. Nucleotide variants of G. soja versus G. max confirmed by Roche Genome Sequencer FLX sequencing showed a 99.99% concordance in single-nucleotide polymorphism and a 98.82% agreement in insertion/deletion calls on Illumina-GA reads. Data presented in this study suggest that the G. soja/G. max complex may be at least 0.27 million y old, appearing before the relatively recent event of domestication (6,000∼9,000 y ago). This suggests that soybean domestication is complicated and that more in-depth study of population genetics is needed. In any case, genome comparison of domesticated and undomesticated forms of soybean can facilitate its improvement.

  19. Whole genome sequence analysis of unidentified genetically modified papaya for development of a specific detection method.

    Science.gov (United States)

    Nakamura, Kosuke; Kondo, Kazunari; Akiyama, Hiroshi; Ishigaki, Takumi; Noguchi, Akio; Katsumata, Hiroshi; Takasaki, Kazuto; Futo, Satoshi; Sakata, Kozue; Fukuda, Nozomi; Mano, Junichi; Kitta, Kazumi; Tanaka, Hidenori; Akashi, Ryo; Nishimaki-Mogami, Tomoko

    2016-08-15

    Identification of transgenic sequences in an unknown genetically modified (GM) papaya (Carica papaya L.) by whole genome sequence analysis was demonstrated. Whole genome sequence data were generated for a GM-positive fresh papaya fruit commodity detected in monitoring using real-time polymerase chain reaction (PCR). The sequences obtained were mapped against an open database for papaya genome sequence. Transgenic construct- and event-specific sequences were identified as a GM papaya developed to resist infection from a Papaya ringspot virus. Based on the transgenic sequences, a specific real-time PCR detection method for GM papaya applicable to various food commodities was developed. Whole genome sequence analysis enabled identifying unknown transgenic construct- and event-specific sequences in GM papaya and development of a reliable method for detecting them in papaya food commodities. Copyright © 2016 Elsevier Ltd. All rights reserved.

  20. Whole-Genome Sequence of the Microcin E492-Producing Strain Klebsiella pneumoniae RYC492.

    Science.gov (United States)

    Marcoleta, Andrés; Gutiérrez-Cortez, Sergio; Maturana, Daniel; Monasterio, Octavio; Lagos, Rosalba

    2013-05-09

    Here, we report the draft genome sequence of the Gram-negative strain Klebsiella pneumoniae RYC492, which produces the amyloid-forming and antibacterial peptide microcin E492. The sequenced genome consists of a 5,095,761-bp assembled open chromosome where the gene cluster for microcin production is located in a putative 31-kb genomic island flanked by sequence repeats and containing a putative integrase-coding gene.

  1. A comparative, BAC end sequence enabled map of the genome of the American mink (Neovison vison)

    DEFF Research Database (Denmark)

    Benkel, Bernhard F.; Smith, Amanda; Christensen, Knud

    2012-01-01

    In this report we present the results of the analysis of approximately 2.7 Mb of genomic information for the American mink (Neovison vison) derived through BAC end sequencing. Our study, which encompasses approximately 1/1000th of the mink genome, suggests that simple sequence repeats (SSRs......) are less common in the mink than in the human genome, whereas the average GC content of the mink genome is slightly higher than that of its human counterpart. The 2.7 Mb mink genomic dataset also contained 2,416 repeat elements (retroids and DNA transposons) occupying almost 31% of the sequence space...

  2. Genome-wide survey for biologically functional pseudogenes.

    Directory of Open Access Journals (Sweden)

    Orjan Svensson

    2006-05-01

    Full Text Available According to current estimates there exist about 20,000 pseudogenes in a mammalian genome. The vast majority of these are disabled and nonfunctional copies of protein-coding genes which, therefore, evolve neutrally. Recent findings that a Makorin1 pseudogene, residing on mouse Chromosome 5, is, indeed, in vivo vital and also evolutionarily preserved, encouraged us to conduct a genome-wide survey for other functional pseudogenes in human, mouse, and chimpanzee. We identify to our knowledge the first examples of conserved pseudogenes common to human and mouse, originating from one duplication predating the human-mouse species split and having evolved as pseudogenes since the species split. Functionality is one possible way to explain the apparently contradictory properties of such pseudogene pairs, i.e., high conservation and ancient origin. The hypothesis of functionality is tested by comparing expression evidence and synteny of the candidates with proper test sets. The tests suggest potential biological function. Our candidate set includes a small set of long-lived pseudogenes whose unknown potential function is retained since before the human-mouse species split, and also a larger group of primate-specific ones found from human-chimpanzee searches. Two processed sequences are notable, their conservation since the human-mouse split being as high as most protein-coding genes; one is derived from the protein Ataxin 7-like 3 (ATX7NL3, and one from the Spinocerebellar ataxia type 1 protein (ATX1. Our approach is comparative and can be applied to any pair of species. It is implemented by a semi-automated pipeline based on cross-species BLAST comparisons and maximum-likelihood phylogeny estimations. To separate pseudogenes from protein-coding genes, we use standard methods, utilizing in-frame disablements, as well as a probabilistic filter based on Ka/Ks ratios.

  3. Genomic View of Bipolar Disorder Revealed by Whole Genome Sequencing in a Genetic Isolate

    Science.gov (United States)

    Georgi, Benjamin; Craig, David; Kember, Rachel L.; Liu, Wencheng; Lindquist, Ingrid; Nasser, Sara; Brown, Christopher; Egeland, Janice A.; Paul, Steven M.; Bućan, Maja

    2014-01-01

    Bipolar disorder is a common, heritable mental illness characterized by recurrent episodes of mania and depression. Despite considerable effort to elucidate the genetic underpinnings of bipolar disorder, causative genetic risk factors remain elusive. We conducted a comprehensive genomic analysis of bipolar disorder in a large Old Order Amish pedigree. Microsatellite genotypes and high-density SNP-array genotypes of 388 family members were combined with whole genome sequence data for 50 of these subjects, comprising 18 parent-child trios. This study design permitted evaluation of candidate variants within the context of haplotype structure by resolving the phase in sequenced parent-child trios and by imputation of variants into multiple unsequenced siblings. Non-parametric and parametric linkage analysis of the entire pedigree as well as on smaller clusters of families identified several nominally significant linkage peaks, each of which included dozens of predicted deleterious variants. Close inspection of exonic and regulatory variants in genes under the linkage peaks using family-based association tests revealed additional credible candidate genes for functional studies and further replication in population-based cohorts. However, despite the in-depth genomic characterization of this unique, large and multigenerational pedigree from a genetic isolate, there was no convergence of evidence implicating a particular set of risk loci or common pathways. The striking haplotype and locus heterogeneity we observed has profound implications for the design of studies of bipolar and other related disorders. PMID:24625924

  4. Rediscovery by Whole Genome Sequencing: Classical Mutations and Genome Polymorphisms in Neurospora crassa

    Energy Technology Data Exchange (ETDEWEB)

    McCluskey, Kevin; Wiest, Aric E.; Grigoriev, Igor V.; Lipzen, Anna; Martin, Joel; Schackwitz, Wendy; Baker, Scott E.

    2011-06-02

    Classical forward genetics has been foundational to modern biology, and has been the paradigm for characterizing the role of genes in shaping phenotypes for decades. In recent years, reverse genetics has been used to identify the functions of genes, via the intentional introduction of variation and subsequent evaluation in physiological, molecular, and even population contexts. These approaches are complementary and whole genome analysis serves as a bridge between the two. We report in this article the whole genome sequencing of eighteen classical mutant strains of Neurospora crassa and the putative identification of the mutations associated with corresponding mutant phenotypes. Although some strains carry multiple unique nonsynonymous, nonsense, or frameshift mutations, the combined power of limiting the scope of the search based on genetic markers and of using a comparative analysis among the eighteen genomes provides strong support for the association between mutation and phenotype. For ten of the mutants, the mutant phenotype is recapitulated in classical or gene deletion mutants in Neurospora or other filamentous fungi. From thirteen to 137 nonsense mutations are present in each strain and indel sizes are shown to be highly skewed in gene coding sequence. Significant additional genetic variation was found in the eighteen mutant strains, and this variability defines multiple alleles of many genes. These alleles may be useful in further genetic and molecular analysis of known and yet-to-be-discovered functions and they invite new interpretations of molecular and genetic interactions in classical mutant strains.

  5. Genomic view of bipolar disorder revealed by whole genome sequencing in a genetic isolate.

    Directory of Open Access Journals (Sweden)

    Benjamin Georgi

    2014-03-01

    Full Text Available Bipolar disorder is a common, heritable mental illness characterized by recurrent episodes of mania and depression. Despite considerable effort to elucidate the genetic underpinnings of bipolar disorder, causative genetic risk factors remain elusive. We conducted a comprehensive genomic analysis of bipolar disorder in a large Old Order Amish pedigree. Microsatellite genotypes and high-density SNP-array genotypes of 388 family members were combined with whole genome sequence data for 50 of these subjects, comprising 18 parent-child trios. This study design permitted evaluation of candidate variants within the context of haplotype structure by resolving the phase in sequenced parent-child trios and by imputation of variants into multiple unsequenced siblings. Non-parametric and parametric linkage analysis of the entire pedigree as well as on smaller clusters of families identified several nominally significant linkage peaks, each of which included dozens of predicted deleterious variants. Close inspection of exonic and regulatory variants in genes under the linkage peaks using family-based association tests revealed additional credible candidate genes for functional studies and further replication in population-based cohorts. However, despite the in-depth genomic characterization of this unique, large and multigenerational pedigree from a genetic isolate, there was no convergence of evidence implicating a particular set of risk loci or common pathways. The striking haplotype and locus heterogeneity we observed has profound implications for the design of studies of bipolar and other related disorders.

  6. Genome sequence of pacific abalone (Haliotis discus hannai): the first draft genome in family Haliotidae.

    Science.gov (United States)

    Nam, Bo-Hye; Kwak, Woori; Kim, Young-Ok; Kim, Dong-Gyun; Kong, Hee Jeong; Kim, Woo-Jin; Kang, Jeong-Ha; Park, Jung Youn; An, Cheul Min; Moon, Ji-Young; Park, Choul Ji; Yu, Jae Woong; Yoon, Joon; Seo, Minseok; Kim, Kwondo; Kim, Duk Kyung; Lee, SaetByeol; Sung, Samsun; Lee, Chul; Shin, Younhee; Jung, Myunghee; Kang, Byeong-Chul; Shin, Ga-Hee; Ka, Sojeong; Caetano-Anolles, Kelsey; Cho, Seoae; Kim, Heebal

    2017-05-01

    Abalones are large marine snails in the family Haliotidae and the genus Haliotis belonging to the class Gastropoda of the phylum Mollusca. The family Haliotidae contains only one genus, Haliotis, and this single genus is known to contain several species of abalone. With 18 additional subspecies, the most comprehensive treatment of Haliotidae considers 56 species valid [ 1 ]. Abalone is an economically important fishery and aquaculture animal that is considered a highly prized seafood delicacy. The total global supply of abalone has increased 5-fold since the 1970s and farm production increased explosively from 50 mt to 103 464 mt in the past 40 years. Additionally, researchers have recently focused on abalone given their reported tumor suppression effect. However, despite the valuable features of this marine animal, no genomic information is available for the Haliotidae family and related research is still limited. To construct the H . discus hannai genome, a total of 580-G base pairs using Illumina and Pacbio platforms were generated with 322-fold coverage based on the 1.8-Gb estimated genome size of H . discus hannai using flow cytometry. The final genome assembly consisted of 1.86 Gb with 35 450 scaffolds (>2 kb). GC content level was 40.51%, and the N50 length of assembled scaffolds was 211 kb. We identified 29 449 genes using Evidence Modeler based on the gene information from ab initio prediction, protein homology with known genes, and transcriptome evidence of RNA-seq. Here we present the first Haliotidae genome, H . discus hannai , with sequencing data, assembly, and gene annotation information. This will be helpful for resolving the lack of genomic information in the Haliotidae family as well as providing more opportunities for understanding gastropod evolution. © The Authors 2017. Published by Oxford University Press.

  7. Whole genome sequencing: an efficient approach to ensuring food safety

    Science.gov (United States)

    Lakicevic, B.; Nastasijevic, I.; Dimitrijevic, M.

    2017-09-01

    Whole genome sequencing is an effective, powerful tool that can be applied to a wide range of public health and food safety applications. A major difference between WGS and the traditional typing techniques is that WGS allows all genes to be included in the analysis, instead of a well-defined subset of genes or variable intergenic regions. Also, the use of WGS can facilitate the understanding of contamination/colonization routes of foodborne pathogens within the food production environment, and can also afford efficient tracking of pathogens’ entry routes and distribution from farm-to-consumer. Tracking foodborne pathogens in the food processing-distribution-retail-consumer continuum is of the utmost importance for facilitation of outbreak investigations and rapid action in controlling/preventing foodborne outbreaks. Therefore, WGS likely will replace most of the numerous workflows used in public health laboratories to characterize foodborne pathogens into one consolidated, efficient workflow.

  8. Genome sequence of Kocuria varians G6 ssolated from a slaughterhouse in Denmark

    DEFF Research Database (Denmark)

    Raghupathi, Prem Krishnan; Herschend, Jakob; Røder, Henriette Lyng

    2016-01-01

    We report here the first draft genome sequence ofKocuria variansG6, which was isolated from a meat chopper at a small slaughterhouse in Denmark. The 2.90-Mb genome sequence consists of 95 contigs and contains 2,518 predicted protein-coding genes.......We report here the first draft genome sequence ofKocuria variansG6, which was isolated from a meat chopper at a small slaughterhouse in Denmark. The 2.90-Mb genome sequence consists of 95 contigs and contains 2,518 predicted protein-coding genes....

  9. Unique Sequence Features of the Human Adenovirus 31 Complete Genomic Sequence are Conserved in Clinical Isolates

    OpenAIRE

    Hofmayer, Soeren; Darr, Sebastian; Rehren, Fabienne; Heim, Albert; Madisch, Ijad

    2009-01-01

    Abstract Background Human adenoviruses (HAdV) are causing a broad spectrum of diseases. One of the most severe forms of adenovirus infection is a disseminated disease resulting in significant morbidity and mortality. Several reports in recent years have identified HAdV-31 from species A (HAdV-A31) as a cause of disseminated disease in children following haematopoetic stem cell transplantation (hSCT) and liver transplantation. We sequenced and analyzed the complete genome of the HAdV-A31 proto...

  10. Whole-Genome Sequencing for National Surveillance of Shigella flexneri

    Directory of Open Access Journals (Sweden)

    Marie A. Chattaway

    2017-09-01

    Full Text Available National surveillance of Shigella flexneri ensures the rapid detection of outbreaks to facilitate public health investigation and intervention strategies. In this study, we used whole-genome sequencing (WGS to type S. flexneri in order to detect linked cases and support epidemiological investigations. We prospectively analyzed 330 isolates of S. flexneri received at the Gastrointestinal Bacteria Reference Unit at Public Health England between August 2015 and January 2016. Traditional phenotypic and WGS sub-typing methods were compared. PCR was carried out on isolates exhibiting phenotypic/genotypic discrepancies with respect to serotype. Phylogenetic relationships between isolates were analyzed by WGS using single nucleotide polymorphism (SNP typing to facilitate cluster detection. For 306/330 (93% isolates there was concordance between serotype derived from the genome and phenotypic serology. Discrepant results between the phenotypic and genotypic tests were attributed to novel O-antigen synthesis/modification gene combinations or indels identified in O-antigen synthesis/modification genes rendering them dysfunctional. SNP typing identified 36 clusters of two isolates or more. WGS provided microbiological evidence of epidemiologically linked clusters and detected novel O-antigen synthesis/modification gene combinations associated with two outbreaks. WGS provided reliable and robust data for monitoring trends in the incidence of different serotypes over time. SNP typing can be used to facilitate outbreak investigations in real-time thereby informing surveillance strategies and providing the opportunities for implementing timely public health interventions.

  11. No genome-wide protein sequence convergence for echolocation.

    Science.gov (United States)

    Zou, Zhengting; Zhang, Jianzhi

    2015-05-01

    Toothed whales and two groups of bats independently acquired echolocation, the ability to locate and identify objects by reflected sound. Echolocation requires physiologically complex and coordinated vocal, auditory, and neural functions, but the molecular basis of the capacity for echolocation is not well understood. A recent study suggested that convergent amino acid substitutions widespread in the proteins of echolocators underlay the convergent origins of mammalian echolocation. Here, we show that genomic signatures of molecular convergence between echolocating lineages are generally no stronger than those between echolocating and comparable nonecholocating lineages. The same is true for the group of 29 hearing-related proteins claimed to be enriched with molecular convergence. Reexamining the previous selection test reveals several flaws and invalidates the asserted evidence for adaptive convergence. Together, these findings indicate that the reported genomic signatures of convergence largely reflect the background level of sequence convergence unrelated to the origins of echolocation. © The Author 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  12. Co-barcoded sequence reads from long DNA fragments: A cost-effective solution for Perfect Genome sequencing

    Directory of Open Access Journals (Sweden)

    Brock A Peters

    2015-01-01

    Full Text Available Next generation sequencing (NGS technologies, primarily based on massively parallel sequencing (MPS, have touched and radically changed almost all aspects of research worldwide. These technologies have allowed for the rapid analysis, to date, of the genomes of more than 2,000 different species. In humans, NGS has arguably had the largest impact. Over 100,000 genomes of individual humans (based on various estimates have been sequenced allowing for deep insights into what makes individuals and families unique and what causes disease in each of us. Despite all of this progress, the current state of the art in sequence technology is far from generating a perfect genome sequence and much remains to be understood in the biology of human and other organisms’ genomes. In the article that follows we outline, why the perfect genome in humans is important, what is lacking from current human whole genome sequences, and a potential strategy for achieving the perfect genome in a cost effective manner.

  13. Topoisomerase I plays a critical role in suppressing genome instability at a highly transcribed G-quadruplex-forming sequence.

    Directory of Open Access Journals (Sweden)

    Puja Yadav

    2014-12-01

    Full Text Available G-quadruplex or G4 DNA is a non-B secondary DNA structure that comprises a stacked array of guanine-quartets. Cellular processes such as transcription and replication can be hindered by unresolved DNA secondary structures potentially endangering genome maintenance. As G4-forming sequences are highly frequent throughout eukaryotic genomes, it is important to define what factors contribute to a G4 motif becoming a hotspot of genome instability. Using a genetic assay in Saccharomyces cerevisiae, we previously demonstrated that a potential G4-forming sequence derived from a guanine-run containing immunoglobulin switch Mu (Sμ region becomes highly unstable when actively transcribed. Here we describe assays designed to survey spontaneous genome rearrangements initiated at the Sμ sequence in the context of large genomic areas. We demonstrate that, in the absence of Top1, a G4 DNA-forming sequence becomes a strong hotspot of gross chromosomal rearrangements and loss of heterozygosity associated with mitotic recombination within the ∼ 20 kb or ∼ 100 kb regions of yeast chromosome V or III, respectively. Transcription confers a critical strand bias since genome rearrangements at the G4-forming Sμ are elevated only when the guanine-runs are located on the non-transcribed strand. The direction of replication and transcription, when in a head-on orientation, further contribute to the elevated genome instability at a potential G4 DNA-forming sequence. The implications of our identification of Top1 as a critical factor in suppression of instability associated with potential G4 DNA-forming sequences are discussed.

  14. Genomic prediction in families of perennial ryegrass based on genotyping-by-sequencing

    DEFF Research Database (Denmark)

    Ashraf, Bilal

    In this thesis we investigate the potential for genomic prediction in perennial ryegrass using genotyping-by-sequencing (GBS) data. Association method based on family-based breeding systems was developed, genomic heritabilities, genomic prediction accurancies and effects of some key factors wer...... prediction. Overall, GBS allows for genomic prediction in breeding families of perennial ryegrass and holds good potential to expedite genetic gain and encourage the application of genomic prediction...

  15. Genome and transcriptome sequencing of lung cancers reveal diverse mutational and splicing events.

    Science.gov (United States)

    Liu, Jinfeng; Lee, William; Jiang, Zhaoshi; Chen, Zhongqiang; Jhunjhunwala, Suchit; Haverty, Peter M; Gnad, Florian; Guan, Yinghui; Gilbert, Houston N; Stinson, Jeremy; Klijn, Christiaan; Guillory, Joseph; Bhatt, Deepali; Vartanian, Steffan; Walter, Kimberly; Chan, Jocelyn; Holcomb, Thomas; Dijkgraaf, Peter; Johnson, Stephanie; Koeman, Julie; Minna, John D; Gazdar, Adi F; Stern, Howard M; Hoeflich, Klaus P; Wu, Thomas D; Settleman, Jeff; de Sauvage, Frederic J; Gentleman, Robert C; Neve, Richard M; Stokoe, David; Modrusan, Zora; Seshagiri, Somasekar; Shames, David S; Zhang, Zemin

    2012-12-01

    Lung cancer is a highly heterogeneous disease in terms of both underlying genetic lesions and response to therapeutic treatments. We performed deep whole-genome sequencing and transcriptome sequencing on 19 lung cancer cell lines and three lung tumor/normal pairs. Overall, our data show that cell line models exhibit similar mutation spectra to human tumor samples. Smoker and never-smoker cancer samples exhibit distinguishable patterns of mutations. A number of epigenetic regulators, including KDM6A, ASH1L, SMARCA4, and ATAD2, are frequently altered by mutations or copy number changes. A systematic survey of splice-site mutations identified 106 splice site mutations associated with cancer specific aberrant splicing, including mutations in several known cancer-related genes. RAC1b, an isoform of the RAC1 GTPase that includes one additional exon, was found to be preferentially up-regulated in lung cancer. We further show that its expression is significantly associated with sensitivity to a MAP2K (MEK) inhibitor PD-0325901. Taken together, these data present a comprehensive genomic landscape of a large number of lung cancer samples and further demonstrate that cancer-specific alternative splicing is a widespread phenomenon that has potential utility as therapeutic biomarkers. The detailed characterizations of the lung cancer cell lines also provide genomic context to the vast amount of experimental data gathered for these lines over the decades, and represent highly valuable resources for cancer biology.

  16. Genome Sequence of Bacillus glycinifermentans TH008, Isolated from Ohio Soil

    OpenAIRE

    Zeigler, Daniel R.

    2016-01-01

    The genome sequence of an Ohio soil isolate, TH008, was determined. The sequence reveals a close relationship between TH008 and domesticated Bacillus glycinifermentans strains found in a traditional Korean fermented soybean food.

  17. Genome-wide survey of recurrent HBV integration in hepatocellular carcinoma

    DEFF Research Database (Denmark)

    Sung, Wing-Kin; Zheng, Hancheng; Li, Shuyu

    2012-01-01

    To survey hepatitis B virus (HBV) integration in liver cancer genomes, we conducted massively parallel sequencing of 81 HBV-positive and 7 HBV-negative hepatocellular carcinomas (HCCs) and adjacent normal tissues. We found that HBV integration is observed more frequently in the tumors (86.4%) than...... in adjacent liver tissues (30.7%). Copy-number variations (CNVs) were significantly increased at HBV breakpoint locations where chromosomal instability was likely induced. Approximately 40% of HBV breakpoints within the HBV genome were located within a 1,800-bp region where the viral enhancer, X gene and core...... gene are located. We also identified recurrent HBV integration events (in ≥ 4 HCCs) that were validated by RNA sequencing (RNA-seq) and Sanger sequencing at the known and putative cancer-related TERT, MLL4 and CCNE1 genes, which showed upregulated gene expression in tumor versus normal tissue. We also...

  18. Privacy-preserving techniques of genomic data-a survey.

    Science.gov (United States)

    Aziz, Md Momin Al; Sadat, Md Nazmus; Alhadidi, Dima; Wang, Shuang; Jiang, Xiaoqian; Brown, Cheryl L; Mohammed, Noman

    2017-11-07

    Genomic data hold salient information about the characteristics of a living organism. Throughout the past decade, pinnacle developments have given us more accurate and inexpensive methods to retrieve genome sequences of humans. However, with the advancement of genomic research, there is a growing privacy concern regarding the collection, storage and analysis of such sensitive human data. Recent results show that given some background information, it is possible for an adversary to reidentify an individual from a specific genomic data set. This can reveal the current association or future susceptibility of some diseases for that individual (and sometimes the kinship between individuals) resulting in a privacy violation. Regardless of these risks, our genomic data hold much importance in analyzing the well-being of us and the future generation. Thus, in this article, we discuss the different privacy and security-related problems revolving around human genomic data. In addition, we will explore some of the cardinal cryptographic concepts, which can bring efficacy in secure and private genomic data computation. This article will relate the gaps between these two research areas-Cryptography and Genomics. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  19. High-density rhesus macaque oligonucleotide microarray design using early-stage rhesus genome sequence information and human genome annotations

    Directory of Open Access Journals (Sweden)

    Magness Charles L

    2007-01-01

    Full Text Available Abstract Background Until recently, few genomic reagents specific for non-human primate research have been available. To address this need, we have constructed a macaque-specific high-density oligonucleotide microarray by using highly fragmented low-pass sequence contigs from the rhesus genome project together with the detailed sequence and exon structure of the human genome. Using this method, we designed oligonucleotide probes to over 17,000 distinct rhesus/human gene orthologs and increased by four-fold the number of available genes relative to our first-generation expressed sequence tag (EST-derived array. Results We constructed a database containing 248,000 exon sequences from 23,000 human RefSeq genes and compared each human exon with its best matching sequence in the January 2005 version of the rhesus genome project list of 486,000 DNA contigs. Best matching rhesus exon sequences for each of the 23,000 human genes were then concatenated in the proper order and orientation to produce a rhesus "virtual transcriptome." Microarray probes were designed, one per gene, to the region closest to the 3' untranslated region (UTR of each rhesus virtual transcript. Each probe was compared to a composite rhesus/human transcript database to test for cross-hybridization potential yielding a final probe set representing 18,296 rhesus/human gene orthologs, including transcript variants, and over 17,000 distinct genes. We hybridized mRNA from rhesus brain and spleen to both the EST- and genome-derived microarrays. Besides four-fold greater gene coverage, the genome-derived array also showed greater mean signal intensities for genes present on both arrays. Genome-derived probes showed 99.4% identity when compared to 4,767 rhesus GenBank sequence tag site (STS sequences indicating that early stage low-pass versions of complex genomes are of sufficient quality to yield valuable functional genomic information when combined with finished genome information from

  20. Genome Analysis of the Domestic Dog (Korean Jindo) by Massively Parallel Sequencing

    Science.gov (United States)

    Kim, Ryong Nam; Kim, Dae-Soo; Choi, Sang-Haeng; Yoon, Byoung-Ha; Kang, Aram; Nam, Seong-Hyeuk; Kim, Dong-Wook; Kim, Jong-Joo; Ha, Ji-Hong; Toyoda, Atsushi; Fujiyama, Asao; Kim, Aeri; Kim, Min-Young; Park, Kun-Hyang; Lee, Kang Seon; Park, Hong-Seog

    2012-01-01

    Although pioneering sequencing projects have shed light on the boxer and poodle genomes, a number of challenges need to be met before the sequencing and annotation of the dog genome can be considered complete. Here, we present the DNA sequence of the Jindo dog genome, sequenced to 45-fold average coverage using Illumina massively parallel sequencing technology. A comparison of the sequence to the reference boxer genome led to the identification of 4 675 437 single nucleotide polymorphisms (SNPs, including 3 346 058 novel SNPs), 71 642 indels and 8131 structural variations. Of these, 339 non-synonymous SNPs and 3 indels are located within coding sequences (CDS). In particular, 3 non-synonymous SNPs and a 26-bp deletion occur in the TCOF1 locus, implying that the difference observed in cranial facial morphology between Jindo and boxer dogs might be influenced by those variations. Through the annotation of the Jindo olfactory receptor gene family, we found 2 unique olfactory receptor genes and 236 olfactory receptor genes harbouring non-synonymous homozygous SNPs that are likely to affect smelling capability. In addition, we determined the DNA sequence of the Jindo dog mitochondrial genome and identified Jindo dog-specific mtDNA genotypes. This Jindo genome data upgrade our understanding of dog genomic architecture and will be a very valuable resource for investigating not only dog genetics and genomics but also human and dog disease genetics and comparative genomics. PMID:22474061

  1. Genome analysis of the domestic dog (Korean Jindo) by massively parallel sequencing.

    Science.gov (United States)

    Kim, Ryong Nam; Kim, Dae-Soo; Choi, Sang-Haeng; Yoon, Byoung-Ha; Kang, Aram; Nam, Seong-Hyeuk; Kim, Dong-Wook; Kim, Jong-Joo; Ha, Ji-Hong; Toyoda, Atsushi; Fujiyama, Asao; Kim, Aeri; Kim, Min-Young; Park, Kun-Hyang; Lee, Kang Seon; Park, Hong-Seog

    2012-06-01

    Although pioneering sequencing projects have shed light on the boxer and poodle genomes, a number of challenges need to be met before the sequencing and annotation of the dog genome can be considered complete. Here, we present the DNA sequence of the Jindo dog genome, sequenced to 45-fold average coverage using Illumina massively parallel sequencing technology. A comparison of the sequence to the reference boxer genome led to the identification of 4 675 437 single nucleotide polymorphisms (SNPs, including 3 346 058 novel SNPs), 71 642 indels and 8131 structural variations. Of these, 339 non-synonymous SNPs and 3 indels are located within coding sequences (CDS). In particular, 3 non-synonymous SNPs and a 26-bp deletion occur in the TCOF1 locus, implying that the difference observed in cranial facial morphology between Jindo and boxer dogs might be influenced by those variations. Through the annotation of the Jindo olfactory receptor gene family, we found 2 unique olfactory receptor genes and 236 olfactory receptor genes harbouring non-synonymous homozygous SNPs that are likely to affect smelling capability. In addition, we determined the DNA sequence of the Jindo dog mitochondrial genome and identified Jindo dog-specific mtDNA genotypes. This Jindo genome data upgrade our understanding of dog genomic architecture and will be a very valuable resource for investigating not only dog genetics and genomics but also human and dog disease genetics and comparative genomics.

  2. Dispositional optimism and perceived risk interact to predict intentions to learn genome sequencing results.

    Science.gov (United States)

    Taber, Jennifer M; Klein, William M P; Ferrer, Rebecca A; Lewis, Katie L; Biesecker, Leslie G; Biesecker, Barbara B

    2015-07-01

    Dispositional optimism and risk perceptions are each associated with health-related behaviors and decisions and other outcomes, but little research has examined how these constructs interact, particularly in consequential health contexts. The predictive validity of risk perceptions for health-related information seeking and intentions may be improved by examining dispositional optimism as a moderator, and by testing alternate types of risk perceptions, such as comparative and experiential risk. Participants (n = 496) had their genomes sequenced as part of a National Institutes of Health pilot cohort study (ClinSeq®). Participants completed a cross-sectional baseline survey of various types of risk perceptions and intentions to learn genome sequencing results for differing disease risks (e.g., medically actionable, nonmedically actionable, carrier status) and to use this information to change their lifestyle/health behaviors. Risk perceptions (absolute, comparative, and experiential) were largely unassociated with intentions to learn sequencing results. Dispositional optimism and comparative risk perceptions interacted, however, such that individuals higher in optimism reported greater intentions to learn all 3 types of sequencing results when comparative risk was perceived to be higher than when it was perceived to be lower. This interaction was inconsistent for experiential risk and absent for absolute risk. Independent of perceived risk, participants high in dispositional optimism reported greater interest in learning risks for nonmedically actionable disease and carrier status, and greater intentions to use genome information to change their lifestyle/health behaviors. The relationship between risk perceptions and intentions may depend on how risk perceptions are assessed and on degree of optimism. (c) 2015 APA, all rights reserved.

  3. Complete genome sequence of a giant Vibrio phage ValKK3 infecting Vibrio alginolyticus

    OpenAIRE

    Lal, Tamrin M.; Sano, Motohiko; Hatai, Kishio; Ransangan, Julian

    2016-01-01

    This paper describes the complete sequence of a giant lytic marine myophage, Vibrio phage ValKK3 that is specific to Vibrio alginolyticus ATCC® 17749™. Vibrio phage ValKK3 was subjected to whole genome sequencing on MiSeq sequencing platform and annotated using Blast2Go. The complete sequence of ValKK3 genome was deposited in DBBJ/EMBL/GenBank under accession number KP671755.

  4. Complete genome sequence of a giant Vibrio phage ValKK3 infecting Vibrio alginolyticus

    Directory of Open Access Journals (Sweden)

    Tamrin M. Lal

    2016-06-01

    Full Text Available This paper describes the complete sequence of a giant lytic marine myophage, Vibrio phage ValKK3 that is specific to Vibrio alginolyticus ATCC® 17749™. Vibrio phage ValKK3 was subjected to whole genome sequencing on MiSeq sequencing platform and annotated using Blast2Go. The complete sequence of ValKK3 genome was deposited in DBBJ/EMBL/GenBank under accession number KP671755.

  5. USE OF NEXT-GENERATION SEQUENCING FOR GENOMIC ANALYSIS IN COMPLEX DISEASES

    OpenAIRE

    Sana, Maria Elena

    2013-01-01

    Since the early 1990s, Sanger method has been the gold standard methodology for sequencing analysis of DNA. Next-generation sequencing (NGS) approaches revolutionized the field of genomics over the last 5 years. These new sequencing technologies make feasible the direct and cost-effective sequencing of genomes at unprecedented scale and speed. Furthermore, the applications of these technologies are wide-spread and have been developed to explore the complex biological systems, among which RNA ...

  6. Global Genomic Diversity of Human Papillomavirus 11 Based on 433 Isolates and 78 Complete Genome Sequences

    Science.gov (United States)

    Jelen, Mateja M.; Chen, Zigui; Kocjan, Boštjan J.; Hošnjak, Lea; Burt, Felicity J.; Chan, Paul K. S.; Chouhy, Diego; Combrinck, Catharina E.; Estrade, Christine; Fiander, Alison; Garland, Suzanne M.; Giri, Adriana A.; González, Joaquín Víctor; Gröning, Arndt; Hibbitts, Sam; Luk, Tommy N. M.; Marinic, Karina; Matsukura, Toshihiko; Neumann, Anna; Oštrbenk, Anja; Picconi, Maria Alejandra; Sagadin, Martin; Sahli, Roland; Seedat, Riaz Y.; Seme, Katja; Severini, Alberto; Sinchi, Jessica L.; Smahelova, Jana; Tabrizi, Sepehr N.; Tachezy, Ruth; Tohme Faybush, Sarah; Uloza, Virgilijus; Uloziene, Ingrida; Wong, Yong Wee; Židovec Lepej, Snježana; Burk, Robert D.

    2016-01-01

    ABSTRACT Human papillomavirus 11 (HPV11) is an etiological agent of anogenital warts and laryngeal papillomas and is included in the 4-valent and 9-valent prophylactic HPV vaccines. We established the largest collection of globally circulating HPV11 isolates to date and examined the genomic diversity of 433 isolates and 78 complete genomes (CGs) from six continents. The genomic variation within the 2,800-bp E5a-E5b-L1-upstream regulatory region was initially studied in 181/207 (87.4%) HPV11 isolates collected for this study. Of these, the CGs of 30 HPV11 variants containing unique single nucleotide polymorphisms (SNPs), indels (insertions or deletions), or amino acid changes were fully sequenced. A maximum likelihood tree based on the global alignment of 78 HPV11 CGs (30 CGs from our study and 48 CGs from GenBank) revealed two HPV11 lineages (lineages A and B) and four sublineages (sublineages A1, A2, A3, and A4). HPV11 (sub)lineage-specific SNPs within the CG were identified, as well as the 208-bp representative region for CG-based phylogenetic clustering within the partial E2 open reading frame and noncoding region 2. Globally, sublineage A2 was the most prevalent, followed by sublineages A1, A3, and A4 and lineage B. IMPORTANCE This collaborative international study defined the global heterogeneity of HPV11 and established the largest collection of globally circulating HPV11 genomic variants to date. Thirty novel complete HPV11 genomes were determined and submitted to the available sequence repositories. Global phylogenetic analysis revealed two HPV11 variant lineages and four sublineages. The HPV11 (sub)lineage-specific SNPs and the representative region identified within the partial genomic region E2/noncoding region 2 (NCR2) will enable the simpler identification and comparison of HPV11 variants worldwide. This study provides an important knowledge base for HPV11 for future studies in HPV epidemiology, evolution, pathogenicity, prevention, and molecular assay

  7. First large genomic inversion in familial cerebral cavernous malformation identified by whole genome sequencing.

    Science.gov (United States)

    Spiegler, Stefanie; Rath, Matthias; Hoffjan, Sabine; Dammann, Philipp; Sure, Ulrich; Pagenstecher, Axel; Strom, Tim; Felbor, Ute

    2018-01-01

    Familial cerebral cavernous malformations (CCMs) predispose to seizures and hemorrhagic stroke. Molecular genetic analyses of CCM1, CCM2, and CCM3 result in a mutation detection rate of up to 98%. However, only whole genome sequencing (WGS) in combination with the Manta algorithm for analyses of structural variants revealed a heterozygous 24 kB inversion including exon 1 of CCM2 in a 12-year-old boy with familial CCMs. Its breakpoints were fine-mapped, and quantitative analysis on RNA confirmed reduced CCM2 expression. Our data expand the spectrum of CCM mutations and indicate that the existence of a fourth CCM disease gene is rather unlikely.

  8. Genomic DNA Enrichment Using Sequence Capture Microarrays: a Novel Approach to Discover Sequence Nucleotide Polymorphisms (SNP) in Brassica napus L

    Science.gov (United States)

    Clarke, Wayne E.; Parkin, Isobel A.; Gajardo, Humberto A.; Gerhardt, Daniel J.; Higgins, Erin; Sidebottom, Christine; Sharpe, Andrew G.; Snowdon, Rod J.; Federico, Maria L.; Iniguez-Luy, Federico L.

    2013-01-01

    Targeted genomic selection methodologies, or sequence capture, allow for DNA enrichment and large-scale resequencing and characterization of natural genetic variation in species with complex genomes, such as rapeseed canola (Brassica napus L., AACC, 2n=38). The main goal of this project was to combine sequence capture with next generation sequencing (NGS) to discover single nucleotide polymorphisms (SNPs) in specific areas of the B. napus genome historically associated (via quantitative trait loci –QTL– analysis) to traits of agronomical and nutritional importance. A 2.1 million feature sequence capture platform was designed to interrogate DNA sequence variation across 47 specific genomic regions, representing 51.2 Mb of the Brassica A and C genomes, in ten diverse rapeseed genotypes. All ten genotypes were sequenced using the 454 Life Sciences chemistry and to assess the effect of increased sequence depth, two genotypes were also sequenced using Illumina HiSeq chemistry. As a result, 589,367 potentially useful SNPs were identified. Analysis of sequence coverage indicated a four-fold increased representation of target regions, with 57% of the filtered SNPs falling within these regions. Sixty percent of discovered SNPs corresponded to transitions while 40% were transversions. Interestingly, fifty eight percent of the SNPs were found in genic regions while 42% were found in intergenic regions. Further, a high percentage of genic SNPs was found in exons (65% and 64% for the A and C genomes, respectively). Two different genotyping assays were used to validate the discovered SNPs. Validation rates ranged from 61.5% to 84% of tested SNPs, underpinning the effectiveness of this SNP discovery approach. Most importantly, the discovered SNPs were associated with agronomically important regions of the B. napus genome generating a novel data resource for research and breeding this crop species. PMID:24312619

  9. The complete chloroplast genome sequence of Podocarpus lambertii: genome structure, evolutionary aspects, gene content and SSR detection.

    Directory of Open Access Journals (Sweden)

    Leila do Nascimento Vieira

    Full Text Available BACKGROUND: Podocarpus lambertii (Podocarpaceae is a native conifer from the Brazilian Atlantic Forest Biome, which is considered one of the 25 biodiversity hotspots in the world. The advancement of next-generation sequencing technologies has enabled the rapid acquisition of whole chloroplast (cp genome sequences at low cost. Several studies have proven the potential of cp genomes as tools to understand enigmatic and basal phylogenetic relationships at different taxonomic levels, as well as further probe the structural and functional evolution of plants. In this work, we present the complete cp genome sequence of P. lambertii. METHODOLOGY/PRINCIPAL FINDINGS: The P. lambertii cp genome is 133,734 bp in length, and similar to other sequenced cupressophytes, it lacks one of the large inverted repeat regions (IR. It contains 118 unique genes and one duplicated tRNA (trnN-GUU, which occurs as an inverted repeat sequence. The rps16 gene was not found, which was previously reported for the plastid genome of another Podocarpaceae (Nageia nagi and Araucariaceae (Agathis dammara. Structurally, P. lambertii shows 4 inversions of a large DNA fragment ∼20,000 bp compared to the Podocarpus totara cp genome. These unexpected characteristics may be attributed to geographical distance and different adaptive needs. The P. lambertii cp genome presents a total of 28 tandem repeats and 156 SSRs, with homo- and dipolymers being the most common and tri-, tetra-, penta-, and hexapolymers occurring with less frequency. CONCLUSION: The complete cp genome sequence of P. lambertii revealed significant structural changes, even in species from the same genus. These results reinforce the apparently loss of rps16 gene in Podocarpaceae cp genome. In addition, several SSRs in the P. lambertii cp genome are likely intraspecific polymorphism sites, which may allow highly sensitive phylogeographic and population structure studies, as well as phylogenetic studies of species of

  10. Integrated analysis of whole genome and transcriptome sequencing reveals diverse transcriptomic aberrations driven by somatic genomic changes in liver cancers.

    Directory of Open Access Journals (Sweden)

    Yuichi Shiraishi

    Full Text Available Recent studies applying high-throughput sequencing technologies have identified several recurrently mutated genes and pathways in multiple cancer genomes. However, transcriptional consequences from these genomic alterations in cancer genome remain unclear. In this study, we performed integrated and comparative analyses of whole genomes and transcriptomes of 22 hepatitis B virus (HBV-related hepatocellular carcinomas (HCCs and their matched controls. Comparison of whole genome sequence (WGS and RNA-Seq revealed much evidence that various types of genomic mutations triggered diverse transcriptional changes. Not only splice-site mutations, but also silent mutations in coding regions, deep intronic mutations and structural changes caused splicing aberrations. HBV integrations generated diverse patterns of virus-human fusion transcripts depending on affected gene, such as TERT, CDK15, FN1 and MLL4. Structural variations could drive over-expression of genes such as WNT ligands, with/without creating gene fusions. Furthermore, by taking account of genomic mutations causing transcriptional aberrations, we could improve the sensitivity of deleterious mutation detection in known cancer driver genes (TP53, AXIN1, ARID2, RPS6KA3, and identified recurrent disruptions in putative cancer driver genes such as HNF4A, CPS1, TSC1 and THRAP3 in HCCs. These findings indicate genomic alterations in cancer genome have diverse transcriptomic effects, and integrated analysis of WGS and RNA-Seq can facilitate the interpretation of a large number of genomic alterations detected in cancer genome.

  11. Identifying Human Genome-Wide CNV, LOH and UPD by Targeted Sequencing of Selected Regions.

    Directory of Open Access Journals (Sweden)

    Wei Li

    Full Text Available Copy-number variations (CNV, loss of heterozygosity (LOH, and uniparental disomy (UPD are large genomic aberrations leading to many common inherited diseases, cancers, and other complex diseases. An integrated tool to identify these aberrations is essential in understanding diseases and in designing clinical interventions. Previous discovery methods based on whole-genome sequencing (WGS require very high depth of coverage on the whole genome scale, and are cost-wise inefficient. Another approach, whole exome genome sequencing (WEGS, is limited to discovering variations within exons. Thus, we are lacking efficient methods to detect genomic aberrations on the whole genome scale using next-generation sequencing technology. Here we present a method to identify genome-wide CNV, LOH and UPD for the human genome via selectively sequencing a small portion of genome termed Selected Target Regions (SeTRs. In our experiments, the SeTRs are covered by 99.73%~99.95% with sufficient depth. Our developed bioinformatics pipeline calls genome-wide CNVs with high confidence, revealing 8 credible events of LOH and 3 UPD events larger than 5M from 15 individual samples. We demonstrate that genome-wide CNV, LOH and UPD can be detected using a cost-effective SeTRs sequencing approach, and that LOH and UPD can be identified using just a sample grouping technique, without using a matched sample or familial information.

  12. The complete chloroplast genome sequences of five Epimedium species: lights into phylogenetic and taxonomic analyses

    Directory of Open Access Journals (Sweden)

    Yanjun eZhang

    2016-03-01

    Full Text Available Epimedium L. is a phylogenetically and economically important genus in the family Berberidaceae. We here sequenced the complete chloroplast (cp genomes of four Epimedium species using Illumina sequencing technology via a combination of de novo and reference-guided assembly, which was also the first comprehensive cp genome analysis on Epimedium combining the cp genome sequence of E. koreanum previously reported. The five Epimedium cp genomes exhibited typical quadripartite and circular structure that was rather conserved in genomic structure and the synteny of gene order. However, these cp genomes presented obvious variations at the boundaries of the four regions because of the expansion and contraction of the inverted repeat (IR region and the single-copy (SC boundary regions. The trnQ-UUG duplication occurred in the five Epimedium cp genomes, which was not found in the other basal eudicotyledons. The rapidly evolving cp genome regions were detected among the five cp genomes, as well as the difference of simple sequence repeats (SSR and repeat sequence were identified. Phylogenetic relationships among the five Epimedium species based on their cp genomes showed accordance with the updated system of the genus on the whole, but reminded that the evolutionary relationships and the divisions of the genus need further investigation applying more evidences. The availability of these cp genomes provided valuable genetic information for accurately identifying species, taxonomy and phylogenetic resolution and evolution of Epimedium, and assist in exploration and utilization of Epimedium plants.

  13. Genetic counselors' views and experiences with the clinical integration of genome sequencing.

    Science.gov (United States)

    Machini, Kalotina; Douglas, Jessica; Braxton, Alicia; Tsipis, Judith; Kramer, Kate

    2014-08-01

    In recent years, new sequencing technologies known as next generation sequencing (NGS) have provided scientists the ability to rapidly sequence all known coding as well as non-coding sequences in the human genome. As the two emerging approaches, whole exome (WES) and whole genome (WGS) sequencing, have started to be integrated in the clinical arena, we sought to survey health care professionals who are likely to be involved in the implementation process now and/or in the future (e.g., genetic counselors, geneticists and nurse practitioners). Two hundred twenty-one genetic counselors- one third of whom currently offer WES/WGS-participated in an anonymous online survey. The aims of the survey were first, to identify barriers to the implementation of WES/WGS, as perceived by survey participants; second, to provide the first systematic report of current practices regarding the integration of WES/WGS in clinic and/or research across the US and Canada and to illuminate the roles and challenges of genetic counselors participating in this process; and third to evaluate the impact of WES/WGS on patient care. Our results showed that genetic counseling practices with respect to WES/WGS are consistent with the criteria set forth in the ACMG 2012 policy statement, which highlights indications for testing, reporting, and pre/post test considerations. Our respondents described challenges related to offering WES/WGS, which included billing issues, the duration and content of the consent process, result interpretation and disclosure of incidental findings and variants of unknown significance. In addition, respondents indicated that specialty area (i.e., prenatal and cancer), lack of clinical utility of WES/WGS and concerns about interpretation of test results were factors that prevented them from offering this technology to patients. Finally, study participants identified the aspects of their professional training which have been most beneficial in aiding with the integration of

  14. Isolation and enrichment of Cryptosporidium DNA and verification of DNA purity for whole-genome sequencing.

    Science.gov (United States)

    Guo, Yaqiong; Li, Na; Lysén, Colleen; Frace, Michael; Tang, Kevin; Sammons, Scott; Roellig, Dawn M; Feng, Yaoyu; Xiao, Lihua

    2015-02-01

    Whole-genome sequencing of Cryptosporidium spp. is hampered by difficulties in obtaining sufficient, highly pure genomic DNA from clinical specimens. In this study, we developed procedures for the isolation and enrichment of Cryptosporidium genomic DNA from fecal specimens and verification of DNA purity for whole-genome sequencing. The isolation and enrichment of genomic DNA were achieved by a combination of three oocyst purification steps and whole-genome amplification (WGA) of DNA from purified oocysts. Quantitative PCR (qPCR) analysis of WGA products was used as an initial quality assessment of amplified genomic DNA. The purity of WGA products was assessed by Sanger sequencing of cloned products. Next-generation sequencing tools were used in final evaluations of genome coverage and of the extent of contamination. Altogether, 24 fecal specimens of Cryptosporidium parvum, C. hominis, C. andersoni, C. ubiquitum, C. tyzzeri, and Cryptosporidium chipmunk genotype I were processed with the procedures. As expected, WGA products with low (sequences in Sanger sequencing. The cloning-sequencing analysis, however, showed significant contamination in 5 WGA products (proportion of positive colonies derived from Cryptosporidium genomic DNA, ≤25%). Following this strategy, 20 WGA products from six Cryptosporidium species or genotypes with low (mostly sequencing, generating sequence data covering 94.5% to 99.7% of Cryptosporidium genomes, with mostly minor contamination from bacterial, fungal, and host DNA. These results suggest that the described strategy can be used effectively for the isolation and enrichment of Cryptosporidium DNA from fecal specimens for whole-genome sequencing. Copyright © 2015, American Society for Microbiology. All Rights Reserved.

  15. BAC-end sequences analysis provides first insights into coffee (Coffea canephora P.) genome composition and evolution.

    Science.gov (United States)

    Dereeper, Alexis; Guyot, Romain; Tranchant-Dubreuil, Christine; Anthony, François; Argout, Xavier; de Bellis, Fabien; Combes, Marie-Christine; Gavory, Frederick; de Kochko, Alexandre; Kudrna, Dave; Leroy, Thierry; Poulain, Julie; Rondeau, Myriam; Song, Xiang; Wing, Rod; Lashermes, Philippe

    2013-10-01

    Coffee is one of the world's most important agricultural commodities. Coffee belongs to the Rubiaceae family in the euasterid I clade of dicotyledonous plants, to which the Solanaceae family also belongs. Two bacterial artificial chromosome (BAC) libraries of a homozygous doubled haploid plant of Coffea canephora were constructed using two enzymes, HindIII and BstYI. A total of 134,827 high quality BAC-end sequences (BESs) were generated from the 73,728 clones of the two libraries, and 131,412 BESs were conserved for further analysis after elimination of chloroplast and mitochondrial sequences. This corresponded to almost 13 % of the estimated size of the C. canephora genome. 6.7 % of BESs contained simple sequence repeats, the most abundant (47.8 %) being mononucleotide motifs. These sequences allow the development of numerous useful marker sites. Potential transposable elements (TEs) represented 11.9 % of the full length BESs. A difference was observed between the BstYI and HindIII libraries (14.9 vs. 8.8 %). Analysis of BESs against known coding sequences of TEs indicated that 11.9 % of the genome corresponded to known repeat sequences, like for other flowering plants. The number of genes in the coffee genome was estimated at 41,973 which is probably overestimated. Comparative genome mapping revealed that microsynteny was higher between coffee and grapevine than between coffee and tomato or Arabidopsis. BESs constitute valuable resources for the first genome wide survey of coffee and provide new insights into the composition and evolution of the coffee genome.

  16. Characterization of Three Mycobacterium spp. with Potential Use in Bioremediation by Genome Sequencing and Comparative Genomics.

    Science.gov (United States)

    Das, Sarbashis; Pettersson, B M Fredrik; Behra, Phani Rama Krishna; Ramesh, Malavika; Dasgupta, Santanu; Bhattacharya, Alok; Kirsebom, Leif A

    2015-06-16

    We provide the genome sequences of the type strains of the polychlorophenol-degrading Mycobacterium chlorophenolicum (DSM43826), the degrader of chlorinated aliphatics Mycobacterium chubuense (DSM44219) and Mycobacterium obuense (DSM44075) that has been tested for use in cancer immunotherapy. The genome sizes of M. chlorophenolicum, M. chubuense, and M. obuense are 6.93, 5.95, and 5.58 Mb with GC-contents of 68.4%, 69.2%, and 67.9%, respectively. Comparative genomic analysis revealed that 3,254 genes are common and we predicted approximately 250 genes acquired through horizontal gene transfer from different sources including proteobacteria. The data also showed that the biodegrading Mycobacterium spp. NBB4, also referred to as M. chubuense NBB4, is distantly related to the M. chubuense type strain and should be considered as a separate species, we suggest it to be named Mycobacterium ethylenense NBB4. Among different categories we identified genes with potential roles in: biodegradation of aromatic compounds and copper homeostasis. These are the first nonpathogenic Mycobacterium spp. found harboring genes involved in copper homeostasis. These findings would therefore provide insight into the role of this group of Mycobacterium spp. in bioremediation as well as the evolution of copper homeostasis within the Mycobacterium genus. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  17. Genome sequence of Thermofilum pendens reveals an exceptional loss of biosynthetic pathways without genome reduction

    Energy Technology Data Exchange (ETDEWEB)

    Kyrpides, Nikos; Anderson, Iain; Rodriguez, Jason; Susanti, Dwi; Porat, Iris; Reich, Claudia; Ulrich, Luke E.; Elkins, James G.; Mavromatis, Kostas; Lykidis, Athanasios; Kim, Edwin; Thompson, Linda S.; Nolan, Matt; Land, Miriam; Copeland, Alex; Lapidus, Alla; Lucas, Susan; Detter, Chris; Zhulin, Igor B.; Olsen, Gary J.; Whitman, William; Mukhopadhyay, Biswarup; Bristow, James; Kyrpides, Nikos

    2008-01-01

    We report the complete genome of Thermofilum pendens, a deep-branching, hyperthermophilic member of the order Thermoproteales within the archaeal kingdom Crenarchaeota. T. pendens is a sulfur-dependent, anaerobic heterotroph isolated from a solfatara in Iceland. It is an extracellular commensal, requiring an extract of Thermoproteus tenax for growth, and the genome sequence reveals that biosynthetic pathways for purines, most amino acids, and most cofactors are absent. In fact T. pendens has fewer biosynthetic enzymes than obligate intracellular parasites, although it does not display other features common among obligate parasites and thus does not appear to be in the process of becoming a parasite. It appears that T. pendens has adapted to life in an environment rich in nutrients. T. pendens was known to utilize peptides as an energy source, but the genome reveals substantial ability to grow on carbohydrates. T. pendens is the first crenarchaeote and only the second archaeon found to have a transporter of the phosphotransferase system. In addition to fermentation, T. pendens may gain energy from sulfur reduction with hydrogen and formate as electron donors. It may also be capable of sulfur-independent growth on formate with formate hydrogenlyase. Additional novel features are the presence of a monomethylamine:corrinoid methyltransferase, the first time this enzyme has been found outside of Methanosarcinales, and a presenilin-related protein. Predicted highly expressed proteins do not include housekeeping genes, and instead include ABC transporters for carbohydrates and peptides, and CRISPR-associated proteins.

  18. Genomic Enzymology: Web Tools for Leveraging Protein Family Sequence-Function Space and Genome Context to Discover Novel Functions.

    Science.gov (United States)

    Gerlt, John A

    2017-08-22

    The exponentially increasing number of protein and nucleic acid sequences provides opportunities to discover novel enzymes, metabolic pathways, and metabolites/natural products, thereby adding to our knowledge of biochemistry and biology. The challenge has evolved from generating sequence information to mining the databases to integrating and leveraging the available information, i.e., the availability of "genomic enzymology" web tools. Web tools that allow identification of biosynthetic gene clusters are widely used by the natural products/synthetic biology community, thereby facilitating the discovery of novel natural products and the enzymes responsible for their biosynthesis. However, many novel enzymes with interesting mechanisms participate in uncharacterized small-molecule metabolic pathways; their discovery and functional characterization also can be accomplished by leveraging information in protein and nucleic acid databases. This Perspective focuses on two genomic enzymology web tools that assist the discovery novel metabolic pathways: (1) Enzyme Function Initiative-Enzyme Similarity Tool (EFI-EST) for generating sequence similarity networks to visualize and analyze sequence-function space in protein families and (2) Enzyme Function Initiative-Genome Neighborhood Tool (EFI-GNT) for generating genome neighborhood networks to visualize and analyze the genome context in microbial and fungal genomes. Both tools have been adapted to other applications to facilitate target selection for enzyme discovery and functional characterization. As the natural products community has demonstrated, the enzymology community needs to embrace the essential role of web tools that allow the protein and genome sequence databases to be leveraged for novel insights into enzymological problems.

  19. Genome sequences of Phytophthora enable translational plant disease management and accelerate research

    Science.gov (United States)

    Niklaus J. Grünwald

    2012-01-01

    Whole and partial genome sequences are becoming available at an ever-increasing pace. For many plant pathogen systems, we are moving into the era of genome resequencing. The first Phytophthora genomes, P. ramorum and P. sojae, became available in 2004, followed shortly by P. infestans...

  20. Complete genome sequence of the hippuricase-positive Campylobacter avium type strain LMG 24591

    Science.gov (United States)

    Campylobacter avium is a hippurate-positive, thermotolerant campylobacter that has been isolated from poultry. Here we present the genome sequences of two C. avium strains isolated from broiler chickens: strains LMG 24591T (complete genome) and LMG 24592 (draft genome). The C. avium type strain geno...

  1. Genome Sequence of Cauliflower Mosaic Virus Identified in Earwigs (Doru luteipes) through a Metagenomic Approach.

    Science.gov (United States)

    Godinho, Márcio Tadeu; Paula, Débora Pires; Varsani, Arvind; Ribeiro, Simone Graça

    2017-03-16

    Here we report the first complete genome sequence of a cauliflower mosaic virus from Brazil, obtained from the gut content of the predator earwig ( Doru luteipes ). This virus has a genome of 8,030 nucleotides (nt) and shares 97% genome-wide identity with an isolate from Argentina. Copyright © 2017 Godinho et al.

  2. Draft genome sequence of Xylella fastidiosa pear leaf scorch strain in Taiwan

    Science.gov (United States)

    The draft genome sequence of Xylella fastidiosa pear leaf scorch strain (PLS229) isolated from pear cultivar Hengshan (Pyrus pyrifolia) in Taiwan is reported. The bacterium has a genome size of 2,733,013 bp with a G+C content of 53.1%. The PLS229 strain genome was annotated to have 3,259 open readin...

  3. Genome Sequence of the Rice Pathogen Dickeya zeae Strain ZJU1202

    OpenAIRE

    Li, Bin; Shi, Yu; Ibrahim, Muhammad; Liu, He; Shan, Changlin; Wang, Yanli; Kube, Michael; Xie, Guan-Lin; Sun, Guochang

    2012-01-01

    Dickeya zeae is a phytopathogenic bacterium causing soft rot diseases in a wide range of economically important crops. Here we present the draft genome sequence of strain ZJU1202, which is the causal agent of rice foot rot in China. The draft genome will contribute to epidemiological and comparative genomic studies and the quarantine of this devastating phytopathogen.

  4. Genome sequence of the rice pathogen Dickeya zeae strain ZJU1202.

    Science.gov (United States)

    Li, Bin; Shi, Yu; Ibrahim, Muhammad; Liu, He; Shan, Changlin; Wang, Yanli; Kube, Michael; Xie, Guan-Lin; Sun, Guochang

    2012-08-01

    Dickeya zeae is a phytopathogenic bacterium causing soft rot diseases in a wide range of economically important crops. Here we present the draft genome sequence of strain ZJU1202, which is the causal agent of rice foot rot in China. The draft genome will contribute to epidemiological and comparative genomic studies and the quarantine of this devastating phytopathogen.

  5. Draft genome sequence of the white-rot fungus Obba rivulosa 3A-2

    Science.gov (United States)

    Otto Miettinen; Robert Riley; Kerrie Barry; Daniel Cullen; Ronald P. de Vries; Matthieu Hainaut; Annele Hatakka; Bernard Henrissat; Kristiina Hilden; Rita Kuo; Kurt LaButti; Anna Lipzen; Miia R. Makela; Laura Sandor; Joseph W. Spatafora; Igor V. Grigoriev; David S. Hibbett

    2016-01-01

    We report here the first genome sequence of the white-rot fungus Obba rivulsa (Polyporales, Basidiomycota), a polypore known for its lignin-decomposing ability. The genome is based on the homokaryon 3A-2 originating in Finland. The genome is typical in size and carbohydrate active enzyme (CAZy) content for wood-decomposing basidiomycetes.

  6. The peculiar landscape of repetitive sequences in the olive (Olea europaea L.) genome.

    Science.gov (United States)

    Barghini, Elena; Natali, Lucia; Cossu, Rosa Maria; Giordani, Tommaso; Pindo, Massimo; Cattonaro, Federica; Scalabrin, Simone; Velasco, Riccardo; Morgante, Michele; Cavallini, Andrea

    2014-04-01

    Analyzing genome structure in different species allows to gain an insight into the evolution of plant genome size. Olive (Olea europaea L.) has a medium-sized haploid genome of 1.4 Gb, whose structure is largely uncharacterized, despite the growing importance of this tree as oil crop. Next-generation sequencing technologies and different computational procedures have been used to study the composition of the olive genome and its repetitive fraction. A total of 2.03 and 2.3 genome equivalents of Illumina and 454 reads from genomic DNA, respectively, were assembled following different procedures, which produced more than 200,000 differently redundant contigs, with mean length higher than 1,000 nt. Mapping Illumina reads onto the assembled sequences was used to estimate their redundancy. The genome data set was subdivided into highly and medium redundant and nonredundant contigs. By combining identification and mapping of repeated sequences, it was established that tandem repeats represent a very large portion of the olive genome (∼31% of the whole genome), consisting of six main families of different length, two of which were first discovered in these experiments. The other large redundant class in the olive genome is represented by transposable elements (especially long terminal repeat-retrotransposons). On the whole, the results of our analyses show the peculiar landscape of the olive genome, related to the massive amplification of tandem repeats, more than that reported for any other sequenced plant genome.

  7. Whole-Genome Sequencing Coupled to Imputation Discovers Genetic Signals for Anthropometric Traits

    DEFF Research Database (Denmark)

    Tachmazidou, Ioanna; Süveges, Dániel; Min, Josine L

    2017-01-01

    Deep sequence-based imputation can enhance the discovery power of genome-wide association studies by assessing previously unexplored variation across the common- and low-frequency spectra. We applied a hybrid whole-genome sequencing (WGS) and deep imputation approach to examine the broader alleli...

  8. Draft Genome Sequences of Four Hospital-Associated Pseudomonas putida Isolates.

    Science.gov (United States)

    Mustapha, Mustapha M; Marsh, Jane W; Ezeonwuka, Chinelo D; Pasculle, Anthony W; Pacey, Marissa P; Querry, Ashley M; Muto, Carlene A; Harrison, Lee H

    2016-09-29

    We present here the draft genome sequences of four Pseudomonas putida isolates belonging to a single clone suspected for nosocomial transmission between patients and a bronchoscope in a tertiary hospital. The four genome sequences belong to a single lineage but contain differences in their mobile genetic elements. Copyright © 2016 Mustapha et al.

  9. Draft Genome Sequence of Lactobacillus plantarum wikim18, Isolated from Korean Kimchi

    Science.gov (United States)

    Jang, Ja Young; Lim, Hyeong In; Park, Hae Woong; Choi, Hak-Jong; Kim, Tae-Woon; Kang, Miran

    2014-01-01

    This report describes the draft genome sequence of Lactobacillus plantarum strain wikim18, isolated from the traditional Korean food kimchi. The reads generated by Ion Torrent PGM were assembled into 327 contigs. RAST annotation of the genome revealed 12 tRNAs and 3,316 protein-coding gene sequences. PMID:24855305

  10. Whole genome sequencing as a tool for phylogenetic analysis of clinical strains of Mitis group streptococci

    DEFF Research Database (Denmark)

    Rasmusen, L. H.; Dargis, R.; Iversen, Katrine Højholt

    2016-01-01

    with infective endocarditis were whole genome sequenced. We compared the phylogenetic analyses based on single genes (recA, sodA, gdh), multigene (MLSA), SNPs, and core-genome sequences. The six phylogenetic analyses generally showed a similar pattern of six monophyletic clusters, though a few differences were...

  11. The genomic sequence of the Chinese hamster ovary (CHO)-K1 cell line

    DEFF Research Database (Denmark)

    Xu, Xun; Pan, Shengkai; Liu, Xin

    2011-01-01

    Chinese hamster ovary (CHO)-derived cell lines are the preferred host cells for the production of therapeutic proteins. Here we present a draft genomic sequence of the CHO-K1 ancestral cell line. The assembly comprises 2.45 Gb of genomic sequence, with 24,383 predicted genes. We associate most...

  12. Complete Genomic Sequence of Maize Rough Dwarf Virus, a Fijivirus Transmitted by the Small Brown Planthopper

    OpenAIRE

    Lv, Mingfang; Xie, Li; Yang, Jian; Chen, Jianping; Zhang, Heng-Mu

    2016-01-01

    The nucleotide sequences of the 10 genomic segments of an Italian isolate of maize rough dwarf virus (MRDV) were determined. This first complete genomic sequence of MRDV will help understand the phylogenetic relationships among group 2 fijiviruses and especially the closely related rice black-streaked dwarf virus, which is also found to naturally infect maize.

  13. The genomic sequence of the Chinese hamster ovary (CHO)-K1 cell line

    DEFF Research Database (Denmark)

    Xu, Xun; Pan, Shengkai; Liu, Xin

    2011-01-01

    Chinese hamster ovary (CHO)-derived cell lines are the preferred host cells for the production of therapeutic proteins. Here we present a draft genomic sequence of the CHO-K1 ancestral cell line. The assembly comprises 2.45 Gb of genomic sequence, with 24,383 predicted genes. We associate most of...

  14. Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes

    DEFF Research Database (Denmark)

    Albertsen, Mads; Hugenholtz, Philip; Skarshewski, Adam

    2013-01-01

    Reference genomes are required to understand the diverse roles of microorganisms in ecology, evolution, human and animal health, but most species remain uncultured. Here we present a sequence composition–independent approach to recover high-quality microbial genomes from deeply sequenced metageno...

  15. Complete Genome Sequences of Edwardsiella tarda-Lytic Bacteriophages KF-1 and IW-1.

    Science.gov (United States)

    Yasuike, Motoshige; Sugaya, Emi; Nakamura, Yoji; Shigenobu, Yuya; Kawato, Yasuhiko; Kai, Wataru; Fujiwara, Atushi; Sano, Motohiko; Kobayashi, Takanori; Nakai, Toshihiro

    2013-01-01

    We report the complete genome sequences of two Edwardsiella tarda-lytic bacteriophages isolated from flounder kidney (KF-1) and seawater (IW-1). These newly sequenced phage genomes provide a novel resource for future studies on phage-host interaction mechanisms and various applications of the phages for control of edwardsiellosis in aquaculture.

  16. First complete genome sequence of canine bocavirus 2 in mainland China

    Directory of Open Access Journals (Sweden)

    S.-L. Zhai

    2017-07-01

    Full Text Available We obtained the first full-length genome sequence of canine bocavirus 2 (CBoV2 from the faeces of a healthy dog in Guangzhou city, Guangdong province, mainland China. The genome of GZHD15 consisted of 5059 nucleotides. Sequence analysis suggested that GZHD15 was close to a previously circulated Hong Kong isolate.

  17. Genome sequence of Kocuria palustris strain W4

    DEFF Research Database (Denmark)

    Herschend, Jakob; Raghupathi, Prem Krishnan; Røder, Henriette Lyng

    2016-01-01

    We report the 3.09 Mb draft genome sequence ofKocuria palustrisW4, isolated from a slaughterhouse in Denmark.......We report the 3.09 Mb draft genome sequence ofKocuria palustrisW4, isolated from a slaughterhouse in Denmark....

  18. Genome Sequence of Madurella mycetomatis mm55, Isolated from a Human Mycetoma Case in Sudan

    NARCIS (Netherlands)

    Smit, S.; Derks, M.F.L.; Bervoets, Sander; Fahal, Ahmed; Leeuwen, van Willem; Belkum, van Alex; Sande, van de Wendy W.J.

    2016-01-01

    We present the first genome sequence for a strain of the main mycetoma causative agent, Madurella mycetomatis. This 36.7-Mb genome sequence will offer new insights into the pathogenesis of mycetoma, and it will contribute to the development of better therapies for this neglected tropical disease.

  19. Genome Sequence of Helicobacter bizzozeroniiStrain CIII-1, an Isolate from Human Gastric Mucosa ▿

    Science.gov (United States)

    Schott, Thomas; Rossi, Mirko; Hänninen, Marja-Liisa

    2011-01-01

    The canine-adapted Helicobacter bizzozeroniiis the only nonpylori Helicobacterspecies isolated from human gastric biopsy tissue. Here we present the genome sequence of strain CIII-1, isolated from a 45-year-old female patient with severe gastric symptoms. This is the first genome sequence of nonpylori gastric Helicobacterisolated from human gastritis. PMID:21705603

  20. Whole-genome sequence of Bacillus solimangrovi GH 2-4T, isolated from mangrove soil

    OpenAIRE

    Lim, Sooyeon; Chang, Dong-Ho; Kim, Byoung-Chan

    2016-01-01

    Bacillus solimangrovi GH 2-4T was isolated from mangrove soil and subjected to whole genome sequencing on HiSeq platform and annotated on RAST. The nucleotide sequence of this genome was deposited into DDBJ/EMBL/GenBank under the accession MJEH00000000.

  1. Genome Sequence of Amycolatopsis sp Strain ATCC 39116, a Plant Biomass-Degrading Actinomycete

    Energy Technology Data Exchange (ETDEWEB)

    Davis, Jennifer R. [Brown University; Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Woyke, Tanja [U.S. Department of Energy, Joint Genome Institute; Teshima, Hazuki [Los Alamos National Laboratory (LANL); Bruce, David [Los Alamos National Laboratory (LANL); Detter, J. Chris [U.S. Department of Energy, Joint Genome Institute; Tapia, Roxanne [Los Alamos National Laboratory (LANL); Han, Shunsheng [Los Alamos National Laboratory (LANL); Han, James [U.S. Department of Energy, Joint Genome Institute; Pitluck, Sam [U.S. Department of Energy, Joint Genome Institute; Nolan, Matt [U.S. Department of Energy, Joint Genome Institute; Mikhailova, Natalia [U.S. Department of Energy, Joint Genome Institute; Land, Miriam L [ORNL; Sello, Jason K. [Brown University

    2012-01-01

    We announce the availability of a high-quality draft of the genome sequence of Amycolatopsis sp. strain 39116, one of few bacterial species that are known to consume the lignin component of plant biomass. This genome sequence will further ongoing efforts to use microorganisms for the conversion of plant biomass into fuels and high-value chemicals.

  2. Complete Genome Sequence of Photobacterium sp. Strain J15, Isolated from Seawater of Southwestern Johor, Malaysia.

    Science.gov (United States)

    Roslan, Noordiyanah Nadhirah; Sabri, Suriana; Oslan, Siti Nurbaya; Baharum, Syarul Nataqain; Leow, Thean Chor

    2016-07-28

    Here, we report the genome sequences of Photobacterium sp. strain J15, isolated from seawater in Johor, Malaysia, with the ability to produce lipase and asparaginase. The PacBio genome sequence analysis of Photobacterium sp. strain J15 generated revealed its potential in producing enzymes with different catalytic functions. Copyright © 2016 Roslan et al.

  3. Genome Sequence of the Acidophilic Bacterium Acidocella sp. Strain MX-AZ02

    DEFF Research Database (Denmark)

    Servín-Garcidueñas, Luis E.; Garrett, Roger A.; Amils, Ricardo

    2013-01-01

    Here, we report the draft genome sequence of Acidocella sp. strain MX-AZ02, an acidophilic and heterotrophic alphaproteobacterium isolated from a geothermal lake in western Mexico.......Here, we report the draft genome sequence of Acidocella sp. strain MX-AZ02, an acidophilic and heterotrophic alphaproteobacterium isolated from a geothermal lake in western Mexico....

  4. Genome Sequences of Three Strains of Lactobacillus paracasei of Different Origins and with Different Cholate Sensitivities

    Science.gov (United States)

    Shiwa, Yuh; Atarashi, Hotaka; Tanaka, Naoto; Okada, Sanae; Yoshikawa, Hirofumi; Endo, Akihito; Miyaji, Tatsuro

    2015-01-01

    We report here the draft genome sequences of three strains of Lactobacillus paracasei (NRIC 0644, NRIC 1781, and NRIC 1917) isolated from different sources. The three genomes range from 2.95 to 3.15 Mb with a G+C content of 46% and contain approximately 2,700 protein coding sequences. PMID:25931593

  5. Genomic survey of the ectoparasitic mite Varroa destructor, a major pest of the honey bee Apis mellifera

    Directory of Open Access Journals (Sweden)

    Elsik Chris

    2010-10-01

    Full Text Available Abstract Background The ectoparasitic mite Varroa destructor has emerged as the primary pest of domestic honey bees (Apis mellifera. Here we present an initial survey of the V. destructor genome carried out to advance our understanding of Varroa biology and to identify new avenues for mite control. This sequence survey provides immediate resources for molecular and population-genetic analyses of Varroa-Apis interactions and defines the challenges ahead for a comprehensive Varroa genome project. Results The genome size was estimated by flow cytometry to be 565 Mbp, larger than most sequenced insects but modest relative to some other Acari. Genomic DNA pooled from ~1,000 mites was sequenced to 4.3× coverage with 454 pyrosequencing. The 2.4 Gbp of sequencing reads were assembled into 184,094 contigs with an N50 of 2,262 bp, totaling 294 Mbp of sequence after filtering. Genic sequences with homology to other eukaryotic genomes were identified on 13,031 of these contigs, totaling 31.3 Mbp. Alignment of protein sequence blocks conserved among V. destructor and four other arthropod genomes indicated a higher level of sequence divergence within this mite lineage relative to the tick Ixodes scapularis. A number of microbes potentially associated with V. destructor were identified in the sequence survey, including ~300 Kbp of sequence deriving from one or more bacterial species of the Actinomycetales. The presence of this bacterium was confirmed in individual mites by PCR assay, but varied significantly by age and sex of mites. Fragments of a novel virus related to the Baculoviridae were also identified in the survey. The rate of single nucleotide polymorphisms (SNPs in the pooled mites was estimated to be 6.2 × 10-5per bp, a low rate consistent with the historical demography and life history of the species. Conclusions This survey has provided general tools for the research community and novel directions for investigating the biology and control of

  6. Analysis Of Segmental Duplications In The Pig Genome Based On Next-Generation Sequencing

    DEFF Research Database (Denmark)

    Fadista, João; Bendixen, Christian

    extensively studied in other organisms, its analysis in pig has been hampered by the lack of a complete pig genome assembly. By measuring the depth of coverage of Illumina whole-genome shotgun sequencing reads of the Tabasco animal aligned to the latest pig genome assembly (Sus scrofa 10 – based also...... on Tabasco), led us to the detection of a high-resolution map of segmental duplications in the pig genome. Comparing these segments with four other Duroc animals sequenced at our institute, supplied the resources needed to describe the first genome-wide and systematic analysis of segmental duplications...

  7. Complete Genome Sequence of Thermus aquaticus Y51MC23.

    Directory of Open Access Journals (Sweden)

    Phillip J Brumm

    Full Text Available Thermus aquaticus Y51MC23 was isolated from a boiling spring in the Lower Geyser Basin of Yellowstone National Park. Remarkably, this T. aquaticus strain is able to grow anaerobically and produces multiple morphological forms. Y51MC23 is a Gram-negative, rod-shaped organism that grows well between 50°C and 80°C with maximum growth rate at 65°C to 70°C. Growth studies suggest that Y51MC23 primarily scavenges protein from the environment, supported by the high number of secreted and intracellular proteases and peptidases as well as transporter systems for amino acids and peptides. The genome was assembled de novo using a 350 bp fragment library (paired end sequencing and an 8 kb long span mate pair library. A closed and finished genome was obtained consisting of a single chromosome of 2.15 Mb and four plasmids of 11, 14, 70, and 79 kb. Unlike other Thermus species, functions usually found on megaplasmids were identified on the chromosome. The Y51MC23 genome contains two full and two partial prophage as well as numerous CRISPR loci. The high identity and synteny between Y51MC23 prophage 2 and that of Thermus sp. 2.9 is interesting, given the 8,800 km separation of the two hot springs from which they were isolated. The anaerobic lifestyle of Y51MC23 is complex, with multiple morphologies present in cultures. The use of fluorescence microscopy reveals new details about these unusual morphological features, including the presence of multiple types of large and small spheres, often forming a confluent layer of spheres. Many of the spheres appear to be formed not from cell envelope or outer membrane components as previously believed, but from a remodeled peptidoglycan cell wall. These complex morphological forms may serve multiple functions in the survival of the organism, including food and nucleic acid storage as well as colony attachment and organization.

  8. A complete mitochondrial genome sequence from a mesolithic wild aurochs (Bos primigenius).

    LENUS (Irish Health Repository)

    Edwards, Ceiridwen J

    2010-01-01

    BACKGROUND: The derivation of domestic cattle from the extinct wild aurochs (Bos primigenius) has been well-documented by archaeological and genetic studies. Genetic studies point towards the Neolithic Near East as the centre of origin for Bos taurus, with some lines of evidence suggesting possible, albeit rare, genetic contributions from locally domesticated wild aurochsen across Eurasia. Inferences from these investigations have been based largely on the analysis of partial mitochondrial DNA sequences generated from modern animals, with limited sequence data from ancient aurochsen samples. Recent developments in DNA sequencing technologies, however, are affording new opportunities for the examination of genetic material retrieved from extinct species, providing new insight into their evolutionary history. Here we present DNA sequence analysis of the first complete mitochondrial genome (16,338 base pairs) from an archaeologically-verified and exceptionally-well preserved aurochs bone sample. METHODOLOGY: DNA extracts were generated from an aurochs humerus bone sample recovered from a cave site located in Derbyshire, England and radiocarbon-dated to 6,738+\\/-68 calibrated years before present. These extracts were prepared for both Sanger and next generation DNA sequencing technologies (Illumina Genome Analyzer). In total, 289.9 megabases (22.48%) of the post-filtered DNA sequences generated using the Illumina Genome Analyzer from this sample mapped with confidence to the bovine genome. A consensus B. primigenius mitochondrial genome sequence was constructed and was analysed alongside all available complete bovine mitochondrial genome sequences. CONCLUSIONS: For all nucleotide positions where both Sanger and Illumina Genome Analyzer sequencing methods gave high-confidence calls, no discrepancies were observed. Sequence analysis reveals evidence of heteroplasmy in this sample and places this mitochondrial genome sequence securely within a previously identified

  9. Genome sequence of bubaline alphaherpesvirus 1 (BuHV1) isolated in Australia in 1972.

    Science.gov (United States)

    Scheffer, Camila Mengue; Varela, Ana Paula Muterle; Cibulski, Samuel Paulo; Schmidt, Candice; Campos, Fabrício Souza; Paim, Willian Pinto; Dos Santos, Raíssa Nunes; Teixeira, Thais Fumaco; Loiko, Márcia Regina; Tochetto, Caroline; Dos Santos, Helton Fernandes; de Lima, Diane Alves; Cerva, Cristine; Mayer, Fabiana Quoos; Petzhold, Sylio Alfredo; Franco, Ana Cláudia; George, Toby St; Spilki, Fernando Rosado; Roehe, Paulo Michel

    2017-05-01

    Bubaline alphaherpesvirus 1 (BuHV1) is a member of the family Herpesviridae, subfamily Alphaherpesvirinae, genus Varicellovirus. To date, no full genome sequence of BuHV has been published. Here, we report the complete genome sequence of bubaline alphaherpesvirus 1 (BuHV1) strain b6 (BuHV1-b6), isolated from a water buffalo (Bubalus bubalis) in 1972 in Australia. The virus was multiplied in MDBK cells, and the DNA was extracted and subjected to high-throughput sequencing. The reads were aligned and combined into a single genome sequence, with bovine alphaherpesvirus 5 (BoHV5) strain SV507/99 (accession number NC005261) as a reference. The BuHV1-b6 genome is a linear double-stranded DNA molecule, 137,452 bp long, with a GC content of 76.8%. The genome consists of two unique sequences: a long, or UL, sequence (103,818 bp) and a short, or US, sequence (9,586 bp), with the latter being flanked by inverted IR and TR elements of 12,024 bp each. The arrangement is typical of herpesvirus genomes of the D-type. The overall sequence has a 92.2% similarity at the nucleotide level to the reference BoHV5 strain. Our report provides a significant landmark in the history of herpesviruses, represented by the genome sequence of this 44-year-old virus isolate.

  10. ChickVD: a sequence variation database for the chicken genome

    DEFF Research Database (Denmark)

    Wang, Jing; He, Ximiao; Ruan, Jue

    2005-01-01

    Variation Database' (ChickVD). A graphical MapView shows variants mapped onto the chicken genome in the context of gene annotations and other features, including genetic markers, trait loci, cDNAs, chicken orthologs of human disease genes and raw sequence traces. ChickVD also stores information......Working in parallel with the efforts to sequence the chicken (Gallus gallus) genome, the Beijing Genomics Institute led an international team of scientists from China, USA, UK, Sweden, The Netherlands and Germany to map extensive DNA sequence variation throughout the chicken genome by sampling DNA...... from domestic breeds. Using the Red Jungle Fowl genome sequence as a reference, we identified 3.1 million non-redundant DNA sequence variants. To facilitate the application of our data to avian genetics and to provide a foundation for functional and evolutionary studies, we created the 'Chicken...

  11. Whole-Genome Sequence of the Nitrogen-Fixing Symbiotic Rhizobium Mesorhizobium loti Strain TONO.

    Science.gov (United States)

    Shimoda, Yoshikazu; Hirakawa, Hideki; Sato, Shusei; Saeki, Kazuhiko; Hayashi, Makoto

    2016-10-06

    Mesorhizobium loti is the nitrogen-fixing microsymbiont for legumes of the genus Lotus Here, we report the whole-genome sequence of a Mesorhizobium loti strain, TONO, which is used as a symbiont for the model legume Lotus japonicus The whole-genome sequence of the strain TONO will be a solid platform for comparative genomics analyses and for the identification of genes responsible for the symbiotic properties of Mesorhizobium species. Copyright © 2016 Shimoda et al.

  12. Whole-Genome Sequence of the Nitrogen-Fixing Symbiotic Rhizobium Mesorhizobium loti Strain TONO

    Science.gov (United States)

    Hirakawa, Hideki; Sato, Shusei; Saeki, Kazuhiko; Hayashi, Makoto

    2016-01-01

    Mesorhizobium loti is the nitrogen-fixing microsymbiont for legumes of the genus Lotus. Here, we report the whole-genome sequence of a Mesorhizobium loti strain, TONO, which is used as a symbiont for the model legume Lotus japonicus. The whole-genome sequence of the strain TONO will be a solid platform for comparative genomics analyses and for the identification of genes responsible for the symbiotic properties of Mesorhizobium species. PMID:27795235

  13. Complete DNA sequence of the linear mitochondrial genome of the pathogenic yeast Candida parapsilosis

    DEFF Research Database (Denmark)

    Nosek, J.; Novotna, M.; Hlavatovicova, Z.

    2004-01-01

    The complete sequence of the mitochondrial DNA of the opportunistic yeast pathogen Candida parapsilosis was determined. The mitochondrial genome is represented by linear DNA molecules terminating with tandem repeats of a 738-bp unit. The number of repeats varies, thus generating a population...... mitochondrial genome of its close relative C. albicans. The complete sequence has implications for both mitochondrial DNA replication and the evolution of linear DNA genomes....

  14. Genome Sequence of Human Papillomavirus Type 20, Strain HPV-20/Lancaster/2015.

    Science.gov (United States)

    Atkinson, Kate V; Bishop, Lisa A; Rhodes, Glenn; Salez, Nicolas; McEwan, Neil R; Hegarty, Matthew J; Robey, Julie; Harding, Nicola; Wetherell, Simon; Lauder, Robert M; Pickup, Roger W; Wilkinson, Mark; Gatherer, Derek

    2017-08-03

    The genome sequence of human papillomavirus type 20 (HPV-20; family Papillomaviridae , genus Betapapillomavirus , species Betapapillomavirus 1 , type 20) was assembled by deep sequencing from nasopharyngeal swabs. The assembled genome is 0.37% divergent over its full length from the single complete genome of HPV-20 in GenBank (U31778). We named the strain HPV-20/Lancaster/2015. Copyright © 2017 Atkinson et al.

  15. Whole-genome sequence of the first sequence type 27 Brucella ceti strain isolated from European waters

    DEFF Research Database (Denmark)

    Duvnjak, Sanja; Spicic, Silvio; Kusar, Darja

    2017-01-01

    Brucella spp. that cause marine brucellosis are becoming more important, as the disease appears to be more widespread than originally thought. Here, we report a whole and annotated genome sequence of Brucella ceti CRO350, a sequence type 27 strain isolated from a bottlenose dolphin carcass found...

  16. Genome sequencing highlights the dynamic early history of dogs.

    Directory of Open Access Journals (Sweden)

    Adam H Freedman

    2014-01-01

    Full Text Available To identify genetic changes underlying dog domestication and reconstruct their early evolutionary history, we generated high-quality genome sequences from three gray wolves, one from each of the three putative centers of dog domestication, two basal dog lineages (Basenji and Dingo and a golden jackal as an outgroup. Analysis of these sequences supports a demographic model in which dogs and wolves diverged through a dynamic process involving population bottlenecks in both lineages and post-divergence gene flow. In dogs, the domestication bottleneck involved at least a 16-fold reduction in population size, a much more severe bottleneck than estimated previously. A sharp bottleneck in wolves occurred soon after their divergence from dogs, implying that the pool of diversity from which dogs arose was substantially larger than represented by modern wolf populations. We narrow the plausible range for the date of initial dog domestication to an interval spanning 11-16 thousand years ago, predating the rise of agriculture. In light of this finding, we expand upon previous work regarding the increase in copy number of the amylase gene (AMY2B in dogs, which is believed to have aided digestion of starch in agricultural refuse. We find standing variation for amylase copy number variation in wolves and little or no copy number increase in the Dingo and Husky lineages. In conjunction with the estimated timing of dog origins, these results provide additional support to archaeological finds, suggesting the earliest dogs arose alongside hunter-gathers rather than agriculturists. Regarding the geographic origin of dogs, we find that, surprisingly, none of the extant wolf lineages from putative domestication centers is more closely related to dogs, and, instead, the sampled wolves form a sister monophyletic clade. This result, in combination with dog-wolf admixture during the process of domestication, suggests that a re-evaluation of past hypotheses regarding dog

  17. Genome sequencing highlights the dynamic early history of dogs.

    Science.gov (United States)

    Freedman, Adam H; Gronau, Ilan; Schweizer, Rena M; Ortega-Del Vecchyo, Diego; Han, Eunjung; Silva, Pedro M; Galaverni, Marco; Fan, Zhenxin; Marx, Peter; Lorente-Galdos, Belen; Beale, Holly; Ramirez, Oscar; Hormozdiari, Farhad; Alkan, Can; Vilà, Carles; Squire, Kevin; Geffen, Eli; Kusak, Josip; Boyko, Adam R; Parker, Heidi G; Lee, Clarence; Tadigotla, Vasisht; Wilton, Alan; Siepel, Adam; Bustamante, Carlos D; Harkins, Timothy T; Nelson, Stanley F; Ostrander, Elaine A; Marques-Bonet, Tomas; Wayne, Robert K; Novembre, John

    2014-01-01

    To identify genetic changes underlying dog domestication and reconstruct their early evolutionary history, we generated high-quality genome sequences from three gray wolves, one from each of the three putative centers of dog domestication, two basal dog lineages (Basenji and Dingo) and a golden jackal as an outgroup. Analysis of these sequences supports a demographic model in which dogs and wolves diverged through a dynamic process involving population bottlenecks in both lineages and post-divergence gene flow. In dogs, the domestication bottleneck involved at least a 16-fold reduction in population size, a much more severe bottleneck than estimated previously. A sharp bottleneck in wolves occurred soon after their divergence from dogs, implying that the pool of diversity from which dogs arose was substantially larger than represented by modern wolf populations. We narrow the plausible range for the date of initial dog domestication to an interval spanning 11-16 thousand years ago, predating the rise of agriculture. In light of this finding, we expand upon previous work regarding the increase in copy number of the amylase gene (AMY2B) in dogs, which is believed to have aided digestion of starch in agricultural refuse. We find standing variation for amylase copy number variation in wolves and little or no copy number increase in the Dingo and Husky lineages. In conjunction with the estimated timing of dog origins, these results provide additional support to archaeological finds, suggesting the earliest dogs arose alongside hunter-gathers rather than agriculturists. Regarding the geographic origin of dogs, we find that, surprisingly, none of the extant wolf lineages from putative domestication centers is more closely related to dogs, and, instead, the sampled wolves form a sister monophyletic clade. This result, in combination with dog-wolf admixture during the process of domestication, suggests that a re-evaluation of past hypotheses regarding dog origins is

  18. Genome Sequencing Highlights the Dynamic Early History of Dogs

    Science.gov (United States)

    Freedman, Adam H.; Gronau, Ilan; Schweizer, Rena M.; Ortega-Del Vecchyo, Diego; Han, Eunjung; Silva, Pedro M.; Galaverni, Marco; Fan, Zhenxin; Marx, Peter; Lorente-Galdos, Belen; Beale, Holly; Ramirez, Oscar; Hormozdiari, Farhad; Alkan, Can; Vilà, Carles; Squire, Kevin; Geffen, Eli; Kusak, Josip; Boyko, Adam R.; Parker, Heidi G.; Lee, Clarence; Tadigotla, Vasisht; Siepel, Adam; Bustamante, Carlos D.; Harkins, Timothy T.; Nelson, Stanley F.; Ostrander, Elaine A.; Marques-Bonet, Tomas; Wayne, Robert K.; Novembre, John

    2014-01-01

    To identify genetic changes underlying dog domestication and reconstruct their early evolutionary history, we generated high-quality genome sequences from three gray wolves, one from each of the three putative centers of dog domestication, two basal dog lineages (Basenji and Dingo) and a golden jackal as an outgroup. Analysis of these sequences supports a demographic model in which dogs and wolves diverged through a dynamic process involving population bottlenecks in both lineages and post-divergence gene flow. In dogs, the domestication bottleneck involved at least a 16-fold reduction in population size, a much more severe bottleneck than estimated previously. A sharp bottleneck in wolves occurred soon after their divergence from dogs, implying that the pool of diversity from which dogs arose was substantially larger than represented by modern wolf populations. We narrow the plausible range for the date of initial dog domestication to an interval spanning 11–16 thousand years ago, predating the rise of agriculture. In light of this finding, we expand upon previous work regarding the increase in copy number of the amylase gene (AMY2B) in dogs, which is believed to have aided digestion of starch in agricultural refuse. We find standing variation for amylase copy number variation in wolves and little or no copy number increase in the Dingo and Husky lineages. In conjunction with the estimated timing of dog origins, these results provide additional support to archaeological finds, suggesting the earliest dogs arose alongside hunter-gathers rather than agriculturists. Regarding the geographic origin of dogs, we find that, surprisingly, none of the extant wolf lineages from putative domestication centers is more closely related to dogs, and, instead, the sampled wolves form a sister monophyletic clade. This result, in combination with dog-wolf admixture during the process of domestication, suggests that a re-evaluation of past hypotheses regarding dog origins is

  19. Identification of genomic insertion and flanking sequence of G2-EPSPS and GAT transgenes in soybean using whole genome sequencing method

    Directory of Open Access Journals (Sweden)

    Bingfu Guo

    2016-07-01

    Full Text Available Molecular characterization of sequences flanking exogenous fragment insertions is essential for safety assessment and labeling of genetically modified organisms (GMO. In this study, the T-DNA insertion sites and flanking sequences were identified in two newly developed transgenic glyphosate-tolerant soybeans GE-J16 and ZH10-6 based on whole genome sequencing (WGS method. About 21 Gb sequence data (~21× coverage for each line was generated on Illumina HiSeq 2500 platform. The junction reads mapped to boundary of T-DNA and flanking sequences in these two events were identified by comparing all sequencing reads with soybean reference genome and sequence of transgenic vector. The putative insertion loci and flanking sequences were further confirmed by PCR amplification, Sanger sequencing, and co-segregation analysis. All these analyses supported that exogenous T-DNA fragments were integrated in positions of Chr19: 50543767-50543792 and Chr17: 7980527-7980541 in these two transgenic lines. Identification of the genomic insertion site of the G2-EPSPS and GAT transgenes will facilitate the use of their glyphosate-tolerant traits in soybean breeding program. These results also demonstrated that WGS is a cost-effective and rapid method of identifying sites of T-DNA insertions and flanking sequences in soybean.

  20. Identification of Genomic Insertion and Flanking Sequence of G2-EPSPS and GAT Transgenes in Soybean Using Whole Genome Sequencing Method.

    Science.gov (United States)

    Guo, Bingfu; Guo, Yong; Hong, Huilong; Qiu, Li-Juan

    2016-01-01

    Molecular characterization of sequence flanking exogenous fragment insertion is essential for safety assessment and labeling of genetically modified organism (GMO). In this study, the T-DNA insertion sites and flanking sequences were identified in two newly developed transgenic glyphosate-tolerant soybeans GE-J16 and ZH10-6 based on whole genome sequencing (WGS) method. More than 22.4 Gb sequence data (∼21 × coverage) for each line was generated on Illumina HiSeq 2500 platform. The junction reads mapped to boundaries of T-DNA and flanking sequences in these two events were identified by comparing all sequencing reads with soybean reference genome and sequence of transgenic vector. The putative insertion loci and flanking sequences were further confirmed by PCR amplification, Sanger sequencing, and co-segregation analysis. All these analyses supported that exogenous T-DNA fragments were integrated in positions of Chr19: 50543767-50543792 and Chr17: 7980527-7980541 in these two transgenic lines. Identification of genomic insertion sites of G2-EPSPS and GAT transgenes will facilitate the utilization of their glyphosate-tolerant traits in soybean breeding program. These results also demonstrated that WGS was a cost-effective and rapid method for identifying sites of T-DNA insertions and flanking sequences in soybean.

  1. Complete genome sequence of Pectobacterium carotovorum subsp. carotovorum bacteriophage My1.

    Science.gov (United States)

    Lee, Dong Hwan; Lee, Ju-Hoon; Shin, Hakdong; Ji, Samnyu; Roh, Eunjung; Jung, Kyusuk; Ryu, Sangryeol; Choi, Jaehyuk; Heu, Sunggi

    2012-10-01

    Pectobacterium carotovorum subsp. carotovorum, a member of the Enterobacteriaceae family, is an important plant-pathogenic bacterium causing significant economic losses worldwide. P. carotovorum subsp. carotovorum bacteriophage My1 was isolated from a soil sample. Its genome was completely sequenced and analyzed for the development of an effective biological control agent. Sequence and morphological analyses revealed that phage My1 is a T5-like bacteriophage and belongs to the family Siphoviridae. To date, there is no report of a Pectobacterium-targeting siphovirus genome sequence. Here, we announce the complete genome sequence of phage My1 and report the results of our analysis.

  2. Assessing the feasibility of GS FLX Pyrosequencing for sequencing the Atlantic salmon genome

    Directory of Open Access Journals (Sweden)

    Lubieniecki Krzysztof P

    2008-08-01

    Full Text Available Abstract Background With a whole genome duplication event and wealth of biological data, salmonids are excellent model organisms for studying evolutionary processes, fates of duplicated genes and genetic and physiological processes associated with complex behavioral phenotypes. It is surprising therefore, that no salmonid genome has been sequenced. Atlantic salmon (Salmo salar is a good representative salmonid for sequencing given its importance in aquaculture and the genomic resources available. However, the size and complexity of the genome combined with the lack of a sequenced reference genome from a closely related fish makes assembly challenging. Given the cost and time limitations of Sanger sequencing as well as recent improvements to next generation sequencing technologies, we examined the feasibility of using the Genome Sequencer (GS FLX pyrosequencing system to obtain the sequence of a salmonid genome. Eight pooled BACs belonging to a minimum tiling path covering ~1 Mb of the Atlantic salmon genome were sequenced by GS FLX shotgun and Long Paired End sequencing and compared with a ninth BAC sequenced by Sanger sequencing of a shotgun library. Results An initial assembly using only GS FLX shotgun sequences (average read length 248.5 bp with ~30× coverage allowed gene identification, but was incomplete even when 126 Sanger-generated BAC-end sequences (~0.09× coverage were incorporated. The addition of paired end sequencing reads (additional ~26× coverage produced a final assembly comprising 175 contigs assembled into four scaffolds with 171 gaps. Sanger sequencing of the ninth BAC (~10.5× coverage produced nine contigs and two scaffolds. The number of scaffolds produced by the GS FLX assembly was comparable to Sanger-generated sequencing; however, the number of gaps was much higher in the GS FLX assembly. Conclusion These results represent the first use of GS FLX paired end reads for de novo sequence assembly. Our data demonstrated

  3. From Conventional to Next Generation Sequencing of Epstein-Barr Virus Genomes.

    Science.gov (United States)

    Kwok, Hin; Chiang, Alan Kwok Shing

    2016-02-24

    Genomic sequences of Epstein-Barr virus (EBV) have been of interest because the virus is associated with cancers, such as nasopharyngeal carcinoma, and conditions such as infectious mononucleosis. The progress of whole-genome EBV sequencing has been limited by the inefficiency and cost of the first-generation sequencing technology. With the advancement of next-generation sequencing (NGS) and target enrichment strategies, increasing number of EBV genomes has been published. These genomes were sequenced using different approaches, either with or without EBV DNA enrichment. This review provides an overview of the EBV genomes published to date, and a description of the sequencing technology and bioinformatic analyses employed in generating these sequences. We further explored ways through which the quality of sequencing data can be improved, such as using DNA oligos for capture hybridization, and longer insert size and read length in the sequencing runs. These advances will enable large-scale genomic sequencing of EBV which will facilitate a better understanding of the genetic variations of EBV in different geographic regions and discovery of potentially pathogenic variants in specific diseases.

  4. Effects of informed consent for individual genome sequencing on relevant knowledge.

    Science.gov (United States)

    Kaphingst, K A; Facio, F M; Cheng, M-R; Brooks, S; Eidem, H; Linn, A; Biesecker, B B; Biesecker, L G

    2012-11-01

    Increasing availability of individual genomic information suggests that patients will need knowledge about genome sequencing to make informed decisions, but prior research is limited. In this study, we examined genome sequencing knowledge before and after informed consent among 311 participants enrolled in the ClinSeq™ sequencing study. An exploratory factor analysis of knowledge items yielded two factors (sequencing limitations knowledge; sequencing benefits knowledge). In multivariable analysis, high pre-consent sequencing limitations knowledge scores were significantly related to education [odds ratio (OR): 8.7, 95% confidence interval (CI): 2.45-31.10 for post-graduate education, and OR: 3.9; 95% CI: 1.05, 14.61 for college degree compared with less than college degree] and race/ethnicity (OR: 2.4, 95% CI: 1.09, 5.38 for non-Hispanic Whites compared with other racial/ethnic groups). Mean values increased significantly between pre- and post-consent for the sequencing limitations knowledge subscale (6.9-7.7, p benefits knowledge subscale (7.0-7.5, p < 0.0001); increase in knowledge did not differ by sociodemographic characteristics. This study highlights gaps in genome sequencing knowledge and underscores the need to target educational efforts toward participants with less education or from minority racial/ethnic groups. The informed consent process improved genome sequencing knowledge. Future studies could examine how genome sequencing knowledge influences informed decision making. © 2012 John Wiley & Sons A/S.

  5. Near-complete genome sequencing of swine vesicular disease virus using the Roche GS FLX sequencing platform

    DEFF Research Database (Denmark)

    Nielsen, Sandra Cathrine Abel; Bruhn, Christian Anders Wathne; Samaniego Castruita, Jose Alfredo

    2014-01-01

    of the capacity of a Roche GS FLX sequencing platform. Sequences were initially verified through one of two criteria; either a match between a de novo assembly and a reference mapping, or a match between all of five different reference mappings performed against a fixed set of starting reference genomes...... with significant genetic distances within the same species of viruses. All reference mappings used an iterative method to avoid bias. Further verification was achieved through phylogenetic analysis against published SVDV genomes and additional Enterovirus B sequences. This approach allows high confidence...

  6. The complete mitochondrial genome sequence of Tylototriton taliangensis (Amphibia: Caudata).

    Science.gov (United States)

    Jiang, Ye; Li, Ziyuan; Liu, Jiabin; Li, Yan; Ni, Qingyong; Yao, Yongfang; Xu, Huailiang; Li, Ying; Zhang, Mingwang

    2016-07-01

    Tylototriton taliangensis was listed as a Near Threatened amphibian in IUCN red list. In this study, we sequenced the complete mitochondrial (mt) genome of this species (GenBank: KP979646) and found it contains 16,265 base pairs, which encode 13 protein-coding genes (PCGs), 22 transfer RNA genes (tRNA), 2 ribosomal RNA genes (rRNA) and 1 control region (CR). We also found that almost all PCGs and tRNA genes are located on the H-strand, except for ND6 subunit gene and eight tRNA genes, which were distributed on the L-strand. The PCGs used "ATG" and "GTG" as the start codon, while used four types of stop codons. Almost all tRNA genes were folded into typical cloverleaf secondary structures. The L-strand replication origin (OL) and a non-coding region were also found. The new mitogenomic phylogenetic tree confirms the reciprocally monophyly of the genus Tylototriton, Echinotriton and Pleurodeles with high bootstrap value. The present study will provide information for future studies on the conservation genetics and phylogeny of this species and its relatives.

  7. Comparative analyses of multi-species sequences from targeted genomic regions.

    Science.gov (United States)

    Thomas, J W; Touchman, J W; Blakesley, R W; Bouffard, G G; Beckstrom-Sternberg, S M; Margulies, E H; Blanchette, M; Siepel, A C; Thomas, P J; McDowell, J C; Maskeri, B; Hansen, N F; Schwartz, M S; Weber, R J; Kent, W J; Karolchik, D; Bruen, T C; Bevan, R; Cutler, D J; Schwartz, S; Elnitski, L; Idol, J R; Prasad, A B; Lee-Lin, S-Q; Maduro, V V B; Summers, T J; Portnoy, M E; Dietrich, N L; Akhter, N; Ayele, K; Benjamin, B; Cariaga, K; Brinkley, C P; Brooks, S Y; Granite, S; Guan, X; Gupta, J; Haghighi, P; Ho, S-L; Huang, M C; Karlins, E; Laric, P L; Legaspi, R; Lim, M J; Maduro, Q L; Masiello, C A; Mastrian, S D; McCloskey, J C; Pearson, R; Stantripop, S; Tiongson, E E; Tran, J T; Tsurgeon, C; Vogt, J L; Walker, M A; Wetherby, K D; Wiggins, L S; Young, A C; Zhang, L-H; Osoegawa, K; Zhu, B; Zhao, B; Shu, C L; De Jong, P J; Lawrence, C E; Smit, A F; Chakravarti, A; Haussler, D; Green, P; Miller, W; Green, E D

    2003-08-14

    The systematic comparison of genomic sequences from different organisms represents a central focus of contemporary genome analysis. Comparative analyses of vertebrate sequences can identify coding and conserved non-coding regions, including regulatory elements, and provide insight into the forces that have rendered modern-day genomes. As a complement to whole-genome sequencing efforts, we are sequencing and comparing targeted genomic regions in multiple, evolutionarily diverse vertebrates. Here we report the generation and analysis of over 12 megabases (Mb) of sequence from 12 species, all derived from the genomic region orthologous to a segment of about 1.8 Mb on human chromosome 7 containing ten genes, including the gene mutated in cystic fibrosis. These sequences show conservation reflecting both functional constraints and the neutral mutational events that shaped this genomic region. In particular, we identify substantial numbers of conserved non-coding segments beyond those previously identified experimentally, most of which are not detectable by pair-wise sequence comparisons alone. Analysis of transposable element insertions highlights the variation in genome dynamics among these species and confirms the placement of rodents as a sister group to the primates.

  8. Comparative genome sequencing of Drosophila pseudoobscura: Chromosomal, gene, and cis-element evolution

    DEFF Research Database (Denmark)

    Richards, Stephen; Liu, Yue; Bettencourt, Brian R.

    2005-01-01

    years (Myr) since the pseudoobscura/melanogaster divergence. Genes expressed in the testes had higher amino acid sequence divergence than the genome-wide average, consistent with the rapid evolution of sex-specific proteins. Cis-regulatory sequences are more conserved than random and nearby sequences......We have sequenced the genome of a second Drosophila species, Drosophila pseudoobscura, and compared this to the genome sequence of Drosophila melanogaster, a primary model organism. Throughout evolution the vast majority of Drosophila genes have remained on the same chromosome arm, but within each...... between the species-but the difference is slight, suggesting that the evolution of cis-regulatory elements is flexible. Overall, a pattern of repeat-mediated chromosomal rearrangement, and high coadaptation of both male genes and cis-regulatory sequences emerges as important themes of genome divergence...

  9. Genome Sequence of Torulaspora delbrueckii NRRL Y-50541, Isolated from Mezcal Fermentation.

    Science.gov (United States)

    Gomez-Angulo, Jorge; Vega-Alvarado, Leticia; Escalante-García, Zazil; Grande, Ricardo; Gschaedler-Mathis, Anne; Amaya-Delgado, Lorena; Arrizon, Javier; Sanchez-Flores, Alejandro

    2015-07-23

    Torulaspora delbrueckii presents metabolic features interesting for biotechnological applications (in the dairy and wine industries). Recently, the T. delbrueckii CBS 1146 genome, which has been maintained under laboratory conditions since 1970, was published. Thus, a genome of a new mezcal yeast was sequenced and characterized and showed genetic differences and a higher genome assembly quality, offering a better reference genome. Copyright © 2015 Gomez-Angulo et al.

  10. Fallacy of the Unique Genome: Sequence Diversity within Single Helicobacter pylori Strains

    OpenAIRE

    Draper, JL; Hansen, LM; Bernick, DL; Abedrabbo, S; Underwood, JG; Kong, N; Huang, BC; Weis, AM; Weimer, BC; van Vliet, AHM; Pourmand, N; Solnick, JV; Karplus, K; Ottemannc, KM

    2017-01-01

    © 2017 Doberenz et al. Many bacterial genomes are highly variable but nonetheless are typically published as a single assembled genome. Experiments tracking bacterial genome evolution have not looked at the variation present at a given point in time. Here, we analyzed the mouse-passaged Helicobacter pylori strain SS1 and its parent PMSS1 to assess intra-and intergenomic variability. Using high sequence coverage depth and experimental validation, we detected extensive genome plasticity within ...

  11. Rapid and affordable genome-wide bisulfite DNA sequencing by XmaI-reduced representation bisulfite sequencing.

    Science.gov (United States)

    Tanas, Alexander S; Borisova, Marina E; Kuznetsova, Ekaterina B; Rudenko, Viktoria V; Karandasheva, Kristina O; Nemtsova, Marina V; Izhevskaya, Vera L; Simonova, Olga A; Larin, Sergey S; Zaletaev, Dmitry V; Strelnikov, Vladimir V

    2017-06-01

    To develop a reduced representation bisulfite sequencing (RRBS) approach for rapid and affordable genome-wide DNA methylation analysis. We have selected restriction endonuclease XmaI to produce RRBS library fragments. After digestion and partial fill-in DNA fragments were ligated to barcoded adapters, bisulfite converted, size-selected, and sequenced on the Ion Torrent Personal Genome Machine. XmaI-RRBS results were compared with the previously published RRBS data. We have developed an XmaI-RRBS method for rapid and affordable genome-wide DNA methylation analysis, with library preparation taking only 4 days and sequencing possible within 4 h. We have also addressed several challenges in order to further improve the RRBS technology. XmaI-RRBS may be performed on degraded DNA samples and is compatible with the bench-top next-generation sequencing machines.

  12. The Fast Changing Landscape of Sequencing Technologies and Their Impact on Microbial Genome Assemblies and Annotation

    Energy Technology Data Exchange (ETDEWEB)

    Mavromatis, K [U.S. Department of Energy, Joint Genome Institute; Land, Miriam L [ORNL; Brettin, Thomas S [ORNL; Quest, Daniel J [ORNL; Copeland, A [U.S. Department of Energy, Joint Genome Institute; Clum, Alicia [U.S. Department of Energy, Joint Genome Institute; Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Woyke, Tanja [U.S. Department of Energy, Joint Genome Institute; Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute; Klenk, Hans-Peter [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Cottingham, Robert W [ORNL; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute

    2012-01-01

    Background: The emergence of next generation sequencing (NGS) has provided the means for rapid and high throughput sequencing and data generation at low cost, while concomitantly creating a new set of challenges. The number of available assembled microbial genomes continues to grow rapidly and their quality reflects the quality of the sequencing technology used, but also of the analysis software employed for assembly and annotation. Methodology/Principal Findings: In this work, we have explored the quality of the microbial draft genomes across various sequencing technologies. We have compared the draft and finished assemblies of 133 microbial genomes sequenced at the Department of Energy-Joint Genome Institute and finished at the Los Alamos National Laboratory using a variety of combinations of sequencing technologies, reflecting the transition of the institute from Sanger-based sequencing platforms to NGS platforms. The quality of the public assemblies and of the associated gene annotations was evaluated using various metrics. Results obtained with the different sequencing technologies, as well as their effects on downstream processes, were analyzed. Our results demonstrate that the Illumina HiSeq 2000 sequencing system, the primary sequencing technology currently used for de novo genome sequencing and assembly at JGI, has various advantages in terms of total sequence throughput and cost, but it also introduces challenges for the downstream analyses. In all cases assembly results although on average are of high quality, need to be viewed critically and consider sources of errors in them prior to analysis. Conclusion: These data follow the evolution of microbial sequencing and downstream processing at the JGI from draft genome sequences with large gaps corresponding to missing genes of significant biological role to assemblies with multiple small gaps (Illumina) and finally to assemblies that generate almost complete genomes (Illumina+PacBio).

  13. The fast changing landscape of sequencing technologies and their impact on microbial genome assemblies and annotation.

    Directory of Open Access Journals (Sweden)

    Konstantinos Mavromatis

    Full Text Available BACKGROUND: The emergence of next generation sequencing (NGS has provided the means for rapid and high throughput sequencing and data generation at low cost, while concomitantly creating a new set of challenges. The number of available assembled microbial genomes continues to grow rapidly and their quality reflects the quality of the sequencing technology used, but also of the analysis software employed for assembly and annotation. METHODOLOGY/PRINCIPAL FINDINGS: In this work, we have explored the quality of the microbial draft genomes across various sequencing technologies. We have compared the draft and finished assemblies of 133 microbial genomes sequenced at the Department of Energy-Joint Genome Institute and finished at the Los Alamos National Laboratory using a variety of combinations of sequencing technologies, reflecting the transition of the institute from Sanger-based sequencing platforms to NGS platforms. The quality of the public assemblies and of the associated gene annotations was evaluated using various metrics. Results obtained with the different sequencing technologies, as well as their effects on downstream processes, were analyzed. Our results demonstrate that the Illumina HiSeq 2000 sequencing system, the primary sequencing technology currently used for de novo genome sequencing and assembly at JGI, has various advantages in terms of total sequence throughput and cost, but it also introduces challenges for the downstream analyses. In all cases assembly results although on average are of high quality, need to be viewed critically and consider sources of errors in them prior to analysis. CONCLUSION: These data follow the evolution of microbial sequencing and downstream processing at the JGI from draft genome sequences with large gaps corresponding to missing genes of significant biological role to assemblies with multiple small gaps (Illumina and finally to assemblies that generate almost complete genomes (Illumina+PacBio.

  14. Nanopore DNA Sequencing and Genome Assembly on the International Space Station.

    Science.gov (United States)

    Castro-Wallace, Sarah L; Chiu, Charles Y; John, Kristen K; Stahl, Sarah E; Rubins, Kathleen H; McIntyre, Alexa B R; Dworkin, Jason P; Lupisella, Mark L; Smith, David J; Botkin, Douglas J; Stephenson, Timothy A; Juul, Sissel; Turner, Daniel J; Izquierdo, Fernando; Federman, Scot; Stryke, Doug; Somasekar, Sneha; Alexander, Noah; Yu, Guixia; Mason, Christopher E; Burton, Aaron S

    2017-12-21

    We evaluated the performance of the MinION DNA sequencer in-flight on the International Space Station (ISS), and benchmarked its performance off-Earth against the MinION, Illumina MiSeq, and PacBio RS II sequencing platforms in terrestrial laboratories. Samples contained equimolar mixtures of genomic DNA from lambda bacteriophage, Escherichia coli (strain K12, MG1655) and Mus musculus (female BALB/c mouse). Nine sequencing runs were performed aboard the ISS over a 6-month period, yielding a total of 276,882 reads with no apparent decrease in performance over time. From sequence data collected aboard the ISS, we constructed directed assemblies of the ~4.6 Mb E. coli genome, ~48.5 kb lambda genome, and a representative M. musculus sequence (the ~16.3 kb mitochondrial genome), at 100%, 100%, and 96.7% consensus pairwise identity, respectively; de novo assembly of the E. coli genome from raw reads yielded a single contig comprising 99.9% of the genome at 98.6% consensus pairwise identity. Simulated real-time analyses of in-flight sequence data using an automated bioinformatic pipeline and laptop-based genomic assembly demonstrated the feasibility of sequencing analysis and microbial identification aboard the ISS. These findings illustrate the potential for sequencing applications including disease diagnosis, environmental monitoring, and elucidating the molecular basis for how organisms respond to spaceflight.

  15. Genomic sequence around butterfly wing development genes: annotation and comparative analysis.

    Directory of Open Access Journals (Sweden)

    Inês C Conceição

    Full Text Available BACKGROUND: Analysis of genomic sequence allows characterization of genome content and organization, and access beyond gene-coding regions for identification of functional elements. BAC libraries, where relatively large genomic regions are made readily available, are especially useful for species without a fully sequenced genome and can increase genomic coverage of phylogenetic and biological diversity. For example, no butterfly genome is yet available despite the unique genetic and biological properties of this group, such as diversified wing color patterns. The evolution and development of these patterns is being studied in a few target species, including Bicyclus anynana, where a whole-genome BAC library allows targeted access to large genomic regions. METHODOLOGY/PRINCIPAL FINDINGS: We characterize ∼1.3 Mb of genomic sequence around 11 selected genes expressed in B. anynana developing wings. Extensive manual curation of in silico predictions, also making use of a large dataset of expressed genes for this species, identified repetitive elements and protein coding sequence, and highlighted an expansion of Alcohol dehydrogenase genes. Comparative analysis with orthologous regions of the lepidopteran reference genome allowed assessment of conservation of fine-scale synteny (with detection of new inversions and translocations and of DNA sequence (with detection of high levels of conservation of non-coding regions around some, but not all, developmental genes. CONCLUSIONS: The general properties and organization of the available B. anynana genomic sequence are similar to the lepidopteran reference, despite the more than 140 MY divergence. Our results lay the groundwork for further studies of new interesting findings in relation to both coding and non-coding sequence: 1 the Alcohol dehydrogenase expansion with higher similarity between the five tandemly-repeated B. anynana paralogs than with the corresponding B. mori orthologs, and 2 the high

  16. Sequence composition similarities with the 7SL RNA are highly predictive of functional genomic features

    OpenAIRE

    Paquet, Yanick; Anderson, Alan

    2010-01-01

    Transposable elements derived from the 7SL RNA gene, such as Alu elements in primates, have had remarkable success in several mammalian lineages. The results presented here show a broad spectrum of functions for genomic segments that display sequence composition similarities with the 7SL RNA gene. Using thoroughly documented loci, we report that DNaseI-hypersensitive sites can be singled out in large genomic sequences by an assessment of sequence composition similarities with the 7SL RNA gene...

  17. Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome

    OpenAIRE

    Margulies, Elliott H.; Cooper, Gregory M.; Asimenos, George; Thomas, Daryl J.; Dewey, Colin N.; Siepel, Adam; Birney, Ewan; Keefe, Damian; Schwartz, Ariel S.; Hou, Minmei; Taylor, James; Nikolaev, Sergey; Montoya-Burgos, Juan I.; Löytynoja, Ari; Whelan, Simon

    2007-01-01

    A key component of the ongoing ENCODE project involves rigorous comparative sequence analyses for the initially targeted 1% of the human genome. Here, we present orthologous sequence generation, alignment, and evolutionary constraint analyses of 23 mammalian species for all ENCODE targets. Alignments were generated using four different methods; comparisons of these methods reveal large-scale consistency but substantial differences in terms of small genomic rearrangements, sensitivity (sequenc...

  18. Genome-wide microsatellite identification in the fungus Anisogramma anomala using Illumina sequencing and genome assembly.

    Directory of Open Access Journals (Sweden)

    Guohong Cai

    Full Text Available High-throughput sequencing has been dramatically accelerating the discovery of microsatellite markers (also known as Simple Sequence Repeats. Both 454 and Illumina reads have been used directly in microsatellite discovery and primer design (the "Seq-to-SSR" approach. However, constraints of this approach include: 1 many microsatellite-containing reads do not have sufficient flanking sequences to allow primer design, and 2 difficulties in removing microsatellite loci residing in longer, repetitive regions. In the current study, we applied the novel "Seq-Assembly-SSR" approach to overcome these constraints in Anisogramma anomala. In our approach, Illumina reads were first assembled into a draft genome, and the latter was then used in microsatellite discovery. A. anomala is an obligate biotrophic ascomycete that causes eastern filbert blight disease of commercial European hazelnut. Little is known about its population structure or diversity. Approximately 26 M 146 bp Illumina reads were generated from a paired-end library of a fungal strain from Oregon. The reads were assembled into a draft genome of 333 Mb (excluding gaps, with contig N50 of 10,384 bp and scaffold N50 of 32,987 bp. A bioinformatics pipeline identified 46,677 microsatellite motifs at 44,247 loci, including 2,430 compound loci. Primers were successfully designed for 42,923 loci (97%. After removing 2,886 loci close to assembly gaps and 676 loci in repetitive regions, a genome-wide microsatellite database of 39,361 loci was generated for the fungus. In experimental screening of 236 loci using four geographically representative strains, 228 (96.6% were successfully amplified and 214 (90.7% produced single PCR products. Twenty-three (9.7% were found to be perfect polymorphic loci. A small-scale population study using 11 polymorphic loci revealed considerable gene diversity. Clustering analysis grouped isolates of this fungus into two clades in accordance with their geographic origins

  19. Variable selection models for genomic selection using whole-genome sequence data and singular value decomposition.

    Science.gov (United States)

    Meuwissen, Theo H E; Indahl, Ulf G; Ødegård, Jørgen

    2017-12-27

    Non-linear Bayesian genomic prediction models such as BayesA/B/C/R involve iteration and mostly Markov chain Monte Carlo (MCMC) algorithms, which are computationally expensive, especially when whole-genome sequence (WGS) data are analyzed. Singular value decomposition (SVD) of the genotype matrix can facilitate genomic prediction in large datasets, and can be used to estimate marker effects and their prediction error variances (PEV) in a computationally efficient manner. Here, we developed, implemented, and evaluated a direct, non-iterative method for the estimation of marker effects for the BayesC genomic prediction model. The BayesC model assumes a priori that markers have normally distributed effects with probability [Formula: see text] and no effect with probability (1 - [Formula: see text]). Marker effects and their PEV are estimated by using SVD and the posterior probability of the marker having a non-zero effect is calculated. These posterior probabilities are used to obtain marker-specific effect variances, which are subsequently used to approximate BayesC estimates of marker effects in a linear model. A computer simulation study was conducted to compare alternative genomic prediction methods, where a single reference generation was used to estimate marker effects, which were subsequently used for 10 generations of forward prediction, for which accuracies were evaluated. SVD-based posterior probabilities of markers having non-zero effects were generally lower than MCMC-based posterior probabilities, but for some regions the opposite occurred, resulting in clear signals for QTL-rich regions. The accuracies of breeding values estimated using SVD- and MCMC-based BayesC analyses were similar across the 10 generations of forward prediction. For an intermediate number of generations (2 to 5) of forward prediction, accuracies obtained with the BayesC model tended to be slightly higher than accuracies obtained using the best linear unbiased prediction of SNP

  20. Multifaceted biological insights from a draft genome sequence of the tobacco hornworm moth, Manduca sexta

    DEFF Research Database (Denmark)

    Kanost, Michael R; Arrese, Estela L; Cao, Xiaolong

    2016-01-01

    Manduca sexta, known as the tobacco hornworm or Carolina sphinx moth, is a lepidopteran insect that is used extensively as a model system for research in insect biochemistry, physiology, neurobiology, development, and immunity. One important benefit of this species as an experimental model is its......, and immunity. This genome sequence, annotation, and analysis provide an important new resource from a well-studied model insect species and will facilitate further biochemical and mechanistic experimental studies of many biological systems in insects.......Manduca sexta, known as the tobacco hornworm or Carolina sphinx moth, is a lepidopteran insect that is used extensively as a model system for research in insect biochemistry, physiology, neurobiology, development, and immunity. One important benefit of this species as an experimental model is its...... extremely large size, reaching more than 10 g in the larval stage. M. sexta larvae feed on solanaceous plants and thus must tolerate a substantial challenge from plant allelochemicals, in