WorldWideScience

Sample records for genomic sequence characterization

  1. Characterizing immunoglobulin repertoire from whole blood by a personal genome sequencer.

    Directory of Open Access Journals (Sweden)

    Fan Gao

    Full Text Available In human immune system, V(DJ recombination produces an enormously large repertoire of immunoglobulins (Ig so that they can tackle different antigens from bacteria, viruses and tumor cells. Several studies have demonstrated the utility of next-generation sequencers such as Roche 454 and Illumina Genome Analyzer to characterize the repertoire of immunoglobulins. However, these techniques typically require separation of B cell population from whole blood and require a few weeks for running the sequencers, so it may not be practical to implement them in clinical settings. Recently, the Ion Torrent personal genome sequencer has emerged as a tabletop personal genome sequencer that can be operated in a time-efficient and cost-effective manner. In this study, we explored the technical feasibility to use multiplex PCR for amplifying V(DJ recombination for IgH, directly from whole blood, then sequence the amplicons by the Ion Torrent sequencer. The whole process including data generation and analysis can be completed in one day. We tested the method in a pilot study on patients with benign, atypical and malignant meningiomas. Despite the noisy data, we were able to compare the samples by their usage frequencies of the V segment, as well as their somatic hypermutation rates. In summary, our study suggested that it is technically feasible to perform clinical monitoring of V(DJ recombination within a day by personal genome sequencers.

  2. The Pinus taeda genome is characterized by diverse and highly diverged repetitive sequences

    Directory of Open Access Journals (Sweden)

    Yandell Mark

    2010-07-01

    Full Text Available Abstract Background In today's age of genomic discovery, no attempt has been made to comprehensively sequence a gymnosperm genome. The largest genus in the coniferous family Pinaceae is Pinus, whose 110-120 species have extremely large genomes (c. 20-40 Gb, 2N = 24. The size and complexity of these genomes have prompted much speculation as to the feasibility of completing a conifer genome sequence. Conifer genomes are reputed to be highly repetitive, but there is little information available on the nature and identity of repetitive units in gymnosperms. The pines have extensive genetic resources, with approximately 329000 ESTs from eleven species and genetic maps in eight species, including a dense genetic map of the twelve linkage groups in Pinus taeda. Results We present here the Sanger sequence and annotation of ten P. taeda BAC clones and Genome Analyzer II whole genome shotgun (WGS sequences representing 7.5% of the genome. Computational annotation of ten BACs predicts three putative protein-coding genes and at least fifteen likely pseudogenes in nearly one megabase of sequence. We found three conifer-specific LTR retroelements in the BACs, and tentatively identified at least 15 others based on evidence from the distantly related angiosperms. Alignment of WGS sequences to the BACs indicates that 80% of BAC sequences have similar copies (≥ 75% nucleotide identity elsewhere in the genome, but only 23% have identical copies (99% identity. The three most common repetitive elements in the genome were identified and, when combined, represent less than 5% of the genome. Conclusions This study indicates that the majority of repeats in the P. taeda genome are 'novel' and will therefore require additional BAC or genomic sequencing for accurate characterization. The pine genome contains a very large number of diverged and probably defunct repetitive elements. This study also provides new evidence that sequencing a pine genome using a WGS approach is

  3. Genome survey sequencing and genetic background characterization of Gracilariopsis lemaneiformis (Rhodophyta) based on next-generation sequencing.

    Science.gov (United States)

    Zhou, Wei; Hu, Yiyi; Sui, Zhenghong; Fu, Feng; Wang, Jinguo; Chang, Lianpeng; Guo, Weihua; Li, Binbin

    2013-01-01

    Gracilariopsis lemaneiformis has a high economic value and is one of the most important aquaculture species in China. Despite it is economic importance, it has remained largely unstudied at the genomic level. In this study, we conducted a genome survey of Gp. lemaneiformis using next-generation sequencing (NGS) technologies. In total, 18.70 Gb of high-quality sequence data with an estimated genome size of 97 Mb were obtained by HiSeq 2000 sequencing for Gp. lemaneiformis. These reads were assembled into 160,390 contigs with a N50 length of 3.64 kb, which were further assembled into 125,685 scaffolds with a total length of 81.17 Mb. Genome analysis predicted 3490 genes and a GC% content of 48%. The identified genes have an average transcript length of 1,429 bp, an average coding sequence size of 1,369 bp, 1.36 exons per gene, exon length of 1,008 bp, and intron length of 191 bp. From the initial assembled scaffold, transposable elements constituted 54.64% (44.35 Mb) of the genome, and 7737 simple sequence repeats (SSRs) were identified. Among these SSRs, the trinucleotide repeat type was the most abundant (up to 73.20% of total SSRs), followed by the di- (17.41%), tetra- (5.49%), hexa- (2.90%), and penta- (1.00%) nucleotide repeat type. These characteristics suggest that Gp. lemaneiformis is a model organism for genetic study. This is the first report of genome-wide characterization within this taxon.

  4. Genome Survey Sequencing and Genetic Background Characterization of Gracilariopsis lemaneiformis (Rhodophyta) Based on Next-Generation Sequencing

    Science.gov (United States)

    Sui, Zhenghong; Fu, Feng; Wang, Jinguo; Chang, Lianpeng; Guo, Weihua; Li, Binbin

    2013-01-01

    Gracilariopsis lemaneiformis has a high economic value and is one of the most important aquaculture species in China. Despite it is economic importance, it has remained largely unstudied at the genomic level. In this study, we conducted a genome survey of Gp. lemaneiformis using next-generation sequencing (NGS) technologies. In total, 18.70 Gb of high-quality sequence data with an estimated genome size of 97 Mb were obtained by HiSeq 2000 sequencing for Gp. lemaneiformis. These reads were assembled into 160,390 contigs with a N50 length of 3.64 kb, which were further assembled into 125,685 scaffolds with a total length of 81.17 Mb. Genome analysis predicted 3490 genes and a GC% content of 48%. The identified genes have an average transcript length of 1,429 bp, an average coding sequence size of 1,369 bp, 1.36 exons per gene, exon length of 1,008 bp, and intron length of 191 bp. From the initial assembled scaffold, transposable elements constituted 54.64% (44.35 Mb) of the genome, and 7737 simple sequence repeats (SSRs) were identified. Among these SSRs, the trinucleotide repeat type was the most abundant (up to 73.20% of total SSRs), followed by the di- (17.41%), tetra- (5.49%), hexa- (2.90%), and penta- (1.00%) nucleotide repeat type. These characteristics suggest that Gp. lemaneiformis is a model organism for genetic study. This is the first report of genome-wide characterization within this taxon. PMID:23875008

  5. Comparative genome analysis and characterization of the Salmonella Typhimurium strain CCRJ_26 isolated from swine carcasses using whole-genome sequencing approach.

    Science.gov (United States)

    Panzenhagen, P H N; Cabral, C C; Suffys, P N; Franco, R M; Rodrigues, D P; Conte-Junior, C A

    2018-04-01

    Salmonella pathogenicity relies on virulence factors many of which are clustered within the Salmonella pathogenicity islands. Salmonella also harbours mobile genetic elements such as virulence plasmids, prophage-like elements and antimicrobial resistance genes which can contribute to increase its pathogenicity. Here, we have genetically characterized a selected S. Typhimurium strain (CCRJ_26) from our previous study with Multiple Drugs Resistant profile and high-frequency PFGE clonal profile which apparently persists in the pork production centre of Rio de Janeiro State, Brazil. By whole-genome sequencing, we described the strain's genome virulent content and characterized the repertoire of bacterial plasmids, antibiotic resistance genes and prophage-like elements. Here, we have shown evidence that strain CCRJ_26 genome possible represent a virulence-associated phenotype which may be potentially virulent in human infection. Whole-genome sequencing technologies are still costly and remain underexplored for applied microbiology in Brazil. Hence, this genomic description of S. Typhimurium strain CCRJ_26 will provide help in future molecular epidemiological studies. The analysis described here reveals a quick and useful pipeline for bacterial virulence characterization using whole-genome sequencing approach. © 2018 The Society for Applied Microbiology.

  6. The characterization of twenty sequenced human genomes.

    Directory of Open Access Journals (Sweden)

    Kimberly Pelak

    2010-09-01

    Full Text Available We present the analysis of twenty human genomes to evaluate the prospects for identifying rare functional variants that contribute to a phenotype of interest. We sequenced at high coverage ten "case" genomes from individuals with severe hemophilia A and ten "control" genomes. We summarize the number of genetic variants emerging from a study of this magnitude, and provide a proof of concept for the identification of rare and highly-penetrant functional variants by confirming that the cause of hemophilia A is easily recognizable in this data set. We also show that the number of novel single nucleotide variants (SNVs discovered per genome seems to stabilize at about 144,000 new variants per genome, after the first 15 individuals have been sequenced. Finally, we find that, on average, each genome carries 165 homozygous protein-truncating or stop loss variants in genes representing a diverse set of pathways.

  7. Genome-wide characterization of centromeric satellites from multiple mammalian genomes.

    Science.gov (United States)

    Alkan, Can; Cardone, Maria Francesca; Catacchio, Claudia Rita; Antonacci, Francesca; O'Brien, Stephen J; Ryder, Oliver A; Purgato, Stefania; Zoli, Monica; Della Valle, Giuliano; Eichler, Evan E; Ventura, Mario

    2011-01-01

    Despite its importance in cell biology and evolution, the centromere has remained the final frontier in genome assembly and annotation due to its complex repeat structure. However, isolation and characterization of the centromeric repeats from newly sequenced species are necessary for a complete understanding of genome evolution and function. In recent years, various genomes have been sequenced, but the characterization of the corresponding centromeric DNA has lagged behind. Here, we present a computational method (RepeatNet) to systematically identify higher-order repeat structures from unassembled whole-genome shotgun sequence and test whether these sequence elements correspond to functional centromeric sequences. We analyzed genome datasets from six species of mammals representing the diversity of the mammalian lineage, namely, horse, dog, elephant, armadillo, opossum, and platypus. We define candidate monomer satellite repeats and demonstrate centromeric localization for five of the six genomes. Our analysis revealed the greatest diversity of centromeric sequences in horse and dog in contrast to elephant and armadillo, which showed high-centromeric sequence homogeneity. We could not isolate centromeric sequences within the platypus genome, suggesting that centromeres in platypus are not enriched in satellite DNA. Our method can be applied to the characterization of thousands of other vertebrate genomes anticipated for sequencing in the near future, providing an important tool for annotation of centromeres.

  8. Genome-wide microsatellite characterization and marker development in the sequenced Brassica crop species.

    Science.gov (United States)

    Shi, Jiaqin; Huang, Shunmou; Zhan, Jiepeng; Yu, Jingyin; Wang, Xinfa; Hua, Wei; Liu, Shengyi; Liu, Guihua; Wang, Hanzhong

    2014-02-01

    Although much research has been conducted, the pattern of microsatellite distribution has remained ambiguous, and the development/utilization of microsatellite markers has still been limited/inefficient in Brassica, due to the lack of genome sequences. In view of this, we conducted genome-wide microsatellite characterization and marker development in three recently sequenced Brassica crops: Brassica rapa, Brassica oleracea and Brassica napus. The analysed microsatellite characteristics of these Brassica species were highly similar or almost identical, which suggests that the pattern of microsatellite distribution is likely conservative in Brassica. The genomic distribution of microsatellites was highly non-uniform and positively or negatively correlated with genes or transposable elements, respectively. Of the total of 115 869, 185 662 and 356 522 simple sequence repeat (SSR) markers developed with high frequencies (408.2, 343.8 and 356.2 per Mb or one every 2.45, 2.91 and 2.81 kb, respectively), most represented new SSR markers, the majority had determined physical positions, and a large number were genic or putative single-locus SSR markers. We also constructed a comprehensive database for the newly developed SSR markers, which was integrated with public Brassica SSR markers and annotated genome components. The genome-wide SSR markers developed in this study provide a useful tool to extend the annotated genome resources of sequenced Brassica species to genetic study/breeding in different Brassica species.

  9. Advantages of genome sequencing by long-read sequencer using SMRT technology in medical area.

    Science.gov (United States)

    Nakano, Kazuma; Shiroma, Akino; Shimoji, Makiko; Tamotsu, Hinako; Ashimine, Noriko; Ohki, Shun; Shinzato, Misuzu; Minami, Maiko; Nakanishi, Tetsuhiro; Teruya, Kuniko; Satou, Kazuhito; Hirano, Takashi

    2017-07-01

    PacBio RS II is the first commercialized third-generation DNA sequencer able to sequence a single molecule DNA in real-time without amplification. PacBio RS II's sequencing technology is novel and unique, enabling the direct observation of DNA synthesis by DNA polymerase. PacBio RS II confers four major advantages compared to other sequencing technologies: long read lengths, high consensus accuracy, a low degree of bias, and simultaneous capability of epigenetic characterization. These advantages surmount the obstacle of sequencing genomic regions such as high/low G+C, tandem repeat, and interspersed repeat regions. Moreover, PacBio RS II is ideal for whole genome sequencing, targeted sequencing, complex population analysis, RNA sequencing, and epigenetics characterization. With PacBio RS II, we have sequenced and analyzed the genomes of many species, from viruses to humans. Herein, we summarize and review some of our key genome sequencing projects, including full-length viral sequencing, complete bacterial genome and almost-complete plant genome assemblies, and long amplicon sequencing of a disease-associated gene region. We believe that PacBio RS II is not only an effective tool for use in the basic biological sciences but also in the medical/clinical setting.

  10. Deep whole-genome sequencing of 90 Han Chinese genomes.

    Science.gov (United States)

    Lan, Tianming; Lin, Haoxiang; Zhu, Wenjuan; Laurent, Tellier Christian Asker Melchior; Yang, Mengcheng; Liu, Xin; Wang, Jun; Wang, Jian; Yang, Huanming; Xu, Xun; Guo, Xiaosen

    2017-09-01

    Next-generation sequencing provides a high-resolution insight into human genetic information. However, the focus of previous studies has primarily been on low-coverage data due to the high cost of sequencing. Although the 1000 Genomes Project and the Haplotype Reference Consortium have both provided powerful reference panels for imputation, low-frequency and novel variants remain difficult to discover and call with accuracy on the basis of low-coverage data. Deep sequencing provides an optimal solution for the problem of these low-frequency and novel variants. Although whole-exome sequencing is also a viable choice for exome regions, it cannot account for noncoding regions, sometimes resulting in the absence of important, causal variants. For Han Chinese populations, the majority of variants have been discovered based upon low-coverage data from the 1000 Genomes Project. However, high-coverage, whole-genome sequencing data are limited for any population, and a large amount of low-frequency, population-specific variants remain uncharacterized. We have performed whole-genome sequencing at a high depth (∼×80) of 90 unrelated individuals of Chinese ancestry, collected from the 1000 Genomes Project samples, including 45 Northern Han Chinese and 45 Southern Han Chinese samples. Eighty-three of these 90 have been sequenced by the 1000 Genomes Project. We have identified 12 568 804 single nucleotide polymorphisms, 2 074 210 short InDels, and 26 142 structural variations from these 90 samples. Compared to the Han Chinese data from the 1000 Genomes Project, we have found 7 000 629 novel variants with low frequency (defined as minor allele frequency genome. Compared to the 1000 Genomes Project, these Han Chinese deep sequencing data enhance the characterization of a large number of low-frequency, novel variants. This will be a valuable resource for promoting Chinese genetics research and medical development. Additionally, it will provide a valuable supplement to the 1000

  11. An automated annotation tool for genomic DNA sequences using

    Indian Academy of Sciences (India)

    Genomic sequence data are often available well before the annotated sequence is published. We present a method for analysis of genomic DNA to identify coding sequences using the GeneScan algorithm and characterize these resultant sequences by BLAST. The routines are used to develop a system for automated ...

  12. Mapping and characterizing N6-methyladenine in eukaryotic genomes using single molecule real-time sequencing.

    Science.gov (United States)

    Zhu, Shijia; Beaulaurier, John; Deikus, Gintaras; Wu, Tao; Strahl, Maya; Hao, Ziyang; Luo, Guanzheng; Gregory, James A; Chess, Andrew; He, Chuan; Xiao, Andrew; Sebra, Robert; Schadt, Eric E; Fang, Gang

    2018-05-15

    N6-methyladenine (m6dA) has been discovered as a novel form of DNA methylation prevalent in eukaryotes, however, methods for high resolution mapping of m6dA events are still lacking. Single-molecule real-time (SMRT) sequencing has enabled the detection of m6dA events at single-nucleotide resolution in prokaryotic genomes, but its application to detecting m6dA in eukaryotic genomes has not been rigorously examined. Herein, we identified unique characteristics of eukaryotic m6dA methylomes that fundamentally differ from those of prokaryotes. Based on these differences, we describe the first approach for mapping m6dA events using SMRT sequencing specifically designed for the study of eukaryotic genomes, and provide appropriate strategies for designing experiments and carrying out sequencing in future studies. We apply the novel approach to study two eukaryotic genomes. For green algae, we construct the first complete genome-wide map of m6dA at single nucleotide and single molecule resolution. For human lymphoblastoid cells (hLCLs), joint analyses of SMRT sequencing and independent sequencing data suggest that putative m6dA events are enriched in the promoters of young, full length LINE-1 elements (L1s). These analyses demonstrate a general method for rigorous mapping and characterization of m6dA events in eukaryotic genomes. Published by Cold Spring Harbor Laboratory Press.

  13. Preliminary Genomic Characterization of Ten Hardwood Tree Species from Multiplexed Low Coverage Whole Genome Sequencing.

    Directory of Open Access Journals (Sweden)

    Margaret Staton

    Full Text Available Forest health issues are on the rise in the United States, resulting from introduction of alien pests and diseases, coupled with abiotic stresses related to climate change. Increasingly, forest scientists are finding genetic/genomic resources valuable in addressing forest health issues. For a set of ten ecologically and economically important native hardwood tree species representing a broad phylogenetic spectrum, we used low coverage whole genome sequencing from multiplex Illumina paired ends to economically profile their genomic content. For six species, the genome content was further analyzed by flow cytometry in order to determine the nuclear genome size. Sequencing yielded a depth of 0.8X to 7.5X, from which in silico analysis yielded preliminary estimates of gene and repetitive sequence content in the genome for each species. Thousands of genomic SSRs were identified, with a clear predisposition toward dinucleotide repeats and AT-rich repeat motifs. Flanking primers were designed for SSR loci for all ten species, ranging from 891 loci in sugar maple to 18,167 in redbay. In summary, we have demonstrated that useful preliminary genome information including repeat content, gene content and useful SSR markers can be obtained at low cost and time input from a single lane of Illumina multiplex sequence.

  14. Mining genome sequencing data to identify the genomic features linked to breast cancer histopathology

    Science.gov (United States)

    Ping, Zheng; Siegal, Gene P.; Almeida, Jonas S.; Schnitt, Stuart J.; Shen, Dejun

    2014-01-01

    Background: Genetics and genomics have radically altered our understanding of breast cancer progression. However, the genomic basis of various histopathologic features of breast cancer is not yet well-defined. Materials and Methods: The Cancer Genome Atlas (TCGA) is an international database containing a large collection of human cancer genome sequencing data. cBioPortal is a web tool developed for mining these sequencing data. We performed mining of TCGA sequencing data in an attempt to characterize the genomic features correlated with breast cancer histopathology. We first assessed the quality of the TCGA data using a group of genes with known alterations in various cancers. Both genome-wide gene mutation and copy number changes as well as a group of genes with a high frequency of genetic changes were then correlated with various histopathologic features of invasive breast cancer. Results: Validation of TCGA data using a group of genes with known alterations in breast cancer suggests that the TCGA has accurately documented the genomic abnormalities of multiple malignancies. Further analysis of TCGA breast cancer sequencing data shows that accumulation of specific genomic defects is associated with higher tumor grade, larger tumor size and receptor negativity. Distinct groups of genomic changes were found to be associated with the different grades of invasive ductal carcinoma. The mutator role of the TP53 gene was validated by genomic sequencing data of invasive breast cancer and TP53 mutation was found to play a critical role in defining high tumor grade. Conclusions: Data mining of the TCGA genome sequencing data is an innovative and reliable method to help characterize the genomic abnormalities associated with histopathologic features of invasive breast cancer. PMID:24672738

  15. Mining genome sequencing data to identify the genomic features linked to breast cancer histopathology

    Directory of Open Access Journals (Sweden)

    Zheng Ping

    2014-01-01

    Full Text Available Background: Genetics and genomics have radically altered our understanding of breast cancer progression. However, the genomic basis of various histopathologic features of breast cancer is not yet well-defined. Materials and Methods: The Cancer Genome Atlas (TCGA is an international database containing a large collection of human cancer genome sequencing data. cBioPortal is a web tool developed for mining these sequencing data. We performed mining of TCGA sequencing data in an attempt to characterize the genomic features correlated with breast cancer histopathology. We first assessed the quality of the TCGA data using a group of genes with known alterations in various cancers. Both genome-wide gene mutation and copy number changes as well as a group of genes with a high frequency of genetic changes were then correlated with various histopathologic features of invasive breast cancer. Results: Validation of TCGA data using a group of genes with known alterations in breast cancer suggests that the TCGA has accurately documented the genomic abnormalities of multiple malignancies. Further analysis of TCGA breast cancer sequencing data shows that accumulation of specific genomic defects is associated with higher tumor grade, larger tumor size and receptor negativity. Distinct groups of genomic changes were found to be associated with the different grades of invasive ductal carcinoma. The mutator role of the TP53 gene was validated by genomic sequencing data of invasive breast cancer and TP53 mutation was found to play a critical role in defining high tumor grade. Conclusions: Data mining of the TCGA genome sequencing data is an innovative and reliable method to help characterize the genomic abnormalities associated with histopathologic features of invasive breast cancer.

  16. Sequencing and characterizing the genome of Estrella lausannensis as an undergraduate project: training students and biological insights

    Directory of Open Access Journals (Sweden)

    Claire eBertelli

    2015-02-01

    Full Text Available With the widespread availability of high-throughput sequencing technologies, sequencing projects have become pervasive in the molecular life sciences. The huge bulk of data generated daily must be analyzed further by biologists with skills in bioinformatics and by embedded bioinformaticians, i.e., bioinformaticians integrated in wet lab research groups. Thus, students interested in molecular life sciences must be trained in the main steps of genomics: sequencing, assembly, annotation and analysis. To reach that goal, a practical course has been set up for master students at the University of Lausanne: the Sequence a genome class. At the beginning of the academic year, a few bacterial species whose genome is unknown are provided to the students, who sequence and assemble the genome(s and perform manual annotation. Here, we report the progress of the first class from September 2010 to June 2011 and the results obtained by seven master students who specifically assembled and annotated the genome of Estrella lausannensis, an obligate intracellular bacterium related to Chlamydia. The draft genome of Estrella is composed of 29 scaffolds encompassing 2,819,825 bp that encode for 2,233 putative proteins. Estrella also possesses a 9,136 bp plasmid that encodes for 14 genes, among which we found an integrase and a toxin/antitoxin module. Like all other members of the Chlamydiales order, Estrella possesses a highly conserved type III secretion system, considered as a key virulence factor. The annotation of the Estrella genome also allowed the characterization of the metabolic abilities of this strictly intracellular bacterium. Altogether, the students provided the scientific community with the Estrella genome sequence and a preliminary understanding of the biology of this recently-discovered bacterial genus, while learning to use cutting-edge technologies for sequencing and to perform bioinformatics analyses.

  17. Gene Discovery through Genomic Sequencing of Brucella abortus

    OpenAIRE

    Sánchez, Daniel O.; Zandomeni, Ruben O.; Cravero, Silvio; Verdún, Ramiro E.; Pierrou, Ester; Faccio, Paula; Diaz, Gabriela; Lanzavecchia, Silvia; Agüero, Fernán; Frasch, Alberto C. C.; Andersson, Siv G. E.; Rossetti, Osvaldo L.; Grau, Oscar; Ugalde, Rodolfo A.

    2001-01-01

    Brucella abortus is the etiological agent of brucellosis, a disease that affects bovines and human. We generated DNA random sequences from the genome of B. abortus strain 2308 in order to characterize molecular targets that might be useful for developing immunological or chemotherapeutic strategies against this pathogen. The partial sequencing of 1,899 clones allowed the identification of 1,199 genomic sequence surveys (GSSs) with high homology (BLAST expect value < 10−5) to sequences deposit...

  18. Building a model: developing genomic resources for common milkweed (Asclepias syriaca) with low coverage genome sequencing.

    Science.gov (United States)

    Straub, Shannon C K; Fishbein, Mark; Livshultz, Tatyana; Foster, Zachary; Parks, Matthew; Weitemier, Kevin; Cronn, Richard C; Liston, Aaron

    2011-05-04

    Milkweeds (Asclepias L.) have been extensively investigated in diverse areas of evolutionary biology and ecology; however, there are few genetic resources available to facilitate and compliment these studies. This study explored how low coverage genome sequencing of the common milkweed (Asclepias syriaca L.) could be useful in characterizing the genome of a plant without prior genomic information and for development of genomic resources as a step toward further developing A. syriaca as a model in ecology and evolution. A 0.5× genome of A. syriaca was produced using Illumina sequencing. A virtually complete chloroplast genome of 158,598 bp was assembled, revealing few repeats and loss of three genes: accD, clpP, and ycf1. A nearly complete rDNA cistron (18S-5.8S-26S; 7,541 bp) and 5S rDNA (120 bp) sequence were obtained. Assessment of polymorphism revealed that the rDNA cistron and 5S rDNA had 0.3% and 26.7% polymorphic sites, respectively. A partial mitochondrial genome sequence (130,764 bp), with identical gene content to tobacco, was also assembled. An initial characterization of repeat content indicated that Ty1/copia-like retroelements are the most common repeat type in the milkweed genome. At least one A. syriaca microread hit 88% of Catharanthus roseus (Apocynaceae) unigenes (median coverage of 0.29×) and 66% of single copy orthologs (COSII) in asterids (median coverage of 0.14×). From this partial characterization of the A. syriaca genome, markers for population genetics (microsatellites) and phylogenetics (low-copy nuclear genes) studies were developed. The results highlight the promise of next generation sequencing for development of genomic resources for any organism. Low coverage genome sequencing allows characterization of the high copy fraction of the genome and exploration of the low copy fraction of the genome, which facilitate the development of molecular tools for further study of a target species and its relatives. This study represents a first

  19. Genome wide characterization of simple sequence repeats in watermelon genome and their application in comparative mapping and genetic diversity analysis.

    Science.gov (United States)

    Zhu, Huayu; Song, Pengyao; Koo, Dal-Hoe; Guo, Luqin; Li, Yanman; Sun, Shouru; Weng, Yiqun; Yang, Luming

    2016-08-05

    Microsatellite markers are one of the most informative and versatile DNA-based markers used in plant genetic research, but their development has traditionally been difficult and costly. The whole genome sequencing with next-generation sequencing (NGS) technologies provides large amounts of sequence data to develop numerous microsatellite markers at whole genome scale. SSR markers have great advantage in cross-species comparisons and allow investigation of karyotype and genome evolution through highly efficient computation approaches such as in silico PCR. Here we described genome wide development and characterization of SSR markers in the watermelon (Citrullus lanatus) genome, which were then use in comparative analysis with two other important crop species in the Cucurbitaceae family: cucumber (Cucumis sativus L.) and melon (Cucumis melo L.). We further applied these markers in evaluating the genetic diversity and population structure in watermelon germplasm collections. A total of 39,523 microsatellite loci were identified from the watermelon draft genome with an overall density of 111 SSRs/Mbp, and 32,869 SSR primers were designed with suitable flanking sequences. The dinucleotide SSRs were the most common type representing 34.09 % of the total SSR loci and the AT-rich motifs were the most abundant in all nucleotide repeat types. In silico PCR analysis identified 832 and 925 SSR markers with each having a single amplicon in the cucumber and melon draft genome, respectively. Comparative analysis with these cross-species SSR markers revealed complicated mosaic patterns of syntenic blocks among the genomes of three species. In addition, genetic diversity analysis of 134 watermelon accessions with 32 highly informative SSR loci placed these lines into two groups with all accessions of C.lanatus var. citorides and three accessions of C. colocynthis clustered in one group and all accessions of C. lanatus var. lanatus and the remaining accessions of C. colocynthis

  20. Validation of rice genome sequence by optical mapping

    Directory of Open Access Journals (Sweden)

    Pape Louise

    2007-08-01

    Full Text Available Abstract Background Rice feeds much of the world, and possesses the simplest genome analyzed to date within the grass family, making it an economically relevant model system for other cereal crops. Although the rice genome is sequenced, validation and gap closing efforts require purely independent means for accurate finishing of sequence build data. Results To facilitate ongoing sequencing finishing and validation efforts, we have constructed a whole-genome SwaI optical restriction map of the rice genome. The physical map consists of 14 contigs, covering 12 chromosomes, with a total genome size of 382.17 Mb; this value is about 11% smaller than original estimates. 9 of the 14 optical map contigs are without gaps, covering chromosomes 1, 2, 3, 4, 5, 7, 8 10, and 12 in their entirety – including centromeres and telomeres. Alignments between optical and in silico restriction maps constructed from IRGSP (International Rice Genome Sequencing Project and TIGR (The Institute for Genomic Research genome sequence sources are comprehensive and informative, evidenced by map coverage across virtually all published gaps, discovery of new ones, and characterization of sequence misassemblies; all totalling ~14 Mb. Furthermore, since optical maps are ordered restriction maps, identified discordances are pinpointed on a reliable physical scaffold providing an independent resource for closure of gaps and rectification of misassemblies. Conclusion Analysis of sequence and optical mapping data effectively validates genome sequence assemblies constructed from large, repeat-rich genomes. Given this conclusion we envision new applications of such single molecule analysis that will merge advantages offered by high-resolution optical maps with inexpensive, but short sequence reads generated by emerging sequencing platforms. Lastly, map construction techniques presented here points the way to new types of comparative genome analysis that would focus on discernment of

  1. 10KP: A phylodiverse genome sequencing plan.

    Science.gov (United States)

    Cheng, Shifeng; Melkonian, Michael; Smith, Stephen A; Brockington, Samuel; Archibald, John M; Delaux, Pierre-Marc; Li, Fay-Wei; Melkonian, Barbara; Mavrodiev, Evgeny V; Sun, Wenjing; Fu, Yuan; Yang, Huanming; Soltis, Douglas E; Graham, Sean W; Soltis, Pamela S; Liu, Xin; Xu, Xun; Wong, Gane Ka-Shu

    2018-03-01

    Understanding plant evolution and diversity in a phylogenomic context is an enormous challenge due, in part, to limited availability of genome-scale data across phylodiverse species. The 10KP (10,000 Plants) Genome Sequencing Project will sequence and characterize representative genomes from every major clade of embryophytes, green algae, and protists (excluding fungi) within the next 5 years. By implementing and continuously improving leading-edge sequencing technologies and bioinformatics tools, 10KP will catalogue the genome content of plant and protist diversity and make these data freely available as an enduring foundation for future scientific discoveries and applications. 10KP is structured as an international consortium, open to the global community, including botanical gardens, plant research institutes, universities, and private industry. Our immediate goal is to establish a policy framework for this endeavor, the principles of which are outlined here.

  2. 10KP: A phylodiverse genome sequencing plan

    Science.gov (United States)

    Cheng, Shifeng; Melkonian, Michael; Brockington, Samuel; Archibald, John M; Delaux, Pierre-Marc; Melkonian, Barbara; Mavrodiev, Evgeny V; Sun, Wenjing; Fu, Yuan; Yang, Huanming; Soltis, Douglas E; Graham, Sean W; Soltis, Pamela S; Liu, Xin; Xu, Xun

    2018-01-01

    Abstract Understanding plant evolution and diversity in a phylogenomic context is an enormous challenge due, in part, to limited availability of genome-scale data across phylodiverse species. The 10KP (10,000 Plants) Genome Sequencing Project will sequence and characterize representative genomes from every major clade of embryophytes, green algae, and protists (excluding fungi) within the next 5 years. By implementing and continuously improving leading-edge sequencing technologies and bioinformatics tools, 10KP will catalogue the genome content of plant and protist diversity and make these data freely available as an enduring foundation for future scientific discoveries and applications. 10KP is structured as an international consortium, open to the global community, including botanical gardens, plant research institutes, universities, and private industry. Our immediate goal is to establish a policy framework for this endeavor, the principles of which are outlined here. PMID:29618049

  3. Building a model: developing genomic resources for common milkweed (Asclepias syriaca with low coverage genome sequencing

    Directory of Open Access Journals (Sweden)

    Weitemier Kevin

    2011-05-01

    Full Text Available Abstract Background Milkweeds (Asclepias L. have been extensively investigated in diverse areas of evolutionary biology and ecology; however, there are few genetic resources available to facilitate and compliment these studies. This study explored how low coverage genome sequencing of the common milkweed (Asclepias syriaca L. could be useful in characterizing the genome of a plant without prior genomic information and for development of genomic resources as a step toward further developing A. syriaca as a model in ecology and evolution. Results A 0.5× genome of A. syriaca was produced using Illumina sequencing. A virtually complete chloroplast genome of 158,598 bp was assembled, revealing few repeats and loss of three genes: accD, clpP, and ycf1. A nearly complete rDNA cistron (18S-5.8S-26S; 7,541 bp and 5S rDNA (120 bp sequence were obtained. Assessment of polymorphism revealed that the rDNA cistron and 5S rDNA had 0.3% and 26.7% polymorphic sites, respectively. A partial mitochondrial genome sequence (130,764 bp, with identical gene content to tobacco, was also assembled. An initial characterization of repeat content indicated that Ty1/copia-like retroelements are the most common repeat type in the milkweed genome. At least one A. syriaca microread hit 88% of Catharanthus roseus (Apocynaceae unigenes (median coverage of 0.29× and 66% of single copy orthologs (COSII in asterids (median coverage of 0.14×. From this partial characterization of the A. syriaca genome, markers for population genetics (microsatellites and phylogenetics (low-copy nuclear genes studies were developed. Conclusions The results highlight the promise of next generation sequencing for development of genomic resources for any organism. Low coverage genome sequencing allows characterization of the high copy fraction of the genome and exploration of the low copy fraction of the genome, which facilitate the development of molecular tools for further study of a target species

  4. Genome Sequencing

    DEFF Research Database (Denmark)

    Sato, Shusei; Andersen, Stig Uggerhøj

    2014-01-01

    The current Lotus japonicus reference genome sequence is based on a hybrid assembly of Sanger TAC/BAC, Sanger shotgun and Illumina shotgun sequencing data generated from the Miyakojima-MG20 accession. It covers nearly all expressed L. japonicus genes and has been annotated mainly based on transcr......The current Lotus japonicus reference genome sequence is based on a hybrid assembly of Sanger TAC/BAC, Sanger shotgun and Illumina shotgun sequencing data generated from the Miyakojima-MG20 accession. It covers nearly all expressed L. japonicus genes and has been annotated mainly based...

  5. The Complete Sequence of a Human Parainfluenzavirus 4 Genome

    Science.gov (United States)

    Yea, Carmen; Cheung, Rose; Collins, Carol; Adachi, Dena; Nishikawa, John; Tellier, Raymond

    2009-01-01

    Although the human parainfluenza virus 4 (HPIV4) has been known for a long time, its genome, alone among the human paramyxoviruses, has not been completely sequenced to date. In this study we obtained the first complete genomic sequence of HPIV4 from a clinical isolate named SKPIV4 obtained at the Hospital for Sick Children in Toronto (Ontario, Canada). The coding regions for the N, P/V, M, F and HN proteins show very high identities (95% to 97%) with previously available partial sequences for HPIV4B. The sequence for the L protein and the non-coding regions represent new information. A surprising feature of the genome is its length, more than 17 kb, making it the longest genome within the genus Rubulavirus, although the length is well within the known range of 15 kb to 19 kb for the subfamily Paramyxovirinae. The availability of a complete genomic sequence will facilitate investigations on a respiratory virus that is still not completely characterized. PMID:21994536

  6. The Complete Sequence of a Human Parainfluenzavirus 4 Genome

    Directory of Open Access Journals (Sweden)

    Carmen Yea

    2009-06-01

    Full Text Available Although the human parainfluenza virus 4 (HPIV4 has been known for a long time, its genome, alone among the human paramyxoviruses, has not been completely sequenced to date. In this study we obtained the first complete genomic sequence of HPIV4 from a clinical isolate named SKPIV4 obtained at the Hospital for Sick Children in Toronto (Ontario, Canada. The coding regions for the N, P/V, M, F and HN proteins show very high identities (95% to 97% with previously available partial sequences for HPIV4B. The sequence for the L protein and the non-coding regions represent new information. A surprising feature of the genome is its length, more than 17 kb, making it the longest genome within the genus Rubulavirus, although the length is well within the known range of 15 kb to 19 kb for the subfamily Paramyxovirinae. The availability of a complete genomic sequence will facilitate investigations on a respiratory virus that is still not completely characterized.

  7. Complete genome sequences of six measles virus strains

    NARCIS (Netherlands)

    Phan, M.V.T. (My V.T.); C.M.E. Schapendonk (Claudia); B.B. Oude Munnink (Bas B.); M.P.G. Koopmans D.V.M. (Marion); R.L. de Swart (Rik); Cotten, M. (Matthew)

    2018-01-01

    textabstractGenetic characterization of wild-type measles virus (MV) strains is a critical component of measles surveillance and molecular epidemiology. We have obtained complete genome sequences of six MV strains belonging to different genotypes, using random-primed next generation sequencing.

  8. Application of whole genome shotgun sequencing for detection and characterization of genetically modified organisms and derived products.

    Science.gov (United States)

    Holst-Jensen, Arne; Spilsberg, Bjørn; Arulandhu, Alfred J; Kok, Esther; Shi, Jianxin; Zel, Jana

    2016-07-01

    The emergence of high-throughput, massive or next-generation sequencing technologies has created a completely new foundation for molecular analyses. Various selective enrichment processes are commonly applied to facilitate detection of predefined (known) targets. Such approaches, however, inevitably introduce a bias and are prone to miss unknown targets. Here we review the application of high-throughput sequencing technologies and the preparation of fit-for-purpose whole genome shotgun sequencing libraries for the detection and characterization of genetically modified and derived products. The potential impact of these new sequencing technologies for the characterization, breeding selection, risk assessment, and traceability of genetically modified organisms and genetically modified products is yet to be fully acknowledged. The published literature is reviewed, and the prospects for future developments and use of the new sequencing technologies for these purposes are discussed.

  9. Complete genome sequence of Arcanobacterium haemolyticum type strain (11018T)

    Energy Technology Data Exchange (ETDEWEB)

    Yasawong, Montri [HZI - Helmholtz Centre for Infection Research, Braunschweig, Germany; Teshima, Hazuki [Los Alamos National Laboratory (LANL); Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute; Nolan, Matt [U.S. Department of Energy, Joint Genome Institute; Lucas, Susan [U.S. Department of Energy, Joint Genome Institute; Glavina Del Rio, Tijana [U.S. Department of Energy, Joint Genome Institute; Tice, Hope [U.S. Department of Energy, Joint Genome Institute; Cheng, Jan-Fang [U.S. Department of Energy, Joint Genome Institute; Bruce, David [Los Alamos National Laboratory (LANL); Detter, J. Chris [U.S. Department of Energy, Joint Genome Institute; Tapia, Roxanne [Los Alamos National Laboratory (LANL); Han, Cliff [Los Alamos National Laboratory (LANL); Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Pitluck, Sam [U.S. Department of Energy, Joint Genome Institute; Liolios, Konstantinos [U.S. Department of Energy, Joint Genome Institute; Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Mavromatis, K [U.S. Department of Energy, Joint Genome Institute; Mikhailova, Natalia [U.S. Department of Energy, Joint Genome Institute; Pati, Amrita [U.S. Department of Energy, Joint Genome Institute; Chen, Amy [U.S. Department of Energy, Joint Genome Institute; Palaniappan, Krishna [U.S. Department of Energy, Joint Genome Institute; Land, Miriam L [ORNL; Hauser, Loren John [ORNL; Chang, Yun-Juan [ORNL; Jeffries, Cynthia [Oak Ridge National Laboratory (ORNL); Rohde, Manfred [HZI - Helmholtz Centre for Infection Research, Braunschweig, Germany; Sikorski, Johannes [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Pukall, Rudiger [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Goker, Markus [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Woyke, Tanja [U.S. Department of Energy, Joint Genome Institute; Bristow, James [U.S. Department of Energy, Joint Genome Institute; Eisen, Jonathan [U.S. Department of Energy, Joint Genome Institute; Markowitz, Victor [U.S. Department of Energy, Joint Genome Institute; Hugenholtz, Philip [U.S. Department of Energy, Joint Genome Institute; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute; Klenk, Hans-Peter [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany

    2010-01-01

    Vulcanisaeta distributa Itoh et al. 2002 belongs to the family Thermoproteaceae in the phylum Crenarchaeota. The genus Vulcanisaeta is characterized by a global distribution in hot and acidic springs. This is the first genome sequence from a member of the genus Vulcanisaeta and seventh genome sequence in the family Thermoproteaceae. The 2,374,137 bp long genome with its 2,544 protein-coding and 49 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  10. Characterization of genomic sequence showing strong association with polyembryony among diverse Citrus species and cultivars, and its synteny with Vitis and Populus.

    Science.gov (United States)

    Nakano, Michiharu; Shimada, Takehiko; Endo, Tomoko; Fujii, Hiroshi; Nesumi, Hirohisa; Kita, Masayuki; Ebina, Masumi; Shimizu, Tokurou; Omura, Mitsuo

    2012-02-01

    Polyembryony, in which multiple somatic nucellar cell-derived embryos develop in addition to the zygotic embryo in a seed, is common in the genus Citrus. Previous genetic studies indicated polyembryony is mainly determined by a single locus, but the underlying molecular mechanism is still unclear. As a step towards identification and characterization of the gene or genes responsible for nucellar embryogenesis in Citrus, haplotype-specific physical maps around the polyembryony locus were constructed. By sequencing three BAC clones aligned on the polyembryony haplotype, a single contiguous draft sequence consisting of 380 kb containing 70 predicted open reading frames (ORFs) was reconstructed. Single nucleotide polymorphism genotypes detected in the sequenced genomic region showed strong association with embryo type in Citrus, indicating a common polyembryony locus is shared among widely diverse Citrus cultivars and species. The arrangement of the predicted ORFs in the characterized genomic region showed high collinearity to the genomic sequence of chromosome 4 of Vitis vinifera and linkage group VI of Populus trichocarpa, suggesting that the syntenic relationship among these species is conserved even though V. vinifera and P. trichocarpa are non-apomictic species. This is the first study to characterize in detail the genomic structure of an apomixis locus determining adventitious embryony. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.

  11. Complete genome sequence of Ikoma lyssavirus.

    Science.gov (United States)

    Marston, Denise A; Ellis, Richard J; Horton, Daniel L; Kuzmin, Ivan V; Wise, Emma L; McElhinney, Lorraine M; Banyard, Ashley C; Ngeleja, Chanasa; Keyyu, Julius; Cleaveland, Sarah; Lembo, Tiziana; Rupprecht, Charles E; Fooks, Anthony R

    2012-09-01

    Lyssaviruses (family Rhabdoviridae) constitute one of the most important groups of viral zoonoses globally. All lyssaviruses cause the disease rabies, an acute progressive encephalitis for which, once symptoms occur, there is no effective cure. Currently available vaccines are highly protective against the predominantly circulating lyssavirus species. Using next-generation sequencing technologies, we have obtained the whole-genome sequence for a novel lyssavirus, Ikoma lyssavirus (IKOV), isolated from an African civet in Tanzania displaying clinical signs of rabies. Genetically, this virus is the most divergent within the genus Lyssavirus. Characterization of the genome will help to improve our understanding of lyssavirus diversity and enable investigation into vaccine-induced immunity and protection.

  12. Context based computational analysis and characterization of ARS consensus sequences (ACS of Saccharomyces cerevisiae genome

    Directory of Open Access Journals (Sweden)

    Vinod Kumar Singh

    2016-09-01

    Full Text Available Genome-wide experimental studies in Saccharomyces cerevisiae reveal that autonomous replicating sequence (ARS requires an essential consensus sequence (ACS for replication activity. Computational studies identified thousands of ACS like patterns in the genome. However, only a few hundreds of these sites act as replicating sites and the rest are considered as dormant or evolving sites. In a bid to understand the sequence makeup of replication sites, a content and context-based analysis was performed on a set of replicating ACS sequences that binds to origin-recognition complex (ORC denoted as ORC-ACS and non-replicating ACS sequences (nrACS, that are not bound by ORC. In this study, DNA properties such as base composition, correlation, sequence dependent thermodynamic and DNA structural profiles, and their positions have been considered for characterizing ORC-ACS and nrACS. Analysis reveals that ORC-ACS depict marked differences in nucleotide composition and context features in its vicinity compared to nrACS. Interestingly, an A-rich motif was also discovered in ORC-ACS sequences within its nucleosome-free region. Profound changes in the conformational features, such as DNA helical twist, inclination angle and stacking energy between ORC-ACS and nrACS were observed. Distribution of ACS motifs in the non-coding segments points to the locations of ORC-ACS which are found far away from the adjacent gene start position compared to nrACS thereby enabling an accessible environment for ORC-proteins. Our attempt is novel in considering the contextual view of ACS and its flanking region along with nucleosome positioning in the S. cerevisiae genome and may be useful for any computational prediction scheme.

  13. Yeast genome sequencing:

    DEFF Research Database (Denmark)

    Piskur, Jure; Langkjær, Rikke Breinhold

    2004-01-01

    For decades, unicellular yeasts have been general models to help understand the eukaryotic cell and also our own biology. Recently, over a dozen yeast genomes have been sequenced, providing the basis to resolve several complex biological questions. Analysis of the novel sequence data has shown...... of closely related species helps in gene annotation and to answer how many genes there really are within the genomes. Analysis of non-coding regions among closely related species has provided an example of how to determine novel gene regulatory sequences, which were previously difficult to analyse because...... they are short and degenerate and occupy different positions. Comparative genomics helps to understand the origin of yeasts and points out crucial molecular events in yeast evolutionary history, such as whole-genome duplication and horizontal gene transfer(s). In addition, the accumulating sequence data provide...

  14. Whole-genome sequence variation, population structure and demographic history of the Dutch population

    NARCIS (Netherlands)

    The Genome of the Netherlands Consortium; T. Marschall (Tobias); A. Schönhuth (Alexander)

    2014-01-01

    htmlabstractWhole-genome sequencing enables complete characterization of genetic variation, but geographic clustering of rare alleles demands many diverse populations be studied. Here we describe the Genome of the Netherlands (GoNL) Project, in which we sequenced the whole genomes of 250 Dutch

  15. Genome Sequence Databases (Overview): Sequencing and Assembly

    Energy Technology Data Exchange (ETDEWEB)

    Lapidus, Alla L.

    2009-01-01

    From the date its role in heredity was discovered, DNA has been generating interest among scientists from different fields of knowledge: physicists have studied the three dimensional structure of the DNA molecule, biologists tried to decode the secrets of life hidden within these long molecules, and technologists invent and improve methods of DNA analysis. The analysis of the nucleotide sequence of DNA occupies a special place among the methods developed. Thanks to the variety of sequencing technologies available, the process of decoding the sequence of genomic DNA (or whole genome sequencing) has become robust and inexpensive. Meanwhile the assembly of whole genome sequences remains a challenging task. In addition to the need to assemble millions of DNA fragments of different length (from 35 bp (Solexa) to 800 bp (Sanger)), great interest in analysis of microbial communities (metagenomes) of different complexities raises new problems and pushes some new requirements for sequence assembly tools to the forefront. The genome assembly process can be divided into two steps: draft assembly and assembly improvement (finishing). Despite the fact that automatically performed assembly (or draft assembly) is capable of covering up to 98% of the genome, in most cases, it still contains incorrectly assembled reads. The error rate of the consensus sequence produced at this stage is about 1/2000 bp. A finished genome represents the genome assembly of much higher accuracy (with no gaps or incorrectly assembled areas) and quality ({approx}1 error/10,000 bp), validated through a number of computer and laboratory experiments.

  16. Whole-genome sequence variation, population structure and demographic history of the Dutch population

    NARCIS (Netherlands)

    Francioli, Laurent C.; Menelaou, Andronild; Pulit, Sara L.; Van Dijk, Freerk; Palamara, Pier Francesco; Elbers, Clara C.; Neerincx, Pieter B. T.; Ye, Kai; Guryev, Victor; Kloosterman, Wigard P.; Deelen, Patrick; Abdellaoui, Abdel; Van Leeuwen, Elisabeth M.; Van Oven, Mannis; Vermaat, Martijn; Li, Mingkun; Laros, Jeroen F. J.; Karssen, Lennart C.; Kanterakis, Alexandros; Amin, Najaf; Hottenga, Jouke Jan; Lameijer, Eric-Wubbo; Kattenberg, Mathijs; Dijkstra, Martijn; Byelas, Heorhiy; Van Settenl, Jessica; Van Schaik, Barbera D. C.; Bot, Jan; Nijman, Isaac J.; Renkens, Ivo; Marscha, Tobias; Schonhuth, Alexander; Hehir-Kwa, Jayne Y.; Handsaker, Robert E.; Polak, Paz; Sohail, Mashaal; Vuzman, Dana; Hormozdiari, Fereydoun; Van Enckevort, David; Mei, Hailiang; Koval, Vyacheslav; Moed, Ma-Tthijs H.; Van der Velde, K. Joeri; Rivadeneira, Fernando; Estrada, Karol; Medina-Gomez, Carolina; Isaacs, Aaron; Platteel, Mathieu; Swertz, Morris A.; Wijmenga, Cisca

    Whole-genome sequencing enables complete characterization of genetic variation, but geographic clustering of rare alleles demands many diverse populations be studied. Here we describe the Genome of the Netherlands (GoNL) Project, in which we sequenced the whole genomes of 250 Dutch parent-offspring

  17. Genome-Wide Characterization of Simple Sequence Repeat (SSR) Loci in Chinese Jujube and Jujube SSR Primer Transferability

    Science.gov (United States)

    Xiao, Jing; Zhao, Jin; Liu, Mengjun; Liu, Ping; Dai, Li; Zhao, Zhihui

    2015-01-01

    Chinese jujube (Ziziphus jujuba), an economically important species in the Rhamnaceae family, is a popular fruit tree in Asia. Here, we surveyed and characterized simple sequence repeats (SSRs) in the jujube genome. A total of 436,676 SSR loci were identified, with an average distance of 0.93 Kb between the loci. A large proportion of the SSRs included mononucleotide, dinucleotide and trinucleotide repeat motifs, which accounted for 64.87%, 24.40%, and 8.74% of all repeats, respectively. Among the mononucleotide repeats, A/T was the most common, whereas AT/TA was the most common dinucleotide repeat. A total of 30,565 primer pairs were successfully designed and screened using a series of criteria. Moreover, 725 of 1,000 randomly selected primer pairs were effective among 6 cultivars, and 511 of these primer pairs were polymorphic. Sequencing the amplicons of two SSRs across three jujube cultivars revealed variations in the repeats. The transferability of jujube SSR primers proved that 35/64 SSRs could be transferred across family boundary. Using jujube SSR primers, clustering analysis results from 15 species were highly consistent with the Angiosperm Phylogeny Group (APGIII) System. The genome-wide characterization of SSRs in Chinese jujube is very valuable for whole-genome characterization and marker-assisted selection in jujube breeding. In addition, the transferability of jujube SSR primers could provide a solid foundation for their further utilization. PMID:26000739

  18. Genomic sequence around butterfly wing development genes: annotation and comparative analysis.

    Directory of Open Access Journals (Sweden)

    Inês C Conceição

    Full Text Available BACKGROUND: Analysis of genomic sequence allows characterization of genome content and organization, and access beyond gene-coding regions for identification of functional elements. BAC libraries, where relatively large genomic regions are made readily available, are especially useful for species without a fully sequenced genome and can increase genomic coverage of phylogenetic and biological diversity. For example, no butterfly genome is yet available despite the unique genetic and biological properties of this group, such as diversified wing color patterns. The evolution and development of these patterns is being studied in a few target species, including Bicyclus anynana, where a whole-genome BAC library allows targeted access to large genomic regions. METHODOLOGY/PRINCIPAL FINDINGS: We characterize ∼1.3 Mb of genomic sequence around 11 selected genes expressed in B. anynana developing wings. Extensive manual curation of in silico predictions, also making use of a large dataset of expressed genes for this species, identified repetitive elements and protein coding sequence, and highlighted an expansion of Alcohol dehydrogenase genes. Comparative analysis with orthologous regions of the lepidopteran reference genome allowed assessment of conservation of fine-scale synteny (with detection of new inversions and translocations and of DNA sequence (with detection of high levels of conservation of non-coding regions around some, but not all, developmental genes. CONCLUSIONS: The general properties and organization of the available B. anynana genomic sequence are similar to the lepidopteran reference, despite the more than 140 MY divergence. Our results lay the groundwork for further studies of new interesting findings in relation to both coding and non-coding sequence: 1 the Alcohol dehydrogenase expansion with higher similarity between the five tandemly-repeated B. anynana paralogs than with the corresponding B. mori orthologs, and 2 the high

  19. Efficiency to Discovery Transgenic Loci in GM Rice Using Next Generation Sequencing Whole Genome Re-sequencing

    Directory of Open Access Journals (Sweden)

    Doori Park

    2015-09-01

    Full Text Available Molecular characterization technology in genetically modified organisms, in addition to how transgenic biotechnologies are developed now require full transparency to assess the risk to living modified and non-modified organisms. Next generation sequencing (NGS methodology is suggested as an effective means in genome characterization and detection of transgenic insertion locations. In the present study, we applied NGS to insert transgenic loci, specifically the epidermal growth factor (EGF in genetically modified rice cells. A total of 29.3 Gb (~72× coverage was sequenced with a 2 × 150 bp paired end method by Illumina HiSeq2500, which was consecutively mapped to the rice genome and T-vector sequence. The compatible pairs of reads were successfully mapped to 10 loci on the rice chromosome and vector sequences were validated to the insertion location by polymerase chain reaction (PCR amplification. The EGF transgenic site was confirmed only on chromosome 4 by PCR. Results of this study demonstrated the success of NGS data to characterize the rice genome. Bioinformatics analyses must be developed in association with NGS data to identify highly accurate transgenic sites.

  20. Whole-genome sequencing of veterinary pathogens

    DEFF Research Database (Denmark)

    Ronco, Troels

    -electrophoresis and single-locus sequencing has been widely used to characterize such types of veterinary pathogens. However, DNA sequencing techniques have become fast and cost effective in recent years and whole-genome sequencing data provide a much higher discriminative power and reproducibility than any...... genetic background. This indicates that dairy cows can be natural carriers of S. aureus subtypes that in certain cases lead to CM. A group of isolates that mostly belonged to ST151 carried three pathogenicity islands that were primarily found in this group. The prevalence of resistance genes was generally...

  1. Discovery, genotyping and characterization of structural variation and novel sequence at single nucleotide resolution from de novo genome assemblies on a population scale

    DEFF Research Database (Denmark)

    Liu, Siyang; Huang, Shujia; Rao, Junhua

    2015-01-01

    present a novel approach implemented in a single software package, AsmVar, to discover, genotype and characterize different forms of structural variation and novel sequence from population-scale de novo genome assemblies up to nucleotide resolution. Application of AsmVar to several human de novo genome......) as well as large deletions. However, these approaches consistently display a substantial bias against the recovery of complex structural variants and novel sequence in individual genomes and do not provide interpretation information such as the annotation of ancestral state and formation mechanism. We...... assemblies captures a wide spectrum of structural variants and novel sequences present in the human population in high sensitivity and specificity. Our method provides a direct solution for investigating structural variants and novel sequences from de novo genome assemblies, facilitating the construction...

  2. Complete Genome Sequence of Bifidobacterium bifidum S17▿

    Science.gov (United States)

    Zhurina, Daria; Zomer, Aldert; Gleinser, Marita; Brancaccio, Vincenco Francesco; Auchter, Marc; Waidmann, Mark S.; Westermann, Christina; van Sinderen, Douwe; Riedel, Christian U.

    2011-01-01

    Here, we report on the first completely annotated genome sequence of a Bifidobacterium bifidum strain. B. bifidum S17, isolated from feces of a breast-fed infant, was shown to strongly adhere to intestinal epithelial cells and has potent anti-inflammatory activity in vitro and in vivo. The genome sequence will provide new insights into the biology of this potential probiotic organism and allow for the characterization of the molecular mechanisms underlying its beneficial properties. PMID:21037011

  3. Identification of genomic insertion and flanking sequence of G2-EPSPS and GAT transgenes in soybean using whole genome sequencing method

    Directory of Open Access Journals (Sweden)

    Bingfu Guo

    2016-07-01

    Full Text Available Molecular characterization of sequences flanking exogenous fragment insertions is essential for safety assessment and labeling of genetically modified organisms (GMO. In this study, the T-DNA insertion sites and flanking sequences were identified in two newly developed transgenic glyphosate-tolerant soybeans GE-J16 and ZH10-6 based on whole genome sequencing (WGS method. About 21 Gb sequence data (~21× coverage for each line was generated on Illumina HiSeq 2500 platform. The junction reads mapped to boundary of T-DNA and flanking sequences in these two events were identified by comparing all sequencing reads with soybean reference genome and sequence of transgenic vector. The putative insertion loci and flanking sequences were further confirmed by PCR amplification, Sanger sequencing, and co-segregation analysis. All these analyses supported that exogenous T-DNA fragments were integrated in positions of Chr19: 50543767-50543792 and Chr17: 7980527-7980541 in these two transgenic lines. Identification of the genomic insertion site of the G2-EPSPS and GAT transgenes will facilitate the use of their glyphosate-tolerant traits in soybean breeding program. These results also demonstrated that WGS is a cost-effective and rapid method of identifying sites of T-DNA insertions and flanking sequences in soybean.

  4. Draft genome sequence of the silver pomfret fish, Pampus argenteus.

    Science.gov (United States)

    AlMomin, Sabah; Kumar, Vinod; Al-Amad, Sami; Al-Hussaini, Mohsen; Dashti, Talal; Al-Enezi, Khaznah; Akbar, Abrar

    2016-01-01

    Silver pomfret, Pampus argenteus, is a fish species from coastal waters. Despite its high commercial value, this edible fish has not been sequenced. Hence, its genetic and genomic studies have been limited. We report the first draft genome sequence of the silver pomfret obtained using a Next Generation Sequencing (NGS) technology. We assembled 38.7 Gb of nucleotides into scaffolds of 350 Mb with N50 of about 1.5 kb, using high quality paired end reads. These scaffolds represent 63.7% of the estimated silver pomfret genome length. The newly sequenced and assembled genome has 11.06% repetitive DNA regions, and this percentage is comparable to that of the tilapia genome. The genome analysis predicted 16 322 genes. About 91% of these genes showed homology with known proteins. Many gene clusters were annotated to protein and fatty-acid metabolism pathways that may be important in the context of the meat texture and immune system developmental processes. The reference genome can pave the way for the identification of many other genomic features that could improve breeding and population-management strategies, and it can also help characterize the genetic diversity of P. argenteus.

  5. Using Partial Genomic Fosmid Libraries for Sequencing CompleteOrganellar Genomes

    Energy Technology Data Exchange (ETDEWEB)

    McNeal, Joel R.; Leebens-Mack, James H.; Arumuganathan, K.; Kuehl, Jennifer V.; Boore, Jeffrey L.; dePamphilis, Claude W.

    2005-08-26

    Organellar genome sequences provide numerous phylogenetic markers and yield insight into organellar function and molecular evolution. These genomes are much smaller in size than their nuclear counterparts; thus, their complete sequencing is much less expensive than total nuclear genome sequencing, making broader phylogenetic sampling feasible. However, for some organisms it is challenging to isolate plastid DNA for sequencing using standard methods. To overcome these difficulties, we constructed partial genomic libraries from total DNA preparations of two heterotrophic and two autotrophic angiosperm species using fosmid vectors. We then used macroarray screening to isolate clones containing large fragments of plastid DNA. A minimum tiling path of clones comprising the entire genome sequence of each plastid was selected, and these clones were shotgun-sequenced and assembled into complete genomes. Although this method worked well for both heterotrophic and autotrophic plants, nuclear genome size had a dramatic effect on the proportion of screened clones containing plastid DNA and, consequently, the overall number of clones that must be screened to ensure full plastid genome coverage. This technique makes it possible to determine complete plastid genome sequences for organisms that defy other available organellar genome sequencing methods, especially those for which limited amounts of tissue are available.

  6. Genome Sequences of Oryza Species

    KAUST Repository

    Kumagai, Masahiko

    2018-02-14

    This chapter summarizes recent data obtained from genome sequencing, annotation projects, and studies on the genome diversity of Oryza sativa and related Oryza species. O. sativa, commonly known as Asian rice, is the first monocot species whose complete genome sequence was deciphered based on physical mapping by an international collaborative effort. This genome, along with its accurate and comprehensive annotation, has become an indispensable foundation for crop genomics and breeding. With the development of innovative sequencing technologies, genomic studies of O. sativa have dramatically increased; in particular, a large number of cultivars and wild accessions have been sequenced and compared with the reference rice genome. Since de novo genome sequencing has become cost-effective, the genome of African cultivated rice, O. glaberrima, has also been determined. Comparative genomic studies have highlighted the independent domestication processes of different rice species, but it also turned out that Asian and African rice share a common gene set that has experienced similar artificial selection. An international project aimed at constructing reference genomes and examining the genome diversity of wild Oryza species is currently underway, and the genomes of some species are publicly available. This project provides a platform for investigations such as the evolution, development, polyploidization, and improvement of crops. Studies on the genomic diversity of Oryza species, including wild species, should provide new insights to solve the problem of growing food demands in the face of rapid climatic changes.

  7. Genome Sequences of Oryza Species

    KAUST Repository

    Kumagai, Masahiko; Tanaka, Tsuyoshi; Ohyanagi, Hajime; Hsing, Yue-Ie C.; Itoh, Takeshi

    2018-01-01

    This chapter summarizes recent data obtained from genome sequencing, annotation projects, and studies on the genome diversity of Oryza sativa and related Oryza species. O. sativa, commonly known as Asian rice, is the first monocot species whose complete genome sequence was deciphered based on physical mapping by an international collaborative effort. This genome, along with its accurate and comprehensive annotation, has become an indispensable foundation for crop genomics and breeding. With the development of innovative sequencing technologies, genomic studies of O. sativa have dramatically increased; in particular, a large number of cultivars and wild accessions have been sequenced and compared with the reference rice genome. Since de novo genome sequencing has become cost-effective, the genome of African cultivated rice, O. glaberrima, has also been determined. Comparative genomic studies have highlighted the independent domestication processes of different rice species, but it also turned out that Asian and African rice share a common gene set that has experienced similar artificial selection. An international project aimed at constructing reference genomes and examining the genome diversity of wild Oryza species is currently underway, and the genomes of some species are publicly available. This project provides a platform for investigations such as the evolution, development, polyploidization, and improvement of crops. Studies on the genomic diversity of Oryza species, including wild species, should provide new insights to solve the problem of growing food demands in the face of rapid climatic changes.

  8. Implementation of Whole Genome Sequencing (WGS for Identification and Characterization of Shiga Toxin-Producing Escherichia coli (STEC in the United States

    Directory of Open Access Journals (Sweden)

    Rebecca L Lindsey

    2016-05-01

    Full Text Available Shiga toxin-producing Escherichia coli (STEC is an important foodborne pathogen capable of causing severe disease in humans. Rapid and accurate identification and characterization techniques are essential during outbreak investigations. Current methods for characterization of STEC are expensive and time-consuming. With the advent of rapid and cheap whole genome sequencing (WGS benchtop sequencers, the potential exists to replace traditional workflows with WGS. The aim of this study was to validate tools to do reference identification and characterization from WGS for STEC in a single workflow within an easy to use commercially available software platform. Publically available serotype, virulence, and antimicrobial resistance databases were downloaded from the Center for Genomic Epidemiology (CGE (www.genomicepidemiology.org and integrated into a genotyping plug-in with in silico PCR tools to confirm some of the virulence genes detected from WGS data. Additionally, down sampling experiments on the WGS sequence data were performed to determine a threshold for sequence coverage needed to accurately predict serotype and virulence genes using the established workflow. The serotype database was tested on a total of 228 genomes and correctly predicted from WGS for 96.1% of O serogroups and 96.5% of H serogroups identified by conventional testing techniques. A total of 59 genomes were evaluated to determine the threshold of coverage to detect the different WGS targets, 40 were evaluated for serotype and virulence gene detection and 19 for the stx gene subtypes. For serotype, 95% of the O and 100% of the H serogroups were detected at > 40x and ≥ 30x coverage, respectively. For virulence targets and stx gene subtypes, nearly all genes were detected at > 40x, though some targets were 100% detectable from genomes with coverage ≥20x. The resistance detection tool was 97% concordant with phenotypic testing results. With isolates sequenced to > 40x

  9. Implementation of Whole Genome Sequencing (WGS) for Identification and Characterization of Shiga Toxin-Producing Escherichia coli (STEC) in the United States

    Science.gov (United States)

    Lindsey, Rebecca L.; Pouseele, Hannes; Chen, Jessica C.; Strockbine, Nancy A.; Carleton, Heather A.

    2016-01-01

    Shiga toxin-producing Escherichia coli (STEC) is an important foodborne pathogen capable of causing severe disease in humans. Rapid and accurate identification and characterization techniques are essential during outbreak investigations. Current methods for characterization of STEC are expensive and time-consuming. With the advent of rapid and cheap whole genome sequencing (WGS) benchtop sequencers, the potential exists to replace traditional workflows with WGS. The aim of this study was to validate tools to do reference identification and characterization from WGS for STEC in a single workflow within an easy to use commercially available software platform. Publically available serotype, virulence, and antimicrobial resistance databases were downloaded from the Center for Genomic Epidemiology (CGE) (www.genomicepidemiology.org) and integrated into a genotyping plug-in with in silico PCR tools to confirm some of the virulence genes detected from WGS data. Additionally, down sampling experiments on the WGS sequence data were performed to determine a threshold for sequence coverage needed to accurately predict serotype and virulence genes using the established workflow. The serotype database was tested on a total of 228 genomes and correctly predicted from WGS for 96.1% of O serogroups and 96.5% of H serogroups identified by conventional testing techniques. A total of 59 genomes were evaluated to determine the threshold of coverage to detect the different WGS targets, 40 were evaluated for serotype and virulence gene detection and 19 for the stx gene subtypes. For serotype, 95% of the O and 100% of the H serogroups were detected at > 40x and ≥ 30x coverage, respectively. For virulence targets and stx gene subtypes, nearly all genes were detected at > 40x, though some targets were 100% detectable from genomes with coverage ≥20x. The resistance detection tool was 97% concordant with phenotypic testing results. With isolates sequenced to > 40x coverage, the different

  10. Repetitive DNA in the pea (Pisum sativum L. genome: comprehensive characterization using 454 sequencing and comparison to soybean and Medicago truncatula

    Directory of Open Access Journals (Sweden)

    Navrátilová Alice

    2007-11-01

    Full Text Available Abstract Background Extraordinary size variation of higher plant nuclear genomes is in large part caused by differences in accumulation of repetitive DNA. This makes repetitive DNA of great interest for studying the molecular mechanisms shaping architecture and function of complex plant genomes. However, due to methodological constraints of conventional cloning and sequencing, a global description of repeat composition is available for only a very limited number of higher plants. In order to provide further data required for investigating evolutionary patterns of repeated DNA within and between species, we used a novel approach based on massive parallel sequencing which allowed a comprehensive repeat characterization in our model species, garden pea (Pisum sativum. Results Analysis of 33.3 Mb sequence data resulted in quantification and partial sequence reconstruction of major repeat families occurring in the pea genome with at least thousands of copies. Our results showed that the pea genome is dominated by LTR-retrotransposons, estimated at 140,000 copies/1C. Ty3/gypsy elements are less diverse and accumulated to higher copy numbers than Ty1/copia. This is in part due to a large population of Ogre-like retrotransposons which alone make up over 20% of the genome. In addition to numerous types of mobile elements, we have discovered a set of novel satellite repeats and two additional variants of telomeric sequences. Comparative genome analysis revealed that there are only a few repeat sequences conserved between pea and soybean genomes. On the other hand, all major families of pea mobile elements are well represented in M. truncatula. Conclusion We have demonstrated that even in a species with a relatively large genome like pea, where a single 454-sequencing run provided only 0.77% coverage, the generated sequences were sufficient to reconstruct and analyze major repeat families corresponding to a total of 35–48% of the genome. These data

  11. Genomic Enzymology: Web Tools for Leveraging Protein Family Sequence-Function Space and Genome Context to Discover Novel Functions.

    Science.gov (United States)

    Gerlt, John A

    2017-08-22

    The exponentially increasing number of protein and nucleic acid sequences provides opportunities to discover novel enzymes, metabolic pathways, and metabolites/natural products, thereby adding to our knowledge of biochemistry and biology. The challenge has evolved from generating sequence information to mining the databases to integrating and leveraging the available information, i.e., the availability of "genomic enzymology" web tools. Web tools that allow identification of biosynthetic gene clusters are widely used by the natural products/synthetic biology community, thereby facilitating the discovery of novel natural products and the enzymes responsible for their biosynthesis. However, many novel enzymes with interesting mechanisms participate in uncharacterized small-molecule metabolic pathways; their discovery and functional characterization also can be accomplished by leveraging information in protein and nucleic acid databases. This Perspective focuses on two genomic enzymology web tools that assist the discovery novel metabolic pathways: (1) Enzyme Function Initiative-Enzyme Similarity Tool (EFI-EST) for generating sequence similarity networks to visualize and analyze sequence-function space in protein families and (2) Enzyme Function Initiative-Genome Neighborhood Tool (EFI-GNT) for generating genome neighborhood networks to visualize and analyze the genome context in microbial and fungal genomes. Both tools have been adapted to other applications to facilitate target selection for enzyme discovery and functional characterization. As the natural products community has demonstrated, the enzymology community needs to embrace the essential role of web tools that allow the protein and genome sequence databases to be leveraged for novel insights into enzymological problems.

  12. Sequence Analysis and Characterization of Active Human Alu Subfamilies Based on the 1000 Genomes Pilot Project.

    Science.gov (United States)

    Konkel, Miriam K; Walker, Jerilyn A; Hotard, Ashley B; Ranck, Megan C; Fontenot, Catherine C; Storer, Jessica; Stewart, Chip; Marth, Gabor T; Batzer, Mark A

    2015-08-29

    The goal of the 1000 Genomes Consortium is to characterize human genome structural variation (SV), including forms of copy number variations such as deletions, duplications, and insertions. Mobile element insertions, particularly Alu elements, are major contributors to genomic SV among humans. During the pilot phase of the project we experimentally validated 645 (611 intergenic and 34 exon targeted) polymorphic "young" Alu insertion events, absent from the human reference genome. Here, we report high resolution sequencing of 343 (322 unique) recent Alu insertion events, along with their respective target site duplications, precise genomic breakpoint coordinates, subfamily assignment, percent divergence, and estimated A-rich tail lengths. All the sequenced Alu loci were derived from the AluY lineage with no evidence of retrotransposition activity involving older Alu families (e.g., AluJ and AluS). AluYa5 is currently the most active Alu subfamily in the human lineage, followed by AluYb8, and many others including three newly identified subfamilies we have termed AluYb7a3, AluYb8b1, and AluYa4a1. This report provides the structural details of 322 unique Alu variants from individual human genomes collectively adding about 100 kb of genomic variation. Many Alu subfamilies are currently active in human populations, including a surprising level of AluY retrotransposition. Human Alu subfamilies exhibit continuous evolution with potential drivers sprouting new Alu lineages. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  13. Genomic sequencing in clinical trials

    OpenAIRE

    Mestan, Karen K; Ilkhanoff, Leonard; Mouli, Samdeep; Lin, Simon

    2011-01-01

    Abstract Human genome sequencing is the process by which the exact order of nucleic acid base pairs in the 24 human chromosomes is determined. Since the completion of the Human Genome Project in 2003, genomic sequencing is rapidly becoming a major part of our translational research efforts to understand and improve human health and disease. This article reviews the current and future directions of clinical research with respect to genomic sequencing, a technology that is just beginning to fin...

  14. Dispersed repetitive sequences in eukaryotic genomes and their possible biological significance

    International Nuclear Information System (INIS)

    Georgiev, G.P.; Kramerov, D.A.; Ryskov, A.P.; Skryabin, K.G.; Lukanidin, E.M.

    1983-01-01

    In this paper is described the properties of a novel mouse mdg-like element, the A2 sequence, which is the most abundant repetitive sequence. We also characterized an ubiquitous B2 sequence that represents, after B1, the dominant family among the short interspersed repeats of the mouse genome. The existence of some putative transposition intermediates was shown for repeats of both A and B types of the mouse genome. These are closed circular DNA of the A type and small polyadenylated B + RNAs. The fundamental question that arises is whether these sequences are simply selfish DNA capable of transpositions or do they fulfill some useful biological functions within the genome. 66 references, 11 figures, 1 table

  15. Genomic Prediction from Whole Genome Sequence in Livestock: The 1000 Bull Genomes Project

    DEFF Research Database (Denmark)

    Hayes, Benjamin J; MacLeod, Iona M; Daetwyler, Hans D

    Advantages of using whole genome sequence data to predict genomic estimated breeding values (GEBV) include better persistence of accuracy of GEBV across generations and more accurate GEBV across breeds. The 1000 Bull Genomes Project provides a database of whole genome sequenced key ancestor bulls....... In a dairy data set, predictions using BayesRC and imputed sequence data from 1000 Bull Genomes were 2% more accurate than with 800k data. We could demonstrate the method identified causal mutations in some cases. Further improvements will come from more accurate imputation of sequence variant genotypes...

  16. Whole-genome characterization in pedigreed non-human primates using Genotyping-By-Sequencing and imputation.

    OpenAIRE

    Cervera-Juanes, Rita; Vinson, Amanda; Ferguson, Betsy; Carbone, Lucia; Spindel, Eliot; Mccouch, Susan; Spindel, Jennifer; Nevonen, Kimberly; Letaw, John; Raboin, Michael; Bimber, Ben

    2016-01-01

    Background: Rhesus macaques are widely used in biomedical research, but the application of genomic information in this species to better understand human disease is still undeveloped. Whole-genome sequence (WGS) data in pedigreed macaque colonies could provide substantial experimental power, but the collection of WGS data in large cohorts remains a formidable expense. Here, we describe a cost-effective approach that selects the most informative macaques in a pedigree for whole-genome sequenci...

  17. Long-read sequencing and de novo assembly of a Chinese genome

    Science.gov (United States)

    Short-read sequencing has enabled the de novo assembly of several individual human genomes, but with inherent limitations in characterizing repeat elements. Here we sequence a Chinese individual HX1 by single-molecule real-time (SMRT) long-read sequencing, construct a physical map by NanoChannel arr...

  18. Complete Genome Sequence of the Probiotic Lactic Acid Bacterium Lactobacillus Rhamnosus

    Directory of Open Access Journals (Sweden)

    Samat Kozhakhmetov

    2014-01-01

    Full Text Available Introduction: Lactobacilli are a bacteria commonly found in the gastrointestinal tract. Some species of this genus have probiotic properties. The most common of these is Lactobacillus rhamnosus, a microoganism, generally regarded as safe (GRAS. It is also a homofermentative L-(+-lactic acid producer. The genus Lactobacillus is characterized by an extraordinary degree of the phenotypic and genotypic diversity. However, the studies of the genus were conducted mostly with the unequally distributed, non-random choice of species for sequencing; thus, there is only one representative genome from the Lactobacillus rhamnosus clade available to date. The aim of this study was to characterize the genome sequencing of selected strains of Lactobacilli. Methods: 109 samples were isolated from national domestic dairy products in the laboratory of Center for life sciences. After screaning isolates for probiotic properties, a highly active Lactobacillus spp strain was chosen. Genomic DNA was extracted according to the manufacturing protocol (Wizard® Genomic DNA Purification Kit. The Lactobacillus rhamnosus strain was identified as the highly active Lactobacillus strain accoridng to its morphological, cultural, physiological, and biochemical properties, and a genotypic analysis. Results: The genome of Lactobacillus rhamnosus was sequenced using the Roche 454 GS FLX (454 GS FLX platforms. The initial draft assembly was prepared from 14 large contigs (20 all contigs by the Newbler gsAssembler 2.3 (454 Life Sciences, Branford, CT. Conclusion: A full genome-sequencing of selected strains of lactic acid bacteria was made during the study.

  19. Characterization of Human Cytomegalovirus Genome Diversity in Immunocompromised Hosts by Whole-Genome Sequencing Directly From Clinical Specimens.

    Science.gov (United States)

    Hage, Elias; Wilkie, Gavin S; Linnenweber-Held, Silvia; Dhingra, Akshay; Suárez, Nicolás M; Schmidt, Julius J; Kay-Fedorov, Penelope C; Mischak-Weissinger, Eva; Heim, Albert; Schwarz, Anke; Schulz, Thomas F; Davison, Andrew J; Ganzenmueller, Tina

    2017-06-01

    Advances in next-generation sequencing (NGS) technologies allow comprehensive studies of genetic diversity over the entire genome of human cytomegalovirus (HCMV), a significant pathogen for immunocompromised individuals. Next-generation sequencing was performed on target enriched sequence libraries prepared directly from a variety of clinical specimens (blood, urine, breast milk, respiratory samples, biopsies, and vitreous humor) obtained longitudinally or from different anatomical compartments from 20 HCMV-infected patients (renal transplant recipients, stem cell transplant recipients, and congenitally infected children). De novo-assembled HCMV genome sequences were obtained for 57 of 68 sequenced samples. Analysis of longitudinal or compartmental HCMV diversity revealed various patterns: no major differences were detected among longitudinal, intraindividual blood samples from 9 of 15 patients and in most of the patients with compartmental samples, whereas a switch of the major HCMV population was observed in 6 individuals with sequential blood samples and upon compartmental analysis of 1 patient with HCMV retinitis. Variant analysis revealed additional aspects of minor virus population dynamics and antiviral-resistance mutations. In immunosuppressed patients, HCMV can remain relatively stable or undergo drastic genomic changes that are suggestive of the emergence of minor resident strains or de novo infection. © The Author 2017. Published by Oxford University Press for the Infectious Diseases Society of America. All rights reserved. For permissions, e-mail journals.permissions@oup.com.

  20. A comprehensive characterization of simple sequence repeats in pepper genomes provides valuable resources for marker development in Capsicum.

    Science.gov (United States)

    Cheng, Jiaowen; Zhao, Zicheng; Li, Bo; Qin, Cheng; Wu, Zhiming; Trejo-Saavedra, Diana L; Luo, Xirong; Cui, Junjie; Rivera-Bustamante, Rafael F; Li, Shuaicheng; Hu, Kailin

    2016-01-07

    The sequences of the full set of pepper genomes including nuclear, mitochondrial and chloroplast are now available for use. However, the overall of simple sequence repeats (SSR) distribution in these genomes and their practical implications for molecular marker development in Capsicum have not yet been described. Here, an average of 868,047.50, 45.50 and 30.00 SSR loci were identified in the nuclear, mitochondrial and chloroplast genomes of pepper, respectively. Subsequently, systematic comparisons of various species, genome types, motif lengths, repeat numbers and classified types were executed and discussed. In addition, a local database composed of 113,500 in silico unique SSR primer pairs was built using a homemade bioinformatics workflow. As a pilot study, 65 polymorphic markers were validated among a wide collection of 21 Capsicum genotypes with allele number and polymorphic information content value per marker raging from 2 to 6 and 0.05 to 0.64, respectively. Finally, a comparison of the clustering results with those of a previous study indicated the usability of the newly developed SSR markers. In summary, this first report on the comprehensive characterization of SSR motifs in pepper genomes and the very large set of SSR primer pairs will benefit various genetic studies in Capsicum.

  1. Genomic sequencing of Pleistocene cave bears

    Energy Technology Data Exchange (ETDEWEB)

    Noonan, James P.; Hofreiter, Michael; Smith, Doug; Priest, JamesR.; Rohland, Nadin; Rabeder, Gernot; Krause, Johannes; Detter, J. Chris; Paabo, Svante; Rubin, Edward M.

    2005-04-01

    Despite the information content of genomic DNA, ancient DNA studies to date have largely been limited to amplification of mitochondrial DNA due to technical hurdles such as contamination and degradation of ancient DNAs. In this study, we describe two metagenomic libraries constructed using unamplified DNA extracted from the bones of two 40,000-year-old extinct cave bears. Analysis of {approx}1 Mb of sequence from each library showed that, despite significant microbial contamination, 5.8 percent and 1.1 percent of clones in the libraries contain cave bear inserts, yielding 26,861 bp of cave bear genome sequence. Alignment of this sequence to the dog genome, the closest sequenced genome to cave bear in terms of evolutionary distance, revealed roughly the expected ratio of cave bear exons, repeats and conserved noncoding sequences. Only 0.04 percent of all clones sequenced were derived from contamination with modern human DNA. Comparison of cave bear with orthologous sequences from several modern bear species revealed the evolutionary relationship of these lineages. Using the metagenomic approach described here, we have recovered substantial quantities of mammalian genomic sequence more than twice as old as any previously reported, establishing the feasibility of ancient DNA genomic sequencing programs.

  2. Diversity in non-repetitive human sequences not found in the reference genome.

    Science.gov (United States)

    Kehr, Birte; Helgadottir, Anna; Melsted, Pall; Jonsson, Hakon; Helgason, Hannes; Jonasdottir, Adalbjörg; Jonasdottir, Aslaug; Sigurdsson, Asgeir; Gylfason, Arnaldur; Halldorsson, Gisli H; Kristmundsdottir, Snaedis; Thorgeirsson, Gudmundur; Olafsson, Isleifur; Holm, Hilma; Thorsteinsdottir, Unnur; Sulem, Patrick; Helgason, Agnar; Gudbjartsson, Daniel F; Halldorsson, Bjarni V; Stefansson, Kari

    2017-04-01

    Genomes usually contain some non-repetitive sequences that are missing from the reference genome and occur only in a population subset. Such non-repetitive, non-reference (NRNR) sequences have remained largely unexplored in terms of their characterization and downstream analyses. Here we describe 3,791 breakpoint-resolved NRNR sequence variants called using PopIns from whole-genome sequence data of 15,219 Icelanders. We found that over 95% of the 244 NRNR sequences that are 200 bp or longer are present in chimpanzees, indicating that they are ancestral. Furthermore, 149 variant loci are in linkage disequilibrium (r 2 > 0.8) with a genome-wide association study (GWAS) catalog marker, suggesting disease relevance. Additionally, we report an association (P = 3.8 × 10 -8 , odds ratio (OR) = 0.92) with myocardial infarction (23,360 cases, 300,771 controls) for a 766-bp NRNR sequence variant. Our results underline the importance of including variation of all complexity levels when searching for variants that associate with disease.

  3. Sequence analysis of the genome of carnation (Dianthus caryophyllus L.).

    Science.gov (United States)

    Yagi, Masafumi; Kosugi, Shunichi; Hirakawa, Hideki; Ohmiya, Akemi; Tanase, Koji; Harada, Taro; Kishimoto, Kyutaro; Nakayama, Masayoshi; Ichimura, Kazuo; Onozaki, Takashi; Yamaguchi, Hiroyasu; Sasaki, Nobuhiro; Miyahara, Taira; Nishizaki, Yuzo; Ozeki, Yoshihiro; Nakamura, Noriko; Suzuki, Takamasa; Tanaka, Yoshikazu; Sato, Shusei; Shirasawa, Kenta; Isobe, Sachiko; Miyamura, Yoshinori; Watanabe, Akiko; Nakayama, Shinobu; Kishida, Yoshie; Kohara, Mitsuyo; Tabata, Satoshi

    2014-06-01

    The whole-genome sequence of carnation (Dianthus caryophyllus L.) cv. 'Francesco' was determined using a combination of different new-generation multiplex sequencing platforms. The total length of the non-redundant sequences was 568,887,315 bp, consisting of 45,088 scaffolds, which covered 91% of the 622 Mb carnation genome estimated by k-mer analysis. The N50 values of contigs and scaffolds were 16,644 bp and 60,737 bp, respectively, and the longest scaffold was 1,287,144 bp. The average GC content of the contig sequences was 36%. A total of 1050, 13, 92 and 143 genes for tRNAs, rRNAs, snoRNA and miRNA, respectively, were identified in the assembled genomic sequences. For protein-encoding genes, 43 266 complete and partial gene structures excluding those in transposable elements were deduced. Gene coverage was ∼ 98%, as deduced from the coverage of the core eukaryotic genes. Intensive characterization of the assigned carnation genes and comparison with those of other plant species revealed characteristic features of the carnation genome. The results of this study will serve as a valuable resource for fundamental and applied research of carnation, especially for breeding new carnation varieties. Further information on the genomic sequences is available at http://carnation.kazusa.or.jp. © The Author 2013. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  4. An evaluation of Comparative Genome Sequencing (CGS by comparing two previously-sequenced bacterial genomes

    Directory of Open Access Journals (Sweden)

    Herring Christopher D

    2007-08-01

    Full Text Available Abstract Background With the development of new technology, it has recently become practical to resequence the genome of a bacterium after experimental manipulation. It is critical though to know the accuracy of the technique used, and to establish confidence that all of the mutations were detected. Results In order to evaluate the accuracy of genome resequencing using the microarray-based Comparative Genome Sequencing service provided by Nimblegen Systems Inc., we resequenced the E. coli strain W3110 Kohara using MG1655 as a reference, both of which have been completely sequenced using traditional sequencing methods. CGS detected 7 of 8 small sequence differences, one large deletion, and 9 of 12 IS element insertions present in W3110, but did not detect a large chromosomal inversion. In addition, we confirmed that CGS also detected 2 SNPs, one deletion and 7 IS element insertions that are not present in the genome sequence, which we attribute to changes that occurred after the creation of the W3110 lambda clone library. The false positive rate for SNPs was one per 244 Kb of genome sequence. Conclusion CGS is an effective way to detect multiple mutations present in one bacterium relative to another, and while highly cost-effective, is prone to certain errors. Mutations occurring in repeated sequences or in sequences with a high degree of secondary structure may go undetected. It is also critical to follow up on regions of interest in which SNPs were not called because they often indicate deletions or IS element insertions.

  5. Snake Genome Sequencing: Results and Future Prospects.

    Science.gov (United States)

    Kerkkamp, Harald M I; Kini, R Manjunatha; Pospelov, Alexey S; Vonk, Freek J; Henkel, Christiaan V; Richardson, Michael K

    2016-12-01

    Snake genome sequencing is in its infancy-very much behind the progress made in sequencing the genomes of humans, model organisms and pathogens relevant to biomedical research, and agricultural species. We provide here an overview of some of the snake genome projects in progress, and discuss the biological findings, with special emphasis on toxinology, from the small number of draft snake genomes already published. We discuss the future of snake genomics, pointing out that new sequencing technologies will help overcome the problem of repetitive sequences in assembling snake genomes. Genome sequences are also likely to be valuable in examining the clustering of toxin genes on the chromosomes, in designing recombinant antivenoms and in studying the epigenetic regulation of toxin gene expression.

  6. Snake Genome Sequencing: Results and Future Prospects

    Directory of Open Access Journals (Sweden)

    Harald M. I. Kerkkamp

    2016-12-01

    Full Text Available Snake genome sequencing is in its infancy—very much behind the progress made in sequencing the genomes of humans, model organisms and pathogens relevant to biomedical research, and agricultural species. We provide here an overview of some of the snake genome projects in progress, and discuss the biological findings, with special emphasis on toxinology, from the small number of draft snake genomes already published. We discuss the future of snake genomics, pointing out that new sequencing technologies will help overcome the problem of repetitive sequences in assembling snake genomes. Genome sequences are also likely to be valuable in examining the clustering of toxin genes on the chromosomes, in designing recombinant antivenoms and in studying the epigenetic regulation of toxin gene expression.

  7. Complete genome sequence of Actinosynnema mirum type strain (101T)

    Energy Technology Data Exchange (ETDEWEB)

    Land, Miriam; Lapidus, Alla; Mayilraj, Shanmugam; Chen, Feng; Copeland, Alex; Glavina Del Rio, Tijana; Nolan, Matt; Lucas, Susan; Tice, Hope; Cheng, Jan-Fang; Chertkov, Olga; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Rohde, Manfred; Goker, Markus; Pati, Amrita; Ivanova, Natalia; Mavrommatis, Konstantinos; Chen, Amy; Palaniappan, Krishna; Hauser, Loren; Chang, Yun-Juan; Jefferies, Cynthia; Brettin, Thomas; Detter, John C.; Han, Cliff; Chain, Patrick; Tindall, Brian; Bristow, James; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C.; Klenk, Hans-Peter

    2009-05-20

    Actinosynnema mirum Hasegawa et al. 1978 is the type species of the genus, and is of phylogenetic interest because of its central phylogenetic location in the Actino-synnemataceae, a rapidly growing family within the actinobacterial suborder Pseudo-nocardineae. A. mirum is characterized by its motile spores borne on synnemata and as a producer of nocardicin antibiotics. It is capable of growing aerobically and under a moderate CO2 atmosphere. The strain is a Gram-positive, aerial and substrate mycelium producing bacterium, originally isolated from a grass blade collected from the Raritan River, New Jersey. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first complete genome sequence of a member of the family Actinosynnemataceae, and only the second sequence from the actinobacterial suborder Pseudonocardineae. The 8,248,144 bp long single replicon genome with its 7100 protein-coding and 77 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  8. Plantagora: modeling whole genome sequencing and assembly of plant genomes.

    Directory of Open Access Journals (Sweden)

    Roger Barthelson

    Full Text Available BACKGROUND: Genomics studies are being revolutionized by the next generation sequencing technologies, which have made whole genome sequencing much more accessible to the average researcher. Whole genome sequencing with the new technologies is a developing art that, despite the large volumes of data that can be produced, may still fail to provide a clear and thorough map of a genome. The Plantagora project was conceived to address specifically the gap between having the technical tools for genome sequencing and knowing precisely the best way to use them. METHODOLOGY/PRINCIPAL FINDINGS: For Plantagora, a platform was created for generating simulated reads from several different plant genomes of different sizes. The resulting read files mimicked either 454 or Illumina reads, with varying paired end spacing. Thousands of datasets of reads were created, most derived from our primary model genome, rice chromosome one. All reads were assembled with different software assemblers, including Newbler, Abyss, and SOAPdenovo, and the resulting assemblies were evaluated by an extensive battery of metrics chosen for these studies. The metrics included both statistics of the assembly sequences and fidelity-related measures derived by alignment of the assemblies to the original genome source for the reads. The results were presented in a website, which includes a data graphing tool, all created to help the user compare rapidly the feasibility and effectiveness of different sequencing and assembly strategies prior to testing an approach in the lab. Some of our own conclusions regarding the different strategies were also recorded on the website. CONCLUSIONS/SIGNIFICANCE: Plantagora provides a substantial body of information for comparing different approaches to sequencing a plant genome, and some conclusions regarding some of the specific approaches. Plantagora also provides a platform of metrics and tools for studying the process of sequencing and assembly

  9. Draft genome sequence of the intestinal parasite Blastocystis subtype 4-isolate WR1

    Directory of Open Access Journals (Sweden)

    Ivan Wawrzyniak

    2015-06-01

    Full Text Available The intestinal protistan parasite Blastocystis is characterized by an extensive genetic variability with 17 subtypes (ST1–ST17 described to date. Only the whole genome of a human ST7 isolate was previously sequenced. Here we report the draft genome sequence of Blastocystis ST4-WR1 isolated from a laboratory rodent at Singapore.

  10. A Case Study into Microbial Genome Assembly Gap Sequences and Finishing Strategies.

    Science.gov (United States)

    Utturkar, Sagar M; Klingeman, Dawn M; Hurt, Richard A; Brown, Steven D

    2017-01-01

    This study characterized regions of DNA which remained unassembled by either PacBio and Illumina sequencing technologies for seven bacterial genomes. Two genomes were manually finished using bioinformatics and PCR/Sanger sequencing approaches and regions not assembled by automated software were analyzed. Gaps present within Illumina assemblies mostly correspond to repetitive DNA regions such as multiple rRNA operon sequences. PacBio gap sequences were evaluated for several properties such as GC content, read coverage, gap length, ability to form strong secondary structures, and corresponding annotations. Our hypothesis that strong secondary DNA structures blocked DNA polymerases and contributed to gap sequences was not accepted. PacBio assemblies had few limitations overall and gaps were explained as cumulative effect of lower than average sequence coverage and repetitive sequences at contig termini. An important aspect of the present study is the compilation of biological features that interfered with assembly and included active transposons, multiple plasmid sequences, phage DNA integration, and large sequence duplication. Our targeted genome finishing approach and systematic evaluation of the unassembled DNA will be useful for others looking to close, finish, and polish microbial genome sequences.

  11. Full-length genome sequences of porcine epidemic diarrhoea virus strain CV777; Use of NGS to analyse genomic and sub-genomic RNAs

    DEFF Research Database (Denmark)

    Rasmussen, Thomas Bruun; Boniotti, Maria Beatrice; Papetti, Alice

    2018-01-01

    Porcine epidemic diarrhoea virus, strain CV777, was initially characterized in 1978 as the causative agent of a disease first identified in the UK in 1971. This coronavirus has been widely distributed among laboratories and has been passaged both within pigs and in cell culture. To determine...... the variability between different stocks of the PEDV strain CV777, sequencing of the full-length genome (ca. 28kb) has been performed in 6 different laboratories, using different protocols. Not surprisingly, each of the different full genome sequences were distinct from each other and from the reference sequence...... the analysis of sub-genomic mRNAs from infected cells. It is clearly important to know the features of the specific sample of CV777 being used for experimental studies....

  12. Mapping copy number variation by population-scale genome sequencing

    DEFF Research Database (Denmark)

    Mills, Ryan E.; Walter, Klaudia; Stewart, Chip

    2011-01-01

    Genomic structural variants (SVs) are abundant in humans, differing from other forms of variation in extent, origin and functional impact. Despite progress in SV characterization, the nucleotide resolution architecture of most SVs remains unknown. We constructed a map of unbalanced SVs (that is......, copy number variants) based on whole genome DNA sequencing data from 185 human genomes, integrating evidence from complementary SV discovery approaches with extensive experimental validations. Our map encompassed 22,025 deletions and 6,000 additional SVs, including insertions and tandem duplications...

  13. Sequencing intractable DNA to close microbial genomes.

    Directory of Open Access Journals (Sweden)

    Richard A Hurt

    Full Text Available Advancement in high throughput DNA sequencing technologies has supported a rapid proliferation of microbial genome sequencing projects, providing the genetic blueprint for in-depth studies. Oftentimes, difficult to sequence regions in microbial genomes are ruled "intractable" resulting in a growing number of genomes with sequence gaps deposited in databases. A procedure was developed to sequence such problematic regions in the "non-contiguous finished" Desulfovibrio desulfuricans ND132 genome (6 intractable gaps and the Desulfovibrio africanus genome (1 intractable gap. The polynucleotides surrounding each gap formed GC rich secondary structures making the regions refractory to amplification and sequencing. Strand-displacing DNA polymerases used in concert with a novel ramped PCR extension cycle supported amplification and closure of all gap regions in both genomes. The developed procedures support accurate gene annotation, and provide a step-wise method that reduces the effort required for genome finishing.

  14. Sequencing Intractable DNA to Close Microbial Genomes

    Energy Technology Data Exchange (ETDEWEB)

    Hurt, Jr., Richard Ashley [ORNL; Brown, Steven D [ORNL; Podar, Mircea [ORNL; Palumbo, Anthony Vito [ORNL; Elias, Dwayne A [ORNL

    2012-01-01

    Advancement in high throughput DNA sequencing technologies has supported a rapid proliferation of microbial genome sequencing projects, providing the genetic blueprint for for in-depth studies. Oftentimes, difficult to sequence regions in microbial genomes are ruled intractable resulting in a growing number of genomes with sequence gaps deposited in databases. A procedure was developed to sequence such difficult regions in the non-contiguous finished Desulfovibrio desulfuricans ND132 genome (6 intractable gaps) and the Desulfovibrio africanus genome (1 intractable gap). The polynucleotides surrounding each gap formed GC rich secondary structures making the regions refractory to amplification and sequencing. Strand-displacing DNA polymerases used in concert with a novel ramped PCR extension cycle supported amplification and closure of all gap regions in both genomes. These developed procedures support accurate gene annotation, and provide a step-wise method that reduces the effort required for genome finishing.

  15. Complete chloroplast genome sequence of a major economic species, Ziziphus jujuba (Rhamnaceae).

    Science.gov (United States)

    Ma, Qiuyue; Li, Shuxian; Bi, Changwei; Hao, Zhaodong; Sun, Congrui; Ye, Ning

    2017-02-01

    Ziziphus jujuba is an important woody plant with high economic and medicinal value. Here, we analyzed and characterized the complete chloroplast (cp) genome of Z. jujuba, the first member of the Rhamnaceae family for which the chloroplast genome sequence has been reported. We also built a web browser for navigating the cp genome of Z. jujuba ( http://bio.njfu.edu.cn/gb2/gbrowse/Ziziphus_jujuba_cp/ ). Sequence analysis showed that this cp genome is 161,466 bp long and has a typical quadripartite structure of large (LSC, 89,120 bp) and small (SSC, 19,348 bp) single-copy regions separated by a pair of inverted repeats (IRs, 26,499 bp). The sequence contained 112 unique genes, including 78 protein-coding genes, 30 transfer RNAs, and four ribosomal RNAs. The genome structure, gene order, GC content, and codon usage are similar to other typical angiosperm cp genomes. A total of 38 tandem repeats, two forward repeats, and three palindromic repeats were detected in the Z. jujuba cp genome. Simple sequence repeat (SSR) analysis revealed that most SSRs were AT-rich. The homopolymer regions in the cp genome of Z. jujuba were verified and manually corrected by Sanger sequencing. One-third of mononucleotide repeats were found to be erroneously sequenced by the 454 pyrosequencing, which resulted in sequences of 1-4 bases shorter than that by the Sanger sequencing. Analyzing the cp genome of Z. jujuba revealed that the IR contraction and expansion events resulted in ycf1 and rps19 pseudogenes. A phylogenetic analysis based on 64 protein-coding genes showed that Z. jujuba was closely related to members of the Elaeagnaceae family, which will be helpful for phylogenetic studies of other Rosales species. The complete cp genome sequence of Z. jujuba will facilitate population, phylogenetic, and cp genetic engineering studies of this economic plant.

  16. Molecular Characterization of Transgenic Events Using Next Generation Sequencing Approach.

    Science.gov (United States)

    Guttikonda, Satish K; Marri, Pradeep; Mammadov, Jafar; Ye, Liang; Soe, Khaing; Richey, Kimberly; Cruse, James; Zhuang, Meibao; Gao, Zhifang; Evans, Clive; Rounsley, Steve; Kumpatla, Siva P

    2016-01-01

    Demand for the commercial use of genetically modified (GM) crops has been increasing in light of the projected growth of world population to nine billion by 2050. A prerequisite of paramount importance for regulatory submissions is the rigorous safety assessment of GM crops. One of the components of safety assessment is molecular characterization at DNA level which helps to determine the copy number, integrity and stability of a transgene; characterize the integration site within a host genome; and confirm the absence of vector DNA. Historically, molecular characterization has been carried out using Southern blot analysis coupled with Sanger sequencing. While this is a robust approach to characterize the transgenic crops, it is both time- and resource-consuming. The emergence of next-generation sequencing (NGS) technologies has provided highly sensitive and cost- and labor-effective alternative for molecular characterization compared to traditional Southern blot analysis. Herein, we have demonstrated the successful application of both whole genome sequencing and target capture sequencing approaches for the characterization of single and stacked transgenic events and compared the results and inferences with traditional method with respect to key criteria required for regulatory submissions.

  17. Molecular Characterization of Transgenic Events Using Next Generation Sequencing Approach.

    Directory of Open Access Journals (Sweden)

    Satish K Guttikonda

    Full Text Available Demand for the commercial use of genetically modified (GM crops has been increasing in light of the projected growth of world population to nine billion by 2050. A prerequisite of paramount importance for regulatory submissions is the rigorous safety assessment of GM crops. One of the components of safety assessment is molecular characterization at DNA level which helps to determine the copy number, integrity and stability of a transgene; characterize the integration site within a host genome; and confirm the absence of vector DNA. Historically, molecular characterization has been carried out using Southern blot analysis coupled with Sanger sequencing. While this is a robust approach to characterize the transgenic crops, it is both time- and resource-consuming. The emergence of next-generation sequencing (NGS technologies has provided highly sensitive and cost- and labor-effective alternative for molecular characterization compared to traditional Southern blot analysis. Herein, we have demonstrated the successful application of both whole genome sequencing and target capture sequencing approaches for the characterization of single and stacked transgenic events and compared the results and inferences with traditional method with respect to key criteria required for regulatory submissions.

  18. Genomic characterization of large heterochromatic gaps in the human genome assembly.

    Directory of Open Access Journals (Sweden)

    Nicolas Altemose

    2014-05-01

    Full Text Available The largest gaps in the human genome assembly correspond to multi-megabase heterochromatic regions composed primarily of two related families of tandem repeats, Human Satellites 2 and 3 (HSat2,3. The abundance of repetitive DNA in these regions challenges standard mapping and assembly algorithms, and as a result, the sequence composition and potential biological functions of these regions remain largely unexplored. Furthermore, existing genomic tools designed to predict consensus-based descriptions of repeat families cannot be readily applied to complex satellite repeats such as HSat2,3, which lack a consistent repeat unit reference sequence. Here we present an alignment-free method to characterize complex satellites using whole-genome shotgun read datasets. Utilizing this approach, we classify HSat2,3 sequences into fourteen subfamilies and predict their chromosomal distributions, resulting in a comprehensive satellite reference database to further enable genomic studies of heterochromatic regions. We also identify 1.3 Mb of non-repetitive sequence interspersed with HSat2,3 across 17 unmapped assembly scaffolds, including eight annotated gene predictions. Finally, we apply our satellite reference database to high-throughput sequence data from 396 males to estimate array size variation of the predominant HSat3 array on the Y chromosome, confirming that satellite array sizes can vary between individuals over an order of magnitude (7 to 98 Mb and further demonstrating that array sizes are distributed differently within distinct Y haplogroups. In summary, we present a novel framework for generating initial reference databases for unassembled genomic regions enriched with complex satellite DNA, and we further demonstrate the utility of these reference databases for studying patterns of sequence variation within human populations.

  19. Genome Sequence of Torulaspora delbrueckii NRRL Y-50541, Isolated from Mezcal Fermentation.

    Science.gov (United States)

    Gomez-Angulo, Jorge; Vega-Alvarado, Leticia; Escalante-García, Zazil; Grande, Ricardo; Gschaedler-Mathis, Anne; Amaya-Delgado, Lorena; Arrizon, Javier; Sanchez-Flores, Alejandro

    2015-07-23

    Torulaspora delbrueckii presents metabolic features interesting for biotechnological applications (in the dairy and wine industries). Recently, the T. delbrueckii CBS 1146 genome, which has been maintained under laboratory conditions since 1970, was published. Thus, a genome of a new mezcal yeast was sequenced and characterized and showed genetic differences and a higher genome assembly quality, offering a better reference genome. Copyright © 2015 Gomez-Angulo et al.

  20. A simple method for the parallel deep sequencing of full influenza A genomes

    DEFF Research Database (Denmark)

    Kampmann, Marie-Louise; Fordyce, Sarah Louise; Avila Arcos, Maria del Carmen

    2011-01-01

    Given the major threat of influenza A to human and animal health, and its ability to evolve rapidly through mutation and reassortment, tools that enable its timely characterization are necessary to help monitor its evolution and spread. For this purpose, deep sequencing can be a very valuable tool....... This study reports a comprehensive method that enables deep sequencing of the complete genomes of influenza A subtypes using the Illumina Genome Analyzer IIx (GAIIx). By using this method, the complete genomes of nine viruses were sequenced in parallel, representing the 2009 pandemic H1N1 virus, H5N1 virus...

  1. Sequencing of chloroplast genome using whole cellular DNA and Solexa sequencing technology

    Directory of Open Access Journals (Sweden)

    Jian eWu

    2012-11-01

    Full Text Available Sequencing of the chloroplast genome using traditional sequencing methods has been difficult because of its size (>120 kb and the complicated procedures required to prepare templates. To explore the feasibility of sequencing the chloroplast genome using DNA extracted from whole cells and Solexa sequencing technology, we sequenced whole cellular DNA isolated from leaves of three Brassica rapa accessions with one lane per accession. In total, 246 Mb, 362Mb, 361 Mb sequence data were generated for the three accessions Chiifu-401-42, Z16 and FT, respectively. Microreads were assembled by reference-guided assembly using the cpDNA sequences of B. rapa, Arabidopsis thaliana, and Nicotiana tabacum. We achieved coverage of more than 99.96% of the cp genome in the three tested accessions using the B. rapa sequence as the reference. When A. thaliana or N. tabacum sequences were used as references, 99.7–99.8% or 95.5–99.7% of the B. rapa chloroplast genome was covered, respectively. These results demonstrated that sequencing of whole cellular DNA isolated from young leaves using the Illumina Genome Analyzer is an efficient method for high-throughput sequencing of chloroplast genome.

  2. Whole Genome Sequencing of Enterovirus species C Isolates by High-throughput Sequencing: Development of Generic Primers

    Directory of Open Access Journals (Sweden)

    Maël Bessaud

    2016-08-01

    Full Text Available Enteroviruses are among the most common viruses infecting humans and can cause diverse clinical syndromes ranging from minor febrile illness to severe and potentially fatal diseases. Enterovirus species C (EV-C consists of more than 20 types, among which the 3 serotypes of polioviruses, the etiological agents of poliomyelitis, are included. Biodiversity and evolution of EV-C genomes are shaped by frequent recombination events. Therefore, identification and characterization of circulating EV-C strains require the sequencing of different genomic regions.A simple method was developed to sequence quickly the entire genome of EV-C isolates. Four overlapping fragments were produced separately by RT-PCR performed with generic primers. The four amplicons were then pooled and purified prior to be sequenced by high-throughput technique.The method was assessed on a panel of EV-Cs belonging to a wide-range of types. It can be used to determine full-length genome sequences through de novo assembly of thousands of reads. It was also able to discriminate reads from closely related viruses in mixtures.By decreasing the workload compared to classical Sanger-based techniques, this method will serve as a precious tool for sequencing large panels of EV-Cs isolated in cell cultures during environmental surveillance or from patients, including vaccine-derived polioviruses.

  3. Initial characterization of the large genome of the salamander Ambystoma mexicanum using shotgun and laser capture chromosome sequencing.

    Science.gov (United States)

    Keinath, Melissa C; Timoshevskiy, Vladimir A; Timoshevskaya, Nataliya Y; Tsonis, Panagiotis A; Voss, S Randal; Smith, Jeramiah J

    2015-11-10

    Vertebrates exhibit substantial diversity in genome size, and some of the largest genomes exist in species that uniquely inform diverse areas of basic and biomedical research. For example, the salamander Ambystoma mexicanum (the Mexican axolotl) is a model organism for studies of regeneration, development and genome evolution, yet its genome is ~10× larger than the human genome. As part of a hierarchical approach toward improving genome resources for the species, we generated 600 Gb of shotgun sequence data and developed methods for sequencing individual laser-captured chromosomes. Based on these data, we estimate that the A. mexicanum genome is ~32 Gb. Notably, as much as 19 Gb of the A. mexicanum genome can potentially be considered single copy, which presumably reflects the evolutionary diversification of mobile elements that accumulated during an ancient episode of genome expansion. Chromosome-targeted sequencing permitted the development of assemblies within the constraints of modern computational platforms, allowed us to place 2062 genes on the two smallest A. mexicanum chromosomes and resolves key events in the history of vertebrate genome evolution. Our analyses show that the capture and sequencing of individual chromosomes is likely to provide valuable information for the systematic sequencing, assembly and scaffolding of large genomes.

  4. [Complete genome sequencing and sequence analysis of BCG Tice].

    Science.gov (United States)

    Wang, Zhiming; Pan, Yuanlong; Wu, Jun; Zhu, Baoli

    2012-10-04

    The objective of this study is to obtain the complete genome sequence of Bacillus Calmette-Guerin Tice (BCG Tice), in order to provide more information about the molecular biology of BCG Tice and design more reasonable vaccines to prevent tuberculosis. We assembled the data from high-throughput sequencing with SOAPdenovo software, with many contigs and scaffolds obtained. There are many sequence gaps and physical gaps remained as a result of regional low coverage and low quality. We designed primers at the end of contigs and performed PCR amplification in order to link these contigs and scaffolds. With various enzymes to perform PCR amplification, adjustment of PCR reaction conditions, and combined with clone construction to sequence, all the gaps were finished. We obtained the complete genome sequence of BCG Tice and submitted it to GenBank of National Center for Biotechnology Information (NCBI). The genome of BCG Tice is 4334064 base pairs in length, with GC content 65.65%. The problems and strategies during the finishing step of BCG Tice sequencing are illuminated here, with the hope of affording some experience to those who are involved in the finishing step of genome sequencing. The microarray data were verified by our results.

  5. A novel bioinformatics method for efficient knowledge discovery by BLSOM from big genomic sequence data.

    Science.gov (United States)

    Bai, Yu; Iwasaki, Yuki; Kanaya, Shigehiko; Zhao, Yue; Ikemura, Toshimichi

    2014-01-01

    With remarkable increase of genomic sequence data of a wide range of species, novel tools are needed for comprehensive analyses of the big sequence data. Self-Organizing Map (SOM) is an effective tool for clustering and visualizing high-dimensional data such as oligonucleotide composition on one map. By modifying the conventional SOM, we have previously developed Batch-Learning SOM (BLSOM), which allows classification of sequence fragments according to species, solely depending on the oligonucleotide composition. In the present study, we introduce the oligonucleotide BLSOM used for characterization of vertebrate genome sequences. We first analyzed pentanucleotide compositions in 100 kb sequences derived from a wide range of vertebrate genomes and then the compositions in the human and mouse genomes in order to investigate an efficient method for detecting differences between the closely related genomes. BLSOM can recognize the species-specific key combination of oligonucleotide frequencies in each genome, which is called a "genome signature," and the specific regions specifically enriched in transcription-factor-binding sequences. Because the classification and visualization power is very high, BLSOM is an efficient powerful tool for extracting a wide range of information from massive amounts of genomic sequences (i.e., big sequence data).

  6. Human Genome Sequencing in Health and Disease

    Science.gov (United States)

    Gonzaga-Jauregui, Claudia; Lupski, James R.; Gibbs, Richard A.

    2013-01-01

    Following the “finished,” euchromatic, haploid human reference genome sequence, the rapid development of novel, faster, and cheaper sequencing technologies is making possible the era of personalized human genomics. Personal diploid human genome sequences have been generated, and each has contributed to our better understanding of variation in the human genome. We have consequently begun to appreciate the vastness of individual genetic variation from single nucleotide to structural variants. Translation of genome-scale variation into medically useful information is, however, in its infancy. This review summarizes the initial steps undertaken in clinical implementation of personal genome information, and describes the application of whole-genome and exome sequencing to identify the cause of genetic diseases and to suggest adjuvant therapies. Better analysis tools and a deeper understanding of the biology of our genome are necessary in order to decipher, interpret, and optimize clinical utility of what the variation in the human genome can teach us. Personal genome sequencing may eventually become an instrument of common medical practice, providing information that assists in the formulation of a differential diagnosis. We outline herein some of the remaining challenges. PMID:22248320

  7. Identification of optimum sequencing depth especially for de novo genome assembly of small genomes using next generation sequencing data.

    Science.gov (United States)

    Desai, Aarti; Marwah, Veer Singh; Yadav, Akshay; Jha, Vineet; Dhaygude, Kishor; Bangar, Ujwala; Kulkarni, Vivek; Jere, Abhay

    2013-01-01

    Next Generation Sequencing (NGS) is a disruptive technology that has found widespread acceptance in the life sciences research community. The high throughput and low cost of sequencing has encouraged researchers to undertake ambitious genomic projects, especially in de novo genome sequencing. Currently, NGS systems generate sequence data as short reads and de novo genome assembly using these short reads is computationally very intensive. Due to lower cost of sequencing and higher throughput, NGS systems now provide the ability to sequence genomes at high depth. However, currently no report is available highlighting the impact of high sequence depth on genome assembly using real data sets and multiple assembly algorithms. Recently, some studies have evaluated the impact of sequence coverage, error rate and average read length on genome assembly using multiple assembly algorithms, however, these evaluations were performed using simulated datasets. One limitation of using simulated datasets is that variables such as error rates, read length and coverage which are known to impact genome assembly are carefully controlled. Hence, this study was undertaken to identify the minimum depth of sequencing required for de novo assembly for different sized genomes using graph based assembly algorithms and real datasets. Illumina reads for E.coli (4.6 MB) S.kudriavzevii (11.18 MB) and C.elegans (100 MB) were assembled using SOAPdenovo, Velvet, ABySS, Meraculous and IDBA-UD. Our analysis shows that 50X is the optimum read depth for assembling these genomes using all assemblers except Meraculous which requires 100X read depth. Moreover, our analysis shows that de novo assembly from 50X read data requires only 6-40 GB RAM depending on the genome size and assembly algorithm used. We believe that this information can be extremely valuable for researchers in designing experiments and multiplexing which will enable optimum utilization of sequencing as well as analysis resources.

  8. Genomic insight into the common carp (Cyprinus carpio genome by sequencing analysis of BAC-end sequences

    Directory of Open Access Journals (Sweden)

    Wang Jintu

    2011-04-01

    Full Text Available Abstract Background Common carp is one of the most important aquaculture teleost fish in the world. Common carp and other closely related Cyprinidae species provide over 30% aquaculture production in the world. However, common carp genomic resources are still relatively underdeveloped. BAC end sequences (BES are important resources for genome research on BAC-anchored genetic marker development, linkage map and physical map integration, and whole genome sequence assembling and scaffolding. Result To develop such valuable resources in common carp (Cyprinus carpio, a total of 40,224 BAC clones were sequenced on both ends, generating 65,720 clean BES with an average read length of 647 bp after sequence processing, representing 42,522,168 bp or 2.5% of common carp genome. The first survey of common carp genome was conducted with various bioinformatics tools. The common carp genome contains over 17.3% of repetitive elements with GC content of 36.8% and 518 transposon ORFs. To identify and develop BAC-anchored microsatellite markers, a total of 13,581 microsatellites were detected from 10,355 BES. The coding region of 7,127 genes were recognized from 9,443 BES on 7,453 BACs, with 1,990 BACs have genes on both ends. To evaluate the similarity to the genome of closely related zebrafish, BES of common carp were aligned against zebrafish genome. A total of 39,335 BES of common carp have conserved homologs on zebrafish genome which demonstrated the high similarity between zebrafish and common carp genomes, indicating the feasibility of comparative mapping between zebrafish and common carp once we have physical map of common carp. Conclusion BAC end sequences are great resources for the first genome wide survey of common carp. The repetitive DNA was estimated to be approximate 28% of common carp genome, indicating the higher complexity of the genome. Comparative analysis had mapped around 40,000 BES to zebrafish genome and established over 3

  9. Genomic insight into the common carp (Cyprinus carpio) genome by sequencing analysis of BAC-end sequences

    Science.gov (United States)

    2011-01-01

    Background Common carp is one of the most important aquaculture teleost fish in the world. Common carp and other closely related Cyprinidae species provide over 30% aquaculture production in the world. However, common carp genomic resources are still relatively underdeveloped. BAC end sequences (BES) are important resources for genome research on BAC-anchored genetic marker development, linkage map and physical map integration, and whole genome sequence assembling and scaffolding. Result To develop such valuable resources in common carp (Cyprinus carpio), a total of 40,224 BAC clones were sequenced on both ends, generating 65,720 clean BES with an average read length of 647 bp after sequence processing, representing 42,522,168 bp or 2.5% of common carp genome. The first survey of common carp genome was conducted with various bioinformatics tools. The common carp genome contains over 17.3% of repetitive elements with GC content of 36.8% and 518 transposon ORFs. To identify and develop BAC-anchored microsatellite markers, a total of 13,581 microsatellites were detected from 10,355 BES. The coding region of 7,127 genes were recognized from 9,443 BES on 7,453 BACs, with 1,990 BACs have genes on both ends. To evaluate the similarity to the genome of closely related zebrafish, BES of common carp were aligned against zebrafish genome. A total of 39,335 BES of common carp have conserved homologs on zebrafish genome which demonstrated the high similarity between zebrafish and common carp genomes, indicating the feasibility of comparative mapping between zebrafish and common carp once we have physical map of common carp. Conclusion BAC end sequences are great resources for the first genome wide survey of common carp. The repetitive DNA was estimated to be approximate 28% of common carp genome, indicating the higher complexity of the genome. Comparative analysis had mapped around 40,000 BES to zebrafish genome and established over 3,100 microsyntenies, covering over 50% of

  10. High-precision, whole-genome sequencing of laboratory strains facilitates genetic studies.

    Directory of Open Access Journals (Sweden)

    Anjana Srivatsan

    2008-08-01

    Full Text Available Whole-genome sequencing is a powerful technique for obtaining the reference sequence information of multiple organisms. Its use can be dramatically expanded to rapidly identify genomic variations, which can be linked with phenotypes to obtain biological insights. We explored these potential applications using the emerging next-generation sequencing platform Solexa Genome Analyzer, and the well-characterized model bacterium Bacillus subtilis. Combining sequencing with experimental verification, we first improved the accuracy of the published sequence of the B. subtilis reference strain 168, then obtained sequences of multiple related laboratory strains and different isolates of each strain. This provides a framework for comparing the divergence between different laboratory strains and between their individual isolates. We also demonstrated the power of Solexa sequencing by using its results to predict a defect in the citrate signal transduction pathway of a common laboratory strain, which we verified experimentally. Finally, we examined the molecular nature of spontaneously generated mutations that suppress the growth defect caused by deletion of the stringent response mediator relA. Using whole-genome sequencing, we rapidly mapped these suppressor mutations to two small homologs of relA. Interestingly, stable suppressor strains had mutations in both genes, with each mutation alone partially relieving the relA growth defect. This supports an intriguing three-locus interaction module that is not easily identifiable through traditional suppressor mapping. We conclude that whole-genome sequencing can drastically accelerate the identification of suppressor mutations and complex genetic interactions, and it can be applied as a standard tool to investigate the genetic traits of model organisms.

  11. Isolation and sequence characterization of DNA-A genome of a new begomovirus strain associated with severe leaf curling symptoms of Jatropha curcas L.

    KAUST Repository

    Chauhan, Sushma

    2018-04-22

    Begomoviruses belong to the family Geminiviridae are associated with several disease symptoms, such as mosaic and leaf curling in Jatropha curcas. The molecular characterization of these viral strains will help in developing management strategies to control the disease. In this study, J. curcas that was infected with begomovirus and showed acute leaf curling symptoms were identified. DNA-A segment from pathogenic viral strain was isolated and sequenced. The sequenced genome was assembled and characterized in detail. The full-length DNA-A sequence was covered by primer walking. The genome sequence showed the general organization of DNA-A from begomovirus by the distribution of ORFs in both viral and anti-viral strands. The genome size ranged from 2844 bp–2852 bp. Three strains with minor nucleotide variations were identified, and a phylogenetic analysis was performed by comparing the DNA-A segments from other reported begomovirus isolates. The maximum sequence similarity was observed with Euphorbia yellow mosaic virus (FN435995). In the phylogenetic tree, no clustering was observed with previously reported begomovirus strains isolated from J. curcas host. The strains isolated in this study belong to new begomoviral strain that elicits symptoms of leaf curling in J. curcas. The results indicate that the probable origin of the strains is from Jatropha mosaic virus infecting J. gassypifolia. The strains isolated in this study are referred as Jatropha curcas leaf curl India virus (JCLCIV) based on the major symptoms exhibited by host J. curcas.

  12. Molecular characterization of human T-cell lymphotropic virus type 1 full and partial genomes by Illumina massively parallel sequencing technology.

    Directory of Open Access Journals (Sweden)

    Rodrigo Pessôa

    Full Text Available BACKGROUND: Here, we report on the partial and full-length genomic (FLG variability of HTLV-1 sequences from 90 well-characterized subjects, including 48 HTLV-1 asymptomatic carriers (ACs, 35 HTLV-1-associated myelopathy/tropical spastic paraparesis (HAM/TSP and 7 adult T-cell leukemia/lymphoma (ATLL patients, using an Illumina paired-end protocol. METHODS: Blood samples were collected from 90 individuals, and DNA was extracted from the PBMCs to measure the proviral load and to amplify the HTLV-1 FLG from two overlapping fragments. The amplified PCR products were subjected to deep sequencing. The sequencing data were assembled, aligned, and mapped against the HTLV-1 genome with sufficient genetic resemblance and utilized for further phylogenetic analysis. RESULTS: A high-throughput sequencing-by-synthesis instrument was used to obtain an average of 3210- and 5200-fold coverage of the partial (n = 14 and FLG (n = 76 data from the HTLV-1 strains, respectively. The results based on the phylogenetic trees of consensus sequences from partial and FLGs revealed that 86 (95.5% individuals were infected with the transcontinental sub-subtypes of the cosmopolitan subtype (aA and that 4 individuals (4.5% were infected with the Japanese sub-subtypes (aB. A comparison of the nucleotide and amino acids of the FLG between the three clinical settings yielded no correlation between the sequenced genotype and clinical outcomes. The evolutionary relationships among the HTLV sequences were inferred from nucleotide sequence, and the results are consistent with the hypothesis that there were multiple introductions of the transcontinental subtype in Brazil. CONCLUSIONS: This study has increased the number of subtype aA full-length genomes from 8 to 81 and HTLV-1 aB from 2 to 5 sequences. The overall data confirmed that the cosmopolitan transcontinental sub-subtypes were the most prevalent in the Brazilian population. It is hoped that this valuable genomic data

  13. Molecular characterization of human T-cell lymphotropic virus type 1 full and partial genomes by Illumina massively parallel sequencing technology.

    Science.gov (United States)

    Pessôa, Rodrigo; Watanabe, Jaqueline Tomoko; Nukui, Youko; Pereira, Juliana; Casseb, Jorge; Kasseb, Jorge; de Oliveira, Augusto César Penalva; Segurado, Aluisio Cotrim; Sanabani, Sabri Saeed

    2014-01-01

    Here, we report on the partial and full-length genomic (FLG) variability of HTLV-1 sequences from 90 well-characterized subjects, including 48 HTLV-1 asymptomatic carriers (ACs), 35 HTLV-1-associated myelopathy/tropical spastic paraparesis (HAM/TSP) and 7 adult T-cell leukemia/lymphoma (ATLL) patients, using an Illumina paired-end protocol. Blood samples were collected from 90 individuals, and DNA was extracted from the PBMCs to measure the proviral load and to amplify the HTLV-1 FLG from two overlapping fragments. The amplified PCR products were subjected to deep sequencing. The sequencing data were assembled, aligned, and mapped against the HTLV-1 genome with sufficient genetic resemblance and utilized for further phylogenetic analysis. A high-throughput sequencing-by-synthesis instrument was used to obtain an average of 3210- and 5200-fold coverage of the partial (n = 14) and FLG (n = 76) data from the HTLV-1 strains, respectively. The results based on the phylogenetic trees of consensus sequences from partial and FLGs revealed that 86 (95.5%) individuals were infected with the transcontinental sub-subtypes of the cosmopolitan subtype (aA) and that 4 individuals (4.5%) were infected with the Japanese sub-subtypes (aB). A comparison of the nucleotide and amino acids of the FLG between the three clinical settings yielded no correlation between the sequenced genotype and clinical outcomes. The evolutionary relationships among the HTLV sequences were inferred from nucleotide sequence, and the results are consistent with the hypothesis that there were multiple introductions of the transcontinental subtype in Brazil. This study has increased the number of subtype aA full-length genomes from 8 to 81 and HTLV-1 aB from 2 to 5 sequences. The overall data confirmed that the cosmopolitan transcontinental sub-subtypes were the most prevalent in the Brazilian population. It is hoped that this valuable genomic data will add to our current understanding of the

  14. Full genome sequences and molecular characterization of tick-borne encephalitis virus strains isolated from human patients.

    Science.gov (United States)

    Formanová, Petra; Černý, Jiří; Bolfíková, Barbora Černá; Valdés, James J; Kozlova, Irina; Dzhioev, Yuri; Růžek, Daniel

    2015-02-01

    Tick-borne encephalitis virus (TBEV) causes tick-borne encephalitis (TBE), one of the most important human neuroinfections across Eurasia. Up to date, only three full genome sequences of human European TBEV isolates are available, mostly due to difficulties with isolation of the virus from human patients. Here we present full genome characterization of an additional five low-passage TBEV strains isolated from human patients with severe forms of TBE. These strains were isolated in 1953 within Central Bohemia in the former Czechoslovakia, and belong to the historically oldest human TBEV isolates in Europe. We demonstrate here that all analyzed isolates are distantly phylogenetically related, indicating that the emergence of TBE in Central Europe was not caused by one predominant strain, but rather a pool of distantly related TBEV strains. Nucleotide identity between individual sequenced TBEV strains ranged from 97.5% to 99.6% and all strains shared large deletions in the 3' non-coding region, which has been recently suggested to be an important determinant of virulence. The number of unique amino acid substitutions varied from 3 to 9 in individual isolates, but no characteristic amino acid substitution typical exclusively for all human TBEV isolates was identified when compared to the isolates from ticks. We did, however, correlate that the exploration of the TBEV envelope glycoprotein by specific antibodies were in close proximity to these unique amino acid substitutions. Taken together, we report here the largest number of patient-derived European TBEV full genome sequences to date and provide a platform for further studies on evolution of TBEV since the first emergence of human TBE in Europe. Copyright © 2014 Elsevier GmbH. All rights reserved.

  15. Genomic treasure troves: complete genome sequencing of herbarium and insect museum specimens.

    Science.gov (United States)

    Staats, Martijn; Erkens, Roy H J; van de Vossenberg, Bart; Wieringa, Jan J; Kraaijeveld, Ken; Stielow, Benjamin; Geml, József; Richardson, James E; Bakker, Freek T

    2013-01-01

    Unlocking the vast genomic diversity stored in natural history collections would create unprecedented opportunities for genome-scale evolutionary, phylogenetic, domestication and population genomic studies. Many researchers have been discouraged from using historical specimens in molecular studies because of both generally limited success of DNA extraction and the challenges associated with PCR-amplifying highly degraded DNA. In today's next-generation sequencing (NGS) world, opportunities and prospects for historical DNA have changed dramatically, as most NGS methods are actually designed for taking short fragmented DNA molecules as templates. Here we show that using a standard multiplex and paired-end Illumina sequencing approach, genome-scale sequence data can be generated reliably from dry-preserved plant, fungal and insect specimens collected up to 115 years ago, and with minimal destructive sampling. Using a reference-based assembly approach, we were able to produce the entire nuclear genome of a 43-year-old Arabidopsis thaliana (Brassicaceae) herbarium specimen with high and uniform sequence coverage. Nuclear genome sequences of three fungal specimens of 22-82 years of age (Agaricus bisporus, Laccaria bicolor, Pleurotus ostreatus) were generated with 81.4-97.9% exome coverage. Complete organellar genome sequences were assembled for all specimens. Using de novo assembly we retrieved between 16.2-71.0% of coding sequence regions, and hence remain somewhat cautious about prospects for de novo genome assembly from historical specimens. Non-target sequence contaminations were observed in 2 of our insect museum specimens. We anticipate that future museum genomics projects will perhaps not generate entire genome sequences in all cases (our specimens contained relatively small and low-complexity genomes), but at least generating vital comparative genomic data for testing (phylo)genetic, demographic and genetic hypotheses, that become increasingly more horizontal

  16. Insights from 20 years of bacterial genome sequencing

    DEFF Research Database (Denmark)

    Land, Miriam; Hauser, Loren; Jun, Se-Ran

    2015-01-01

    Since the first two complete bacterial genome sequences were published in 1995, the science of bacteria has dramatically changed. Using third-generation DNA sequencing, it is possible to completely sequence a bacterial genome in a few hours and identify some types of methylation sites along...... the genome as well. Sequencing of bacterial genome sequences is now a standard procedure, and the information from tens of thousands of bacterial genomes has had a major impact on our views of the bacterial world. In this review, we explore a series of questions to highlight some insights that comparative...... genomics has produced. To date, there are genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. However, the distribution is quite skewed towards a few phyla that contain model organisms. But the breadth is continuing to improve, with projects dedicated to filling...

  17. Repetitive part of the banana (Musa acuminata) genome investigated by low-depth 454 sequencing.

    Science.gov (United States)

    Hribová, Eva; Neumann, Pavel; Matsumoto, Takashi; Roux, Nicolas; Macas, Jirí; Dolezel, Jaroslav

    2010-09-16

    Bananas and plantains (Musa spp.) are grown in more than a hundred tropical and subtropical countries and provide staple food for hundreds of millions of people. They are seed-sterile crops propagated clonally and this makes them vulnerable to a rapid spread of devastating diseases and at the same time hampers breeding improved cultivars. Although the socio-economic importance of bananas and plantains cannot be overestimated, they remain outside the focus of major research programs. This slows down the study of nuclear genome and the development of molecular tools to facilitate banana improvement. In this work, we report on the first thorough characterization of the repeat component of the banana (M. acuminata cv. 'Calcutta 4') genome. Analysis of almost 100 Mb of sequence data (0.15× genome coverage) permitted partial sequence reconstruction and characterization of repetitive DNA, making up about 30% of the genome. The results showed that the banana repeats are predominantly made of various types of Ty1/copia and Ty3/gypsy retroelements representing 16 and 7% of the genome respectively. On the other hand, DNA transposons were found to be rare. In addition to new families of transposable elements, two new satellite repeats were discovered and found useful as cytogenetic markers. To help in banana sequence annotation, a specific Musa repeat database was created, and its utility was demonstrated by analyzing the repeat composition of 62 genomic BAC clones. A low-depth 454 sequencing of banana nuclear genome provided the largest amount of DNA sequence data available until now for Musa and permitted reconstruction of most of the major types of DNA repeats. The information obtained in this study improves the knowledge of the long-range organization of banana chromosomes, and provides sequence resources needed for repeat masking and annotation during the Musa genome sequencing project. It also provides sequence data for isolation of DNA markers to be used in genetic

  18. Large-Scale Sequencing: The Future of Genomic Sciences Colloquium

    Energy Technology Data Exchange (ETDEWEB)

    Margaret Riley; Merry Buckley

    2009-01-01

    , since not only are their genomes available, but they are also accompanied by data on environment and physiology that can be used to understand the resulting data. As single cell isolation methods improve, there should be a shift toward incorporating uncultured organisms and communities into this effort. Efforts to sequence cultivated isolates should target characterized isolates from culture collections for which biochemical data are available, as well as other cultures of lasting value from personal collections. The genomes of type strains should be among the first targets for sequencing, but creative culture methods, novel cell isolation, and sorting methods would all be helpful in obtaining organisms we have not yet been able to cultivate for sequencing. The data that should be provided for strains targeted for sequencing will depend on the phylogenetic context of the organism and the amount of information available about its nearest relatives. Annotation is an important part of transforming genome sequences into useful resources, but it represents the most significant bottleneck to the field of comparative genomics right now and must be addressed. Furthermore, there is a need for more consistency in both annotation and achieving annotation data. As new annotation tools become available over time, re-annotation of genomes should be implemented, taking advantage of advancements in annotation techniques in order to capitalize on the genome sequences and increase both the societal and scientific benefit of genomics work. Given the proper resources, the knowledge and ability exist to be able to select model systems, some simple, some less so, and dissect them so that we may understand the processes and interactions at work in them. Colloquium participants suggest a five-pronged, coordinated initiative to exhaustively describe six different microbial ecosystems, designed to describe all the gene diversity, across genomes. In this effort, sequencing should be complemented

  19. Characterizing Phage Genomes for Therapeutic Applications

    Directory of Open Access Journals (Sweden)

    Casandra W. Philipson

    2018-04-01

    Full Text Available Multi-drug resistance is increasing at alarming rates. The efficacy of phage therapy, treating bacterial infections with bacteriophages alone or in combination with traditional antibiotics, has been demonstrated in emergency cases in the United States and in other countries, however remains to be approved for wide-spread use in the US. One limiting factor is a lack of guidelines for assessing the genomic safety of phage candidates. We present the phage characterization workflow used by our team to generate data for submitting phages to the Federal Drug Administration (FDA for authorized use. Essential analysis checkpoints and warnings are detailed for obtaining high-quality genomes, excluding undesirable candidates, rigorously assessing a phage genome for safety and evaluating sequencing contamination. This workflow has been developed in accordance with community standards for high-throughput sequencing of viral genomes as well as principles for ideal phages used for therapy. The feasibility and utility of the pipeline is demonstrated on two new phage genomes that meet all safety criteria. We propose these guidelines as a minimum standard for phages being submitted to the FDA for review as investigational new drug candidates.

  20. Characterizing Phage Genomes for Therapeutic Applications.

    Science.gov (United States)

    Philipson, Casandra W; Voegtly, Logan J; Lueder, Matthew R; Long, Kyle A; Rice, Gregory K; Frey, Kenneth G; Biswas, Biswajit; Cer, Regina Z; Hamilton, Theron; Bishop-Lilly, Kimberly A

    2018-04-10

    Multi-drug resistance is increasing at alarming rates. The efficacy of phage therapy, treating bacterial infections with bacteriophages alone or in combination with traditional antibiotics, has been demonstrated in emergency cases in the United States and in other countries, however remains to be approved for wide-spread use in the US. One limiting factor is a lack of guidelines for assessing the genomic safety of phage candidates. We present the phage characterization workflow used by our team to generate data for submitting phages to the Federal Drug Administration (FDA) for authorized use. Essential analysis checkpoints and warnings are detailed for obtaining high-quality genomes, excluding undesirable candidates, rigorously assessing a phage genome for safety and evaluating sequencing contamination. This workflow has been developed in accordance with community standards for high-throughput sequencing of viral genomes as well as principles for ideal phages used for therapy. The feasibility and utility of the pipeline is demonstrated on two new phage genomes that meet all safety criteria. We propose these guidelines as a minimum standard for phages being submitted to the FDA for review as investigational new drug candidates.

  1. Complete genome sequence of Bifidobacterium breve CECT 7263, a strain isolated from human milk.

    Science.gov (United States)

    Jiménez, Esther; Villar-Tajadura, M Antonia; Marín, María; Fontecha, Javier; Requena, Teresa; Arroyo, Rebeca; Fernández, Leónides; Rodríguez, Juan M

    2012-07-01

    Bifidobacterium breve is an actinobacterium frequently isolated from colonic microbiota of breastfeeding babies. Here, we report the complete and annotated genome sequence of a B. breve strain isolated from human milk, B. breve CECT 7263. The genome sequence will provide new insights into the biology of this potential probiotic organism and will allow the characterization of genes related to beneficial properties.

  2. Rare and common regulatory variation in population-scale sequenced human genomes.

    Directory of Open Access Journals (Sweden)

    Stephen B Montgomery

    2011-07-01

    Full Text Available Population-scale genome sequencing allows the characterization of functional effects of a broad spectrum of genetic variants underlying human phenotypic variation. Here, we investigate the influence of rare and common genetic variants on gene expression patterns, using variants identified from sequencing data from the 1000 genomes project in an African and European population sample and gene expression data from lymphoblastoid cell lines. We detect comparable numbers of expression quantitative trait loci (eQTLs when compared to genotypes obtained from HapMap 3, but as many as 80% of the top expression quantitative trait variants (eQTVs discovered from 1000 genomes data are novel. The properties of the newly discovered variants suggest that mapping common causal regulatory variants is challenging even with full resequencing data; however, we observe significant enrichment of regulatory effects in splice-site and nonsense variants. Using RNA sequencing data, we show that 46.2% of nonsynonymous variants are differentially expressed in at least one individual in our sample, creating widespread potential for interactions between functional protein-coding and regulatory variants. We also use allele-specific expression to identify putative rare causal regulatory variants. Furthermore, we demonstrate that outlier expression values can be due to rare variant effects, and we approximate the number of such effects harboured in an individual by effect size. Our results demonstrate that integration of genomic and RNA sequencing analyses allows for the joint assessment of genome sequence and genome function.

  3. Genome sequence determination and metagenomic characterization of a Dehalococcoides mixed culture grown on cis-1,2-dichloroethene.

    Science.gov (United States)

    Yohda, Masafumi; Yagi, Osami; Takechi, Ayane; Kitajima, Mizuki; Matsuda, Hisashi; Miyamura, Naoaki; Aizawa, Tomoko; Nakajima, Mutsuyasu; Sunairi, Michio; Daiba, Akito; Miyajima, Takashi; Teruya, Morimi; Teruya, Kuniko; Shiroma, Akino; Shimoji, Makiko; Tamotsu, Hinako; Juan, Ayaka; Nakano, Kazuma; Aoyama, Misako; Terabayashi, Yasunobu; Satou, Kazuhito; Hirano, Takashi

    2015-07-01

    A Dehalococcoides-containing bacterial consortium that performed dechlorination of 0.20 mM cis-1,2-dichloroethene to ethene in 14 days was obtained from the sediment mud of the lotus field. To obtain detailed information of the consortium, the metagenome was analyzed using the short-read next-generation sequencer SOLiD 3. Matching the obtained sequence tags with the reference genome sequences indicated that the Dehalococcoides sp. in the consortium was highly homologous to Dehalococcoides mccartyi CBDB1 and BAV1. Sequence comparison with the reference sequence constructed from 16S rRNA gene sequences in a public database showed the presence of Sedimentibacter, Sulfurospirillum, Clostridium, Desulfovibrio, Parabacteroides, Alistipes, Eubacterium, Peptostreptococcus and Proteocatella in addition to Dehalococcoides sp. After further enrichment, the members of the consortium were narrowed down to almost three species. Finally, the full-length circular genome sequence of the Dehalococcoides sp. in the consortium, D. mccartyi IBARAKI, was determined by analyzing the metagenome with the single-molecule DNA sequencer PacBio RS. The accuracy of the sequence was confirmed by matching it to the tag sequences obtained by SOLiD 3. The genome is 1,451,062 nt and the number of CDS is 1566, which includes 3 rRNA genes and 47 tRNA genes. There exist twenty-eight RDase genes that are accompanied by the genes for anchor proteins. The genome exhibits significant sequence identity with other Dehalococcoides spp. throughout the genome, but there exists significant difference in the distribution RDase genes. The combination of a short-read next-generation DNA sequencer and a long-read single-molecule DNA sequencer gives detailed information of a bacterial consortium. Copyright © 2014 The Society for Biotechnology, Japan. Published by Elsevier B.V. All rights reserved.

  4. Spectral entropy criteria for structural segmentation in genomic DNA sequences

    International Nuclear Information System (INIS)

    Chechetkin, V.R.; Lobzin, V.V.

    2004-01-01

    The spectral entropy is calculated with Fourier structure factors and characterizes the level of structural ordering in a sequence of symbols. It may efficiently be applied to the assessment and reconstruction of the modular structure in genomic DNA sequences. We present the relevant spectral entropy criteria for the local and non-local structural segmentation in DNA sequences. The results are illustrated with the model examples and analysis of intervening exon-intron segments in the protein-coding regions

  5. Characterization of the first complete genome sequence of an Impatiens necrotic spot orthotospovirus isolate from the United States and worldwide phylogenetic analyses of INSV isolates.

    Science.gov (United States)

    Zhao, Kaixi; Margaria, Paolo; Rosa, Cristina

    2018-05-10

    Impatiens necrotic spot orthotospovirus (INSV) can impact economically important ornamental plants and vegetables worldwide. Characterization studies on INSV are limited. For most INSV isolates, there are no complete genome sequences available. This lack of genomic information has a negative impact on the understanding of the INSV genetic diversity and evolution. Here we report the first complete nucleotide sequence of a US INSV isolate. INSV-UP01 was isolated from an impatiens in Pennsylvania, US. RT-PCR was used to clone its full-length genome and Vector NTI to assemble overlapping sequences. Phylogenetic trees were constructed by using MEGA7 software to show the phylogenetic relationships with other available INSV sequences worldwide. This US isolate has genome and biological features classical of INSV species and clusters in the Western Hemisphere clade, but its origin appears to be recent. Furthermore, INSV-UP01 might have been involved in a recombination event with an Italian isolate belonging to the Asian clade. Our analyses support that INSV isolates infect a broad plant-host range they group by geographic origin and not by host, and are subjected to frequent recombination events. These results justify the need to generate and analyze complete genome sequences of orthotospoviruses in general and INSV in particular.

  6. Whole Genome Sequencing for Genomics-Guided Investigations of Escherichia coli O157:H7 Outbreaks.

    Science.gov (United States)

    Rusconi, Brigida; Sanjar, Fatemeh; Koenig, Sara S K; Mammel, Mark K; Tarr, Phillip I; Eppinger, Mark

    2016-01-01

    Multi isolate whole genome sequencing (WGS) and typing for outbreak investigations has become a reality in the post-genomics era. We applied this technology to strains from Escherichia coli O157:H7 outbreaks. These include isolates from seven North America outbreaks, as well as multiple isolates from the same patient and from different infected individuals in the same household. Customized high-resolution bioinformatics sequence typing strategies were developed to assess the core genome and mobilome plasticity. Sequence typing was performed using an in-house single nucleotide polymorphism (SNP) discovery and validation pipeline. Discriminatory power becomes of particular importance for the investigation of isolates from outbreaks in which macrogenomic techniques such as pulse-field gel electrophoresis or multiple locus variable number tandem repeat analysis do not differentiate closely related organisms. We also characterized differences in the phage inventory, allowing us to identify plasticity among outbreak strains that is not detectable at the core genome level. Our comprehensive analysis of the mobilome identified multiple plasmids that have not previously been associated with this lineage. Applied phylogenomics approaches provide strong molecular evidence for exceptionally little heterogeneity of strains within outbreaks and demonstrate the value of intra-cluster comparisons, rather than basing the analysis on archetypal reference strains. Next generation sequencing and whole genome typing strategies provide the technological foundation for genomic epidemiology outbreak investigation utilizing its significantly higher sample throughput, cost efficiency, and phylogenetic relatedness accuracy. These phylogenomics approaches have major public health relevance in translating information from the sequence-based survey to support timely and informed countermeasures. Polymorphisms identified in this work offer robust phylogenetic signals that index both short- and

  7. Identification, characterization, and utilization of genome-wide simple sequence repeats to identify a QTL for acidity in apple

    Science.gov (United States)

    2012-01-01

    Background Apple is an economically important fruit crop worldwide. Developing a genetic linkage map is a critical step towards mapping and cloning of genes responsible for important horticultural traits in apple. To facilitate linkage map construction, we surveyed and characterized the distribution and frequency of perfect microsatellites in assembled contig sequences of the apple genome. Results A total of 28,538 SSRs have been identified in the apple genome, with an overall density of 40.8 SSRs per Mb. Di-nucleotide repeats are the most frequent microsatellites in the apple genome, accounting for 71.9% of all microsatellites. AT/TA repeats are the most frequent in genomic regions, accounting for 38.3% of all the G-SSRs, while AG/GA dimers prevail in transcribed sequences, and account for 59.4% of all EST-SSRs. A total set of 310 SSRs is selected to amplify eight apple genotypes. Of these, 245 (79.0%) are found to be polymorphic among cultivars and wild species tested. AG/GA motifs in genomic regions have detected more alleles and higher PIC values than AT/TA or AC/CA motifs. Moreover, AG/GA repeats are more variable than any other dimers in apple, and should be preferentially selected for studies, such as genetic diversity and linkage map construction. A total of 54 newly developed apple SSRs have been genetically mapped. Interestingly, clustering of markers with distorted segregation is observed on linkage groups 1, 2, 10, 15, and 16. A QTL responsible for malic acid content of apple fruits is detected on linkage group 8, and accounts for ~13.5% of the observed phenotypic variation. Conclusions This study demonstrates that di-nucleotide repeats are prevalent in the apple genome and that AT/TA and AG/GA repeats are the most frequent in genomic and transcribed sequences of apple, respectively. All SSR motifs identified in this study as well as those newly mapped SSRs will serve as valuable resources for pursuing apple genetic studies, aiding the apple breeding

  8. A bioinformatics approach for identifying transgene insertion sites using whole genome sequencing data.

    Science.gov (United States)

    Park, Doori; Park, Su-Hyun; Ban, Yong Wook; Kim, Youn Shic; Park, Kyoung-Cheul; Kim, Nam-Soo; Kim, Ju-Kon; Choi, Ik-Young

    2017-08-15

    Genetically modified crops (GM crops) have been developed to improve the agricultural traits of modern crop cultivars. Safety assessments of GM crops are of paramount importance in research at developmental stages and before releasing transgenic plants into the marketplace. Sequencing technology is developing rapidly, with higher output and labor efficiencies, and will eventually replace existing methods for the molecular characterization of genetically modified organisms. To detect the transgenic insertion locations in the three GM rice gnomes, Illumina sequencing reads are mapped and classified to the rice genome and plasmid sequence. The both mapped reads are classified to characterize the junction site between plant and transgene sequence by sequence alignment. Herein, we present a next generation sequencing (NGS)-based molecular characterization method, using transgenic rice plants SNU-Bt9-5, SNU-Bt9-30, and SNU-Bt9-109. Specifically, using bioinformatics tools, we detected the precise insertion locations and copy numbers of transfer DNA, genetic rearrangements, and the absence of backbone sequences, which were equivalent to results obtained from Southern blot analyses. NGS methods have been suggested as an effective means of characterizing and detecting transgenic insertion locations in genomes. Our results demonstrate the use of a combination of NGS technology and bioinformatics approaches that offers cost- and time-effective methods for assessing the safety of transgenic plants.

  9. Isolation and sequence characterization of DNA-A genome of a new begomovirus strain associated with severe leaf curling symptoms of Jatropha curcas L.

    Science.gov (United States)

    Chauhan, Sushma; Rahman, Hifzur; Mastan, Shaik G; Pamidimarri, D V N Sudheer; Reddy, Muppala P

    2018-07-20

    Begomoviruses belong to the family Geminiviridae are associated with several disease symptoms, such as mosaic and leaf curling in Jatropha curcas. The molecular characterization of these viral strains will help in developing management strategies to control the disease. In this study, J. curcas that was infected with begomovirus and showed acute leaf curling symptoms were identified. DNA-A segment from pathogenic viral strain was isolated and sequenced. The sequenced genome was assembled and characterized in detail. The full-length DNA-A sequence was covered by primer walking. The genome sequence showed the general organization of DNA-A from begomovirus by the distribution of ORFs in both viral and anti-viral strands. The genome size ranged from 2844 bp-2852 bp. Three strains with minor nucleotide variations were identified, and a phylogenetic analysis was performed by comparing the DNA-A segments from other reported begomovirus isolates. The maximum sequence similarity was observed with Euphorbia yellow mosaic virus (FN435995). In the phylogenetic tree, no clustering was observed with previously reported begomovirus strains isolated from J. curcas host. The strains isolated in this study belong to new begomoviral strain that elicits symptoms of leaf curling in J. curcas. The results indicate that the probable origin of the strains is from Jatropha mosaic virus infecting J. gassypifolia. The strains isolated in this study are referred as Jatropha curcas leaf curl India virus (JCLCIV) based on the major symptoms exhibited by host J. curcas. Copyright © 2018 Elsevier B.V. All rights reserved.

  10. The diploid genome sequence of an Asian individual

    DEFF Research Database (Denmark)

    Wang, Jun; Wang, Wei; Li, Ruiqiang

    2008-01-01

    Here we present the first diploid genome sequence of an Asian individual. The genome was sequenced to 36-fold average coverage using massively parallel sequencing technology. We aligned the short reads onto the NCBI human reference genome to 99.97% coverage, and guided by the reference genome, we...... used uniquely mapped reads to assemble a high-quality consensus sequence for 92% of the Asian individual's genome. We identified approximately 3 million single-nucleotide polymorphisms (SNPs) inside this region, of which 13.6% were not in the dbSNP database. Genotyping analysis showed that SNP...... identification had high accuracy and consistency, indicating the high sequence quality of this assembly. We also carried out heterozygote phasing and haplotype prediction against HapMap CHB and JPT haplotypes (Chinese and Japanese, respectively), sequence comparison with the two available individual genomes (J...

  11. Characterization of probiotic Escherichia coli isolates with a novel pan-genome microarray

    DEFF Research Database (Denmark)

    Willenbrock, Hanni; Hallin, Peter Fischer; Wassenaar, Trudy

    2007-01-01

    of the same species are rapidly becoming available, allowing for the definition and characterization of a whole species as a population of genomes - the 'pan-genome'. Results: Using 32 Escherichia coli and Shigella genome sequences we estimate the pan- and core genome of the species. We designed a high...

  12. Complete genome sequence of Bifidobacterium breve CECT 7263, a strain isolated from human milk

    OpenAIRE

    Jiménez, Esther; Villar-Tajadura, M. Antonia; Marín, María; Fontecha, F. Javier; Requena, Teresa; Arroyo, Rebeca; Fernández, Leónides; Rodríguez, Juan M.

    2012-01-01

    Bifidobacterium breve is an actinobacterium frequently isolated from colonic microbiota of breastfeeding babies. Here, we report the complete and annotated genome sequence of a B. breve strain isolated from human milk, B. breve CECT 7263. The genome sequence will provide new insights into the biology of this potential probiotic organism and will allow the characterization of genes related to beneficial properties. © 2012, American Society for Microbiology.

  13. Complete sequence and comparative analysis of the chloroplast genome of Plinia trunciflora

    Directory of Open Access Journals (Sweden)

    Maria Eguiluz

    2017-11-01

    Full Text Available Abstract Plinia trunciflora is a Brazilian native fruit tree from the Myrtaceae family, also known as jaboticaba. This species has great potential by its fruit production. Due to the high content of essential oils in their leaves and of anthocyanins in the fruits, there is also an increasing interest by the pharmaceutical industry. Nevertheless, there are few studies focusing on its molecular biology and genetic characterization. We herein report the complete chloroplast (cp genome of P. trunciflora using high-throughput sequencing and compare it to other previously sequenced Myrtaceae genomes. The cp genome of P. trunciflora is 159,512 bp in size, comprising inverted repeats of 26,414 bp and single-copy regions of 88,097 bp (LSC and 18,587 bp (SSC. The genome contains 111 single-copy genes (77 protein-coding, 30 tRNA and four rRNA genes. Phylogenetic analysis using 57 cp protein-coding genes demonstrated that P. trunciflora, Eugenia uniflora and Acca sellowiana form a cluster with closer relationship to Syzygium cumini than with Eucalyptus. The complete cp sequence reported here can be used in evolutionary and population genetics studies, contributing to resolve the complex taxonomy of this species and fill the gap in genetic characterization.

  14. Complete sequence and comparative analysis of the chloroplast genome of Plinia trunciflora

    Science.gov (United States)

    Eguiluz, Maria; Yuyama, Priscila Mary; Guzman, Frank; Rodrigues, Nureyev Ferreira; Margis, Rogerio

    2017-01-01

    Abstract Plinia trunciflora is a Brazilian native fruit tree from the Myrtaceae family, also known as jaboticaba. This species has great potential by its fruit production. Due to the high content of essential oils in their leaves and of anthocyanins in the fruits, there is also an increasing interest by the pharmaceutical industry. Nevertheless, there are few studies focusing on its molecular biology and genetic characterization. We herein report the complete chloroplast (cp) genome of P. trunciflora using high-throughput sequencing and compare it to other previously sequenced Myrtaceae genomes. The cp genome of P. trunciflora is 159,512 bp in size, comprising inverted repeats of 26,414 bp and single-copy regions of 88,097 bp (LSC) and 18,587 bp (SSC). The genome contains 111 single-copy genes (77 protein-coding, 30 tRNA and four rRNA genes). Phylogenetic analysis using 57 cp protein-coding genes demonstrated that P. trunciflora, Eugenia uniflora and Acca sellowiana form a cluster with closer relationship to Syzygium cumini than with Eucalyptus. The complete cp sequence reported here can be used in evolutionary and population genetics studies, contributing to resolve the complex taxonomy of this species and fill the gap in genetic characterization. PMID:29111566

  15. Complete sequence and comparative analysis of the chloroplast genome of Plinia trunciflora.

    Science.gov (United States)

    Eguiluz, Maria; Yuyama, Priscila Mary; Guzman, Frank; Rodrigues, Nureyev Ferreira; Margis, Rogerio

    2017-01-01

    Plinia trunciflora is a Brazilian native fruit tree from the Myrtaceae family, also known as jaboticaba. This species has great potential by its fruit production. Due to the high content of essential oils in their leaves and of anthocyanins in the fruits, there is also an increasing interest by the pharmaceutical industry. Nevertheless, there are few studies focusing on its molecular biology and genetic characterization. We herein report the complete chloroplast (cp) genome of P. trunciflora using high-throughput sequencing and compare it to other previously sequenced Myrtaceae genomes. The cp genome of P. trunciflora is 159,512 bp in size, comprising inverted repeats of 26,414 bp and single-copy regions of 88,097 bp (LSC) and 18,587 bp (SSC). The genome contains 111 single-copy genes (77 protein-coding, 30 tRNA and four rRNA genes). Phylogenetic analysis using 57 cp protein-coding genes demonstrated that P. trunciflora, Eugenia uniflora and Acca sellowiana form a cluster with closer relationship to Syzygium cumini than with Eucalyptus. The complete cp sequence reported here can be used in evolutionary and population genetics studies, contributing to resolve the complex taxonomy of this species and fill the gap in genetic characterization.

  16. The first complete chloroplast genome sequence of a lycophyte,Huperzia lucidula (Lycopodiaceae)

    Energy Technology Data Exchange (ETDEWEB)

    Wolf, Paul G.; Karol, Kenneth G.; Mandoli, Dina F.; Kuehl,Jennifer V.; Arumuganathan, K.; Ellis, Mark W.; Mishler, Brent D.; Kelch,Dean G.; Olmstead, Richard G.; Boore, Jeffrey L.

    2005-02-01

    We used a unique combination of techniques to sequence the first complete chloroplast genome of a lycophyte, Huperzia lucidula. This plant belongs to a significant clade hypothesized to represent the sister group to all other vascular plants. We used fluorescence-activated cell sorting (FACS) to isolate the organelles, rolling circle amplification (RCA) to amplify the genome, and shotgun sequencing to 8x depth coverage to obtain the complete chloroplast genome sequence. The genome is 154,373bp, containing inverted repeats of 15,314 bp each, a large single-copy region of 104,088 bp, and a small single-copy region of 19,671 bp. Gene order is more similar to those of mosses, liverworts, and hornworts than to gene order for other vascular plants. For example, the Huperziachloroplast genome possesses the bryophyte gene order for a previously characterized 30 kb inversion, thus supporting the hypothesis that lycophytes are sister to all other extant vascular plants. The lycophytechloroplast genome data also enable a better reconstruction of the basaltracheophyte genome, which is useful for inferring relationships among bryophyte lineages. Several unique characters are observed in Huperzia, such as movement of the gene ndhF from the small single copy region into the inverted repeat. We present several analyses of evolutionary relationships among land plants by using nucleotide data, amino acid sequences, and by comparing gene arrangements from chloroplast genomes. The results, while still tentative pending the large number of chloroplast genomes from other key lineages that are soon to be sequenced, are intriguing in themselves, and contribute to a growing comparative database of genomic and morphological data across the green plants.

  17. Harnessing Whole Genome Sequencing in Medical Mycology.

    Science.gov (United States)

    Cuomo, Christina A

    2017-01-01

    Comparative genome sequencing studies of human fungal pathogens enable identification of genes and variants associated with virulence and drug resistance. This review describes current approaches, resources, and advances in applying whole genome sequencing to study clinically important fungal pathogens. Genomes for some important fungal pathogens were only recently assembled, revealing gene family expansions in many species and extreme gene loss in one obligate species. The scale and scope of species sequenced is rapidly expanding, leveraging technological advances to assemble and annotate genomes with higher precision. By using iteratively improved reference assemblies or those generated de novo for new species, recent studies have compared the sequence of isolates representing populations or clinical cohorts. Whole genome approaches provide the resolution necessary for comparison of closely related isolates, for example, in the analysis of outbreaks or sampled across time within a single host. Genomic analysis of fungal pathogens has enabled both basic research and diagnostic studies. The increased scale of sequencing can be applied across populations, and new metagenomic methods allow direct analysis of complex samples.

  18. Genomic Sequence Variation Markup Language (GSVML).

    Science.gov (United States)

    Nakaya, Jun; Kimura, Michio; Hiroi, Kaei; Ido, Keisuke; Yang, Woosung; Tanaka, Hiroshi

    2010-02-01

    With the aim of making good use of internationally accumulated genomic sequence variation data, which is increasing rapidly due to the explosive amount of genomic research at present, the development of an interoperable data exchange format and its international standardization are necessary. Genomic Sequence Variation Markup Language (GSVML) will focus on genomic sequence variation data and human health applications, such as gene based medicine or pharmacogenomics. We developed GSVML through eight steps, based on case analysis and domain investigations. By focusing on the design scope to human health applications and genomic sequence variation, we attempted to eliminate ambiguity and to ensure practicability. We intended to satisfy the requirements derived from the use case analysis of human-based clinical genomic applications. Based on database investigations, we attempted to minimize the redundancy of the data format, while maximizing the data covering range. We also attempted to ensure communication and interface ability with other Markup Languages, for exchange of omics data among various omics researchers or facilities. The interface ability with developing clinical standards, such as the Health Level Seven Genotype Information model, was analyzed. We developed the human health-oriented GSVML comprising variation data, direct annotation, and indirect annotation categories; the variation data category is required, while the direct and indirect annotation categories are optional. The annotation categories contain omics and clinical information, and have internal relationships. For designing, we examined 6 cases for three criteria as human health application and 15 data elements for three criteria as data formats for genomic sequence variation data exchange. The data format of five international SNP databases and six Markup Languages and the interface ability to the Health Level Seven Genotype Model in terms of 317 items were investigated. GSVML was developed as

  19. EuMicroSatdb: A database for microsatellites in the sequenced genomes of eukaryotes

    Directory of Open Access Journals (Sweden)

    Grover Atul

    2007-07-01

    Full Text Available Abstract Background Microsatellites have immense utility as molecular markers in different fields like genome characterization and mapping, phylogeny and evolutionary biology. Existing microsatellite databases are of limited utility for experimental and computational biologists with regard to their content and information output. EuMicroSatdb (Eukaryotic MicroSatellite database http://ipu.ac.in/usbt/EuMicroSatdb.htm is a web based relational database for easy and efficient positional mining of microsatellites from sequenced eukaryotic genomes. Description A user friendly web interface has been developed for microsatellite data retrieval using Active Server Pages (ASP. The backend database codes for data extraction and assembly have been written using Perl based scripts and C++. Precise need based microsatellites data retrieval is possible using different input parameters like microsatellite type (simple perfect or compound perfect, repeat unit length (mono- to hexa-nucleotide, repeat number, microsatellite length and chromosomal location in the genome. Furthermore, information about clustering of different microsatellites in the genome can also be retrieved. Finally, to facilitate primer designing for PCR amplification of any desired microsatellite locus, 200 bp upstream and downstream sequences are provided. Conclusion The database allows easy systematic retrieval of comprehensive information about simple and compound microsatellites, microsatellite clusters and their locus coordinates in 31 sequenced eukaryotic genomes. The information content of the database is useful in different areas of research like gene tagging, genome mapping, population genetics, germplasm characterization and in understanding microsatellite dynamics in eukaryotic genomes.

  20. Approaches for in silico finishing of microbial genome sequences

    Directory of Open Access Journals (Sweden)

    Frederico Schmitt Kremer

    Full Text Available Abstract The introduction of next-generation sequencing (NGS had a significant effect on the availability of genomic information, leading to an increase in the number of sequenced genomes from a large spectrum of organisms. Unfortunately, due to the limitations implied by the short-read sequencing platforms, most of these newly sequenced genomes remained as “drafts”, incomplete representations of the whole genetic content. The previous genome sequencing studies indicated that finishing a genome sequenced by NGS, even bacteria, may require additional sequencing to fill the gaps, making the entire process very expensive. As such, several in silico approaches have been developed to optimize the genome assemblies and facilitate the finishing process. The present review aims to explore some free (open source, in many cases tools that are available to facilitate genome finishing.

  1. Approaches for in silico finishing of microbial genome sequences.

    Science.gov (United States)

    Kremer, Frederico Schmitt; McBride, Alan John Alexander; Pinto, Luciano da Silva

    The introduction of next-generation sequencing (NGS) had a significant effect on the availability of genomic information, leading to an increase in the number of sequenced genomes from a large spectrum of organisms. Unfortunately, due to the limitations implied by the short-read sequencing platforms, most of these newly sequenced genomes remained as "drafts", incomplete representations of the whole genetic content. The previous genome sequencing studies indicated that finishing a genome sequenced by NGS, even bacteria, may require additional sequencing to fill the gaps, making the entire process very expensive. As such, several in silico approaches have been developed to optimize the genome assemblies and facilitate the finishing process. The present review aims to explore some free (open source, in many cases) tools that are available to facilitate genome finishing.

  2. A plant pathology perspective of fungal genome sequencing.

    Science.gov (United States)

    Aylward, Janneke; Steenkamp, Emma T; Dreyer, Léanne L; Roets, Francois; Wingfield, Brenda D; Wingfield, Michael J

    2017-06-01

    The majority of plant pathogens are fungi and many of these adversely affect food security. This mini-review aims to provide an analysis of the plant pathogenic fungi for which genome sequences are publically available, to assess their general genome characteristics, and to consider how genomics has impacted plant pathology. A list of sequenced fungal species was assembled, the taxonomy of all species verified, and the potential reason for sequencing each of the species considered. The genomes of 1090 fungal species are currently (October 2016) in the public domain and this number is rapidly rising. Pathogenic species comprised the largest category (35.5 %) and, amongst these, plant pathogens are predominant. Of the 191 plant pathogenic fungal species with available genomes, 61.3 % cause diseases on food crops, more than half of which are staple crops. The genomes of plant pathogens are slightly larger than those of other fungal species sequenced to date and they contain fewer coding sequences in relation to their genome size. Both of these factors can be attributed to the expansion of repeat elements. Sequenced genomes of plant pathogens provide blueprints from which potential virulence factors were identified and from which genes associated with different pathogenic strategies could be predicted. Genome sequences have also made it possible to evaluate adaptability of pathogen genomes and genomic regions that experience selection pressures. Some genomic patterns, however, remain poorly understood and plant pathogen genomes alone are not sufficient to unravel complex pathogen-host interactions. Genomes, therefore, cannot replace experimental studies that can be complex and tedious. Ultimately, the most promising application lies in using fungal plant pathogen genomics to inform disease management and risk assessment strategies. This will ultimately minimize the risks of future disease outbreaks and assist in preparation for emerging pathogen outbreaks.

  3. Arthropod phylogenetics in light of three novel millipede (myriapoda: diplopoda mitochondrial genomes with comments on the appropriateness of mitochondrial genome sequence data for inferring deep level relationships.

    Directory of Open Access Journals (Sweden)

    Michael S Brewer

    Full Text Available BACKGROUND: Arthropods are the most diverse group of eukaryotic organisms, but their phylogenetic relationships are poorly understood. Herein, we describe three mitochondrial genomes representing orders of millipedes for which complete genomes had not been characterized. Newly sequenced genomes are combined with existing data to characterize the protein coding regions of myriapods and to attempt to reconstruct the evolutionary relationships within the Myriapoda and Arthropoda. RESULTS: The newly sequenced genomes are similar to previously characterized millipede sequences in terms of synteny and length. Unique translocations occurred within the newly sequenced taxa, including one half of the Appalachioria falcifera genome, which is inverted with respect to other millipede genomes. Across myriapods, amino acid conservation levels are highly dependent on the gene region. Additionally, individual loci varied in the level of amino acid conservation. Overall, most gene regions showed low levels of conservation at many sites. Attempts to reconstruct the evolutionary relationships suffered from questionable relationships and low support values. Analyses of phylogenetic informativeness show the lack of signal deep in the trees (i.e., genes evolve too quickly. As a result, the myriapod tree resembles previously published results but lacks convincing support, and, within the arthropod tree, well established groups were recovered as polyphyletic. CONCLUSIONS: The novel genome sequences described herein provide useful genomic information concerning millipede groups that had not been investigated. Taken together with existing sequences, the variety of compositions and evolution of myriapod mitochondrial genomes are shown to be more complex than previously thought. Unfortunately, the use of mitochondrial protein-coding regions in deep arthropod phylogenetics appears problematic, a result consistent with previously published studies. Lack of phylogenetic

  4. Arthropod phylogenetics in light of three novel millipede (myriapoda: diplopoda) mitochondrial genomes with comments on the appropriateness of mitochondrial genome sequence data for inferring deep level relationships.

    Science.gov (United States)

    Brewer, Michael S; Swafford, Lynn; Spruill, Chad L; Bond, Jason E

    2013-01-01

    Arthropods are the most diverse group of eukaryotic organisms, but their phylogenetic relationships are poorly understood. Herein, we describe three mitochondrial genomes representing orders of millipedes for which complete genomes had not been characterized. Newly sequenced genomes are combined with existing data to characterize the protein coding regions of myriapods and to attempt to reconstruct the evolutionary relationships within the Myriapoda and Arthropoda. The newly sequenced genomes are similar to previously characterized millipede sequences in terms of synteny and length. Unique translocations occurred within the newly sequenced taxa, including one half of the Appalachioria falcifera genome, which is inverted with respect to other millipede genomes. Across myriapods, amino acid conservation levels are highly dependent on the gene region. Additionally, individual loci varied in the level of amino acid conservation. Overall, most gene regions showed low levels of conservation at many sites. Attempts to reconstruct the evolutionary relationships suffered from questionable relationships and low support values. Analyses of phylogenetic informativeness show the lack of signal deep in the trees (i.e., genes evolve too quickly). As a result, the myriapod tree resembles previously published results but lacks convincing support, and, within the arthropod tree, well established groups were recovered as polyphyletic. The novel genome sequences described herein provide useful genomic information concerning millipede groups that had not been investigated. Taken together with existing sequences, the variety of compositions and evolution of myriapod mitochondrial genomes are shown to be more complex than previously thought. Unfortunately, the use of mitochondrial protein-coding regions in deep arthropod phylogenetics appears problematic, a result consistent with previously published studies. Lack of phylogenetic signal renders the resulting tree topologies as suspect

  5. Southern-by-Sequencing: A Robust Screening Approach for Molecular Characterization of Genetically Modified Crops

    Directory of Open Access Journals (Sweden)

    Gina M. Zastrow-Hayes

    2015-03-01

    Full Text Available Molecular characterization of events is an integral part of the advancement process during genetically modified (GM crop product development. Assessment of these events is traditionally accomplished by polymerase chain reaction (PCR and Southern blot analyses. Southern blot analysis can be time-consuming and comparatively expensive and does not provide sequence-level detail. We have developed a sequence-based application, Southern-by-Sequencing (SbS, utilizing sequence capture coupled with next-generation sequencing (NGS technology to replace Southern blot analysis for event selection in a high-throughput molecular characterization environment. SbS is accomplished by hybridizing indexed and pooled whole-genome DNA libraries from GM plants to biotinylated probes designed to target the sequence of transformation plasmids used to generate events within the pool. This sequence capture process enriches the sequence data obtained for targeted regions of interest (transformation plasmid DNA. Taking advantage of the DNA adjacent to the targeted bases (referred to as next-to-target sequence that accompanies the targeted transformation plasmid sequence, the data analysis detects plasmid-to-genome and plasmid-to-plasmid junctions introduced during insertion into the plant genome. Analysis of these junction sequences provides sequence-level information as to the following: the number of insertion loci including detection of unlinked, independently segregating, small DNA fragments; copy number; rearrangements, truncations, or deletions of the intended insertion DNA; and the presence of transformation plasmid backbone sequences. This molecular evidence from SbS analysis is used to characterize and select GM plants meeting optimal molecular characterization criteria. SbS technology has proven to be a robust event screening tool for use in a high-throughput molecular characterization environment.

  6. A Snapshot of the Emerging Tomato Genome Sequence

    Directory of Open Access Journals (Sweden)

    Lukas A. Mueller

    2009-03-01

    Full Text Available The genome of tomato ( L. is being sequenced by an international consortium of 10 countries (Korea, China, the United Kingdom, India, the Netherlands, France, Japan, Spain, Italy, and the United States as part of the larger “International Solanaceae Genome Project (SOL: Systems Approach to Diversity and Adaptation” initiative. The tomato genome sequencing project uses an ordered bacterial artificial chromosome (BAC approach to generate a high-quality tomato euchromatic genome sequence for use as a reference genome for the Solanaceae and euasterids. Sequence is deposited at GenBank and at the SOL Genomics Network (SGN. Currently, there are around 1000 BACs finished or in progress, representing more than a third of the projected euchromatic portion of the genome. An annotation effort is also underway by the International Tomato Annotation Group. The expected number of genes in the euchromatin is ∼40,000, based on an estimate from a preliminary annotation of 11% of finished sequence. Here, we present this first snapshot of the emerging tomato genome and its annotation, a short comparison with potato ( L. sequence data, and the tools available for the researchers to exploit this new resource are also presented. In the future, whole-genome shotgun techniques will be combined with the BAC-by-BAC approach to cover the entire tomato genome. The high-quality reference euchromatic tomato sequence is expected to be near completion by 2010.

  7. Genomic DNA Enrichment Using Sequence Capture Microarrays: a Novel Approach to Discover Sequence Nucleotide Polymorphisms (SNP) in Brassica napus L

    Science.gov (United States)

    Clarke, Wayne E.; Parkin, Isobel A.; Gajardo, Humberto A.; Gerhardt, Daniel J.; Higgins, Erin; Sidebottom, Christine; Sharpe, Andrew G.; Snowdon, Rod J.; Federico, Maria L.; Iniguez-Luy, Federico L.

    2013-01-01

    Targeted genomic selection methodologies, or sequence capture, allow for DNA enrichment and large-scale resequencing and characterization of natural genetic variation in species with complex genomes, such as rapeseed canola (Brassica napus L., AACC, 2n=38). The main goal of this project was to combine sequence capture with next generation sequencing (NGS) to discover single nucleotide polymorphisms (SNPs) in specific areas of the B. napus genome historically associated (via quantitative trait loci –QTL– analysis) to traits of agronomical and nutritional importance. A 2.1 million feature sequence capture platform was designed to interrogate DNA sequence variation across 47 specific genomic regions, representing 51.2 Mb of the Brassica A and C genomes, in ten diverse rapeseed genotypes. All ten genotypes were sequenced using the 454 Life Sciences chemistry and to assess the effect of increased sequence depth, two genotypes were also sequenced using Illumina HiSeq chemistry. As a result, 589,367 potentially useful SNPs were identified. Analysis of sequence coverage indicated a four-fold increased representation of target regions, with 57% of the filtered SNPs falling within these regions. Sixty percent of discovered SNPs corresponded to transitions while 40% were transversions. Interestingly, fifty eight percent of the SNPs were found in genic regions while 42% were found in intergenic regions. Further, a high percentage of genic SNPs was found in exons (65% and 64% for the A and C genomes, respectively). Two different genotyping assays were used to validate the discovered SNPs. Validation rates ranged from 61.5% to 84% of tested SNPs, underpinning the effectiveness of this SNP discovery approach. Most importantly, the discovered SNPs were associated with agronomically important regions of the B. napus genome generating a novel data resource for research and breeding this crop species. PMID:24312619

  8. Scrutinizing virus genome termini by high-throughput sequencing.

    Directory of Open Access Journals (Sweden)

    Shasha Li

    Full Text Available Analysis of genomic terminal sequences has been a major step in studies on viral DNA replication and packaging mechanisms. However, traditional methods to study genome termini are challenging due to the time-consuming protocols and their inefficiency where critical details are lost easily. Recent advances in next generation sequencing (NGS have enabled it to be a powerful tool to study genome termini. In this study, using NGS we sequenced one iridovirus genome and twenty phage genomes and confirmed for the first time that the high frequency sequences (HFSs found in the NGS reads are indeed the terminal sequences of viral genomes. Further, we established a criterion to distinguish the type of termini and the viral packaging mode. We also obtained additional terminal details such as terminal repeats, multi-termini, asymmetric termini. With this approach, we were able to simultaneously detect details of the genome termini as well as obtain the complete sequence of bacteriophage genomes. Theoretically, this application can be further extended to analyze larger and more complicated genomes of plant and animal viruses. This study proposed a novel and efficient method for research on viral replication, packaging, terminase activity, transcription regulation, and metabolism of the host cell.

  9. The Genome Sequence of the psychrophilic archaeon, Methanococcoides burtonii: the Role of Genome Evolution in Cold-adaptation

    Energy Technology Data Exchange (ETDEWEB)

    Allen, Michelle A.; Lauro, Federico M.; Williams, Timothy J.; Burg, Dominic; Siddiqui, Khawar S.; De Francisci, David; Chong, Kevin W.Y.; Pilak, Oliver; Chew, Hwee H.; De Maere, Matthew Z.; Ting, Lily; Katrib, Marilyn; Ng, Charmaine; Sowers, Kevin R.; Galperin, Michael Y.; Anderson, Iain J.; Ivanova, Natalia; Dalin, Eileen; Martinez, Michelle; Lapidus, Alla; Hauser, Loren; Land, Miriam; Thomas, Torsten; Cavicchioli, Ricardo

    2009-04-01

    Psychrophilic archaea are abundant and perform critical roles throughout the Earth's expansive cold biosphere. Here we report the first complete genome sequence for a psychrophilic methanogenic archaeon, Methanococcoides burtonii. The genome sequence was manually annotated including the use of a five tiered Evidence Rating system that ranked annotations from Evidence Rating (ER) 1 (gene product experimentally characterized from the parent organism) to ER5 (hypothetical gene product) to provide a rapid means of assessing the certainty of gene function predictions. The genome is characterized by a higher level of aberrant sequence composition (51%) than any other archaeon. In comparison to hyper/thermophilic archaea which are subject to selection of synonymous codon usage, M. burtonii has evolved cold adaptation through a genomic capacity to accommodate highly skewed amino acid content, while retaining codon usage in common with its mesophilic Methanosarcina cousins. Polysaccharide biosynthesis genes comprise at least 3.3% of protein coding genes in the genome, and Cell wall/membrane/envelope biogenesis COG genes are over-represented. Likewise, signal transduction (COG category T) genes are over-represented and M. burtonii has a high 'IQ' (a measure of adaptive potential) compared to many methanogens. Numerous genes in these two over-represented COG categories appear to have been acquired from {var_epsilon}- and {delta}-proteobacteria, as do specific genes involved in central metabolism such as a novel B form of aconitase. Transposases also distinguish M. burtonii from other archaea, and their genomic characteristics indicate they play an important role in evolving the M. burtonii genome. Our study reveals a capacity for this model psychrophile to evolve through genome plasticity (including nucleotide skew, horizontal gene transfer and transposase activity) that enables adaptation to the cold, and to the biological and physical changes that have

  10. Rhipicephalus (Boophilus) microplus strain Deutsch, whole genome shotgun sequencing project first submission of genome sequence

    Science.gov (United States)

    The size and repetitive nature of the Rhipicephalus microplus genome makes obtaining a full genome sequence difficult. Cot filtration/selection techniques were used to reduce the repetitive fraction of the tick genome and enrich for the fraction of DNA with gene-containing regions. The Cot-selected ...

  11. Genome Sequence of the Palaeopolyploid soybean

    Energy Technology Data Exchange (ETDEWEB)

    Schmutz, Jeremy; Cannon, Steven B.; Schlueter, Jessica; Ma, Jianxin; Mitros, Therese; Nelson, William; Hyten, David L.; Song, Qijian; Thelen, Jay J.; Cheng, Jianlin; Xu, Dong; Hellsten, Uffe; May, Gregory D.; Yu, Yeisoo; Sakura, Tetsuya; Umezawa, Taishi; Bhattacharyya, Madan K.; Sandhu, Devinder; Valliyodan, Babu; Lindquist, Erika; Peto, Myron; Grant, David; Shu, Shengqiang; Goodstein, David; Barry, Kerrie; Futrell-Griggs, Montona; Abernathy, Brian; Du, Jianchang; Tian, Zhixi; Zhu, Liucun; Gill, Navdeep; Joshi, Trupti; Libault, Marc; Sethuraman, Anand; Zhang, Xue-Cheng; Shinozaki, Kazuo; Nguyen, Henry T.; Wing, Rod A.; Cregan, Perry; Specht, James; Grimwood, Jane; Rokhsar, Dan; Stacey, Gary; Shoemaker, Randy C.; Jackson, Scott A.

    2009-08-03

    Soybean (Glycine max) is one of the most important crop plants for seed protein and oil content, and for its capacity to fix atmospheric nitrogen through symbioses with soil-borne microorganisms. We sequenced the 1.1-gigabase genome by a whole-genome shotgun approach and integrated it with physical and high-density genetic maps to create a chromosome-scale draft sequence assembly. We predict 46,430 protein-coding genes, 70percent more than Arabidopsis and similar to the poplar genome which, like soybean, is an ancient polyploid (palaeopolyploid). About 78percent of the predicted genes occur in chromosome ends, which comprise less than one-half of the genome but account for nearly all of the genetic recombination. Genome duplications occurred at approximately 59 and 13 million years ago, resulting in a highly duplicated genome with nearly 75percent of the genes present in multiple copies. The two duplication events were followed by gene diversification and loss, and numerous chromosome rearrangements. An accurate soybean genome sequence will facilitate the identification of the genetic basis of many soybean traits, and accelerate the creation of improved soybean varieties.

  12. A computational genomics pipeline for prokaryotic sequencing projects.

    Science.gov (United States)

    Kislyuk, Andrey O; Katz, Lee S; Agrawal, Sonia; Hagen, Matthew S; Conley, Andrew B; Jayaraman, Pushkala; Nelakuditi, Viswateja; Humphrey, Jay C; Sammons, Scott A; Govil, Dhwani; Mair, Raydel D; Tatti, Kathleen M; Tondella, Maria L; Harcourt, Brian H; Mayer, Leonard W; Jordan, I King

    2010-08-01

    New sequencing technologies have accelerated research on prokaryotic genomes and have made genome sequencing operations outside major genome sequencing centers routine. However, no off-the-shelf solution exists for the combined assembly, gene prediction, genome annotation and data presentation necessary to interpret sequencing data. The resulting requirement to invest significant resources into custom informatics support for genome sequencing projects remains a major impediment to the accessibility of high-throughput sequence data. We present a self-contained, automated high-throughput open source genome sequencing and computational genomics pipeline suitable for prokaryotic sequencing projects. The pipeline has been used at the Georgia Institute of Technology and the Centers for Disease Control and Prevention for the analysis of Neisseria meningitidis and Bordetella bronchiseptica genomes. The pipeline is capable of enhanced or manually assisted reference-based assembly using multiple assemblers and modes; gene predictor combining; and functional annotation of genes and gene products. Because every component of the pipeline is executed on a local machine with no need to access resources over the Internet, the pipeline is suitable for projects of a sensitive nature. Annotation of virulence-related features makes the pipeline particularly useful for projects working with pathogenic prokaryotes. The pipeline is licensed under the open-source GNU General Public License and available at the Georgia Tech Neisseria Base (http://nbase.biology.gatech.edu/). The pipeline is implemented with a combination of Perl, Bourne Shell and MySQL and is compatible with Linux and other Unix systems.

  13. Functional role of a highly repetitive DNA sequence in anchorage of the mouse genome.

    Science.gov (United States)

    Neuer-Nitsche, B; Lu, X N; Werner, D

    1988-09-12

    The major portion of the eukaryotic genome consists of various categories of repetitive DNA sequences which have been studied with respect to their base compositions, organizations, copy numbers, transcription and species specificities; their biological roles, however, are still unclear. A novel quality of a highly repetitive mouse DNA sequence is described which points to a functional role: All copies (approximately 50,000 per haploid genome) of this DNA sequence reside on genomic Alu I DNA fragments each associated with nuclear polypeptides that are not released from DNA by proteinase K, SDS and phenol extraction. By this quality the repetitive DNA sequence is classified as a member of the sub-set of DNA sequences involved in tight DNA-polypeptide complexes which have been previously shown to be components of the subnuclear structure termed 'nuclear matrix'. From these results it has to be concluded that the repetitive DNA sequence characterized in this report represents or comprises a signal for a large number of site specific attachment points of the mouse genome in the nuclear matrix.

  14. Plasmodium cynomolgi genome sequences provide insight into Plasmodium vivax and the monkey malaria clade.

    Science.gov (United States)

    Tachibana, Shin-Ichiro; Sullivan, Steven A; Kawai, Satoru; Nakamura, Shota; Kim, Hyunjae R; Goto, Naohisa; Arisue, Nobuko; Palacpac, Nirianne M Q; Honma, Hajime; Yagi, Masanori; Tougan, Takahiro; Katakai, Yuko; Kaneko, Osamu; Mita, Toshihiro; Kita, Kiyoshi; Yasutomi, Yasuhiro; Sutton, Patrick L; Shakhbatyan, Rimma; Horii, Toshihiro; Yasunaga, Teruo; Barnwell, John W; Escalante, Ananias A; Carlton, Jane M; Tanabe, Kazuyuki

    2012-09-01

    P. cynomolgi, a malaria-causing parasite of Asian Old World monkeys, is the sister taxon of P. vivax, the most prevalent malaria-causing species in humans outside of Africa. Because P. cynomolgi shares many phenotypic, biological and genetic characteristics with P. vivax, we generated draft genome sequences for three P. cynomolgi strains and performed genomic analysis comparing them with the P. vivax genome, as well as with the genome of a third previously sequenced simian parasite, Plasmodium knowlesi. Here, we show that genomes of the monkey malaria clade can be characterized by copy-number variants (CNVs) in multigene families involved in evasion of the human immune system and invasion of host erythrocytes. We identify genome-wide SNPs, microsatellites and CNVs in the P. cynomolgi genome, providing a map of genetic variation that can be used to map parasite traits and study parasite populations. The sequencing of the P. cynomolgi genome is a critical step in developing a model system for P. vivax research and in counteracting the neglect of P. vivax.

  15. Full Genome Sequence and sfRNA Interferon Antagonist Activity of Zika Virus from Recife, Brazil.

    Directory of Open Access Journals (Sweden)

    Claire L Donald

    2016-10-01

    Full Text Available The outbreak of Zika virus (ZIKV in the Americas has transformed a previously obscure mosquito-transmitted arbovirus of the Flaviviridae family into a major public health concern. Little is currently known about the evolution and biology of ZIKV and the factors that contribute to the associated pathogenesis. Determining genomic sequences of clinical viral isolates and characterization of elements within these are an important prerequisite to advance our understanding of viral replicative processes and virus-host interactions.We obtained a ZIKV isolate from a patient who presented with classical ZIKV-associated symptoms, and used high throughput sequencing and other molecular biology approaches to determine its full genome sequence, including non-coding regions. Genome regions were characterized and compared to the sequences of other isolates where available. Furthermore, we identified a subgenomic flavivirus RNA (sfRNA in ZIKV-infected cells that has antagonist activity against RIG-I induced type I interferon induction, with a lesser effect on MDA-5 mediated action.The full-length genome sequence including non-coding regions of a South American ZIKV isolate from a patient with classical symptoms will support efforts to develop genetic tools for this virus. Detection of sfRNA that counteracts interferon responses is likely to be important for further understanding of pathogenesis and virus-host interactions.

  16. Intra-species sequence comparisons for annotating genomes

    Energy Technology Data Exchange (ETDEWEB)

    Boffelli, Dario; Weer, Claire V.; Weng, Li; Lewis, Keith D.; Shoukry, Malak I.; Pachter, Lior; Keys, David N.; Rubin, Edward M.

    2004-07-15

    Analysis of sequence variation among members of a single species offers a potential approach to identify functional DNA elements responsible for biological features unique to that species. Due to its high rate of allelic polymorphism and ease of genetic manipulability, we chose the sea squirt, Ciona intestinalis, to explore intra-species sequence comparisons for genome annotation. A large number of C. intestinalis specimens were collected from four continents and a set of genomic intervals amplified, resequenced and analyzed to determine the mutation rates at each nucleotide in the sequence. We found that regions with low mutation rates efficiently demarcated functionally constrained sequences: these include a set of noncoding elements, which we showed in C intestinalis transgenic assays to act as tissue-specific enhancers, as well as the location of coding sequences. This illustrates that comparisons of multiple members of a species can be used for genome annotation, suggesting a path for the annotation of the sequenced genomes of organisms occupying uncharacterized phylogenetic branches of the animal kingdom and raises the possibility that the resequencing of a large number of Homo sapiens individuals might be used to annotate the human genome and identify sequences defining traits unique to our species. The sequence data from this study has been submitted to GenBank under accession nos. AY667278-AY667407.

  17. Sequencing and comparing whole mitochondrial genomes ofanimals

    Energy Technology Data Exchange (ETDEWEB)

    Boore, Jeffrey L.; Macey, J. Robert; Medina, Monica

    2005-04-22

    Comparing complete animal mitochondrial genome sequences is becoming increasingly common for phylogenetic reconstruction and as a model for genome evolution. Not only are they much more informative than shorter sequences of individual genes for inferring evolutionary relatedness, but these data also provide sets of genome-level characters, such as the relative arrangements of genes, that can be especially powerful. We describe here the protocols commonly used for physically isolating mtDNA, for amplifying these by PCR or RCA, for cloning,sequencing, assembly, validation, and gene annotation, and for comparing both sequences and gene arrangements. On several topics, we offer general observations based on our experiences to date with determining and comparing complete mtDNA sequences.

  18. Applications of Genomic Sequencing in Pediatric CNS Tumors.

    Science.gov (United States)

    Bavle, Abhishek A; Lin, Frank Y; Parsons, D Williams

    2016-05-01

    Recent advances in genome-scale sequencing methods have resulted in a significant increase in our understanding of the biology of human cancers. When applied to pediatric central nervous system (CNS) tumors, these remarkable technological breakthroughs have facilitated the molecular characterization of multiple tumor types, provided new insights into the genetic basis of these cancers, and prompted innovative strategies that are changing the management paradigm in pediatric neuro-oncology. Genomic tests have begun to affect medical decision making in a number of ways, from delineating histopathologically similar tumor types into distinct molecular subgroups that correlate with clinical characteristics, to guiding the addition of novel therapeutic agents for patients with high-risk or poor-prognosis tumors, or alternatively, reducing treatment intensity for those with a favorable prognosis. Genomic sequencing has also had a significant impact on translational research strategies in pediatric CNS tumors, resulting in wide-ranging applications that have the potential to direct the rational preclinical screening of novel therapeutic agents, shed light on tumor heterogeneity and evolution, and highlight differences (or similarities) between pediatric and adult CNS tumors. Finally, in addition to allowing the identification of somatic (tumor-specific) mutations, the analysis of patient-matched constitutional (germline) DNA has facilitated the detection of pathogenic germline alterations in cancer genes in patients with CNS tumors, with critical implications for genetic counseling and tumor surveillance strategies for children with familial predisposition syndromes. As our understanding of the molecular landscape of pediatric CNS tumors continues to advance, innovative applications of genomic sequencing hold significant promise for further improving the care of children with these cancers.

  19. ReRep: Computational detection of repetitive sequences in genome survey sequences (GSS

    Directory of Open Access Journals (Sweden)

    Alves-Ferreira Marcelo

    2008-09-01

    Full Text Available Abstract Background Genome survey sequences (GSS offer a preliminary global view of a genome since, unlike ESTs, they cover coding as well as non-coding DNA and include repetitive regions of the genome. A more precise estimation of the nature, quantity and variability of repetitive sequences very early in a genome sequencing project is of considerable importance, as such data strongly influence the estimation of genome coverage, library quality and progress in scaffold construction. Also, the elimination of repetitive sequences from the initial assembly process is important to avoid errors and unnecessary complexity. Repetitive sequences are also of interest in a variety of other studies, for instance as molecular markers. Results We designed and implemented a straightforward pipeline called ReRep, which combines bioinformatics tools for identifying repetitive structures in a GSS dataset. In a case study, we first applied the pipeline to a set of 970 GSSs, sequenced in our laboratory from the human pathogen Leishmania braziliensis, the causative agent of leishmaniosis, an important public health problem in Brazil. We also verified the applicability of ReRep to new sequencing technologies using a set of 454-reads of an Escheria coli. The behaviour of several parameters in the algorithm is evaluated and suggestions are made for tuning of the analysis. Conclusion The ReRep approach for identification of repetitive elements in GSS datasets proved to be straightforward and efficient. Several potential repetitive sequences were found in a L. braziliensis GSS dataset generated in our laboratory, and further validated by the analysis of a more complete genomic dataset from the EMBL and Sanger Centre databases. ReRep also identified most of the E. coli K12 repeats prior to assembly in an example dataset obtained by automated sequencing using 454 technology. The parameters controlling the algorithm behaved consistently and may be tuned to the properties

  20. The whole genome sequences and experimentally phased haplotypes of over 100 personal genomes.

    Science.gov (United States)

    Mao, Qing; Ciotlos, Serban; Zhang, Rebecca Yu; Ball, Madeleine P; Chin, Robert; Carnevali, Paolo; Barua, Nina; Nguyen, Staci; Agarwal, Misha R; Clegg, Tom; Connelly, Abram; Vandewege, Ward; Zaranek, Alexander Wait; Estep, Preston W; Church, George M; Drmanac, Radoje; Peters, Brock A

    2016-10-11

    Since the completion of the Human Genome Project in 2003, it is estimated that more than 200,000 individual whole human genomes have been sequenced. A stunning accomplishment in such a short period of time. However, most of these were sequenced without experimental haplotype data and are therefore missing an important aspect of genome biology. In addition, much of the genomic data is not available to the public and lacks phenotypic information. As part of the Personal Genome Project, blood samples from 184 participants were collected and processed using Complete Genomics' Long Fragment Read technology. Here, we present the experimental whole genome haplotyping and sequencing of these samples to an average read coverage depth of 100X. This is approximately three-fold higher than the read coverage applied to most whole human genome assemblies and ensures the highest quality results. Currently, 114 genomes from this dataset are freely available in the GigaDB repository and are associated with rich phenotypic data; the remaining 70 should be added in the near future as they are approved through the PGP data release process. For reproducibility analyses, 20 genomes were sequenced at least twice using independent LFR barcoded libraries. Seven genomes were also sequenced using Complete Genomics' standard non-barcoded library process. In addition, we report 2.6 million high-quality, rare variants not previously identified in the Single Nucleotide Polymorphisms database or the 1000 Genomes Project Phase 3 data. These genomes represent a unique source of haplotype and phenotype data for the scientific community and should help to expand our understanding of human genome evolution and function.

  1. Genome Sequencing and Analysis Conference IV

    Energy Technology Data Exchange (ETDEWEB)

    1993-12-31

    J. Craig Venter and C. Thomas Caskey co-chaired Genome Sequencing and Analysis Conference IV held at Hilton Head, South Carolina from September 26--30, 1992. Venter opened the conference by noting that approximately 400 researchers from 16 nations were present four times as many participants as at Genome Sequencing Conference I in 1989. Venter also introduced the Data Fair, a new component of the conference allowing exchange and on-site computer analysis of unpublished sequence data.

  2. A map of human genome variation from population-scale sequencing.

    Science.gov (United States)

    Abecasis, Gonçalo R; Altshuler, David; Auton, Adam; Brooks, Lisa D; Durbin, Richard M; Gibbs, Richard A; Hurles, Matt E; McVean, Gil A

    2010-10-28

    The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. Here we present results of the pilot phase of the project, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms. We undertook three projects: low-coverage whole-genome sequencing of 179 individuals from four populations; high-coverage sequencing of two mother-father-child trios; and exon-targeted sequencing of 697 individuals from seven populations. We describe the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants, most of which were previously undescribed. We show that, because we have catalogued the vast majority of common variation, over 95% of the currently accessible variants found in any individual are present in this data set. On average, each person is found to carry approximately 250 to 300 loss-of-function variants in annotated genes and 50 to 100 variants previously implicated in inherited disorders. We demonstrate how these results can be used to inform association and functional studies. From the two trios, we directly estimate the rate of de novo germline base substitution mutations to be approximately 10(-8) per base pair per generation. We explore the data with regard to signatures of natural selection, and identify a marked reduction of genetic variation in the neighbourhood of genes, due to selection at linked sites. These methods and public data will support the next phase of human genetic research.

  3. Genomic characterization of Burkholderia pseudomallei isolates selected for medical countermeasures testing: comparative genomics associated with differential virulence.

    Directory of Open Access Journals (Sweden)

    Jason W Sahl

    Full Text Available Burkholderia pseudomallei is the causative agent of melioidosis and a potential bioterrorism agent. In the development of medical countermeasures against B. pseudomallei infection, the US Food and Drug Administration (FDA animal Rule recommends using well-characterized strains in animal challenge studies. In this study, whole genome sequence data were generated for 6 B. pseudomallei isolates previously identified as candidates for animal challenge studies; an additional 5 isolates were sequenced that were associated with human inhalational melioidosis. A core genome single nucleotide polymorphism (SNP phylogeny inferred from a concatenated SNP alignment from the 11 isolates sequenced in this study and a diverse global collection of isolates demonstrated the diversity of the proposed Animal Rule isolates. To understand the genomic composition of each isolate, a large-scale blast score ratio (LS-BSR analysis was performed on the entire pan-genome; this demonstrated the variable composition of genes across the panel and also helped to identify genes unique to individual isolates. In addition, a set of ~550 genes associated with pathogenesis in B. pseudomallei were screened against the 11 sequenced genomes with LS-BSR. Differential gene distribution for 54 virulence-associated genes was observed between genomes and three of these genes were correlated with differential virulence observed in animal challenge studies using BALB/c mice. Differentially conserved genes and SNPs associated with disease severity were identified and could be the basis for future studies investigating the pathogenesis of B. pseudomallei. Overall, the genetic characterization of the 11 proposed Animal Rule isolates provides context for future studies involving B. pseudomallei pathogenesis, differential virulence, and efficacy to therapeutics.

  4. Genome and Transcriptome Sequencing of the Ostreid herpesvirus 1 From Tomales Bay, California

    Science.gov (United States)

    Burge, C. A.; Langevin, S.; Closek, C. J.; Roberts, S. B.; Friedman, C. S.

    2016-02-01

    Mass mortalities of larval and seed bivalve molluscs attributed to the Ostreid herpesvirus 1 (OsHV-1) occur globally. OsHV-1 was fully sequenced and characterized as a member of the Family Malacoherpesviridae. Multiple strains of OsHV-1 exist and may vary in virulence, i.e. OsHV-1 µvar. For most global variants of OsHV-1, sequence data is limited to PCR-based sequencing of segments, including two recent genomes. In the United States, OsHV-1 is limited to detection in adjacent embayments in California, Tomales and Drakes bays. Limited DNA sequence data of OsHV-1 infecting oysters in Tomales Bay indicates the virus detected in Tomales Bay is similar but not identical to any one global variant of OsHV-1. In order to better understand both strain variation and virulence of OsHV-1 infecting oysters in Tomales Bay, we used genomic and transcriptomic sequencing. Meta-genomic sequencing (Illumina MiSeq) was conducted from infected oysters (n=4 per year) collected in 2003, 2007, and 2014, where full OsHV-1 genome sequences and low overall microbial diversity were achieved from highly infected oysters. Increased microbial diversity was detected in three of four samples sequenced from 2003, where qPCR based genome copy numbers of OsHV-1 were lower. Expression analysis (SOLiD RNA sequencing) of OsHV-1 genes expressed in oyster larvae at 24 hours post exposure revealed a nearly complete transcriptome, with several highly expressed genes, which are similar to recent transcriptomic analyses of other OsHV-1 variants. Taken together, our results indicate that genome and transcriptome sequencing may be powerful tools in understanding both strain variation and virulence of non-culturable marine viruses.

  5. Value of a newly sequenced bacterial genome

    DEFF Research Database (Denmark)

    Barbosa, Eudes; Aburjaile, Flavia F; Ramos, Rommel Tj

    2014-01-01

    and annotation will not be undertaken. It is important to know what is lost when we settle for a draft genome and to determine the "scientific value" of a newly sequenced genome. This review addresses the expected impact of newly sequenced genomes on antibacterial discovery and vaccinology. Also, it discusses...... heightened expectations that NGS would boost antibacterial discovery and vaccine development. Although many possible drug and vaccine targets have been discovered, the success rate of genome-based analysis has remained below expectations. Furthermore, NGS has had consequences for genome quality, resulting...

  6. Single-Cell Whole-Genome Amplification and Sequencing: Methodology and Applications.

    Science.gov (United States)

    Huang, Lei; Ma, Fei; Chapman, Alec; Lu, Sijia; Xie, Xiaoliang Sunney

    2015-01-01

    We present a survey of single-cell whole-genome amplification (WGA) methods, including degenerate oligonucleotide-primed polymerase chain reaction (DOP-PCR), multiple displacement amplification (MDA), and multiple annealing and looping-based amplification cycles (MALBAC). The key parameters to characterize the performance of these methods are defined, including genome coverage, uniformity, reproducibility, unmappable rates, chimera rates, allele dropout rates, false positive rates for calling single-nucleotide variations, and ability to call copy-number variations. Using these parameters, we compare five commercial WGA kits by performing deep sequencing of multiple single cells. We also discuss several major applications of single-cell genomics, including studies of whole-genome de novo mutation rates, the early evolution of cancer genomes, circulating tumor cells (CTCs), meiotic recombination of germ cells, preimplantation genetic diagnosis (PGD), and preimplantation genomic screening (PGS) for in vitro-fertilized embryos.

  7. Characterization of mango (Mangifera indica L.) transcriptome and chloroplast genome.

    Science.gov (United States)

    Azim, M Kamran; Khan, Ishtaiq A; Zhang, Yong

    2014-05-01

    We characterized mango leaf transcriptome and chloroplast genome using next generation DNA sequencing. The RNA-seq output of mango transcriptome generated >12 million reads (total nucleotides sequenced >1 Gb). De novo transcriptome assembly generated 30,509 unigenes with lengths in the range of 300 to ≥3,000 nt and 67× depth of coverage. Blast searching against nonredundant nucleotide databases and several Viridiplantae genomic datasets annotated 24,593 mango unigenes (80% of total) and identified Citrus sinensis as closest neighbor of mango with 9,141 (37%) matched sequences. The annotation with gene ontology and Clusters of Orthologous Group terms categorized unigene sequences into 57 and 25 classes, respectively. More than 13,500 unigenes were assigned to 293 KEGG pathways. Besides major plant biology related pathways, KEGG based gene annotation pointed out active presence of an array of biochemical pathways involved in (a) biosynthesis of bioactive flavonoids, flavones and flavonols, (b) biosynthesis of terpenoids and lignins and (c) plant hormone signal transduction. The mango transcriptome sequences revealed 235 proteases belonging to five catalytic classes of proteolytic enzymes. The draft genome of mango chloroplast (cp) was obtained by a combination of Sanger and next generation sequencing. The draft mango cp genome size is 151,173 bp with a pair of inverted repeats of 27,093 bp separated by small and large single copy regions, respectively. Out of 139 genes in mango cp genome, 91 found to be protein coding. Sequence analysis revealed cp genome of C. sinensis as closest neighbor of mango. We found 51 short repeats in mango cp genome supposed to be associated with extensive rearrangements. This is the first report of transcriptome and chloroplast genome analysis of any Anacardiaceae family member.

  8. Complete genome sequence of Dyadobacter fermentans type strain (NS114T)

    Energy Technology Data Exchange (ETDEWEB)

    Lang, Elke; Lapidus, Alla; Chertkov, Olga; Brettin, Thomas; Detter, John C.; Han, Cliff; Copeland, Alex; Glavina Del Rio, Tijana; Nolan, Matt; Chen, Feng; Lucas, Susan; Tice, Hope; Cheng, Jan-Fang; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Ovchinnikova, Galina; Pati, Amrita; Ivanova, Natalia; Mavromatis, Konstantinos; Chen, Amy; Chain, Patrick; Bristow, Jim; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Goker, Markus; Rohde, Manfred; Kyrpides, Nikos C; Klenk, Hans-Peter

    2009-05-20

    Dyadobacter fermentans (Chelius MK and Triplett EW, 2000) is the type species of the genus Dyadobacter. It is of phylogenetic interest because of its location in the Cytophagaceae, a very diverse family within the order 'Sphingobacteriales'. D. fermentans has a mainly respiratory metabolism, stains Gram-negative, is non-motile and oxidase and catalase positive. It is characterized by the production of cell filaments in ageing cultures, a flexirubin-like pigment and its ability to ferment glucose, which is almost unique in the aerobically living members of this taxonomically difficult family. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of the 'sphingobacterial' genus Dyadobacter, and this 6,967,790 bp long single replicon genome with its 5804 protein-coding and 50 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  9. Targeted sequencing of plant genomes

    Science.gov (United States)

    Mark D. Huynh

    2014-01-01

    Next-generation sequencing (NGS) has revolutionized the field of genetics by providing a means for fast and relatively affordable sequencing. With the advancement of NGS, wholegenome sequencing (WGS) has become more commonplace. However, sequencing an entire genome is still not cost effective or even beneficial in all cases. In studies that do not require a whole-...

  10. Multilocus Sequence Typing of Total-Genome-Sequenced Bacteria

    DEFF Research Database (Denmark)

    Larsen, Mette Voldby; Cosentino, Salvatore; Rasmussen, Simon

    2012-01-01

    Accurate strain identification is essential for anyone working with bacteria. For many species, multilocus sequence typing (MLST) is considered the "gold standard" of typing, but it is traditionally performed in an expensive and time-consuming manner. As the costs of whole-genome sequencing (WGS...

  11. Whole-Genome de novo Sequencing Of Quail And Grey Partridge

    DEFF Research Database (Denmark)

    Holm, Lars-Erik; Panitz, Frank; Burt, Dave

    2011-01-01

    The development in sequencing methods has made it possible to perform whole genome de novo sequencing of species without large commercial interests. Within the EU-financed QUANTOMICS project (KBBE-2A-222664), we have performed de novo sequencing of quail (Coturnix coturnix) and grey partridge...... (Perdix perdix) on a Genome Analyzer GAII (Illumina) using paired-end sequencing. The amount of generated sequences amounts to 8 to 9 Gb for each species. The analysis and assembly of the generated sequences is ongoing. Access to the whole genome sequence from these two species will enable enhanced...... comparative studies towards the chicken genome and will aid in identifying evolutionarily conserved sequences within the Galliformes. The obtained sequences from quail and partridge represent a beginning of generating the whole genome sequence for these species. The continuation of establishing the genome...

  12. Genomic Characterization for Parasitic Weeds of the Genus Striga by Sample Sequence Analysis

    Directory of Open Access Journals (Sweden)

    Matt C. Estep

    2012-03-01

    Full Text Available Generation of ∼2200 Sanger sequence reads or ∼10,000 454 reads for seven Lour. DNA samples (five species allowed identification of the highly repetitive DNA content in these genomes. The 14 most abundant repeats in these species were identified and partially assembled. Annotation indicated that they represent nine long terminal repeat (LTR retrotransposon families, three tandem satellite repeats, one long interspersed element (LINE retroelement, and one DNA transposon. All of these repeats are most closely related to repetitive elements in other closely related plants and are not products of horizontal transfer from their host species. These repeats were differentially abundant in each species, with the LTR retrotransposons and satellite repeats most responsible for variation in genome size. Each species had some repetitive elements that were more abundant and some less abundant than the other species examined, indicating that no single element or any unilateral growth or decrease trend in genome behavior was responsible for variation in genome size and composition. Genome sizes were determined by flow sorting, and the values of 615 Mb [ (L. Kuntze], 1330 Mb [ (Willd. Vatke], 1425 Mb [ (Delile Benth.] and 2460 Mb ( Benth. suggest a ploidy series, a prediction supported by repetitive DNA sequence analysis. Phylogenetic analysis using six chloroplast loci indicated the ancestral relationships of the five most agriculturally important species, with the unexpected result that the one parasite of dicotyledonous plants ( was found to be more closely related to some of the grass parasites than many of the grass parasites are to each other.

  13. Mitochondrial genome sequences and comparative genomics ofPhytophthora ramorum and P. sojae

    Energy Technology Data Exchange (ETDEWEB)

    Martin, Frank N.; Douda, Bensasson; Tyler, Brett M.; Boore,Jeffrey L.

    2007-01-01

    The complete sequences of the mitochondrial genomes of theoomycetes of Phytophthora ramorum and P. sojae were determined during thecourse of their complete nuclear genome sequencing (Tyler, et al. 2006).Both are circular, with sizes of 39,314 bp for P. ramorum and 42,975 bpfor P. sojae. Each contains a total of 37 identifiable protein-encodinggenes, 25 or 26 tRNAs (P. sojae and P. ramorum, respectively)specifying19 amino acids, and a variable number of ORFs (7 for P. ramorum and 12for P. sojae) which are potentially additional functional genes.Non-coding regions comprise approximately 11.5 percent and 18.4 percentof the genomes of P. ramorum and P. sojae, respectively. Relative to P.sojae, there is an inverted repeat of 1,150 bp in P. ramorum thatincludes an unassigned unique ORF, a tRNA gene, and adjacent non-codingsequences, but otherwise the gene order in both species is identical.Comparisons of these genomes with published sequences of the P. infestansmitochondrial genome reveals a number of similarities, but the gene orderin P. infestans differs in two adjacent locations due to inversions.Sequence alignments of the three genomes indicated sequence conservationranging from 75 to 85 percent and that specific regions were morevariable than others.

  14. MIPS: a database for genomes and protein sequences.

    Science.gov (United States)

    Mewes, H W; Frishman, D; Güldener, U; Mannhaupt, G; Mayer, K; Mokrejs, M; Morgenstern, B; Münsterkötter, M; Rudd, S; Weil, B

    2002-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF, Neuherberg, Germany) continues to provide genome-related information in a systematic way. MIPS supports both national and European sequencing and functional analysis projects, develops and maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences, and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the databases for the comprehensive set of genomes (PEDANT genomes), the database of annotated human EST clusters (HIB), the database of complete cDNAs from the DHGP (German Human Genome Project), as well as the project specific databases for the GABI (Genome Analysis in Plants) and HNB (Helmholtz-Netzwerk Bioinformatik) networks. The Arabidospsis thaliana database (MATDB), the database of mitochondrial proteins (MITOP) and our contribution to the PIR International Protein Sequence Database have been described elsewhere [Schoof et al. (2002) Nucleic Acids Res., 30, 91-93; Scharfe et al. (2000) Nucleic Acids Res., 28, 155-158; Barker et al. (2001) Nucleic Acids Res., 29, 29-32]. All databases described, the protein analysis tools provided and the detailed descriptions of our projects can be accessed through the MIPS World Wide Web server (http://mips.gsf.de).

  15. Genomic signal processing methods for computation of alignment-free distances from DNA sequences.

    Science.gov (United States)

    Borrayo, Ernesto; Mendizabal-Ruiz, E Gerardo; Vélez-Pérez, Hugo; Romo-Vázquez, Rebeca; Mendizabal, Adriana P; Morales, J Alejandro

    2014-01-01

    Genomic signal processing (GSP) refers to the use of digital signal processing (DSP) tools for analyzing genomic data such as DNA sequences. A possible application of GSP that has not been fully explored is the computation of the distance between a pair of sequences. In this work we present GAFD, a novel GSP alignment-free distance computation method. We introduce a DNA sequence-to-signal mapping function based on the employment of doublet values, which increases the number of possible amplitude values for the generated signal. Additionally, we explore the use of three DSP distance metrics as descriptors for categorizing DNA signal fragments. Our results indicate the feasibility of employing GAFD for computing sequence distances and the use of descriptors for characterizing DNA fragments.

  16. Genomic view of bipolar disorder revealed by whole genome sequencing in a genetic isolate.

    Directory of Open Access Journals (Sweden)

    Benjamin Georgi

    2014-03-01

    Full Text Available Bipolar disorder is a common, heritable mental illness characterized by recurrent episodes of mania and depression. Despite considerable effort to elucidate the genetic underpinnings of bipolar disorder, causative genetic risk factors remain elusive. We conducted a comprehensive genomic analysis of bipolar disorder in a large Old Order Amish pedigree. Microsatellite genotypes and high-density SNP-array genotypes of 388 family members were combined with whole genome sequence data for 50 of these subjects, comprising 18 parent-child trios. This study design permitted evaluation of candidate variants within the context of haplotype structure by resolving the phase in sequenced parent-child trios and by imputation of variants into multiple unsequenced siblings. Non-parametric and parametric linkage analysis of the entire pedigree as well as on smaller clusters of families identified several nominally significant linkage peaks, each of which included dozens of predicted deleterious variants. Close inspection of exonic and regulatory variants in genes under the linkage peaks using family-based association tests revealed additional credible candidate genes for functional studies and further replication in population-based cohorts. However, despite the in-depth genomic characterization of this unique, large and multigenerational pedigree from a genetic isolate, there was no convergence of evidence implicating a particular set of risk loci or common pathways. The striking haplotype and locus heterogeneity we observed has profound implications for the design of studies of bipolar and other related disorders.

  17. Genomic View of Bipolar Disorder Revealed by Whole Genome Sequencing in a Genetic Isolate

    Science.gov (United States)

    Georgi, Benjamin; Craig, David; Kember, Rachel L.; Liu, Wencheng; Lindquist, Ingrid; Nasser, Sara; Brown, Christopher; Egeland, Janice A.; Paul, Steven M.; Bućan, Maja

    2014-01-01

    Bipolar disorder is a common, heritable mental illness characterized by recurrent episodes of mania and depression. Despite considerable effort to elucidate the genetic underpinnings of bipolar disorder, causative genetic risk factors remain elusive. We conducted a comprehensive genomic analysis of bipolar disorder in a large Old Order Amish pedigree. Microsatellite genotypes and high-density SNP-array genotypes of 388 family members were combined with whole genome sequence data for 50 of these subjects, comprising 18 parent-child trios. This study design permitted evaluation of candidate variants within the context of haplotype structure by resolving the phase in sequenced parent-child trios and by imputation of variants into multiple unsequenced siblings. Non-parametric and parametric linkage analysis of the entire pedigree as well as on smaller clusters of families identified several nominally significant linkage peaks, each of which included dozens of predicted deleterious variants. Close inspection of exonic and regulatory variants in genes under the linkage peaks using family-based association tests revealed additional credible candidate genes for functional studies and further replication in population-based cohorts. However, despite the in-depth genomic characterization of this unique, large and multigenerational pedigree from a genetic isolate, there was no convergence of evidence implicating a particular set of risk loci or common pathways. The striking haplotype and locus heterogeneity we observed has profound implications for the design of studies of bipolar and other related disorders. PMID:24625924

  18. Rapid and Accurate Sequencing of Enterovirus Genomes Using MinION Nanopore Sequencer.

    Science.gov (United States)

    Wang, Ji; Ke, Yue Hua; Zhang, Yong; Huang, Ke Qiang; Wang, Lei; Shen, Xin Xin; Dong, Xiao Ping; Xu, Wen Bo; Ma, Xue Jun

    2017-10-01

    Knowledge of an enterovirus genome sequence is very important in epidemiological investigation to identify transmission patterns and ascertain the extent of an outbreak. The MinION sequencer is increasingly used to sequence various viral pathogens in many clinical situations because of its long reads, portability, real-time accessibility of sequenced data, and very low initial costs. However, information is lacking on MinION sequencing of enterovirus genomes. In this proof-of-concept study using Enterovirus 71 (EV71) and Coxsackievirus A16 (CA16) strains as examples, we established an amplicon-based whole genome sequencing method using MinION. We explored the accuracy, minimum sequencing time, discrimination and high-throughput sequencing ability of MinION, and compared its performance with Sanger sequencing. Within the first minute (min) of sequencing, the accuracy of MinION was 98.5% for the single EV71 strain and 94.12%-97.33% for 10 genetically-related CA16 strains. In as little as 14 min, 99% identity was reached for the single EV71 strain, and in 17 min (on average), 99% identity was achieved for 10 CA16 strains in a single run. MinION is suitable for whole genome sequencing of enteroviruses with sufficient accuracy and fine discrimination and has the potential as a fast, reliable and convenient method for routine use. Copyright © 2017 The Editorial Board of Biomedical and Environmental Sciences. Published by China CDC. All rights reserved.

  19. From Genome Sequence to Taxonomy - A Skeptic’s View

    DEFF Research Database (Denmark)

    Özen, Asli Ismihan; Vesth, Tammi Camilla; Ussery, David

    2012-01-01

    The relative ease of sequencing bacterial genomes has resulted in thousands of sequenced bacterial genomes available in the public databases. This same technology now allows for using the entire genome sequence as an identifier for an organism. There are many methods available which attempt to us...

  20. Get your high-quality low-cost genome sequence

    NARCIS (Netherlands)

    Faino, L.; Thomma, B.P.H.J.

    2014-01-01

    The study of whole-genome sequences has become essential for almost all branches of biological research. Next-generation sequencing (NGS) has revolutionized the scalability, speed, and resolution of sequencing and brought genomic science within reach of academic laboratories that study non-model

  1. BlastKOALA and GhostKOALA: KEGG Tools for Functional Characterization of Genome and Metagenome Sequences.

    Science.gov (United States)

    Kanehisa, Minoru; Sato, Yoko; Morishima, Kanae

    2016-02-22

    BlastKOALA and GhostKOALA are automatic annotation servers for genome and metagenome sequences, which perform KO (KEGG Orthology) assignments to characterize individual gene functions and reconstruct KEGG pathways, BRITE hierarchies and KEGG modules to infer high-level functions of the organism or the ecosystem. Both servers are made freely available at the KEGG Web site (http://www.kegg.jp/blastkoala/). In BlastKOALA, the KO assignment is performed by a modified version of the internally used KOALA algorithm after the BLAST search against a non-redundant dataset of pangenome sequences at the species, genus or family level, which is generated from the KEGG GENES database by retaining the KO content of each taxonomic category. In GhostKOALA, which utilizes more rapid GHOSTX for database search and is suitable for metagenome annotation, the pangenome dataset is supplemented with Cd-hit clusters including those for viral genes. The result files may be downloaded and manipulated for further KEGG Mapper analysis, such as comparative pathway analysis using multiple BlastKOALA results. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.

  2. Investigation of genome sequences within the family Pasteurellaceae

    DEFF Research Database (Denmark)

    Angen, Øystein; Ussery, David

    Introduction The bacterial genome sequences are now available for an increasing number of strains within the family Pasteurellaceae. At present, 24 Pasteurellaceae genomes are publicly available through internet databases, and another 40 genomes are being sequenced. This investigation will describe...... the core genome for both the family Pasteurellaceae and for the species Haemophilus influenzae. Methods Twenty genome sequences from the following species were included: Haemophilus influenzae (11 strains), Haemophilus ducreyi (1 strain), Histophilus somni (2 strains), Haemophilus parasuis (1 strain......), Actinobacillus pleuropneumoniae (2 strains), Actinobacillus succinogenes (1 strain), Mannheimia succiniciproducens (1 strain), and Pasteurella multocida (1 strain). The predicted proteins for each genome were BLASTed against each other, and a set of conserved core gene families was determined as described...

  3. Assembly of the Complete Sitka Spruce Chloroplast Genome Using 10X Genomics' GemCode Sequencing Data.

    Directory of Open Access Journals (Sweden)

    Lauren Coombe

    Full Text Available The linked read sequencing library preparation platform by 10X Genomics produces barcoded sequencing libraries, which are subsequently sequenced using the Illumina short read sequencing technology. In this new approach, long fragments of DNA are partitioned into separate micro-reactions, where the same index sequence is incorporated into each of the sequencing fragment inserts derived from a given long fragment. In this study, we exploited this property by using reads from index sequences associated with a large number of reads, to assemble the chloroplast genome of the Sitka spruce tree (Picea sitchensis. Here we report on the first Sitka spruce chloroplast genome assembled exclusively from P. sitchensis genomic libraries prepared using the 10X Genomics protocol. We show that the resulting 124,049 base pair long genome shares high sequence similarity with the related white spruce and Norway spruce chloroplast genomes, but diverges substantially from a previously published P. sitchensis- P. thunbergii chimeric genome. The use of reads from high-frequency indices enabled separation of the nuclear genome reads from that of the chloroplast, which resulted in the simplification of the de Bruijn graphs used at the various stages of assembly.

  4. First fungal genome sequence from Africa: A preliminary analysis

    Directory of Open Access Journals (Sweden)

    Rene Sutherland

    2012-01-01

    Full Text Available Some of the most significant breakthroughs in the biological sciences this century will emerge from the development of next generation sequencing technologies. The ease of availability of DNA sequence made possible through these new technologies has given researchers opportunities to study organisms in a manner that was not possible with Sanger sequencing. Scientists will, therefore, need to embrace genomics, as well as develop and nurture the human capacity to sequence genomes and utilise the ’tsunami‘ of data that emerge from genome sequencing. In response to these challenges, we sequenced the genome of Fusarium circinatum, a fungal pathogen of pine that causes pitch canker, a disease of great concern to the South African forestry industry. The sequencing work was conducted in South Africa, making F. circinatum the first eukaryotic organism for which the complete genome has been sequenced locally. Here we report on the process that was followed to sequence, assemble and perform a preliminary characterisation of the genome. Furthermore, details of the computer annotation and manual curation of this genome are presented. The F. circinatum genome was found to be nearly 44 million bases in size, which is similar to that of four other Fusarium genomes that have been sequenced elsewhere. The genome contains just over 15 000 open reading frames, which is less than that of the related species, Fusarium oxysporum, but more than that for Fusarium verticillioides. Amongst the various putative gene clusters identified in F. circinatum, those encoding the secondary metabolites fumosin and fusarin appeared to harbour evidence of gene translocation. It is anticipated that similar comparisons of other loci will provide insights into the genetic basis for pathogenicity of the pitch canker pathogen. Perhaps more importantly, this project has engaged a relatively large group of scientists

  5. Oxford Nanopore MinION Sequencing and Genome Assembly

    Directory of Open Access Journals (Sweden)

    Hengyun Lu

    2016-10-01

    Full Text Available The revolution of genome sequencing is continuing after the successful second-generation sequencing (SGS technology. The third-generation sequencing (TGS technology, led by Pacific Biosciences (PacBio, is progressing rapidly, moving from a technology once only capable of providing data for small genome analysis, or for performing targeted screening, to one that promises high quality de novo assembly and structural variation detection for human-sized genomes. In 2014, the MinION, the first commercial sequencer using nanopore technology, was released by Oxford Nanopore Technologies (ONT. MinION identifies DNA bases by measuring the changes in electrical conductivity generated as DNA strands pass through a biological pore. Its portability, affordability, and speed in data production makes it suitable for real-time applications, the release of the long read sequencer MinION has thus generated much excitement and interest in the genomics community. While de novo genome assemblies can be cheaply produced from SGS data, assembly continuity is often relatively poor, due to the limited ability of short reads to handle long repeats. Assembly quality can be greatly improved by using TGS long reads, since repetitive regions can be easily expanded into using longer sequencing lengths, despite having higher error rates at the base level. The potential of nanopore sequencing has been demonstrated by various studies in genome surveillance at locations where rapid and reliable sequencing is needed, but where resources are limited.

  6. Human genetics and genomics a decade after the release of the draft sequence of the human genome

    Science.gov (United States)

    2011-01-01

    Substantial progress has been made in human genetics and genomics research over the past ten years since the publication of the draft sequence of the human genome in 2001. Findings emanating directly from the Human Genome Project, together with those from follow-on studies, have had an enormous impact on our understanding of the architecture and function of the human genome. Major developments have been made in cataloguing genetic variation, the International HapMap Project, and with respect to advances in genotyping technologies. These developments are vital for the emergence of genome-wide association studies in the investigation of complex diseases and traits. In parallel, the advent of high-throughput sequencing technologies has ushered in the 'personal genome sequencing' era for both normal and cancer genomes, and made possible large-scale genome sequencing studies such as the 1000 Genomes Project and the International Cancer Genome Consortium. The high-throughput sequencing and sequence-capture technologies are also providing new opportunities to study Mendelian disorders through exome sequencing and whole-genome sequencing. This paper reviews these major developments in human genetics and genomics over the past decade. PMID:22155605

  7. Molecular Characterization of Five Potyviruses Infecting Korean Sweet Potatoes Based on Analyses of Complete Genome Sequences

    Directory of Open Access Journals (Sweden)

    Hae-Ryun Kwak

    2015-12-01

    Full Text Available Sweet potatoes (Ipomea batatas L. are grown extensively, in tropical and temperate regions, and are important food crops worldwide. In Korea, potyviruses, including Sweet potato feathery mottle virus (SPFMV, Sweet potato virus C (SPVC, Sweet potato virus G (SPVG, Sweet potato virus 2 (SPV2, and Sweet potato latent virus (SPLV, have been detected in sweet potato fields at a high (~95% incidence. In the present work, complete genome sequences of 18 isolates, representing the five potyviruses mentioned above, were compared with previously reported genome sequences. The complete genomes consisted of 10,081 to 10,830 nucleotides, excluding the poly-A tails. Their genomic organizations were typical of the Potyvirus genus, including one target open reading frame coding for a putative polyprotein. Based on phylogenetic analyses and sequence comparisons, the Korean SPFMV isolates belonged to the strains RC and O with >98% nucleotide sequence identity. Korean SPVC isolates had 99% identity to the Japanese isolate SPVC-Bungo and 70% identity to the SPFMV isolates. The Korean SPVG isolates showed 99% identity to the three previously reported SPVG isolates. Korean SPV2 isolates had 97% identity to the SPV2 GWB-2 isolate from the USA. Korean SPLV isolates had a relatively low (88% nucleotide sequence identity with the Taiwanese SPLV-TW isolates, and they were phylogenetically distantly related to SPFMV isolates. Recombination analysis revealed that possible recombination events occurred in the P1, HC-Pro and NIa-NIb regions of SPFMV and SPLV isolates and these regions were identified as hotspots for recombination in the sweet potato potyviruses.

  8. Complete Genome Sequence of an Avian Paramyxovirus Representative of Putative New Serotype 13

    OpenAIRE

    Goraichuk, Iryna; Sharma, Poonam; Stegniy, Borys; Muzyka, Denys; Pantin-Jackwood, Mary J.; Gerilovych, Anton; Solodiankin, Olexii; Bolotin, Vitaliy; Miller, Patti J.; Dimitrov, Kiril M.; Afonso, Claudio L.

    2016-01-01

    Here, we report the complete genome sequence of a virus of a putative new serotype of avian paramyxovirus (APMV). The virus was isolated from a white-fronted goose in Ukraine in 2011 and designated white-fronted goose/Ukraine/Askania-Nova/48-15-02/2011. The genomic characterization of the isolate suggests that it represents the novel avian paramyxovirus group APMV 13.

  9. Complete Genome Sequence of an Avian Paramyxovirus Representative of Putative New Serotype 13

    Science.gov (United States)

    Goraichuk, Iryna; Sharma, Poonam; Stegniy, Borys; Muzyka, Denys; Pantin-Jackwood, Mary J.; Gerilovych, Anton; Solodiankin, Olexii; Bolotin, Vitaliy; Miller, Patti J.; Dimitrov, Kiril M.

    2016-01-01

    Here, we report the complete genome sequence of a virus of a putative new serotype of avian paramyxovirus (APMV). The virus was isolated from a white-fronted goose in Ukraine in 2011 and designated white-fronted goose/Ukraine/Askania-Nova/48-15-02/2011. The genomic characterization of the isolate suggests that it represents the novel avian paramyxovirus group APMV 13. PMID:27469958

  10. The "most wanted" taxa from the human microbiome for whole genome sequencing.

    Directory of Open Access Journals (Sweden)

    Anthony A Fodor

    Full Text Available The goal of the Human Microbiome Project (HMP is to generate a comprehensive catalog of human-associated microorganisms including reference genomes representing the most common species. Toward this goal, the HMP has characterized the microbial communities at 18 body habitats in a cohort of over 200 healthy volunteers using 16S rRNA gene (16S sequencing and has generated nearly 1,000 reference genomes from human-associated microorganisms. To determine how well current reference genome collections capture the diversity observed among the healthy microbiome and to guide isolation and future sequencing of microbiome members, we compared the HMP's 16S data sets to several reference 16S collections to create a 'most wanted' list of taxa for sequencing. Our analysis revealed that the diversity of commonly occurring taxa within the HMP cohort microbiome is relatively modest, few novel taxa are represented by these OTUs and many common taxa among HMP volunteers recur across different populations of healthy humans. Taken together, these results suggest that it should be possible to perform whole-genome sequencing on a large fraction of the human microbiome, including the 'most wanted', and that these sequences should serve to support microbiome studies across multiple cohorts. Also, in stark contrast to other taxa, the 'most wanted' organisms are poorly represented among culture collections suggesting that novel culture- and single-cell-based methods will be required to isolate these organisms for sequencing.

  11. Navigating the tip of the genomic iceberg: Next-generation sequencing for plant systematics.

    Science.gov (United States)

    Straub, Shannon C K; Parks, Matthew; Weitemier, Kevin; Fishbein, Mark; Cronn, Richard C; Liston, Aaron

    2012-02-01

    Just as Sanger sequencing did more than 20 years ago, next-generation sequencing (NGS) is poised to revolutionize plant systematics. By combining multiplexing approaches with NGS throughput, systematists may no longer need to choose between more taxa or more characters. Here we describe a genome skimming (shallow sequencing) approach for plant systematics. Through simulations, we evaluated optimal sequencing depth and performance of single-end and paired-end short read sequences for assembly of nuclear ribosomal DNA (rDNA) and plastomes and addressed the effect of divergence on reference-guided plastome assembly. We also used simulations to identify potential phylogenetic markers from low-copy nuclear loci at different sequencing depths. We demonstrated the utility of genome skimming through phylogenetic analysis of the Sonoran Desert clade (SDC) of Asclepias (Apocynaceae). Paired-end reads performed better than single-end reads. Minimum sequencing depths for high quality rDNA and plastome assemblies were 40× and 30×, respectively. Divergence from the reference significantly affected plastome assembly, but relatively similar references are available for most seed plants. Deeper rDNA sequencing is necessary to characterize intragenomic polymorphism. The low-copy fraction of the nuclear genome was readily surveyed, even at low sequencing depths. Nearly 160000 bp of sequence from three organelles provided evidence of phylogenetic incongruence in the SDC. Adoption of NGS will facilitate progress in plant systematics, as whole plastome and rDNA cistrons, partial mitochondrial genomes, and low-copy nuclear markers can now be efficiently obtained for molecular phylogenetics studies.

  12. Genome sequences and comparative genomics of two Lactobacillus ruminis strains from the bovine and human intestinal tracts

    LENUS (Irish Health Repository)

    2011-08-30

    Abstract Background The genus Lactobacillus is characterized by an extraordinary degree of phenotypic and genotypic diversity, which recent genomic analyses have further highlighted. However, the choice of species for sequencing has been non-random and unequal in distribution, with only a single representative genome from the L. salivarius clade available to date. Furthermore, there is no data to facilitate a functional genomic analysis of motility in the lactobacilli, a trait that is restricted to the L. salivarius clade. Results The 2.06 Mb genome of the bovine isolate Lactobacillus ruminis ATCC 27782 comprises a single circular chromosome, and has a G+C content of 44.4%. In silico analysis identified 1901 coding sequences, including genes for a pediocin-like bacteriocin, a single large exopolysaccharide-related cluster, two sortase enzymes, two CRISPR loci and numerous IS elements and pseudogenes. A cluster of genes related to a putative pilin was identified, and shown to be transcribed in vitro. A high quality draft assembly of the genome of a second L. ruminis strain, ATCC 25644 isolated from humans, suggested a slightly larger genome of 2.138 Mb, that exhibited a high degree of synteny with the ATCC 27782 genome. In contrast, comparative analysis of L. ruminis and L. salivarius identified a lack of long-range synteny between these closely related species. Comparison of the L. salivarius clade core proteins with those of nine other Lactobacillus species distributed across 4 major phylogenetic groups identified the set of shared proteins, and proteins unique to each group. Conclusions The genome of L. ruminis provides a comparative tool for directing functional analyses of other members of the L. salivarius clade, and it increases understanding of the divergence of this distinct Lactobacillus lineage from other commensal lactobacilli. The genome sequence provides a definitive resource to facilitate investigation of the genetics, biochemistry and host

  13. UFO: a web server for ultra-fast functional profiling of whole genome protein sequences.

    Science.gov (United States)

    Meinicke, Peter

    2009-09-02

    Functional profiling is a key technique to characterize and compare the functional potential of entire genomes. The estimation of profiles according to an assignment of sequences to functional categories is a computationally expensive task because it requires the comparison of all protein sequences from a genome with a usually large database of annotated sequences or sequence families. Based on machine learning techniques for Pfam domain detection, the UFO web server for ultra-fast functional profiling allows researchers to process large protein sequence collections instantaneously. Besides the frequencies of Pfam and GO categories, the user also obtains the sequence specific assignments to Pfam domain families. In addition, a comparison with existing genomes provides dissimilarity scores with respect to 821 reference proteomes. Considering the underlying UFO domain detection, the results on 206 test genomes indicate a high sensitivity of the approach. In comparison with current state-of-the-art HMMs, the runtime measurements show a considerable speed up in the range of four orders of magnitude. For an average size prokaryotic genome, the computation of a functional profile together with its comparison typically requires about 10 seconds of processing time. For the first time the UFO web server makes it possible to get a quick overview on the functional inventory of newly sequenced organisms. The genome scale comparison with a large number of precomputed profiles allows a first guess about functionally related organisms. The service is freely available and does not require user registration or specification of a valid email address.

  14. UFO: a web server for ultra-fast functional profiling of whole genome protein sequences

    Directory of Open Access Journals (Sweden)

    Meinicke Peter

    2009-09-01

    Full Text Available Abstract Background Functional profiling is a key technique to characterize and compare the functional potential of entire genomes. The estimation of profiles according to an assignment of sequences to functional categories is a computationally expensive task because it requires the comparison of all protein sequences from a genome with a usually large database of annotated sequences or sequence families. Description Based on machine learning techniques for Pfam domain detection, the UFO web server for ultra-fast functional profiling allows researchers to process large protein sequence collections instantaneously. Besides the frequencies of Pfam and GO categories, the user also obtains the sequence specific assignments to Pfam domain families. In addition, a comparison with existing genomes provides dissimilarity scores with respect to 821 reference proteomes. Considering the underlying UFO domain detection, the results on 206 test genomes indicate a high sensitivity of the approach. In comparison with current state-of-the-art HMMs, the runtime measurements show a considerable speed up in the range of four orders of magnitude. For an average size prokaryotic genome, the computation of a functional profile together with its comparison typically requires about 10 seconds of processing time. Conclusion For the first time the UFO web server makes it possible to get a quick overview on the functional inventory of newly sequenced organisms. The genome scale comparison with a large number of precomputed profiles allows a first guess about functionally related organisms. The service is freely available and does not require user registration or specification of a valid email address.

  15. Whole genome sequencing reveals genomic heterogeneity and antibiotic purification in Mycobacterium tuberculosis isolates

    KAUST Repository

    Black, PA

    2015-10-24

    Background Whole genome sequencing has revolutionised the interrogation of mycobacterial genomes. Recent studies have reported conflicting findings on the genomic stability of Mycobacterium tuberculosis during the evolution of drug resistance. In an age where whole genome sequencing is increasingly relied upon for defining the structure of bacterial genomes, it is important to investigate the reliability of next generation sequencing to identify clonal variants present in a minor percentage of the population. This study aimed to define a reliable cut-off for identification of low frequency sequence variants and to subsequently investigate genetic heterogeneity and the evolution of drug resistance in M. tuberculosis. Methods Genomic DNA was isolated from single colonies from 14 rifampicin mono-resistant M. tuberculosis isolates, as well as the primary cultures and follow up MDR cultures from two of these patients. The whole genomes of the M. tuberculosis isolates were sequenced using either the Illumina MiSeq or Illumina HiSeq platforms. Sequences were analysed with an in-house pipeline. Results Using next-generation sequencing in combination with Sanger sequencing and statistical analysis we defined a read frequency cut-off of 30 % to identify low frequency M. tuberculosis variants with high confidence. Using this cut-off we demonstrated a high rate of genetic diversity between single colonies isolated from one population, showing that by using the current sequencing technology, single colonies are not a true reflection of the genetic diversity within a whole population and vice versa. We further showed that numerous heterogeneous variants emerge and then disappear during the evolution of isoniazid resistance within individual patients. Our findings allowed us to formulate a model for the selective bottleneck which occurs during the course of infection, acting as a genomic purification event. Conclusions Our study demonstrated true levels of genetic diversity

  16. Complete mitochondrial genome sequences from five Eimeria species (Apicomplexa; Coccidia; Eimeriidae) infecting domestic turkeys.

    Science.gov (United States)

    Ogedengbe, Mosun E; El-Sherry, Shiem; Whale, Julia; Barta, John R

    2014-07-17

    Clinical and subclinical coccidiosis is cosmopolitan and inflicts significant losses to the poultry industry globally. Seven named Eimeria species are responsible for coccidiosis in turkeys: Eimeria dispersa; Eimeria meleagrimitis; Eimeria gallopavonis; Eimeria meleagridis; Eimeria adenoeides; Eimeria innocua; and, Eimeria subrotunda. Although attempts have been made to characterize these parasites molecularly at the nuclear 18S rDNA and ITS loci, the maternally-derived and mitotically replicating mitochondrial genome may be more suited for species level molecular work; however, only limited sequence data are available for Eimeria spp. infecting turkeys. The purpose of this study was to sequence and annotate the complete mitochondrial genomes from 5 Eimeria species that commonly infect the domestic turkey (Meleagris gallopavo). Six single-oocyst derived cultures of five Eimeria species infecting turkeys were PCR-amplified and sequenced completely prior to detailed annotation. Resulting sequences were aligned and used in phylogenetic analyses (BI, ML, and MP) that included complete mitochondrial genomes from 16 Eimeria species or concatenated CDS sequences from each genome. Complete mitochondrial genome sequences were obtained for Eimeria adenoeides Guelph, 6211 bp; Eimeria dispersa Briston, 6238 bp; Eimeria meleagridis USAR97-01, 6212 bp; Eimeria meleagrimitis USMN08-01, 6165 bp; Eimeria gallopavonis Weybridge, 6215 bp; and Eimeria gallopavonis USKS06-01, 6215 bp). The order, orientation and CDS lengths of the three protein coding genes (COI, COIII and CytB) as well as rDNA fragments encoding ribosomal large and small subunit rRNA were conserved among all sequences. Pairwise sequence identities between species ranged from 88.1% to 98.2%; sequence variability was concentrated within CDS or between rDNA fragments (where indels were common). No phylogenetic reconstruction supported monophyly of Eimeria species infecting turkeys; Eimeria dispersa may have arisen

  17. Genome sequencing of ovine isolates of Mycobacterium avium subspecies paratuberculosis offers insights into host association

    Directory of Open Access Journals (Sweden)

    Bannantine John P

    2012-03-01

    Full Text Available Abstract Background The genome of Mycobacterium avium subspecies paratuberculosis (MAP is remarkably homogeneous among the genomes of bovine, human and wildlife isolates. However, previous work in our laboratories with the bovine K-10 strain has revealed substantial differences compared to sheep isolates. To systematically characterize all genomic differences that may be associated with the specific hosts, we sequenced the genomes of three U.S. sheep isolates and also obtained an optical map. Results Our analysis of one of the isolates, MAP S397, revealed a genome 4.8 Mb in size with 4,700 open reading frames (ORFs. Comparative analysis of the MAP S397 isolate showed it acquired approximately 10 large sequence regions that are shared with the human M. avium subsp. hominissuis strain 104 and lost 2 large regions that are present in the bovine strain. In addition, optical mapping defined the presence of 7 large inversions between the bovine and ovine genomes (~ 2.36 Mb. Whole-genome sequencing of 2 additional sheep strains of MAP (JTC1074 and JTC7565 further confirmed genomic homogeneity of the sheep isolates despite the presence of polymorphisms on the nucleotide level. Conclusions Comparative sequence analysis employed here provided a better understanding of the host association, evolution of members of the M. avium complex and could help in deciphering the phenotypic differences observed among sheep and cattle strains of MAP. A similar approach based on whole-genome sequencing combined with optical mapping could be employed to examine closely related pathogens. We propose an evolutionary scenario for M. avium complex strains based on these genome sequences.

  18. Reference genome sequence of the model plant Setaria.

    Science.gov (United States)

    Bennetzen, Jeffrey L; Schmutz, Jeremy; Wang, Hao; Percifield, Ryan; Hawkins, Jennifer; Pontaroli, Ana C; Estep, Matt; Feng, Liang; Vaughn, Justin N; Grimwood, Jane; Jenkins, Jerry; Barry, Kerrie; Lindquist, Erika; Hellsten, Uffe; Deshpande, Shweta; Wang, Xuewen; Wu, Xiaomei; Mitros, Therese; Triplett, Jimmy; Yang, Xiaohan; Ye, Chu-Yu; Mauro-Herrera, Margarita; Wang, Lin; Li, Pinghua; Sharma, Manoj; Sharma, Rita; Ronald, Pamela C; Panaud, Olivier; Kellogg, Elizabeth A; Brutnell, Thomas P; Doust, Andrew N; Tuskan, Gerald A; Rokhsar, Daniel; Devos, Katrien M

    2012-05-13

    We generated a high-quality reference genome sequence for foxtail millet (Setaria italica). The ∼400-Mb assembly covers ∼80% of the genome and >95% of the gene space. The assembly was anchored to a 992-locus genetic map and was annotated by comparison with >1.3 million expressed sequence tag reads. We produced more than 580 million RNA-Seq reads to facilitate expression analyses. We also sequenced Setaria viridis, the ancestral wild relative of S. italica, and identified regions of differential single-nucleotide polymorphism density, distribution of transposable elements, small RNA content, chromosomal rearrangement and segregation distortion. The genus Setaria includes natural and cultivated species that demonstrate a wide capacity for adaptation. The genetic basis of this adaptation was investigated by comparing five sequenced grass genomes. We also used the diploid Setaria genome to evaluate the ongoing genome assembly of a related polyploid, switchgrass (Panicum virgatum).

  19. Reference genome sequence of the model plant Setaria

    Energy Technology Data Exchange (ETDEWEB)

    Bennetzen, Jeffrey L [ORNL; Schmutz, Jeremy [Hudson Alpha Institute of Biotechnology; Wang, Hao [University of Georgia, Athens, GA; Percifield, Ryan [University of Georgia, Athens, GA; Hawkins, Jennifer [University of Georgia, Athens, GA; Pontaroli, Ana C. [University of Georgia, Athens, GA; Estep, Matt [University of Georgia, Athens, GA; Feng, Liang [University of Georgia, Athens, GA; Vaughn, Justin N [ORNL; Grimwood, Jane [Hudson Alpha Institute of Biotechnology; Jenkins, Jerry [Hudson Alpha Institute of Biotechnology; Barry, Kerrie [U.S. Department of Energy, Joint Genome Institute; Lindquist, Erika [U.S. Department of Energy, Joint Genome Institute; Hellsten, Uffe [U.S. Department of Energy, Joint Genome Institute; Deshpande, Shweta [U.S. Department of Energy, Joint Genome Institute; Wang, Xuewen [University of Georgia, Athens, GA; Wu, Xiaomei [University of Georgia, Athens, GA; Mitros, Therese [University of California, Berkeley; Triplett, Jimmy [University of Missouri, St. Louis; Yang, Xiaohan [ORNL; Ye, Chuyu [ORNL; Mauro-Herrera, Margarita [Oklahoma State University; Wang, Lin [Cornell University; Li, Pinghua [Cornell University; Sharma, Manoj [University of California, Davis; Sharma, Rita [University of California, Davis; Ronald, Pamela [University of California, Davis; Panaud, Olivier [Universite de Perpignan, Perpignan, France; Kellogg, Elizabeth A. [University of Missouri, St. Louis; Brutnell, Thomas P. [Cornell University; Doust, Andrew N. [Oklahoma State University; Tuskan, Gerald A [ORNL; Rokhsar, Daniel [U.S. Department of Energy, Joint Genome Institute; Devos, Katrien M [ORNL

    2012-01-01

    We generated a high-quality reference genome sequence for foxtail millet (Setaria italica). The ~400-Mb assembly covers ~80% of the genome and >95% of the gene space. The assembly was anchored to a 992-locus genetic map and was annotated by comparison with >1.3 million expressed sequence tag reads. We produced more than 580 million RNA-Seq reads to facilitate expression analyses. We also sequenced Setaria viridis, the ancestral wild relative of S. italica, and identified regions of differential single-nucleotide polymorphism density, distribution of transposable elements, small RNA content, chromosomal rearrangement and segregation distortion. The genus Setaria includes natural and cultivated species that demonstrate a wide capacity for adaptation. The genetic basis of this adaptation was investigated by comparing five sequenced grass genomes. We also used the diploid Setaria genome to evaluate the ongoing genome assembly of a related polyploid, switchgrass (Panicum virgatum).

  20. Reference genome sequence of the model plant Setaria

    Energy Technology Data Exchange (ETDEWEB)

    Bennetzen, Jeffrey L [ORNL; Yang, Xiaohan [ORNL; Ye, Chuyu [ORNL; Tuskan, Gerald A [ORNL

    2012-01-01

    We generated a high-quality reference genome sequence for foxtail millet (Setaria italica). The {approx}400-Mb assembly covers {approx}80% of the genome and >95% of the gene space. The assembly was anchored to a 992-locus genetic map and was annotated by comparison with >1.3 million expressed sequence tag reads. We produced more than 580 million RNA-Seq reads to facilitate expression analyses. We also sequenced Setaria viridis, the ancestral wild relative of S. italica, and identified regions of differential single-nucleotide polymorphism density, distribution of transposable elements, small RNA content, chromosomal rearrangement and segregation distortion. The genus Setaria includes natural and cultivated species that demonstrate a wide capacity for adaptation. The genetic basis of this adaptation was investigated by comparing five sequenced grass genomes. We also used the diploid Setaria genome to evaluate the ongoing genome assembly of a related polyploid, switchgrass (Panicum virgatum).

  1. The need for high-quality whole-genome sequence databases in microbial forensics.

    Science.gov (United States)

    Sjödin, Andreas; Broman, Tina; Melefors, Öjar; Andersson, Gunnar; Rasmusson, Birgitta; Knutsson, Rickard; Forsman, Mats

    2013-09-01

    Microbial forensics is an important part of a strengthened capability to respond to biocrime and bioterrorism incidents to aid in the complex task of distinguishing between natural outbreaks and deliberate acts. The goal of a microbial forensic investigation is to identify and criminally prosecute those responsible for a biological attack, and it involves a detailed analysis of the weapon--that is, the pathogen. The recent development of next-generation sequencing (NGS) technologies has greatly increased the resolution that can be achieved in microbial forensic analyses. It is now possible to identify, quickly and in an unbiased manner, previously undetectable genome differences between closely related isolates. This development is particularly relevant for the most deadly bacterial diseases that are caused by bacterial lineages with extremely low levels of genetic diversity. Whole-genome analysis of pathogens is envisaged to be increasingly essential for this purpose. In a microbial forensic context, whole-genome sequence analysis is the ultimate method for strain comparisons as it is informative during identification, characterization, and attribution--all 3 major stages of the investigation--and at all levels of microbial strain identity resolution (ie, it resolves the full spectrum from family to isolate). Given these capabilities, one bottleneck in microbial forensics investigations is the availability of high-quality reference databases of bacterial whole-genome sequences. To be of high quality, databases need to be curated and accurate in terms of sequences, metadata, and genetic diversity coverage. The development of whole-genome sequence databases will be instrumental in successfully tracing pathogens in the future.

  2. Probabilistic topic modeling for the analysis and classification of genomic sequences

    Science.gov (United States)

    2015-01-01

    Background Studies on genomic sequences for classification and taxonomic identification have a leading role in the biomedical field and in the analysis of biodiversity. These studies are focusing on the so-called barcode genes, representing a well defined region of the whole genome. Recently, alignment-free techniques are gaining more importance because they are able to overcome the drawbacks of sequence alignment techniques. In this paper a new alignment-free method for DNA sequences clustering and classification is proposed. The method is based on k-mers representation and text mining techniques. Methods The presented method is based on Probabilistic Topic Modeling, a statistical technique originally proposed for text documents. Probabilistic topic models are able to find in a document corpus the topics (recurrent themes) characterizing classes of documents. This technique, applied on DNA sequences representing the documents, exploits the frequency of fixed-length k-mers and builds a generative model for a training group of sequences. This generative model, obtained through the Latent Dirichlet Allocation (LDA) algorithm, is then used to classify a large set of genomic sequences. Results and conclusions We performed classification of over 7000 16S DNA barcode sequences taken from Ribosomal Database Project (RDP) repository, training probabilistic topic models. The proposed method is compared to the RDP tool and Support Vector Machine (SVM) classification algorithm in a extensive set of trials using both complete sequences and short sequence snippets (from 400 bp to 25 bp). Our method reaches very similar results to RDP classifier and SVM for complete sequences. The most interesting results are obtained when short sequence snippets are considered. In these conditions the proposed method outperforms RDP and SVM with ultra short sequences and it exhibits a smooth decrease of performance, at every taxonomic level, when the sequence length is decreased. PMID:25916734

  3. Simultaneous Structural Variation Discovery in Multiple Paired-End Sequenced Genomes

    Science.gov (United States)

    Hormozdiari, Fereydoun; Hajirasouliha, Iman; McPherson, Andrew; Eichler, Evan E.; Sahinalp, S. Cenk

    Next generation sequencing technologies have been decreasing the costs and increasing the world-wide capacity for sequence production at an unprecedented rate, making the initiation of large scale projects aiming to sequence almost 2000 genomes [1]. Structural variation detection promises to be one of the key diagnostic tools for cancer and other diseases with genomic origin. In this paper, we study the problem of detecting structural variation events in two or more sequenced genomes through high throughput sequencing . We propose to move from the current model of (1) detecting genomic variations in single next generation sequenced (NGS) donor genomes independently, and (2) checking whether two or more donor genomes indeed agree or disagree on the variations (in this paper we name this framework Independent Structural Variation Discovery and Merging - ISV&M), to a new model in which we detect structural variation events among multiple genomes simultaneously.

  4. Mining olive genome through library sequencing and bioinformatics ...

    African Journals Online (AJOL)

    As one of the initial steps of olive (Olea europaea L.) genome analysis, a small insert genomic DNA library was constructed (digesting olive genomic DNA with SmaI and cloning the digestion products into pUC19 vector) and randomly picked 83 colonies were sequenced. Analysis of the insert sequences revealed 12 clones ...

  5. Low-pass sequencing for microbial comparative genomics

    Directory of Open Access Journals (Sweden)

    Kennedy Sean

    2004-01-01

    Full Text Available Abstract Background We studied four extremely halophilic archaea by low-pass shotgun sequencing: (1 the metabolically versatile Haloarcula marismortui; (2 the non-pigmented Natrialba asiatica; (3 the psychrophile Halorubrum lacusprofundi and (4 the Dead Sea isolate Halobaculum gomorrense. Approximately one thousand single pass genomic sequences per genome were obtained. The data were analyzed by comparative genomic analyses using the completed Halobacterium sp. NRC-1 genome as a reference. Low-pass shotgun sequencing is a simple, inexpensive, and rapid approach that can readily be performed on any cultured microbe. Results As expected, the four archaeal halophiles analyzed exhibit both bacterial and eukaryotic characteristics as well as uniquely archaeal traits. All five halophiles exhibit greater than sixty percent GC content and low isoelectric points (pI for their predicted proteins. Multiple insertion sequence (IS elements, often involved in genome rearrangements, were identified in H. lacusprofundi and H. marismortui. The core biological functions that govern cellular and genetic mechanisms of H. sp. NRC-1 appear to be conserved in these four other halophiles. Multiple TATA box binding protein (TBP and transcription factor IIB (TFB homologs were identified from most of the four shotgunned halophiles. The reconstructed molecular tree of all five halophiles shows a large divergence between these species, but with the closest relationship being between H. sp. NRC-1 and H. lacusprofundi. Conclusion Despite the diverse habitats of these species, all five halophiles share (1 high GC content and (2 low protein isoelectric points, which are characteristics associated with environmental exposure to UV radiation and hypersalinity, respectively. Identification of multiple IS elements in the genome of H. lacusprofundi and H. marismortui suggest that genome structure and dynamic genome reorganization might be similar to that previously observed in the

  6. Genome Sequence of the Freshwater Yangtze Finless Porpoise.

    Science.gov (United States)

    Yuan, Yuan; Zhang, Peijun; Wang, Kun; Liu, Mingzhong; Li, Jing; Zheng, Jingsong; Wang, Ding; Xu, Wenjie; Lin, Mingli; Dong, Lijun; Zhu, Chenglong; Qiu, Qiang; Li, Songhai

    2018-04-16

    The Yangtze finless porpoise ( Neophocaena asiaeorientalis ssp. asiaeorientalis ) is a subspecies of the narrow-ridged finless porpoise ( N. asiaeorientalis ). In total, 714.28 gigabases (Gb) of raw reads were generated by whole-genome sequencing of the Yangtze finless porpoise, using an Illumina HiSeq 2000 platform. After filtering the low-quality and duplicated reads, we assembled a draft genome of 2.22 Gb, with contig N50 and scaffold N50 values of 46.69 kilobases (kb) and 1.71 megabases (Mb), respectively. We identified 887.63 Mb of repetitive sequences and predicted 18,479 protein-coding genes in the assembled genome. The phylogenetic tree showed a relationship between the Yangtze finless porpoise and the Yangtze River dolphin, which diverged approximately 20.84 million years ago. In comparisons with the genomes of 10 other mammals, we detected 44 species-specific gene families, 164 expanded gene families, and 313 positively selected genes in the Yangtze finless porpoise genome. The assembled genome sequence and underlying sequence data are available at the National Center for Biotechnology Information under BioProject accession number PRJNA433603.

  7. Whole-genome sequencing of a laboratory-evolved yeast strain

    Directory of Open Access Journals (Sweden)

    Dunham Maitreya J

    2010-02-01

    Full Text Available Abstract Background Experimental evolution of microbial populations provides a unique opportunity to study evolutionary adaptation in response to controlled selective pressures. However, until recently it has been difficult to identify the precise genetic changes underlying adaptation at a genome-wide scale. New DNA sequencing technologies now allow the genome of parental and evolved strains of microorganisms to be rapidly determined. Results We sequenced >93.5% of the genome of a laboratory-evolved strain of the yeast Saccharomyces cerevisiae and its ancestor at >28× depth. Both single nucleotide polymorphisms and copy number amplifications were found, with specific gains over array-based methodologies previously used to analyze these genomes. Applying a segmentation algorithm to quantify structural changes, we determined the approximate genomic boundaries of a 5× gene amplification. These boundaries guided the recovery of breakpoint sequences, which provide insights into the nature of a complex genomic rearrangement. Conclusions This study suggests that whole-genome sequencing can provide a rapid approach to uncover the genetic basis of evolutionary adaptations, with further applications in the study of laboratory selections and mutagenesis screens. In addition, we show how single-end, short read sequencing data can provide detailed information about structural rearrangements, and generate predictions about the genomic features and processes that underlie genome plasticity.

  8. The Arabidopsis lyrata genome sequence and the basis of rapid genome size change

    Energy Technology Data Exchange (ETDEWEB)

    Hu, Tina T.; Pattyn, Pedro; Bakker, Erica G.; Cao, Jun; Cheng, Jan-Fang; Clark, Richard M.; Fahlgren, Noah; Fawcett, Jeffrey A.; Grimwood, Jane; Gundlach, Heidrun; Haberer, Georg; Hollister, Jesse D.; Ossowski, Stephan; Ottilar, Robert P.; Salamov, Asaf A.; Schneeberger, Korbinian; Spannagl, Manuel; Wang, Xi; Yang, Liang; Nasrallah, Mikhail E.; Bergelson, Joy; Carrington, James C.; Gaut, Brandon S.; Schmutz, Jeremy; Mayer, Klaus F. X.; Van de Peer, Yves; Grigoriev, Igor V.; Nordborg, Magnus; Weigel, Detlef; Guo, Ya-Long

    2011-04-29

    In our manuscript, we present a high-quality genome sequence of the Arabidopsis thaliana relative, Arabidopsis lyrata, produced by dideoxy sequencing. We have performed the usual types of genome analysis (gene annotation, dN/dS studies etc. etc.), but this is relegated to the Supporting Information. Instead, we focus on what was a major motivation for sequencing this genome, namely to understand how A. thaliana lost half its genome in a few million years and lived to tell the tale. The rather surprising conclusion is that there is not a single genomic feature that accounts for the reduced genome, but that every aspect centromeres, intergenic regions, transposable elements, gene family number is affected through hundreds of thousands of cuts. This strongly suggests that overall genome size in itself is what has been under selection, a suggestion that is strongly supported by our demonstration (using population genetics data from A. thaliana) that new deletions seem to be driven to fixation.

  9. The first insight into the salvia (lamiaceae) genome via bac library construction and high-throughput sequencing of target bac clones

    International Nuclear Information System (INIS)

    Hao, D.C.; Vautrin, S.; Berges, H.; Chen, S.L.

    2015-01-01

    Salvia is a representative genus of Lamiaceae, a eudicot family with significant species diversity and population adaptibility. One of the key goals of Salvia genomics research is to identify genes of adaptive significance. This information may help to improve the conservation of adaptive genetic variation and the management of medicinal plants to increase their health and productivity. Large-insert genomic libraries are a fundamental tool for achieving this purpose. We report herein the construction, characterization and screening of a gridded BAC library for Salvia officinalis (sage). The S. officinalis BAC library consists of 17,764 clones and the average insert size is 107 Kb, corresponding to 3 haploid genome equivalents. Seventeen positive clones (average insert size 115 Kb) containing five terpene synthase (TPS) genes were screened out by PCR and 12 of them were subject to Illumina HiSeq 2000 sequencing, which yielded 28,097,480 90-bp raw reads (2.53 Gb). Scaffolds containing sabinene synthase (Sab), a Sab homolog, TPS3 (kaurene synthase-like 2), copalyl diphosphate synthase 2 and one cytochrome P450 gene were retrieved via de novo assembly and annotation, which also have flanking noncoding sequences, including predicted promoters and repeat sequences. Among 2,638 repeat sequences, there are 330 amplifiable microsatellites. This BAC library provides a new resource for Lamiaceae genomic studies, including microsatellite marker development, physical mapping, comparative genomics and genome sequencing. Characterization of positive clones provided insights into the structure of the Salvia genome. These sequences will be used in the assembly of a future genome sequence for S. officinalis. (author)

  10. Targeted isolation, sequence assembly and characterization of two white spruce (Picea glauca BAC clones for terpenoid synthase and cytochrome P450 genes involved in conifer defence reveal insights into a conifer genome

    Directory of Open Access Journals (Sweden)

    Ritland Carol

    2009-08-01

    Full Text Available Abstract Background Conifers are a large group of gymnosperm trees which are separated from the angiosperms by more than 300 million years of independent evolution. Conifer genomes are extremely large and contain considerable amounts of repetitive DNA. Currently, conifer sequence resources exist predominantly as expressed sequence tags (ESTs and full-length (FLcDNAs. There is no genome sequence available for a conifer or any other gymnosperm. Conifer defence-related genes often group into large families with closely related members. The goals of this study are to assess the feasibility of targeted isolation and sequence assembly of conifer BAC clones containing specific genes from two large gene families, and to characterize large segments of genomic DNA sequence for the first time from a conifer. Results We used a PCR-based approach to identify BAC clones for two target genes, a terpene synthase (3-carene synthase; 3CAR and a cytochrome P450 (CYP720B4 from a non-arrayed genomic BAC library of white spruce (Picea glauca. Shotgun genomic fragments isolated from the BAC clones were sequenced to a depth of 15.6- and 16.0-fold coverage, respectively. Assembly and manual curation yielded sequence scaffolds of 172 kbp (3CAR and 94 kbp (CYP720B4 long. Inspection of the genomic sequences revealed the intron-exon structures, the putative promoter regions and putative cis-regulatory elements of these genes. Sequences related to transposable elements (TEs, high complexity repeats and simple repeats were prevalent and comprised approximately 40% of the sequenced genomic DNA. An in silico simulation of the effect of sequencing depth on the quality of the sequence assembly provides direction for future efforts of conifer genome sequencing. Conclusion We report the first targeted cloning, sequencing, assembly, and annotation of large segments of genomic DNA from a conifer. We demonstrate that genomic BAC clones for individual members of multi-member gene

  11. Targeted isolation, sequence assembly and characterization of two white spruce (Picea glauca) BAC clones for terpenoid synthase and cytochrome P450 genes involved in conifer defence reveal insights into a conifer genome.

    Science.gov (United States)

    Hamberger, Björn; Hall, Dawn; Yuen, Mack; Oddy, Claire; Hamberger, Britta; Keeling, Christopher I; Ritland, Carol; Ritland, Kermit; Bohlmann, Jörg

    2009-08-06

    Conifers are a large group of gymnosperm trees which are separated from the angiosperms by more than 300 million years of independent evolution. Conifer genomes are extremely large and contain considerable amounts of repetitive DNA. Currently, conifer sequence resources exist predominantly as expressed sequence tags (ESTs) and full-length (FL)cDNAs. There is no genome sequence available for a conifer or any other gymnosperm. Conifer defence-related genes often group into large families with closely related members. The goals of this study are to assess the feasibility of targeted isolation and sequence assembly of conifer BAC clones containing specific genes from two large gene families, and to characterize large segments of genomic DNA sequence for the first time from a conifer. We used a PCR-based approach to identify BAC clones for two target genes, a terpene synthase (3-carene synthase; 3CAR) and a cytochrome P450 (CYP720B4) from a non-arrayed genomic BAC library of white spruce (Picea glauca). Shotgun genomic fragments isolated from the BAC clones were sequenced to a depth of 15.6- and 16.0-fold coverage, respectively. Assembly and manual curation yielded sequence scaffolds of 172 kbp (3CAR) and 94 kbp (CYP720B4) long. Inspection of the genomic sequences revealed the intron-exon structures, the putative promoter regions and putative cis-regulatory elements of these genes. Sequences related to transposable elements (TEs), high complexity repeats and simple repeats were prevalent and comprised approximately 40% of the sequenced genomic DNA. An in silico simulation of the effect of sequencing depth on the quality of the sequence assembly provides direction for future efforts of conifer genome sequencing. We report the first targeted cloning, sequencing, assembly, and annotation of large segments of genomic DNA from a conifer. We demonstrate that genomic BAC clones for individual members of multi-member gene families can be isolated in a gene-specific fashion. The

  12. MIPS: a database for protein sequences and complete genomes.

    Science.gov (United States)

    Mewes, H W; Hani, J; Pfeiffer, F; Frishman, D

    1998-01-01

    The MIPS group [Munich Information Center for Protein Sequences of the German National Center for Environment and Health (GSF)] at the Max-Planck-Institute for Biochemistry, Martinsried near Munich, Germany, is involved in a number of data collection activities, including a comprehensive database of the yeast genome, a database reflecting the progress in sequencing the Arabidopsis thaliana genome, the systematic analysis of other small genomes and the collection of protein sequence data within the framework of the PIR-International Protein Sequence Database (described elsewhere in this volume). Through its WWW server (http://www.mips.biochem.mpg.de ) MIPS provides access to a variety of generic databases, including a database of protein families as well as automatically generated data by the systematic application of sequence analysis algorithms. The yeast genome sequence and its related information was also compiled on CD-ROM to provide dynamic interactive access to the 16 chromosomes of the first eukaryotic genome unraveled. PMID:9399795

  13. Whole Genome Re-Sequencing and Characterization of Powdery Mildew Disease-Associated Allelic Variation in Melon.

    Directory of Open Access Journals (Sweden)

    Sathishkumar Natarajan

    Full Text Available Powdery mildew is one of the most common fungal diseases in the world. This disease frequently affects melon (Cucumis melo L. and other Cucurbitaceous family crops in both open field and greenhouse cultivation. One of the goals of genomics is to identify the polymorphic loci responsible for variation in phenotypic traits. In this study, powdery mildew disease assessment scores were calculated for four melon accessions, 'SCNU1154', 'Edisto47', 'MR-1', and 'PMR5'. To investigate the genetic variation of these accessions, whole genome re-sequencing using the Illumina HiSeq 2000 platform was performed. A total of 754,759,704 quality-filtered reads were generated, with an average of 82.64% coverage relative to the reference genome. Comparisons of the sequences for the melon accessions revealed around 7.4 million single nucleotide polymorphisms (SNPs, 1.9 million InDels, and 182,398 putative structural variations (SVs. Functional enrichment analysis of detected variations classified them into biological process, cellular component and molecular function categories. Further, a disease-associated QTL map was constructed for 390 SNPs and 45 InDels identified as related to defense-response genes. Among them 112 SNPs and 12 InDels were observed in powdery mildew responsive chromosomes. Accordingly, this whole genome re-sequencing study identified SNPs and InDels associated with defense genes that will serve as candidate polymorphisms in the search for sources of resistance against powdery mildew disease and could accelerate marker-assisted breeding in melon.

  14. Whole Genome Re-Sequencing and Characterization of Powdery Mildew Disease-Associated Allelic Variation in Melon.

    Science.gov (United States)

    Natarajan, Sathishkumar; Kim, Hoy-Taek; Thamilarasan, Senthil Kumar; Veerappan, Karpagam; Park, Jong-In; Nou, Ill-Sup

    2016-01-01

    Powdery mildew is one of the most common fungal diseases in the world. This disease frequently affects melon (Cucumis melo L.) and other Cucurbitaceous family crops in both open field and greenhouse cultivation. One of the goals of genomics is to identify the polymorphic loci responsible for variation in phenotypic traits. In this study, powdery mildew disease assessment scores were calculated for four melon accessions, 'SCNU1154', 'Edisto47', 'MR-1', and 'PMR5'. To investigate the genetic variation of these accessions, whole genome re-sequencing using the Illumina HiSeq 2000 platform was performed. A total of 754,759,704 quality-filtered reads were generated, with an average of 82.64% coverage relative to the reference genome. Comparisons of the sequences for the melon accessions revealed around 7.4 million single nucleotide polymorphisms (SNPs), 1.9 million InDels, and 182,398 putative structural variations (SVs). Functional enrichment analysis of detected variations classified them into biological process, cellular component and molecular function categories. Further, a disease-associated QTL map was constructed for 390 SNPs and 45 InDels identified as related to defense-response genes. Among them 112 SNPs and 12 InDels were observed in powdery mildew responsive chromosomes. Accordingly, this whole genome re-sequencing study identified SNPs and InDels associated with defense genes that will serve as candidate polymorphisms in the search for sources of resistance against powdery mildew disease and could accelerate marker-assisted breeding in melon.

  15. Genome sequence of Lactobacillus rhamnosus ATCC 8530.

    Science.gov (United States)

    Pittet, Vanessa; Ewen, Emily; Bushell, Barry R; Ziola, Barry

    2012-02-01

    Lactobacillus rhamnosus is found in the human gastrointestinal tract and is important for probiotics. We became interested in L. rhamnosus isolate ATCC 8530 in relation to beer spoilage and hops resistance. We report here the genome sequence of this isolate, along with a brief comparison to other available L. rhamnosus genome sequences.

  16. Genome Sequence of Lactobacillus rhamnosus ATCC 8530

    OpenAIRE

    Pittet, Vanessa; Ewen, Emily; Bushell, Barry R.; Ziola, Barry

    2012-01-01

    Lactobacillus rhamnosus is found in the human gastrointestinal tract and is important for probiotics. We became interested in L. rhamnosus isolate ATCC 8530 in relation to beer spoilage and hops resistance. We report here the genome sequence of this isolate, along with a brief comparison to other available L. rhamnosus genome sequences.

  17. Whole-Genome Sequencing of Lactobacillus salivarius Strains BCRC 14759 and BCRC 12574.

    Science.gov (United States)

    Chiu, Shih-Hau; Chen, Chien-Chi; Wang, Li-Ting; Huang, Lina

    2017-11-22

    Lactobacillus salivarius BCRC 14759 has been identified as a high-exopolysaccharide-producing strain with potential as a probiotic or fermented dairy product. Here, we report the genome sequences of L. salivarius BCRC 14759 and the comparable strain BCRC 12574, isolated from human saliva. The PacBio RSII sequencing platform was used to obtain high-quality assemblies for characterization of this probiotic candidate. Copyright © 2017 Chiu et al.

  18. BioNano genome mapping of individual chromosomes supports physical mapping and sequence assembly in complex plant genomes.

    Science.gov (United States)

    Staňková, Helena; Hastie, Alex R; Chan, Saki; Vrána, Jan; Tulpová, Zuzana; Kubaláková, Marie; Visendi, Paul; Hayashi, Satomi; Luo, Mingcheng; Batley, Jacqueline; Edwards, David; Doležel, Jaroslav; Šimková, Hana

    2016-07-01

    The assembly of a reference genome sequence of bread wheat is challenging due to its specific features such as the genome size of 17 Gbp, polyploid nature and prevalence of repetitive sequences. BAC-by-BAC sequencing based on chromosomal physical maps, adopted by the International Wheat Genome Sequencing Consortium as the key strategy, reduces problems caused by the genome complexity and polyploidy, but the repeat content still hampers the sequence assembly. Availability of a high-resolution genomic map to guide sequence scaffolding and validate physical map and sequence assemblies would be highly beneficial to obtaining an accurate and complete genome sequence. Here, we chose the short arm of chromosome 7D (7DS) as a model to demonstrate for the first time that it is possible to couple chromosome flow sorting with genome mapping in nanochannel arrays and create a de novo genome map of a wheat chromosome. We constructed a high-resolution chromosome map composed of 371 contigs with an N50 of 1.3 Mb. Long DNA molecules achieved by our approach facilitated chromosome-scale analysis of repetitive sequences and revealed a ~800-kb array of tandem repeats intractable to current DNA sequencing technologies. Anchoring 7DS sequence assemblies obtained by clone-by-clone sequencing to the 7DS genome map provided a valuable tool to improve the BAC-contig physical map and validate sequence assembly on a chromosome-arm scale. Our results indicate that creating genome maps for the whole wheat genome in a chromosome-by-chromosome manner is feasible and that they will be an affordable tool to support the production of improved pseudomolecules. © 2016 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.

  19. Five Complete Chloroplast Genome Sequences from Diospyros: Genome Organization and Comparative Analysis.

    Science.gov (United States)

    Fu, Jianmin; Liu, Huimin; Hu, Jingjing; Liang, Yuqin; Liang, Jinjun; Wuyun, Tana; Tan, Xiaofeng

    2016-01-01

    Diospyros is the largest genus in Ebenaceae, comprising more than 500 species with remarkable economic value, especially Diospyros kaki Thunb., which has traditionally been an important food resource in China, Korea, and Japan. Complete chloroplast (cp) genomes from D. kaki, D. lotus L., D. oleifera Cheng., D. glaucifolia Metc., and Diospyros 'Jinzaoshi' were sequenced using Illumina sequencing technology. This is the first cp genome reported in Ebenaceae. The cp genome sequences of Diospyros ranged from 157,300 to 157,784 bp in length, presenting a typical quadripartite structure with two inverted repeats each separated by one large and one small single-copy region. For each cp genome, 134 genes were annotated, including 80 protein-coding, 31 tRNA, and 4 rRNA unique genes. In all, 179 repeats and 283 single sequence repeats were identified. Four hypervariable regions, namely, intergenic region of trnQ_rps16, trnV_ndhC, and psbD_trnT, and intron of ndhA, were identified in the Diospyros genomes. Phylogenetic analyses based on the whole cp genome, protein-coding, and intergenic and intron sequences indicated that D. oleifera is closely related to D. kaki and could be used as a model plant for future research on D. kaki; to our knowledge, this is proposed for the first time. Further, these analyses together with two large deletions (301 and 140 bp) in the cp genome of D. 'Jinzaoshi', support its placement as a new species in Diospyros. Both maximum parsimony and likelihood analyses for 19 taxa indicated the basal position of Ericales in asterids and suggested that Ebenaceae is monophyletic in Ericales.

  20. Five Complete Chloroplast Genome Sequences from Diospyros: Genome Organization and Comparative Analysis.

    Directory of Open Access Journals (Sweden)

    Jianmin Fu

    Full Text Available Diospyros is the largest genus in Ebenaceae, comprising more than 500 species with remarkable economic value, especially Diospyros kaki Thunb., which has traditionally been an important food resource in China, Korea, and Japan. Complete chloroplast (cp genomes from D. kaki, D. lotus L., D. oleifera Cheng., D. glaucifolia Metc., and Diospyros 'Jinzaoshi' were sequenced using Illumina sequencing technology. This is the first cp genome reported in Ebenaceae. The cp genome sequences of Diospyros ranged from 157,300 to 157,784 bp in length, presenting a typical quadripartite structure with two inverted repeats each separated by one large and one small single-copy region. For each cp genome, 134 genes were annotated, including 80 protein-coding, 31 tRNA, and 4 rRNA unique genes. In all, 179 repeats and 283 single sequence repeats were identified. Four hypervariable regions, namely, intergenic region of trnQ_rps16, trnV_ndhC, and psbD_trnT, and intron of ndhA, were identified in the Diospyros genomes. Phylogenetic analyses based on the whole cp genome, protein-coding, and intergenic and intron sequences indicated that D. oleifera is closely related to D. kaki and could be used as a model plant for future research on D. kaki; to our knowledge, this is proposed for the first time. Further, these analyses together with two large deletions (301 and 140 bp in the cp genome of D. 'Jinzaoshi', support its placement as a new species in Diospyros. Both maximum parsimony and likelihood analyses for 19 taxa indicated the basal position of Ericales in asterids and suggested that Ebenaceae is monophyletic in Ericales.

  1. Advancing Eucalyptus genomics: identification and sequencing of lignin biosynthesis genes from deep-coverage BAC libraries

    Directory of Open Access Journals (Sweden)

    Kudrna David

    2011-03-01

    Full Text Available Abstract Background Eucalyptus species are among the most planted hardwoods in the world because of their rapid growth, adaptability and valuable wood properties. The development and integration of genomic resources into breeding practice will be increasingly important in the decades to come. Bacterial artificial chromosome (BAC libraries are key genomic tools that enable positional cloning of important traits, synteny evaluation, and the development of genome framework physical maps for genetic linkage and genome sequencing. Results We describe the construction and characterization of two deep-coverage BAC libraries EG_Ba and EG_Bb obtained from nuclear DNA fragments of E. grandis (clone BRASUZ1 digested with HindIII and BstYI, respectively. Genome coverages of 17 and 15 haploid genome equivalents were estimated for EG_Ba and EG_Bb, respectively. Both libraries contained large inserts, with average sizes ranging from 135 Kb (Eg_Bb to 157 Kb (Eg_Ba, very low extra-nuclear genome contamination providing a probability of finding a single copy gene ≥ 99.99%. Libraries were screened for the presence of several genes of interest via hybridizations to high-density BAC filters followed by PCR validation. Five selected BAC clones were sequenced and assembled using the Roche GS FLX technology providing the whole sequence of the E. grandis chloroplast genome, and complete genomic sequences of important lignin biosynthesis genes. Conclusions The two E. grandis BAC libraries described in this study represent an important milestone for the advancement of Eucalyptus genomics and forest tree research. These BAC resources have a highly redundant genome coverage (> 15×, contain large average inserts and have a very low percentage of clones with organellar DNA or empty vectors. These publicly available BAC libraries are thus suitable for a broad range of applications in genetic and genomic research in Eucalyptus and possibly in related species of Myrtaceae

  2. Genome Sequence, Assembly and Characterization of Two Metschnikowia fructicola Strains Used as Biocontrol Agents of Postharvest Diseases

    Directory of Open Access Journals (Sweden)

    Edoardo Piombo

    2018-04-01

    Full Text Available The yeast Metschnikowia fructicola was reported as an efficient biological control agent of postharvest diseases of fruits and vegetables, and it is the bases of the commercial formulated product “Shemer.” Several mechanisms of action by which M. fructicola inhibits postharvest pathogens were suggested including iron-binding compounds, induction of defense signaling genes, production of fungal cell wall degrading enzymes and relatively high amounts of superoxide anions. We assembled the whole genome sequence of two strains of M. fructicola using PacBio and Illumina shotgun sequencing technologies. Using the PacBio, a high-quality draft genome consisting of 93 contigs, with an estimated genome size of approximately 26 Mb, was obtained. Comparative analysis of M. fructicola proteins with the other three available closely related genomes revealed a shared core of homologous proteins coded by 5,776 genes. Comparing the genomes of the two M. fructicola strains using a SNP calling approach resulted in the identification of 564,302 homologous SNPs with 2,004 predicted high impact mutations. The size of the genome is exceptionally high when compared with those of available closely related organisms, and the high rate of homology among M. fructicola genes points toward a recent whole-genome duplication event as the cause of this large genome. Based on the assembled genome, sequences were annotated with a gene description and gene ontology (GO term and clustered in functional groups. Analysis of CAZymes family genes revealed 1,145 putative genes, and transcriptomic analysis of CAZyme expression levels in M. fructicola during its interaction with either grapefruit peel tissue or Penicillium digitatum revealed a high level of CAZyme gene expression when the yeast was placed in wounded fruit tissue.

  3. Sequencing the CHO DXB11 genome reveals regional variations in genomic stability and haploidy

    DEFF Research Database (Denmark)

    Kaas, Christian Schrøder; Kristensen, Claus; Betenbaugh, Michael J.

    2015-01-01

    Background: The DHFR negative CHO DXB11 cell line (also known as DUX-B11 and DUKX) was historically the first CHO cell line to be used for large scale production of heterologous proteins and is still used for production of a number of complex proteins.  Results: Here we present the genomic sequence...... of the CHO DXB11 genome sequenced to a depth of 33x. Overall a significant genomic drift was seen favoring GC -> AT point mutations in line with the chemical mutagenesis strategy used for generation of the cell line. The sequencing depth for each gene in the genome revealed distinct peaks at sequencing...... in eight additional analyzed CHO genomes (15-20% haploidy) but not in the genome of the Chinese hamster. The dhfr gene is confirmed to be haploid in CHO DXB11; transcriptionally active and the remaining allele contains a G410C point mutation causing a Thr137Arg missense mutation. We find similar to 2...

  4. The Use of Non-Variant Sites to Improve the Clinical Assessment of Whole-Genome Sequence Data.

    Directory of Open Access Journals (Sweden)

    Alberto Ferrarini

    Full Text Available Genetic testing, which is now a routine part of clinical practice and disease management protocols, is often based on the assessment of small panels of variants or genes. On the other hand, continuous improvements in the speed and per-base costs of sequencing have now made whole exome sequencing (WES and whole genome sequencing (WGS viable strategies for targeted or complete genetic analysis, respectively. Standard WGS/WES data analytical workflows generally rely on calling of sequence variants respect to the reference genome sequence. However, the reference genome sequence contains a large number of sites represented by rare alleles, by known pathogenic alleles and by alleles strongly associated to disease by GWAS. It's thus critical, for clinical applications of WGS and WES, to interpret whether non-variant sites are homozygous for the reference allele or if the corresponding genotype cannot be reliably called. Here we show that an alternative analytical approach based on the analysis of both variant and non-variant sites from WGS data allows to genotype more than 92% of sites corresponding to known SNPs compared to 6% genotyped by standard variant analysis. These include homozygous reference sites of clinical interest, thus leading to a broad and comprehensive characterization of variation necessary to an accurate evaluation of disease risk. Altogether, our findings indicate that characterization of both variant and non-variant clinically informative sites in the genome is necessary to allow an accurate clinical assessment of a personal genome. Finally, we propose a highly efficient extended VCF (eVCF file format which allows to store genotype calls for sites of clinical interest while remaining compatible with current variant interpretation software.

  5. Complete genome sequence of Bacillus velezensis G341, a strain with a broad inhibitory spectrum against plant pathogens.

    Science.gov (United States)

    Lee, Hyun-Hee; Park, Jungwook; Lim, Jae Yun; Kim, Hun; Choi, Gyung Ja; Kim, Jin-Cheol; Seo, Young-Su

    2015-10-10

    Bacillus velezensis G341 can suppress plant pathogens by producing antagonistic active compounds including bacillomycin D, fengycin, and (oxy) difficidin. The complete genome sequence of this bacterium was characterized by one circular chromosome of 4,009,746bp with 3953 open reading frames. The genome contained 36 pseudogenes, 30 rRNA operons, and 95 tRNAs. This complete genome sequence provides an additional resource for the development of antimicrobial compounds. Copyright © 2015 Elsevier B.V. All rights reserved.

  6. The Sequenced Angiosperm Genomes and Genome Databases.

    Science.gov (United States)

    Chen, Fei; Dong, Wei; Zhang, Jiawei; Guo, Xinyue; Chen, Junhao; Wang, Zhengjia; Lin, Zhenguo; Tang, Haibao; Zhang, Liangsheng

    2018-01-01

    Angiosperms, the flowering plants, provide the essential resources for human life, such as food, energy, oxygen, and materials. They also promoted the evolution of human, animals, and the planet earth. Despite the numerous advances in genome reports or sequencing technologies, no review covers all the released angiosperm genomes and the genome databases for data sharing. Based on the rapid advances and innovations in the database reconstruction in the last few years, here we provide a comprehensive review for three major types of angiosperm genome databases, including databases for a single species, for a specific angiosperm clade, and for multiple angiosperm species. The scope, tools, and data of each type of databases and their features are concisely discussed. The genome databases for a single species or a clade of species are especially popular for specific group of researchers, while a timely-updated comprehensive database is more powerful for address of major scientific mysteries at the genome scale. Considering the low coverage of flowering plants in any available database, we propose construction of a comprehensive database to facilitate large-scale comparative studies of angiosperm genomes and to promote the collaborative studies of important questions in plant biology.

  7. Sequencing of a new target genome: the Pediculus humanus humanus (Phthiraptera: Pediculidae) genome project.

    Science.gov (United States)

    Pittendrigh, B R; Clark, J M; Johnston, J S; Lee, S H; Romero-Severson, J; Dasch, G A

    2006-11-01

    The human body louse, Pediculus humanus humanus (L.), and the human head louse, Pediculus humanus capitis, belong to the hemimetabolous order Phthiraptera. The body louse is the primary vector that transmits the bacterial agents of louse-borne relapsing fever, trench fever, and epidemic typhus. The genomes of the bacterial causative agents of several of these aforementioned diseases have been sequenced. Thus, determining the body louse genome will enhance studies of host-vector-pathogen interactions. Although not important as a major disease vector, head lice are of major social concern. Resistance to traditional pesticides used to control head and body lice have developed. It is imperative that new molecular targets be discovered for the development of novel compounds to control these insects. No complete genome sequence exists for a hemimetabolous insect species primarily because hemimetabolous insects often have large (2000 Mb) to very large (up to 16,300 Mb) genomes. Fortuitously, we determined that the human body louse has one of the smallest genome sizes known in insects, suggesting it may be a suitable choice as a minimal hemimetabolous genome in which many genes have been eliminated during its adaptation to human parasitism. Because many louse species infest birds and mammals, the body louse genome-sequencing project will facilitate studies of their comparative genomics. A 6-8X coverage of the body louse genome, plus sequenced expressed sequence tags, should provide the entomological, evolutionary biology, medical, and public health communities with useful genetic information.

  8. Whole-Genome-Sequencing characterization of bloodstream infection-causing hypervirulent Klebsiella pneumoniae of capsular serotype K2 and ST374.

    Science.gov (United States)

    Wang, Xiaoli; Xie, Yingzhou; Li, Gang; Liu, Jialin; Li, Xiaobin; Tian, Lijun; Sun, Jingyong; Ou, Hong-Yu; Qu, Hongping

    2018-01-01

    Hypervirulent K. pneumoniae variants (hvKP) have been increasingly reported worldwide, causing metastasis of severe infections such as liver abscesses and bacteremia. The capsular serotype K2 hvKP strains show diverse multi-locus sequence types (MLSTs), but with limited genetics and virulence information. In this study, we report a hypermucoviscous K. pneumoniae strain, RJF293, isolated from a human bloodstream sample in a Chinese hospital. It caused a metastatic infection and fatal septic shock in a critical patient. The microbiological features and genetic background were investigated with multiple approaches. The Strain RJF293 was determined to be multilocis sequence type (ST) 374 and serotype K2, displayed a median lethal dose (LD50) of 1.5 × 10 2 CFU in BALB/c mice and was as virulent as the ST23 K1 serotype hvKP strain NTUH-K2044 in a mouse lethality assay. Whole genome sequencing revealed that the RJF293 genome codes for 32 putative virulence factors and exhibits a unique presence/absence pattern in comparison to the other 105 completely sequenced K. pneumoniae genomes. Whole genome SNP-based phylogenetic analysis revealed that strain RJF293 formed a single clade, distant from those containing either ST66 or ST86 hvKP. Compared to the other sequenced hvKP chromosomes, RJF293 contains several strain-variable regions, including one prophage, one ICEKp1 family integrative and conjugative element and six large genomic islands. The sequencing of the first complete genome of an ST374 K2 hvKP clinical strain should reinforce our understanding of the epidemiology and virulence mechanisms of this bloodstream infection-causing hvKP with clinical significance.

  9. Genome sequence of the olive tree, Olea europaea.

    Science.gov (United States)

    Cruz, Fernando; Julca, Irene; Gómez-Garrido, Jèssica; Loska, Damian; Marcet-Houben, Marina; Cano, Emilio; Galán, Beatriz; Frias, Leonor; Ribeca, Paolo; Derdak, Sophia; Gut, Marta; Sánchez-Fernández, Manuel; García, Jose Luis; Gut, Ivo G; Vargas, Pablo; Alioto, Tyler S; Gabaldón, Toni

    2016-06-27

    The Mediterranean olive tree (Olea europaea subsp. europaea) was one of the first trees to be domesticated and is currently of major agricultural importance in the Mediterranean region as the source of olive oil. The molecular bases underlying the phenotypic differences among domesticated cultivars, or between domesticated olive trees and their wild relatives, remain poorly understood. Both wild and cultivated olive trees have 46 chromosomes (2n). A total of 543 Gb of raw DNA sequence from whole genome shotgun sequencing, and a fosmid library containing 155,000 clones from a 1,000+ year-old olive tree (cv. Farga) were generated by Illumina sequencing using different combinations of mate-pair and pair-end libraries. Assembly gave a final genome with a scaffold N50 of 443 kb, and a total length of 1.31 Gb, which represents 95 % of the estimated genome length (1.38 Gb). In addition, the associated fungus Aureobasidium pullulans was partially sequenced. Genome annotation, assisted by RNA sequencing from leaf, root, and fruit tissues at various stages, resulted in 56,349 unique protein coding genes, suggesting recent genomic expansion. Genome completeness, as estimated using the CEGMA pipeline, reached 98.79 %. The assembled draft genome of O. europaea will provide a valuable resource for the study of the evolution and domestication processes of this important tree, and allow determination of the genetic bases of key phenotypic traits. Moreover, it will enhance breeding programs and the formation of new varieties.

  10. Genomic divergences among cattle, dog and human estimated from large-scale alignments of genomic sequences

    Directory of Open Access Journals (Sweden)

    Shade Larry L

    2006-06-01

    Full Text Available Abstract Background Approximately 11 Mb of finished high quality genomic sequences were sampled from cattle, dog and human to estimate genomic divergences and their regional variation among these lineages. Results Optimal three-way multi-species global sequence alignments for 84 cattle clones or loci (each >50 kb of genomic sequence were constructed using the human and dog genome assemblies as references. Genomic divergences and substitution rates were examined for each clone and for various sequence classes under different functional constraints. Analysis of these alignments revealed that the overall genomic divergences are relatively constant (0.32–0.37 change/site for pairwise comparisons among cattle, dog and human; however substitution rates vary across genomic regions and among different sequence classes. A neutral mutation rate (2.0–2.2 × 10(-9 change/site/year was derived from ancestral repetitive sequences, whereas the substitution rate in coding sequences (1.1 × 10(-9 change/site/year was approximately half of the overall rate (1.9–2.0 × 10(-9 change/site/year. Relative rate tests also indicated that cattle have a significantly faster rate of substitution as compared to dog and that this difference is about 6%. Conclusion This analysis provides a large-scale and unbiased assessment of genomic divergences and regional variation of substitution rates among cattle, dog and human. It is expected that these data will serve as a baseline for future mammalian molecular evolution studies.

  11. Whole genome sequence phylogenetic analysis of four Mexican rabies viruses isolated from cattle.

    Science.gov (United States)

    Bárcenas-Reyes, I; Loza-Rubio, E; Cantó-Alarcón, G J; Luna-Cozar, J; Enríquez-Vázquez, A; Barrón-Rodríguez, R J; Milián-Suazo, F

    2017-08-01

    Phylogenetic analysis of the rabies virus in molecular epidemiology has been traditionally performed on partial sequences of the genome, such as the N, G, and P genes; however, that approach raises concerns about the discriminatory power compared to whole genome sequencing. In this study we characterized four strains of the rabies virus isolated from cattle in Querétaro, Mexico by comparing the whole genome sequence to that of strains from the American, European and Asian continents. Four cattle brain samples positive to rabies and characterized as AgV11, genotype 1, were used in the study. A cDNA sequence was generated by reverse transcription PCR (RT-PCR) using oligo dT. cDNA samples were sequenced in an Illumina NextSeq 500 platform. The phylogenetic analysis was performed with MEGA 6.0. Minimum evolution phylogenetic trees were constructed with the Neighbor-Joining method and bootstrapped with 1000 replicates. Three large and seven small clusters were formed with the 26 sequences used. The largest cluster grouped strains from different species in South America: Brazil, and the French Guyana. The second cluster grouped five strains from Mexico. A Mexican strain reported in a different study was highly related to our four strains, suggesting common source of infection. The phylogenetic analysis shows that the type of host is different for the different regions in the American Continent; rabies is more related to bats. It was concluded that the rabies virus in central Mexico is genetically stable and that it is transmitted by the vampire bat Desmodus rotundus. Copyright © 2017 Elsevier Ltd. All rights reserved.

  12. Characterizing genomic alterations in cancer by complementary functional associations.

    Science.gov (United States)

    Kim, Jong Wook; Botvinnik, Olga B; Abudayyeh, Omar; Birger, Chet; Rosenbluh, Joseph; Shrestha, Yashaswi; Abazeed, Mohamed E; Hammerman, Peter S; DiCara, Daniel; Konieczkowski, David J; Johannessen, Cory M; Liberzon, Arthur; Alizad-Rahvar, Amir Reza; Alexe, Gabriela; Aguirre, Andrew; Ghandi, Mahmoud; Greulich, Heidi; Vazquez, Francisca; Weir, Barbara A; Van Allen, Eliezer M; Tsherniak, Aviad; Shao, Diane D; Zack, Travis I; Noble, Michael; Getz, Gad; Beroukhim, Rameen; Garraway, Levi A; Ardakani, Masoud; Romualdi, Chiara; Sales, Gabriele; Barbie, David A; Boehm, Jesse S; Hahn, William C; Mesirov, Jill P; Tamayo, Pablo

    2016-05-01

    Systematic efforts to sequence the cancer genome have identified large numbers of mutations and copy number alterations in human cancers. However, elucidating the functional consequences of these variants, and their interactions to drive or maintain oncogenic states, remains a challenge in cancer research. We developed REVEALER, a computational method that identifies combinations of mutually exclusive genomic alterations correlated with functional phenotypes, such as the activation or gene dependency of oncogenic pathways or sensitivity to a drug treatment. We used REVEALER to uncover complementary genomic alterations associated with the transcriptional activation of β-catenin and NRF2, MEK-inhibitor sensitivity, and KRAS dependency. REVEALER successfully identified both known and new associations, demonstrating the power of combining functional profiles with extensive characterization of genomic alterations in cancer genomes.

  13. Ribosomal DNA sequence heterogeneity reflects intraspecies phylogenies and predicts genome structure in two contrasting yeast species.

    Science.gov (United States)

    West, Claire; James, Stephen A; Davey, Robert P; Dicks, Jo; Roberts, Ian N

    2014-07-01

    The ribosomal RNA encapsulates a wealth of evolutionary information, including genetic variation that can be used to discriminate between organisms at a wide range of taxonomic levels. For example, the prokaryotic 16S rDNA sequence is very widely used both in phylogenetic studies and as a marker in metagenomic surveys and the internal transcribed spacer region, frequently used in plant phylogenetics, is now recognized as a fungal DNA barcode. However, this widespread use does not escape criticism, principally due to issues such as difficulties in classification of paralogous versus orthologous rDNA units and intragenomic variation, both of which may be significant barriers to accurate phylogenetic inference. We recently analyzed data sets from the Saccharomyces Genome Resequencing Project, characterizing rDNA sequence variation within multiple strains of the baker's yeast Saccharomyces cerevisiae and its nearest wild relative Saccharomyces paradoxus in unprecedented detail. Notably, both species possess single locus rDNA systems. Here, we use these new variation datasets to assess whether a more detailed characterization of the rDNA locus can alleviate the second of these phylogenetic issues, sequence heterogeneity, while controlling for the first. We demonstrate that a strong phylogenetic signal exists within both datasets and illustrate how they can be used, with existing methodology, to estimate intraspecies phylogenies of yeast strains consistent with those derived from whole-genome approaches. We also describe the use of partial Single Nucleotide Polymorphisms, a type of sequence variation found only in repetitive genomic regions, in identifying key evolutionary features such as genome hybridization events and show their consistency with whole-genome Structure analyses. We conclude that our approach can transform rDNA sequence heterogeneity from a problem to a useful source of evolutionary information, enabling the estimation of highly accurate phylogenies of

  14. The past, present and future of mitochondrial genomics: have we sequenced enough mtDNAs?

    Science.gov (United States)

    Smith, David Roy

    2016-01-01

    The year 2014 saw more than a thousand new mitochondrial genome sequences deposited in GenBank-an almost 15% increase from the previous year. Hundreds of peer-reviewed articles accompanied these genomes, making mitochondrial DNAs (mtDNAs) the most sequenced and reported type of eukaryotic chromosome. These mtDNA data have advanced a wide range of scientific fields, from forensics to anthropology to medicine to molecular evolution. But for many biological lineages, mtDNAs are so well sampled that newly published genomes are arguably no longer contributing significantly to the progression of science, and in some cases they are tying up valuable resources, particularly journal editors and referees. Is it time to acknowledge that as a research community we have published enough mitochondrial genome papers? Here, I address this question, exploring the history, milestones and impacts of mitochondrial genomics, the benefits and drawbacks of continuing to publish mtDNAs at a high rate and what the future may hold for such an important and popular genetic marker. I highlight groups for which mtDNAs are still poorly sampled, thus meriting further investigation, and recommend that more energy be spent characterizing aspects of mitochondrial genomes apart from the DNA sequence, such as their chromosomal and transcriptional architectures. Ultimately, one should be mindful before writing a mitochondrial genome paper. Consider perhaps sending the sequence directly to GenBank instead, and be sure to annotate it correctly before submission. © The Author 2015. Published by Oxford University Press.

  15. Mitochondrial genome sequencing helps show the evolutionary mechanism of mitochondrial genome formation in Brassica

    Science.gov (United States)

    2011-01-01

    Background Angiosperm mitochondrial genomes are more complex than those of other organisms. Analyses of the mitochondrial genome sequences of at least 11 angiosperm species have showed several common properties; these cannot easily explain, however, how the diverse mitotypes evolved within each genus or species. We analyzed the evolutionary relationships of Brassica mitotypes by sequencing. Results We sequenced the mitotypes of cam (Brassica rapa), ole (B. oleracea), jun (B. juncea), and car (B. carinata) and analyzed them together with two previously sequenced mitotypes of B. napus (pol and nap). The sizes of whole single circular genomes of cam, jun, ole, and car are 219,747 bp, 219,766 bp, 360,271 bp, and 232,241 bp, respectively. The mitochondrial genome of ole is largest as a resulting of the duplication of a 141.8 kb segment. The jun mitotype is the result of an inherited cam mitotype, and pol is also derived from the cam mitotype with evolutionary modifications. Genes with known functions are conserved in all mitotypes, but clear variation in open reading frames (ORFs) with unknown functions among the six mitotypes was observed. Sequence relationship analysis showed that there has been genome compaction and inheritance in the course of Brassica mitotype evolution. Conclusions We have sequenced four Brassica mitotypes, compared six Brassica mitotypes and suggested a mechanism for mitochondrial genome formation in Brassica, including evolutionary events such as inheritance, duplication, rearrangement, genome compaction, and mutation. PMID:21988783

  16. Genome-Wide Analysis of Simple Sequence Repeats in Bitter Gourd (Momordica charantia

    Directory of Open Access Journals (Sweden)

    Junjie Cui

    2017-06-01

    Full Text Available Bitter gourd (Momordica charantia is widely cultivated as a vegetable and medicinal herb in many Asian and African countries. After the sequencing of the cucumber (Cucumis sativus, watermelon (Citrullus lanatus, and melon (Cucumis melo genomes, bitter gourd became the fourth cucurbit species whose whole genome was sequenced. However, a comprehensive analysis of simple sequence repeats (SSRs in bitter gourd, including a comparison with the three aforementioned cucurbit species has not yet been published. Here, we identified a total of 188,091 and 167,160 SSR motifs in the genomes of the bitter gourd lines ‘Dali-11’ and ‘OHB3-1,’ respectively. Subsequently, the SSR content, motif lengths, and classified motif types were characterized for the bitter gourd genomes and compared among all the cucurbit genomes. Lastly, a large set of 138,727 unique in silico SSR primer pairs were designed for bitter gourd. Among these, 71 primers were selected, all of which successfully amplified SSRs from the two bitter gourd lines ‘Dali-11’ and ‘K44’. To further examine the utilization of unique SSR primers, 21 SSR markers were used to genotype a collection of 211 bitter gourd lines from all over the world. A model-based clustering method and phylogenetic analysis indicated a clear separation among the geographic groups. The genomic SSR markers developed in this study have considerable potential value in advancing bitter gourd research.

  17. Gene Discovery through Genomic Sequencing of Brucella abortus

    Science.gov (United States)

    Sánchez, Daniel O.; Zandomeni, Ruben O.; Cravero, Silvio; Verdún, Ramiro E.; Pierrou, Ester; Faccio, Paula; Diaz, Gabriela; Lanzavecchia, Silvia; Agüero, Fernán; Frasch, Alberto C. C.; Andersson, Siv G. E.; Rossetti, Osvaldo L.; Grau, Oscar; Ugalde, Rodolfo A.

    2001-01-01

    Brucella abortus is the etiological agent of brucellosis, a disease that affects bovines and human. We generated DNA random sequences from the genome of B. abortus strain 2308 in order to characterize molecular targets that might be useful for developing immunological or chemotherapeutic strategies against this pathogen. The partial sequencing of 1,899 clones allowed the identification of 1,199 genomic sequence surveys (GSSs) with high homology (BLAST expect value < 10−5) to sequences deposited in the GenBank databases. Among them, 925 represent putative novel genes for the Brucella genus. Out of 925 nonredundant GSSs, 470 were classified in 15 categories based on cellular function. Seven hundred GSSs showed no significant database matches and remain available for further studies in order to identify their function. A high number of GSSs with homology to Agrobacterium tumefaciens and Rhizobium meliloti proteins were observed, thus confirming their close phylogenetic relationship. Among them, several GSSs showed high similarity with genes related to nodule nitrogen fixation, synthesis of nod factors, nodulation protein symbiotic plasmid, and nodule bacteroid differentiation. We have also identified several B. abortus homologs of virulence and pathogenesis genes from other pathogens, including a homolog to both the Shda gene from Salmonella enterica serovar Typhimurium and the AidA-1 gene from Escherichia coli. Other GSSs displayed significant homologies to genes encoding components of the type III and type IV secretion machineries, suggesting that Brucella might also have an active type III secretion machinery. PMID:11159979

  18. Agaricus bisporus genome sequence: a commentary.

    Science.gov (United States)

    Kerrigan, Richard W; Challen, Michael P; Burton, Kerry S

    2013-06-01

    The genomes of two isolates of Agaricus bisporus have been sequenced recently. This soil-inhabiting fungus has a wide geographical distribution in nature and it is also cultivated in an industrialized indoor process ($4.7bn annual worldwide value) to produce edible mushrooms. Previously this lignocellulosic fungus has resisted precise econutritional classification, i.e. into white- or brown-rot decomposers. The generation of the genome sequence and transcriptomic analyses has revealed a new classification, 'humicolous', for species adapted to grow in humic-rich, partially decomposed leaf material. The Agaricus biporus genomes contain a collection of polysaccharide and lignin-degrading genes and more interestingly an expanded number of genes (relative to other lignocellulosic fungi) that enhance degradation of lignin derivatives, i.e. heme-thiolate peroxidases and β-etherases. A motif that is hypothesized to be a promoter element in the humicolous adaptation suite is present in a large number of genes specifically up-regulated when the mycelium is grown on humic-rich substrate. The genome sequence of A. bisporus offers a platform to explore fungal biology in carbon-rich soil environments and terrestrial cycling of carbon, nitrogen, phosphorus and potassium. Copyright © 2013 Elsevier Inc. All rights reserved.

  19. Genomic Characterization of DArT Markers Based on High-Density Linkage Analysis and Physical Mapping to the Eucalyptus Genome

    Science.gov (United States)

    Petroli, César D.; Sansaloni, Carolina P.; Carling, Jason; Steane, Dorothy A.; Vaillancourt, René E.; Myburg, Alexander A.; da Silva, Orzenil Bonfim; Pappas, Georgios Joannis; Kilian, Andrzej; Grattapaglia, Dario

    2012-01-01

    Diversity Arrays Technology (DArT) provides a robust, high throughput, cost-effective method to query thousands of sequence polymorphisms in a single assay. Despite the extensive use of this genotyping platform for numerous plant species, little is known regarding the sequence attributes and genome-wide distribution of DArT markers. We investigated the genomic properties of the 7,680 DArT marker probes of a Eucalyptus array, by sequencing them, constructing a high density linkage map and carrying out detailed physical mapping analyses to the Eucalyptus grandis reference genome. A consensus linkage map with 2,274 DArT markers anchored to 210 microsatellites and a framework map, with improved support for ordering, displayed extensive collinearity with the genome sequence. Only 1.4 Mbp of the 75 Mbp of still unplaced scaffold sequence was captured by 45 linkage mapped but physically unaligned markers to the 11 main Eucalyptus pseudochromosomes, providing compelling evidence for the quality and completeness of the current Eucalyptus genome assembly. A highly significant correspondence was found between the locations of DArT markers and predicted gene models, while most of the 89 DArT probes unaligned to the genome correspond to sequences likely absent in E. grandis, consistent with the pan-genomic feature of this multi-Eucalyptus species DArT array. These comprehensive linkage-to-physical mapping analyses provide novel data regarding the genomic attributes of DArT markers in plant genomes in general and for Eucalyptus in particular. DArT markers preferentially target the gene space and display a largely homogeneous distribution across the genome, thereby providing superb coverage for mapping and genome-wide applications in breeding and diversity studies. Data reported on these ubiquitous properties of DArT markers will be particularly valuable to researchers working on less-studied crop species who already count on DArT genotyping arrays but for which no reference

  20. Genomic characterization of DArT markers based on high-density linkage analysis and physical mapping to the Eucalyptus genome.

    Directory of Open Access Journals (Sweden)

    César D Petroli

    Full Text Available Diversity Arrays Technology (DArT provides a robust, high throughput, cost-effective method to query thousands of sequence polymorphisms in a single assay. Despite the extensive use of this genotyping platform for numerous plant species, little is known regarding the sequence attributes and genome-wide distribution of DArT markers. We investigated the genomic properties of the 7,680 DArT marker probes of a Eucalyptus array, by sequencing them, constructing a high density linkage map and carrying out detailed physical mapping analyses to the Eucalyptus grandis reference genome. A consensus linkage map with 2,274 DArT markers anchored to 210 microsatellites and a framework map, with improved support for ordering, displayed extensive collinearity with the genome sequence. Only 1.4 Mbp of the 75 Mbp of still unplaced scaffold sequence was captured by 45 linkage mapped but physically unaligned markers to the 11 main Eucalyptus pseudochromosomes, providing compelling evidence for the quality and completeness of the current Eucalyptus genome assembly. A highly significant correspondence was found between the locations of DArT markers and predicted gene models, while most of the 89 DArT probes unaligned to the genome correspond to sequences likely absent in E. grandis, consistent with the pan-genomic feature of this multi-Eucalyptus species DArT array. These comprehensive linkage-to-physical mapping analyses provide novel data regarding the genomic attributes of DArT markers in plant genomes in general and for Eucalyptus in particular. DArT markers preferentially target the gene space and display a largely homogeneous distribution across the genome, thereby providing superb coverage for mapping and genome-wide applications in breeding and diversity studies. Data reported on these ubiquitous properties of DArT markers will be particularly valuable to researchers working on less-studied crop species who already count on DArT genotyping arrays but for

  1. Genomic organization and dynamics of repetitive DNA sequences in representatives of three Fagaceae genera.

    Science.gov (United States)

    Alves, Sofia; Ribeiro, Teresa; Inácio, Vera; Rocheta, Margarida; Morais-Cecílio, Leonor

    2012-05-01

    Oaks, chestnuts, and beeches are economically important species of the Fagaceae. To understand the relationship between these members of this family, a deep knowledge of their genome composition and organization is needed. In this work, we have isolated and characterized several AFLP fragments obtained from Quercus rotundifolia Lam. through homology searches in available databases. Genomic polymorphisms involving some of these sequences were evaluated in two species of Quercus, one of Castanea, and one of Fagus with specific primers. Comparative FISH analysis with generated sequences was performed in interphase nuclei of the four species, and the co-immunolocalization of 5-methylcytosine was also studied. Some of the sequences isolated proved to be genus-specific, while others were present in all the genera. Retroelements, either gypsy-like of the Tat/Athila clade or copia-like, are well represented, and most are dispersed in euchromatic regions of these species with no DNA methylation associated, pointing to an interspersed arrangement of these retroelements with potential gene-rich regions. A particular gypsy-sequence is dispersed in oaks and chestnut nuclei, but its confinement to chromocenters in beech evidences genome restructuring events during evolution of Fagaceae. Several sequences generated in this study proved to be good tools to comparatively study Fagaceae genome organization.

  2. Next Generation DNA Sequencing and the Future of Genomic Medicine

    OpenAIRE

    Anderson, Matthew W.; Schrijver, Iris

    2010-01-01

    In the years since the first complete human genome sequence was reported, there has been a rapid development of technologies to facilitate high-throughput sequence analysis of DNA (termed “next-generation” sequencing). These novel approaches to DNA sequencing offer the promise of complete genomic analysis at a cost feasible for routine clinical diagnostics. However, the ability to more thoroughly interrogate genomic sequence raises a number of important issues with regard to result interpreta...

  3. Genome sequencing and annotation of Stenotrophomonas sp. SAM8

    Directory of Open Access Journals (Sweden)

    Samy Selim

    2015-12-01

    Full Text Available We report draft genome sequence of Stenotrophomonas sp. strain SAM8, isolated from environmental water. The draft genome size is 3,665,538 bp with a G + C content of 67.2% and contains 6 rRNA sequence (single copies of 5S, 16S & 23S rRNA. The genome sequence can be accessed at DDBJ/EMBL/GenBank under the accession no. LDAV00000000.

  4. Sequencing of a Cultivated Diploid Cotton Genome-Gossypium arboreum

    Institute of Scientific and Technical Information of China (English)

    WILKINS; Thea; A

    2008-01-01

    Sequencing the genomes of crop species and model systems contributes significantly to our understanding of the organization,structure and function of plant genomes.In a `white paper' published in 2007,the cotton community set forth a strategic plan for sequencing the AD genome of cultivated upland cotton that initially targets less complex diploid genomes.This strategy banks on the high degree

  5. Genomic Characterization of the Genus Nairovirus (Family Bunyaviridae).

    Science.gov (United States)

    Kuhn, Jens H; Wiley, Michael R; Rodriguez, Sergio E; Bào, Yīmíng; Prieto, Karla; Travassos da Rosa, Amelia P A; Guzman, Hilda; Savji, Nazir; Ladner, Jason T; Tesh, Robert B; Wada, Jiro; Jahrling, Peter B; Bente, Dennis A; Palacios, Gustavo

    2016-06-10

    Nairovirus, one of five bunyaviral genera, includes seven species. Genomic sequence information is limited for members of the Dera Ghazi Khan, Hughes, Qalyub, Sakhalin, and Thiafora nairovirus species. We used next-generation sequencing and historical virus-culture samples to determine 14 complete and nine coding-complete nairoviral genome sequences to further characterize these species. Previously unsequenced viruses include Abu Mina, Clo Mor, Great Saltee, Hughes, Raza, Sakhalin, Soldado, and Tillamook viruses. In addition, we present genomic sequence information on additional isolates of previously sequenced Avalon, Dugbe, Sapphire II, and Zirqa viruses. Finally, we identify Tunis virus, previously thought to be a phlebovirus, as an isolate of Abu Hammad virus. Phylogenetic analyses indicate the need for reassignment of Sapphire II virus to Dera Ghazi Khan nairovirus and reassignment of Hazara, Tofla, and Nairobi sheep disease viruses to novel species. We also propose new species for the Kasokero group (Kasokero, Leopards Hill, Yogue viruses), the Ketarah group (Gossas, Issyk-kul, Keterah/soft tick viruses) and the Burana group (Wēnzhōu tick virus, Huángpí tick virus 1, Tǎchéng tick virus 1). Our analyses emphasize the sister relationship of nairoviruses and arenaviruses, and indicate that several nairo-like viruses (Shāyáng spider virus 1, Xīnzhōu spider virus, Sānxiá water strider virus 1, South Bay virus, Wǔhàn millipede virus 2) require establishment of novel genera in a larger nairovirus-arenavirus supergroup.

  6. Genomic multiple sequence alignments: refinement using a genetic algorithm

    Directory of Open Access Journals (Sweden)

    Lefkowitz Elliot J

    2005-08-01

    Full Text Available Abstract Background Genomic sequence data cannot be fully appreciated in isolation. Comparative genomics – the practice of comparing genomic sequences from different species – plays an increasingly important role in understanding the genotypic differences between species that result in phenotypic differences as well as in revealing patterns of evolutionary relationships. One of the major challenges in comparative genomics is producing a high-quality alignment between two or more related genomic sequences. In recent years, a number of tools have been developed for aligning large genomic sequences. Most utilize heuristic strategies to identify a series of strong sequence similarities, which are then used as anchors to align the regions between the anchor points. The resulting alignment is globally correct, but in many cases is suboptimal locally. We describe a new program, GenAlignRefine, which improves the overall quality of global multiple alignments by using a genetic algorithm to improve local regions of alignment. Regions of low quality are identified, realigned using the program T-Coffee, and then refined using a genetic algorithm. Because a better COFFEE (Consistency based Objective Function For alignmEnt Evaluation score generally reflects greater alignment quality, the algorithm searches for an alignment that yields a better COFFEE score. To improve the intrinsic slowness of the genetic algorithm, GenAlignRefine was implemented as a parallel, cluster-based program. Results We tested the GenAlignRefine algorithm by running it on a Linux cluster to refine sequences from a simulation, as well as refine a multiple alignment of 15 Orthopoxvirus genomic sequences approximately 260,000 nucleotides in length that initially had been aligned by Multi-LAGAN. It took approximately 150 minutes for a 40-processor Linux cluster to optimize some 200 fuzzy (poorly aligned regions of the orthopoxvirus alignment. Overall sequence identity increased only

  7. The genome sequence of Caenorhabditis briggsae: a platform for comparative genomics.

    Directory of Open Access Journals (Sweden)

    Lincoln D Stein

    2003-11-01

    Full Text Available The soil nematodes Caenorhabditis briggsae and Caenorhabditis elegans diverged from a common ancestor roughly 100 million years ago and yet are almost indistinguishable by eye. They have the same chromosome number and genome sizes, and they occupy the same ecological niche. To explore the basis for this striking conservation of structure and function, we have sequenced the C. briggsae genome to a high-quality draft stage and compared it to the finished C. elegans sequence. We predict approximately 19,500 protein-coding genes in the C. briggsae genome, roughly the same as in C. elegans. Of these, 12,200 have clear C. elegans orthologs, a further 6,500 have one or more clearly detectable C. elegans homologs, and approximately 800 C. briggsae genes have no detectable matches in C. elegans. Almost all of the noncoding RNAs (ncRNAs known are shared between the two species. The two genomes exhibit extensive colinearity, and the rate of divergence appears to be higher in the chromosomal arms than in the centers. Operons, a distinctive feature of C. elegans, are highly conserved in C. briggsae, with the arrangement of genes being preserved in 96% of cases. The difference in size between the C. briggsae (estimated at approximately 104 Mbp and C. elegans (100.3 Mbp genomes is almost entirely due to repetitive sequence, which accounts for 22.4% of the C. briggsae genome in contrast to 16.5% of the C. elegans genome. Few, if any, repeat families are shared, suggesting that most were acquired after the two species diverged or are undergoing rapid evolution. Coclustering the C. elegans and C. briggsae proteins reveals 2,169 protein families of two or more members. Most of these are shared between the two species, but some appear to be expanding or contracting, and there seem to be as many as several hundred novel C. briggsae gene families. The C. briggsae draft sequence will greatly improve the annotation of the C. elegans genome. Based on similarity to C

  8. Sequencing and de novo assembly of 150 genomes from Denmark as a population reference

    DEFF Research Database (Denmark)

    Maretty, Lasse; Jensen, Jacob Malte; Petersen, Bent

    2017-01-01

    Hundreds of thousands of human genomes are now being sequenced to characterize genetic variation and use this information to augment association mapping studies of complex disorders and other phenotypic traits. Genetic variation is identified mainly by mapping short reads to the reference genome......-coverage sequencing with mate-pair libraries extending up to 20 kilobases. We report de novo assemblies of 150 individuals (50 trios) from the GenomeDenmark project. The quality of these assemblies is similar to those obtained using the more expensive long-read technology. We use the assemblies to identify a rich set...... or by performing local assembly. However, these approaches are biased against discovery of structural variants and variation in the more complex parts of the genome. Hence, large-scale de novo assembly is needed. Here we show that it is possible to construct excellent de novo assemblies from high...

  9. Sequencing and de novo assembly of 150 genomes from Denmark as a population reference

    DEFF Research Database (Denmark)

    Maretty, Lasse; Jensen, Jacob Malte; Petersen, Bent

    2017-01-01

    Hundreds of thousands of human genomes are now being sequenced to characterize genetic variation and use this information to augment association mapping studies of complex disorders and other phenotypic traits. Genetic variation is identified mainly by mapping short reads to the reference genome...... or by performing local assembly. However, these approaches are biased against discovery of structural variants and variation in the more complex parts of the genome. Hence, large-scale de novo assembly is needed. Here we show that it is possible to construct excellent de novo assemblies from high......-coverage sequencing with mate-pair libraries extending up to 20 kilobases. We report de novo assemblies of 150 individuals (50 trios) from the GenomeDenmark project. The quality of these assemblies is similar to those obtained using the more expensive long-read technology. We use the assemblies to identify a rich set...

  10. Draft Genome Sequence of Lactobacillus rhamnosus 2166.

    OpenAIRE

    Karlyshev, Andrey V.; Melnikov, Vyacheslav G.; Kosarev, Igor V.; Abramov, Vyacheslav M.

    2014-01-01

    In this report, we present a draft sequence of the genome of Lactobacillus rhamnosus strain 2166, a potential novel probiotic. Genome annotation and read mapping onto a reference genome of L. rhamnosus strain GG allowed for the identification of the differences and similarities in the genomic contents and gene arrangements of these strains.

  11. Building the sequence map of the human pan-genome

    DEFF Research Database (Denmark)

    Li, Ruiqiang; Li, Yingrui; Zheng, Hancheng

    2010-01-01

    analysis of predicted genes indicated that the novel sequences contain potentially functional coding regions. We estimate that a complete human pan-genome would contain approximately 19-40 Mb of novel sequence not present in the extant reference genome. The extensive amount of novel sequence contributing...

  12. From Conventional to Next Generation Sequencing of Epstein-Barr Virus Genomes.

    Science.gov (United States)

    Kwok, Hin; Chiang, Alan Kwok Shing

    2016-02-24

    Genomic sequences of Epstein-Barr virus (EBV) have been of interest because the virus is associated with cancers, such as nasopharyngeal carcinoma, and conditions such as infectious mononucleosis. The progress of whole-genome EBV sequencing has been limited by the inefficiency and cost of the first-generation sequencing technology. With the advancement of next-generation sequencing (NGS) and target enrichment strategies, increasing number of EBV genomes has been published. These genomes were sequenced using different approaches, either with or without EBV DNA enrichment. This review provides an overview of the EBV genomes published to date, and a description of the sequencing technology and bioinformatic analyses employed in generating these sequences. We further explored ways through which the quality of sequencing data can be improved, such as using DNA oligos for capture hybridization, and longer insert size and read length in the sequencing runs. These advances will enable large-scale genomic sequencing of EBV which will facilitate a better understanding of the genetic variations of EBV in different geographic regions and discovery of potentially pathogenic variants in specific diseases.

  13. From Conventional to Next Generation Sequencing of Epstein-Barr Virus Genomes

    Directory of Open Access Journals (Sweden)

    Hin Kwok

    2016-02-01

    Full Text Available Genomic sequences of Epstein–Barr virus (EBV have been of interest because the virus is associated with cancers, such as nasopharyngeal carcinoma, and conditions such as infectious mononucleosis. The progress of whole-genome EBV sequencing has been limited by the inefficiency and cost of the first-generation sequencing technology. With the advancement of next-generation sequencing (NGS and target enrichment strategies, increasing number of EBV genomes has been published. These genomes were sequenced using different approaches, either with or without EBV DNA enrichment. This review provides an overview of the EBV genomes published to date, and a description of the sequencing technology and bioinformatic analyses employed in generating these sequences. We further explored ways through which the quality of sequencing data can be improved, such as using DNA oligos for capture hybridization, and longer insert size and read length in the sequencing runs. These advances will enable large-scale genomic sequencing of EBV which will facilitate a better understanding of the genetic variations of EBV in different geographic regions and discovery of potentially pathogenic variants in specific diseases.

  14. Reference-quality genome sequence of Aegilops tauschii, the source of wheat D genome, shows that recombination shapes genome structure and evolution

    Science.gov (United States)

    Aegilops tauschii is the diploid progenitor of the D genome of hexaploid wheat and an important genetic resource for wheat. A reference-quality sequence for the Ae. tauschii genome was produced with a combination of ordered-clone sequencing, whole-genome shotgun sequencing, and BioNano optical geno...

  15. Genome sequence analysis of the model grass Brachypodium distachyon: insights into grass genome evolution

    Energy Technology Data Exchange (ETDEWEB)

    Schulman, Al

    2009-08-09

    Three subfamilies of grasses, the Erhardtoideae (rice), the Panicoideae (maize, sorghum, sugar cane and millet), and the Pooideae (wheat, barley and cool season forage grasses) provide the basis of human nutrition and are poised to become major sources of renewable energy. Here we describe the complete genome sequence of the wild grass Brachypodium distachyon (Brachypodium), the first member of the Pooideae subfamily to be completely sequenced. Comparison of the Brachypodium, rice and sorghum genomes reveals a precise sequence- based history of genome evolution across a broad diversity of the grass family and identifies nested insertions of whole chromosomes into centromeric regions as a predominant mechanism driving chromosome evolution in the grasses. The relatively compact genome of Brachypodium is maintained by a balance of retroelement replication and loss. The complete genome sequence of Brachypodium, coupled to its exceptional promise as a model system for grass research, will support the development of new energy and food crops

  16. Overlapping genomic sequences: a treasure trove of single-nucleotide polymorphisms.

    Science.gov (United States)

    Taillon-Miller, P; Gu, Z; Li, Q; Hillier, L; Kwok, P Y

    1998-07-01

    An efficient strategy to develop a dense set of single-nucleotide polymorphism (SNP) markers is to take advantage of the human genome sequencing effort currently under way. Our approach is based on the fact that bacterial artificial chromosomes (BACs) and P1-based artificial chromosomes (PACs) used in long-range sequencing projects come from diploid libraries. If the overlapping clones sequenced are from different lineages, one is comparing the sequences from 2 homologous chromosomes in the overlapping region. We have analyzed in detail every SNP identified while sequencing three sets of overlapping clones found on chromosome 5p15.2, 7q21-7q22, and 13q12-13q13. In the 200.6 kb of DNA sequence analyzed in these overlaps, 153 SNPs were identified. Computer analysis for repetitive elements and suitability for STS development yielded 44 STSs containing 68 SNPs for further study. All 68 SNPs were confirmed to be present in at least one of the three (Caucasian, African-American, Hispanic) populations studied. Furthermore, 42 of the SNPs tested (62%) were informative in at least one population, 32 (47%) were informative in two or more populations, and 23 (34%) were informative in all three populations. These results clearly indicate that developing SNP markers from overlapping genomic sequence is highly efficient and cost effective, requiring only the two simple steps of developing STSs around the known SNPs and characterizing them in the appropriate populations.

  17. Genomic landscape of ovarian clear cell carcinoma via whole exome sequencing.

    Science.gov (United States)

    Kim, Se Ik; Lee, Ji Won; Lee, Maria; Kim, Hee Seung; Chung, Hyun Hoon; Kim, Jae-Weon; Park, Noh Hyun; Song, Yong-Sang; Seo, Jeong-Sun

    2018-02-01

    To analyze whole exome sequencing (WES) data on ovarian clear cell carcinoma (OCCC) in Korean patients via the technique of next generation sequencing (NGS). Genomic profiles were compared between endometriosis-associated OCCC (EMS-OCCC) and Non-EMS-OCCC. We used serum samples and cancer tissues, stored at the Seoul National University Hospital Human Biobank, that were initially collected from women diagnosed with OCCC between 2012 and 2016. In total, 15 patients were enrolled: 5 with pathologically confirmed EMS-OCCC and 10 with Non-EMS-OCCC. We performed NGS WES on 15 fresh frozen OCCC tissues and matched serum samples, enabling comprehensive genomic characterization of OCCC. OCCC was characterized by complex genomic alterations, with a median of 178 exonic mutations (range, 111-25,798) and a median of 343 somatic copy number variations (range, 43-1,820) per tumor sample. In all, 54 somatic mutations were discovered across 14 genes, including PIK3CA (40%), ARID1A (40%), and KRAS (20%) in the 15 Korean OCCCs. Copy number gains in NTRK1 (33%), MYC (40%), and GNAS (47%) and copy number losses in TET2 (73%), TSC1 (67%), BRCA2 (60%), and SMAD4 (47%) were frequent. The significantly altered pathways were associated with proliferation and survival (including the PI3K/AKT, TP53, and ERBB2 pathways) in 87% of OCCCs and with chromatin remodeling in 47% of OCCCs. No significant differences in frequencies of genetic alterations were detected between EMS-OCCC and Non-EMS-OCCC groups. We successfully characterized the genomic landscape of 15 Korean patients with OCCC. We identified potential therapeutic targets for the treatment of this malignancy. Copyright © 2017. Published by Elsevier Inc.

  18. Genome sequence and organization of a nucleopolyhedrovirus isolated from the smaller tea tortrix, Adoxophyes honmai

    International Nuclear Information System (INIS)

    Nakai, Madoka; Goto, Chie; Kang, Won Kyung; Shikata, Masamitsu; Luque, Teresa; Kunimi, Yasuhisa

    2003-01-01

    Adoxophyes honmai nucleopolyhedrovirus (AdhoNPV) has a distinctive pathology in A. honmai larvae, killing the host more slowly than other NPVs. To further understand the pathology of AdhoNPV, its genome was completely sequenced and compared with those of other baculoviruses. The AdhoNPV genome is 113,220 bp, with a G + C content of 35.6%. It contains 125 putative open reading frames (ORFs), of which 8 are unique to AdhoNPV, and 4 homologous regions. The other 117 ORFs display similarity to previously characterized baculovirus genes involved in early and late gene expression, DNA replication, and structural and auxiliary functions. The phylogenetic position of AdhoNPV, in relation to 15 other baculoviruses whose genomes have been completely sequenced, was assessed by three different analyses: gene sequence, gene order, and gene content. Although gene content analysis failed to support the group II NPVs, phylogenetic trees based on gene sequence and gene order showed AdhoNPV to be closely related to the group II NPVs

  19. Recombination-dependent replication and gene conversion homogenize repeat sequences and diversify plastid genome structure.

    Science.gov (United States)

    Ruhlman, Tracey A; Zhang, Jin; Blazier, John C; Sabir, Jamal S M; Jansen, Robert K

    2017-04-01

    There is a misinterpretation in the literature regarding the variable orientation of the small single copy region of plastid genomes (plastomes). The common phenomenon of small and large single copy inversion, hypothesized to occur through intramolecular recombination between inverted repeats (IR) in a circular, single unit-genome, in fact, more likely occurs through recombination-dependent replication (RDR) of linear plastome templates. If RDR can be primed through both intra- and intermolecular recombination, then this mechanism could not only create inversion isomers of so-called single copy regions, but also an array of alternative sequence arrangements. We used Illumina paired-end and PacBio single-molecule real-time (SMRT) sequences to characterize repeat structure in the plastome of Monsonia emarginata (Geraniaceae). We used OrgConv and inspected nucleotide alignments to infer ancestral nucleotides and identify gene conversion among repeats and mapped long (>1 kb) SMRT reads against the unit-genome assembly to identify alternative sequence arrangements. Although M. emarginata lacks the canonical IR, we found that large repeats (>1 kilobase; kb) represent ∼22% of the plastome nucleotide content. Among the largest repeats (>2 kb), we identified GC-biased gene conversion and mapping filtered, long SMRT reads to the M. emarginata unit-genome assembly revealed alternative, substoichiometric sequence arrangements. We offer a model based on RDR and gene conversion between long repeated sequences in the M. emarginata plastome and provide support that both intra-and intermolecular recombination between large repeats, particularly in repeat-rich plastomes, varies unit-genome structure while homogenizing the nucleotide sequence of repeats. © 2017 Botanical Society of America.

  20. The complete chloroplast genome sequence of Dendrobium officinale.

    Science.gov (United States)

    Yang, Pei; Zhou, Hong; Qian, Jun; Xu, Haibin; Shao, Qingsong; Li, Yonghua; Yao, Hui

    2016-01-01

    The complete chloroplast sequence of Dendrobium officinale, an endangered and economically important traditional Chinese medicine, was reported and characterized. The genome size is 152,018 bp, with 37.5% GC content. A pair of inverted repeats (IRs) of 26,284 bp are separated by a large single-copy region (LSC, 84,944 bp) and a small single-copy region (SSC, 14,506 bp). The complete cp DNA contains 83 protein-coding genes, 39 tRNA genes and 8 rRNA genes. Fourteen genes contained one or two introns.

  1. Genome sequencing and annotation of Proteus sp. SAS71

    Directory of Open Access Journals (Sweden)

    Samy Selim

    2015-12-01

    Full Text Available We report draft genome sequence of Proteus sp. strain SAS71, isolated from water spring in Aljouf region, Saudi Arabia. The draft genome size is 3,037,704 bp with a G + C content of 39.3% and contains 6 rRNA sequence (single copies of 5S, 16S & 23S rRNA. The genome sequence can be accessed at DDBJ/EMBL/GenBank under the accession no. LDIU00000000.

  2. The complete genome sequence and analysis of the human pathogen Campylobacter lari

    DEFF Research Database (Denmark)

    Miller, WG; Wang, G; Binnewies, Tim Terence

    2008-01-01

    Campylobacter lari is a member of the epsilon subdivision of the Proteobacteria and is part of the thermotolerant Campylobacter group, a clade that includes the human pathogen C. jejuni. Here we present the complete genome sequence of the human clinical isolate, C. lari RM2100. The genome of strain...... RM2100 is approximately 1.53 Mb and includes the 46 kb megaplasmid pCL2100. Also present within the strain RM2100 genome is a 36 kb putative prophage, termed CLIE1, which is similar to CJIE4, a putative prophage present within the C. jejuni RM1221 genome. Nearly all (90%) of the gene content...... in strain RM2100 is similar to genes present in the genomes of other characterized thermotolerant campylobacters. However, several genes involved in amino acid biosynthesis and energy metabolism, identified previously in other Campylobacter genomes, are absent from the C. lari RM2100 genome. Therefore, C...

  3. Genomic Sequencing of Single Microbial Cells from Environmental Samples

    Energy Technology Data Exchange (ETDEWEB)

    Ishoey, Thomas; Woyke, Tanja; Stepanauskas, Ramunas; Novotny, Mark; Lasken, Roger S.

    2008-02-01

    Recently developed techniques allow genomic DNA sequencing from single microbial cells [Lasken RS: Single-cell genomic sequencing using multiple displacement amplification, Curr Opin Microbiol 2007, 10:510-516]. Here, we focus on research strategies for putting these methods into practice in the laboratory setting. An immediate consequence of single-cell sequencing is that it provides an alternative to culturing organisms as a prerequisite for genomic sequencing. The microgram amounts of DNA required as template are amplified from a single bacterium by a method called multiple displacement amplification (MDA) avoiding the need to grow cells. The ability to sequence DNA from individual cells will likely have an immense impact on microbiology considering the vast numbers of novel organisms, which have been inaccessible unless culture-independent methods could be used. However, special approaches have been necessary to work with amplified DNA. MDA may not recover the entire genome from the single copy present in most bacteria. Also, some sequence rearrangements can occur during the DNA amplification reaction. Over the past two years many research groups have begun to use MDA, and some practical approaches to single-cell sequencing have been developed. We review the consensus that is emerging on optimum methods, reliability of amplified template, and the proper interpretation of 'composite' genomes which result from the necessity of combining data from several single-cell MDA reactions in order to complete the assembly. Preferred laboratory methods are considered on the basis of experience at several large sequencing centers where >70% of genomes are now often recovered from single cells. Methods are reviewed for preparation of bacterial fractions from environmental samples, single-cell isolation, DNA amplification by MDA, and DNA sequencing.

  4. Genomic characterization of the Taylorella genus.

    Directory of Open Access Journals (Sweden)

    Laurent Hébert

    Full Text Available The Taylorella genus comprises two species: Taylorella equigenitalis, which causes contagious equine metritis, and Taylorella asinigenitalis, a closely-related species mainly found in donkeys. We herein report on the first genome sequence of T. asinigenitalis, analyzing and comparing it with the recently-sequenced T. equigenitalis genome. The T. asinigenitalis genome contains a single circular chromosome of 1,638,559 bp with a 38.3% GC content and 1,534 coding sequences (CDS. While 212 CDSs were T. asinigenitalis-specific, 1,322 had orthologs in T. equigenitalis. Two hundred and thirty-four T. equigenitalis CDSs had no orthologs in T. asinigenitalis. Analysis of the basic nutrition metabolism of both Taylorella species showed that malate, glutamate and alpha-ketoglutarate may be their main carbon and energy sources. For both species, we identified four different secretion systems and several proteins potentially involved in binding and colonization of host cells, suggesting a strong potential for interaction with their host. T. equigenitalis seems better-equipped than T. asinigenitalis in terms of virulence since we identified numerous proteins potentially involved in pathogenicity, including hemagluttinin-related proteins, a type IV secretion system, TonB-dependent lactoferrin and transferrin receptors, and YadA and Hep_Hag domains containing proteins. This is the first molecular characterization of Taylorella genus members, and the first molecular identification of factors potentially involved in T. asinigenitalis and T. equigenitalis pathogenicity and host colonization. This study facilitates a genetic understanding of growth phenotypes, animal host preference and pathogenic capacity, paving the way for future functional investigations into this largely unknown genus.

  5. Molecular characterization, genomic distribution and evolutionary dynamics of Short INterspersed Elements in the termite genome.

    Science.gov (United States)

    Luchetti, Andrea; Mantovani, Barbara

    2011-02-01

    Short INterspersed Elements (SINEs) in invertebrates, and especially in animal inbred genomes such that of termites, are poorly known; in this paper we characterize three new SINE families (Talub, Taluc and Talud) through the analyses of 341 sequences, either isolated from the Reticulitermes lucifugus genome or drawn from EST Genbank collection. We further add new data to the only isopteran element known so far, Talua. These SINEs are tRNA-derived elements, with an average length ranging from 258 to 372 bp. The tails are made up by poly(A) or microsatellite motifs. Their copy number varies from 7.9 × 10(3) to 10(5) copies, well within the range observed for other metazoan genomes. Species distribution, age and target site duplication analysis indicate Talud as the oldest, possibly inactive SINE originated before the onset of Isoptera (~150 Myr ago). Taluc underwent to substantial sequence changes throughout the evolution of termites and data suggest it was silenced and then re-activated in the R. lucifugus lineage. Moreover, Taluc shares a conserved sequence block with other unrelated SINEs, as observed for some vertebrate and cephalopod elements. The study of genomic environment showed that insertions are mainly surrounded by microsatellites and other SINEs, indicating a biased accumulation within non-coding regions. The evolutionary dynamics of Talu~ elements is explained through selective mechanisms acting in an inbred genome; in this respect, the study of termites' SINEs activity may provide an interesting framework to address the (co)evolution of mobile elements and the host genome.

  6. Whole-Genome Sequences of Thirteen Isolates of Borrelia burgdorferi

    Energy Technology Data Exchange (ETDEWEB)

    Schutzer S. E.; Dunn J.; Fraser-Liggett, C. M.; Casjens, S. R.; Qiu, W.-G.; Mongodin, E. F.; Luft, B. J.

    2011-02-01

    Borrelia burgdorferi is a causative agent of Lyme disease in North America and Eurasia. The first complete genome sequence of B. burgdorferi strain 31, available for more than a decade, has assisted research on the pathogenesis of Lyme disease. Because a single genome sequence is not sufficient to understand the relationship between genotypic and geographic variation and disease phenotype, we determined the whole-genome sequences of 13 additional B. burgdorferi isolates that span the range of natural variation. These sequences should allow improved understanding of pathogenesis and provide a foundation for novel detection, diagnosis, and prevention strategies.

  7. Recurrence time statistics: versatile tools for genomic DNA sequence analysis.

    Science.gov (United States)

    Cao, Yinhe; Tung, Wen-Wen; Gao, J B

    2004-01-01

    With the completion of the human and a few model organisms' genomes, and the genomes of many other organisms waiting to be sequenced, it has become increasingly important to develop faster computational tools which are capable of easily identifying the structures and extracting features from DNA sequences. One of the more important structures in a DNA sequence is repeat-related. Often they have to be masked before protein coding regions along a DNA sequence are to be identified or redundant expressed sequence tags (ESTs) are to be sequenced. Here we report a novel recurrence time based method for sequence analysis. The method can conveniently study all kinds of periodicity and exhaustively find all repeat-related features from a genomic DNA sequence. An efficient codon index is also derived from the recurrence time statistics, which has the salient features of being largely species-independent and working well on very short sequences. Efficient codon indices are key elements of successful gene finding algorithms, and are particularly useful for determining whether a suspected EST belongs to a coding or non-coding region. We illustrate the power of the method by studying the genomes of E. coli, the yeast S. cervisivae, the nematode worm C. elegans, and the human, Homo sapiens. Computationally, our method is very efficient. It allows us to carry out analysis of genomes on the whole genomic scale by a PC.

  8. Whole genome sequencing and bioinformatics analysis of two Egyptian genomes.

    Science.gov (United States)

    ElHefnawi, Mahmoud; Jeon, Sungwon; Bhak, Youngjune; ElFiky, Asmaa; Horaiz, Ahmed; Jun, JeHoon; Kim, Hyunho; Bhak, Jong

    2018-05-15

    We report two Egyptian male genomes (EGP1 and EGP2) sequenced at ~ 30× sequencing depths. EGP1 had 4.7 million variants, where 198,877 were novel variants while EGP2 had 209,109 novel variants out of 4.8 million variants. The mitochondrial haplogroup of the two individuals were identified to be H7b1 and L2a1c, respectively. We also identified the Y haplogroup of EGP1 (R1b) and EGP2 (J1a2a1a2 > P58 > FGC11). EGP1 had a mutation in the NADH gene of the mitochondrial genome ND4 (m.11778 G > A) that causes Leber's hereditary optic neuropathy. Some SNPs shared by the two genomes were associated with an increased level of cholesterol and triglycerides, probably related with Egyptians obesity. Comparison of these genomes with African and Western-Asian genomes can provide insights on Egyptian ancestry and genetic history. This resource can be used to further understand genomic diversity and functional classification of variants as well as human migration and evolution across Africa and Western-Asia. Copyright © 2017. Published by Elsevier B.V.

  9. Puzzling sequences: studying microbial genomes from 'Ötzi'

    International Nuclear Information System (INIS)

    Rattei, T.

    2012-01-01

    Ancient remains, and mummies in particular, are of central value for archaeological research. The Tyrolean iceman “Ötzi” was conserved in a glacier of the Ötztal Alps about 5000 years ago. Aside from morphological and phenotypical classification, the determination of DNA sequences and the subsequent genome analyses have been first applied to mitochondrial DNA and then been extended to genomic DNA. Typically also ancient microbial DNA is sequenced. These sequences allow the identification of pathogens as well as studying the evolution of microorganisms. The talk will explain the metagenomic aspects of the “Ötzi” genome project and discuss the first results. (author)

  10. The diploid genome sequence of an individual human.

    Directory of Open Access Journals (Sweden)

    Samuel Levy

    2007-09-01

    Full Text Available Presented here is a genome sequence of an individual human. It was produced from approximately 32 million random DNA fragments, sequenced by Sanger dideoxy technology and assembled into 4,528 scaffolds, comprising 2,810 million bases (Mb of contiguous sequence with approximately 7.5-fold coverage for any given region. We developed a modified version of the Celera assembler to facilitate the identification and comparison of alternate alleles within this individual diploid genome. Comparison of this genome and the National Center for Biotechnology Information human reference assembly revealed more than 4.1 million DNA variants, encompassing 12.3 Mb. These variants (of which 1,288,319 were novel included 3,213,401 single nucleotide polymorphisms (SNPs, 53,823 block substitutions (2-206 bp, 292,102 heterozygous insertion/deletion events (indels(1-571 bp, 559,473 homozygous indels (1-82,711 bp, 90 inversions, as well as numerous segmental duplications and copy number variation regions. Non-SNP DNA variation accounts for 22% of all events identified in the donor, however they involve 74% of all variant bases. This suggests an important role for non-SNP genetic alterations in defining the diploid genome structure. Moreover, 44% of genes were heterozygous for one or more variants. Using a novel haplotype assembly strategy, we were able to span 1.5 Gb of genome sequence in segments >200 kb, providing further precision to the diploid nature of the genome. These data depict a definitive molecular portrait of a diploid human genome that provides a starting point for future genome comparisons and enables an era of individualized genomic information.

  11. Distribution and evolution of repeated sequences in genomes of Triatominae (Hemiptera-Reduviidae inferred from genomic in situ hybridization.

    Directory of Open Access Journals (Sweden)

    Sebastian Pita

    Full Text Available The subfamily Triatominae, vectors of Chagas disease, comprises 140 species characterized by a highly homogeneous chromosome number. We analyzed the chromosomal distribution and evolution of repeated sequences in Triatominae genomes by Genomic in situ Hybridization using Triatoma delpontei and Triatoma infestans genomic DNAs as probes. Hybridizations were performed on their own chromosomes and on nine species included in six genera from the two main tribes: Triatomini and Rhodniini. Genomic probes clearly generate two different hybridization patterns, dispersed or accumulated in specific regions or chromosomes. The three used probes generate the same hybridization pattern in each species. However, these patterns are species-specific. In closely related species, the probes strongly hybridized in the autosomal heterochromatic regions, resembling C-banding and DAPI patterns. However, in more distant species these co-localizations are not observed. The heterochromatic Y chromosome is constituted by highly repeated sequences, which is conserved among 10 species of Triatomini tribe suggesting be an ancestral character for this group. However, the Y chromosome in Rhodniini tribe is markedly different, supporting the early evolutionary dichotomy between both tribes. In some species, sex chromosomes and autosomes shared repeated sequences, suggesting meiotic chromatin exchanges among these heterologous chromosomes. Our GISH analyses enabled us to acquire not only reliable information about autosomal repeated sequences distribution but also an insight into sex chromosome evolution in Triatominae. Furthermore, the differentiation obtained by GISH might be a valuable marker to establish phylogenetic relationships and to test the controversial origin of the Triatominae subfamily.

  12. The (in)complete organelle genome: exploring the use and nonuse of available technologies for characterizing mitochondrial and plastid chromosomes.

    Science.gov (United States)

    Sanitá Lima, Matheus; Woods, Laura C; Cartwright, Matthew W; Smith, David Roy

    2016-11-01

    Not long ago, scientists paid dearly in time, money and skill for every nucleotide that they sequenced. Today, DNA sequencing technologies epitomize the slogan 'faster, easier, cheaper and more', and in many ways, sequencing an entire genome has become routine, even for the smallest laboratory groups. This is especially true for mitochondrial and plastid genomes. Given their relatively small sizes and high copy numbers per cell, organelle DNAs are currently among the most highly sequenced kind of chromosome. But accurately characterizing an organelle genome and the information it encodes can require much more than DNA sequencing and bioinformatics analyses. Organelle genomes can be surprisingly complex and can exhibit convoluted and unconventional modes of gene expression. Unravelling this complexity can demand a wide assortment of experiments, from pulsed-field gel electrophoresis to Southern and Northern blots to RNA analyses. Here, we show that it is exactly these types of 'complementary' analyses that are often lacking from contemporary organelle genome papers, particularly short 'genome announcement' articles. Consequently, crucial and interesting features of organelle chromosomes are going undescribed, which could ultimately lead to a poor understanding and even a misrepresentation of these genomes and the genes they express. High-throughput sequencing and bioinformatics have made it easy to sequence and assemble entire chromosomes, but they should not be used as a substitute for or at the expense of other types of genomic characterization methods. © 2016 The Authors. Molecular Ecology Resources Published by John Wiley & Sons Ltd.

  13. The Dunaliella salina organelle genomes: large sequences, inflated with intronic and intergenic DNA

    Energy Technology Data Exchange (ETDEWEB)

    Smith, David R.; Lee, Robert W.; Cushman, John C.; Magnuson, Jon K.; Tran, Duc; Polle, Juergen E.

    2010-05-07

    Abstract Background: Dunaliella salina Teodoresco, a unicellular, halophilic green alga belonging to the Chlorophyceae, is among the most industrially important microalgae. This is because D. salina can produce massive amounts of β-carotene, which can be collected for commercial purposes, and because of its potential as a feedstock for biofuels production. Although the biochemistry and physiology of D. salina have been studied in great detail, virtually nothing is known about the genomes it carries, especially those within its mitochondrion and plastid. This study presents the complete mitochondrial and plastid genome sequences of D. salina and compares them with those of the model green algae Chlamydomonas reinhardtii and Volvox carteri. Results: The D. salina organelle genomes are large, circular-mapping molecules with ~60% noncoding DNA, placing them among the most inflated organelle DNAs sampled from the Chlorophyta. In fact, the D. salina plastid genome, at 269 kb, is the largest complete plastid DNA (ptDNA) sequence currently deposited in GenBank, and both the mitochondrial and plastid genomes have unprecedentedly high intron densities for organelle DNA: ~1.5 and ~0.4 introns per gene, respectively. Moreover, what appear to be the relics of genes, introns, and intronic open reading frames are found scattered throughout the intergenic ptDNA regions -- a trait without parallel in other characterized organelle genomes and one that gives insight into the mechanisms and modes of expansion of the D. salina ptDNA. Conclusions: These findings confirm the notion that chlamydomonadalean algae have some of the most extreme organelle genomes of all eukaryotes. They also suggest that the events giving rise to the expanded ptDNA architecture of D. salina and other Chlamydomonadales may have occurred early in the evolution of this lineage. Although interesting from a genome evolution standpoint, the D. salina organelle DNA sequences will aid in the development of a viable

  14. The Dunaliella salina organelle genomes: large sequences, inflated with intronic and intergenic DNA

    Directory of Open Access Journals (Sweden)

    Tran Duc

    2010-05-01

    Full Text Available Abstract Background Dunaliella salina Teodoresco, a unicellular, halophilic green alga belonging to the Chlorophyceae, is among the most industrially important microalgae. This is because D. salina can produce massive amounts of β-carotene, which can be collected for commercial purposes, and because of its potential as a feedstock for biofuels production. Although the biochemistry and physiology of D. salina have been studied in great detail, virtually nothing is known about the genomes it carries, especially those within its mitochondrion and plastid. This study presents the complete mitochondrial and plastid genome sequences of D. salina and compares them with those of the model green algae Chlamydomonas reinhardtii and Volvox carteri. Results The D. salina organelle genomes are large, circular-mapping molecules with ~60% noncoding DNA, placing them among the most inflated organelle DNAs sampled from the Chlorophyta. In fact, the D. salina plastid genome, at 269 kb, is the largest complete plastid DNA (ptDNA sequence currently deposited in GenBank, and both the mitochondrial and plastid genomes have unprecedentedly high intron densities for organelle DNA: ~1.5 and ~0.4 introns per gene, respectively. Moreover, what appear to be the relics of genes, introns, and intronic open reading frames are found scattered throughout the intergenic ptDNA regions -- a trait without parallel in other characterized organelle genomes and one that gives insight into the mechanisms and modes of expansion of the D. salina ptDNA. Conclusions These findings confirm the notion that chlamydomonadalean algae have some of the most extreme organelle genomes of all eukaryotes. They also suggest that the events giving rise to the expanded ptDNA architecture of D. salina and other Chlamydomonadales may have occurred early in the evolution of this lineage. Although interesting from a genome evolution standpoint, the D. salina organelle DNA sequences will aid in the

  15. Synaptotagmin gene content of the sequenced genomes

    Directory of Open Access Journals (Sweden)

    Craxton Molly

    2004-07-01

    Full Text Available Abstract Background Synaptotagmins exist as a large gene family in mammals. There is much interest in the function of certain family members which act crucially in the regulated synaptic vesicle exocytosis required for efficient neurotransmission. Knowledge of the functions of other family members is relatively poor and the presence of Synaptotagmin genes in plants indicates a role for the family as a whole which is wider than neurotransmission. Identification of the Synaptotagmin genes within completely sequenced genomes can provide the entire Synaptotagmin gene complement of each sequenced organism. Defining the detailed structures of all the Synaptotagmin genes and their encoded products can provide a useful resource for functional studies and a deeper understanding of the evolution of the gene family. The current rapid increase in the number of sequenced genomes from different branches of the tree of life, together with the public deposition of evolutionarily diverse transcript sequences make such studies worthwhile. Results I have compiled a detailed list of the Synaptotagmin genes of Caenorhabditis, Anopheles, Drosophila, Ciona, Danio, Fugu, Mus, Homo, Arabidopsis and Oryza by examining genomic and transcript sequences from public sequence databases together with some transcript sequences obtained by cDNA library screening and RT-PCR. I have compared all of the genes and investigated the relationship between plant Synaptotagmins and their non-Synaptotagmin counterparts. Conclusions I have identified and compared 98 Synaptotagmin genes from 10 sequenced genomes. Detailed comparison of transcript sequences reveals abundant and complex variation in Synaptotagmin gene expression and indicates the presence of Synaptotagmin genes in all animals and land plants. Amino acid sequence comparisons indicate patterns of conservation and diversity in function. Phylogenetic analysis shows the origin of Synaptotagmins in multicellular eukaryotes and their

  16. The first genome sequence of a metatherian herpesvirus: Macropodid herpesvirus 1.

    Science.gov (United States)

    Vaz, Paola K; Mahony, Timothy J; Hartley, Carol A; Fowler, Elizabeth V; Ficorilli, Nino; Lee, Sang W; Gilkerson, James R; Browning, Glenn F; Devlin, Joanne M

    2016-01-22

    While many placental herpesvirus genomes have been fully sequenced, the complete genome of a marsupial herpesvirus has not been described. Here we present the first genome sequence of a metatherian herpesvirus, Macropodid herpesvirus 1 (MaHV-1). The MaHV-1 viral genome was sequenced using an Illumina MiSeq sequencer, de novo assembly was performed and the genome was annotated. The MaHV-1 genome was 140 kbp in length and clustered phylogenetically with the primate simplexviruses, sharing 67% nucleotide sequence identity with Human herpesviruses 1 and 2. The MaHV-1 genome contained 66 predicted open reading frames (ORFs) homologous to those in other herpesvirus genomes, but lacked homologues of UL3, UL4, UL56 and glycoprotein J. This is the first alphaherpesvirus genome that has been found to lack the UL3 and UL4 homologues. We identified six novel ORFs and confirmed their transcription by RT-PCR. This is the first genome sequence of a herpesvirus that infects metatherians, a taxonomically unique mammalian clade. Members of the Simplexvirus genus are remarkably conserved, so the absence of ORFs otherwise retained in eutherian and avian alphaherpesviruses contributes to our understanding of the Alphaherpesvirinae. Further study of metatherian herpesvirus genetics and pathogenesis provides a unique approach to understanding herpesvirus-mammalian interactions.

  17. Genomic Characterization of the Genus Nairovirus (Family Bunyaviridae

    Directory of Open Access Journals (Sweden)

    Jens H. Kuhn

    2016-06-01

    Full Text Available Nairovirus, one of five bunyaviral genera, includes seven species. Genomic sequence information is limited for members of the Dera Ghazi Khan, Hughes, Qalyub, Sakhalin, and Thiafora nairovirus species. We used next-generation sequencing and historical virus-culture samples to determine 14 complete and nine coding-complete nairoviral genome sequences to further characterize these species. Previously unsequenced viruses include Abu Mina, Clo Mor, Great Saltee, Hughes, Raza, Sakhalin, Soldado, and Tillamook viruses. In addition, we present genomic sequence information on additional isolates of previously sequenced Avalon, Dugbe, Sapphire II, and Zirqa viruses. Finally, we identify Tunis virus, previously thought to be a phlebovirus, as an isolate of Abu Hammad virus. Phylogenetic analyses indicate the need for reassignment of Sapphire II virus to Dera Ghazi Khan nairovirus and reassignment of Hazara, Tofla, and Nairobi sheep disease viruses to novel species. We also propose new species for the Kasokero group (Kasokero, Leopards Hill, Yogue viruses, the Ketarah group (Gossas, Issyk-kul, Keterah/soft tick viruses and the Burana group (Wēnzhōu tick virus, Huángpí tick virus 1, Tǎchéng tick virus 1. Our analyses emphasize the sister relationship of nairoviruses and arenaviruses, and indicate that several nairo-like viruses (Shāyáng spider virus 1, Xīnzhōu spider virus, Sānxiá water strider virus 1, South Bay virus, Wǔhàn millipede virus 2 require establishment of novel genera in a larger nairovirus-arenavirus supergroup.

  18. Rediscovery by Whole Genome Sequencing: Classical Mutations and Genome Polymorphisms in Neurospora crassa

    Energy Technology Data Exchange (ETDEWEB)

    McCluskey, Kevin; Wiest, Aric E.; Grigoriev, Igor V.; Lipzen, Anna; Martin, Joel; Schackwitz, Wendy; Baker, Scott E.

    2011-06-02

    Classical forward genetics has been foundational to modern biology, and has been the paradigm for characterizing the role of genes in shaping phenotypes for decades. In recent years, reverse genetics has been used to identify the functions of genes, via the intentional introduction of variation and subsequent evaluation in physiological, molecular, and even population contexts. These approaches are complementary and whole genome analysis serves as a bridge between the two. We report in this article the whole genome sequencing of eighteen classical mutant strains of Neurospora crassa and the putative identification of the mutations associated with corresponding mutant phenotypes. Although some strains carry multiple unique nonsynonymous, nonsense, or frameshift mutations, the combined power of limiting the scope of the search based on genetic markers and of using a comparative analysis among the eighteen genomes provides strong support for the association between mutation and phenotype. For ten of the mutants, the mutant phenotype is recapitulated in classical or gene deletion mutants in Neurospora or other filamentous fungi. From thirteen to 137 nonsense mutations are present in each strain and indel sizes are shown to be highly skewed in gene coding sequence. Significant additional genetic variation was found in the eighteen mutant strains, and this variability defines multiple alleles of many genes. These alleles may be useful in further genetic and molecular analysis of known and yet-to-be-discovered functions and they invite new interpretations of molecular and genetic interactions in classical mutant strains.

  19. Transcriptome characterization of the South African abalone Haliotis midae using sequencing-by-synthesis

    Directory of Open Access Journals (Sweden)

    Roodt-Wilding Rouvay

    2011-03-01

    Full Text Available Abstract Background Worldwide, the genus Haliotis is represented by 56 extant species and several of these are commercially cultured. Among the six abalone species found in South Africa, Haliotis midae is the only aquacultured species. Despite its economic importance, genomic sequence resources for H. midae, and for abalone in general, are still scarce. Next generation sequencing technologies provide a fast and efficient tool to generate large sequence collections that can be used to characterize the transcriptome and identify expressed genes associated with economically important traits like growth and disease resistance. Results More than 25 million short reads generated by the Illumina Genome Analyzer were de novo assembled in 22,761 contigs with an average size of 260 bp. With a stringent E-value threshold of 10-10, 3,841 contigs (16.8% had a BLAST homologous match against the Genbank non-redundant (NR protein database. Most of these sequences were annotated using the gene ontology (GO and eukaryotic orthologous groups of proteins (KOG databases and assigned to various functional categories. According to annotation results, many gene families involved in immune response were identified. Thousands of simple sequence repeats (SSR and single nucleotide polymorphisms (SNP were detected. Setting stringent parameters to ensure a high probability of amplification, 420 primer pairs in 181 contigs containing SSR loci were designed. Conclusion This data represents the most comprehensive genomic resource for the South African abalone H. midae to date. The amount of assembled sequences demonstrated the utility of the Illumina sequencing technology in the transcriptome characterization of a non-model species. It allowed the development of several markers and the identification of promising candidate genes for future studies on population and functional genomics in H. midae and in other abalone species.

  20. Identification and Characterization of Epstein-Barr Virus Genomes in Lung Carcinoma Biopsy Samples by Next-Generation Sequencing Technology.

    Science.gov (United States)

    Wang, Shanshan; Xiong, Hongchao; Yan, Shi; Wu, Nan; Lu, Zheming

    2016-05-18

    Epstein-Barr virus (EBV) has been detected in the tumor cells of several cancers, including some cases of lung carcinoma (LC). However, the genomic characteristics and diversity of EBV strains associated with LC are poorly understood. In this study, we sequenced the EBV genomes isolated from four primary LC tumor biopsy samples, designated LC1 to LC4. Comparative analysis demonstrated that LC strains were more closely related to GD1 strain. Compared to GD1 reference genome, a total of 520 variations in all, including 498 substitutions, 12 insertions, and 10 deletions were found. Latent genes were found to harbor the most numbers of nonsynonymous mutations. Phylogenetic analysis showed that all LC strains were closely related to Asian EBV strains, whereas different from African/American strains. LC2 genome was distinct from the other three LC genomes, suggesting at least two parental lineages of EBV among the LC genomes may exist. All LC strains could be classified as China 1 and V-val subtype according to the amino acid sequence of LMP1 and EBNA1, respectively. In conclusion, our results showed the genomic diversity among EBV genomes isolated from LC, which might facilitate to uncover the previously unknown variations of pathogenic significance.

  1. Protecting genomic sequence anonymity with generalization lattices.

    Science.gov (United States)

    Malin, B A

    2005-01-01

    Current genomic privacy technologies assume the identity of genomic sequence data is protected if personal information, such as demographics, are obscured, removed, or encrypted. While demographic features can directly compromise an individual's identity, recent research demonstrates such protections are insufficient because sequence data itself is susceptible to re-identification. To counteract this problem, we introduce an algorithm for anonymizing a collection of person-specific DNA sequences. The technique is termed DNA lattice anonymization (DNALA), and is based upon the formal privacy protection schema of k -anonymity. Under this model, it is impossible to observe or learn features that distinguish one genetic sequence from k-1 other entries in a collection. To maximize information retained in protected sequences, we incorporate a concept generalization lattice to learn the distance between two residues in a single nucleotide region. The lattice provides the most similar generalized concept for two residues (e.g. adenine and guanine are both purines). The method is tested and evaluated with several publicly available human population datasets ranging in size from 30 to 400 sequences. Our findings imply the anonymization schema is feasible for the protection of sequences privacy. The DNALA method is the first computational disclosure control technique for general DNA sequences. Given the computational nature of the method, guarantees of anonymity can be formally proven. There is room for improvement and validation, though this research provides the groundwork from which future researchers can construct genomics anonymization schemas tailored to specific datasharing scenarios.

  2. The Release 6 reference sequence of the Drosophila melanogaster genome.

    Science.gov (United States)

    Hoskins, Roger A; Carlson, Joseph W; Wan, Kenneth H; Park, Soo; Mendez, Ivonne; Galle, Samuel E; Booth, Benjamin W; Pfeiffer, Barret D; George, Reed A; Svirskas, Robert; Krzywinski, Martin; Schein, Jacqueline; Accardo, Maria Carmela; Damia, Elisabetta; Messina, Giovanni; Méndez-Lago, María; de Pablos, Beatriz; Demakova, Olga V; Andreyeva, Evgeniya N; Boldyreva, Lidiya V; Marra, Marco; Carvalho, A Bernardo; Dimitri, Patrizio; Villasante, Alfredo; Zhimulev, Igor F; Rubin, Gerald M; Karpen, Gary H; Celniker, Susan E

    2015-03-01

    Drosophila melanogaster plays an important role in molecular, genetic, and genomic studies of heredity, development, metabolism, behavior, and human disease. The initial reference genome sequence reported more than a decade ago had a profound impact on progress in Drosophila research, and improving the accuracy and completeness of this sequence continues to be important to further progress. We previously described improvement of the 117-Mb sequence in the euchromatic portion of the genome and 21 Mb in the heterochromatic portion, using a whole-genome shotgun assembly, BAC physical mapping, and clone-based finishing. Here, we report an improved reference sequence of the single-copy and middle-repetitive regions of the genome, produced using cytogenetic mapping to mitotic and polytene chromosomes, clone-based finishing and BAC fingerprint verification, ordering of scaffolds by alignment to cDNA sequences, incorporation of other map and sequence data, and validation by whole-genome optical restriction mapping. These data substantially improve the accuracy and completeness of the reference sequence and the order and orientation of sequence scaffolds into chromosome arm assemblies. Representation of the Y chromosome and other heterochromatic regions is particularly improved. The new 143.9-Mb reference sequence, designated Release 6, effectively exhausts clone-based technologies for mapping and sequencing. Highly repeat-rich regions, including large satellite blocks and functional elements such as the ribosomal RNA genes and the centromeres, are largely inaccessible to current sequencing and assembly methods and remain poorly represented. Further significant improvements will require sequencing technologies that do not depend on molecular cloning and that produce very long reads. © 2015 Hoskins et al.; Published by Cold Spring Harbor Laboratory Press.

  3. Involvement of Disperse Repetitive Sequences in Wheat/Rye Genome Adjustment

    Directory of Open Access Journals (Sweden)

    Manuela Silva

    2012-07-01

    Full Text Available The union of different genomes in the same nucleus frequently results in hybrid genotypes with improved genome plasticity related to both genome remodeling events and changes in gene expression. Most modern cereal crops are polyploid species. Triticale, synthesized by the cross between wheat and rye, constitutes an excellent model to study polyploidization functional implications. We intend to attain a deeper knowledge of dispersed repetitive sequence involvement in parental genome reshuffle in triticale and in wheat-rye addition lines that have the entire wheat genome plus each rye chromosome pair. Through Random Amplified Polymorphic DNA (RAPD analysis with OPH20 10-mer primer we unraveled clear alterations corresponding to the loss of specific bands from both parental genomes. Moreover, the sequential nature of those events was revealed by the increased absence of rye-origin bands in wheat-rye addition lines in comparison with triticale. Remodeled band sequencing revealed that both repetitive and coding genome domains are affected in wheat-rye hybrid genotypes. Additionally, the amplification and sequencing of pSc20H internal segments showed that the disappearance of parental bands may result from restricted sequence alterations and unraveled the involvement of wheat/rye related repetitive sequences in genome adjustment needed for hybrid plant stabilization.

  4. A Probabilistic Genome-Wide Gene Reading Frame Sequence Model

    DEFF Research Database (Denmark)

    Have, Christian Theil; Mørk, Søren

    We introduce a new type of probabilistic sequence model, that model the sequential composition of reading frames of genes in a genome. Our approach extends gene finders with a model of the sequential composition of genes at the genome-level -- effectively producing a sequential genome annotation...... as output. The model can be used to obtain the most probable genome annotation based on a combination of i: a gene finder score of each gene candidate and ii: the sequence of the reading frames of gene candidates through a genome. The model --- as well as a higher order variant --- is developed and tested...... and are evaluated by the effect on prediction performance. Since bacterial gene finding to a large extent is a solved problem it forms an ideal proving ground for evaluating the explicit modeling of larger scale gene sequence composition of genomes. We conclude that the sequential composition of gene reading frames...

  5. From Sequence to Morphology - Long-Range Correlations in Complete Sequenced Genomes

    NARCIS (Netherlands)

    T.A. Knoch (Tobias)

    2004-01-01

    textabstractThe largely unresolved sequential organization, i.e. the relations within DNA sequences, and its connection to the three-dimensional organization of genomes was investigated by correlation analyses of completely sequenced chromosomes from Viroids, Archaea, Bacteria, Arabidopsis

  6. High-resolution characterization of a hepatocellular carcinoma genome.

    Science.gov (United States)

    Totoki, Yasushi; Tatsuno, Kenji; Yamamoto, Shogo; Arai, Yasuhito; Hosoda, Fumie; Ishikawa, Shumpei; Tsutsumi, Shuichi; Sonoda, Kohtaro; Totsuka, Hirohiko; Shirakihara, Takuya; Sakamoto, Hiromi; Wang, Linghua; Ojima, Hidenori; Shimada, Kazuaki; Kosuge, Tomoo; Okusaka, Takuji; Kato, Kazuto; Kusuda, Jun; Yoshida, Teruhiko; Aburatani, Hiroyuki; Shibata, Tatsuhiro

    2011-05-01

    Hepatocellular carcinoma, one of the most common virus-associated cancers, is the third most frequent cause of cancer-related death worldwide. By massively parallel sequencing of a primary hepatitis C virus-positive hepatocellular carcinoma (36× coverage) and matched lymphocytes (>28× coverage) from the same individual, we identified more than 11,000 somatic substitutions of the tumor genome that showed predominance of T>C/A>G transition and a decrease of the T>C substitution on the transcribed strand, suggesting preferential DNA repair. Gene annotation enrichment analysis of 63 validated non-synonymous substitutions revealed enrichment of phosphoproteins. We further validated 22 chromosomal rearrangements, generating four fusion transcripts that had altered transcriptional regulation (BCORL1-ELF4) or promoter activity. Whole-exome sequencing at a higher sequence depth (>76× coverage) revealed a TSC1 nonsense substitution in a subpopulation of the tumor cells. This first high-resolution characterization of a virus-associated cancer genome identified previously uncharacterized mutation patterns, intra-chromosomal rearrangements and fusion genes, as well as genetic heterogeneity within the tumor.

  7. Templated sequence insertion polymorphisms in the human genome

    Science.gov (United States)

    Onozawa, Masahiro; Aplan, Peter

    2016-11-01

    Templated Sequence Insertion Polymorphism (TSIP) is a recently described form of polymorphism recognized in the human genome, in which a sequence that is templated from a distant genomic region is inserted into the genome, seemingly at random. TSIPs can be grouped into two classes based on nucleotide sequence features at the insertion junctions; Class 1 TSIPs show features of insertions that are mediated via the LINE-1 ORF2 protein, including 1) target-site duplication (TSD), 2) polyadenylation 10-30 nucleotides downstream of a “cryptic” polyadenylation signal, and 3) preference for insertion at a 5’-TTTT/A-3’ sequence. In contrast, class 2 TSIPs show features consistent with repair of a DNA double-strand break via insertion of a DNA “patch” that is derived from a distant genomic region. Survey of a large number of normal human volunteers demonstrates that most individuals have 25-30 TSIPs, and that these TSIPs track with specific geographic regions. Similar to other forms of human polymorphism, we suspect that these TSIPs may be important for the generation of human diversity and genetic diseases.

  8. Survey sequencing and comparative analysis of the elephant shark (Callorhinchus milii genome.

    Directory of Open Access Journals (Sweden)

    Byrappa Venkatesh

    2007-04-01

    Full Text Available Owing to their phylogenetic position, cartilaginous fishes (sharks, rays, skates, and chimaeras provide a critical reference for our understanding of vertebrate genome evolution. The relatively small genome of the elephant shark, Callorhinchus milii, a chimaera, makes it an attractive model cartilaginous fish genome for whole-genome sequencing and comparative analysis. Here, the authors describe survey sequencing (1.4x coverage and comparative analysis of the elephant shark genome, one of the first cartilaginous fish genomes to be sequenced to this depth. Repetitive sequences, represented mainly by a novel family of short interspersed element-like and long interspersed element-like sequences, account for about 28% of the elephant shark genome. Fragments of approximately 15,000 elephant shark genes reveal specific examples of genes that have been lost differentially during the evolution of tetrapod and teleost fish lineages. Interestingly, the degree of conserved synteny and conserved sequences between the human and elephant shark genomes are higher than that between human and teleost fish genomes. Elephant shark contains putative four Hox clusters indicating that, unlike teleost fish genomes, the elephant shark genome has not experienced an additional whole-genome duplication. These findings underscore the importance of the elephant shark as a critical reference vertebrate genome for comparative analysis of the human and other vertebrate genomes. This study also demonstrates that a survey-sequencing approach can be applied productively for comparative analysis of distantly related vertebrate genomes.

  9. Genome Sequence of Australian Indigenous Wine Yeast Torulaspora delbrueckii COFT1 Using Nanopore Sequencing.

    Science.gov (United States)

    Tondini, Federico; Jiranek, Vladimir; Grbin, Paul R; Onetto, Cristobal A

    2018-04-26

    Here, we report the first sequenced genome of an indigenous Australian wine isolate of Torulaspora delbrueckii using the Oxford Nanopore MinION and Illumina HiSeq sequencing platforms. The genome size is 9.4 Mb and contains 4,831 genes. Copyright © 2018 Tondini et al.

  10. Next-Generation Sequencing Reveals the Impact of Repetitive DNA Across Phylogenetically Closely Related Genomes of Orobanchaceae

    Science.gov (United States)

    Piednoël, Mathieu; Aberer, Andre J.; Schneeweiss, Gerald M.; Macas, Jiri; Novak, Petr; Gundlach, Heidrun; Temsch, Eva M.; Renner, Susanne S.

    2013-01-01

    We used next-generation sequencing to characterize the genomes of nine species of Orobanchaceae of known phylogenetic relationships, different life forms, and including a polyploid species. The study species are the autotrophic, nonparasitic Lindenbergia philippensis, the hemiparasitic Schwalbea americana, and seven nonphotosynthetic parasitic species of Orobanche (Orobanche crenata, Orobanche cumana, Orobanche gracilis (tetraploid), and Orobanche pancicii) and Phelipanche (Phelipanche lavandulacea, Phelipanche purpurea, and Phelipanche ramosa). Ty3/Gypsy elements comprise 1.93%–28.34% of the nine genomes and Ty1/Copia elements comprise 8.09%–22.83%. When compared with L. philippensis and S. americana, the nonphotosynthetic species contain higher proportions of repetitive DNA sequences, perhaps reflecting relaxed selection on genome size in parasitic organisms. Among the parasitic species, those in the genus Orobanche have smaller genomes but higher proportions of repetitive DNA than those in Phelipanche, mostly due to a diversification of repeats and an accumulation of Ty3/Gypsy elements. Genome downsizing in the tetraploid O. gracilis probably led to sequence loss across most repeat types. PMID:22723303

  11. From cultured to uncultured genome sequences: metagenomics and modeling microbial ecosystems.

    Science.gov (United States)

    Garza, Daniel R; Dutilh, Bas E

    2015-11-01

    Microorganisms and the viruses that infect them are the most numerous biological entities on Earth and enclose its greatest biodiversity and genetic reservoir. With strength in their numbers, these microscopic organisms are major players in the cycles of energy and matter that sustain all life. Scientists have only scratched the surface of this vast microbial world through culture-dependent methods. Recent developments in generating metagenomes, large random samples of nucleic acid sequences isolated directly from the environment, are providing comprehensive portraits of the composition, structure, and functioning of microbial communities. Moreover, advances in metagenomic analysis have created the possibility of obtaining complete or nearly complete genome sequences from uncultured microorganisms, providing important means to study their biology, ecology, and evolution. Here we review some of the recent developments in the field of metagenomics, focusing on the discovery of genetic novelty and on methods for obtaining uncultured genome sequences, including through the recycling of previously published datasets. Moreover we discuss how metagenomics has become a core scientific tool to characterize eco-evolutionary patterns of microbial ecosystems, thus allowing us to simultaneously discover new microbes and study their natural communities. We conclude by discussing general guidelines and challenges for modeling the interactions between uncultured microorganisms and viruses based on the information contained in their genome sequences. These models will significantly advance our understanding of the functioning of microbial ecosystems and the roles of microbes in the environment.

  12. Phylogenetic and genomic characterization of a novel atypical porcine pestivirus in China.

    Science.gov (United States)

    Zhang, H; Wen, W; Hao, G; Hu, Y; Chen, H; Qian, P; Li, X

    2018-02-01

    Atypical porcine pestivirus (APPV) has been considered a novel pestivirus and causative agent of congenital tremor type A-II. An APPV CH-GX2016 strain was characterized from newly born piglets with clinical symptoms of congenital tremor in Guangxi, China. The genome of APPV CH-GX 2016 strain was 11,475 bp in length and encoded a polyprotein composed of the 3,635 amino acids. This genome sequence exhibited 88.0% to 90.8% nucleotide sequence homology with other APPV reference sequences in GenBank. Phylogenetic analysis further showed that APPV CH-GX is a novel pestivirus compared with previously described classical pestivirus strains. Therefore, APPV is present in pigs in China. © 2017 Blackwell Verlag GmbH.

  13. De novo 454 sequencing of barcoded BAC pools for comprehensive gene survey and genome analysis in the complex genome of barley

    Directory of Open Access Journals (Sweden)

    Scholz Uwe

    2009-11-01

    Full Text Available Abstract Background De novo sequencing the entire genome of a large complex plant genome like the one of barley (Hordeum vulgare L. is a major challenge both in terms of experimental feasibility and costs. The emergence and breathtaking progress of next generation sequencing technologies has put this goal into focus and a clone based strategy combined with the 454/Roche technology is conceivable. Results To test the feasibility, we sequenced 91 barcoded, pooled, gene containing barley BACs using the GS FLX platform and assembled the sequences under iterative change of parameters. The BAC assemblies were characterized by N50 of ~50 kb (N80 ~31 kb, N90 ~21 kb and a Q40 of 94%. For ~80% of the clones, the best assemblies consisted of less than 10 contigs at 24-fold mean sequence coverage. Moreover we show that gene containing regions seem to assemble completely and uninterrupted thus making the approach suitable for detecting complete and positionally anchored genes. By comparing the assemblies of four clones to their complete reference sequences generated by the Sanger method, we evaluated the distribution, quality and representativeness of the 454 sequences as well as the consistency and reliability of the assemblies. Conclusion The described multiplex 454 sequencing of barcoded BACs leads to sequence consensi highly representative for the clones. Assemblies are correct for the majority of contigs. Though the resolution of complex repetitive structures requires additional experimental efforts, our approach paves the way for a clone based strategy of sequencing the barley genome.

  14. Complete genome sequence of the myxobacterium Sorangium cellulosum

    DEFF Research Database (Denmark)

    Schneiker, S; Perlova, O; Kaiser, O

    2007-01-01

    The genus Sorangium synthesizes approximately half of the secondary metabolites isolated from myxobacteria, including the anti-cancer metabolite epothilone. We report the complete genome sequence of the model Sorangium strain S. cellulosum Soce56, which produces several natural products and has...... morphological and physiological properties typical of the genus. The circular genome, comprising 13,033,779 base pairs, is the largest bacterial genome sequenced to date. No global synteny with the genome of Myxococcus xanthus is apparent, revealing an unanticipated level of divergence between...... these myxobacteria. A large percentage of the genome is devoted to regulation, particularly post-translational phosphorylation, which probably supports the strain's complex, social lifestyle. This regulatory network includes the highest number of eukaryotic protein kinase-like kinases discovered in any organism...

  15. Cyprinus carpio Genome sequencing and assembly

    NARCIS (Netherlands)

    Kolder, I.C.R.M.; Plas-Duivesteijn, van der Suzanne J.; Tan, G.; Wiegertjes, G.; Forlenza, M.; Guler, A.T.; Travin, D.Y.; Nakao, M.; Moritomo, T.; Irnazarow, I.; Jansen, H.J.

    2013-01-01

    Sequencing of the common carp (Cyprinus carpio carpio Linnaeus, 1758) genome, with the objective of establishing carp as a model organism to supplement the closely related zebrafish (Danio rerio). The sequenced individual is a homozygous female (by gynogenesis) of R3 x R8 carp, the heterozygous

  16. Controversy and debate on clinical genomics sequencing-paper 2: clinical genome-wide sequencing: don't throw out the baby with the bathwater!

    Science.gov (United States)

    Adam, Shelin; Friedman, Jan M

    2017-12-01

    Genome-wide (exome or whole genome) sequencing with appropriate genetic counseling should be considered for any patient with a suspected Mendelian disease that has not been identified by conventional testing. Clinical genome-wide sequencing provides a powerful and effective means of identifying specific genetic causes of serious disease and improving clinical care. Copyright © 2017 Elsevier Inc. All rights reserved.

  17. Genome sequence of the Thermotoga thermarum type strain (LA3(T)) from an African solfataric spring.

    Science.gov (United States)

    Göker, Markus; Spring, Stefan; Scheuner, Carmen; Anderson, Iain; Zeytun, Ahmet; Nolan, Matt; Lucas, Susan; Tice, Hope; Del Rio, Tijana Glavina; Cheng, Jan-Fang; Han, Cliff; Tapia, Roxanne; Goodwin, Lynne A; Pitluck, Sam; Liolios, Konstantinos; Mavromatis, Konstantinos; Pagani, Ioanna; Ivanova, Natalia; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D; Rohde, Manfred; Detter, John C; Woyke, Tanja; Bristow, James; Eisen, Jonathan A; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Lapidus, Alla

    2014-06-15

    Thermotoga thermarum Windberger et al. 1989 is a member to the genomically well characterized genus Thermotoga in the phylum 'Thermotogae'. T. thermarum is of interest for its origin from a continental solfataric spring vs. predominantly marine oil reservoirs of other members of the genus. The genome of strain LA3T also provides fresh data for the phylogenomic positioning of the (hyper-)thermophilic bacteria. T. thermarum strain LA3(T) is the fourth sequenced genome of a type strain from the genus Thermotoga, and the sixth in the family Thermotogaceae to be formally described in a publication. Phylogenetic analyses do not reveal significant discrepancies between the current classification of the group, 16S rRNA gene data and whole-genome sequences. Nevertheless, T. thermarum significantly differs from other Thermotoga species regarding its iron-sulfur cluster synthesis, as it contains only a minimal set of the necessary proteins. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 2,039,943 bp long chromosome with its 2,015 protein-coding and 51 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  18. Genome-wide identification and characterization of Notch transcription complex-binding sequence paired sites in leukemia cells

    Science.gov (United States)

    Severson, Eric; Arnett, Kelly L.; Wang, Hongfang; Zang, Chongzhi; Taing, Len; Liu, Hudan; Pear, Warren S.; Liu, X. Shirley; Blacklow, Stephen C.; Aster, Jon C.

    2018-01-01

    Notch transcription complexes (NTCs) drive target gene expression by binding to two distinct types of genomic response elements, NTC monomer-binding sites and sequence-paired sites (SPSs) that bind NTC dimers. SPSs are conserved and are linked to the Notch-responsiveness of a few genes, but their overall contribution to Notch-dependent gene regulation is unknown. To address this issue, we determined the DNA sequence requirements for NTC dimerization using a fluorescence resonance energy transfer (FRET) assay, and applied insights from these in vitro studies to Notch-“addicted” leukemia cells. We find that SPSs contribute to the regulation of approximately a third of direct Notch target genes. While originally described in promoters, SPSs are present mainly in long-range enhancers, including an enhancer containing a newly described SPS that regulates HES5. Our work provides a general method for identifying sequence-paired sites in genome-wide data sets and highlights the widespread role of NTC dimerization in Notch-transformed leukemia cells. PMID:28465412

  19. Advanced Whole-Genome Sequencing and Analysis of Fetal Genomes from Amniotic Fluid.

    Science.gov (United States)

    Mao, Qing; Chin, Robert; Xie, Weiwei; Deng, Yuqing; Zhang, Wenwei; Xu, Huixin; Zhang, Rebecca Yu; Shi, Quan; Peters, Erin E; Gulbahce, Natali; Li, Zhenyu; Chen, Fang; Drmanac, Radoje; Peters, Brock A

    2018-04-01

    Amniocentesis is a common procedure, the primary purpose of which is to collect cells from the fetus to allow testing for abnormal chromosomes, altered chromosomal copy number, or a small number of genes that have small single- to multibase defects. Here we demonstrate the feasibility of generating an accurate whole-genome sequence of a fetus from either the cellular or cell-free DNA (cfDNA) of an amniotic sample. cfDNA and DNA isolated from the cell pellet of 31 amniocenteses were sequenced to approximately 50× genome coverage by use of the Complete Genomics nanoarray platform. In a subset of the samples, long fragment read libraries were generated from DNA isolated from cells and sequenced to approximately 100× genome coverage. Concordance of variant calls between the 2 DNA sources and with parental libraries was >96%. Two fetal genomes were found to harbor potentially detrimental variants in chromodomain helicase DNA binding protein 8 ( CHD8 ) and LDL receptor-related protein 1 ( LRP1 ), variations of which have been associated with autism spectrum disorder and keratosis pilaris atrophicans, respectively. We also discovered drug sensitivities and carrier information of fetuses for a variety of diseases. We were able to elucidate the complete genome sequence of 31 fetuses from amniotic fluid and demonstrate that the cfDNA or DNA from the cell pellet can be analyzed with little difference in quality. We believe that current technologies could analyze this material in a highly accurate and complete manner and that analyses like these should be considered for addition to current amniocentesis procedures. © 2018 American Association for Clinical Chemistry.

  20. Construction of the BAC Library of Small Abalone (Haliotis diversicolor) for Gene Screening and Genome Characterization.

    Science.gov (United States)

    Jiang, Likun; You, Weiwei; Zhang, Xiaojun; Xu, Jian; Jiang, Yanliang; Wang, Kai; Zhao, Zixia; Chen, Baohua; Zhao, Yunfeng; Mahboob, Shahid; Al-Ghanim, Khalid A; Ke, Caihuan; Xu, Peng

    2016-02-01

    The small abalone (Haliotis diversicolor) is one of the most important aquaculture species in East Asia. To facilitate gene cloning and characterization, genome analysis, and genetic breeding of it, we constructed a large-insert bacterial artificial chromosome (BAC) library, which is an important genetic tool for advanced genetics and genomics research. The small abalone BAC library includes 92,610 clones with an average insert size of 120 Kb, equivalent to approximately 7.6× of the small abalone genome. We set up three-dimensional pools and super pools of 18,432 BAC clones for target gene screening using PCR method. To assess the approach, we screened 12 target genes in these 18,432 BAC clones and identified 16 positive BAC clones. Eight positive BAC clones were then sequenced and assembled with the next generation sequencing platform. The assembled contigs representing these 8 BAC clones spanned 928 Kb of the small abalone genome, providing the first batch of genome sequences for genome evaluation and characterization. The average GC content of small abalone genome was estimated as 40.33%. A total of 21 protein-coding genes, including 7 target genes, were annotated into the 8 BACs, which proved the feasibility of PCR screening approach with three-dimensional pools in small abalone BAC library. One hundred fifty microsatellite loci were also identified from the sequences for marker development in the future. The BAC library and clone pools provided valuable resources and tools for genetic breeding and conservation of H. diversicolor.

  1. Molecular characterization and complete genome sequence of avian paramyxovirus type 4 prototype strain duck/Hong Kong/D3/75

    Directory of Open Access Journals (Sweden)

    Collins Peter L

    2008-10-01

    Full Text Available Abstract Background Avian paramyxoviruses (APMVs are frequently isolated from domestic and wild birds throughout the world. All APMVs, except avian metapneumovirus, are classified in the genus Avulavirus of the family Paramyxoviridae. At present, the APMVs of genus Avulavirus are divided into nine serological types (APMV 1–9. Newcastle disease virus represents APMV-1 and is the most characterized among all APMV types. Very little is known about the molecular characteristics and pathogenicity of APMV 2–9. Results As a first step towards understanding the molecular genetics and pathogenicity of APMV-4, we have sequenced the complete genome of APMV-4 strain duck/Hong Kong/D3/75 and determined its pathogenicity in embryonated chicken eggs. The genome of APMV-4 is 15,054 nucleotides (nt in length, which is consistent with the "rule of six". The genome contains six non-overlapping genes in the order 3'-N-P/V-M-F-HN-L-5'. The genes are flanked on either side by highly conserved transcription start and stop signals and have intergenic sequences varying in length from 9 to 42 nt. The genome contains a 55 nt leader region at 3' end. The 5' trailer region is 17 nt, which is the shortest in the family Paramyxoviridae. Analysis of mRNAs transcribed from the P gene showed that 35% of the transcripts were edited by insertion of one non-templated G residue at an editing site leading to production of V mRNAs. No message was detected that contained insertion of two non-templated G residues, indicating that the W mRNAs are inefficiently produced in APMV-4 infected cells. The cleavage site of the F protein (DIPQR↓F does not conform to the preferred cleavage site of the ubiquitous intracellular protease furin. However, exogenous proteases were not required for the growth of APMV-4 in cell culture, indicating that the cleavage does not depend on a furin site. Conclusion Phylogenic analysis of the nucleotide sequences of viruses of all five genera of the family

  2. Getting complete genomes from complex samples using nanopore sequencing

    DEFF Research Database (Denmark)

    Kirkegaard, Rasmus Hansen; Karst, Søren Michael; Albertsen, Mads

    Background Short read DNA sequencing and metagenomic binning workflows have made it possible to extract bacterial genome bins from environmental microbial samples containing hundreds to thousands of different species. However, these genome bins often do not represent complete genomes......, as they are mostly fragmented, incomplete and often contaminated with foreign DNA. The value of these `draft genomes` have limited, lasting value to the scientific community, as gene synteny is broken and there is some uncertainty of what is missing1. The genetic material most often missed is important multi......-copy and/or conserved marker genes such as the 16S rRNA gene, as sequence micro-heterogeneity prevents assembly of these genes in the de novo assembly. However, long read sequencing technologies are emerging promising an end to fragmented genome assemblies2. Experimental design We extracted DNA from a full...

  3. SeqEntropy: genome-wide assessment of repeats for short read sequencing.

    Directory of Open Access Journals (Sweden)

    Hsueh-Ting Chu

    Full Text Available BACKGROUND: Recent studies on genome assembly from short-read sequencing data reported the limitation of this technology to reconstruct the entire genome even at very high depth coverage. We investigated the limitation from the perspective of information theory to evaluate the effect of repeats on short-read genome assembly using idealized (error-free reads at different lengths. METHODOLOGY/PRINCIPAL FINDINGS: We define a metric H(k to be the entropy of sequencing reads at a read length k and use the relative loss of entropy ΔH(k to measure the impact of repeats for the reconstruction of whole-genome from sequences of length k. In our experiments, we found that entropy loss correlates well with de-novo assembly coverage of a genome, and a score of ΔH(k>1% indicates a severe loss in genome reconstruction fidelity. The minimal read lengths to achieve ΔH(k<1% are different for various organisms and are independent of the genome size. For example, in order to meet the threshold of ΔH(k<1%, a read length of 60 bp is needed for the sequencing of human genome (3.2 10(9 bp and 320 bp for the sequencing of fruit fly (1.8×10(8 bp. We also calculated the ΔH(k scores for 2725 prokaryotic chromosomes and plasmids at several read lengths. Our results indicate that the levels of repeats in different genomes are diverse and the entropy of sequencing reads provides a measurement for the repeat structures. CONCLUSIONS/SIGNIFICANCE: The proposed entropy-based measurement, which can be calculated in seconds to minutes in most cases, provides a rapid quantitative evaluation on the limitation of idealized short-read genome sequencing. Moreover, the calculation can be parallelized to scale up to large euakryotic genomes. This approach may be useful to tune the sequencing parameters to achieve better genome assemblies when a closely related genome is already available.

  4. The First Endogenous Herpesvirus, Identified in the Tarsier Genome, and Novel Sequences from Primate Rhadinoviruses and Lymphocryptoviruses

    Science.gov (United States)

    Aswad, Amr; Katzourakis, Aris

    2014-01-01

    Herpesviridae is a diverse family of large and complex pathogens whose genomes are extremely difficult to sequence. This is particularly true for clinical samples, and if the virus, host, or both genomes are being sequenced for the first time. Although herpesviruses are known to occasionally integrate in host genomes, and can also be inherited in a Mendelian fashion, they are notably absent from the genomic fossil record comprised of endogenous viral elements (EVEs). Here, we combine paleovirological and metagenomic approaches to both explore the constituent viral diversity of mammalian genomes and search for endogenous herpesviruses. We describe the first endogenous herpesvirus from the genome of the Philippine tarsier, belonging to the Roseolovirus genus, and characterize its highly defective genome that is integrated and flanked by unambiguous host DNA. From a draft assembly of the aye-aye genome, we use bioinformatic tools to reveal over 100,000 bp of a novel rhadinovirus that is the first lemur gammaherpesvirus, closely related to Kaposi's sarcoma-associated virus. We also identify 58 genes of Pan paniscus lymphocryptovirus 1, the bonobo equivalent of human Epstein-Barr virus. For each of the viruses, we postulate gene function via comparative analysis to known viral relatives. Most notably, the evidence from gene content and phylogenetics suggests that the aye-aye sequences represent the most basal known rhadinovirus, and indicates that tumorigenic herpesviruses have been infecting primates since their emergence in the late Cretaceous. Overall, these data show that a genomic fossil record of herpesviruses exists despite their extremely large genomes, and expands the known diversity of Herpesviridae, which will aid the characterization of pathogenesis. Our analytical approach illustrates the benefit of intersecting evolutionary approaches with metagenomics, genetics and paleovirology. PMID:24945689

  5. Genetic characterization of human herpesvirus type 1: Full-length genome sequence of strain obtained from an encephalitis case from India

    Directory of Open Access Journals (Sweden)

    Vijay P Bondre

    2016-01-01

    Interpretation & conclusions: Our results showed that the full-length genome sequence generated from an Indian HSV-1 isolate shared close genetic relationship with the American KOS and Chinese CR38 strains which belonged to the Asian genetic lineage. Recombination analysis of Indian isolate demonstrated multiple recombination crossover points throughout the genome. This full-length genome sequence amplified from the Indian isolate would be helpful to study HSV evolution, genetic basis of differential pathogenesis, host-virus interactions and viral factors contributing towards differential clinical outcome in human infections.

  6. Determining and comparing protein function in Bacterial genome sequences

    DEFF Research Database (Denmark)

    Vesth, Tammi Camilla

    of this class have very little homology to other known genomes making functional annotation based on sequence similarity very difficult. Inspired in part by this analysis, an approach for comparative functional annotation was created based public sequenced genomes, CMGfunc. Functionally related groups......In November 2013, there was around 21.000 different prokaryotic genomes sequenced and publicly available, and the number is growing daily with another 20.000 or more genomes expected to be sequenced and deposited by the end of 2014. An important part of the analysis of this data is the functional...... annotation of genes – the descriptions assigned to genes that describe the likely function of the encoded proteins. This process is limited by several factors, including the definition of a function which can be more or less specific as well as how many genes can actually be assigned a function based...

  7. Comparative analysis of catfish BAC end sequences with the zebrafish genome

    Directory of Open Access Journals (Sweden)

    Abernathy Jason

    2009-12-01

    Full Text Available Abstract Background Comparative mapping is a powerful tool to transfer genomic information from sequenced genomes to closely related species for which whole genome sequence data are not yet available. However, such an approach is still very limited in catfish, the most important aquaculture species in the United States. This project was initiated to generate additional BAC end sequences and demonstrate their applications in comparative mapping in catfish. Results We reported the generation of 43,000 BAC end sequences and their applications for comparative genome analysis in catfish. Using these and the additional 20,000 existing BAC end sequences as a resource along with linkage mapping and existing physical map, conserved syntenic regions were identified between the catfish and zebrafish genomes. A total of 10,943 catfish BAC end sequences (17.3% had significant BLAST hits to the zebrafish genome (cutoff value ≤ e-5, of which 3,221 were unique gene hits, providing a platform for comparative mapping based on locations of these genes in catfish and zebrafish. Genetic linkage mapping of microsatellites associated with contigs allowed identification of large conserved genomic segments and construction of super scaffolds. Conclusion BAC end sequences and their associated polymorphic markers are great resources for comparative genome analysis in catfish. Highly conserved chromosomal regions were identified to exist between catfish and zebrafish. However, it appears that the level of conservation at local genomic regions are high while a high level of chromosomal shuffling and rearrangements exist between catfish and zebrafish genomes. Orthologous regions established through comparative analysis should facilitate both structural and functional genome analysis in catfish.

  8. Next Generation Semiconductor Based Sequencing of the Donkey (Equus asinus) Genome Provided Comparative Sequence Data against the Horse Genome and a Few Millions of Single Nucleotide Polymorphisms

    Science.gov (United States)

    Bertolini, Francesca; Scimone, Concetta; Geraci, Claudia; Schiavo, Giuseppina; Utzeri, Valerio Joe; Chiofalo, Vincenzo; Fontanesi, Luca

    2015-01-01

    Few studies investigated the donkey (Equus asinus) at the whole genome level so far. Here, we sequenced the genome of two male donkeys using a next generation semiconductor based sequencing platform (the Ion Proton sequencer) and compared obtained sequence information with the available donkey draft genome (and its Illumina reads from which it was originated) and with the EquCab2.0 assembly of the horse genome. Moreover, the Ion Torrent Personal Genome Analyzer was used to sequence reduced representation libraries (RRL) obtained from a DNA pool including donkeys of different breeds (Grigio Siciliano, Ragusano and Martina Franca). The number of next generation sequencing reads aligned with the EquCab2.0 horse genome was larger than those aligned with the draft donkey genome. This was due to the larger N50 for contigs and scaffolds of the horse genome. Nucleotide divergence between E. caballus and E. asinus was estimated to be ~ 0.52-0.57%. Regions with low nucleotide divergence were identified in several autosomal chromosomes and in the whole chromosome X. These regions might be evolutionally important in equids. Comparing Y-chromosome regions we identified variants that could be useful to track donkey paternal lineages. Moreover, about 4.8 million of single nucleotide polymorphisms (SNPs) in the donkey genome were identified and annotated combining sequencing data from Ion Proton (whole genome sequencing) and Ion Torrent (RRL) runs with Illumina reads. A higher density of SNPs was present in regions homologous to horse chromosome 12, in which several studies reported a high frequency of copy number variants. The SNPs we identified constitute a first resource useful to describe variability at the population genomic level in E. asinus and to establish monitoring systems for the conservation of donkey genetic resources. PMID:26151450

  9. Next Generation Semiconductor Based Sequencing of the Donkey (Equus asinus Genome Provided Comparative Sequence Data against the Horse Genome and a Few Millions of Single Nucleotide Polymorphisms.

    Directory of Open Access Journals (Sweden)

    Francesca Bertolini

    Full Text Available Few studies investigated the donkey (Equus asinus at the whole genome level so far. Here, we sequenced the genome of two male donkeys using a next generation semiconductor based sequencing platform (the Ion Proton sequencer and compared obtained sequence information with the available donkey draft genome (and its Illumina reads from which it was originated and with the EquCab2.0 assembly of the horse genome. Moreover, the Ion Torrent Personal Genome Analyzer was used to sequence reduced representation libraries (RRL obtained from a DNA pool including donkeys of different breeds (Grigio Siciliano, Ragusano and Martina Franca. The number of next generation sequencing reads aligned with the EquCab2.0 horse genome was larger than those aligned with the draft donkey genome. This was due to the larger N50 for contigs and scaffolds of the horse genome. Nucleotide divergence between E. caballus and E. asinus was estimated to be ~ 0.52-0.57%. Regions with low nucleotide divergence were identified in several autosomal chromosomes and in the whole chromosome X. These regions might be evolutionally important in equids. Comparing Y-chromosome regions we identified variants that could be useful to track donkey paternal lineages. Moreover, about 4.8 million of single nucleotide polymorphisms (SNPs in the donkey genome were identified and annotated combining sequencing data from Ion Proton (whole genome sequencing and Ion Torrent (RRL runs with Illumina reads. A higher density of SNPs was present in regions homologous to horse chromosome 12, in which several studies reported a high frequency of copy number variants. The SNPs we identified constitute a first resource useful to describe variability at the population genomic level in E. asinus and to establish monitoring systems for the conservation of donkey genetic resources.

  10. Multi-region and single-cell sequencing reveal variable genomic heterogeneity in rectal cancer.

    Science.gov (United States)

    Liu, Mingshan; Liu, Yang; Di, Jiabo; Su, Zhe; Yang, Hong; Jiang, Beihai; Wang, Zaozao; Zhuang, Meng; Bai, Fan; Su, Xiangqian

    2017-11-23

    Colorectal cancer is a heterogeneous group of malignancies with complex molecular subtypes. While colon cancer has been widely investigated, studies on rectal cancer are very limited. Here, we performed multi-region whole-exome sequencing and single-cell whole-genome sequencing to examine the genomic intratumor heterogeneity (ITH) of rectal tumors. We sequenced nine tumor regions and 88 single cells from two rectal cancer patients with tumors of the same molecular classification and characterized their mutation profiles and somatic copy number alterations (SCNAs) at the multi-region and the single-cell levels. A variable extent of genomic heterogeneity was observed between the two patients, and the degree of ITH increased when analyzed on the single-cell level. We found that major SCNAs were early events in cancer development and inherited steadily. Single-cell sequencing revealed mutations and SCNAs which were hidden in bulk sequencing. In summary, we studied the ITH of rectal cancer at regional and single-cell resolution and demonstrated that variable heterogeneity existed in two patients. The mutational scenarios and SCNA profiles of two patients with treatment naïve from the same molecular subtype are quite different. Our results suggest each tumor possesses its own architecture, which may result in different diagnosis, prognosis, and drug responses. Remarkable ITH exists in the two patients we have studied, providing a preliminary impression of ITH in rectal cancer.

  11. Specialized microbial databases for inductive exploration of microbial genome sequences

    Directory of Open Access Journals (Sweden)

    Cabau Cédric

    2005-02-01

    Full Text Available Abstract Background The enormous amount of genome sequence data asks for user-oriented databases to manage sequences and annotations. Queries must include search tools permitting function identification through exploration of related objects. Methods The GenoList package for collecting and mining microbial genome databases has been rewritten using MySQL as the database management system. Functions that were not available in MySQL, such as nested subquery, have been implemented. Results Inductive reasoning in the study of genomes starts from "islands of knowledge", centered around genes with some known background. With this concept of "neighborhood" in mind, a modified version of the GenoList structure has been used for organizing sequence data from prokaryotic genomes of particular interest in China. GenoChore http://bioinfo.hku.hk/genochore.html, a set of 17 specialized end-user-oriented microbial databases (including one instance of Microsporidia, Encephalitozoon cuniculi, a member of Eukarya has been made publicly available. These databases allow the user to browse genome sequence and annotation data using standard queries. In addition they provide a weekly update of searches against the world-wide protein sequences data libraries, allowing one to monitor annotation updates on genes of interest. Finally, they allow users to search for patterns in DNA or protein sequences, taking into account a clustering of genes into formal operons, as well as providing extra facilities to query sequences using predefined sequence patterns. Conclusion This growing set of specialized microbial databases organize data created by the first Chinese bacterial genome programs (ThermaList, Thermoanaerobacter tencongensis, LeptoList, with two different genomes of Leptospira interrogans and SepiList, Staphylococcus epidermidis associated to related organisms for comparison.

  12. Sequencing and annotation of mitochondrial genomes from individual parasitic helminths.

    Science.gov (United States)

    Jex, Aaron R; Littlewood, D Timothy; Gasser, Robin B

    2015-01-01

    Mitochondrial (mt) genomics has significant implications in a range of fundamental areas of parasitology, including evolution, systematics, and population genetics as well as explorations of mt biochemistry, physiology, and function. Mt genomes also provide a rich source of markers to aid molecular epidemiological and ecological studies of key parasites. However, there is still a paucity of information on mt genomes for many metazoan organisms, particularly parasitic helminths, which has often related to challenges linked to sequencing from tiny amounts of material. The advent of next-generation sequencing (NGS) technologies has paved the way for low cost, high-throughput mt genomic research, but there have been obstacles, particularly in relation to post-sequencing assembly and analyses of large datasets. In this chapter, we describe protocols for the efficient amplification and sequencing of mt genomes from small portions of individual helminths, and highlight the utility of NGS platforms to expedite mt genomics. In addition, we recommend approaches for manual or semi-automated bioinformatic annotation and analyses to overcome the bioinformatic "bottleneck" to research in this area. Taken together, these approaches have demonstrated applicability to a range of parasites and provide prospects for using complete mt genomic sequence datasets for large-scale molecular systematic and epidemiological studies. In addition, these methods have broader utility and might be readily adapted to a range of other medium-sized molecular regions (i.e., 10-100 kb), including large genomic operons, and other organellar (e.g., plastid) and viral genomes.

  13. Challenging a bioinformatic tool's ability to detect microbial contaminants using in silico whole genome sequencing data.

    Science.gov (United States)

    Olson, Nathan D; Zook, Justin M; Morrow, Jayne B; Lin, Nancy J

    2017-01-01

    High sensitivity methods such as next generation sequencing and polymerase chain reaction (PCR) are adversely impacted by organismal and DNA contaminants. Current methods for detecting contaminants in microbial materials (genomic DNA and cultures) are not sensitive enough and require either a known or culturable contaminant. Whole genome sequencing (WGS) is a promising approach for detecting contaminants due to its sensitivity and lack of need for a priori assumptions about the contaminant. Prior to applying WGS, we must first understand its limitations for detecting contaminants and potential for false positives. Herein we demonstrate and characterize a WGS-based approach to detect organismal contaminants using an existing metagenomic taxonomic classification algorithm. Simulated WGS datasets from ten genera as individuals and binary mixtures of eight organisms at varying ratios were analyzed to evaluate the role of contaminant concentration and taxonomy on detection. For the individual genomes the false positive contaminants reported depended on the genus, with Staphylococcus , Escherichia , and Shigella having the highest proportion of false positives. For nearly all binary mixtures the contaminant was detected in the in-silico datasets at the equivalent of 1 in 1,000 cells, though F. tularensis was not detected in any of the simulated contaminant mixtures and Y. pestis was only detected at the equivalent of one in 10 cells. Once a WGS method for detecting contaminants is characterized, it can be applied to evaluate microbial material purity, in efforts to ensure that contaminants are characterized in microbial materials used to validate pathogen detection assays, generate genome assemblies for database submission, and benchmark sequencing methods.

  14. The Complete Chloroplast and Mitochondrial Genome Sequences of Boea hygrometrica: Insights into the Evolution of Plant Organellar Genomes

    Science.gov (United States)

    Wang, Xumin; Deng, Xin; Zhang, Xiaowei; Hu, Songnian; Yu, Jun

    2012-01-01

    The complete nucleotide sequences of the chloroplast (cp) and mitochondrial (mt) genomes of resurrection plant Boea hygrometrica (Bh, Gesneriaceae) have been determined with the lengths of 153,493 bp and 510,519 bp, respectively. The smaller chloroplast genome contains more genes (147) with a 72% coding sequence, and the larger mitochondrial genome have less genes (65) with a coding faction of 12%. Similar to other seed plants, the Bh cp genome has a typical quadripartite organization with a conserved gene in each region. The Bh mt genome has three recombinant sequence repeats of 222 bp, 843 bp, and 1474 bp in length, which divide the genome into a single master circle (MC) and four isomeric molecules. Compared to other angiosperms, one remarkable feature of the Bh mt genome is the frequent transfer of genetic material from the cp genome during recent Bh evolution. We also analyzed organellar genome evolution in general regarding genome features as well as compositional dynamics of sequence and gene structure/organization, providing clues for the understanding of the evolution of organellar genomes in plants. The cp-derived sequences including tRNAs found in angiosperm mt genomes support the conclusion that frequent gene transfer events may have begun early in the land plant lineage. PMID:22291979

  15. Toward an Integrated BAC Library Resource for Genome Sequencing and Analysis; FINAL

    International Nuclear Information System (INIS)

    Simon, M. I.; Kim, U.-J.

    2002-01-01

    We developed a great deal of expertise in building large BAC libraries from a variety of DNA sources including humans, mice, corn, microorganisms, worms, and Arabidopsis. We greatly improved the technology for screening these libraries rapidly and for selecting appropriate BACs and mapping BACs to develop large overlapping contigs. We became involved in supplying BACs and BAC contigs to a variety of sequencing and mapping projects and we began to collaborate with Drs. Adams and Venter at TIGR and with Dr. Leroy Hood and his group at University of Washington to provide BACs for end sequencing and for mapping and sequencing of large fragments of chromosome 16. Together with Dr. Ian Dunham and his co-workers at the Sanger Center we completed the mapping and they completed the sequencing of the first human chromosome, chromosome 22. This was published in Nature in 1999 and our BAC contigs made a major contribution to this sequencing effort. Drs. Shizuya and Ding invented an automated highly accurate BAC mapping technique. We also developed long-term collaborations with Dr. Uli Weier at UCSF in the design of BAC probes for characterization of human tumors and specific chromosome deletions and breakpoints. Finally the contribution of our work to the human genome project has been recognized in the publication both by the international consortium and the NIH of a draft sequence of the human genome in Nature last year. Dr. Shizuya was acknowledged in the authorship of that landmark paper. Dr. Simon was also an author on the Venter/Adams Celera project sequencing the human genome that was published in Science last year

  16. Genome-wide identification of coding and non-coding conserved sequence tags in human and mouse genomes

    Directory of Open Access Journals (Sweden)

    Maggi Giorgio P

    2008-06-01

    Full Text Available Abstract Background The accurate detection of genes and the identification of functional regions is still an open issue in the annotation of genomic sequences. This problem affects new genomes but also those of very well studied organisms such as human and mouse where, despite the great efforts, the inventory of genes and regulatory regions is far from complete. Comparative genomics is an effective approach to address this problem. Unfortunately it is limited by the computational requirements needed to perform genome-wide comparisons and by the problem of discriminating between conserved coding and non-coding sequences. This discrimination is often based (thus dependent on the availability of annotated proteins. Results In this paper we present the results of a comprehensive comparison of human and mouse genomes performed with a new high throughput grid-based system which allows the rapid detection of conserved sequences and accurate assessment of their coding potential. By detecting clusters of coding conserved sequences the system is also suitable to accurately identify potential gene loci. Following this analysis we created a collection of human-mouse conserved sequence tags and carefully compared our results to reliable annotations in order to benchmark the reliability of our classifications. Strikingly we were able to detect several potential gene loci supported by EST sequences but not corresponding to as yet annotated genes. Conclusion Here we present a new system which allows comprehensive comparison of genomes to detect conserved coding and non-coding sequences and the identification of potential gene loci. Our system does not require the availability of any annotated sequence thus is suitable for the analysis of new or poorly annotated genomes.

  17. Effects of informed consent for individual genome sequencing on relevant knowledge.

    Science.gov (United States)

    Kaphingst, K A; Facio, F M; Cheng, M-R; Brooks, S; Eidem, H; Linn, A; Biesecker, B B; Biesecker, L G

    2012-11-01

    Increasing availability of individual genomic information suggests that patients will need knowledge about genome sequencing to make informed decisions, but prior research is limited. In this study, we examined genome sequencing knowledge before and after informed consent among 311 participants enrolled in the ClinSeq™ sequencing study. An exploratory factor analysis of knowledge items yielded two factors (sequencing limitations knowledge; sequencing benefits knowledge). In multivariable analysis, high pre-consent sequencing limitations knowledge scores were significantly related to education [odds ratio (OR): 8.7, 95% confidence interval (CI): 2.45-31.10 for post-graduate education, and OR: 3.9; 95% CI: 1.05, 14.61 for college degree compared with less than college degree] and race/ethnicity (OR: 2.4, 95% CI: 1.09, 5.38 for non-Hispanic Whites compared with other racial/ethnic groups). Mean values increased significantly between pre- and post-consent for the sequencing limitations knowledge subscale (6.9-7.7, p benefits knowledge subscale (7.0-7.5, p < 0.0001); increase in knowledge did not differ by sociodemographic characteristics. This study highlights gaps in genome sequencing knowledge and underscores the need to target educational efforts toward participants with less education or from minority racial/ethnic groups. The informed consent process improved genome sequencing knowledge. Future studies could examine how genome sequencing knowledge influences informed decision making. © 2012 John Wiley & Sons A/S.

  18. Insights into the evolution of mitochondrial genome size from complete sequences of Citrullus lanatus and Cucurbita pepo (Cucurbitaceae).

    Science.gov (United States)

    Alverson, Andrew J; Wei, XiaoXin; Rice, Danny W; Stern, David B; Barry, Kerrie; Palmer, Jeffrey D

    2010-06-01

    The mitochondrial genomes of seed plants are unusually large and vary in size by at least an order of magnitude. Much of this variation occurs within a single family, the Cucurbitaceae, whose genomes range from an estimated 390 to 2,900 kb in size. We sequenced the mitochondrial genomes of Citrullus lanatus (watermelon: 379,236 nt) and Cucurbita pepo (zucchini: 982,833 nt)--the two smallest characterized cucurbit mitochondrial genomes--and determined their RNA editing content. The relatively compact Citrullus mitochondrial genome actually contains more and longer genes and introns, longer segmental duplications, and more discernibly nuclear-derived DNA. The large size of the Cucurbita mitochondrial genome reflects the accumulation of unprecedented amounts of both chloroplast sequences (>113 kb) and short repeated sequences (>370 kb). A low mutation rate has been hypothesized to underlie increases in both genome size and RNA editing frequency in plant mitochondria. However, despite its much larger genome, Cucurbita has a significantly higher synonymous substitution rate (and presumably mutation rate) than Citrullus but comparable levels of RNA editing. The evolution of mutation rate, genome size, and RNA editing are apparently decoupled in Cucurbitaceae, reflecting either simple stochastic variation or governance by different factors.

  19. Human genome and genetic sequencing research and informed consent

    International Nuclear Information System (INIS)

    Iwakawa, Mayumi

    2003-01-01

    On March 29, 2001, the Ethical Guidelines for Human Genome and Genetic Sequencing Research were established. They have intended to serve as ethical guidelines for all human genome and genetic sequencing research practice, for the purpose of upholding respect for human dignity and rights and enforcing use of proper methods in the pursuit of human genome and genetic sequencing research, with the understanding and cooperation of the public. The RadGenomics Project has prepared a research protocol and informed consent document that follow these ethical guidelines. We have endeavored to protect the privacy of individual information, and have established a procedure for examination of research practices by an ethics committee. Here we report our procedure in order to offer this concept to the patients. (authors)

  20. In Silico Mining of Microsatellites in Coding Sequences of the Date Palm (Arecaceae Genome, Characterization, and Transferability

    Directory of Open Access Journals (Sweden)

    Frédérique Aberlenc-Bertossi

    2014-01-01

    Full Text Available Premise of the study: To complement existing sets of primarily dinucleotide microsatellite loci from noncoding sequences of date palm, we developed primers for tri- and hexanucleotide microsatellite loci identified within genes. Due to their conserved genomic locations, the primers should be useful in other palm taxa, and their utility was tested in seven other Phoenix species and in Chamaerops, Livistona, and Hyphaene. Methods and Results: Tandem repeat motifs of 3–6 bp were searched using a simple sequence repeat (SSR–pipeline package in coding portions of the date palm draft genome sequence. Fifteen loci produced highly consistent amplification, intraspecific polymorphisms, and stepwise mutation patterns. Conclusions: These microsatellite loci showed sufficient levels of variability and transferability to make them useful for population genetic, selection signature, and interspecific gene flow studies in Phoenix and other Coryphoideae genera.

  1. Draft Genome Sequencing and Comparative Analysis of Aspergillus sojae NBRC4239

    Science.gov (United States)

    Sato, Atsushi; Oshima, Kenshiro; Noguchi, Hideki; Ogawa, Masahiro; Takahashi, Tadashi; Oguma, Tetsuya; Koyama, Yasuji; Itoh, Takehiko; Hattori, Masahira; Hanya, Yoshiki

    2011-01-01

    We conducted genome sequencing of the filamentous fungus Aspergillus sojae NBRC4239 isolated from the koji used to prepare Japanese soy sauce. We used the 454 pyrosequencing technology and investigated the genome with respect to enzymes and secondary metabolites in comparison with other Aspergilli sequenced. Assembly of 454 reads generated a non-redundant sequence of 39.5-Mb possessing 13 033 putative genes and 65 scaffolds composed of 557 contigs. Of the 2847 open reading frames with Pfam domain scores of >150 found in A. sojae NBRC4239, 81.7% had a high degree of similarity with the genes of A. oryzae. Comparative analysis identified serine carboxypeptidase and aspartic protease genes unique to A. sojae NBRC4239. While A. oryzae possessed three copies of α-amyalse gene, A. sojae NBRC4239 possessed only a single copy. Comparison of 56 gene clusters for secondary metabolites between A. sojae NBRC4239 and A. oryzae revealed that 24 clusters were conserved, whereas 32 clusters differed between them that included a deletion of 18 508 bp containing mfs1, mao1, dmaT, and pks-nrps for the cyclopiazonic acid (CPA) biosynthesis, explaining the no productivity of CPA in A. sojae. The A. sojae NBRC4239 genome data will be useful to characterize functional features of the koji moulds used in Japanese industries. PMID:21659486

  2. Draft genome sequences of two virulent serotypes of avian Pasteurella multocida

    Science.gov (United States)

    Here we report the draft genome sequences of two virulent avian strains of Pasteurella multocida. Comparative analyses of these genomes were done with the published genome sequence of avirulent Pasteurella multocida strain Pm70....

  3. Draft Genome Sequences of Two Virulent Serotypes of Avian Pasteurella multocida

    OpenAIRE

    Abrahante, Juan E.; Johnson, Timothy J.; Hunter, Samuel S.; Maheswaran, Samuel K.; Hauglund, Melissa J.; Bayles, Darrell O.; Tatum, Fred M.; Briggs, Robert E.

    2013-01-01

    Here we report the draft genome sequences of two virulent avian strains of Pasteurella multocida. Comparative analyses of these genomes were done with the published genome sequence of avirulent P.?multocida strain Pm70.

  4. Genome sequence of a mesophilic hydrogenotrophic methanogen Methanocella paludicola, the first cultivated representative of the order Methanocellales.

    Directory of Open Access Journals (Sweden)

    Sanae Sakai

    Full Text Available We report complete genome sequence of a mesophilic hydrogenotrophic methanogen Methanocella paludicola, the first cultured representative of the order Methanocellales once recognized as an uncultured key archaeal group for methane emission in rice fields. The genome sequence of M. paludicola consists of a single circular chromosome of 2,957,635 bp containing 3004 protein-coding sequences (CDS. Genes for most of the functions known in the methanogenic archaea were identified, e.g. a full complement of hydrogenases and methanogenesis enzymes. The mixotrophic growth of M. paludicola was clarified by the genomic characterization and re-examined by the subsequent growth experiments. Comparative genome analysis with the previously reported genome sequence of RC-I(MRE50, which was metagenomically reconstructed, demonstrated that about 70% of M. paludicola CDSs were genetically related with RC-I(MRE50 CDSs. These CDSs included the genes involved in hydrogenotrophic methane production, incomplete TCA cycle, assimilatory sulfate reduction and so on. However, the genetic components for the carbon and nitrogen fixation and antioxidant system were different between the two Methanocellales genomes. The difference is likely associated with the physiological variability between M. paludicola and RC-I(MRE50, further suggesting the genomic and physiological diversity of the Methanocellales methanogens. Comparative genome analysis among the previously determined methanogen genomes points to the genome-wide relatedness of the Methanocellales methanogens to the orders Methanosarcinales and Methanomicrobiales methanogens in terms of the genetic repertoire. Meanwhile, the unique evolutionary history of the Methanocellales methanogens is also traced in an aspect by the comparative genome analysis among the methanogens.

  5. Utility of Whole-Genome Sequencing in Characterizing Acinetobacter Epidemiology and Analyzing Hospital Outbreaks

    Science.gov (United States)

    Fitzpatrick, Margaret A.; Hauser, Alan R.

    2015-01-01

    Acinetobacter baumannii frequently causes nosocomial infections and outbreaks. Whole-genome sequencing (WGS) is a promising technique for strain typing and outbreak investigations. We compared the performance of conventional methods with WGS for strain typing clinical Acinetobacter isolates and analyzing a carbapenem-resistant A. baumannii (CRAB) outbreak. We performed two band-based typing techniques (pulsed-field gel electrophoresis and repetitive extragenic palindromic-PCR), multilocus sequence type (MLST) analysis, and WGS on 148 Acinetobacter calcoaceticus-A. baumannii complex bloodstream isolates collected from a single hospital from 2005 to 2012. Phylogenetic trees inferred from core-genome single nucleotide polymorphisms (SNPs) confirmed three Acinetobacter species within this collection. Four major A. baumannii clonal lineages (as defined by MLST) circulated during the study, three of which are globally distributed and one of which is novel. WGS indicated that a threshold of 2,500 core SNPs accurately distinguished A. baumannii isolates from different clonal lineages. The band-based techniques performed poorly in assigning isolates to clonal lineages and exhibited little agreement with sequence-based techniques. After applying WGS to a CRAB outbreak that occurred during the study, we identified a threshold of 2.5 core SNPs that distinguished nonoutbreak from outbreak strains. WGS was more discriminatory than the band-based techniques and was used to construct a more accurate transmission map that resolved many of the plausible transmission routes suggested by epidemiologic links. Our study demonstrates that WGS is superior to conventional techniques for A. baumannii strain typing and outbreak analysis. These findings support the incorporation of WGS into health care infection prevention efforts. PMID:26699703

  6. Low-pass shotgun sequencing of the barley genome facilitates rapid identification of genes, conserved non-coding sequences and novel repeats

    Directory of Open Access Journals (Sweden)

    Graner Andreas

    2008-10-01

    Full Text Available Abstract Background Barley has one of the largest and most complex genomes of all economically important food crops. The rise of new short read sequencing technologies such as Illumina/Solexa permits such large genomes to be effectively sampled at relatively low cost. Based on the corresponding sequence reads a Mathematically Defined Repeat (MDR index can be generated to map repetitive regions in genomic sequences. Results We have generated 574 Mbp of Illumina/Solexa sequences from barley total genomic DNA, representing about 10% of a genome equivalent. From these sequences we generated an MDR index which was then used to identify and mark repetitive regions in the barley genome. Comparison of the MDR plots with expert repeat annotation drawing on the information already available for known repetitive elements revealed a significant correspondence between the two methods. MDR-based annotation allowed for the identification of dozens of novel repeat sequences, though, which were not recognised by hand-annotation. The MDR data was also used to identify gene-containing regions by masking of repetitive sequences in eight de-novo sequenced bacterial artificial chromosome (BAC clones. For half of the identified candidate gene islands indeed gene sequences could be identified. MDR data were only of limited use, when mapped on genomic sequences from the closely related species Triticum monococcum as only a fraction of the repetitive sequences was recognised. Conclusion An MDR index for barley, which was obtained by whole-genome Illumina/Solexa sequencing, proved as efficient in repeat identification as manual expert annotation. Circumventing the labour-intensive step of producing a specific repeat library for expert annotation, an MDR index provides an elegant and efficient resource for the identification of repetitive and low-copy (i.e. potentially gene-containing sequences regions in uncharacterised genomic sequences. The restriction that a particular

  7. Genome sequence of the moderately thermophilic, amino-acid-degrading and sulfur-reducing bacterium Thermovirga lienii type strain (Cas60314(T)).

    Science.gov (United States)

    Göker, Markus; Saunders, Elisabeth; Lapidus, Alla; Nolan, Matt; Lucas, Susan; Hammon, Nancy; Deshpande, Shweta; Cheng, Jan-Fang; Han, Cliff; Tapia, Roxanne; Goodwin, Lynne A; Pitluck, Sam; Liolios, Konstantinos; Mavromatis, Konstantinos; Pagani, Ioanna; Ivanova, Natalia; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Chang, Yun-Juan; Jeffries, Cynthia D; Brambilla, Evelyne-Marie; Rohde, Manfred; Spring, Stefan; Detter, John C; Woyke, Tanja; Bristow, James; Eisen, Jonathan A; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter

    2012-05-25

    Thermovirga lienii Dahle and Birkeland 2006 is a member of the genus Thermovirga in the genomically moderately well characterized phylum 'Synergistetes'. Members of this relatively recently proposed phylum 'Synergistetes' are of interest because of their isolated phylogenetic position and their diverse habitats, e.g. from humans to oil wells. The genome of T. lienii Cas60314(T) is the fifth genome sequence (third completed) from this phylum to be published. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 1,999,646 bp long genome (including one plasmid) with its 1,914 protein-coding and 59 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  8. Supplementary Material for: Whole genome sequencing reveals genomic heterogeneity and antibiotic purification in Mycobacterium tuberculosis isolates

    KAUST Repository

    Black, PA

    2015-01-01

    Abstract Background Whole genome sequencing has revolutionised the interrogation of mycobacterial genomes. Recent studies have reported conflicting findings on the genomic stability of Mycobacterium tuberculosis during the evolution of drug resistance. In an age where whole genome sequencing is increasingly relied upon for defining the structure of bacterial genomes, it is important to investigate the reliability of next generation sequencing to identify clonal variants present in a minor percentage of the population. This study aimed to define a reliable cut-off for identification of low frequency sequence variants and to subsequently investigate genetic heterogeneity and the evolution of drug resistance in M. tuberculosis. Methods Genomic DNA was isolated from single colonies from 14 rifampicin mono-resistant M. tuberculosis isolates, as well as the primary cultures and follow up MDR cultures from two of these patients. The whole genomes of the M. tuberculosis isolates were sequenced using either the Illumina MiSeq or Illumina HiSeq platforms. Sequences were analysed with an in-house pipeline. Results Using next-generation sequencing in combination with Sanger sequencing and statistical analysis we defined a read frequency cut-off of 30 % to identify low frequency M. tuberculosis variants with high confidence. Using this cut-off we demonstrated a high rate of genetic diversity between single colonies isolated from one population, showing that by using the current sequencing technology, single colonies are not a true reflection of the genetic diversity within a whole population and vice versa. We further showed that numerous heterogeneous variants emerge and then disappear during the evolution of isoniazid resistance within individual patients. Our findings allowed us to formulate a model for the selective bottleneck which occurs during the course of infection, acting as a genomic purification event. Conclusions Our study demonstrated true levels of genetic

  9. Draft genome sequence of the Coccolithovirus Emiliania huxleyi virus 203.

    Science.gov (United States)

    Nissimov, Jozef I; Worthy, Charlotte A; Rooks, Paul; Napier, Johnathan A; Kimmance, Susan A; Henn, Matthew R; Ogata, Hiroyuki; Allen, Michael J

    2011-12-01

    The Coccolithoviridae are a recently discovered group of viruses that infect the marine coccolithophorid Emiliania huxleyi. Emiliania huxleyi virus 203 (EhV-203) has a 160- to 180-nm-diameter icosahedral structure and a genome of approximately 400 kbp, consisting of 464 coding sequences (CDSs). Here we describe the genomic features of EhV-203 together with a draft genome sequence and its annotation, highlighting the homology and heterogeneity of this genome in comparison with the EhV-86 reference genome.

  10. Genome sequence of the moderately thermophilic halophile Flexistipes sinusarabici strain (MAS10T)

    Energy Technology Data Exchange (ETDEWEB)

    Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute; Chertkov, Olga [Los Alamos National Laboratory (LANL); Nolan, Matt [U.S. Department of Energy, Joint Genome Institute; Lucas, Susan [U.S. Department of Energy, Joint Genome Institute; Hammon, Nancy [U.S. Department of Energy, Joint Genome Institute; Deshpande, Shweta [U.S. Department of Energy, Joint Genome Institute; Cheng, Jan-Fang [U.S. Department of Energy, Joint Genome Institute; Tapia, Roxanne [Los Alamos National Laboratory (LANL); Han, Cliff [Los Alamos National Laboratory (LANL); Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Pitluck, Sam [U.S. Department of Energy, Joint Genome Institute; Liolios, Konstantinos [U.S. Department of Energy, Joint Genome Institute; Pagani, Ioanna [U.S. Department of Energy, Joint Genome Institute; Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Huntemann, Marcel [U.S. Department of Energy, Joint Genome Institute; Mavromatis, K [U.S. Department of Energy, Joint Genome Institute; Mikhailova, Natalia [U.S. Department of Energy, Joint Genome Institute; Pati, Amrita [U.S. Department of Energy, Joint Genome Institute; Chen, Amy [U.S. Department of Energy, Joint Genome Institute; Palaniappan, Krishna [U.S. Department of Energy, Joint Genome Institute; Land, Miriam L [ORNL; Hauser, Loren John [ORNL; Brambilla, Evelyne-Marie [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Rohde, Manfred [HZI - Helmholtz Centre for Infection Research, Braunschweig, Germany; Abt, Birte [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Spring, Stefan [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Goker, Markus [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Bristow, James [U.S. Department of Energy, Joint Genome Institute; Eisen, Jonathan [U.S. Department of Energy, Joint Genome Institute; Markowitz, Victor [U.S. Department of Energy, Joint Genome Institute; Hugenholtz, Philip [U.S. Department of Energy, Joint Genome Institute; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute; Klenk, Hans-Peter [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Woyke, Tanja [U.S. Department of Energy, Joint Genome Institute

    2011-01-01

    Flexistipes sinusarabici Fiala et al. 2000 is the type species of the genus Flexistipes in the fami- ly Deferribacteraceae. The species is of interest because of its isolated phylogenetic location in a genomically under-characterized region of the tree of life, and because of its origin from a multiply extreme environment; the Atlantis Deep brines of the Red Sea, where it had to struggle with high temperatures, high salinity, and a high concentrations of heavy metals. This is the fourth completed genome sequence to be published of a type strain of the family Deferribacteraceae. The 2,526,590 bp long genome with its 2,346 protein-coding and 53 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  11. A new HCV genotype 6 subtype designated 6v was confirmed with three complete genome sequences.

    Science.gov (United States)

    Wang, Yizhong; Xia, Xueshan; Li, Chunhua; Maneekarn, Niwat; Xia, Wenjie; Zhao, Wenhua; Feng, Yue; Kung, Hsiang Fu; Fu, Yongshui; Lu, Ling

    2009-03-01

    Although hepatitis C virus (HCV) genotype 6 is classified into 21 subtypes, 6a-6u, new variants continue to be identified. To characterize the full-length genomes of three novel HCV genotype 6 variants: KMN02, KM046 and KM181. From sera of patients with HCV infection, the entire HCV genome was amplified by RT-PCR followed by direct DNA sequencing and phylogenetic analysis. The sera contained HCV genomes of 9461, 9429, and 9461nt in length, and each harboured a single ORF of 9051nt. The genomes showed 95.3-98.1% nucleotide similarity to each other and 72.2-75.4% similarity to 23 genotype 6 reference sequences, which represent subtypes 6a-6u and unassigned variants km41 and gz52557. Phylogenetic analyses demonstrated that they were genotype 6, but were subtypically distinct. Based on the current criteria of HCV classification, they were designed to represent a new subtype, 6v. Analysis of E1 and NS5B region partial sequences revealed two additional related variants, CMBD-14 and CMBD-86 that had been previously reported in northern Thailand and sequences dropped into Genbank. Three novel HCV genotype 6 variants were entirely sequenced and designated subtype 6v.

  12. Mitochondrial Genome Sequences and Structures Aid in the Resolution of Piroplasmida phylogeny

    Science.gov (United States)

    Marr, Henry S.; Tarigo, Jaime L.; Cohn, Leah A.; Bird, David M.; Scholl, Elizabeth H.; Levy, Michael G.; Wiegmann, Brian M.; Birkenheuer, Adam J.

    2016-01-01

    The taxonomy of the order Piroplasmida, which includes a number of clinically and economically relevant organisms, is a hotly debated topic amongst parasitologists. Three genera (Babesia, Theileria, and Cytauxzoon) are recognized based on parasite life cycle characteristics, but molecular phylogenetic analyses of 18S sequences have suggested the presence of five or more distinct Piroplasmida lineages. Despite these important advancements, a few studies have been unable to define the taxonomic relationships of some organisms (e.g. C. felis and T. equi) with respect to other Piroplasmida. Additional evidence from mitochondrial genome sequences and synteny should aid in the inference of Piroplasmida phylogeny and resolution of taxonomic uncertainties. In this study, we have amplified, sequenced, and annotated seven previously uncharacterized mitochondrial genomes (Babesia canis, Babesia vogeli, Babesia rossi, Babesia sp. Coco, Babesia conradae, Babesia microti-like sp., and Cytauxzoon felis) and identified additional ribosomal fragments in ten previously characterized mitochondrial genomes. Phylogenetic analysis of concatenated mitochondrial and 18S sequences as well as cox1 amino acid sequence identified five distinct Piroplasmida groups, each of which possesses a unique mitochondrial genome structure. Specifically, our results confirm the existence of four previously identified clades (B. microti group, Babesia sensu stricto, Theileria equi, and a Babesia sensu latu group that includes B. conradae) while supporting the integration of Theileria and Cytauxzoon species into a single fifth taxon. Although known biological characteristics of Piroplasmida corroborate the proposed phylogeny, more investigation into parasite life cycles is warranted to further understand the evolution of the Piroplasmida. Our results provide an evolutionary framework for comparative biology of these important animal and human pathogens and help focus renewed efforts toward understanding the

  13. Mitochondrial Genome Sequences and Structures Aid in the Resolution of Piroplasmida phylogeny.

    Directory of Open Access Journals (Sweden)

    Megan E Schreeg

    Full Text Available The taxonomy of the order Piroplasmida, which includes a number of clinically and economically relevant organisms, is a hotly debated topic amongst parasitologists. Three genera (Babesia, Theileria, and Cytauxzoon are recognized based on parasite life cycle characteristics, but molecular phylogenetic analyses of 18S sequences have suggested the presence of five or more distinct Piroplasmida lineages. Despite these important advancements, a few studies have been unable to define the taxonomic relationships of some organisms (e.g. C. felis and T. equi with respect to other Piroplasmida. Additional evidence from mitochondrial genome sequences and synteny should aid in the inference of Piroplasmida phylogeny and resolution of taxonomic uncertainties. In this study, we have amplified, sequenced, and annotated seven previously uncharacterized mitochondrial genomes (Babesia canis, Babesia vogeli, Babesia rossi, Babesia sp. Coco, Babesia conradae, Babesia microti-like sp., and Cytauxzoon felis and identified additional ribosomal fragments in ten previously characterized mitochondrial genomes. Phylogenetic analysis of concatenated mitochondrial and 18S sequences as well as cox1 amino acid sequence identified five distinct Piroplasmida groups, each of which possesses a unique mitochondrial genome structure. Specifically, our results confirm the existence of four previously identified clades (B. microti group, Babesia sensu stricto, Theileria equi, and a Babesia sensu latu group that includes B. conradae while supporting the integration of Theileria and Cytauxzoon species into a single fifth taxon. Although known biological characteristics of Piroplasmida corroborate the proposed phylogeny, more investigation into parasite life cycles is warranted to further understand the evolution of the Piroplasmida. Our results provide an evolutionary framework for comparative biology of these important animal and human pathogens and help focus renewed efforts toward

  14. Rapid whole genome sequencing for the detection and characterization of microorganisms directly from clinical samples

    DEFF Research Database (Denmark)

    Hasman, Henrik; Saputra, Dhany; Sicheritz-Pontén, Thomas

    2014-01-01

    Whole genome sequencing (WGS) is becoming available as a routine tool for clinical microbiology. If applied directly on clinical samples this could further reduce diagnostic time and thereby improve control and treatment. A major bottle-neck is the availability of fast and reliable bioinformatics...

  15. Characterization of Three Mycobacterium spp. with Potential Use in Bioremediation by Genome Sequencing and Comparative Genomics.

    Science.gov (United States)

    Das, Sarbashis; Pettersson, B M Fredrik; Behra, Phani Rama Krishna; Ramesh, Malavika; Dasgupta, Santanu; Bhattacharya, Alok; Kirsebom, Leif A

    2015-06-16

    We provide the genome sequences of the type strains of the polychlorophenol-degrading Mycobacterium chlorophenolicum (DSM43826), the degrader of chlorinated aliphatics Mycobacterium chubuense (DSM44219) and Mycobacterium obuense (DSM44075) that has been tested for use in cancer immunotherapy. The genome sizes of M. chlorophenolicum, M. chubuense, and M. obuense are 6.93, 5.95, and 5.58 Mb with GC-contents of 68.4%, 69.2%, and 67.9%, respectively. Comparative genomic analysis revealed that 3,254 genes are common and we predicted approximately 250 genes acquired through horizontal gene transfer from different sources including proteobacteria. The data also showed that the biodegrading Mycobacterium spp. NBB4, also referred to as M. chubuense NBB4, is distantly related to the M. chubuense type strain and should be considered as a separate species, we suggest it to be named Mycobacterium ethylenense NBB4. Among different categories we identified genes with potential roles in: biodegradation of aromatic compounds and copper homeostasis. These are the first nonpathogenic Mycobacterium spp. found harboring genes involved in copper homeostasis. These findings would therefore provide insight into the role of this group of Mycobacterium spp. in bioremediation as well as the evolution of copper homeostasis within the Mycobacterium genus. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  16. The First Complete Chloroplast Genome Sequences in Actinidiaceae: Genome Structure and Comparative Analysis.

    Science.gov (United States)

    Yao, Xiaohong; Tang, Ping; Li, Zuozhou; Li, Dawei; Liu, Yifei; Huang, Hongwen

    2015-01-01

    Actinidia chinensis is an important economic plant belonging to the basal lineage of the asterids. Availability of a complete Actinidia chloroplast genome sequence is crucial to understanding phylogenetic relationships among major lineages of angiosperms and facilitates kiwifruit genetic improvement. We report here the complete nucleotide sequences of the chloroplast genomes for Actinidia chinensis and A. chinensis var deliciosa obtained through de novo assembly of Illumina paired-end reads produced by total DNA sequencing. The total genome size ranges from 155,446 to 157,557 bp, with an inverted repeat (IR) of 24,013 to 24,391 bp, a large single copy region (LSC) of 87,984 to 88,337 bp and a small single copy region (SSC) of 20,332 to 20,336 bp. The genome encodes 113 different genes, including 79 unique protein-coding genes, 30 tRNA genes and 4 ribosomal RNA genes, with 16 duplicated in the inverted repeats, and a tRNA gene (trnfM-CAU) duplicated once in the LSC region. Comparisons of IR boundaries among four asterid species showed that IR/LSC borders were extended into the 5' portion of the psbA gene and IR contraction occurred in Actinidia. The clap gene has been lost from the chloroplast genome in Actinidia, and may have been transferred to the nucleus during chloroplast evolution. Twenty-seven polymorphic simple sequence repeat (SSR) loci were identified in the Actinidia chloroplast genome. Maximum parsimony analyses of a 72-gene, 16 taxa angiosperm dataset strongly support the placement of Actinidiaceae in Ericales within the basal asterids.

  17. GI-SVM: A sensitive method for predicting genomic islands based on unannotated sequence of a single genome.

    Science.gov (United States)

    Lu, Bingxin; Leong, Hon Wai

    2016-02-01

    Genomic islands (GIs) are clusters of functionally related genes acquired by lateral genetic transfer (LGT), and they are present in many bacterial genomes. GIs are extremely important for bacterial research, because they not only promote genome evolution but also contain genes that enhance adaption and enable antibiotic resistance. Many methods have been proposed to predict GI. But most of them rely on either annotations or comparisons with other closely related genomes. Hence these methods cannot be easily applied to new genomes. As the number of newly sequenced bacterial genomes rapidly increases, there is a need for methods to detect GI based solely on sequences of a single genome. In this paper, we propose a novel method, GI-SVM, to predict GIs given only the unannotated genome sequence. GI-SVM is based on one-class support vector machine (SVM), utilizing composition bias in terms of k-mer content. From our evaluations on three real genomes, GI-SVM can achieve higher recall compared with current methods, without much loss of precision. Besides, GI-SVM allows flexible parameter tuning to get optimal results for each genome. In short, GI-SVM provides a more sensitive method for researchers interested in a first-pass detection of GI in newly sequenced genomes.

  18. Genome sequencing and annotation of Serratia sp. strain TEL.

    Science.gov (United States)

    Lephoto, Tiisetso E; Gray, Vincent M

    2015-12-01

    We present the annotation of the draft genome sequence of Serratia sp. strain TEL (GenBank accession number KP711410). This organism was isolated from entomopathogenic nematode Oscheius sp. strain TEL (GenBank accession number KM492926) collected from grassland soil and has a genome size of 5,000,541 bp and 542 subsystems. The genome sequence can be accessed at DDBJ/EMBL/GenBank under the accession number LDEG00000000.

  19. Genome sequencing and annotation of Serratia sp. strain TEL

    Directory of Open Access Journals (Sweden)

    Tiisetso E. Lephoto

    2015-12-01

    Full Text Available We present the annotation of the draft genome sequence of Serratia sp. strain TEL (GenBank accession number KP711410. This organism was isolated from entomopathogenic nematode Oscheius sp. strain TEL (GenBank accession number KM492926 collected from grassland soil and has a genome size of 5,000,541 bp and 542 subsystems. The genome sequence can be accessed at DDBJ/EMBL/GenBank under the accession number LDEG00000000.

  20. Genome sequencing and annotation of Serratia sp. strain TEL

    OpenAIRE

    Lephoto, Tiisetso E.; Gray, Vincent M.

    2015-01-01

    We present the annotation of the draft genome sequence of Serratia sp. strain TEL (GenBank accession number KP711410). This organism was isolated from entomopathogenic nematode Oscheius sp. strain TEL (GenBank accession number KM492926) collected from grassland soil and has a genome size of 5,000,541 bp and 542 subsystems. The genome sequence can be accessed at DDBJ/EMBL/GenBank under the accession number LDEG00000000.

  1. Identification and Mapping of Simple Sequence Repeat Markers from Common Bean (Phaseolus vulgaris L. Bacterial Artificial Chromosome End Sequences for Genome Characterization and Genetic–Physical Map Integration

    Directory of Open Access Journals (Sweden)

    Juana M. Córdoba

    2010-11-01

    Full Text Available Microsatellite markers or simple sequence repeat (SSR loci are useful for diversity characterization and genetic–physical mapping. Different in silico microsatellite search methods have been developed for mining bacterial artificial chromosome (BAC end sequences for SSRs. The overall goal of this study was genome characterization based on SSRs in 89,017 BAC end sequences (BESs from the G19833 common bean ( L. library. Another objective was to identify new SSR taking into account three tandem motif identification programs (Automated Microsatellite Marker Development [AMMD], Tandem Repeats Finder [TRF], and SSRLocator [SSRL]. Among the microsatellite search engines, SSRL identified the highest number of SSRs; however, when primer design was attempted, the number dropped due to poor primer design regions. Automated Microsatellite Marker Development software identified many SSRs with valuable AT/TA or AG/TC motifs, while TRF found fewer SSRs and produced no primers. A subgroup of 323 AT-rich, di-, and trinucleotide SSRs were selected from the AMMD results and used in a parental survey with DOR364 and G19833, of which 75 could be mapped in the corresponding population; these represented 4052 BAC clones. Together with 92 previously mapped BES- and 114 non-BES-derived markers, a total of 280 SSRs were included in the polymerase chain reaction (PCR-based map, integrating a total of 8232 BAC clones in 162 contigs from the physical map.

  2. Draft Genome Sequences of Two Virulent Serotypes of Avian Pasteurella multocida

    Science.gov (United States)

    Abrahante, Juan E.; Johnson, Timothy J.; Hunter, Samuel S.; Maheswaran, Samuel K.; Hauglund, Melissa J.; Bayles, Darrell O.; Tatum, Fred M.

    2013-01-01

    Here we report the draft genome sequences of two virulent avian strains of Pasteurella multocida. Comparative analyses of these genomes were done with the published genome sequence of avirulent P. multocida strain Pm70. PMID:23405337

  3. Genetic Characterization and Comparative Genome Analysis of Brucella melitensis Isolates from India

    Directory of Open Access Journals (Sweden)

    Sarwar Azam

    2016-01-01

    Full Text Available Brucellosis is the most frequent zoonotic disease worldwide, with over 500,000 new human infections every year. Brucella melitensis, the most virulent species in humans, primarily affects goats and the zoonotic transmission occurs by ingestion of unpasteurized milk products or through direct contact with fetal tissues. Brucellosis is endemic in India but no information is available on population structure and genetic diversity of Brucella spp. in India. We performed multilocus sequence typing of four B. melitensis strains isolated from naturally infected goats from India. For more detailed genetic characterization, we carried out whole genome sequencing and comparative genome analysis of one of the B. melitensis isolates, Bm IND1. Genome analysis identified 141 unique SNPs, 78 VNTRs, 51 Indels, and 2 putative prophage integrations in the Bm IND1 genome. Our data may help to develop improved epidemiological typing tools and efficient preventive strategies to control brucellosis.

  4. Why size really matters when sequencing plant genomes

    Czech Academy of Sciences Publication Activity Database

    Kelly, L.J.; Leitch, A.R.; Fay, M. F.; Renny-Byfield, S.; Pellicer, J.; Macas, Jiří; Leitch, I.J.

    2012-01-01

    Roč. 5, č. 4 (2012), s. 415-425 ISSN 1755-0874 Institutional research plan: CEZ:AV0Z50510513 Institutional support: RVO:60077344 Keywords : C-value * genome assembly * genome size evolution * genome sequencing Subject RIV: EB - Genetics ; Molecular Biology Impact factor: 0.924, year: 2012

  5. Comparison of 61 Sequenced Escherichia coli Genomes

    DEFF Research Database (Denmark)

    Lukjancenko, Oksana; Wassenaar, T. M.; Ussery, David

    2010-01-01

    Escherichia coli is an important component of the biosphere and is an ideal model for studies of processes involved in bacterial genome evolution. Sixty-one publically available E. coli and Shigella spp. sequenced genomes are compared, using basic methods to produce phylogenetic and proteomics...

  6. Draft genome sequence of the coccolithovirus Emiliania huxleyi virus 202.

    Science.gov (United States)

    Nissimov, Jozef I; Worthy, Charlotte A; Rooks, Paul; Napier, Johnathan A; Kimmance, Susan A; Henn, Matthew R; Ogata, Hiroyuki; Allen, Michael J

    2012-02-01

    Emiliania huxleyi virus 202 (EhV-202) is a member of the Coccolithoviridae, a group of viruses that infect the marine coccolithophorid Emiliania huxleyi. EhV-202 has a 160- to 180-nm-diameter icosahedral structure and a genome of approximately 407 kbp, consisting of 485 coding sequences (CDSs). Here we describe the genomic features of EhV-202, together with a draft genome sequence and its annotation, highlighting the homology and heterogeneity of this genome in comparison with the EhV-86 reference genome.

  7. Draft genome sequence of ramie, Boehmeria nivea (L.) Gaudich.

    Science.gov (United States)

    Luan, Ming-Bao; Jian, Jian-Bo; Chen, Ping; Chen, Jun-Hui; Chen, Jian-Hua; Gao, Qiang; Gao, Gang; Zhou, Ju-Hong; Chen, Kun-Mei; Guang, Xuan-Min; Chen, Ji-Kang; Zhang, Qian-Qian; Wang, Xiao-Fei; Fang, Long; Sun, Zhi-Min; Bai, Ming-Zhou; Fang, Xiao-Dong; Zhao, Shan-Cen; Xiong, He-Ping; Yu, Chun-Ming; Zhu, Ai-Guo

    2018-05-01

    Ramie, Boehmeria nivea (L.) Gaudich, family Urticaceae, is a plant native to eastern Asia, and one of the world's oldest fibre crops. It is also used as animal feed and for the phytoremediation of heavy metal-contaminated farmlands. Thus, the genome sequence of ramie was determined to explore the molecular basis of its fibre quality, protein content and phytoremediation. For further understanding ramie genome, different paired-end and mate-pair libraries were combined to generate 134.31 Gb of raw DNA sequences using the Illumina whole-genome shotgun sequencing approach. The highly heterozygous B. nivea genome was assembled using the Platanus Genome Assembler, which is an effective tool for the assembly of highly heterozygous genome sequences. The final length of the draft genome of this species was approximately 341.9 Mb (contig N50 = 22.62 kb, scaffold N50 = 1,126.36 kb). Based on ramie genome annotations, 30,237 protein-coding genes were predicted, and the repetitive element content was 46.3%. The completeness of the final assembly was evaluated by benchmarking universal single-copy orthologous genes (BUSCO); 90.5% of the 1,440 expected embryophytic genes were identified as complete, and 4.9% were identified as fragmented. Phylogenetic analysis based on single-copy gene families and one-to-one orthologous genes placed ramie with mulberry and cannabis, within the clade of urticalean rosids. Genome information of ramie will be a valuable resource for the conservation of endangered Boehmeria species and for future studies on the biogeography and characteristic evolution of members of Urticaceae. © 2018 John Wiley & Sons Ltd.

  8. Comprehensive genomic characterization of campylobacter genus reveals some underlying mechanisms for its genomic diversification.

    Directory of Open Access Journals (Sweden)

    Yizhuang Zhou

    Full Text Available Campylobacter species.are phenotypically diverse in many aspects including host habitats and pathogenicities, which demands comprehensive characterization of the entire Campylobacter genus to study their underlying genetic diversification. Up to now, 34 Campylobacter strains have been sequenced and published in public databases, providing good opportunity to systemically analyze their genomic diversities. In this study, we first conducted genomic characterization, which includes genome-wide alignments, pan-genome analysis, and phylogenetic identification, to depict the genetic diversity of Campylobacter genus. Afterward, we improved the tetranucleotide usage pattern-based naïve Bayesian classifier to identify the abnormal composition fragments (ACFs, fragments with significantly different tetranucleotide frequency profiles from its genomic tetranucleotide frequency profiles including horizontal gene transfers (HGTs to explore the mechanisms for the genetic diversity of this organism. Finally, we analyzed the HGTs transferred via bacteriophage transductions. To our knowledge, this study is the first to use single nucleotide polymorphism information to construct liable microevolution phylogeny of 21 Campylobacter jejuni strains. Combined with the phylogeny of all the collected Campylobacter species based on genome-wide core gene information, comprehensive phylogenetic inference of all 34 Campylobacter organisms was determined. It was found that C. jejuni harbors a high fraction of ACFs possibly through intraspecies recombination, whereas other Campylobacter members possess numerous ACFs possibly via intragenus recombination. Furthermore, some Campylobacter strains have undergone significant ancient viral integration during their evolution process. The improved method is a powerful tool for bacterial genomic analysis. Moreover, the findings would provide useful information for future research on Campylobacter genus.

  9. Whole-genome sequencing and genetic variant analysis of a Quarter Horse mare.

    KAUST Repository

    Doan, Ryan; Cohen, Noah D; Sawyer, Jason; Ghaffari, Noushin; Johnson, Charlie D; Dindot, Scott V

    2012-01-01

    BACKGROUND: The catalog of genetic variants in the horse genome originates from a few select animals, the majority originating from the Thoroughbred mare used for the equine genome sequencing project. The purpose of this study was to identify genetic variants, including single nucleotide polymorphisms (SNPs), insertion/deletion polymorphisms (INDELs), and copy number variants (CNVs) in the genome of an individual Quarter Horse mare sequenced by next-generation sequencing. RESULTS: Using massively parallel paired-end sequencing, we generated 59.6 Gb of DNA sequence from a Quarter Horse mare resulting in an average of 24.7X sequence coverage. Reads were mapped to approximately 97% of the reference Thoroughbred genome. Unmapped reads were de novo assembled resulting in 19.1 Mb of new genomic sequence in the horse. Using a stringent filtering method, we identified 3.1 million SNPs, 193 thousand INDELs, and 282 CNVs. Genetic variants were annotated to determine their impact on gene structure and function. Additionally, we genotyped this Quarter Horse for mutations of known diseases and for variants associated with particular traits. Functional clustering analysis of genetic variants revealed that most of the genetic variation in the horse's genome was enriched in sensory perception, signal transduction, and immunity and defense pathways. CONCLUSIONS: This is the first sequencing of a horse genome by next-generation sequencing and the first genomic sequence of an individual Quarter Horse mare. We have increased the catalog of genetic variants for use in equine genomics by the addition of novel SNPs, INDELs, and CNVs. The genetic variants described here will be a useful resource for future studies of genetic variation regulating performance traits and diseases in equids.

  10. Whole-genome sequencing and genetic variant analysis of a Quarter Horse mare.

    KAUST Repository

    Doan, Ryan

    2012-02-17

    BACKGROUND: The catalog of genetic variants in the horse genome originates from a few select animals, the majority originating from the Thoroughbred mare used for the equine genome sequencing project. The purpose of this study was to identify genetic variants, including single nucleotide polymorphisms (SNPs), insertion/deletion polymorphisms (INDELs), and copy number variants (CNVs) in the genome of an individual Quarter Horse mare sequenced by next-generation sequencing. RESULTS: Using massively parallel paired-end sequencing, we generated 59.6 Gb of DNA sequence from a Quarter Horse mare resulting in an average of 24.7X sequence coverage. Reads were mapped to approximately 97% of the reference Thoroughbred genome. Unmapped reads were de novo assembled resulting in 19.1 Mb of new genomic sequence in the horse. Using a stringent filtering method, we identified 3.1 million SNPs, 193 thousand INDELs, and 282 CNVs. Genetic variants were annotated to determine their impact on gene structure and function. Additionally, we genotyped this Quarter Horse for mutations of known diseases and for variants associated with particular traits. Functional clustering analysis of genetic variants revealed that most of the genetic variation in the horse\\'s genome was enriched in sensory perception, signal transduction, and immunity and defense pathways. CONCLUSIONS: This is the first sequencing of a horse genome by next-generation sequencing and the first genomic sequence of an individual Quarter Horse mare. We have increased the catalog of genetic variants for use in equine genomics by the addition of novel SNPs, INDELs, and CNVs. The genetic variants described here will be a useful resource for future studies of genetic variation regulating performance traits and diseases in equids.

  11. Draft genome sequence of Therminicola potens strain JR

    Energy Technology Data Exchange (ETDEWEB)

    Byrne-Bailey, K.G.; Wrighton, K.C.; Melnyk, R.A.; Agbo, P.; Hazen, T.C.; Coates, J.D.

    2010-07-01

    'Thermincola potens' strain JR is one of the first Gram-positive dissimilatory metal-reducing bacteria (DMRB) for which there is a complete genome sequence. Consistent with the physiology of this organism, preliminary annotation revealed an abundance of multiheme c-type cytochromes that are putatively associated with the periplasm and cell surface in a Gram-positive bacterium. Here we report the complete genome sequence of strain JR.

  12. Virtual Genome Walking across the 32 Gb Ambystoma mexicanum genome; assembling gene models and intronic sequence.

    Science.gov (United States)

    Evans, Teri; Johnson, Andrew D; Loose, Matthew

    2018-01-12

    Large repeat rich genomes present challenges for assembly using short read technologies. The 32 Gb axolotl genome is estimated to contain ~19 Gb of repetitive DNA making an assembly from short reads alone effectively impossible. Indeed, this model species has been sequenced to 20× coverage but the reads could not be conventionally assembled. Using an alternative strategy, we have assembled subsets of these reads into scaffolds describing over 19,000 gene models. We call this method Virtual Genome Walking as it locally assembles whole genome reads based on a reference transcriptome, identifying exons and iteratively extending them into surrounding genomic sequence. These assemblies are then linked and refined to generate gene models including upstream and downstream genomic, and intronic, sequence. Our assemblies are validated by comparison with previously published axolotl bacterial artificial chromosome (BAC) sequences. Our analyses of axolotl intron length, intron-exon structure, repeat content and synteny provide novel insights into the genic structure of this model species. This resource will enable new experimental approaches in axolotl, such as ChIP-Seq and CRISPR and aid in future whole genome sequencing efforts. The assembled sequences and annotations presented here are freely available for download from https://tinyurl.com/y8gydc6n . The software pipeline is available from https://github.com/LooseLab/iterassemble .

  13. Overlapping Genomic Sequences: A Treasure Trove of Single-Nucleotide Polymorphisms

    Science.gov (United States)

    Taillon-Miller, Patricia; Gu, Zhijie; Li, Qun; Hillier, LaDeana; Kwok, Pui-Yan

    1998-01-01

    An efficient strategy to develop a dense set of single-nucleotide polymorphism (SNP) markers is to take advantage of the human genome sequencing effort currently under way. Our approach is based on the fact that bacterial artificial chromosomes (BACs) and P1-based artificial chromosomes (PACs) used in long-range sequencing projects come from diploid libraries. If the overlapping clones sequenced are from different lineages, one is comparing the sequences from 2 homologous chromosomes in the overlapping region. We have analyzed in detail every SNP identified while sequencing three sets of overlapping clones found on chromosome 5p15.2, 7q21–7q22, and 13q12–13q13. In the 200.6 kb of DNA sequence analyzed in these overlaps, 153 SNPs were identified. Computer analysis for repetitive elements and suitability for STS development yielded 44 STSs containing 68 SNPs for further study. All 68 SNPs were confirmed to be present in at least one of the three (Caucasian, African-American, Hispanic) populations studied. Furthermore, 42 of the SNPs tested (62%) were informative in at least one population, 32 (47%) were informative in two or more populations, and 23 (34%) were informative in all three populations. These results clearly indicate that developing SNP markers from overlapping genomic sequence is highly efficient and cost effective, requiring only the two simple steps of developing STSs around the known SNPs and characterizing them in the appropriate populations. [The sequence data described in this paper have been submitted to the GenBank data library under accession nos. AC003015 (for GS113423), AC002380 (GS330J10), AC000066 (RG293F11), AC003086 (RG104F04), AC002525 (257C22A), and U73331 (96A18A).] PMID:9685323

  14. High-density rhesus macaque oligonucleotide microarray design using early-stage rhesus genome sequence information and human genome annotations

    Directory of Open Access Journals (Sweden)

    Magness Charles L

    2007-01-01

    Full Text Available Abstract Background Until recently, few genomic reagents specific for non-human primate research have been available. To address this need, we have constructed a macaque-specific high-density oligonucleotide microarray by using highly fragmented low-pass sequence contigs from the rhesus genome project together with the detailed sequence and exon structure of the human genome. Using this method, we designed oligonucleotide probes to over 17,000 distinct rhesus/human gene orthologs and increased by four-fold the number of available genes relative to our first-generation expressed sequence tag (EST-derived array. Results We constructed a database containing 248,000 exon sequences from 23,000 human RefSeq genes and compared each human exon with its best matching sequence in the January 2005 version of the rhesus genome project list of 486,000 DNA contigs. Best matching rhesus exon sequences for each of the 23,000 human genes were then concatenated in the proper order and orientation to produce a rhesus "virtual transcriptome." Microarray probes were designed, one per gene, to the region closest to the 3' untranslated region (UTR of each rhesus virtual transcript. Each probe was compared to a composite rhesus/human transcript database to test for cross-hybridization potential yielding a final probe set representing 18,296 rhesus/human gene orthologs, including transcript variants, and over 17,000 distinct genes. We hybridized mRNA from rhesus brain and spleen to both the EST- and genome-derived microarrays. Besides four-fold greater gene coverage, the genome-derived array also showed greater mean signal intensities for genes present on both arrays. Genome-derived probes showed 99.4% identity when compared to 4,767 rhesus GenBank sequence tag site (STS sequences indicating that early stage low-pass versions of complex genomes are of sufficient quality to yield valuable functional genomic information when combined with finished genome information from

  15. Draft genome sequence of the sexually transmitted pathogen Trichomonas vaginalis

    DEFF Research Database (Denmark)

    Carlton, Jane M.; Hirt, Robert P.; Silva, Joana C.

    2007-01-01

    We describe the genome sequence of the protist Trichomonas vaginalis, a sexually transmitted human pathogen. Repeats and transposable elements comprise about two-thirds of the approximately 160-megabase genome, reflecting a recent massive expansion of genetic material. This expansion...... environment. The genome sequence predicts previously unknown functions for the hydrogenosome, which support a common evolutionary origin of this unusual organelle with mitochondria....

  16. Bos taurus strain:dairy beef (cattle): 1000 Bull Genomes Run 2, Bovine Whole Genome Sequence

    NARCIS (Netherlands)

    Bouwman, A.C.; Daetwyler, H.D.; Chamberlain, Amanda J.; Ponce, Carla Hurtado; Sargolzaei, Mehdi; Schenkel, Flavio S.; Sahana, Goutam; Govignon-Gion, Armelle; Boitard, Simon; Dolezal, Marlies; Pausch, Hubert; Brøndum, Rasmus F.; Bowman, Phil J.; Thomsen, Bo; Guldbrandtsen, Bernt; Lund, Mogens S.; Servin, Bertrand; Garrick, Dorian J.; Reecy, James M.; Vilkki, Johanna; Bagnato, Alessandro; Wang, Min; Hoff, Jesse L.; Schnabel, Robert D.; Taylor, Jeremy F.; Vinkhuyzen, Anna A.E.; Panitz, Frank; Bendixen, Christian; Holm, Lars-Erik; Gredler, Birgit; Hozé, Chris; Boussaha, Mekki; Sanchez, Marie Pierre; Rocha, Dominique; Capitan, Aurelien; Tribout, Thierry; Barbat, Anne; Croiseau, Pascal; Drögemüller, Cord; Jagannathan, Vidhya; Vander Jagt, Christy; Crowley, John J.; Bieber, Anna; Purfield, Deirdre C.; Berry, Donagh P.; Emmerling, Reiner; Götz, Kay Uwe; Frischknecht, Mirjam; Russ, Ingolf; Sölkner, Johann; Tassell, van Curtis P.; Fries, Ruedi; Stothard, Paul; Veerkamp, R.F.; Boichard, Didier; Goddard, Mike E.; Hayes, Ben J.

    2014-01-01

    Whole genome sequence data (BAM format) of 234 bovine individuals aligned to UMD3.1. The aim of the study was to identify genetic variants (SNPs and indels) for downstream analysis such as imputation, GWAS, and detection of lethal recessives. Additional sequences for later 1000 bull genomes runs can

  17. Multiple Genome Sequences of Lactobacillus plantarum Strains

    OpenAIRE

    Kafka, Thomas A.; Geissler, Andreas J.; Vogel, Rudi F.

    2017-01-01

    ABSTRACT We report here the genome sequences of four Lactobacillus plantarum strains which vary in surface hydrophobicity. Bioinformatic analysis, using additional genomes of Lactobacillus plantarum strains, revealed a possible correlation between the cell wall teichoic acid-type and cell surface hydrophobicity and provide the basis for consecutive analyses.

  18. Whole-Genome Sequencing in Microbial Forensic Analysis of Gamma-Irradiated Microbial Materials.

    Science.gov (United States)

    Broomall, Stacey M; Ait Ichou, Mohamed; Krepps, Michael D; Johnsky, Lauren A; Karavis, Mark A; Hubbard, Kyle S; Insalaco, Joseph M; Betters, Janet L; Redmond, Brady W; Rivers, Bryan A; Liem, Alvin T; Hill, Jessica M; Fochler, Edward T; Roth, Pierce A; Rosenzweig, C Nicole; Skowronski, Evan W; Gibbons, Henry S

    2016-01-15

    Effective microbial forensic analysis of materials used in a potential biological attack requires robust methods of morphological and genetic characterization of the attack materials in order to enable the attribution of the materials to potential sources and to exclude other potential sources. The genetic homogeneity and potential intersample variability of many of the category A to C bioterrorism agents offer a particular challenge to the generation of attributive signatures, potentially requiring whole-genome or proteomic approaches to be utilized. Currently, irradiation of mail is standard practice at several government facilities judged to be at particularly high risk. Thus, initial forensic signatures would need to be recovered from inactivated (nonviable) material. In the study described in this report, we determined the effects of high-dose gamma irradiation on forensic markers of bacterial biothreat agent surrogate organisms with a particular emphasis on the suitability of genomic DNA (gDNA) recovered from such sources as a template for whole-genome analysis. While irradiation of spores and vegetative cells affected the retention of Gram and spore stains and sheared gDNA into small fragments, we found that irradiated material could be utilized to generate accurate whole-genome sequence data on the Illumina and Roche 454 sequencing platforms. Copyright © 2016, American Society for Microbiology. All Rights Reserved.

  19. The complete chloroplast genome sequence of Podocarpus lambertii: genome structure, evolutionary aspects, gene content and SSR detection.

    Directory of Open Access Journals (Sweden)

    Leila do Nascimento Vieira

    Full Text Available BACKGROUND: Podocarpus lambertii (Podocarpaceae is a native conifer from the Brazilian Atlantic Forest Biome, which is considered one of the 25 biodiversity hotspots in the world. The advancement of next-generation sequencing technologies has enabled the rapid acquisition of whole chloroplast (cp genome sequences at low cost. Several studies have proven the potential of cp genomes as tools to understand enigmatic and basal phylogenetic relationships at different taxonomic levels, as well as further probe the structural and functional evolution of plants. In this work, we present the complete cp genome sequence of P. lambertii. METHODOLOGY/PRINCIPAL FINDINGS: The P. lambertii cp genome is 133,734 bp in length, and similar to other sequenced cupressophytes, it lacks one of the large inverted repeat regions (IR. It contains 118 unique genes and one duplicated tRNA (trnN-GUU, which occurs as an inverted repeat sequence. The rps16 gene was not found, which was previously reported for the plastid genome of another Podocarpaceae (Nageia nagi and Araucariaceae (Agathis dammara. Structurally, P. lambertii shows 4 inversions of a large DNA fragment ∼20,000 bp compared to the Podocarpus totara cp genome. These unexpected characteristics may be attributed to geographical distance and different adaptive needs. The P. lambertii cp genome presents a total of 28 tandem repeats and 156 SSRs, with homo- and dipolymers being the most common and tri-, tetra-, penta-, and hexapolymers occurring with less frequency. CONCLUSION: The complete cp genome sequence of P. lambertii revealed significant structural changes, even in species from the same genus. These results reinforce the apparently loss of rps16 gene in Podocarpaceae cp genome. In addition, several SSRs in the P. lambertii cp genome are likely intraspecific polymorphism sites, which may allow highly sensitive phylogeographic and population structure studies, as well as phylogenetic studies of species of

  20. Billions of basepairs of recently expanded, repetitive sequences are eliminated from the somatic genome during copepod development.

    Science.gov (United States)

    Sun, Cheng; Wyngaard, Grace; Walton, D Brian; Wichman, Holly A; Mueller, Rachel Lockridge

    2014-03-11

    Chromatin diminution is the programmed deletion of DNA from presomatic cell or nuclear lineages during development, producing single organisms that contain two different nuclear genomes. Phylogenetically diverse taxa undergo chromatin diminution--some ciliates, nematodes, copepods, and vertebrates. In cyclopoid copepods, chromatin diminution occurs in taxa with massively expanded germline genomes; depending on species, germline genome sizes range from 15 - 75 Gb, 12-74 Gb of which are lost from pre-somatic cell lineages at germline--soma differentiation. This is more than an order of magnitude more sequence than is lost from other taxa. To date, the sequences excised from copepods have not been analyzed using large-scale genomic datasets, and the processes underlying germline genomic gigantism in this clade, as well as the functional significance of chromatin diminution, have remained unknown. Here, we used high-throughput genomic sequencing and qPCR to characterize the germline and somatic genomes of Mesocyclops edax, a freshwater cyclopoid copepod with a germline genome of ~15 Gb and a somatic genome of ~3 Gb. We show that most of the excised DNA consists of repetitive sequences that are either 1) verifiable transposable elements (TEs), or 2) non-simple repeats of likely TE origin. Repeat elements in both genomes are skewed towards younger (i.e. less divergent) elements. Excised DNA is a non-random sample of the germline repeat element landscape; younger elements, and high frequency DNA transposons and LINEs, are disproportionately eliminated from the somatic genome. Our results suggest that germline genome expansion in M. edax reflects explosive repeat element proliferation, and that billions of base pairs of such repeats are deleted from the somatic genome every generation. Thus, we hypothesize that chromatin diminution is a mechanism that controls repeat element load, and that this load can evolve to be divergent between tissue types within single organisms.

  1. Twenty-one genome sequences from Pseudomonas species and 19 genome sequences from diverse bacteria isolated from the rhizosphere and endosphere of Populus deltoides.

    Science.gov (United States)

    Brown, Steven D; Utturkar, Sagar M; Klingeman, Dawn M; Johnson, Courtney M; Martin, Stanton L; Land, Miriam L; Lu, Tse-Yuan S; Schadt, Christopher W; Doktycz, Mitchel J; Pelletier, Dale A

    2012-11-01

    To aid in the investigation of the Populus deltoides microbiome, we generated draft genome sequences for 21 Pseudomonas strains and 19 other diverse bacteria isolated from Populus deltoides roots. Genome sequences for isolates similar to Acidovorax, Bradyrhizobium, Brevibacillus, Caulobacter, Chryseobacterium, Flavobacterium, Herbaspirillum, Novosphingobium, Pantoea, Phyllobacterium, Polaromonas, Rhizobium, Sphingobium, and Variovorax were generated.

  2. The complete chloroplast genome sequence of Abies nephrolepis (Pinaceae: Abietoideae

    Directory of Open Access Journals (Sweden)

    Dong-Keun Yi

    2016-06-01

    Full Text Available The plant chloroplast (cp genome has maintained a relatively conserved structure and gene content throughout evolution. Cp genome sequences have been used widely for resolving evolutionary and phylogenetic issues at various taxonomic levels of plants. Here, we report the complete cp genome of Abies nephrolepis. The A. nephrolepis cp genome is 121,336 base pairs (bp in length including a pair of short inverted repeat regions (IRa and IRb of 139 bp each separated by a small single copy (SSC region of 54,323 bp (SSC and a large single copy region of 66,735 bp (LSC. It contains 114 genes, 68 of which are protein coding genes, 35 tRNA and four rRNA genes, six open reading frames, and one pseudogene. Seventeen repeat units and 64 simple sequence repeats (SSR have been detected in A. nephrolepis cp genome. Large IR sequences locate in 42-kb inversion points (1186 bp. The A. nephrolepis cp genome is identical to Abies koreana’s which is closely related to taxa. Pairwise comparison between two cp genomes revealed 140 polymorphic sites in each. Complete cp genome sequence of A. nephrolepis has a significant potential to provide information on the evolutionary pattern of Abietoideae and valuable data for development of DNA markers for easy identification and classification.

  3. Analysis of high-throughput sequencing and annotation strategies for phage genomes.

    Directory of Open Access Journals (Sweden)

    Matthew R Henn

    Full Text Available BACKGROUND: Bacterial viruses (phages play a critical role in shaping microbial populations as they influence both host mortality and horizontal gene transfer. As such, they have a significant impact on local and global ecosystem function and human health. Despite their importance, little is known about the genomic diversity harbored in phages, as methods to capture complete phage genomes have been hampered by the lack of knowledge about the target genomes, and difficulties in generating sufficient quantities of genomic DNA for sequencing. Of the approximately 550 phage genomes currently available in the public domain, fewer than 5% are marine phage. METHODOLOGY/PRINCIPAL FINDINGS: To advance the study of phage biology through comparative genomic approaches we used marine cyanophage as a model system. We compared DNA preparation methodologies (DNA extraction directly from either phage lysates or CsCl purified phage particles, and sequencing strategies that utilize either Sanger sequencing of a linker amplification shotgun library (LASL or of a whole genome shotgun library (WGSL, or 454 pyrosequencing methods. We demonstrate that genomic DNA sample preparation directly from a phage lysate, combined with 454 pyrosequencing, is best suited for phage genome sequencing at scale, as this method is capable of capturing complete continuous genomes with high accuracy. In addition, we describe an automated annotation informatics pipeline that delivers high-quality annotation and yields few false positives and negatives in ORF calling. CONCLUSIONS/SIGNIFICANCE: These DNA preparation, sequencing and annotation strategies enable a high-throughput approach to the burgeoning field of phage genomics.

  4. Draft genome sequence of bitter gourd (Momordica charantia), a vegetable and medicinal plant in tropical and subtropical regions.

    Science.gov (United States)

    Urasaki, Naoya; Takagi, Hiroki; Natsume, Satoshi; Uemura, Aiko; Taniai, Naoki; Miyagi, Norimichi; Fukushima, Mai; Suzuki, Shouta; Tarora, Kazuhiko; Tamaki, Moritoshi; Sakamoto, Moriaki; Terauchi, Ryohei; Matsumura, Hideo

    2017-02-01

    Bitter gourd (Momordica charantia) is an important vegetable and medicinal plant in tropical and subtropical regions globally. In this study, the draft genome sequence of a monoecious bitter gourd inbred line, OHB3-1, was analyzed. Through Illumina sequencing and de novo assembly, scaffolds of 285.5 Mb in length were generated, corresponding to ∼84% of the estimated genome size of bitter gourd (339 Mb). In this draft genome sequence, 45,859 protein-coding gene loci were identified, and transposable elements accounted for 15.3% of the whole genome. According to synteny mapping and phylogenetic analysis of conserved genes, bitter gourd was more related to watermelon (Citrullus lanatus) than to cucumber (Cucumis sativus) or melon (C. melo). Using RAD-seq analysis, 1507 marker loci were genotyped in an F2 progeny of two bitter gourd lines, resulting in an improved linkage map, comprising 11 linkage groups. By anchoring RAD tag markers, 255 scaffolds were assigned to the linkage map. Comparative analysis of genome sequences and predicted genes determined that putative trypsin-inhibitor and ribosome-inactivating genes were distinctive in the bitter gourd genome. These genes could characterize the bitter gourd as a medicinal plant. © The Author 2016. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  5. Genome-wide sequence variations among Mycobacterium avium subspecies paratuberculosis.

    Directory of Open Access Journals (Sweden)

    Chung-Yi eHsu

    2011-12-01

    Full Text Available Mycobacterium avium subspecies paratuberculosis (M. ap, the causative agent of Johne’s disease (JD, infects many farmed ruminants, wildlife animals and humans. To better understand the molecular pathogenesis of these infections, we analyzed the whole genome sequences of several M. ap and M. avium subspecies avium (M. avium strains isolated from various hosts and environments. Using Next-generation sequencing technology, all 6 M. ap isolates showed a high percentage of homology (98% to the reference genome sequence of M. ap K-10 isolated from cattle. However, 2 M. avium isolates (DT 78 and Env 77 showed significant sequence diversity from the reference strain M. avium 104. The genomes of M. avium isolates DT 78 and Env 77 exhibited only 87% and 40% homology, respectively, to the M. avium 104 reference genome. Within the M. ap isolates, genomic rearrangements (insertions/deletions, Indels were not detected, and only unique single nucleotide polymorphisms (SNPs were observed among the 6 M. ap strains. While most of the SNPs (~100 in M. ap genomes were non-synonymous, a total of ~ 6000 SNPs were detected among M. avium genomes, most of them were synonymous suggesting a differential selective pressure between M. ap and M. avium isolates. In addition, SNPs-based phylo-genomic analysis showed that isolates from goat and Oryx are closely related to the cattle (K-10 strain while the human isolate (M. ap 4B is closely related to the environmental strains, indicating environmental source to human infections. Overall, SNPs were the most common variations among M. ap isolates while SNPs in addition to Indels were prevalent among M. avium isolates. Genomic variations will be useful in designing host-specific markers for the analysis of mycobacterial evolution and for developing novel diagnostics directed against Johne’s disease in animals.

  6. Phenotypic, molecular characterization, antimicrobial susceptibility and draft genome sequence of Corynebacterium argentoratense strains isolated from clinical samples

    Directory of Open Access Journals (Sweden)

    I. Fernández-Natal

    2016-03-01

    Full Text Available During a 12-year period we isolated five Corynebacterium argentoratense strains identified by phenotypic methods, including the use of matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF and 16S rRNA gene sequencing. In addition, antimicrobial susceptibility was determined, and genome sequencing for the detection of antibiotic resistance genes was performed. The organisms were isolated from blood and throat cultures and could be identified by all methods used. All strains were resistant to cotrimoxazole, and resistance to β-lactams was partly present. Two strains were resistant to erythromycin and clindamycin. The draft genome sequences of theses isolates revealed the presence of the erm(X resistance gene that is embedded in the genetic structure of the transposable element Tn5423. Although rarely reported as a human pathogen, C. argentoratense can be involved in bacteraemia and probably in other infections. Our results also show that horizontal transfer of genes responsible for antibiotic resistance is occurring in this species.

  7. A complete mitochondrial genome sequence from a mesolithic wild aurochs (Bos primigenius.

    Directory of Open Access Journals (Sweden)

    Ceiridwen J Edwards

    Full Text Available BACKGROUND: The derivation of domestic cattle from the extinct wild aurochs (Bos primigenius has been well-documented by archaeological and genetic studies. Genetic studies point towards the Neolithic Near East as the centre of origin for Bos taurus, with some lines of evidence suggesting possible, albeit rare, genetic contributions from locally domesticated wild aurochsen across Eurasia. Inferences from these investigations have been based largely on the analysis of partial mitochondrial DNA sequences generated from modern animals, with limited sequence data from ancient aurochsen samples. Recent developments in DNA sequencing technologies, however, are affording new opportunities for the examination of genetic material retrieved from extinct species, providing new insight into their evolutionary history. Here we present DNA sequence analysis of the first complete mitochondrial genome (16,338 base pairs from an archaeologically-verified and exceptionally-well preserved aurochs bone sample. METHODOLOGY: DNA extracts were generated from an aurochs humerus bone sample recovered from a cave site located in Derbyshire, England and radiocarbon-dated to 6,738+/-68 calibrated years before present. These extracts were prepared for both Sanger and next generation DNA sequencing technologies (Illumina Genome Analyzer. In total, 289.9 megabases (22.48% of the post-filtered DNA sequences generated using the Illumina Genome Analyzer from this sample mapped with confidence to the bovine genome. A consensus B. primigenius mitochondrial genome sequence was constructed and was analysed alongside all available complete bovine mitochondrial genome sequences. CONCLUSIONS: For all nucleotide positions where both Sanger and Illumina Genome Analyzer sequencing methods gave high-confidence calls, no discrepancies were observed. Sequence analysis reveals evidence of heteroplasmy in this sample and places this mitochondrial genome sequence securely within a previously

  8. A complete mitochondrial genome sequence from a mesolithic wild aurochs (Bos primigenius).

    LENUS (Irish Health Repository)

    Edwards, Ceiridwen J

    2010-01-01

    BACKGROUND: The derivation of domestic cattle from the extinct wild aurochs (Bos primigenius) has been well-documented by archaeological and genetic studies. Genetic studies point towards the Neolithic Near East as the centre of origin for Bos taurus, with some lines of evidence suggesting possible, albeit rare, genetic contributions from locally domesticated wild aurochsen across Eurasia. Inferences from these investigations have been based largely on the analysis of partial mitochondrial DNA sequences generated from modern animals, with limited sequence data from ancient aurochsen samples. Recent developments in DNA sequencing technologies, however, are affording new opportunities for the examination of genetic material retrieved from extinct species, providing new insight into their evolutionary history. Here we present DNA sequence analysis of the first complete mitochondrial genome (16,338 base pairs) from an archaeologically-verified and exceptionally-well preserved aurochs bone sample. METHODOLOGY: DNA extracts were generated from an aurochs humerus bone sample recovered from a cave site located in Derbyshire, England and radiocarbon-dated to 6,738+\\/-68 calibrated years before present. These extracts were prepared for both Sanger and next generation DNA sequencing technologies (Illumina Genome Analyzer). In total, 289.9 megabases (22.48%) of the post-filtered DNA sequences generated using the Illumina Genome Analyzer from this sample mapped with confidence to the bovine genome. A consensus B. primigenius mitochondrial genome sequence was constructed and was analysed alongside all available complete bovine mitochondrial genome sequences. CONCLUSIONS: For all nucleotide positions where both Sanger and Illumina Genome Analyzer sequencing methods gave high-confidence calls, no discrepancies were observed. Sequence analysis reveals evidence of heteroplasmy in this sample and places this mitochondrial genome sequence securely within a previously identified

  9. Candidate genes revealed by a genome scan for mosquito resistance to a bacterial insecticide: sequence and gene expression variations

    Directory of Open Access Journals (Sweden)

    David Jean-Philippe

    2009-11-01

    Full Text Available Abstract Background Genome scans are becoming an increasingly popular approach to study the genetic basis of adaptation and speciation, but on their own, they are often helpless at identifying the specific gene(s or mutation(s targeted by selection. This shortcoming is hopefully bound to disappear in the near future, thanks to the wealth of new genomic resources that are currently being developed for many species. In this article, we provide a foretaste of this exciting new era by conducting a genome scan in the mosquito Aedes aegypti with the aim to look for candidate genes involved in resistance to Bacillus thuringiensis subsp. israelensis (Bti insecticidal toxins. Results The genome of a Bti-resistant and a Bti-susceptible strains was surveyed using about 500 MITE-based molecular markers, and the loci showing the highest inter-strain genetic differentiation were sequenced and mapped on the Aedes aegypti genome sequence. Several good candidate genes for Bti-resistance were identified in the vicinity of these highly differentiated markers. Two of them, coding for a cadherin and a leucine aminopeptidase, were further examined at the sequence and gene expression levels. In the resistant strain, the cadherin gene displayed patterns of nucleotide polymorphisms consistent with the action of positive selection (e.g. an excess of high compared to intermediate frequency mutations, as well as a significant under-expression compared to the susceptible strain. Conclusion Both sequence and gene expression analyses agree to suggest a role for positive selection in the evolution of this cadherin gene in the resistant strain. However, it is unlikely that resistance to Bti is conferred by this gene alone, and further investigation will be needed to characterize other genes significantly associated with Bti resistance in Ae. aegypti. Beyond these results, this article illustrates how genome scans can build on the body of new genomic information (here, full

  10. Global repeat discovery and estimation of genomic copy number in a large, complex genome using a high-throughput 454 sequence survey

    Directory of Open Access Journals (Sweden)

    Varala Kranthi

    2007-05-01

    Full Text Available Abstract Background Extensive computational and database tools are available to mine genomic and genetic databases for model organisms, but little genomic data is available for many species of ecological or agricultural significance, especially those with large genomes. Genome surveys using conventional sequencing techniques are powerful, particularly for detecting sequences present in many copies per genome. However these methods are time-consuming and have potential drawbacks. High throughput 454 sequencing provides an alternative method by which much information can be gained quickly and cheaply from high-coverage surveys of genomic DNA. Results We sequenced 78 million base-pairs of randomly sheared soybean DNA which passed our quality criteria. Computational analysis of the survey sequences provided global information on the abundant repetitive sequences in soybean. The sequence was used to determine the copy number across regions of large genomic clones or contigs and discover higher-order structures within satellite repeats. We have created an annotated, online database of sequences present in multiple copies in the soybean genome. The low bias of pyrosequencing against repeat sequences is demonstrated by the overall composition of the survey data, which matches well with past estimates of repetitive DNA content obtained by DNA re-association kinetics (Cot analysis. Conclusion This approach provides a potential aid to conventional or shotgun genome assembly, by allowing rapid assessment of copy number in any clone or clone-end sequence. In addition, we show that partial sequencing can provide access to partial protein-coding sequences.

  11. Rapid sequencing of the bamboo mitochondrial genome using Illumina technology and parallel episodic evolution of organelle genomes in grasses.

    Science.gov (United States)

    Ma, Peng-Fei; Guo, Zhen-Hua; Li, De-Zhu

    2012-01-01

    Compared to their counterparts in animals, the mitochondrial (mt) genomes of angiosperms exhibit a number of unique features. However, unravelling their evolution is hindered by the few completed genomes, of which are essentially Sanger sequenced. While next-generation sequencing technologies have revolutionized chloroplast genome sequencing, they are just beginning to be applied to angiosperm mt genomes. Chloroplast genomes of grasses (Poaceae) have undergone episodic evolution and the evolutionary rate was suggested to be correlated between chloroplast and mt genomes in Poaceae. It is interesting to investigate whether correlated rate change also occurred in grass mt genomes as expected under lineage effects. A time-calibrated phylogenetic tree is needed to examine rate change. We determined a largely completed mt genome from a bamboo, Ferrocalamus rimosivaginus (Poaceae), through Illumina sequencing of total DNA. With combination of de novo and reference-guided assembly, 39.5-fold coverage Illumina reads were finally assembled into scaffolds totalling 432,839 bp. The assembled genome contains nearly the same genes as the completed mt genomes in Poaceae. For examining evolutionary rate in grass mt genomes, we reconstructed a phylogenetic tree including 22 taxa based on 31 mt genes. The topology of the well-resolved tree was almost identical to that inferred from chloroplast genome with only minor difference. The inconsistency possibly derived from long branch attraction in mtDNA tree. By calculating absolute substitution rates, we found significant rate change (∼4-fold) in mt genome before and after the diversification of Poaceae both in synonymous and nonsynonymous terms. Furthermore, the rate change was correlated with that of chloroplast genomes in grasses. Our result demonstrates that it is a rapid and efficient approach to obtain angiosperm mt genome sequences using Illumina sequencing technology. The parallel episodic evolution of mt and chloroplast

  12. Characterization of full-length sequenced cDNA inserts (FLIcs) from Atlantic salmon (Salmo salar)

    Science.gov (United States)

    Andreassen, Rune; Lunner, Sigbjørn; Høyheim, Bjørn

    2009-01-01

    Background Sequencing of the Atlantic salmon genome is now being planned by an international research consortium. Full-length sequenced inserts from cDNAs (FLIcs) are an important tool for correct annotation and clustering of the genomic sequence in any species. The large amount of highly similar duplicate sequences caused by the relatively recent genome duplication in the salmonid ancestor represents a particular challenge for the genome project. FLIcs will therefore be an extremely useful resource for the Atlantic salmon sequencing project. In addition to be helpful in order to distinguish between duplicate genome regions and in determining correct gene structures, FLIcs are an important resource for functional genomic studies and for investigation of regulatory elements controlling gene expression. In contrast to the large number of ESTs available, including the ESTs from 23 developmental and tissue specific cDNA libraries contributed by the Salmon Genome Project (SGP), the number of sequences where the full-length of the cDNA insert has been determined has been small. Results High quality full-length insert sequences from 560 pre-smolt white muscle tissue specific cDNAs were generated, accession numbers [GenBank: BT043497 - BT044056]. Five hundred and ten (91%) of the transcripts were annotated using Gene Ontology (GO) terms and 440 of the FLIcs are likely to contain a complete coding sequence (cCDS). The sequence information was used to identify putative paralogs, characterize salmon Kozak motifs, polyadenylation signal variation and to identify motifs likely to be involved in the regulation of particular genes. Finally, conserved 7-mers in the 3'UTRs were identified, of which some were identical to miRNA target sequences. Conclusion This paper describes the first Atlantic salmon FLIcs from a tissue and developmental stage specific cDNA library. We have demonstrated that many FLIcs contained a complete coding sequence (cCDS). This suggests that the remaining c

  13. Characterization of full-length sequenced cDNA inserts (FLIcs from Atlantic salmon (Salmo salar

    Directory of Open Access Journals (Sweden)

    Lunner Sigbjørn

    2009-10-01

    Full Text Available Abstract Background Sequencing of the Atlantic salmon genome is now being planned by an international research consortium. Full-length sequenced inserts from cDNAs (FLIcs are an important tool for correct annotation and clustering of the genomic sequence in any species. The large amount of highly similar duplicate sequences caused by the relatively recent genome duplication in the salmonid ancestor represents a particular challenge for the genome project. FLIcs will therefore be an extremely useful resource for the Atlantic salmon sequencing project. In addition to be helpful in order to distinguish between duplicate genome regions and in determining correct gene structures, FLIcs are an important resource for functional genomic studies and for investigation of regulatory elements controlling gene expression. In contrast to the large number of ESTs available, including the ESTs from 23 developmental and tissue specific cDNA libraries contributed by the Salmon Genome Project (SGP, the number of sequences where the full-length of the cDNA insert has been determined has been small. Results High quality full-length insert sequences from 560 pre-smolt white muscle tissue specific cDNAs were generated, accession numbers [GenBank: BT043497 - BT044056]. Five hundred and ten (91% of the transcripts were annotated using Gene Ontology (GO terms and 440 of the FLIcs are likely to contain a complete coding sequence (cCDS. The sequence information was used to identify putative paralogs, characterize salmon Kozak motifs, polyadenylation signal variation and to identify motifs likely to be involved in the regulation of particular genes. Finally, conserved 7-mers in the 3'UTRs were identified, of which some were identical to miRNA target sequences. Conclusion This paper describes the first Atlantic salmon FLIcs from a tissue and developmental stage specific cDNA library. We have demonstrated that many FLIcs contained a complete coding sequence (cCDS. This

  14. The zebrafish reference genome sequence and its relationship to the human genome.

    Science.gov (United States)

    Howe, Kerstin; Clark, Matthew D; Torroja, Carlos F; Torrance, James; Berthelot, Camille; Muffato, Matthieu; Collins, John E; Humphray, Sean; McLaren, Karen; Matthews, Lucy; McLaren, Stuart; Sealy, Ian; Caccamo, Mario; Churcher, Carol; Scott, Carol; Barrett, Jeffrey C; Koch, Romke; Rauch, Gerd-Jörg; White, Simon; Chow, William; Kilian, Britt; Quintais, Leonor T; Guerra-Assunção, José A; Zhou, Yi; Gu, Yong; Yen, Jennifer; Vogel, Jan-Hinnerk; Eyre, Tina; Redmond, Seth; Banerjee, Ruby; Chi, Jianxiang; Fu, Beiyuan; Langley, Elizabeth; Maguire, Sean F; Laird, Gavin K; Lloyd, David; Kenyon, Emma; Donaldson, Sarah; Sehra, Harminder; Almeida-King, Jeff; Loveland, Jane; Trevanion, Stephen; Jones, Matt; Quail, Mike; Willey, Dave; Hunt, Adrienne; Burton, John; Sims, Sarah; McLay, Kirsten; Plumb, Bob; Davis, Joy; Clee, Chris; Oliver, Karen; Clark, Richard; Riddle, Clare; Elliot, David; Eliott, David; Threadgold, Glen; Harden, Glenn; Ware, Darren; Begum, Sharmin; Mortimore, Beverley; Mortimer, Beverly; Kerry, Giselle; Heath, Paul; Phillimore, Benjamin; Tracey, Alan; Corby, Nicole; Dunn, Matthew; Johnson, Christopher; Wood, Jonathan; Clark, Susan; Pelan, Sarah; Griffiths, Guy; Smith, Michelle; Glithero, Rebecca; Howden, Philip; Barker, Nicholas; Lloyd, Christine; Stevens, Christopher; Harley, Joanna; Holt, Karen; Panagiotidis, Georgios; Lovell, Jamieson; Beasley, Helen; Henderson, Carl; Gordon, Daria; Auger, Katherine; Wright, Deborah; Collins, Joanna; Raisen, Claire; Dyer, Lauren; Leung, Kenric; Robertson, Lauren; Ambridge, Kirsty; Leongamornlert, Daniel; McGuire, Sarah; Gilderthorp, Ruth; Griffiths, Coline; Manthravadi, Deepa; Nichol, Sarah; Barker, Gary; Whitehead, Siobhan; Kay, Michael; Brown, Jacqueline; Murnane, Clare; Gray, Emma; Humphries, Matthew; Sycamore, Neil; Barker, Darren; Saunders, David; Wallis, Justene; Babbage, Anne; Hammond, Sian; Mashreghi-Mohammadi, Maryam; Barr, Lucy; Martin, Sancha; Wray, Paul; Ellington, Andrew; Matthews, Nicholas; Ellwood, Matthew; Woodmansey, Rebecca; Clark, Graham; Cooper, James D; Cooper, James; Tromans, Anthony; Grafham, Darren; Skuce, Carl; Pandian, Richard; Andrews, Robert; Harrison, Elliot; Kimberley, Andrew; Garnett, Jane; Fosker, Nigel; Hall, Rebekah; Garner, Patrick; Kelly, Daniel; Bird, Christine; Palmer, Sophie; Gehring, Ines; Berger, Andrea; Dooley, Christopher M; Ersan-Ürün, Zübeyde; Eser, Cigdem; Geiger, Horst; Geisler, Maria; Karotki, Lena; Kirn, Anette; Konantz, Judith; Konantz, Martina; Oberländer, Martina; Rudolph-Geiger, Silke; Teucke, Mathias; Lanz, Christa; Raddatz, Günter; Osoegawa, Kazutoyo; Zhu, Baoli; Rapp, Amanda; Widaa, Sara; Langford, Cordelia; Yang, Fengtang; Schuster, Stephan C; Carter, Nigel P; Harrow, Jennifer; Ning, Zemin; Herrero, Javier; Searle, Steve M J; Enright, Anton; Geisler, Robert; Plasterk, Ronald H A; Lee, Charles; Westerfield, Monte; de Jong, Pieter J; Zon, Leonard I; Postlethwait, John H; Nüsslein-Volhard, Christiane; Hubbard, Tim J P; Roest Crollius, Hugues; Rogers, Jane; Stemple, Derek L

    2013-04-25

    Zebrafish have become a popular organism for the study of vertebrate gene function. The virtually transparent embryos of this species, and the ability to accelerate genetic studies by gene knockdown or overexpression, have led to the widespread use of zebrafish in the detailed investigation of vertebrate gene function and increasingly, the study of human genetic disease. However, for effective modelling of human genetic disease it is important to understand the extent to which zebrafish genes and gene structures are related to orthologous human genes. To examine this, we generated a high-quality sequence assembly of the zebrafish genome, made up of an overlapping set of completely sequenced large-insert clones that were ordered and oriented using a high-resolution high-density meiotic map. Detailed automatic and manual annotation provides evidence of more than 26,000 protein-coding genes, the largest gene set of any vertebrate so far sequenced. Comparison to the human reference genome shows that approximately 70% of human genes have at least one obvious zebrafish orthologue. In addition, the high quality of this genome assembly provides a clearer understanding of key genomic features such as a unique repeat content, a scarcity of pseudogenes, an enrichment of zebrafish-specific genes on chromosome 4 and chromosomal regions that influence sex determination.

  15. Twenty-One Genome Sequences from Pseudomonas Species and 19 Genome Sequences from Diverse Bacteria Isolated from the Rhizosphere and Endosphere of Populus deltoides

    Energy Technology Data Exchange (ETDEWEB)

    Brown, Steven D [ORNL; Utturkar, Sagar M [ORNL; Klingeman, Dawn Marie [ORNL; Johnson, Courtney M [ORNL; Martin, Stanton [ORNL; Land, Miriam L [ORNL; Lu, Tse-Yuan [ORNL; Schadt, Christopher Warren [ORNL; Doktycz, Mitchel John [ORNL; Pelletier, Dale A [ORNL

    2012-01-01

    To aid in the investigation of the Populus deltoides microbiome we generated draft genome sequences for twenty one Pseudomonas and twenty one other diverse bacteria isolated from Populus deltoides roots. Genome sequences for isolates similar to Acidovorax, Bradyrhizobium, Brevibacillus, Burkholderia, Caulobacter, Chryseobacterium, Flavobacterium, Herbaspirillum, Novosphingobium, Pantoea, Phyllobacterium, Polaromonas, Rhizobium, Sphingobium and Variovorax were generated.

  16. Controversy and debate on clinical genomics sequencing-paper 1: genomics is not exceptional: rigorous evaluations are necessary for clinical applications of genomic sequencing.

    Science.gov (United States)

    Wilson, Brenda J; Miller, Fiona Alice; Rousseau, François

    2017-12-01

    Next generation genomic sequencing (NGS) technologies-whole genome and whole exome sequencing-are now cheap enough to be within the grasp of many health care organizations. To many, NGS is symbolic of cutting edge health care, offering the promise of "precision" and "personalized" medicine. Historically, research and clinical application has been a two-way street in clinical genetics: research often driven directly by the desire to understand and try to solve immediate clinical problems affecting real, identifiable patients and families, accompanied by a low threshold of willingness to apply research-driven interventions without resort to formal empirical evaluations. However, NGS technologies are not simple substitutes for older technologies and need careful evaluation for use as screening, diagnostic, or prognostic tools. We have concerns across three areas. First, at the moment, analytic validity is unknown because technical platforms are not yet stable, laboratory quality assurance programs are in their infancy, and data interpretation capabilities are badly underdeveloped. Second, clinical validity of genomic findings for patient populations without pre-existing high genetic risk is doubtful, as most clinical experience with NGS technologies relates to patients with a high prior likelihood of a genetic etiology. Finally, we are concerned that proponents argue not only for clinically driven approaches to assessing a patient's genome, but also for seeking out variants associated with unrelated conditions or susceptibilities-so-called "secondary targets"-this is screening on a genomic scale. We argue that clinical uses of genomic sequencing should remain limited to specialist and research settings, that screening for secondary findings in clinical testing should be limited to the maximum extent possible, and that the benefits, harms, and economic implications of their routine use be systematically evaluated. All stakeholders have a responsibility to ensure that

  17. An integrated semiconductor device enabling non-optical genome sequencing.

    Science.gov (United States)

    Rothberg, Jonathan M; Hinz, Wolfgang; Rearick, Todd M; Schultz, Jonathan; Mileski, William; Davey, Mel; Leamon, John H; Johnson, Kim; Milgrew, Mark J; Edwards, Matthew; Hoon, Jeremy; Simons, Jan F; Marran, David; Myers, Jason W; Davidson, John F; Branting, Annika; Nobile, John R; Puc, Bernard P; Light, David; Clark, Travis A; Huber, Martin; Branciforte, Jeffrey T; Stoner, Isaac B; Cawley, Simon E; Lyons, Michael; Fu, Yutao; Homer, Nils; Sedova, Marina; Miao, Xin; Reed, Brian; Sabina, Jeffrey; Feierstein, Erika; Schorn, Michelle; Alanjary, Mohammad; Dimalanta, Eileen; Dressman, Devin; Kasinskas, Rachel; Sokolsky, Tanya; Fidanza, Jacqueline A; Namsaraev, Eugeni; McKernan, Kevin J; Williams, Alan; Roth, G Thomas; Bustillo, James

    2011-07-20

    The seminal importance of DNA sequencing to the life sciences, biotechnology and medicine has driven the search for more scalable and lower-cost solutions. Here we describe a DNA sequencing technology in which scalable, low-cost semiconductor manufacturing techniques are used to make an integrated circuit able to directly perform non-optical DNA sequencing of genomes. Sequence data are obtained by directly sensing the ions produced by template-directed DNA polymerase synthesis using all-natural nucleotides on this massively parallel semiconductor-sensing device or ion chip. The ion chip contains ion-sensitive, field-effect transistor-based sensors in perfect register with 1.2 million wells, which provide confinement and allow parallel, simultaneous detection of independent sequencing reactions. Use of the most widely used technology for constructing integrated circuits, the complementary metal-oxide semiconductor (CMOS) process, allows for low-cost, large-scale production and scaling of the device to higher densities and larger array sizes. We show the performance of the system by sequencing three bacterial genomes, its robustness and scalability by producing ion chips with up to 10 times as many sensors and sequencing a human genome.

  18. Complete genome sequence of Acidimicrobium ferrooxidans type strain (ICPT)

    Energy Technology Data Exchange (ETDEWEB)

    Clum, Alicia; Nolan, Matt; Lang, Elke; Glavina Del Rio, Tijana; Tice, Hope; Copeland, Alex; Cheng, Jan-Fang; Lucas, Susan; Chen, Feng; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Ivanova, Natalia; Mavrommatis, Konstantinos; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Goker, Markus; Spring, Stefan; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jefferies, Cynthia C.; Chain, Patrick; Bristow, James; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C.; Klenk, Hans-Peter; Lapidus, Alla

    2009-05-20

    Acidimicrobium ferrooxidans (Clark and Norris 1996) is the sole and type species of the genus, which until recently was the only genus within the actinobacterial family Acidimicrobiaceae and in the order Acidomicrobiales. Rapid oxidation of iron pyrite during autotrophic growth in the absence of an enhanced CO2 concentration is characteristic for A. ferrooxidans. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of the order Acidomicrobiales, and the 2,158,157 bp long single replicon genome with its 2038 protein coding and 54 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  19. What can we learn about lyssavirus genomes using 454 sequencing?

    Science.gov (United States)

    Höper, Dirk; Finke, Stefan; Freuling, Conrad M; Hoffmann, Bernd; Beer, Martin

    2012-01-01

    The main task of the individual project number four"Whole genome sequencing, virus-host adaptation, and molecular epidemiological analyses of lyssaviruses "within the network" Lyssaviruses--a potential re-emerging public health threat" is to provide high quality complete genome sequences from lyssaviruses. These sequences are analysed in-depth with regard to the diversity of the viral populations as to both quasi-species and so-called defective interfering RNAs. Moreover, the sequence data will facilitate further epidemiological analyses, will provide insight into the evolution of lyssaviruses and will be the basis for the design of novel nucleic acid based diagnostics. The first results presented here indicate that not only high quality full-length lyssavirus genome sequences can be generated, but indeed efficient analysis of the viral population gets feasible.

  20. Whole Genome Sequences of Three Treponema pallidum ssp. pertenue Strains: Yaws and Syphilis Treponemes Differ in Less than 0.2% of the Genome Sequence

    Science.gov (United States)

    Chen, Lei; Pospíšilová, Petra; Strouhal, Michal; Qin, Xiang; Mikalová, Lenka; Norris, Steven J.; Muzny, Donna M.; Gibbs, Richard A.; Fulton, Lucinda L.; Sodergren, Erica; Weinstock, George M.; Šmajs, David

    2012-01-01

    Background The yaws treponemes, Treponema pallidum ssp. pertenue (TPE) strains, are closely related to syphilis causing strains of Treponema pallidum ssp. pallidum (TPA). Both yaws and syphilis are distinguished on the basis of epidemiological characteristics, clinical symptoms, and several genetic signatures of the corresponding causative agents. Methodology/Principal Findings To precisely define genetic differences between TPA and TPE, high-quality whole genome sequences of three TPE strains (Samoa D, CDC-2, Gauthier) were determined using next-generation sequencing techniques. TPE genome sequences were compared to four genomes of TPA strains (Nichols, DAL-1, SS14, Chicago). The genome structure was identical in all three TPE strains with similar length ranging between 1,139,330 bp and 1,139,744 bp. No major genome rearrangements were found when compared to the four TPA genomes. The whole genome nucleotide divergence (dA) between TPA and TPE subspecies was 4.7 and 4.8 times higher than the observed nucleotide diversity (π) among TPA and TPE strains, respectively, corresponding to 99.8% identity between TPA and TPE genomes. A set of 97 (9.9%) TPE genes encoded proteins containing two or more amino acid replacements or other major sequence changes. The TPE divergent genes were mostly from the group encoding potential virulence factors and genes encoding proteins with unknown function. Conclusions/Significance Hypothetical genes, with genetic differences, consistently found between TPE and TPA strains are candidates for syphilitic treponemes virulence factors. Seventeen TPE genes were predicted under positive selection, and eleven of them coded either for predicted exported proteins or membrane proteins suggesting their possible association with the cell surface. Sequence changes between TPE and TPA strains and changes specific to individual strains represent suitable targets for subspecies- and strain-specific molecular diagnostics. PMID:22292095

  1. Human Ro60 (SSA2) genomic organization and sequence alterations, examined in cutaneous lupus erythematosus.

    Science.gov (United States)

    Millard, T P; Ashton, G H S; Kondeatis, E; Vaughan, R W; Hughes, G R V; Khamashta, M A; Hawk, J L M; McGregor, J M; McGrath, J A

    2002-02-01

    The Ro 60 kDa protein (Ro60 or SSA2) is the major component of the Ro ribonucleoprotein (Ro RNP) complex, to which an immune response is a specific feature of several autoimmune diseases. The genomic organization and any sequence variation within the DNA encoding Ro60 are unknown. To characterize the Ro60 gene structure and to assess whether any sequence alterations might be associated with serum anti-Ro antibody in subacute cutaneous lupus erythematosus (SCLE), thus potentially providing new insight into disease pathogenesis. The cDNA sequence for Ro60 was obtained from the NCBI database and used for a BLAST search for a clone containing the entire genomic sequence. The intron-exon borders were confirmed by designing intronic primer pairs to flank each exon, which were then used to amplify genomic DNA for automated sequencing from 36 caucasian patients with SCLE (anti-Ro positive) and 49 with discoid LE (DLE, anti-Ro negative), in addition to 36 healthy caucasian controls. Heteroduplex analysis of polymerase chain reaction (PCR) products from patients and controls spanning all Ro60 exons (1-8) revealed a common bandshift in the PCR products spanning exon 7. Sequencing of the corresponding PCR products demonstrated an A > G substitution at nucleotide position 1318-7, within the consensus acceptor splice site of exon 7 (GenBank XM001901). The allele frequencies were major allele A (0.71) and minor allele G (0.29) in 72 control chromosomes, with no significant differences found between SCLE patients, DLE patients and controls. The genomic organization of the DNA encoding the Ro60 protein is described, including a common polymorphism within the consensus acceptor splice site of exon 7. Our delineation of a strategy for the genomic amplification of Ro60 forms a basis for further examination of the pathological functions of the Ro RNP in autoimmune disease.

  2. ChickVD: a sequence variation database for the chicken genome

    DEFF Research Database (Denmark)

    Wang, Jing; He, Ximiao; Ruan, Jue

    2005-01-01

    Working in parallel with the efforts to sequence the chicken (Gallus gallus) genome, the Beijing Genomics Institute led an international team of scientists from China, USA, UK, Sweden, The Netherlands and Germany to map extensive DNA sequence variation throughout the chicken genome by sampling DN...... on quantitative trait loci using data from collaborating institutions and public resources. Our data can be queried by search engine and homology-based BLAST searches. ChickVD is publicly accessible at http://chicken.genomics.org.cn. Udgivelsesdato: 2005-Jan-1...

  3. Complete genome sequence of Gordonia bronchialis type strain (3410T)

    Energy Technology Data Exchange (ETDEWEB)

    Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Sikorski, Johannes [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Jando, Marlen [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute; Nolan, Matt [U.S. Department of Energy, Joint Genome Institute; Glavina Del Rio, Tijana [U.S. Department of Energy, Joint Genome Institute; Tice, Hope [U.S. Department of Energy, Joint Genome Institute; Copeland, A [U.S. Department of Energy, Joint Genome Institute; Cheng, Jan-Fang [U.S. Department of Energy, Joint Genome Institute; Chen, Feng [U.S. Department of Energy, Joint Genome Institute; Bruce, David [Los Alamos National Laboratory (LANL); Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Pitluck, Sam [U.S. Department of Energy, Joint Genome Institute; Mavromatis, K [U.S. Department of Energy, Joint Genome Institute; Ovchinnikova, Galina [U.S. Department of Energy, Joint Genome Institute; Pati, Amrita [U.S. Department of Energy, Joint Genome Institute; Chen, Amy [U.S. Department of Energy, Joint Genome Institute; Palaniappan, Krishna [U.S. Department of Energy, Joint Genome Institute; Land, Miriam L [ORNL; Hauser, Loren John [ORNL; Chang, Yun-Juan [ORNL; Jeffries, Cynthia [Oak Ridge National Laboratory (ORNL); Chain, Patrick S. G. [Lawrence Livermore National Laboratory (LLNL); Saunders, Elizabeth H [Los Alamos National Laboratory (LANL); Han, Cliff [Los Alamos National Laboratory (LANL); Detter, J C [U.S. Department of Energy, Joint Genome Institute; Brettin, Thomas S [ORNL; Rohde, Manfred [HZI - Helmholtz Centre for Infection Research, Braunschweig, Germany; Goker, Markus [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Bristow, James [U.S. Department of Energy, Joint Genome Institute; Eisen, Jonathan [U.S. Department of Energy, Joint Genome Institute; Markowitz, Victor [U.S. Department of Energy, Joint Genome Institute; Hugenholtz, Philip [U.S. Department of Energy, Joint Genome Institute; Klenk, Hans-Peter [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute

    2010-01-01

    Gordonia bronchialis Tsukamura 1971 is the type species of the genus. G. bronchialis is a human-pathogenic organism that has been isolated from a large variety of human tissues. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of the family Gordoniaceae. The 5,290,012 bp long genome with its 4,944 protein-coding and 55 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  4. Genome-wide SNP identification by high-throughput sequencing and selective mapping allows sequence assembly positioning using a framework genetic linkage map

    Directory of Open Access Journals (Sweden)

    Xu Xiangming

    2010-12-01

    Full Text Available Abstract Background Determining the position and order of contigs and scaffolds from a genome assembly within an organism's genome remains a technical challenge in a majority of sequencing projects. In order to exploit contemporary technologies for DNA sequencing, we developed a strategy for whole genome single nucleotide polymorphism sequencing allowing the positioning of sequence contigs onto a linkage map using the bin mapping method. Results The strategy was tested on a draft genome of the fungal pathogen Venturia inaequalis, the causal agent of apple scab, and further validated using sequence contigs derived from the diploid plant genome Fragaria vesca. Using our novel method we were able to anchor 70% and 92% of sequences assemblies for V. inaequalis and F. vesca, respectively, to genetic linkage maps. Conclusions We demonstrated the utility of this approach by accurately determining the bin map positions of the majority of the large sequence contigs from each genome sequence and validated our method by mapping single sequence repeat markers derived from sequence contigs on a full mapping population.

  5. Accelerated Evolution of Conserved Noncoding Sequences in theHuman Genome

    Energy Technology Data Exchange (ETDEWEB)

    Prambhakar, Shyam; Noonan, James P.; Paabo, Svante; Rubin, EdwardM.

    2006-07-06

    Genomic comparisons between human and distant, non-primatemammals are commonly used to identify cis-regulatory elements based onconstrained sequence evolution. However, these methods fail to detect"cryptic" functional elements, which are too weakly conserved amongmammals to distinguish from nonfunctional DNA. To address this problem,we explored the potential of deep intra-primate sequence comparisons. Wesequenced the orthologs of 558 kb of human genomic sequence, coveringmultiple loci involved in cholesterol homeostasis, in 6 nonhumanprimates. Our analysis identified 6 noncoding DNA elements displayingsignificant conservation among primates, but undetectable in more distantcomparisons. In vitro and in vivo tests revealed that at least three ofthese 6 elements have regulatory function. Notably, the mouse orthologsof these three functional human sequences had regulatory activity despitetheir lack of significant sequence conservation, indicating that they arecryptic ancestral cis-regulatory elements. These regulatory elementscould still be detected in a smaller set of three primate speciesincluding human, rhesus and marmoset. Since the human and rhesus genomesequences are already available, and the marmoset genome is activelybeing sequenced, the primate-specific conservation analysis describedhere can be applied in the near future on a whole-genome scale, tocomplement the annotation provided by more distant speciescomparisons.

  6. Ancient Human Genome Sequence of an Extinct Palaeo-Eskimo

    DEFF Research Database (Denmark)

    Rasmussen, Morten; Li, Yingrui; Lindgreen, Stinus

    2010-01-01

    We report here the genome sequence of an ancient human. Obtained from approximately 4,000-year-old permafrost-preserved hair, the genome represents a male individual from the first known culture to settle in Greenland. Sequenced to an average depth of 20x, we recover 79% of the diploid genome...... possible phenotypic characteristics of the individual that belonged to a culture whose location has yielded only trace human remains. We compare the high-confidence SNPs to those of contemporary populations to find the populations most closely related to the individual. This provides evidence...

  7. Complete Genome Sequence of Ikoma Lyssavirus

    OpenAIRE

    Marston, Denise A.; Ellis, Richard J.; Horton, Daniel L.; Kuzmin, Ivan V.; Wise, Emma L.; McElhinney, Lorraine M.; Banyard, Ashley C.; Ngeleja, Chanasa; Keyyu, Julius; Cleaveland, Sarah; Lembo, Tiziana; Rupprecht, Charles E.; Fooks, Anthony R.

    2012-01-01

    Lyssaviruses (family Rhabdoviridae) constitute one of the most important groups of viral zoonoses globally. All lyssaviruses cause the disease rabies, an acute progressive encephalitis for which, once symptoms occur, there is no effective cure. Currently available vaccines are highly protective against the predominantly circulating lyssavirus species. Using next-generation sequencing technologies, we have obtained the whole-genome sequence for a novel lyssavirus, Ikoma lyssavirus (IKOV), isol...

  8. Whole Genome Amplification and Reduced-Representation Genome Sequencing of Schistosoma japonicum Miracidia.

    Directory of Open Access Journals (Sweden)

    Jonathan A Shortt

    2017-01-01

    Full Text Available In areas where schistosomiasis control programs have been implemented, morbidity and prevalence have been greatly reduced. However, to sustain these reductions and move towards interruption of transmission, new tools for disease surveillance are needed. Genomic methods have the potential to help trace the sources of new infections, and allow us to monitor drug resistance. Large-scale genotyping efforts for schistosome species have been hindered by cost, limited numbers of established target loci, and the small amount of DNA obtained from miracidia, the life stage most readily acquired from humans. Here, we present a method using next generation sequencing to provide high-resolution genomic data from S. japonicum for population-based studies.We applied whole genome amplification followed by double digest restriction site associated DNA sequencing (ddRADseq to individual S. japonicum miracidia preserved on Whatman FTA cards. We found that we could effectively and consistently survey hundreds of thousands of variants from 10,000 to 30,000 loci from archived miracidia as old as six years. An analysis of variation from eight miracidia obtained from three hosts in two villages in Sichuan showed clear population structuring by village and host even within this limited sample.This high-resolution sequencing approach yields three orders of magnitude more information than microsatellite genotyping methods that have been employed over the last decade, creating the potential to answer detailed questions about the sources of human infections and to monitor drug resistance. Costs per sample range from $50-$200, depending on the amount of sequence information desired, and we expect these costs can be reduced further given continued reductions in sequencing costs, improvement of protocols, and parallelization. This approach provides new promise for using modern genome-scale sampling to S. japonicum surveillance, and could be applied to other schistosome species

  9. Whole Genome Amplification and Reduced-Representation Genome Sequencing of Schistosoma japonicum Miracidia.

    Science.gov (United States)

    Shortt, Jonathan A; Card, Daren C; Schield, Drew R; Liu, Yang; Zhong, Bo; Castoe, Todd A; Carlton, Elizabeth J; Pollock, David D

    2017-01-01

    In areas where schistosomiasis control programs have been implemented, morbidity and prevalence have been greatly reduced. However, to sustain these reductions and move towards interruption of transmission, new tools for disease surveillance are needed. Genomic methods have the potential to help trace the sources of new infections, and allow us to monitor drug resistance. Large-scale genotyping efforts for schistosome species have been hindered by cost, limited numbers of established target loci, and the small amount of DNA obtained from miracidia, the life stage most readily acquired from humans. Here, we present a method using next generation sequencing to provide high-resolution genomic data from S. japonicum for population-based studies. We applied whole genome amplification followed by double digest restriction site associated DNA sequencing (ddRADseq) to individual S. japonicum miracidia preserved on Whatman FTA cards. We found that we could effectively and consistently survey hundreds of thousands of variants from 10,000 to 30,000 loci from archived miracidia as old as six years. An analysis of variation from eight miracidia obtained from three hosts in two villages in Sichuan showed clear population structuring by village and host even within this limited sample. This high-resolution sequencing approach yields three orders of magnitude more information than microsatellite genotyping methods that have been employed over the last decade, creating the potential to answer detailed questions about the sources of human infections and to monitor drug resistance. Costs per sample range from $50-$200, depending on the amount of sequence information desired, and we expect these costs can be reduced further given continued reductions in sequencing costs, improvement of protocols, and parallelization. This approach provides new promise for using modern genome-scale sampling to S. japonicum surveillance, and could be applied to other schistosome species and other

  10. High-Throughput Sequencing, a VersatileWeapon to Support Genome-Based Diagnosis in Infectious Diseases: Applications to Clinical Bacteriology

    Directory of Open Access Journals (Sweden)

    Ségolène Caboche

    2014-04-01

    Full Text Available The recent progresses of high-throughput sequencing (HTS technologies enable easy and cost-reduced access to whole genome sequencing (WGS or re-sequencing. HTS associated with adapted, automatic and fast bioinformatics solutions for sequencing applications promises an accurate and timely identification and characterization of pathogenic agents. Many studies have demonstrated that data obtained from HTS analysis have allowed genome-based diagnosis, which has been consistent with phenotypic observations. These proofs of concept are probably the first steps toward the future of clinical microbiology. From concept to routine use, many parameters need to be considered to promote HTS as a powerful tool to help physicians and clinicians in microbiological investigations. This review highlights the milestones to be completed toward this purpose.

  11. SOLiD sequencing of four Vibrio vulnificus genomes enables comparative genomic analysis and identification of candidate clade-specific virulence genes

    Directory of Open Access Journals (Sweden)

    Telonis-Scott Marina

    2010-09-01

    Full Text Available Abstract Background Vibrio vulnificus is the leading cause of reported death from consumption of seafood in the United States. Despite several decades of research on molecular pathogenesis, much remains to be learned about the mechanisms of virulence of this opportunistic bacterial pathogen. The two complete and annotated genomic DNA sequences of V. vulnificus belong to strains of clade 2, which is the predominant clade among clinical strains. Clade 2 strains generally possess higher virulence potential in animal models of disease compared with clade 1, which predominates among environmental strains. SOLiD sequencing of four V. vulnificus strains representing different clades (1 and 2 and biotypes (1 and 2 was used for comparative genomic analysis. Results Greater than 4,100,000 bases were sequenced of each strain, yielding approximately 100-fold coverage for each of the four genomes. Although the read lengths of SOLiD genomic sequencing were only 35 nt, we were able to make significant conclusions about the unique and shared sequences among the genomes, including identification of single nucleotide polymorphisms. Comparative analysis of the newly sequenced genomes to the existing reference genomes enabled the identification of 3,459 core V. vulnificus genes shared among all six strains and 80 clade 2-specific genes. We identified 523,161 SNPs among the six genomes. Conclusions We were able to glean much information about the genomic content of each strain using next generation sequencing. Flp pili, GGDEF proteins, and genomic island XII were identified as possible virulence factors because of their presence in virulent sequenced strains. Genomic comparisons also point toward the involvement of sialic acid catabolism in pathogenesis.

  12. A sequence-based survey of the complex structural organization of tumor genomes

    Energy Technology Data Exchange (ETDEWEB)

    Collins, Colin; Raphael, Benjamin J.; Volik, Stanislav; Yu, Peng; Wu, Chunxiao; Huang, Guiqing; Linardopoulou, Elena V.; Trask, Barbara J.; Waldman, Frederic; Costello, Joseph; Pienta, Kenneth J.; Mills, Gordon B.; Bajsarowicz, Krystyna; Kobayashi, Yasuko; Sridharan, Shivaranjani; Paris, Pamela; Tao, Quanzhou; Aerni, Sarah J.; Brown, Raymond P.; Bashir, Ali; Gray, Joe W.; Cheng, Jan-Fang; de Jong, Pieter; Nefedov, Mikhail; Ried, Thomas; Padilla-Nash, Hesed M.; Collins, Colin C.

    2008-04-03

    The genomes of many epithelial tumors exhibit extensive chromosomal rearrangements. All classes of genome rearrangements can be identified using End Sequencing Profiling (ESP), which relies on paired-end sequencing of cloned tumor genomes. In this study, brain, breast, ovary and prostate tumors along with three breast cancer cell lines were surveyed with ESP yielding the largest available collection of sequence-ready tumor genome breakpoints and providing evidence that some rearrangements may be recurrent. Sequencing and fluorescence in situ hybridization (FISH) confirmed translocations and complex tumor genome structures that include coamplification and packaging of disparate genomic loci with associated molecular heterogeneity. Comparison of the tumor genomes suggests recurrent rearrangements. Some are likely to be novel structural polymorphisms, whereas others may be bona fide somatic rearrangements. A recurrent fusion transcript in breast tumors and a constitutional fusion transcript resulting from a segmental duplication were identified. Analysis of end sequences for single nucleotide polymorphisms (SNPs) revealed candidate somatic mutations and an elevated rate of novel SNPs in an ovarian tumor. These results suggest that the genomes of many epithelial tumors may be far more dynamic and complex than previously appreciated and that genomic fusions including fusion transcripts and proteins may be common, possibly yielding tumor-specific biomarkers and therapeutic targets.

  13. Whole genome characterization of non-tissue culture adapted HRSV strains in severely infected children

    Directory of Open Access Journals (Sweden)

    Kumaria Rajni

    2011-07-01

    Full Text Available Abstract Background Human respiratory syncytial virus (HRSV is the most important virus causing lower respiratory infection in young children. The complete genetic characterization of RSV clinical strains is a prerequisite for understanding HRSV infection in the clinical context. Current information about the genetic structure of the HRSV genome has largely been obtained using tissue culture adapted viruses. During tissue culture adaptation genetic changes can be introduced into the virus genome, which may obscure subtle variations in the genetic structure of different RSV strains. Methods In this study we describe a novel Sanger sequencing strategy which allowed the complete genetic characterisation of 14 clinical HRSV strains. The viruses were sequenced directly in the nasal washes of severely hospitalized children, and without prior passage of the viruses in tissue culture. Results The analysis of nucleotide sequences suggested that vRNA length is a variable factor among primary strains, while the phylogenetic analysis suggests selective pressure for change. The G gene showed the greatest sequence variation (2-6.4%, while small hydrophobic protein and matrix genes were completely conserved across all clinical strains studied. A number of sequence changes in the F, L, M2-1 and M2-2 genes were observed that have not been described in laboratory isolates. The gene junction regions showed more sequence variability, and in particular the intergenic regions showed a highest level of sequence variation. Although the clinical strains grew slower than the HRSVA2 virus isolate in tissue culture, the HRSVA2 isolate and clinical strains formed similar virus structures such as virus filaments and inclusion bodies in infected cells; supporting the clinical relevance of these virus structures. Conclusion This is the first report to describe the complete genetic characterization of HRSV clinical strains that have been sequenced directly from clinical

  14. Draft genome sequence of Bradyrhizobium sp. strain BR 3262, an effective microsymbiont recommended for cowpea inoculation in Brazil.

    Science.gov (United States)

    Simões-Araújo, Jean Luiz; Leite, Jakson; Marie Rouws, Luc Felicianus; Passos, Samuel Ribeiro; Xavier, Gustavo Ribeiro; Rumjanek, Norma Gouvêa; Zilli, Jerri Édson

    The strain BR 3262 was isolated from nodule of cowpea (Vigna unguiculata L. Walp) growing in soil of the Atlantic Forest area in Brazil and it is reported as an efficient nitrogen fixing bacterium associated to cowpea. Firstly, this strain was assigned as Bradyrhizobium elkanii, however, recently a more detailed genetic and molecular characterization has indicated it could be a Bradyrhizobium pachyrhizi species. We report here the draft genome sequence of B. pachyrhizi strain BR 3262, an elite bacterium used as inoculant for cowpea. The whole genome with 116 scaffolds, 8,965,178bp and 63.8% of C+G content for BR 3262 was obtained using Illumina MiSeq sequencing technology. Annotation was added by the RAST prokaryotic genome annotation service and shown 8369 coding sequences, 52 RNAs genes, classified in 504 subsystems. Published by Elsevier Editora Ltda.

  15. Draft genome sequence of Bradyrhizobium sp. strain BR 3262, an effective microsymbiont recommended for cowpea inoculation in Brazil

    Directory of Open Access Journals (Sweden)

    Jean Luiz Simões-Araújo

    Full Text Available Abstract The strain BR 3262 was isolated from nodule of cowpea (Vigna unguiculata L. Walp growing in soil of the Atlantic Forest area in Brazil and it is reported as an efficient nitrogen fixing bacterium associated to cowpea. Firstly, this strain was assigned as Bradyrhizobium elkanii, however, recently a more detailed genetic and molecular characterization has indicated it could be a Bradyrhizobium pachyrhizi species. We report here the draft genome sequence of B. pachyrhizi strain BR 3262, an elite bacterium used as inoculant for cowpea. The whole genome with 116 scaffolds, 8,965,178 bp and 63.8% of C+G content for BR 3262 was obtained using Illumina MiSeq sequencing technology. Annotation was added by the RAST prokaryotic genome annotation service and shown 8369 coding sequences, 52 RNAs genes, classified in 504 subsystems.

  16. Living laboratory: whole-genome sequencing as a learning healthcare enterprise.

    Science.gov (United States)

    Angrist, M; Jamal, L

    2015-04-01

    With the proliferation of affordable large-scale human genomic data come profound and vexing questions about management of such data and their clinical uncertainty. These issues challenge the view that genomic research on human beings can (or should) be fully segregated from clinical genomics, either conceptually or practically. Here, we argue that the sharp distinction between clinical care and research is especially problematic in the context of large-scale genomic sequencing of people with suspected genetic conditions. Core goals of both enterprises (e.g. understanding genotype-phenotype relationships; generating an evidence base for genomic medicine) are more likely to be realized at a population scale if both those ordering and those undergoing sequencing for diagnostic reasons are routinely and longitudinally studied. Rather than relying on expensive and lengthy randomized clinical trials and meta-analyses, we propose leveraging nascent clinical-research hybrid frameworks into a broader, more permanent instantiation of exploratory medical sequencing. Such an investment could enlighten stakeholders about the real-life challenges posed by whole-genome sequencing, such as establishing the clinical actionability of genetic variants, returning 'off-target' results to families, developing effective service delivery models and monitoring long-term outcomes. © 2014 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  17. Rickettsia asembonensis Characterization by Multilocus Sequence Typing of Complete Genes, Peru.

    Science.gov (United States)

    Loyola, Steev; Flores-Mendoza, Carmen; Torre, Armando; Kocher, Claudine; Melendrez, Melanie; Luce-Fedrow, Alison; Maina, Alice N; Richards, Allen L; Leguia, Mariana

    2018-05-01

    While studying rickettsial infections in Peru, we detected Rickettsia asembonensis in fleas from domestic animals. We characterized 5 complete genomic regions (17kDa, gltA, ompA, ompB, and sca4) and conducted multilocus sequence typing and phylogenetic analyses. The molecular isolate from Peru is distinct from the original R. asembonensis strain from Kenya.

  18. Pigs in sequence space: A 0.66X coverage pig genome survey based on shotgun sequencing

    DEFF Research Database (Denmark)

    Wernersson, Rasmus; Schierup, M.H.; Jorgensen, F.G.

    2005-01-01

    sequences (0.66X coverage) from the pig genome. The data are hereby released (NCBI Trace repository with center name "SDJVP", and project name "Sino-Danish Pig Genome Project") together with an initial evolutionary analysis. The non-repetitive fraction of the sequences was aligned to the UCSC human...

  19. Genome sequencing of bacteria: sequencing, de novo assembly and rapid analysis using open source tools.

    Science.gov (United States)

    Kisand, Veljo; Lettieri, Teresa

    2013-04-01

    De novo genome sequencing of previously uncharacterized microorganisms has the potential to open up new frontiers in microbial genomics by providing insight into both functional capabilities and biodiversity. Until recently, Roche 454 pyrosequencing was the NGS method of choice for de novo assembly because it generates hundreds of thousands of long reads (tools for processing NGS data are increasingly free and open source and are often adopted for both their high quality and role in promoting academic freedom. The error rate of pyrosequencing the Alcanivorax borkumensis genome was such that thousands of insertions and deletions were artificially introduced into the finished genome. Despite a high coverage (~30 fold), it did not allow the reference genome to be fully mapped. Reads from regions with errors had low quality, low coverage, or were missing. The main defect of the reference mapping was the introduction of artificial indels into contigs through lower than 100% consensus and distracting gene calling due to artificial stop codons. No assembler was able to perform de novo assembly comparable to reference mapping. Automated annotation tools performed similarly on reference mapped and de novo draft genomes, and annotated most CDSs in the de novo assembled draft genomes. Free and open source software (FOSS) tools for assembly and annotation of NGS data are being developed rapidly to provide accurate results with less computational effort. Usability is not high priority and these tools currently do not allow the data to be processed without manual intervention. Despite this, genome assemblers now readily assemble medium short reads into long contigs (>97-98% genome coverage). A notable gap in pyrosequencing technology is the quality of base pair calling and conflicting base pairs between single reads at the same nucleotide position. Regardless, using draft whole genomes that are not finished and remain fragmented into tens of contigs allows one to characterize

  20. Draft genome sequence of Phomopsis longicolla isolate MSPL 10-6

    Directory of Open Access Journals (Sweden)

    Shuxian Li

    2015-03-01

    Full Text Available Phomopsis longicolla is the primary cause of Phomopsis seed decay in soybean. This disease severely affects soybean seed quality by reducing seed viability and oil content, altering seed composition, and increasing frequencies of moldy and/or split beans. It is one of the most economically important soybean diseases. Here, we report the de novo assembled draft genome sequence of the P. longicolla isolate MSPL10-6, which was isolated from field-grown soybean seed in Mississippi, USA. This study represents the first reported genome sequence of a seedborne fungal pathogen in the Diaporthe–Phomopsis complex. The P. longicolla genome sequence will enable research into the genetic basis of fungal infection of soybean seed and provide information for the study of soybean–fungal interactions. The genome sequence will also be valuable for molecular genetic marker development, manipulation of pathogenicity-related genes and development of new control strategies for this pathogen.

  1. Plastid and mitochondrion genomic sequences from Arctic Chlorella sp. ArM0029B

    Science.gov (United States)

    2014-01-01

    Background Chorella is the representative taxon of Chlorellales in Trebouxiophyceae, and its chloroplast (cp) genomic information has been thought to depend only on studies concerning Chlorella vulgaris and GenBank information of C. variablis. Mitochondrial (mt) genomic information regarding Chlorella is currently unavailable. To elucidate the evolution of organelle genomes and genetic information of Chlorella, we have sequenced and characterized the cp and mt genomes of Arctic Chlorella sp. ArM0029B. Results The 119,989-bp cp genome lacking inverted repeats and 65,049-bp mt genome were sequenced. The ArM0029B cp genome contains 114 conserved genes, including 32 tRNA genes, 3 rRNA genes, and 79 genes encoding proteins. Chlorella cp genomes are highly rearranged except for a Chlorella-specific six-gene cluster, and the ArM0029B plastid resembles that of Chlorella variabilis except for a 15-kb gene cluster inversion. In the mt genome, 62 conserved genes, including 27 tRNA genes, 3 rRNA genes, and 32 genes encoding proteins were determined. The mt genome of ArM0029B is similar to that of the non-photosynthetic species Prototheca and Heicosporidium. The ArM0029B mt genome contains a group I intron, with an ORF containing two LAGLIDADG motifs, in cox1. The intronic ORF is shared by C. vulgaris and Prototheca. The phylogeny of the plastid genome reveals that ArM0029B showed a close relationship of Chlorella to Parachlorella and Oocystis within Chlorellales. The distribution of the cox1 intron at 721 support membership in the order Chlorellales. Mitochondrial phylogenomic analyses, however, indicated that ArM0029B shows a greater affinity to MX-AZ01 and Coccomyxa than to the Helicosporidium-Prototheca clade, although the detailed phylogenetic relationships among the three taxa remain to be resolved. Conclusions The plastid genome of ArM0029B is similar to that of C. variabilis. The mt sequence of ArM0029B is the first genome to be reported for Chlorella. Chloroplast

  2. Comparative genome sequencing of Drosophila pseudoobscura: Chromosomal, gene, and cis-element evolution

    DEFF Research Database (Denmark)

    Richards, Stephen; Liu, Yue; Bettencourt, Brian R.

    2005-01-01

    years (Myr) since the pseudoobscura/melanogaster divergence. Genes expressed in the testes had higher amino acid sequence divergence than the genome-wide average, consistent with the rapid evolution of sex-specific proteins. Cis-regulatory sequences are more conserved than random and nearby sequences......We have sequenced the genome of a second Drosophila species, Drosophila pseudoobscura, and compared this to the genome sequence of Drosophila melanogaster, a primary model organism. Throughout evolution the vast majority of Drosophila genes have remained on the same chromosome arm, but within each...... between the species-but the difference is slight, suggesting that the evolution of cis-regulatory elements is flexible. Overall, a pattern of repeat-mediated chromosomal rearrangement, and high coadaptation of both male genes and cis-regulatory sequences emerges as important themes of genome divergence...

  3. Sequencing and characterization of the complete mitochondrial genome of Japanese Swellshark (Cephalloscyllium umbratile)

    OpenAIRE

    Zhu, Ke-Cheng; Liang, Yin-Yin; Wu, Na; Guo, Hua-Yang; Zhang, Nan; Jiang, Shi-Gui; Zhang, Dian-Chang

    2017-01-01

    To further comprehend the genome features of Cephalloscyllium umbratile (Carcharhiniformes), an endangered species, the complete mitochondrial DNA (mtDNA) was firstly sequenced and annotated. The full-length mtDNA of C. umbratile was 16,697 bp and contained ribosomal RNA (rRNA) genes, 13 protein-coding genes (PCGs), 23 transfer RNA (tRNA) genes, and a major non-coding control region. Each PCG was initiated by an authoritative ATN codon, except for COX1 initiated by a GTG codon. Seven of 13 PC...

  4. Genomic 3' terminal sequence comparison of three isolates of rabbit haemorrhagic disease virus.

    Science.gov (United States)

    Milton, I D; Vlasak, R; Nowotny, N; Rodak, L; Carter, M J

    1992-05-15

    Comparison of sequence data is necessary in older to investigate virus origins, identify features common to virulent strains, and characterize genomic organization within virus families. A virulent caliciviral disease of rabbits recently emerged in China. We have sequenced 1100 bases from the 3' ends of two independent European isolates of this virus, and compared these with previously determined calicivirus sequences. Rabbit caliciviruses were closely related, despite the different countries in which isolation was made. This supports the rapid spread of a new virus across Europe. The capsid protein sequences of these rabbit viruses differ markedly from those determined for feline calicivirus, but a hypothetical 3' open reading frame is relatively well conserved between the caliciviruses of these two different hosts and argues for a functional role.

  5. Complete Genome Sequence of Staphylococcus epidermidis 1457.

    Science.gov (United States)

    Galac, Madeline R; Stam, Jason; Maybank, Rosslyn; Hinkle, Mary; Mack, Dietrich; Rohde, Holger; Roth, Amanda L; Fey, Paul D

    2017-06-01

    Staphylococcus epidermidis 1457 is a frequently utilized strain that is amenable to genetic manipulation and has been widely used for biofilm-related research. We report here the whole-genome sequence of this strain, which encodes 2,277 protein-coding genes and 81 RNAs within its 2.4-Mb genome and plasmid. Copyright © 2017 Galac et al.

  6. Full Genomic Characterization of a Saffold Virus Isolated in Peru

    Directory of Open Access Journals (Sweden)

    Mariana Leguia

    2015-11-01

    Full Text Available While studying respiratory infections of unknown etiology we detected Saffold virus in an oropharyngeal swab collected from a two-year-old female suffering from diarrhea and respiratory illness. The full viral genome recovered by deep sequencing showed 98% identity to a previously described Saffold strain isolated in Japan. Phylogenetic analysis confirmed the Peruvian Saffold strain belongs to genotype 3 and is most closely related to strains that have circulated in Asia. This is the first documented case report of Saffold virus in Peru and the only complete genomic characterization of a Saffold-3 isolate from the Americas.

  7. Characterization of the dsDNA prophage sequences in the genome of Neisseria gonorrhoeae and visualization of productive bacteriophage

    Directory of Open Access Journals (Sweden)

    Maugel Timothy K

    2007-07-01

    Full Text Available Abstract Background Bioinformatic analysis of the genome sequence of Neisseria gonorrhoeae revealed the presence of nine probable prophage islands. The distribution, conservation and function of many of these sequences, and their ability to produce bacteriophage particles are unknown. Results Our analysis of the genomic sequence of FA1090 identified five genomic regions (NgoΦ1 – 5 that are related to dsDNA lysogenic phage. The genetic content of the dsDNA prophage sequences were examined in detail and found to contain blocks of genes encoding for proteins homologous to proteins responsible for phage DNA replication, structural proteins and proteins responsible for phage assembly. The DNA sequences from NgoΦ1, NgoΦ2 and NgoΦ3 contain some significant regions of identity. A unique region of NgoΦ2 showed very high similarity with the Pseudomonas aeruginosa generalized transducing phage F116. Comparative analysis at the nucleotide and protein levels suggests that the sequences of NgoΦ1 and NgoΦ2 encode functionally active phages, while NgoΦ3, NgoΦ4 and NgoΦ5 encode incomplete genomes. Expression of the NgoΦ1 and NgoΦ2 repressors in Escherichia coli inhibit the growth of E. coli and the propagation of phage λ. The NgoΦ2 repressor was able to inhibit transcription of N. gonorrhoeae genes and Haemophilus influenzae HP1 phage promoters. The holin gene of NgoΦ1 (identical to that encoded by NgoΦ2, when expressed in E. coli, could serve as substitute for the phage λ s gene. We were able to detect the presence of the DNA derived from NgoΦ1 in the cultures of N. gonorrhoeae. Electron microscopy analysis of culture supernatants revealed the presence of multiple forms of bacteriophage particles. Conclusion These data suggest that the genes similar to dsDNA lysogenic phage present in the gonococcus are generally conserved in this pathogen and that they are able to regulate the expression of other neisserial genes. Since phage particles were

  8. Challenging a bioinformatic tool’s ability to detect microbial contaminants using in silico whole genome sequencing data

    Directory of Open Access Journals (Sweden)

    Nathan D. Olson

    2017-09-01

    Full Text Available High sensitivity methods such as next generation sequencing and polymerase chain reaction (PCR are adversely impacted by organismal and DNA contaminants. Current methods for detecting contaminants in microbial materials (genomic DNA and cultures are not sensitive enough and require either a known or culturable contaminant. Whole genome sequencing (WGS is a promising approach for detecting contaminants due to its sensitivity and lack of need for a priori assumptions about the contaminant. Prior to applying WGS, we must first understand its limitations for detecting contaminants and potential for false positives. Herein we demonstrate and characterize a WGS-based approach to detect organismal contaminants using an existing metagenomic taxonomic classification algorithm. Simulated WGS datasets from ten genera as individuals and binary mixtures of eight organisms at varying ratios were analyzed to evaluate the role of contaminant concentration and taxonomy on detection. For the individual genomes the false positive contaminants reported depended on the genus, with Staphylococcus, Escherichia, and Shigella having the highest proportion of false positives. For nearly all binary mixtures the contaminant was detected in the in-silico datasets at the equivalent of 1 in 1,000 cells, though F. tularensis was not detected in any of the simulated contaminant mixtures and Y. pestis was only detected at the equivalent of one in 10 cells. Once a WGS method for detecting contaminants is characterized, it can be applied to evaluate microbial material purity, in efforts to ensure that contaminants are characterized in microbial materials used to validate pathogen detection assays, generate genome assemblies for database submission, and benchmark sequencing methods.

  9. Deciphering the biology of Mycobacterium tuberculosis from thecomplete genome sequence

    DEFF Research Database (Denmark)

    Cole, S.T.; Krogh, Anders Stærmose

    1998-01-01

    Countless millions of people have died from tuberculosis, a chronic infectious disease caused by the tubercle bacillus. The complete genome sequence of the best-characterized strain of Mycobacterium tuberculosis, H37Rv, has been determined and analysed in order to improve our understanding....... tuberculosis differs radically from other bacteria in that a very large portion of its coding capacity is devoted to the production of enzymes involved in lipogenesis and lipolysis, and to two new families of glycine-rich proteins with a repetitive structure that may represent a source of antigenic variation....

  10. Comparison of two Next Generation sequencing platforms for full genome sequencing of Classical Swine Fever Virus

    DEFF Research Database (Denmark)

    Fahnøe, Ulrik; Pedersen, Anders Gorm; Höper, Dirk

    2013-01-01

    to the consensus sequence. Additionally, we got an average sequence depth for the genome of 4000 for the Iontorrent PGM and 400 for the FLX platform making the mapping suitable for single nucleotide variant (SNV) detection. The analysis revealed a single non-silent SNV A10665G leading to the amino acid change D......Next Generation Sequencing (NGS) is becoming more adopted into viral research and will be the preferred technology in the years to come. We have recently sequenced several strains of Classical Swine Fever Virus (CSFV) by NGS on both Genome Sequencer FLX (GS FLX) and Iontorrent PGM platforms...

  11. Unveiling Mycoplasma hyopneumoniae Promoters: Sequence Definition and Genomic Distribution

    Science.gov (United States)

    Weber, Shana de Souto; Sant'Anna, Fernando Hayashi; Schrank, Irene Silveira

    2012-01-01

    Several Mycoplasma species have had their genome completely sequenced, including four strains of the swine pathogen Mycoplasma hyopneumoniae. Nevertheless, little is known about the nucleotide sequences that control transcriptional initiation in these microorganisms. Therefore, with the objective of investigating the promoter sequences of M. hyopneumoniae, 23 transcriptional start sites (TSSs) of distinct genes were mapped. A pattern that resembles the σ70 promoter −10 element was found upstream of the TSSs. However, no −35 element was distinguished. Instead, an AT-rich periodic signal was identified. About half of the experimentally defined promoters contained the motif 5′-TRTGn-3′, which was identical to the −16 element usually found in Gram-positive bacteria. The defined promoters were utilized to build position-specific scoring matrices in order to scan putative promoters upstream of all coding sequences (CDSs) in the M. hyopneumoniae genome. Two hundred and one signals were found associated with 169 CDSs. Most of these sequences were located within 100 nucleotides of the start codons. This study has shown that the number of promoter-like sequences in the M. hyopneumoniae genome is more frequent than expected by chance, indicating that most of the sequences detected are probably biologically functional. PMID:22334569

  12. The Personal Genome Project Canada: findings from whole genome sequences of the inaugural 56 participants.

    Science.gov (United States)

    Reuter, Miriam S; Walker, Susan; Thiruvahindrapuram, Bhooma; Whitney, Joe; Cohn, Iris; Sondheimer, Neal; Yuen, Ryan K C; Trost, Brett; Paton, Tara A; Pereira, Sergio L; Herbrick, Jo-Anne; Wintle, Richard F; Merico, Daniele; Howe, Jennifer; MacDonald, Jeffrey R; Lu, Chao; Nalpathamkalam, Thomas; Sung, Wilson W L; Wang, Zhuozhi; Patel, Rohan V; Pellecchia, Giovanna; Wei, John; Strug, Lisa J; Bell, Sherilyn; Kellam, Barbara; Mahtani, Melanie M; Bassett, Anne S; Bombard, Yvonne; Weksberg, Rosanna; Shuman, Cheryl; Cohn, Ronald D; Stavropoulos, Dimitri J; Bowdin, Sarah; Hildebrandt, Matthew R; Wei, Wei; Romm, Asli; Pasceri, Peter; Ellis, James; Ray, Peter; Meyn, M Stephen; Monfared, Nasim; Hosseini, S Mohsen; Joseph-George, Ann M; Keeley, Fred W; Cook, Ryan A; Fiume, Marc; Lee, Hin C; Marshall, Christian R; Davies, Jill; Hazell, Allison; Buchanan, Janet A; Szego, Michael J; Scherer, Stephen W

    2018-02-05

    The Personal Genome Project Canada is a comprehensive public data resource that integrates whole genome sequencing data and health information. We describe genomic variation identified in the initial recruitment cohort of 56 volunteers. Volunteers were screened for eligibility and provided informed consent for open data sharing. Using blood DNA, we performed whole genome sequencing and identified all possible classes of DNA variants. A genetic counsellor explained the implication of the results to each participant. Whole genome sequencing of the first 56 participants identified 207 662 805 sequence variants and 27 494 copy number variations. We analyzed a prioritized disease-associated data set ( n = 1606 variants) according to standardized guidelines, and interpreted 19 variants in 14 participants (25%) as having obvious health implications. Six of these variants (e.g., in BRCA1 or mosaic loss of an X chromosome) were pathogenic or likely pathogenic. Seven were risk factors for cancer, cardiovascular or neurobehavioural conditions. Four other variants - associated with cancer, cardiac or neurodegenerative phenotypes - remained of uncertain significance because of discrepancies among databases. We also identified a large structural chromosome aberration and a likely pathogenic mitochondrial variant. There were 172 recessive disease alleles (e.g., 5 individuals carried mutations for cystic fibrosis). Pharmacogenomics analyses revealed another 3.9 potentially relevant genotypes per individual. Our analyses identified a spectrum of genetic variants with potential health impact in 25% of participants. When also considering recessive alleles and variants with potential pharmacologic relevance, all 56 participants had medically relevant findings. Although access is mostly limited to research, whole genome sequencing can provide specific and novel information with the potential of major impact for health care. © 2018 Joule Inc. or its licensors.

  13. The zebrafish reference genome sequence and its relationship to the human genome

    Science.gov (United States)

    Howe, Kerstin; Clark, Matthew D.; Torroja, Carlos F.; Torrance, James; Berthelot, Camille; Muffato, Matthieu; Collins, John E.; Humphray, Sean; McLaren, Karen; Matthews, Lucy; McLaren, Stuart; Sealy, Ian; Caccamo, Mario; Churcher, Carol; Scott, Carol; Barrett, Jeffrey C.; Koch, Romke; Rauch, Gerd-Jörg; White, Simon; Chow, William; Kilian, Britt; Quintais, Leonor T.; Guerra-Assunção, José A.; Zhou, Yi; Gu, Yong; Yen, Jennifer; Vogel, Jan-Hinnerk; Eyre, Tina; Redmond, Seth; Banerjee, Ruby; Chi, Jianxiang; Fu, Beiyuan; Langley, Elizabeth; Maguire, Sean F.; Laird, Gavin K.; Lloyd, David; Kenyon, Emma; Donaldson, Sarah; Sehra, Harminder; Almeida-King, Jeff; Loveland, Jane; Trevanion, Stephen; Jones, Matt; Quail, Mike; Willey, Dave; Hunt, Adrienne; Burton, John; Sims, Sarah; McLay, Kirsten; Plumb, Bob; Davis, Joy; Clee, Chris; Oliver, Karen; Clark, Richard; Riddle, Clare; Eliott, David; Threadgold, Glen; Harden, Glenn; Ware, Darren; Mortimer, Beverly; Kerry, Giselle; Heath, Paul; Phillimore, Benjamin; Tracey, Alan; Corby, Nicole; Dunn, Matthew; Johnson, Christopher; Wood, Jonathan; Clark, Susan; Pelan, Sarah; Griffiths, Guy; Smith, Michelle; Glithero, Rebecca; Howden, Philip; Barker, Nicholas; Stevens, Christopher; Harley, Joanna; Holt, Karen; Panagiotidis, Georgios; Lovell, Jamieson; Beasley, Helen; Henderson, Carl; Gordon, Daria; Auger, Katherine; Wright, Deborah; Collins, Joanna; Raisen, Claire; Dyer, Lauren; Leung, Kenric; Robertson, Lauren; Ambridge, Kirsty; Leongamornlert, Daniel; McGuire, Sarah; Gilderthorp, Ruth; Griffiths, Coline; Manthravadi, Deepa; Nichol, Sarah; Barker, Gary; Whitehead, Siobhan; Kay, Michael; Brown, Jacqueline; Murnane, Clare; Gray, Emma; Humphries, Matthew; Sycamore, Neil; Barker, Darren; Saunders, David; Wallis, Justene; Babbage, Anne; Hammond, Sian; Mashreghi-Mohammadi, Maryam; Barr, Lucy; Martin, Sancha; Wray, Paul; Ellington, Andrew; Matthews, Nicholas; Ellwood, Matthew; Woodmansey, Rebecca; Clark, Graham; Cooper, James; Tromans, Anthony; Grafham, Darren; Skuce, Carl; Pandian, Richard; Andrews, Robert; Harrison, Elliot; Kimberley, Andrew; Garnett, Jane; Fosker, Nigel; Hall, Rebekah; Garner, Patrick; Kelly, Daniel; Bird, Christine; Palmer, Sophie; Gehring, Ines; Berger, Andrea; Dooley, Christopher M.; Ersan-Ürün, Zübeyde; Eser, Cigdem; Geiger, Horst; Geisler, Maria; Karotki, Lena; Kirn, Anette; Konantz, Judith; Konantz, Martina; Oberländer, Martina; Rudolph-Geiger, Silke; Teucke, Mathias; Osoegawa, Kazutoyo; Zhu, Baoli; Rapp, Amanda; Widaa, Sara; Langford, Cordelia; Yang, Fengtang; Carter, Nigel P.; Harrow, Jennifer; Ning, Zemin; Herrero, Javier; Searle, Steve M. J.; Enright, Anton; Geisler, Robert; Plasterk, Ronald H. A.; Lee, Charles; Westerfield, Monte; de Jong, Pieter J.; Zon, Leonard I.; Postlethwait, John H.; Nüsslein-Volhard, Christiane; Hubbard, Tim J. P.; Crollius, Hugues Roest; Rogers, Jane; Stemple, Derek L.

    2013-01-01

    Zebrafish have become a popular organism for the study of vertebrate gene function1,2. The virtually transparent embryos of this species, and the ability to accelerate genetic studies by gene knockdown or overexpression, have led to the widespread use of zebrafish in the detailed investigation of vertebrate gene function and increasingly, the study of human genetic disease3–5. However, for effective modelling of human genetic disease it is important to understand the extent to which zebrafish genes and gene structures are related to orthologous human genes. To examine this, we generated a high-quality sequence assembly of the zebrafish genome, made up of an overlapping set of completely sequenced large-insert clones that were ordered and oriented using a high-resolution high-density meiotic map. Detailed automatic and manual annotation provides evidence of more than 26,000 protein-coding genes6, the largest gene set of any vertebrate so far sequenced. Comparison to the human reference genome shows that approximately 70% of human genes have at least one obvious zebrafish orthologue. In addition, the high quality of this genome assembly provides a clearer understanding of key genomic features such as a unique repeat content, a scarcity of pseudogenes, an enrichment of zebrafish-specific genes on chromosome 4 and chromosomal regions that influence sex determination. PMID:23594743

  14. Supplementary Material for: Whole genome sequencing reveals genomic heterogeneity and antibiotic purification in Mycobacterium tuberculosis isolates

    KAUST Repository

    Black, PA; Vos, M. de; Louw, GE; Merwe, RG van der; Dippenaar, A.; Streicher, EM; Abdallah, AM; Sampson, SL; Victor, TC; Dolby, T.; Simpson, JA; Helden, PD van; Warren, RM; Pain, Arnab

    2015-01-01

    Abstract Background Whole genome sequencing has revolutionised the interrogation of mycobacterial genomes. Recent studies have reported conflicting findings on the genomic stability of Mycobacterium tuberculosis during the evolution of drug

  15. Complete genome sequence of Parvibaculum lavamentivorans type strain (DS-1(T)).

    Science.gov (United States)

    Schleheck, David; Weiss, Michael; Pitluck, Sam; Bruce, David; Land, Miriam L; Han, Shunsheng; Saunders, Elizabeth; Tapia, Roxanne; Detter, Chris; Brettin, Thomas; Han, James; Woyke, Tanja; Goodwin, Lynne; Pennacchio, Len; Nolan, Matt; Cook, Alasdair M; Kjelleberg, Staffan; Thomas, Torsten

    2011-12-31

    Parvibaculum lavamentivorans DS-1(T) is the type species of the novel genus Parvibaculum in the novel family Rhodobiaceae (formerly Phyllobacteriaceae) of the order Rhizobiales of Alphaproteobacteria. Strain DS-1(T) is a non-pigmented, aerobic, heterotrophic bacterium and represents the first tier member of environmentally important bacterial communities that catalyze the complete degradation of synthetic laundry surfactants. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 3,914,745 bp long genome with its predicted 3,654 protein coding genes is the first completed genome sequence of the genus Parvibaculum, and the first genome sequence of a representative of the family Rhodobiaceae.

  16. Characterization of the genomic organization of the region bordering the centromere of chromosome V of Podospora anserina by direct sequencing.

    Science.gov (United States)

    Silar, Philippe; Barreau, Christian; Debuchy, Robert; Kicka, Sébastien; Turcq, Béatrice; Sainsard-Chanet, Annie; Sellem, Carole H; Billault, Alain; Cattolico, Laurence; Duprat, Simone; Weissenbach, Jean

    2003-08-01

    A Podospora anserina BAC library of 4800 clones has been constructed in the vector pBHYG allowing direct selection in fungi. Screening of the BAC collection for centromeric sequences of chromosome V allowed the recovery of clones localized on either sides of the centromere, but no BAC clone was found to contain the centromere. Seven BAC clones containing 322,195 and 156,244bp from either sides of the centromeric region were sequenced and annotated. One 5S rRNA gene, 5 tRNA genes, and 163 putative coding sequences (CDS) were identified. Among these, only six CDS seem specific to P. anserina. The gene density in the centromeric region is approximately one gene every 2.8kb. Extrapolation of this gene density to the whole genome of P. anserina suggests that the genome contains about 11,000 genes. Synteny analyses between P. anserina and Neurospora crassa show that co-linearity extends at the most to a few genes, suggesting rapid genome rearrangements between these two species.

  17. Massively parallel whole genome amplification for single-cell sequencing using droplet microfluidics.

    Science.gov (United States)

    Hosokawa, Masahito; Nishikawa, Yohei; Kogawa, Masato; Takeyama, Haruko

    2017-07-12

    Massively parallel single-cell genome sequencing is required to further understand genetic diversities in complex biological systems. Whole genome amplification (WGA) is the first step for single-cell sequencing, but its throughput and accuracy are insufficient in conventional reaction platforms. Here, we introduce single droplet multiple displacement amplification (sd-MDA), a method that enables massively parallel amplification of single cell genomes while maintaining sequence accuracy and specificity. Tens of thousands of single cells are compartmentalized in millions of picoliter droplets and then subjected to lysis and WGA by passive droplet fusion in microfluidic channels. Because single cells are isolated in compartments, their genomes are amplified to saturation without contamination. This enables the high-throughput acquisition of contamination-free and cell specific sequence reads from single cells (21,000 single-cells/h), resulting in enhancement of the sequence data quality compared to conventional methods. This method allowed WGA of both single bacterial cells and human cancer cells. The obtained sequencing coverage rivals those of conventional techniques with superior sequence quality. In addition, we also demonstrate de novo assembly of uncultured soil bacteria and obtain draft genomes from single cell sequencing. This sd-MDA is promising for flexible and scalable use in single-cell sequencing.

  18. Getting complete genomes from complex samples using nanopore sequencing

    DEFF Research Database (Denmark)

    Kirkegaard, Rasmus Hansen; Karst, Søren Michael; Albertsen, Mads

    Short read sequencing and metagenomic binning workflows have made it possible to extract bacterial genome bins from environmental microbial samples containing hundreds to thousands of different species. However, these genome bins often do not represent complete genomes, as they are mostly...... fragmented, incomplete and often contaminated with foreign DNA and with no robust strategies to validate the quality. The value of these `draft genomes` have limited, lasting value to the scientific community, as gene synteny is broken and the uncertainty of what is missing. The genetic material most often...... missed is important multi-copy and/or conserved marker genes such as the 16S rRNA gene, as sequence micro-heterogeneity prevents assembly of these genes in the de novo assembly. We demonstrate that using nanopore long reads it is now possible to overcome these issues and make complete genomes from...

  19. Draft Genome Sequence of Bacillus coagulans GBI-30, 6086, a Widely Used Spore-Forming Probiotic Strain

    OpenAIRE

    Orrù, Luigi; Salvetti, Elisa; Cattivelli, Luigi; Lamontanara, Antonella; Michelotti, Vania; Capozzi, Vittorio; Spano, Giuseppe; Keller, David; Cash, Howard; Martina, Alessia; Torriani, Sandra; Felis, Giovanna E.

    2014-01-01

    Bacillus coagulans GBI-30, 6086 is a safe strain, already available on the market, and characterized by certified beneficial effects. The draft genome sequence presented here constitutes the first pillar toward the identification of the molecular mechanisms responsible for its positive features and safety.

  20. Comparison of methods for genomic localization of gene trap sequences

    Directory of Open Access Journals (Sweden)

    Ferrin Thomas E

    2006-09-01

    Full Text Available Abstract Background Gene knockouts in a model organism such as mouse provide a valuable resource for the study of basic biology and human disease. Determining which gene has been inactivated by an untargeted gene trapping event poses a challenging annotation problem because gene trap sequence tags, which represent sequence near the vector insertion site of a trapped gene, are typically short and often contain unresolved residues. To understand better the localization of these sequences on the mouse genome, we compared stand-alone versions of the alignment programs BLAT, SSAHA, and MegaBLAST. A set of 3,369 sequence tags was aligned to build 34 of the mouse genome using default parameters for each algorithm. Known genome coordinates for the cognate set of full-length genes (1,659 sequences were used to evaluate localization results. Results In general, all three programs performed well in terms of localizing sequences to a general region of the genome, with only relatively subtle errors identified for a small proportion of the sequence tags. However, large differences in performance were noted with regard to correctly identifying exon boundaries. BLAT correctly identified the vast majority of exon boundaries, while SSAHA and MegaBLAST missed the majority of exon boundaries. SSAHA consistently reported the fewest false positives and is the fastest algorithm. MegaBLAST was comparable to BLAT in speed, but was the most susceptible to localizing sequence tags incorrectly to pseudogenes. Conclusion The differences in performance for sequence tags and full-length reference sequences were surprisingly small. Characteristic variations in localization results for each program were noted that affect the localization of sequence at exon boundaries, in particular.

  1. Complete genome sequence of Serratia plymuthica strain AS12

    Energy Technology Data Exchange (ETDEWEB)

    Neupane, Saraswoti [Uppsala University, Uppsala, Sweden; Finlay, Roger D. [Uppsala University, Uppsala, Sweden; Alstrom, Sadhna [Uppsala University, Uppsala, Sweden; Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute; Lucas, Susan [U.S. Department of Energy, Joint Genome Institute; Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute; Bruce, David [Los Alamos National Laboratory (LANL); Pitluck, Sam [U.S. Department of Energy, Joint Genome Institute; Peters, Lin [U.S. Department of Energy, Joint Genome Institute; Ovchinnikova, Galina [U.S. Department of Energy, Joint Genome Institute; Chertkov, Olga [Los Alamos National Laboratory (LANL); Han, James [U.S. Department of Energy, Joint Genome Institute; Han, Cliff [Los Alamos National Laboratory (LANL); Tapia, Roxanne [Los Alamos National Laboratory (LANL); Detter, J. Chris [U.S. Department of Energy, Joint Genome Institute; Land, Miriam L [ORNL; Hauser, Loren John [ORNL; Cheng, Jan-Fang [U.S. Department of Energy, Joint Genome Institute; Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Pagani, Ioanna [U.S. Department of Energy, Joint Genome Institute; Klenk, Hans-Peter [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Woyke, Tanja [U.S. Department of Energy, Joint Genome Institute; Hogberg, Nils [Uppsala University, Uppsala, Sweden

    2012-01-01

    A plant associated member of the family Enterobacteriaceae, Serratia plymuthica strain AS12 was isolated from rapeseed roots. It is of scientific interest due to its plant growth promoting and plant pathogen inhibiting ability. The genome of S. plymuthica AS12 comprises a 5,443,009 bp long circular chromosome, which consists of 4,952 protein-coding genes, 87 tRNA genes and 7 rRNA operons. This genome was sequenced within the 2010 DOE-JGI Community Sequencing Program (CSP2010) as part of the project entitled 'Genomics of four rapeseed plant growth promoting bacteria with antagonistic effect on plant pathogens'.

  2. Using Whole Genome Analysis to Examine Recombination across Diverse Sequence Types of Staphylococcus aureus.

    Directory of Open Access Journals (Sweden)

    Elizabeth M Driebe

    Full Text Available Staphylococcus aureus is an important clinical pathogen worldwide and understanding this organism's phylogeny and, in particular, the role of recombination, is important both to understand the overall spread of virulent lineages and to characterize outbreaks. To further elucidate the phylogeny of S. aureus, 35 diverse strains were sequenced using whole genome sequencing. In addition, 29 publicly available whole genome sequences were included to create a single nucleotide polymorphism (SNP-based phylogenetic tree encompassing 11 distinct lineages. All strains of a particular sequence type fell into the same clade with clear groupings of the major clonal complexes of CC8, CC5, CC30, CC45 and CC1. Using a novel analysis method, we plotted the homoplasy density and SNP density across the whole genome and found evidence of recombination throughout the entire chromosome, but when we examined individual clonal lineages we found very little recombination. However, when we analyzed three branches of multiple lineages, we saw intermediate and differing levels of recombination between them. These data demonstrate that in S. aureus, recombination occurs across major lineages that subsequently expand in a clonal manner. Estimated mutation rates for the CC8 and CC5 lineages were different from each other. While the CC8 lineage rate was similar to previous studies, the CC5 lineage was 100-fold greater. Fifty known virulence genes were screened in all genomes in silico to determine their distribution across major clades. Thirty-three genes were present variably across clades, most of which were not constrained by ancestry, indicating horizontal gene transfer or gene loss.

  3. CpGAVAS, an integrated web server for the annotation, visualization, analysis, and GenBank submission of completely sequenced chloroplast genome sequences

    Science.gov (United States)

    2012-01-01

    Background The complete sequences of chloroplast genomes provide wealthy information regarding the evolutionary history of species. With the advance of next-generation sequencing technology, the number of completely sequenced chloroplast genomes is expected to increase exponentially, powerful computational tools annotating the genome sequences are in urgent need. Results We have developed a web server CPGAVAS. The server accepts a complete chloroplast genome sequence as input. First, it predicts protein-coding and rRNA genes based on the identification and mapping of the most similar, full-length protein, cDNA and rRNA sequences by integrating results from Blastx, Blastn, protein2genome and est2genome programs. Second, tRNA genes and inverted repeats (IR) are identified using tRNAscan, ARAGORN and vmatch respectively. Third, it calculates the summary statistics for the annotated genome. Fourth, it generates a circular map ready for publication. Fifth, it can create a Sequin file for GenBank submission. Last, it allows the extractions of protein and mRNA sequences for given list of genes and species. The annotation results in GFF3 format can be edited using any compatible annotation editing tools. The edited annotations can then be uploaded to CPGAVAS for update and re-analyses repeatedly. Using known chloroplast genome sequences as test set, we show that CPGAVAS performs comparably to another application DOGMA, while having several superior functionalities. Conclusions CPGAVAS allows the semi-automatic and complete annotation of a chloroplast genome sequence, and the visualization, editing and analysis of the annotation results. It will become an indispensible tool for researchers studying chloroplast genomes. The software is freely accessible from http://www.herbalgenomics.org/cpgavas. PMID:23256920

  4. CpGAVAS, an integrated web server for the annotation, visualization, analysis, and GenBank submission of completely sequenced chloroplast genome sequences

    Directory of Open Access Journals (Sweden)

    Liu Chang

    2012-12-01

    Full Text Available Abstract Background The complete sequences of chloroplast genomes provide wealthy information regarding the evolutionary history of species. With the advance of next-generation sequencing technology, the number of completely sequenced chloroplast genomes is expected to increase exponentially, powerful computational tools annotating the genome sequences are in urgent need. Results We have developed a web server CPGAVAS. The server accepts a complete chloroplast genome sequence as input. First, it predicts protein-coding and rRNA genes based on the identification and mapping of the most similar, full-length protein, cDNA and rRNA sequences by integrating results from Blastx, Blastn, protein2genome and est2genome programs. Second, tRNA genes and inverted repeats (IR are identified using tRNAscan, ARAGORN and vmatch respectively. Third, it calculates the summary statistics for the annotated genome. Fourth, it generates a circular map ready for publication. Fifth, it can create a Sequin file for GenBank submission. Last, it allows the extractions of protein and mRNA sequences for given list of genes and species. The annotation results in GFF3 format can be edited using any compatible annotation editing tools. The edited annotations can then be uploaded to CPGAVAS for update and re-analyses repeatedly. Using known chloroplast genome sequences as test set, we show that CPGAVAS performs comparably to another application DOGMA, while having several superior functionalities. Conclusions CPGAVAS allows the semi-automatic and complete annotation of a chloroplast genome sequence, and the visualization, editing and analysis of the annotation results. It will become an indispensible tool for researchers studying chloroplast genomes. The software is freely accessible from http://www.herbalgenomics.org/cpgavas.

  5. Characterization and complete genome sequences of L. rhamnosus DSM 14870 and L. gasseri DSM 14869 contained in the EcoVag® probiotic vaginal capsules.

    Science.gov (United States)

    Marcotte, Harold; Krogh Andersen, Kasper; Lin, Yin; Zuo, Fanglei; Zeng, Zhu; Larsson, Per Göran; Brandsborg, Erik; Brønstad, Gunnar; Hammarström, Lennart

    2017-12-01

    Lactobacillus rhamnosus DSM 14870 and Lactobacillus gasseri DSM 14869 were previously isolated from the vaginal epithelial cells (VEC) of healthy women and selected for the development of the vaginal EcoVag ® probiotic capsules. EcoVag ® was subsequently shown to provide long-term cure and reduce relapse of bacterial vaginosis (BV) as an adjunct to antibiotic therapy. To identify genes potentially involved in probiotic activity, we performed genome sequencing and characterization of the two strains. The complete genome analysis of both strains revealed the presence of genes encoding functions related to adhesion, exopolysaccharide (EPS) biosynthesis, antimicrobial activity, and CRISPR adaptive immunity but absence of antibiotic resistance genes. Interesting features of L. rhamnosus DSM 14870 genome include the presence of the spaCBA-srtC gene encoding spaCBA pili and interruption of the gene cluster encoding long galactose-rich EPS by integrases. Unique to L. gasseri DSM 14869 genome was the presence of a gene encoding a putative (1456 amino acid) new adhesin containing two rib/alpha-like repeats. L. rhamnosus DSM 14870 and L. gasseri DSM 14869 showed acidification of the culture medium (to pH 3.8) and a strong adhesion capability to the Caco-2 cell line and VEC. L. gasseri DSM 14869 could produce a thick (40nm) EPS layer and hydrogen peroxide. L. rhamnosus DSM 14870 was shown to produce SpaCBA pili and a 20nm EPS layer, and could inhibit the growth of Gardnerella vaginalis, a bacterium commonly associated with BV. The genome sequences provide a basis for further elucidation of the molecular basis for their probiotic functions. Copyright © 2017 Elsevier GmbH. All rights reserved.

  6. Development of Mycoplasma synoviae (MS) core genome multilocus sequence typing (cgMLST) scheme.

    Science.gov (United States)

    Ghanem, Mostafa; El-Gazzar, Mohamed

    2018-05-01

    Mycoplasma synoviae (MS) is a poultry pathogen with reported increased prevalence and virulence in recent years. MS strain identification is essential for prevention, control efforts and epidemiological outbreak investigations. Multiple multilocus based sequence typing schemes have been developed for MS, yet the resolution of these schemes could be limited for outbreak investigation. The cost of whole genome sequencing became close to that of sequencing the seven MLST targets; however, there is no standardized method for typing MS strains based on whole genome sequences. In this paper, we propose a core genome multilocus sequence typing (cgMLST) scheme as a standardized and reproducible method for typing MS based whole genome sequences. A diverse set of 25 MS whole genome sequences were used to identify 302 core genome genes as cgMLST targets (35.5% of MS genome) and 44 whole genome sequences of MS isolates from six countries in four continents were used for typing applying this scheme. cgMLST based phylogenetic trees displayed a high degree of agreement with core genome SNP based analysis and available epidemiological information. cgMLST allowed evaluation of two conventional MLST schemes of MS. The high discriminatory power of cgMLST allowed differentiation between samples of the same conventional MLST type. cgMLST represents a standardized, accurate, highly discriminatory, and reproducible method for differentiation between MS isolates. Like conventional MLST, it provides stable and expandable nomenclature, allowing for comparing and sharing the typing results between different laboratories worldwide. Copyright © 2018 The Authors. Published by Elsevier B.V. All rights reserved.

  7. Draft Genome Sequences of Four Hospital-Associated Pseudomonas putida Isolates.

    Science.gov (United States)

    Mustapha, Mustapha M; Marsh, Jane W; Ezeonwuka, Chinelo D; Pasculle, Anthony W; Pacey, Marissa P; Querry, Ashley M; Muto, Carlene A; Harrison, Lee H

    2016-09-29

    We present here the draft genome sequences of four Pseudomonas putida isolates belonging to a single clone suspected for nosocomial transmission between patients and a bronchoscope in a tertiary hospital. The four genome sequences belong to a single lineage but contain differences in their mobile genetic elements. Copyright © 2016 Mustapha et al.

  8. Complete Genome Sequence of the Human Gut Symbiont Roseburia hominis

    DEFF Research Database (Denmark)

    Travis, Anthony J.; Kelly, Denise; Flint, Harry J

    2015-01-01

    We report here the complete genome sequence of the human gut symbiont Roseburia hominis A2-183(T) (= DSM 16839(T) = NCIMB 14029(T)), isolated from human feces. The genome is represented by a 3,592,125-bp chromosome with 3,405 coding sequences. A number of potential functions contributing to host...

  9. Sequence based polymorphic (SBP marker technology for targeted genomic regions: its application in generating a molecular map of the Arabidopsis thaliana genome

    Directory of Open Access Journals (Sweden)

    Sahu Binod B

    2012-01-01

    Full Text Available Abstract Background Molecular markers facilitate both genotype identification, essential for modern animal and plant breeding, and the isolation of genes based on their map positions. Advancements in sequencing technology have made possible the identification of single nucleotide polymorphisms (SNPs for any genomic regions. Here a sequence based polymorphic (SBP marker technology for generating molecular markers for targeted genomic regions in Arabidopsis is described. Results A ~3X genome coverage sequence of the Arabidopsis thaliana ecotype, Niederzenz (Nd-0 was obtained by applying Illumina's sequencing by synthesis (Solexa technology. Comparison of the Nd-0 genome sequence with the assembled Columbia-0 (Col-0 genome sequence identified putative single nucleotide polymorphisms (SNPs throughout the entire genome. Multiple 75 base pair Nd-0 sequence reads containing SNPs and originating from individual genomic DNA molecules were the basis for developing co-dominant SBP markers. SNPs containing Col-0 sequences, supported by transcript sequences or sequences from multiple BAC clones, were compared to the respective Nd-0 sequences to identify possible restriction endonuclease enzyme site variations. Small amplicons, PCR amplified from both ecotypes, were digested with suitable restriction enzymes and resolved on a gel to reveal the sequence based polymorphisms. By applying this technology, 21 SBP markers for the marker poor regions of the Arabidopsis map representing polymorphisms between Col-0 and Nd-0 ecotypes were generated. Conclusions The SBP marker technology described here allowed the development of molecular markers for targeted genomic regions of Arabidopsis. It should facilitate isolation of co-dominant molecular markers for targeted genomic regions of any animal or plant species, whose genomic sequences have been assembled. This technology will particularly facilitate the development of high density molecular marker maps, essential for

  10. Genomic region operation kit for flexible processing of deep sequencing data.

    Science.gov (United States)

    Ovaska, Kristian; Lyly, Lauri; Sahu, Biswajyoti; Jänne, Olli A; Hautaniemi, Sampsa

    2013-01-01

    Computational analysis of data produced in deep sequencing (DS) experiments is challenging due to large data volumes and requirements for flexible analysis approaches. Here, we present a mathematical formalism based on set algebra for frequently performed operations in DS data analysis to facilitate translation of biomedical research questions to language amenable for computational analysis. With the help of this formalism, we implemented the Genomic Region Operation Kit (GROK), which supports various DS-related operations such as preprocessing, filtering, file conversion, and sample comparison. GROK provides high-level interfaces for R, Python, Lua, and command line, as well as an extension C++ API. It supports major genomic file formats and allows storing custom genomic regions in efficient data structures such as red-black trees and SQL databases. To demonstrate the utility of GROK, we have characterized the roles of two major transcription factors (TFs) in prostate cancer using data from 10 DS experiments. GROK is freely available with a user guide from >http://csbi.ltdk.helsinki.fi/grok/.

  11. Reconstruction of putative DNA virus from endogenous rice tungro bacilliform virus-like sequences in the rice genome: implications for integration and evolution

    Directory of Open Access Journals (Sweden)

    Kishima Yuji

    2004-10-01

    Full Text Available Abstract Background Plant genomes contain various kinds of repetitive sequences such as transposable elements, microsatellites, tandem repeats and virus-like sequences. Most of them, with the exception of virus-like sequences, do not allow us to trace their origins nor to follow the process of their integration into the host genome. Recent discoveries of virus-like sequences in plant genomes led us to set the objective of elucidating the origin of the repetitive sequences. Endogenous rice tungro bacilliform virus (RTBV-like sequences (ERTBVs have been found throughout the rice genome. Here, we reconstructed putative virus structures from RTBV-like sequences in the rice genome and characterized to understand evolutionary implication, integration manner and involvements of endogenous virus segments in the corresponding disease response. Results We have collected ERTBVs from the rice genomes. They contain rearranged structures and no intact ORFs. The identified ERTBV segments were shown to be phylogenetically divided into three clusters. For each phylogenetic cluster, we were able to make a consensus alignment for a circular virus-like structure carrying two complete ORFs. Comparisons of DNA and amino acid sequences suggested the closely relationship between ERTBV and RTBV. The Oryza AA-genome species vary in the ERTBV copy number. The species carrying low-copy-number of ERTBV segments have been reported to be extremely susceptible to RTBV. The DNA methylation state of the ERTBV sequences was correlated with their copy number in the genome. Conclusions These ERTBV segments are unlikely to have functional potential as a virus. However, these sequences facilitate to establish putative virus that provided information underlying virus integration and evolutionary relationship with existing virus. Comparison of ERTBV among the Oryza AA-genome species allowed us to speculate a possible role of endogenous virus segments against its related disease.

  12. Organization and evolution of primate centromeric DNA from whole-genome shotgun sequence data.

    Directory of Open Access Journals (Sweden)

    Can Alkan

    2007-09-01

    Full Text Available The major DNA constituent of primate centromeres is alpha satellite DNA. As much as 2%-5% of sequence generated as part of primate genome sequencing projects consists of this material, which is fragmented or not assembled as part of published genome sequences due to its highly repetitive nature. Here, we develop computational methods to rapidly recover and categorize alpha-satellite sequences from previously uncharacterized whole-genome shotgun sequence data. We present an algorithm to computationally predict potential higher-order array structure based on paired-end sequence data and then experimentally validate its organization and distribution by experimental analyses. Using whole-genome shotgun data from the human, chimpanzee, and macaque genomes, we examine the phylogenetic relationship of these sequences and provide further support for a model for their evolution and mutation over the last 25 million years. Our results confirm fundamental differences in the dispersal and evolution of centromeric satellites in the Old World monkey and ape lineages of evolution.

  13. Organization and evolution of primate centromeric DNA from whole-genome shotgun sequence data.

    Science.gov (United States)

    Alkan, Can; Ventura, Mario; Archidiacono, Nicoletta; Rocchi, Mariano; Sahinalp, S Cenk; Eichler, Evan E

    2007-09-01

    The major DNA constituent of primate centromeres is alpha satellite DNA. As much as 2%-5% of sequence generated as part of primate genome sequencing projects consists of this material, which is fragmented or not assembled as part of published genome sequences due to its highly repetitive nature. Here, we develop computational methods to rapidly recover and categorize alpha-satellite sequences from previously uncharacterized whole-genome shotgun sequence data. We present an algorithm to computationally predict potential higher-order array structure based on paired-end sequence data and then experimentally validate its organization and distribution by experimental analyses. Using whole-genome shotgun data from the human, chimpanzee, and macaque genomes, we examine the phylogenetic relationship of these sequences and provide further support for a model for their evolution and mutation over the last 25 million years. Our results confirm fundamental differences in the dispersal and evolution of centromeric satellites in the Old World monkey and ape lineages of evolution.

  14. SIS: a program to generate draft genome sequence scaffolds for prokaryotes

    Directory of Open Access Journals (Sweden)

    Dias Zanoni

    2012-05-01

    Full Text Available Abstract Background Decreasing costs of DNA sequencing have made prokaryotic draft genome sequences increasingly common. A contig scaffold is an ordering of contigs in the correct orientation. A scaffold can help genome comparisons and guide gap closure efforts. One popular technique for obtaining contig scaffolds is to map contigs onto a reference genome. However, rearrangements that may exist between the query and reference genomes may result in incorrect scaffolds, if these rearrangements are not taken into account. Large-scale inversions are common rearrangement events in prokaryotic genomes. Even in draft genomes it is possible to detect the presence of inversions given sufficient sequencing coverage and a sufficiently close reference genome. Results We present a linear-time algorithm that can generate a set of contig scaffolds for a draft genome sequence represented in contigs given a reference genome. The algorithm is aimed at prokaryotic genomes and relies on the presence of matching sequence patterns between the query and reference genomes that can be interpreted as the result of large-scale inversions; we call these patterns inversion signatures. Our algorithm is capable of correctly generating a scaffold if at least one member of every inversion signature pair is present in contigs and no inversion signatures have been overwritten in evolution. The algorithm is also capable of generating scaffolds in the presence of any kind of inversion, even though in this general case there is no guarantee that all scaffolds in the scaffold set will be correct. We compare the performance of sis, the program that implements the algorithm, to seven other scaffold-generating programs. The results of our tests show that sis has overall better performance. Conclusions sis is a new easy-to-use tool to generate contig scaffolds, available both as stand-alone and as a web server. The good performance of sis in our tests adds evidence that large

  15. De novo assembly of human genomes with massively parallel short read sequencing

    DEFF Research Database (Denmark)

    Li, Ruiqiang; Zhu, Hongmei; Ruan, Jue

    2010-01-01

    genomes from short read sequences. We successfully assembled both the Asian and African human genome sequences, achieving an N50 contig size of 7.4 and 5.9 kilobases (kb) and scaffold of 446.3 and 61.9 kb, respectively. The development of this de novo short read assembly method creates new opportunities...... for building reference sequences and carrying out accurate analyses of unexplored genomes in a cost-effective way....

  16. High-throughput sequencing of three Lemnoideae (duckweeds chloroplast genomes from total DNA.

    Directory of Open Access Journals (Sweden)

    Wenqin Wang

    Full Text Available BACKGROUND: Chloroplast genomes provide a wealth of information for evolutionary and population genetic studies. Chloroplasts play a particularly important role in the adaption for aquatic plants because they float on water and their major surface is exposed continuously to sunlight. The subfamily of Lemnoideae represents such a collection of aquatic species that because of photosynthesis represents one of the fastest growing plant species on earth. METHODS: We sequenced the chloroplast genomes from three different genera of Lemnoideae, Spirodela polyrhiza, Wolffiella lingulata and Wolffia australiana by high-throughput DNA sequencing of genomic DNA using the SOLiD platform. Unfractionated total DNA contains high copies of plastid DNA so that sequences from the nucleus and mitochondria can easily be filtered computationally. Remaining sequence reads were assembled into contiguous sequences (contigs using SOLiD software tools. Contigs were mapped to a reference genome of Lemna minor and gaps, selected by PCR, were sequenced on the ABI3730xl platform. CONCLUSIONS: This combinatorial approach yielded whole genomic contiguous sequences in a cost-effective manner. Over 1,000-time coverage of chloroplast from total DNA were reached by the SOLiD platform in a single spot on a quadrant slide without purification. Comparative analysis indicated that the chloroplast genome was conserved in gene number and organization with respect to the reference genome of L. minor. However, higher nucleotide substitution, abundant deletions and insertions occurred in non-coding regions of these genomes, indicating a greater genomic dynamics than expected from the comparison of other related species in the Pooideae. Noticeably, there was no transition bias over transversion in Lemnoideae. The data should have immediate applications in evolutionary biology and plant taxonomy with increased resolution and statistical power.

  17. Whole genome sequencing-based characterization of extensively drug resistant (XDR) strains of Mycobacterium tuberculosis from Pakistan

    KAUST Repository

    Hasan, Zahra; Ali, Asho; McNerney, Ruth; Mallard, Kim; Hill-Cawthorne, Grant A.; Coll, Francesc; Nair, Mridul; Pain, Arnab; Clark, Taane G.; Hasan, Rumina

    2015-01-01

    Objectives: The global increase in drug resistance in Mycobacterium tuberculosis (MTB) strains increases the focus on improved molecular diagnostics for MTB. Extensively drug-resistant (XDR) - TB is caused by MTB strains resistant to rifampicin, isoniazid, fluoroquinolone and aminoglycoside antibiotics. Resistance to anti-tuberculous drugs has been associated with single nucleotide polymorphisms (SNPs), in particular MTB genes. However, there is regional variation between MTB lineages and the SNPs associated with resistance. Therefore, there is a need to identify common resistance conferring SNPs so that effective molecular-based diagnostic tests for MTB can be developed. This study investigated used whole genome sequencing (WGS) to characterize 37 XDR MTB isolates from Pakistan and investigated SNPs related to drug resistance. Methods: XDR-TB strains were selected. DNA was extracted from MTB strains, and samples underwent WGS with 76-base-paired end fragment sizes using Illumina paired end HiSeq2000 technology. Raw sequence data were mapped uniquely to H37Rv reference genome. The mappings allowed SNPs and small indels to be called using SAMtools/BCFtools. Results: This study found that in all XDR strains, rifampicin resistance was attributable to SNPs in the rpoB RDR region. Isoniazid resistance-associated mutations were primarily related to katG codon 315 followed by inhA S94A. Fluoroquinolone resistance was attributable to gyrA 91-94 codons in most strains, while one did not have SNPs in either gyrA or gyrB. Aminoglycoside resistance was mostly associated with SNPs in rrs, except in 6 strains. Ethambutol resistant strains had embB codon 306 mutations, but many strains did not have this present. The SNPs were compared with those present in commercial assays such as LiPA Hain MDRTBsl, and the sensitivity of the assays for these strains was evaluated. Conclusions: If common drug resistance associated with SNPs evaluated the concordance between phenotypic and

  18. Whole genome sequencing-based characterization of extensively drug resistant (XDR) strains of Mycobacterium tuberculosis from Pakistan

    KAUST Repository

    Hasan, Zahra

    2015-03-01

    Objectives: The global increase in drug resistance in Mycobacterium tuberculosis (MTB) strains increases the focus on improved molecular diagnostics for MTB. Extensively drug-resistant (XDR) - TB is caused by MTB strains resistant to rifampicin, isoniazid, fluoroquinolone and aminoglycoside antibiotics. Resistance to anti-tuberculous drugs has been associated with single nucleotide polymorphisms (SNPs), in particular MTB genes. However, there is regional variation between MTB lineages and the SNPs associated with resistance. Therefore, there is a need to identify common resistance conferring SNPs so that effective molecular-based diagnostic tests for MTB can be developed. This study investigated used whole genome sequencing (WGS) to characterize 37 XDR MTB isolates from Pakistan and investigated SNPs related to drug resistance. Methods: XDR-TB strains were selected. DNA was extracted from MTB strains, and samples underwent WGS with 76-base-paired end fragment sizes using Illumina paired end HiSeq2000 technology. Raw sequence data were mapped uniquely to H37Rv reference genome. The mappings allowed SNPs and small indels to be called using SAMtools/BCFtools. Results: This study found that in all XDR strains, rifampicin resistance was attributable to SNPs in the rpoB RDR region. Isoniazid resistance-associated mutations were primarily related to katG codon 315 followed by inhA S94A. Fluoroquinolone resistance was attributable to gyrA 91-94 codons in most strains, while one did not have SNPs in either gyrA or gyrB. Aminoglycoside resistance was mostly associated with SNPs in rrs, except in 6 strains. Ethambutol resistant strains had embB codon 306 mutations, but many strains did not have this present. The SNPs were compared with those present in commercial assays such as LiPA Hain MDRTBsl, and the sensitivity of the assays for these strains was evaluated. Conclusions: If common drug resistance associated with SNPs evaluated the concordance between phenotypic and

  19. Whole-genome sequence-based analysis of thyroid function

    DEFF Research Database (Denmark)

    Taylor, Peter N.; Porcu, Eleonora; Chew, Shelby

    2015-01-01

    Normal thyroid function is essential for health, but its genetic architecture remains poorly understood. Here, for the heritable thyroid traits thyrotropin (TSH) and free thyroxine (FT4), we analyse whole-genome sequence data from the UK10K project (N = 2,287). Using additional whole-genome seque...

  20. Complete genome sequence of pronghorn virus, a pestivirus

    Science.gov (United States)

    The complete genome sequence of Pronghorn virus, a member of the Pestivirus genus of the Flaviviridae, was determined. The virus, originally isolated from a pronghorn antelope, had a genome of 12,287 nucleotides with a single open reading frame of 11,694 bases encoding 3898 amino acids....

  1. Draft Genome Sequence of Bacillus coagulans GBI-30, 6086, a Widely Used Spore-Forming Probiotic Strain

    Science.gov (United States)

    Orrù, Luigi; Salvetti, Elisa; Cattivelli, Luigi; Lamontanara, Antonella; Michelotti, Vania; Capozzi, Vittorio; Spano, Giuseppe; Keller, David; Cash, Howard; Martina, Alessia; Felis, Giovanna E.

    2014-01-01

    Bacillus coagulans GBI-30, 6086 is a safe strain, already available on the market, and characterized by certified beneficial effects. The draft genome sequence presented here constitutes the first pillar toward the identification of the molecular mechanisms responsible for its positive features and safety. PMID:25377698