WorldWideScience

Sample records for genome shotgun wgs

  1. Re-annotation of the physical map of Glycine max for polyploid-like regions by BAC end sequence driven whole genome shotgun read assembly

    Directory of Open Access Journals (Sweden)

    Shultz Jeffry

    2008-07-01

    Full Text Available Abstract Background Many of the world's most important food crops have either polyploid genomes or homeologous regions derived from segmental shuffling following polyploid formation. The soybean (Glycine max genome has been shown to be composed of approximately four thousand short interspersed homeologous regions with 1, 2 or 4 copies per haploid genome by RFLP analysis, microsatellite anchors to BACs and by contigs formed from BAC fingerprints. Despite these similar regions,, the genome has been sequenced by whole genome shotgun sequence (WGS. Here the aim was to use BAC end sequences (BES derived from three minimum tile paths (MTP to examine the extent and homogeneity of polyploid-like regions within contigs and the extent of correlation between the polyploid-like regions inferred from fingerprinting and the polyploid-like sequences inferred from WGS matches. Results Results show that when sequence divergence was 1–10%, the copy number of homeologous regions could be identified from sequence variation in WGS reads overlapping BES. Homeolog sequence variants (HSVs were single nucleotide polymorphisms (SNPs; 89% and single nucleotide indels (SNIs 10%. Larger indels were rare but present (1%. Simulations that had predicted fingerprints of homeologous regions could be separated when divergence exceeded 2% were shown to be false. We show that a 5–10% sequence divergence is necessary to separate homeologs by fingerprinting. BES compared to WGS traces showed polyploid-like regions with less than 1% sequence divergence exist at 2.3% of the locations assayed. Conclusion The use of HSVs like SNPs and SNIs to characterize BACs wil improve contig building methods. The implications for bioinformatic and functional annotation of polyploid and paleopolyploid genomes show that a combined approach of BAC fingerprint based physical maps, WGS sequence and HSV-based partitioning of BAC clones from homeologous regions to separate contigs will allow reliable de

  2. Genome Improvement at JGI-HAGSC

    Energy Technology Data Exchange (ETDEWEB)

    Grimwood, Jane; Schmutz, Jeremy J.; Myers, Richard M.

    2012-03-03

    Since the completion of the sequencing of the human genome, the Joint Genome Institute (JGI) has rapidly expanded its scientific goals in several DOE mission-relevant areas. At the JGI-HAGSC, we have kept pace with this rapid expansion of projects with our focus on assessing, assembling, improving and finishing eukaryotic whole genome shotgun (WGS) projects for which the shotgun sequence is generated at the Production Genomic Facility (JGI-PGF). We follow this by combining the draft WGS with genomic resources generated at JGI-HAGSC or in collaborator laboratories (including BAC end sequences, genetic maps and FLcDNA sequences) to produce an improved draft sequence. For eukaryotic genomes important to the DOE mission, we then add further information from directed experiments to produce reference genomic sequences that are publicly available for any scientific researcher. Also, we have continued our program for producing BAC-based finished sequence, both for adding information to JGI genome projects and for small BAC-based sequencing projects proposed through any of the JGI sequencing programs. We have now built our computational expertise in WGS assembly and analysis and have moved eukaryotic genome assembly from the JGI-PGF to JGI-HAGSC. We have concentrated our assembly development work on large plant genomes and complex fungal and algal genomes.

  3. Genomic V exons from whole genome shotgun data in reptiles.

    Science.gov (United States)

    Olivieri, D N; von Haeften, B; Sánchez-Espinel, C; Faro, J; Gambón-Deza, F

    2014-08-01

    Reptiles and mammals diverged over 300 million years ago, creating two parallel evolutionary lineages amongst terrestrial vertebrates. In reptiles, two main evolutionary lines emerged: one gave rise to Squamata, while the other gave rise to Testudines, Crocodylia, and Aves. In this study, we determined the genomic variable (V) exons from whole genome shotgun sequencing (WGS) data in reptiles corresponding to the three main immunoglobulin (IG) loci and the four main T cell receptor (TR) loci. We show that Squamata lack the TRG and TRD genes, and snakes lack the IGKV genes. In representative species of Testudines and Crocodylia, the seven major IG and TR loci are maintained. As in mammals, genes of the IG loci can be grouped into well-defined IMGT clans through a multi-species phylogenetic analysis. We show that the reptilian IGHV and IGLV genes are distributed amongst the established mammalian clans, while their IGKV genes are found within a single clan, nearly exclusive from the mammalian sequences. The reptilian and mammalian TRAV genes cluster into six common evolutionary clades (since IMGT clans have not been defined for TR). In contrast, the reptilian TRBV genes cluster into three clades, which have few mammalian members. In this locus, the V exon sequences from mammals appear to have undergone different evolutionary diversification processes that occurred outside these shared reptilian clans. These sequences can be obtained in a freely available public repository (http://vgenerepertoire.org).

  4. Whole genome shotgun sequencing of Indian strains of Streptococcus agalactiae

    Directory of Open Access Journals (Sweden)

    Balaji Veeraraghavan

    2017-12-01

    Full Text Available Group B streptococcus is known as a leading cause of neonatal infections in developing countries. The present study describes the whole genome shotgun sequences of four Group B Streptococcus (GBS isolates. Molecular data on clonality is lacking for GBS in India. The present genome report will add important information on the scarce genome data of GBS and will help in deriving comparative genome studies of GBS isolates at global level. This Whole Genome Shotgun project has been deposited at DDBJ/ENA/GenBank under the accession numbers NHPL00000000 – NHPO00000000.

  5. WGSQuikr: fast whole-genome shotgun metagenomic classification.

    Directory of Open Access Journals (Sweden)

    David Koslicki

    Full Text Available With the decrease in cost and increase in output of whole-genome shotgun technologies, many metagenomic studies are utilizing this approach in lieu of the more traditional 16S rRNA amplicon technique. Due to the large number of relatively short reads output from whole-genome shotgun technologies, there is a need for fast and accurate short-read OTU classifiers. While there are relatively fast and accurate algorithms available, such as MetaPhlAn, MetaPhyler, PhyloPythiaS, and PhymmBL, these algorithms still classify samples in a read-by-read fashion and so execution times can range from hours to days on large datasets. We introduce WGSQuikr, a reconstruction method which can compute a vector of taxonomic assignments and their proportions in the sample with remarkable speed and accuracy. We demonstrate on simulated data that WGSQuikr is typically more accurate and up to an order of magnitude faster than the aforementioned classification algorithms. We also verify the utility of WGSQuikr on real biological data in the form of a mock community. WGSQuikr is a Whole-Genome Shotgun QUadratic, Iterative, K-mer based Reconstruction method which extends the previously introduced 16S rRNA-based algorithm Quikr. A MATLAB implementation of WGSQuikr is available at: http://sourceforge.net/projects/wgsquikr.

  6. Bioinformatics for whole-genome shotgun sequencing of microbial communities.

    Directory of Open Access Journals (Sweden)

    Kevin Chen

    2005-07-01

    Full Text Available The application of whole-genome shotgun sequencing to microbial communities represents a major development in metagenomics, the study of uncultured microbes via the tools of modern genomic analysis. In the past year, whole-genome shotgun sequencing projects of prokaryotic communities from an acid mine biofilm, the Sargasso Sea, Minnesota farm soil, three deep-sea whale falls, and deep-sea sediments have been reported, adding to previously published work on viral communities from marine and fecal samples. The interpretation of this new kind of data poses a wide variety of exciting and difficult bioinformatics problems. The aim of this review is to introduce the bioinformatics community to this emerging field by surveying existing techniques and promising new approaches for several of the most interesting of these computational problems.

  7. Using Growing Self-Organising Maps to Improve the Binning Process in Environmental Whole-Genome Shotgun Sequencing

    Science.gov (United States)

    Chan, Chon-Kit Kenneth; Hsu, Arthur L.; Tang, Sen-Lin; Halgamuge, Saman K.

    2008-01-01

    Metagenomic projects using whole-genome shotgun (WGS) sequencing produces many unassembled DNA sequences and small contigs. The step of clustering these sequences, based on biological and molecular features, is called binning. A reported strategy for binning that combines oligonucleotide frequency and self-organising maps (SOM) shows high potential. We improve this strategy by identifying suitable training features, implementing a better clustering algorithm, and defining quantitative measures for assessing results. We investigated the suitability of each of di-, tri-, tetra-, and pentanucleotide frequencies. The results show that dinucleotide frequency is not a sufficiently strong signature for binning 10 kb long DNA sequences, compared to the other three. Furthermore, we observed that increased order of oligonucleotide frequency may deteriorate the assignment result in some cases, which indicates the possible existence of optimal species-specific oligonucleotide frequency. We replaced SOM with growing self-organising map (GSOM) where comparable results are obtained while gaining 7%–15% speed improvement. PMID:18288261

  8. GT-WGS: an efficient and economic tool for large-scale WGS analyses based on the AWS cloud service.

    Science.gov (United States)

    Wang, Yiqi; Li, Gen; Ma, Mark; He, Fazhong; Song, Zhuo; Zhang, Wei; Wu, Chengkun

    2018-01-19

    Whole-genome sequencing (WGS) plays an increasingly important role in clinical practice and public health. Due to the big data size, WGS data analysis is usually compute-intensive and IO-intensive. Currently it usually takes 30 to 40 h to finish a 50× WGS analysis task, which is far from the ideal speed required by the industry. Furthermore, the high-end infrastructure required by WGS computing is costly in terms of time and money. In this paper, we aim to improve the time efficiency of WGS analysis and minimize the cost by elastic cloud computing. We developed a distributed system, GT-WGS, for large-scale WGS analyses utilizing the Amazon Web Services (AWS). Our system won the first prize on the Wind and Cloud challenge held by Genomics and Cloud Technology Alliance conference (GCTA) committee. The system makes full use of the dynamic pricing mechanism of AWS. We evaluate the performance of GT-WGS with a 55× WGS dataset (400GB fastq) provided by the GCTA 2017 competition. In the best case, it only took 18.4 min to finish the analysis and the AWS cost of the whole process is only 16.5 US dollars. The accuracy of GT-WGS is 99.9% consistent with that of the Genome Analysis Toolkit (GATK) best practice. We also evaluated the performance of GT-WGS performance on a real-world dataset provided by the XiangYa hospital, which consists of 5× whole-genome dataset with 500 samples, and on average GT-WGS managed to finish one 5× WGS analysis task in 2.4 min at a cost of $3.6. WGS is already playing an important role in guiding therapeutic intervention. However, its application is limited by the time cost and computing cost. GT-WGS excelled as an efficient and affordable WGS analyses tool to address this problem. The demo video and supplementary materials of GT-WGS can be accessed at https://github.com/Genetalks/wgs_analysis_demo .

  9. De Novo Assembly of Complete Chloroplast Genomes from Non-model Species Based on a K-mer Frequency-Based Selection of Chloroplast Reads from total DNA Sequences.

    NARCIS (Netherlands)

    Izan, Shairul; Esselink, G.; Visser, R.G.F.; Smulders, M.J.M.; Borm, T.J.A.

    2017-01-01

    Whole Genome Shotgun (WGS) sequences of plant species often contain an abundance of reads that are derived from the chloroplast genome. Up to now these reads have generally been identified and assembled into chloroplast genomes based on homology to chloroplasts from related species. This

  10. The genome of flax (Linum usitatissimum) assembled de novo from short shotgun sequence reads.

    Science.gov (United States)

    Wang, Zhiwen; Hobson, Neil; Galindo, Leonardo; Zhu, Shilin; Shi, Daihu; McDill, Joshua; Yang, Linfeng; Hawkins, Simon; Neutelings, Godfrey; Datla, Raju; Lambert, Georgina; Galbraith, David W; Grassa, Christopher J; Geraldes, Armando; Cronk, Quentin C; Cullis, Christopher; Dash, Prasanta K; Kumar, Polumetla A; Cloutier, Sylvie; Sharpe, Andrew G; Wong, Gane K-S; Wang, Jun; Deyholos, Michael K

    2012-11-01

    Flax (Linum usitatissimum) is an ancient crop that is widely cultivated as a source of fiber, oil and medicinally relevant compounds. To accelerate crop improvement, we performed whole-genome shotgun sequencing of the nuclear genome of flax. Seven paired-end libraries ranging in size from 300 bp to 10 kb were sequenced using an Illumina genome analyzer. A de novo assembly, comprised exclusively of deep-coverage (approximately 94× raw, approximately 69× filtered) short-sequence reads (44-100 bp), produced a set of scaffolds with N(50) =694 kb, including contigs with N(50)=20.1 kb. The contig assembly contained 302 Mb of non-redundant sequence representing an estimated 81% genome coverage. Up to 96% of published flax ESTs aligned to the whole-genome shotgun scaffolds. However, comparisons with independently sequenced BACs and fosmids showed some mis-assembly of regions at the genome scale. A total of 43384 protein-coding genes were predicted in the whole-genome shotgun assembly, and up to 93% of published flax ESTs, and 86% of A. thaliana genes aligned to these predicted genes, indicating excellent coverage and accuracy at the gene level. Analysis of the synonymous substitution rates (K(s) ) observed within duplicate gene pairs was consistent with a recent (5-9 MYA) whole-genome duplication in flax. Within the predicted proteome, we observed enrichment of many conserved domains (Pfam-A) that may contribute to the unique properties of this crop, including agglutinin proteins. Together these results show that de novo assembly, based solely on whole-genome shotgun short-sequence reads, is an efficient means of obtaining nearly complete genome sequence information for some plant species. © 2012 The Authors. The Plant Journal © 2012 Blackwell Publishing Ltd.

  11. The genome of flax (Linum usitatissimum) assembled de novo from short shotgun sequence reads

    DEFF Research Database (Denmark)

    Wang, Zhiwen; Hobson, Neil; Galindo, Leonardo

    2012-01-01

    Flax (Linum usitatissimum) is an ancient crop that is widely cultivated as a source of fiber, oil and medicinally relevant compounds. To accelerate crop improvement, we performed whole-genome shotgun sequencing of the nuclear genome of flax. Seven paired-end libraries ranging in size from 300 bp...... these results show that de novo assembly, based solely on whole-genome shotgun short-sequence reads, is an efficient means of obtaining nearly complete genome sequence information for some plant species....

  12. Organization and evolution of primate centromeric DNA from whole-genome shotgun sequence data.

    Directory of Open Access Journals (Sweden)

    Can Alkan

    2007-09-01

    Full Text Available The major DNA constituent of primate centromeres is alpha satellite DNA. As much as 2%-5% of sequence generated as part of primate genome sequencing projects consists of this material, which is fragmented or not assembled as part of published genome sequences due to its highly repetitive nature. Here, we develop computational methods to rapidly recover and categorize alpha-satellite sequences from previously uncharacterized whole-genome shotgun sequence data. We present an algorithm to computationally predict potential higher-order array structure based on paired-end sequence data and then experimentally validate its organization and distribution by experimental analyses. Using whole-genome shotgun data from the human, chimpanzee, and macaque genomes, we examine the phylogenetic relationship of these sequences and provide further support for a model for their evolution and mutation over the last 25 million years. Our results confirm fundamental differences in the dispersal and evolution of centromeric satellites in the Old World monkey and ape lineages of evolution.

  13. Organization and evolution of primate centromeric DNA from whole-genome shotgun sequence data.

    Science.gov (United States)

    Alkan, Can; Ventura, Mario; Archidiacono, Nicoletta; Rocchi, Mariano; Sahinalp, S Cenk; Eichler, Evan E

    2007-09-01

    The major DNA constituent of primate centromeres is alpha satellite DNA. As much as 2%-5% of sequence generated as part of primate genome sequencing projects consists of this material, which is fragmented or not assembled as part of published genome sequences due to its highly repetitive nature. Here, we develop computational methods to rapidly recover and categorize alpha-satellite sequences from previously uncharacterized whole-genome shotgun sequence data. We present an algorithm to computationally predict potential higher-order array structure based on paired-end sequence data and then experimentally validate its organization and distribution by experimental analyses. Using whole-genome shotgun data from the human, chimpanzee, and macaque genomes, we examine the phylogenetic relationship of these sequences and provide further support for a model for their evolution and mutation over the last 25 million years. Our results confirm fundamental differences in the dispersal and evolution of centromeric satellites in the Old World monkey and ape lineages of evolution.

  14. Implementation of Whole Genome Sequencing (WGS for Identification and Characterization of Shiga Toxin-Producing Escherichia coli (STEC in the United States

    Directory of Open Access Journals (Sweden)

    Rebecca L Lindsey

    2016-05-01

    Full Text Available Shiga toxin-producing Escherichia coli (STEC is an important foodborne pathogen capable of causing severe disease in humans. Rapid and accurate identification and characterization techniques are essential during outbreak investigations. Current methods for characterization of STEC are expensive and time-consuming. With the advent of rapid and cheap whole genome sequencing (WGS benchtop sequencers, the potential exists to replace traditional workflows with WGS. The aim of this study was to validate tools to do reference identification and characterization from WGS for STEC in a single workflow within an easy to use commercially available software platform. Publically available serotype, virulence, and antimicrobial resistance databases were downloaded from the Center for Genomic Epidemiology (CGE (www.genomicepidemiology.org and integrated into a genotyping plug-in with in silico PCR tools to confirm some of the virulence genes detected from WGS data. Additionally, down sampling experiments on the WGS sequence data were performed to determine a threshold for sequence coverage needed to accurately predict serotype and virulence genes using the established workflow. The serotype database was tested on a total of 228 genomes and correctly predicted from WGS for 96.1% of O serogroups and 96.5% of H serogroups identified by conventional testing techniques. A total of 59 genomes were evaluated to determine the threshold of coverage to detect the different WGS targets, 40 were evaluated for serotype and virulence gene detection and 19 for the stx gene subtypes. For serotype, 95% of the O and 100% of the H serogroups were detected at > 40x and ≥ 30x coverage, respectively. For virulence targets and stx gene subtypes, nearly all genes were detected at > 40x, though some targets were 100% detectable from genomes with coverage ≥20x. The resistance detection tool was 97% concordant with phenotypic testing results. With isolates sequenced to > 40x

  15. Implementation of Whole Genome Sequencing (WGS) for Identification and Characterization of Shiga Toxin-Producing Escherichia coli (STEC) in the United States

    Science.gov (United States)

    Lindsey, Rebecca L.; Pouseele, Hannes; Chen, Jessica C.; Strockbine, Nancy A.; Carleton, Heather A.

    2016-01-01

    Shiga toxin-producing Escherichia coli (STEC) is an important foodborne pathogen capable of causing severe disease in humans. Rapid and accurate identification and characterization techniques are essential during outbreak investigations. Current methods for characterization of STEC are expensive and time-consuming. With the advent of rapid and cheap whole genome sequencing (WGS) benchtop sequencers, the potential exists to replace traditional workflows with WGS. The aim of this study was to validate tools to do reference identification and characterization from WGS for STEC in a single workflow within an easy to use commercially available software platform. Publically available serotype, virulence, and antimicrobial resistance databases were downloaded from the Center for Genomic Epidemiology (CGE) (www.genomicepidemiology.org) and integrated into a genotyping plug-in with in silico PCR tools to confirm some of the virulence genes detected from WGS data. Additionally, down sampling experiments on the WGS sequence data were performed to determine a threshold for sequence coverage needed to accurately predict serotype and virulence genes using the established workflow. The serotype database was tested on a total of 228 genomes and correctly predicted from WGS for 96.1% of O serogroups and 96.5% of H serogroups identified by conventional testing techniques. A total of 59 genomes were evaluated to determine the threshold of coverage to detect the different WGS targets, 40 were evaluated for serotype and virulence gene detection and 19 for the stx gene subtypes. For serotype, 95% of the O and 100% of the H serogroups were detected at > 40x and ≥ 30x coverage, respectively. For virulence targets and stx gene subtypes, nearly all genes were detected at > 40x, though some targets were 100% detectable from genomes with coverage ≥20x. The resistance detection tool was 97% concordant with phenotypic testing results. With isolates sequenced to > 40x coverage, the different

  16. Population genetic analysis of shotgun assemblies of genomic sequences from multiple individuals

    DEFF Research Database (Denmark)

    Hellmann, Ines; Mang, Yuan; Gu, Zhiping

    2008-01-01

    We introduce a simple, broadly applicable method for obtaining estimates of nucleotide diversity from genomic shotgun sequencing data. The method takes into account the special nature of these data: random sampling of genomic segments from one or more individuals and a relatively high error rate...... for individual reads. Applying this method to data from the Celera human genome sequencing and SNP discovery project, we obtain estimates of nucleotide diversity in windows spanning the human genome and show that the diversity to divergence ratio is reduced in regions of low recombination. Furthermore, we show...

  17. Cross-border outbreak of listeriosis caused by cold-smoked salmon, revealed by integrated surveillance and whole genome sequencing (WGS), Denmark and France, 2015 to 2017

    DEFF Research Database (Denmark)

    Schjorring, Susanne; Lassen, Sofie Gillesberg; Jensen, Tenna

    2017-01-01

    In August 2017, an outbreak of six listeriosis cases in Denmark was traced to cold-smoked salmon, using epidemiological investigations and whole-genome sequencing (WGS) analyses. Exchange of genome sequences allowed identification in France of a food isolate from a salmon-derived product and a hu......In August 2017, an outbreak of six listeriosis cases in Denmark was traced to cold-smoked salmon, using epidemiological investigations and whole-genome sequencing (WGS) analyses. Exchange of genome sequences allowed identification in France of a food isolate from a salmon-derived product...... and a human isolate from 2016 within the same cgMLST cluster as the Danish isolates (L2-SL8-ST8-CT771). The salmon product came from a third European Union country. WGS can rapidly link human cases and food isolates across Europe....

  18. The European sea bass Dicentrarchus labrax genome puzzle: comparative BAC-mapping and low coverage shotgun sequencing

    Directory of Open Access Journals (Sweden)

    Volckaert Filip AM

    2010-01-01

    Full Text Available Abstract Background Food supply from the ocean is constrained by the shortage of domesticated and selected fish. Development of genomic models of economically important fishes should assist with the removal of this bottleneck. European sea bass Dicentrarchus labrax L. (Moronidae, Perciformes, Teleostei is one of the most important fishes in European marine aquaculture; growing genomic resources put it on its way to serve as an economic model. Results End sequencing of a sea bass genomic BAC-library enabled the comparative mapping of the sea bass genome using the three-spined stickleback Gasterosteus aculeatus genome as a reference. BAC-end sequences (102,690 were aligned to the stickleback genome. The number of mappable BACs was improved using a two-fold coverage WGS dataset of sea bass resulting in a comparative BAC-map covering 87% of stickleback chromosomes with 588 BAC-contigs. The minimum size of 83 contigs covering 50% of the reference was 1.2 Mbp; the largest BAC-contig comprised 8.86 Mbp. More than 22,000 BAC-clones aligned with both ends to the reference genome. Intra-chromosomal rearrangements between sea bass and stickleback were identified. Size distributions of mapped BACs were used to calculate that the genome of sea bass may be only 1.3 fold larger than the 460 Mbp stickleback genome. Conclusions The BAC map is used for sequencing single BACs or BAC-pools covering defined genomic entities by second generation sequencing technologies. Together with the WGS dataset it initiates a sea bass genome sequencing project. This will allow the quantification of polymorphisms through resequencing, which is important for selecting highly performing domesticated fish.

  19. High-throughput automated microfluidic sample preparation for accurate microbial genomics.

    Science.gov (United States)

    Kim, Soohong; De Jonghe, Joachim; Kulesa, Anthony B; Feldman, David; Vatanen, Tommi; Bhattacharyya, Roby P; Berdy, Brittany; Gomez, James; Nolan, Jill; Epstein, Slava; Blainey, Paul C

    2017-01-27

    Low-cost shotgun DNA sequencing is transforming the microbial sciences. Sequencing instruments are so effective that sample preparation is now the key limiting factor. Here, we introduce a microfluidic sample preparation platform that integrates the key steps in cells to sequence library sample preparation for up to 96 samples and reduces DNA input requirements 100-fold while maintaining or improving data quality. The general-purpose microarchitecture we demonstrate supports workflows with arbitrary numbers of reaction and clean-up or capture steps. By reducing the sample quantity requirements, we enabled low-input (∼10,000 cells) whole-genome shotgun (WGS) sequencing of Mycobacterium tuberculosis and soil micro-colonies with superior results. We also leveraged the enhanced throughput to sequence ∼400 clinical Pseudomonas aeruginosa libraries and demonstrate excellent single-nucleotide polymorphism detection performance that explained phenotypically observed antibiotic resistance. Fully-integrated lab-on-chip sample preparation overcomes technical barriers to enable broader deployment of genomics across many basic research and translational applications.

  20. De Novo Assembly of Complete Chloroplast Genomes from Non-model Species Based on a K-mer Frequency-Based Selection of Chloroplast Reads from Total DNA Sequences

    Directory of Open Access Journals (Sweden)

    Shairul Izan

    2017-08-01

    Full Text Available Whole Genome Shotgun (WGS sequences of plant species often contain an abundance of reads that are derived from the chloroplast genome. Up to now these reads have generally been identified and assembled into chloroplast genomes based on homology to chloroplasts from related species. This re-sequencing approach may select against structural differences between the genomes especially in non-model species for which no close relatives have been sequenced before. The alternative approach is to de novo assemble the chloroplast genome from total genomic DNA sequences. In this study, we used k-mer frequency tables to identify and extract the chloroplast reads from the WGS reads and assemble these using a highly integrated and automated custom pipeline. Our strategy includes steps aimed at optimizing assemblies and filling gaps which are left due to coverage variation in the WGS dataset. We have successfully de novo assembled three complete chloroplast genomes from plant species with a range of nuclear genome sizes to demonstrate the universality of our approach: Solanum lycopersicum (0.9 Gb, Aegilops tauschii (4 Gb and Paphiopedilum henryanum (25 Gb. We also highlight the need to optimize the choice of k and the amount of data used. This new and cost-effective method for de novo short read assembly will facilitate the study of complete chloroplast genomes with more accurate analyses and inferences, especially in non-model plant genomes.

  1. The Pinus taeda genome is characterized by diverse and highly diverged repetitive sequences

    Directory of Open Access Journals (Sweden)

    Yandell Mark

    2010-07-01

    Full Text Available Abstract Background In today's age of genomic discovery, no attempt has been made to comprehensively sequence a gymnosperm genome. The largest genus in the coniferous family Pinaceae is Pinus, whose 110-120 species have extremely large genomes (c. 20-40 Gb, 2N = 24. The size and complexity of these genomes have prompted much speculation as to the feasibility of completing a conifer genome sequence. Conifer genomes are reputed to be highly repetitive, but there is little information available on the nature and identity of repetitive units in gymnosperms. The pines have extensive genetic resources, with approximately 329000 ESTs from eleven species and genetic maps in eight species, including a dense genetic map of the twelve linkage groups in Pinus taeda. Results We present here the Sanger sequence and annotation of ten P. taeda BAC clones and Genome Analyzer II whole genome shotgun (WGS sequences representing 7.5% of the genome. Computational annotation of ten BACs predicts three putative protein-coding genes and at least fifteen likely pseudogenes in nearly one megabase of sequence. We found three conifer-specific LTR retroelements in the BACs, and tentatively identified at least 15 others based on evidence from the distantly related angiosperms. Alignment of WGS sequences to the BACs indicates that 80% of BAC sequences have similar copies (≥ 75% nucleotide identity elsewhere in the genome, but only 23% have identical copies (99% identity. The three most common repetitive elements in the genome were identified and, when combined, represent less than 5% of the genome. Conclusions This study indicates that the majority of repeats in the P. taeda genome are 'novel' and will therefore require additional BAC or genomic sequencing for accurate characterization. The pine genome contains a very large number of diverged and probably defunct repetitive elements. This study also provides new evidence that sequencing a pine genome using a WGS approach is

  2. Pigs in sequence space: A 0.66X coverage pig genome survey based on shotgun sequencing

    Directory of Open Access Journals (Sweden)

    Li Wei

    2005-05-01

    Full Text Available Abstract Background Comparative whole genome analysis of Mammalia can benefit from the addition of more species. The pig is an obvious choice due to its economic and medical importance as well as its evolutionary position in the artiodactyls. Results We have generated ~3.84 million shotgun sequences (0.66X coverage from the pig genome. The data are hereby released (NCBI Trace repository with center name "SDJVP", and project name "Sino-Danish Pig Genome Project" together with an initial evolutionary analysis. The non-repetitive fraction of the sequences was aligned to the UCSC human-mouse alignment and the resulting three-species alignments were annotated using the human genome annotation. Ultra-conserved elements and miRNAs were identified. The results show that for each of these types of orthologous data, pig is much closer to human than mouse is. Purifying selection has been more efficient in pig compared to human, but not as efficient as in mouse, and pig seems to have an isochore structure most similar to the structure in human. Conclusion The addition of the pig to the set of species sequenced at low coverage adds to the understanding of selective pressures that have acted on the human genome by bisecting the evolutionary branch between human and mouse with the mouse branch being approximately 3 times as long as the human branch. Additionally, the joint alignment of the shot-gun sequences to the human-mouse alignment offers the investigator a rapid way to defining specific regions for analysis and resequencing.

  3. WGS accurately predicts antimicrobial resistance in Escherichia coli

    Science.gov (United States)

    Objectives: To determine the effectiveness of whole-genome sequencing (WGS) in identifying resistance genotypes of multidrug-resistant Escherichia coli (E. coli) and whether these correlate with observed phenotypes. Methods: Seventy-six E. coli strains were isolated from farm cattle and measured f...

  4. Whole-genome typing and characterization of blaVIM19-harbouring ST383 Klebsiella pneumoniae by PFGE, whole-genome mapping and WGS.

    Science.gov (United States)

    Sabirova, Julia S; Xavier, Basil Britto; Coppens, Jasmine; Zarkotou, Olympia; Lammens, Christine; Janssens, Lore; Burggrave, Ronald; Wagner, Trevor; Goossens, Herman; Malhotra-Kumar, Surbhi

    2016-06-01

    We utilized whole-genome mapping (WGM) and WGS to characterize 12 clinical carbapenem-resistant Klebsiella pneumoniae strains (TGH1-TGH12). All strains were screened for carbapenemase genes by PCR, and typed by MLST, PFGE (XbaI) and WGM (AflII) (OpGen, USA). WGS (Illumina) was performed on TGH8 and TGH10. Reads were de novo assembled and annotated [SPAdes, Rapid Annotation Subsystem Technology (RAST)]. Contigs were aligned directly, and after in silico AflII restriction, with corresponding WGMs (MapSolver, OpGen; BioNumerics, Applied Maths). All 12 strains were ST383. Of the 12 strains, 11 were carbapenem resistant, 7 harboured blaKPC-2 and 11 harboured blaVIM-19. Varying the parameters for assigning WGM clusters showed that these were comparable to STs and to the eight PFGE types or subtypes (difference of three or more bands). A 95% similarity coefficient assigned all 12 WGMs to a single cluster, whereas a 99% similarity coefficient (or ≥10 unmatched-fragment difference) assigned the 12 WGMs to eight (sub)clusters. Based on a difference of three or more bands between PFGE profiles, the Simpson's diversity indices (SDIs) of WGM (0.94, Jackknife pseudo-values CI: 0.883-0.996) and PFGE (0.93, Jackknife pseudo-values CI: 0.828-1.000) were similar (P = 0.649). However, the discriminatory power of WGM was significantly higher (SDI: 0.94, Jackknife pseudo-values CI: 0.883-0.996) than that of PFGE profiles typed on a difference of seven or more bands (SDI: 0.53, Jackknife pseudo-values CI: 0.212-0.849) (P = 0.007). This study demonstrates the application of WGM to understanding the epidemiology of hospital-associated K. pneumoniae. Utilizing a combination of WGM and WGS, we also present here the first longitudinal genomic characterization of the highly dynamic carbapenem-resistant ST383 K. pneumoniae clone that is rapidly gaining importance in Europe. © The Author 2016. Published by Oxford University Press on behalf of the British Society for Antimicrobial

  5. Genome resources for climate-resilient cowpea, an essential crop for food security.

    Science.gov (United States)

    Muñoz-Amatriaín, María; Mirebrahim, Hamid; Xu, Pei; Wanamaker, Steve I; Luo, MingCheng; Alhakami, Hind; Alpert, Matthew; Atokple, Ibrahim; Batieno, Benoit J; Boukar, Ousmane; Bozdag, Serdar; Cisse, Ndiaga; Drabo, Issa; Ehlers, Jeffrey D; Farmer, Andrew; Fatokun, Christian; Gu, Yong Q; Guo, Yi-Ning; Huynh, Bao-Lam; Jackson, Scott A; Kusi, Francis; Lawley, Cynthia T; Lucas, Mitchell R; Ma, Yaqin; Timko, Michael P; Wu, Jiajie; You, Frank; Barkley, Noelle A; Roberts, Philip A; Lonardi, Stefano; Close, Timothy J

    2017-03-01

    Cowpea (Vigna unguiculata L. Walp.) is a legume crop that is resilient to hot and drought-prone climates, and a primary source of protein in sub-Saharan Africa and other parts of the developing world. However, genome resources for cowpea have lagged behind most other major crops. Here we describe foundational genome resources and their application to the analysis of germplasm currently in use in West African breeding programs. Resources developed from the African cultivar IT97K-499-35 include a whole-genome shotgun (WGS) assembly, a bacterial artificial chromosome (BAC) physical map, and assembled sequences from 4355 BACs. These resources and WGS sequences of an additional 36 diverse cowpea accessions supported the development of a genotyping assay for 51 128 SNPs, which was then applied to five bi-parental RIL populations to produce a consensus genetic map containing 37 372 SNPs. This genetic map enabled the anchoring of 100 Mb of WGS and 420 Mb of BAC sequences, an exploration of genetic diversity along each linkage group, and clarification of macrosynteny between cowpea and common bean. The SNP assay enabled a diversity analysis of materials from West African breeding programs. Two major subpopulations exist within those materials, one of which has significant parentage from South and East Africa and more diversity. There are genomic regions of high differentiation between subpopulations, one of which coincides with a cluster of nodulin genes. The new resources and knowledge help to define goals and accelerate the breeding of improved varieties to address food security issues related to limited-input small-holder farming and climate stress. © 2016 The Authors. The Plant Journal published by John Wiley & Sons Ltd and Society for Experimental Biology.

  6. Real-Time WGS-based Typing of VTEC Isolates for Surveillance and Outbreak Detection

    DEFF Research Database (Denmark)

    Joensen, Katrine Grimstrup; Hasman, Henrik; Scheutz, F.

    2013-01-01

    the IonTorrent PGM benchtop sequencing technology. WGS-based typing was carried out using web-based tools, developed by the Center for Genomic Epidemiology (www.genomicepidemiology.org), for determination of MLST types, virulence genes and phylogenetic relationship between the isolates. The WGS-based...... a small outbreak occurred. For all isolates, apart from one resulting in poor sequence output, the WGS-based typing led to detection of the same virulence gene variants as the routine typing, and was also able to detect many other possible virulence features, and in most instances produce a useful typing...... result faster than routine typing. Also, the WGS-approach was able to correctly detect, according to the routine typing, the isolates belonging to the outbreak. Conclusion: The real-time WGS-based typing was able to produce typing results comparable to the routine typing, at least as fast as the routine...

  7. Genomic analyses of the Chlamydia trachomatis core genome show an association between chromosomal genome, plasmid type and disease

    NARCIS (Netherlands)

    Versteeg, Bart; Bruisten, Sylvia M.; Pannekoek, Yvonne; Jolley, Keith A.; Maiden, Martin C. J.; van der Ende, Arie; Harrison, Odile B.

    2018-01-01

    Background: Chlamydia trachomatis (Ct) plasmid has been shown to encode genes essential for infection. We evaluated the population structure of Ct using whole-genome sequence data (WGS). In particular, the relationship between the Ct genome, plasmid and disease was investigated. Results: WGS data

  8. A Statistical Framework for the Functional Analysis of Metagenomes

    Energy Technology Data Exchange (ETDEWEB)

    Sharon, Itai; Pati, Amrita; Markowitz, Victor; Pinter, Ron Y.

    2008-10-01

    Metagenomic studies consider the genetic makeup of microbial communities as a whole, rather than their individual member organisms. The functional and metabolic potential of microbial communities can be analyzed by comparing the relative abundance of gene families in their collective genomic sequences (metagenome) under different conditions. Such comparisons require accurate estimation of gene family frequencies. They present a statistical framework for assessing these frequencies based on the Lander-Waterman theory developed originally for Whole Genome Shotgun (WGS) sequencing projects. They also provide a novel method for assessing the reliability of the estimations which can be used for removing seemingly unreliable measurements. They tested their method on a wide range of datasets, including simulated genomes and real WGS data from sequencing projects of whole genomes. Results suggest that their framework corrects inherent biases in accepted methods and provides a good approximation to the true statistics of gene families in WGS projects.

  9. Identification of antimicrobial resistance genes in multidrug-resistant clinical Bacteroides fragilis isolates by whole genome shotgun sequencing

    DEFF Research Database (Denmark)

    Sydenham, Thomas Vognbjerg; Sóki, József; Hasman, Henrik

    2015-01-01

    Bacteroides fragilis constitutes the most frequent anaerobic bacterium causing bacteremia in humans. The genetic background for antimicrobial resistance in B. fragilis is diverse with some genes requiring insertion sequence (IS) elements inserted upstream for increased expression. To evaluate whole...... genome shotgun sequencing as a method for predicting antimicrobial resistance properties, one meropenem resistant and five multidrug-resistant blood culture isolates were sequenced and antimicrobial resistance genes and IS elements identified using ResFinder 2.1 (http...

  10. Genome Sequencing

    DEFF Research Database (Denmark)

    Sato, Shusei; Andersen, Stig Uggerhøj

    2014-01-01

    The current Lotus japonicus reference genome sequence is based on a hybrid assembly of Sanger TAC/BAC, Sanger shotgun and Illumina shotgun sequencing data generated from the Miyakojima-MG20 accession. It covers nearly all expressed L. japonicus genes and has been annotated mainly based on transcr......The current Lotus japonicus reference genome sequence is based on a hybrid assembly of Sanger TAC/BAC, Sanger shotgun and Illumina shotgun sequencing data generated from the Miyakojima-MG20 accession. It covers nearly all expressed L. japonicus genes and has been annotated mainly based...

  11. Initial characterization of the large genome of the salamander Ambystoma mexicanum using shotgun and laser capture chromosome sequencing.

    Science.gov (United States)

    Keinath, Melissa C; Timoshevskiy, Vladimir A; Timoshevskaya, Nataliya Y; Tsonis, Panagiotis A; Voss, S Randal; Smith, Jeramiah J

    2015-11-10

    Vertebrates exhibit substantial diversity in genome size, and some of the largest genomes exist in species that uniquely inform diverse areas of basic and biomedical research. For example, the salamander Ambystoma mexicanum (the Mexican axolotl) is a model organism for studies of regeneration, development and genome evolution, yet its genome is ~10× larger than the human genome. As part of a hierarchical approach toward improving genome resources for the species, we generated 600 Gb of shotgun sequence data and developed methods for sequencing individual laser-captured chromosomes. Based on these data, we estimate that the A. mexicanum genome is ~32 Gb. Notably, as much as 19 Gb of the A. mexicanum genome can potentially be considered single copy, which presumably reflects the evolutionary diversification of mobile elements that accumulated during an ancient episode of genome expansion. Chromosome-targeted sequencing permitted the development of assemblies within the constraints of modern computational platforms, allowed us to place 2062 genes on the two smallest A. mexicanum chromosomes and resolves key events in the history of vertebrate genome evolution. Our analyses show that the capture and sequencing of individual chromosomes is likely to provide valuable information for the systematic sequencing, assembly and scaffolding of large genomes.

  12. Comparison of typing by whole genome sequencing (WGS) and pulsed field gel electrophoresis (PFGE) of isolates from a hospital outbreak with a CTX-M-15 producing Klebsiella pneumoniae

    DEFF Research Database (Denmark)

    Hansen, D. S.; Kaas, Rolf Sommer; Nielsen, E. M.

    2013-01-01

    strains, were investigated. The 44 isolates were phenotypic similar; ESBL-producers with reduced susceptibility to gentamicin and ciprofloxacin. PFGE was performed using XbaI; data handling in BioNumeric. For WGS a reference genome (outbreak isolate 2006-1-264) was assembled to a single scaffold using...

  13. Characterizing neutral genomic diversity and selection signatures in indigenous populations of Moroccan goats (Capra hircus using WGS data

    Directory of Open Access Journals (Sweden)

    Badr eBenjelloun

    2015-04-01

    Full Text Available Since the time of their domestication, goats (Capra hircus have evolved in a large variety of locally adapted populations in response to different human and environmental pressures. In the present era, many indigenous populations are threatened with extinction due to their substitution by cosmopolitan breeds, while they might represent highly valuable genomic resources. It is thus crucial to characterize the neutral and adaptive genetic diversity of indigenous populations. A fine characterization of whole genome variation in farm animals is now possible by using new sequencing technologies. We sequenced the complete genome at 12X coverage of 44 goats geographically representative of the three phenotypically distinct indigenous populations in Morocco. The study of mitochondrial genomes showed a high diversity exclusively restricted to the haplogroup A. The 44 nuclear genomes showed a very high diversity (24 million variants associated with low linkage disequilibrium. The overall genetic diversity was weakly structured according to geography and phenotypes. When looking for signals of positive selection in each population we identified many candidate genes, several of which gave insights into the metabolic pathways or biological processes involved in the adaptation to local conditions (e.g. panting in warm/desert conditions. This study highlights the interest of WGS data to characterize livestock genomic diversity. It illustrates the valuable genetic richness present in indigenous populations that have to be sustainably managed and may represent valuable genetic resources for the long-term preservation of the species.

  14. Phenotypic Heterogeneity of Genomically-Diverse Isolates of Streptococcus mutans

    Science.gov (United States)

    Palmer, Sara R.; Miller, James H.; Abranches, Jacqueline; Zeng, Lin; Lefebure, Tristan; Richards, Vincent P.; Lemos, José A.; Stanhope, Michael J.; Burne, Robert A.

    2013-01-01

    High coverage, whole genome shotgun (WGS) sequencing of 57 geographically- and genetically-diverse isolates of Streptococcus mutans from individuals of known dental caries status was recently completed. Of the 57 sequenced strains, fifteen isolates, were selected based primarily on differences in gene content and phenotypic characteristics known to affect virulence and compared with the reference strain UA159. A high degree of variability in these properties was observed between strains, with a broad spectrum of sensitivities to low pH, oxidative stress (air and paraquat) and exposure to competence stimulating peptide (CSP). Significant differences in autolytic behavior and in biofilm development in glucose or sucrose were also observed. Natural genetic competence varied among isolates, and this was correlated to the presence or absence of competence genes, comCDE and comX, and to bacteriocins. In general strains that lacked the ability to become competent possessed fewer genes for bacteriocins and immunity proteins or contained polymorphic variants of these genes. WGS sequence analysis of the pan-genome revealed, for the first time, components of a Type VII secretion system in several S. mutans strains, as well as two putative ORFs that encode possible collagen binding proteins located upstream of the cnm gene, which is associated with host cell invasiveness. The virulence of these particular strains was assessed in a wax-worm model. This is the first study to combine a comprehensive analysis of key virulence-related phenotypes with extensive genomic analysis of a pathogen that evolved closely with humans. Our analysis highlights the phenotypic diversity of S. mutans isolates and indicates that the species has evolved a variety of adaptive strategies to persist in the human oral cavity and, when conditions are favorable, to initiate disease. PMID:23613838

  15. Phenotypic heterogeneity of genomically-diverse isolates of Streptococcus mutans.

    Directory of Open Access Journals (Sweden)

    Sara R Palmer

    Full Text Available High coverage, whole genome shotgun (WGS sequencing of 57 geographically- and genetically-diverse isolates of Streptococcus mutans from individuals of known dental caries status was recently completed. Of the 57 sequenced strains, fifteen isolates, were selected based primarily on differences in gene content and phenotypic characteristics known to affect virulence and compared with the reference strain UA159. A high degree of variability in these properties was observed between strains, with a broad spectrum of sensitivities to low pH, oxidative stress (air and paraquat and exposure to competence stimulating peptide (CSP. Significant differences in autolytic behavior and in biofilm development in glucose or sucrose were also observed. Natural genetic competence varied among isolates, and this was correlated to the presence or absence of competence genes, comCDE and comX, and to bacteriocins. In general strains that lacked the ability to become competent possessed fewer genes for bacteriocins and immunity proteins or contained polymorphic variants of these genes. WGS sequence analysis of the pan-genome revealed, for the first time, components of a Type VII secretion system in several S. mutans strains, as well as two putative ORFs that encode possible collagen binding proteins located upstream of the cnm gene, which is associated with host cell invasiveness. The virulence of these particular strains was assessed in a wax-worm model. This is the first study to combine a comprehensive analysis of key virulence-related phenotypes with extensive genomic analysis of a pathogen that evolved closely with humans. Our analysis highlights the phenotypic diversity of S. mutans isolates and indicates that the species has evolved a variety of adaptive strategies to persist in the human oral cavity and, when conditions are favorable, to initiate disease.

  16. Whole genome sequencing in clinical and public health microbiology.

    Science.gov (United States)

    Kwong, J C; McCallum, N; Sintchenko, V; Howden, B P

    2015-04-01

    Genomics and whole genome sequencing (WGS) have the capacity to greatly enhance knowledge and understanding of infectious diseases and clinical microbiology.The growth and availability of bench-top WGS analysers has facilitated the feasibility of genomics in clinical and public health microbiology.Given current resource and infrastructure limitations, WGS is most applicable to use in public health laboratories, reference laboratories, and hospital infection control-affiliated laboratories.As WGS represents the pinnacle for strain characterisation and epidemiological analyses, it is likely to replace traditional typing methods, resistance gene detection and other sequence-based investigations (e.g., 16S rDNA PCR) in the near future.Although genomic technologies are rapidly evolving, widespread implementation in clinical and public health microbiology laboratories is limited by the need for effective semi-automated pipelines, standardised quality control and data interpretation, bioinformatics expertise, and infrastructure.

  17. Data on genome sequencing, analysis and annotation of a pathogenic Bacillus cereus 062011msu

    Directory of Open Access Journals (Sweden)

    Rashmi Rathy

    2018-04-01

    Full Text Available Bacillus species 062011 msu is a harmful pathogenic strain responsible for causing abscessation in sheep and goat population studied by Mariappan et al. (2012 [1]. The organism specifically targets the female sheep and goat population and results in the reduction of milk and meat production. In the present study, we have performed the whole genome sequencing of the pathogenic isolate using the Ion Torrent sequencing platform and generated 458,944 raw reads with an average length of 198.2 bp. The genome sequence was assembled, annotated and analysed for the genetic islands, metabolic pathways, orthologous groups, virulence factors and antibiotic resistance genes associated with the pathogen. Simultaneously the 16S rRNA sequencing study and genome sequence comparison data confirmed that the strain belongs to the species Bacillus cereus and exhibits 99% sequence homo;logy with the genomes of B. cereus ATCC 10987 and B. cereus FRI-35. Hence, we have renamed the organism as Bacillus cereus 062011msu. The Whole Genome Shotgun (WGS project has been deposited at DDBJ/ENA/GenBank under the accession NTMF00000000 (https://www.ncbi.nlm.nih.gov/bioproject/PRJNA404036(SAMN07629099. Keywords: Bacillus cereus, Genome sequencing, Abscessation, Virulence factors

  18. A proposed clinical decision support architecture capable of supporting whole genome sequence information.

    Science.gov (United States)

    Welch, Brandon M; Loya, Salvador Rodriguez; Eilbeck, Karen; Kawamoto, Kensaku

    2014-04-04

    Whole genome sequence (WGS) information may soon be widely available to help clinicians personalize the care and treatment of patients. However, considerable barriers exist, which may hinder the effective utilization of WGS information in a routine clinical care setting. Clinical decision support (CDS) offers a potential solution to overcome such barriers and to facilitate the effective use of WGS information in the clinic. However, genomic information is complex and will require significant considerations when developing CDS capabilities. As such, this manuscript lays out a conceptual framework for a CDS architecture designed to deliver WGS-guided CDS within the clinical workflow. To handle the complexity and breadth of WGS information, the proposed CDS framework leverages service-oriented capabilities and orchestrates the interaction of several independently-managed components. These independently-managed components include the genome variant knowledge base, the genome database, the CDS knowledge base, a CDS controller and the electronic health record (EHR). A key design feature is that genome data can be stored separately from the EHR. This paper describes in detail: (1) each component of the architecture; (2) the interaction of the components; and (3) how the architecture attempts to overcome the challenges associated with WGS information. We believe that service-oriented CDS capabilities will be essential to using WGS information for personalized medicine.

  19. V-GAP: Viral genome assembly pipeline

    KAUST Repository

    Nakamura, Yoji

    2015-10-22

    Next-generation sequencing technologies have allowed the rapid determination of the complete genomes of many organisms. Although shotgun sequences from large genome organisms are still difficult to reconstruct perfect contigs each of which represents a full chromosome, those from small genomes have been assembled successfully into a very small number of contigs. In this study, we show that shotgun reads from phage genomes can be reconstructed into a single contig by controlling the number of read sequences used in de novo assembly. We have developed a pipeline to assemble small viral genomes with good reliability using a resampling method from shotgun data. This pipeline, named V-GAP (Viral Genome Assembly Pipeline), will contribute to the rapid genome typing of viruses, which are highly divergent, and thus will meet the increasing need for viral genome comparisons in metagenomic studies.

  20. V-GAP: Viral genome assembly pipeline

    KAUST Repository

    Nakamura, Yoji; Yasuike, Motoshige; Nishiki, Issei; Iwasaki, Yuki; Fujiwara, Atushi; Kawato, Yasuhiko; Nakai, Toshihiro; Nagai, Satoshi; Kobayashi, Takanori; Gojobori, Takashi; Ototake, Mitsuru

    2015-01-01

    Next-generation sequencing technologies have allowed the rapid determination of the complete genomes of many organisms. Although shotgun sequences from large genome organisms are still difficult to reconstruct perfect contigs each of which represents a full chromosome, those from small genomes have been assembled successfully into a very small number of contigs. In this study, we show that shotgun reads from phage genomes can be reconstructed into a single contig by controlling the number of read sequences used in de novo assembly. We have developed a pipeline to assemble small viral genomes with good reliability using a resampling method from shotgun data. This pipeline, named V-GAP (Viral Genome Assembly Pipeline), will contribute to the rapid genome typing of viruses, which are highly divergent, and thus will meet the increasing need for viral genome comparisons in metagenomic studies.

  1. How do students react to analyzing their own genomes in a whole-genome sequencing course?: outcomes of a longitudinal cohort study.

    Science.gov (United States)

    Sanderson, Saskia C; Linderman, Michael D; Zinberg, Randi; Bashir, Ali; Kasarskis, Andrew; Zweig, Micol; Suckiel, Sabrina; Shah, Hardik; Mahajan, Milind; Diaz, George A; Schadt, Eric E

    2015-11-01

    Health-care professionals need to be trained to work with whole-genome sequencing (WGS) in their practice. Our aim was to explore how students responded to a novel genome analysis course that included the option to analyze their own genomes. This was an observational cohort study. Questionnaires were administered before (T3) and after the genome analysis course (T4), as well as 6 months later (T5). In-depth interviews were conducted at T5. All students (n = 19) opted to analyze their own genomes. At T5, 12 of 15 students stated that analyzing their own genomes had been useful. Ten reported they had applied their knowledge in the workplace. Technical WGS knowledge increased (mean of 63.8% at T3, mean of 72.5% at T4; P = 0.005). In-depth interviews suggested that analyzing their own genomes may increase students' motivation to learn and their understanding of the patient experience. Most (but not all) of the students reported low levels of WGS results-related distress and low levels of regret about their decision to analyze their own genomes. Giving students the option of analyzing their own genomes may increase motivation to learn, but some students may experience personal WGS results-related distress and regret. Additional evidence is required before considering incorporating optional personal genome analysis into medical education on a large scale.

  2. High-resolution metagenomics targets major functional types in complex microbial communities

    Energy Technology Data Exchange (ETDEWEB)

    Kalyuzhnaya, Marina G.; Lapidus, Alla; Ivanova, Natalia; Copeland, Alex C.; McHardy, Alice C.; Szeto, Ernest; Salamov, Asaf; Grigoriev, Igor V.; Suciu, Dominic; Levine, Samuel R.; Markowitz, Victor M.; Rigoutsos, Isidore; Tringe, Susannah G.; Bruce, David C.; Richardson, Paul M.; Lidstrom, Mary E.; Chistoserdova, Ludmila

    2009-08-01

    Most microbes in the biosphere remain uncultured and unknown. Whole genome shotgun (WGS) sequencing of environmental DNA (metagenomics) allows glimpses into genetic and metabolic potentials of natural microbial communities. However, in communities of high complexity metagenomics fail to link specific microbes to specific ecological functions. To overcome this limitation, we selectively targeted populations involved in oxidizing single-carbon (C{sub 1}) compounds in Lake Washington (Seattle, USA) by labeling their DNA via stable isotope probing (SIP), followed by WGS sequencing. Metagenome analysis demonstrated specific sequence enrichments in response to different C{sub 1} substrates, highlighting ecological roles of individual phylotypes. We further demonstrated the utility of our approach by extracting a nearly complete genome of a novel methylotroph Methylotenera mobilis, reconstructing its metabolism and conducting genome-wide analyses. This approach allowing high-resolution genomic analysis of ecologically relevant species has the potential to be applied to a wide variety of ecosystems.

  3. A Proposed Clinical Decision Support Architecture Capable of Supporting Whole Genome Sequence Information

    Directory of Open Access Journals (Sweden)

    Brandon M. Welch

    2014-04-01

    Full Text Available Whole genome sequence (WGS information may soon be widely available to help clinicians personalize the care and treatment of patients. However, considerable barriers exist, which may hinder the effective utilization of WGS information in a routine clinical care setting. Clinical decision support (CDS offers a potential solution to overcome such barriers and to facilitate the effective use of WGS information in the clinic. However, genomic information is complex and will require significant considerations when developing CDS capabilities. As such, this manuscript lays out a conceptual framework for a CDS architecture designed to deliver WGS-guided CDS within the clinical workflow. To handle the complexity and breadth of WGS information, the proposed CDS framework leverages service-oriented capabilities and orchestrates the interaction of several independently-managed components. These independently-managed components include the genome variant knowledge base, the genome database, the CDS knowledge base, a CDS controller and the electronic health record (EHR. A key design feature is that genome data can be stored separately from the EHR. This paper describes in detail: (1 each component of the architecture; (2 the interaction of the components; and (3 how the architecture attempts to overcome the challenges associated with WGS information. We believe that service-oriented CDS capabilities will be essential to using WGS information for personalized medicine.

  4. A microarray-based genotyping and genetic mapping approach for highly heterozygous outcrossing species enables localization of a large fraction of the unassembled Populus trichocarpa genome sequence.

    Science.gov (United States)

    Drost, Derek R; Novaes, Evandro; Boaventura-Novaes, Carolina; Benedict, Catherine I; Brown, Ryan S; Yin, Tongming; Tuskan, Gerald A; Kirst, Matias

    2009-06-01

    Microarrays have demonstrated significant power for genome-wide analyses of gene expression, and recently have also revolutionized the genetic analysis of segregating populations by genotyping thousands of loci in a single assay. Although microarray-based genotyping approaches have been successfully applied in yeast and several inbred plant species, their power has not been proven in an outcrossing species with extensive genetic diversity. Here we have developed methods for high-throughput microarray-based genotyping in such species using a pseudo-backcross progeny of 154 individuals of Populus trichocarpa and P. deltoides analyzed with long-oligonucleotide in situ-synthesized microarray probes. Our analysis resulted in high-confidence genotypes for 719 single-feature polymorphism (SFP) and 1014 gene expression marker (GEM) candidates. Using these genotypes and an established microsatellite (SSR) framework map, we produced a high-density genetic map comprising over 600 SFPs, GEMs and SSRs. The abundance of gene-based markers allowed us to localize over 35 million base pairs of previously unplaced whole-genome shotgun (WGS) scaffold sequence to putative locations in the genome of P. trichocarpa. A high proportion of sampled scaffolds could be verified for their placement with independently mapped SSRs, demonstrating the previously un-utilized power that high-density genotyping can provide in the context of map-based WGS sequence reassembly. Our results provide a substantial contribution to the continued improvement of the Populus genome assembly, while demonstrating the feasibility of microarray-based genotyping in a highly heterozygous population. The strategies presented are applicable to genetic mapping efforts in all plant species with similarly high levels of genetic diversity.

  5. Application of whole genome shotgun sequencing for detection and characterization of genetically modified organisms and derived products.

    Science.gov (United States)

    Holst-Jensen, Arne; Spilsberg, Bjørn; Arulandhu, Alfred J; Kok, Esther; Shi, Jianxin; Zel, Jana

    2016-07-01

    The emergence of high-throughput, massive or next-generation sequencing technologies has created a completely new foundation for molecular analyses. Various selective enrichment processes are commonly applied to facilitate detection of predefined (known) targets. Such approaches, however, inevitably introduce a bias and are prone to miss unknown targets. Here we review the application of high-throughput sequencing technologies and the preparation of fit-for-purpose whole genome shotgun sequencing libraries for the detection and characterization of genetically modified and derived products. The potential impact of these new sequencing technologies for the characterization, breeding selection, risk assessment, and traceability of genetically modified organisms and genetically modified products is yet to be fully acknowledged. The published literature is reviewed, and the prospects for future developments and use of the new sequencing technologies for these purposes are discussed.

  6. Suicide with Shotgun: A Case Report

    Directory of Open Access Journals (Sweden)

    Ali Yildirim

    2011-03-01

    Full Text Available Suicide appears to be a major public health problem in our country and all over the World. Suicide methods will vary between the various communities the most common types of suicides are hanging, using chemicals and using firearms (pistol, shotgun. Connected with easy availability of shotguns suicide cases with using shotgun is significantly increasing in recent years. In our study, suicide with a shotgun, are evaluated in terms of shooting range and its features, originate, area of suicide, crime scene, sex and age. [J Contemp Med 2011; 1(1.000: 29-34

  7. GenBank

    OpenAIRE

    Benson, Dennis A.; Cavanaugh, Mark; Clark, Karen; Karsch-Mizrachi, Ilene; Lipman, David J.; Ostell, James; Sayers, Eric W.

    2012-01-01

    GenBank? (http://www.ncbi.nlm.nih.gov) is a comprehensive database that contains publicly available nucleotide sequences for almost 260 000 formally described species. These sequences are obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects, including whole-genome shotgun (WGS) and environmental sampling projects. Most submissions are made using the web-based BankIt or standalone Sequin programs, and GenBank staff assig...

  8. Parents perspectives on whole genome sequencing for their children: qualified enthusiasm?

    Science.gov (United States)

    Anderson, J A; Meyn, M S; Shuman, C; Zlotnik Shaul, R; Mantella, L E; Szego, M J; Bowdin, S; Monfared, N; Hayeems, R Z

    2017-08-01

    To better understand the consequences of returning whole genome sequencing (WGS) results in paediatrics and facilitate its evidence-based clinical implementation, we studied parents' experiences with WGS and their preferences for the return of adult-onset secondary variants (SVs)-medically actionable genomic variants unrelated to their child's current medical condition that predict adult-onset disease. We conducted qualitative interviews with parents whose children were undergoing WGS as part of the SickKids Genome Clinic, a research project that studies the impact of clinical WGS on patients, families, and the healthcare system. Interviews probed parents' experience with and motivation for WGS as well as their preferences related to SVs. Interviews were analysed thematically. Of 83 invited, 23 parents from 18 families participated. These parents supported WGS as a diagnostic test, perceiving clear intrinsic and instrumental value. However, many parents were ambivalent about receiving SVs, conveying a sense of self-imposed obligation to take on the 'weight' of knowing their child's SVs, however unpleasant. Some parents chose to learn about adult-onset SVs for their child but not for themselves. Despite general enthusiasm for WGS as a diagnostic test, many parents felt a duty to learn adult-onset SVs. Analogous to 'inflicted insight', we call this phenomenon 'inflicted ought'. Importantly, not all parents of children undergoing WGS view the best interests of their child in relational terms, thereby challenging an underlying justification for current ACMG guidelines for reporting incidental secondary findings from whole exome and WGS. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/.

  9. The MedSeq Project: a randomized trial of integrating whole genome sequencing into clinical medicine.

    Science.gov (United States)

    Vassy, Jason L; Lautenbach, Denise M; McLaughlin, Heather M; Kong, Sek Won; Christensen, Kurt D; Krier, Joel; Kohane, Isaac S; Feuerman, Lindsay Z; Blumenthal-Barby, Jennifer; Roberts, J Scott; Lehmann, Lisa Soleymani; Ho, Carolyn Y; Ubel, Peter A; MacRae, Calum A; Seidman, Christine E; Murray, Michael F; McGuire, Amy L; Rehm, Heidi L; Green, Robert C

    2014-03-20

    Whole genome sequencing (WGS) is already being used in certain clinical and research settings, but its impact on patient well-being, health-care utilization, and clinical decision-making remains largely unstudied. It is also unknown how best to communicate sequencing results to physicians and patients to improve health. We describe the design of the MedSeq Project: the first randomized trials of WGS in clinical care. This pair of randomized controlled trials compares WGS to standard of care in two clinical contexts: (a) disease-specific genomic medicine in a cardiomyopathy clinic and (b) general genomic medicine in primary care. We are recruiting 8 to 12 cardiologists, 8 to 12 primary care physicians, and approximately 200 of their patients. Patient participants in both the cardiology and primary care trials are randomly assigned to receive a family history assessment with or without WGS. Our laboratory delivers a genome report to physician participants that balances the needs to enhance understandability of genomic information and to convey its complexity. We provide an educational curriculum for physician participants and offer them a hotline to genetics professionals for guidance in interpreting and managing their patients' genome reports. Using varied data sources, including surveys, semi-structured interviews, and review of clinical data, we measure the attitudes, behaviors and outcomes of physician and patient participants at multiple time points before and after the disclosure of these results. The impact of emerging sequencing technologies on patient care is unclear. We have designed a process of interpreting WGS results and delivering them to physicians in a way that anticipates how we envision genomic medicine will evolve in the near future. That is, our WGS report provides clinically relevant information while communicating the complexity and uncertainty of WGS results to physicians and, through physicians, to their patients. This project will not only

  10. Bacterial genome sequencing in clinical microbiology: a pathogen-oriented review.

    Science.gov (United States)

    Tagini, F; Greub, G

    2017-11-01

    In recent years, whole-genome sequencing (WGS) has been perceived as a technology with the potential to revolutionise clinical microbiology. Herein, we reviewed the literature on the use of WGS for the most commonly encountered pathogens in clinical microbiology laboratories: Escherichia coli and other Enterobacteriaceae, Staphylococcus aureus and coagulase-negative staphylococci, streptococci and enterococci, mycobacteria and Chlamydia trachomatis. For each pathogen group, we focused on five different aspects: the genome characteristics, the most common genomic approaches and the clinical uses of WGS for (i) typing and outbreak analysis, (ii) virulence investigation and (iii) in silico antimicrobial susceptibility testing. Of all the clinical usages, the most frequent and straightforward usage was to type bacteria and to trace outbreaks back. A next step toward standardisation was made thanks to the development of several new genome-wide multi-locus sequence typing systems based on WGS data. Although virulence characterisation could help in various particular clinical settings, it was done mainly to describe outbreak strains. An increasing number of studies compared genotypic to phenotypic antibiotic susceptibility testing, with mostly promising results. However, routine implementation will preferentially be done in the workflow of particular pathogens, such as mycobacteria, rather than as a broadly applicable generic tool. Overall, concrete uses of WGS in routine clinical microbiology or infection control laboratories were done, but the next big challenges will be the standardisation and validation of the procedures and bioinformatics pipelines in order to reach clinical standards.

  11. "Shotgunning" as an illicit drug smoking practice.

    Science.gov (United States)

    Perlman, D C; Perkins, M P; Paone, D; Kochems, L; Salomon, N; Friedmann, P; Des Jarlais, D C

    1997-01-01

    There has been a rise in illicit drug smoking in the United States. "Shotgunning" drugs (or "doing a shotgun") refers to the practice of inhaling smoke and then exhaling it into another individual's mouth, a practice with the potential for the efficient transmission of respiratory pathogens. Three hundred fifty-four drug users (239 from a syringe exchange and 115 from a drug detoxification program) were interviewed about shotgunning and screened for tuberculosis (TB). Fifty-nine (17%; 95% CI 12.9%-20.9%) reported shotgunning while smoking crack cocaine (68%), marijuana (41%), or heroin (2%). In multivariate analysis, age alcohol to intoxication (OR 2.2, 95% CI 1.1-4.3), having engaged in high-risk sex (OR 2.6, 95% CI 1.04-6.7), and crack use (OR 6.0, 95% CI 3.0-12) were independently associated with shotgunning. Shotgunning is a frequent drug smoking practice with the potential to transmit respiratory pathogens, underscoring the need for education of drug users about the risks of specific drug use practices, and the ongoing need for TB control among active drug users.

  12. Genome U-Plot: a whole genome visualization.

    Science.gov (United States)

    Gaitatzes, Athanasios; Johnson, Sarah H; Smadbeck, James B; Vasmatzis, George

    2018-05-15

    The ability to produce and analyze whole genome sequencing (WGS) data from samples with structural variations (SV) generated the need to visualize such abnormalities in simplified plots. Conventional two-dimensional representations of WGS data frequently use either circular or linear layouts. There are several diverse advantages regarding both these representations, but their major disadvantage is that they do not use the two-dimensional space very efficiently. We propose a layout, termed the Genome U-Plot, which spreads the chromosomes on a two-dimensional surface and essentially quadruples the spatial resolution. We present the Genome U-Plot for producing clear and intuitive graphs that allows researchers to generate novel insights and hypotheses by visualizing SVs such as deletions, amplifications, and chromoanagenesis events. The main features of the Genome U-Plot are its layered layout, its high spatial resolution and its improved aesthetic qualities. We compare conventional visualization schemas with the Genome U-Plot using visualization metrics such as number of line crossings and crossing angle resolution measures. Based on our metrics, we improve the readability of the resulting graph by at least 2-fold, making apparent important features and making it easy to identify important genomic changes. A whole genome visualization tool with high spatial resolution and improved aesthetic qualities. An implementation and documentation of the Genome U-Plot is publicly available at https://github.com/gaitat/GenomeUPlot. vasmatzis.george@mayo.edu. Supplementary data are available at Bioinformatics online.

  13. Whole Genome Sequencing Increases Molecular Diagnostic Yield Compared with Current Diagnostic Testing for Inherited Retinal Disease.

    Science.gov (United States)

    Ellingford, Jamie M; Barton, Stephanie; Bhaskar, Sanjeev; Williams, Simon G; Sergouniotis, Panagiotis I; O'Sullivan, James; Lamb, Janine A; Perveen, Rahat; Hall, Georgina; Newman, William G; Bishop, Paul N; Roberts, Stephen A; Leach, Rick; Tearle, Rick; Bayliss, Stuart; Ramsden, Simon C; Nemeth, Andrea H; Black, Graeme C M

    2016-05-01

    To compare the efficacy of whole genome sequencing (WGS) with targeted next-generation sequencing (NGS) in the diagnosis of inherited retinal disease (IRD). Case series. A total of 562 patients diagnosed with IRD. We performed a direct comparative analysis of current molecular diagnostics with WGS. We retrospectively reviewed the findings from a diagnostic NGS DNA test for 562 patients with IRD. A subset of 46 of 562 patients (encompassing potential clinical outcomes of diagnostic analysis) also underwent WGS, and we compared mutation detection rates and molecular diagnostic yields. In addition, we compared the sensitivity and specificity of the 2 techniques to identify known single nucleotide variants (SNVs) using 6 control samples with publically available genotype data. Diagnostic yield of genomic testing. Across known disease-causing genes, targeted NGS and WGS achieved similar levels of sensitivity and specificity for SNV detection. However, WGS also identified 14 clinically relevant genetic variants through WGS that had not been identified by NGS diagnostic testing for the 46 individuals with IRD. These variants included large deletions and variants in noncoding regions of the genome. Identification of these variants confirmed a molecular diagnosis of IRD for 11 of the 33 individuals referred for WGS who had not obtained a molecular diagnosis through targeted NGS testing. Weighted estimates, accounting for population structure, suggest that WGS methods could result in an overall 29% (95% confidence interval, 15-45) uplift in diagnostic yield. We show that WGS methods can detect disease-causing genetic variants missed by current NGS diagnostic methodologies for IRD and thereby demonstrate the clinical utility and additional value of WGS. Copyright © 2016 American Academy of Ophthalmology. Published by Elsevier Inc. All rights reserved.

  14. A Validation Approach of an End-to-End Whole Genome Sequencing Workflow for Source Tracking of Listeria monocytogenes and Salmonella enterica

    Directory of Open Access Journals (Sweden)

    Anne-Catherine Portmann

    2018-03-01

    Full Text Available Whole genome sequencing (WGS, using high throughput sequencing technology, reveals the complete sequence of the bacterial genome in a few days. WGS is increasingly being used for source tracking, pathogen surveillance and outbreak investigation due to its high discriminatory power. In the food industry, WGS used for source tracking is beneficial to support contamination investigations. Despite its increased use, no standards or guidelines are available today for the use of WGS in outbreak and/or trace-back investigations. Here we present a validation of our complete (end-to-end WGS workflow for Listeria monocytogenes and Salmonella enterica including: subculture of isolates, DNA extraction, sequencing and bioinformatics analysis. This end-to-end WGS workflow was evaluated according to the following performance criteria: stability, repeatability, reproducibility, discriminatory power, and epidemiological concordance. The current study showed that few single nucleotide polymorphism (SNPs were observed for L. monocytogenes and S. enterica when comparing genome sequences from five independent colonies from the first subculture and five independent colonies after the tenth subculture. Consequently, the stability of the WGS workflow for L. monocytogenes and S. enterica was demonstrated despite the few genomic variations that can occur during subculturing steps. Repeatability and reproducibility were also demonstrated. The WGS workflow was shown to have a high discriminatory power and has the ability to show genetic relatedness. Additionally, the WGS workflow was able to reproduce published outbreak investigation results, illustrating its capability of showing epidemiological concordance. The current study proposes a validation approach comprising all steps of a WGS workflow and demonstrates that the workflow can be applied to L. monocytogenes or S. enterica.

  15. Public attitudes in Japan toward participation in whole genome sequencing studies.

    Science.gov (United States)

    Okita, Taketoshi; Ohashi, Noriko; Kabata, Daijiro; Shintani, Ayumi; Kato, Kazuto

    2018-04-13

    Recent innovations in gene analysis technology have allowed for rapid and inexpensive sequencing of entire genomes. Thus, both conducting a study using whole genome sequencing (WGS) in a large population and the clinical application of research findings from such studies are currently feasible. However, to promote WGS studies, understanding and voluntary participation by the general public is needed. Therefore, it is essential to investigate the general public's attitude toward and understanding of WGS studies. The primary goal of our research is to investigate these issues and to discover how they relate to research participation in WGS studies. A survey of awareness regarding WGS and studies using WGS was conducted with a sample of 2000 or more participants using a self-administered questionnaire posted on the Internet between February 20 and 21, 2015. Prior to the survey, we briefly explained WGS and WGS study-related issues to the respondents in order to provide them with the minimum knowledge required to answer the questionnaire. We then conducted an analysis, including cross-classification. For the question regarding interest in WGS, 46.6% of participants responded "Yes." 70.7% of all respondents said that they were interested in some kinds of findings that could be obtained from WGS studies. Regarding participation in WGS studies, 29.0% were interested in participating. The demographic factors significantly related to attitudes toward research participation were age, level of education, and employment status. The results also suggest that concerns about WGS have a positive effect on people's willingness to participate. Furthermore, it was shown that for people who were not interested in their gene-related information, concerns about WGS negatively impacted their willingness to participate. However, for people who were interested in their gene-related information, their concerns might not have impacted their willingness to participate. This research has shown

  16. Practical issues in implementing whole-genome-sequencing in routine diagnostic microbiology.

    Science.gov (United States)

    Rossen, J W A; Friedrich, A W; Moran-Gilad, J

    2018-04-01

    Next generation sequencing (NGS) is increasingly being used in clinical microbiology. Like every new technology adopted in microbiology, the integration of NGS into clinical and routine workflows must be carefully managed. To review the practical aspects of implementing bacterial whole genome sequencing (WGS) in routine diagnostic laboratories. Review of the literature and expert opinion. In this review, we discuss when and how to integrate whole genome sequencing (WGS) in the routine workflow of the clinical laboratory. In addition, as the microbiology laboratories have to adhere to various national and international regulations and criteria for their accreditation, we deliberate on quality control issues for using WGS in microbiology, including the importance of proficiency testing. Furthermore, the current and future place of this technology in the diagnostic hierarchy of microbiology is described as well as the necessity of maintaining backwards compatibility with already established methods. Finally, we speculate on the question of whether WGS can entirely replace routine microbiology in the future and the tension between the fact that most sequencers are designed to process multiple samples in parallel whereas for optimal diagnosis a one-by-one processing of the samples is preferred. Special reference is made to the cost and turnaround time of WGS in diagnostic laboratories. Further development is required to improve the workflow for WGS, in particular to shorten the turnaround time, reduce costs, and streamline downstream data analyses. Only when these processes reach maturity will reliance on WGS for routine patient management and infection control management become feasible, enabling the transformation of clinical microbiology into a genome-based and personalized diagnostic field. Copyright © 2017 The Author(s). Published by Elsevier Ltd.. All rights reserved.

  17. Identification of meat products by shotgun spectral matching

    DEFF Research Database (Denmark)

    Ohana, D.; Dalebout, H.; Marissen, R. J.

    2016-01-01

    A new method, based on shotgun spectral matching of peptide tandem mass spectra, was successfully applied to the identification of different food species. The method was demonstrated to work on raw as well as processed samples from 16 mammalian and 10 bird species by counting spectral matches...... to spectral libraries in a reference database with one spectral library per species. A phylogenetic tree could also be constructed directly from the spectra. Nearly all samples could be correctly identified at the species level, and 100% at the genus level. The method does not use any genomic information...

  18. Whole-genome characterization in pedigreed non-human primates using Genotyping-By-Sequencing and imputation.

    OpenAIRE

    Cervera-Juanes, Rita; Vinson, Amanda; Ferguson, Betsy; Carbone, Lucia; Spindel, Eliot; Mccouch, Susan; Spindel, Jennifer; Nevonen, Kimberly; Letaw, John; Raboin, Michael; Bimber, Ben

    2016-01-01

    Background: Rhesus macaques are widely used in biomedical research, but the application of genomic information in this species to better understand human disease is still undeveloped. Whole-genome sequence (WGS) data in pedigreed macaque colonies could provide substantial experimental power, but the collection of WGS data in large cohorts remains a formidable expense. Here, we describe a cost-effective approach that selects the most informative macaques in a pedigree for whole-genome sequenci...

  19. Reference-quality genome sequence of Aegilops tauschii, the source of wheat D genome, shows that recombination shapes genome structure and evolution

    Science.gov (United States)

    Aegilops tauschii is the diploid progenitor of the D genome of hexaploid wheat and an important genetic resource for wheat. A reference-quality sequence for the Ae. tauschii genome was produced with a combination of ordered-clone sequencing, whole-genome shotgun sequencing, and BioNano optical geno...

  20. Transposon fingerprinting using low coverage whole genome shotgun sequencing in cacao (Theobroma cacao L.) and related species.

    Science.gov (United States)

    Sveinsson, Saemundur; Gill, Navdeep; Kane, Nolan C; Cronk, Quentin

    2013-07-24

    Transposable elements (TEs) and other repetitive elements are a large and dynamically evolving part of eukaryotic genomes, especially in plants where they can account for a significant proportion of genome size. Their dynamic nature gives them the potential for use in identifying and characterizing crop germplasm. However, their repetitive nature makes them challenging to study using conventional methods of molecular biology. Next generation sequencing and new computational tools have greatly facilitated the investigation of TE variation within species and among closely related species. (i) We generated low-coverage Illumina whole genome shotgun sequencing reads for multiple individuals of cacao (Theobroma cacao) and related species. These reads were analysed using both an alignment/mapping approach and a de novo (graph based clustering) approach. (ii) A standard set of ultra-conserved orthologous sequences (UCOS) standardized TE data between samples and provided phylogenetic information on the relatedness of samples. (iii) The mapping approach proved highly effective within the reference species but underestimated TE abundance in interspecific comparisons relative to the de novo methods. (iv) Individual T. cacao accessions have unique patterns of TE abundance indicating that the TE composition of the genome is evolving actively within this species. (v) LTR/Gypsy elements are the most abundant, comprising c.10% of the genome. (vi) Within T. cacao the retroelement families show an order of magnitude greater sequence variability than the DNA transposon families. (vii) Theobroma grandiflorum has a similar TE composition to T. cacao, but the related genus Herrania is rather different, with LTRs making up a lower proportion of the genome, perhaps because of a massive presence (c. 20%) of distinctive low complexity satellite-like repeats in this genome. (i) Short read alignment/mapping to reference TE contigs provides a simple and effective method of investigating

  1. Whole Genome DNA Sequence Analysis of Salmonella subspecies enterica serotype Tennessee obtained from related peanut butter foodborne outbreaks.

    Directory of Open Access Journals (Sweden)

    Mark R Wilson

    Full Text Available Establishing an association between possible food sources and clinical isolates requires discriminating the suspected pathogen from an environmental background, and distinguishing it from other closely-related foodborne pathogens. We used whole genome sequencing (WGS to Salmonella subspecies enterica serotype Tennessee (S. Tennessee to describe genomic diversity across the serovar as well as among and within outbreak clades of strains associated with contaminated peanut butter. We analyzed 71 isolates of S. Tennessee from disparate food, environmental, and clinical sources and 2 other closely-related Salmonella serovars as outgroups (S. Kentucky and S. Cubana, which were also shot-gun sequenced. A whole genome single nucleotide polymorphism (SNP analysis was performed using a maximum likelihood approach to infer phylogenetic relationships. Several monophyletic lineages of S. Tennessee with limited SNP variability were identified that recapitulated several food contamination events. S. Tennessee clades were separated from outgroup salmonellae by more than sixteen thousand SNPs. Intra-serovar diversity of S. Tennessee was small compared to the chosen outgroups (1,153 SNPs, suggesting recent divergence of some S. Tennessee clades. Analysis of all 1,153 SNPs structuring an S. Tennessee peanut butter outbreak cluster revealed that isolates from several food, plant, and clinical isolates were very closely related, as they had only a few SNP differences between them. SNP-based cluster analyses linked specific food sources to several clinical S. Tennessee strains isolated in separate contamination events. Environmental and clinical isolates had very similar whole genome sequences; no markers were found that could be used to discriminate between these sources. Finally, we identified SNPs within variable S. Tennessee genes that may be useful markers for the development of rapid surveillance and typing methods, potentially aiding in traceback efforts

  2. Whole-genome shotgun optical mapping of rhodospirillumrubrum

    Energy Technology Data Exchange (ETDEWEB)

    Reslewic, Susan; Zhou, Shiguo; Place, Mike; Zhang, Yaoping; Briska, Adam; Goldstein, Steve; Churas, Chris; Runnheim, Rod; Forrest,Dan; Lim, Alex; Lapidus, Alla; Han, Cliff S.; Roberts, Gary P.; Schwartz,David C.

    2004-07-01

    Rhodospirillum rubrum is a phototrophic purple non-sulfur bacterium known for its unique and well-studied nitrogen fixation and carbon monoxide oxidation systems, and as a source of hydrogen and biodegradable plastics production. To better understand this organism and to facilitate assembly of its sequence, three whole-genome restriction maps (Xba I, Nhe I, and Hind III) of R. rubrum strain ATCC 11170 were created by optical mapping. Optical mapping is a system for creating whole-genome ordered restriction maps from randomly sheared genomic DNA molecules extracted directly from cells. During the sequence finishing process, all three optical maps confirmed a putative error in sequence assembly, while the Hind III map acted as a scaffold for high resolution alignment with sequence contigs spanning the whole genome. In addition to highlighting optical mapping's role in the assembly and validation of genome sequence, our work underscores the unique niche in resolution occupied by the optical mapping system. With a resolution ranging from 6.5 kb (previously published) to 45 kb (reported here), optical mapping advances a ''molecular cytogenetics'' approach to solving problems in genomic analysis.

  3. Whole genome sequencing versus traditional genotyping for investigation of a Mycobacterium tuberculosis outbreak: a longitudinal molecular epidemiological study.

    Directory of Open Access Journals (Sweden)

    Andreas Roetzer

    Full Text Available BACKGROUND: Understanding Mycobacterium tuberculosis (Mtb transmission is essential to guide efficient tuberculosis control strategies. Traditional strain typing lacks sufficient discriminatory power to resolve large outbreaks. Here, we tested the potential of using next generation genome sequencing for identification of outbreak-related transmission chains. METHODS AND FINDINGS: During long-term (1997 to 2010 prospective population-based molecular epidemiological surveillance comprising a total of 2,301 patients, we identified a large outbreak caused by an Mtb strain of the Haarlem lineage. The main performance outcome measure of whole genome sequencing (WGS analyses was the degree of correlation of the WGS analyses with contact tracing data and the spatio-temporal distribution of the outbreak cases. WGS analyses of the 86 isolates revealed 85 single nucleotide polymorphisms (SNPs, subdividing the outbreak into seven genome clusters (two to 24 isolates each, plus 36 unique SNP profiles. WGS results showed that the first outbreak isolates detected in 1997 were falsely clustered by classical genotyping. In 1998, one clone (termed "Hamburg clone" started expanding, apparently independently from differences in the social environment of early cases. Genome-based clustering patterns were in better accordance with contact tracing data and the geographical distribution of the cases than clustering patterns based on classical genotyping. A maximum of three SNPs were identified in eight confirmed human-to-human transmission chains, involving 31 patients. We estimated the Mtb genome evolutionary rate at 0.4 mutations per genome per year. This rate suggests that Mtb grows in its natural host with a doubling time of approximately 22 h (400 generations per year. Based on the genome variation discovered, emergence of the Hamburg clone was dated back to a period between 1993 and 1997, hence shortly before the discovery of the outbreak through epidemiological

  4. The Need for Clinical Decision Support Integrated with the Electronic Health Record for the Clinical Application of Whole Genome Sequencing Information

    Directory of Open Access Journals (Sweden)

    Brandon M. Welch

    2013-12-01

    Full Text Available Whole genome sequencing (WGS is rapidly approaching widespread clinical application. Technology advancements over the past decade, since the first human genome was decoded, have made it feasible to use WGS for clinical care. Future advancements will likely drive down the price to the point wherein WGS is routinely available for care. However, were this to happen today, most of the genetic information available to guide clinical care would go unused due to the complexity of genetics, limited physician proficiency in genetics, and lack of genetics professionals in the clinical workforce. Furthermore, these limitations are unlikely to change in the future. As such, the use of clinical decision support (CDS to guide genome-guided clinical decision-making is imperative. In this manuscript, we describe the barriers to widespread clinical application of WGS information, describe how CDS can be an important tool for overcoming these barriers, and provide clinical examples of how genome-enabled CDS can be used in the clinical setting.

  5. Comparing sequencing assays and human-machine analyses in actionable genomics for glioblastoma.

    Science.gov (United States)

    Wrzeszczynski, Kazimierz O; Frank, Mayu O; Koyama, Takahiko; Rhrissorrakrai, Kahn; Robine, Nicolas; Utro, Filippo; Emde, Anne-Katrin; Chen, Bo-Juen; Arora, Kanika; Shah, Minita; Vacic, Vladimir; Norel, Raquel; Bilal, Erhan; Bergmann, Ewa A; Moore Vogel, Julia L; Bruce, Jeffrey N; Lassman, Andrew B; Canoll, Peter; Grommes, Christian; Harvey, Steve; Parida, Laxmi; Michelini, Vanessa V; Zody, Michael C; Jobanputra, Vaidehi; Royyuru, Ajay K; Darnell, Robert B

    2017-08-01

    To analyze a glioblastoma tumor specimen with 3 different platforms and compare potentially actionable calls from each. Tumor DNA was analyzed by a commercial targeted panel. In addition, tumor-normal DNA was analyzed by whole-genome sequencing (WGS) and tumor RNA was analyzed by RNA sequencing (RNA-seq). The WGS and RNA-seq data were analyzed by a team of bioinformaticians and cancer oncologists, and separately by IBM Watson Genomic Analytics (WGA), an automated system for prioritizing somatic variants and identifying drugs. More variants were identified by WGS/RNA analysis than by targeted panels. WGA completed a comparable analysis in a fraction of the time required by the human analysts. The development of an effective human-machine interface in the analysis of deep cancer genomic datasets may provide potentially clinically actionable calls for individual patients in a more timely and efficient manner than currently possible. NCT02725684.

  6. Clinical decision support for whole genome sequence information leveraging a service-oriented architecture: a prototype.

    Science.gov (United States)

    Welch, Brandon M; Rodriguez-Loya, Salvador; Eilbeck, Karen; Kawamoto, Kensaku

    2014-01-01

    Whole genome sequence (WGS) information could soon be routinely available to clinicians to support the personalized care of their patients. At such time, clinical decision support (CDS) integrated into the clinical workflow will likely be necessary to support genome-guided clinical care. Nevertheless, developing CDS capabilities for WGS information presents many unique challenges that need to be overcome for such approaches to be effective. In this manuscript, we describe the development of a prototype CDS system that is capable of providing genome-guided CDS at the point of care and within the clinical workflow. To demonstrate the functionality of this prototype, we implemented a clinical scenario of a hypothetical patient at high risk for Lynch Syndrome based on his genomic information. We demonstrate that this system can effectively use service-oriented architecture principles and standards-based components to deliver point of care CDS for WGS information in real-time.

  7. WGS-based surveillance of third-generation cephalosporin-resistant Escherichia coli from bloodstream infections in Denmark

    DEFF Research Database (Denmark)

    Roer, Louise; Hansen, Frank; Thomsen, Martin Christen Frølund

    2017-01-01

    clone, here observed for the first time in Denmark. Additionally, the analysis revealed three individual cases with possible persistence of closely related clones collected more than 13 months apart. Continuous WGS-based national surveillance of 3GC-R Ec , in combination with more detailed......-genome sequenced and characterized by using the batch uploader from the Center for Genomic Epidemiology (CGE) and automatically analysed using the CGE tools according to resistance profile, MLST, serotype and fimH subtype. Additionally, the phylogenetic relationship of the isolates was analysed by SNP analysis......To evaluate a genome-based surveillance of all Danish third-generation cephalosporin-resistant Escherichia coli (3GC-R Ec ) from bloodstream infections between 2014 and 2015, focusing on horizontally transferable resistance mechanisms. A collection of 552 3GC-R Ec isolates were whole...

  8. Technical Report: Benchmarking for Quasispecies Abundance Inference with Confidence Intervals from Metagenomic Sequence Data

    Energy Technology Data Exchange (ETDEWEB)

    McLoughlin, K. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

    2016-01-22

    The software application “MetaQuant” was developed by our group at Lawrence Livermore National Laboratory (LLNL). It is designed to profile microbial populations in a sample using data from whole-genome shotgun (WGS) metagenomic DNA sequencing. Several other metagenomic profiling applications have been described in the literature. We ran a series of benchmark tests to compare the performance of MetaQuant against that of a few existing profiling tools, using real and simulated sequence datasets. This report describes our benchmarking procedure and results.

  9. Utility of Whole-Genome Sequencing in Characterizing Acinetobacter Epidemiology and Analyzing Hospital Outbreaks

    Science.gov (United States)

    Fitzpatrick, Margaret A.; Hauser, Alan R.

    2015-01-01

    Acinetobacter baumannii frequently causes nosocomial infections and outbreaks. Whole-genome sequencing (WGS) is a promising technique for strain typing and outbreak investigations. We compared the performance of conventional methods with WGS for strain typing clinical Acinetobacter isolates and analyzing a carbapenem-resistant A. baumannii (CRAB) outbreak. We performed two band-based typing techniques (pulsed-field gel electrophoresis and repetitive extragenic palindromic-PCR), multilocus sequence type (MLST) analysis, and WGS on 148 Acinetobacter calcoaceticus-A. baumannii complex bloodstream isolates collected from a single hospital from 2005 to 2012. Phylogenetic trees inferred from core-genome single nucleotide polymorphisms (SNPs) confirmed three Acinetobacter species within this collection. Four major A. baumannii clonal lineages (as defined by MLST) circulated during the study, three of which are globally distributed and one of which is novel. WGS indicated that a threshold of 2,500 core SNPs accurately distinguished A. baumannii isolates from different clonal lineages. The band-based techniques performed poorly in assigning isolates to clonal lineages and exhibited little agreement with sequence-based techniques. After applying WGS to a CRAB outbreak that occurred during the study, we identified a threshold of 2.5 core SNPs that distinguished nonoutbreak from outbreak strains. WGS was more discriminatory than the band-based techniques and was used to construct a more accurate transmission map that resolved many of the plausible transmission routes suggested by epidemiologic links. Our study demonstrates that WGS is superior to conventional techniques for A. baumannii strain typing and outbreak analysis. These findings support the incorporation of WGS into health care infection prevention efforts. PMID:26699703

  10. Whole-genome sequencing of multiple myeloma from diagnosis to plasma cell leukemia reveals genomic initiating events, evolution, and clonal tides.

    Science.gov (United States)

    Egan, Jan B; Shi, Chang-Xin; Tembe, Waibhav; Christoforides, Alexis; Kurdoglu, Ahmet; Sinari, Shripad; Middha, Sumit; Asmann, Yan; Schmidt, Jessica; Braggio, Esteban; Keats, Jonathan J; Fonseca, Rafael; Bergsagel, P Leif; Craig, David W; Carpten, John D; Stewart, A Keith

    2012-08-02

    The longitudinal evolution of a myeloma genome from diagnosis to plasma cell leukemia has not previously been reported. We used whole-genome sequencing (WGS) on 4 purified tumor samples and patient germline DNA drawn over a 5-year period in a t(4;14) multiple myeloma patient. Tumor samples were acquired at diagnosis, first relapse, second relapse, and end-stage secondary plasma cell leukemia (sPCL). In addition to the t(4;14), all tumor time points also shared 10 common single-nucleotide variants (SNVs) on WGS comprising shared initiating events. Interestingly, we observed genomic sequence variants that waxed and waned with time in progressive tumors, suggesting the presence of multiple independent, yet related, clones at diagnosis that rose and fell in dominance. Five newly acquired SNVs, including truncating mutations of RB1 and ZKSCAN3, were observed only in the final sPCL sample suggesting leukemic transformation events. This longitudinal WGS characterization of the natural history of a high-risk myeloma patient demonstrated tumor heterogeneity at diagnosis with shifting dominance of tumor clones over time and has also identified potential mutations contributing to myelomagenesis as well as transformation from myeloma to overt extramedullary disease such as sPCL.

  11. Whole-genome-based Mycobacterium tuberculosis surveillance: a standardized, portable, and expandable approach.

    Science.gov (United States)

    Kohl, Thomas A; Diel, Roland; Harmsen, Dag; Rothgänger, Jörg; Walter, Karen Meywald; Merker, Matthias; Weniger, Thomas; Niemann, Stefan

    2014-07-01

    Whole-genome sequencing (WGS) allows for effective tracing of Mycobacterium tuberculosis complex (MTBC) (tuberculosis pathogens) transmission. However, it is difficult to standardize and, therefore, is not yet employed for interlaboratory prospective surveillance. To allow its widespread application, solutions for data standardization and storage in an easily expandable database are urgently needed. To address this question, we developed a core genome multilocus sequence typing (cgMLST) scheme for clinical MTBC isolates using the Ridom SeqSphere(+) software, which transfers the genome-wide single nucleotide polymorphism (SNP) diversity into an allele numbering system that is standardized, portable, and not computationally intensive. To test its performance, we performed WGS analysis of 26 isolates with identical IS6110 DNA fingerprints and spoligotyping patterns from a longitudinal outbreak in the federal state of Hamburg, Germany (notified between 2001 and 2010). The cgMLST approach (3,041 genes) discriminated the 26 strains with a resolution comparable to that of SNP-based WGS typing (one major cluster of 22 identical or closely related and four outlier isolates with at least 97 distinct SNPs or 63 allelic variants). Resulting tree topologies are highly congruent and grouped the isolates in both cases analogously. Our data show that SNP- and cgMLST-based WGS analyses facilitate high-resolution discrimination of longitudinal MTBC outbreaks. cgMLST allows for a meaningful epidemiological interpretation of the WGS genotyping data. It enables standardized WGS genotyping for epidemiological investigations, e.g., on the regional public health office level, and the creation of web-accessible databases for global TB surveillance with an integrated early warning system. Copyright © 2014, American Society for Microbiology. All Rights Reserved.

  12. An assessment of time involved in pre-test case review and counseling for a whole genome sequencing clinical research program.

    Science.gov (United States)

    Williams, Janet L; Faucett, W Andrew; Smith-Packard, Bethanny; Wagner, Monisa; Williams, Marc S

    2014-08-01

    Whole genome sequencing (WGS) is being used for evaluation of individuals with undiagnosed disease of suspected genetic origin. Implementing WGS into clinical practice will place an increased burden upon care teams with regard to pre-test patient education and counseling about results. To quantitate the time needed for appropriate pre-test evaluation of participants in WGS testing, we documented the time spent by our clinical research group on various activities related to program preparation, participant screening, and consent prior to WGS. Participants were children or young adults with autism, intellectual or developmental disability, and/or congenital anomalies, who have remained undiagnosed despite previous evaluation, and their biologic parents. Results showed that significant time was spent in securing allocation of clinical research space to counsel participants and families, and in acquisition and review of participant's medical records. Pre-enrollment chart review identified two individuals with existing diagnoses resulting in savings of $30,000 for the genome sequencing alone, as well as saving hours of personnel time for genome interpretation and communication of WGS results. New WGS programs should plan for costs associated with additional pre-test administrative planning and patient evaluation time that will be required to provide high quality care.

  13. Norgal: extraction and de novo assembly of mitochondrial DNA from whole-genome sequencing data.

    Science.gov (United States)

    Al-Nakeeb, Kosai; Petersen, Thomas Nordahl; Sicheritz-Pontén, Thomas

    2017-11-21

    Whole-genome sequencing (WGS) projects provide short read nucleotide sequences from nuclear and possibly organelle DNA depending on the source of origin. Mitochondrial DNA is present in animals and fungi, while plants contain DNA from both mitochondria and chloroplasts. Current techniques for separating organelle reads from nuclear reads in WGS data require full reference or partial seed sequences for assembling. Norgal (de Novo ORGAneLle extractor) avoids this requirement by identifying a high frequency subset of k-mers that are predominantly of mitochondrial origin and performing a de novo assembly on a subset of reads that contains these k-mers. The method was applied to WGS data from a panda, brown algae seaweed, butterfly and filamentous fungus. We were able to extract full circular mitochondrial genomes and obtained sequence identities to the reference sequences in the range from 98.5 to 99.5%. We also assembled the chloroplasts of grape vines and cucumbers using Norgal together with seed-based de novo assemblers. Norgal is a pipeline that can extract and assemble full or partial mitochondrial and chloroplast genomes from WGS short reads without prior knowledge. The program is available at: https://bitbucket.org/kosaidtu/norgal .

  14. Low-pass sequencing for microbial comparative genomics

    Directory of Open Access Journals (Sweden)

    Kennedy Sean

    2004-01-01

    Full Text Available Abstract Background We studied four extremely halophilic archaea by low-pass shotgun sequencing: (1 the metabolically versatile Haloarcula marismortui; (2 the non-pigmented Natrialba asiatica; (3 the psychrophile Halorubrum lacusprofundi and (4 the Dead Sea isolate Halobaculum gomorrense. Approximately one thousand single pass genomic sequences per genome were obtained. The data were analyzed by comparative genomic analyses using the completed Halobacterium sp. NRC-1 genome as a reference. Low-pass shotgun sequencing is a simple, inexpensive, and rapid approach that can readily be performed on any cultured microbe. Results As expected, the four archaeal halophiles analyzed exhibit both bacterial and eukaryotic characteristics as well as uniquely archaeal traits. All five halophiles exhibit greater than sixty percent GC content and low isoelectric points (pI for their predicted proteins. Multiple insertion sequence (IS elements, often involved in genome rearrangements, were identified in H. lacusprofundi and H. marismortui. The core biological functions that govern cellular and genetic mechanisms of H. sp. NRC-1 appear to be conserved in these four other halophiles. Multiple TATA box binding protein (TBP and transcription factor IIB (TFB homologs were identified from most of the four shotgunned halophiles. The reconstructed molecular tree of all five halophiles shows a large divergence between these species, but with the closest relationship being between H. sp. NRC-1 and H. lacusprofundi. Conclusion Despite the diverse habitats of these species, all five halophiles share (1 high GC content and (2 low protein isoelectric points, which are characteristics associated with environmental exposure to UV radiation and hypersalinity, respectively. Identification of multiple IS elements in the genome of H. lacusprofundi and H. marismortui suggest that genome structure and dynamic genome reorganization might be similar to that previously observed in the

  15. Shotgun metagenomic data streams: surfing without fear

    Energy Technology Data Exchange (ETDEWEB)

    Berendzen, Joel R [Los Alamos National Laboratory

    2010-12-06

    Timely information about bio-threat prevalence, consequence, propagation, attribution, and mitigation is needed to support decision-making, both routinely and in a crisis. One DNA sequencer can stream 25 Gbp of information per day, but sampling strategies and analysis techniques are needed to turn raw sequencing power into actionable knowledge. Shotgun metagenomics can enable biosurveillance at the level of a single city, hospital, or airplane. Metagenomics characterizes viruses and bacteria from complex environments such as soil, air filters, or sewage. Unlike targeted-primer-based sequencing, shotgun methods are not blind to sequences that are truly novel, and they can measure absolute prevalence. Shotgun metagenomic sampling can be non-invasive, efficient, and inexpensive while being informative. We have developed analysis techniques for shotgun metagenomic sequencing that rely upon phylogenetic signature patterns. They work by indexing local sequence patterns in a manner similar to web search engines. Our methods are laptop-fast and favorable scaling properties ensure they will be sustainable as sequencing methods grow. We show examples of application to soil metagenomic samples.

  16. Whole genome sequencing as the ultimate tool to diagnose tuberculosis

    Directory of Open Access Journals (Sweden)

    Dick van Soolingen

    2016-01-01

    Full Text Available In the past two decades, DNA techniques have been increasingly used in the laboratory diagnosis of tuberculosis (TB. The (sub species of the Mycobacterium tuberculosis complex are usually identified using reverse line blot techniques. The resistance is predicted by the detection of mutations in genes associated with resistance. Nevertheless, all cases are still subjected to cumbersome phenotypic resistance testing. The production of a strain-characteristic DNA fingerprint, to investigate the epidemiology of TB, is done by the 24-locus variable number tandem repeat (VNTR typing. However, most of the molecular techniques in the diagnosis of TB can eventually be replaced by whole genome sequencing (WGS. Many international TB reference laboratories are currently working on the introduction of WGS; however, standardization in the international context is lacking. The European Centre for Infectious Disease Prevention and Control in Stockholm, Sweden organizes a yearly round of quality control on VNTR typing and in 2015 for the first time also WGS. In this first proficiency study, only three out of eight international TB laboratories produced WGS results in line with those of the reference laboratory. The whole process of DNA isolation, purification, quantification, sequencing, and analysis/interpretation of data is still under development. In this presentation, many aspects will be covered that influence the quality and interpretation of WGS results. The turn-around-time, analysis, and utility of WGS will be discussed. Moreover, the experiences in the use of WGS in the molecular epidemiology of TB in The Netherlands are detailed. It can be concluded that many difficulties still have to be conquered. The state of the art is that bacteria still have to be cultured to have sufficient quality and quantity of DNA for succesful WGS. The quality of sequencing has improved significantly over the past 7 years, and the detection of mutations has, therefore

  17. snpTree - a web-server to identify and construct SNP trees from whole genome sequence data

    DEFF Research Database (Denmark)

    Leekitcharoenphon, Pimlapas; Kaas, Rolf Sommer; Thomsen, Martin Christen Frølund

    2012-01-01

    identify SNPs and construct phylogenetic trees from WGS as well as from assembled genomes or contigs. WGS data in fastq format are aligned to reference genomes by BWA while contigs in fasta format are processed by Nucmer. SNPs are concatenated based on position on reference genome and a tree is constructed...... to differentiate and classify isolates. One of the successfully and broadly used methods is analysis of single nucletide polymorphisms (SNPs). Currently, there are different tools and methods to identify SNPs including various options and cut-off values. Furthermore, all current methods require bioinformatic...... skills. Thus, we lack a standard and simple automatic tool to determine SNPs and construct phylogenetic tree from WGS data. Results Here we introduce snpTree, a server for online-automatic SNPs analysis. This tool is composed of different SNPs analysis suites, perl and python scripts. snpTree can...

  18. Whole-Genome Sequencing for National Surveillance of Shigella flexneri

    Directory of Open Access Journals (Sweden)

    Marie A. Chattaway

    2017-09-01

    Full Text Available National surveillance of Shigella flexneri ensures the rapid detection of outbreaks to facilitate public health investigation and intervention strategies. In this study, we used whole-genome sequencing (WGS to type S. flexneri in order to detect linked cases and support epidemiological investigations. We prospectively analyzed 330 isolates of S. flexneri received at the Gastrointestinal Bacteria Reference Unit at Public Health England between August 2015 and January 2016. Traditional phenotypic and WGS sub-typing methods were compared. PCR was carried out on isolates exhibiting phenotypic/genotypic discrepancies with respect to serotype. Phylogenetic relationships between isolates were analyzed by WGS using single nucleotide polymorphism (SNP typing to facilitate cluster detection. For 306/330 (93% isolates there was concordance between serotype derived from the genome and phenotypic serology. Discrepant results between the phenotypic and genotypic tests were attributed to novel O-antigen synthesis/modification gene combinations or indels identified in O-antigen synthesis/modification genes rendering them dysfunctional. SNP typing identified 36 clusters of two isolates or more. WGS provided microbiological evidence of epidemiologically linked clusters and detected novel O-antigen synthesis/modification gene combinations associated with two outbreaks. WGS provided reliable and robust data for monitoring trends in the incidence of different serotypes over time. SNP typing can be used to facilitate outbreak investigations in real-time thereby informing surveillance strategies and providing the opportunities for implementing timely public health interventions.

  19. Whole-genome shotgun optical mapping of Rhodospirillum rubrum

    Energy Technology Data Exchange (ETDEWEB)

    Reslewic, S. [Univ. Wisc.-Madison; Zhou, S. [Univ. Wisc.-Madison; Place, M. [Univ. Wisc.-Madison; Zhang, Y. [Univ. Wisc.-Madison; Briska, A. [Univ. Wisc.-Madison; Goldstein, S. [Univ. Wisc.-Madison; Churas, C. [Univ. Wisc.-Madison; Runnheim, R. [Univ. Wisc.-Madison; Forrest, D. [Univ. Wisc.-Madison; Lim, A. [Univ. Wisc.-Madison; Lapidus, A. [Univ. Wisc.-Madison; Han, C. S. [Univ. Wisc.-Madison; Roberts, G. P. [Univ. Wisc.-Madison; Schwartz, D. C. [Univ. Wisc.-Madison

    2005-09-01

    Rhodospirillum rubrum is a phototrophic purple nonsulfur bacterium known for its unique and well-studied nitrogen fixation and carbon monoxide oxidation systems and as a source of hydrogen and biodegradable plastic production. To better understand this organism and to facilitate assembly of its sequence, three whole-genome restriction endonuclease maps (XbaI, NheI, and HindIII) of R. rubrum strain ATCC 11170 were created by optical mapping. Optical mapping is a system for creating whole-genome ordered restriction endonuclease maps from randomly sheared genomic DNA molecules extracted from cells. During the sequence finishing process, all three optical maps confirmed a putative error in sequence assembly, while the HindIII map acted as a scaffold for high-resolution alignment with sequence contigs spanning the whole genome. In addition to highlighting optical mapping's role in the assembly and confirmation of genome sequence, this work underscores the unique niche in resolution occupied by the optical mapping system. With a resolution ranging from 6.5 kb (previously published) to 45 kb (reported here), optical mapping advances a "molecular cytogenetics" approach to solving problems in genomic analysis.

  20. Challenging a bioinformatic tool's ability to detect microbial contaminants using in silico whole genome sequencing data.

    Science.gov (United States)

    Olson, Nathan D; Zook, Justin M; Morrow, Jayne B; Lin, Nancy J

    2017-01-01

    High sensitivity methods such as next generation sequencing and polymerase chain reaction (PCR) are adversely impacted by organismal and DNA contaminants. Current methods for detecting contaminants in microbial materials (genomic DNA and cultures) are not sensitive enough and require either a known or culturable contaminant. Whole genome sequencing (WGS) is a promising approach for detecting contaminants due to its sensitivity and lack of need for a priori assumptions about the contaminant. Prior to applying WGS, we must first understand its limitations for detecting contaminants and potential for false positives. Herein we demonstrate and characterize a WGS-based approach to detect organismal contaminants using an existing metagenomic taxonomic classification algorithm. Simulated WGS datasets from ten genera as individuals and binary mixtures of eight organisms at varying ratios were analyzed to evaluate the role of contaminant concentration and taxonomy on detection. For the individual genomes the false positive contaminants reported depended on the genus, with Staphylococcus , Escherichia , and Shigella having the highest proportion of false positives. For nearly all binary mixtures the contaminant was detected in the in-silico datasets at the equivalent of 1 in 1,000 cells, though F. tularensis was not detected in any of the simulated contaminant mixtures and Y. pestis was only detected at the equivalent of one in 10 cells. Once a WGS method for detecting contaminants is characterized, it can be applied to evaluate microbial material purity, in efforts to ensure that contaminants are characterized in microbial materials used to validate pathogen detection assays, generate genome assemblies for database submission, and benchmark sequencing methods.

  1. Rainbow: a tool for large-scale whole-genome sequencing data analysis using cloud computing.

    Science.gov (United States)

    Zhao, Shanrong; Prenger, Kurt; Smith, Lance; Messina, Thomas; Fan, Hongtao; Jaeger, Edward; Stephens, Susan

    2013-06-27

    Technical improvements have decreased sequencing costs and, as a result, the size and number of genomic datasets have increased rapidly. Because of the lower cost, large amounts of sequence data are now being produced by small to midsize research groups. Crossbow is a software tool that can detect single nucleotide polymorphisms (SNPs) in whole-genome sequencing (WGS) data from a single subject; however, Crossbow has a number of limitations when applied to multiple subjects from large-scale WGS projects. The data storage and CPU resources that are required for large-scale whole genome sequencing data analyses are too large for many core facilities and individual laboratories to provide. To help meet these challenges, we have developed Rainbow, a cloud-based software package that can assist in the automation of large-scale WGS data analyses. Here, we evaluated the performance of Rainbow by analyzing 44 different whole-genome-sequenced subjects. Rainbow has the capacity to process genomic data from more than 500 subjects in two weeks using cloud computing provided by the Amazon Web Service. The time includes the import and export of the data using Amazon Import/Export service. The average cost of processing a single sample in the cloud was less than 120 US dollars. Compared with Crossbow, the main improvements incorporated into Rainbow include the ability: (1) to handle BAM as well as FASTQ input files; (2) to split large sequence files for better load balance downstream; (3) to log the running metrics in data processing and monitoring multiple Amazon Elastic Compute Cloud (EC2) instances; and (4) to merge SOAPsnp outputs for multiple individuals into a single file to facilitate downstream genome-wide association studies. Rainbow is a scalable, cost-effective, and open-source tool for large-scale WGS data analysis. For human WGS data sequenced by either the Illumina HiSeq 2000 or HiSeq 2500 platforms, Rainbow can be used straight out of the box. Rainbow is available

  2. Applied Genomics of Foodborne Pathogens

    DEFF Research Database (Denmark)

    and customized source of information designed for and accessible to microbiologists interested in applying cutting-edge genomics in food safety and public health research. This book fills this void with a well-selected collection of topics, case studies, and bioinformatics tools contributed by experts......This book provides a timely and thorough snapshot into the emerging and fast evolving area of applied genomics of foodborne pathogens. Driven by the drastic advance of whole genome shot gun sequencing (WGS) technologies, genomics applications are becoming increasingly valuable and even essential...... at the forefront of foodborne pathogen genomics research....

  3. Entrance, exit, and reentrance of one shot with a shotgun

    DEFF Research Database (Denmark)

    Gulmann, C; Hougen, H P

    1999-01-01

    The case being reported is one of a homicidal shotgun fatality with an unusual wound pattern. A 34-year-old man was shot at close range with a 12-gauge shotgun armed with No. 5 birdshot ammunition. The shot entered the left axillary region, exited through the left infraclavicular region, and ther......The case being reported is one of a homicidal shotgun fatality with an unusual wound pattern. A 34-year-old man was shot at close range with a 12-gauge shotgun armed with No. 5 birdshot ammunition. The shot entered the left axillary region, exited through the left infraclavicular region...

  4. The Use of Genomics in Microbiology: From Vaccines to Drug Resistance

    KAUST Repository

    Hill-Cawthorne, Grant A.

    2015-05-01

    Since 2004 sequencing has undergone a revolutionary change with the advent of first the 454 sequencer, followed by the introduction of the Solexa/Illumina chemistries. This has led to the ability to sequence the whole genomes of a large number of microorganisms in a short space of time. Microbiology’s last revolution was in the introduction of PCR, which allowed for faster detection of pathogens, particularly viruses. With whole genome sequencing (WGS) many of the time-consuming steps carried out by a reference laboratory can be skipped. These include organism detection, speciation, antimicrobial susceptibility testing, typing and molecular epidemiology – all carried out in a single sequencing run followed by bioinformatics analysis. So far the merits of WGS in microbiology have only been demonstrated in highly specialised scientific laboratories in high-income countries. However, with continuingly decreasing costs and increasing throughput, many public health laboratories are now acquiring sequencers and their use will inevitably spread to middle-income countries. In this thesis I explore the use of WGS in three specific areas and include details on how to develop and assess bioinformatics pipelines. First, I shall demonstrate that WGS can be used to assess highly divergent regions within microorganisms by using the example of the weaknesses in current approaches to the molecular detection of methicillin-resistant Staphylococcus aureus isolates. In particular, I shall highlight how current molecular tests have limitations in detecting drug resistance when the regions of the genome conferring resistance have significant mutations. Second, I will examine how WGS can provide insights into the biology of the current vaccine against tuberculosis, Bacillus Calmette Guerin (BCG), particularly how the continued passage of the seedlot for this vaccine has led to very different versions being used around the globe. Finally, I will demonstrate how WGS can be used to

  5. Survey on the Use of Whole-Genome Sequencing for Infectious Diseases Surveillance: Rapid Expansion of European National Capacities, 2015–2016

    Directory of Open Access Journals (Sweden)

    Joana Revez

    2017-12-01

    Full Text Available Whole-genome sequencing (WGS has become an essential tool for public health surveillance and molecular epidemiology of infectious diseases and antimicrobial drug resistance. It provides precise geographical delineation of spread and enables incidence monitoring of pathogens at genotype level. Coupled with epidemiological and environmental investigations, it delivers ultimate resolution for tracing sources of epidemic infections. To ascertain the level of implementation of WGS-based typing for national public health surveillance and investigation of prioritized diseases in the European Union (EU/European Economic Area (EEA, two surveys were conducted in 2015 and 2016. The surveys were designed to determine the national public health reference laboratories’ access to WGS and operational WGS-based typing capacity for national surveillance of selected foodborne pathogens, antimicrobial-resistant pathogens, and vaccine-preventable diseases identified as priorities for European genomic surveillance. Twenty-eight and twenty-nine out of the 30 EU/EEA countries participated in the survey in 2015 and 2016, respectively. National public health reference laboratories in 22 and 25 countries had access to WGS-based typing for public health applications in 2015 and 2016, respectively. Reported reasons for limited or no access were lack of funding, staff, and expertise. Illumina technology was the most frequently used followed by Ion Torrent technology. The access to bioinformatics expertise and competence for routine WGS data analysis was limited. By mid-2016, half of the EU/EEA countries were using WGS analysis either as first- or second-line typing method for surveillance of the pathogens and antibiotic resistance issues identified as EU priorities. The sampling frame as well as bioinformatics analysis varied by pathogen/resistance issue and country. Core genome multilocus allelic profiling, also called cgMLST, was the most frequently used annotation

  6. Harnessing cross-species alignment to discover SNPs and generate a draft genome sequence of a bighorn sheep (Ovis canadensis).

    Science.gov (United States)

    Miller, Joshua M; Moore, Stephen S; Stothard, Paul; Liao, Xiaoping; Coltman, David W

    2015-05-20

    Whole genome sequences (WGS) have proliferated as sequencing technology continues to improve and costs decline. While many WGS of model or domestic organisms have been produced, a growing number of non-model species are also being sequenced. In the absence of a reference, construction of a genome sequence necessitates de novo assembly which may be beyond the ability of many labs due to the large volumes of raw sequence data and extensive bioinformatics required. In contrast, the presence of a reference WGS allows for alignment which is more tractable than assembly. Recent work has highlighted that the reference need not come from the same species, potentially enabling a wide array of species WGS to be constructed using cross-species alignment. Here we report on the creation a draft WGS from a single bighorn sheep (Ovis canadensis) using alignment to the closely related domestic sheep (Ovis aries). Two sequencing libraries on SOLiD platforms yielded over 865 million reads, and combined alignment to the domestic sheep reference resulted in a nearly complete sequence (95% coverage of the reference) at an average of 12x read depth (104 SD). From this we discovered over 15 million variants and annotated them relative to the domestic sheep reference. We then conducted an enrichment analysis of those SNPs showing fixed differences between the reference and sequenced individual and found significant differences in a number of gene ontology (GO) terms, including those associated with reproduction, muscle properties, and bone deposition. Our results demonstrate that cross-species alignment enables the creation of novel WGS for non-model organisms. The bighorn sheep WGS will provide a resource for future resequencing studies or comparative genomics.

  7. 33 CFR 110.129a - Apra Harbor, Guam. (Datum: WGS 84)

    Science.gov (United States)

    2010-07-01

    ... 33 Navigation and Navigable Waters 1 2010-07-01 2010-07-01 false Apra Harbor, Guam. (Datum: WGS 84) 110.129a Section 110.129a Navigation and Navigable Waters COAST GUARD, DEPARTMENT OF HOMELAND SECURITY ANCHORAGES ANCHORAGE REGULATIONS Special Anchorage Areas § 110.129a Apra Harbor, Guam. (Datum: WGS 84) (a...

  8. Novel advances in shotgun lipidomics for biology and medicine.

    Science.gov (United States)

    Wang, Miao; Wang, Chunyan; Han, Rowland H; Han, Xianlin

    2016-01-01

    The field of lipidomics, as coined in 2003, has made profound advances and been rapidly expanded. The mass spectrometry-based strategies of this analytical methodology-oriented research discipline for lipid analysis are largely fallen into three categories: direct infusion-based shotgun lipidomics, liquid chromatography-mass spectrometry-based platforms, and matrix-assisted laser desorption/ionization mass spectrometry-based approaches (particularly in imagining lipid distribution in tissues or cells). This review focuses on shotgun lipidomics. After briefly introducing its fundamentals, the major materials of this article cover its recent advances. These include the novel methods of lipid extraction, novel shotgun lipidomics strategies for identification and quantification of previously hardly accessible lipid classes and molecular species including isomers, and novel tools for processing and interpretation of lipidomics data. Representative applications of advanced shotgun lipidomics for biological and biomedical research are also presented in this review. We believe that with these novel advances in shotgun lipidomics, this approach for lipid analysis should become more comprehensive and high throughput, thereby greatly accelerating the lipidomics field to substantiate the aberrant lipid metabolism, signaling, trafficking, and homeostasis under pathological conditions and their underpinning biochemical mechanisms. Copyright © 2015 Elsevier B.V. All rights reserved.

  9. Whole-genome analysis of a patient with early-stage small-cell lung cancer.

    Science.gov (United States)

    Han, J-Y; Lee, Y-S; Kim, B C; Lee, G K; Lee, S; Kim, E-H; Kim, H-M; Bhak, J

    2014-12-01

    We performed whole-genome sequencing (WGS) of a case of early-stage small-cell lung cancer (SCLC) to analyze the genomic features. WGS revealed a lot of single-nucleotide variations (SNVs), small insertion/deletions and chromosomal abnormality. Chromosomes 4p, 5q, 13q, 15q, 17p and 22q contained many block deletions. Especially, copy loss was observed in tumor suppressor genes RB1 and TP53, and copy gain in oncogene hTERT. Somatic mutations were found in TP53 and CREBBP. Novel nonsynonymous (ns) SNVs in C6ORF103 and SLC5A4 genes were also found. Sanger sequencing of the SLC5A4 gene in 23 independent SCLC samples showed another nsSNV in the SLC5A4 gene, indicating that nsSNVs in the SLC5A4 gene are recurrent in SCLC. WGS of an early-stage SCLC identified novel recurrent mutations and validated known variations, including copy number variations. These findings provide insight into the genomic landscape contributing to SCLC development.

  10. Rapid whole genome sequencing and precision neonatology.

    Science.gov (United States)

    Petrikin, Joshua E; Willig, Laurel K; Smith, Laurie D; Kingsmore, Stephen F

    2015-12-01

    Traditionally, genetic testing has been too slow or perceived to be impractical to initial management of the critically ill neonate. Technological advances have led to the ability to sequence and interpret the entire genome of a neonate in as little as 26 h. As the cost and speed of testing decreases, the utility of whole genome sequencing (WGS) of neonates for acute and latent genetic illness increases. Analyzing the entire genome allows for concomitant evaluation of the currently identified 5588 single gene diseases. When applied to a select population of ill infants in a level IV neonatal intensive care unit, WGS yielded a diagnosis of a causative genetic disease in 57% of patients. These diagnoses may lead to clinical management changes ranging from transition to palliative care for uniformly lethal conditions for alteration or initiation of medical or surgical therapy to improve outcomes in others. Thus, institution of 2-day WGS at time of acute presentation opens the possibility of early implementation of precision medicine. This implementation may create opportunities for early interventional, frequently novel or off-label therapies that may alter disease trajectory in infants with what would otherwise be fatal disease. Widespread deployment of rapid WGS and precision medicine will raise ethical issues pertaining to interpretation of variants of unknown significance, discovery of incidental findings related to adult onset conditions and carrier status, and implementation of medical therapies for which little is known in terms of risks and benefits. Despite these challenges, precision neonatology has significant potential both to decrease infant mortality related to genetic diseases with onset in newborns and to facilitate parental decision making regarding transition to palliative care. Copyright © 2015 Elsevier Inc. All rights reserved.

  11. Challenging a bioinformatic tool’s ability to detect microbial contaminants using in silico whole genome sequencing data

    Directory of Open Access Journals (Sweden)

    Nathan D. Olson

    2017-09-01

    Full Text Available High sensitivity methods such as next generation sequencing and polymerase chain reaction (PCR are adversely impacted by organismal and DNA contaminants. Current methods for detecting contaminants in microbial materials (genomic DNA and cultures are not sensitive enough and require either a known or culturable contaminant. Whole genome sequencing (WGS is a promising approach for detecting contaminants due to its sensitivity and lack of need for a priori assumptions about the contaminant. Prior to applying WGS, we must first understand its limitations for detecting contaminants and potential for false positives. Herein we demonstrate and characterize a WGS-based approach to detect organismal contaminants using an existing metagenomic taxonomic classification algorithm. Simulated WGS datasets from ten genera as individuals and binary mixtures of eight organisms at varying ratios were analyzed to evaluate the role of contaminant concentration and taxonomy on detection. For the individual genomes the false positive contaminants reported depended on the genus, with Staphylococcus, Escherichia, and Shigella having the highest proportion of false positives. For nearly all binary mixtures the contaminant was detected in the in-silico datasets at the equivalent of 1 in 1,000 cells, though F. tularensis was not detected in any of the simulated contaminant mixtures and Y. pestis was only detected at the equivalent of one in 10 cells. Once a WGS method for detecting contaminants is characterized, it can be applied to evaluate microbial material purity, in efforts to ensure that contaminants are characterized in microbial materials used to validate pathogen detection assays, generate genome assemblies for database submission, and benchmark sequencing methods.

  12. Shotgun protein sequencing.

    Energy Technology Data Exchange (ETDEWEB)

    Faulon, Jean-Loup Michel; Heffelfinger, Grant S.

    2009-06-01

    A novel experimental and computational technique based on multiple enzymatic digestion of a protein or protein mixture that reconstructs protein sequences from sequences of overlapping peptides is described in this SAND report. This approach, analogous to shotgun sequencing of DNA, is to be used to sequence alternative spliced proteins, to identify post-translational modifications, and to sequence genetically engineered proteins.

  13. Whole genome sequencing: an efficient approach to ensuring food safety

    Science.gov (United States)

    Lakicevic, B.; Nastasijevic, I.; Dimitrijevic, M.

    2017-09-01

    Whole genome sequencing is an effective, powerful tool that can be applied to a wide range of public health and food safety applications. A major difference between WGS and the traditional typing techniques is that WGS allows all genes to be included in the analysis, instead of a well-defined subset of genes or variable intergenic regions. Also, the use of WGS can facilitate the understanding of contamination/colonization routes of foodborne pathogens within the food production environment, and can also afford efficient tracking of pathogens’ entry routes and distribution from farm-to-consumer. Tracking foodborne pathogens in the food processing-distribution-retail-consumer continuum is of the utmost importance for facilitation of outbreak investigations and rapid action in controlling/preventing foodborne outbreaks. Therefore, WGS likely will replace most of the numerous workflows used in public health laboratories to characterize foodborne pathogens into one consolidated, efficient workflow.

  14. Integrated analysis of whole genome and transcriptome sequencing reveals diverse transcriptomic aberrations driven by somatic genomic changes in liver cancers.

    Directory of Open Access Journals (Sweden)

    Yuichi Shiraishi

    Full Text Available Recent studies applying high-throughput sequencing technologies have identified several recurrently mutated genes and pathways in multiple cancer genomes. However, transcriptional consequences from these genomic alterations in cancer genome remain unclear. In this study, we performed integrated and comparative analyses of whole genomes and transcriptomes of 22 hepatitis B virus (HBV-related hepatocellular carcinomas (HCCs and their matched controls. Comparison of whole genome sequence (WGS and RNA-Seq revealed much evidence that various types of genomic mutations triggered diverse transcriptional changes. Not only splice-site mutations, but also silent mutations in coding regions, deep intronic mutations and structural changes caused splicing aberrations. HBV integrations generated diverse patterns of virus-human fusion transcripts depending on affected gene, such as TERT, CDK15, FN1 and MLL4. Structural variations could drive over-expression of genes such as WNT ligands, with/without creating gene fusions. Furthermore, by taking account of genomic mutations causing transcriptional aberrations, we could improve the sensitivity of deleterious mutation detection in known cancer driver genes (TP53, AXIN1, ARID2, RPS6KA3, and identified recurrent disruptions in putative cancer driver genes such as HNF4A, CPS1, TSC1 and THRAP3 in HCCs. These findings indicate genomic alterations in cancer genome have diverse transcriptomic effects, and integrated analysis of WGS and RNA-Seq can facilitate the interpretation of a large number of genomic alterations detected in cancer genome.

  15. Genetic counselors' views and experiences with the clinical integration of genome sequencing.

    Science.gov (United States)

    Machini, Kalotina; Douglas, Jessica; Braxton, Alicia; Tsipis, Judith; Kramer, Kate

    2014-08-01

    In recent years, new sequencing technologies known as next generation sequencing (NGS) have provided scientists the ability to rapidly sequence all known coding as well as non-coding sequences in the human genome. As the two emerging approaches, whole exome (WES) and whole genome (WGS) sequencing, have started to be integrated in the clinical arena, we sought to survey health care professionals who are likely to be involved in the implementation process now and/or in the future (e.g., genetic counselors, geneticists and nurse practitioners). Two hundred twenty-one genetic counselors- one third of whom currently offer WES/WGS-participated in an anonymous online survey. The aims of the survey were first, to identify barriers to the implementation of WES/WGS, as perceived by survey participants; second, to provide the first systematic report of current practices regarding the integration of WES/WGS in clinic and/or research across the US and Canada and to illuminate the roles and challenges of genetic counselors participating in this process; and third to evaluate the impact of WES/WGS on patient care. Our results showed that genetic counseling practices with respect to WES/WGS are consistent with the criteria set forth in the ACMG 2012 policy statement, which highlights indications for testing, reporting, and pre/post test considerations. Our respondents described challenges related to offering WES/WGS, which included billing issues, the duration and content of the consent process, result interpretation and disclosure of incidental findings and variants of unknown significance. In addition, respondents indicated that specialty area (i.e., prenatal and cancer), lack of clinical utility of WES/WGS and concerns about interpretation of test results were factors that prevented them from offering this technology to patients. Finally, study participants identified the aspects of their professional training which have been most beneficial in aiding with the integration of

  16. Whole-Genome Sequencing Coupled to Imputation Discovers Genetic Signals for Anthropometric Traits

    NARCIS (Netherlands)

    I. Tachmazidou (Ioanna); Süveges, D. (Dániel); J. Min (Josine); G.R.S. Ritchie (Graham R.S.); Steinberg, J. (Julia); K. Walter (Klaudia); V. Iotchkova (Valentina); J.A. Schwartzentruber (Jeremy); J. Huang (Jian); Y. Memari (Yasin); McCarthy, S. (Shane); Crawford, A.A. (Andrew A.); C. Bombieri (Cristina); M. Cocca (Massimiliano); A.-E. Farmaki (Aliki-Eleni); T.R. Gaunt (Tom); P. Jousilahti (Pekka); M.N. Kooijman (Marjolein ); Lehne, B. (Benjamin); G. Malerba (Giovanni); S. Männistö (Satu); A. Matchan (Angela); M.C. Medina-Gomez (Carolina); S. Metrustry (Sarah); A. Nag (Abhishek); I. Ntalla (Ioanna); L. Paternoster (Lavinia); N.W. Rayner (Nigel William); C. Sala (Cinzia); W.R. Scott (William R.); H.A. Shihab (Hashem A.); L. Southam (Lorraine); B. St Pourcain (Beate); M. Traglia (Michela); K. Trajanoska (Katerina); Zaza, G. (Gialuigi); W. Zhang (Weihua); M.S. Artigas; Bansal, N. (Narinder); M. Benn (Marianne); Chen, Z. (Zhongsheng); P. Danecek (Petr); Lin, W.-Y. (Wei-Yu); A. Locke (Adam); J. Luan (Jian'An); A.K. Manning (Alisa); Mulas, A. (Antonella); C. Sidore (Carlo); A. Tybjaerg-Hansen; A. Varbo (Anette); M. Zoledziewska (Magdalena); C. Finan (Chris); Hatzikotoulas, K. (Konstantinos); A.E. Hendricks (Audrey E.); J.P. Kemp (John); A. Moayyeri (Alireza); Panoutsopoulou, K. (Kalliope); Szpak, M. (Michal); S.G. Wilson (Scott); M. Boehnke (Michael); F. Cucca (Francesco); Di Angelantonio, E. (Emanuele); C. Langenberg (Claudia); C.M. Lindgren (Cecilia M.); McCarthy, M.I. (Mark I.); A.P. Morris (Andrew); B.G. Nordestgaard (Børge); R.A. Scott (Robert); M.D. Tobin (Martin); N.J. Wareham (Nick); P.R. Burton (Paul); J.C. Chambers (John); Smith, G.D. (George Davey); G.V. Dedoussis (George); J.F. Felix (Janine); O.H. Franco (Oscar); Gambaro, G. (Giovanni); P. Gasparini (Paolo); C.J. Hammond (Christopher J.); A. Hofman (Albert); V.W.V. Jaddoe (Vincent); M.E. Kleber (Marcus); J.S. Kooner (Jaspal S.); M. Perola (Markus); C.L. Relton (Caroline); S.M. Ring (Susan); F. Rivadeneira Ramirez (Fernando); V. Salomaa (Veikko); T.D. Spector (Timothy); O. Stegle (Oliver); D. Toniolo (Daniela); A.G. Uitterlinden (André); I.E. Barroso (Inês); C.M.T. Greenwood (Celia); Perry, J.R.B. (John R.B.); Walker, B.R. (Brian R.); A.S. Butterworth (Adam); Y. Xue (Yali); R. Durbin (Richard); K.S. Small (Kerrin); N. Soranzo (Nicole); N.J. Timpson (Nicholas); E. Zeggini (Eleftheria)

    2016-01-01

    textabstractDeep sequence-based imputation can enhance the discovery power of genome-wide association studies by assessing previously unexplored variation across the common- and low-frequency spectra. We applied a hybrid whole-genome sequencing (WGS) and deep imputation approach to examine the

  17. Whole-Genome Sequencing Coupled to Imputation Discovers Genetic Signals for Anthropometric Traits

    DEFF Research Database (Denmark)

    Tachmazidou, Ioanna; Süveges, Dániel; Min, Josine L

    2017-01-01

    Deep sequence-based imputation can enhance the discovery power of genome-wide association studies by assessing previously unexplored variation across the common- and low-frequency spectra. We applied a hybrid whole-genome sequencing (WGS) and deep imputation approach to examine the broader alleli...

  18. Application of WGS data for O-specific antigen analysis and in silico serotyping of Pseudomonas aeruginosa isolates

    DEFF Research Database (Denmark)

    Thrane, Sandra Wingaard; Taylor, Véronique L.; Lund, Ole

    2016-01-01

    aeruginosa serotyper (PAst) program, which enabled in silico serotyping of P. aeruginosa isolates using WGS data. PAst has been made publically available as a web-service, and aptly facilitate high-throughput serotyping analysis. The program overcomes critical issues such as the loss of in vitro typeability...... often associated with P. aeruginosa isolates from chronic infections, and quickly determines the serogroup of an isolate based on the sequence of the O-specific antigen (OSA) gene cluster. Here, PAst analysis of 1649 genomes resulted in successful serogroup assignments in 99.27% of the cases...

  19. Shedding genomic light on Aristotle's lantern.

    Science.gov (United States)

    Sodergren, Erica; Shen, Yufeng; Song, Xingzhi; Zhang, Lan; Gibbs, Richard A; Weinstock, George M

    2006-12-01

    Sea urchins have proved fascinating to biologists since the time of Aristotle who compared the appearance of their bony mouth structure to a lantern in The History of Animals. Throughout modern times it has been a model system for research in developmental biology. Now, the genome of the sea urchin Strongylocentrotus purpuratus is the first echinoderm genome to be sequenced. A high quality draft sequence assembly was produced using the Atlas assembler to combine whole genome shotgun sequences with sequences from a collection of BACs selected to form a minimal tiling path along the genome. A formidable challenge was presented by the high degree of heterozygosity between the two haplotypes of the selected male representative of this marine organism. This was overcome by use of the BAC tiling path backbone, in which each BAC represents a single haplotype, as well as by improvements in the Atlas software. Another innovation introduced in this project was the sequencing of pools of tiling path BACs rather than individual BAC sequencing. The Clone-Array Pooled Shotgun Strategy greatly reduced the cost and time devoted to preparing shotgun libraries from BAC clones. The genome sequence was analyzed with several gene prediction methods to produce a comprehensive gene list that was then manually refined and annotated by a volunteer team of sea urchin experts. This latter annotation community edited over 9000 gene models and uncovered many unexpected aspects of the sea urchin genetic content impacting transcriptional regulation, immunology, sensory perception, and an organism's development. Analysis of the basic deuterostome genetic complement supports the sea urchin's role as a model system for deuterostome and, by extension, chordate development.

  20. Complete Genome Sequence of Pediococcus pentosaceus Strain SL4

    DEFF Research Database (Denmark)

    Dantoft, Shruti Harnal; Bielak, Eliza Maria; Seo, Jae-Gu

    2013-01-01

    Pediococcus pentosaceus SL4 was isolated from a Korean fermented vegetable product, kimchi. We report here the whole-genome sequence (WGS) of P. pentosaceus SL4. The genome consists of a 1.79-Mb circular chromosome (G+C content of 37.3%) and seven distinct plasmids ranging in size from 4 kb to 50...

  1. Accuracy and reliability of plane networks transformed from WGS84 into S-JTSK

    Directory of Open Access Journals (Sweden)

    Sütti Juraj

    1998-12-01

    Full Text Available Lokálne polohové siete, zamerané metódami GPS a spracované v systéme WGS84, musia sa pre praktické použitie v geodézii (vytyèovacie práce, kontrolné merania a iné transformova do dátumu S-JTSK. Na posúdenie presnosti takto vzniklého bodového po¾a je potrebné transformova z WGS84 do S-JTSK aj charakteristiky presnosti bodov. Charakteristiky spo¾ahlivosti transformovanej siete v S-JTSK sú identické s charakteristikami vo WGS84.

  2. The Use of Non-Variant Sites to Improve the Clinical Assessment of Whole-Genome Sequence Data.

    Directory of Open Access Journals (Sweden)

    Alberto Ferrarini

    Full Text Available Genetic testing, which is now a routine part of clinical practice and disease management protocols, is often based on the assessment of small panels of variants or genes. On the other hand, continuous improvements in the speed and per-base costs of sequencing have now made whole exome sequencing (WES and whole genome sequencing (WGS viable strategies for targeted or complete genetic analysis, respectively. Standard WGS/WES data analytical workflows generally rely on calling of sequence variants respect to the reference genome sequence. However, the reference genome sequence contains a large number of sites represented by rare alleles, by known pathogenic alleles and by alleles strongly associated to disease by GWAS. It's thus critical, for clinical applications of WGS and WES, to interpret whether non-variant sites are homozygous for the reference allele or if the corresponding genotype cannot be reliably called. Here we show that an alternative analytical approach based on the analysis of both variant and non-variant sites from WGS data allows to genotype more than 92% of sites corresponding to known SNPs compared to 6% genotyped by standard variant analysis. These include homozygous reference sites of clinical interest, thus leading to a broad and comprehensive characterization of variation necessary to an accurate evaluation of disease risk. Altogether, our findings indicate that characterization of both variant and non-variant clinically informative sites in the genome is necessary to allow an accurate clinical assessment of a personal genome. Finally, we propose a highly efficient extended VCF (eVCF file format which allows to store genotype calls for sites of clinical interest while remaining compatible with current variant interpretation software.

  3. Use of Whole Genome Sequencing for Diagnosis and Discovery in the Cancer Genetics Clinic

    Directory of Open Access Journals (Sweden)

    Samantha B. Foley

    2015-01-01

    Full Text Available Despite the potential of whole-genome sequencing (WGS to improve patient diagnosis and care, the empirical value of WGS in the cancer genetics clinic is unknown. We performed WGS on members of two cohorts of cancer genetics patients: those with BRCA1/2 mutations (n = 176 and those without (n = 82. Initial analysis of potentially pathogenic variants (PPVs, defined as nonsynonymous variants with allele frequency < 1% in ESP6500 in 163 clinically-relevant genes suggested that WGS will provide useful clinical results. This is despite the fact that a majority of PPVs were novel missense variants likely to be classified as variants of unknown significance (VUS. Furthermore, previously reported pathogenic missense variants did not always associate with their predicted diseases in our patients. This suggests that the clinical use of WGS will require large-scale efforts to consolidate WGS and patient data to improve accuracy of interpretation of rare variants. While loss-of-function (LoF variants represented only a small fraction of PPVs, WGS identified additional cancer risk LoF PPVs in patients with known BRCA1/2 mutations and led to cancer risk diagnoses in 21% of non-BRCA cancer genetics patients after expanding our analysis to 3209 ClinVar genes. These data illustrate how WGS can be used to improve our ability to discover patients' cancer genetic risks.

  4. Using diverse U.S. beef cattle genomes to identify missense mutations in EPAS1, a gene associated with pulmonary hypertension

    Science.gov (United States)

    The availability of whole genome sequence (WGS) data has made it possible to discover protein variants in silico. However, existing bovine WGS databases do not show data in a form conducive to protein variant analysis, and tend to under represent the breadth of genetic diversity in U.S. beef cattle...

  5. The Present and Future of Whole Genome Sequencing (WGS and Whole Metagenome Sequencing (WMS for Surveillance of Antimicrobial Resistant Microorganisms and Antimicrobial Resistance Genes across the Food Chain

    Directory of Open Access Journals (Sweden)

    Elena A. Oniciuc

    2018-05-01

    Full Text Available Antimicrobial resistance (AMR surveillance is a critical step within risk assessment schemes, as it is the basis for informing global strategies, monitoring the effectiveness of public health interventions, and detecting new trends and emerging threats linked to food. Surveillance of AMR is currently based on the isolation of indicator microorganisms and the phenotypic characterization of clinical, environmental and food strains isolated. However, this approach provides very limited information on the mechanisms driving AMR or on the presence or spread of AMR genes throughout the food chain. Whole-genome sequencing (WGS of bacterial pathogens has shown potential for epidemiological surveillance, outbreak detection, and infection control. In addition, whole metagenome sequencing (WMS allows for the culture-independent analysis of complex microbial communities, providing useful information on AMR genes occurrence. Both technologies can assist the tracking of AMR genes and mobile genetic elements, providing the necessary information for the implementation of quantitative risk assessments and allowing for the identification of hotspots and routes of transmission of AMR across the food chain. This review article summarizes the information currently available on the use of WGS and WMS for surveillance of AMR in foodborne pathogenic bacteria and food-related samples and discusses future needs that will have to be considered for the routine implementation of these next-generation sequencing methodologies with this aim. In particular, methodological constraints that impede the use at a global scale of these high-throughput sequencing (HTS technologies are identified, and the standardization of methods and protocols is suggested as a measure to upgrade HTS-based AMR surveillance schemes.

  6. Using diverse U.S. beef cattle genomes to identify missense mutations in EPAS1, a gene associated with high-altitude pulmonary hypertension

    Science.gov (United States)

    The availability of whole genome sequence (WGS) data has made it possible to discover protein variants in silico. However, bovine WGS databases comprised of related influential sires from relatively few breeds tend to under represent the breadth of genetic diversity in U.S. beef cattle. Thus, our ...

  7. Removing the bottleneck in whole genome sequencing of Mycobacterium tuberculosis for rapid drug resistance analysis: a call to action

    Directory of Open Access Journals (Sweden)

    Ruth McNerney

    2017-03-01

    Full Text Available Whole genome sequencing (WGS can provide a comprehensive analysis of Mycobacterium tuberculosis mutations that cause resistance to anti-tuberculosis drugs. With the deployment of bench-top sequencers and rapid analytical software, WGS is poised to become a useful tool to guide treatment. However, direct sequencing from clinical specimens to provide a full drug resistance profile remains a serious challenge. This article reviews current practices for extracting M. tuberculosis DNA and possible solutions for sampling sputum. Techniques under consideration include enzymatic digestion, physical disruption, chemical degradation, detergent solubilization, solvent extraction, ligand-coated magnetic beads, silica columns, and oligonucleotide pull-down baits. Selective amplification of genomic bacterial DNA in sputum prior to WGS may provide a solution, and differential lysis to reduce the levels of contaminating human DNA is also being explored. To remove this bottleneck and accelerate access to WGS for patients with suspected drug-resistant tuberculosis, it is suggested that a coordinated and collaborative approach be taken to more rapidly optimize, compare, and validate methodologies for sequencing from patient samples.

  8. Shotgun metagenomic data on the human stool samples to characterize shifts of the gut microbial profile after the Helicobacter pylori eradication therapy

    Directory of Open Access Journals (Sweden)

    Eugenia A. Boulygina

    2017-10-01

    Full Text Available The shotgun sequencing data presented in this report are related to the research article named “Gut microbiome shotgun sequencing in assessment of microbial community changes associated with H. pylori eradication therapy” (Khusnutdinova et al., 2016 [1]. Typically, the H. pylori eradication protocol includes a prolonged two-week use of the broad-spectrum antibiotics. The presented data on the whole-genome sequencing of the total DNA from stool samples of patients before the start of the eradication, immediately after eradication and several weeks after the end of treatment could help to profile the gut microbiota both taxonomically and functionally. The presented data together with those described in Glushchenko et al. (2017 [2] allow researchers to characterize the metagenomic profiles in which the use of antibiotics could result in dramatic changes in the intestinal microbiota composition. We perform 15 gut metagenomes from 5 patients with H. pylori infection, obtained through the shotgun sequencing on the SOLiD 5500 W platform. Raw reads are deposited in the ENA under project ID PRJEB21338.

  9. The African Genome Variation Project shapes medical genetics in Africa

    Science.gov (United States)

    Gurdasani, Deepti; Carstensen, Tommy; Tekola-Ayele, Fasil; Pagani, Luca; Tachmazidou, Ioanna; Hatzikotoulas, Konstantinos; Karthikeyan, Savita; Iles, Louise; Pollard, Martin O.; Choudhury, Ananyo; Ritchie, Graham R. S.; Xue, Yali; Asimit, Jennifer; Nsubuga, Rebecca N.; Young, Elizabeth H.; Pomilla, Cristina; Kivinen, Katja; Rockett, Kirk; Kamali, Anatoli; Doumatey, Ayo P.; Asiki, Gershim; Seeley, Janet; Sisay-Joof, Fatoumatta; Jallow, Muminatou; Tollman, Stephen; Mekonnen, Ephrem; Ekong, Rosemary; Oljira, Tamiru; Bradman, Neil; Bojang, Kalifa; Ramsay, Michele; Adeyemo, Adebowale; Bekele, Endashaw; Motala, Ayesha; Norris, Shane A.; Pirie, Fraser; Kaleebu, Pontiano; Kwiatkowski, Dominic; Tyler-Smith, Chris; Rotimi, Charles; Zeggini, Eleftheria; Sandhu, Manjinder S.

    2014-01-01

    Given the importance of Africa to studies of human origins and disease susceptibility, detailed characterisation of African genetic diversity is needed. The African Genome Variation Project (AGVP) provides a resource to help design, implement and interpret genomic studies in sub-Saharan Africa (SSA) and worldwide. The AGVP represents dense genotypes from 1,481 and whole genome sequences (WGS) from 320 individuals across SSA. Using this resource, we find novel evidence of complex, regionally distinct hunter-gatherer and Eurasian admixture across SSA. We identify new loci under selection, including for malaria and hypertension. We show that modern imputation panels can identify association signals at highly differentiated loci across populations in SSA. Using WGS, we show further improvement in imputation accuracy supporting efforts for large-scale sequencing of diverse African haplotypes. Finally, we present an efficient genotype array design capturing common genetic variation in Africa, showing for the first time that such designs are feasible. PMID:25470054

  10. Microbial succession and the functional potential during the fermentation of Chinese soy sauce brine

    Directory of Open Access Journals (Sweden)

    Joanita eSulaiman

    2014-10-01

    Full Text Available The quality of traditional Chinese soy sauce is determined by microbial communities and their inter-related metabolic roles in the fermentation tank. In this study, traditional Chinese soy sauce brine samples were obtained periodically to monitor the transitions of the microbial population and functional properties during the six months of fermentation process. Whole genome shotgun (WGS method revealed that the fermentation brine was dominated by the bacterial genus Weissella and later dominated by the fungal genus Candida. Metabolic reconstruction of the metagenome sequences demonstrated a characteristic profile of heterotrophic fermentation of proteins and carbohydrates. This was supported by the detection of ethanol with stable decrease of pH values. To the best of our knowledge, this is the first study that explores the temporal changes in microbial successions over a period of six months, through metagenome shotgun sequencing in traditional Chinese soy sauce fermentation and the biological processes therein.

  11. Genome shotgun sequencing and development of microsatellite ...

    African Journals Online (AJOL)

    Analysis of the gerbera genome DNA ('Raon') general library showed that sequences of (AT), (AG), (AAG) and (AAT) repeats appeared most often, whereas (AC), (AAC) and (ACC) were the least frequent. Primer pairs were designed for 80 loci. Only eight primer pairs produced reproducible polymorphic bands in the 28 ...

  12. Real time application of whole genome sequencing for outbreak investigation - What is an achievable turnaround time?

    Science.gov (United States)

    McGann, Patrick; Bunin, Jessica L; Snesrud, Erik; Singh, Seema; Maybank, Rosslyn; Ong, Ana C; Kwak, Yoon I; Seronello, Scott; Clifford, Robert J; Hinkle, Mary; Yamada, Stephen; Barnhill, Jason; Lesho, Emil

    2016-07-01

    Whole genome sequencing (WGS) is increasingly employed in clinical settings, though few assessments of turnaround times (TAT) have been performed in real-time. In this study, WGS was used to investigate an unfolding outbreak of vancomycin resistant Enterococcus faecium (VRE) among 3 patients in the ICU of a tertiary care hospital. Including overnight culturing, a TAT of just 48.5 h for a comprehensive report was achievable using an Illumina Miseq benchtop sequencer. WGS revealed that isolates from patient 2 and 3 differed from that of patient 1 by a single nucleotide polymorphism (SNP), indicating nosocomial transmission. However, the unparalleled resolution provided by WGS suggested that nosocomial transmission involved two separate events from patient 1 to patient 2 and 3, and not a linear transmission suspected by the time line. Rapid TAT's are achievable using WGS in the clinical setting and can provide an unprecedented level of resolution for outbreak investigations. Published by Elsevier Inc.

  13. Use of WGS in Mycobacterium tuberculosis routine diagnosis

    Directory of Open Access Journals (Sweden)

    Daniela M Cirillo

    2016-01-01

    Conclusion: WGS is a rapid, cost-effective technique that promises to integrate and replace the other tests in routine laboratories for an accurate diagnosis of DR-TB, although it is suitable nowadays for cultured samples only.

  14. One bacterial cell, one complete genome.

    Directory of Open Access Journals (Sweden)

    Tanja Woyke

    2010-04-01

    Full Text Available While the bulk of the finished microbial genomes sequenced to date are derived from cultured bacterial and archaeal representatives, the vast majority of microorganisms elude current culturing attempts, severely limiting the ability to recover complete or even partial genomes from these environmental species. Single cell genomics is a novel culture-independent approach, which enables access to the genetic material of an individual cell. No single cell genome has to our knowledge been closed and finished to date. Here we report the completed genome from an uncultured single cell of Candidatus Sulcia muelleri DMIN. Digital PCR on single symbiont cells isolated from the bacteriome of the green sharpshooter Draeculacephala minerva bacteriome allowed us to assess that this bacteria is polyploid with genome copies ranging from approximately 200-900 per cell, making it a most suitable target for single cell finishing efforts. For single cell shotgun sequencing, an individual Sulcia cell was isolated and whole genome amplified by multiple displacement amplification (MDA. Sanger-based finishing methods allowed us to close the genome. To verify the correctness of our single cell genome and exclude MDA-derived artifacts, we independently shotgun sequenced and assembled the Sulcia genome from pooled bacteriomes using a metagenomic approach, yielding a nearly identical genome. Four variations we detected appear to be genuine biological differences between the two samples. Comparison of the single cell genome with bacteriome metagenomic sequence data detected two single nucleotide polymorphisms (SNPs, indicating extremely low genetic diversity within a Sulcia population. This study demonstrates the power of single cell genomics to generate a complete, high quality, non-composite reference genome within an environmental sample, which can be used for population genetic analyzes.

  15. One Bacterial Cell, One Complete Genome

    Energy Technology Data Exchange (ETDEWEB)

    Woyke, Tanja; Tighe, Damon; Mavrommatis, Konstantinos; Clum, Alicia; Copeland, Alex; Schackwitz, Wendy; Lapidus, Alla; Wu, Dongying; McCutcheon, John P.; McDonald, Bradon R.; Moran, Nancy A.; Bristow, James; Cheng, Jan-Fang

    2010-04-26

    While the bulk of the finished microbial genomes sequenced to date are derived from cultured bacterial and archaeal representatives, the vast majority of microorganisms elude current culturing attempts, severely limiting the ability to recover complete or even partial genomes from these environmental species. Single cell genomics is a novel culture-independent approach, which enables access to the genetic material of an individual cell. No single cell genome has to our knowledge been closed and finished to date. Here we report the completed genome from an uncultured single cell of Candidatus Sulcia muelleri DMIN. Digital PCR on single symbiont cells isolated from the bacteriome of the green sharpshooter Draeculacephala minerva bacteriome allowed us to assess that this bacteria is polyploid with genome copies ranging from approximately 200?900 per cell, making it a most suitable target for single cell finishing efforts. For single cell shotgun sequencing, an individual Sulcia cell was isolated and whole genome amplified by multiple displacement amplification (MDA). Sanger-based finishing methods allowed us to close the genome. To verify the correctness of our single cell genome and exclude MDA-derived artifacts, we independently shotgun sequenced and assembled the Sulcia genome from pooled bacteriomes using a metagenomic approach, yielding a nearly identical genome. Four variations we detected appear to be genuine biological differences between the two samples. Comparison of the single cell genome with bacteriome metagenomic sequence data detected two single nucleotide polymorphisms (SNPs), indicating extremely low genetic diversity within a Sulcia population. This study demonstrates the power of single cell genomics to generate a complete, high quality, non-composite reference genome within an environmental sample, which can be used for population genetic analyzes.

  16. A physical map for the Amborella trichopoda genome sheds light on the evolution of angiosperm genome structure

    OpenAIRE

    Zuccolo, Andrea; Bowers, John E; Estill, James C; Xiong, Zhiyong; Luo, Meizhong; Sebastian, Aswathy; Goicoechea, Jos? Luis; Collura, Kristi; Yu, Yeisoo; Jiao, Yuannian; Duarte, Jill; Tang, Haibao; Ayyampalayam, Saravanaraj; Rounsley, Steve; Kudrna, Dave

    2011-01-01

    Background Recent phylogenetic analyses have identified Amborella trichopoda, an understory tree species endemic to the forests of New Caledonia, as sister to a clade including all other known flowering plant species. The Amborella genome is a unique reference for understanding the evolution of angiosperm genomes because it can serve as an outgroup to root comparative analyses. A physical map, BAC end sequences and sample shotgun sequences provide a first view of the 870 Mbp Amborella genome....

  17. Paleogenomics in a temperate environment: shotgun sequencing from an extinct Mediterranean caprine.

    Directory of Open Access Journals (Sweden)

    Oscar Ramírez

    Full Text Available BACKGROUND: Numerous endemic mammals, including dwarf elephants, goats, hippos and deers, evolved in isolation in the Mediterranean islands during the Pliocene and Pleistocene. Most of them subsequently became extinct during the Holocene. Recently developed high-throughput sequencing technologies could provide a unique tool for retrieving genomic data from these extinct species, making it possible to study their evolutionary history and the genetic bases underlying their particular, sometimes unique, adaptations. METHODOLOGY/PRINCIPALS FINDINGS: A DNA extraction of a approximately 6,000 year-old bone sample from an extinct caprine (Myotragus balearicus from the Balearic Islands in the Western Mediterranean, has been subjected to shotgun sequencing with the GS FLX 454 platform. Only 0.27% of the resulting sequences, identified from alignments with the cow genome and comprising 15,832 nucleotides, with an average length of 60 nucleotides, proved to be endogenous. CONCLUSIONS: A phylogenetic tree generated with Myotragus sequences and those from other artiodactyls displays an identical topology to that generated from mitochondrial DNA data. Despite being in an unfavourable thermal environment, which explains the low yield of endogenous sequences, our study demonstrates that it is possible to obtain genomic data from extinct species from temperate regions.

  18. Draft genome sequence of a Kluyvera intermedia isolate from a patient with a pancreatic abscess.

    Science.gov (United States)

    Thele, Roland; Gumpert, Heidi; Christensen, Louise B; Worning, Peder; Schønning, Kristian; Westh, Henrik; Hansen, Thomas A

    2017-09-01

    The genus Kluyvera comprises potential pathogens that can cause many infections. This study reports a Kluyvera intermedia strain (FOSA7093) from a pancreatic cyst specimen from a long-term hospitalised patient. Whole-genome sequencing (WGS) of the K. intermedia isolate was performed and the strain was reported as sensitive to Danish-registered antibiotics although it had a fosA-like gene in the genome. There were nine contigs that aligned to a plasmid, and these contigs contained several heavy metal resistance gene homologues. Furthermore, a prophage was discovered in the genome. WGS represents an efficient tool for monitoring Kluyvera spp. and its role as a reservoir of multidrug resistance. Therefore, this susceptible K. intermedia genome has many characteristics that allow comparison of resistant K. intermedia that might be discovered in the future. Copyright © 2017 International Society for Chemotherapy of Infection and Cancer. Published by Elsevier Ltd. All rights reserved.

  19. Targeted sequencing of plant genomes

    Science.gov (United States)

    Mark D. Huynh

    2014-01-01

    Next-generation sequencing (NGS) has revolutionized the field of genetics by providing a means for fast and relatively affordable sequencing. With the advancement of NGS, wholegenome sequencing (WGS) has become more commonplace. However, sequencing an entire genome is still not cost effective or even beneficial in all cases. In studies that do not require a whole-...

  20. Whole genome sequencing options for bacterial strain typing and epidemiologic analysis based on single nucleotide polymorphism versus gene-by-gene-based approaches.

    Science.gov (United States)

    Schürch, A C; Arredondo-Alonso, S; Willems, R J L; Goering, R V

    2018-04-01

    Whole genome sequence (WGS)-based strain typing finds increasing use in the epidemiologic analysis of bacterial pathogens in both public health as well as more localized infection control settings. This minireview describes methodologic approaches that have been explored for WGS-based epidemiologic analysis and considers the challenges and pitfalls of data interpretation. Personal collection of relevant publications. When applying WGS to study the molecular epidemiology of bacterial pathogens, genomic variability between strains is translated into measures of distance by determining single nucleotide polymorphisms in core genome alignments or by indexing allelic variation in hundreds to thousands of core genes, assigning types to unique allelic profiles. Interpreting isolate relatedness from these distances is highly organism specific, and attempts to establish species-specific cutoffs are unlikely to be generally applicable. In cases where single nucleotide polymorphism or core gene typing do not provide the resolution necessary for accurate assessment of the epidemiology of bacterial pathogens, inclusion of accessory gene or plasmid sequences may provide the additional required discrimination. As with all epidemiologic analysis, realizing the full potential of the revolutionary advances in WGS-based approaches requires understanding and dealing with issues related to the fundamental steps of data generation and interpretation. Copyright © 2018 The Authors. Published by Elsevier Ltd.. All rights reserved.

  1. Multilocus Sequence Typing of Total-Genome-Sequenced Bacteria

    DEFF Research Database (Denmark)

    Larsen, Mette Voldby; Cosentino, Salvatore; Rasmussen, Simon

    2012-01-01

    Accurate strain identification is essential for anyone working with bacteria. For many species, multilocus sequence typing (MLST) is considered the "gold standard" of typing, but it is traditionally performed in an expensive and time-consuming manner. As the costs of whole-genome sequencing (WGS...

  2. Using genomic information to conserve genetic diversity in livestock

    NARCIS (Netherlands)

    Eynard, Sonia E.

    2018-01-01

    Concern about the status of livestock breeds and their conservation has increased as selection and small population sizes caused loss of genetic diversity. Meanwhile, dense SNP chips and whole genome sequences (WGS) became available, providing opportunities to accurately quantify the impact of

  3. Pseudomonas aeruginosa intensive care unit outbreak: winnowing of transmissions with molecular and genomic typing.

    Science.gov (United States)

    Parcell, B J; Oravcova, K; Pinheiro, M; Holden, M T G; Phillips, G; Turton, J F; Gillespie, S H

    2018-03-01

    Pseudomonas aeruginosa healthcare outbreaks can be time consuming and difficult to investigate. Guidance does not specify which typing technique is most practical for decision-making. To explore the usefulness of whole-genome sequencing (WGS) in the investigation of a P. aeruginosa outbreak, describing how it compares with pulsed-field gel electrophoresis (PFGE) and variable number tandem repeat (VNTR) analysis. Six patient isolates and six environmental samples from an intensive care unit (ICU) positive for P. aeruginosa over two years underwent VNTR, PFGE and WGS. VNTR and PFGE were required to fully determine the potential source of infection and rule out others. WGS results unambiguously distinguished linked isolates, giving greater assurance of the transmission route between wash-hand basin water and two patients, supporting the control measures employed. WGS provided detailed information without the need for further typing. When allied to epidemiological information, WGS can be used to understand outbreak situations rapidly and with certainty. Implementation of WGS in real-time would be a major advance in day-to-day practice. It could become a standard of care as it becomes more widespread due to its reproducibility and lower costs. Copyright © 2017 The Authors. Published by Elsevier Ltd.. All rights reserved.

  4. Parameters of the CGCS 2000 Ellipsoid and Comparisons with GRS 80 and WGS 84

    Directory of Open Access Journals (Sweden)

    CHENG Pengfei

    2016-02-01

    Full Text Available According to the definition of China Geodetic Coordinate System 2000(CGCS 2000 and defined constants of the ellipsoid adopted by CGCS 2000,the other geometrical and physical parameters of this ellipsoid are derived and compared with that from GRS 80 and WGS 84,respectively.Meanwhile the coordinates and normal gravity on the CGCS 2000 ellipsoid are compared with that on WGS 84 and GRS 80.The difference between the normal gravity on CGCS 2000 ellipsoid and that on GRS 80 is about -143.54×10-8 m/s2,while it is 0.02×10-8 m/s2 compared to WGS 84.The longitudes of a point on these three ellipsoids are the same,but the maximum difference of latitude between CGCS 2000 and GRS 80 is 8.26×10-11 arc seconds,which is about 2.5×10-6 mm,and the maximum difference of latitude between CGCS 2000 and WGS 84 is 3.6×10-6 arc seconds,which is about 0.11 mm.

  5. Real-time whole-genome sequencing for routine typing, surveillance, and outbreak detection of verotoxigenic Escherichia coli.

    OpenAIRE

    Joensen, Katrine Grimstrup; Scheutz, Flemming; Lund, Ole; Hasman, Henrik; Kaas, Rolf Sommer; Nielsen, Eva M.; Aarestrup, Frank Møller

    2014-01-01

    Fast and accurate identification and typing of pathogens are essential for effective surveillance and outbreak detection. The current routine procedure is based on a variety of techniques, making the procedure laborious, time-consuming, and expensive. With whole-genome sequencing (WGS) becoming cheaper, it has huge potential in both diagnostics and routine surveillance. The aim of this study was to perform a real-time evaluation of WGS for routine typing and surveillance of verocytotoxin-prod...

  6. Real-Time Whole-Genome Sequencing for Routine Typing, Surveillance, and Outbreak Detection of Verotoxigenic Escherichia coli

    OpenAIRE

    Joensen, Katrine Grimstrup; Scheutz, Flemming; Lund, Ole; Hasman, Henrik; Kaas, Rolf S.; Nielsen, Eva M.; Aarestrup, Frank M.

    2014-01-01

    Fast and accurate identification and typing of pathogens are essential for effective surveillance and outbreak detection. The current routine procedure is based on a variety of techniques, making the procedure laborious, time-consuming, and expensive. With whole-genome sequencing (WGS) becoming cheaper, it has huge potential in both diagnostics and routine surveillance. The aim of this study was to perform a real-time evaluation of WGS for routine typing and surveillance of verocytotoxin-prod...

  7. Using sheep genomes from diverse U.S. breeds to identify missense variants in genes affecting fecundity

    Science.gov (United States)

    Background: Access to sheep genome sequences significantly improves the chances of identifying genes that may influence the health, welfare, and productivity of these animals. Methods: A public, searchable DNA sequence resource for U.S. sheep was created with whole genome sequence (WGS) of 96 rams. ...

  8. Rhipicephalus (Boophilus) microplus strain Deutsch, whole genome shotgun sequencing project first submission of genome sequence

    Science.gov (United States)

    The size and repetitive nature of the Rhipicephalus microplus genome makes obtaining a full genome sequence difficult. Cot filtration/selection techniques were used to reduce the repetitive fraction of the tick genome and enrich for the fraction of DNA with gene-containing regions. The Cot-selected ...

  9. Ni-CeO2/C Catalysts with Enhanced OSC for the WGS Reaction

    Directory of Open Access Journals (Sweden)

    Laura Pastor-Pérez

    2015-03-01

    Full Text Available In this work, the WGS performance of a conventional Ni/CeO2 bulk catalyst is compared to that of a carbon-supported Ni-CeO2 catalyst. The carbon-supported sample resulted to be much more active than the bulk one. The higher activity of the Ni-CeO2/C catalyst is associated to its oxygen storage capacity, a parameter that strongly influences the WGS behavior. The stability of the carbon-supported catalyst under realistic operation conditions is also a subject of this paper. In summary, our study represents an approach towards a new generation of Ni-ceria based catalyst for the pure hydrogen production via WGS. The dispersion of ceria nanoparticles on an activated carbon support drives to improved catalytic skills with a considerable reduction of the amount of ceria in the catalyst formulation.

  10. Reporting results from whole-genome and whole-exome sequencing in clinical practice: a proposal for Canada?

    Science.gov (United States)

    Zawati, Ma'n H; Parry, David; Thorogood, Adrian; Nguyen, Minh Thu; Boycott, Kym M; Rosenblatt, David; Knoppers, Bartha Maria

    2014-01-01

    This article proposes recommendations for the use of whole-genome and whole-exome (WGS/WES) sequencing in clinical practice, endorsed by the board of directors of the Canadian College of Medical Geneticists. The publication of statements and recommendations by several international and national organisations on clinical WGS/WES has prompted a need for Canadian-specific guidance. A multi-disciplinary group consisting of lawyers, ethicists, genetic researchers, and clinical geneticists was assembled to review existing guidelines on WGS/WES and identify provisions relevant to the Canadian context. Definitions were provided to orient the recommendations and to minimize confusion with other recommendations. Recommendations include the following: WGS/WES should be used in a judicious and cost-efficient manner; WGS/WES should be used to answer a clinical question; and physicians need to explain to adult patients the nature of the results that could arise, so as to allow them to make informed choices over whether to take the test and which results they wish to receive. Recommendations are also provided for WGS/WES in the pediatric context, and for when results implicate patients' family members. These recommendations are only a proposal to be developed into comprehensive Canadian-based guidelines. They aim to promote discussion about the reporting of WGS/WES results, and to encourage the ethical implementation of these new technologies in the clinical setting.

  11. High-throughput shotgun lipidomics by quadrupole time-of-flight mass spectrometry

    DEFF Research Database (Denmark)

    Ståhlman, Marcus; Ejsing, Christer S.; Tarasov, Kirill

    2009-01-01

    Technological advances in mass spectrometry and meticulous method development have produced several shotgun lipidomic approaches capable of characterizing lipid species by direct analysis of total lipid extracts. Shotgun lipidomics by hybrid quadrupole time-of-flight mass spectrometry allows...... the absolute quantification of hundreds of molecular glycerophospholipid species, glycerolipid species, sphingolipid species and sterol lipids. Future applications in clinical cohort studies demand detailed lipid molecule information and the application of high-throughput lipidomics platforms. In this review...... we describe a novel high-throughput shotgun lipidomic platform based on 96-well robot-assisted lipid extraction, automated sample infusion by mircofluidic-based nanoelectrospray ionization, and quantitative multiple precursor ion scanning analysis on a quadrupole time-of-flight mass spectrometer...

  12. Preliminary High-Throughput Metagenome Assembly

    Energy Technology Data Exchange (ETDEWEB)

    Dusheyko, Serge; Furman, Craig; Pangilinan, Jasmyn; Shapiro, Harris; Tu, Hank

    2007-03-26

    Metagenome data sets present a qualitatively different assembly problem than traditional single-organism whole-genome shotgun (WGS) assembly. The unique aspects of such projects include the presence of a potentially large number of distinct organisms and their representation in the data set at widely different fractions. In addition, multiple closely related strains could be present, which would be difficult to assemble separately. Failure to take these issues into account can result in poor assemblies that either jumble together different strains or which fail to yield useful results. The DOE Joint Genome Institute has sequenced a number of metagenomic projects and plans to considerably increase this number in the coming year. As a result, the JGI has a need for high-throughput tools and techniques for handling metagenome projects. We present the techniques developed to handle metagenome assemblies in a high-throughput environment. This includes a streamlined assembly wrapper, based on the JGI?s in-house WGS assembler, Jazz. It also includes the selection of sensible defaults targeted for metagenome data sets, as well as quality control automation for cleaning up the raw results. While analysis is ongoing, we will discuss preliminary assessments of the quality of the assembly results (http://fames.jgi-psf.org).

  13. Draft genome sequence of Sclerospora graminicola, the pearl millet downy mildew pathogen

    Directory of Open Access Journals (Sweden)

    Navajeet Chakravartty

    2017-12-01

    Full Text Available Sclerospora graminicola pathogen is the most important biotic production constraints of pearl millet in India, Africa and other parts of the world. We report a de novo whole genome assembly and analysis of pathotype 1, one of the most virulent pathotypes of S. graminicola from India. The whole genome sequencing was performed by sequencing of 7.38 Gb with 73,889,924 paired end reads from the paired-end library, and 1.15 Gb with 3,851,788 reads from the mate pair library generated from Illumina HiSeq 2500 and Illumina MiSeq, respectively. A total 597,293 filtered sub reads with average read length of 6.39 Kb was generated on PACBIO RSII with P6-C4 chemistry. Assembled draft genome sequence of S. graminicola pathotype 1 was 299,901,251 bp in length, N50 of 17,909 bp with a minimum of 1 Kb scaffold size. The GC content was 47.2 % consisting of 26,786 scaffolds with longest scaffold size of 238,843 bp. The overall coverage was 40X. The draft genome sequence was used for gene prediction using AUGUSTUS which resulted in 65,404 genes using Saccharomyces cerevisiae as a model. A total of 52,285 predicted genes found homology using BLASTX against nr database and 38,120 genes were observed with a significant BLASTX match with E-value cutoff of 1e-5 and 40% identity percentage. Out of 38,120 genes annotated a set of 11,873 genes had UniProt entries, while 7,248 were GO terms and 9,686 with KEGG IDs. Of the 7,248 GO terms, 2,724 were associated with the biological processes. The genome information of downy mildew pathogen is available in the NCBI GenBank database. The Sclerospora graminicola whole genome shotgun (WGS project has the project accession MIQA00000000. This version of the project (02 has the accession number MIQA02000000, and consists of sequences MIQA02000001-MIQA02026786, with BioProject ID PRJNA325098 and BioSample ID SAMN05219233. This study may help understand the evolutionary pattern of pathogen and aid elucidation of effector evolution for

  14. Using whole genome sequencing to study American foulbrood epidemiology in honeybees.

    Directory of Open Access Journals (Sweden)

    Joakim Ågren

    Full Text Available American foulbrood (AFB, caused by Paenibacillus larvae, is a devastating disease in honeybees. In most countries, the disease is controlled through compulsory burning of symptomatic colonies causing major economic losses in apiculture. The pathogen is endemic to honeybees world-wide and is readily transmitted via the movement of hive equipment or bees. Molecular epidemiology of AFB currently largely relies on placing isolates in one of four ERIC-genotypes. However, a more powerful alternative is multi-locus sequence typing (MLST using whole-genome sequencing (WGS, which allows for high-resolution studies of disease outbreaks. To evaluate WGS as a tool for AFB-epidemiology, we applied core genome MLST (cgMLST on isolates from a recent outbreak of AFB in Sweden. The high resolution of the cgMLST allowed different bacterial clones involved in the disease outbreak to be identified and to trace the source of infection. The source was found to be a beekeeper who had sold bees to two other beekeepers, proving the epidemiological link between them. No such conclusion could have been made using conventional MLST or ERIC-typing. This is the first time that WGS has been used to study the epidemiology of AFB. The results show that the technique is very powerful for high-resolution tracing of AFB-outbreaks.

  15. Annotation of the Clostridium Acetobutylicum Genome

    Energy Technology Data Exchange (ETDEWEB)

    Daly, M. J.

    2004-06-09

    The genome sequence of the solvent producing bacterium Clostridium acetobutylicum ATCC824, has been determined by the shotgun approach. The genome consists of a 3.94 Mb chromosome and a 192 kb megaplasmid that contains the majority of genes responsible for solvent production. Comparison of C. acetobutylicum to Bacillus subtilis reveals significant local conservation of gene order, which has not been seen in comparisons of other genomes with similar, or, in some cases, closer, phylogenetic proximity. This conservation allows the prediction of many previously undetected operons in both bacteria.

  16. Rhipicephalus microplus strain Deutsch, whole genome shotgun sequencing project Version 2

    Science.gov (United States)

    The cattle tick, Rhipicephalus (Boophilus) microplus, has a genome over 2.4 times the size of the human genome, and with over 70% of repetitive DNA, this genome would prove very costly to sequence at today's prices and difficult to assemble and analyze. Cot filtration/selection techniques were used ...

  17. Improved imputation accuracy of rare and low-frequency variants using population-specific high-coverage WGS-based imputation reference panel.

    Science.gov (United States)

    Mitt, Mario; Kals, Mart; Pärn, Kalle; Gabriel, Stacey B; Lander, Eric S; Palotie, Aarno; Ripatti, Samuli; Morris, Andrew P; Metspalu, Andres; Esko, Tõnu; Mägi, Reedik; Palta, Priit

    2017-06-01

    Genetic imputation is a cost-efficient way to improve the power and resolution of genome-wide association (GWA) studies. Current publicly accessible imputation reference panels accurately predict genotypes for common variants with minor allele frequency (MAF)≥5% and low-frequency variants (0.5≤MAF<5%) across diverse populations, but the imputation of rare variation (MAF<0.5%) is still rather limited. In the current study, we evaluate imputation accuracy achieved with reference panels from diverse populations with a population-specific high-coverage (30 ×) whole-genome sequencing (WGS) based reference panel, comprising of 2244 Estonian individuals (0.25% of adult Estonians). Although the Estonian-specific panel contains fewer haplotypes and variants, the imputation confidence and accuracy of imputed low-frequency and rare variants was significantly higher. The results indicate the utility of population-specific reference panels for human genetic studies.

  18. Identification and insertion polymorphisms of short interspersed nuclear elements (SINEs) in Brassica genomes

    International Nuclear Information System (INIS)

    Nouroz, F.; Naveed, M.

    2018-01-01

    The non-LTR retrotransposons (retroposons) are abundant in plant genomes including members of Brassicaceae. Of the retroposons, long interspersed nuclear elements (LINEs) are more copious followed by short interspersed nuclear elements (SINEs) in sequenced eukaryotic genomes. The SINEs are short elements and ranged from 100-500 bps flanked by variable sized target site duplications, 5' tRNA region with polymerase III promoter, internal tRNA unrelated region, 3' LINEs derived region and a poly adenosine tail. Different computational approaches were used for the identification and characterization of SINEs, while PCR was used to detect the SINEs insertion polymorphisms in various Brassica genotypes. Ten previously unidentified families of SINEs were identified and characterized from Brassica genomes. The structural features of these SINEs were studied in detail, which showed typical SINE features displaying small sizes, target site duplications, head regions, internal regions (body) of variable sizes and a poly (A) tail at the 3' terminus. The elements from various families ranged from 206-558 bp, where BoSINE2 family displayed smallest SINE element (206 bp), while larger members belonged to BoSINE9 family (524-558 bp). The distribution and abundance of SINEs in various Brassica species and genotypes (40) at a particular site/locus were investigated by SINEs based PCR markers. Various SINE insertion polymorphisms were detected from different genotypes, where higher PCR bands amplified the SINE insertions, while lower bands amplified the pre-insertion sites (flanking regions). The analysis of Brassica SINEs copy numbers from 10 identified families revealed that around 860 and 1712 copies of SINEs were calculated from B. rapa and B. oleracea Whole-genome shotgun contigs (WGS) respectively. Analysis of insertion sites of Brassica SINEs revealed that the members from all 10 SINE families had shown an insertion preference in AT rich regions. The present

  19. Genome sequence of Stachybotrys chartarum Strain 51-11

    Science.gov (United States)

    Stachybotrys chartarum strain 51-11 genome was sequenced by shotgun sequencing utilizing Illumina Hiseq 2000 and PacBio long read technology. Since Stachybotrys chartarum has been implicated in health impacts within water-damaged buildings, any information extracted from the geno...

  20. An Outbreak of Streptococcus pyogenes in a Mental Health Facility: Advantage of Well-Timed Whole-Genome Sequencing Over emm Typing.

    Science.gov (United States)

    Bergin, Sarah M; Periaswamy, Balamurugan; Barkham, Timothy; Chua, Hong Choon; Mok, Yee Ming; Fung, Daniel Shuen Sheng; Su, Alex Hsin Chuan; Lee, Yen Ling; Chua, Ming Lai Ivan; Ng, Poh Yong; Soon, Wei Jia Wendy; Chu, Collins Wenhan; Tan, Siyun Lucinda; Meehan, Mary; Ang, Brenda Sze Peng; Leo, Yee Sin; Holden, Matthew T G; De, Partha; Hsu, Li Yang; Chen, Swaine L; de Sessions, Paola Florez; Marimuthu, Kalisvar

    2018-05-09

    OBJECTIVEWe report the utility of whole-genome sequencing (WGS) conducted in a clinically relevant time frame (ie, sufficient for guiding management decision), in managing a Streptococcus pyogenes outbreak, and present a comparison of its performance with emm typing.SETTINGA 2,000-bed tertiary-care psychiatric hospital.METHODSActive surveillance was conducted to identify new cases of S. pyogenes. WGS guided targeted epidemiological investigations, and infection control measures were implemented. Single-nucleotide polymorphism (SNP)-based genome phylogeny, emm typing, and multilocus sequence typing (MLST) were performed. We compared the ability of WGS and emm typing to correctly identify person-to-person transmission and to guide the management of the outbreak.RESULTSThe study included 204 patients and 152 staff. We identified 35 patients and 2 staff members with S. pyogenes. WGS revealed polyclonal S. pyogenes infections with 3 genetically distinct phylogenetic clusters (C1-C3). Cluster C1 isolates were all emm type 4, sequence type 915 and had pairwise SNP differences of 0-5, which suggested recent person-to-person transmissions. Epidemiological investigation revealed that cluster C1 was mediated by dermal colonization and transmission of S. pyogenes in a male residential ward. Clusters C2 and C3 were genomically diverse, with pairwise SNP differences of 21-45 and 26-58, and emm 11 and mostly emm120, respectively. Clusters C2 and C3, which may have been considered person-to-person transmissions by emm typing, were shown by WGS to be unlikely by integrating pairwise SNP differences with epidemiology.CONCLUSIONSWGS had higher resolution than emm typing in identifying clusters with recent and ongoing person-to-person transmissions, which allowed implementation of targeted intervention to control the outbreak.Infect Control Hosp Epidemiol 2018;1-9.

  1. A strategic stakeholder approach for addressing further analysis requests in whole genome sequencing research.

    Science.gov (United States)

    Thornock, Bradley Steven O

    2016-01-01

    Whole genome sequencing (WGS) can be a cost-effective and efficient means of diagnosis for some children, but it also raises a number of ethical concerns. One such concern is how researchers derive and communicate results from WGS, including future requests for further analysis of stored sequences. The purpose of this paper is to think about what is at stake, and for whom, in any solution that is developed to deal with such requests. To accomplish this task, this paper will utilize stakeholder theory, a common method used in business ethics. Several scenarios that connect stakeholder concerns and WGS will also posited and analyzed. This paper concludes by developing criteria composed of a series of questions that researchers can answer in order to more effectively address requests for further analysis of stored sequences.

  2. Genomic characterization, phylogenetic analysis, and identification of virulence factors in Aerococcus sanguinicola and Aerococcus urinae strains isolated from infection episodes

    DEFF Research Database (Denmark)

    Carkaci, Derya; Højholt, Katrine; Nielsen, Xiaohui Chen

    2017-01-01

    Aerococcus sanguinicola and Aerococcus urinae are emerging pathogens in clinical settings mostly being causative agents of urinary tract infections (UTIs), urogenic sepsis and more seldomly complicated infective endocarditis (IE). Limited knowledge exists concerning the pathogenicity of these two...... species. Eight clinical A. sanguinicola (isolated from 2009 to 2015) and 40 clinical A. urinae (isolated from 1984 to 2015) strains from episodes of UTIs, bacteremia, and IE were whole-genome sequenced (WGS) to analyze genomic diversity and characterization of virulence genes involved in the bacterial....... In conclusion, this is the first study dealing with WGS and comparative genomics of clinical A. sanguinicola and A. urinae strains from episodes of UTIs, bacteremia, and IE. Gene homologs associated with antiphagocytosis and bacterial adherence were identified and genetic variability was observed within A...

  3. The Effects of Signal Erosion and Core Genome Reduction on the Identification of Diagnostic Markers

    Directory of Open Access Journals (Sweden)

    Jason W. Sahl

    2016-09-01

    Full Text Available Whole-genome sequence (WGS data are commonly used to design diagnostic targets for the identification of bacterial pathogens. To do this effectively, genomics databases must be comprehensive to identify the strict core genome that is specific to the target pathogen. As additional genomes are analyzed, the core genome size is reduced and there is erosion of the target-specific regions due to commonality with related species, potentially resulting in the identification of false positives and/or false negatives.

  4. The Future of Whole-Genome Sequencing for Public Health and the Clinic

    OpenAIRE

    Allard, Marc W.

    2016-01-01

    An American Society for Microbiology (ASM) conference titled the Conference on Rapid Next-Generation Sequencing and Bioinformatic Pipelines for Enhanced Molecular Epidemiological Investigation of Pathogens provided a venue for discussing how technologies surrounding whole-genome sequencing (WGS) are advancing microbiology. Several applications in microbial taxonomy, microbial forensics, and genomics for public health pathogen surveillance were presented at the meeting and are reviewed. All of...

  5. Identification of genomic insertion and flanking sequence of G2-EPSPS and GAT transgenes in soybean using whole genome sequencing method

    Directory of Open Access Journals (Sweden)

    Bingfu Guo

    2016-07-01

    Full Text Available Molecular characterization of sequences flanking exogenous fragment insertions is essential for safety assessment and labeling of genetically modified organisms (GMO. In this study, the T-DNA insertion sites and flanking sequences were identified in two newly developed transgenic glyphosate-tolerant soybeans GE-J16 and ZH10-6 based on whole genome sequencing (WGS method. About 21 Gb sequence data (~21× coverage for each line was generated on Illumina HiSeq 2500 platform. The junction reads mapped to boundary of T-DNA and flanking sequences in these two events were identified by comparing all sequencing reads with soybean reference genome and sequence of transgenic vector. The putative insertion loci and flanking sequences were further confirmed by PCR amplification, Sanger sequencing, and co-segregation analysis. All these analyses supported that exogenous T-DNA fragments were integrated in positions of Chr19: 50543767-50543792 and Chr17: 7980527-7980541 in these two transgenic lines. Identification of the genomic insertion site of the G2-EPSPS and GAT transgenes will facilitate the use of their glyphosate-tolerant traits in soybean breeding program. These results also demonstrated that WGS is a cost-effective and rapid method of identifying sites of T-DNA insertions and flanking sequences in soybean.

  6. Shotgun approaches to gait analysis : insights & limitations

    NARCIS (Netherlands)

    Kaptein, Ronald G.; Wezenberg, Daphne; IJmker, Trienke; Houdijk, Han; Beek, Peter J.; Lamoth, Claudine J. C.; Daffertshofer, Andreas

    2014-01-01

    Background: Identifying features for gait classification is a formidable problem. The number of candidate measures is legion. This calls for proper, objective criteria when ranking their relevance. Methods: Following a shotgun approach we determined a plenitude of kinematic and physiological gait

  7. A probabilistic model to recover individual genomes from metagenomes

    NARCIS (Netherlands)

    J. Dröge (Johannes); A. Schönhuth (Alexander); A.C. McHardy (Alice)

    2017-01-01

    textabstractShotgun metagenomics of microbial communities reveal information about strains of relevance for applications in medicine, biotechnology and ecology. Recovering their genomes is a crucial but very challenging step due to the complexity of the underlying biological system and technical

  8. "We don't know her history, her background": adoptive parents' perspectives on whole genome sequencing results.

    Science.gov (United States)

    Crouch, Julia; Yu, Joon-Ho; Shankar, Aditi G; Tabor, Holly K

    2015-02-01

    Exome sequencing and whole genome sequencing (ES/WGS) can provide parents with a wide range of genetic information about their children, and adoptive parents may have unique issues to consider regarding possible access to this information. The few papers published on adoption and genetics have focused on targeted genetic testing of children in the pre-adoption context. There are no data on adoptive parents' perspectives about pediatric ES/WGS, including their preferences about different kinds of results, and the potential benefits and risks of receiving results. To explore these issues, we conducted four exploratory focus groups with adoptive parents (N = 26). The majority lacked information about their children's biological family health history and ancestry, and many viewed WGS results as a way to fill in these gaps in knowledge. Some expressed concerns about protecting their children's future privacy and autonomy, but at the same time stated that WGS results could possibly help them be proactive about their children's health. A few parents expressed concerns about the risks of WGS in a pre-adoption context, specifically about decreasing a child's chance of adoption. These results suggest that issues surrounding genetic information in the post-adoption and ES/WGS contexts need to be considered, as well as concerns about risks in the pre-adoption context. A critical challenge for ES/WGS in the context of adoption will be balancing the right to know different kinds of genetic information with the right not to know. Specific guidance for geneticists and genetic counselors may be needed to maximize benefits of WGS while minimizing harms and prohibiting misuse of the information in the adoption process.

  9. The American cranberry mitochondrial genome reveals the presence of selenocysteine (tRNA-Sec and SECIS) insertion machinery in land plants

    Science.gov (United States)

    The American cranberry (Vaccinium macrocarpon Ait.) mitochondrial genome was assembled and reconstructed from whole genome 454 Roche GS-FLX and Illumina shotgun sequences. Compared with other Asterids, the reconstruction of the genome revealed an average size mitochondrion (459,678 nt) with comparat...

  10. Global Genomic Epidemiology of Salmonella enterica Serovar Typhimurium DT104

    DEFF Research Database (Denmark)

    Leekitcharoenphon, Pimlapas; Hendriksen, Rene S.; Le Hello, Simon

    2016-01-01

    It has been 30 years since the initial emergence and subsequent rapid global spread of multidrug-resistant Salmonella enterica serovar Typhimurium DT104 (MDR DT104). Nonetheless, its origin and transmission route have never been revealed. We used whole-genome sequencing (WGS) and temporally struc...

  11. Hapsembler: An Assembler for Highly Polymorphic Genomes

    Science.gov (United States)

    Donmez, Nilgun; Brudno, Michael

    As whole genome sequencing has become a routine biological experiment, algorithms for assembly of whole genome shotgun data has become a topic of extensive research, with a plethora of off-the-shelf methods that can reconstruct the genomes of many organisms. Simultaneously, several recently sequenced genomes exhibit very high polymorphism rates. For these organisms genome assembly remains a challenge as most assemblers are unable to handle highly divergent haplotypes in a single individual. In this paper we describe Hapsembler, an assembler for highly polymorphic genomes, which makes use of paired reads. Our experiments show that Hapsembler produces accurate and contiguous assemblies of highly polymorphic genomes, while performing on par with the leading tools on haploid genomes. Hapsembler is available for download at http://compbio.cs.toronto.edu/hapsembler.

  12. Epidemiological analysis of Salmonella clusters identified by whole genome sequencing, England and Wales 2014.

    Science.gov (United States)

    Waldram, Alison; Dolan, Gayle; Ashton, Philip M; Jenkins, Claire; Dallman, Timothy J

    2018-05-01

    The unprecedented level of bacterial strain discrimination provided by whole genome sequencing (WGS) presents new challenges with respect to the utility and interpretation of the data. Whole genome sequences from 1445 isolates of Salmonella belonging to the most commonly identified serotypes in England and Wales isolated between April and August 2014 were analysed. Single linkage single nucleotide polymorphism thresholds at the 10, 5 and 0 level were explored for evidence of epidemiological links between clustered cases. Analysis of the WGS data organised 566 of the 1445 isolates into 32 clusters of five or more. A statistically significant epidemiological link was identified for 17 clusters. The clusters were associated with foreign travel (n = 8), consumption of Chinese takeaways (n = 4), chicken eaten at home (n = 2), and one each of the following; eating out, contact with another case in the home and contact with reptiles. In the same time frame, one cluster was detected using traditional outbreak detection methods. WGS can be used for the highly specific and highly sensitive detection of biologically related isolates when epidemiological links are obscured. Improvements in the collection of detailed, standardised exposure information would enhance cluster investigations. Copyright © 2017 Elsevier Ltd. All rights reserved.

  13. Genome sequencing of Deutsch strain of cattle ticks, Rhipicephalus microplus: Raw Pac Bio reads.

    Science.gov (United States)

    Pac Bio RS II whole genome shotgun sequencing technology was used to sequence the genome of the cattle tick, Rhipicephalus microplus. The DNA was derived from 14 day old eggs from the Deutsch Texas outbreak strain reared at the USDA-ARS Cattle Fever Tick Research Laboratory, Edinburg, TX. Each corre...

  14. Challenges and opportunities for whole-genome sequencing–based surveillance of antibiotic resistance

    NARCIS (Netherlands)

    Schürch, Anita C.; van Schaik, Willem

    2017-01-01

    Infections caused by drug-resistant bacteria are increasingly reported across the planet, and drug-resistant bacteria are recognized to be a major threat to public health and modern medicine. In this review, we discuss how whole-genome sequencing (WGS)–based approaches can contribute to the

  15. WGS-Adsorbent Reaction Studies at Laboratory Scale; Estudios de la Reaccion WGS-Adsorbente a Escala de Laboratorio

    Energy Technology Data Exchange (ETDEWEB)

    Marano, M.; Torreiro, Y.

    2014-02-01

    This document reports the most significant results obtained during the experimental work performed under task WGS-adsorbent experimental studies within CAPHIGAS project (National Research Plan 2008-2011, ref: ENE2009-08002). The behavior of the binary adsorbent-catalyst system which will be used in the hybrid system is described in this document. Main results reported here were used during the design and development of the hybrid system adsorbent catalyst- membrane proposed in the CAPHIGAS project. The influence of main operating parameters and the optimized volume ratio adsorbent-catalyst are also presented in this report. (Author)

  16. Insights from early experience of a Rare Disease Genomic Medicine Multidisciplinary Team: a qualitative study.

    Science.gov (United States)

    Ormondroyd, Elizabeth; Mackley, Michael P; Blair, Edward; Craft, Jude; Knight, Julian C; Taylor, John; Taylor, Jenny C; Wilkie, Andrew Om; Watkins, Hugh

    2017-06-01

    Whole-exome/whole-genome sequencing (WES/WGS) has the potential to enhance genetic diagnosis of rare disease, and is increasingly becoming part of routine clinical care in mainstream medicine. Effective translation will require ongoing efforts in a number of areas including: selection of appropriate patients, provision of effective consent, pre- and post-test genetic counselling, improving variant interpretation algorithms and practices, and management of secondary findings including those found incidentally and those actively sought. Allied to this is the need for an effective education programme for all members of clinical teams involved in care of patients with rare disease, as well as to maintain public confidence in the use of these technologies. We established a Genomic Medicine Multidisciplinary Team (GM-MDT) in 2014 to build on the experiences of earlier successful research-based WES/WGS studies, to address these needs and to review results including pertinent and secondary findings. Here we report on a qualitative study of decision-making in the GM-MDT combined with analysis of semi-structured interviews with GM-MDT members. Study findings show that members appreciate the clinical and scientific diversity of the GM-MDT and value it for education and oversight. To date, discussions have focussed on case selection including the extent and interpretation of clinical and family history information required to establish likely monogenic aetiology and inheritance model. Achieving a balance between effective use of WES/WGS - prioritising cases in a diverse and highly complex patient population where WES/WGS will be tractable - and meeting the recruitment targets of a large project is considered challenging.

  17. Genotyping using whole-genome sequencing is a realistic alternative to surveillance based on phenotypic antimicrobial susceptibility testing

    DEFF Research Database (Denmark)

    Zankari, Ea; Hasman, Henrik; Kaas, Rolf Sommer

    2013-01-01

    -genome sequencing (WGS) may soon be within reach even for routine surveillance and clinical diagnostics. The aim of this study was to evaluate WGS as a routine tool for surveillance of antimicrobial resistance compared with current phenotypic procedures. Methods: Antimicrobial susceptibility tests were performed...... to the categorizing of isolates as resistant and 2569 as susceptible. Seven cases of disagreement between tested and predicted susceptibility were observed, six of which were related to spectinomycin resistance in Escherichia coli. Correlation between MLST type and resistance profiles was only observed in Salmonella...

  18. Analysis of high-throughput sequencing and annotation strategies for phage genomes.

    Directory of Open Access Journals (Sweden)

    Matthew R Henn

    Full Text Available BACKGROUND: Bacterial viruses (phages play a critical role in shaping microbial populations as they influence both host mortality and horizontal gene transfer. As such, they have a significant impact on local and global ecosystem function and human health. Despite their importance, little is known about the genomic diversity harbored in phages, as methods to capture complete phage genomes have been hampered by the lack of knowledge about the target genomes, and difficulties in generating sufficient quantities of genomic DNA for sequencing. Of the approximately 550 phage genomes currently available in the public domain, fewer than 5% are marine phage. METHODOLOGY/PRINCIPAL FINDINGS: To advance the study of phage biology through comparative genomic approaches we used marine cyanophage as a model system. We compared DNA preparation methodologies (DNA extraction directly from either phage lysates or CsCl purified phage particles, and sequencing strategies that utilize either Sanger sequencing of a linker amplification shotgun library (LASL or of a whole genome shotgun library (WGSL, or 454 pyrosequencing methods. We demonstrate that genomic DNA sample preparation directly from a phage lysate, combined with 454 pyrosequencing, is best suited for phage genome sequencing at scale, as this method is capable of capturing complete continuous genomes with high accuracy. In addition, we describe an automated annotation informatics pipeline that delivers high-quality annotation and yields few false positives and negatives in ORF calling. CONCLUSIONS/SIGNIFICANCE: These DNA preparation, sequencing and annotation strategies enable a high-throughput approach to the burgeoning field of phage genomics.

  19. Whole genome sequencing of clinical strains of Mycobacterium tuberculosis from Mumbai, India: A potential tool for determining drug-resistance and strain lineage.

    Science.gov (United States)

    Chatterjee, Anirvan; Nilgiriwala, Kayzad; Saranath, Dhananjaya; Rodrigues, Camilla; Mistry, Nerges

    2017-12-01

    Amplification of drug resistance in Mycobacterium tuberculosis (M.tb) and its transmission are significant barriers in controlling tuberculosis (TB) globally. Diagnostic inaccuracies and delays impede appropriate drug administration, which exacerbates primary and secondary drug resistance. Increasing affordability of whole genome sequencing (WGS) and exhaustive cataloguing of drug resistance mutations is poised to revolutionise TB diagnostics and facilitate personalized drug therapy. However, application of WGS for diagnostics in high endemic areas is yet to be demonstrated. We report WGS of 74 clinical TB isolates from Mumbai, India, characterising genotypic drug resistance to first- and second-line anti-TB drugs. A concordance analysis between phenotypic and genotypic drug susceptibility of a subset of 29 isolates and the sensitivity of resistance prediction to the 4 drugs was calculated, viz. isoniazid-100%, rifampicin-100%, ethambutol-100% and streptomycin-85%. The whole genome based phylogeny showed almost equal proportion of East Asian (27/74) and Central Asian (25/74) strains. Interestingly we also found a clonal group of 9 isolates, of which 7 patients were found to be from the same geographical location and accessed the same health post. This provides the first evidence of epidemiological linkage for tracking TB transmission in India, an approach which has the potential to significantly improve chances of End-TB goals. Finally, the use of Mykrobe Predictor, as a standalone drug resistance and strain typing tool, requiring just few minutes to analyse raw WGS data into tabulated results, implies the rapid clinical applicability of WGS based TB diagnosis. Copyright © 2017 Elsevier Ltd. All rights reserved.

  20. Whole genome sequencing reveals a de novo SHANK3 mutation in familial autism spectrum disorder.

    Directory of Open Access Journals (Sweden)

    Sergio I Nemirovsky

    Full Text Available Clinical genomics promise to be especially suitable for the study of etiologically heterogeneous conditions such as Autism Spectrum Disorder (ASD. Here we present three siblings with ASD where we evaluated the usefulness of Whole Genome Sequencing (WGS for the diagnostic approach to ASD.We identified a family segregating ASD in three siblings with an unidentified cause. We performed WGS in the three probands and used a state-of-the-art comprehensive bioinformatic analysis pipeline and prioritized the identified variants located in genes likely to be related to ASD. We validated the finding by Sanger sequencing in the probands and their parents.Three male siblings presented a syndrome characterized by severe intellectual disability, absence of language, autism spectrum symptoms and epilepsy with negative family history for mental retardation, language disorders, ASD or other psychiatric disorders. We found germline mosaicism for a heterozygous deletion of a cytosine in the exon 21 of the SHANK3 gene, resulting in a missense sequence of 5 codons followed by a premature stop codon (NM_033517:c.3259_3259delC, p.Ser1088Profs*6.We reported an infrequent form of familial ASD where WGS proved useful in the clinic. We identified a mutation in SHANK3 that underscores its relevance in Autism Spectrum Disorder.

  1. Direct whole-genome sequencing of Plasmodium falciparum specimens from dried erythrocyte spots

    DEFF Research Database (Denmark)

    Nag, Sidsel; Kofoed, Poul Erik; Ursing, Johan

    2018-01-01

    -infected individuals living in rural areas, away from main infrastructure and the electrical grid. The aim of this study was to describe a low-tech procedure to sample P. falciparum specimens for direct whole genome sequencing (WGS), without use of electricity and cold-chain. Methods: Venous blood samples were...

  2. A comparison of rice chloroplast genomes

    DEFF Research Database (Denmark)

    Tang, Jiabin; Xia, Hong'ai; Cao, Mengliang

    2004-01-01

    Using high quality sequence reads extracted from our whole genome shotgun repository, we assembled two chloroplast genome sequences from two rice (Oryza sativa) varieties, one from 93-11 (a typical indica variety) and the other from PA64S (an indica-like variety with maternal origin of japonica......), which are both parental varieties of the super-hybrid rice, LYP9. Based on the patterns of high sequence coverage, we partitioned chloroplast sequence variations into two classes, intravarietal and intersubspecific polymorphisms. Intravarietal polymorphisms refer to variations within 93-11 or PA64S...

  3. Comparative Genomics of Vibrio cholerae O1 Isolated from Cholera Patients in Bangladesh

    DEFF Research Database (Denmark)

    Hossain, Zenat Zebin; Leekitcharoenphon, Pimlapas; Dalsgaard, Anders

    patients was co-infected with two V. cholerae strains (VC-1 and VC-3). Major virulence factors, biotype and antimicrobial resistance genes were identified by WGS. A global phylogenetic tree was inferred using genome wide SNPs (Single Nucleotide Polymorphism) analysis. RESULTS: All the V. cholerae strains...

  4. Using Partial Genomic Fosmid Libraries for Sequencing CompleteOrganellar Genomes

    Energy Technology Data Exchange (ETDEWEB)

    McNeal, Joel R.; Leebens-Mack, James H.; Arumuganathan, K.; Kuehl, Jennifer V.; Boore, Jeffrey L.; dePamphilis, Claude W.

    2005-08-26

    Organellar genome sequences provide numerous phylogenetic markers and yield insight into organellar function and molecular evolution. These genomes are much smaller in size than their nuclear counterparts; thus, their complete sequencing is much less expensive than total nuclear genome sequencing, making broader phylogenetic sampling feasible. However, for some organisms it is challenging to isolate plastid DNA for sequencing using standard methods. To overcome these difficulties, we constructed partial genomic libraries from total DNA preparations of two heterotrophic and two autotrophic angiosperm species using fosmid vectors. We then used macroarray screening to isolate clones containing large fragments of plastid DNA. A minimum tiling path of clones comprising the entire genome sequence of each plastid was selected, and these clones were shotgun-sequenced and assembled into complete genomes. Although this method worked well for both heterotrophic and autotrophic plants, nuclear genome size had a dramatic effect on the proportion of screened clones containing plastid DNA and, consequently, the overall number of clones that must be screened to ensure full plastid genome coverage. This technique makes it possible to determine complete plastid genome sequences for organisms that defy other available organellar genome sequencing methods, especially those for which limited amounts of tissue are available.

  5. Comparing Whole-Genome Sequencing with Sanger Sequencing for spa Typing of Methicillin-Resistant Staphylococcus aureus

    DEFF Research Database (Denmark)

    Bartels, Mette Damkjaer; Petersen, Andreas; Worning, Peder

    2014-01-01

    spa typing of methicillin-resistant Staphylococcus aureus (MRSA) has traditionally been done by PCR amplification and Sanger sequencing of the spa repeat region. At Hvidovre Hospital, Denmark, whole-genome sequencing (WGS) of all MRSA isolates has been performed routinely since January 2013, and ...

  6. Genomic variation in Salmonella enterica core genes for epidemiological typing

    DEFF Research Database (Denmark)

    Leekitcharoenphon, Pimlapas; Lukjancenko, Oksana; Rundsten, Carsten Friis

    2012-01-01

    Background: Technological advances in high throughput genome sequencing are making whole genome sequencing (WGS) available as a routine tool for bacterial typing. Standardized procedures for identification of relevant genes and of variation are needed to enable comparison between studies and over...... genomes and evaluate their value as typing targets, comparing whole genome typing and traditional methods such as 16S and MLST. A consensus tree based on variation of core genes gives much better resolution than 16S and MLST; the pan-genome family tree is similar to the consensus tree, but with higher...... that there is a positive selection towards mutations leading to amino acid changes. Conclusions: Genomic variation within the core genome is useful for investigating molecular evolution and providing candidate genes for bacterial genome typing. Identification of genes with different degrees of variation is important...

  7. Reference Genome-Directed Resolution of Homologous and Homeologous Relationships within and between Different Oat Linkage Maps

    Directory of Open Access Journals (Sweden)

    Juan J. Gutierrez-Gonzalez

    2011-11-01

    Full Text Available Genome research on oat ( L. has received less attention than wheat ( L. and barley ( L. because it is a less prominent component of the human food system. To assess the potential of the model grass (L P. Beauv. as a surrogate for oat genome research, the whole genome sequence (WGS of was employed for comparative analysis with oat genetic linkage maps. Sequences of mapped molecular markers from one diploid spp. and two hexaploid oat maps were aligned to the WGS to infer syntenic relationships. Diploid and exhibit a high degree of synteny with 18 syntenic blocks covering 87% of the oat genome, which permitted postulation of an ancestral spp. chromosome structure. Synteny between oat and was also prevalent, with 50 syntenic blocks covering 76.6% of the ‘Kanota’ × ‘Ogle’ linkage map. Coalignment of diploid and hexaploid maps to helped resolve homeologous relationships between different oat linkage groups but also revealed many major rearrangements in oat subgenomes. Extending the analysis to a second oat linkage map (Ogle × ‘TAM O-301’ allowed identification of several putative homologous linkage groups across the two oat populations. These results indicate that the genome sequence will be a useful resource to assist genetics and genomics research in oat. The analytical strategy employed here should be applicable for genome research in other temperate grass crops with modest amounts of genomic data.

  8. Microbiota present in cystic fibrosis lungs as revealed by whole genome sequencing.

    Directory of Open Access Journals (Sweden)

    Philippe M Hauser

    Full Text Available Determination of the precise composition and variation of microbiota in cystic fibrosis lungs is crucial since chronic inflammation due to microorganisms leads to lung damage and ultimately, death. However, this constitutes a major technical challenge. Culturing of microorganisms does not provide a complete representation of a microbiota, even when using culturomics (high-throughput culture. So far, only PCR-based metagenomics have been investigated. However, these methods are biased towards certain microbial groups, and suffer from uncertain quantification of the different microbial domains. We have explored whole genome sequencing (WGS using the Illumina high-throughput technology applied directly to DNA extracted from sputa obtained from two cystic fibrosis patients. To detect all microorganism groups, we used four procedures for DNA extraction, each with a different lysis protocol. We avoided biases due to whole DNA amplification thanks to the high efficiency of current Illumina technology. Phylogenomic classification of the reads by three different methods produced similar results. Our results suggest that WGS provides, in a single analysis, a better qualitative and quantitative assessment of microbiota compositions than cultures and PCRs. WGS identified a high quantity of Haemophilus spp. (patient 1 or Staphylococcus spp. plus Streptococcus spp. (patient 2 together with low amounts of anaerobic (Veillonella, Prevotella, Fusobacterium and aerobic bacteria (Gemella, Moraxella, Granulicatella. WGS suggested that fungal members represented very low proportions of the microbiota, which were detected by cultures and PCRs because of their selectivity. The future increase of reads' sizes and decrease in cost should ensure the usefulness of WGS for the characterisation of microbiota.

  9. Rapid whole genome sequencing for the detection and characterization of microorganisms directly from clinical samples

    DEFF Research Database (Denmark)

    Hasman, Henrik; Saputra, Dhany; Sicheritz-Pontén, Thomas

    2014-01-01

    Whole genome sequencing (WGS) is becoming available as a routine tool for clinical microbiology. If applied directly on clinical samples this could further reduce diagnostic time and thereby improve control and treatment. A major bottle-neck is the availability of fast and reliable bioinformatics...

  10. Routine Whole-Genome Sequencing for Outbreak Investigations of Staphylococcus aureus in a National Reference Center

    Directory of Open Access Journals (Sweden)

    Geraldine Durand

    2018-03-01

    Full Text Available The French National Reference Center for Staphylococci currently uses DNA arrays and spa typing for the initial epidemiological characterization of Staphylococcus aureus strains. We here describe the use of whole-genome sequencing (WGS to investigate retrospectively four distinct and virulent S. aureus lineages [clonal complexes (CCs: CC1, CC5, CC8, CC30] involved in hospital and community outbreaks or sporadic infections in France. We used a WGS bioinformatics pipeline based on de novo assembly (reference-free approach, single nucleotide polymorphism analysis, and on the inclusion of epidemiological markers. We examined the phylogeographic diversity of the French dominant hospital-acquired CC8-MRSA (methicillin-resistant S. aureus Lyon clone through WGS analysis which did not demonstrate evidence of large-scale geographic clustering. We analyzed sporadic cases along with two outbreaks of a CC1-MSSA (methicillin-susceptible S. aureus clone containing the Panton–Valentine leukocidin (PVL and results showed that two sporadic cases were closely related. We investigated an outbreak of PVL-positive CC30-MSSA in a school environment and were able to reconstruct the transmission history between eight families. We explored different outbreaks among newborns due to the CC5-MRSA Geraldine clone and we found evidence of an unsuspected link between two otherwise distinct outbreaks. Here, WGS provides the resolving power to disprove transmission events indicated by conventional methods (same sequence type, spa type, toxin profile, and antibiotic resistance profile and, most importantly, WGS can reveal unsuspected transmission events. Therefore, WGS allows to better describe and understand outbreaks and (inter-national dissemination of S. aureus lineages. Our findings underscore the importance of adding WGS for (inter-national surveillance of infections caused by virulent clones of S. aureus but also substantiate the fact that technological optimization at

  11. Genome-wide characterization of centromeric satellites from multiple mammalian genomes.

    Science.gov (United States)

    Alkan, Can; Cardone, Maria Francesca; Catacchio, Claudia Rita; Antonacci, Francesca; O'Brien, Stephen J; Ryder, Oliver A; Purgato, Stefania; Zoli, Monica; Della Valle, Giuliano; Eichler, Evan E; Ventura, Mario

    2011-01-01

    Despite its importance in cell biology and evolution, the centromere has remained the final frontier in genome assembly and annotation due to its complex repeat structure. However, isolation and characterization of the centromeric repeats from newly sequenced species are necessary for a complete understanding of genome evolution and function. In recent years, various genomes have been sequenced, but the characterization of the corresponding centromeric DNA has lagged behind. Here, we present a computational method (RepeatNet) to systematically identify higher-order repeat structures from unassembled whole-genome shotgun sequence and test whether these sequence elements correspond to functional centromeric sequences. We analyzed genome datasets from six species of mammals representing the diversity of the mammalian lineage, namely, horse, dog, elephant, armadillo, opossum, and platypus. We define candidate monomer satellite repeats and demonstrate centromeric localization for five of the six genomes. Our analysis revealed the greatest diversity of centromeric sequences in horse and dog in contrast to elephant and armadillo, which showed high-centromeric sequence homogeneity. We could not isolate centromeric sequences within the platypus genome, suggesting that centromeres in platypus are not enriched in satellite DNA. Our method can be applied to the characterization of thousands of other vertebrate genomes anticipated for sequencing in the near future, providing an important tool for annotation of centromeres.

  12. Identification of Escherichia coli and Shigella Species from Whole-Genome Sequences.

    Science.gov (United States)

    Chattaway, Marie A; Schaefer, Ulf; Tewolde, Rediat; Dallman, Timothy J; Jenkins, Claire

    2017-02-01

    Escherichia coli and Shigella species are closely related and genetically constitute the same species. Differentiating between these two pathogens and accurately identifying the four species of Shigella are therefore challenging. The organism-specific bioinformatics whole-genome sequencing (WGS) typing pipelines at Public Health England are dependent on the initial identification of the bacterial species by use of a kmer-based approach. Of the 1,982 Escherichia coli and Shigella sp. isolates analyzed in this study, 1,957 (98.4%) had concordant results by both traditional biochemistry and serology (TB&S) and the kmer identification (ID) derived from the WGS data. Of the 25 mismatches identified, 10 were enteroinvasive E. coli isolates that were misidentified as Shigella flexneri or S. boydii by the kmer ID, and 8 were S. flexneri isolates misidentified by TB&S as S. boydii due to nonfunctional S. flexneri O antigen biosynthesis genes. Analysis of the population structure based on multilocus sequence typing (MLST) data derived from the WGS data showed that the remaining discrepant results belonged to clonal complex 288 (CC288), comprising both S. boydii and S. dysenteriae strains. Mismatches between the TB&S and kmer ID results were explained by the close phylogenetic relationship between the two species and were resolved with reference to the MLST data. Shigella can be differentiated from E. coli and accurately identified to the species level by use of kmer comparisons and MLST. Analysis of the WGS data provided explanations for the discordant results between TB&S and WGS data, revealed the true phylogenetic relationships between different species of Shigella, and identified emerging pathoadapted lineages. © Crown copyright 2017.

  13. WGS-Adsorbent Reaction Studies at Laboratory Scale

    International Nuclear Information System (INIS)

    Marano, M.; Torreiro, Y.

    2014-01-01

    This document reports the most significant results obtained during the experimental work performed under task WGS adsorbent experimental studies within CAPHIGAS project (National Research Plan 2008-2011, ref: ENE2009-08002). The behavior of the binary adsorbent-catalyst system which will be used in the hybrid system is described in this document. Main results reported here were used during the design and development of the hybrid system adsorbent catalyst- membrane proposed in the CAPHIGAS project. The influence of main operating parameters and the optimized volume ratio adsorbent-catalyst are also presented in this report. (Author)

  14. Ethical issues in consumer genome sequencing: Use of consumers' samples and data.

    Science.gov (United States)

    Niemiec, Emilia; Howard, Heidi Carmen

    2016-03-01

    High throughput approaches such as whole genome sequencing (WGS) and whole exome sequencing (WES) create an unprecedented amount of data providing powerful resources for clinical care and research. Recently, WGS and WES services have been made available by commercial direct-to-consumer (DTC) companies. The DTC offer of genetic testing (GT) has already brought attention to potentially problematic issues such as the adequacy of consumers' informed consent and transparency of companies' research activities. In this study, we analysed the websites of four DTC GT companies offering WGS and/or WES with regard to their policies governing storage and future use of consumers' data and samples. The results are discussed in relation to recommendations and guiding principles such as the "Statement of the European Society of Human Genetics on DTC GT for health-related purposes" (2010) and the "Framework for responsible sharing of genomic and health-related data" (Global Alliance for Genomics and Health, 2014). The analysis reveals that some companies may store and use consumers' samples or sequencing data for unspecified research and share the data with third parties. Moreover, the companies do not provide sufficient or clear information to consumers about this, which can undermine the validity of the consent process. Furthermore, while all companies state that they provide privacy safeguards for data and mention the limitations of these, information about the possibility of re-identification is lacking. Finally, although the companies that may conduct research do include information regarding proprietary claims and commercialisation of the results, it is not clear whether consumers are aware of the consequences of these policies. These results indicate that DTC GT companies still need to improve the transparency regarding handling of consumers' samples and data, including having an explicit and clear consent process for research activities.

  15. Ethical issues in consumer genome sequencing: Use of consumers' samples and data

    Directory of Open Access Journals (Sweden)

    Emilia Niemiec

    2016-03-01

    Full Text Available High throughput approaches such as whole genome sequencing (WGS and whole exome sequencing (WES create an unprecedented amount of data providing powerful resources for clinical care and research. Recently, WGS and WES services have been made available by commercial direct-to-consumer (DTC companies. The DTC offer of genetic testing (GT has already brought attention to potentially problematic issues such as the adequacy of consumers' informed consent and transparency of companies' research activities. In this study, we analysed the websites of four DTC GT companies offering WGS and/or WES with regard to their policies governing storage and future use of consumers' data and samples. The results are discussed in relation to recommendations and guiding principles such as the “Statement of the European Society of Human Genetics on DTC GT for health-related purposes” (2010 and the “Framework for responsible sharing of genomic and health-related data” (Global Alliance for Genomics and Health, 2014. The analysis reveals that some companies may store and use consumers' samples or sequencing data for unspecified research and share the data with third parties. Moreover, the companies do not provide sufficient or clear information to consumers about this, which can undermine the validity of the consent process. Furthermore, while all companies state that they provide privacy safeguards for data and mention the limitations of these, information about the possibility of re-identification is lacking. Finally, although the companies that may conduct research do include information regarding proprietary claims and commercialisation of the results, it is not clear whether consumers are aware of the consequences of these policies. These results indicate that DTC GT companies still need to improve the transparency regarding handling of consumers' samples and data, including having an explicit and clear consent process for research activities.

  16. An Assessment of Different Genomic Approaches for Inferring Phylogeny of Listeria monocytogenes

    DEFF Research Database (Denmark)

    Henri, Clementine; Leekitcharoenphon, Pimlapas; Carleton, Heather A.

    2017-01-01

    Background/objectives: Whole genome sequencing (WGS) has proven to be a powerful subtyping tool for foodborne pathogenic bacteria like L. monocytogenes. The interests of genome-scale analysis for national surveillance, outbreak detection or source tracking has been largely documented. The genomic......MLPPST) or pan genome (wgMLPPST). Currently, there are little comparisons studies of these different analytical approaches. Our objective was to assess and compare different genomic methods that can be implemented in order to cluster isolates of L monocytogenes.Methods: The clustering methods were evaluated...... on a collection of 207 L. monocytogenes genomes of food origin representative of the genetic diversity of the Anses collection. The trees were then compared using robust statistical analyses.Results: The backward comparability between conventional typing methods and genomic methods revealed a near...

  17. Counter-current membrane reactor for WGS process: Membrane design

    Energy Technology Data Exchange (ETDEWEB)

    Piemonte, Vincenzo; Favetta, Barbara [Department of Chemical Engineering Materials and Environment, University of Rome ' ' La Sapienza' ' , via Eudossiana 18, 00184 Rome (Italy); De Falco, Marcello [Faculty of Engineering, University Campus Bio-Medico of Rome, via Alvaro del Portillo 21, 00128 Rome (Italy); Basile, Angelo [CNR-ITM, c/o University of Calabria, Via Pietro Bucci, Cubo 17/C, 87030 Rende (CS) (Italy)

    2010-11-15

    Water gas shift (WGS) is a thermodynamically limited reaction which has to operate at low temperatures, reducing kinetics rate and increasing the amount of catalyst required to reach valuable CO conversions. It has been widely demonstrated that the integration of hydrogen selective membranes is a promising way to enhance WGS reactors performance: a Pd-based MR operated successfully overcoming the thermodynamic constraints of a traditional reactor thanks to the removal of hydrogen from reaction environment. In the first part of a MR, the H{sub 2} partial pressure starts from a minimum value since the reaction has not started. As a consequence, if the carrier gas in the permeation zone is sent in counter-current, which is the most efficient configuration, in the first reactor section the H{sub 2} partial pressure in reaction zone is low while in the permeation zone is high, potentially implying back permeation. This means a bad utilization of the first part of the membrane area and thus, a worsening of the MR performance with lower H{sub 2} recovery and lower CO conversion with respect to the case in which the whole selective surface is properly used. To avoid this problem different MR configurations were evaluated by a 1-D pseudo-homogeneous model, validated with WGS industrial data reported in scientific literature. It was demonstrated that the permeated H{sub 2} flow rate per membrane surface, i.e. the membrane flux, strongly improves if selective membrane is placed only in the second part of the reactor: in fact, if the membrane is placed at L{sub m}/L{sub tot} = 0.5, the membrane flux is 0.2 kmol/(m{sup 2}h) about, if it is placed along all reactor tube (L{sub m}/L{sub tot} = 1), flux is 0.05 kmol/(m{sup 2}h). The effect of the L/D reactor ratio and of the reactor wall temperature on the CO conversion were also assessed. (author)

  18. Norgal: Extraction and de novo assembly of mitochondrial DNA from whole-genome sequencing data

    DEFF Research Database (Denmark)

    Al-Nakeeb, Kosai Ali Ahmed; Petersen, Thomas Nordahl; Sicheritz-Pontén, Thomas

    2017-01-01

    and performing a de novo assembly on a subset of reads that contains these k-mers. The method was applied to WGS data from a panda, brown algae seaweed, butterfly and filamentous fungus. We were able to extract full circular mitochondrial genomes and obtained sequence identities to the reference sequences...

  19. A Danish Salmonella Bareilly outbreak investigated by the use of whole genome sequencing

    DEFF Research Database (Denmark)

    Torpdahl, M.; Kiil, K.; Litrup, E.

    2013-01-01

    with several band changes and others are defined by one PFGE profile thereby excluding closely related profiles. We decided to investigate whether whole genome sequencing (WGS) could resolve this issue and be useful in outbreak investigations. Several analyses were performed, including a SNP tree based...... on the core genome, MLST profiles and detection of phages in the genome. The human cluster and the broiler isolates belonged to the same ST, but the isolates were divided into two groups, 9 SNPs apart, according to an MP phylogeny. When using PHAST, we found that two phage regions were a 100% similar...

  20. Real-Time Whole-Genome Sequencing for Routine Typing, Surveillance, and Outbreak Detection of Verotoxigenic Escherichia coli

    Science.gov (United States)

    Scheutz, Flemming; Lund, Ole; Hasman, Henrik; Kaas, Rolf S.; Nielsen, Eva M.; Aarestrup, Frank M.

    2014-01-01

    Fast and accurate identification and typing of pathogens are essential for effective surveillance and outbreak detection. The current routine procedure is based on a variety of techniques, making the procedure laborious, time-consuming, and expensive. With whole-genome sequencing (WGS) becoming cheaper, it has huge potential in both diagnostics and routine surveillance. The aim of this study was to perform a real-time evaluation of WGS for routine typing and surveillance of verocytotoxin-producing Escherichia coli (VTEC). In Denmark, the Statens Serum Institut (SSI) routinely receives all suspected VTEC isolates. During a 7-week period in the fall of 2012, all incoming isolates were concurrently subjected to WGS using IonTorrent PGM. Real-time bioinformatics analysis was performed using web-tools (www.genomicepidemiology.org) for species determination, multilocus sequence type (MLST) typing, and determination of phylogenetic relationship, and a specific VirulenceFinder for detection of E. coli virulence genes was developed as part of this study. In total, 46 suspected VTEC isolates were characterized in parallel during the study. VirulenceFinder proved successful in detecting virulence genes included in routine typing, explicitly verocytotoxin 1 (vtx1), verocytotoxin 2 (vtx2), and intimin (eae), and also detected additional virulence genes. VirulenceFinder is also a robust method for assigning verocytotoxin (vtx) subtypes. A real-time clustering of isolates in agreement with the epidemiology was established from WGS, enabling discrimination between sporadic and outbreak isolates. Overall, WGS typing produced results faster and at a lower cost than the current routine. Therefore, WGS typing is a superior alternative to conventional typing strategies. This approach may also be applied to typing and surveillance of other pathogens. PMID:24574290

  1. Real-time whole-genome sequencing for routine typing, surveillance, and outbreak detection of verotoxigenic Escherichia coli.

    Science.gov (United States)

    Joensen, Katrine Grimstrup; Scheutz, Flemming; Lund, Ole; Hasman, Henrik; Kaas, Rolf S; Nielsen, Eva M; Aarestrup, Frank M

    2014-05-01

    Fast and accurate identification and typing of pathogens are essential for effective surveillance and outbreak detection. The current routine procedure is based on a variety of techniques, making the procedure laborious, time-consuming, and expensive. With whole-genome sequencing (WGS) becoming cheaper, it has huge potential in both diagnostics and routine surveillance. The aim of this study was to perform a real-time evaluation of WGS for routine typing and surveillance of verocytotoxin-producing Escherichia coli (VTEC). In Denmark, the Statens Serum Institut (SSI) routinely receives all suspected VTEC isolates. During a 7-week period in the fall of 2012, all incoming isolates were concurrently subjected to WGS using IonTorrent PGM. Real-time bioinformatics analysis was performed using web-tools (www.genomicepidemiology.org) for species determination, multilocus sequence type (MLST) typing, and determination of phylogenetic relationship, and a specific VirulenceFinder for detection of E. coli virulence genes was developed as part of this study. In total, 46 suspected VTEC isolates were characterized in parallel during the study. VirulenceFinder proved successful in detecting virulence genes included in routine typing, explicitly verocytotoxin 1 (vtx1), verocytotoxin 2 (vtx2), and intimin (eae), and also detected additional virulence genes. VirulenceFinder is also a robust method for assigning verocytotoxin (vtx) subtypes. A real-time clustering of isolates in agreement with the epidemiology was established from WGS, enabling discrimination between sporadic and outbreak isolates. Overall, WGS typing produced results faster and at a lower cost than the current routine. Therefore, WGS typing is a superior alternative to conventional typing strategies. This approach may also be applied to typing and surveillance of other pathogens.

  2. Genomic, proteomic and biochemical analysis of the organohalide respiratory pathway in Desulfitobacterium dehalogenans

    NARCIS (Netherlands)

    Kruse, T.; Pas, van de B.A.; Atteia, A.; Krab, K.; Hagen, W.R.; Goodwin, L.; Chain, P.; Boeren, S.; Maphosa, F.; Schraa, G.; Vos, de W.M.; Oost, van der J.; Smidt, H.; Stams, A.J.M.

    2015-01-01

    Desulfitobacterium dehalogenans is able to grow by organohalide respiration using 3-chloro-4-hydroxyphenyl acetate (Cl-OHPA) as an electron acceptor. We used a combination of genome sequencing, biochemical analysis of redox active components and shotgun proteomics to study elements of the

  3. The Glyphosate-Based Herbicide Roundup Does not Elevate Genome-Wide Mutagenesis of Escherichia coli.

    Science.gov (United States)

    Tincher, Clayton; Long, Hongan; Behringer, Megan; Walker, Noah; Lynch, Michael

    2017-10-05

    Mutations induced by pollutants may promote pathogen evolution, for example by accelerating mutations conferring antibiotic resistance. Generally, evaluating the genome-wide mutagenic effects of long-term sublethal pollutant exposure at single-nucleotide resolution is extremely difficult. To overcome this technical barrier, we use the mutation accumulation/whole-genome sequencing (MA/WGS) method as a mutagenicity test, to quantitatively evaluate genome-wide mutagenesis of Escherichia coli after long-term exposure to a wide gradient of the glyphosate-based herbicide (GBH) Roundup Concentrate Plus. The genome-wide mutation rate decreases as GBH concentration increases, suggesting that even long-term GBH exposure does not compromise the genome stability of bacteria. Copyright © 2017 Tincher et al.

  4. A combined meta-barcoding and shotgun metagenomic analysis of spontaneous wine fermentation.

    Science.gov (United States)

    Sternes, Peter R; Lee, Danna; Kutyna, Dariusz R; Borneman, Anthony R

    2017-07-01

    Wine is a complex beverage, comprising hundreds of metabolites produced through the action of yeasts and bacteria in fermenting grape must. Commercially, there is now a growing trend away from using wine yeast (Saccharomyces) starter cultures, toward the historic practice of uninoculated or "wild" fermentation, where the yeasts and bacteria associated with the grapes and/or winery perform the fermentation. It is the varied metabolic contributions of these numerous non-Saccharomyces species that are thought to impart complexity and desirable taste and aroma attributes to wild ferments in comparison to their inoculated counterparts. To map the microflora of spontaneous fermentation, metagenomic techniques were employed to characterize and monitor the progression of fungal species in 5 different wild fermentations. Both amplicon-based ribosomal DNA internal transcribed spacer (ITS) phylotyping and shotgun metagenomics were used to assess community structure across different stages of fermentation. While providing a sensitive and highly accurate means of characterizing the wine microbiome, the shotgun metagenomic data also uncovered a significant overabundance bias in the ITS phylotyping abundance estimations for the common non-Saccharomyces wine yeast genus Metschnikowia. By identifying biases such as that observed for Metschnikowia, abundance measurements from future ITS phylotyping datasets can be corrected to provide more accurate species representation. Ultimately, as more shotgun metagenomic and single-strain de novo assemblies for key wine species become available, the accuracy of both ITS-amplicon and shotgun studies will greatly increase, providing a powerful methodology for deciphering the influence of the microbial community on the wine flavor and aroma. © The Authors 2017. Published by Oxford University Press.

  5. Genomic and proteomic identification of Late Holocene remains

    DEFF Research Database (Denmark)

    Biard, Vincent; Gol'din, Pavel; Gladilina, Elena

    2017-01-01

    A critical challenge of the 21st century is to understand and minimise the effects of human activities on biodiversity. Cetaceans are a prime concern in biodiversity research, as many species still suffer from human impacts despite decades of management and conservation efforts. Zooarchaeology...... sequencing approach. In addition, shotgun sequencing produced several complete ancient odontocete mitogenomes and auxiliary nuclear genomic data for further exploration in a population genetic context. In contrast, both morphological identification and Sanger sequencing lacked taxonomic resolution and....../or resulted in misclassification of samples. We found that the combination of ZooMS and shotgun sequencing provides a powerful tool in zooarchaeology, and here allowed for a deeper understanding of past marine resource use and its implication for current management and conservation of Black Sea odontocetes....

  6. Whole-genome sequence of Clostridium lituseburense L74, isolated from the larval gut of the rhinoceros beetle, Trypoxylus dichotomus

    OpenAIRE

    Lee, Yookyung; Lim, Sooyeon; Rhee, Moon-Soo; Chang, Dong-Ho; Kim, Byoung-Chan

    2016-01-01

    Clostridium lituseburense L74 was isolated from the larval gut of the rhinoceros beetle, Trypoxylus dichotomus collected in Yeong-dong, Chuncheongbuk-do, South Korea and subjected to whole genome sequencing on HiSeq platform and annotated on RAST. The nucleotide sequence of this genome was deposited into DDBJ/EMBL/GenBank under the accession NZ_LITJ00000000. Keywords: Insect, Larval gut, Whole genome shot-gun sequencing

  7. Analysis of pig serum proteins based on shotgun liquid ...

    African Journals Online (AJOL)

    Recent advances in proteomics technologies have opened up significant opportunities for future applications. We used shotgun liquid chromatography, coupled with tandem mass spectrometry (LC-MS/MS) to determine the proteome profile of healthy pig serum. Samples of venous blood were collected and subjected to ...

  8. Epidemiological links between tuberculosis cases identified twice as efficiently by whole genome sequencing than conventional molecular typing: A population-based study.

    Science.gov (United States)

    Jajou, Rana; de Neeling, Albert; van Hunen, Rianne; de Vries, Gerard; Schimmel, Henrieke; Mulder, Arnout; Anthony, Richard; van der Hoek, Wim; van Soolingen, Dick

    2018-01-01

    Patients with Mycobacterium tuberculosis isolates sharing identical DNA fingerprint patterns can be epidemiologically linked. However, municipal health services in the Netherlands are able to confirm an epidemiological link in only around 23% of the patients with isolates clustered by the conventional variable number of tandem repeat (VNTR) genotyping. This research aims to investigate whether whole genome sequencing (WGS) is a more reliable predictor of epidemiological links between tuberculosis patients than VNTR genotyping. VNTR genotyping and WGS were performed in parallel on all Mycobacterium tuberculosis complex isolates received at the Netherlands National Institute for Public Health and the Environment in 2016. Isolates were clustered by VNTR when they shared identical 24-loci VNTR patterns; isolates were assigned to a WGS cluster when the pair-wise genetic distance was ≤ 12 single nucleotide polymorphisms (SNPs). Cluster investigation was performed by municipal health services on all isolates clustered by VNTR in 2016. The proportion of epidemiological links identified among patients clustered by either method was calculated. In total, 535 isolates were genotyped, of which 25% (134/535) were clustered by VNTR and 14% (76/535) by WGS; the concordance between both typing methods was 86%. The proportion of epidemiological links among WGS clustered cases (57%) was twice as common than among VNTR clustered cases (31%). When WGS was applied, the number of clustered isolates was halved, while all epidemiologically linked cases remained clustered. WGS is therefore a more reliable tool to predict epidemiological links between tuberculosis cases than VNTR genotyping and will allow more efficient transmission tracing, as epidemiological investigations based on false clustering can be avoided.

  9. Fine-scale variation in meiotic recombination in Mimulus inferred from population shotgun sequencing

    Energy Technology Data Exchange (ETDEWEB)

    Hellsten, Uffe [USDOE Joint Genome Inst., Walnut Creek, CA (United States); Wright, Kevin M. [Harvard Univ., Cambridge, MA (United States); Jenkins, Jerry [USDOE Joint Genome Inst., Walnut Creek, CA (United States); HudsonAlpha Inst. of Biotechnology, Huntsville, AL (United States); Shu, Shengqiang [USDOE Joint Genome Inst., Walnut Creek, CA (United States); Yuan, Yao-Wu [Univ. of Connecticut, Storrs, CT (United States); Wessler, Susan R. [Univ. of California, Riverside, CA (United States); Schmutz, Jeremy [USDOE Joint Genome Inst., Walnut Creek, CA (United States); HudsonAlpha Inst. of Biotechnology, Huntsville, AL (United States); Willis, John H. [Duke Univ., Durham, NC (United States); Rokhsar, Daniel S. [USDOE Joint Genome Inst., Walnut Creek, CA (United States); Univ. of California, Berkeley, CA (United States)

    2013-11-13

    Meiotic recombination rates can vary widely across genomes, with hotspots of intense activity interspersed among cold regions. In yeast, hotspots tend to occur in promoter regions of genes, whereas in humans and mice hotspots are largely defined by binding sites of the PRDM9 protein. To investigate the detailed recombination pattern in a flowering plant we use shotgun resequencing of a wild population of the monkeyflower Mimulus guttatus to precisely locate over 400,000 boundaries of historic crossovers or gene conversion tracts. Their distribution defines some 13,000 hotspots of varying strengths, interspersed with cold regions of undetectably low recombination. Average recombination rates peak near starts of genes and fall off sharply, exhibiting polarity. Within genes, recombination tracts are more likely to terminate in exons than in introns. The general pattern is similar to that observed in yeast, as well as in PRDM9-knockout mice, suggesting that recombination initiation described here in Mimulus may reflect ancient and conserved eukaryotic mechanisms

  10. Whole genome sequence to decipher the resistome of Shewanella algae, a multidrug-resistant bacterium responsible for pneumonia, Marseille, France.

    Science.gov (United States)

    Cimmino, Teresa; Olaitan, Abiola Olumuyiwa; Rolain, Jean-Marc

    2016-01-01

    We characterize and decipher the resistome and the virulence factors of Shewanella algae MARS 14, a multidrug-resistant clinical strain using the whole genome sequencing (WGS) strategy. The bacteria were isolated from the bronchoalveolar lavage of a hospitalized patient in the Timone Hospital in Marseille, France who developed pneumonia after plunging into the Mediterranean Sea. The genome size of S. algae MARS 14 was 5,005,710 bp with 52.8% guanine cytosine content. The resistome includes members of class C and D beta-lactamases and numerous multidrug-efflux pumps. We also found the presence of several hemolysins genes, a complete flagellum system gene cluster and genes responsible for biofilm formation. Moreover, we reported for the first time in a clinical strain of Shewanella spp. the presence of a bacteriocin (marinocin). The WGS analysis of this pathogen provides insight into its virulence factors and resistance to antibiotics.

  11. De novo assembly of a haplotype-resolved human genome.

    Science.gov (United States)

    Cao, Hongzhi; Wu, Honglong; Luo, Ruibang; Huang, Shujia; Sun, Yuhui; Tong, Xin; Xie, Yinlong; Liu, Binghang; Yang, Hailong; Zheng, Hancheng; Li, Jian; Li, Bo; Wang, Yu; Yang, Fang; Sun, Peng; Liu, Siyang; Gao, Peng; Huang, Haodong; Sun, Jing; Chen, Dan; He, Guangzhu; Huang, Weihua; Huang, Zheng; Li, Yue; Tellier, Laurent C A M; Liu, Xiao; Feng, Qiang; Xu, Xun; Zhang, Xiuqing; Bolund, Lars; Krogh, Anders; Kristiansen, Karsten; Drmanac, Radoje; Drmanac, Snezana; Nielsen, Rasmus; Li, Songgang; Wang, Jian; Yang, Huanming; Li, Yingrui; Wong, Gane Ka-Shu; Wang, Jun

    2015-06-01

    The human genome is diploid, and knowledge of the variants on each chromosome is important for the interpretation of genomic information. Here we report the assembly of a haplotype-resolved diploid genome without using a reference genome. Our pipeline relies on fosmid pooling together with whole-genome shotgun strategies, based solely on next-generation sequencing and hierarchical assembly methods. We applied our sequencing method to the genome of an Asian individual and generated a 5.15-Gb assembled genome with a haplotype N50 of 484 kb. Our analysis identified previously undetected indels and 7.49 Mb of novel coding sequences that could not be aligned to the human reference genome, which include at least six predicted genes. This haplotype-resolved genome represents the most complete de novo human genome assembly to date. Application of our approach to identify individual haplotype differences should aid in translating genotypes to phenotypes for the development of personalized medicine.

  12. Comparing whole-genome sequencing with Sanger sequencing for spa typing of methicillin-resistant Staphylococcus aureus.

    Science.gov (United States)

    Bartels, Mette Damkjær; Petersen, Andreas; Worning, Peder; Nielsen, Jesper Boye; Larner-Svensson, Hanna; Johansen, Helle Krogh; Andersen, Leif Percival; Jarløv, Jens Otto; Boye, Kit; Larsen, Anders Rhod; Westh, Henrik

    2014-12-01

    spa typing of methicillin-resistant Staphylococcus aureus (MRSA) has traditionally been done by PCR amplification and Sanger sequencing of the spa repeat region. At Hvidovre Hospital, Denmark, whole-genome sequencing (WGS) of all MRSA isolates has been performed routinely since January 2013, and an in-house analysis pipeline determines the spa types. Due to national surveillance, all MRSA isolates are sent to Statens Serum Institut, where the spa type is determined by PCR and Sanger sequencing. The purpose of this study was to evaluate the reliability of the spa types obtained by 150-bp paired-end Illumina WGS. MRSA isolates from new MRSA patients in 2013 (n = 699) in the capital region of Denmark were included. We found a 97% agreement between spa types obtained by the two methods. All isolates achieved a spa type by both methods. Nineteen isolates differed in spa types by the two methods, in most cases due to the lack of 24-bp repeats in the whole-genome-sequenced isolates. These related but incorrect spa types should have no consequence in outbreak investigations, since all epidemiologically linked isolates, regardless of spa type, will be included in the single nucleotide polymorphism (SNP) analysis. This will reveal the close relatedness of the spa types. In conclusion, our data show that WGS is a reliable method to determine the spa type of MRSA. Copyright © 2014, American Society for Microbiology. All Rights Reserved.

  13. Investigation of a possible outbreak of carbapenem-resistant Acinetobacter baumannii in Odense, Denmark using PFGE, MLST and whole-genome-based SNPs

    DEFF Research Database (Denmark)

    Hammerum, Anette M; Hansen, Frank; Skov, Marianne N

    2015-01-01

    carbapenem-resistant A. baumannii were detected at Odense University Hospital, Odense, Denmark. These isolates were typed by PFGE, with ApaI and SmaI, respectively, and subjected to WGS. The WGS data were used for in silico extraction of MLST types using two different schemes, resistance genes and SNPs......, to which 31 publicly available A. baumannii genomes were added. RESULTS: Using ApaI, the eight isolates had four different PFGE profiles, which were further differentiated using SmaI, separating one of the profiles into two distinct PFGE types. Five ST2 (Pasteur MLST) OXA-23-producing isolates, two ST1 OXA...

  14. A high-throughput shotgun mutagenesis approach to mapping B-cell antibody epitopes.

    Science.gov (United States)

    Davidson, Edgar; Doranz, Benjamin J

    2014-09-01

    Characterizing the binding sites of monoclonal antibodies (mAbs) on protein targets, their 'epitopes', can aid in the discovery and development of new therapeutics, diagnostics and vaccines. However, the speed of epitope mapping techniques has not kept pace with the increasingly large numbers of mAbs being isolated. Obtaining detailed epitope maps for functionally relevant antibodies can be challenging, particularly for conformational epitopes on structurally complex proteins. To enable rapid epitope mapping, we developed a high-throughput strategy, shotgun mutagenesis, that enables the identification of both linear and conformational epitopes in a fraction of the time required by conventional approaches. Shotgun mutagenesis epitope mapping is based on large-scale mutagenesis and rapid cellular testing of natively folded proteins. Hundreds of mutant plasmids are individually cloned, arrayed in 384-well microplates, expressed within human cells, and tested for mAb reactivity. Residues are identified as a component of a mAb epitope if their mutation (e.g. to alanine) does not support candidate mAb binding but does support that of other conformational mAbs or allows full protein function. Shotgun mutagenesis is particularly suited for studying structurally complex proteins because targets are expressed in their native form directly within human cells. Shotgun mutagenesis has been used to delineate hundreds of epitopes on a variety of proteins, including G protein-coupled receptor and viral envelope proteins. The epitopes mapped on dengue virus prM/E represent one of the largest collections of epitope information for any viral protein, and results are being used to design better vaccines and drugs. © 2014 John Wiley & Sons Ltd.

  15. Structural analysis of the α subunit of Na(+)/K(+) ATPase genes in invertebrates.

    Science.gov (United States)

    Thabet, Rahma; Rouault, J-D; Ayadi, Habib; Leignel, Vincent

    2016-01-01

    The Na(+)/K(+) ATPase is a ubiquitous pump coordinating the transport of Na(+) and K(+) across the membrane of cells and its role is fundamental to cellular functions. It is heteromer in eukaryotes including two or three subunits (α, β and γ which is specific to the vertebrates). The catalytic functions of the enzyme have been attributed to the α subunit. Several complete α protein sequences are available, but only few gene structures were characterized. We identified the genomic sequences coding the α-subunit of the Na(+)/K(+) ATPase, from the whole-genome shotgun contigs (WGS), NCBI Genomes (chromosome), Genomic Survey Sequences (GSS) and High Throughput Genomic Sequences (HTGS) databases across distinct phyla. One copy of the α subunit gene was found in Annelida, Arthropoda, Cnidaria, Echinodermata, Hemichordata, Mollusca, Placozoa, Porifera, Platyhelminthes, Urochordata, but the nematodes seem to possess 2 to 4 copies. The number of introns varied from 0 (Platyhelminthes) to 26 (Porifera); and their localization and length are also highly variable. Molecular phylogenies (Maximum Likelihood and Maximum Parsimony methods) showed some clusters constituted by (Chordata/(Echinodermata/Hemichordata)) or (Plathelminthes/(Annelida/Mollusca)) and a basal position for Porifera. These structural analyses increase our knowledge about the evolutionary events of the α subunit genes in the invertebrates. Copyright © 2016 Elsevier Inc. All rights reserved.

  16. A hybrid computational strategy to address WGS variant analysis in >5000 samples.

    Science.gov (United States)

    Huang, Zhuoyi; Rustagi, Navin; Veeraraghavan, Narayanan; Carroll, Andrew; Gibbs, Richard; Boerwinkle, Eric; Venkata, Manjunath Gorentla; Yu, Fuli

    2016-09-10

    The decreasing costs of sequencing are driving the need for cost effective and real time variant calling of whole genome sequencing data. The scale of these projects are far beyond the capacity of typical computing resources available with most research labs. Other infrastructures like the cloud AWS environment and supercomputers also have limitations due to which large scale joint variant calling becomes infeasible, and infrastructure specific variant calling strategies either fail to scale up to large datasets or abandon joint calling strategies. We present a high throughput framework including multiple variant callers for single nucleotide variant (SNV) calling, which leverages hybrid computing infrastructure consisting of cloud AWS, supercomputers and local high performance computing infrastructures. We present a novel binning approach for large scale joint variant calling and imputation which can scale up to over 10,000 samples while producing SNV callsets with high sensitivity and specificity. As a proof of principle, we present results of analysis on Cohorts for Heart And Aging Research in Genomic Epidemiology (CHARGE) WGS freeze 3 dataset in which joint calling, imputation and phasing of over 5300 whole genome samples was produced in under 6 weeks using four state-of-the-art callers. The callers used were SNPTools, GATK-HaplotypeCaller, GATK-UnifiedGenotyper and GotCloud. We used Amazon AWS, a 4000-core in-house cluster at Baylor College of Medicine, IBM power PC Blue BioU at Rice and Rhea at Oak Ridge National Laboratory (ORNL) for the computation. AWS was used for joint calling of 180 TB of BAM files, and ORNL and Rice supercomputers were used for the imputation and phasing step. All other steps were carried out on the local compute cluster. The entire operation used 5.2 million core hours and only transferred a total of 6 TB of data across the platforms. Even with increasing sizes of whole genome datasets, ensemble joint calling of SNVs for low

  17. Whole-genome sequence of Clostridium lituseburense L74, isolated from the larval gut of the rhinoceros beetle, Trypoxylus dichotomus

    Directory of Open Access Journals (Sweden)

    Yookyung Lee

    2016-03-01

    Full Text Available Clostridium lituseburense L74 was isolated from the larval gut of the rhinoceros beetle, Trypoxylus dichotomus collected in Yeong-dong, Chuncheongbuk-do, South Korea and subjected to whole genome sequencing on HiSeq platform and annotated on RAST. The nucleotide sequence of this genome was deposited into DDBJ/EMBL/GenBank under the accession NZ_LITJ00000000. Keywords: Insect, Larval gut, Whole genome shot-gun sequencing

  18. Whole genome sequencing of Mycobacterium tuberculosis SB24 isolated from Sabah, Malaysia

    Directory of Open Access Journals (Sweden)

    Noraini Philip

    2016-09-01

    Full Text Available Mycobacterium tuberculosis (M. tuberculosis is the causative agent of tuberculosis (TB that causes millions of death every year. We have sequenced the genome of M. tuberculosis isolated from cerebrospinal fluid (CSF of a patient diagnosed with tuberculous meningitis (TBM. The isolated strain was referred as M. tuberculosis SB24. Genomic DNA of the M. tuberculosis SB24 was extracted and subjected to whole genome sequencing using PacBio platform. The draft genome size of M. tuberculosis SB24 was determined to be 4,452,489 bp with a G + C content of 65.6%. The whole genome shotgun project has been deposited in NCBI SRA under the accession number SRP076503.

  19. A decade of genomic history for healthcare-associated Enterococcus faecium in the United Kingdom and Ireland

    Science.gov (United States)

    Raven, Kathy E.; Reuter, Sandra; Reynolds, Rosy; Brodrick, Hayley J.; Russell, Julie E.; Török, M. Estée; Parkhill, Julian; Peacock, Sharon J.

    2016-01-01

    Vancomycin-resistant Enterococcus faecium (VREfm) is an important cause of healthcare-associated infections worldwide. We undertook whole-genome sequencing (WGS) of 495 E. faecium bloodstream isolates from 2001–2011 in the United Kingdom and Ireland (UK&I) and 11 E. faecium isolates from a reference collection. Comparison between WGS and multilocus sequence typing (MLST) identified major discrepancies for 17% of isolates, with multiple instances of the same sequence type (ST) being located in genetically distant positions in the WGS tree. This confirms that WGS is superior to MLST for evolutionary analyses and is more accurate than current typing methods used during outbreak investigations. E. faecium has been categorized as belonging to three clades (Clades A1, hospital-associated; A2, animal-associated; and B, community-associated). Phylogenetic analysis of our isolates replicated the distinction between Clade A (97% of isolates) and Clade B but did not support the subdivision of Clade A into Clade A1 and A2. Phylogeographic analyses revealed that Clade A had been introduced multiple times into each hospital referral network or country, indicating frequent movement of E. faecium between regions that rarely share hospital patients. Numerous genetic clusters contained highly related vanA-positive and -negative E. faecium, which implies that control of vancomycin-resistant enterococci (VRE) in hospitals also requires consideration of vancomycin-susceptible E. faecium. Our findings reveal the evolution and dissemination of hospital-associated E. faecium in the UK&I and provide evidence for WGS as an instrument for infection control. PMID:27527616

  20. Whole-Genome Sequence of Pseudomonas graminis Strain UASWS1507, a Potential Biological Control Agent and Biofertilizer Isolated in Switzerland.

    Science.gov (United States)

    Crovadore, Julien; Calmin, Gautier; Chablais, Romain; Cochard, Bastien; Schulz, Torsten; Lefort, François

    2016-10-06

    We report here the whole-genome shotgun sequence of the strain UASWS1507 of the species Pseudomonas graminis, isolated in Switzerland from an apple tree. This is the first genome registered for this species, which is considered as a potential and valuable resource of biological control agents and biofertilizers for agriculture. Copyright © 2016 Crovadore et al.

  1. Whole-genome sequence, SNP chips and pedigree structure: building demographic profiles in domestic dog breeds to optimize genetic-trait mapping

    Science.gov (United States)

    Dreger, Dayna L.; Rimbault, Maud; Davis, Brian W.; Bhatnagar, Adrienne; Parker, Heidi G.

    2016-01-01

    ABSTRACT In the decade following publication of the draft genome sequence of the domestic dog, extraordinary advances with application to several fields have been credited to the canine genetic system. Taking advantage of closed breeding populations and the subsequent selection for aesthetic and behavioral characteristics, researchers have leveraged the dog as an effective natural model for the study of complex traits, such as disease susceptibility, behavior and morphology, generating unique contributions to human health and biology. When designing genetic studies using purebred dogs, it is essential to consider the unique demography of each population, including estimation of effective population size and timing of population bottlenecks. The analytical design approach for genome-wide association studies (GWAS) and analysis of whole-genome sequence (WGS) experiments are inextricable from demographic data. We have performed a comprehensive study of genomic homozygosity, using high-depth WGS data for 90 individuals, and Illumina HD SNP data from 800 individuals representing 80 breeds. These data were coupled with extensive pedigree data analyses for 11 breeds that, together, allowed us to compute breed structure, demography, and molecular measures of genome diversity. Our comparative analyses characterize the extent, formation and implication of breed-specific diversity as it relates to population structure. These data demonstrate the relationship between breed-specific genome dynamics and population architecture, and provide important considerations influencing the technological and cohort design of association and other genomic studies. PMID:27874836

  2. Genome sequencing and annotation of multidrug resistant Mycobacterium tuberculosis (MDR-TB PR10 strain

    Directory of Open Access Journals (Sweden)

    Mohd Zakihalani A. Halim

    2016-03-01

    Full Text Available Here, we report the draft genome sequence and annotation of a multidrug resistant Mycobacterium tuberculosis strain PR10 (MDR-TB PR10 isolated from a patient diagnosed with tuberculosis. The size of the draft genome MDR-TB PR10 is 4.34 Mbp with 65.6% of G + C content and consists of 4637 predicted genes. The determinants were categorized by RAST into 400 subsystems with 4286 coding sequences and 50 RNAs. The whole genome shotgun project has been deposited at DDBJ/EMBL/GenBank under the accession number CP010968. Keywords: Mycobacterium tuberculosis, Genome, MDR, Extrapulmonary

  3. Analysis Of Segmental Duplications In The Pig Genome Based On Next-Generation Sequencing

    DEFF Research Database (Denmark)

    Fadista, João; Bendixen, Christian

    Segmental duplications are >1kb segments of duplicated DNA present in a genome with high sequence identity (>90%). They are associated with genomic rearrangements and provide a significant source of gene and genome evolution within mammalian genomes. Although segmental duplications have been...... extensively studied in other organisms, its analysis in pig has been hampered by the lack of a complete pig genome assembly. By measuring the depth of coverage of Illumina whole-genome shotgun sequencing reads of the Tabasco animal aligned to the latest pig genome assembly (Sus scrofa 10 – based also...... and their associated copy number alterations, focusing on the global organization of these segments and their possible functional significance in porcine phenotypes. This work provides insights into mammalian genome evolution and generates a valuable resource for porcine genomics research...

  4. Shotgun pyrosequencing metagenomic analyses of dusts from swine confinement and grain facilities.

    Science.gov (United States)

    Boissy, Robert J; Romberger, Debra J; Roughead, William A; Weissenburger-Moser, Lisa; Poole, Jill A; LeVan, Tricia D

    2014-01-01

    Inhalation of agricultural dusts causes inflammatory reactions and symptoms such as headache, fever, and malaise, which can progress to chronic airway inflammation and associated diseases, e.g. asthma, chronic bronchitis, chronic obstructive pulmonary disease, and hypersensitivity pneumonitis. Although in many agricultural environments feed particles are the major constituent of these dusts, the inflammatory responses that they provoke are likely attributable to particle-associated bacteria, archaebacteria, fungi, and viruses. In this study, we performed shotgun pyrosequencing metagenomic analyses of DNA from dusts from swine confinement facilities or grain elevators, with comparisons to dusts from pet-free households. DNA sequence alignment showed that 19% or 62% of shotgun pyrosequencing metagenomic DNA sequence reads from swine facility or household dusts, respectively, were of swine or human origin, respectively. In contrast only 2% of such reads from grain elevator dust were of mammalian origin. These metagenomic shotgun reads of mammalian origin were excluded from our analyses of agricultural dust microbiota. The ten most prevalent bacterial taxa identified in swine facility compared to grain elevator or household dust were comprised of 75%, 16%, and 42% gram-positive organisms, respectively. Four of the top five swine facility dust genera were assignable (Clostridium, Lactobacillus, Ruminococcus, and Eubacterium, ranging from 4% to 19% relative abundance). The relative abundances of these four genera were lower in dust from grain elevators or pet-free households. These analyses also highlighted the predominance in swine facility dust of Firmicutes (70%) at the phylum level, Clostridia (44%) at the Class level, and Clostridiales at the Order level (41%). In summary, shotgun pyrosequencing metagenomic analyses of agricultural dusts show that they differ qualitatively and quantitatively at the level of microbial taxa present, and that the bioinformatic analyses

  5. Shotgun pyrosequencing metagenomic analyses of dusts from swine confinement and grain facilities.

    Directory of Open Access Journals (Sweden)

    Robert J Boissy

    Full Text Available Inhalation of agricultural dusts causes inflammatory reactions and symptoms such as headache, fever, and malaise, which can progress to chronic airway inflammation and associated diseases, e.g. asthma, chronic bronchitis, chronic obstructive pulmonary disease, and hypersensitivity pneumonitis. Although in many agricultural environments feed particles are the major constituent of these dusts, the inflammatory responses that they provoke are likely attributable to particle-associated bacteria, archaebacteria, fungi, and viruses. In this study, we performed shotgun pyrosequencing metagenomic analyses of DNA from dusts from swine confinement facilities or grain elevators, with comparisons to dusts from pet-free households. DNA sequence alignment showed that 19% or 62% of shotgun pyrosequencing metagenomic DNA sequence reads from swine facility or household dusts, respectively, were of swine or human origin, respectively. In contrast only 2% of such reads from grain elevator dust were of mammalian origin. These metagenomic shotgun reads of mammalian origin were excluded from our analyses of agricultural dust microbiota. The ten most prevalent bacterial taxa identified in swine facility compared to grain elevator or household dust were comprised of 75%, 16%, and 42% gram-positive organisms, respectively. Four of the top five swine facility dust genera were assignable (Clostridium, Lactobacillus, Ruminococcus, and Eubacterium, ranging from 4% to 19% relative abundance. The relative abundances of these four genera were lower in dust from grain elevators or pet-free households. These analyses also highlighted the predominance in swine facility dust of Firmicutes (70% at the phylum level, Clostridia (44% at the Class level, and Clostridiales at the Order level (41%. In summary, shotgun pyrosequencing metagenomic analyses of agricultural dusts show that they differ qualitatively and quantitatively at the level of microbial taxa present, and that the

  6. The Metagenome of Utricularia gibba's Traps: Into the Microbial Input to a Carnivorous Plant

    Science.gov (United States)

    Alcaraz, Luis David; Martínez-Sánchez, Shamayim; Torres, Ignacio; Ibarra-Laclette, Enrique; Herrera-Estrella, Luis

    2016-01-01

    The genome and transcriptome sequences of the aquatic, rootless, and carnivorous plant Utricularia gibba L. (Lentibulariaceae), were recently determined. Traps are necessary for U. gibba because they help the plant to survive in nutrient-deprived environments. The U. gibba's traps (Ugt) are specialized structures that have been proposed to selectively filter microbial inhabitants. To determine whether the traps indeed have a microbiome that differs, in composition or abundance, from the microbiome in the surrounding environment, we used whole-genome shotgun (WGS) metagenomics to describe both the taxonomic and functional diversity of the Ugt microbiome. We collected U. gibba plants from their natural habitat and directly sequenced the metagenome of the Ugt microbiome and its surrounding water. The total predicted number of species in the Ugt was more than 1,100. Using pan-genome fragment recruitment analysis, we were able to identify to the species level of some key Ugt players, such as Pseudomonas monteilii. Functional analysis of the Ugt metagenome suggests that the trap microbiome plays an important role in nutrient scavenging and assimilation while complementing the hydrolytic functions of the plant. PMID:26859489

  7. Plasmid flux in Escherichia coli ST131 sublineages, analyzed by plasmid constellation network (PLACNET), a new method for plasmid reconstruction from whole genome sequences.

    Science.gov (United States)

    Lanza, Val F; de Toro, María; Garcillán-Barcia, M Pilar; Mora, Azucena; Blanco, Jorge; Coque, Teresa M; de la Cruz, Fernando

    2014-12-01

    Bacterial whole genome sequence (WGS) methods are rapidly overtaking classical sequence analysis. Many bacterial sequencing projects focus on mobilome changes, since macroevolutionary events, such as the acquisition or loss of mobile genetic elements, mainly plasmids, play essential roles in adaptive evolution. Existing WGS analysis protocols do not assort contigs between plasmids and the main chromosome, thus hampering full analysis of plasmid sequences. We developed a method (called plasmid constellation networks or PLACNET) that identifies, visualizes and analyzes plasmids in WGS projects by creating a network of contig interactions, thus allowing comprehensive plasmid analysis within WGS datasets. The workflow of the method is based on three types of data: assembly information (including scaffold links and coverage), comparison to reference sequences and plasmid-diagnostic sequence features. The resulting network is pruned by expert analysis, to eliminate confounding data, and implemented in a Cytoscape-based graphic representation. To demonstrate PLACNET sensitivity and efficacy, the plasmidome of the Escherichia coli lineage ST131 was analyzed. ST131 is a globally spread clonal group of extraintestinal pathogenic E. coli (ExPEC), comprising different sublineages with ability to acquire and spread antibiotic resistance and virulence genes via plasmids. Results show that plasmids flux in the evolution of this lineage, which is wide open for plasmid exchange. MOBF12/IncF plasmids were pervasive, adding just by themselves more than 350 protein families to the ST131 pangenome. Nearly 50% of the most frequent γ-proteobacterial plasmid groups were found to be present in our limited sample of ten analyzed ST131 genomes, which represent the main ST131 sublineages.

  8. Draft genome sequence of a Kluyvera intermedia isolate from a patient with a pancreatic abscess

    DEFF Research Database (Denmark)

    Thele, Roland; Gumpert, Heidi; Christensen, Louise B.

    2017-01-01

    The genus Kluyvera comprises potential pathogens that can cause many infections. This study reports a Kluyvera intermedia strain (FOSA7093) from a pancreatic cyst specimen from a long-term hospitalised patient. Whole-genome sequencing (WGS) of the K. intermedia isolate was performed and the strain...... an efficient tool for monitoring Kluyvera spp. and its role as a reservoir of multidrug resistance. Therefore, this susceptible K. intermedia genome has many characteristics that allow comparison of resistant K. intermedia that might be discovered in the future....

  9. Draft genome sequence of the Algerian bee Apis mellifera intermissa

    Directory of Open Access Journals (Sweden)

    Nizar Jamal Haddad

    2015-06-01

    Full Text Available Apis mellifera intermissa is the native honeybee subspecies of Algeria. A. m. intermissa occurs in Tunisia, Algeria and Morocco, between the Atlas and the Mediterranean and Atlantic coasts. This bee is very important due to its high ability to adapt to great variations in climatic conditions and due to its preferable cleaning behavior. Here we report the draft genome sequence of this honey bee, its Whole Genome Shotgun project has been deposited at DDBJ/EMBL/GenBank under the accession JSUV00000000. The 240-Mb genome is being annotated and analyzed. Comparison with the genome of other Apis mellifera sub-species promises to yield insights into the evolution of adaptations to high temperature and resistance to Varroa parasite infestation.

  10. Comprehensive Phylogenetic Analysis of Bovine Non-aureus Staphylococci Species Based on Whole-Genome Sequencing

    Science.gov (United States)

    Naushad, Sohail; Barkema, Herman W.; Luby, Christopher; Condas, Larissa A. Z.; Nobrega, Diego B.; Carson, Domonique A.; De Buck, Jeroen

    2016-01-01

    Non-aureus staphylococci (NAS), a heterogeneous group of a large number of species and subspecies, are the most frequently isolated pathogens from intramammary infections in dairy cattle. Phylogenetic relationships among bovine NAS species are controversial and have mostly been determined based on single-gene trees. Herein, we analyzed phylogeny of bovine NAS species using whole-genome sequencing (WGS) of 441 distinct isolates. In addition, evolutionary relationships among bovine NAS were estimated from multilocus data of 16S rRNA, hsp60, rpoB, sodA, and tuf genes and sequences from these and numerous other single genes/proteins. All phylogenies were created with FastTree, Maximum-Likelihood, Maximum-Parsimony, and Neighbor-Joining methods. Regardless of methodology, WGS-trees clearly separated bovine NAS species into five monophyletic coherent clades. Furthermore, there were consistent interspecies relationships within clades in all WGS phylogenetic reconstructions. Except for the Maximum-Parsimony tree, multilocus data analysis similarly produced five clades. There were large variations in determining clades and interspecies relationships in single gene/protein trees, under different methods of tree constructions, highlighting limitations of using single genes for determining bovine NAS phylogeny. However, based on WGS data, we established a robust phylogeny of bovine NAS species, unaffected by method or model of evolutionary reconstructions. Therefore, it is now possible to determine associations between phylogeny and many biological traits, such as virulence, antimicrobial resistance, environmental niche, geographical distribution, and host specificity. PMID:28066335

  11. Shotgun proteomics of plant plasma membrane and microdomain proteins using nano-LC-MS/MS.

    Science.gov (United States)

    Takahashi, Daisuke; Li, Bin; Nakayama, Takato; Kawamura, Yukio; Uemura, Matsuo

    2014-01-01

    Shotgun proteomics allows the comprehensive analysis of proteins extracted from plant cells, subcellular organelles, and membranes. Previously, two-dimensional gel electrophoresis-based proteomics was used for mass spectrometric analysis of plasma membrane proteins. In order to get comprehensive proteome profiles of the plasma membrane including highly hydrophobic proteins with a number of transmembrane domains, a mass spectrometry-based shotgun proteomics method using nano-LC-MS/MS for proteins from the plasma membrane proteins and plasma membrane microdomain fraction is described. The results obtained are easily applicable to label-free protein semiquantification.

  12. Genome Sequence of the Palaeopolyploid soybean

    Energy Technology Data Exchange (ETDEWEB)

    Schmutz, Jeremy; Cannon, Steven B.; Schlueter, Jessica; Ma, Jianxin; Mitros, Therese; Nelson, William; Hyten, David L.; Song, Qijian; Thelen, Jay J.; Cheng, Jianlin; Xu, Dong; Hellsten, Uffe; May, Gregory D.; Yu, Yeisoo; Sakura, Tetsuya; Umezawa, Taishi; Bhattacharyya, Madan K.; Sandhu, Devinder; Valliyodan, Babu; Lindquist, Erika; Peto, Myron; Grant, David; Shu, Shengqiang; Goodstein, David; Barry, Kerrie; Futrell-Griggs, Montona; Abernathy, Brian; Du, Jianchang; Tian, Zhixi; Zhu, Liucun; Gill, Navdeep; Joshi, Trupti; Libault, Marc; Sethuraman, Anand; Zhang, Xue-Cheng; Shinozaki, Kazuo; Nguyen, Henry T.; Wing, Rod A.; Cregan, Perry; Specht, James; Grimwood, Jane; Rokhsar, Dan; Stacey, Gary; Shoemaker, Randy C.; Jackson, Scott A.

    2009-08-03

    Soybean (Glycine max) is one of the most important crop plants for seed protein and oil content, and for its capacity to fix atmospheric nitrogen through symbioses with soil-borne microorganisms. We sequenced the 1.1-gigabase genome by a whole-genome shotgun approach and integrated it with physical and high-density genetic maps to create a chromosome-scale draft sequence assembly. We predict 46,430 protein-coding genes, 70percent more than Arabidopsis and similar to the poplar genome which, like soybean, is an ancient polyploid (palaeopolyploid). About 78percent of the predicted genes occur in chromosome ends, which comprise less than one-half of the genome but account for nearly all of the genetic recombination. Genome duplications occurred at approximately 59 and 13 million years ago, resulting in a highly duplicated genome with nearly 75percent of the genes present in multiple copies. The two duplication events were followed by gene diversification and loss, and numerous chromosome rearrangements. An accurate soybean genome sequence will facilitate the identification of the genetic basis of many soybean traits, and accelerate the creation of improved soybean varieties.

  13. Epidemiological characterization of a nosocomial outbreak of extended spectrum β-lactamase Escherichia coli ST-131 confirms the clinical value of core genome multilocus sequence typing.

    Science.gov (United States)

    Woksepp, Hanna; Ryberg, Anna; Berglind, Linda; Schön, Thomas; Söderman, Jan

    2017-12-01

    Enhanced precision of epidemiological typing in clinically suspected nosocomial outbreaks is crucial. Our aim was to investigate whether single nucleotide polymorphism (SNP) analysis and core genome (cg) multilocus sequence typing (MLST) of whole genome sequencing (WGS) data would more reliably identify a nosocomial outbreak, compared to earlier molecular typing methods. Sixteen isolates from a nosocomial outbreak of ESBL E. coli ST-131 in southeastern Sweden and three control strains were subjected to WGS. Sequences were explored by SNP analysis and cgMLST. cgMLST clearly differentiated between the outbreak isolates and the control isolates (>1400 differences). All clinically identified outbreak isolates showed close clustering (≥2 allele differences), except for two isolates (>50 allele differences). These data confirmed that the isolates with >50 differing genes did not belong to the nosocomial outbreak. The number of SNPs within the outbreak was ≤7, whereas the two discrepant isolates had >700 SNPs. Two of the ESBL E. coli ST-131 isolates did not belong to the clinically identified outbreak. Our results illustrate the power of WGS in terms of resolution, which may avoid overestimation of patients belonging to outbreaks as judged from epidemiological data and previously employed molecular methods with lower discriminatory ability. © 2017 APMIS. Published by John Wiley & Sons Ltd.

  14. Presence of extensive Wolbachia symbiont insertions discovered in the genome of its host Glossina morsitans morsitans.

    Directory of Open Access Journals (Sweden)

    Corey Brelsfoard

    2014-04-01

    Full Text Available Tsetse flies (Glossina spp. are the cyclical vectors of Trypanosoma spp., which are unicellular parasites responsible for multiple diseases, including nagana in livestock and sleeping sickness in humans in Africa. Glossina species, including Glossina morsitans morsitans (Gmm, for which the Whole Genome Sequence (WGS is now available, have established symbiotic associations with three endosymbionts: Wigglesworthia glossinidia, Sodalis glossinidius and Wolbachia pipientis (Wolbachia. The presence of Wolbachia in both natural and laboratory populations of Glossina species, including the presence of horizontal gene transfer (HGT events in a laboratory colony of Gmm, has already been shown. We herein report on the draft genome sequence of the cytoplasmic Wolbachia endosymbiont (cytWol associated with Gmm. By in silico and molecular and cytogenetic analysis, we discovered and validated the presence of multiple insertions of Wolbachia (chrWol in the host Gmm genome. We identified at least two large insertions of chrWol, 527,507 and 484,123 bp in size, from Gmm WGS data. Southern hybridizations confirmed the presence of Wolbachia insertions in Gmm genome, and FISH revealed multiple insertions located on the two sex chromosomes (X and Y, as well as on the supernumerary B-chromosomes. We compare the chrWol insertions to the cytWol draft genome in an attempt to clarify the evolutionary history of the HGT events. We discuss our findings in light of the evolution of Wolbachia infections in the tsetse fly and their potential impacts on the control of tsetse populations and trypanosomiasis.

  15. Significance of functional disease-causal/susceptible variants identified by whole-genome analyses for the understanding of human diseases.

    Science.gov (United States)

    Hitomi, Yuki; Tokunaga, Katsushi

    2017-01-01

    Human genome variation may cause differences in traits and disease risks. Disease-causal/susceptible genes and variants for both common and rare diseases can be detected by comprehensive whole-genome analyses, such as whole-genome sequencing (WGS), using next-generation sequencing (NGS) technology and genome-wide association studies (GWAS). Here, in addition to the application of an NGS as a whole-genome analysis method, we summarize approaches for the identification of functional disease-causal/susceptible variants from abundant genetic variants in the human genome and methods for evaluating their functional effects in human diseases, using an NGS and in silico and in vitro functional analyses. We also discuss the clinical applications of the functional disease causal/susceptible variants to personalized medicine.

  16. A statistical approach designed for finding mathematically defined repeats in shotgun data and determining the length distribution of clone-inserts

    DEFF Research Database (Denmark)

    Zhong, Lan; Zhang, Kunlin; Huang, Xiangang

    2003-01-01

    that repeats of different copy number have different probabilities of appearance in shotgun data, so based on this principle, we constructed a statistical model and inferred criteria for mathematically defined repeats (MDRs) at different shotgun coverages. According to these criteria, we developed software...... MDRmasker to identify and mask MDRs in shotgun data. With repeats masked prior to assembly, the speed of assembly was increased with lower error probability. In addition, clone-insert size affect the accuracy of repeat assembly and scaffold construction, we also designed length distribution of clone...

  17. A label-free quantitative shotgun proteomics analysis of rice grain development

    Directory of Open Access Journals (Sweden)

    Koh Hee-Jong

    2011-09-01

    Full Text Available Abstract Background Although a great deal of rice proteomic research has been conducted, there are relatively few studies specifically addressing the rice grain proteome. The existing rice grain proteomic researches have focused on the identification of differentially expressed proteins or monitoring protein expression patterns during grain filling stages. Results Proteins were extracted from rice grains 10, 20, and 30 days after flowering, as well as from fully mature grains. By merging all of the identified proteins in this study, we identified 4,172 non-redundant proteins with a wide range of molecular weights (from 5.2 kDa to 611 kDa and pI values (from pH 2.9 to pH 12.6. A Genome Ontology category enrichment analysis for the 4,172 proteins revealed that 52 categories were enriched, including the carbohydrate metabolic process, transport, localization, lipid metabolic process, and secondary metabolic process. The relative abundances of the 1,784 reproducibly identified proteins were compared to detect 484 differentially expressed proteins during rice grain development. Clustering analysis and Genome Ontology category enrichment analysis revealed that proteins involved in the metabolic process were enriched through all stages of development, suggesting that proteome changes occurred even in the desiccation phase. Interestingly, enrichments of proteins involved in protein folding were detected in the desiccation phase and in fully mature grain. Conclusion This is the first report conducting comprehensive identification of rice grain proteins. With a label free shotgun proteomic approach, we identified large number of rice grain proteins and compared the expression patterns of reproducibly identified proteins during rice grain development. Clustering analysis, Genome Ontology category enrichment analysis, and the analysis of composite expression profiles revealed dynamic changes of metabolisms during rice grain development. Interestingly, we

  18. Performance and economics of a Pd-based planar WGS membrane reactor for coal gasification

    International Nuclear Information System (INIS)

    Dolan, M.D.; Donelson, R.; Dave, N.C.

    2010-01-01

    Conceptual 300 tonne per day (tpd) H 2 -from-coal plants have been the subject of several major costing exercises in the past decade. Incorporating conventional high- and low-temperature water-gas-shift (WGS) reactors, amine-based CO 2 removal and PSA-based H 2 purification systems, these studies provide a benchmark against which alternative H 2 -from-coal technologies can be compared. The catalytic membrane reactor (CMR), combining a WGS catalyst and hydrogen-selective metal membrane, can potentially replace the multiple shift and separation stages of a plant based on conventional technology. CMR-based shift and separation offers several major advantages over the conventional approach, including greater-than-equilibrium WGS conversion, the containment of the CO 2 at high-pressure and a reduction in the number of unit processes. To determine capital costs of a WGS CMR-based H 2 -from-coal plant, a prototype planar CMR was constructed and tested with varying catalyst bed depth, residence time and membrane type (commercially-sourced 50 μm Pd or 40 μm Pd-25Ag wt%). Experiments to measure CO conversion, and H 2 flux and yield were conducted at 400 C with a feed pressure of 20 bar H 2 O:C ratio of 3 and a H 2 product pressure of 1 bar. Under the optimum conditions examined (with a 40 μm-thick Pd-25Ag membrane and 2 would be required to provide a throughput of 300 tpd with 85% H 2 yield. The capital cost of the CMR component of the plant would be around $US 180 million (based on current metal prices), of which 73% can be attributed to the cost of the Pd-Ag alloy membranes. Incorporation of a membrane that meets the 2015 US DOE cost and flux targets would offer cost parity, with a plant cost of $US 44 million and a total membrane area of ∝13,000 m 2 . Meeting these performance and cost targets would likely require a shift to very thin Pd-alloy membranes or highly-permeable Group IV, V body-centred-cubic alloys. (author)

  19. Gleaning evolutionary insights from the genome sequence of a probiotic yeast Saccharomyces boulardii.

    Science.gov (United States)

    Khatri, Indu; Akhtar, Akil; Kaur, Kamaldeep; Tomar, Rajul; Prasad, Gandham Satyanarayana; Ramya, Thirumalai Nallan Chakravarthy; Subramanian, Srikrishna

    2013-10-22

    The yeast Saccharomyces boulardii is used worldwide as a probiotic to alleviate the effects of several gastrointestinal diseases and control antibiotics-associated diarrhea. While many studies report the probiotic effects of S. boulardii, no genome information for this yeast is currently available in the public domain. We report the 11.4 Mbp draft genome of this probiotic yeast. The draft genome was obtained by assembling Roche 454 FLX + shotgun data into 194 contigs with an N50 of 251 Kbp. We compare our draft genome with all other Saccharomyces cerevisiae genomes. Our analysis confirms the close similarity of S. boulardii to S. cerevisiae strains and provides a framework to understand the probiotic effects of this yeast, which exhibits unique physiological and metabolic properties.

  20. Assessing the genome level diversity of Listeria monocytogenes from contaminated ice cream and environmental samples linked to a listeriosis outbreak in the United States.

    Directory of Open Access Journals (Sweden)

    Yi Chen

    Full Text Available A listeriosis outbreak in the United States implicated contaminated ice cream produced by one company, which operated 3 facilities. We performed single nucleotide polymorphism (SNP-based whole genome sequencing (WGS analysis on Listeria monocytogenes from food, environmental and clinical sources, identifying two clusters and a single branch, belonging to PCR serogroup IIb and genetic lineage I. WGS Cluster I, representing one outbreak strain, contained 82 food and environmental isolates from Facility I and 4 clinical isolates. These isolates differed by up to 29 SNPs, exhibited 9 pulsed-field gel electrophoresis (PFGE profiles and multilocus sequence typing (MLST sequence type (ST 5 of clonal complex 5 (CC5. WGS Cluster II contained 51 food and environmental isolates from Facility II, 4 food isolates from Facility I and 5 clinical isolates. Among them the isolates from Facility II and clinical isolates formed a clade and represented another outbreak strain. Isolates in this clade differed by up to 29 SNPs, exhibited 3 PFGE profiles and ST5. The only isolate collected from Facility III belonged to singleton ST489, which was in a single branch separate from Clusters I and II, and was not associated with the outbreak. WGS analyses clustered together outbreak-associated isolates exhibiting multiple PFGE profiles, while differentiating them from epidemiologically unrelated isolates that exhibited outbreak PFGE profiles. The complete genome of a Cluster I isolate allowed the identification and analyses of putative prophages, revealing that Cluster I isolates differed by the gain or loss of three putative prophages, causing the banding pattern differences among all 3 AscI-PFGE profiles observed in Cluster I isolates. WGS data suggested that certain ice cream varieties and/or production lines might have contamination sources unique to them. The SNP-based analysis was able to distinguish CC5 as a group from non-CC5 isolates and differentiate among CC5

  1. Assessing the genome level diversity of Listeria monocytogenes from contaminated ice cream and environmental samples linked to a listeriosis outbreak in the United States.

    Science.gov (United States)

    Chen, Yi; Luo, Yan; Curry, Phillip; Timme, Ruth; Melka, David; Doyle, Matthew; Parish, Mickey; Hammack, Thomas S; Allard, Marc W; Brown, Eric W; Strain, Errol A

    2017-01-01

    A listeriosis outbreak in the United States implicated contaminated ice cream produced by one company, which operated 3 facilities. We performed single nucleotide polymorphism (SNP)-based whole genome sequencing (WGS) analysis on Listeria monocytogenes from food, environmental and clinical sources, identifying two clusters and a single branch, belonging to PCR serogroup IIb and genetic lineage I. WGS Cluster I, representing one outbreak strain, contained 82 food and environmental isolates from Facility I and 4 clinical isolates. These isolates differed by up to 29 SNPs, exhibited 9 pulsed-field gel electrophoresis (PFGE) profiles and multilocus sequence typing (MLST) sequence type (ST) 5 of clonal complex 5 (CC5). WGS Cluster II contained 51 food and environmental isolates from Facility II, 4 food isolates from Facility I and 5 clinical isolates. Among them the isolates from Facility II and clinical isolates formed a clade and represented another outbreak strain. Isolates in this clade differed by up to 29 SNPs, exhibited 3 PFGE profiles and ST5. The only isolate collected from Facility III belonged to singleton ST489, which was in a single branch separate from Clusters I and II, and was not associated with the outbreak. WGS analyses clustered together outbreak-associated isolates exhibiting multiple PFGE profiles, while differentiating them from epidemiologically unrelated isolates that exhibited outbreak PFGE profiles. The complete genome of a Cluster I isolate allowed the identification and analyses of putative prophages, revealing that Cluster I isolates differed by the gain or loss of three putative prophages, causing the banding pattern differences among all 3 AscI-PFGE profiles observed in Cluster I isolates. WGS data suggested that certain ice cream varieties and/or production lines might have contamination sources unique to them. The SNP-based analysis was able to distinguish CC5 as a group from non-CC5 isolates and differentiate among CC5 isolates from

  2. Plasmid Flux in Escherichia coli ST131 Sublineages, Analyzed by Plasmid Constellation Network (PLACNET), a New Method for Plasmid Reconstruction from Whole Genome Sequences

    Science.gov (United States)

    Garcillán-Barcia, M. Pilar; Mora, Azucena; Blanco, Jorge; Coque, Teresa M.; de la Cruz, Fernando

    2014-01-01

    Bacterial whole genome sequence (WGS) methods are rapidly overtaking classical sequence analysis. Many bacterial sequencing projects focus on mobilome changes, since macroevolutionary events, such as the acquisition or loss of mobile genetic elements, mainly plasmids, play essential roles in adaptive evolution. Existing WGS analysis protocols do not assort contigs between plasmids and the main chromosome, thus hampering full analysis of plasmid sequences. We developed a method (called plasmid constellation networks or PLACNET) that identifies, visualizes and analyzes plasmids in WGS projects by creating a network of contig interactions, thus allowing comprehensive plasmid analysis within WGS datasets. The workflow of the method is based on three types of data: assembly information (including scaffold links and coverage), comparison to reference sequences and plasmid-diagnostic sequence features. The resulting network is pruned by expert analysis, to eliminate confounding data, and implemented in a Cytoscape-based graphic representation. To demonstrate PLACNET sensitivity and efficacy, the plasmidome of the Escherichia coli lineage ST131 was analyzed. ST131 is a globally spread clonal group of extraintestinal pathogenic E. coli (ExPEC), comprising different sublineages with ability to acquire and spread antibiotic resistance and virulence genes via plasmids. Results show that plasmids flux in the evolution of this lineage, which is wide open for plasmid exchange. MOBF12/IncF plasmids were pervasive, adding just by themselves more than 350 protein families to the ST131 pangenome. Nearly 50% of the most frequent γ–proteobacterial plasmid groups were found to be present in our limited sample of ten analyzed ST131 genomes, which represent the main ST131 sublineages. PMID:25522143

  3. Plasmid flux in Escherichia coli ST131 sublineages, analyzed by plasmid constellation network (PLACNET, a new method for plasmid reconstruction from whole genome sequences.

    Directory of Open Access Journals (Sweden)

    Val F Lanza

    2014-12-01

    Full Text Available Bacterial whole genome sequence (WGS methods are rapidly overtaking classical sequence analysis. Many bacterial sequencing projects focus on mobilome changes, since macroevolutionary events, such as the acquisition or loss of mobile genetic elements, mainly plasmids, play essential roles in adaptive evolution. Existing WGS analysis protocols do not assort contigs between plasmids and the main chromosome, thus hampering full analysis of plasmid sequences. We developed a method (called plasmid constellation networks or PLACNET that identifies, visualizes and analyzes plasmids in WGS projects by creating a network of contig interactions, thus allowing comprehensive plasmid analysis within WGS datasets. The workflow of the method is based on three types of data: assembly information (including scaffold links and coverage, comparison to reference sequences and plasmid-diagnostic sequence features. The resulting network is pruned by expert analysis, to eliminate confounding data, and implemented in a Cytoscape-based graphic representation. To demonstrate PLACNET sensitivity and efficacy, the plasmidome of the Escherichia coli lineage ST131 was analyzed. ST131 is a globally spread clonal group of extraintestinal pathogenic E. coli (ExPEC, comprising different sublineages with ability to acquire and spread antibiotic resistance and virulence genes via plasmids. Results show that plasmids flux in the evolution of this lineage, which is wide open for plasmid exchange. MOBF12/IncF plasmids were pervasive, adding just by themselves more than 350 protein families to the ST131 pangenome. Nearly 50% of the most frequent γ-proteobacterial plasmid groups were found to be present in our limited sample of ten analyzed ST131 genomes, which represent the main ST131 sublineages.

  4. Whole-Genome Sequencing for Routine Pathogen Surveillance in Public Health: a Population Snapshot of Invasive Staphylococcus aureus in Europe

    Directory of Open Access Journals (Sweden)

    David M. Aanensen

    2016-05-01

    Full Text Available The implementation of routine whole-genome sequencing (WGS promises to transform our ability to monitor the emergence and spread of bacterial pathogens. Here we combined WGS data from 308 invasive Staphylococcus aureus isolates corresponding to a pan-European population snapshot, with epidemiological and resistance data. Geospatial visualization of the data is made possible by a generic software tool designed for public health purposes that is available at the project URL (http://www.microreact.org/project/EkUvg9uY?tt=rc. Our analysis demonstrates that high-risk clones can be identified on the basis of population level properties such as clonal relatedness, abundance, and spatial structuring and by inferring virulence and resistance properties on the basis of gene content. We also show that in silico predictions of antibiotic resistance profiles are at least as reliable as phenotypic testing. We argue that this work provides a comprehensive road map illustrating the three vital components for future molecular epidemiological surveillance: (i large-scale structured surveys, (ii WGS, and (iii community-oriented database infrastructure and analysis tools.

  5. Variable selection models for genomic selection using whole-genome sequence data and singular value decomposition.

    Science.gov (United States)

    Meuwissen, Theo H E; Indahl, Ulf G; Ødegård, Jørgen

    2017-12-27

    Non-linear Bayesian genomic prediction models such as BayesA/B/C/R involve iteration and mostly Markov chain Monte Carlo (MCMC) algorithms, which are computationally expensive, especially when whole-genome sequence (WGS) data are analyzed. Singular value decomposition (SVD) of the genotype matrix can facilitate genomic prediction in large datasets, and can be used to estimate marker effects and their prediction error variances (PEV) in a computationally efficient manner. Here, we developed, implemented, and evaluated a direct, non-iterative method for the estimation of marker effects for the BayesC genomic prediction model. The BayesC model assumes a priori that markers have normally distributed effects with probability [Formula: see text] and no effect with probability (1 - [Formula: see text]). Marker effects and their PEV are estimated by using SVD and the posterior probability of the marker having a non-zero effect is calculated. These posterior probabilities are used to obtain marker-specific effect variances, which are subsequently used to approximate BayesC estimates of marker effects in a linear model. A computer simulation study was conducted to compare alternative genomic prediction methods, where a single reference generation was used to estimate marker effects, which were subsequently used for 10 generations of forward prediction, for which accuracies were evaluated. SVD-based posterior probabilities of markers having non-zero effects were generally lower than MCMC-based posterior probabilities, but for some regions the opposite occurred, resulting in clear signals for QTL-rich regions. The accuracies of breeding values estimated using SVD- and MCMC-based BayesC analyses were similar across the 10 generations of forward prediction. For an intermediate number of generations (2 to 5) of forward prediction, accuracies obtained with the BayesC model tended to be slightly higher than accuracies obtained using the best linear unbiased prediction of SNP

  6. Epidemiology and whole genome sequencing of an ongoing point-source Salmonella Agona outbreak associated with sushi consumption in western Sydney, Australia 2015.

    Science.gov (United States)

    Thompson, C K; Wang, Q; Bag, S K; Franklin, N; Shadbolt, C T; Howard, P; Fearnley, E J; Quinn, H E; Sintchenko, V; Hope, K G

    2017-07-01

    During May 2015, an increase in Salmonella Agona cases was reported from western Sydney, Australia. We examine the public health actions used to investigate and control this increase. A descriptive case-series investigation was conducted. Six outbreak cases were identified; all had consumed cooked tuna sushi rolls purchased within a western Sydney shopping complex. Onset of illness for outbreak cases occurred between 7 April and 24 May 2015. Salmonella was isolated from food samples collected from the implicated premise and a prohibition order issued. No further cases were identified following this action. Whole genome sequence (WGS) analysis was performed on isolates recovered during this investigation, with additional S. Agona isolates from sporadic-clinical cases and routine food sampling in New South Wales, January to July 2015. Clinical isolates of outbreak cases were indistinguishable from food isolates collected from the implicated sushi outlet. Five additional clinical isolates not originally considered to be linked to the outbreak were genomically similar to outbreak isolates, indicating the point-source contamination may have started before routine surveillance identified an increase. This investigation demonstrated the value of genomics-guided public health action, where near real-time WGS enhanced the resolution of the epidemiological investigation.

  7. Inferring Aggregated Functional Traits from Metagenomic Data Using Constrained Non-negative Matrix Factorization: Application to Fiber Degradation in the Human Gut Microbiota.

    Science.gov (United States)

    Raguideau, Sébastien; Plancade, Sandra; Pons, Nicolas; Leclerc, Marion; Laroche, Béatrice

    2016-12-01

    Whole Genome Shotgun (WGS) metagenomics is increasingly used to study the structure and functions of complex microbial ecosystems, both from the taxonomic and functional point of view. Gene inventories of otherwise uncultured microbial communities make the direct functional profiling of microbial communities possible. The concept of community aggregated trait has been adapted from environmental and plant functional ecology to the framework of microbial ecology. Community aggregated traits are quantified from WGS data by computing the abundance of relevant marker genes. They can be used to study key processes at the ecosystem level and correlate environmental factors and ecosystem functions. In this paper we propose a novel model based approach to infer combinations of aggregated traits characterizing specific ecosystemic metabolic processes. We formulate a model of these Combined Aggregated Functional Traits (CAFTs) accounting for a hierarchical structure of genes, which are associated on microbial genomes, further linked at the ecosystem level by complex co-occurrences or interactions. The model is completed with constraints specifically designed to exploit available genomic information, in order to favor biologically relevant CAFTs. The CAFTs structure, as well as their intensity in the ecosystem, is obtained by solving a constrained Non-negative Matrix Factorization (NMF) problem. We developed a multicriteria selection procedure for the number of CAFTs. We illustrated our method on the modelling of ecosystemic functional traits of fiber degradation by the human gut microbiota. We used 1408 samples of gene abundances from several high-throughput sequencing projects and found that four CAFTs only were needed to represent the fiber degradation potential. This data reduction highlighted biologically consistent functional patterns while providing a high quality preservation of the original data. Our method is generic and can be applied to other metabolic processes in

  8. Inferring Aggregated Functional Traits from Metagenomic Data Using Constrained Non-negative Matrix Factorization: Application to Fiber Degradation in the Human Gut Microbiota.

    Directory of Open Access Journals (Sweden)

    Sébastien Raguideau

    2016-12-01

    Full Text Available Whole Genome Shotgun (WGS metagenomics is increasingly used to study the structure and functions of complex microbial ecosystems, both from the taxonomic and functional point of view. Gene inventories of otherwise uncultured microbial communities make the direct functional profiling of microbial communities possible. The concept of community aggregated trait has been adapted from environmental and plant functional ecology to the framework of microbial ecology. Community aggregated traits are quantified from WGS data by computing the abundance of relevant marker genes. They can be used to study key processes at the ecosystem level and correlate environmental factors and ecosystem functions. In this paper we propose a novel model based approach to infer combinations of aggregated traits characterizing specific ecosystemic metabolic processes. We formulate a model of these Combined Aggregated Functional Traits (CAFTs accounting for a hierarchical structure of genes, which are associated on microbial genomes, further linked at the ecosystem level by complex co-occurrences or interactions. The model is completed with constraints specifically designed to exploit available genomic information, in order to favor biologically relevant CAFTs. The CAFTs structure, as well as their intensity in the ecosystem, is obtained by solving a constrained Non-negative Matrix Factorization (NMF problem. We developed a multicriteria selection procedure for the number of CAFTs. We illustrated our method on the modelling of ecosystemic functional traits of fiber degradation by the human gut microbiota. We used 1408 samples of gene abundances from several high-throughput sequencing projects and found that four CAFTs only were needed to represent the fiber degradation potential. This data reduction highlighted biologically consistent functional patterns while providing a high quality preservation of the original data. Our method is generic and can be applied to other

  9. Genome wide SNP discovery in flax through next generation sequencing of reduced representation libraries

    Directory of Open Access Journals (Sweden)

    Kumar Santosh

    2012-12-01

    Full Text Available Abstract Background Flax (Linum usitatissimum L. is a significant fibre and oilseed crop. Current flax molecular markers, including isozymes, RAPDs, AFLPs and SSRs are of limited use in the construction of high density linkage maps and for association mapping applications due to factors such as low reproducibility, intense labour requirements and/or limited numbers. We report here on the use of a reduced representation library strategy combined with next generation Illumina sequencing for rapid and large scale discovery of SNPs in eight flax genotypes. SNP discovery was performed through in silico analysis of the sequencing data against the whole genome shotgun sequence assembly of flax genotype CDC Bethune. Genotyping-by-sequencing of an F6-derived recombinant inbred line population provided validation of the SNPs. Results Reduced representation libraries of eight flax genotypes were sequenced on the Illumina sequencing platform resulting in sequence coverage ranging from 4.33 to 15.64X (genome equivalents. Depending on the relatedness of the genotypes and the number and length of the reads, between 78% and 93% of the reads mapped onto the CDC Bethune whole genome shotgun sequence assembly. A total of 55,465 SNPs were discovered with the largest number of SNPs belonging to the genotypes with the highest mapping coverage percentage. Approximately 84% of the SNPs discovered were identified in a single genotype, 13% were shared between any two genotypes and the remaining 3% in three or more. Nearly a quarter of the SNPs were found in genic regions. A total of 4,706 out of 4,863 SNPs discovered in Macbeth were validated using genotyping-by-sequencing of 96 F6 individuals from a recombinant inbred line population derived from a cross between CDC Bethune and Macbeth, corresponding to a validation rate of 96.8%. Conclusions Next generation sequencing of reduced representation libraries was successfully implemented for genome-wide SNP discovery from

  10. in silico Whole Genome Sequencer & Analyzer (iWGS): a computational pipeline to guide the design and analysis of de novo genome sequencing studies

    Science.gov (United States)

    The availability of genomes across the tree of life is highly biased toward vertebrates, pathogens, human disease models, and organisms with relatively small and simple genomes. Recent progress in genomics has enabled the de novo decoding of the genome of virtually any organism, greatly expanding it...

  11. Multiple hospital outbreaks of vanA Enterococcus faecium in Denmark, 2012-13, investigated by WGS, MLST and PFGE

    DEFF Research Database (Denmark)

    Pinholt, Mette; Larner-Svensson, Hanna; Littauer, Pia

    2015-01-01

    -based typing could replace PFGE for typing of VREfm. METHODS: A population-based study was conducted including all VREfm isolates submitted for national surveillance from January 2012 to April 2013. All isolates were investigated by WGS, MLST and PFGE. RESULTS: One-hundred and thirty-two isolates were included......) were identified using WGS. Direct or indirect transmission of VREfm between patients and intra- and inter-regional spreading clones was observed. We identified 10 STs. PFGE identified four major clusters (13-43 isolates) and seven minor clusters (two to three isolates). The results from the typing...... methods were highly concordant. However, WGS-based typing had the highest discriminatory power. CONCLUSIONS: This study emphasizes the importance of infection control measures to limit transmission of VREfm between patients. However, the diversity of the VREfm isolates points to the fact that other...

  12. Dicty_cDB: SSM365 [Dicty_cDB

    Lifescience Database Archive (English)

    Full Text Available WGS-SbicolorF (JM107 adapted methyl filtered) Sorghum bicolor genomic clone hv90e08 5', DNA sequence. 36 1.2... 2 BZ330175 |BZ330175.1 hv90e08.g1 WGS-SbicolorF (JM107 adapted methyl filtered) ...rF (JM107 adapted methyl filtered) Sorghum bicolor genomic clone hz26f02 5', DNA sequence. 36 1.4 2 BZ628823... |BZ628823.1 ih62d12.g1 WGS-SbicolorF (DH5a methyl filtered) Sorghum bicolor geno...mic clone ih62d12 5', DNA sequence. 36 1.7 2 BZ340606 |BZ340606.1 ic39h07.b1 WGS-SbicolorF (JM107 adapted methyl filter

  13. Genomic diagnosis for children with intellectual disability and/or developmental delay.

    Science.gov (United States)

    Bowling, Kevin M; Thompson, Michelle L; Amaral, Michelle D; Finnila, Candice R; Hiatt, Susan M; Engel, Krysta L; Cochran, J Nicholas; Brothers, Kyle B; East, Kelly M; Gray, David E; Kelley, Whitley V; Lamb, Neil E; Lose, Edward J; Rich, Carla A; Simmons, Shirley; Whittle, Jana S; Weaver, Benjamin T; Nesmith, Amy S; Myers, Richard M; Barsh, Gregory S; Bebin, E Martina; Cooper, Gregory M

    2017-05-30

    Developmental disabilities have diverse genetic causes that must be identified to facilitate precise diagnoses. We describe genomic data from 371 affected individuals, 309 of which were sequenced as proband-parent trios. Whole-exome sequences (WES) were generated for 365 individuals (127 affected) and whole-genome sequences (WGS) were generated for 612 individuals (244 affected). Pathogenic or likely pathogenic variants were found in 100 individuals (27%), with variants of uncertain significance in an additional 42 (11.3%). We found that a family history of neurological disease, especially the presence of an affected first-degree relative, reduces the pathogenic/likely pathogenic variant identification rate, reflecting both the disease relevance and ease of interpretation of de novo variants. We also found that improvements to genetic knowledge facilitated interpretation changes in many cases. Through systematic reanalyses, we have thus far reclassified 15 variants, with 11.3% of families who initially were found to harbor a VUS and 4.7% of families with a negative result eventually found to harbor a pathogenic or likely pathogenic variant. To further such progress, the data described here are being shared through ClinVar, GeneMatcher, and dbGaP. Our data strongly support the value of large-scale sequencing, especially WGS within proband-parent trios, as both an effective first-choice diagnostic tool and means to advance clinical and research progress related to pediatric neurological disease.

  14. Brucella abortus strain 2308 Wisconsin genome: importance of the definition of reference strains

    Directory of Open Access Journals (Sweden)

    Marcela Suárez-Esquivel

    2016-09-01

    Full Text Available Brucellosis is a bacterial infectious disease affecting a wide range of mammals and a neglected zoonosis caused by species of the genetically homogenous genus Brucella. As in most studies on bacterial diseases, research in brucellosis is carried out by using reference strains as canonical models to understand the mechanisms underlying host pathogen interactions. We performed whole genome sequencing (WGS analysis of the reference strain Brucella abortus 2308 routinely used in our laboratory, including manual curated annotation accessible as an editable version at www.wikipedia.Comparison of this genome with two publically available 2308 genomes showed significant differences, particularly indels related to insertional elements, suggesting variability related to the transposition of these elements within the same strain. Considering the outcome of high resolution genomic techniques in the bacteriology field, the conventional concept of strain definition needs to be revised.

  15. Evaluation of multiple approaches to identify genome-wide polymorphisms in closely related genotypes of sweet cherry (Prunus avium L.

    Directory of Open Access Journals (Sweden)

    Seanna Hewitt

    Full Text Available Identification of genetic polymorphisms and subsequent development of molecular markers is important for marker assisted breeding of superior cultivars of economically important species. Sweet cherry (Prunus avium L. is an economically important non-climacteric tree fruit crop in the Rosaceae family and has undergone a genetic bottleneck due to breeding, resulting in limited genetic diversity in the germplasm that is utilized for breeding new cultivars. Therefore, it is critical to recognize the best platforms for identifying genome-wide polymorphisms that can help identify, and consequently preserve, the diversity in a genetically constrained species. For the identification of polymorphisms in five closely related genotypes of sweet cherry, a gel-based approach (TRAP, reduced representation sequencing (TRAPseq, a 6k cherry SNParray, and whole genome sequencing (WGS approaches were evaluated in the identification of genome-wide polymorphisms in sweet cherry cultivars. All platforms facilitated detection of polymorphisms among the genotypes with variable efficiency. In assessing multiple SNP detection platforms, this study has demonstrated that a combination of appropriate approaches is necessary for efficient polymorphism identification, especially between closely related cultivars of a species. The information generated in this study provides a valuable resource for future genetic and genomic studies in sweet cherry, and the insights gained from the evaluation of multiple approaches can be utilized for other closely related species with limited genetic diversity in the breeding germplasm. Keywords: Polymorphisms, Prunus avium, Next-generation sequencing, Target region amplification polymorphism (TRAP, Genetic diversity, SNParray, Reduced representation sequencing, Whole genome sequencing (WGS

  16. Whole genome sequencing and analysis of Campylobacter coli YH502 from retail chicken reveals a plasmid-borne type VI secretion system

    Directory of Open Access Journals (Sweden)

    Sandeep Ghatak

    2017-03-01

    Full Text Available Campylobacter is a major cause of foodborne illnesses worldwide. Campylobacter infections, commonly caused by ingestion of undercooked poultry and meat products, can lead to gastroenteritis and chronic reactive arthritis in humans. Whole genome sequencing (WGS is a powerful technology that provides comprehensive genetic information about bacteria and is increasingly being applied to study foodborne pathogens: e.g., evolution, epidemiology/outbreak investigation, and detection. Herein we report the complete genome sequence of Campylobacter coli strain YH502 isolated from retail chicken in the United States. WGS, de novo assembly, and annotation of the genome revealed a chromosome of 1,718,974 bp and a mega-plasmid (pCOS502 of 125,964 bp. GC content of the genome was 31.2% with 1931 coding sequences and 53 non-coding RNAs. Multiple virulence factors including a plasmid-borne type VI secretion system and antimicrobial resistance genes (beta-lactams, fluoroquinolones, and aminoglycoside were found. The presence of T6SS in a mobile genetic element (plasmid suggests plausible horizontal transfer of these virulence genes to other organisms. The C. coli YH502 genome also harbors CRISPR sequences and associated proteins. Phylogenetic analysis based on average nucleotide identity and single nucleotide polymorphisms identified closely related C. coli genomes available in the NCBI database. Taken together, the analyzed genomic data of this potentially virulent strain of C. coli will facilitate further understanding of this important foodborne pathogen most likely leading to better control strategies. The chromosome and plasmid sequences of C. coli YH502 have been deposited in GenBank under the accession numbers CP018900.1 and CP018901.1, respectively.

  17. Midpregnancy Marriage and Divorce: Why the Death of Shotgun Marriage Has Been Greatly Exaggerated.

    Science.gov (United States)

    Gibson-Davis, Christina M; Ananat, Elizabeth O; Gassman-Pines, Anna

    2016-12-01

    Conventional wisdom holds that births following the colloquially termed "shotgun marriage"-that is, births to parents who married between conception and the birth-are nearing obsolescence. To investigate trends in shotgun marriage, we matched North Carolina administrative data on nearly 800,000 first births among white and black mothers to marriage and divorce records. We found that among married births, midpregnancy-married births (our preferred term for shotgun-married births) have been relatively stable at about 10 % over the past quarter-century while increasing substantially for vulnerable population subgroups. In 2012, among black and white less-educated and younger women, midpregnancy-married births accounted for approximately 20 % to 25 % of married first births. The increasing representation of midpregnancy-married births among married births raises concerns about well-being among at-risk families because midpregnancy marriages may be quite fragile. Our analysis revealed, however, that midpregnancy marriages were more likely to dissolve only among more advantaged groups. Of those groups considered to be most at risk of divorce-namely, black women with lower levels of education and who were younger-midpregnancy marriages had the same or lower likelihood of divorce as preconception marriages. Our results suggest an overlooked resiliency in a type of marriage that has only increased in salience.

  18. Genome-wide patterns of differentiation and spatially varying selection between postglacial recolonization lineages of Populus alba (Salicaceae), a widespread forest tree.

    Science.gov (United States)

    Stölting, Kai N; Paris, Margot; Meier, Cécile; Heinze, Berthold; Castiglione, Stefano; Bartha, Denes; Lexer, Christian

    2015-08-01

    Studying the divergence continuum in plants is relevant to fundamental and applied biology because of the potential to reveal functionally important genetic variation. In this context, whole-genome sequencing (WGS) provides the necessary rigour for uncovering footprints of selection. We resequenced populations of two divergent phylogeographic lineages of Populus alba (n = 48), thoroughly characterized by microsatellites (n = 317), and scanned their genomes for regions of unusually high allelic differentiation and reduced diversity using > 1.7 million single nucleotide polymorphisms (SNPs) from WGS. Results were confirmed by Sanger sequencing. On average, 9134 high-differentiation (≥ 4 standard deviations) outlier SNPs were uncovered between populations, 848 of which were shared by ≥ three replicate comparisons. Annotation revealed that 545 of these were located in 437 predicted genes. Twelve percent of differentiation outlier genome regions exhibited significantly reduced genetic diversity. Gene ontology (GO) searches were successful for 327 high-differentiation genes, and these were enriched for 63 GO terms. Our results provide a snapshot of the roles of 'hard selective sweeps' vs divergent selection of standing genetic variation in distinct postglacial recolonization lineages of P. alba. Thus, this study adds to our understanding of the mechanisms responsible for the origin of functionally relevant variation in temperate trees. © 2015 The Authors. New Phytologist © 2015 New Phytologist Trust.

  19. An efficient approach to BAC based assembly of complex genomes.

    Science.gov (United States)

    Visendi, Paul; Berkman, Paul J; Hayashi, Satomi; Golicz, Agnieszka A; Bayer, Philipp E; Ruperao, Pradeep; Hurgobin, Bhavna; Montenegro, Juan; Chan, Chon-Kit Kenneth; Staňková, Helena; Batley, Jacqueline; Šimková, Hana; Doležel, Jaroslav; Edwards, David

    2016-01-01

    There has been an exponential growth in the number of genome sequencing projects since the introduction of next generation DNA sequencing technologies. Genome projects have increasingly involved assembly of whole genome data which produces inferior assemblies compared to traditional Sanger sequencing of genomic fragments cloned into bacterial artificial chromosomes (BACs). While whole genome shotgun sequencing using next generation sequencing (NGS) is relatively fast and inexpensive, this method is extremely challenging for highly complex genomes, where polyploidy or high repeat content confounds accurate assembly, or where a highly accurate 'gold' reference is required. Several attempts have been made to improve genome sequencing approaches by incorporating NGS methods, to variable success. We present the application of a novel BAC sequencing approach which combines indexed pools of BACs, Illumina paired read sequencing, a sequence assembler specifically designed for complex BAC assembly, and a custom bioinformatics pipeline. We demonstrate this method by sequencing and assembling BAC cloned fragments from bread wheat and sugarcane genomes. We demonstrate that our assembly approach is accurate, robust, cost effective and scalable, with applications for complete genome sequencing in large and complex genomes.

  20. Dicty_cDB: SSL385 [Dicty_cDB

    Lifescience Database Archive (English)

    Full Text Available C37A2, complete sequence. 38 0.049 3 BZ347525 |BZ347525.1 hm84h07.b1 WGS-SbicolorF (JM107 adapted methyl filter...-SbicolorF (JM107 adapted methyl filtered) Sorghum bicolor genomic clone ho57b09 ..._Ba0020C03 5', genomic survey sequence. 42 0.33 2 BZ366117 |BZ366117.1 ic94g05.g1 WGS-SbicolorF (JM107 adapted methyl filter...in ordered pieces. 44 0.88 1 BZ626058 |BZ626058.1 ih42h02.b1 WGS-SbicolorF (DH5a methyl filter...07.g1 WGS-SbicolorF (DH5a methyl filtered) Sorghum bicolor genomic clone ii21a07, DNA sequence. 38 3.0 2 dna

  1. Dicty_cDB: SSL552 [Dicty_cDB

    Lifescience Database Archive (English)

    Full Text Available 28 0.36 3 BZ330174 |BZ330174.1 hv90e08.b1 WGS-SbicolorF (JM107 adapted methyl filtered) Sorghum bicolor gen...S-SbicolorF (JM107 adapted methyl filtered) Sorghum bicolor genomic clone hv90e08 5', DNA sequence. 36 0.63 ...2 BZ335743 |BZ335743.1 hz26f02.g1 WGS-SbicolorF (JM107 adapted methyl filtered) Sorghum bicolor genomic clon...e hz26f02 5', DNA sequence. 36 0.68 2 BZ628823 |BZ628823.1 ih62d12.g1 WGS-SbicolorF (DH5a methyl filtered) S...06.1 ic39h07.b1 WGS-SbicolorF (JM107 adapted methyl filtered) Sorghum bicolor genomic clone ic39h07 5', DNA

  2. Generation of a BAC-based physical map of the melon genome

    Directory of Open Access Journals (Sweden)

    Puigdomènech Pere

    2010-05-01

    Full Text Available Abstract Background Cucumis melo (melon belongs to the Cucurbitaceae family, whose economic importance among horticulture crops is second only to Solanaceae. Melon has high intra-specific genetic variation, morphologic diversity and a small genome size (450 Mb, which make this species suitable for a great variety of molecular and genetic studies that can lead to the development of tools for breeding varieties of the species. A number of genetic and genomic resources have already been developed, such as several genetic maps and BAC genomic libraries. These tools are essential for the construction of a physical map, a valuable resource for map-based cloning, comparative genomics and assembly of whole genome sequencing data. However, no physical map of any Cucurbitaceae has yet been developed. A project has recently been started to sequence the complete melon genome following a whole-genome shotgun strategy, which makes use of massive sequencing data. A BAC-based melon physical map will be a useful tool to help assemble and refine the draft genome data that is being produced. Results A melon physical map was constructed using a 5.7 × BAC library and a genetic map previously developed in our laboratories. High-information-content fingerprinting (HICF was carried out on 23,040 BAC clones, digesting with five restriction enzymes and SNaPshot labeling, followed by contig assembly with FPC software. The physical map has 1,355 contigs and 441 singletons, with an estimated physical length of 407 Mb (0.9 × coverage of the genome and the longest contig being 3.2 Mb. The anchoring of 845 BAC clones to 178 genetic markers (100 RFLPs, 76 SNPs and 2 SSRs also allowed the genetic positioning of 183 physical map contigs/singletons, representing 55 Mb (12% of the melon genome, to individual chromosomal loci. The melon FPC database is available for download at http://melonomics.upv.es/static/files/public/physical_map/. Conclusions Here we report the construction

  3. Discovering human germ cell mutagens with whole genome sequencing: Insights from power calculations reveal the importance of controlling for between-family variability.

    Science.gov (United States)

    Webster, R J; Williams, A; Marchetti, F; Yauk, C L

    2018-07-01

    Mutations in germ cells pose potential genetic risks to offspring. However, de novo mutations are rare events that are spread across the genome and are difficult to detect. Thus, studies in this area have generally been under-powered, and no human germ cell mutagen has been identified. Whole Genome Sequencing (WGS) of human pedigrees has been proposed as an approach to overcome these technical and statistical challenges. WGS enables analysis of a much wider breadth of the genome than traditional approaches. Here, we performed power analyses to determine the feasibility of using WGS in human families to identify germ cell mutagens. Different statistical models were compared in the power analyses (ANOVA and multiple regression for one-child families, and mixed effect model sampling between two to four siblings per family). Assumptions were made based on parameters from the existing literature, such as the mutation-by-paternal age effect. We explored two scenarios: a constant effect due to an exposure that occurred in the past, and an accumulating effect where the exposure is continuing. Our analysis revealed the importance of modeling inter-family variability of the mutation-by-paternal age effect. Statistical power was improved by models accounting for the family-to-family variability. Our power analyses suggest that sufficient statistical power can be attained with 4-28 four-sibling families per treatment group, when the increase in mutations ranges from 40 to 10% respectively. Modeling family variability using mixed effect models provided a reduction in sample size compared to a multiple regression approach. Much larger sample sizes were required to detect an interaction effect between environmental exposures and paternal age. These findings inform study design and statistical modeling approaches to improve power and reduce sequencing costs for future studies in this area. Crown Copyright © 2018. Published by Elsevier B.V. All rights reserved.

  4. Assembly and diploid architecture of an individual human genome via single-molecule technologies.

    Science.gov (United States)

    Pendleton, Matthew; Sebra, Robert; Pang, Andy Wing Chun; Ummat, Ajay; Franzen, Oscar; Rausch, Tobias; Stütz, Adrian M; Stedman, William; Anantharaman, Thomas; Hastie, Alex; Dai, Heng; Fritz, Markus Hsi-Yang; Cao, Han; Cohain, Ariella; Deikus, Gintaras; Durrett, Russell E; Blanchard, Scott C; Altman, Roger; Chin, Chen-Shan; Guo, Yan; Paxinos, Ellen E; Korbel, Jan O; Darnell, Robert B; McCombie, W Richard; Kwok, Pui-Yan; Mason, Christopher E; Schadt, Eric E; Bashir, Ali

    2015-08-01

    We present the first comprehensive analysis of a diploid human genome that combines single-molecule sequencing with single-molecule genome maps. Our hybrid assembly markedly improves upon the contiguity observed from traditional shotgun sequencing approaches, with scaffold N50 values approaching 30 Mb, and we identified complex structural variants (SVs) missed by other high-throughput approaches. Furthermore, by combining Illumina short-read data with long reads, we phased both single-nucleotide variants and SVs, generating haplotypes with over 99% consistency with previous trio-based studies. Our work shows that it is now possible to integrate single-molecule and high-throughput sequence data to generate de novo assembled genomes that approach reference quality.

  5. Culture-independent detection and characterisation of Mycobacterium tuberculosis and M. africanum in sputum samples using shotgun metagenomics on a benchtop sequencer

    Directory of Open Access Journals (Sweden)

    Emma L. Doughty

    2014-09-01

    Full Text Available Tuberculosis remains a major global health problem. Laboratory diagnostic methods that allow effective, early detection of cases are central to management of tuberculosis in the individual patient and in the community. Since the 1880s, laboratory diagnosis of tuberculosis has relied primarily on microscopy and culture. However, microscopy fails to provide species- or lineage-level identification and culture-based workflows for diagnosis of tuberculosis remain complex, expensive, slow, technically demanding and poorly able to handle mixed infections. We therefore explored the potential of shotgun metagenomics, sequencing of DNA from samples without culture or target-specific amplification or capture, to detect and characterise strains from the Mycobacterium tuberculosis complex in smear-positive sputum samples obtained from The Gambia in West Africa. Eight smear- and culture-positive sputum samples were investigated using a differential-lysis protocol followed by a kit-based DNA extraction method, with sequencing performed on a benchtop sequencing instrument, the Illumina MiSeq. The number of sequence reads in each sputum-derived metagenome ranged from 989,442 to 2,818,238. The proportion of reads in each metagenome mapping against the human genome ranged from 20% to 99%. We were able to detect sequences from the M. tuberculosis complex in all eight samples, with coverage of the H37Rv reference genome ranging from 0.002X to 0.7X. By analysing the distribution of large sequence polymorphisms (deletions and the locations of the insertion element IS6110 and single nucleotide polymorphisms (SNPs, we were able to assign seven of eight metagenome-derived genomes to a species and lineage within the M. tuberculosis complex. Two metagenome-derived mycobacterial genomes were assigned to M. africanum, a species largely confined to West Africa; the others that could be assigned belonged to lineages T, H or LAM within the clade of “modern” M. tuberculosis

  6. Genome-wide comparison of medieval and modern Mycobacterium leprae.

    Science.gov (United States)

    Schuenemann, Verena J; Singh, Pushpendra; Mendum, Thomas A; Krause-Kyora, Ben; Jäger, Günter; Bos, Kirsten I; Herbig, Alexander; Economou, Christos; Benjak, Andrej; Busso, Philippe; Nebel, Almut; Boldsen, Jesper L; Kjellström, Anna; Wu, Huihai; Stewart, Graham R; Taylor, G Michael; Bauer, Peter; Lee, Oona Y-C; Wu, Houdini H T; Minnikin, David E; Besra, Gurdyal S; Tucker, Katie; Roffey, Simon; Sow, Samba O; Cole, Stewart T; Nieselt, Kay; Krause, Johannes

    2013-07-12

    Leprosy was endemic in Europe until the Middle Ages. Using DNA array capture, we have obtained genome sequences of Mycobacterium leprae from skeletons of five medieval leprosy cases from the United Kingdom, Sweden, and Denmark. In one case, the DNA was so well preserved that full de novo assembly of the ancient bacterial genome could be achieved through shotgun sequencing alone. The ancient M. leprae sequences were compared with those of 11 modern strains, representing diverse genotypes and geographic origins. The comparisons revealed remarkable genomic conservation during the past 1000 years, a European origin for leprosy in the Americas, and the presence of an M. leprae genotype in medieval Europe now commonly associated with the Middle East. The exceptional preservation of M. leprae biomarkers, both DNA and mycolic acids, in ancient skeletons has major implications for palaeomicrobiology and human pathogen evolution.

  7. Defining Diagnostic Biomarkers Using Shotgun Proteomics and MALDI-TOF Mass Spectrometry.

    Science.gov (United States)

    Armengaud, Jean

    2017-01-01

    Whole-cell MALDI-TOF has become a robust and widely used tool to quickly identify any pathogen. In addition to being routinely used in hospitals, it is also useful for low cost dereplication in large scale screening procedures of new environmental isolates for environmental biotechnology or taxonomical applications. Here, I describe how specific biomarkers can be defined using shotgun proteomics and whole-cell MALDI-TOF mass spectrometry. Based on MALDI-TOF spectra recorded on a given set of pathogens with internal calibrants, m/z values of interest are extracted. The proteins which contribute to these peaks are deduced from label-free shotgun proteomics measurements carried out on the same sample. Quantitative information based on the spectral count approach allows ranking the most probable candidates. Proteogenomic approaches help to define whether these proteins give the same m/z values along the whole taxon under consideration or result in heterogeneous lists. These specific biomarkers nicely complement conventional profiling approaches and may help to better define groups of organisms, for example at the subspecies level.

  8. PAnalyzer: A software tool for protein inference in shotgun proteomics

    Directory of Open Access Journals (Sweden)

    Prieto Gorka

    2012-11-01

    Full Text Available Abstract Background Protein inference from peptide identifications in shotgun proteomics must deal with ambiguities that arise due to the presence of peptides shared between different proteins, which is common in higher eukaryotes. Recently data independent acquisition (DIA approaches have emerged as an alternative to the traditional data dependent acquisition (DDA in shotgun proteomics experiments. MSE is the term used to name one of the DIA approaches used in QTOF instruments. MSE data require specialized software to process acquired spectra and to perform peptide and protein identifications. However the software available at the moment does not group the identified proteins in a transparent way by taking into account peptide evidence categories. Furthermore the inspection, comparison and report of the obtained results require tedious manual intervention. Here we report a software tool to address these limitations for MSE data. Results In this paper we present PAnalyzer, a software tool focused on the protein inference process of shotgun proteomics. Our approach considers all the identified proteins and groups them when necessary indicating their confidence using different evidence categories. PAnalyzer can read protein identification files in the XML output format of the ProteinLynx Global Server (PLGS software provided by Waters Corporation for their MSE data, and also in the mzIdentML format recently standardized by HUPO-PSI. Multiple files can also be read simultaneously and are considered as technical replicates. Results are saved to CSV, HTML and mzIdentML (in the case of a single mzIdentML input file files. An MSE analysis of a real sample is presented to compare the results of PAnalyzer and ProteinLynx Global Server. Conclusions We present a software tool to deal with the ambiguities that arise in the protein inference process. Key contributions are support for MSE data analysis by ProteinLynx Global Server and technical replicates

  9. DNA fingerprinting of Mycobacterium tuberculosis: from phage typing to whole-genome sequencing.

    Science.gov (United States)

    Schürch, Anita C; van Soolingen, Dick

    2012-06-01

    Current typing methods for Mycobacterium tuberculosis complex evolved from simple phenotypic approaches like phage typing and drug susceptibility profiling to DNA-based strain typing methods, such as IS6110-restriction fragment length polymorphisms (RFLP) and variable number of tandem repeats (VNTR) typing. Examples of the usefulness of molecular typing are source case finding and epidemiological linkage of tuberculosis (TB) cases, international transmission of MDR/XDR-TB, the discrimination between endogenous reactivation and exogenous re-infection as a cause of relapses after curative treatment of tuberculosis, the evidence of multiple M. tuberculosis infections, and the disclosure of laboratory cross-contaminations. Simultaneously, phylogenetic analyses were developed based on single nucleotide polymorphisms (SNPs), genomic deletions usually referred to as regions of difference (RDs) and spoligotyping which served both strain typing and phylogenetic analysis. National and international initiatives that rely on the application of these typing methods have brought significant insight into the molecular epidemiology of tuberculosis. However, current DNA fingerprinting methods have important limitations. They can often not distinguish between genetically closely related strains and the turn-over of these markers is variable. Moreover, the suitability of most DNA typing methods for phylogenetic reconstruction is limited as they show a high propensity of convergent evolution or misinfer genetic distances. In order to fully explore the possibilities of genotyping in the molecular epidemiology of tuberculosis and to study the phylogeny of the causative bacteria reliably, the application of whole-genome sequencing (WGS) analysis for all M. tuberculosis isolates is the optimal, although currently still a costly solution. In the last years WGS for typing of pathogens has been explored and yielded important additional information on strain diversity in comparison to the

  10. Evaluation of whole genome sequencing for outbreak detection of Salmonella enterica

    DEFF Research Database (Denmark)

    Leekitcharoenphon, Pimlapas; Nielsen, Eva M.; Kaas, Rolf Sommer

    2014-01-01

    Salmonella enterica is a common cause of minor and large food borne outbreaks. To achieve successful and nearly ‘real-time’ monitoring and identification of outbreaks, reliable sub-typing is essential. Whole genome sequencing (WGS) shows great promises for using as a routine epidemiological typing....... Enteritidis and 5 S. Derby were also sequenced and used for comparison. A number of different bioinformatics approaches were applied on the data; including pan-genome tree, k-mer tree, nucleotide difference tree and SNP tree. The outcome of each approach was evaluated in relation to the association...... of the isolates to specific outbreaks. The pan-genome tree clustered 65% of the S. Typhimurium isolates according to the pre-defined epidemiology, the k-mer tree 88%, the nucleotide difference tree 100% and the SNP tree 100% of the strains within S. Typhimurium. The resulting outcome of the four phylogenetic...

  11. Performance and economics of a Pd-based planar WGS membrane reactor for coal gasification

    Energy Technology Data Exchange (ETDEWEB)

    Dolan, M.D. [CSIRO Energy Technology, Pullenvale QLD 4069 (Australia); Donelson, R. [CSIRO Process Science and Engineering, Clayton VIC 3168 (Australia); Dave, N.C. [CSIRO Energy Technology, North Ryde NSW 2113 (Australia)

    2010-10-15

    Conceptual 300 tonne per day (tpd) H{sub 2}-from-coal plants have been the subject of several major costing exercises in the past decade. Incorporating conventional high- and low-temperature water-gas-shift (WGS) reactors, amine-based CO{sub 2} removal and PSA-based H{sub 2} purification systems, these studies provide a benchmark against which alternative H{sub 2}-from-coal technologies can be compared. The catalytic membrane reactor (CMR), combining a WGS catalyst and hydrogen-selective metal membrane, can potentially replace the multiple shift and separation stages of a plant based on conventional technology. CMR-based shift and separation offers several major advantages over the conventional approach, including greater-than-equilibrium WGS conversion, the containment of the CO{sub 2} at high-pressure and a reduction in the number of unit processes. To determine capital costs of a WGS CMR-based H{sub 2}-from-coal plant, a prototype planar CMR was constructed and tested with varying catalyst bed depth, residence time and membrane type (commercially-sourced 50 {mu}m Pd or 40 {mu}m Pd-25Ag wt%). Experiments to measure CO conversion, and H{sub 2} flux and yield were conducted at 400 C with a feed pressure of 20 bar H{sub 2}O:C ratio of 3 and a H{sub 2} product pressure of 1 bar. Under the optimum conditions examined (with a 40 {mu}m-thick Pd-25Ag membrane and <3 mm-thick catalyst bed), a membrane surface area of {proportional_to}25,000 m{sup 2} would be required to provide a throughput of 300 tpd with 85% H{sub 2} yield. The capital cost of the CMR component of the plant would be around $US 180 million (based on current metal prices), of which 73% can be attributed to the cost of the Pd-Ag alloy membranes. Incorporation of a membrane that meets the 2015 US DOE cost and flux targets would offer

  12. Genomic Characterization of USA300 Methicillin-Resistant Staphylococcus aureus (MRSA) to Evaluate Intraclass Transmission and Recurrence of Skin and Soft Tissue Infection (SSTI) Among High-Risk Military Trainees.

    Science.gov (United States)

    Millar, Eugene V; Rice, Gregory K; Elassal, Emad M; Schlett, Carey D; Bennett, Jason W; Redden, Cassie L; Mor, Deepika; Law, Natasha N; Tribble, David R; Hamilton, Theron; Ellis, Michael W; Bishop-Lilly, Kimberly A

    2017-08-01

    Military trainees are at increased risk for methicillin-resistant Staphylococcus aureus (MRSA) skin and soft tissue infection (SSTI). Whole genome sequencing (WGS) can refine our understanding of MRSA transmission and microevolution in congregate settings. We conducted a prospective case-control study of SSTI among US Army infantry trainees at Fort Benning, Georgia, from July 2012 to December 2014. We identified clusters of USA300 MRSA SSTI within select training classes and performed WGS on clinical isolates. We then linked genomic, phylogenetic, epidemiologic, and clinical data in order to evaluate intra- and interclass disease transmission. Furthermore, among cases of recurrent MRSA SSTI, we evaluated the intrahost relatedness of infecting strains. Nine training classes with ≥5 cases of USA300 MRSA SSTI were selected. Eighty USA300 MRSA clinical isolates from 74 trainees, 6 (8.1%) of whom had recurrent infection, were subjected to WGS. We identified 2719 single nucleotide variants (SNVs). The overall median (range) SNV difference between isolates was 173 (1-339). Intraclass median SNV differences ranged from 23 to 245. Two phylogenetic clusters were suggestive of interclass MRSA transmission. One of these clusters stemmed from 2 classes that were separated by a 13-month period but housed in the same barracks. Among trainees with recurrent MRSA SSTI, the intrahost median SNV difference was 7.5 (1-48). Application of WGS revealed intra- and interclass transmission of MRSA among military trainees. An interclass cluster between 2 noncontemporaneous classes suggests a long-term reservoir for MRSA in this setting. Published by Oxford University Press for the Infectious Diseases Society of America 2017. This work is written by (a) US Government employee(s) and is in the public domain in the US.

  13. MULTI-DIMENSIONAL MASS SPECTROMETRY-BASED SHOTGUN LIPIDOMICS AND NOVEL STRATEGIES FOR LIPIDOMIC ANALYSES

    Science.gov (United States)

    Han, Xianlin; Yang, Kui; Gross, Richard W.

    2011-01-01

    Since our last comprehensive review on multi-dimensional mass spectrometry-based shotgun lipidomics (Mass Spectrom. Rev. 24 (2005), 367), many new developments in the field of lipidomics have occurred. These developments include new strategies and refinements for shotgun lipidomic approaches that use direct infusion, including novel fragmentation strategies, identification of multiple new informative dimensions for mass spectrometric interrogation, and the development of new bioinformatic approaches for enhanced identification and quantitation of the individual molecular constituents that comprise each cell’s lipidome. Concurrently, advances in liquid chromatography-based platforms and novel strategies for quantitative matrix-assisted laser desorption/ionization mass spectrometry for lipidomic analyses have been developed. Through the synergistic use of this repertoire of new mass spectrometric approaches, the power and scope of lipidomics has been greatly expanded to accelerate progress toward the comprehensive understanding of the pleiotropic roles of lipids in biological systems. PMID:21755525

  14. Nano-Scale Au Supported on Carbon Materials for the Low Temperature Water Gas Shift (WGS Reaction

    Directory of Open Access Journals (Sweden)

    Paula Sánchez

    2011-12-01

    Full Text Available Au-based catalysts supported on carbon materials with different structures such as graphite (G and fishbone type carbon nanofibers (CNF-F were prepared using two different methods (impregnation and gold-sol to be tested in the water gas shift (WGS reaction. Atomic absorption spectrometry, transmission electron microscopy (TEM, temperature-programmed oxidation (TPO, X-ray diffraction (XRD, Raman spectroscopy, elemental analyses (CNH, N2 adsorption-desorption analysis, temperature-programmed reduction (TPR and temperature-programmed decomposition were employed to characterize both the supports and catalysts. Both the crystalline nature of the carbon supports and the method of gold incorporation had a strong influence on the way in which Au particles were deposited on the carbon surface. The higher crystallinity and the smaller and well dispersed Au particle size were, the higher activity of the catalysts in the WGS reaction was noted. Finally, catalytic activity showed an important dependence on the reaction temperature and steam-to-CO molar ratio.

  15. Shotgun Bisulfite Sequencing of the Betula platyphylla Genome Reveals the Tree’s DNA Methylation Patterning

    Directory of Open Access Journals (Sweden)

    Chang Su

    2014-12-01

    Full Text Available DNA methylation plays a critical role in the regulation of gene expression. Most studies of DNA methylation have been performed in herbaceous plants, and little is known about the methylation patterns in tree genomes. In the present study, we generated a map of methylated cytosines at single base pair resolution for Betula platyphylla (white birch by bisulfite sequencing combined with transcriptomics to analyze DNA methylation and its effects on gene expression. We obtained a detailed view of the function of DNA methylation sequence composition and distribution in the genome of B. platyphylla. There are 34,460 genes in the whole genome of birch, and 31,297 genes are methylated. Conservatively, we estimated that 14.29% of genomic cytosines are methylcytosines in birch. Among the methylation sites, the CHH context accounts for 48.86%, and is the largest proportion. Combined transcriptome and methylation analysis showed that the genes with moderate methylation levels had higher expression levels than genes with high and low methylation. In addition, methylated genes are highly enriched for the GO subcategories of binding activities, catalytic activities, cellular processes, response to stimulus and cell death, suggesting that methylation mediates these pathways in birch trees.

  16. Microbial Community Profiling of Human Saliva Using Shotgun Metagenomic Sequencing

    OpenAIRE

    Hasan, Nur A.; Young, Brian A.; Minard-Smith, Angela T.; Saeed, Kelly; Li, Huai; Heizer, Esley M.; McMillan, Nancy J.; Isom, Richard; Abdullah, Abdul Shakur; Bornman, Daniel M.; Faith, Seth A.; Choi, Seon Young; Dickens, Michael L.; Cebula, Thomas A.; Colwell, Rita R.

    2014-01-01

    Human saliva is clinically informative of both oral and general health. Since next generation shotgun sequencing (NGS) is now widely used to identify and quantify bacteria, we investigated the bacterial flora of saliva microbiomes of two healthy volunteers and five datasets from the Human Microbiome Project, along with a control dataset containing short NGS reads from bacterial species representative of the bacterial flora of human saliva. GENIUS, a system designed to identify and quantify ba...

  17. Tracembler – software for in-silico chromosome walking in unassembled genomes

    Directory of Open Access Journals (Sweden)

    Wilkerson Matthew D

    2007-05-01

    Full Text Available Abstract Background Whole genome shotgun sequencing produces increasingly higher coverage of a genome with random sequence reads. Progressive whole genome assembly and eventual finishing sequencing is a process that typically takes several years for large eukaryotic genomes. In the interim, all sequence reads of public sequencing projects are made available in repositories such as the NCBI Trace Archive. For a particular locus, sequencing coverage may be high enough early on to produce a reliable local genome assembly. We have developed software, Tracembler, that facilitates in silico chromosome walking by recursively assembling reads of a selected species from the NCBI Trace Archive starting with reads that significantly match sequence seeds supplied by the user. Results Tracembler takes one or multiple DNA or protein sequence(s as input to the NCBI Trace Archive BLAST engine to identify matching sequence reads from a species of interest. The BLAST searches are carried out recursively such that BLAST matching sequences identified in previous rounds of searches are used as new queries in subsequent rounds of BLAST searches. The recursive BLAST search stops when either no more new matching sequences are found, a given maximal number of queries is exhausted, or a specified maximum number of rounds of recursion is reached. All the BLAST matching sequences are then assembled into contigs based on significant sequence overlaps using the CAP3 program. We demonstrate the validity of the concept and software implementation with an example of successfully recovering a full-length Chrm2 gene as well as its upstream and downstream genomic regions from Rattus norvegicus reads. In a second example, a query with two adjacent Medicago truncatula genes as seeds resulted in a contig that likely identifies the microsyntenic homologous soybean locus. Conclusion Tracembler streamlines the process of recursive database searches, sequence assembly, and gene

  18. Whole genome sequencing resource identifies 18 new candidate genes for autism spectrum disorder

    Science.gov (United States)

    Yuen, Ryan KC; Merico, Daniele; Bookman, Matt; Howe, Jennifer L; Thiruvahindrapuram, Bhooma; Patel, Rohan V; Whitney, Joe; Deflaux, Nicole; Bingham, Jonathan; Wang, Zhuozhi; Pellecchia, Giovanna; Buchanan, Janet A; Walker, Susan; Marshall, Christian R; Uddin, Mohammed; Zarrei, Mehdi; Deneault, Eric; D’Abate, Lia; Chan, Ada JS; Koyanagi, Stephanie; Paton, Tara; Pereira, Sergio L; Hoang, Ny; Engchuan, Worrawat; Higginbotham, Edward J; Ho, Karen; Lamoureux, Sylvia; Li, Weili; MacDonald, Jeffrey R; Nalpathamkalam, Thomas; Sung, Wilson WL; Tsoi, Fiona J; Wei, John; Xu, Lizhen; Tasse, Anne-Marie; Kirby, Emily; Van Etten, William; Twigger, Simon; Roberts, Wendy; Drmic, Irene; Jilderda, Sanne; Modi, Bonnie MacKinnon; Kellam, Barbara; Szego, Michael; Cytrynbaum, Cheryl; Weksberg, Rosanna; Zwaigenbaum, Lonnie; Woodbury-Smith, Marc; Brian, Jessica; Senman, Lili; Iaboni, Alana; Doyle-Thomas, Krissy; Thompson, Ann; Chrysler, Christina; Leef, Jonathan; Savion-Lemieux, Tal; Smith, Isabel M; Liu, Xudong; Nicolson, Rob; Seifer, Vicki; Fedele, Angie; Cook, Edwin H; Dager, Stephen; Estes, Annette; Gallagher, Louise; Malow, Beth A; Parr, Jeremy R; Spence, Sarah J; Vorstman, Jacob; Frey, Brendan J; Robinson, James T; Strug, Lisa J; Fernandez, Bridget A; Elsabbagh, Mayada; Carter, Melissa T; Hallmayer, Joachim; Knoppers, Bartha M; Anagnostou, Evdokia; Szatmari, Peter; Ring, Robert H; Glazer, David; Pletcher, Mathew T; Scherer, Stephen W

    2017-01-01

    We are performing whole genome sequencing (WGS) of families with Autism Spectrum Disorder (ASD) to build a resource, named MSSNG, to enable the sub-categorization of phenotypes and underlying genetic factors involved. Here, we report WGS of 5,205 samples from families with ASD, accompanied by clinical information, creating a database accessible in a cloud platform, and through an internet portal with controlled access. We found an average of 73.8 de novo single nucleotide variants and 12.6 de novo insertion/deletions (indels) or copy number variations (CNVs) per ASD subject. We identified 18 new candidate ASD-risk genes such as MED13 and PHF3, and found that participants bearing mutations in susceptibility genes had significantly lower adaptive ability (p=6×10−4). In 294/2,620 (11.2%) of ASD cases, a molecular basis could be determined and 7.2% of these carried CNV/chromosomal abnormalities, emphasizing the importance of detecting all forms of genetic variation as diagnostic and therapeutic targets in ASD. PMID:28263302

  19. Comparative genome analysis of clostridium perfringens isolates from healthy and necrotic enteritis infected poultry and diseased pigs

    DEFF Research Database (Denmark)

    Ronco, Troels; Lyhs, Ulrike; Stegger, Marc

    2015-01-01

    to be important for the development of NE in chickens and piglets, respectively, while the role of these toxins is less well elucidated in diseased turkeys. Methods: We carried out comparative genomic analysis of 40 C. perfringens genomes from healthy and NE-suffering chickens and turkeys, and diseased pigs using......B, NELoc-1 and -3 seem to play an important role in the NE pathogenesis in chickens, whereas cpb2 is important in diseased pigs. • The VirSR two-component system is involved in regulating NE-associated virulence genes. • Conjugative plasmid genes are widely spread among C. perfringens. • WGS is a powerful...

  20. Rapid and Easy In Silico Serotyping of Escherichia coli Isolates by Use of Whole-Genome Sequencing Data

    DEFF Research Database (Denmark)

    Joensen, Katrine Grimstrup; Tetzschner, Anna M. M.; Iguchi, Atsushi

    2015-01-01

    typing and surveillance. The aim of this study was to establish a valid and publicly available tool for WGS-based in silico serotyping of E. coli applicable for routine typing and surveillance. A FASTA database of specific O-antigen processing system genes for O typing and flagellin genes for H typing...... tool. SerotypeFinder was evaluated on 682 E. coli genomes, 108 of which were sequenced for this study, where both the whole genome and the serotype were available. In total, 601 and 509 isolates were included for O and H typing, respectively. The O-antigen genes wzx, wzy, wzm, and wzt and the flagellin...

  1. Bacterial whole genome-based phylogeny: construction of a new benchmarking dataset and assessment of some existing methods.

    Science.gov (United States)

    Ahrenfeldt, Johanne; Skaarup, Carina; Hasman, Henrik; Pedersen, Anders Gorm; Aarestrup, Frank Møller; Lund, Ole

    2017-01-05

    Whole genome sequencing (WGS) is increasingly used in diagnostics and surveillance of infectious diseases. A major application for WGS is to use the data for identifying outbreak clusters, and there is therefore a need for methods that can accurately and efficiently infer phylogenies from sequencing reads. In the present study we describe a new dataset that we have created for the purpose of benchmarking such WGS-based methods for epidemiological data, and also present an analysis where we use the data to compare the performance of some current methods. Our aim was to create a benchmark data set that mimics sequencing data of the sort that might be collected during an outbreak of an infectious disease. This was achieved by letting an E. coli hypermutator strain grow in the lab for 8 consecutive days, each day splitting the culture in two while also collecting samples for sequencing. The result is a data set consisting of 101 whole genome sequences with known phylogenetic relationship. Among the sequenced samples 51 correspond to internal nodes in the phylogeny because they are ancestral, while the remaining 50 correspond to leaves. We also used the newly created data set to compare three different online available methods that infer phylogenies from whole-genome sequencing reads: NDtree, CSI Phylogeny and REALPHY. One complication when comparing the output of these methods with the known phylogeny is that phylogenetic methods typically build trees where all observed sequences are placed as leafs, even though some of them are in fact ancestral. We therefore devised a method for post processing the inferred trees by collapsing short branches (thus relocating some leafs to internal nodes), and also present two new measures of tree similarity that takes into account the identity of both internal and leaf nodes. Based on this analysis we find that, among the investigated methods, CSI Phylogeny had the best performance, correctly identifying 73% of all branches in the

  2. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.

    Science.gov (United States)

    Chin, Chen-Shan; Alexander, David H; Marks, Patrick; Klammer, Aaron A; Drake, James; Heiner, Cheryl; Clum, Alicia; Copeland, Alex; Huddleston, John; Eichler, Evan E; Turner, Stephen W; Korlach, Jonas

    2013-06-01

    We present a hierarchical genome-assembly process (HGAP) for high-quality de novo microbial genome assemblies using only a single, long-insert shotgun DNA library in conjunction with Single Molecule, Real-Time (SMRT) DNA sequencing. Our method uses the longest reads as seeds to recruit all other reads for construction of highly accurate preassembled reads through a directed acyclic graph-based consensus procedure, which we follow with assembly using off-the-shelf long-read assemblers. In contrast to hybrid approaches, HGAP does not require highly accurate raw reads for error correction. We demonstrate efficient genome assembly for several microorganisms using as few as three SMRT Cell zero-mode waveguide arrays of sequencing and for BACs using just one SMRT Cell. Long repeat regions can be successfully resolved with this workflow. We also describe a consensus algorithm that incorporates SMRT sequencing primary quality values to produce de novo genome sequence exceeding 99.999% accuracy.

  3. Development and characterization of genomic SSR markers for Anneslea fragrans (Pentaphylacaceae).

    Science.gov (United States)

    Sun, Lijing; Meng, Kaikai; Liao, Boyong; Li, Chunmei; Zhang, Yue; Liao, Wenbo; Chen, Sufang

    2017-10-01

    The genus Anneslea (Pentaphylacaceae) contains four species and six varieties, most of which are locally endemic. Here, simple sequence repeat (SSR) markers were developed for the conservation of these species. The genome of A. fragrans was sequenced and de novo assembled into 445,162 contigs, of which 30,409 SSR loci were detected. Primers for 100 SSR loci were validated with PCR amplification in three populations of A. fragrans . Seventy-nine loci successfully amplified, and 30 were polymorphic. The mean number of alleles, observed heterozygosity, and expected heterozygosity were 7.01 ± 1.60, 0.817 ± 0.241, and 0.796 ± 0.145, respectively. Most primers could be amplified in Ternstroemia gymnanthera , T. kwangtungensis , and Cleyera pachyphylla . Our study demonstrated that shotgun genome sequencing is an efficient way to develop genomic SSR markers for nonmodel species. These genomic SSR loci will be valuable in population genetic studies in Anneslea and its relatives.

  4. A Snapshot of the Emerging Tomato Genome Sequence

    Directory of Open Access Journals (Sweden)

    Lukas A. Mueller

    2009-03-01

    Full Text Available The genome of tomato ( L. is being sequenced by an international consortium of 10 countries (Korea, China, the United Kingdom, India, the Netherlands, France, Japan, Spain, Italy, and the United States as part of the larger “International Solanaceae Genome Project (SOL: Systems Approach to Diversity and Adaptation” initiative. The tomato genome sequencing project uses an ordered bacterial artificial chromosome (BAC approach to generate a high-quality tomato euchromatic genome sequence for use as a reference genome for the Solanaceae and euasterids. Sequence is deposited at GenBank and at the SOL Genomics Network (SGN. Currently, there are around 1000 BACs finished or in progress, representing more than a third of the projected euchromatic portion of the genome. An annotation effort is also underway by the International Tomato Annotation Group. The expected number of genes in the euchromatin is ∼40,000, based on an estimate from a preliminary annotation of 11% of finished sequence. Here, we present this first snapshot of the emerging tomato genome and its annotation, a short comparison with potato ( L. sequence data, and the tools available for the researchers to exploit this new resource are also presented. In the future, whole-genome shotgun techniques will be combined with the BAC-by-BAC approach to cover the entire tomato genome. The high-quality reference euchromatic tomato sequence is expected to be near completion by 2010.

  5. Intraspecies comparative genomics of three strains of Orientia tsutsugamushi with different antibiotic sensitivity.

    Science.gov (United States)

    Liao, Hsiao-Mei; Chao, Chien-Chung; Lei, Haiyan; Li, Bingjie; Tsai, Shien; Hung, Guo-Chiuan; Ching, Wei-Mei; Lo, Shyh-Ching

    2017-06-01

    We recently reported the genome of Orientia tsutsugamushi (OT) strain Karp (GenBank Accession #: NZ_LYMA00000000.2, https://www.ncbi.nlm.nih.gov/nuccore/NZ_LYMA00000000.2) with > 2 Mb in size through clone-based sequencing and high throughput genomic shotgun sequencing (HTS). The genomes of OT strains AFSC4 and AFSC7 were similarly sequenced by HTS Since strains AFSC4 (GenBank Accession #: NZ_LYMT00000000.1, https://www.ncbi.nlm.nih.gov/nuccore/1035784408) and AFSC7 (GenBank Accession #: NZ_LYMB00000000.1, https://www.ncbi.nlm.nih.gov/nuccore/1035854767) were more resistant to antibiotics than strain Karp, we conducted comparative analysis of the three draft genomes annotated by RAST server aimed to identify possible genetic bases of difference in microbial antibiotic sensitivity. Intraspecies comparative genomics analysis of the three OT strains revealed that two ORFs encoding hypothetical proteins in both strains AFSC4 and AFSC7 are absent in strain Karp.

  6. The Sorghum bicolor genome and the diversification of grasses

    Energy Technology Data Exchange (ETDEWEB)

    Paterson, Andrew H.; Bowers, John E.; Bruggmann, Remy; dubchak, Inna; Grimwood, Jane; Gundlach, Heidrun; Haberer, Georg; Hellsten, Uffe; Mitros, Therese; Poliakov, Alexander; Schmutz, Jeremy; Spannagl, Manuel; Tang, Haibo; Wang, Xiyin; Wicker, Thomas; Bharti, Arvind K.; Chapman, Jarrod; Feltus, F. Alex; Gowik, Udo; Grigoriev, Igor V.; Lyons, Eric; Maher, Christopher A.; Martis, Mihaela; Marechania, Apurva; Otillar, Robert P.; Penning, Bryan W.; Salamov, Asaf. A.; Wang, Yu; Zhang, Lifang; Carpita, Nicholas C.; Freeling, Michael; Gingle, Alan R.; hash, C. Thomas; Keller, Beat; Klein, Patricia; Kresovich, Stephen; McCann, Maureen C.; Ming, Ray; Peterson, Daniel G.; ur-Rahman, Mehboob-; Ware, Doreen; Westhoff, Peter; Mayer, Klaus F. X.; Messing, Joachim; Rokhsar, Daniel S.

    2008-08-20

    Sorghum, an African grass related to sugar cane and maize, is grown for food, feed, fibre and fuel. We present an initial analysis of the approx730-megabase Sorghum bicolor (L.) Moench genome, placing approx98percent of genes in their chromosomal context using whole-genome shotgun sequence validated by genetic, physical and syntenic information. Genetic recombination is largely confined to about one-third of the sorghum genome with gene order and density similar to those of rice. Retrotransposon accumulation in recombinationally recalcitrant heterochromatin explains the approx75percent larger genome size of sorghum compared with rice. Although gene and repetitive DNA distributions have been preserved since palaeopolyploidization approx70 million years ago, most duplicated gene sets lost one member before the sorghum rice divergence. Concerted evolution makes one duplicated chromosomal segment appear to be only a few million years old. About 24percent of genes are grass-specific and 7percent are sorghum-specific. Recent gene and microRNA duplications may contribute to sorghum's drought tolerance.

  7. Pigs in sequence space: A 0.66X coverage pig genome survey based on shotgun sequencing

    DEFF Research Database (Denmark)

    Wernersson, Rasmus; Schierup, M.H.; Jorgensen, F.G.

    2005-01-01

    sequences (0.66X coverage) from the pig genome. The data are hereby released (NCBI Trace repository with center name "SDJVP", and project name "Sino-Danish Pig Genome Project") together with an initial evolutionary analysis. The non-repetitive fraction of the sequences was aligned to the UCSC human...

  8. Shotgun Proteomics and Biomarker Discovery

    Directory of Open Access Journals (Sweden)

    W. Hayes McDonald

    2002-01-01

    Full Text Available Coupling large-scale sequencing projects with the amino acid sequence information that can be gleaned from tandem mass spectrometry (MS/MS has made it much easier to analyze complex mixtures of proteins. The limits of this “shotgun” approach, in which the protein mixture is proteolytically digested before separation, can be further expanded by separating the resulting mixture of peptides prior to MS/MS analysis. Both single dimensional high pressure liquid chromatography (LC and multidimensional LC (LC/LC can be directly interfaced with the mass spectrometer to allow for automated collection of tremendous quantities of data. While there is no single technique that addresses all proteomic challenges, the shotgun approaches, especially LC/LC-MS/MS-based techniques such as MudPIT (multidimensional protein identification technology, show advantages over gel-based techniques in speed, sensitivity, scope of analysis, and dynamic range. Advances in the ability to quantitate differences between samples and to detect for an array of post-translational modifications allow for the discovery of classes of protein biomarkers that were previously unassailable.

  9. Global analysis of the yeast lipidome by quantitative shotgun mass spectrometry

    DEFF Research Database (Denmark)

    Ejsing, Christer S.; Sampaio, Julio L; Surendranath, Vineeth

    2009-01-01

    95% coverage of the yeast lipidome achieved with 125-fold improvement in sensitivity compared with previous approaches. Comparative lipidomics demonstrated that growth temperature and defects in lipid biosynthesis induce ripple effects throughout the molecular composition of the yeast lipidome....... This work serves as a resource for molecular characterization of eukaryotic lipidomes, and establishes shotgun lipidomics as a powerful platform for complementing biochemical studies and other systems-level approaches....

  10. Genome sequences of six Phytophthora species associated with forests in New Zealand

    Science.gov (United States)

    Studholme, D.J.; McDougal, R.L.; Sambles, C.; Hansen, E.; Hardy, G.; Grant, M.; Ganley, R.J.; Williams, N.M.

    2015-01-01

    In New Zealand there has been a long association of Phytophthora diseases in forests, nurseries, remnant plantings and horticultural crops. However, new Phytophthora diseases of trees have recently emerged. Genome sequencing has been performed for 12 Phytophthora isolates, from six species: Phytophthora pluvialis, Phytophthora kernoviae, Phytophthora cinnamomi, Phytophthora agathidicida, Phytophthora multivora and Phytophthora taxon Totara. These sequences will enable comparative analyses to identify potential virulence strategies and ultimately facilitate better control strategies. This Whole Genome Shotgun data have been deposited in DDBJ/ENA/GenBank under the accession numbers LGTT00000000, LGTU00000000, JPWV00000000, JPWU00000000, LGSK00000000, LGSJ00000000, LGTR00000000, LGTS00000000, LGSM00000000, LGSL00000000, LGSO00000000, and LGSN00000000. PMID:26981359

  11. Marked microevolution of a unique Mycobacterium tuberculosis strain in 17 years of ongoing transmission in a high risk population.

    Directory of Open Access Journals (Sweden)

    Carolina Mehaffy

    Full Text Available The transmission and persistence of Mycobacterium tuberculosis within high risk populations is a threat to tuberculosis (TB control. In the current study, we used whole genome sequencing (WGS to decipher the transmission dynamics and microevolution of M. tuberculosis ON-A, an endemic strain responsible for an ongoing outbreak of TB in an urban homeless/under-housed population. Sixty-one M. tuberculosis isolates representing 57 TB cases from 1997 to 2013 were subjected to WGS. Sequencing data was integrated with available epidemiological information and analyzed to determine how the M. tuberculosis ON-A strain has evolved during almost two decades of active transmission. WGS offers higher discriminatory power than traditional genotyping techniques, dividing the M. tuberculosis ON-A strain into 6 sub-clusters, each defined by unique single nucleotide polymorphism profiles. One sub-cluster, designated ON-ANM (Natural Mutant; 26 isolates from 24 cases was also defined by a large, 15 kb genomic deletion. WGS analysis reveals the existence of multiple transmission chains within the same population/setting. Our results help validate the utility of WGS as a powerful tool for identifying genomic changes and adaptation of M. tuberculosis.

  12. Neolithic and Medieval virus genomes reveal complex evolution of Hepatitis B.

    Science.gov (United States)

    Krause-Kyora, Ben; Susat, Julian; Key, Felix M; Kühnert, Denise; Bosse, Esther; Immel, Alexander; Rinne, Christoph; Kornell, Sabin-Christin; Yepes, Diego; Franzenburg, Sören; Heyne, Henrike O; Meier, Thomas; Lösch, Sandra; Meller, Harald; Friederich, Susanne; Nicklisch, Nicole; Alt, Kurt W; Schreiber, Stefan; Tholey, Andreas; Herbig, Alexander; Nebel, Almut; Krause, Johannes

    2018-05-10

    The hepatitis B virus (HBV) is one of the most widespread human pathogens known today, yet its origin and evolutionary history are still unclear and controversial. Here, we report the analysis of three ancient HBV genomes recovered from human skeletons found at three different archaeological sites in Germany. We reconstructed two Neolithic and one medieval HBV genomes by de novo assembly from shotgun DNA sequencing data. Additionally, we observed HBV-specific peptides using paleo-proteomics. Our results show that HBV circulates in the European population for at least 7000 years. The Neolithic HBV genomes show a high genomic similarity to each other. In a phylogenetic network, they do not group with any human-associated HBV genome and are most closely related to those infecting African non-human primates. These ancient virus forms appear to represent distinct lineages that have no close relatives today and possibly went extinct. Our results reveal the great potential of ancient DNA from human skeletons in order to study the long-time evolution of blood borne viruses. © 2018, Krause-Kyora et al.

  13. Shotgun lipidomic analysis of chemically sulfated sterols compromises analytical sensitivity

    DEFF Research Database (Denmark)

    Casanovas, Albert; Hannibal-Bach, Hans Kristian; Jensen, Ole Nørregaard

    2014-01-01

    Shotgun lipidomics affords comprehensive and quantitative analysis of lipid species in cells and tissues at high-throughput [1 5]. The methodology is based on direct infusion of lipid extracts by electrospray ionization (ESI) combined with tandem mass spectrometry (MS/MS) and/or high resolution F...... low ionization efficiency in ESI [7]. For this reason, chemical derivatization procedures including acetylation [8] or sulfation [9] are commonly implemented to facilitate ionization, detection and quantification of sterols for global lipidome analysis [1-3, 10]....

  14. Global repeat discovery and estimation of genomic copy number in a large, complex genome using a high-throughput 454 sequence survey

    Directory of Open Access Journals (Sweden)

    Varala Kranthi

    2007-05-01

    Full Text Available Abstract Background Extensive computational and database tools are available to mine genomic and genetic databases for model organisms, but little genomic data is available for many species of ecological or agricultural significance, especially those with large genomes. Genome surveys using conventional sequencing techniques are powerful, particularly for detecting sequences present in many copies per genome. However these methods are time-consuming and have potential drawbacks. High throughput 454 sequencing provides an alternative method by which much information can be gained quickly and cheaply from high-coverage surveys of genomic DNA. Results We sequenced 78 million base-pairs of randomly sheared soybean DNA which passed our quality criteria. Computational analysis of the survey sequences provided global information on the abundant repetitive sequences in soybean. The sequence was used to determine the copy number across regions of large genomic clones or contigs and discover higher-order structures within satellite repeats. We have created an annotated, online database of sequences present in multiple copies in the soybean genome. The low bias of pyrosequencing against repeat sequences is demonstrated by the overall composition of the survey data, which matches well with past estimates of repetitive DNA content obtained by DNA re-association kinetics (Cot analysis. Conclusion This approach provides a potential aid to conventional or shotgun genome assembly, by allowing rapid assessment of copy number in any clone or clone-end sequence. In addition, we show that partial sequencing can provide access to partial protein-coding sequences.

  15. GO Explorer: A gene-ontology tool to aid in the interpretation of shotgun proteomics data.

    Science.gov (United States)

    Carvalho, Paulo C; Fischer, Juliana Sg; Chen, Emily I; Domont, Gilberto B; Carvalho, Maria Gc; Degrave, Wim M; Yates, John R; Barbosa, Valmir C

    2009-02-24

    Spectral counting is a shotgun proteomics approach comprising the identification and relative quantitation of thousands of proteins in complex mixtures. However, this strategy generates bewildering amounts of data whose biological interpretation is a challenge. Here we present a new algorithm, termed GO Explorer (GOEx), that leverages the gene ontology (GO) to aid in the interpretation of proteomic data. GOEx stands out because it combines data from protein fold changes with GO over-representation statistics to help draw conclusions. Moreover, it is tightly integrated within the PatternLab for Proteomics project and, thus, lies within a complete computational environment that provides parsers and pattern recognition tools designed for spectral counting. GOEx offers three independent methods to query data: an interactive directed acyclic graph, a specialist mode where key words can be searched, and an automatic search. Its usefulness is demonstrated by applying it to help interpret the effects of perillyl alcohol, a natural chemotherapeutic agent, on glioblastoma multiform cell lines (A172). We used a new multi-surfactant shotgun proteomic strategy and identified more than 2600 proteins; GOEx pinpointed key sets of differentially expressed proteins related to cell cycle, alcohol catabolism, the Ras pathway, apoptosis, and stress response, to name a few. GOEx facilitates organism-specific studies by leveraging GO and providing a rich graphical user interface. It is a simple to use tool, specialized for biologists who wish to analyze spectral counting data from shotgun proteomics. GOEx is available at http://pcarvalho.com/patternlab.

  16. Choosing the best plant for the job: a cost-effective assay to prescreen ancient plant remains destined for shotgun sequencing.

    Directory of Open Access Journals (Sweden)

    Nathan Wales

    Full Text Available DNA extracted from ancient plant remains almost always contains a mixture of endogenous (that is, derived from the plant and exogenous (derived from other sources DNA. The exogenous 'contaminant' DNA, chiefly derived from microorganisms, presents significant problems for shotgun sequencing. In some samples, more than 90% of the recovered sequences are exogenous, providing limited data relevant to the sample. However, other samples have far less contamination and subsequently yield much more useful data via shotgun sequencing. Given the investment required for high-throughput sequencing, whenever multiple samples are available, it is most economical to sequence the least contaminated sample. We present an assay based on quantitative real-time PCR which estimates the relative amounts of fungal and bacterial DNA in a sample in comparison to the endogenous plant DNA. Given a collection of contextually-similar ancient plant samples, this low cost assay aids in selecting the best sample for shotgun sequencing.

  17. Genomic characterization of large heterochromatic gaps in the human genome assembly.

    Directory of Open Access Journals (Sweden)

    Nicolas Altemose

    2014-05-01

    Full Text Available The largest gaps in the human genome assembly correspond to multi-megabase heterochromatic regions composed primarily of two related families of tandem repeats, Human Satellites 2 and 3 (HSat2,3. The abundance of repetitive DNA in these regions challenges standard mapping and assembly algorithms, and as a result, the sequence composition and potential biological functions of these regions remain largely unexplored. Furthermore, existing genomic tools designed to predict consensus-based descriptions of repeat families cannot be readily applied to complex satellite repeats such as HSat2,3, which lack a consistent repeat unit reference sequence. Here we present an alignment-free method to characterize complex satellites using whole-genome shotgun read datasets. Utilizing this approach, we classify HSat2,3 sequences into fourteen subfamilies and predict their chromosomal distributions, resulting in a comprehensive satellite reference database to further enable genomic studies of heterochromatic regions. We also identify 1.3 Mb of non-repetitive sequence interspersed with HSat2,3 across 17 unmapped assembly scaffolds, including eight annotated gene predictions. Finally, we apply our satellite reference database to high-throughput sequence data from 396 males to estimate array size variation of the predominant HSat3 array on the Y chromosome, confirming that satellite array sizes can vary between individuals over an order of magnitude (7 to 98 Mb and further demonstrating that array sizes are distributed differently within distinct Y haplogroups. In summary, we present a novel framework for generating initial reference databases for unassembled genomic regions enriched with complex satellite DNA, and we further demonstrate the utility of these reference databases for studying patterns of sequence variation within human populations.

  18. Shotgun glycomics of pig lung identifies natural endogenous receptors for influenza viruses.

    Science.gov (United States)

    Byrd-Leotis, Lauren; Liu, Renpeng; Bradley, Konrad C; Lasanajak, Yi; Cummings, Sandra F; Song, Xuezheng; Heimburg-Molinaro, Jamie; Galloway, Summer E; Culhane, Marie R; Smith, David F; Steinhauer, David A; Cummings, Richard D

    2014-06-03

    Influenza viruses bind to host cell surface glycans containing terminal sialic acids, but as studies on influenza binding become more sophisticated, it is becoming evident that although sialic acid may be necessary, it is not sufficient for productive binding. To better define endogenous glycans that serve as viral receptors, we have explored glycan recognition in the pig lung, because influenza is broadly disseminated in swine, and swine have been postulated as an intermediary host for the emergence of pandemic strains. For these studies, we used the technology of "shotgun glycomics" to identify natural receptor glycans. The total released N- and O-glycans from pig lung glycoproteins and glycolipid-derived glycans were fluorescently tagged and separated by multidimensional HPLC, and individual glycans were covalently printed to generate pig lung shotgun glycan microarrays. All viruses tested interacted with one or more sialylated N-glycans but not O-glycans or glycolipid-derived glycans, and each virus demonstrated novel and unexpected differences in endogenous N-glycan recognition. The results illustrate the repertoire of specific, endogenous N-glycans of pig lung glycoproteins for virus recognition and offer a new direction for studying endogenous glycan functions in viral pathogenesis.

  19. Functional regression method for whole genome eQTL epistasis analysis with sequencing data.

    Science.gov (United States)

    Xu, Kelin; Jin, Li; Xiong, Momiao

    2017-05-18

    Epistasis plays an essential rule in understanding the regulation mechanisms and is an essential component of the genetic architecture of the gene expressions. However, interaction analysis of gene expressions remains fundamentally unexplored due to great computational challenges and data availability. Due to variation in splicing, transcription start sites, polyadenylation sites, post-transcriptional RNA editing across the entire gene, and transcription rates of the cells, RNA-seq measurements generate large expression variability and collectively create the observed position level read count curves. A single number for measuring gene expression which is widely used for microarray measured gene expression analysis is highly unlikely to sufficiently account for large expression variation across the gene. Simultaneously analyzing epistatic architecture using the RNA-seq and whole genome sequencing (WGS) data poses enormous challenges. We develop a nonlinear functional regression model (FRGM) with functional responses where the position-level read counts within a gene are taken as a function of genomic position, and functional predictors where genotype profiles are viewed as a function of genomic position, for epistasis analysis with RNA-seq data. Instead of testing the interaction of all possible pair-wises SNPs, the FRGM takes a gene as a basic unit for epistasis analysis, which tests for the interaction of all possible pairs of genes and use all the information that can be accessed to collectively test interaction between all possible pairs of SNPs within two genome regions. By large-scale simulations, we demonstrate that the proposed FRGM for epistasis analysis can achieve the correct type 1 error and has higher power to detect the interactions between genes than the existing methods. The proposed methods are applied to the RNA-seq and WGS data from the 1000 Genome Project. The numbers of pairs of significantly interacting genes after Bonferroni correction

  20. Population Genomics of Francisella tularensis subsp. holarctica and its Implication on the Eco-Epidemiology of Tularemia in Switzerland.

    Science.gov (United States)

    Wittwer, Matthias; Altpeter, Ekkehard; Pilo, Paola; Gygli, Sebastian M; Beuret, Christian; Foucault, Frederic; Ackermann-Gäumann, Rahel; Karrer, Urs; Jacob, Daniela; Grunow, Roland; Schürch, Nadia

    2018-01-01

    Whole genome sequencing (WGS) methods provide new possibilities in the field of molecular epidemiology. This is particularly true for monomorphic organisms where the discriminatory power of traditional methods (e.g., restriction enzyme length polymorphism typing, multi locus sequence typing etc.) is inadequate to elucidate complex disease transmission patterns, as well as resolving the phylogeny at high resolution on a micro-geographic scale. In this study, we present insights into the population structure of Francisella tularensis subsp. holarctica , the causative agent of tularemia in Switzerland. A total of 59 Fth isolates were obtained from castor bean ticks ( Ixodes ricinus) , animals and humans and a high resolution phylogeny was inferred using WGS methods. The majority of the Fth population in Switzerland belongs to the west European B.11 clade and shows an extraordinary genetic diversity underlining the old evolutionary history of the pathogen in the alpine region. Moreover, a new B.11 subclade was identified which was not described so far. The combined analysis of the epidemiological data of human tularemia cases with the whole genome sequences of the 59 isolates provide evidence that ticks play a pivotal role in transmitting Fth to humans and other vertebrates in Switzerland. This is further underlined by the correlation of disease risk estimates with climatic and ecological factors influencing the survival of ticks.

  1. Population Genomics of Francisella tularensis subsp. holarctica and its Implication on the Eco-Epidemiology of Tularemia in Switzerland

    Science.gov (United States)

    Wittwer, Matthias; Altpeter, Ekkehard; Pilo, Paola; Gygli, Sebastian M.; Beuret, Christian; Foucault, Frederic; Ackermann-Gäumann, Rahel; Karrer, Urs; Jacob, Daniela; Grunow, Roland; Schürch, Nadia

    2018-01-01

    Whole genome sequencing (WGS) methods provide new possibilities in the field of molecular epidemiology. This is particularly true for monomorphic organisms where the discriminatory power of traditional methods (e.g., restriction enzyme length polymorphism typing, multi locus sequence typing etc.) is inadequate to elucidate complex disease transmission patterns, as well as resolving the phylogeny at high resolution on a micro-geographic scale. In this study, we present insights into the population structure of Francisella tularensis subsp. holarctica, the causative agent of tularemia in Switzerland. A total of 59 Fth isolates were obtained from castor bean ticks (Ixodes ricinus), animals and humans and a high resolution phylogeny was inferred using WGS methods. The majority of the Fth population in Switzerland belongs to the west European B.11 clade and shows an extraordinary genetic diversity underlining the old evolutionary history of the pathogen in the alpine region. Moreover, a new B.11 subclade was identified which was not described so far. The combined analysis of the epidemiological data of human tularemia cases with the whole genome sequences of the 59 isolates provide evidence that ticks play a pivotal role in transmitting Fth to humans and other vertebrates in Switzerland. This is further underlined by the correlation of disease risk estimates with climatic and ecological factors influencing the survival of ticks. PMID:29623260

  2. GO Explorer: A gene-ontology tool to aid in the interpretation of shotgun proteomics data

    Directory of Open Access Journals (Sweden)

    Domont Gilberto B

    2009-02-01

    Full Text Available Abstract Background Spectral counting is a shotgun proteomics approach comprising the identification and relative quantitation of thousands of proteins in complex mixtures. However, this strategy generates bewildering amounts of data whose biological interpretation is a challenge. Results Here we present a new algorithm, termed GO Explorer (GOEx, that leverages the gene ontology (GO to aid in the interpretation of proteomic data. GOEx stands out because it combines data from protein fold changes with GO over-representation statistics to help draw conclusions. Moreover, it is tightly integrated within the PatternLab for Proteomics project and, thus, lies within a complete computational environment that provides parsers and pattern recognition tools designed for spectral counting. GOEx offers three independent methods to query data: an interactive directed acyclic graph, a specialist mode where key words can be searched, and an automatic search. Its usefulness is demonstrated by applying it to help interpret the effects of perillyl alcohol, a natural chemotherapeutic agent, on glioblastoma multiform cell lines (A172. We used a new multi-surfactant shotgun proteomic strategy and identified more than 2600 proteins; GOEx pinpointed key sets of differentially expressed proteins related to cell cycle, alcohol catabolism, the Ras pathway, apoptosis, and stress response, to name a few. Conclusion GOEx facilitates organism-specific studies by leveraging GO and providing a rich graphical user interface. It is a simple to use tool, specialized for biologists who wish to analyze spectral counting data from shotgun proteomics. GOEx is available at http://pcarvalho.com/patternlab.

  3. Whole Genome Sequencing for Genomics-Guided Investigations of Escherichia coli O157:H7 Outbreaks.

    Science.gov (United States)

    Rusconi, Brigida; Sanjar, Fatemeh; Koenig, Sara S K; Mammel, Mark K; Tarr, Phillip I; Eppinger, Mark

    2016-01-01

    Multi isolate whole genome sequencing (WGS) and typing for outbreak investigations has become a reality in the post-genomics era. We applied this technology to strains from Escherichia coli O157:H7 outbreaks. These include isolates from seven North America outbreaks, as well as multiple isolates from the same patient and from different infected individuals in the same household. Customized high-resolution bioinformatics sequence typing strategies were developed to assess the core genome and mobilome plasticity. Sequence typing was performed using an in-house single nucleotide polymorphism (SNP) discovery and validation pipeline. Discriminatory power becomes of particular importance for the investigation of isolates from outbreaks in which macrogenomic techniques such as pulse-field gel electrophoresis or multiple locus variable number tandem repeat analysis do not differentiate closely related organisms. We also characterized differences in the phage inventory, allowing us to identify plasticity among outbreak strains that is not detectable at the core genome level. Our comprehensive analysis of the mobilome identified multiple plasmids that have not previously been associated with this lineage. Applied phylogenomics approaches provide strong molecular evidence for exceptionally little heterogeneity of strains within outbreaks and demonstrate the value of intra-cluster comparisons, rather than basing the analysis on archetypal reference strains. Next generation sequencing and whole genome typing strategies provide the technological foundation for genomic epidemiology outbreak investigation utilizing its significantly higher sample throughput, cost efficiency, and phylogenetic relatedness accuracy. These phylogenomics approaches have major public health relevance in translating information from the sequence-based survey to support timely and informed countermeasures. Polymorphisms identified in this work offer robust phylogenetic signals that index both short- and

  4. Automated and Accurate Estimation of Gene Family Abundance from Shotgun Metagenomes.

    Directory of Open Access Journals (Sweden)

    Stephen Nayfach

    2015-11-01

    Full Text Available Shotgun metagenomic DNA sequencing is a widely applicable tool for characterizing the functions that are encoded by microbial communities. Several bioinformatic tools can be used to functionally annotate metagenomes, allowing researchers to draw inferences about the functional potential of the community and to identify putative functional biomarkers. However, little is known about how decisions made during annotation affect the reliability of the results. Here, we use statistical simulations to rigorously assess how to optimize annotation accuracy and speed, given parameters of the input data like read length and library size. We identify best practices in metagenome annotation and use them to guide the development of the Shotgun Metagenome Annotation Pipeline (ShotMAP. ShotMAP is an analytically flexible, end-to-end annotation pipeline that can be implemented either on a local computer or a cloud compute cluster. We use ShotMAP to assess how different annotation databases impact the interpretation of how marine metagenome and metatranscriptome functional capacity changes across seasons. We also apply ShotMAP to data obtained from a clinical microbiome investigation of inflammatory bowel disease. This analysis finds that gut microbiota collected from Crohn's disease patients are functionally distinct from gut microbiota collected from either ulcerative colitis patients or healthy controls, with differential abundance of metabolic pathways related to host-microbiome interactions that may serve as putative biomarkers of disease.

  5. Serendipitous discovery of Wolbachia genomes in multiple Drosophila species.

    Science.gov (United States)

    Salzberg, Steven L; Dunning Hotopp, Julie C; Delcher, Arthur L; Pop, Mihai; Smith, Douglas R; Eisen, Michael B; Nelson, William C

    2005-01-01

    The Trace Archive is a repository for the raw, unanalyzed data generated by large-scale genome sequencing projects. The existence of this data offers scientists the possibility of discovering additional genomic sequences beyond those originally sequenced. In particular, if the source DNA for a sequencing project came from a species that was colonized by another organism, then the project may yield substantial amounts of genomic DNA, including near-complete genomes, from the symbiotic or parasitic organism. By searching the publicly available repository of DNA sequencing trace data, we discovered three new species of the bacterial endosymbiont Wolbachia pipientis in three different species of fruit fly: Drosophila ananassae, D. simulans, and D. mojavensis. We extracted all sequences with partial matches to a previously sequenced Wolbachia strain and assembled those sequences using customized software. For one of the three new species, the data recovered were sufficient to produce an assembly that covers more than 95% of the genome; for a second species the data produce the equivalent of a 'light shotgun' sampling of the genome, covering an estimated 75-80% of the genome; and for the third species the data cover approximately 6-7% of the genome. The results of this study reveal an unexpected benefit of depositing raw data in a central genome sequence repository: new species can be discovered within this data. The differences between these three new Wolbachia genomes and the previously sequenced strain revealed numerous rearrangements and insertions within each lineage and hundreds of novel genes. The three new genomes, with annotation, have been deposited in GenBank.

  6. Evaluation of a Stenotrophomonas maltophilia bacteremia cluster in hematopoietic stem cell transplantation recipients using whole genome sequencing

    Directory of Open Access Journals (Sweden)

    Stefanie Kampmeier

    2017-11-01

    Full Text Available Abstract Background Stenotrophomonas maltophilia ubiquitously occurs in the hospital environment. This opportunistic pathogen can cause severe infections in immunocompromised hosts such as hematopoietic stem cell transplantation (HSCT recipients. Between February and July 2016, a cluster of four patients on the HSCT unit suffered from S. maltophilia bloodstream infections (BSI. Methods For epidemiological investigation we retrospectively identified the colonization status of patients admitted to the ward during this time period and performed environmental monitoring of shower heads, shower outlets, washbasins and toilets in patient rooms. We tested antibiotic susceptibility of detected S. maltophilia isolates. Environmental and blood culture samples were subjected to whole genome sequence (WGS-based typing. Results Of four patients with S. maltophlilia BSI, three were found to be colonized previously. In addition, retrospective investigations revealed two patients being colonized in anal swab samples but not infected. Environmental monitoring revealed one shower outlet contaminated with S. maltophilia. Antibiotic susceptibility testing of seven S. maltophlia strains resulted in two trimethoprim/sulfamethoxazole resistant and five susceptible isolates, however, not excluding an outbreak scenario. WGS-based typing did not result in any close genotypic relationship among the patients’ isolates. In contrast, one environmental isolate from a shower outlet was closely related to a single patient’s isolate. Conclusion WGS-based typing successfully refuted an outbreak of S. maltophilia on a HSCT ward but uncoverd that sanitary installations can be an actual source of S. maltophilia transmissions.

  7. Genome sequence of the olive tree, Olea europaea.

    Science.gov (United States)

    Cruz, Fernando; Julca, Irene; Gómez-Garrido, Jèssica; Loska, Damian; Marcet-Houben, Marina; Cano, Emilio; Galán, Beatriz; Frias, Leonor; Ribeca, Paolo; Derdak, Sophia; Gut, Marta; Sánchez-Fernández, Manuel; García, Jose Luis; Gut, Ivo G; Vargas, Pablo; Alioto, Tyler S; Gabaldón, Toni

    2016-06-27

    The Mediterranean olive tree (Olea europaea subsp. europaea) was one of the first trees to be domesticated and is currently of major agricultural importance in the Mediterranean region as the source of olive oil. The molecular bases underlying the phenotypic differences among domesticated cultivars, or between domesticated olive trees and their wild relatives, remain poorly understood. Both wild and cultivated olive trees have 46 chromosomes (2n). A total of 543 Gb of raw DNA sequence from whole genome shotgun sequencing, and a fosmid library containing 155,000 clones from a 1,000+ year-old olive tree (cv. Farga) were generated by Illumina sequencing using different combinations of mate-pair and pair-end libraries. Assembly gave a final genome with a scaffold N50 of 443 kb, and a total length of 1.31 Gb, which represents 95 % of the estimated genome length (1.38 Gb). In addition, the associated fungus Aureobasidium pullulans was partially sequenced. Genome annotation, assisted by RNA sequencing from leaf, root, and fruit tissues at various stages, resulted in 56,349 unique protein coding genes, suggesting recent genomic expansion. Genome completeness, as estimated using the CEGMA pipeline, reached 98.79 %. The assembled draft genome of O. europaea will provide a valuable resource for the study of the evolution and domestication processes of this important tree, and allow determination of the genetic bases of key phenotypic traits. Moreover, it will enhance breeding programs and the formation of new varieties.

  8. Large pore dermal microdialysis and liquid chromatography-tandem mass spectroscopy shotgun proteomic analysis: a feasibility study.

    Science.gov (United States)

    Petersen, Lars J; Sørensen, Mette A; Codrea, Marius C; Zacho, Helle D; Bendixen, Emøke

    2013-11-01

    The purpose of the present pilot study was to investigate the feasibility of combining large pore dermal microdialysis with shotgun proteomic analysis in human skin. Dialysate was recovered from human skin by 2000 kDa microdialysis membranes from one subject at three different phases of the study; trauma due to implantation of the dialysis device, a post implantation steady-state period, and after induction of vasodilatation and plasma extravasation. For shotgun proteomics, the proteins were extracted and digested with trypsin. Peptides were separated by capillary and nanoflow HPLC systems, followed by tandem mass spectrometry (MS/MS) on a Quadrupole-TOF hybrid instrument. The MS/MS spectra were merged and mapped to a human target protein database to achieve peptide identification and protein inference. Results showed variation in protein amounts and profiles for each of the different sampling phases. The total protein concentration was 1.7, 0.6, and 1.3 mg/mL during the three phases, respectively. A total of 158 different proteins were identified. Immunoglobulins and the major classes of plasma proteins, including proteases, coagulation factors, apolipoproteins, albumins, and complement factors, make up the major load of proteins in all three test conditions. Shotgun proteomics allowed the identification of more than 150 proteins in microdialysis samples from human skin. This highlights the opportunities of LC-MS/MS to study the complex molecular interactions in the skin. © 2013 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  9. Improvement of a sample preparation method assisted by sodium deoxycholate for mass-spectrometry-based shotgun membrane proteomics.

    Science.gov (United States)

    Lin, Yong; Lin, Haiyan; Liu, Zhonghua; Wang, Kunbo; Yan, Yujun

    2014-11-01

    In current shotgun-proteomics-based biological discovery, the identification of membrane proteins is a challenge. This is especially true for integral membrane proteins due to their highly hydrophobic nature and low abundance. Thus, much effort has been directed at sample preparation strategies such as use of detergents, chaotropes, and organic solvents. We previously described a sample preparation method for shotgun membrane proteomics, the sodium deoxycholate assisted method, which cleverly circumvents many of the challenges associated with traditional sample preparation methods. However, the method is associated with significant sample loss due to the slightly weaker extraction/solubilization ability of sodium deoxycholate when it is used at relatively low concentrations such as 1%. Hence, we present an enhanced sodium deoxycholate sample preparation strategy that first uses a high concentration of sodium deoxycholate (5%) to lyse membranes and extract/solubilize hydrophobic membrane proteins, and then dilutes the detergent to 1% for a more efficient digestion. We then applied the improved method to shotgun analysis of proteins from rat liver membrane enriched fraction. Compared with other representative sample preparation strategies including our previous sodium deoxycholate assisted method, the enhanced sodium deoxycholate method exhibited superior sensitivity, coverage, and reliability for the identification of membrane proteins particularly those with high hydrophobicity and/or multiple transmembrane domains. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  10. Degenerate Quadtree Latitude/Longitude Grid Based on WGS-84 Ellipsoidal Facet

    Directory of Open Access Journals (Sweden)

    HU Bailin

    2016-12-01

    Full Text Available For the needs of digital earth development and solving many global problems, a new discrete global grid system-DQLLG (degenerate quadtree latitude/longitude grid was put forward, which was based on WGS-84 ellipsoidal facet. The hierarchical subdivision method, characteristics and grid column/row coordinate system were detailed. The Latitude/Longitude coordinate, area and side length of multi-resolution meshes on different subdivision levels were calculated. Then the changes of mesh areas and side lengths were analyzed and compared that with spherical DQLLG. The research indicates that the DQLLG had many excellent features:uniformity, hierarchy, consistency of direction, extensive data compatibility and so on. It has certain practicality for Global GIS in the future.

  11. Outbreak of Invasive Wound Mucormycosis in a Burn Unit Due to Multiple Strains of Mucor circinelloides f. circinelloides Resolved by Whole-Genome Sequencing

    Directory of Open Access Journals (Sweden)

    Dea Garcia-Hermoso

    2018-04-01

    Full Text Available Mucorales are ubiquitous environmental molds responsible for mucormycosis in diabetic, immunocompromised, and severely burned patients. Small outbreaks of invasive wound mucormycosis (IWM have already been reported in burn units without extensive microbiological investigations. We faced an outbreak of IWM in our center and investigated the clinical isolates with whole-genome sequencing (WGS analysis. We analyzed M. circinelloides isolates from patients in our burn unit (BU1, Héééôèéééûéôèôèôôèéôééôéôôèôpital Saint-Louis, Paris, France together with nonoutbreak isolates from Burn Unit 2 (BU2, Paris area and from France over a 2-year period (2013 to 2015. A total of 21 isolates, including 14 isolates from six BU1 patients, were analyzed by whole-genome sequencing (WGS. Phylogenetic classification based on de novo assembly and assembly free approaches showed that the clinical isolates clustered in four highly divergent clades. Clade 1 contained at least one of the strains from the six epidemiologically linked BU1 patients. The clinical isolates were specific to each patient. Two patients were infected with more than two strains from different clades, suggesting that an environmental reservoir of clonally unrelated isolates was the source of contamination. Only two patients from BU1 shared one strain, which could correspond to direct transmission or contamination with the same environmental source. In conclusion, WGS of several isolates per patients coupled with precise epidemiological data revealed a complex situation combining potential cross-transmission between patients and multiple contaminations with a heterogeneous pool of strains from a cryptic environmental reservoir.

  12. A universal protocol to generate consensus level genome sequences for foot-and-mouth disease virus and other positive-sense polyadenylated RNA viruses using the Illumina MiSeq.

    Science.gov (United States)

    Logan, Grace; Freimanis, Graham L; King, David J; Valdazo-González, Begoña; Bachanek-Bankowska, Katarzyna; Sanderson, Nicholas D; Knowles, Nick J; King, Donald P; Cottam, Eleanor M

    2014-09-30

    Next-Generation Sequencing (NGS) is revolutionizing molecular epidemiology by providing new approaches to undertake whole genome sequencing (WGS) in diagnostic settings for a variety of human and veterinary pathogens. Previous sequencing protocols have been subject to biases such as those encountered during PCR amplification and cell culture, or are restricted by the need for large quantities of starting material. We describe here a simple and robust methodology for the generation of whole genome sequences on the Illumina MiSeq. This protocol is specific for foot-and-mouth disease virus (FMDV) or other polyadenylated RNA viruses and circumvents both the use of PCR and the requirement for large amounts of initial template. The protocol was successfully validated using five FMDV positive clinical samples from the 2001 epidemic in the United Kingdom, as well as a panel of representative viruses from all seven serotypes. In addition, this protocol was successfully used to recover 94% of an FMDV genome that had previously been identified as cell culture negative. Genome sequences from three other non-FMDV polyadenylated RNA viruses (EMCV, ERAV, VESV) were also obtained with minor protocol amendments. We calculated that a minimum coverage depth of 22 reads was required to produce an accurate consensus sequence for FMDV O. This was achieved in 5 FMDV/O/UKG isolates and the type O FMDV from the serotype panel with the exception of the 5' genomic termini and area immediately flanking the poly(C) region. We have developed a universal WGS method for FMDV and other polyadenylated RNA viruses. This method works successfully from a limited quantity of starting material and eliminates the requirement for genome-specific PCR amplification. This protocol has the potential to generate consensus-level sequences within a routine high-throughput diagnostic environment.

  13. An inventory of the Aspergillus niger secretome by combining in silico predictions with shotgun proteomics data

    Directory of Open Access Journals (Sweden)

    Martens-Uzunova Elena S

    2010-10-01

    Full Text Available Abstract Background The ecological niche occupied by a fungal species, its pathogenicity and its usefulness as a microbial cell factory to a large degree depends on its secretome. Protein secretion usually requires the presence of a N-terminal signal peptide (SP and by scanning for this feature using available highly accurate SP-prediction tools, the fraction of potentially secreted proteins can be directly predicted. However, prediction of a SP does not guarantee that the protein is actually secreted and current in silico prediction methods suffer from gene-model errors introduced during genome annotation. Results A majority rule based classifier that also evaluates signal peptide predictions from the best homologs of three neighbouring Aspergillus species was developed to create an improved list of potential signal peptide containing proteins encoded by the Aspergillus niger genome. As a complement to these in silico predictions, the secretome associated with growth and upon carbon source depletion was determined using a shotgun proteomics approach. Overall, some 200 proteins with a predicted signal peptide were identified to be secreted proteins. Concordant changes in the secretome state were observed as a response to changes in growth/culture conditions. Additionally, two proteins secreted via a non-classical route operating in A. niger were identified. Conclusions We were able to improve the in silico inventory of A. niger secretory proteins by combining different gene-model predictions from neighbouring Aspergilli and thereby avoiding prediction conflicts associated with inaccurate gene-models. The expected accuracy of signal peptide prediction for proteins that lack homologous sequences in the proteomes of related species is 85%. An experimental validation of the predicted proteome confirmed in silico predictions.

  14. An inventory of the Aspergillus niger secretome by combining in silico predictions with shotgun proteomics data.

    Science.gov (United States)

    Braaksma, Machtelt; Martens-Uzunova, Elena S; Punt, Peter J; Schaap, Peter J

    2010-10-19

    The ecological niche occupied by a fungal species, its pathogenicity and its usefulness as a microbial cell factory to a large degree depends on its secretome. Protein secretion usually requires the presence of a N-terminal signal peptide (SP) and by scanning for this feature using available highly accurate SP-prediction tools, the fraction of potentially secreted proteins can be directly predicted. However, prediction of a SP does not guarantee that the protein is actually secreted and current in silico prediction methods suffer from gene-model errors introduced during genome annotation. A majority rule based classifier that also evaluates signal peptide predictions from the best homologs of three neighbouring Aspergillus species was developed to create an improved list of potential signal peptide containing proteins encoded by the Aspergillus niger genome. As a complement to these in silico predictions, the secretome associated with growth and upon carbon source depletion was determined using a shotgun proteomics approach. Overall, some 200 proteins with a predicted signal peptide were identified to be secreted proteins. Concordant changes in the secretome state were observed as a response to changes in growth/culture conditions. Additionally, two proteins secreted via a non-classical route operating in A. niger were identified. We were able to improve the in silico inventory of A. niger secretory proteins by combining different gene-model predictions from neighbouring Aspergilli and thereby avoiding prediction conflicts associated with inaccurate gene-models. The expected accuracy of signal peptide prediction for proteins that lack homologous sequences in the proteomes of related species is 85%. An experimental validation of the predicted proteome confirmed in silico predictions.

  15. Responses of intestinal virome to silver nanoparticles: safety assessment by classical virology, whole-genome sequencing and bioinformatics approaches

    Directory of Open Access Journals (Sweden)

    Gokulan K

    2018-05-01

    Full Text Available Kuppan Gokulan,1,* Aschalew Z Bekele,1,* Kenneth L Drake,2 Sangeeta Khare1 1Division of Microbiology, US Food and Drug Administration, National Center for Toxicological Research, Jefferson, AR, USA; 2Seralogix, Inc., Austin, TX, USA *These authors contributed equally to this work Background: Effects of silver nanoparticles (AgNP on the intestinal virome/phage community are mostly unknown. The working hypothesis of this study was that the exposure of pharmaceutical/nanomedicine and other consumer-use material containing silver ions and nanoparticles to the gastrointestinal tract may result in disturbance of the beneficial gut viruses/phages. Methods: This study assesses the impact of AgNP on the survival of individual bacteriophages using classical virology cultivation and electron microscopic techniques. Moreover, how the ingested AgNP may affect the intestinal virus/phages was investigated by conducting whole-genome sequencing (WGS. Results: The viral cultivation methods showed minimal effect on selected viruses during short-term exposure (24 h to 10 nm AgNP. However, long-term exposure (7 days resulted in significant reduction in the viral/phage population. Data obtained from WGS were filtered and compared with a nonredundant viral database composed of the complete viral genomes from NCBI using KRAKEN (confidence scoring threshold of 0.5. To compare the relative differential changes, the sequence counts in each treatment group were normalized to account for differences in DNA sequencing library sizes. Bioinformatics techniques were developed to visualize the virome comparative changes in a phylogenic tree graph. The computed data revealed that AgNP had an impact on several intestinal bacteriophages that prey on bacterial genus Enterobacteria, Yersinia and Staphylococcus as host species. Moreover, there was an independent effect of nanoparticles and released ions. Conclusion: Overall, this study reveals that the small-size AgNP could lead to

  16. The Nostoc punctiforme Genome

    Energy Technology Data Exchange (ETDEWEB)

    John C. Meeks

    2001-12-31

    Nostoc punctiforme is a filamentous cyanobacterium with extensive phenotypic characteristics and a relatively large genome, approaching 10 Mb. The phenotypic characteristics include a photoautotrophic, diazotrophic mode of growth, but N. punctiforme is also facultatively heterotrophic; its vegetative cells have multiple development alternatives, including terminal differentiation into nitrogen-fixing heterocysts and transient differentiation into spore-like akinetes or motile filaments called hormogonia; and N. punctiforme has broad symbiotic competence with fungi and terrestrial plants, including bryophytes, gymnosperms and an angiosperm. The shotgun-sequencing phase of the N. punctiforme strain ATCC 29133 genome has been completed by the Joint Genome Institute. Annotation of an 8.9 Mb database yielded 7432 open reading frames, 45% of which encode proteins with known or probable known function and 29% of which are unique to N. punctiforme. Comparative analysis of the sequence indicates a genome that is highly plastic and in a state of flux, with numerous insertion sequences and multilocus repeats, as well as genes encoding transposases and DNA modification enzymes. The sequence also reveals the presence of genes encoding putative proteins that collectively define almost all characteristics of cyanobacteria as a group. N. punctiforme has an extensive potential to sense and respond to environmental signals as reflected by the presence of more than 400 genes encoding sensor protein kinases, response regulators and other transcriptional factors. The signal transduction systems and any of the large number of unique genes may play essential roles in the cell differentiation and symbiotic interaction properties of N. punctiforme.

  17. Carbon dioxide (hydrogen sulfide) membrane separations and WGS membrane reactor modeling for fuel cells

    Science.gov (United States)

    Huang, Jin

    Acid-gas removal is of great importance in many environmental or energy-related processes. Compared to current commercial technologies, membrane-based CO2 and H2S capture has the advantages of low energy consumption, low weight and space requirement, simplicity of installation/operation, and high process flexibility. However, the large-scale application of the membrane separation technology is limited by the relatively low transport properties. In this study, CO2 (H2S)-selective polymeric membranes with high permeability and high selectivity have been studied based on the facilitated transport mechanism. The membrane showed facilitated effect for both CO2 and H2S. A CO2 permeability of above 2000 Barrers, a CO2/H2 selectivity of greater than 40, and a CO2/N2 selectivity of greater than 200 at 100--150°C were observed. As a result of higher reaction rate and smaller diffusing compound, the H2S permeability and H2S/H2 selectivity were about three times higher than those properties for CO2. The novel CO2-selective membrane has been applied to capture CO 2 from flue gas and natural gas. In the CO2 capture experiments from a gas mixture with N2 and H2, a permeate CO 2 dry concentration of greater than 98% was obtained by using steam as the sweep gas. In CO2/CH4 separation, decent CO 2 transport properties were obtained with a feed pressure up to 500 psia. With the thin-film composite membrane structure, significant increase on the CO2 flux was achieved with the decrease of the selective layer thickness. With the continuous removal of CO2, CO2-selective water-gas-shift (WGS) membrane reactor is a promising approach to enhance CO conversion and increase the purity of H2 at process pressure under relatively low temperature. The simultaneous reaction and transport process in the countercurrent WGS membrane reactor was simulated by using a one-dimensional non-isothermal model. The modeling results show that a CO concentration of less than 10 ppm and a H2 recovery of greater

  18. High-Throughput Sequencing, a VersatileWeapon to Support Genome-Based Diagnosis in Infectious Diseases: Applications to Clinical Bacteriology

    Directory of Open Access Journals (Sweden)

    Ségolène Caboche

    2014-04-01

    Full Text Available The recent progresses of high-throughput sequencing (HTS technologies enable easy and cost-reduced access to whole genome sequencing (WGS or re-sequencing. HTS associated with adapted, automatic and fast bioinformatics solutions for sequencing applications promises an accurate and timely identification and characterization of pathogenic agents. Many studies have demonstrated that data obtained from HTS analysis have allowed genome-based diagnosis, which has been consistent with phenotypic observations. These proofs of concept are probably the first steps toward the future of clinical microbiology. From concept to routine use, many parameters need to be considered to promote HTS as a powerful tool to help physicians and clinicians in microbiological investigations. This review highlights the milestones to be completed toward this purpose.

  19. El datum geodésico de Ocotepeque y el datum satelitario del sistema WGS84.

    Directory of Open Access Journals (Sweden)

    Esteban Dörries

    2016-03-01

    Full Text Available En Costa Rica están en uso actualmente dos sistemas de coordenadas. El sistema de coordenadas Lambert del Instituto Geográfico Nacional (IGN, definido a partir del datum de Ocotepeque y el sistema CRTM del Catastro Nacional, definido a partir del datum del sistema satelitario GPS. Para establecer las bases científicas y técnicas adecuadas para realizar recomendaciones que permitan fijar normas conducentes a la unificación de los sistemas de referencia y a la eliminación de las ambigüedades, se ejecutó un proyecto de investigación en la Escuela de Topografía, Catastro y Geodesia (ETCG, de la Universidad Nacional, titulado “Estudio comparativo del datum Geodésico de Ocotepeque y el datum satelitario del sistema WGS84”. Este proyecto tuvo como objetivo general efectuar un estudio comparativo de la Red Geodésica Nacional utilizada por el IGN, formada por cadenas de triangulación, es base de la cartografía oficial y está orientada en relación con el datum geodésico de Ocotepeque, y una red obtenida por mediciones GPS constituida por las del Catastro Nacional y de la Universidad Nacional como redes parciales, formada por figuras y orientada en el datum satelitario WGS84.

  20. M-GCAT: interactively and efficiently constructing large-scale multiple genome comparison frameworks in closely related species

    Directory of Open Access Journals (Sweden)

    Messeguer Xavier

    2006-10-01

    Full Text Available Abstract Background Due to recent advances in whole genome shotgun sequencing and assembly technologies, the financial cost of decoding an organism's DNA has been drastically reduced, resulting in a recent explosion of genomic sequencing projects. This increase in related genomic data will allow for in depth studies of evolution in closely related species through multiple whole genome comparisons. Results To facilitate such comparisons, we present an interactive multiple genome comparison and alignment tool, M-GCAT, that can efficiently construct multiple genome comparison frameworks in closely related species. M-GCAT is able to compare and identify highly conserved regions in up to 20 closely related bacterial species in minutes on a standard computer, and as many as 90 (containing 75 cloned genomes from a set of 15 published enterobacterial genomes in an hour. M-GCAT also incorporates a novel comparative genomics data visualization interface allowing the user to globally and locally examine and inspect the conserved regions and gene annotations. Conclusion M-GCAT is an interactive comparative genomics tool well suited for quickly generating multiple genome comparisons frameworks and alignments among closely related species. M-GCAT is freely available for download for academic and non-commercial use at: http://alggen.lsi.upc.es/recerca/align/mgcat/intro-mgcat.html.

  1. Genome analysis of E. coli isolated from Crohn's disease patients.

    Science.gov (United States)

    Rakitina, Daria V; Manolov, Alexander I; Kanygina, Alexandra V; Garushyants, Sofya K; Baikova, Julia P; Alexeev, Dmitry G; Ladygina, Valentina G; Kostryukova, Elena S; Larin, Andrei K; Semashko, Tatiana A; Karpova, Irina Y; Babenko, Vladislav V; Ismagilova, Ruzilya K; Malanin, Sergei Y; Gelfand, Mikhail S; Ilina, Elena N; Gorodnichev, Roman B; Lisitsyna, Eugenia S; Aleshkin, Gennady I; Scherbakov, Petr L; Khalif, Igor L; Shapina, Marina V; Maev, Igor V; Andreev, Dmitry N; Govorun, Vadim M

    2017-07-19

    Escherichia coli (E. coli) has been increasingly implicated in the pathogenesis of Crohn's disease (CD). The phylogeny of E. coli isolated from Crohn's disease patients (CDEC) was controversial, and while genotyping results suggested heterogeneity, the sequenced strains of E. coli from CD patients were closely related. We performed the shotgun genome sequencing of 28 E. coli isolates from ten CD patients and compared genomes from these isolates with already published genomes of CD strains and other pathogenic and non-pathogenic strains. CDEC was shown to belong to A, B1, B2 and D phylogenetic groups. The plasmid and several operons from the reference CD-associated E. coli strain LF82 were demonstrated to be more often present in CDEC genomes belonging to different phylogenetic groups than in genomes of commensal strains. The operons include carbon-source induced invasion GimA island, prophage I, iron uptake operons I and II, capsular assembly pathogenetic island IV and propanediol and galactitol utilization operons. Our findings suggest that CDEC are phylogenetically diverse. However, some strains isolated from independent sources possess highly similar chromosome or plasmids. Though no CD-specific genes or functional domains were present in all CD-associated strains, some genes and operons are more often found in the genomes of CDEC than in commensal E. coli. They are principally linked to gut colonization and utilization of propanediol and other sugar alcohols.

  2. ReAS: Recovery of ancestral sequences for transposable elements from the unassembled reads of a whole genome shotgun

    DEFF Research Database (Denmark)

    Li, Ruiqiang; Ye, Jia; Li, Songgang

    2005-01-01

    in comparison to their ancestral sequences. Tested on the japonica rice genome, ReAS was able to reconstruct all of the high copy sequences in the Repbase repository of known TEs, and increase the effectiveness of RepeatMasker in identifying TEs from genome sequences. Udgivelsesdato: 2005-Sep...

  3. Large pore dermal microdialysis and liquid chromatography-tandem mass spectroscopy shotgun proteomic analysis: a feasibility study

    DEFF Research Database (Denmark)

    Petersen, Lars J.; Sorensen, Mette A.; Codrea, Marius C.

    2013-01-01

    Background/AimsThe purpose of the present pilot study was to investigate the feasibility of combining large pore dermal microdialysis with shotgun proteomic analysis in human skin. MethodsDialysate was recovered from human skin by 2000 kDa microdialysis membranes from one subject at three different...

  4. Observing copepods through a genomic lens

    Directory of Open Access Journals (Sweden)

    Johnson Stewart C

    2011-09-01

    Full Text Available Abstract Background Copepods outnumber every other multicellular animal group. They are critical components of the world's freshwater and marine ecosystems, sensitive indicators of local and global climate change, key ecosystem service providers, parasites and predators of economically important aquatic animals and potential vectors of waterborne disease. Copepods sustain the world fisheries that nourish and support human populations. Although genomic tools have transformed many areas of biological and biomedical research, their power to elucidate aspects of the biology, behavior and ecology of copepods has only recently begun to be exploited. Discussion The extraordinary biological and ecological diversity of the subclass Copepoda provides both unique advantages for addressing key problems in aquatic systems and formidable challenges for developing a focused genomics strategy. This article provides an overview of genomic studies of copepods and discusses strategies for using genomics tools to address key questions at levels extending from individuals to ecosystems. Genomics can, for instance, help to decipher patterns of genome evolution such as those that occur during transitions from free living to symbiotic and parasitic lifestyles and can assist in the identification of genetic mechanisms and accompanying physiological changes associated with adaptation to new or physiologically challenging environments. The adaptive significance of the diversity in genome size and unique mechanisms of genome reorganization during development could similarly be explored. Genome-wide and EST studies of parasitic copepods of salmon and large EST studies of selected free-living copepods have demonstrated the potential utility of modern genomics approaches for the study of copepods and have generated resources such as EST libraries, shotgun genome sequences, BAC libraries, genome maps and inbred lines that will be invaluable in assisting further efforts to

  5. Observing copepods through a genomic lens

    Science.gov (United States)

    2011-01-01

    Background Copepods outnumber every other multicellular animal group. They are critical components of the world's freshwater and marine ecosystems, sensitive indicators of local and global climate change, key ecosystem service providers, parasites and predators of economically important aquatic animals and potential vectors of waterborne disease. Copepods sustain the world fisheries that nourish and support human populations. Although genomic tools have transformed many areas of biological and biomedical research, their power to elucidate aspects of the biology, behavior and ecology of copepods has only recently begun to be exploited. Discussion The extraordinary biological and ecological diversity of the subclass Copepoda provides both unique advantages for addressing key problems in aquatic systems and formidable challenges for developing a focused genomics strategy. This article provides an overview of genomic studies of copepods and discusses strategies for using genomics tools to address key questions at levels extending from individuals to ecosystems. Genomics can, for instance, help to decipher patterns of genome evolution such as those that occur during transitions from free living to symbiotic and parasitic lifestyles and can assist in the identification of genetic mechanisms and accompanying physiological changes associated with adaptation to new or physiologically challenging environments. The adaptive significance of the diversity in genome size and unique mechanisms of genome reorganization during development could similarly be explored. Genome-wide and EST studies of parasitic copepods of salmon and large EST studies of selected free-living copepods have demonstrated the potential utility of modern genomics approaches for the study of copepods and have generated resources such as EST libraries, shotgun genome sequences, BAC libraries, genome maps and inbred lines that will be invaluable in assisting further efforts to provide genomics tools for

  6. Data on xylem sap proteins from Mn- and Fe-deficient tomato plants obtained using shotgun proteomics.

    Science.gov (United States)

    Ceballos-Laita, Laura; Gutierrez-Carbonell, Elain; Takahashi, Daisuke; Abadía, Anunciación; Uemura, Matsuo; Abadía, Javier; López-Millán, Ana Flor

    2018-04-01

    This article contains consolidated proteomic data obtained from xylem sap collected from tomato plants grown in Fe- and Mn-sufficient control, as well as Fe-deficient and Mn-deficient conditions. Data presented here cover proteins identified and quantified by shotgun proteomics and Progenesis LC-MS analyses: proteins identified with at least two peptides and showing changes statistically significant (ANOVA; p ≤ 0.05) and above a biologically relevant selected threshold (fold ≥ 2) between treatments are listed. The comparison between Fe-deficient, Mn-deficient and control xylem sap samples using a multivariate statistical data analysis (Principal Component Analysis, PCA) is also included. Data included in this article are discussed in depth in the research article entitled "Effects of Fe and Mn deficiencies on the protein profiles of tomato ( Solanum lycopersicum) xylem sap as revealed by shotgun analyses" [1]. This dataset is made available to support the cited study as well to extend analyses at a later stage.

  7. metaBIT, an integrative and automated metagenomic pipeline for analysing microbial profiles from high-throughput sequencing shotgun data

    DEFF Research Database (Denmark)

    Louvel, Guillaume; Der Sarkissian, Clio; Hanghøj, Kristian Ebbesen

    2016-01-01

    -throughput DNA sequencing (HTS). Here, we develop metaBIT, an open-source computational pipeline automatizing routine microbial profiling of shotgun HTS data. Customizable by the user at different stringency levels, it performs robust taxonomy-based assignment and relative abundance calculation of microbial taxa......, as well as cross-sample statistical analyses of microbial diversity distributions. We demonstrate the versatility of metaBIT within a range of published HTS data sets sampled from the environment (soil and seawater) and the human body (skin and gut), but also from archaeological specimens. We present......-friendly profiling of the microbial DNA present in HTS shotgun data sets. The applications of metaBIT are vast, from monitoring of laboratory errors and contaminations, to the reconstruction of past and present microbiota, and the detection of candidate species, including pathogens....

  8. A framework for assessing the concordance of molecular typing methods and the true strain phylogeny of Campylobacter jejuni and C. coli using draft genome sequence data

    Directory of Open Access Journals (Sweden)

    Catherine Dianna Carrillo

    2012-05-01

    Full Text Available Tracking of sources of sporadic cases of campylobacteriosis remains challenging, as commonly used molecular typing methods have limited ability to unambiguously link genetically related strains. Genomics has become increasingly prominent in the public health response to enteric pathogens as methods enable characterization of pathogens at an unprecedented level of resolution. However, the cost of sequencing and expertise required for bioinformatic analyses remains prohibitive, and these comprehensive analyses are limited to a few priority strains. Although several molecular typing methods are currently widely used for epidemiological analysis of campylobacters, it is not clear how accurately these methods reflect true strain relationships. To address this, we analyzed 104 publically available whole genome sequences (WGS of C. jejuni and C. coli. In addition to in silico determination of multi-locus sequence (MLST, fla and porA type, as well as comparative genomic fingerprint (CGF, we inferred a reference phylogeny based on conserved core genome elements. Molecular typing data were compared to the reference phylogeny for concordance using the Adjusted Wallace Coefficient (AWC with confidence intervals. Although MLST targets the sequence variability in core genes and CGF targets insertions/deletions of accessory genes, both methods are based on multilocus analysis and provided better estimates of true phylogeny than methods based on single loci (porA, fla. A more comprehensive WGS dataset including additional genetically related strains, both epidemiologically linked and unlinked, will be necessary to assess performance of methods for outbreak investigations and surveillance activities. Analyses of the strengths and weaknesses of widely used typing methodologies in inferring true strain relationships will provide guidance in the interpretation of this data for epidemiological purposes.

  9. Low-pass shotgun sequencing of the barley genome facilitates rapid identification of genes, conserved non-coding sequences and novel repeats

    Directory of Open Access Journals (Sweden)

    Graner Andreas

    2008-10-01

    Full Text Available Abstract Background Barley has one of the largest and most complex genomes of all economically important food crops. The rise of new short read sequencing technologies such as Illumina/Solexa permits such large genomes to be effectively sampled at relatively low cost. Based on the corresponding sequence reads a Mathematically Defined Repeat (MDR index can be generated to map repetitive regions in genomic sequences. Results We have generated 574 Mbp of Illumina/Solexa sequences from barley total genomic DNA, representing about 10% of a genome equivalent. From these sequences we generated an MDR index which was then used to identify and mark repetitive regions in the barley genome. Comparison of the MDR plots with expert repeat annotation drawing on the information already available for known repetitive elements revealed a significant correspondence between the two methods. MDR-based annotation allowed for the identification of dozens of novel repeat sequences, though, which were not recognised by hand-annotation. The MDR data was also used to identify gene-containing regions by masking of repetitive sequences in eight de-novo sequenced bacterial artificial chromosome (BAC clones. For half of the identified candidate gene islands indeed gene sequences could be identified. MDR data were only of limited use, when mapped on genomic sequences from the closely related species Triticum monococcum as only a fraction of the repetitive sequences was recognised. Conclusion An MDR index for barley, which was obtained by whole-genome Illumina/Solexa sequencing, proved as efficient in repeat identification as manual expert annotation. Circumventing the labour-intensive step of producing a specific repeat library for expert annotation, an MDR index provides an elegant and efficient resource for the identification of repetitive and low-copy (i.e. potentially gene-containing sequences regions in uncharacterised genomic sequences. The restriction that a particular

  10. Phenotypic H-Antigen Typing by Mass Spectrometry Combined with Genetic Typing of H Antigens, O Antigens, and Toxins by Whole-Genome Sequencing Enhances Identification of Escherichia coli Isolates.

    Science.gov (United States)

    Cheng, Keding; Chui, Huixia; Domish, Larissa; Sloan, Angela; Hernandez, Drexler; McCorrister, Stuart; Robinson, Alyssia; Walker, Matthew; Peterson, Lorea A M; Majcher, Miles; Ratnam, Sam; Haldane, David J M; Bekal, Sadjia; Wylie, John; Chui, Linda; Tyler, Shaun; Xu, Bianli; Reimer, Aleisha; Nadon, Celine; Knox, J David; Wang, Gehua

    2016-08-01

    Mass spectrometry-based phenotypic H-antigen typing (MS-H) combined with whole-genome-sequencing-based genetic identification of H antigens, O antigens, and toxins (WGS-HOT) was used to type 60 clinical Escherichia coli isolates, 43 of which were previously identified as nonmotile, H type undetermined, or O rough by serotyping or having shown discordant MS-H and serotyping results. Whole-genome sequencing confirmed that MS-H was able to provide more accurate data regarding H antigen expression than serotyping. Further, enhanced and more confident O antigen identification resulted from gene cluster based typing in combination with conventional typing based on the gene pair comprising wzx and wzy and that comprising wzm and wzt The O antigen was identified in 94.6% of the isolates when the two genetic O typing approaches (gene pair and gene cluster) were used in conjunction, in comparison to 78.6% when the gene pair database was used alone. In addition, 98.2% of the isolates showed the existence of genes for various toxins and/or virulence factors, among which verotoxins (Shiga toxin 1 and/or Shiga toxin 2) were 100% concordant with conventional PCR based testing results. With more applications of mass spectrometry and whole-genome sequencing in clinical microbiology laboratories, this combined phenotypic and genetic typing platform (MS-H plus WGS-HOT) should be ideal for pathogenic E. coli typing. Copyright © 2016 Cheng et al.

  11. Draft genome sequence of ramie, Boehmeria nivea (L.) Gaudich.

    Science.gov (United States)

    Luan, Ming-Bao; Jian, Jian-Bo; Chen, Ping; Chen, Jun-Hui; Chen, Jian-Hua; Gao, Qiang; Gao, Gang; Zhou, Ju-Hong; Chen, Kun-Mei; Guang, Xuan-Min; Chen, Ji-Kang; Zhang, Qian-Qian; Wang, Xiao-Fei; Fang, Long; Sun, Zhi-Min; Bai, Ming-Zhou; Fang, Xiao-Dong; Zhao, Shan-Cen; Xiong, He-Ping; Yu, Chun-Ming; Zhu, Ai-Guo

    2018-05-01

    Ramie, Boehmeria nivea (L.) Gaudich, family Urticaceae, is a plant native to eastern Asia, and one of the world's oldest fibre crops. It is also used as animal feed and for the phytoremediation of heavy metal-contaminated farmlands. Thus, the genome sequence of ramie was determined to explore the molecular basis of its fibre quality, protein content and phytoremediation. For further understanding ramie genome, different paired-end and mate-pair libraries were combined to generate 134.31 Gb of raw DNA sequences using the Illumina whole-genome shotgun sequencing approach. The highly heterozygous B. nivea genome was assembled using the Platanus Genome Assembler, which is an effective tool for the assembly of highly heterozygous genome sequences. The final length of the draft genome of this species was approximately 341.9 Mb (contig N50 = 22.62 kb, scaffold N50 = 1,126.36 kb). Based on ramie genome annotations, 30,237 protein-coding genes were predicted, and the repetitive element content was 46.3%. The completeness of the final assembly was evaluated by benchmarking universal single-copy orthologous genes (BUSCO); 90.5% of the 1,440 expected embryophytic genes were identified as complete, and 4.9% were identified as fragmented. Phylogenetic analysis based on single-copy gene families and one-to-one orthologous genes placed ramie with mulberry and cannabis, within the clade of urticalean rosids. Genome information of ramie will be a valuable resource for the conservation of endangered Boehmeria species and for future studies on the biogeography and characteristic evolution of members of Urticaceae. © 2018 John Wiley & Sons Ltd.

  12. Comprehensive genomic analysis of a plant growth-promoting rhizobacterium Pantoea agglomerans strain P5.

    Science.gov (United States)

    Shariati J, Vahid; Malboobi, Mohammad Ali; Tabrizi, Zeinab; Tavakol, Elahe; Owilia, Parviz; Safari, Maryam

    2017-11-15

    In this study, we provide a comparative genomic analysis of Pantoea agglomerans strain P5 and 10 closely related strains based on phylogenetic analyses. A next-generation shotgun strategy was implemented using the Illumina HiSeq 2500 technology followed by core- and pan-genome analysis. The genome of P. agglomerans strain P5 contains an assembly size of 5082485 bp with 55.4% G + C content. P. agglomerans consists of 2981 core and 3159 accessory genes for Coding DNA Sequences (CDSs) based on the pan-genome analysis. Strain P5 can be grouped closely with strains PG734 and 299 R using pan and core genes, respectively. All the predicted and annotated gene sequences were allocated to KEGG pathways. Accordingly,  genes involved in plant growth-promoting (PGP) ability, including phosphate solubilization, IAA and siderophore production, acetoin and 2,3-butanediol synthesis and bacterial secretion, were assigned. This study provides an in-depth view of the PGP characteristics of strain P5, highlighting its potential use in agriculture as a biofertilizer.

  13. Genomic diversity of drug-resistant Mycobacterium tuberculosis isolates in Lisbon Portugal: Towards tuberculosis genomic epidemiology

    Directory of Open Access Journals (Sweden)

    João Perdigão

    2015-01-01

    Full Text Available Multidrug- (MDR and extensively drug-resistant (XDR tuberculosis (TB present a challenge to disease control and elimination goals. Lisbon, Portugal, has a high TB incidencerate and unusual and successful XDR-TB strains that have been found in circulation foralmost two decades. For the last 20 years, a continued circulation of two phylogenetic clades, Lisboa3 and Q1, which are highly associated with MDR and XDR, have been observed. In recent years, these strains have been well characterized regarding the molecular basis of drug resistance and have been inclusively subjected to whole genome sequencing (WGS. Researchers have been studying the genomic diversity of strains circulating in Lisbon and its genomic determinants through cutting-edge next generation sequencing. An enormous amount of whole genome sequence data are now available for the most prevalent and clinically relevant strains circulating in Lisbon. It is the persistence, prevalence and rapid evolution towards drug resistance that has prompted researchers to investigate the properties of these strains at the genomic level and in the future at a global transcriptomic level. Seventy Mycobacterium tuberculosis (MTB isolates, mostly recovered in Lisbon, were genotyped by 24-loci Mycobacterial Interspersed Repetitive Unit – Variable Number of Tandem Repeats (MIRU-VNTR and the genomes sequenced using a next generation sequencing platform – Illumina HiSeq 2000. The genotyping data revealed three major clusters associated with MDR-TB (Lisboa3-A, Lisboa3-B and Q1, two of which are associated with XDR-TB (Lisboa3-B and Q1, whilst the genomic data contributed to elucidating the phylogenetic positioning of circulating MDR-TB strains, showing a high predominance of a single SNP cluster group 5. Furthermore, a genome-wide phylogeny analysis from these strains, together with 19 publicly available genomes of MTB clinical isolates, revealed two major clades responsible for MDR/XDR-TB in the region

  14. Genomic diversity of drug-resistant Mycobacterium tuberculosis isolates in Lisbon Portugal: Towards tuberculosis genomic epidemiology

    KAUST Repository

    Perdigã o, Joã o; Silva, Hugo; Machado, Diana; Macedo, Rita; Maltez, Fernando; Silva, Carla; Jordao, Luisa; Couto, Isabel; Mallard, Kim; Coll, Francesc; Hill-Cawthorne, Grant A.; McNerney, Ruth; Pain, Arnab; Clark, Taane G.; Viveiros, Miguel; Portugal, Isabel

    2015-01-01

    Multidrug- (MDR) and extensively drug-resistant (XDR) tuberculosis (TB) present a challenge to disease control and elimination goals. Lisbon, Portugal, has a high TB incidence rate and unusual and successful XDR-TB strains that have been found in circulation for almost two decades. For the last 20. years, a continued circulation of two phylogenetic clades, Lisboa3 and Q1, which are highly associated with MDR and XDR, have been observed. In recent years, these strains have been well characterized regarding the molecular basis of drug resistance and have been inclusively subjected to whole genome sequencing (WGS). Researchers have been studying the genomic diversity of strains circulating in Lisbon and its genomic determinants through cutting-edge next generation sequencing. An enormous amount of whole genome sequence data are now available for the most prevalent and clinically relevant strains circulating in Lisbon.It is the persistence, prevalence and rapid evolution towards drug resistance that has prompted researchers to investigate the properties of these strains at the genomic level and in the future at a global transcriptomic level. Seventy Mycobacterium tuberculosis (MTB) isolates, mostly recovered in Lisbon, were genotyped by 24-. loci Mycobacterial Interspersed Repetitive Unit - Variable Number of Tandem Repeats (MIRU-VNTR) and the genomes sequenced using a next generation sequencing platform - Illumina HiSeq 2000.The genotyping data revealed three major clusters associated with MDR-TB (Lisboa3-A, Lisboa3-B and Q1), two of which are associated with XDR-TB (Lisboa3-B and Q1), whilst the genomic data contributed to elucidating the phylogenetic positioning of circulating MDR-TB strains, showing a high predominance of a single SNP cluster group 5. Furthermore, a genome-wide phylogeny analysis from these strains, together with 19 publicly available genomes of MTB clinical isolates, revealed two major clades responsible for MDR/XDR-TB in the region: Lisboa3 and Q

  15. Genomic diversity of drug-resistant Mycobacterium tuberculosis isolates in Lisbon Portugal: Towards tuberculosis genomic epidemiology

    KAUST Repository

    Perdigão, João

    2015-03-01

    Multidrug- (MDR) and extensively drug-resistant (XDR) tuberculosis (TB) present a challenge to disease control and elimination goals. Lisbon, Portugal, has a high TB incidence rate and unusual and successful XDR-TB strains that have been found in circulation for almost two decades. For the last 20. years, a continued circulation of two phylogenetic clades, Lisboa3 and Q1, which are highly associated with MDR and XDR, have been observed. In recent years, these strains have been well characterized regarding the molecular basis of drug resistance and have been inclusively subjected to whole genome sequencing (WGS). Researchers have been studying the genomic diversity of strains circulating in Lisbon and its genomic determinants through cutting-edge next generation sequencing. An enormous amount of whole genome sequence data are now available for the most prevalent and clinically relevant strains circulating in Lisbon.It is the persistence, prevalence and rapid evolution towards drug resistance that has prompted researchers to investigate the properties of these strains at the genomic level and in the future at a global transcriptomic level. Seventy Mycobacterium tuberculosis (MTB) isolates, mostly recovered in Lisbon, were genotyped by 24-. loci Mycobacterial Interspersed Repetitive Unit - Variable Number of Tandem Repeats (MIRU-VNTR) and the genomes sequenced using a next generation sequencing platform - Illumina HiSeq 2000.The genotyping data revealed three major clusters associated with MDR-TB (Lisboa3-A, Lisboa3-B and Q1), two of which are associated with XDR-TB (Lisboa3-B and Q1), whilst the genomic data contributed to elucidating the phylogenetic positioning of circulating MDR-TB strains, showing a high predominance of a single SNP cluster group 5. Furthermore, a genome-wide phylogeny analysis from these strains, together with 19 publicly available genomes of MTB clinical isolates, revealed two major clades responsible for MDR/XDR-TB in the region: Lisboa3 and Q

  16. The Medicago Genome Provides Insight into the Evolution of Rhizobial Symbioses

    Science.gov (United States)

    Young, Nevin D.; Debellé, Frédéric; Oldroyd, Giles E. D.; Geurts, Rene; Cannon, Steven B.; Udvardi, Michael K.; Benedito, Vagner A.; Mayer, Klaus F. X.; Gouzy, Jérôme; Schoof, Heiko; Van de Peer, Yves; Proost, Sebastian; Cook, Douglas R.; Meyers, Blake C.; Spannagl, Manuel; Cheung, Foo; De Mita, Stéphane; Krishnakumar, Vivek; Gundlach, Heidrun; Zhou, Shiguo; Mudge, Joann; Bharti, Arvind K.; Murray, Jeremy D.; Naoumkina, Marina A.; Rosen, Benjamin; Silverstein, Kevin A. T.; Tang, Haibao; Rombauts, Stephane; Zhao, Patrick X.; Zhou, Peng; Barbe, Valérie; Bardou, Philippe; Bechner, Michael; Bellec, Arnaud; Berger, Anne; Bergès, Hélène; Bidwell, Shelby; Bisseling, Ton; Choisne, Nathalie; Couloux, Arnaud; Denny, Roxanne; Deshpande, Shweta; Dai, Xinbin; Doyle, Jeff; Dudez, Anne-Marie; Farmer, Andrew D.; Fouteau, Stéphanie; Franken, Carolien; Gibelin, Chrystel; Gish, John; Goldstein, Steven; González, Alvaro J.; Green, Pamela J.; Hallab, Asis; Hartog, Marijke; Hua, Axin; Humphray, Sean; Jeong, Dong-Hoon; Jing, Yi; Jöcker, Anika; Kenton, Steve M.; Kim, Dong-Jin; Klee, Kathrin; Lai, Hongshing; Lang, Chunting; Lin, Shaoping; Macmil, Simone L; Magdelenat, Ghislaine; Matthews, Lucy; McCorrison, Jamison; Monaghan, Erin L.; Mun, Jeong-Hwan; Najar, Fares Z.; Nicholson, Christine; Noirot, Céline; O’Bleness, Majesta; Paule, Charles R.; Poulain, Julie; Prion, Florent; Qin, Baifang; Qu, Chunmei; Retzel, Ernest F.; Riddle, Claire; Sallet, Erika; Samain, Sylvie; Samson, Nicolas; Sanders, Iryna; Saurat, Olivier; Scarpelli, Claude; Schiex, Thomas; Segurens, Béatrice; Severin, Andrew J.; Sherrier, D. Janine; Shi, Ruihua; Sims, Sarah; Singer, Susan R.; Sinharoy, Senjuti; Sterck, Lieven; Viollet, Agnès; Wang, Bing-Bing; Wang, Keqin; Wang, Mingyi; Wang, Xiaohong; Warfsmann, Jens; Weissenbach, Jean; White, Doug D.; White, Jim D.; Wiley, Graham B.; Wincker, Patrick; Xing, Yanbo; Yang, Limei; Yao, Ziyun; Ying, Fu; Zhai, Jixian; Zhou, Liping; Zuber, Antoine; Dénarié, Jean; Dixon, Richard A.; May, Gregory D.; Schwartz, David C.; Rogers, Jane; Quétier, Francis; Town, Christopher D.; Roe, Bruce A.

    2011-01-01

    Legumes (Fabaceae or Leguminosae) are unique among cultivated plants for their ability to carry out endosymbiotic nitrogen fixation with rhizobial bacteria, a process that takes place in a specialized structure known as the nodule. Legumes belong to one of the two main groups of eurosids, the Fabidae, which includes most species capable of endosymbiotic nitrogen fixation 1. Legumes comprise several evolutionary lineages derived from a common ancestor 60 million years ago (Mya). Papilionoids are the largest clade, dating nearly to the origin of legumes and containing most cultivated species 2. Medicago truncatula (Mt) is a long-established model for the study of legume biology. Here we describe the draft sequence of the Mt euchromatin based on a recently completed BAC-assembly supplemented with Illumina-shotgun sequence, together capturing ~94% of all Mt genes. A whole-genome duplication (WGD) approximately 58 Mya played a major role in shaping the Mt genome and thereby contributed to the evolution of endosymbiotic nitrogen fixation. Subsequent to the WGD, the Mt genome experienced higher levels of rearrangement than two other sequenced legumes, Glycine max (Gm) and Lotus japonicus (Lj). Mt is a close relative of alfalfa (M. sativa), a widely cultivated crop with limited genomics tools and complex autotetraploid genetics. As such, the Mt genome sequence provides significant opportunities to expand alfalfa’s genomic toolbox. PMID:22089132

  17. Reconstruction of a Bacterial Genome from DNA Cassettes

    Energy Technology Data Exchange (ETDEWEB)

    Christopher Dupont; John Glass; Laura Sheahan; Shibu Yooseph; Lisa Zeigler Allen; Mathangi Thiagarajan; Andrew Allen; Robert Friedman; J. Craig Venter

    2011-12-31

    This basic research program comprised two major areas: (1) acquisition and analysis of marine microbial metagenomic data and development of genomic analysis tools for broad, external community use; (2) development of a minimal bacterial genome. Our Marine Metagenomic Diversity effort generated and analyzed shotgun sequencing data from microbial communities sampled from over 250 sites around the world. About 40% of the 26 Gbp of sequence data has been made publicly available to date with a complete release anticipated in six months. Our results and those mining the deposited data have revealed a vast diversity of genes coding for critical metabolic processes whose phylogenetic and geographic distributions will enable a deeper understanding of carbon and nutrient cycling, microbial ecology, and rapid rate evolutionary processes such as horizontal gene transfer by viruses and plasmids. A global assembly of the generated dataset resulted in a massive set (5Gbp) of genome fragments that provide context to the majority of the generated data that originated from uncultivated organisms. Our Synthetic Biology team has made significant progress towards the goal of synthesizing a minimal mycoplasma genome that will have all of the machinery for independent life. This project, once completed, will provide fundamentally new knowledge about requirements for microbial life and help to lay a basic research foundation for developing microbiological approaches to bioenergy.

  18. A Novel Prosthetic Joint Infection Pathogen, Mycoplasma salivarium, Identified by Metagenomic Shotgun Sequencing.

    Science.gov (United States)

    Thoendel, Matthew; Jeraldo, Patricio; Greenwood-Quaintance, Kerryl E; Chia, Nicholas; Abdel, Matthew P; Steckelberg, James M; Osmon, Douglas R; Patel, Robin

    2017-07-15

    Defining the microbial etiology of culture-negative prosthetic joint infection (PJI) can be challenging. Metagenomic shotgun sequencing is a new tool to identify organisms undetected by conventional methods. We present a case where metagenomics was used to identify Mycoplasma salivarium as a novel PJI pathogen in a patient with hypogammaglobulinemia. © The Author 2017. Published by Oxford University Press for the Infectious Diseases Society of America. All rights reserved. For permissions, e-mail: journals.permissions@oup.com.

  19. Application for coordinate transformation between Gaus - Kruger projection: Bessel ellipsoid and UTM projection: WGS84 ellipsoid

    Directory of Open Access Journals (Sweden)

    Zoran Gojković

    2017-01-01

    Full Text Available The physical surface of the earth has irregular shape which is not mathematically defined, therefore the shape of the Earth is approximated with mathematically defined surfaces such as ellipsoid and sphere. The developing of a global positioning systems, thus and modern navigation systems, as effect produce large amounts of data which contain the problem of homogeneity. This problem could be exceed if all the data are store in the same coordinate system. Hence the need for data transformation from local coordinate systems to the global coordinate systems. Global level implies WGS84 ellipsoid and UTM projection while national coordinate system of Republic Serbia is Gauss-Kruger with Bessel ellipsoid. This coordinate system of Republic Serbia on a global level has a local character. Applying appropriate mathematical models and functions it is possible to transform coordinates from one system to another and vice versa. The paper describes coordinate transformations from Gauss-Kruger coordinate system ellipsoid Bessel to UTM projection WGS84 ellipsoid and vice versa, and also an application which provides transformation of its kind that is made using open source environment. Name of the application is TRANS7_GK_UTM_GK and it can be found and used on the web page of the faculty for Mining and Geology under the link http://gk2utm.rgf.bg.ac.rs with a user guide.

  20. Metagenomic analysis of the microbial community in fermented grape marc reveals that Lactobacillus fabifermentans is one of the dominant species: insights into its genome structure

    DEFF Research Database (Denmark)

    Campanaro, Stefano; Treu, Laura; Vendramin, Veronica

    2014-01-01

    species after 30 days of incubation and made it possible to identify those species that are able to grow in that extreme environment. The genome sequence of Lactobacillus fabifermentans, one of the dominant species identified, was then analyzed using shotgun sequencing and comparative genomics....... The results revealed that it is one of the largest genomes among the Lactobacillus sequenced and is characterized by a large number of genes involved in carbohydrate utilization and in the regulation of gene expression. The genome was shaped through a large number of gene duplication events, while lateral...... gene transfer contributed to a lesser extent with respect to other Lactobacillus species. According to genomic analysis, its carbohydrate utilization pattern and ability to form biofilm are the main genetic traits linked to the adaptation the species underwent permitting it to grow in fermenting grape...

  1. Tandem Mass Spectrum Sequencing: An Alternative to Database Search Engines in Shotgun Proteomics.

    Science.gov (United States)

    Muth, Thilo; Rapp, Erdmann; Berven, Frode S; Barsnes, Harald; Vaudel, Marc

    2016-01-01

    Protein identification via database searches has become the gold standard in mass spectrometry based shotgun proteomics. However, as the quality of tandem mass spectra improves, direct mass spectrum sequencing gains interest as a database-independent alternative. In this chapter, the general principle of this so-called de novo sequencing is introduced along with pitfalls and challenges of the technique. The main tools available are presented with a focus on user friendly open source software which can be directly applied in everyday proteomic workflows.

  2. Prediction of Phenotypic Antimicrobial Resistance Profiles From Whole Genome Sequences of Non-typhoidal Salmonella enterica.

    Science.gov (United States)

    Neuert, Saskia; Nair, Satheesh; Day, Martin R; Doumith, Michel; Ashton, Philip M; Mellor, Kate C; Jenkins, Claire; Hopkins, Katie L; Woodford, Neil; de Pinna, Elizabeth; Godbole, Gauri; Dallman, Timothy J

    2018-01-01

    Surveillance of antimicrobial resistance (AMR) in non-typhoidal Salmonella enterica (NTS), is essential for monitoring transmission of resistance from the food chain to humans, and for establishing effective treatment protocols. We evaluated the prediction of phenotypic resistance in NTS from genotypic profiles derived from whole genome sequencing (WGS). Genes and chromosomal mutations responsible for phenotypic resistance were sought in WGS data from 3,491 NTS isolates received by Public Health England's Gastrointestinal Bacteria Reference Unit between April 2014 and March 2015. Inferred genotypic AMR profiles were compared with phenotypic susceptibilities determined for fifteen antimicrobials using EUCAST guidelines. Discrepancies between phenotypic and genotypic profiles for one or more antimicrobials were detected for 76 isolates (2.18%) although only 88/52,365 (0.17%) isolate/antimicrobial combinations were discordant. Of the discrepant results, the largest number were associated with streptomycin (67.05%, n = 59). Pan-susceptibility was observed in 2,190 isolates (62.73%). Overall, resistance to tetracyclines was most common (26.27% of isolates, n = 917) followed by sulphonamides (23.72%, n = 828) and ampicillin (21.43%, n = 748). Multidrug resistance (MDR), i.e., resistance to three or more antimicrobial classes, was detected in 848 isolates (24.29%) with resistance to ampicillin, streptomycin, sulphonamides and tetracyclines being the most common MDR profile ( n = 231; 27.24%). For isolates with this profile, all but one were S . Typhimurium and 94.81% ( n = 219) had the resistance determinants bla TEM-1, strA-strB, sul2 and tet (A). Extended-spectrum β-lactamase genes were identified in 41 isolates (1.17%) and multiple mutations in chromosomal genes associated with ciprofloxacin resistance in 82 isolates (2.35%). This study showed that WGS is suitable as a rapid means of determining AMR patterns of NTS for public health surveillance.

  3. High-content screening of yeast mutant libraries by shotgun lipidomics

    DEFF Research Database (Denmark)

    Tarasov, Kirill; Stefanko, Adam; Casanovas, Albert

    2014-01-01

    To identify proteins with a functional role in lipid metabolism and homeostasis we designed a high-throughput platform for high-content lipidomic screening of yeast mutant libraries. To this end, we combined culturing and lipid extraction in 96-well format, automated direct infusion...... factor KAR4 precipitated distinct lipid metabolic phenotypes. These results demonstrate that the high-throughput shotgun lipidomics platform is a valid and complementary proxy for high-content screening of yeast mutant libraries....... nanoelectrospray ionization, high-resolution Orbitrap mass spectrometry, and a dedicated data processing framework to support lipid phenotyping across hundreds of Saccharomyces cerevisiae mutants. Our novel approach revealed that the absence of genes with unknown function YBR141C and YJR015W, and the transcription...

  4. Limitations of variable number of tandem repeat typing identified through whole genome sequencing of Mycobacterium avium subsp. paratuberculosis on a national and herd level.

    Science.gov (United States)

    Ahlstrom, Christina; Barkema, Herman W; Stevenson, Karen; Zadoks, Ruth N; Biek, Roman; Kao, Rowland; Trewby, Hannah; Haupstein, Deb; Kelton, David F; Fecteau, Gilles; Labrecque, Olivia; Keefe, Greg P; McKenna, Shawn L B; De Buck, Jeroen

    2015-03-08

    Mycobacterium avium subsp. paratuberculosis (MAP), the causative bacterium of Johne's disease in dairy cattle, is widespread in the Canadian dairy industry and has significant economic and animal welfare implications. An understanding of the population dynamics of MAP can be used to identify introduction events, improve control efforts and target transmission pathways, although this requires an adequate understanding of MAP diversity and distribution between herds and across the country. Whole genome sequencing (WGS) offers a detailed assessment of the SNP-level diversity and genetic relationship of isolates, whereas several molecular typing techniques used to investigate the molecular epidemiology of MAP, such as variable number of tandem repeat (VNTR) typing, target relatively unstable repetitive elements in the genome that may be too unpredictable to draw accurate conclusions. The objective of this study was to evaluate the diversity of bovine MAP isolates in Canadian dairy herds using WGS and then determine if VNTR typing can distinguish truly related and unrelated isolates. Phylogenetic analysis based on 3,039 SNPs identified through WGS of 124 MAP isolates identified eight genetically distinct subtypes in dairy herds from seven Canadian provinces, with the dominant type including over 80% of MAP isolates. VNTR typing of 527 MAP isolates identified 12 types, including "bison type" isolates, from seven different herds. At a national level, MAP isolates differed from each other by 1-2 to 239-240 SNPs, regardless of whether they belonged to the same or different VNTR types. A herd-level analysis of MAP isolates demonstrated that VNTR typing may both over-estimate and under-estimate the relatedness of MAP isolates found within a single herd. The presence of multiple MAP subtypes in Canada suggests multiple introductions into the country including what has now become one dominant type, an important finding for Johne's disease control. VNTR typing often failed to

  5. Use of whole-genome sequencing and evaluation of the apparent sensitivity and specificity of antemortem tuberculosis tests in the investigation of an unusual outbreak of Mycobacterium bovis infection in a Michigan dairy herd.

    Science.gov (United States)

    Bruning-Fann, Colleen S; Robbe-Austerman, Suelee; Kaneene, John B; Thomsen, Bruce V; Tilden, John D; Ray, Jean S; Smith, Richard W; Fitzgerald, Scott D; Bolin, Steven R; O'Brien, Daniel J; Mullaney, Thomas P; Stuber, Tod P; Averill, James J; Marks, David

    2017-07-15

    OBJECTIVE To describe use of whole-genome sequencing (WGS) and evaluate the apparent sensitivity and specificity of antemortem tuberculosis tests during investigation of an unusual outbreak of Mycobacterium bovis infection in a Michigan dairy herd. DESIGN Bovine tuberculosis (bTB) outbreak investigation. ANIMALS Cattle, cats, dog, and wildlife. PROCEDURES All cattle in the index dairy herd were screened for bTB with the caudal fold test (CFT), and cattle ≥ 6 months old were also screened with a γ-interferon (γIFN) assay. The index herd was depopulated along with all barn cats and a dog that were fed unpasteurized milk from the herd. Select isolates from M bovis-infected animals from the index herd and other bTB-affected herds underwent WGS. Wildlife around all affected premises was examined for bTB. RESULTS No evidence of bTB was found in any wildlife examined. Within the index herd, 53 of 451 (11.8%) cattle and 12 of 21 (57%) cats were confirmed to be infected with M bovis. Prevalence of M bovis-infected cattle was greatest among 4- to 7-month-old calves (16/49 [33%]) followed by adult cows (36/203 [18%]). The apparent sensitivity and specificity were 86.8% and 92.7% for the CFT and 80.4% and 96.5% for the γIFN assay when results for those tests were interpreted separately and 96.1% and 91.7% when results were interpreted in parallel. Results of WGS revealed that M bovis-infected barn cats and cattle from the index herd and 6 beef operations were infected with the same strain of M bovis. Of the 6 bTB-affected beef operations identified during the investigation, 3 were linked to the index herd only by WGS results; there was no record of movement of livestock or waste milk from the index herd to those operations. CONCLUSIONS AND CLINICAL RELEVANCE Whole-genome sequencing enhanced the epidemiological investigation and should be used in all disease investigations. Performing the CFT and γIFN assay in parallel improved the antemortem ability to detect M bovis

  6. An Assessment of Different Genomic Approaches for Inferring Phylogeny of Listeria monocytogenes

    Directory of Open Access Journals (Sweden)

    Clémentine Henri

    2017-11-01

    Full Text Available Background/objectives: Whole genome sequencing (WGS has proven to be a powerful subtyping tool for foodborne pathogenic bacteria like L. monocytogenes. The interests of genome-scale analysis for national surveillance, outbreak detection or source tracking has been largely documented. The genomic data however can be exploited with many different bioinformatics methods like single nucleotide polymorphism (SNP, core-genome multi locus sequence typing (cgMLST, whole-genome multi locus sequence typing (wgMLST or multi locus predicted protein sequence typing (MLPPST on either core-genome (cgMLPPST or pan-genome (wgMLPPST. Currently, there are little comparisons studies of these different analytical approaches. Our objective was to assess and compare different genomic methods that can be implemented in order to cluster isolates of L. monocytogenes.Methods: The clustering methods were evaluated on a collection of 207 L. monocytogenes genomes of food origin representative of the genetic diversity of the Anses collection. The trees were then compared using robust statistical analyses.Results: The backward comparability between conventional typing methods and genomic methods revealed a near-perfect concordance. The importance of selecting a proper reference when calling SNPs was highlighted, although distances between strains remained identical. The analysis also revealed that the topology of the phylogenetic trees between wgMLST and cgMLST were remarkably similar. The comparison between SNP and cgMLST or SNP and wgMLST approaches showed that the topologies of phylogenic trees were statistically similar with an almost equivalent clustering.Conclusion: Our study revealed high concordance between wgMLST, cgMLST, and SNP approaches which are all suitable for typing of L. monocytogenes. The comparable clustering is an important observation considering that the two approaches have been variously implemented among reference laboratories.

  7. Whole genome sequence of the emerging oomycete pathogen Pythium insidiosum strain CDC-B5653 isolated from an infected human in the USA

    Directory of Open Access Journals (Sweden)

    Marina S. Ascunce

    2016-03-01

    Full Text Available Pythium insidiosum ATCC 200269 strain CDC-B5653, an isolate from necrotizing lesions on the mouth and eye of a 2-year-old boy in Memphis, Tennessee, USA, was sequenced using a combination of Illumina MiSeq (300 bp paired-end, 14 millions reads and PacBio (10  Kb fragment library, 356,001 reads. The sequencing data were assembled using SPAdes version 3.1.0, yielding a total genome size of 45.6 Mb contained in 8992 contigs, N50 of 13 Kb, 57% G + C content, and 17,867 putative protein-coding genes. This Whole Genome Shotgun project has been deposited at DDBJ/EMBL/GenBank under the accession JRHR00000000. Keywords: Oomycete, Pythium insidiosum, Pythiosis, Human emerging pathogen, Genome sequencing

  8. Comparative genomics of multidrug resistance in Acinetobacter baumannii.

    Directory of Open Access Journals (Sweden)

    Pierre-Edouard Fournier

    2006-01-01

    Full Text Available Acinetobacter baumannii is a species of nonfermentative gram-negative bacteria commonly found in water and soil. This organism was susceptible to most antibiotics in the 1970s. It has now become a major cause of hospital-acquired infections worldwide due to its remarkable propensity to rapidly acquire resistance determinants to a wide range of antibacterial agents. Here we use a comparative genomic approach to identify the complete repertoire of resistance genes exhibited by the multidrug-resistant A. baumannii strain AYE, which is epidemic in France, as well as to investigate the mechanisms of their acquisition by comparison with the fully susceptible A. baumannii strain SDF, which is associated with human body lice. The assembly of the whole shotgun genome sequences of the strains AYE and SDF gave an estimated size of 3.9 and 3.2 Mb, respectively. A. baumannii strain AYE exhibits an 86-kb genomic region termed a resistance island--the largest identified to date--in which 45 resistance genes are clustered. At the homologous location, the SDF strain exhibits a 20 kb-genomic island flanked by transposases but devoid of resistance markers. Such a switching genomic structure might be a hotspot that could explain the rapid acquisition of resistance markers under antimicrobial pressure. Sequence similarity and phylogenetic analyses confirm that most of the resistance genes found in the A. baumannii strain AYE have been recently acquired from bacteria of the genera Pseudomonas, Salmonella, or Escherichia. This study also resulted in the discovery of 19 new putative resistance genes. Whole-genome sequencing appears to be a fast and efficient approach to the exhaustive identification of resistance genes in epidemic infectious agents of clinical significance.

  9. Comparative Genomics of Multidrug Resistance in Acinetobacter baumannii.

    Directory of Open Access Journals (Sweden)

    2006-01-01

    Full Text Available Acinetobacter baumannii is a species of nonfermentative gram-negative bacteria commonly found in water and soil. This organism was susceptible to most antibiotics in the 1970s. It has now become a major cause of hospital-acquired infections worldwide due to its remarkable propensity to rapidly acquire resistance determinants to a wide range of antibacterial agents. Here we use a comparative genomic approach to identify the complete repertoire of resistance genes exhibited by the multidrug-resistant A. baumannii strain AYE, which is epidemic in France, as well as to investigate the mechanisms of their acquisition by comparison with the fully susceptible A. baumannii strain SDF, which is associated with human body lice. The assembly of the whole shotgun genome sequences of the strains AYE and SDF gave an estimated size of 3.9 and 3.2 Mb, respectively. A. baumannii strain AYE exhibits an 86-kb genomic region termed a resistance island-the largest identified to date-in which 45 resistance genes are clustered. At the homologous location, the SDF strain exhibits a 20 kb-genomic island flanked by transposases but devoid of resistance markers. Such a switching genomic structure might be a hotspot that could explain the rapid acquisition of resistance markers under antimicrobial pressure. Sequence similarity and phylogenetic analyses confirm that most of the resistance genes found in the A. baumannii strain AYE have been recently acquired from bacteria of the genera Pseudomonas, Salmonella, or Escherichia. This study also resulted in the discovery of 19 new putative resistance genes. Whole-genome sequencing appears to be a fast and efficient approach to the exhaustive identification of resistance genes in epidemic infectious agents of clinical significance.

  10. Rapid and accurate pyrosequencing of angiosperm plastid genomes

    Science.gov (United States)

    Moore, Michael J; Dhingra, Amit; Soltis, Pamela S; Shaw, Regina; Farmerie, William G; Folta, Kevin M; Soltis, Douglas E

    2006-01-01

    genome sequence was generated for a significant reduction in time and cost over traditional shotgun-based genome sequencing techniques, although with approximately half the coverage of previously reported GS 20 de novo genome sequence. The GS 20 should be broadly applicable to angiosperm plastid genome sequencing, and therefore promises to expand the scale of plant genetic and phylogenetic research dramatically. PMID:16934154

  11. Rapid and accurate pyrosequencing of angiosperm plastid genomes

    Directory of Open Access Journals (Sweden)

    Farmerie William G

    2006-08-01

    observed in the GS 20 plastid genome sequence was generated for a significant reduction in time and cost over traditional shotgun-based genome sequencing techniques, although with approximately half the coverage of previously reported GS 20 de novo genome sequence. The GS 20 should be broadly applicable to angiosperm plastid genome sequencing, and therefore promises to expand the scale of plant genetic and phylogenetic research dramatically.

  12. The draft genome sequence of Mangrovibacter sp. strain MP23, an endophyte isolated from the roots of Phragmites karka

    Directory of Open Access Journals (Sweden)

    Pratiksha Behera

    2016-09-01

    Full Text Available Till date, only one draft genome has been reported within the genus Mangrovibacter. Here, we report the second draft genome shotgun sequence of a Mangrovibacter sp. strain MP23 that was isolated from the roots of Phargmites karka (P. karka, an invasive weed growing in the Chilika Lagoon, Odisha, India. Strain MP23 is a facultative anaerobic, nitrogen-fixing endophytic bacteria that grows optimally at 37 °C, 7.0 pH, and 1% NaCl concentration. The draft genome sequence of strain MP23 contains 4,947,475 bp with an estimated G + C content of 49.9% and total 4392 protein coding genes. The genome sequence has provided information on putative genes that code for proteins involved in oxidative stress, uptake of nutrients, and nitrogen fixation that might offer niche specific ecological fitness and explain the invasive success of P. karka in Chilika Lagoon. The draft genome sequence and annotation have been deposited at DDBJ/EMBL/GenBank under the accession number LYRP00000000.

  13. Improving Phrap-based assembly of the rat using "reliable" overlaps.

    Directory of Open Access Journals (Sweden)

    Michael Roberts

    2008-03-01

    Full Text Available The assembly methods used for whole-genome shotgun (WGS data have a major impact on the quality of resulting draft genomes. We present a novel algorithm to generate a set of "reliable" overlaps based on identifying repeat k-mers. To demonstrate the benefits of using reliable overlaps, we have created a version of the Phrap assembly program that uses only overlaps from a specific list. We call this version PhrapUMD. Integrating PhrapUMD and our "reliable-overlap" algorithm with the Baylor College of Medicine assembler, Atlas, we assemble the BACs from the Rattus norvegicus genome project. Starting with the same data as the Nov. 2002 Atlas assembly, we compare our results and the Atlas assembly to the 4.3 Mb of rat sequence in the 21 BACs that have been finished. Our version of the draft assembly of the 21 BACs increases the coverage of finished sequence from 93.4% to 96.3%, while simultaneously reducing the base error rate from 4.5 to 1.1 errors per 10,000 bases. There are a number of ways of assessing the relative merits of assemblies when the finished sequence is available. If one views the overall quality of an assembly as proportional to the inverse of the product of the error rate and sequence missed, then the assembly presented here is seven times better. The UMD Overlapper with options for reliable overlaps is available from the authors at http://www.genome.umd.edu. We also provide the changes to the Phrap source code enabling it to use only the reliable overlaps.

  14. Plastic litter from shotgun ammunition on Danish coastlines - Amounts and provenance.

    Science.gov (United States)

    Kanstrup, Niels; Balsby, Thorsten J S

    2018-06-01

    Plastic litter in the marine environment is a major global issue. Discarded plastic shotgun ammunition shells and discharged wads are an unwelcome addition and feature among the top ten litter items found on reference beaches in Denmark. To understand this problem, its scale and origins, collections were made by volunteers along Danish coastal shorelines. In all 3669 plastic ammunition items were collected at 68 sites along 44.6 km of shoreline. The collected items were scored for characteristic variables such as gauge and length, shot type, and the legibility of text, the erosion, and the presence of metallic components. Scores for characteristics were related to the site, area, and season and possible influences discussed. The prevalence of collected plastic shotgun litter ranges from zero to 41 items per 100 m with an average of 3.7 items per 100 m. Most ammunition litter on Danish coasts originates from hunting on Danish coastal waterbodies, but a small amount may come from further afield. North Sea coasts are the most distinctive suggesting the possible contribution of long distance drift as well as the likelihood that such litter can persist in marine habitats for decades. The pathway from initial discard to eventual wash-up and collection depends on the physical properties of plastic components, marine tides and currents, coastal topography and shoreline vegetation. Judging from the disintegration of the cartridge and the wear and decomposition of components, we conclude that there is a substantial supply of polluting plastic ammunition materials that has and will accumulate. These plastic items pose a hazard to marine ecosystems and wash up on coasts for many years to come. We recommend that responsible managers, hunters and ammunition manufacturers will take action now to reduce the problem and, thereby, protect ecosystems, wildlife and the sustainability of hunting. Copyright © 2018 Elsevier Ltd. All rights reserved.

  15. Application of the whole-transcriptome shotgun sequencing approach to the study of Philadelphia-positive acute lymphoblastic leukemia

    International Nuclear Information System (INIS)

    Iacobucci, I; Ferrarini, A; Sazzini, M; Giacomelli, E; Lonetti, A; Xumerle, L; Ferrari, A; Papayannidis, C; Malerba, G; Luiselli, D; Boattini, A; Garagnani, P; Vitale, A; Soverini, S; Pane, F; Baccarani, M; Delledonne, M; Martinelli, G

    2012-01-01

    Although the pathogenesis of BCR–ABL1-positive acute lymphoblastic leukemia (ALL) is mainly related to the expression of the BCR–ABL1 fusion transcript, additional cooperating genetic lesions are supposed to be involved in its development and progression. Therefore, in an attempt to investigate the complex landscape of mutations, changes in expression profiles and alternative splicing (AS) events that can be observed in such disease, the leukemia transcriptome of a BCR–ABL1-positive ALL patient at diagnosis and at relapse was sequenced using a whole-transcriptome shotgun sequencing (RNA-Seq) approach. A total of 13.9 and 15.8 million sequence reads was generated from de novo and relapsed samples, respectively, and aligned to the human genome reference sequence. This led to the identification of five validated missense mutations in genes involved in metabolic processes (DPEP1, TMEM46), transport (MVP), cell cycle regulation (ABL1) and catalytic activity (CTSZ), two of which resulted in acquired relapse variants. In all, 6390 and 4671 putative AS events were also detected, as well as expression levels for 18 315 and 18 795 genes, 28% of which were differentially expressed in the two disease phases. These data demonstrate that RNA-Seq is a suitable approach for identifying a wide spectrum of genetic alterations potentially involved in ALL

  16. Young, intact and nested retrotransposons are abundant in the onion and asparagus genomes.

    Science.gov (United States)

    Vitte, C; Estep, M C; Leebens-Mack, J; Bennetzen, J L

    2013-09-01

    Although monocotyledonous plants comprise one of the two major groups of angiosperms and include >65 000 species, comprehensive genome analysis has been focused mainly on the Poaceae (grass) family. Due to this bias, most of the conclusions that have been drawn for monocot genome evolution are based on grasses. It is not known whether these conclusions apply to many other monocots. To extend our understanding of genome evolution in the monocots, Asparagales genomic sequence data were acquired and the structural properties of asparagus and onion genomes were analysed. Specifically, several available onion and asparagus bacterial artificial chromosomes (BACs) with contig sizes >35 kb were annotated and analysed, with a particular focus on the characterization of long terminal repeat (LTR) retrotransposons. The results reveal that LTR retrotransposons are the major components of the onion and garden asparagus genomes. These elements are mostly intact (i.e. with two LTRs), have mainly inserted within the past 6 million years and are piled up into nested structures. Analysis of shotgun genomic sequence data and the observation of two copies for some transposable elements (TEs) in annotated BACs indicates that some families have become particularly abundant, as high as 4-5 % (asparagus) or 3-4 % (onion) of the genome for the most abundant families, as also seen in large grass genomes such as wheat and maize. Although previous annotations of contiguous genomic sequences have suggested that LTR retrotransposons were highly fragmented in these two Asparagales genomes, the results presented here show that this was largely due to the methodology used. In contrast, this current work indicates an ensemble of genomic features similar to those observed in the Poaceae.

  17. Genome sequencing of the sweetpotato whitefly Bemisia tabaci MED/Q.

    Science.gov (United States)

    Xie, Wen; Chen, Chunhai; Yang, Zezhong; Guo, Litao; Yang, Xin; Wang, Dan; Chen, Ming; Huang, Jinqun; Wen, Yanan; Zeng, Yang; Liu, Yating; Xia, Jixing; Tian, Lixia; Cui, Hongying; Wu, Qingjun; Wang, Shaoli; Xu, Baoyun; Li, Xianchun; Tan, Xinqiu; Ghanim, Murad; Qiu, Baoli; Pan, Huipeng; Chu, Dong; Delatte, Helene; Maruthi, M N; Ge, Feng; Zhou, Xueping; Wang, Xiaowei; Wan, Fanghao; Du, Yuzhou; Luo, Chen; Yan, Fengming; Preisser, Evan L; Jiao, Xiaoguo; Coates, Brad S; Zhao, Jinyang; Gao, Qiang; Xia, Jinquan; Yin, Ye; Liu, Yong; Brown, Judith K; Zhou, Xuguo Joe; Zhang, Youjun

    2017-05-01

    The sweetpotato whitefly Bemisia tabaci is a highly destructive agricultural and ornamental crop pest. It damages host plants through both phloem feeding and vectoring plant pathogens. Introductions of B. tabaci are difficult to quarantine and eradicate because of its high reproductive rates, broad host plant range, and insecticide resistance. A total of 791 Gb of raw DNA sequence from whole genome shotgun sequencing, and 13 BAC pooling libraries were generated by Illumina sequencing using different combinations of mate-pair and pair-end libraries. Assembly gave a final genome with a scaffold N50 of 437 kb, and a total length of 658 Mb. Annotation of repetitive elements and coding regions resulted in 265.0 Mb TEs (40.3%) and 20 786 protein-coding genes with putative gene family expansions, respectively. Phylogenetic analysis based on orthologs across 14 arthropod taxa suggested that MED/Q is clustered into a hemipteran clade containing A. pisum and is a sister lineage to a clade containing both R. prolixus and N. lugens. Genome completeness, as estimated using the CEGMA and Benchmarking Universal Single-Copy Orthologs pipelines, reached 96% and 79%. These MED/Q genomic resources lay a foundation for future 'pan-genomic' comparisons of invasive vs. noninvasive, invasive vs. invasive, and native vs. exotic Bemisia, which, in return, will open up new avenues of investigation into whitefly biology, evolution, and management. © The Author 2017. Published by Oxford University Press.

  18. The first complete chloroplast genome sequence of a lycophyte,Huperzia lucidula (Lycopodiaceae)

    Energy Technology Data Exchange (ETDEWEB)

    Wolf, Paul G.; Karol, Kenneth G.; Mandoli, Dina F.; Kuehl,Jennifer V.; Arumuganathan, K.; Ellis, Mark W.; Mishler, Brent D.; Kelch,Dean G.; Olmstead, Richard G.; Boore, Jeffrey L.

    2005-02-01

    We used a unique combination of techniques to sequence the first complete chloroplast genome of a lycophyte, Huperzia lucidula. This plant belongs to a significant clade hypothesized to represent the sister group to all other vascular plants. We used fluorescence-activated cell sorting (FACS) to isolate the organelles, rolling circle amplification (RCA) to amplify the genome, and shotgun sequencing to 8x depth coverage to obtain the complete chloroplast genome sequence. The genome is 154,373bp, containing inverted repeats of 15,314 bp each, a large single-copy region of 104,088 bp, and a small single-copy region of 19,671 bp. Gene order is more similar to those of mosses, liverworts, and hornworts than to gene order for other vascular plants. For example, the Huperziachloroplast genome possesses the bryophyte gene order for a previously characterized 30 kb inversion, thus supporting the hypothesis that lycophytes are sister to all other extant vascular plants. The lycophytechloroplast genome data also enable a better reconstruction of the basaltracheophyte genome, which is useful for inferring relationships among bryophyte lineages. Several unique characters are observed in Huperzia, such as movement of the gene ndhF from the small single copy region into the inverted repeat. We present several analyses of evolutionary relationships among land plants by using nucleotide data, amino acid sequences, and by comparing gene arrangements from chloroplast genomes. The results, while still tentative pending the large number of chloroplast genomes from other key lineages that are soon to be sequenced, are intriguing in themselves, and contribute to a growing comparative database of genomic and morphological data across the green plants.

  19. The temporal analysis of yeast exponential phase using shotgun proteomics as a fermentation monitoring technique.

    Science.gov (United States)

    Huang, Eric L; Orsat, Valérie; Shah, Manesh B; Hettich, Robert L; VerBerkmoes, Nathan C; Lefsrud, Mark G

    2012-09-18

    System biology and bioprocess technology can be better understood using shotgun proteomics as a monitoring system during the fermentation. We demonstrated a shotgun proteomic method to monitor the temporal yeast proteome in early, middle and late exponential phases. Our study identified a total of 1389 proteins combining all 2D-LC-MS/MS runs. The temporal Saccharomyces cerevisiae proteome was enriched with proteolysis, radical detoxification, translation, one-carbon metabolism, glycolysis and TCA cycle. Heat shock proteins and proteins associated with oxidative stress response were found throughout the exponential phase. The most abundant proteins observed were translation elongation factors, ribosomal proteins, chaperones and glycolytic enzymes. The high abundance of the H-protein of the glycine decarboxylase complex (Gcv3p) indicated the availability of glycine in the environment. We observed differentially expressed proteins and the induced proteins at mid-exponential phase were involved in ribosome biogenesis, mitochondria DNA binding/replication and transcriptional activator. Induction of tryptophan synthase (Trp5p) indicated the abundance of tryptophan during the fermentation. As fermentation progressed toward late exponential phase, a decrease in cell proliferation was implied from the repression of ribosomal proteins, transcription coactivators, methionine aminopeptidase and translation-associated proteins. Copyright © 2012 Elsevier B.V. All rights reserved.

  20. Deeper insight into the structure of the anaerobic digestion microbial community; the biogas microbiome database is expanded with 157 new genomes

    DEFF Research Database (Denmark)

    Treu, Laura; Kougias, Panagiotis; Campanaro, Stefano

    2016-01-01

    strategy resulted in the highest, up to now, extraction of microbial genomes involved in biogas producing systems. From the 236 extracted genome bins, it was remarkably found that the vast majority of them could only be characterized at high taxonomic levels. This result confirms that the biogas microbiome......This research aimed to better characterize the biogas microbiome by means of high throughput metagenomic sequencing and to elucidate the core microbial consortium existing in biogas reactors independently from the operational conditions. Assembly of shotgun reads followed by an established binning...... is comprised by a consortium of unknown species. A comparative analysis between the genome bins of the current study and those extracted from a previous metagenomic assembly demonstrated a similar phylogenetic distribution of the main taxa. Finally, this analysis led to the identification of a subset of common...

  1. Development of Low Cost Membranes (Ta, Nb & Cellulose Acetate) for H2/CO2 Separation in WGS Reactors

    Energy Technology Data Exchange (ETDEWEB)

    Seetala, Naidu [Grambling State Univ., LA (United States); Siriwardane, Upali [Louisiana Tech Univ., Ruston, LA (United States)

    2011-12-15

    The main aim of this work is to synthesize low temperature bimetallic nanocatalysts for Water Gas Shift reaction (WGS) for hydrogen production from CO and steam mixture; and develop low-cost metal (Nb/Ta)/ceramic membranes for H2 separation and Cellulose Acetate membranes for CO2 separation. .

  2. Comparative genomic data of the Avian Phylogenomics Project.

    Science.gov (United States)

    Zhang, Guojie; Li, Bo; Li, Cai; Gilbert, M Thomas P; Jarvis, Erich D; Wang, Jun

    2014-01-01

    The evolutionary relationships of modern birds are among the most challenging to understand in systematic biology and have been debated for centuries. To address this challenge, we assembled or collected the genomes of 48 avian species spanning most orders of birds, including all Neognathae and two of the five Palaeognathae orders, and used the genomes to construct a genome-scale avian phylogenetic tree and perform comparative genomics analyses (Jarvis et al. in press; Zhang et al. in press). Here we release assemblies and datasets associated with the comparative genome analyses, which include 38 newly sequenced avian genomes plus previously released or simultaneously released genomes of Chicken, Zebra finch, Turkey, Pigeon, Peregrine falcon, Duck, Budgerigar, Adelie penguin, Emperor penguin and the Medium Ground Finch. We hope that this resource will serve future efforts in phylogenomics and comparative genomics. The 38 bird genomes were sequenced using the Illumina HiSeq 2000 platform and assembled using a whole genome shotgun strategy. The 48 genomes were categorized into two groups according to the N50 scaffold size of the assemblies: a high depth group comprising 23 species sequenced at high coverage (>50X) with multiple insert size libraries resulting in N50 scaffold sizes greater than 1 Mb (except the White-throated Tinamou and Bald Eagle); and a low depth group comprising 25 species sequenced at a low coverage (~30X) with two insert size libraries resulting in an average N50 scaffold size of about 50 kb. Repetitive elements comprised 4%-22% of the bird genomes. The assembled scaffolds allowed the homology-based annotation of 13,000 ~ 17000 protein coding genes in each avian genome relative to chicken, zebra finch and human, as well as comparative and sequence conservation analyses. Here we release full genome assemblies of 38 newly sequenced avian species, link genome assembly downloads for the 7 of the remaining 10 species, and provide a guideline of

  3. Mapping whole genome shotgun sequence and variant calling in mammalian species without their reference genomes [v2; ref status: indexed, http://f1000r.es/2x3

    Directory of Open Access Journals (Sweden)

    Ted Kalbfleisch

    2014-02-01

    Full Text Available Genomics research in mammals has produced reference genome sequences that are essential for identifying variation associated with disease.  High quality reference genome sequences are now available for humans, model species, and economically important agricultural animals.  Comparisons between these species have provided unique insights into mammalian gene function.  However, the number of species with reference genomes is small compared to those needed for studying molecular evolutionary relationships in the tree of life.  For example, among the even-toed ungulates there are approximately 300 species whose phylogenetic relationships have been calculated in the 10k trees project.  Only six of these have reference genomes:  cattle, swine, sheep, goat, water buffalo, and bison.  Although reference sequences will eventually be developed for additional hoof stock, the resources in terms of time, money, infrastructure and expertise required to develop a quality reference genome may be unattainable for most species for at least another decade.  In this work we mapped 35 Gb of next generation sequence data of a Katahdin sheep to its own species’ reference genome (Ovis aries Oar3.1 and to that of a species that diverged 15 to 30 million years ago (Bos taurus UMD3.1.  In total, 56% of reads covered 76% of UMD3.1 to an average depth of 6.8 reads per site, 83 million variants were identified, of which 78 million were homozygous and likely represent interspecies nucleotide differences. Excluding repeat regions and sex chromosomes, nearly 3.7 million heterozygous sites were identified in this animal vs. bovine UMD3.1, representing polymorphisms occurring in sheep.  Of these, 41% could be readily mapped to orthologous positions in ovine Oar3.1 with 80% corroborated as heterozygous.  These variant sites, identified via interspecies mapping could be used for comparative genomics, disease association studies, and ultimately to understand

  4. Whole-Genome Sequence Analysis of Antimicrobial Resistance Genes in Streptococcus uberis and Streptococcus dysgalactiae Isolates from Canadian Dairy Herds

    Directory of Open Access Journals (Sweden)

    Julián Reyes Vélez

    2017-05-01

    Full Text Available The objectives of this study are to determine the occurrence of antimicrobial resistance (AMR genes using whole-genome sequence (WGS of Streptococcus uberis (S. uberis and Streptococcus dysgalactiae (S. dysgalactiae isolates, recovered from dairy cows in the Canadian Maritime Provinces. A secondary objective included the exploration of the association between phenotypic AMR and the genomic characteristics (genome size, guanine–cytosine content, and occurrence of unique gene sequences. Initially, 91 isolates were sequenced, and of these isolates, 89 were assembled. Furthermore, 16 isolates were excluded due to larger than expected genomic sizes (>2.3 bp × 1,000 bp. In the final analysis, 73 were used with complete WGS and minimum inhibitory concentration records, which were part of the previous phenotypic AMR study, representing 18 dairy herds from the Maritime region of Canada (1. A total of 23 unique AMR gene sequences were found in the bacterial genomes, with a mean number of 8.1 (minimum: 5; maximum: 13 per genome. Overall, there were 10 AMR genes [ANT(6, TEM-127, TEM-163, TEM-89, TEM-95, Linb, Lnub, Ermb, Ermc, and TetS] present only in S. uberis genomes and 2 genes unique (EF-TU and TEM-71 to the S. dysgalactiae genomes; 11 AMR genes [APH(3′, TEM-1, TEM-136, TEM-157, TEM-47, TetM, bl2b, gyrA, parE, phoP, and rpoB] were found in both bacterial species. Two-way tabulations showed association between the phenotypic susceptibility to lincosamides and the presence of linB (P = 0.002 and lnuB (P < 0.001 genes and the between the presence of tetM (P = 0.015 and tetS (P = 0.064 genes and phenotypic resistance to tetracyclines only for the S. uberis isolates. The logistic model showed that the odds of resistance (to any of the phenotypically tested antimicrobials was 4.35 times higher when there were >11 AMR genes present in the genome, compared with <7 AMR genes (P < 0.001. The odds of resistance was lower for S

  5. Identification of low-confidence regions in the pig reference genome (Sscrofa10.2

    Directory of Open Access Journals (Sweden)

    Amanda eWarr

    2015-11-01

    Full Text Available Many applications of high throughput sequencing rely on the availability of an accurate reference genome. Variant calling often produces large data sets that cannot be realistically validated and which may contain large numbers of false-positives. Errors in the reference assembly increase the number of false-positives. While resources are available to aid in the filtering of variants from human data, for other species these do not yet exist and strict filtering techniques must be employed which are more likely to exclude true-positives. This work assesses the accuracy of the pig reference genome (Sscrofa10.2 using whole genome sequencing reads from the Duroc sow whose genome the assembly was based on. Indicators of structural variation including high regional coverage, unexpected insert sizes, improper pairing and homozygous variants were used to identify low quality (LQ regions of the assembly. Low coverage (LC regions were also identified and analyzed separately. The LQ regions covered 13.85% of the genome, the LC regions covered 26.6% of the genome and combined (LQLC they covered 33.07% of the genome. Over half of dbSNP variants were located in the LQLC regions. Of CNVRs identified in a previous study, 86.3% were located in the LQLC regions. The regions were also enriched for gene predictions from RNA-seq data with 42.98% falling in the LQLC regions. Excluding variants in the LQ, LC or LQLC from future analyses will help reduce the number of false-positive variant calls. Researchers using WGS data should be aware that the current pig reference genome does not give an accurate representation of the copy number of alleles in the original Duroc sow’s genome.

  6. An Efficient Genome Fragment Assembling Using GA with Neighborhood Aware Fitness Function

    Directory of Open Access Journals (Sweden)

    Satoko Kikuchi

    2012-01-01

    Full Text Available To decode a long genome sequence, shotgun sequencing is the state-of-the-art technique. It needs to properly sequence a very large number, sometimes as large as millions, of short partially readable strings (fragments. Arranging those fragments in correct sequence is known as fragment assembling, which is an NP-problem. Presently used methods require enormous computational cost. In this work, we have shown how our modified genetic algorithm (GA could solve this problem efficiently. In the proposed GA, the length of the chromosome, which represents the volume of the search space, is reduced with advancing generations, and thereby improves search efficiency. We also introduced a greedy mutation, by swapping nearby fragments using some heuristics, to improve the fitness of chromosomes. We compared results with Parsons’ algorithm which is based on GA too. We used fragments with partial reads on both sides, mimicking fragments in real genome assembling process. In Parsons’ work base-pair array of the whole fragment is known. Even then, we could obtain much better results, and we succeeded in restructuring contigs covering 100% of the genome sequences.

  7. Simultaneous Emergence of Multidrug-Resistant Candida auris on 3 Continents Confirmed by Whole-Genome Sequencing and Epidemiological Analyses.

    Science.gov (United States)

    Lockhart, Shawn R; Etienne, Kizee A; Vallabhaneni, Snigdha; Farooqi, Joveria; Chowdhary, Anuradha; Govender, Nelesh P; Colombo, Arnaldo Lopes; Calvo, Belinda; Cuomo, Christina A; Desjardins, Christopher A; Berkow, Elizabeth L; Castanheira, Mariana; Magobo, Rindidzani E; Jabeen, Kauser; Asghar, Rana J; Meis, Jacques F; Jackson, Brendan; Chiller, Tom; Litvintseva, Anastasia P

    2017-01-15

    Candida auris, a multidrug-resistant yeast that causes invasive infections, was first described in 2009 in Japan and has since been reported from several countries. To understand the global emergence and epidemiology of C. auris, we obtained isolates from 54 patients with C. auris infection from Pakistan, India, South Africa, and Venezuela during 2012-2015 and the type specimen from Japan. Patient information was available for 41 of the isolates. We conducted antifungal susceptibility testing and whole-genome sequencing (WGS). Available clinical information revealed that 41% of patients had diabetes mellitus, 51% had undergone recent surgery, 73% had a central venous catheter, and 41% were receiving systemic antifungal therapy when C. auris was isolated. The median time from admission to infection was 19 days (interquartile range, 9-36 days), 61% of patients had bloodstream infection, and 59% died. Using stringent break points, 93% of isolates were resistant to fluconazole, 35% to amphotericin B, and 7% to echinocandins; 41% were resistant to 2 antifungal classes and 4% were resistant to 3 classes. WGS demonstrated that isolates were grouped into unique clades by geographic region. Clades were separated by thousands of single-nucleotide polymorphisms, but within each clade isolates were clonal. Different mutations in ERG11 were associated with azole resistance in each geographic clade. C. auris is an emerging healthcare-associated pathogen associated with high mortality. Treatment options are limited, due to antifungal resistance. WGS analysis suggests nearly simultaneous, and recent, independent emergence of different clonal populations on 3 continents. Risk factors and transmission mechanisms need to be elucidated to guide control measures. Published by Oxford University Press for the Infectious Diseases Society of America 2016. This work is written by (a) US Government employee(s) and is in the public domain in the US.

  8. A comparison of statistical methods for genomic selection in a mice population

    Directory of Open Access Journals (Sweden)

    Neves Haroldo HR

    2012-11-01

    Full Text Available Abstract Background The availability of high-density panels of SNP markers has opened new perspectives for marker-assisted selection strategies, such that genotypes for these markers are used to predict the genetic merit of selection candidates. Because the number of markers is often much larger than the number of phenotypes, marker effect estimation is not a trivial task. The objective of this research was to compare the predictive performance of ten different statistical methods employed in genomic selection, by analyzing data from a heterogeneous stock mice population. Results For the five traits analyzed (W6W: weight at six weeks, WGS: growth slope, BL: body length, %CD8+: percentage of CD8+ cells, CD4+/ CD8+: ratio between CD4+ and CD8+ cells, within-family predictions were more accurate than across-family predictions, although this superiority in accuracy varied markedly across traits. For within-family prediction, two kernel methods, Reproducing Kernel Hilbert Spaces Regression (RKHS and Support Vector Regression (SVR, were the most accurate for W6W, while a polygenic model also had comparable performance. A form of ridge regression assuming that all markers contribute to the additive variance (RR_GBLUP figured among the most accurate for WGS and BL, while two variable selection methods ( LASSO and Random Forest, RF had the greatest predictive abilities for %CD8+ and CD4+/ CD8+. RF, RKHS, SVR and RR_GBLUP outperformed the remainder methods in terms of bias and inflation of predictions. Conclusions Methods with large conceptual differences reached very similar predictive abilities and a clear re-ranking of methods was observed in function of the trait analyzed. Variable selection methods were more accurate than the remainder in the case of %CD8+ and CD4+/CD8+ and these traits are likely to be influenced by a smaller number of QTL than the remainder. Judged by their overall performance across traits and computational requirements, RR

  9. Environmental whole-genome amplification to access microbial populations in contaminated sediments

    Energy Technology Data Exchange (ETDEWEB)

    Abulencia, Carl B [Diversa Corporation; Wyborski, Denise L. [Diversa Corporation; Garcia, Joseph A. [Diversa Corporation; Podar, Mircea [ORNL; Chen, Wenqiong [Diversa Corporation; Chang, Sherman H. [Diversa Corporation; Chang, Hwai W. [Diversa Corporation; Watson, David B [ORNL; Brodie, Eoin L. [Lawrence Berkeley National Laboratory (LBNL); Hazen, Terry [Lawrence Berkeley National Laboratory (LBNL); Keller, Martin [ORNL

    2006-05-01

    Low-biomass samples from nitrate and heavy metal contaminated soils yield DNA amounts that have limited use for direct, native analysis and screening. Multiple displacement amplification (MDA) using {phi}29 DNA polymerase was used to amplify whole genomes from environmental, contaminated, subsurface sediments. By first amplifying the genomic DNA (gDNA), biodiversity analysis and gDNA library construction of microbes found in contaminated soils were made possible. The MDA method was validated by analyzing amplified genome coverage from approximately five Escherichia coli cells, resulting in 99.2% genome coverage. The method was further validated by confirming overall representative species coverage and also an amplification bias when amplifying from a mix of eight known bacterial strains. We extracted DNA from samples with extremely low cell densities from a U.S. Department of Energy contaminated site. After amplification, small-subunit rRNA analysis revealed relatively even distribution of species across several major phyla. Clone libraries were constructed from the amplified gDNA, and a small subset of clones was used for shotgun sequencing. BLAST analysis of the library clone sequences showed that 64.9% of the sequences had significant similarities to known proteins, and 'clusters of orthologous groups' (COG) analysis revealed that more than half of the sequences from each library contained sequence similarity to known proteins. The libraries can be readily screened for native genes or any target of interest. Whole-genome amplification of metagenomic DNA from very minute microbial sources, while introducing an amplification bias, will allow access to genomic information that was not previously accessible.

  10. Environmental Whole-Genome Amplification to Access Microbial Diversity in Contaminated Sediments

    Energy Technology Data Exchange (ETDEWEB)

    Abulencia, C.B.; Wyborski, D.L.; Garcia, J.; Podar, M.; Chen, W.; Chang, S.H.; Chang, H.W.; Watson, D.; Brodie,E.I.; Hazen, T.C.; Keller, M.

    2005-12-10

    Low-biomass samples from nitrate and heavy metal contaminated soils yield DNA amounts that have limited use for direct, native analysis and screening. Multiple displacement amplification (MDA) using ?29 DNA polymerase was used to amplify whole genomes from environmental, contaminated, subsurface sediments. By first amplifying the genomic DNA (gDNA), biodiversity analysis and gDNA library construction of microbes found in contaminated soils were made possible. The MDA method was validated by analyzing amplified genome coverage from approximately five Escherichia coli cells, resulting in 99.2 percent genome coverage. The method was further validated by confirming overall representative species coverage and also an amplification bias when amplifying from a mix of eight known bacterial strains. We extracted DNA from samples with extremely low cell densities from a U.S. Department of Energy contaminated site. After amplification, small subunit rRNA analysis revealed relatively even distribution of species across several major phyla. Clone libraries were constructed from the amplified gDNA, and a small subset of clones was used for shotgun sequencing. BLAST analysis of the library clone sequences showed that 64.9 percent of the sequences had significant similarities to known proteins, and ''clusters of orthologous groups'' (COG) analysis revealed that more than half of the sequences from each library contained sequence similarity to known proteins. The libraries can be readily screened for native genes or any target of interest. Whole-genome amplification of metagenomic DNA from very minute microbial sources, while introducing an amplification bias, will allow access to genomic information that was not previously accessible.

  11. Improved imputation of low-frequency and rare variants using the UK10K haplotype reference panel

    NARCIS (Netherlands)

    J. Huang (Jie); B. Howie (Bryan); S. McCarthy (Shane); Y. Memari (Yasin); K. Walter (Klaudia); J.L. Min (Josine L.); P. Danecek (Petr); G. Malerba (Giovanni); E. Trabetti (Elisabetta); H.-F. Zheng (Hou-Feng); G. Gambaro (Giovanni); J.B. Richards (Brent); R. Durbin (Richard); N.J. Timpson (Nicholas); J. Marchini (Jonathan); N. Soranzo (Nicole); S.H. Al Turki (Saeed); A. Amuzu (Antoinette); C. Anderson (Carl); R. Anney (Richard); D. Antony (Dinu); M.S. Artigas; M. Ayub (Muhammad); S. Bala (Senduran); J.C. Barrett (Jeffrey); I.E. Barroso (Inês); P.L. Beales (Philip); M. Benn (Marianne); J. Bentham (Jamie); S. Bhattacharya (Shoumo); E. Birney (Ewan); D.H.R. Blackwood (Douglas); M. Bobrow (Martin); E. Bochukova (Elena); P.F. Bolton (Patrick F.); R. Bounds (Rebecca); C. Boustred (Chris); G. Breen (Gerome); M. Calissano (Mattia); K. Carss (Keren); J.P. Casas (Juan Pablo); J.C. Chambers (John C.); R. Charlton (Ruth); K. Chatterjee (Krishna); L. Chen (Lu); A. Ciampi (Antonio); S. Cirak (Sebahattin); P. Clapham (Peter); G. Clement (Gail); G. Coates (Guy); M. Cocca (Massimiliano); D.A. Collier (David); C. Cosgrove (Catherine); T. Cox (Tony); N.J. Craddock (Nick); L. Crooks (Lucy); S. Curran (Sarah); D. Curtis (David); A. Daly (Allan); I.N.M. Day (Ian N.M.); A.G. Day-Williams (Aaron); G.V. Dedoussis (George); T. Down (Thomas); Y. Du (Yuanping); C.M. van Duijn (Cornelia); I. Dunham (Ian); T. Edkins (Ted); R. Ekong (Rosemary); P. Ellis (Peter); D.M. Evans (David); I.S. Farooqi (I. Sadaf); D.R. Fitzpatrick (David R.); P. Flicek (Paul); J. Floyd (James); A.R. Foley (A. Reghan); C.S. Franklin (Christopher S.); M. Futema (Marta); L. Gallagher (Louise); P. Gasparini (Paolo); T.R. Gaunt (Tom); M. Geihs (Matthias); D. Geschwind (Daniel); C.M.T. Greenwood (Celia); H. Griffin (Heather); D. Grozeva (Detelina); X. Guo (Xiaosen); X. Guo (Xueqin); H. Gurling (Hugh); D. Hart (Deborah); A.E. Hendricks (Audrey E.); P.A. Holmans (Peter A.); L. Huang (Liren); T. Hubbard (Tim); S.E. Humphries (Steve E.); M.E. Hurles (Matthew); P.G. Hysi (Pirro); V. Iotchkova (Valentina); A. Isaacs (Aaron); D.K. Jackson (David K.); Y. Jamshidi (Yalda); J. Johnson (Jon); C. Joyce (Chris); K.J. Karczewski (Konrad); J. Kaye (Jane); T. Keane (Thomas); J.P. Kemp (John); K. Kennedy (Karen); A. Kent (Alastair); J. Keogh (Julia); F. Khawaja (Farrah); M.E. Kleber (Marcus); M. Van Kogelenberg (Margriet); A. Kolb-Kokocinski (Anja); J.S. Kooner (Jaspal S.); G. Lachance (Genevieve); C. Langenberg (Claudia); C. Langford (Cordelia); D. Lawson (Daniel); I. Lee (Irene); E.M. van Leeuwen (Elisa); M. Lek (Monkol); R. Li (Rui); Y. Li (Yingrui); J. Liang (Jieqin); H. Lin (Hong); R. Liu (Ryan); J. Lönnqvist (Jouko); L.R. Lopes (Luis R.); M.C. Lopes (Margarida); J. Luan; D.G. MacArthur (Daniel G.); M. Mangino (Massimo); G. Marenne (Gaëlle); W. März (Winfried); J. Maslen (John); A. Matchan (Angela); I. Mathieson (Iain); P. McGuffin (Peter); A.M. McIntosh (Andrew); A.G. McKechanie (Andrew G.); A. McQuillin (Andrew); S. Metrustry (Sarah); N. Migone (Nicola); H.M. Mitchison (Hannah M.); A. Moayyeri (Alireza); J. Morris (James); R. Morris (Richard); D. Muddyman (Dawn); F. Muntoni; B.G. Nordestgaard (Børge G.); K. Northstone (Kate); M.C. O'donovan (Michael); S. O'Rahilly (Stephen); A. Onoufriadis (Alexandros); K. Oualkacha (Karim); M.J. Owen (Michael J.); A. Palotie (Aarno); K. Panoutsopoulou (Kalliope); V. Parker (Victoria); J.R. Parr (Jeremy R.); L. Paternoster (Lavinia); T. Paunio (Tiina); F. Payne (Felicity); S.J. Payne (Stewart J.); J.R.B. Perry (John); O.P.H. Pietiläinen (Olli); V. Plagnol (Vincent); R.C. Pollitt (Rebecca C.); S. Povey (Sue); M.A. Quail (Michael A.); L. Quaye (Lydia); L. Raymond (Lucy); K. Rehnström (Karola); C.K. Ridout (Cheryl K.); S.M. Ring (Susan); G.R.S. Ritchie (Graham R.S.); N. Roberts (Nicola); R.L. Robinson (Rachel L.); D.B. Savage (David); P.J. Scambler (Peter); S. Schiffels (Stephan); M. Schmidts (Miriam); N. Schoenmakers (Nadia); R.H. Scott (Richard H.); R.A. Scott (Robert); R.K. Semple (Robert K.); E. Serra (Eva); S.I. Sharp (Sally I.); A.C. Shaw (Adam C.); H.A. Shihab (Hashem A.); S.-Y. Shin (So-Youn); D. Skuse (David); K.S. Small (Kerrin); C. Smee (Carol); G.D. Smith; L. Southam (Lorraine); O. Spasic-Boskovic (Olivera); T.D. Spector (Timothy); D. St. Clair (David); B. St Pourcain (Beate); J. Stalker (Jim); E. Stevens (Elizabeth); J. Sun (Jianping); G. Surdulescu (Gabriela); J. Suvisaari (Jaana); P. Syrris (Petros); I. Tachmazidou (Ioanna); R. Taylor (Rohan); J. Tian (Jing); M.D. Tobin (Martin); D. Toniolo (Daniela); M. Traglia (Michela); A. Tybjaerg-Hansen; A.M. Valdes; A.M. Vandersteen (Anthony M.); A. Varbo (Anette); P. Vijayarangakannan (Parthiban); P.M. Visscher (Peter); L.V. Wain (Louise); J.T. Walters (James); G. Wang (Guangbiao); J. Wang (Jun); Y. Wang (Yu); K. Ward (Kirsten); E. Wheeler (Eleanor); P.H. Whincup (Peter); T. Whyte (Tamieka); H.J. Williams (Hywel J.); K.A. Williamson (Kathleen); C. Wilson (Crispian); S.G. Wilson (Scott); K. Wong (Kim); C. Xu (Changjiang); J. Yang (Jian); G. Zaza (Gianluigi); E. Zeggini (Eleftheria); F. Zhang (Feng); P. Zhang (Pingbo); W. Zhang (Weihua)

    2015-01-01

    textabstractImputing genotypes from reference panels created by whole-genome sequencing (WGS) provides a cost-effective strategy for augmenting the single-nucleotide polymorphism (SNP) content of genome-wide arrays. The UK10K Cohorts project has generated a data set of 3,781 whole genomes sequenced

  12. WGSSAT: A High-Throughput Computational Pipeline for Mining and Annotation of SSR Markers From Whole Genomes.

    Science.gov (United States)

    Pandey, Manmohan; Kumar, Ravindra; Srivastava, Prachi; Agarwal, Suyash; Srivastava, Shreya; Nagpure, Naresh S; Jena, Joy K; Kushwaha, Basdeo

    2018-03-16

    Mining and characterization of Simple Sequence Repeat (SSR) markers from whole genomes provide valuable information about biological significance of SSR distribution and also facilitate development of markers for genetic analysis. Whole genome sequencing (WGS)-SSR Annotation Tool (WGSSAT) is a graphical user interface pipeline developed using Java Netbeans and Perl scripts which facilitates in simplifying the process of SSR mining and characterization. WGSSAT takes input in FASTA format and automates the prediction of genes, noncoding RNA (ncRNA), core genes, repeats and SSRs from whole genomes followed by mapping of the predicted SSRs onto a genome (classified according to genes, ncRNA, repeats, exonic, intronic, and core gene region) along with primer identification and mining of cross-species markers. The program also generates a detailed statistical report along with visualization of mapped SSRs, genes, core genes, and RNAs. The features of WGSSAT were demonstrated using Takifugu rubripes data. This yielded a total of 139 057 SSR, out of which 113 703 SSR primer pairs were uniquely amplified in silico onto a T. rubripes (fugu) genome. Out of 113 703 mined SSRs, 81 463 were from coding region (including 4286 exonic and 77 177 intronic), 7 from RNA, 267 from core genes of fugu, whereas 105 641 SSR and 601 SSR primer pairs were uniquely mapped onto the medaka genome. WGSSAT is tested under Ubuntu Linux. The source code, documentation, user manual, example dataset and scripts are available online at https://sourceforge.net/projects/wgssat-nbfgr.

  13. RePS: a sequence assembler that masks exact repeats identified from the shotgun data

    DEFF Research Database (Denmark)

    Wang, Jun; Wong, Gane Ka-Shu; Ni, Peixiang

    2002-01-01

    We describe a sequence assembler, RePS (repeat-masked Phrap with scaffolding), that explicitly identifies exact 20mer repeats from the shotgun data and removes them prior to the assembly. The established software is used to compute meaningful error probabilities for each base. Clone......-end-pairing information is used to construct scaffolds that order and orient the contigs. We show with real data for human and rice that reasonable assemblies are possible even at coverages of only 4x to 6x, despite having up to 42.2% in exact repeats. Udgivelsesdato: 2002-May...

  14. Archived neonatal dried blood spot samples can be used for accurate whole genome and exome-targeted next-generation sequencing

    DEFF Research Database (Denmark)

    Hollegaard, Mads Vilhelm; Grauholm, Jonas; Nielsen, Ronni

    2013-01-01

    Dried blood spot samples (DBSS) have been collected and stored for decades as part of newborn screening programmes worldwide. Representing almost an entire population under a certain age and collected with virtually no bias, the Newborn Screening Biobanks are of immense value in medical studies......, for example, to examine the genetics of various disorders. We have previously demonstrated that DNA extracted from a fraction (2×3.2mm discs) of an archived DBSS can be whole genome amplified (wgaDNA) and used for accurate array genotyping. However, until now, it has been uncertain whether wgaDNA from DBSS...... can be used for accurate whole genome sequencing (WGS) and exome sequencing (WES). This study examined two individuals represented by three different types of samples each: whole-blood (reference samples), 3-year-old DBSS spotted with reference material (refDBSS), and 27- to 29-year-old archived...

  15. The Release 6 reference sequence of the Drosophila melanogaster genome.

    Science.gov (United States)

    Hoskins, Roger A; Carlson, Joseph W; Wan, Kenneth H; Park, Soo; Mendez, Ivonne; Galle, Samuel E; Booth, Benjamin W; Pfeiffer, Barret D; George, Reed A; Svirskas, Robert; Krzywinski, Martin; Schein, Jacqueline; Accardo, Maria Carmela; Damia, Elisabetta; Messina, Giovanni; Méndez-Lago, María; de Pablos, Beatriz; Demakova, Olga V; Andreyeva, Evgeniya N; Boldyreva, Lidiya V; Marra, Marco; Carvalho, A Bernardo; Dimitri, Patrizio; Villasante, Alfredo; Zhimulev, Igor F; Rubin, Gerald M; Karpen, Gary H; Celniker, Susan E

    2015-03-01

    Drosophila melanogaster plays an important role in molecular, genetic, and genomic studies of heredity, development, metabolism, behavior, and human disease. The initial reference genome sequence reported more than a decade ago had a profound impact on progress in Drosophila research, and improving the accuracy and completeness of this sequence continues to be important to further progress. We previously described improvement of the 117-Mb sequence in the euchromatic portion of the genome and 21 Mb in the heterochromatic portion, using a whole-genome shotgun assembly, BAC physical mapping, and clone-based finishing. Here, we report an improved reference sequence of the single-copy and middle-repetitive regions of the genome, produced using cytogenetic mapping to mitotic and polytene chromosomes, clone-based finishing and BAC fingerprint verification, ordering of scaffolds by alignment to cDNA sequences, incorporation of other map and sequence data, and validation by whole-genome optical restriction mapping. These data substantially improve the accuracy and completeness of the reference sequence and the order and orientation of sequence scaffolds into chromosome arm assemblies. Representation of the Y chromosome and other heterochromatic regions is particularly improved. The new 143.9-Mb reference sequence, designated Release 6, effectively exhausts clone-based technologies for mapping and sequencing. Highly repeat-rich regions, including large satellite blocks and functional elements such as the ribosomal RNA genes and the centromeres, are largely inaccessible to current sequencing and assembly methods and remain poorly represented. Further significant improvements will require sequencing technologies that do not depend on molecular cloning and that produce very long reads. © 2015 Hoskins et al.; Published by Cold Spring Harbor Laboratory Press.

  16. Protein identification and quantification from riverbank grape, Vitis riparia: Comparing SDS-PAGE and FASP-GPF techniques for shotgun proteomic analysis.

    Science.gov (United States)

    George, Iniga S; Fennell, Anne Y; Haynes, Paul A

    2015-09-01

    Protein sample preparation optimisation is critical for establishing reproducible high throughput proteomic analysis. In this study, two different fractionation sample preparation techniques (in-gel digestion and in-solution digestion) for shotgun proteomics were used to quantitatively compare proteins identified in Vitis riparia leaf samples. The total number of proteins and peptides identified were compared between filter aided sample preparation (FASP) coupled with gas phase fractionation (GPF) and SDS-PAGE methods. There was a 24% increase in the total number of reproducibly identified proteins when FASP-GPF was used. FASP-GPF is more reproducible, less expensive and a better method than SDS-PAGE for shotgun proteomics of grapevine samples as it significantly increases protein identification across biological replicates. Total peptide and protein information from the two fractionation techniques is available in PRIDE with the identifier PXD001399 (http://proteomecentral.proteomexchange.org/dataset/PXD001399). © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  17. Improved imputation of low-frequency and rare variants using the UK10K haplotype reference panel

    DEFF Research Database (Denmark)

    Huang, Jie; Howie, Bryan; Mccarthy, Shane

    2015-01-01

    Imputing genotypes from reference panels created by whole-genome sequencing (WGS) provides a cost-effective strategy for augmenting the single-nucleotide polymorphism (SNP) content of genome-wide arrays. The UK10K Cohorts project has generated a data set of 3,781 whole genomes sequenced at low de...

  18. KAIKObase: An integrated silkworm genome database and data mining tool

    Directory of Open Access Journals (Sweden)

    Nagaraju Javaregowda

    2009-10-01

    Full Text Available Abstract Background The silkworm, Bombyx mori, is one of the most economically important insects in many developing countries owing to its large-scale cultivation for silk production. With the development of genomic and biotechnological tools, B. mori has also become an important bioreactor for production of various recombinant proteins of biomedical interest. In 2004, two genome sequencing projects for B. mori were reported independently by Chinese and Japanese teams; however, the datasets were insufficient for building long genomic scaffolds which are essential for unambiguous annotation of the genome. Now, both the datasets have been merged and assembled through a joint collaboration between the two groups. Description Integration of the two data sets of silkworm whole-genome-shotgun sequencing by the Japanese and Chinese groups together with newly obtained fosmid- and BAC-end sequences produced the best continuity (~3.7 Mb in N50 scaffold size among the sequenced insect genomes and provided a high degree of nucleotide coverage (88% of all 28 chromosomes. In addition, a physical map of BAC contigs constructed by fingerprinting BAC clones and a SNP linkage map constructed using BAC-end sequences were available. In parallel, proteomic data from two-dimensional polyacrylamide gel electrophoresis in various tissues and developmental stages were compiled into a silkworm proteome database. Finally, a Bombyx trap database was constructed for documenting insertion positions and expression data of transposon insertion lines. Conclusion For efficient usage of genome information for functional studies, genomic sequences, physical and genetic map information and EST data were compiled into KAIKObase, an integrated silkworm genome database which consists of 4 map viewers, a gene viewer, and sequence, keyword and position search systems to display results and data at the level of nucleotide sequence, gene, scaffold and chromosome. Integration of the

  19. Tracking of Listeria monocytogenes in meat establishment using Whole Genome Sequencing as a food safety management tool: A proof of concept.

    Science.gov (United States)

    Nastasijevic, Ivan; Milanov, Dubravka; Velebit, Branko; Djordjevic, Vesna; Swift, Craig; Painset, Anais; Lakicevic, Brankica

    2017-09-18

    Repeated Listeria outbreaks particularly associated with Ready-To-Eat (RTE) delicatessen meat products have been reported annually at global level. The most frequent scenario that led to foodborne outbreaks was the post-thermal treatment cross-contamination of deli meat products during slicing and modified atmosphere packaging (MAP). The precondition for such cross contamination is the previous introduction of Listeria into meat processing facilities and subsequent colonization of the production environment, associated with formation of biofilms resilient to common sanitation procedures regularly applied in meat establishments. The use of Whole Genome Sequencing (WGS) can facilitate the understanding of contamination and colonization routes of pathogens within the food production environment and enable efficient pathogen tracking among different departments. This study aimed to: a) provide a proof of concept on practical use of WGS in a meat establishment to define the entry routes and spread pattern of L. monocytogenes, and b) to consider the regular use of WGS in meat processing establishments as a strong support of food safety management system. The results revealed that Listeria spp. was present in slaughter line, chilling chambers, deboning, slicing, MAP, as well as in corridors and dispatch (53 positive samples, out of 240). Eight L. monocytogenes isolates (out of 53) were identified from the slaughterhouse, chilling chambers, deboning, MAP and dispatch. L. monocytogenes isolates were of three different serotypes (1/2a, 1/2c, 4b) and correspondingly of three MLST sequence types. Overall, two pairs of L. monocytogenes isolates were genetically identical, i.e. two serotype 4b isolates (ST1), isolated from water drain at dispatch unit and two isolates obtained from slaughterhouse (floorwall junction at the carcass wash point) and MAP (water drain). These findings indicated that L. monocytogenes isolates identified in meat processing units (MAP, chilling chamber

  20. Elucidation of Taste- and Odor-Producing Bacteria and Toxigenic Cyanobacteria in a Midwestern Drinking Water Supply Reservoir by Shotgun Metagenomic Analysis.

    Science.gov (United States)

    Otten, Timothy G; Graham, Jennifer L; Harris, Theodore D; Dreher, Theo W

    2016-09-01

    While commonplace in clinical settings, DNA-based assays for identification or enumeration of drinking water pathogens and other biological contaminants remain widely unadopted by the monitoring community. In this study, shotgun metagenomics was used to identify taste-and-odor producers and toxin-producing cyanobacteria over a 2-year period in a drinking water reservoir. The sequencing data implicated several cyanobacteria, including Anabaena spp., Microcystis spp., and an unresolved member of the order Oscillatoriales as the likely principal producers of geosmin, microcystin, and 2-methylisoborneol (MIB), respectively. To further demonstrate this, quantitative PCR (qPCR) assays targeting geosmin-producing Anabaena and microcystin-producing Microcystis were utilized, and these data were fitted using generalized linear models and compared with routine monitoring data, including microscopic cell counts, sonde-based physicochemical analyses, and assays of all inorganic and organic nitrogen and phosphorus forms and fractions. The qPCR assays explained the greatest variation in observed geosmin (adjusted R(2) = 0.71) and microcystin (adjusted R(2) = 0.84) concentrations over the study period, highlighting their potential for routine monitoring applications. The origin of the monoterpene cyclase required for MIB biosynthesis was putatively linked to a periphytic cyanobacterial mat attached to the concrete drinking water inflow structure. We conclude that shotgun metagenomics can be used to identify microbial agents involved in water quality deterioration and to guide PCR assay selection or design for routine monitoring purposes. Finally, we offer estimates of microbial diversity and metagenomic coverage of our data sets for reference to others wishing to apply shotgun metagenomics to other lacustrine systems. Cyanobacterial toxins and microbial taste-and-odor compounds are a growing concern for drinking water utilities reliant upon surface water resources. Specific

  1. PatternLab for proteomics: a tool for differential shotgun proteomics

    Directory of Open Access Journals (Sweden)

    Yates John R

    2008-07-01

    Full Text Available Abstract Background A goal of proteomics is to distinguish between states of a biological system by identifying protein expression differences. Liu et al. demonstrated a method to perform semi-relative protein quantitation in shotgun proteomics data by correlating the number of tandem mass spectra obtained for each protein, or "spectral count", with its abundance in a mixture; however, two issues have remained open: how to normalize spectral counting data and how to efficiently pinpoint differences between profiles. Moreover, Chen et al. recently showed how to increase the number of identified proteins in shotgun proteomics by analyzing samples with different MS-compatible detergents while performing proteolytic digestion. The latter introduced new challenges as seen from the data analysis perspective, since replicate readings are not acquired. Results To address the open issues above, we present a program termed PatternLab for proteomics. This program implements existing strategies and adds two new methods to pinpoint differences in protein profiles. The first method, ACFold, addresses experiments with less than three replicates from each state or having assays acquired by different protocols as described by Chen et al. ACFold uses a combined criterion based on expression fold changes, the AC test, and the false-discovery rate, and can supply a "bird's-eye view" of differentially expressed proteins. The other method addresses experimental designs having multiple readings from each state and is referred to as nSVM (natural support vector machine because of its roots in evolutionary computing and in statistical learning theory. Our observations suggest that nSVM's niche comprises projects that select a minimum set of proteins for classification purposes; for example, the development of an early detection kit for a given pathology. We demonstrate the effectiveness of each method on experimental data and confront them with existing strategies

  2. Real-time genomic investigation underlying the public health response to a Shiga toxin-producing Escherichia coli O26:H11 outbreak in a nursery.

    Science.gov (United States)

    Moran-Gilad, J; Rokney, A; Danino, D; Ferdous, M; Alsana, F; Baum, M; Dukhan, L; Agmon, V; Anuka, E; Valinsky, L; Yishay, R; Grotto, I; Rossen, J W A; Gdalevich, M

    2017-10-01

    Shiga toxin-producing Escherichia coli (STEC) is a significant cause of gastrointestinal infection and the haemolytic-uremic syndrome (HUS). STEC outbreaks are commonly associated with food but animal contact is increasingly being implicated in its transmission. We report an outbreak of STEC affecting young infants at a nursery in a rural community (three HUS cases, one definite case, one probable case, three possible cases and five carriers, based on the combination of clinical, epidemiological and laboratory data) identified using culture-based and molecular techniques. The investigation identified repeated animal contact (animal farming and petting) as a likely source of STEC introduction followed by horizontal transmission. Whole genome sequencing (WGS) was used for real-time investigation of the incident and revealed a unique strain of STEC O26:H11 carrying stx2a and intimin. Following a public health intervention, no additional cases have occurred. This is the first STEC outbreak reported from Israel. WGS proved as a useful tool for rapid laboratory characterization and typing of the outbreak strain and informed the public health response at an early stage of this unusual outbreak.

  3. Shotgun metaproteomics of the human distal gut microbiota

    Energy Technology Data Exchange (ETDEWEB)

    VerBerkmoes, N.C.; Russell, A.L.; Shah, M.; Godzik, A.; Rosenquist, M.; Halfvarsson, J.; Lefsrud, M.G.; Apajalahti, J.; Tysk, C.; Hettich, R.L.; Jansson, Janet K.

    2008-10-15

    The human gut contains a dense, complex and diverse microbial community, comprising the gut microbiome. Metagenomics has recently revealed the composition of genes in the gut microbiome, but provides no direct information about which genes are expressed or functioning. Therefore, our goal was to develop a novel approach to directly identify microbial proteins in fecal samples to gain information about the genes expressed and about key microbial functions in the human gut. We used a non-targeted, shotgun mass spectrometry-based whole community proteomics, or metaproteomics, approach for the first deep proteome measurements of thousands of proteins in human fecal samples, thus demonstrating this approach on the most complex sample type to date. The resulting metaproteomes had a skewed distribution relative to the metagenome, with more proteins for translation, energy production and carbohydrate metabolism when compared to what was earlier predicted from metagenomics. Human proteins, including antimicrobial peptides, were also identified, providing a non-targeted glimpse of the host response to the microbiota. Several unknown proteins represented previously undescribed microbial pathways or host immune responses, revealing a novel complex interplay between the human host and its associated microbes.

  4. HOMICIDE BY CERVICAL SPINAL CORD GUNSHOT INJURY WITH SHOTGUN FIRE PELLETS: CASE REPORT

    Directory of Open Access Journals (Sweden)

    Dana Turliuc, Serban Turliuc, Iustin Mihailov, Andrei Cucu, Gabriel Dumitrescu,Claudia Costea

    2015-10-01

    Full Text Available This case present a rare forensic case of cervical spinal gunshot injury of a female by her husband, a professional hunter, during a family fight with a shotgun fire pellets. The gunshot destroyed completely the cervical spinal cord, without injury to the neck vessels and organs and with the patient survival for seven days. We discuss notions of judicial ballistics, assessment of the patient with spinal cord gunshot injury and therapeutic strategies. Even if cervical spine gunshot injuries are most of the times lethal for majority of patients, the surviving patients need the coordination of a multidisciplinary surgical team to ensure the optimal functional prognostic.

  5. Typing of vancomycin-resistant enterococci with MALDI-TOF mass spectrometry in a nosocomial outbreak setting.

    Science.gov (United States)

    Holzknecht, B J; Dargis, R; Pedersen, M; Pinholt, M; Christensen, J J

    2018-03-23

    To investigate the usefulness of matrix-assisted laser desorption ionization-time-of-flight mass spectrometry (MALDI-TOF MS) typing as a first-line epidemiological tool in a nosocomial outbreak of vancomycin-resistant Enterococcus faecium (VREfm). Fifty-five VREfm isolates, previously characterized by whole-genome sequencing (WGS), were included and analysed by MALDI-TOF MS. To take peak reproducibility into account, ethanol/formic acid extraction and other steps of the protocol were conducted in triplicate. Twenty-seven spectra were generated per isolate, and spectra were visually inspected to determine discriminatory peaks. The presence or absence of these was recorded in a peak scheme. Nine discriminatory peaks were identified. A characteristic pattern of these could distinguish between the three major WGS groups: WGS I, WGS II and WGS III. Only one of 38 isolates belonging to WGS I, WGS II or WGS III was misclassified. However, ten of the 17 isolates not belonging to WGS I, II or III displayed peak patterns indistinguishable from those of the outbreak strain. Using visual inspection of spectra, MALDI-TOF MS typing proved to be useful in differentiating three VREfm outbreak clones from each other. However, as non-outbreak isolates could not be reliably differentiated from outbreak clones, the practical value of this typing method for VREfm outbreak management was limited in our setting. Copyright © 2018 European Society of Clinical Microbiology and Infectious Diseases. Published by Elsevier Ltd. All rights reserved.

  6. Subtle genetic changes enhance virulence of methicillin resistant and sensitive Staphylococcus aureus

    Directory of Open Access Journals (Sweden)

    Hawes Alicia C

    2007-11-01

    Full Text Available Abstract Background Community acquired (CA methicillin-resistant Staphylococcus aureus (MRSA increasingly causes disease worldwide. USA300 has emerged as the predominant clone causing superficial and invasive infections in children and adults in the USA. Epidemiological studies suggest that USA300 is more virulent than other CA-MRSA. The genetic determinants that render virulence and dominance to USA300 remain unclear. Results We sequenced the genomes of two pediatric USA300 isolates: one CA-MRSA and one CA-methicillin susceptible (MSSA, isolated at Texas Children's Hospital in Houston. DNA sequencing was performed by Sanger dideoxy whole genome shotgun (WGS and 454 Life Sciences pyrosequencing strategies. The sequence of the USA300 MRSA strain was rigorously annotated. In USA300-MRSA 2658 chromosomal open reading frames were predicted and 3.1 and 27 kilobase (kb plasmids were identified. USA300-MSSA contained a 20 kb plasmid with some homology to the 27 kb plasmid found in USA300-MRSA. Two regions found in US300-MRSA were absent in USA300-MSSA. One of these carried the arginine deiminase operon that appears to have been acquired from S. epidermidis. The USA300 sequence was aligned with other sequenced S. aureus genomes and regions unique to USA300 MRSA were identified. Conclusion USA300-MRSA is highly similar to other MRSA strains based on whole genome alignments and gene content, indicating that the differences in pathogenesis are due to subtle changes rather than to large-scale acquisition of virulence factor genes. The USA300 Houston isolate differs from another sequenced USA300 strain isolate, derived from a patient in San Francisco, in plasmid content and a number of sequence polymorphisms. Such differences will provide new insights into the evolution of pathogens.

  7. Genome-centric evaluation of Burkholderia sp. strain SRS-W-2-2016 resistant to high concentrations of uranium and nickel isolated from the Savannah River Site (SRS, USA

    Directory of Open Access Journals (Sweden)

    Ashish Pathak

    2017-06-01

    Full Text Available Savannah River Site (SRS, an approximately 800-km2 former nuclear weapons production facility located near Aiken, SC remains co-contaminated by heavy metals and radionuclides. To gain a better understanding on microbially-mediated bioremediation mechanisms, several bacterial strains resistant to high concentrations of Uranium (U and Nickel (Ni were isolated from the Steeds Pond soils located within the SRS site. One of the isolated strains, designated as strain SRS-W-2-2016, grew robustly on both U and Ni. To fully understand the arsenal of metabolic functions possessed by this strain, a draft whole genome sequence (WGS was obtained, assembled, annotated and analyzed. Genome-centric evaluation revealed the isolate to belong to the Burkholderia genus with close affiliation to B. xenovorans LB400, an aggressive polychlorinated biphenyl-degrader. At a coverage of 90×, the genome of strain SRS-W-2-2016 consisted of 8,035,584 bases with a total number of 7071 putative genes assembling into 191 contigs with an N50 contig length of 134,675 bases. Several gene homologues coding for resistance to heavy metals/radionuclides were identified in strain SRS-W-2-2016, such as a suite of outer membrane efflux pump proteins similar to nickel/cobalt transporter regulators, peptide/nickel transport substrate and ATP-binding proteins, permease proteins, and a high-affinity nickel-transport protein. Also noteworthy were two separate gene fragments in strain SRS-W-2-2016 homologous to the spoT gene; recently correlated with bacterial tolerance to U. Additionally, a plethora of oxygenase genes were also identified in the isolate, potentially involved in the breakdown of organic compounds facilitating the strain's successful colonization and survival in the SRS co-contaminated soils. The WGS project of Burkholderia sp. strain SRS-W-2-2016 is available at DDBJ/ENA/GenBank under the accession #MSDV00000000.

  8. Whole-Genome Shotgun Sequence of the Keratinolytic Bacterium Lysobacter sp. A03, Isolated from the Antarctic Environment

    OpenAIRE

    Pereira, Jamile Queiroz; Ambrosini, Adriana; Sant?Anna, Fernando Hayashi; Tadra-Sfeir, Michele; Faoro, Helisson; Pedrosa, F?bio Oliveira; Souza, Emanuel Maltempi; Brandelli, Adriano; Passaglia, Luciane M. P.

    2015-01-01

    Lysobacter sp. strain A03 is a protease-producing bacterium isolated from decomposing-penguin feathers collected in the Antarctic environment. This strain has the ability to degrade keratin at low temperatures. The A03 genome sequence provides the possibility of finding new genes with biotechnological potential to better understand its cold-adaptation mechanism and survival in cold environments.

  9. Comparison of phasing strategies for whole human genomes.

    Science.gov (United States)

    Choi, Yongwook; Chan, Agnes P; Kirkness, Ewen; Telenti, Amalio; Schork, Nicholas J

    2018-04-01

    Humans are a diploid species that inherit one set of chromosomes paternally and one homologous set of chromosomes maternally. Unfortunately, most human sequencing initiatives ignore this fact in that they do not directly delineate the nucleotide content of the maternal and paternal copies of the 23 chromosomes individuals possess (i.e., they do not 'phase' the genome) often because of the costs and complexities of doing so. We compared 11 different widely-used approaches to phasing human genomes using the publicly available 'Genome-In-A-Bottle' (GIAB) phased version of the NA12878 genome as a gold standard. The phasing strategies we compared included laboratory-based assays that prepare DNA in unique ways to facilitate phasing as well as purely computational approaches that seek to reconstruct phase information from general sequencing reads and constructs or population-level haplotype frequency information obtained through a reference panel of haplotypes. To assess the performance of the 11 approaches, we used metrics that included, among others, switch error rates, haplotype block lengths, the proportion of fully phase-resolved genes, phasing accuracy and yield between pairs of SNVs. Our comparisons suggest that a hybrid or combined approach that leverages: 1. population-based phasing using the SHAPEIT software suite, 2. either genome-wide sequencing read data or parental genotypes, and 3. a large reference panel of variant and haplotype frequencies, provides a fast and efficient way to produce highly accurate phase-resolved individual human genomes. We found that for population-based approaches, phasing performance is enhanced with the addition of genome-wide read data; e.g., whole genome shotgun and/or RNA sequencing reads. Further, we found that the inclusion of parental genotype data within a population-based phasing strategy can provide as much as a ten-fold reduction in phasing errors. We also considered a majority voting scheme for the construction of a

  10. Deeper insight into the structure of the anaerobic digestion microbial community; the biogas microbiome database is expanded with 157 new genomes.

    Science.gov (United States)

    Treu, Laura; Kougias, Panagiotis G; Campanaro, Stefano; Bassani, Ilaria; Angelidaki, Irini

    2016-09-01

    This research aimed to better characterize the biogas microbiome by means of high throughput metagenomic sequencing and to elucidate the core microbial consortium existing in biogas reactors independently from the operational conditions. Assembly of shotgun reads followed by an established binning strategy resulted in the highest, up to now, extraction of microbial genomes involved in biogas producing systems. From the 236 extracted genome bins, it was remarkably found that the vast majority of them could only be characterized at high taxonomic levels. This result confirms that the biogas microbiome is comprised by a consortium of unknown species. A comparative analysis between the genome bins of the current study and those extracted from a previous metagenomic assembly demonstrated a similar phylogenetic distribution of the main taxa. Finally, this analysis led to the identification of a subset of common microbes that could be considered as the core essential group in biogas production. Copyright © 2016 Elsevier Ltd. All rights reserved.

  11. Immune-modulatory genomic properties differentiate gut microbiota of infants with and without eczema

    KAUST Repository

    Oh, Seungdae; Yap, Gaik Chin; Hong, Pei-Ying; Huang, Chiung-Hui; Aw, Marion M.; Shek, Lynette Pei-Chi; Liu, Wen-Tso; Lee, Bee Wah

    2017-01-01

    Gut microbiota play an important role in human immunological processes, potentially affecting allergic diseases such as eczema. The diversity and structure of gut microbiota in infants with eczema have been previously documented. This study aims to evaluate by comparative metagenomics differences in genetic content in gut microbiota of infants with eczema and their matched controls. Stools were collected at the age of one month old from twelve infants from an at risk birth cohort in a case control manner. Clinical follow up for atopic outcomes were carried out at the age of 12 and 24 months. Microbial genomic DNA were extracted from stool samples and used for shotgun sequencing. Comparative metagenomic analysis showed that immune-regulatory TCAAGCTTGA motifs were significantly enriched in the six healthy controls (C) communities compared to the six eczema subjects (E), with many encoded by Bifidobacterium (38% of the total motifs in the C communities). Draft genomes of five Bifidobacterium species populations (B. longum, B. bifidum, B. breve, B. dentium, and B. pseudocatenulatum) were recovered from metagenomic datasets. The B. longum BFN-121-2 genome encoded more TCAAGCTTGA motifs (4.2 copies per one million genome sequence) than other Bifidobacterium genomes. Additionally, the communities in the stool of controls (C) were also significantly enriched in functions associated with tetrapyrrole biosynthesis compared to those of eczema (E). Our results show distinct immune-modulatory genomic properties of gut microbiota in infants associated with eczema and provide new insights into potential role of gut microbiota in affecting human immune homeostasis.

  12. Immune-modulatory genomic properties differentiate gut microbiota of infants with and without eczema

    KAUST Repository

    Oh, Seungdae

    2017-10-19

    Gut microbiota play an important role in human immunological processes, potentially affecting allergic diseases such as eczema. The diversity and structure of gut microbiota in infants with eczema have been previously documented. This study aims to evaluate by comparative metagenomics differences in genetic content in gut microbiota of infants with eczema and their matched controls. Stools were collected at the age of one month old from twelve infants from an at risk birth cohort in a case control manner. Clinical follow up for atopic outcomes were carried out at the age of 12 and 24 months. Microbial genomic DNA were extracted from stool samples and used for shotgun sequencing. Comparative metagenomic analysis showed that immune-regulatory TCAAGCTTGA motifs were significantly enriched in the six healthy controls (C) communities compared to the six eczema subjects (E), with many encoded by Bifidobacterium (38% of the total motifs in the C communities). Draft genomes of five Bifidobacterium species populations (B. longum, B. bifidum, B. breve, B. dentium, and B. pseudocatenulatum) were recovered from metagenomic datasets. The B. longum BFN-121-2 genome encoded more TCAAGCTTGA motifs (4.2 copies per one million genome sequence) than other Bifidobacterium genomes. Additionally, the communities in the stool of controls (C) were also significantly enriched in functions associated with tetrapyrrole biosynthesis compared to those of eczema (E). Our results show distinct immune-modulatory genomic properties of gut microbiota in infants associated with eczema and provide new insights into potential role of gut microbiota in affecting human immune homeostasis.

  13. Immune-modulatory genomic properties differentiate gut microbiota of infants with and without eczema.

    Directory of Open Access Journals (Sweden)

    Seungdae Oh

    Full Text Available Gut microbiota play an important role in human immunological processes, potentially affecting allergic diseases such as eczema. The diversity and structure of gut microbiota in infants with eczema have been previously documented. This study aims to evaluate by comparative metagenomics differences in genetic content in gut microbiota of infants with eczema and their matched controls. Stools were collected at the age of one month old from twelve infants from an at risk birth cohort in a case control manner. Clinical follow up for atopic outcomes were carried out at the age of 12 and 24 months. Microbial genomic DNA were extracted from stool samples and used for shotgun sequencing. Comparative metagenomic analysis showed that immune-regulatory TCAAGCTTGA motifs were significantly enriched in the six healthy controls (C communities compared to the six eczema subjects (E, with many encoded by Bifidobacterium (38% of the total motifs in the C communities. Draft genomes of five Bifidobacterium species populations (B. longum, B. bifidum, B. breve, B. dentium, and B. pseudocatenulatum were recovered from metagenomic datasets. The B. longum BFN-121-2 genome encoded more TCAAGCTTGA motifs (4.2 copies per one million genome sequence than other Bifidobacterium genomes. Additionally, the communities in the stool of controls (C were also significantly enriched in functions associated with tetrapyrrole biosynthesis compared to those of eczema (E. Our results show distinct immune-modulatory genomic properties of gut microbiota in infants associated with eczema and provide new insights into potential role of gut microbiota in affecting human immune homeostasis.

  14. A korarchaeal genome reveals insights into the evolution of the Archaea

    Energy Technology Data Exchange (ETDEWEB)

    Anderson, Iain J; Elkins, James G.; Podar, Mircea; Graham, David E.; Makarova, Kira S.; Wolf, Yuri; Randau, Lennart; Hedlund, Brian P.; Brochier-Armanet, Celine; Kunin, Victor; Anderson, Iain; Lapidus, Alla; Goltsman, Eugene; Barry, Kerrie; Koonin, Eugene V.; Hugenholtz, Phil; Kyrpides, Nikos; Wanner, Gerhard; Richardson, Paul; Keller, Martin; Stetter, Karl O.

    2008-06-05

    The candidate division Korarchaeota comprises a group of uncultivated microorganisms that, by their small subunit rRNA phylogeny, may have diverged early from the major archaeal phyla Crenarchaeota and Euryarchaeota. Here, we report the initial characterization of a member of the Korarchaeota with the proposed name,"Candidatus Korarchaeum cryptofilum," which exhibits an ultrathin filamentous morphology. To investigate possible ancestral relationships between deep-branching Korarchaeota and other phyla, we used whole-genome shotgun sequencing to construct a complete composite korarchaeal genome from enriched cells. The genome was assembled into a single contig 1.59 Mb in length with a G + C content of 49percent. Of the 1,617 predicted protein-coding genes, 1,382 (85percent) could be assigned to a revised set of archaeal Clusters of Orthologous Groups (COGs). The predicted gene functions suggest that the organism relies on a simple mode of peptide fermentation for carbon and energy and lacks the ability to synthesize de novo purines, CoA, and several other cofactors. Phylogenetic analyses based on conserved single genes and concatenated protein sequences positioned the korarchaeote as a deep archaeal lineage with an apparent affinity to the Crenarchaeota. However, the predicted gene content revealed that several conserved cellular systems, such as cell division, DNA replication, and tRNA maturation, resemble the counterparts in the Euryarchaeota. In light of the known composition of archaeal genomes, the Korarchaeota might have retained a set of cellular features that represents the ancestral archaeal form.

  15. A Korarchael Genome Reveals Insights into the Evolution of the Archaea

    Energy Technology Data Exchange (ETDEWEB)

    Lapidus, Alla; Elkins, James G.; Podar, Mircea; Graham, David E.; Makarova, Kira S.; Wolf, Yuri; Randau, Lennart; Hedlund, Brian P.; Brochier-Armanet, Celine; Kunin, Victor; Anderson, Iain; Lapidus, Alla; Goltsman, Eugene; Barry, Kerrie; Koonin, Eugene V.; Hugenholtz, Phil; Kyrpides, Nikos; Wanner, Gerhard; Richardson, Paul; Keller, Martin; Stetter, Karl O.

    2008-01-07

    The candidate division Korarchaeota comprises a group of uncultivated microorganisms that, by their small subunit rRNA phylogeny, may have diverged early from the major archaeal phyla Crenarchaeota and Euryarchaeota. Here, we report the initial characterization of a member of the Korarchaeota with the proposed name, ?Candidatus Korarchaeum cryptofilum,? which exhibits an ultrathin filamentous morphology. To investigate possible ancestral relationships between deep-branching Korarchaeota and other phyla, we used whole-genome shotgun sequencing to construct a complete composite korarchaeal genome from enriched cells. The genome was assembled into a single contig 1.59 Mb in length with a G + C content of 49percent. Of the 1,617 predicted protein-coding genes, 1,382 (85percent) could be assigned to a revised set of archaeal Clusters of Orthologous Groups (COGs). The predicted gene functions suggest that the organism relies on a simple mode of peptide fermentation for carbon and energy and lacks the ability to synthesize de novo purines, CoA, and several other cofactors. Phylogenetic analyses based on conserved single genes and concatenated protein sequences positioned the korarchaeote as a deep archaeal lineage with an apparent affinity to the Crenarchaeota. However, the predicted gene content revealed that several conserved cellular systems, such as cell division, DNA replication, and tRNA maturation, resemble the counterparts in the Euryarchaeota. In light of the known composition of archaeal genomes, the Korarchaeota might have retained a set of cellular features that represents the ancestral archaeal form.

  16. Robust and rapid algorithms facilitate large-scale whole genome sequencing downstream analysis in an integrative framework.

    Science.gov (United States)

    Li, Miaoxin; Li, Jiang; Li, Mulin Jun; Pan, Zhicheng; Hsu, Jacob Shujui; Liu, Dajiang J; Zhan, Xiaowei; Wang, Junwen; Song, Youqiang; Sham, Pak Chung

    2017-05-19

    Whole genome sequencing (WGS) is a promising strategy to unravel variants or genes responsible for human diseases and traits. However, there is a lack of robust platforms for a comprehensive downstream analysis. In the present study, we first proposed three novel algorithms, sequence gap-filled gene feature annotation, bit-block encoded genotypes and sectional fast access to text lines to address three fundamental problems. The three algorithms then formed the infrastructure of a robust parallel computing framework, KGGSeq, for integrating downstream analysis functions for whole genome sequencing data. KGGSeq has been equipped with a comprehensive set of analysis functions for quality control, filtration, annotation, pathogenic prediction and statistical tests. In the tests with whole genome sequencing data from 1000 Genomes Project, KGGSeq annotated several thousand more reliable non-synonymous variants than other widely used tools (e.g. ANNOVAR and SNPEff). It took only around half an hour on a small server with 10 CPUs to access genotypes of ∼60 million variants of 2504 subjects, while a popular alternative tool required around one day. KGGSeq's bit-block genotype format used 1.5% or less space to flexibly represent phased or unphased genotypes with multiple alleles and achieved a speed of over 1000 times faster to calculate genotypic correlation. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  17. First genome report on novel sequence types of Neisseria meningitidis: ST12777 and ST12778.

    Science.gov (United States)

    Veeraraghavan, Balaji; Lal, Binesh; Devanga Ragupathi, Naveen Kumar; Neeravi, Iyyan Raj; Jeyaraman, Ranjith; Varghese, Rosemol; Paul, Miracle Magdalene; Baskaran, Ashtawarthani; Ranjan, Ranjini

    2018-03-01

    Neisseria meningitidis is an important causative agent of meningitis and/or sepsis with high morbidity and mortality. Baseline genome data on N. meningitidis, especially from developing countries such as India, are lacking. This study aimed to investigate the whole genome sequences of N. meningitidis isolates from a tertiary care centre in India. Whole-genome sequencing was performed using an Ion Torrent™ Personal Genome Machine™ (PGM) with 400-bp chemistry. Data were assembled de novo using SPAdes Genome Assembler v.5.0.0.0. Sequence annotation was performed through PATRIC, RAST and the NCBI PGAAP server. Downstream analysis of the isolates was performed using the Center for Genomic Epidemiology databases for antimicrobial resistance genes and sequence types. Virulence factors and CRISPR were analysed using the PubMLST database and CRISPRFinder, respectively. This study reports the whole genome shotgun sequences of eight N. meningitidis isolates from bloodstream infections. The genome data revealed two novel sequence types (ST12777 and ST12778), along with ST11, ST437 and ST6928. The virulence profile of the isolates matched their sequence types. All isolates were negative for plasmid-mediated resistance genes. To the best of our knowledge, this is the first report of ST11 and ST437 N. meningitidis isolates in India along with two novel sequence types (ST12777 and ST12778). These results indicate that the sequence types circulating in India are diverse and require continuous monitoring. Further studies strengthening the genome data on N. meningitidis are required to understand the prevalence, spread, exact resistance and virulence mechanisms along with serotypes. Copyright © 2017 International Society for Chemotherapy of Infection and Cancer. Published by Elsevier Ltd. All rights reserved.

  18. Bacillus anthracis genome organization in light of whole transcriptome sequencing

    Energy Technology Data Exchange (ETDEWEB)

    Martin, Jeffrey; Zhu, Wenhan; Passalacqua, Karla D.; Bergman, Nicholas; Borodovsky, Mark

    2010-03-22

    Emerging knowledge of whole prokaryotic transcriptomes could validate a number of theoretical concepts introduced in the early days of genomics. What are the rules connecting gene expression levels with sequence determinants such as quantitative scores of promoters and terminators? Are translation efficiency measures, e.g. codon adaptation index and RBS score related to gene expression? We used the whole transcriptome shotgun sequencing of a bacterial pathogen Bacillus anthracis to assess correlation of gene expression level with promoter, terminator and RBS scores, codon adaptation index, as well as with a new measure of gene translational efficiency, average translation speed. We compared computational predictions of operon topologies with the transcript borders inferred from RNA-Seq reads. Transcriptome mapping may also improve existing gene annotation. Upon assessment of accuracy of current annotation of protein-coding genes in the B. anthracis genome we have shown that the transcriptome data indicate existence of more than a hundred genes missing in the annotation though predicted by an ab initio gene finder. Interestingly, we observed that many pseudogenes possess not only a sequence with detectable coding potential but also promoters that maintain transcriptional activity.

  19. Elucidation of taste- and odor-producing bacteria and toxigenic cyanobacteria in a Midwestern drinking water supply reservoir by shotgun metagenomics analysis

    Science.gov (United States)

    Otten, Timothy; Graham, Jennifer L.; Harris, Theodore D.; Dreher, Theo

    2016-01-01

    While commonplace in clinical settings, DNA-based assays for identification or enumeration of drinking water pathogens and other biological contaminants remain widely unadopted by the monitoring community. In this study, shotgun metagenomics was used to identify taste-and-odor producers and toxin-producing cyanobacteria over a 2-year period in a drinking water reservoir. The sequencing data implicated several cyanobacteria, including Anabaena spp.,Microcystis spp., and an unresolved member of the order Oscillatoriales as the likely principal producers of geosmin, microcystin, and 2-methylisoborneol (MIB), respectively. To further demonstrate this, quantitative PCR (qPCR) assays targeting geosmin-producing Anabaena and microcystin-producing Microcystis were utilized, and these data were fitted using generalized linear models and compared with routine monitoring data, including microscopic cell counts, sonde-based physicochemical analyses, and assays of all inorganic and organic nitrogen and phosphorus forms and fractions. The qPCR assays explained the greatest variation in observed geosmin (adjusted R2 = 0.71) and microcystin (adjusted R2 = 0.84) concentrations over the study period, highlighting their potential for routine monitoring applications. The origin of the monoterpene cyclase required for MIB biosynthesis was putatively linked to a periphytic cyanobacterial mat attached to the concrete drinking water inflow structure. We conclude that shotgun metagenomics can be used to identify microbial agents involved in water quality deterioration and to guide PCR assay selection or design for routine monitoring purposes. Finally, we offer estimates of microbial diversity and metagenomic coverage of our data sets for reference to others wishing to apply shotgun metagenomics to other lacustrine systems.

  20. Whole-Genome Shotgun Sequence of the Keratinolytic Bacterium Lysobacter sp. A03, Isolated from the Antarctic Environment.

    Science.gov (United States)

    Pereira, Jamile Queiroz; Ambrosini, Adriana; Sant'Anna, Fernando Hayashi; Tadra-Sfeir, Michele; Faoro, Helisson; Pedrosa, Fábio Oliveira; Souza, Emanuel Maltempi; Brandelli, Adriano; Passaglia, Luciane M P

    2015-04-02

    Lysobacter sp. strain A03 is a protease-producing bacterium isolated from decomposing-penguin feathers collected in the Antarctic environment. This strain has the ability to degrade keratin at low temperatures. The A03 genome sequence provides the possibility of finding new genes with biotechnological potential to better understand its cold-adaptation mechanism and survival in cold environments. Copyright © 2015 Pereira et al.

  1. Genome mapping and characterization of the Anopheles gambiae heterochromatin

    Directory of Open Access Journals (Sweden)

    Sharakhova Maria V

    2010-08-01

    and whole-genome shotgun assembly render the mapping and characterization of a significant part of heterochromatic scaffolds a possibility. These results reveal the strong association between characteristics of the genome features and morphological types of chromatin. Initial analysis of the An. gambiae heterochromatin provides a framework for its functional characterization and comparative genomic analyses with other organisms.

  2. Modular assembly of transposable element arrays by microsatellite targeting in the guayule and rice genomes.

    Science.gov (United States)

    Valdes Franco, José A; Wang, Yi; Huo, Naxin; Ponciano, Grisel; Colvin, Howard A; McMahan, Colleen M; Gu, Yong Q; Belknap, William R

    2018-04-19

    Guayule (Parthenium argentatum A. Gray) is a rubber-producing desert shrub native to Mexico and the United States. Guayule represents an alternative to Hevea brasiliensis as a source for commercial natural rubber. The efficient application of modern molecular/genetic tools to guayule improvement requires characterization of its genome. The 1.6 Gb guayule genome was sequenced, assembled and annotated. The final 1.5 Gb assembly, while fragmented (N 50  = 22 kb), maps > 95% of the shotgun reads and is essentially complete. Approximately 40,000 transcribed, protein encoding genes were annotated on the assembly. Further characterization of this genome revealed 15 families of small, microsatellite-associated, transposable elements (TEs) with unexpected chromosomal distribution profiles. These SaTar (Satellite Targeted) elements, which are non-autonomous Mu-like elements (MULEs), were frequently observed in multimeric linear arrays of unrelated individual elements within which no individual element is interrupted by another. This uniformly non-nested TE multimer architecture has not been previously described in either eukaryotic or prokaryotic genomes. Five families of similarly distributed non-autonomous MULEs (microsatellite associated, modularly assembled) were characterized in the rice genome. Families of TEs with similar structures and distribution profiles were identified in sorghum and citrus. The sequencing and assembly of the guayule genome provides a foundation for application of current crop improvement technologies to this plant. In addition, characterization of this genome revealed SaTar elements with distribution profiles unique among TEs. Satar targeting appears based on an alternative MULE recombination mechanism with the potential to impact gene evolution.

  3. Binning metagenomic contigs by coverage and composition

    NARCIS (Netherlands)

    Alneberg, J.; Bjarnason, B.S.; Bruijn, de I.; Schirmer, M.; Quick, J.; Ijaz, U.Z.; Lahti, L.M.; Loman, N.J.; Andersson, A.F.; Quince, C.

    2014-01-01

    Shotgun sequencing enables the reconstruction of genomes from complex microbial communities, but because assembly does not reconstruct entire genomes, it is necessary to bin genome fragments. Here we present CONCOCT, a new algorithm that combines sequence composition and coverage across multiple

  4. Patterns of genomic variation in the poplar rust fungus Melampsora larici-populina identify pathogenesis-related factors

    Directory of Open Access Journals (Sweden)

    Antoine ePersoons

    2014-09-01

    Full Text Available Melampsora larici-populina is a fungal pathogen responsible for foliar rust disease on poplar trees, which causes damage to forest plantations worldwide, particularly in Northern Europe. The reference genome of the isolate 98AG31 was previously sequenced using a whole genome shotgun strategy, revealing a large genome of 101 megabases containing 16,399 predicted genes, which included secreted protein genes representing poplar rust candidate effectors. In the present study, the genomes of 15 isolates collected over the past 20 years throughout the French territory, representing distinct virulence profiles, were characterized by massively parallel sequencing to assess genetic variation in the poplar rust fungus. Comparison to the reference genome revealed striking structural variations. Analysis of coverage and sequencing depth identified large missing regions between isolates related to the mating type loci. More than 611,824 single-nucleotide polymorphism (SNP positions were uncovered overall, indicating a remarkable level of polymorphism. Based on the accumulation of non-synonymous substitutions in coding sequences and the relative frequencies of synonymous and non-synonymous polymorphisms (i.e. PN/PS, we identify candidate genes that may be involved in fungal pathogenesis. Correlation between non-synonymous SNPs in genes encoding secreted proteins and pathotypes of the studied isolates revealed candidate genes potentially related to virulences 1, 6 and 8 of the poplar rust fungus.

  5. Whole-Genome Sequence of the Metastatic PC3 and LNCaP Human Prostate Cancer Cell Lines

    Directory of Open Access Journals (Sweden)

    Inge Seim

    2017-06-01

    Full Text Available The bone metastasis-derived PC3 and the lymph node metastasis-derived LNCaP prostate cancer cell lines are widely studied, having been described in thousands of publications over the last four decades. Here, we report short-read whole-genome sequencing (WGS and de novo assembly of PC3 (ATCC CRL-1435 and LNCaP (clone FGC; ATCC CRL-1740 at ∼70 × coverage. A known homozygous mutation in TP53 and homozygous loss of PTEN were robustly identified in the PC3 cell line, whereas the LNCaP cell line exhibited a larger number of putative inactivating somatic point and indel mutations (and in particular a loss of stop codon events. This study also provides preliminary evidence that loss of one or both copies of the tumor suppressor Capicua (CIC contributes to primary tumor relapse and metastatic progression, potentially offering a treatment target for castration-resistant prostate cancer (CRPC. Our work provides a resource for genetic, genomic, and biological studies employing two commonly-used prostate cancer cell lines.

  6. Whole-Genome Sequence of the Metastatic PC3 and LNCaP Human Prostate Cancer Cell Lines.

    Science.gov (United States)

    Seim, Inge; Jeffery, Penny L; Thomas, Patrick B; Nelson, Colleen C; Chopin, Lisa K

    2017-06-07

    The bone metastasis-derived PC3 and the lymph node metastasis-derived LNCaP prostate cancer cell lines are widely studied, having been described in thousands of publications over the last four decades. Here, we report short-read whole-genome sequencing (WGS) and de novo assembly of PC3 (ATCC CRL-1435) and LNCaP (clone FGC; ATCC CRL-1740) at ∼70 × coverage. A known homozygous mutation in TP53 and homozygous loss of PTEN were robustly identified in the PC3 cell line, whereas the LNCaP cell line exhibited a larger number of putative inactivating somatic point and indel mutations (and in particular a loss of stop codon events). This study also provides preliminary evidence that loss of one or both copies of the tumor suppressor Capicua ( CIC ) contributes to primary tumor relapse and metastatic progression, potentially offering a treatment target for castration-resistant prostate cancer (CRPC). Our work provides a resource for genetic, genomic, and biological studies employing two commonly-used prostate cancer cell lines. Copyright © 2017 Seim et al.

  7. Use of Metagenomic Shotgun Sequencing Technology To Detect Foodborne Pathogens within the Microbiome of the Beef Production Chain

    OpenAIRE

    Yang, Xiang; Noyes, Noelle R.; Doster, Enrique; Martin, Jennifer N.; Linke, Lyndsey M.; Magnuson, Roberta J.; Yang, Hua; Geornaras, Ifigenia; Woerner, Dale R.; Jones, Kenneth L.; Ruiz, Jaime; Boucher, Christina; Morley, Paul S.; Belk, Keith E.

    2016-01-01

    Foodborne illnesses associated with pathogenic bacteria are a global public health and economic challenge. The diversity of microorganisms (pathogenic and nonpathogenic) that exists within the food and meat industries complicates efforts to understand pathogen ecology. Further, little is known about the interaction of pathogens within the microbiome throughout the meat production chain. Here, a metagenomic approach and shotgun sequencing technology were used as tools to detect pathogenic bact...

  8. Whole-genome sequencing approaches for conservation biology: Advantages, limitations and practical recommendations.

    Science.gov (United States)

    Fuentes-Pardo, Angela P; Ruzzante, Daniel E

    2017-10-01

    Whole-genome resequencing (WGR) is a powerful method for addressing fundamental evolutionary biology questions that have not been fully resolved using traditional methods. WGR includes four approaches: the sequencing of individuals to a high depth of coverage with either unresolved or resolved haplotypes, the sequencing of population genomes to a high depth by mixing equimolar amounts of unlabelled-individual DNA (Pool-seq) and the sequencing of multiple individuals from a population to a low depth (lcWGR). These techniques require the availability of a reference genome. This, along with the still high cost of shotgun sequencing and the large demand for computing resources and storage, has limited their implementation in nonmodel species with scarce genomic resources and in fields such as conservation biology. Our goal here is to describe the various WGR methods, their pros and cons and potential applications in conservation biology. WGR offers an unprecedented marker density and surveys a wide diversity of genetic variations not limited to single nucleotide polymorphisms (e.g., structural variants and mutations in regulatory elements), increasing their power for the detection of signatures of selection and local adaptation as well as for the identification of the genetic basis of phenotypic traits and diseases. Currently, though, no single WGR approach fulfils all requirements of conservation genetics, and each method has its own limitations and sources of potential bias. We discuss proposed ways to minimize such biases. We envision a not distant future where the analysis of whole genomes becomes a routine task in many nonmodel species and fields including conservation biology. © 2017 John Wiley & Sons Ltd.

  9. Effects of Fe and Mn deficiencies on the protein profiles of tomato (Solanum lycopersicum) xylem sap as revealed by shotgun analyses

    Science.gov (United States)

    The aim of this work was to study the effects of Fe and Mn deficiencies on the xylem sap proteome of tomato using a shotgun proteomic approach, with the final goal of elucidating plant response mechanisms to these stresses. This approach yielded 643 proteins reliably identified and quantified with 7...

  10. Targeted genome engineering in human induced pluripotent stem cells from patients with hemophilia B using the CRISPR-Cas9 system.

    Science.gov (United States)

    Lyu, Cuicui; Shen, Jun; Wang, Rui; Gu, Haihui; Zhang, Jianping; Xue, Feng; Liu, Xiaofan; Liu, Wei; Fu, Rongfeng; Zhang, Liyan; Li, Huiyuan; Zhang, Xiaobing; Cheng, Tao; Yang, Renchi; Zhang, Lei

    2018-04-06

    Replacement therapy for hemophilia remains a lifelong treatment. Only gene therapy can cure hemophilia at a fundamental level. The clustered regularly interspaced short palindromic repeats-CRISPR associated nuclease 9 (CRISPR-Cas9) system is a versatile and convenient genome editing tool which can be applied to gene therapy for hemophilia. A patient's induced pluripotent stem cells (iPSCs) were generated from their peripheral blood mononuclear cells (PBMNCs) using episomal vectors. The AAVS1-Cas9-sgRNA plasmid which targets the AAVS1 locus and the AAVS1-EF1α-F9 cDNA-puromycin donor plasmid were constructed, and they were electroporated into the iPSCs. When insertion of F9 cDNA into the AAVS1 locus was confirmed, whole genome sequencing (WGS) was carried out to detect the off-target issue. The iPSCs were then differentiated into hepatocytes, and human factor IX (hFIX) antigen and activity were measured in the culture supernatant. Finally, the hepatocytes were transplanted into non-obese diabetic/severe combined immunodeficiency disease (NOD/SCID) mice through splenic injection. The patient's iPSCs were generated from PBMNCs. Human full-length F9 cDNA was inserted into the AAVS1 locus of iPSCs of a hemophilia B patient using the CRISPR-Cas9 system. No off-target mutations were detected by WGS. The hepatocytes differentiated from the inserted iPSCs could secrete hFIX stably and had the ability to be transplanted into the NOD/SCID mice in the short term. PBMNCs are good somatic cell choices for generating iPSCs from hemophilia patients. The iPSC technique is a good tool for genetic therapy for human hereditary diseases. CRISPR-Cas9 is versatile, convenient, and safe to be used in iPSCs with low off-target effects. Our research offers new approaches for clinical gene therapy for hemophilia.

  11. Genomic Evolution Of The Mdr Serotype O12 Pseudomonas Aeruginosa Clone

    DEFF Research Database (Denmark)

    Thrane, Sandra Wingaard; Taylor, Véronique L.; Freschi, Luca

    2015-01-01

    that serotype switching in combination with an antibiotic resistance determinant contributed to the dissemination of the O12 serotype in the clinic. This selective advantage coincides with the introduction of fluoroquinolones in the clinic. With the PAst program isolates can be serotyped using WGS data......Introduction: Since the 1980’s the serotype O12 of Pseudomonas aeruginosa has emerged as the predominant serotype in clinical settings and in epidemic outbreaks. These serotype O12 isolates exhibit high levels of resistance to various classes of antibiotics.Methods: In this study, we explore how......).Results: While most serotypes were closely linked to the core genome phylogeny we observed horizontal exchange of LPS genes among distinct P. aeruginosa strains. Specifically, we identified a ‘serotype island’ containing the P. aeruginosa O12 LPS gene cluster and an antibiotic resistance determinant (gyrAC248T...

  12. OTU analysis using metagenomic shotgun sequencing data.

    Directory of Open Access Journals (Sweden)

    Xiaolin Hao

    Full Text Available Because of technological limitations, the primer and amplification biases in targeted sequencing of 16S rRNA genes have veiled the true microbial diversity underlying environmental samples. However, the protocol of metagenomic shotgun sequencing provides 16S rRNA gene fragment data with natural immunity against the biases raised during priming and thus the potential of uncovering the true structure of microbial community by giving more accurate predictions of operational taxonomic units (OTUs. Nonetheless, the lack of statistically rigorous comparison between 16S rRNA gene fragments and other data types makes it difficult to interpret previously reported results using 16S rRNA gene fragments. Therefore, in the present work, we established a standard analysis pipeline that would help confirm if the differences in the data are true or are just due to potential technical bias. This pipeline is built by using simulated data to find optimal mapping and OTU prediction methods. The comparison between simulated datasets revealed a relationship between 16S rRNA gene fragments and full-length 16S rRNA sequences that a 16S rRNA gene fragment having a length >150 bp provides the same accuracy as a full-length 16S rRNA sequence using our proposed pipeline, which could serve as a good starting point for experimental design and making the comparison between 16S rRNA gene fragment-based and targeted 16S rRNA sequencing-based surveys possible.

  13. The spread of KPC-producing Enterobacteriaceae in Spain: WGS analysis of the emerging high-risk clones of Klebsiella pneumoniae ST11/KPC-2, ST101/KPC-2 and ST512/KPC-3.

    Science.gov (United States)

    Oteo, Jesús; Pérez-Vázquez, María; Bautista, Verónica; Ortega, Adriana; Zamarrón, Pilar; Saez, David; Fernández-Romero, Sara; Lara, Noelia; Ramiro, Raquel; Aracil, Belén; Campos, José

    2016-12-01

    We analysed the microbiological traits and population structure of KPC-producing Enterobacteriaceae isolates collected in Spain between 2012 and 2014. We also performed a comparative WGS analysis of the three major KPC-producing Klebsiella pneumoniae clones detected. Carbapenemase and ESBL genes were sequenced. The Institut Pasteur MLST scheme was used. WGS data were used to construct phylogenetic trees, to identify the determinants of resistance and to de novo assemble the genome of one representative isolate of each of the three major K. pneumoniae clones. Of the 2443 carbapenemase-producing Enterobacteriaceae isolates identified during the study period, 111 (4.5%) produced KPC. Of these, 81 (73.0%) were K. pneumoniae and 13 (11.7%) were Enterobacter cloacae. Three major epidemic clones of K. pneumoniae were identified: ST11/KPC-2, ST101/KPC-2 and ST512/KPC-3. ST11/KPC-2 differed from ST101/KPC-2 and ST512/KPC-3 by 27 819 and 6924 SNPs, respectively. ST101/KPC-2 differed from ST512/KPC-3 by 28 345 SNPs. Nine acquired resistance genes were found in ST11/KPC-2, 11 in ST512/KPC-3 and 13 in ST101/KPC-2. ST101/KPC-2 had the highest number of virulence genes (20). An 11 bp deletion at the end of the mgrB sequence was the cause of colistin resistance in ST512/KPC-3. KPC-producing Enterobacteriaceae are increasing in Spain. Most KPC-producing K. pneumoniae isolates belonged to only five clones: ST11 and ST512 caused interregional spread, ST101 caused regional spread and ST1961 and ST678 produced independent hospital outbreaks. ST101/KPC-2 had the highest number of resistance and virulence genes. ST101/KPC-2 and ST512/KPC-3 were recently implicated in the spread of KPC in Italy. © The Author 2016. Published by Oxford University Press on behalf of the British Society for Antimicrobial Chemotherapy. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  14. Genomic Analysis of Multidrug-Resistant Escherichia coli from North Carolina Community Hospitals: Ongoing Circulation of CTX-M-Producing ST131-H30Rx and ST131-H30R1 Strains.

    Science.gov (United States)

    Kanamori, Hajime; Parobek, Christian M; Juliano, Jonathan J; Johnson, James R; Johnston, Brian D; Johnson, Timothy J; Weber, David J; Rutala, William A; Anderson, Deverick J

    2017-08-01

    Escherichia coli sequence type 131 (ST131) predominates globally among multidrug-resistant (MDR) E. coli strains. We used whole-genome sequencing (WGS) to investigate 63 MDR E. coli isolates from 7 North Carolina community hospitals (2010 to 2015). Of these, 39 (62%) represented ST131, including 37 (95%) from the ST131- H 30R subclone: 10 (27%) from its H 30R1 subset and 27 (69%) from its H 30Rx subset. ST131 core genomes differed by a median of 15 (range, 0 to 490) single-nucleotide variants (SNVs) overall versus only 7 within H 30R1 (range, 3 to 12 SNVs) and 11 within H 30Rx (range, 0 to 21). The four isolates with identical core genomes were all H 30Rx. Epidemiological and clinical characteristics did not vary significantly by strain type, but many patients with MDR E. coli or H 30Rx infection were critically ill and had poor outcomes. H 30Rx isolates characteristically exhibited fluoroquinolone resistance and CTX-M-15 production, had a high prevalence of trimethoprim-sulfamethoxazole resistance (89%), sul1 (89%), and dfrA17 (85%), and were enriched for specific virulence traits, and all qualified as extraintestinal pathogenic E. coli The high overall prevalence of CTX-M-15 appeared to be possibly attributable to its association with the ST131- H 30Rx subclone and IncF[F2:A1:B-] plasmids. Some phylogenetically clustered non-ST131 MDR E. coli isolates also had distinctive serotypes/ fimH types, fluoroquinolone mutations, CTX-M variants, and IncF types. Thus, WGS analysis of our community hospital source MDR E. coli isolates suggested ongoing circulation and differentiation of E. coli ST131 subclones, with clonal segregation of CTX-M variants, other resistance genes, Inc-type plasmids, and virulence genes. Copyright © 2017 American Society for Microbiology.

  15. Ascertainment bias in studies of human genome-wide polymorphism

    DEFF Research Database (Denmark)

    Clark, Andrew G.; Hubisz, Melissa J.; Bustamente, Carlos D.

    2005-01-01

    of the SNPs that are found are influenced by the discovery sampling effort. The International HapMap project relied on nearly any piece of information available to identify SNPs-including BAC end sequences, shotgun reads, and differences between public and private sequences-and even made use of chimpanzee...... was a resequencing-by-hybridization effort using the 24 people of diverse origin in the Polymorphism Discovery Resource. Here we take these two data sets and contrast two basic summary statistics, heterozygosity and FST, as well as the site frequency spectra, for 500-kb windows spanning the genome. The magnitude...... of disparity between these samples in these measures of variability indicates that population genetic analysis on the raw genotype data is ill advised. Given the knowledge of the discovery samples, we perform an ascertainment correction and show how the post-correction data are more consistent across...

  16. Upgrading of Gasification Gases by means of a Catalytic Membrane Reactor: WGS Catalysts and Inorganic Palladium Membranes HENRECA Project (ENE2004-07758-CO2-01). Final Report; Estudios de Enriquecimiento en H{sub 2} de Gases de Gasificacion mediante el Uso Reactor Catalitico de Membranas: Catalizadores WGS y Membranas Inorganicas de Paladio. Informe Final Proyecto HENRECA (ENE2004-07758-C02-01)

    Energy Technology Data Exchange (ETDEWEB)

    Maranon Bujan, M.; Sanchez Hervas, J. M.; Barreiro Carou, M. del

    2008-07-01

    The combination of a CO catalytic converter with a highly hydrogen selective membrane out stands as a very promising technology for the upgrading of biomass gasification gases. The advantages of this combined system over the traditional two stages WGS technology has been investigated within the HENRECA project, financed under the Spanish PN 2004-2007 of the Ministry of Science and Technology. This project started in September 2004 and had a duration of three years. The Division of Combustion and Gasification of CIEMAT participates in this project in three main activities: the study of the catalytic activity of WGS catalysts synthesised by the other partner of the project (University Rey Juan Carlos), the design of the reaction-separation system and the design and construction of a bench-scale pilot plant where the performance of the membranes prepared by URJC and the catalytic membrane system were investigated. This report describes the activities carried out within the project and the main results obtained. (Author) 14 ref.

  17. Nonoperative Management of Multiple Penetrating Cardiac and Colon Wounds from a Shotgun: A Case Report and Literature Review

    OpenAIRE

    Jaramillo, Paula M.; Montoya, Jaime A.; Mejia, David A.; Pereira Warr, Salin

    2018-01-01

    Introduction. Surgery for cardiac trauma is considered fatal and for wounds of the colon by associated sepsis is normally considered; however, conservative management of many traumatic lesions of different injured organs has progressed over the years. Presentation of the Case. A 65-year-old male patient presented with multiple shotgun wounds on the left upper limb, thorax, and abdomen. On evaluation, he was hemodynamically stable with normal sinus rhythm and normal blood pressure, no dyspnea,...

  18. Development and validation of cross-transferable and polymorphic DNA markers for detecting alien genome introgression in Oryza sativa from Oryza brachyantha.

    Science.gov (United States)

    Ray, Soham; Bose, Lotan K; Ray, Joshitha; Ngangkham, Umakanta; Katara, Jawahar L; Samantaray, Sanghamitra; Behera, Lambodar; Anumalla, Mahender; Singh, Onkar N; Chen, Meingsheng; Wing, Rod A; Mohapatra, Trilochan

    2016-08-01

    African wild rice Oryza brachyantha (FF), a distant relative of cultivated rice Oryza sativa (AA), carries genes for pests and disease resistance. Molecular marker assisted alien gene introgression from this wild species to its domesticated counterpart is largely impeded due to the scarce availability of cross-transferable and polymorphic molecular markers that can clearly distinguish these two species. Availability of the whole genome sequence (WGS) of both the species provides a unique opportunity to develop markers, which are cross-transferable. We observed poor cross-transferability (~0.75 %) of O. sativa specific sequence tagged microsatellite (STMS) markers to O. brachyantha. By utilizing the genome sequence information, we developed a set of 45 low cost PCR based co-dominant polymorphic markers (STS and CAPS). These markers were found cross-transferrable (84.78 %) between the two species and could distinguish them from each other and thus allowed tracing alien genome introgression. Finally, we validated a Monosomic Alien Addition Line (MAAL) carrying chromosome 1 of O. brachyantha in O. sativa background using these markers, as a proof of concept. Hence, in this study, we have identified a set molecular marker (comprising of STMS, STS and CAPS) that are capable of detecting alien genome introgression from O. brachyantha to O. sativa.

  19. The Evolution of Strain Typing in the Mycobacterium tuberculosis Complex.

    Science.gov (United States)

    Merker, Matthias; Kohl, Thomas A; Niemann, Stefan; Supply, Philip

    2017-01-01

    Tuberculosis (TB) is a contagious disease with a complex epidemiology. Therefore, molecular typing (genotyping) of Mycobacterium tuberculosis complex (MTBC) strains is of primary importance to effectively guide outbreak investigations, define transmission dynamics and assist global epidemiological surveillance of the disease. Large-scale genotyping is also needed to get better insights into the biological diversity and the evolution of the pathogen. Thanks to its shorter turnaround and simple numerical nomenclature system, mycobacterial interspersed repetitive unit-variable-number tandem repeat (MIRU-VNTR) typing, based on 24 standardized plus 4 hypervariable loci, optionally combined with spoligotyping, has replaced IS6110 DNA fingerprinting over the last decade as a gold standard among classical strain typing methods for many applications. With the continuous progress and decreasing costs of next-generation sequencing (NGS) technologies, typing based on whole genome sequencing (WGS) is now increasingly performed for near complete exploitation of the available genetic information. However, some important challenges remain such as the lack of standardization of WGS analysis pipelines, the need of databases for sharing WGS data at a global level, and a better understanding of the relevant genomic distances for defining clusters of recent TB transmission in different epidemiological contexts. This chapter provides an overview of the evolution of genotyping methods over the last three decades, which culminated with the development of WGS-based methods. It addresses the relative advantages and limitations of these techniques, indicates current challenges and potential directions for facilitating standardization of WGS-based typing, and provides suggestions on what method to use depending on the specific research question.

  20. Whole genome sequencing-based characterization of extensively drug resistant (XDR) strains of Mycobacterium tuberculosis from Pakistan

    KAUST Repository

    Hasan, Zahra; Ali, Asho; McNerney, Ruth; Mallard, Kim; Hill-Cawthorne, Grant A.; Coll, Francesc; Nair, Mridul; Pain, Arnab; Clark, Taane G.; Hasan, Rumina

    2015-01-01

    Objectives: The global increase in drug resistance in Mycobacterium tuberculosis (MTB) strains increases the focus on improved molecular diagnostics for MTB. Extensively drug-resistant (XDR) - TB is caused by MTB strains resistant to rifampicin, isoniazid, fluoroquinolone and aminoglycoside antibiotics. Resistance to anti-tuberculous drugs has been associated with single nucleotide polymorphisms (SNPs), in particular MTB genes. However, there is regional variation between MTB lineages and the SNPs associated with resistance. Therefore, there is a need to identify common resistance conferring SNPs so that effective molecular-based diagnostic tests for MTB can be developed. This study investigated used whole genome sequencing (WGS) to characterize 37 XDR MTB isolates from Pakistan and investigated SNPs related to drug resistance. Methods: XDR-TB strains were selected. DNA was extracted from MTB strains, and samples underwent WGS with 76-base-paired end fragment sizes using Illumina paired end HiSeq2000 technology. Raw sequence data were mapped uniquely to H37Rv reference genome. The mappings allowed SNPs and small indels to be called using SAMtools/BCFtools. Results: This study found that in all XDR strains, rifampicin resistance was attributable to SNPs in the rpoB RDR region. Isoniazid resistance-associated mutations were primarily related to katG codon 315 followed by inhA S94A. Fluoroquinolone resistance was attributable to gyrA 91-94 codons in most strains, while one did not have SNPs in either gyrA or gyrB. Aminoglycoside resistance was mostly associated with SNPs in rrs, except in 6 strains. Ethambutol resistant strains had embB codon 306 mutations, but many strains did not have this present. The SNPs were compared with those present in commercial assays such as LiPA Hain MDRTBsl, and the sensitivity of the assays for these strains was evaluated. Conclusions: If common drug resistance associated with SNPs evaluated the concordance between phenotypic and

  1. Whole genome sequencing-based characterization of extensively drug resistant (XDR) strains of Mycobacterium tuberculosis from Pakistan

    KAUST Repository

    Hasan, Zahra

    2015-03-01

    Objectives: The global increase in drug resistance in Mycobacterium tuberculosis (MTB) strains increases the focus on improved molecular diagnostics for MTB. Extensively drug-resistant (XDR) - TB is caused by MTB strains resistant to rifampicin, isoniazid, fluoroquinolone and aminoglycoside antibiotics. Resistance to anti-tuberculous drugs has been associated with single nucleotide polymorphisms (SNPs), in particular MTB genes. However, there is regional variation between MTB lineages and the SNPs associated with resistance. Therefore, there is a need to identify common resistance conferring SNPs so that effective molecular-based diagnostic tests for MTB can be developed. This study investigated used whole genome sequencing (WGS) to characterize 37 XDR MTB isolates from Pakistan and investigated SNPs related to drug resistance. Methods: XDR-TB strains were selected. DNA was extracted from MTB strains, and samples underwent WGS with 76-base-paired end fragment sizes using Illumina paired end HiSeq2000 technology. Raw sequence data were mapped uniquely to H37Rv reference genome. The mappings allowed SNPs and small indels to be called using SAMtools/BCFtools. Results: This study found that in all XDR strains, rifampicin resistance was attributable to SNPs in the rpoB RDR region. Isoniazid resistance-associated mutations were primarily related to katG codon 315 followed by inhA S94A. Fluoroquinolone resistance was attributable to gyrA 91-94 codons in most strains, while one did not have SNPs in either gyrA or gyrB. Aminoglycoside resistance was mostly associated with SNPs in rrs, except in 6 strains. Ethambutol resistant strains had embB codon 306 mutations, but many strains did not have this present. The SNPs were compared with those present in commercial assays such as LiPA Hain MDRTBsl, and the sensitivity of the assays for these strains was evaluated. Conclusions: If common drug resistance associated with SNPs evaluated the concordance between phenotypic and

  2. PointFinder: a novel web tool for WGS-based detection of antimicrobial resistance associated with chromosomal point mutations in bacterial pathogens

    DEFF Research Database (Denmark)

    Zankari, Ea; Allesøe, Rosa Lundbye; Joensen, Katrine Grimstrup

    2017-01-01

    enterica, Escherichia coli and Campylobacter jejuni. The web-server ResFinder-2.1 was used to identify acquired antimicrobial resistance genes and two methods, the novel PointFinder (using BLAST) and an in-house method (mapping of raw WGS reads), were used to identify chromosomal point mutations. Results...... or when mapping the reads. Conclusions PointFinder proved, with high concordance between phenotypic and predicted antimicrobial susceptibility, to be a user-friendly web tool for detection of chromosomal point mutations associated with antimicrobial resistance....

  3. Macronuclear genome sequence of the ciliate Tetrahymena thermophila, a model eukaryote.

    Directory of Open Access Journals (Sweden)

    Jonathan A Eisen

    2006-09-01

    Full Text Available The ciliate Tetrahymena thermophila is a model organism for molecular and cellular biology. Like other ciliates, this species has separate germline and soma functions that are embodied by distinct nuclei within a single cell. The germline-like micronucleus (MIC has its genome held in reserve for sexual reproduction. The soma-like macronucleus (MAC, which possesses a genome processed from that of the MIC, is the center of gene expression and does not directly contribute DNA to sexual progeny. We report here the shotgun sequencing, assembly, and analysis of the MAC genome of T. thermophila, which is approximately 104 Mb in length and composed of approximately 225 chromosomes. Overall, the gene set is robust, with more than 27,000 predicted protein-coding genes, 15,000 of which have strong matches to genes in other organisms. The functional diversity encoded by these genes is substantial and reflects the complexity of processes required for a free-living, predatory, single-celled organism. This is highlighted by the abundance of lineage-specific duplications of genes with predicted roles in sensing and responding to environmental conditions (e.g., kinases, using diverse resources (e.g., proteases and transporters, and generating structural complexity (e.g., kinesins and dyneins. In contrast to the other lineages of alveolates (apicomplexans and dinoflagellates, no compelling evidence could be found for plastid-derived genes in the genome. UGA, the only T. thermophila stop codon, is used in some genes to encode selenocysteine, thus making this organism the first known with the potential to translate all 64 codons in nuclear genes into amino acids. We present genomic evidence supporting the hypothesis that the excision of DNA from the MIC to generate the MAC specifically targets foreign DNA as a form of genome self-defense. The combination of the genome sequence, the functional diversity encoded therein, and the presence of some pathways missing from

  4. Molecular epidemiology of Staphylococcus aureus bacteremia in a single large Minnesota medical center in 2015 as assessed using MLST, core genome MLST and spa typing.

    Directory of Open Access Journals (Sweden)

    Kyung-Hwa Park

    Full Text Available Staphylococcus aureus is a leading cause of bacteremia in hospitalized patients. Whether or not S. aureus bacteremia (SAB is associated with clonality, implicating potential nosocomial transmission, has not, however, been investigated. Herein, we examined the epidemiology of SAB using whole genome sequencing (WGS. 152 SAB isolates collected over the course of 2015 at a single large Minnesota medical center were studied. Staphylococcus protein A (spa typing was performed by PCR/Sanger sequencing; multilocus sequence typing (MLST and core genome MLST (cgMLST were determined by WGS. Forty-eight isolates (32% were methicillin-resistant S. aureus (MRSA. The isolates encompassed 66 spa types, clustered into 11 spa clonal complexes (CCs and 10 singleton types. 88% of 48 MRSA isolates belonged to spa CC-002 or -008. Methicillin-susceptible S. aureus (MSSA isolates were more genotypically diverse, with 61% distributed across four spa CCs (CC-002, CC-012, CC-008 and CC-084. By MLST, there was 31 sequence types (STs, including 18 divided into 6 CCs and 13 singleton STs. Amongst MSSA isolates, the common MLST clones were CC5 (23%, CC30 (19%, CC8 (15% and CC15 (11%. Common MRSA clones were CC5 (67% and CC8 (25%; there were no MRSA isolates in CC45 or CC30. By cgMLST analysis, there were 9 allelic differences between two isolates, with the remaining 150 isolates differing from each other by over 40 alleles. The two isolates were retroactively epidemiologically linked by medical record review. Overall, cgMLST analysis resulted in higher resolution epidemiological typing than did multilocus sequence or spa typing.

  5. Use of deep whole-genome sequencing data to identify structure risk variants in breast cancer susceptibility genes.

    Science.gov (United States)

    Guo, Xingyi; Shi, Jiajun; Cai, Qiuyin; Shu, Xiao-Ou; He, Jing; Wen, Wanqing; Allen, Jamie; Pharoah, Paul; Dunning, Alison; Hunter, David J; Kraft, Peter; Easton, Douglas F; Zheng, Wei; Long, Jirong

    2018-03-01

    Functional disruptions of susceptibility genes by large genomic structure variant (SV) deletions in germlines are known to be associated with cancer risk. However, few studies have been conducted to systematically search for SV deletions in breast cancer susceptibility genes. We analysed deep (> 30x) whole-genome sequencing (WGS) data generated in blood samples from 128 breast cancer patients of Asian and European descent with either a strong family history of breast cancer or early cancer onset disease. To identify SV deletions in known or suspected breast cancer susceptibility genes, we used multiple SV calling tools including Genome STRiP, Delly, Manta, BreakDancer and Pindel. SV deletions were detected by at least three of these bioinformatics tools in five genes. Specifically, we identified heterozygous deletions covering a fraction of the coding regions of BRCA1 (with approximately 80kb in two patients), and TP53 genes (with ∼1.6 kb in two patients), and of intronic regions (∼1 kb) of the PALB2 (one patient), PTEN (three patients) and RAD51C genes (one patient). We confirmed the presence of these deletions using real-time quantitative PCR (qPCR). Our study identified novel SV deletions in breast cancer susceptibility genes and the identification of such SV deletions may improve clinical testing.

  6. Comparative Genomic Analysis of Clinical and Environmental Vibrio Vulnificus Isolates Revealed Biotype 3 Evolutionary Relationships

    Directory of Open Access Journals (Sweden)

    Yael eKotton

    2015-01-01

    Full Text Available In 1996 a common-source outbreak of severe soft tissue and bloodstream infections erupted among Israeli fish farmers and fish consumers due to changes in fish marketing policies. The causative pathogen was a new strain of Vibrio vulnificus, named biotype 3, which displayed a unique biochemical and genotypic profile. Initial observations suggested that the pathogen erupted as a result of genetic recombination between two distinct populations. We applied a whole genome shotgun sequencing approach using several V. vulnificus strains from Israel in order to study the pan genome of V. vulnificus and determine the phylogenetic relationship of biotype 3 with existing populations. The core genome of V. vulnificus based on 16 draft and complete genomes consisted of 3068 genes, representing between 59% and 78% of the whole genome of 16 strains. The accessory genome varied in size from 781 kbp to 2044 kbp. Phylogenetic analysis based on whole, core, and accessory genomes displayed similar clustering patterns with two main clusters, clinical (C and environmental (E, all biotype 3 strains formed a distinct group within the E cluster. Annotation of accessory genomic regions found in biotype 3 strains and absent from the core genome yielded 1732 genes, of which the vast majority encoded hypothetical proteins, phage-related proteins, and mobile element proteins. A total of 1916 proteins (including 713 hypothetical proteins were present in all human pathogenic strains (both biotype 3 and non-biotype 3 and absent from the environmental strains. Clustering analysis of the non-hypothetical proteins revealed 148 protein clusters shared by all human pathogenic strains; these included transcriptional regulators, arylsulfatases, methyl-accepting chemotaxis proteins, acetyltransferases, GGDEF family proteins, transposases, type IV secretory system (T4SS proteins, and integrases. Our study showed that V. vulnificus biotype 3 evolved from environmental populations and

  7. The complete chloroplast genome of banana (Musa acuminata, Zingiberales): insight into plastid monocotyledon evolution.

    Science.gov (United States)

    Martin, Guillaume; Baurens, Franc-Christophe; Cardi, Céline; Aury, Jean-Marc; D'Hont, Angélique

    2013-01-01

    Banana (genus Musa) is a crop of major economic importance worldwide. It is a monocotyledonous member of the Zingiberales, a sister group of the widely studied Poales. Most cultivated bananas are natural Musa inter-(sub-)specific triploid hybrids. A Musa acuminata reference nuclear genome sequence was recently produced based on sequencing of genomic DNA enriched in nucleus. The Musa acuminata chloroplast genome was assembled with chloroplast reads extracted from whole-genome-shotgun sequence data. The Musa chloroplast genome is a circular molecule of 169,972 bp with a quadripartite structure containing two single copy regions, a Large Single Copy region (LSC, 88,338 bp) and a Small Single Copy region (SSC, 10,768 bp) separated by Inverted Repeat regions (IRs, 35,433 bp). Two forms of the chloroplast genome relative to the orientation of SSC versus LSC were found. The Musa chloroplast genome shows an extreme IR expansion at the IR/SSC boundary relative to the most common structures found in angiosperms. This expansion consists of the integration of three additional complete genes (rps15, ndhH and ycf1) and part of the ndhA gene. No such expansion has been observed in monocots so far. Simple Sequence Repeats were identified in the Musa chloroplast genome and a new set of Musa chloroplastic markers was designed. The complete sequence of M. acuminata ssp malaccensis chloroplast we reported here is the first one for the Zingiberales order. As such it provides new insight in the evolution of the chloroplast of monocotyledons. In particular, it reinforces that IR/SSC expansion has occurred independently several times within monocotyledons. The discovery of new polymorphic markers within Musa chloroplast opens new perspectives to better understand the origin of cultivated triploid bananas.

  8. The complete chloroplast genome of banana (Musa acuminata, Zingiberales: insight into plastid monocotyledon evolution.

    Directory of Open Access Journals (Sweden)

    Guillaume Martin

    Full Text Available Banana (genus Musa is a crop of major economic importance worldwide. It is a monocotyledonous member of the Zingiberales, a sister group of the widely studied Poales. Most cultivated bananas are natural Musa inter-(sub-specific triploid hybrids. A Musa acuminata reference nuclear genome sequence was recently produced based on sequencing of genomic DNA enriched in nucleus.The Musa acuminata chloroplast genome was assembled with chloroplast reads extracted from whole-genome-shotgun sequence data. The Musa chloroplast genome is a circular molecule of 169,972 bp with a quadripartite structure containing two single copy regions, a Large Single Copy region (LSC, 88,338 bp and a Small Single Copy region (SSC, 10,768 bp separated by Inverted Repeat regions (IRs, 35,433 bp. Two forms of the chloroplast genome relative to the orientation of SSC versus LSC were found. The Musa chloroplast genome shows an extreme IR expansion at the IR/SSC boundary relative to the most common structures found in angiosperms. This expansion consists of the integration of three additional complete genes (rps15, ndhH and ycf1 and part of the ndhA gene. No such expansion has been observed in monocots so far. Simple Sequence Repeats were identified in the Musa chloroplast genome and a new set of Musa chloroplastic markers was designed.The complete sequence of M. acuminata ssp malaccensis chloroplast we reported here is the first one for the Zingiberales order. As such it provides new insight in the evolution of the chloroplast of monocotyledons. In particular, it reinforces that IR/SSC expansion has occurred independently several times within monocotyledons. The discovery of new polymorphic markers within Musa chloroplast opens new perspectives to better understand the origin of cultivated triploid bananas.

  9. Detection of de novo single nucleotide variants in offspring of atomic-bomb survivors close to the hypocenter by whole-genome sequencing.

    Science.gov (United States)

    Horai, Makiko; Mishima, Hiroyuki; Hayashida, Chisa; Kinoshita, Akira; Nakane, Yoshibumi; Matsuo, Tatsuki; Tsuruda, Kazuto; Yanagihara, Katsunori; Sato, Shinya; Imanishi, Daisuke; Imaizumi, Yoshitaka; Hata, Tomoko; Miyazaki, Yasushi; Yoshiura, Koh-Ichiro

    2018-03-01

    Ionizing radiation released by the atomic bombs at Hiroshima and Nagasaki, Japan, in 1945 caused many long-term illnesses, including increased risks of malignancies such as leukemia and solid tumours. Radiation has demonstrated genetic effects in animal models, leading to concerns over the potential hereditary effects of atomic bomb-related radiation. However, no direct analyses of whole DNA have yet been reported. We therefore investigated de novo variants in offspring of atomic-bomb survivors by whole-genome sequencing (WGS). We collected peripheral blood from three trios, each comprising a father (atomic-bomb survivor with acute radiation symptoms), a non-exposed mother, and their child, none of whom had any past history of haematological disorders. One trio of non-exposed individuals was included as a control. DNA was extracted and the numbers of de novo single nucleotide variants in the children were counted by WGS with sequencing confirmation. Gross structural variants were also analysed. Written informed consent was obtained from all participants prior to the study. There were 62, 81, and 42 de novo single nucleotide variants in the children of atomic-bomb survivors, compared with 48 in the control trio. There were no gross structural variants in any trio. These findings are in accord with previously published results that also showed no significant genetic effects of atomic-bomb radiation on second-generation survivors.

  10. Comprehensive Evaluation of Toxoplasma gondii VEG and Neospora caninum LIV Genomes with Tachyzoite Stage Transcriptome and Proteome Defines Novel Transcript Features

    KAUST Repository

    Ramaprasad, Abhinay

    2015-04-13

    Toxoplasma gondii is an important protozoan parasite that infects all warm-blooded animals and causes opportunistic infections in immuno-compromised humans. Its closest relative, Neospora caninum, is an important veterinary pathogen that causes spontaneous abortion in livestock. Comparative genomics of these two closely related coccidians has been of particular interest to identify genes that contribute to varied host cell specificity and disease. Here, we describe a manual evaluation of these genomes based on strand-specific RNA sequencing and shotgun proteomics from the invasive tachyzoite stages of these two parasites. We have corrected predicted structures of over one third of the previously annotated gene models and have annotated untranslated regions (UTRs) in over half of the predicted protein-coding genes. We observe distinctly long UTRs in both the organisms, almost four times longer than other model eukaryotes. We have also identified a putative set of cis-natural antisense transcripts (cis-NATs) and long intergenic non-coding RNAs (lincRNAs). We have significantly improved the annotation quality in these genomes that would serve as a manually curated dataset for Toxoplasma and Neospora research communities.

  11. Characterization of Clostridium Baratii Type F Strains Responsible for an Outbreak of Botulism Linked to Beef Meat Consumption in France.

    Science.gov (United States)

    Mazuet, Christelle; Legeay, Christine; Sautereau, Jean; Bouchier, Christiane; Criscuolo, Alexis; Bouvet, Philippe; Trehard, Hélène; Jourdan Da Silva, Nathalie; Popoff, Michel

    2017-02-01

    A second botulism outbreak due to Clostridium baratii occurred in France in August 2015 and included three patients who had their meal in a restaurant the same day. We report the characterization of C. baratii isolates including whole genome sequencing (WGS). Four C. baratii isolates collected in August 2015 from the outbreak 2 were analysed for toxin production and typing as well as for genetic characterization. WGS was done using using the NEBNext Ultra DNA Library Prep kit for Illumina (New England Biolabs) and sequenced on MiSeq machine (Illumina) in paired-end reads of 250 bases. The phylogenetic tree was generated based on the UPGMA method with genetic distances computed by using the Kimura two-parameter model. Evolutionary analyses were conducted in Bionumerics (V.6.6 Applied Maths). Three C. baratii isolates for patient's stools and one isolate from meat produced botulinum neurotoxin (BoNT) type F and retained a bont/F7 gene in OrfX cluster. All isolates were identical according to the WGS. However, phylogeny of the core genome showed that the four C. baratii strains were distantly related to that of the previous C. baratii outbreak in France in 2014 and from the other C. baratii strains reported in databanks. The fact that the strains isolated from the patients and meat samples were genetically identical supports that the meat used for the Bolognese sauce was responsible for this second botulism outbreak in France. These isolates were unrelated to that from the first C. baratii outbreak in France in 2014 indicating a distinct source of contamination. WGS provided robust determination of genetic relatedness and information regarding BoNT typing and toxin gene locus genomic localization.

  12. Source-pathway-receptor investigation of the fate of trace elements derived from shotgun pellets discharged in terrestrial ecosystems managed for game shooting

    International Nuclear Information System (INIS)

    Sneddon, Jennifer; Clemente, Rafael; Riby, Philip; Lepp, Nicholas W.

    2009-01-01

    Spent shotgun pellets may contaminate terrestrial ecosystems. We examined the fate of elements originating from shotgun pellets in pasture and woodland ecosystems. Two source-receptor pathways: i) soil-soil pore water-plant and ii) whole earthworm/worm gut contents - washed and unwashed small mammal hair were investigated. Concentrations of Pb and associated contaminants were higher in soils from shot areas than controls. Arsenic and lead concentrations were positively correlated in soils, soil pore water and associated biota. Element concentrations in biota were below statutory levels in all locations. Bioavailability of lead to small mammals, based on concentrations in washed body hair was low. Lead movement from soil water to higher trophic levels was minor compared to lead adsorbed onto body surfaces. Lead was concentrated in earthworm gut and some plants. Results indicate that managed game shooting presents minimal risk in terms of element transfer to soils and their associated biota. - Source-receptor pathway analysis of a managed game shooting site showed no environmental risk of trace element transfer.

  13. Source-pathway-receptor investigation of the fate of trace elements derived from shotgun pellets discharged in terrestrial ecosystems managed for game shooting

    Energy Technology Data Exchange (ETDEWEB)

    Sneddon, Jennifer [School of Natural Sciences and Psychology, Liverpool John Moores University, Byrom Street, Liverpool L3 3AF (United Kingdom); Clemente, Rafael, E-mail: rclemente@cebas.csic.e [School of Natural Sciences and Psychology, Liverpool John Moores University, Byrom Street, Liverpool L3 3AF (United Kingdom); Riby, Philip [School of Pharmacy and Chemistry, Liverpool John Moores University, Liverpool L3 3AF (United Kingdom); Lepp, Nicholas W., E-mail: n.w.lepp@ljmu.ac.u [School of Natural Sciences and Psychology, Liverpool John Moores University, Byrom Street, Liverpool L3 3AF (United Kingdom)

    2009-10-15

    Spent shotgun pellets may contaminate terrestrial ecosystems. We examined the fate of elements originating from shotgun pellets in pasture and woodland ecosystems. Two source-receptor pathways: i) soil-soil pore water-plant and ii) whole earthworm/worm gut contents - washed and unwashed small mammal hair were investigated. Concentrations of Pb and associated contaminants were higher in soils from shot areas than controls. Arsenic and lead concentrations were positively correlated in soils, soil pore water and associated biota. Element concentrations in biota were below statutory levels in all locations. Bioavailability of lead to small mammals, based on concentrations in washed body hair was low. Lead movement from soil water to higher trophic levels was minor compared to lead adsorbed onto body surfaces. Lead was concentrated in earthworm gut and some plants. Results indicate that managed game shooting presents minimal risk in terms of element transfer to soils and their associated biota. - Source-receptor pathway analysis of a managed game shooting site showed no environmental risk of trace element transfer.

  14. Phylogenetics and differentiation of Salmonella Newport lineages by whole genome sequencing.

    Directory of Open Access Journals (Sweden)

    Guojie Cao

    Full Text Available Salmonella Newport has ranked in the top three Salmonella serotypes associated with foodborne outbreaks from 1995 to 2011 in the United States. In the current study, we selected 26 S. Newport strains isolated from diverse sources and geographic locations and then conducted 454 shotgun pyrosequencing procedures to obtain 16-24 × coverage of high quality draft genomes for each strain. Comparative genomic analysis of 28 S. Newport strains (including 2 reference genomes and 15 outgroup genomes identified more than 140,000 informative SNPs. A resulting phylogenetic tree consisted of four sublineages and indicated that S. Newport had a clear geographic structure. Strains from Asia were divergent from those from the Americas. Our findings demonstrated that analysis using whole genome sequencing data resulted in a more accurate picture of phylogeny compared to that using single genes or small sets of genes. We selected loci around the mutS gene of S. Newport to differentiate distinct lineages, including those between invH and mutS genes at the 3' end of Salmonella Pathogenicity Island 1 (SPI-1, ste fimbrial operon, and Clustered, Regularly Interspaced, Short Palindromic Repeats (CRISPR associated-proteins (cas. These genes in the outgroup genomes held high similarity with either S. Newport Lineage II or III at the same loci. S. Newport Lineages II and III have different evolutionary histories in this region and our data demonstrated genetic flow and homologous recombination events around mutS. The findings suggested that S. Newport Lineages II and III diverged early in the serotype evolution and have evolved largely independently. Moreover, we identified genes that could delineate sublineages within the phylogenetic tree and that could be used as potential biomarkers for trace-back investigations during outbreaks. Thus, whole genome sequencing data enabled us to better understand the genetic background of pathogenicity and evolutionary history of S

  15. The role of whole genome sequencing in antimicrobial susceptibility testing of bacteria: report from the EUCAST Subcommittee

    DEFF Research Database (Denmark)

    Ellington, M J; Ekelund, O; Aarestrup, Frank Møller

    2017-01-01

    clinical decision making. Internationally agreed principles and quality control (QC) metrics will facilitate early harmonization of analytical approaches and interpretive criteria for WGS-based predictive AST. Only data sets that pass agreed QC metrics should be used in AST predictions. Minimum performance...... of resistance loci. For most bacterial species the major limitations to widespread adoption for WGS-based AST in clinical laboratories remain the current high-cost and limited speed of inferring antimicrobial susceptibility from WGS data as well as the dependency on previous culture because analysis directly...... should be changed to epidemiological cut-off values in order to improve differentiation of wild-type from non-wild-type isolates (harbouring an acquired resistance). Clinical breakpoints should be a secondary comparator. This assessment will reveal whether genetic predictions could also be used to guide...

  16. Analysis of secretome of breast cancer cell line with an optimized semi-shotgun method

    International Nuclear Information System (INIS)

    Tang Xiaorong; Yao Ling; Chen Keying; Hu Xiaofang; Xu Lisa; Fan Chunhai

    2009-01-01

    Secretome, the totality of secreted proteins, is viewed as a promising pool of candidate cancer biomarkers. Simple and reliable methods for identifying secreted proteins are highly desired. We used an optimized semi-shotgun liquid chromatography followed by tandem mass spectrometry (LC-MS/MS) method to analyze the secretome of breast cancer cell line MDA-MB-231. A total of 464 proteins were identified. About 63% of the proteins were classified as secreted proteins, including many promising breast cancer biomarkers, which were thought to be correlated with tumorigenesis, tumor development and metastasis. These results suggest that the optimized method may be a powerful strategy for cell line secretome profiling, and can be used to find potential cancer biomarkers with great clinical significance. (authors)

  17. Genome Sequence, Assembly and Characterization of Two Metschnikowia fructicola Strains Used as Biocontrol Agents of Postharvest Diseases

    Directory of Open Access Journals (Sweden)

    Edoardo Piombo

    2018-04-01

    Full Text Available The yeast Metschnikowia fructicola was reported as an efficient biological control agent of postharvest diseases of fruits and vegetables, and it is the bases of the commercial formulated product “Shemer.” Several mechanisms of action by which M. fructicola inhibits postharvest pathogens were suggested including iron-binding compounds, induction of defense signaling genes, production of fungal cell wall degrading enzymes and relatively high amounts of superoxide anions. We assembled the whole genome sequence of two strains of M. fructicola using PacBio and Illumina shotgun sequencing technologies. Using the PacBio, a high-quality draft genome consisting of 93 contigs, with an estimated genome size of approximately 26 Mb, was obtained. Comparative analysis of M. fructicola proteins with the other three available closely related genomes revealed a shared core of homologous proteins coded by 5,776 genes. Comparing the genomes of the two M. fructicola strains using a SNP calling approach resulted in the identification of 564,302 homologous SNPs with 2,004 predicted high impact mutations. The size of the genome is exceptionally high when compared with those of available closely related organisms, and the high rate of homology among M. fructicola genes points toward a recent whole-genome duplication event as the cause of this large genome. Based on the assembled genome, sequences were annotated with a gene description and gene ontology (GO term and clustered in functional groups. Analysis of CAZymes family genes revealed 1,145 putative genes, and transcriptomic analysis of CAZyme expression levels in M. fructicola during its interaction with either grapefruit peel tissue or Penicillium digitatum revealed a high level of CAZyme gene expression when the yeast was placed in wounded fruit tissue.

  18. Draft genome sequence of Streptomyces coelicoflavus ZG0656 reveals the putative biosynthetic gene cluster of acarviostatin family α-amylase inhibitors.

    Science.gov (United States)

    Guo, X; Geng, P; Bai, F; Bai, G; Sun, T; Li, X; Shi, L; Zhong, Q

    2012-08-01

    The aims of this study are to obtain the draft genome sequence of Streptomyces coelicoflavus ZG0656, which produces novel acarviostatin family α-amylase inhibitors, and then to reveal the putative acarviostatin-related gene cluster and the biosynthetic pathway. The draft genome sequence of S. coelicoflavus ZG0656 was generated using a shotgun approach employing a combination of 454 and Solexa sequencing technologies. Genome analysis revealed a putative gene cluster for acarviostatin biosynthesis, termed sct-cluster. The cluster contains 13 acarviostatin synthetic genes, six transporter genes, four starch degrading or transglycosylation enzyme genes and two regulator genes. On the basis of bioinformatic analysis, we proposed a putative biosynthetic pathway of acarviostatins. The intracellular steps produce a structural core, acarviostatin I00-7-P, and the extracellular assemblies lead to diverse acarviostatin end products. The draft genome sequence of S. coelicoflavus ZG0656 revealed the putative biosynthetic gene cluster of acarviostatins and a putative pathway of acarviostatin production. To our knowledge, S. coelicoflavus ZG0656 is the first strain in this species for which a genome sequence has been reported. The analysis of sct-cluster provided important insights into the biosynthesis of acarviostatins. This work will be a platform for producing novel variants and yield improvement. © 2012 The Authors. Letters in Applied Microbiology © 2012 The Society for Applied Microbiology.

  19. Development and validation of an rDNA operon based primer walking strategy applicable to de novo bacterial genome finishing.

    Directory of Open Access Journals (Sweden)

    Alexander William Eastman

    2015-01-01

    Full Text Available Advances in sequencing technology have drastically increased the depth and feasibility of bacterial genome sequencing. However, little information is available that details the specific techniques and procedures employed during genome sequencing despite the large numbers of published genomes. Shotgun approaches employed by second-generation sequencing platforms has necessitated the development of robust bioinformatics tools for in silico assembly, and complete assembly is limited by the presence of repetitive DNA sequences and multi-copy operons. Typically, re-sequencing with multiple platforms and laborious, targeted Sanger sequencing are employed to finish a draft bacterial genome. Here we describe a novel strategy based on the identification and targeted sequencing of repetitive rDNA operons to expedite bacterial genome assembly and finishing. Our strategy was validated by finishing the genome of Paenibacillus polymyxa strain CR1, a bacterium with potential in sustainable agriculture and bio-based processes. An analysis of the 38 contigs contained in the P. polymyxa strain CR1 draft genome revealed 12 repetitive rDNA operons with varied intragenic and flanking regions of variable length, unanimously located at contig boundaries and within contig gaps. These highly similar but not identical rDNA operons were experimentally verified and sequenced simultaneously with multiple, specially designed primer sets. This approach also identified and corrected significant sequence rearrangement generated during the initial in silico assembly of sequencing reads. Our approach reduces the required effort associated with blind primer walking for contig assembly, increasing both the speed and feasibility of genome finishing. Our study further reinforces the notion that repetitive DNA elements are major limiting factors for genome finishing. Moreover, we provided a step-by-step workflow for genome finishing, which may guide future bacterial genome finishing

  20. Genome-wide detection of chromosomal rearrangements, indels, and mutations in circular chromosomes by short read sequencing

    DEFF Research Database (Denmark)

    Skovgaard, Ole; Bak, Mads; Løbner-Olesen, Anders

    2011-01-01

    a combination of WGS and genome copy number analysis, for the identification of mutations that suppress the growth deficiency imposed by excessive initiations from the Escherichia coli origin of replication, oriC. The E. coli chromosome, like the majority of bacterial chromosomes, is circular, and DNA...... replication is initiated by assembling two replication complexes at the origin, oriC. These complexes then replicate the chromosome bidirectionally toward the terminus, ter. In a population of growing cells, this results in a copy number gradient, so that origin-proximal sequences are more frequent than...... origin-distal sequences. Major rearrangements in the chromosome are, therefore, readily identified by changes in copy number, i.e., certain sequences become over- or under-represented. Of the eight mutations analyzed in detail here, six were found to affect a single gene only, one was a large chromosomal...

  1. A Genomic, Transcriptomic and Proteomic Look at the GE2270 Producer Planobispora rosea, an Uncommon Actinomycete.

    Directory of Open Access Journals (Sweden)

    Arianna Tocchetti

    Full Text Available We report the genome sequence of Planobispora rosea ATCC 53733, a mycelium-forming soil-dweller belonging to one of the lesser studied genera of Actinobacteria and producing the thiopeptide GE2270. The P. rosea genome presents considerable convergence in gene organization and function with other members in the family Streptosporangiaceae, with a significant number (44% of shared orthologs. Patterns of gene expression in P. rosea cultures during exponential and stationary phase have been analyzed using whole transcriptome shotgun sequencing and by proteome analysis. Among the differentially abundant proteins, those involved in protein metabolism are particularly represented, including the GE2270-insensitive EF-Tu. Two proteins from the pbt cluster, directing GE2270 biosynthesis, slightly increase their abundance values over time. While GE2270 production starts during the exponential phase, most pbt genes, as analyzed by qRT-PCR, are down-regulated. The exception is represented by pbtA, encoding the precursor peptide of the ribosomally synthesized GE2270, whose expression reached the highest level at the entry into stationary phase.

  2. Genome shotgun sequencing and development of microsatellite ...

    African Journals Online (AJOL)

    ADP

    2012-04-10

    Apr 10, 2012 ... useful for investigating genetic diversity and differentiation in gerbera. Key words: ... However, this method had a disadvantage: it could not .... PCR product. PCR was ..... advantages, SSR markers had not been developed or ...

  3. Identification of antibiotic resistance genes in the multidrug-resistant Acinetobacter baumannii strain, MDR-SHH02, using whole-genome sequencing.

    Science.gov (United States)

    Wang, Hualiang; Wang, Jinghua; Yu, Peijuan; Ge, Ping; Jiang, Yanqun; Xu, Rong; Chen, Rong; Liu, Xuejie

    2017-02-01

    This study aimed to investigate antibiotic resistance genes in the multidrug-resistant (MDR) Acinetobacter baumannii (A. baumanii) strain, MDR-SHH02, using whole‑genome sequencing (WGS). The antibiotic resistance of MDR-SHH02 isolated from a patient with breast cancer to 19 types of antibiotics was determined using the Kirby‑Bauer method. WGS of MDR-SHH02 was then performed. Following quality control and transcriptome assembly, functional annotation of genes was conducted, and the phylogenetic tree of MDR-SHH02, along with another 5 A. baumanii species and 2 Acinetobacter species, was constructed using PHYLIP 3.695 and FigTree v1.4.2. Furthermore, pathogenicity islands (PAIs) were predicted by the pathogenicity island database. Potential antibiotic resistance genes in MDR-SHH02 were predicted based on the information in the Antibiotic Resistance Genes Database (ARDB). MDR-SHH02 was found to be resistant to all of the tested antibiotics. The total draft genome length of MDR-SHH02 was 4,003,808 bp. There were 74.25% of coding sequences to be annotated into 21 of the Clusters of Orthologous Groups (COGs) of protein terms, such as 'transcription' and 'amino acid transport and metabolism'. Furthermore, there were 45 PAIs homologous to the sequence MDRSHH02000806. Additionally, a total of 12 gene sequences in MDR-SHH02 were highly similar to the sequences of antibiotic resistance genes in ARDB, including genes encoding aminoglycoside‑modifying enzymes [e.g., aac(3)-Ia, ant(2'')‑Ia, aph33ib and aph(3')-Ia], β-lactamase genes (bl2b_tem and bl2b_tem1), sulfonamide-resistant dihydropteroate synthase genes (sul1 and sul2), catb3 and tetb. These results suggest that numerous genes mediate resistance to various antibiotics in MDR-SHH02, and provide a clinical guidance for the personalized therapy of A. baumannii-infected patients.

  4. [Fingerprints identification of Gynostemma pentaphyllum by RAPD and cloning and analysis of its specific DNA fragment].

    Science.gov (United States)

    Jiang, Jun-fu; Li, Xiong-ying; Wu, Yao-sheng; Luo, Yu; Zhao, Rui-qiang; Lan, Xiu-wan

    2009-02-01

    To identify the resources of Gynostemma pentaphyllum and its spurious breed plant Cayratia japonica at level of DNA. Two random primers ( WGS001, WGS004) screened were applied to do random amplification with genomic DNA extracted from Gynostemma pentaphyllum and Cayratia japonica which were collected from different habitats. After amplificated with WGS004, one characteristic fragment about 500 bp which was common to all Gynostemma pentaphyllum samples studied but not to Cayratia japonica was cloned and sequenced. Then these sequences obtained were analyzed for identity and compared by Blastn program in GenBank. There were obvious different bands amplified by above two primers in their fingerprints of genomic DNA. On the basis of these different bands of DNA fingerprints, they could distinguish Gynostemma pentaphyllum and Cayratia japonica obviously. Sequence alignment of seven cloned bands showed that their identities ranged from 45.7% - 94.5%. There was no similar genome sequences searched in GenBank. This indicated that these seven DNA fragments had not been reported before and they should be new sequences. RAPD technique can be used for the accurate identification of Gynostemma pentaphyllum and its counterfeit goods Cayratia japonica. Besides, these specific DNA sequences for Gynostemmna pentaphyllum in this study are useful for the further research on identification of species and assisted selection breeding in Gynostemma pentaphyllum.

  5. Whole-genome profiling and shotgun sequencing delivers an anchored, gene-decorated, physical map assembly of bread wheat chromosome 6A

    Czech Academy of Sciences Publication Activity Database

    Poursarebani, N.; Nussbaumer, T.; Šimková, Hana; Šafář, Jan; Witsenboer, H.; van Oeveren, J.; Doležel, Jaroslav; Mayer, K. F. X.; Stein, N.; Schnurbusch, T.

    2014-01-01

    Roč. 79, č. 2 (2014), s. 334-347 ISSN 0960-7412 Institutional support: RVO:61389030 Keywords : bread wheat chromosome 6A * whole-genome profiling * LINEAR TOPOLOGICAL CONTIGS Subject RIV: EB - Genetics ; Molecular Biology Impact factor: 5.972, year: 2014

  6. Construction of a virtual Mycobacterium tuberculosis consensus genome and its application to data from a next generation sequencer.

    Science.gov (United States)

    Okumura, Kayo; Kato, Masako; Kirikae, Teruo; Kayano, Mitsunori; Miyoshi-Akiyama, Tohru

    2015-03-20

    Although Mycobacterium tuberculosis isolates are consisted of several different lineages and the epidemiology analyses are usually assessed relative to a particular reference genome, M. tuberculosis H37Rv, which might introduce some biased results. Those analyses are essentially based genome sequence information of M. tuberculosis and could be performed in sillico in theory, with whole genome sequence (WGS) data available in the databases and obtained by next generation sequencers (NGSs). As an approach to establish higher resolution methods for such analyses, whole genome sequences of the M. tuberculosis complexes (MTBCs) strains available on databases were aligned to construct virtual reference genome sequences called the consensus sequence (CS), and evaluated its feasibility in in sillico epidemiological analyses. The consensus sequence (CS) was successfully constructed and utilized to perform phylogenetic analysis, evaluation of read mapping efficacy, which is crucial for detecting single nucleotide polymorphisms (SNPs), and various MTBC typing methods virtually including spoligotyping, VNTR, Long sequence polymorphism and Beijing typing. SNPs detected based on CS, in comparison with H37Rv, were utilized in concatemer-based phylogenetic analysis to determine their reliability relative to a phylogenetic tree based on whole genome alignment as the gold standard. Statistical comparison of phylogenic trees based on CS with that of H37Rv indicated the former showed always better results that that of later. SNP detection and concatenation with CS was advantageous because the frequency of crucial SNPs distinguishing among strain lineages was higher than those of H37Rv. The number of SNPs detected was lower with the consensus than with the H37Rv sequence, resulting in a significant reduction in computational time. Performance of each virtual typing was satisfactory and accorded with those published when those are available. These results indicated that virtual CS

  7. Insights into the genome structure and copy-number variation of Eimeria tenella

    Directory of Open Access Journals (Sweden)

    Lim Lik-Sin

    2012-08-01

    method to improve the assembly of the genome of E. tenella from shotgun data, and to help reveal its overall structure. A preliminary assessment of copy-number variation (extra or missing copies of genomic segments between strains of E. tenella was also carried out. The emerging picture is of a very unusual genome architecture displaying inter-strain copy-number variation. We suggest that these features may be related to the known ability of this parasite to rapidly develop drug resistance.

  8. Application of whole genome sequence data in analyzing the molecular epidemiology of Shiga toxin-producing Escherichia coli O157:H7/H.

    Science.gov (United States)

    Yokoyama, Eiji; Hirai, Shinichiro; Ishige, Taichiro; Murakami, Satoshi

    2018-01-02

    Seventeen clusters of Shiga toxin-producing Escherichia coli O157:H7/- (O157) strains, determined by cluster analysis of pulsed-field gel electrophoresis patterns, were analyzed using whole genome sequence (WGS) data to investigate this pathogen's molecular epidemiology. The 17 clusters included 136 strains containing strains from nine outbreaks, with each outbreak caused by a single source contaminated with the organism, as shown by epidemiological contact surveys. WGS data of these strains were used to identify single nucleotide polymorphisms (SNPs) by two methods: short read data were directly mapped to a reference genome (mapping derived SNPs) and common SNPs between the mapping derived SNPs and SNPs in assembled data of short read data (common SNPs). Among both SNPs, those that were detected in genes with a gap were excluded to remove ambiguous SNPs from further analysis. The effectiveness of both SNPs was investigated among all the concatenated SNPs that were detected (whole SNP set); SNPs were divided into three categories based on the genes in which they were located (i.e., backbone SNP set, O-island SNP set, and mobile element SNP set); and SNPs in non-coding regions (intergenic region SNP set). When SNPs from strains isolated from the nine single source derived outbreaks were analyzed using an unweighted pair group method with arithmetic mean tree (UPGMA) and a minimum spanning tree (MST), the maximum pair-wise distances of the backbone SNP set of the mapping derived SNPs were significantly smaller than those of the whole and intergenic region SNP set on both UPGMAs and MSTs. This significant difference was also observed when the backbone SNP set of the common SNPs were examined (Steel-Dwass test, P≤0.01). When the maximum pair-wise distances were compared between the mapping derived and common SNPs, significant differences were observed in those of the whole, mobile element, and intergenic region SNP set (Wilcoxon signed rank test, P≤0.01). When all

  9. Clinical pharmacogenomics testing in the era of next generation sequencing: challenges and opportunities for precision medicine.

    Science.gov (United States)

    Ji, Yuan; Si, Yue; McMillin, Gwendolyn A; Lyon, Elaine

    2018-04-23

    The rapid development and dramatic decrease in cost of sequencing techniques have ushered the implementation of genomic testing in patient care. Next generation DNA sequencing (NGS) techniques have been used increasingly in clinical laboratories to scan the whole or part of the human genome in order to facilitate diagnosis and/or prognostics of genetic disease. Despite many hurdles and debates, pharmacogenomics (PGx) is believed to be an area of genomic medicine where precision medicine could have immediate impact in the near future. Areas covered: This review focuses on lessons learned through early attempts of clinically implementing PGx testing; the challenges and opportunities that PGx testing brings to precision medicine in the era of NGS. Expert commentary: Replacing targeted analysis approach with NGS for PGx testing is neither technically feasible nor necessary currently due to several technical limitations and uncertainty involved in interpreting variants of uncertain significance for PGx variants. However, reporting PGx variants out of clinical whole exome or whole genome sequencing (WES/WGS) might represent additional benefits for patients who are tested by WES/WGS.

  10. Identification of acquired antimicrobial resistance genes

    DEFF Research Database (Denmark)

    Zankari, Ea; Hasman, Henrik; Cosentino, Salvatore

    2012-01-01

    ObjectivesIdentification of antimicrobial resistance genes is important for understanding the underlying mechanisms and the epidemiology of antimicrobial resistance. As the costs of whole-genome sequencing (WGS) continue to decline, it becomes increasingly available in routine diagnostic laborato......ObjectivesIdentification of antimicrobial resistance genes is important for understanding the underlying mechanisms and the epidemiology of antimicrobial resistance. As the costs of whole-genome sequencing (WGS) continue to decline, it becomes increasingly available in routine diagnostic...... laboratories and is anticipated to substitute traditional methods for resistance gene identification. Thus, the current challenge is to extract the relevant information from the large amount of generated data.MethodsWe developed a web-based method, ResFinder that uses BLAST for identification of acquired...... antimicrobial resistance genes in whole-genome data. As input, the method can use both pre-assembled, complete or partial genomes, and short sequence reads from four different sequencing platforms. The method was evaluated on 1862 GenBank files containing 1411 different resistance genes, as well as on 23 de...

  11. Construction of a dairy microbial genome catalog opens new perspectives for the metagenomic analysis of dairy fermented products.

    Science.gov (United States)

    Almeida, Mathieu; Hébert, Agnès; Abraham, Anne-Laure; Rasmussen, Simon; Monnet, Christophe; Pons, Nicolas; Delbès, Céline; Loux, Valentin; Batto, Jean-Michel; Leonard, Pierre; Kennedy, Sean; Ehrlich, Stanislas Dusko; Pop, Mihai; Montel, Marie-Christine; Irlinger, Françoise; Renault, Pierre

    2014-12-13

    Microbial communities of traditional cheeses are complex and insufficiently characterized. The origin, safety and functional role in cheese making of these microbial communities are still not well understood. Metagenomic analysis of these communities by high throughput shotgun sequencing is a promising approach to characterize their genomic and functional profiles. Such analyses, however, critically depend on the availability of appropriate reference genome databases against which the sequencing reads can be aligned. We built a reference genome catalog suitable for short read metagenomic analysis using a low-cost sequencing strategy. We selected 142 bacteria isolated from dairy products belonging to 137 different species and 67 genera, and succeeded to reconstruct the draft genome of 117 of them at a standard or high quality level, including isolates from the genera Kluyvera, Luteococcus and Marinilactibacillus, still missing from public database. To demonstrate the potential of this catalog, we analysed the microbial composition of the surface of two smear cheeses and one blue-veined cheese, and showed that a significant part of the microbiota of these traditional cheeses was composed of microorganisms newly sequenced in our study. Our study provides data, which combined with publicly available genome references, represents the most expansive catalog to date of cheese-associated bacteria. Using this extended dairy catalog, we revealed the presence in traditional cheese of dominant microorganisms not deliberately inoculated, mainly Gram-negative genera such as Pseudoalteromonas haloplanktis or Psychrobacter immobilis, that may contribute to the characteristics of cheese produced through traditional methods.

  12. Diversity of thermophiles in a Malaysian hot spring determined using 16S rRNA and shotgun metagenome sequencing.

    Science.gov (United States)

    Chan, Chia Sing; Chan, Kok-Gan; Tay, Yea-Ling; Chua, Yi-Heng; Goh, Kian Mau

    2015-01-01

    The Sungai Klah (SK) hot spring is the second hottest geothermal spring in Malaysia. This hot spring is a shallow, 150-m-long, fast-flowing stream, with temperatures varying from 50 to 110°C and a pH range of 7.0-9.0. Hidden within a wooded area, the SK hot spring is continually fed by plant litter, resulting in a relatively high degree of total organic content (TOC). In this study, a sample taken from the middle of the stream was analyzed at the 16S rRNA V3-V4 region by amplicon metagenome sequencing. Over 35 phyla were detected by analyzing the 16S rRNA data. Firmicutes and Proteobacteria represented approximately 57% of the microbiome. Approximately 70% of the detected thermophiles were strict anaerobes; however, Hydrogenobacter spp., obligate chemolithotrophic thermophiles, represented one of the major taxa. Several thermophilic photosynthetic microorganisms and acidothermophiles were also detected. Most of the phyla identified by 16S rRNA were also found using the shotgun metagenome approaches. The carbon, sulfur, and nitrogen metabolism within the SK hot spring community were evaluated by shotgun metagenome sequencing, and the data revealed diversity in terms of metabolic activity and dynamics. This hot spring has a rich diversified phylogenetic community partly due to its natural environment (plant litter, high TOC, and a shallow stream) and geochemical parameters (broad temperature and pH range). It is speculated that symbiotic relationships occur between the members of the community.

  13. Unbiased RNA Shotgun Metagenomics in Social and Solitary Wild Bees Detects Associations with Eukaryote Parasites and New Viruses.

    Directory of Open Access Journals (Sweden)

    Karel Schoonvaere

    Full Text Available The diversity of eukaryote organisms and viruses associated with wild bees remains poorly characterized in contrast to the well-documented pathosphere of the western honey bee, Apis mellifera. Using a deliberate RNA shotgun metagenomic sequencing strategy in combination with a dedicated bioinformatics workflow, we identified the (micro-organisms and viruses associated with two bumble bee hosts, Bombus terrestris and Bombus pascuorum, and two solitary bee hosts, Osmia cornuta and Andrena vaga. Ion Torrent semiconductor sequencing generated approximately 3.8 million high quality reads. The most significant eukaryote associations were two protozoan, Apicystis bombi and Crithidia bombi, and one nematode parasite Sphaerularia bombi in bumble bees. The trypanosome protozoan C. bombi was also found in the solitary bee O. cornuta. Next to the identification of three honey bee viruses Black queen cell virus, Sacbrood virus and Varroa destructor virus-1 and four plant viruses, we describe two novel RNA viruses Scaldis River bee virus (SRBV and Ganda bee virus (GABV based on their partial genomic sequences. The novel viruses belong to the class of negative-sense RNA viruses, SRBV is related to the order Mononegavirales whereas GABV is related to the family Bunyaviridae. The potential biological role of both viruses in bees is discussed in the context of recent advances in the field of arthropod viruses. Further, fragmentary sequence evidence for other undescribed viruses is presented, among which a nudivirus in O. cornuta and an unclassified virus related to Chronic bee paralysis virus in B. terrestris. Our findings extend the current knowledge of wild bee parasites in general and addsto the growing evidence of unexplored arthropod viruses in valuable insects.

  14. Unbiased RNA Shotgun Metagenomics in Social and Solitary Wild Bees Detects Associations with Eukaryote Parasites and New Viruses.

    Science.gov (United States)

    Schoonvaere, Karel; De Smet, Lina; Smagghe, Guy; Vierstraete, Andy; Braeckman, Bart P; de Graaf, Dirk C

    2016-01-01

    The diversity of eukaryote organisms and viruses associated with wild bees remains poorly characterized in contrast to the well-documented pathosphere of the western honey bee, Apis mellifera. Using a deliberate RNA shotgun metagenomic sequencing strategy in combination with a dedicated bioinformatics workflow, we identified the (micro-)organisms and viruses associated with two bumble bee hosts, Bombus terrestris and Bombus pascuorum, and two solitary bee hosts, Osmia cornuta and Andrena vaga. Ion Torrent semiconductor sequencing generated approximately 3.8 million high quality reads. The most significant eukaryote associations were two protozoan, Apicystis bombi and Crithidia bombi, and one nematode parasite Sphaerularia bombi in bumble bees. The trypanosome protozoan C. bombi was also found in the solitary bee O. cornuta. Next to the identification of three honey bee viruses Black queen cell virus, Sacbrood virus and Varroa destructor virus-1 and four plant viruses, we describe two novel RNA viruses Scaldis River bee virus (SRBV) and Ganda bee virus (GABV) based on their partial genomic sequences. The novel viruses belong to the class of negative-sense RNA viruses, SRBV is related to the order Mononegavirales whereas GABV is related to the family Bunyaviridae. The potential biological role of both viruses in bees is discussed in the context of recent advances in the field of arthropod viruses. Further, fragmentary sequence evidence for other undescribed viruses is presented, among which a nudivirus in O. cornuta and an unclassified virus related to Chronic bee paralysis virus in B. terrestris. Our findings extend the current knowledge of wild bee parasites in general and addsto the growing evidence of unexplored arthropod viruses in valuable insects.

  15. The high-quality genome of Brassica napus cultivar 'ZS11' reveals the introgression history in semi-winter morphotype.

    Science.gov (United States)

    Sun, Fengming; Fan, Guangyi; Hu, Qiong; Zhou, Yongming; Guan, Mei; Tong, Chaobo; Li, Jiana; Du, Dezhi; Qi, Cunkou; Jiang, Liangcai; Liu, Weiqing; Huang, Shunmou; Chen, Wenbin; Yu, Jingyin; Mei, Desheng; Meng, Jinling; Zeng, Peng; Shi, Jiaqin; Liu, Kede; Wang, Xi; Wang, Xinfa; Long, Yan; Liang, Xinming; Hu, Zhiyong; Huang, Guodong; Dong, Caihua; Zhang, He; Li, Jun; Zhang, Yaolei; Li, Liangwei; Shi, Chengcheng; Wang, Jiahao; Lee, Simon Ming-Yuen; Guan, Chunyun; Xu, Xun; Liu, Shengyi; Liu, Xin; Chalhoub, Boulos; Hua, Wei; Wang, Hanzhong

    2017-11-01

    Allotetraploid oilseed rape (Brassica napus L.) is an agriculturally important crop. Cultivation and breeding of B. napus by humans has resulted in numerous genetically diverse morphotypes with optimized agronomic traits and ecophysiological adaptation. To further understand the genetic basis of diversification and adaptation, we report a draft genome of an Asian semi-winter oilseed rape cultivar 'ZS11' and its comprehensive genomic comparison with the genomes of the winter-type cultivar 'Darmor-bzh' as well as two progenitors. The integrated BAC-to-BAC and whole-genome shotgun sequencing strategies were effective in the assembly of repetitive regions (especially young long terminal repeats) and resulted in a high-quality genome assembly of B. napus 'ZS11'. Within a short evolutionary period (~6700 years ago), semi-winter-type 'ZS11' and the winter-type 'Darmor-bzh' maintained highly genomic collinearity. Even so, certain genetic differences were also detected in two morphotypes. Relative to 'Darmor-bzh', both two subgenomes of 'ZS11' are closely related to its progenitors, and the 'ZS11' genome harbored several specific segmental homoeologous exchanges (HEs). Furthermore, the semi-winter-type 'ZS11' underwent potential genomic introgressions with B. rapa (A r ). Some of these genetic differences were associated with key agronomic traits. A key gene of A03.FLC3 regulating vernalization-responsive flowering time in 'ZS11' was first experienced HE, and then underwent genomic introgression event with A r , which potentially has led to genetic differences in controlling vernalization in the semi-winter types. Our observations improved our understanding of the genetic diversity of different B. napus morphotypes and the cultivation history of semi-winter oilseed rape in Asia. © 2017 The Authors The Plant Journal © 2017 John Wiley & Sons Ltd.

  16. Complete genome sequence of Acinetobacter baumannii XH386 (ST208, a multi-drug resistant bacteria isolated from pediatric hospital in China

    Directory of Open Access Journals (Sweden)

    Youhong Fang

    2016-03-01

    Full Text Available Acinetobacter baumannii is an important bacterium that emerged as a significant nosocomial pathogen worldwide. The rise of A. baumannii was due to its multi-drug resistance (MDR, while it was difficult to treat multi-drug resistant A. baumannii with antibiotics, especially in pediatric patients for the therapeutic options with antibiotics were quite limited in pediatric patients. A. baumannii ST208 was identified as predominant sequence type of carbapenem resistant A. baumannii in the United States and China. As we knew, there was no complete genome sequence reproted for A. baumannii ST208, although several whole genome shotgun sequences had been reported. Here, we sequenced the 4087-kilobase (kb chromosome and 112-kb plasmid of A. baumannii XH386 (ST208, which was isolated from a pediatric hospital in China. The genome of A. baumannii XH386 contained 3968 protein-coding genes and 94 RNA-only encoding genes. Genomic analysis and Minimum inhibitory concentration assay showed that A. baumannii XH386 was multi-drug resistant strain, which showed resistance to most of antibiotics, except for tigecycline. The data may be accessed via the GenBank accession number CP010779 and CP010780. Keywords: Acinetobacter baumannii, Multi-drug resistance, Paediatric

  17. Mapping of genomic EGFRvIII deletions in glioblastoma: insight into rearrangement mechanisms and biomarker development.

    Science.gov (United States)

    Koga, Tomoyuki; Li, Bin; Figueroa, Javier M; Ren, Bing; Chen, Clark C; Carter, Bob S; Furnari, Frank B

    2018-04-12

    Epidermal growth factor receptor (EGFR) variant III (vIII) is the most common oncogenic rearrangement in glioblastoma (GBM) generated by deletion of exons two to seven of EGFR. The proximal breakpoints occur in variable positions within the 123-kb intron one, presenting significant challenges in terms of PCR-based mapping. Molecular mechanisms underlying these deletions remain unclear. We determined the presence of EGFRvIII and its breakpoints for 29 GBM samples using quantitative polymerase chain reaction (qPCR), arrayed PCR mapping, Sanger sequencing, and whole genome sequencing (WGS). Patient-specific breakpoint PCR was performed on tumors, plasma and cerebrospinal fluid (CSF) samples. The breakpoint sequences and single nucleotide polymorphisms (SNPs) were analyzed to elucidate the underlying biogenic mechanism. PCR mapping and WGS independently unveiled eight EGFRvIII breakpoints in six tumors. Patient-specific primers yielded EGFRvIII PCR amplicons in matched tumors, and in cell-free DNA (cfDNA) from a CSF sample, but not in cfDNA or extracellular-vesicle DNA from plasma. The breakpoint analysis revealed nucleotide insertions in four, an insertion of a region outside of EGFR locus in one, microhomologies in three, as well as a duplication or an inversion accompanied by microhomologies in two, suggestive of distinct DNA repair mechanisms. In the GBM samples that harbored distinct breakpoints, the SNP compositions of EGFRvIII and amplified non-vIII EGFR were identical, suggesting that these rearrangements arose from amplified non-vIII EGFR. Our approach efficiently "fingerprints" each sample's EGFRvIII breakpoints. Breakpoint sequence analyses suggest that independent breakpoints arose from precursor amplified non-vIII EGFR through different DNA repair mechanisms.

  18. Comparison of Control of Clostridium difficile Infection in Six English Hospitals Using Whole-Genome Sequencing.

    Science.gov (United States)

    Eyre, David W; Fawley, Warren N; Rajgopal, Anu; Settle, Christopher; Mortimer, Kalani; Goldenberg, Simon D; Dawson, Susan; Crook, Derrick W; Peto, Tim E A; Walker, A Sarah; Wilcox, Mark H

    2017-08-01

    Variation in Clostridium difficile infection (CDI) rates between healthcare institutions suggests overall incidence could be reduced if the lowest rates could be achieved more widely. We used whole-genome sequencing (WGS) of consecutive C. difficile isolates from 6 English hospitals over 1 year (2013-14) to compare infection control performance. Fecal samples with a positive initial screen for C. difficile were sequenced. Within each hospital, we estimated the proportion of cases plausibly acquired from previous cases. Overall, 851/971 (87.6%) sequenced samples contained toxin genes, and 451 (46.4%) were fecal-toxin-positive. Of 652 potentially toxigenic isolates >90-days after the study started, 128 (20%, 95% confidence interval [CI] 17-23%) were genetically linked (within ≤2 single nucleotide polymorphisms) to a prior patient's isolate from the previous 90 days. Hospital 2 had the fewest linked isolates, 7/105 (7%, 3-13%), hospital 1, 9/70 (13%, 6-23%), and hospitals 3-6 had similar proportions of linked isolates (22-26%) (P ≤ .002 comparing hospital-2 vs 3-6). Results were similar adjusting for locally circulating ribotypes. Adjusting for hospital, ribotype-027 had the highest proportion of linked isolates (57%, 95% CI 29-81%). Fecal-toxin-positive and toxin-negative patients were similarly likely to be a potential transmission donor, OR = 1.01 (0.68-1.49). There was no association between the estimated proportion of linked cases and testing rates. WGS can be used as a novel surveillance tool to identify varying rates of C. difficile transmission between institutions and therefore to allow targeted efforts to reduce CDI incidence. © The Author 2017. Published by Oxford University Press for the Infectious Diseases Society of America.

  19. Core Genome Multilocus Sequence Typing Scheme for Stable, Comparative Analyses of Campylobacter jejuni and C. coli Human Disease Isolates.

    Science.gov (United States)

    Cody, Alison J; Bray, James E; Jolley, Keith A; McCarthy, Noel D; Maiden, Martin C J

    2017-07-01

    Human campylobacteriosis, caused by Campylobacter jejuni and C. coli , remains a leading cause of bacterial gastroenteritis in many countries, but the epidemiology of campylobacteriosis outbreaks remains poorly defined, largely due to limitations in the resolution and comparability of isolate characterization methods. Whole-genome sequencing (WGS) data enable the improvement of sequence-based typing approaches, such as multilocus sequence typing (MLST), by substantially increasing the number of loci examined. A core genome MLST (cgMLST) scheme defines a comprehensive set of those loci present in most members of a bacterial group, balancing very high resolution with comparability across the diversity of the group. Here we propose a set of 1,343 loci as a human campylobacteriosis cgMLST scheme (v1.0), the allelic profiles of which can be assigned to core genome sequence types. The 1,343 loci chosen were a subset of the 1,643 loci identified in the reannotation of the genome sequence of C. jejuni isolate NCTC 11168, chosen as being present in >95% of draft genomes of 2,472 representative United Kingdom campylobacteriosis isolates, comprising 2,207 (89.3%) C. jejuni isolates and 265 (10.7%) C. coli isolates. Validation of the cgMLST scheme was undertaken with 1,478 further high-quality draft genomes, containing 150 or fewer contiguous sequences, from disease isolate collections: 99.5% of these isolates contained ≥95% of the 1,343 cgMLST loci. In addition to the rapid and effective high-resolution analysis of large numbers of diverse isolates, the cgMLST scheme enabled the efficient identification of very closely related isolates from a well-defined single-source campylobacteriosis outbreak. Copyright © 2017 Cody et al.

  20. Identification, characterization and distribution of transposable elements in the flax (Linum usitatissimum L. genome

    Directory of Open Access Journals (Sweden)

    González Leonardo Galindo

    2012-11-01

    Full Text Available Abstract Background Flax (Linum usitatissimum L. is an important crop for the production of bioproducts derived from its seed and stem fiber. Transposable elements (TEs are widespread in plant genomes and are a key component of their evolution. The availability of a genome assembly of flax (Linum usitatissimum affords new opportunities to explore the diversity of TEs and their relationship to genes and gene expression. Results Four de novo repeat identification algorithms (PILER, RepeatScout, LTR_finder and LTR_STRUC were applied to the flax genome assembly. The resulting library of flax repeats was combined with the RepBase Viridiplantae division and used with RepeatMasker to identify TEs coverage in the genome. LTR retrotransposons were the most abundant TEs (17.2% genome coverage, followed by Long Interspersed Nuclear Element (LINE retrotransposons (2.10% and Mutator DNA transposons (1.99%. Comparison of putative flax TEs to flax transcript databases indicated that TEs are not highly expressed in flax. However, the presence of recent insertions, defined by 100% intra-element LTR similarity, provided evidence for recent TE activity. Spatial analysis showed TE-rich regions, gene-rich regions as well as regions with similar genes and TE density. Monte Carlo simulations for the 71 largest scaffolds (≥ 1 Mb each did not show any regional differences in the frequency of TE overlap with gene coding sequences. However, differences between TE superfamilies were found in their proximity to genes. Genes within TE-rich regions also appeared to have lower transcript expression, based on EST abundance. When LTR elements were compared, Copia showed more diversity, recent insertions and conserved domains than the Gypsy, demonstrating their importance in genome evolution. Conclusions The calculated 23.06% TE coverage of the flax WGS assembly is at the low end of the range of TE coverages reported in other eudicots, although this estimate does not include

  1. Identification, characterization and distribution of transposable elements in the flax (Linum usitatissimum L.) genome.

    Science.gov (United States)

    González, Leonardo Galindo; Deyholos, Michael K

    2012-11-21

    Flax (Linum usitatissimum L.) is an important crop for the production of bioproducts derived from its seed and stem fiber. Transposable elements (TEs) are widespread in plant genomes and are a key component of their evolution. The availability of a genome assembly of flax (Linum usitatissimum) affords new opportunities to explore the diversity of TEs and their relationship to genes and gene expression. Four de novo repeat identification algorithms (PILER, RepeatScout, LTR_finder and LTR_STRUC) were applied to the flax genome assembly. The resulting library of flax repeats was combined with the RepBase Viridiplantae division and used with RepeatMasker to identify TEs coverage in the genome. LTR retrotransposons were the most abundant TEs (17.2% genome coverage), followed by Long Interspersed Nuclear Element (LINE) retrotransposons (2.10%) and Mutator DNA transposons (1.99%). Comparison of putative flax TEs to flax transcript databases indicated that TEs are not highly expressed in flax. However, the presence of recent insertions, defined by 100% intra-element LTR similarity, provided evidence for recent TE activity. Spatial analysis showed TE-rich regions, gene-rich regions as well as regions with similar genes and TE density. Monte Carlo simulations for the 71 largest scaffolds (≥ 1 Mb each) did not show any regional differences in the frequency of TE overlap with gene coding sequences. However, differences between TE superfamilies were found in their proximity to genes. Genes within TE-rich regions also appeared to have lower transcript expression, based on EST abundance. When LTR elements were compared, Copia showed more diversity, recent insertions and conserved domains than the Gypsy, demonstrating their importance in genome evolution. The calculated 23.06% TE coverage of the flax WGS assembly is at the low end of the range of TE coverages reported in other eudicots, although this estimate does not include TEs likely found in unassembled repetitive regions of

  2. Genome-Centric Analysis of a Thermophilic and Cellulolytic Bacterial Consortium Derived from Composting

    Science.gov (United States)

    Lemos, Leandro N.; Pereira, Roberta V.; Quaggio, Ronaldo B.; Martins, Layla F.; Moura, Livia M. S.; da Silva, Amanda R.; Antunes, Luciana P.; da Silva, Aline M.; Setubal, João C.

    2017-01-01

    Microbial consortia selected from complex lignocellulolytic microbial communities are promising alternatives to deconstruct plant waste, since synergistic action of different enzymes is required for full degradation of plant biomass in biorefining applications. Culture enrichment also facilitates the study of interactions among consortium members, and can be a good source of novel microbial species. Here, we used a sample from a plant waste composting operation in the São Paulo Zoo (Brazil) as inoculum to obtain a thermophilic aerobic consortium enriched through multiple passages at 60°C in carboxymethylcellulose as sole carbon source. The microbial community composition of this consortium was investigated by shotgun metagenomics and genome-centric analysis. Six near-complete (over 90%) genomes were reconstructed. Similarity and phylogenetic analyses show that four of these six genomes are novel, with the following hypothesized identifications: a new Thermobacillus species; the first Bacillus thermozeamaize genome (for which currently only 16S sequences are available) or else the first representative of a new family in the Bacillales order; the first representative of a new genus in the Paenibacillaceae family; and the first representative of a new deep-branching family in the Clostridia class. The reconstructed genomes from known species were identified as Geobacillus thermoglucosidasius and Caldibacillus debilis. The metabolic potential of these recovered genomes based on COG and CAZy analyses show that these genomes encode several glycoside hydrolases (GHs) as well as other genes related to lignocellulose breakdown. The new Thermobacillus species stands out for being the richest in diversity and abundance of GHs, possessing the greatest potential for biomass degradation among the six recovered genomes. We also investigated the presence and activity of the organisms corresponding to these genomes in the composting operation from which the consortium was built

  3. Genomic Selection in the Era of Next Generation Sequencing for Complex Traits in Plant Breeding.

    Science.gov (United States)

    Bhat, Javaid A; Ali, Sajad; Salgotra, Romesh K; Mir, Zahoor A; Dutta, Sutapa; Jadon, Vasudha; Tyagi, Anshika; Mushtaq, Muntazir; Jain, Neelu; Singh, Pradeep K; Singh, Gyanendra P; Prabhu, K V

    2016-01-01

    Genomic selection (GS) is a promising approach exploiting molecular genetic markers to design novel breeding programs and to develop new markers-based models for genetic evaluation. In plant breeding, it provides opportunities to increase genetic gain of complex traits per unit time and cost. The cost-benefit balance was an important consideration for GS to work in crop plants. Availability of genome-wide high-throughput, cost-effective and flexible markers, having low ascertainment bias, suitable for large population size as well for both model and non-model crop species with or without the reference genome sequence was the most important factor for its successful and effective implementation in crop species. These factors were the major limitations to earlier marker systems viz., SSR and array-based, and was unimaginable before the availability of next-generation sequencing (NGS) technologies which have provided novel SNP genotyping platforms especially the genotyping by sequencing. These marker technologies have changed the entire scenario of marker applications and made the use of GS a routine work for crop improvement in both model and non-model crop species. The NGS-based genotyping have increased genomic-estimated breeding value prediction accuracies over other established marker platform in cereals and other crop species, and made the dream of GS true in crop breeding. But to harness the true benefits from GS, these marker technologies will be combined with high-throughput phenotyping for achieving the valuable genetic gain from complex traits. Moreover, the continuous decline in sequencing cost will make the WGS feasible and cost effective for GS in near future. Till that time matures the targeted sequencing seems to be more cost-effective option for large scale marker discovery and GS, particularly in case of large and un-decoded genomes.

  4. Transforming Classroom Norms as Social Change: Pairing Embodied Exercises with Collaborative Participation in the WGS Classroom (with Syllabus

    Directory of Open Access Journals (Sweden)

    Cara E Jones

    2017-02-01

    Full Text Available This article explores tensions between critical feminist pedagogy and the neoliberal corporate university, asking how engaging the body and redistributing student agency highlights larger questions of power that haunt the academy as a whole. Including specific embodied exercises used in WGS classrooms, this essay argues that as students and professors engage within an increasingly corporate university system, embodied activities that incorporate the body as a site of learning and critical analysis can access situated knowledges while projects that de-center power and responsibility are viewed with skepticism. I attribute this discrepancy to the neoliberal structure in which we teach and learn, arguing that we need to value and make visible the labor that goes into critical pedagogy.

  5. MLST and Whole-Genome-Based Population Analysis of Cryptococcus gattii VGIII Links Clinical, Veterinary and Environmental Strains, and Reveals Divergent Serotype Specific Sub-populations and Distant Ancestors

    Science.gov (United States)

    Firacative, Carolina; Roe, Chandler C.; Malik, Richard; Ferreira-Paim, Kennio; Escandón, Patricia; Sykes, Jane E.; Castañón-Olivares, Laura Rocío; Contreras-Peres, Cudberto; Samayoa, Blanca; Sorrell, Tania C.; Castañeda, Elizabeth; Lockhart, Shawn R.; Engelthaler, David M.; Meyer, Wieland

    2016-01-01

    The emerging pathogen Cryptococcus gattii causes life-threatening disease in immunocompetent and immunocompromised hosts. Of the four major molecular types (VGI-VGIV), the molecular type VGIII has recently emerged as cause of disease in otherwise healthy individuals, prompting a need to investigate its population genetic structure to understand if there are potential genotype-dependent characteristics in its epidemiology, environmental niche(s), host range and clinical features of disease. Multilocus sequence typing (MLST) of 122 clinical, environmental and veterinary C. gattii VGIII isolates from Australia, Colombia, Guatemala, Mexico, New Zealand, Paraguay, USA and Venezuela, and whole genome sequencing (WGS) of 60 isolates representing all established MLST types identified four divergent sub-populations. The majority of the isolates belong to two main clades, corresponding either to serotype B or C, indicating an ongoing species evolution. Both major clades included clinical, environmental and veterinary isolates. The C. gattii VGIII population was genetically highly diverse, with minor differences between countries, isolation source, serotype and mating type. Little to no recombination was found between the two major groups, serotype B and C, at the whole and mitochondrial genome level. C. gattii VGIII is widespread in the Americas, with sporadic cases occurring elsewhere, WGS revealed Mexico and USA as a likely origin of the serotype B VGIII population and Colombia as a possible origin of the serotype C VGIII population. Serotype B isolates are more virulent than serotype C isolates in a murine model of infection, causing predominantly pulmonary cryptococcosis. No specific link between genotype and virulence was observed. Antifungal susceptibility testing against six antifungal drugs revealed that serotype B isolates are more susceptible to azoles than serotype C isolates, highlighting the importance of strain typing to guide effective treatment to improve the

  6. A multi-platform draft de novo genome assembly and comparative analysis for the Scarlet Macaw (Ara macao.

    Directory of Open Access Journals (Sweden)

    Christopher M Seabury

    Full Text Available Data deposition to NCBI Genomes: This Whole Genome Shotgun project has been deposited at DDBJ/EMBL/GenBank under the accession AMXX00000000 (SMACv1.0, unscaffolded genome assembly. The version described in this paper is the first version (AMXX01000000. The scaffolded assembly (SMACv1.1 has been deposited at DDBJ/EMBL/GenBank under the accession AOUJ00000000, and is also the first version (AOUJ01000000. Strong biological interest in traits such as the acquisition and utilization of speech, cognitive abilities, and longevity catalyzed the utilization of two next-generation sequencing platforms to provide the first-draft de novo genome assembly for the large, new world parrot Ara macao (Scarlet Macaw. Despite the challenges associated with genome assembly for an outbred avian species, including 951,507 high-quality putative single nucleotide polymorphisms, the final genome assembly (>1.035 Gb includes more than 997 Mb of unambiguous sequence data (excluding N's. Cytogenetic analyses including ZooFISH revealed complex rearrangements associated with two scarlet macaw macrochromosomes (AMA6, AMA7, which supports the hypothesis that translocations, fusions, and intragenomic rearrangements are key factors associated with karyotype evolution among parrots. In silico annotation of the scarlet macaw genome provided robust evidence for 14,405 nuclear gene annotation models, their predicted transcripts and proteins, and a complete mitochondrial genome. Comparative analyses involving the scarlet macaw, chicken, and zebra finch genomes revealed high levels of nucleotide-based conservation as well as evidence for overall genome stability among the three highly divergent species. Application of a new whole-genome analysis of divergence involving all three species yielded prioritized candidate genes and noncoding regions for parrot traits of interest (i.e., speech, intelligence, longevity which were independently supported by the results of previous human GWAS

  7. Diversity of thermophiles in a Malaysian hot spring determined using 16S rRNA and shotgun metagenome sequencing

    Directory of Open Access Journals (Sweden)

    Chia Sing eChan

    2015-03-01

    Full Text Available The Sungai Klah (SK hot spring is the second hottest geothermal spring in Malaysia. This hot spring is a shallow, 150-meter-long, fast-flowing stream, with temperatures varying from 50 to 110°C and a pH range of 7.0 to 9.0. Hidden within a wooded area, the SK hot spring is continually fed by plant litter, resulting in a relatively high degree of total organic content (TOC. In this study, a sample taken from the middle of the stream was analyzed at the 16S rRNA V3−V4 region by amplicon metagenome sequencing. Over 35 phyla were detected by analyzing the 16S rRNA data. Firmicutes and Proteobacteria represented approximately 57% of the microbiome. Approximately 70% of the detected thermophiles were strict anaerobes; however, Hydrogenobacter spp., obligate chemolithotrophic thermophiles, represented one of the major taxa. Several thermophilic photosynthetic microorganisms and acidothermophiles were also detected. Most of the phyla identified by 16S rRNA were also found using the shotgun metagenome approaches. The carbon, sulfur, and nitrogen metabolism within the SK hot spring community were evaluated by shotgun metagenome sequencing, and the data revealed diversity in terms of metabolic activity and dynamics. This hot spring has a rich diversified phylogenetic community partly due to its natural environment (plant litter, high TOC, and a shallow stream and geochemical parameters (broad temperature and pH range. It is speculated that symbiotic relationships occur between the members of the community.

  8. Whole Genome Shotgun Sequencing Shows Selection on Leptospira Regulatory Proteins during in vitro Culture Attenuation

    Science.gov (United States)

    Lehmann, Jason S.; Corey, Victoria C.; Ricaldi, Jessica N.; Vinetz, Joseph M.; Winzeler, Elizabeth A.; Matthias, Michael A.

    2016-01-01

    Leptospirosis is the most common zoonotic disease worldwide with an estimated 500,000 severe cases reported annually, and case fatality rates of 12–25%, due primarily to acute kidney and lung injuries. Despite its prevalence, the molecular mechanisms underlying leptospirosis pathogenesis remain poorly understood. To identify virulence-related genes in Leptospira interrogans, we delineated cumulative genome changes that occurred during serial in vitro passage of a highly virulent strain of L. interrogans serovar Lai into a nearly avirulent isogenic derivative. Comparison of protein coding and computationally predicted noncoding RNA (ncRNA) genes between these two polyclonal strains identified 15 nonsynonymous single nucleotide variant (nsSNV) alleles that increased in frequency and 19 that decreased, whereas no changes in allelic frequency were observed among the ncRNA genes. Some of the nsSNV alleles were in six genes shown previously to be transcriptionally upregulated during exposure to in vivo-like conditions. Five of these nsSNVs were in evolutionarily conserved positions in genes related to signal transduction and metabolism. Frequency changes of minor nsSNV alleles identified in this study likely contributed to the loss of virulence during serial in vitro culture. The identification of new virulence-associated genes should spur additional experimental inquiry into their potential role in Leptospira pathogenesis. PMID:26711524

  9. Genome-Wide Mutation Rate Response to pH Change in the Coral Reef Pathogen Vibrio shilonii AK1.

    Science.gov (United States)

    Strauss, Chloe; Long, Hongan; Patterson, Caitlyn E; Te, Ronald; Lynch, Michael

    2017-08-22

    Recent application of mutation accumulation techniques combined with whole-genome sequencing (MA/WGS) has greatly promoted studies of spontaneous mutation. However, such explorations have rarely been conducted on marine organisms, and it is unclear how marine habitats have influenced genome stability. This report resolves the mutation rate and spectrum of the coral reef pathogen Vibrio shilonii , which causes coral bleaching and endangers the biodiversity maintained by coral reefs. We found that its mutation rate and spectrum are highly similar to those of other studied bacteria from various habitats, despite the saline environment. The mutational properties of this marine bacterium are thus controlled by other general evolutionary forces such as natural selection and genetic drift. We also found that as pH drops, the mutation rate decreases and the mutation spectrum is biased in the direction of generating G/C nucleotides. This implies that evolutionary features of this organism and perhaps other marine microbes might be altered by the increasingly acidic ocean water caused by excess CO 2 emission. Nonetheless, further exploration is needed as the pH range tested in this study was rather narrow and many other possible mutation determinants, such as carbonate increase, are associated with ocean acidification. IMPORTANCE This study explored the pH dependence of a bacterial genome-wide mutation rate. We discovered that the genome-wide rates of appearance of most mutation types decrease linearly and that the mutation spectrum is biased in generating more G/C nucleotides with pH drop in the coral reef pathogen V. shilonii . Copyright © 2017 Strauss et al.

  10. Shotgun proteomic analytical approach for studying proteins adsorbed onto liposome surface

    KAUST Repository

    Capriotti, Anna Laura

    2011-07-02

    The knowledge about the interaction between plasma proteins and nanocarriers employed for in vivo delivery is fundamental to understand their biodistribution. Protein adsorption onto nanoparticle surface (protein corona) is strongly affected by vector surface characteristics. In general, the primary interaction is thought to be electrostatic, thus surface charge of carrier is supposed to play a central role in protein adsorption. Because protein corona composition can be critical in modifying the interactive surface that is recognized by cells, characterizing its formation onto lipid particles may serve as a fundamental predictive model for the in vivo efficiency of a lipidic vector. In the present work, protein coronas adsorbed onto three differently charged cationic liposome formulations were compared by a shotgun proteomic approach based on nano-liquid chromatography-high-resolution mass spectrometry. About 130 proteins were identified in each corona, with only small differences between the different cationic liposome formulations. However, this study could be useful for the future controlled design of colloidal drug carriers and possibly in the controlled creation of biocompatible surfaces of other devices that come into contact with proteins into body fluids. © 2011 Springer-Verlag.

  11. In silico serotyping of E. coli from short read data identifies limited novel O-loci but extensive diversity of O:H serotype combinations within and between pathogenic lineages.

    Science.gov (United States)

    Ingle, Danielle J; Valcanis, Mary; Kuzevski, Alex; Tauschek, Marija; Inouye, Michael; Stinear, Tim; Levine, Myron M; Robins-Browne, Roy M; Holt, Kathryn E

    2016-07-01

    The lipopolysaccharide (O) and flagellar (H) surface antigens of Escherichia coli are targets for serotyping that have traditionally been used to identify pathogenic lineages. These surface antigens are important for the survival of E. coli within mammalian hosts. However, traditional serotyping has several limitations, and public health reference laboratories are increasingly moving towards whole genome sequencing (WGS) to characterize bacterial isolates. Here we present a method to rapidly and accurately serotype E. coli isolates from raw, short read WGS data. Our approach bypasses the need for de novo genome assembly by directly screening WGS reads against a curated database of alleles linked to known and novel E. coli O-groups and H-types (the EcOH database) using the software package srst2. We validated the approach by comparing in silico results for 197 enteropathogenic E. coli isolates with those obtained by serological phenotyping in an independent laboratory. We then demonstrated the utility of our method to characterize isolates in public health and clinical settings, and to explore the genetic diversity of >1500 E. coli genomes from multiple sources. Importantly, we showed that transfer of O- and H-antigen loci between E. coli chromosomal backbones is common, with little evidence of constraints by host or pathotype, suggesting that E. coli ' strain space' may be virtually unlimited, even within specific pathotypes. Our findings show that serotyping is most useful when used in combination with strain genotyping to characterize microevolution events within an inferred population structure.

  12. A segment of the apospory-specific genomic region is highly microsyntenic not only between the apomicts Pennisetum squamulatum and buffelgrass, but also with a rice chromosome 11 centromeric-proximal genomic region.

    Science.gov (United States)

    Gualtieri, Gustavo; Conner, Joann A; Morishige, Daryl T; Moore, L David; Mullet, John E; Ozias-Akins, Peggy

    2006-03-01

    Bacterial artificial chromosome (BAC) clones from apomicts Pennisetum squamulatum and buffelgrass (Cenchrus ciliaris), isolated with the apospory-specific genomic region (ASGR) marker ugt197, were assembled into contigs that were extended by chromosome walking. Gene-like sequences from contigs were identified by shotgun sequencing and BLAST searches, and used to isolate orthologous rice contigs. Additional gene-like sequences in the apomicts' contigs were identified by bioinformatics using fully sequenced BACs from orthologous rice contigs as templates, as well as by interspecies, whole-contig cross-hybridizations. Hierarchical contig orthology was rapidly assessed by constructing detailed long-range contig molecular maps showing the distribution of gene-like sequences and markers, and searching for microsyntenic patterns of sequence identity and spatial distribution within and across species contigs. We found microsynteny between P. squamulatum and buffelgrass contigs. Importantly, this approach also enabled us to isolate from within the rice (Oryza sativa) genome contig Rice A, which shows the highest microsynteny and is most orthologous to the ugt197-containing C1C buffelgrass contig. Contig Rice A belongs to the rice genome database contig 77 (according to the current September 12, 2003, rice fingerprint contig build) that maps proximal to the chromosome 11 centromere, a feature that interestingly correlates with the mapping of ASGR-linked BACs proximal to the centromere or centromere-like sequences. Thus, relatedness between these two orthologous contigs is supported both by their molecular microstructure and by their centromeric-proximal location. Our discoveries promote the use of a microsynteny-based positional-cloning approach using the rice genome as a template to aid in constructing the ASGR toward the isolation of genes underlying apospory.

  13. Whole Genome Sequence Analysis of Salmonella Typhi Isolated in Thailand before and after the Introduction of a National Immunization Program.

    Directory of Open Access Journals (Sweden)

    Zoe A Dyson

    2017-01-01

    Full Text Available Vaccines against Salmonella Typhi, the causative agent of typhoid fever, are commonly used by travellers, however, there are few examples of national immunization programs in endemic areas. There is therefore a paucity of data on the impact of typhoid immunization programs on localised populations of S. Typhi. Here we have used whole genome sequencing (WGS to characterise 44 historical bacterial isolates collected before and after a national typhoid immunization program that was implemented in Thailand in 1977 in response to a large outbreak; the program was highly effective in reducing typhoid case numbers. Thai isolates were highly diverse, including 10 distinct phylogenetic lineages or genotypes. Novel prophage and plasmids were also detected, including examples that were previously only reported in Shigella sonnei and Escherichia coli. The majority of S. Typhi genotypes observed prior to the immunization program were not observed following it. Post-vaccine era isolates were more closely related to S. Typhi isolated from neighbouring countries than to earlier Thai isolates, providing no evidence for the local persistence of endemic S. Typhi following the national immunization program. Rather, later cases of typhoid appeared to be caused by the occasional importation of common genotypes from neighbouring Vietnam, Laos, and Cambodia. These data show the value of WGS in understanding the impacts of vaccination on pathogen populations and provide support for the proposal that large-scale typhoid immunization programs in endemic areas could result in lasting local disease elimination, although larger prospective studies are needed to test this directly.

  14. Sequencing, de novo assembling, and annotating the genome of the endangered Chinese crocodile lizard Shinisaurus crocodilurus.

    Science.gov (United States)

    Gao, Jian; Li, Qiye; Wang, Zongji; Zhou, Yang; Martelli, Paolo; Li, Fang; Xiong, Zijun; Wang, Jian; Yang, Huanming; Zhang, Guojie

    2017-07-01

    The Chinese crocodile lizard, Shinisaurus crocodilurus, is the only living representative of the monotypic family Shinisauridae under the order Squamata. It is an obligate semi-aquatic, viviparous, diurnal species restricted to specific portions of mountainous locations in southwestern China and northeastern Vietnam. However, in the past several decades, this species has undergone a rapid decrease in population size due to illegal poaching and habitat disruption, making this unique reptile species endangered and listed in the Convention on International Trade in Endangered Species of Wild Fauna and Flora Appendix II since 1990. A proposal to uplist it to Appendix I was passed at the Convention on International Trade in Endangered Species of Wild Fauna