WorldWideScience

Sample records for coverage shotgun sequencing

  1. Pigs in sequence space: A 0.66X coverage pig genome survey based on shotgun sequencing

    Directory of Open Access Journals (Sweden)

    Li Wei

    2005-05-01

    Full Text Available Abstract Background Comparative whole genome analysis of Mammalia can benefit from the addition of more species. The pig is an obvious choice due to its economic and medical importance as well as its evolutionary position in the artiodactyls. Results We have generated ~3.84 million shotgun sequences (0.66X coverage from the pig genome. The data are hereby released (NCBI Trace repository with center name "SDJVP", and project name "Sino-Danish Pig Genome Project" together with an initial evolutionary analysis. The non-repetitive fraction of the sequences was aligned to the UCSC human-mouse alignment and the resulting three-species alignments were annotated using the human genome annotation. Ultra-conserved elements and miRNAs were identified. The results show that for each of these types of orthologous data, pig is much closer to human than mouse is. Purifying selection has been more efficient in pig compared to human, but not as efficient as in mouse, and pig seems to have an isochore structure most similar to the structure in human. Conclusion The addition of the pig to the set of species sequenced at low coverage adds to the understanding of selective pressures that have acted on the human genome by bisecting the evolutionary branch between human and mouse with the mouse branch being approximately 3 times as long as the human branch. Additionally, the joint alignment of the shot-gun sequences to the human-mouse alignment offers the investigator a rapid way to defining specific regions for analysis and resequencing.

  2. The genome of flax (Linum usitatissimum) assembled de novo from short shotgun sequence reads.

    Science.gov (United States)

    Wang, Zhiwen; Hobson, Neil; Galindo, Leonardo; Zhu, Shilin; Shi, Daihu; McDill, Joshua; Yang, Linfeng; Hawkins, Simon; Neutelings, Godfrey; Datla, Raju; Lambert, Georgina; Galbraith, David W; Grassa, Christopher J; Geraldes, Armando; Cronk, Quentin C; Cullis, Christopher; Dash, Prasanta K; Kumar, Polumetla A; Cloutier, Sylvie; Sharpe, Andrew G; Wong, Gane K-S; Wang, Jun; Deyholos, Michael K

    2012-11-01

    Flax (Linum usitatissimum) is an ancient crop that is widely cultivated as a source of fiber, oil and medicinally relevant compounds. To accelerate crop improvement, we performed whole-genome shotgun sequencing of the nuclear genome of flax. Seven paired-end libraries ranging in size from 300 bp to 10 kb were sequenced using an Illumina genome analyzer. A de novo assembly, comprised exclusively of deep-coverage (approximately 94× raw, approximately 69× filtered) short-sequence reads (44-100 bp), produced a set of scaffolds with N(50) =694 kb, including contigs with N(50)=20.1 kb. The contig assembly contained 302 Mb of non-redundant sequence representing an estimated 81% genome coverage. Up to 96% of published flax ESTs aligned to the whole-genome shotgun scaffolds. However, comparisons with independently sequenced BACs and fosmids showed some mis-assembly of regions at the genome scale. A total of 43384 protein-coding genes were predicted in the whole-genome shotgun assembly, and up to 93% of published flax ESTs, and 86% of A. thaliana genes aligned to these predicted genes, indicating excellent coverage and accuracy at the gene level. Analysis of the synonymous substitution rates (K(s) ) observed within duplicate gene pairs was consistent with a recent (5-9 MYA) whole-genome duplication in flax. Within the predicted proteome, we observed enrichment of many conserved domains (Pfam-A) that may contribute to the unique properties of this crop, including agglutinin proteins. Together these results show that de novo assembly, based solely on whole-genome shotgun short-sequence reads, is an efficient means of obtaining nearly complete genome sequence information for some plant species. © 2012 The Authors. The Plant Journal © 2012 Blackwell Publishing Ltd.

  3. Shotgun protein sequencing.

    Energy Technology Data Exchange (ETDEWEB)

    Faulon, Jean-Loup Michel; Heffelfinger, Grant S.

    2009-06-01

    A novel experimental and computational technique based on multiple enzymatic digestion of a protein or protein mixture that reconstructs protein sequences from sequences of overlapping peptides is described in this SAND report. This approach, analogous to shotgun sequencing of DNA, is to be used to sequence alternative spliced proteins, to identify post-translational modifications, and to sequence genetically engineered proteins.

  4. Pigs in sequence space: A 0.66X coverage pig genome survey based on shotgun sequencing

    DEFF Research Database (Denmark)

    Wernersson, Rasmus; Schierup, M.H.; Jorgensen, F.G.

    2005-01-01

    sequences (0.66X coverage) from the pig genome. The data are hereby released (NCBI Trace repository with center name "SDJVP", and project name "Sino-Danish Pig Genome Project") together with an initial evolutionary analysis. The non-repetitive fraction of the sequences was aligned to the UCSC human...

  5. RePS: a sequence assembler that masks exact repeats identified from the shotgun data

    DEFF Research Database (Denmark)

    Wang, Jun; Wong, Gane Ka-Shu; Ni, Peixiang

    2002-01-01

    We describe a sequence assembler, RePS (repeat-masked Phrap with scaffolding), that explicitly identifies exact 20mer repeats from the shotgun data and removes them prior to the assembly. The established software is used to compute meaningful error probabilities for each base. Clone......-end-pairing information is used to construct scaffolds that order and orient the contigs. We show with real data for human and rice that reasonable assemblies are possible even at coverages of only 4x to 6x, despite having up to 42.2% in exact repeats. Udgivelsesdato: 2002-May...

  6. Bioinformatics for whole-genome shotgun sequencing of microbial communities.

    Directory of Open Access Journals (Sweden)

    Kevin Chen

    2005-07-01

    Full Text Available The application of whole-genome shotgun sequencing to microbial communities represents a major development in metagenomics, the study of uncultured microbes via the tools of modern genomic analysis. In the past year, whole-genome shotgun sequencing projects of prokaryotic communities from an acid mine biofilm, the Sargasso Sea, Minnesota farm soil, three deep-sea whale falls, and deep-sea sediments have been reported, adding to previously published work on viral communities from marine and fecal samples. The interpretation of this new kind of data poses a wide variety of exciting and difficult bioinformatics problems. The aim of this review is to introduce the bioinformatics community to this emerging field by surveying existing techniques and promising new approaches for several of the most interesting of these computational problems.

  7. Transposon fingerprinting using low coverage whole genome shotgun sequencing in cacao (Theobroma cacao L.) and related species.

    Science.gov (United States)

    Sveinsson, Saemundur; Gill, Navdeep; Kane, Nolan C; Cronk, Quentin

    2013-07-24

    Transposable elements (TEs) and other repetitive elements are a large and dynamically evolving part of eukaryotic genomes, especially in plants where they can account for a significant proportion of genome size. Their dynamic nature gives them the potential for use in identifying and characterizing crop germplasm. However, their repetitive nature makes them challenging to study using conventional methods of molecular biology. Next generation sequencing and new computational tools have greatly facilitated the investigation of TE variation within species and among closely related species. (i) We generated low-coverage Illumina whole genome shotgun sequencing reads for multiple individuals of cacao (Theobroma cacao) and related species. These reads were analysed using both an alignment/mapping approach and a de novo (graph based clustering) approach. (ii) A standard set of ultra-conserved orthologous sequences (UCOS) standardized TE data between samples and provided phylogenetic information on the relatedness of samples. (iii) The mapping approach proved highly effective within the reference species but underestimated TE abundance in interspecific comparisons relative to the de novo methods. (iv) Individual T. cacao accessions have unique patterns of TE abundance indicating that the TE composition of the genome is evolving actively within this species. (v) LTR/Gypsy elements are the most abundant, comprising c.10% of the genome. (vi) Within T. cacao the retroelement families show an order of magnitude greater sequence variability than the DNA transposon families. (vii) Theobroma grandiflorum has a similar TE composition to T. cacao, but the related genus Herrania is rather different, with LTRs making up a lower proportion of the genome, perhaps because of a massive presence (c. 20%) of distinctive low complexity satellite-like repeats in this genome. (i) Short read alignment/mapping to reference TE contigs provides a simple and effective method of investigating

  8. Whole genome shotgun sequencing of Indian strains of Streptococcus agalactiae

    Directory of Open Access Journals (Sweden)

    Balaji Veeraraghavan

    2017-12-01

    Full Text Available Group B streptococcus is known as a leading cause of neonatal infections in developing countries. The present study describes the whole genome shotgun sequences of four Group B Streptococcus (GBS isolates. Molecular data on clonality is lacking for GBS in India. The present genome report will add important information on the scarce genome data of GBS and will help in deriving comparative genome studies of GBS isolates at global level. This Whole Genome Shotgun project has been deposited at DDBJ/ENA/GenBank under the accession numbers NHPL00000000 – NHPO00000000.

  9. Organization and evolution of primate centromeric DNA from whole-genome shotgun sequence data.

    Directory of Open Access Journals (Sweden)

    Can Alkan

    2007-09-01

    Full Text Available The major DNA constituent of primate centromeres is alpha satellite DNA. As much as 2%-5% of sequence generated as part of primate genome sequencing projects consists of this material, which is fragmented or not assembled as part of published genome sequences due to its highly repetitive nature. Here, we develop computational methods to rapidly recover and categorize alpha-satellite sequences from previously uncharacterized whole-genome shotgun sequence data. We present an algorithm to computationally predict potential higher-order array structure based on paired-end sequence data and then experimentally validate its organization and distribution by experimental analyses. Using whole-genome shotgun data from the human, chimpanzee, and macaque genomes, we examine the phylogenetic relationship of these sequences and provide further support for a model for their evolution and mutation over the last 25 million years. Our results confirm fundamental differences in the dispersal and evolution of centromeric satellites in the Old World monkey and ape lineages of evolution.

  10. Organization and evolution of primate centromeric DNA from whole-genome shotgun sequence data.

    Science.gov (United States)

    Alkan, Can; Ventura, Mario; Archidiacono, Nicoletta; Rocchi, Mariano; Sahinalp, S Cenk; Eichler, Evan E

    2007-09-01

    The major DNA constituent of primate centromeres is alpha satellite DNA. As much as 2%-5% of sequence generated as part of primate genome sequencing projects consists of this material, which is fragmented or not assembled as part of published genome sequences due to its highly repetitive nature. Here, we develop computational methods to rapidly recover and categorize alpha-satellite sequences from previously uncharacterized whole-genome shotgun sequence data. We present an algorithm to computationally predict potential higher-order array structure based on paired-end sequence data and then experimentally validate its organization and distribution by experimental analyses. Using whole-genome shotgun data from the human, chimpanzee, and macaque genomes, we examine the phylogenetic relationship of these sequences and provide further support for a model for their evolution and mutation over the last 25 million years. Our results confirm fundamental differences in the dispersal and evolution of centromeric satellites in the Old World monkey and ape lineages of evolution.

  11. The genome of flax (Linum usitatissimum) assembled de novo from short shotgun sequence reads

    DEFF Research Database (Denmark)

    Wang, Zhiwen; Hobson, Neil; Galindo, Leonardo

    2012-01-01

    Flax (Linum usitatissimum) is an ancient crop that is widely cultivated as a source of fiber, oil and medicinally relevant compounds. To accelerate crop improvement, we performed whole-genome shotgun sequencing of the nuclear genome of flax. Seven paired-end libraries ranging in size from 300 bp...... these results show that de novo assembly, based solely on whole-genome shotgun short-sequence reads, is an efficient means of obtaining nearly complete genome sequence information for some plant species....

  12. Binning metagenomic contigs by coverage and composition

    NARCIS (Netherlands)

    Alneberg, J.; Bjarnason, B.S.; Bruijn, de I.; Schirmer, M.; Quick, J.; Ijaz, U.Z.; Lahti, L.M.; Loman, N.J.; Andersson, A.F.; Quince, C.

    2014-01-01

    Shotgun sequencing enables the reconstruction of genomes from complex microbial communities, but because assembly does not reconstruct entire genomes, it is necessary to bin genome fragments. Here we present CONCOCT, a new algorithm that combines sequence composition and coverage across multiple

  13. Shotgun metagenomic data streams: surfing without fear

    Energy Technology Data Exchange (ETDEWEB)

    Berendzen, Joel R [Los Alamos National Laboratory

    2010-12-06

    Timely information about bio-threat prevalence, consequence, propagation, attribution, and mitigation is needed to support decision-making, both routinely and in a crisis. One DNA sequencer can stream 25 Gbp of information per day, but sampling strategies and analysis techniques are needed to turn raw sequencing power into actionable knowledge. Shotgun metagenomics can enable biosurveillance at the level of a single city, hospital, or airplane. Metagenomics characterizes viruses and bacteria from complex environments such as soil, air filters, or sewage. Unlike targeted-primer-based sequencing, shotgun methods are not blind to sequences that are truly novel, and they can measure absolute prevalence. Shotgun metagenomic sampling can be non-invasive, efficient, and inexpensive while being informative. We have developed analysis techniques for shotgun metagenomic sequencing that rely upon phylogenetic signature patterns. They work by indexing local sequence patterns in a manner similar to web search engines. Our methods are laptop-fast and favorable scaling properties ensure they will be sustainable as sequencing methods grow. We show examples of application to soil metagenomic samples.

  14. Choosing the best plant for the job: a cost-effective assay to prescreen ancient plant remains destined for shotgun sequencing.

    Directory of Open Access Journals (Sweden)

    Nathan Wales

    Full Text Available DNA extracted from ancient plant remains almost always contains a mixture of endogenous (that is, derived from the plant and exogenous (derived from other sources DNA. The exogenous 'contaminant' DNA, chiefly derived from microorganisms, presents significant problems for shotgun sequencing. In some samples, more than 90% of the recovered sequences are exogenous, providing limited data relevant to the sample. However, other samples have far less contamination and subsequently yield much more useful data via shotgun sequencing. Given the investment required for high-throughput sequencing, whenever multiple samples are available, it is most economical to sequence the least contaminated sample. We present an assay based on quantitative real-time PCR which estimates the relative amounts of fungal and bacterial DNA in a sample in comparison to the endogenous plant DNA. Given a collection of contextually-similar ancient plant samples, this low cost assay aids in selecting the best sample for shotgun sequencing.

  15. Microbial Community Profiling of Human Saliva Using Shotgun Metagenomic Sequencing

    OpenAIRE

    Hasan, Nur A.; Young, Brian A.; Minard-Smith, Angela T.; Saeed, Kelly; Li, Huai; Heizer, Esley M.; McMillan, Nancy J.; Isom, Richard; Abdullah, Abdul Shakur; Bornman, Daniel M.; Faith, Seth A.; Choi, Seon Young; Dickens, Michael L.; Cebula, Thomas A.; Colwell, Rita R.

    2014-01-01

    Human saliva is clinically informative of both oral and general health. Since next generation shotgun sequencing (NGS) is now widely used to identify and quantify bacteria, we investigated the bacterial flora of saliva microbiomes of two healthy volunteers and five datasets from the Human Microbiome Project, along with a control dataset containing short NGS reads from bacterial species representative of the bacterial flora of human saliva. GENIUS, a system designed to identify and quantify ba...

  16. OTU analysis using metagenomic shotgun sequencing data.

    Directory of Open Access Journals (Sweden)

    Xiaolin Hao

    Full Text Available Because of technological limitations, the primer and amplification biases in targeted sequencing of 16S rRNA genes have veiled the true microbial diversity underlying environmental samples. However, the protocol of metagenomic shotgun sequencing provides 16S rRNA gene fragment data with natural immunity against the biases raised during priming and thus the potential of uncovering the true structure of microbial community by giving more accurate predictions of operational taxonomic units (OTUs. Nonetheless, the lack of statistically rigorous comparison between 16S rRNA gene fragments and other data types makes it difficult to interpret previously reported results using 16S rRNA gene fragments. Therefore, in the present work, we established a standard analysis pipeline that would help confirm if the differences in the data are true or are just due to potential technical bias. This pipeline is built by using simulated data to find optimal mapping and OTU prediction methods. The comparison between simulated datasets revealed a relationship between 16S rRNA gene fragments and full-length 16S rRNA sequences that a 16S rRNA gene fragment having a length >150 bp provides the same accuracy as a full-length 16S rRNA sequence using our proposed pipeline, which could serve as a good starting point for experimental design and making the comparison between 16S rRNA gene fragment-based and targeted 16S rRNA sequencing-based surveys possible.

  17. Population genetic analysis of shotgun assemblies of genomic sequences from multiple individuals

    DEFF Research Database (Denmark)

    Hellmann, Ines; Mang, Yuan; Gu, Zhiping

    2008-01-01

    We introduce a simple, broadly applicable method for obtaining estimates of nucleotide diversity from genomic shotgun sequencing data. The method takes into account the special nature of these data: random sampling of genomic segments from one or more individuals and a relatively high error rate...... for individual reads. Applying this method to data from the Celera human genome sequencing and SNP discovery project, we obtain estimates of nucleotide diversity in windows spanning the human genome and show that the diversity to divergence ratio is reduced in regions of low recombination. Furthermore, we show...

  18. Culture-independent detection and characterisation of Mycobacterium tuberculosis and M. africanum in sputum samples using shotgun metagenomics on a benchtop sequencer

    Directory of Open Access Journals (Sweden)

    Emma L. Doughty

    2014-09-01

    Full Text Available Tuberculosis remains a major global health problem. Laboratory diagnostic methods that allow effective, early detection of cases are central to management of tuberculosis in the individual patient and in the community. Since the 1880s, laboratory diagnosis of tuberculosis has relied primarily on microscopy and culture. However, microscopy fails to provide species- or lineage-level identification and culture-based workflows for diagnosis of tuberculosis remain complex, expensive, slow, technically demanding and poorly able to handle mixed infections. We therefore explored the potential of shotgun metagenomics, sequencing of DNA from samples without culture or target-specific amplification or capture, to detect and characterise strains from the Mycobacterium tuberculosis complex in smear-positive sputum samples obtained from The Gambia in West Africa. Eight smear- and culture-positive sputum samples were investigated using a differential-lysis protocol followed by a kit-based DNA extraction method, with sequencing performed on a benchtop sequencing instrument, the Illumina MiSeq. The number of sequence reads in each sputum-derived metagenome ranged from 989,442 to 2,818,238. The proportion of reads in each metagenome mapping against the human genome ranged from 20% to 99%. We were able to detect sequences from the M. tuberculosis complex in all eight samples, with coverage of the H37Rv reference genome ranging from 0.002X to 0.7X. By analysing the distribution of large sequence polymorphisms (deletions and the locations of the insertion element IS6110 and single nucleotide polymorphisms (SNPs, we were able to assign seven of eight metagenome-derived genomes to a species and lineage within the M. tuberculosis complex. Two metagenome-derived mycobacterial genomes were assigned to M. africanum, a species largely confined to West Africa; the others that could be assigned belonged to lineages T, H or LAM within the clade of “modern” M. tuberculosis

  19. Tandem Mass Spectrum Sequencing: An Alternative to Database Search Engines in Shotgun Proteomics.

    Science.gov (United States)

    Muth, Thilo; Rapp, Erdmann; Berven, Frode S; Barsnes, Harald; Vaudel, Marc

    2016-01-01

    Protein identification via database searches has become the gold standard in mass spectrometry based shotgun proteomics. However, as the quality of tandem mass spectra improves, direct mass spectrum sequencing gains interest as a database-independent alternative. In this chapter, the general principle of this so-called de novo sequencing is introduced along with pitfalls and challenges of the technique. The main tools available are presented with a focus on user friendly open source software which can be directly applied in everyday proteomic workflows.

  20. Identification of antimicrobial resistance genes in multidrug-resistant clinical Bacteroides fragilis isolates by whole genome shotgun sequencing

    DEFF Research Database (Denmark)

    Sydenham, Thomas Vognbjerg; Sóki, József; Hasman, Henrik

    2015-01-01

    Bacteroides fragilis constitutes the most frequent anaerobic bacterium causing bacteremia in humans. The genetic background for antimicrobial resistance in B. fragilis is diverse with some genes requiring insertion sequence (IS) elements inserted upstream for increased expression. To evaluate whole...... genome shotgun sequencing as a method for predicting antimicrobial resistance properties, one meropenem resistant and five multidrug-resistant blood culture isolates were sequenced and antimicrobial resistance genes and IS elements identified using ResFinder 2.1 (http...

  1. The European sea bass Dicentrarchus labrax genome puzzle: comparative BAC-mapping and low coverage shotgun sequencing

    Directory of Open Access Journals (Sweden)

    Volckaert Filip AM

    2010-01-01

    Full Text Available Abstract Background Food supply from the ocean is constrained by the shortage of domesticated and selected fish. Development of genomic models of economically important fishes should assist with the removal of this bottleneck. European sea bass Dicentrarchus labrax L. (Moronidae, Perciformes, Teleostei is one of the most important fishes in European marine aquaculture; growing genomic resources put it on its way to serve as an economic model. Results End sequencing of a sea bass genomic BAC-library enabled the comparative mapping of the sea bass genome using the three-spined stickleback Gasterosteus aculeatus genome as a reference. BAC-end sequences (102,690 were aligned to the stickleback genome. The number of mappable BACs was improved using a two-fold coverage WGS dataset of sea bass resulting in a comparative BAC-map covering 87% of stickleback chromosomes with 588 BAC-contigs. The minimum size of 83 contigs covering 50% of the reference was 1.2 Mbp; the largest BAC-contig comprised 8.86 Mbp. More than 22,000 BAC-clones aligned with both ends to the reference genome. Intra-chromosomal rearrangements between sea bass and stickleback were identified. Size distributions of mapped BACs were used to calculate that the genome of sea bass may be only 1.3 fold larger than the 460 Mbp stickleback genome. Conclusions The BAC map is used for sequencing single BACs or BAC-pools covering defined genomic entities by second generation sequencing technologies. Together with the WGS dataset it initiates a sea bass genome sequencing project. This will allow the quantification of polymorphisms through resequencing, which is important for selecting highly performing domesticated fish.

  2. A Novel Prosthetic Joint Infection Pathogen, Mycoplasma salivarium, Identified by Metagenomic Shotgun Sequencing.

    Science.gov (United States)

    Thoendel, Matthew; Jeraldo, Patricio; Greenwood-Quaintance, Kerryl E; Chia, Nicholas; Abdel, Matthew P; Steckelberg, James M; Osmon, Douglas R; Patel, Robin

    2017-07-15

    Defining the microbial etiology of culture-negative prosthetic joint infection (PJI) can be challenging. Metagenomic shotgun sequencing is a new tool to identify organisms undetected by conventional methods. We present a case where metagenomics was used to identify Mycoplasma salivarium as a novel PJI pathogen in a patient with hypogammaglobulinemia. © The Author 2017. Published by Oxford University Press for the Infectious Diseases Society of America. All rights reserved. For permissions, e-mail: journals.permissions@oup.com.

  3. Paleogenomics in a temperate environment: shotgun sequencing from an extinct Mediterranean caprine.

    Directory of Open Access Journals (Sweden)

    Oscar Ramírez

    Full Text Available BACKGROUND: Numerous endemic mammals, including dwarf elephants, goats, hippos and deers, evolved in isolation in the Mediterranean islands during the Pliocene and Pleistocene. Most of them subsequently became extinct during the Holocene. Recently developed high-throughput sequencing technologies could provide a unique tool for retrieving genomic data from these extinct species, making it possible to study their evolutionary history and the genetic bases underlying their particular, sometimes unique, adaptations. METHODOLOGY/PRINCIPALS FINDINGS: A DNA extraction of a approximately 6,000 year-old bone sample from an extinct caprine (Myotragus balearicus from the Balearic Islands in the Western Mediterranean, has been subjected to shotgun sequencing with the GS FLX 454 platform. Only 0.27% of the resulting sequences, identified from alignments with the cow genome and comprising 15,832 nucleotides, with an average length of 60 nucleotides, proved to be endogenous. CONCLUSIONS: A phylogenetic tree generated with Myotragus sequences and those from other artiodactyls displays an identical topology to that generated from mitochondrial DNA data. Despite being in an unfavourable thermal environment, which explains the low yield of endogenous sequences, our study demonstrates that it is possible to obtain genomic data from extinct species from temperate regions.

  4. Aspects of coverage in medical DNA sequencing

    Directory of Open Access Journals (Sweden)

    Wilson Richard K

    2008-05-01

    Full Text Available Abstract Background DNA sequencing is now emerging as an important component in biomedical studies of diseases like cancer. Short-read, highly parallel sequencing instruments are expected to be used heavily for such projects, but many design specifications have yet to be conclusively established. Perhaps the most fundamental of these is the redundancy required to detect sequence variations, which bears directly upon genomic coverage and the consequent resolving power for discerning somatic mutations. Results We address the medical sequencing coverage problem via an extension of the standard mathematical theory of haploid coverage. The expected diploid multi-fold coverage, as well as its generalization for aneuploidy are derived and these expressions can be readily evaluated for any project. The resulting theory is used as a scaling law to calibrate performance to that of standard BAC sequencing at 8× to 10× redundancy, i.e. for expected coverages that exceed 99% of the unique sequence. A differential strategy is formalized for tumor/normal studies wherein tumor samples are sequenced more deeply than normal ones. In particular, both tumor alleles should be detected at least twice, while both normal alleles are detected at least once. Our theory predicts these requirements can be met for tumor and normal redundancies of approximately 26× and 21×, respectively. We explain why these values do not differ by a factor of 2, as might intuitively be expected. Future technology developments should prompt even deeper sequencing of tumors, but the 21× value for normal samples is essentially a constant. Conclusion Given the assumptions of standard coverage theory, our model gives pragmatic estimates for required redundancy. The differential strategy should be an efficient means of identifying potential somatic mutations for further study.

  5. Genome Sequencing

    DEFF Research Database (Denmark)

    Sato, Shusei; Andersen, Stig Uggerhøj

    2014-01-01

    The current Lotus japonicus reference genome sequence is based on a hybrid assembly of Sanger TAC/BAC, Sanger shotgun and Illumina shotgun sequencing data generated from the Miyakojima-MG20 accession. It covers nearly all expressed L. japonicus genes and has been annotated mainly based on transcr......The current Lotus japonicus reference genome sequence is based on a hybrid assembly of Sanger TAC/BAC, Sanger shotgun and Illumina shotgun sequencing data generated from the Miyakojima-MG20 accession. It covers nearly all expressed L. japonicus genes and has been annotated mainly based...

  6. metaBIT, an integrative and automated metagenomic pipeline for analysing microbial profiles from high-throughput sequencing shotgun data

    DEFF Research Database (Denmark)

    Louvel, Guillaume; Der Sarkissian, Clio; Hanghøj, Kristian Ebbesen

    2016-01-01

    -throughput DNA sequencing (HTS). Here, we develop metaBIT, an open-source computational pipeline automatizing routine microbial profiling of shotgun HTS data. Customizable by the user at different stringency levels, it performs robust taxonomy-based assignment and relative abundance calculation of microbial taxa......, as well as cross-sample statistical analyses of microbial diversity distributions. We demonstrate the versatility of metaBIT within a range of published HTS data sets sampled from the environment (soil and seawater) and the human body (skin and gut), but also from archaeological specimens. We present......-friendly profiling of the microbial DNA present in HTS shotgun data sets. The applications of metaBIT are vast, from monitoring of laboratory errors and contaminations, to the reconstruction of past and present microbiota, and the detection of candidate species, including pathogens....

  7. Application of whole genome shotgun sequencing for detection and characterization of genetically modified organisms and derived products.

    Science.gov (United States)

    Holst-Jensen, Arne; Spilsberg, Bjørn; Arulandhu, Alfred J; Kok, Esther; Shi, Jianxin; Zel, Jana

    2016-07-01

    The emergence of high-throughput, massive or next-generation sequencing technologies has created a completely new foundation for molecular analyses. Various selective enrichment processes are commonly applied to facilitate detection of predefined (known) targets. Such approaches, however, inevitably introduce a bias and are prone to miss unknown targets. Here we review the application of high-throughput sequencing technologies and the preparation of fit-for-purpose whole genome shotgun sequencing libraries for the detection and characterization of genetically modified and derived products. The potential impact of these new sequencing technologies for the characterization, breeding selection, risk assessment, and traceability of genetically modified organisms and genetically modified products is yet to be fully acknowledged. The published literature is reviewed, and the prospects for future developments and use of the new sequencing technologies for these purposes are discussed.

  8. Use of Metagenomic Shotgun Sequencing Technology To Detect Foodborne Pathogens within the Microbiome of the Beef Production Chain

    OpenAIRE

    Yang, Xiang; Noyes, Noelle R.; Doster, Enrique; Martin, Jennifer N.; Linke, Lyndsey M.; Magnuson, Roberta J.; Yang, Hua; Geornaras, Ifigenia; Woerner, Dale R.; Jones, Kenneth L.; Ruiz, Jaime; Boucher, Christina; Morley, Paul S.; Belk, Keith E.

    2016-01-01

    Foodborne illnesses associated with pathogenic bacteria are a global public health and economic challenge. The diversity of microorganisms (pathogenic and nonpathogenic) that exists within the food and meat industries complicates efforts to understand pathogen ecology. Further, little is known about the interaction of pathogens within the microbiome throughout the meat production chain. Here, a metagenomic approach and shotgun sequencing technology were used as tools to detect pathogenic bact...

  9. Using Growing Self-Organising Maps to Improve the Binning Process in Environmental Whole-Genome Shotgun Sequencing

    Science.gov (United States)

    Chan, Chon-Kit Kenneth; Hsu, Arthur L.; Tang, Sen-Lin; Halgamuge, Saman K.

    2008-01-01

    Metagenomic projects using whole-genome shotgun (WGS) sequencing produces many unassembled DNA sequences and small contigs. The step of clustering these sequences, based on biological and molecular features, is called binning. A reported strategy for binning that combines oligonucleotide frequency and self-organising maps (SOM) shows high potential. We improve this strategy by identifying suitable training features, implementing a better clustering algorithm, and defining quantitative measures for assessing results. We investigated the suitability of each of di-, tri-, tetra-, and pentanucleotide frequencies. The results show that dinucleotide frequency is not a sufficiently strong signature for binning 10 kb long DNA sequences, compared to the other three. Furthermore, we observed that increased order of oligonucleotide frequency may deteriorate the assignment result in some cases, which indicates the possible existence of optimal species-specific oligonucleotide frequency. We replaced SOM with growing self-organising map (GSOM) where comparable results are obtained while gaining 7%–15% speed improvement. PMID:18288261

  10. Common Wheat Chromosome 5B Composition Analysis Using Low-Coverage 454 Sequencing

    Czech Academy of Sciences Publication Activity Database

    Sergeeva, E.M.; Afonnikov, D. A.; Koltunova, M. K.; Gusev, V.D.; Miroshnichenko, L. A.; Vrána, Jan; Kubaláková, Marie; Poncet, C.; Sourdille, P.; Feuillet, C.; Doležel, Jaroslav; Salina, E.A.

    2014-01-01

    Roč. 7, č. 2 (2014) ISSN 1940-3372 R&D Projects: GA ČR GBP501/12/G090; GA MŠk(CZ) LO1204 Grant - others:GA MŠk(CZ) ED0007/01/01 Program:ED Institutional support: RVO:61389030 Keywords : GENOME SHOTGUN SEQUENCES * IN-SITU HYBRIDIZATION * HEXAPLOID WHEAT Subject RIV: EB - Genetics ; Molecular Biology Impact factor: 3.933, year: 2014

  11. Microbiological profile of chicken carcasses: A comparative analysis using shotgun metagenomic sequencing

    Directory of Open Access Journals (Sweden)

    Alessandra De Cesare

    2018-04-01

    Full Text Available In the last few years metagenomic and 16S rRNA sequencing have completly changed the microbiological investigations of food products. In this preliminary study, the microbiological profile of chicken carcasses collected from animals fed with different diets were tested by using shotgun metagenomic sequencing. A total of 15 carcasses have been collected at the slaughetrhouse at the end of the refrigeration tunnel from chickens reared for 35 days and fed with a control diet (n=5, a diet supplemented with 1500 FTU/kg of commercial phytase (n=5 and a diet supplemented with 1500 FTU/kg of commercial phytase and 3g/kg of inositol (n=5. Ten grams of neck and breast skin were obtained from each carcass and submited to total DNA extraction by using the DNeasy Blood & Tissue Kit (Qiagen. Sequencing libraries have been prepared by using the Nextera XT DNA Library Preparation Kit (Illumina and sequenced in a HiScanSQ (Illumina at 100 bp in paired ends. A number of sequences ranging between 5 and 9 million was obtained for each sample. Sequence analysis showed that Proteobacteria and Firmicutes represented more than 98% of whole bacterial populations associated to carcass skin in all groups but their abundances were different between groups. Moraxellaceae and other degradative bacteria showed a significantly higher abundance in the control compared to the treated groups. Furthermore, Clostridium perfringens showed a relative frequency of abundance significantly higher in the group fed with phytase and Salmonella enterica in the group fed with phytase plus inositol. The results of this preliminary study showed that metagenome sequencing is suitable to investigate and monitor carcass microbiota in order to detect specific pathogenic and/or degradative populations.

  12. Improvement of methods for large scale sequencing; application to human Xq28

    Energy Technology Data Exchange (ETDEWEB)

    Gibbs, R.A.; Andersson, B.; Wentland, M.A. [Baylor College of Medicine, Houston, TX (United States)] [and others

    1994-09-01

    Sequencing of a one-metabase region of Xq28, spanning the FRAXA and IDS loci has been undertaken in order to investigate the practicality of the shotgun approach for large scale sequencing and as a platform to develop improved methods. The efficiency of several steps in the shotgun sequencing strategy has been increased using PCR-based approaches. An improved method for preparation of M13 libraries has been developed. This protocol combines a previously described adaptor-based protocol with the uracil DNA glycosylase (UDG)-cloning procedure. The efficiency of this procedure has been found to be up to 100-fold higher than that of previously used protocols. In addition the novel protocol is more reliable and thus easy to establish in a laboratory. The method has also been adapted for the simultaneous shotgun sequencing of multiple short fragments by concentrating them before library construction is presented. This protocol is suitable for rapid characterization of cDNA clones. A library was constructed from 15 PCR-amplified and concentrated human cDNA inserts, and the insert sequences could easily be identified as separate contigs during the assembly process and the sequence coverage was even along each fragment. Using this strategy, the fine structures of the FraxA and IDS loci have been revealed and several EST homologies indicating novel expressed sequences have been identified. Use of PCR to close repetitive regions that are difficult to clone was tested by determination of the sequence of a cosmid mapping DXS455 in Xq28, containing a polymorphic VNTR. The region containing the VNTR was not represented in the shotgun library, but by designing PCR primers in the sequences flanking the gap and by cloning and sequencing the PCR product, the fine structure of the VNTR has been determined. It was found to be an AT-rich VNTR with a repeated 25-mer at the center.

  13. A statistical approach designed for finding mathematically defined repeats in shotgun data and determining the length distribution of clone-inserts

    DEFF Research Database (Denmark)

    Zhong, Lan; Zhang, Kunlin; Huang, Xiangang

    2003-01-01

    that repeats of different copy number have different probabilities of appearance in shotgun data, so based on this principle, we constructed a statistical model and inferred criteria for mathematically defined repeats (MDRs) at different shotgun coverages. According to these criteria, we developed software...... MDRmasker to identify and mask MDRs in shotgun data. With repeats masked prior to assembly, the speed of assembly was increased with lower error probability. In addition, clone-insert size affect the accuracy of repeat assembly and scaffold construction, we also designed length distribution of clone...

  14. Diversity of thermophiles in a Malaysian hot spring determined using 16S rRNA and shotgun metagenome sequencing.

    Science.gov (United States)

    Chan, Chia Sing; Chan, Kok-Gan; Tay, Yea-Ling; Chua, Yi-Heng; Goh, Kian Mau

    2015-01-01

    The Sungai Klah (SK) hot spring is the second hottest geothermal spring in Malaysia. This hot spring is a shallow, 150-m-long, fast-flowing stream, with temperatures varying from 50 to 110°C and a pH range of 7.0-9.0. Hidden within a wooded area, the SK hot spring is continually fed by plant litter, resulting in a relatively high degree of total organic content (TOC). In this study, a sample taken from the middle of the stream was analyzed at the 16S rRNA V3-V4 region by amplicon metagenome sequencing. Over 35 phyla were detected by analyzing the 16S rRNA data. Firmicutes and Proteobacteria represented approximately 57% of the microbiome. Approximately 70% of the detected thermophiles were strict anaerobes; however, Hydrogenobacter spp., obligate chemolithotrophic thermophiles, represented one of the major taxa. Several thermophilic photosynthetic microorganisms and acidothermophiles were also detected. Most of the phyla identified by 16S rRNA were also found using the shotgun metagenome approaches. The carbon, sulfur, and nitrogen metabolism within the SK hot spring community were evaluated by shotgun metagenome sequencing, and the data revealed diversity in terms of metabolic activity and dynamics. This hot spring has a rich diversified phylogenetic community partly due to its natural environment (plant litter, high TOC, and a shallow stream) and geochemical parameters (broad temperature and pH range). It is speculated that symbiotic relationships occur between the members of the community.

  15. Evaluation of ddRADseq for reduced representation metagenome sequencing

    Directory of Open Access Journals (Sweden)

    Michael Y. Liu

    2017-09-01

    Full Text Available Background Profiling of microbial communities via metagenomic shotgun sequencing has enabled researches to gain unprecedented insight into microbial community structure and the functional roles of community members. This study describes a method and basic analysis for a metagenomic adaptation of the double digest restriction site associated DNA sequencing (ddRADseq protocol for reduced representation metagenome profiling. Methods This technique takes advantage of the sequence specificity of restriction endonucleases to construct an Illumina-compatible sequencing library containing DNA fragments that are between a pair of restriction sites located within close proximity. This results in a reduced sequencing library with coverage breadth that can be tuned by size selection. We assessed the performance of the metagenomic ddRADseq approach by applying the full method to human stool samples and generating sequence data. Results The ddRADseq data yields a similar estimate of community taxonomic profile as obtained from shotgun metagenome sequencing of the same human stool samples. No obvious bias with respect to genomic G + C content and the estimated relative species abundance was detected. Discussion Although ddRADseq does introduce some bias in taxonomic representation, the bias is likely to be small relative to DNA extraction bias. ddRADseq appears feasible and could have value as a tool for metagenome-wide association studies.

  16. Diversity of thermophiles in a Malaysian hot spring determined using 16S rRNA and shotgun metagenome sequencing

    Directory of Open Access Journals (Sweden)

    Chia Sing eChan

    2015-03-01

    Full Text Available The Sungai Klah (SK hot spring is the second hottest geothermal spring in Malaysia. This hot spring is a shallow, 150-meter-long, fast-flowing stream, with temperatures varying from 50 to 110°C and a pH range of 7.0 to 9.0. Hidden within a wooded area, the SK hot spring is continually fed by plant litter, resulting in a relatively high degree of total organic content (TOC. In this study, a sample taken from the middle of the stream was analyzed at the 16S rRNA V3−V4 region by amplicon metagenome sequencing. Over 35 phyla were detected by analyzing the 16S rRNA data. Firmicutes and Proteobacteria represented approximately 57% of the microbiome. Approximately 70% of the detected thermophiles were strict anaerobes; however, Hydrogenobacter spp., obligate chemolithotrophic thermophiles, represented one of the major taxa. Several thermophilic photosynthetic microorganisms and acidothermophiles were also detected. Most of the phyla identified by 16S rRNA were also found using the shotgun metagenome approaches. The carbon, sulfur, and nitrogen metabolism within the SK hot spring community were evaluated by shotgun metagenome sequencing, and the data revealed diversity in terms of metabolic activity and dynamics. This hot spring has a rich diversified phylogenetic community partly due to its natural environment (plant litter, high TOC, and a shallow stream and geochemical parameters (broad temperature and pH range. It is speculated that symbiotic relationships occur between the members of the community.

  17. Microbial community profiling of human saliva using shotgun metagenomic sequencing.

    Directory of Open Access Journals (Sweden)

    Nur A Hasan

    Full Text Available Human saliva is clinically informative of both oral and general health. Since next generation shotgun sequencing (NGS is now widely used to identify and quantify bacteria, we investigated the bacterial flora of saliva microbiomes of two healthy volunteers and five datasets from the Human Microbiome Project, along with a control dataset containing short NGS reads from bacterial species representative of the bacterial flora of human saliva. GENIUS, a system designed to identify and quantify bacterial species using unassembled short NGS reads was used to identify the bacterial species comprising the microbiomes of the saliva samples and datasets. Results, achieved within minutes and at greater than 90% accuracy, showed more than 175 bacterial species comprised the bacterial flora of human saliva, including bacteria known to be commensal human flora but also Haemophilus influenzae, Neisseria meningitidis, Streptococcus pneumoniae, and Gamma proteobacteria. Basic Local Alignment Search Tool (BLASTn analysis in parallel, reported ca. five times more species than those actually comprising the in silico sample. Both GENIUS and BLAST analyses of saliva samples identified major genera comprising the bacterial flora of saliva, but GENIUS provided a more precise description of species composition, identifying to strain in most cases and delivered results at least 10,000 times faster. Therefore, GENIUS offers a facile and accurate system for identification and quantification of bacterial species and/or strains in metagenomic samples.

  18. High coverage of the complete mitochondrial genome of the rare Gray's beaked whale (Mesoplodon grayi) using Illumina next generation sequencing.

    Science.gov (United States)

    Thompson, Kirsten F; Patel, Selina; Williams, Liam; Tsai, Peter; Constantine, Rochelle; Baker, C Scott; Millar, Craig D

    2016-01-01

    Using an Illumina platform, we shot-gun sequenced the complete mitochondrial genome of Gray's beaked whale (Mesoplodon grayi) to an average coverage of 152X. We performed a de novo assembly using SOAPdenovo2 and determined the total mitogenome length to be 16,347 bp. The nucleotide composition was asymmetric (33.3% A, 24.6% C, 12.6% G, 29.5% T) with an overall GC content of 37.2%. The gene organization was similar to that of other cetaceans with 13 protein-coding genes, 2 rRNAs (12S and 16S), 22 predicted tRNAs and 1 control region or D-loop. We found no evidence of heteroplasmy or nuclear copies of mitochondrial DNA in this individual. Beaked whales within the genus Mesoplodon are rarely seen at sea and their basic biology is poorly understood. These data will contribute to resolving the phylogeography and population ecology of this speciose group.

  19. Global analysis of the yeast lipidome by quantitative shotgun mass spectrometry

    DEFF Research Database (Denmark)

    Ejsing, Christer S.; Sampaio, Julio L; Surendranath, Vineeth

    2009-01-01

    95% coverage of the yeast lipidome achieved with 125-fold improvement in sensitivity compared with previous approaches. Comparative lipidomics demonstrated that growth temperature and defects in lipid biosynthesis induce ripple effects throughout the molecular composition of the yeast lipidome....... This work serves as a resource for molecular characterization of eukaryotic lipidomes, and establishes shotgun lipidomics as a powerful platform for complementing biochemical studies and other systems-level approaches....

  20. Estimating DNA coverage and abundance in metagenomes using a gamma approximation

    Energy Technology Data Exchange (ETDEWEB)

    Hooper, Sean D; Dalevi, Daniel; Pati, Amrita; Mavromatis, Konstantinos; Ivanova, Natalia N; Kyrpides, Nikos C

    2010-01-01

    Shotgun sequencing generates large numbers of short DNA reads from either an isolated organism or, in the case of metagenomics projects, from the aggregate genome of a microbial community. These reads are then assembled based on overlapping sequences into larger, contiguous sequences (contigs). The feasibility of assembly and the coverage achieved (reads per nucleotide or distinct sequence of nucleotides) depend on several factors: the number of reads sequenced, the read length and the relative abundances of their source genomes in the microbial community. A low coverage suggests that most of the genomic DNA in the sample has not been sequenced, but it is often difficult to estimate either the extent of the uncaptured diversity or the amount of additional sequencing that would be most efficacious. In this work, we regard a metagenome as a population of DNA fragments (bins), each of which may be covered by one or more reads. We employ a gamma distribution to model this bin population due to its flexibility and ease of use. When a gamma approximation can be found that adequately fits the data, we may estimate the number of bins that were not sequenced and that could potentially be revealed by additional sequencing. We evaluated the performance of this model using simulated metagenomes and demonstrate its applicability on three recent metagenomic datasets.

  1. RNA shotgun metagenomic sequencing of northern California (USA mosquitoes uncovers viruses, bacteria, and fungi

    Directory of Open Access Journals (Sweden)

    James Angus eChandler

    2015-03-01

    Full Text Available Mosquitoes, most often recognized for the microbial agents of disease they may carry, harbor diverse microbial communities that include viruses, bacteria, and fungi, collectively called the microbiota. The composition of the microbiota can directly and indirectly affect disease transmission through microbial interactions that could be revealed by its characterization in natural populations of mosquitoes. Furthermore, the use of shotgun metagenomic sequencing (SMS approaches could allow the discovery of unknown members of the microbiota. In this study, we use RNA SMS to characterize the microbiota of seven individual mosquitoes (species include Culex pipiens, Culiseta incidens, and Ochlerotatus sierrensis collected from a variety of habitats in California, USA. Sequencing was performed on the Illumina HiSeq platform and the resulting sequences were quality-checked and assembled into contigs using the A5 pipeline. Sequences related to single stranded RNA viruses of the Bunyaviridae and Rhabdoviridae were uncovered, along with an unclassified genus of double-stranded RNA viruses. Phylogenetic analysis finds that in all three cases, the closest relatives of the identified viral sequences are other mosquito-associated viruses, suggesting widespread host-group specificity among disparate viral taxa. Interestingly, we identified a Narnavirus of fungi, also reported elsewhere in mosquitoes, that potentially demonstrates a nested host-parasite association between virus, fungi, and mosquito. Sequences related to 8 bacterial families and 13 fungal families were found across the seven samples. Bacillus and Escherichia/Shigella were identified in all samples and Wolbachia was identified in all Cx. pipiens samples, while no single fungal genus was found in more than two samples. This study exemplifies the utility of RNA SMS in the characterization of the natural microbiota of mosquitoes and, in particular, the value of identifying all microbes associated with

  2. Genome wide SNP discovery in flax through next generation sequencing of reduced representation libraries

    Directory of Open Access Journals (Sweden)

    Kumar Santosh

    2012-12-01

    Full Text Available Abstract Background Flax (Linum usitatissimum L. is a significant fibre and oilseed crop. Current flax molecular markers, including isozymes, RAPDs, AFLPs and SSRs are of limited use in the construction of high density linkage maps and for association mapping applications due to factors such as low reproducibility, intense labour requirements and/or limited numbers. We report here on the use of a reduced representation library strategy combined with next generation Illumina sequencing for rapid and large scale discovery of SNPs in eight flax genotypes. SNP discovery was performed through in silico analysis of the sequencing data against the whole genome shotgun sequence assembly of flax genotype CDC Bethune. Genotyping-by-sequencing of an F6-derived recombinant inbred line population provided validation of the SNPs. Results Reduced representation libraries of eight flax genotypes were sequenced on the Illumina sequencing platform resulting in sequence coverage ranging from 4.33 to 15.64X (genome equivalents. Depending on the relatedness of the genotypes and the number and length of the reads, between 78% and 93% of the reads mapped onto the CDC Bethune whole genome shotgun sequence assembly. A total of 55,465 SNPs were discovered with the largest number of SNPs belonging to the genotypes with the highest mapping coverage percentage. Approximately 84% of the SNPs discovered were identified in a single genotype, 13% were shared between any two genotypes and the remaining 3% in three or more. Nearly a quarter of the SNPs were found in genic regions. A total of 4,706 out of 4,863 SNPs discovered in Macbeth were validated using genotyping-by-sequencing of 96 F6 individuals from a recombinant inbred line population derived from a cross between CDC Bethune and Macbeth, corresponding to a validation rate of 96.8%. Conclusions Next generation sequencing of reduced representation libraries was successfully implemented for genome-wide SNP discovery from

  3. Suicide with Shotgun: A Case Report

    Directory of Open Access Journals (Sweden)

    Ali Yildirim

    2011-03-01

    Full Text Available Suicide appears to be a major public health problem in our country and all over the World. Suicide methods will vary between the various communities the most common types of suicides are hanging, using chemicals and using firearms (pistol, shotgun. Connected with easy availability of shotguns suicide cases with using shotgun is significantly increasing in recent years. In our study, suicide with a shotgun, are evaluated in terms of shooting range and its features, originate, area of suicide, crime scene, sex and age. [J Contemp Med 2011; 1(1.000: 29-34

  4. Use of Metagenomic Shotgun Sequencing Technology To Detect Foodborne Pathogens within the Microbiome of the Beef Production Chain.

    Science.gov (United States)

    Yang, Xiang; Noyes, Noelle R; Doster, Enrique; Martin, Jennifer N; Linke, Lyndsey M; Magnuson, Roberta J; Yang, Hua; Geornaras, Ifigenia; Woerner, Dale R; Jones, Kenneth L; Ruiz, Jaime; Boucher, Christina; Morley, Paul S; Belk, Keith E

    2016-04-01

    Foodborne illnesses associated with pathogenic bacteria are a global public health and economic challenge. The diversity of microorganisms (pathogenic and nonpathogenic) that exists within the food and meat industries complicates efforts to understand pathogen ecology. Further, little is known about the interaction of pathogens within the microbiome throughout the meat production chain. Here, a metagenomic approach and shotgun sequencing technology were used as tools to detect pathogenic bacteria in environmental samples collected from the same groups of cattle at different longitudinal processing steps of the beef production chain: cattle entry to feedlot, exit from feedlot, cattle transport trucks, abattoir holding pens, and the end of the fabrication system. The log read counts classified as pathogens per million reads for Salmonella enterica,Listeria monocytogenes,Escherichia coli,Staphylococcus aureus, Clostridium spp. (C. botulinum and C. perfringens), and Campylobacter spp. (C. jejuni,C. coli, and C. fetus) decreased over subsequential processing steps. Furthermore, the normalized read counts for S. enterica,E. coli, and C. botulinumwere greater in the final product than at the feedlots, indicating that the proportion of these bacteria increased (the effect on absolute numbers was unknown) within the remaining microbiome. From an ecological perspective, data indicated that shotgun metagenomics can be used to evaluate not only the microbiome but also shifts in pathogen populations during beef production. Nonetheless, there were several challenges in this analysis approach, one of the main ones being the identification of the specific pathogen from which the sequence reads originated, which makes this approach impractical for use in pathogen identification for regulatory and confirmation purposes. Copyright © 2016 Yang et al.

  5. Initial characterization of the large genome of the salamander Ambystoma mexicanum using shotgun and laser capture chromosome sequencing.

    Science.gov (United States)

    Keinath, Melissa C; Timoshevskiy, Vladimir A; Timoshevskaya, Nataliya Y; Tsonis, Panagiotis A; Voss, S Randal; Smith, Jeramiah J

    2015-11-10

    Vertebrates exhibit substantial diversity in genome size, and some of the largest genomes exist in species that uniquely inform diverse areas of basic and biomedical research. For example, the salamander Ambystoma mexicanum (the Mexican axolotl) is a model organism for studies of regeneration, development and genome evolution, yet its genome is ~10× larger than the human genome. As part of a hierarchical approach toward improving genome resources for the species, we generated 600 Gb of shotgun sequence data and developed methods for sequencing individual laser-captured chromosomes. Based on these data, we estimate that the A. mexicanum genome is ~32 Gb. Notably, as much as 19 Gb of the A. mexicanum genome can potentially be considered single copy, which presumably reflects the evolutionary diversification of mobile elements that accumulated during an ancient episode of genome expansion. Chromosome-targeted sequencing permitted the development of assemblies within the constraints of modern computational platforms, allowed us to place 2062 genes on the two smallest A. mexicanum chromosomes and resolves key events in the history of vertebrate genome evolution. Our analyses show that the capture and sequencing of individual chromosomes is likely to provide valuable information for the systematic sequencing, assembly and scaffolding of large genomes.

  6. Evaluation of a transposase protocol for rapid generation of shotgun high-throughput sequencing libraries from nanogram quantities of DNA.

    Science.gov (United States)

    Marine, Rachel; Polson, Shawn W; Ravel, Jacques; Hatfull, Graham; Russell, Daniel; Sullivan, Matthew; Syed, Fraz; Dumas, Michael; Wommack, K Eric

    2011-11-01

    Construction of DNA fragment libraries for next-generation sequencing can prove challenging, especially for samples with low DNA yield. Protocols devised to circumvent the problems associated with low starting quantities of DNA can result in amplification biases that skew the distribution of genomes in metagenomic data. Moreover, sample throughput can be slow, as current library construction techniques are time-consuming. This study evaluated Nextera, a new transposon-based method that is designed for quick production of DNA fragment libraries from a small quantity of DNA. The sequence read distribution across nine phage genomes in a mock viral assemblage met predictions for six of the least-abundant phages; however, the rank order of the most abundant phages differed slightly from predictions. De novo genome assemblies from Nextera libraries provided long contigs spanning over half of the phage genome; in four cases where full-length genome sequences were available for comparison, consensus sequences were found to match over 99% of the genome with near-perfect identity. Analysis of areas of low and high sequence coverage within phage genomes indicated that GC content may influence coverage of sequences from Nextera libraries. Comparisons of phage genomes prepared using both Nextera and a standard 454 FLX Titanium library preparation protocol suggested that the coverage biases according to GC content observed within the Nextera libraries were largely attributable to bias in the Nextera protocol rather than to the 454 sequencing technology. Nevertheless, given suitable sequence coverage, the Nextera protocol produced high-quality data for genomic studies. For metagenomics analyses, effects of GC amplification bias would need to be considered; however, the library preparation standardization that Nextera provides should benefit comparative metagenomic analyses.

  7. Application of the whole-transcriptome shotgun sequencing approach to the study of Philadelphia-positive acute lymphoblastic leukemia

    International Nuclear Information System (INIS)

    Iacobucci, I; Ferrarini, A; Sazzini, M; Giacomelli, E; Lonetti, A; Xumerle, L; Ferrari, A; Papayannidis, C; Malerba, G; Luiselli, D; Boattini, A; Garagnani, P; Vitale, A; Soverini, S; Pane, F; Baccarani, M; Delledonne, M; Martinelli, G

    2012-01-01

    Although the pathogenesis of BCR–ABL1-positive acute lymphoblastic leukemia (ALL) is mainly related to the expression of the BCR–ABL1 fusion transcript, additional cooperating genetic lesions are supposed to be involved in its development and progression. Therefore, in an attempt to investigate the complex landscape of mutations, changes in expression profiles and alternative splicing (AS) events that can be observed in such disease, the leukemia transcriptome of a BCR–ABL1-positive ALL patient at diagnosis and at relapse was sequenced using a whole-transcriptome shotgun sequencing (RNA-Seq) approach. A total of 13.9 and 15.8 million sequence reads was generated from de novo and relapsed samples, respectively, and aligned to the human genome reference sequence. This led to the identification of five validated missense mutations in genes involved in metabolic processes (DPEP1, TMEM46), transport (MVP), cell cycle regulation (ABL1) and catalytic activity (CTSZ), two of which resulted in acquired relapse variants. In all, 6390 and 4671 putative AS events were also detected, as well as expression levels for 18 315 and 18 795 genes, 28% of which were differentially expressed in the two disease phases. These data demonstrate that RNA-Seq is a suitable approach for identifying a wide spectrum of genetic alterations potentially involved in ALL

  8. "Shotgunning" as an illicit drug smoking practice.

    Science.gov (United States)

    Perlman, D C; Perkins, M P; Paone, D; Kochems, L; Salomon, N; Friedmann, P; Des Jarlais, D C

    1997-01-01

    There has been a rise in illicit drug smoking in the United States. "Shotgunning" drugs (or "doing a shotgun") refers to the practice of inhaling smoke and then exhaling it into another individual's mouth, a practice with the potential for the efficient transmission of respiratory pathogens. Three hundred fifty-four drug users (239 from a syringe exchange and 115 from a drug detoxification program) were interviewed about shotgunning and screened for tuberculosis (TB). Fifty-nine (17%; 95% CI 12.9%-20.9%) reported shotgunning while smoking crack cocaine (68%), marijuana (41%), or heroin (2%). In multivariate analysis, age alcohol to intoxication (OR 2.2, 95% CI 1.1-4.3), having engaged in high-risk sex (OR 2.6, 95% CI 1.04-6.7), and crack use (OR 6.0, 95% CI 3.0-12) were independently associated with shotgunning. Shotgunning is a frequent drug smoking practice with the potential to transmit respiratory pathogens, underscoring the need for education of drug users about the risks of specific drug use practices, and the ongoing need for TB control among active drug users.

  9. Protein-Level Integration Strategy of Multiengine MS Spectra Search Results for Higher Confidence and Sequence Coverage.

    Science.gov (United States)

    Zhao, Panpan; Zhong, Jiayong; Liu, Wanting; Zhao, Jing; Zhang, Gong

    2017-12-01

    Multiple search engines based on various models have been developed to search MS/MS spectra against a reference database, providing different results for the same data set. How to integrate these results efficiently with minimal compromise on false discoveries is an open question due to the lack of an independent, reliable, and highly sensitive standard. We took the advantage of the translating mRNA sequencing (RNC-seq) result as a standard to evaluate the integration strategies of the protein identifications from various search engines. We used seven mainstream search engines (Andromeda, Mascot, OMSSA, X!Tandem, pFind, InsPecT, and ProVerB) to search the same label-free MS data sets of human cell lines Hep3B, MHCCLM3, and MHCC97H from the Chinese C-HPP Consortium for Chromosomes 1, 8, and 20. As expected, the union of seven engines resulted in a boosted false identification, whereas the intersection of seven engines remarkably decreased the identification power. We found that identifications of at least two out of seven engines resulted in maximizing the protein identification power while minimizing the ratio of suspicious/translation-supported identifications (STR), as monitored by our STR index, based on RNC-Seq. Furthermore, this strategy also significantly improves the peptides coverage of the protein amino acid sequence. In summary, we demonstrated a simple strategy to significantly improve the performance for shotgun mass spectrometry by protein-level integrating multiple search engines, maximizing the utilization of the current MS spectra without additional experimental work.

  10. Shotgun pyrosequencing metagenomic analyses of dusts from swine confinement and grain facilities.

    Science.gov (United States)

    Boissy, Robert J; Romberger, Debra J; Roughead, William A; Weissenburger-Moser, Lisa; Poole, Jill A; LeVan, Tricia D

    2014-01-01

    Inhalation of agricultural dusts causes inflammatory reactions and symptoms such as headache, fever, and malaise, which can progress to chronic airway inflammation and associated diseases, e.g. asthma, chronic bronchitis, chronic obstructive pulmonary disease, and hypersensitivity pneumonitis. Although in many agricultural environments feed particles are the major constituent of these dusts, the inflammatory responses that they provoke are likely attributable to particle-associated bacteria, archaebacteria, fungi, and viruses. In this study, we performed shotgun pyrosequencing metagenomic analyses of DNA from dusts from swine confinement facilities or grain elevators, with comparisons to dusts from pet-free households. DNA sequence alignment showed that 19% or 62% of shotgun pyrosequencing metagenomic DNA sequence reads from swine facility or household dusts, respectively, were of swine or human origin, respectively. In contrast only 2% of such reads from grain elevator dust were of mammalian origin. These metagenomic shotgun reads of mammalian origin were excluded from our analyses of agricultural dust microbiota. The ten most prevalent bacterial taxa identified in swine facility compared to grain elevator or household dust were comprised of 75%, 16%, and 42% gram-positive organisms, respectively. Four of the top five swine facility dust genera were assignable (Clostridium, Lactobacillus, Ruminococcus, and Eubacterium, ranging from 4% to 19% relative abundance). The relative abundances of these four genera were lower in dust from grain elevators or pet-free households. These analyses also highlighted the predominance in swine facility dust of Firmicutes (70%) at the phylum level, Clostridia (44%) at the Class level, and Clostridiales at the Order level (41%). In summary, shotgun pyrosequencing metagenomic analyses of agricultural dusts show that they differ qualitatively and quantitatively at the level of microbial taxa present, and that the bioinformatic analyses

  11. Shotgun pyrosequencing metagenomic analyses of dusts from swine confinement and grain facilities.

    Directory of Open Access Journals (Sweden)

    Robert J Boissy

    Full Text Available Inhalation of agricultural dusts causes inflammatory reactions and symptoms such as headache, fever, and malaise, which can progress to chronic airway inflammation and associated diseases, e.g. asthma, chronic bronchitis, chronic obstructive pulmonary disease, and hypersensitivity pneumonitis. Although in many agricultural environments feed particles are the major constituent of these dusts, the inflammatory responses that they provoke are likely attributable to particle-associated bacteria, archaebacteria, fungi, and viruses. In this study, we performed shotgun pyrosequencing metagenomic analyses of DNA from dusts from swine confinement facilities or grain elevators, with comparisons to dusts from pet-free households. DNA sequence alignment showed that 19% or 62% of shotgun pyrosequencing metagenomic DNA sequence reads from swine facility or household dusts, respectively, were of swine or human origin, respectively. In contrast only 2% of such reads from grain elevator dust were of mammalian origin. These metagenomic shotgun reads of mammalian origin were excluded from our analyses of agricultural dust microbiota. The ten most prevalent bacterial taxa identified in swine facility compared to grain elevator or household dust were comprised of 75%, 16%, and 42% gram-positive organisms, respectively. Four of the top five swine facility dust genera were assignable (Clostridium, Lactobacillus, Ruminococcus, and Eubacterium, ranging from 4% to 19% relative abundance. The relative abundances of these four genera were lower in dust from grain elevators or pet-free households. These analyses also highlighted the predominance in swine facility dust of Firmicutes (70% at the phylum level, Clostridia (44% at the Class level, and Clostridiales at the Order level (41%. In summary, shotgun pyrosequencing metagenomic analyses of agricultural dusts show that they differ qualitatively and quantitatively at the level of microbial taxa present, and that the

  12. Shotgun metagenomic data on the human stool samples to characterize shifts of the gut microbial profile after the Helicobacter pylori eradication therapy

    Directory of Open Access Journals (Sweden)

    Eugenia A. Boulygina

    2017-10-01

    Full Text Available The shotgun sequencing data presented in this report are related to the research article named “Gut microbiome shotgun sequencing in assessment of microbial community changes associated with H. pylori eradication therapy” (Khusnutdinova et al., 2016 [1]. Typically, the H. pylori eradication protocol includes a prolonged two-week use of the broad-spectrum antibiotics. The presented data on the whole-genome sequencing of the total DNA from stool samples of patients before the start of the eradication, immediately after eradication and several weeks after the end of treatment could help to profile the gut microbiota both taxonomically and functionally. The presented data together with those described in Glushchenko et al. (2017 [2] allow researchers to characterize the metagenomic profiles in which the use of antibiotics could result in dramatic changes in the intestinal microbiota composition. We perform 15 gut metagenomes from 5 patients with H. pylori infection, obtained through the shotgun sequencing on the SOLiD 5500 W platform. Raw reads are deposited in the ENA under project ID PRJEB21338.

  13. Elucidation of taste- and odor-producing bacteria and toxigenic cyanobacteria in a Midwestern drinking water supply reservoir by shotgun metagenomics analysis

    Science.gov (United States)

    Otten, Timothy; Graham, Jennifer L.; Harris, Theodore D.; Dreher, Theo

    2016-01-01

    While commonplace in clinical settings, DNA-based assays for identification or enumeration of drinking water pathogens and other biological contaminants remain widely unadopted by the monitoring community. In this study, shotgun metagenomics was used to identify taste-and-odor producers and toxin-producing cyanobacteria over a 2-year period in a drinking water reservoir. The sequencing data implicated several cyanobacteria, including Anabaena spp.,Microcystis spp., and an unresolved member of the order Oscillatoriales as the likely principal producers of geosmin, microcystin, and 2-methylisoborneol (MIB), respectively. To further demonstrate this, quantitative PCR (qPCR) assays targeting geosmin-producing Anabaena and microcystin-producing Microcystis were utilized, and these data were fitted using generalized linear models and compared with routine monitoring data, including microscopic cell counts, sonde-based physicochemical analyses, and assays of all inorganic and organic nitrogen and phosphorus forms and fractions. The qPCR assays explained the greatest variation in observed geosmin (adjusted R2 = 0.71) and microcystin (adjusted R2 = 0.84) concentrations over the study period, highlighting their potential for routine monitoring applications. The origin of the monoterpene cyclase required for MIB biosynthesis was putatively linked to a periphytic cyanobacterial mat attached to the concrete drinking water inflow structure. We conclude that shotgun metagenomics can be used to identify microbial agents involved in water quality deterioration and to guide PCR assay selection or design for routine monitoring purposes. Finally, we offer estimates of microbial diversity and metagenomic coverage of our data sets for reference to others wishing to apply shotgun metagenomics to other lacustrine systems.

  14. Shotgun Proteomics and Biomarker Discovery

    Directory of Open Access Journals (Sweden)

    W. Hayes McDonald

    2002-01-01

    Full Text Available Coupling large-scale sequencing projects with the amino acid sequence information that can be gleaned from tandem mass spectrometry (MS/MS has made it much easier to analyze complex mixtures of proteins. The limits of this “shotgun” approach, in which the protein mixture is proteolytically digested before separation, can be further expanded by separating the resulting mixture of peptides prior to MS/MS analysis. Both single dimensional high pressure liquid chromatography (LC and multidimensional LC (LC/LC can be directly interfaced with the mass spectrometer to allow for automated collection of tremendous quantities of data. While there is no single technique that addresses all proteomic challenges, the shotgun approaches, especially LC/LC-MS/MS-based techniques such as MudPIT (multidimensional protein identification technology, show advantages over gel-based techniques in speed, sensitivity, scope of analysis, and dynamic range. Advances in the ability to quantitate differences between samples and to detect for an array of post-translational modifications allow for the discovery of classes of protein biomarkers that were previously unassailable.

  15. A Poisson hierarchical modelling approach to detecting copy number variation in sequence coverage data

    OpenAIRE

    Sep?lveda, Nuno; Campino, Susana G; Assefa, Samuel A; Sutherland, Colin J; Pain5, Arnab; Clark, Taane G

    2013-01-01

    BACKGROUND: The advent of next generation sequencing technology has accelerated efforts to map and catalogue copy number variation (CNV) in genomes of important micro-organisms for public health. A typical analysis of the sequence data involves mapping reads onto a reference genome, calculating the respective coverage, and detecting regions with too-low or too-high coverage (deletions and amplifications, respectively). Current CNV detection methods rely on statistical assumptions (e.g., a Poi...

  16. Entrance, exit, and reentrance of one shot with a shotgun

    DEFF Research Database (Denmark)

    Gulmann, C; Hougen, H P

    1999-01-01

    The case being reported is one of a homicidal shotgun fatality with an unusual wound pattern. A 34-year-old man was shot at close range with a 12-gauge shotgun armed with No. 5 birdshot ammunition. The shot entered the left axillary region, exited through the left infraclavicular region, and ther......The case being reported is one of a homicidal shotgun fatality with an unusual wound pattern. A 34-year-old man was shot at close range with a 12-gauge shotgun armed with No. 5 birdshot ammunition. The shot entered the left axillary region, exited through the left infraclavicular region...

  17. Elucidation of Taste- and Odor-Producing Bacteria and Toxigenic Cyanobacteria in a Midwestern Drinking Water Supply Reservoir by Shotgun Metagenomic Analysis.

    Science.gov (United States)

    Otten, Timothy G; Graham, Jennifer L; Harris, Theodore D; Dreher, Theo W

    2016-09-01

    While commonplace in clinical settings, DNA-based assays for identification or enumeration of drinking water pathogens and other biological contaminants remain widely unadopted by the monitoring community. In this study, shotgun metagenomics was used to identify taste-and-odor producers and toxin-producing cyanobacteria over a 2-year period in a drinking water reservoir. The sequencing data implicated several cyanobacteria, including Anabaena spp., Microcystis spp., and an unresolved member of the order Oscillatoriales as the likely principal producers of geosmin, microcystin, and 2-methylisoborneol (MIB), respectively. To further demonstrate this, quantitative PCR (qPCR) assays targeting geosmin-producing Anabaena and microcystin-producing Microcystis were utilized, and these data were fitted using generalized linear models and compared with routine monitoring data, including microscopic cell counts, sonde-based physicochemical analyses, and assays of all inorganic and organic nitrogen and phosphorus forms and fractions. The qPCR assays explained the greatest variation in observed geosmin (adjusted R(2) = 0.71) and microcystin (adjusted R(2) = 0.84) concentrations over the study period, highlighting their potential for routine monitoring applications. The origin of the monoterpene cyclase required for MIB biosynthesis was putatively linked to a periphytic cyanobacterial mat attached to the concrete drinking water inflow structure. We conclude that shotgun metagenomics can be used to identify microbial agents involved in water quality deterioration and to guide PCR assay selection or design for routine monitoring purposes. Finally, we offer estimates of microbial diversity and metagenomic coverage of our data sets for reference to others wishing to apply shotgun metagenomics to other lacustrine systems. Cyanobacterial toxins and microbial taste-and-odor compounds are a growing concern for drinking water utilities reliant upon surface water resources. Specific

  18. Novel advances in shotgun lipidomics for biology and medicine.

    Science.gov (United States)

    Wang, Miao; Wang, Chunyan; Han, Rowland H; Han, Xianlin

    2016-01-01

    The field of lipidomics, as coined in 2003, has made profound advances and been rapidly expanded. The mass spectrometry-based strategies of this analytical methodology-oriented research discipline for lipid analysis are largely fallen into three categories: direct infusion-based shotgun lipidomics, liquid chromatography-mass spectrometry-based platforms, and matrix-assisted laser desorption/ionization mass spectrometry-based approaches (particularly in imagining lipid distribution in tissues or cells). This review focuses on shotgun lipidomics. After briefly introducing its fundamentals, the major materials of this article cover its recent advances. These include the novel methods of lipid extraction, novel shotgun lipidomics strategies for identification and quantification of previously hardly accessible lipid classes and molecular species including isomers, and novel tools for processing and interpretation of lipidomics data. Representative applications of advanced shotgun lipidomics for biological and biomedical research are also presented in this review. We believe that with these novel advances in shotgun lipidomics, this approach for lipid analysis should become more comprehensive and high throughput, thereby greatly accelerating the lipidomics field to substantiate the aberrant lipid metabolism, signaling, trafficking, and homeostasis under pathological conditions and their underpinning biochemical mechanisms. Copyright © 2015 Elsevier B.V. All rights reserved.

  19. A Poisson hierarchical modelling approach to detecting copy number variation in sequence coverage data

    KAUST Repository

    Sepúlveda, Nuno

    2013-02-26

    Background: The advent of next generation sequencing technology has accelerated efforts to map and catalogue copy number variation (CNV) in genomes of important micro-organisms for public health. A typical analysis of the sequence data involves mapping reads onto a reference genome, calculating the respective coverage, and detecting regions with too-low or too-high coverage (deletions and amplifications, respectively). Current CNV detection methods rely on statistical assumptions (e.g., a Poisson model) that may not hold in general, or require fine-tuning the underlying algorithms to detect known hits. We propose a new CNV detection methodology based on two Poisson hierarchical models, the Poisson-Gamma and Poisson-Lognormal, with the advantage of being sufficiently flexible to describe different data patterns, whilst robust against deviations from the often assumed Poisson model.Results: Using sequence coverage data of 7 Plasmodium falciparum malaria genomes (3D7 reference strain, HB3, DD2, 7G8, GB4, OX005, and OX006), we showed that empirical coverage distributions are intrinsically asymmetric and overdispersed in relation to the Poisson model. We also demonstrated a low baseline false positive rate for the proposed methodology using 3D7 resequencing data and simulation. When applied to the non-reference isolate data, our approach detected known CNV hits, including an amplification of the PfMDR1 locus in DD2 and a large deletion in the CLAG3.2 gene in GB4, and putative novel CNV regions. When compared to the recently available FREEC and cn.MOPS approaches, our findings were more concordant with putative hits from the highest quality array data for the 7G8 and GB4 isolates.Conclusions: In summary, the proposed methodology brings an increase in flexibility, robustness, accuracy and statistical rigour to CNV detection using sequence coverage data. 2013 Seplveda et al.; licensee BioMed Central Ltd.

  20. A Poisson hierarchical modelling approach to detecting copy number variation in sequence coverage data.

    Science.gov (United States)

    Sepúlveda, Nuno; Campino, Susana G; Assefa, Samuel A; Sutherland, Colin J; Pain, Arnab; Clark, Taane G

    2013-02-26

    The advent of next generation sequencing technology has accelerated efforts to map and catalogue copy number variation (CNV) in genomes of important micro-organisms for public health. A typical analysis of the sequence data involves mapping reads onto a reference genome, calculating the respective coverage, and detecting regions with too-low or too-high coverage (deletions and amplifications, respectively). Current CNV detection methods rely on statistical assumptions (e.g., a Poisson model) that may not hold in general, or require fine-tuning the underlying algorithms to detect known hits. We propose a new CNV detection methodology based on two Poisson hierarchical models, the Poisson-Gamma and Poisson-Lognormal, with the advantage of being sufficiently flexible to describe different data patterns, whilst robust against deviations from the often assumed Poisson model. Using sequence coverage data of 7 Plasmodium falciparum malaria genomes (3D7 reference strain, HB3, DD2, 7G8, GB4, OX005, and OX006), we showed that empirical coverage distributions are intrinsically asymmetric and overdispersed in relation to the Poisson model. We also demonstrated a low baseline false positive rate for the proposed methodology using 3D7 resequencing data and simulation. When applied to the non-reference isolate data, our approach detected known CNV hits, including an amplification of the PfMDR1 locus in DD2 and a large deletion in the CLAG3.2 gene in GB4, and putative novel CNV regions. When compared to the recently available FREEC and cn.MOPS approaches, our findings were more concordant with putative hits from the highest quality array data for the 7G8 and GB4 isolates. In summary, the proposed methodology brings an increase in flexibility, robustness, accuracy and statistical rigour to CNV detection using sequence coverage data.

  1. A Poisson hierarchical modelling approach to detecting copy number variation in sequence coverage data

    KAUST Repository

    Sepú lveda, Nuno; Campino, Susana G; Assefa, Samuel A; Sutherland, Colin J; Pain, Arnab; Clark, Taane G

    2013-01-01

    Background: The advent of next generation sequencing technology has accelerated efforts to map and catalogue copy number variation (CNV) in genomes of important micro-organisms for public health. A typical analysis of the sequence data involves mapping reads onto a reference genome, calculating the respective coverage, and detecting regions with too-low or too-high coverage (deletions and amplifications, respectively). Current CNV detection methods rely on statistical assumptions (e.g., a Poisson model) that may not hold in general, or require fine-tuning the underlying algorithms to detect known hits. We propose a new CNV detection methodology based on two Poisson hierarchical models, the Poisson-Gamma and Poisson-Lognormal, with the advantage of being sufficiently flexible to describe different data patterns, whilst robust against deviations from the often assumed Poisson model.Results: Using sequence coverage data of 7 Plasmodium falciparum malaria genomes (3D7 reference strain, HB3, DD2, 7G8, GB4, OX005, and OX006), we showed that empirical coverage distributions are intrinsically asymmetric and overdispersed in relation to the Poisson model. We also demonstrated a low baseline false positive rate for the proposed methodology using 3D7 resequencing data and simulation. When applied to the non-reference isolate data, our approach detected known CNV hits, including an amplification of the PfMDR1 locus in DD2 and a large deletion in the CLAG3.2 gene in GB4, and putative novel CNV regions. When compared to the recently available FREEC and cn.MOPS approaches, our findings were more concordant with putative hits from the highest quality array data for the 7G8 and GB4 isolates.Conclusions: In summary, the proposed methodology brings an increase in flexibility, robustness, accuracy and statistical rigour to CNV detection using sequence coverage data. 2013 Seplveda et al.; licensee BioMed Central Ltd.

  2. Single-base resolution and long-coverage sequencing based on single-molecule nanomanipulation

    International Nuclear Information System (INIS)

    An Hongjie; Huang Jiehuan; Lue Ming; Li Xueling; Lue Junhong; Li Haikuo; Zhang Yi; Li Minqian; Hu Jun

    2007-01-01

    We show new approaches towards a novel single-molecule sequencing strategy which consists of high-resolution positioning isolation of overlapping DNA fragments with atomic force microscopy (AFM), subsequent single-molecule PCR amplification and conventional Sanger sequencing. In this study, a DNA labelling technique was used to guarantee the accuracy in positioning the target DNA. Single-molecule multiplex PCR was carried out to test the contamination. The results showed that the two overlapping DNA fragments isolated by AFM could be successfully sequenced with high quality and perfect contiguity, indicating that single-base resolution and long-coverage sequencing have been achieved simultaneously

  3. Improvement of a sample preparation method assisted by sodium deoxycholate for mass-spectrometry-based shotgun membrane proteomics.

    Science.gov (United States)

    Lin, Yong; Lin, Haiyan; Liu, Zhonghua; Wang, Kunbo; Yan, Yujun

    2014-11-01

    In current shotgun-proteomics-based biological discovery, the identification of membrane proteins is a challenge. This is especially true for integral membrane proteins due to their highly hydrophobic nature and low abundance. Thus, much effort has been directed at sample preparation strategies such as use of detergents, chaotropes, and organic solvents. We previously described a sample preparation method for shotgun membrane proteomics, the sodium deoxycholate assisted method, which cleverly circumvents many of the challenges associated with traditional sample preparation methods. However, the method is associated with significant sample loss due to the slightly weaker extraction/solubilization ability of sodium deoxycholate when it is used at relatively low concentrations such as 1%. Hence, we present an enhanced sodium deoxycholate sample preparation strategy that first uses a high concentration of sodium deoxycholate (5%) to lyse membranes and extract/solubilize hydrophobic membrane proteins, and then dilutes the detergent to 1% for a more efficient digestion. We then applied the improved method to shotgun analysis of proteins from rat liver membrane enriched fraction. Compared with other representative sample preparation strategies including our previous sodium deoxycholate assisted method, the enhanced sodium deoxycholate method exhibited superior sensitivity, coverage, and reliability for the identification of membrane proteins particularly those with high hydrophobicity and/or multiple transmembrane domains. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  4. IdentiCS – Identification of coding sequence and in silico reconstruction of the metabolic network directly from unannotated low-coverage bacterial genome sequence

    Directory of Open Access Journals (Sweden)

    Zeng An-Ping

    2004-08-01

    Full Text Available Abstract Background A necessary step for a genome level analysis of the cellular metabolism is the in silico reconstruction of the metabolic network from genome sequences. The available methods are mainly based on the annotation of genome sequences including two successive steps, the prediction of coding sequences (CDS and their function assignment. The annotation process takes time. The available methods often encounter difficulties when dealing with unfinished error-containing genomic sequence. Results In this work a fast method is proposed to use unannotated genome sequence for predicting CDSs and for an in silico reconstruction of metabolic networks. Instead of using predicted genes or CDSs to query public databases, entries from public DNA or protein databases are used as queries to search a local database of the unannotated genome sequence to predict CDSs. Functions are assigned to the predicted CDSs simultaneously. The well-annotated genome of Salmonella typhimurium LT2 is used as an example to demonstrate the applicability of the method. 97.7% of the CDSs in the original annotation are correctly identified. The use of SWISS-PROT-TrEMBL databases resulted in an identification of 98.9% of CDSs that have EC-numbers in the published annotation. Furthermore, two versions of sequences of the bacterium Klebsiella pneumoniae with different genome coverage (3.9 and 7.9 fold, respectively are examined. The results suggest that a 3.9-fold coverage of the bacterial genome could be sufficiently used for the in silico reconstruction of the metabolic network. Compared to other gene finding methods such as CRITICA our method is more suitable for exploiting sequences of low genome coverage. Based on the new method, a program called IdentiCS (Identification of Coding Sequences from Unfinished Genome Sequences is delivered that combines the identification of CDSs with the reconstruction, comparison and visualization of metabolic networks (free to download

  5. Analysis Of Segmental Duplications In The Pig Genome Based On Next-Generation Sequencing

    DEFF Research Database (Denmark)

    Fadista, João; Bendixen, Christian

    Segmental duplications are >1kb segments of duplicated DNA present in a genome with high sequence identity (>90%). They are associated with genomic rearrangements and provide a significant source of gene and genome evolution within mammalian genomes. Although segmental duplications have been...... extensively studied in other organisms, its analysis in pig has been hampered by the lack of a complete pig genome assembly. By measuring the depth of coverage of Illumina whole-genome shotgun sequencing reads of the Tabasco animal aligned to the latest pig genome assembly (Sus scrofa 10 – based also...... and their associated copy number alterations, focusing on the global organization of these segments and their possible functional significance in porcine phenotypes. This work provides insights into mammalian genome evolution and generates a valuable resource for porcine genomics research...

  6. Development of 13 microsatellites for Gunnison Sage-grouse (Centrocercus minimus) using next-generation shotgun sequencing and their utility in Greater Sage-grouse (Centrocercus urophasianus)

    Science.gov (United States)

    Fike, Jennifer A.; Oyler-McCance, Sara J.; Zimmerman, Shawna J; Castoe, Todd A.

    2015-01-01

    Gunnison Sage-grouse are an obligate sagebrush species that has experienced significant population declines and has been proposed for listing under the U.S. Endangered Species Act. In order to examine levels of connectivity among Gunnison Sage-grouse leks, we identified 13 novel microsatellite loci though next-generation shotgun sequencing, and tested them on the closely related Greater Sage-grouse. The number of alleles per locus ranged from 2 to 12. No loci were found to be linked, although 2 loci revealed significant departures from Hardy–Weinberg equilibrium or evidence of null alleles. While these microsatellites were designed for Gunnison Sage-grouse, they also work well for Greater Sage-grouse and could be used for numerous genetic questions including landscape and population genetics.

  7. Resolving the Complexity of Human Skin Metagenomes Using Single-Molecule Sequencing

    Directory of Open Access Journals (Sweden)

    Yu-Chih Tsai

    2016-02-01

    Full Text Available Deep metagenomic shotgun sequencing has emerged as a powerful tool to interrogate composition and function of complex microbial communities. Computational approaches to assemble genome fragments have been demonstrated to be an effective tool for de novo reconstruction of genomes from these communities. However, the resultant “genomes” are typically fragmented and incomplete due to the limited ability of short-read sequence data to assemble complex or low-coverage regions. Here, we use single-molecule, real-time (SMRT sequencing to reconstruct a high-quality, closed genome of a previously uncharacterized Corynebacterium simulans and its companion bacteriophage from a skin metagenomic sample. Considerable improvement in assembly quality occurs in hybrid approaches incorporating short-read data, with even relatively small amounts of long-read data being sufficient to improve metagenome reconstruction. Using short-read data to evaluate strain variation of this C. simulans in its skin community at single-nucleotide resolution, we observed a dominant C. simulans strain with moderate allelic heterozygosity throughout the population. We demonstrate the utility of SMRT sequencing and hybrid approaches in metagenome quantitation, reconstruction, and annotation.

  8. Resolving the Complexity of Human Skin Metagenomes Using Single-Molecule Sequencing

    Science.gov (United States)

    Tsai, Yu-Chih; Deming, Clayton; Segre, Julia A.; Kong, Heidi H.; Korlach, Jonas

    2016-01-01

    ABSTRACT Deep metagenomic shotgun sequencing has emerged as a powerful tool to interrogate composition and function of complex microbial communities. Computational approaches to assemble genome fragments have been demonstrated to be an effective tool for de novo reconstruction of genomes from these communities. However, the resultant “genomes” are typically fragmented and incomplete due to the limited ability of short-read sequence data to assemble complex or low-coverage regions. Here, we use single-molecule, real-time (SMRT) sequencing to reconstruct a high-quality, closed genome of a previously uncharacterized Corynebacterium simulans and its companion bacteriophage from a skin metagenomic sample. Considerable improvement in assembly quality occurs in hybrid approaches incorporating short-read data, with even relatively small amounts of long-read data being sufficient to improve metagenome reconstruction. Using short-read data to evaluate strain variation of this C. simulans in its skin community at single-nucleotide resolution, we observed a dominant C. simulans strain with moderate allelic heterozygosity throughout the population. We demonstrate the utility of SMRT sequencing and hybrid approaches in metagenome quantitation, reconstruction, and annotation. PMID:26861018

  9. Retrospective Identification of Herpes Simplex 2 Virus-Associated Acute Liver Failure in an Immunocompetent Patient Detected Using Whole Transcriptome Shotgun Sequencing.

    Science.gov (United States)

    Ono, Atsushi; Hayes, C Nelson; Akamatsu, Sakura; Imamura, Michio; Aikata, Hiroshi; Chayama, Kazuaki

    2017-01-01

    Acute liver failure (ALF) is a severe condition in which liver function rapidly deteriorates in individuals without prior history of liver disease. While most cases result from acetaminophen overdose or viral hepatitis, in up to a third of patients, no clear cause can be identified. Liver transplantation has greatly reduced mortality among these patients, but 40% of patients recover without liver transplantation. Therefore, there is an urgent need for rapid determination of the etiology of acute liver failure. In this case report, we present a case of herpes simplex 2 virus- (HSV-) associated ALF in an immunocompetent patient. The patient recovered without LT, but the presence of HSV was not suspected at the time, precluding more effective treatment with acyclovir. To determine the etiology, stored blood samples were analyzed using whole transcriptome shotgun sequencing followed by mapping to a panel of viral reference sequences. The presence of HSV-DNA in blood samples at the time of admission was confirmed using real-time polymerase chain reaction, and, at the time of discharge, HSV-DNA levels had decreased by a factor of 10 6 . Conclusions. In ALF cases of undetermined etiology, uncommon causes should be considered, especially those for which an effective treatment is available.

  10. Retrospective Identification of Herpes Simplex 2 Virus-Associated Acute Liver Failure in an Immunocompetent Patient Detected Using Whole Transcriptome Shotgun Sequencing

    Directory of Open Access Journals (Sweden)

    Atsushi Ono

    2017-01-01

    Full Text Available Acute liver failure (ALF is a severe condition in which liver function rapidly deteriorates in individuals without prior history of liver disease. While most cases result from acetaminophen overdose or viral hepatitis, in up to a third of patients, no clear cause can be identified. Liver transplantation has greatly reduced mortality among these patients, but 40% of patients recover without liver transplantation. Therefore, there is an urgent need for rapid determination of the etiology of acute liver failure. In this case report, we present a case of herpes simplex 2 virus- (HSV- associated ALF in an immunocompetent patient. The patient recovered without LT, but the presence of HSV was not suspected at the time, precluding more effective treatment with acyclovir. To determine the etiology, stored blood samples were analyzed using whole transcriptome shotgun sequencing followed by mapping to a panel of viral reference sequences. The presence of HSV-DNA in blood samples at the time of admission was confirmed using real-time polymerase chain reaction, and, at the time of discharge, HSV-DNA levels had decreased by a factor of 106. Conclusions. In ALF cases of undetermined etiology, uncommon causes should be considered, especially those for which an effective treatment is available.

  11. Using Partial Genomic Fosmid Libraries for Sequencing CompleteOrganellar Genomes

    Energy Technology Data Exchange (ETDEWEB)

    McNeal, Joel R.; Leebens-Mack, James H.; Arumuganathan, K.; Kuehl, Jennifer V.; Boore, Jeffrey L.; dePamphilis, Claude W.

    2005-08-26

    Organellar genome sequences provide numerous phylogenetic markers and yield insight into organellar function and molecular evolution. These genomes are much smaller in size than their nuclear counterparts; thus, their complete sequencing is much less expensive than total nuclear genome sequencing, making broader phylogenetic sampling feasible. However, for some organisms it is challenging to isolate plastid DNA for sequencing using standard methods. To overcome these difficulties, we constructed partial genomic libraries from total DNA preparations of two heterotrophic and two autotrophic angiosperm species using fosmid vectors. We then used macroarray screening to isolate clones containing large fragments of plastid DNA. A minimum tiling path of clones comprising the entire genome sequence of each plastid was selected, and these clones were shotgun-sequenced and assembled into complete genomes. Although this method worked well for both heterotrophic and autotrophic plants, nuclear genome size had a dramatic effect on the proportion of screened clones containing plastid DNA and, consequently, the overall number of clones that must be screened to ensure full plastid genome coverage. This technique makes it possible to determine complete plastid genome sequences for organisms that defy other available organellar genome sequencing methods, especially those for which limited amounts of tissue are available.

  12. Re-annotation of the physical map of Glycine max for polyploid-like regions by BAC end sequence driven whole genome shotgun read assembly

    Directory of Open Access Journals (Sweden)

    Shultz Jeffry

    2008-07-01

    Full Text Available Abstract Background Many of the world's most important food crops have either polyploid genomes or homeologous regions derived from segmental shuffling following polyploid formation. The soybean (Glycine max genome has been shown to be composed of approximately four thousand short interspersed homeologous regions with 1, 2 or 4 copies per haploid genome by RFLP analysis, microsatellite anchors to BACs and by contigs formed from BAC fingerprints. Despite these similar regions,, the genome has been sequenced by whole genome shotgun sequence (WGS. Here the aim was to use BAC end sequences (BES derived from three minimum tile paths (MTP to examine the extent and homogeneity of polyploid-like regions within contigs and the extent of correlation between the polyploid-like regions inferred from fingerprinting and the polyploid-like sequences inferred from WGS matches. Results Results show that when sequence divergence was 1–10%, the copy number of homeologous regions could be identified from sequence variation in WGS reads overlapping BES. Homeolog sequence variants (HSVs were single nucleotide polymorphisms (SNPs; 89% and single nucleotide indels (SNIs 10%. Larger indels were rare but present (1%. Simulations that had predicted fingerprints of homeologous regions could be separated when divergence exceeded 2% were shown to be false. We show that a 5–10% sequence divergence is necessary to separate homeologs by fingerprinting. BES compared to WGS traces showed polyploid-like regions with less than 1% sequence divergence exist at 2.3% of the locations assayed. Conclusion The use of HSVs like SNPs and SNIs to characterize BACs wil improve contig building methods. The implications for bioinformatic and functional annotation of polyploid and paleopolyploid genomes show that a combined approach of BAC fingerprint based physical maps, WGS sequence and HSV-based partitioning of BAC clones from homeologous regions to separate contigs will allow reliable de

  13. Deep Illumina-based shotgun sequencing reveals dietary effects on the structure and function of the fecal microbiome of growing kittens.

    Directory of Open Access Journals (Sweden)

    Oliver Deusch

    Full Text Available Previously, we demonstrated that dietary protein:carbohydrate ratio dramatically affects the fecal microbial taxonomic structure of kittens using targeted 16S gene sequencing. The present study, using the same fecal samples, applied deep Illumina shotgun sequencing to identify the diet-associated functional potential and analyze taxonomic changes of the feline fecal microbiome.Fecal samples from kittens fed one of two diets differing in protein and carbohydrate content (high-protein, low-carbohydrate, HPLC; and moderate-protein, moderate-carbohydrate, MPMC were collected at 8, 12 and 16 weeks of age (n = 6 per group. A total of 345.3 gigabases of sequence were generated from 36 samples, with 99.75% of annotated sequences identified as bacterial. At the genus level, 26% and 39% of reads were annotated for HPLC- and MPMC-fed kittens, with HPLC-fed cats showing greater species richness and microbial diversity. Two phyla, ten families and fifteen genera were responsible for more than 80% of the sequences at each taxonomic level for both diet groups, consistent with the previous taxonomic study. Significantly different abundances between diet groups were observed for 324 genera (56% of all genera identified demonstrating widespread diet-induced changes in microbial taxonomic structure. Diversity was not affected over time. Functional analysis identified 2,013 putative enzyme function groups were different (p<0.000007 between the two dietary groups and were associated to 194 pathways, which formed five discrete clusters based on average relative abundance. Of those, ten contained more (p<0.022 enzyme functions with significant diet effects than expected by chance. Six pathways were related to amino acid biosynthesis and metabolism linking changes in dietary protein with functional differences of the gut microbiome.These data indicate that feline feces-derived microbiomes have large structural and functional differences relating to the dietary

  14. Fully immunized child: coverage, timing and sequencing of routine immunization in an urban poor settlement in Nairobi, Kenya.

    Science.gov (United States)

    Mutua, Martin Kavao; Kimani-Murage, Elizabeth; Ngomi, Nicholas; Ravn, Henrik; Mwaniki, Peter; Echoka, Elizabeth

    2016-01-01

    More efforts have been put in place to increase full immunization coverage rates in the last decade. Little is known about the levels and consequences of delaying or vaccinating children in different schedules. Vaccine effectiveness depends on the timing of its administration, and it is not optimal if given early, delayed or not given as recommended. Evidence of non-specific effects of vaccines is well documented and could be linked to timing and sequencing of immunization. This paper documents the levels of coverage, timing and sequencing of routine childhood vaccines. The study was conducted between 2007 and 2014 in two informal urban settlements in Nairobi. A total of 3856 children, aged 12-23 months and having a vaccination card seen were included in analysis. Vaccination dates recorded from the cards seen were used to define full immunization coverage, timeliness and sequencing. Proportions, medians and Kaplan-Meier curves were used to assess and describe the levels of full immunization coverage, vaccination delays and sequencing. The findings indicate that 67 % of the children were fully immunized by 12 months of age. Missing measles and third doses of polio and pentavalent vaccine were the main reason for not being fully immunized. Delays were highest for third doses of polio and pentavalent and measles. About 22 % of fully immunized children had vaccines in an out-of-sequence manner with 18 % not receiving pentavalent together with polio vaccine as recommended. Results show higher levels of missed opportunities and low coverage of routine childhood vaccinations given at later ages. New strategies are needed to enable health care providers and parents/guardians to work together to increase the levels of completion of all required vaccinations. In particular, more focus is needed on vaccines given in multiple doses (polio, pentavalent and pneumococcal conjugate vaccines).

  15. AD-LIBS: inferring ancestry across hybrid genomes using low-coverage sequence data.

    Science.gov (United States)

    Schaefer, Nathan K; Shapiro, Beth; Green, Richard E

    2017-04-04

    Inferring the ancestry of each region of admixed individuals' genomes is useful in studies ranging from disease gene mapping to speciation genetics. Current methods require high-coverage genotype data and phased reference panels, and are therefore inappropriate for many data sets. We present a software application, AD-LIBS, that uses a hidden Markov model to infer ancestry across hybrid genomes without requiring variant calling or phasing. This approach is useful for non-model organisms and in cases of low-coverage data, such as ancient DNA. We demonstrate the utility of AD-LIBS with synthetic data. We then use AD-LIBS to infer ancestry in two published data sets: European human genomes with Neanderthal ancestry and brown bear genomes with polar bear ancestry. AD-LIBS correctly infers 87-91% of ancestry in simulations and produces ancestry maps that agree with published results and global ancestry estimates in humans. In brown bears, we find more polar bear ancestry than has been published previously, using both AD-LIBS and an existing software application for local ancestry inference, HAPMIX. We validate AD-LIBS polar bear ancestry maps by recovering a geographic signal within bears that mirrors what is seen in SNP data. Finally, we demonstrate that AD-LIBS is more effective than HAPMIX at inferring ancestry when preexisting phased reference data are unavailable and genomes are sequenced to low coverage. AD-LIBS is an effective tool for ancestry inference that can be used even when few individuals are available for comparison or when genomes are sequenced to low coverage. AD-LIBS is therefore likely to be useful in studies of non-model or ancient organisms that lack large amounts of genomic DNA. AD-LIBS can therefore expand the range of studies in which admixture mapping is a viable tool.

  16. WGSQuikr: fast whole-genome shotgun metagenomic classification.

    Directory of Open Access Journals (Sweden)

    David Koslicki

    Full Text Available With the decrease in cost and increase in output of whole-genome shotgun technologies, many metagenomic studies are utilizing this approach in lieu of the more traditional 16S rRNA amplicon technique. Due to the large number of relatively short reads output from whole-genome shotgun technologies, there is a need for fast and accurate short-read OTU classifiers. While there are relatively fast and accurate algorithms available, such as MetaPhlAn, MetaPhyler, PhyloPythiaS, and PhymmBL, these algorithms still classify samples in a read-by-read fashion and so execution times can range from hours to days on large datasets. We introduce WGSQuikr, a reconstruction method which can compute a vector of taxonomic assignments and their proportions in the sample with remarkable speed and accuracy. We demonstrate on simulated data that WGSQuikr is typically more accurate and up to an order of magnitude faster than the aforementioned classification algorithms. We also verify the utility of WGSQuikr on real biological data in the form of a mock community. WGSQuikr is a Whole-Genome Shotgun QUadratic, Iterative, K-mer based Reconstruction method which extends the previously introduced 16S rRNA-based algorithm Quikr. A MATLAB implementation of WGSQuikr is available at: http://sourceforge.net/projects/wgsquikr.

  17. Alternative Enzymes Lead to Improvements in Sequence Coverage and PTM Analysis

    Science.gov (United States)

    Hooper, Kyle; Rosenblatt, Michael; Urh, Marjeta; Saveliev, Sergei; Hosfield, Chris; Kobs, Gary; Ford, Michael; Jones, Richard; Amunugama, Ravi; Allen, David; Brazas, Robert

    2013-01-01

    The profiling of proteins using biological mass spectrometry (bottom up proteomics) most commonly requires trypsin. Trypsin is advantageous in that it produces peptides of optimal charge and size. However, for applications in which the proteins under investigation are part of a complex mixture or not isolated at high levels (i.e. low ng from an immunoprecipitation), sequence coverage is rarely complete. In addition, we have found that in several cases, like phosphorylation, acetylation, and methylation, alternative proteases are required to prepare peptides suitable for MS detection. This poster will provide specific examples which demonstrate this observation. For example, the application of a combined Trypsin/ Lys-C mixture reduces the number of missed cleavages by more than 3-fold producing samples with lower CV's (for biological replicates). The mixture is also well-suited for the complete proteolysis of hydrophobic, compact proteins. The addition of chymotrypsin and elastase has been found to be useful for identifying phosphorylation sites on proteins, especially on sequences where the site of phosphorylation inhibits trypsin (i.e. proximal to K or R). Many epigenetic applications have focused on histone modifications, like lysine acetylation and arginine methylation. Alternative proteases like Asp-N, Glu-C, and chymotrypsin have been especially useful given the fact that the modified K and R residues are resistant to c-terminal cleavage by trypsin. Finally, in the case of serum profiling, the addition of the endoglycosidase, PNGase F has been found to improve sequence coverage due to the removal of N-linked glycans.

  18. Genome sequence of Stachybotrys chartarum Strain 51-11

    Science.gov (United States)

    Stachybotrys chartarum strain 51-11 genome was sequenced by shotgun sequencing utilizing Illumina Hiseq 2000 and PacBio long read technology. Since Stachybotrys chartarum has been implicated in health impacts within water-damaged buildings, any information extracted from the geno...

  19. High-throughput shotgun lipidomics by quadrupole time-of-flight mass spectrometry

    DEFF Research Database (Denmark)

    Ståhlman, Marcus; Ejsing, Christer S.; Tarasov, Kirill

    2009-01-01

    Technological advances in mass spectrometry and meticulous method development have produced several shotgun lipidomic approaches capable of characterizing lipid species by direct analysis of total lipid extracts. Shotgun lipidomics by hybrid quadrupole time-of-flight mass spectrometry allows...... the absolute quantification of hundreds of molecular glycerophospholipid species, glycerolipid species, sphingolipid species and sterol lipids. Future applications in clinical cohort studies demand detailed lipid molecule information and the application of high-throughput lipidomics platforms. In this review...... we describe a novel high-throughput shotgun lipidomic platform based on 96-well robot-assisted lipid extraction, automated sample infusion by mircofluidic-based nanoelectrospray ionization, and quantitative multiple precursor ion scanning analysis on a quadrupole time-of-flight mass spectrometer...

  20. Building a model: developing genomic resources for common milkweed (Asclepias syriaca) with low coverage genome sequencing.

    Science.gov (United States)

    Straub, Shannon C K; Fishbein, Mark; Livshultz, Tatyana; Foster, Zachary; Parks, Matthew; Weitemier, Kevin; Cronn, Richard C; Liston, Aaron

    2011-05-04

    Milkweeds (Asclepias L.) have been extensively investigated in diverse areas of evolutionary biology and ecology; however, there are few genetic resources available to facilitate and compliment these studies. This study explored how low coverage genome sequencing of the common milkweed (Asclepias syriaca L.) could be useful in characterizing the genome of a plant without prior genomic information and for development of genomic resources as a step toward further developing A. syriaca as a model in ecology and evolution. A 0.5× genome of A. syriaca was produced using Illumina sequencing. A virtually complete chloroplast genome of 158,598 bp was assembled, revealing few repeats and loss of three genes: accD, clpP, and ycf1. A nearly complete rDNA cistron (18S-5.8S-26S; 7,541 bp) and 5S rDNA (120 bp) sequence were obtained. Assessment of polymorphism revealed that the rDNA cistron and 5S rDNA had 0.3% and 26.7% polymorphic sites, respectively. A partial mitochondrial genome sequence (130,764 bp), with identical gene content to tobacco, was also assembled. An initial characterization of repeat content indicated that Ty1/copia-like retroelements are the most common repeat type in the milkweed genome. At least one A. syriaca microread hit 88% of Catharanthus roseus (Apocynaceae) unigenes (median coverage of 0.29×) and 66% of single copy orthologs (COSII) in asterids (median coverage of 0.14×). From this partial characterization of the A. syriaca genome, markers for population genetics (microsatellites) and phylogenetics (low-copy nuclear genes) studies were developed. The results highlight the promise of next generation sequencing for development of genomic resources for any organism. Low coverage genome sequencing allows characterization of the high copy fraction of the genome and exploration of the low copy fraction of the genome, which facilitate the development of molecular tools for further study of a target species and its relatives. This study represents a first

  1. Automated and Accurate Estimation of Gene Family Abundance from Shotgun Metagenomes.

    Directory of Open Access Journals (Sweden)

    Stephen Nayfach

    2015-11-01

    Full Text Available Shotgun metagenomic DNA sequencing is a widely applicable tool for characterizing the functions that are encoded by microbial communities. Several bioinformatic tools can be used to functionally annotate metagenomes, allowing researchers to draw inferences about the functional potential of the community and to identify putative functional biomarkers. However, little is known about how decisions made during annotation affect the reliability of the results. Here, we use statistical simulations to rigorously assess how to optimize annotation accuracy and speed, given parameters of the input data like read length and library size. We identify best practices in metagenome annotation and use them to guide the development of the Shotgun Metagenome Annotation Pipeline (ShotMAP. ShotMAP is an analytically flexible, end-to-end annotation pipeline that can be implemented either on a local computer or a cloud compute cluster. We use ShotMAP to assess how different annotation databases impact the interpretation of how marine metagenome and metatranscriptome functional capacity changes across seasons. We also apply ShotMAP to data obtained from a clinical microbiome investigation of inflammatory bowel disease. This analysis finds that gut microbiota collected from Crohn's disease patients are functionally distinct from gut microbiota collected from either ulcerative colitis patients or healthy controls, with differential abundance of metabolic pathways related to host-microbiome interactions that may serve as putative biomarkers of disease.

  2. Detection of Bacterial Pathogens from Broncho-Alveolar Lavage by Next-Generation Sequencing.

    Science.gov (United States)

    Leo, Stefano; Gaïa, Nadia; Ruppé, Etienne; Emonet, Stephane; Girard, Myriam; Lazarevic, Vladimir; Schrenzel, Jacques

    2017-09-20

    The applications of whole-metagenome shotgun sequencing (WMGS) in routine clinical analysis are still limited. A combination of a DNA extraction procedure, sequencing, and bioinformatics tools is essential for the removal of human DNA and for improving bacterial species identification in a timely manner. We tackled these issues with a broncho-alveolar lavage (BAL) sample from an immunocompromised patient who had developed severe chronic pneumonia. We extracted DNA from the BAL sample with protocols based either on sequential lysis of human and bacterial cells or on the mechanical disruption of all cells. Metagenomic libraries were sequenced on Illumina HiSeq platforms. Microbial community composition was determined by k-mer analysis or by mapping to taxonomic markers. Results were compared to those obtained by conventional clinical culture and molecular methods. Compared to mechanical cell disruption, a sequential lysis protocol resulted in a significantly increased proportion of bacterial DNA over human DNA and higher sequence coverage of Mycobacterium abscessus , Corynebacterium jeikeium and Rothia dentocariosa , the bacteria reported by clinical microbiology tests. In addition, we identified anaerobic bacteria not searched for by the clinical laboratory. Our results further support the implementation of WMGS in clinical routine diagnosis for bacterial identification.

  3. Cost-effective sequencing of full-length cDNA clones powered by a de novo-reference hybrid assembly.

    Science.gov (United States)

    Kuroshu, Reginaldo M; Watanabe, Junichi; Sugano, Sumio; Morishita, Shinichi; Suzuki, Yutaka; Kasahara, Masahiro

    2010-05-07

    Sequencing full-length cDNA clones is important to determine gene structures including alternative splice forms, and provides valuable resources for experimental analyses to reveal the biological functions of coded proteins. However, previous approaches for sequencing cDNA clones were expensive or time-consuming, and therefore, a fast and efficient sequencing approach was demanded. We developed a program, MuSICA 2, that assembles millions of short (36-nucleotide) reads collected from a single flow cell lane of Illumina Genome Analyzer to shotgun-sequence approximately 800 human full-length cDNA clones. MuSICA 2 performs a hybrid assembly in which an external de novo assembler is run first and the result is then improved by reference alignment of shotgun reads. We compared the MuSICA 2 assembly with 200 pooled full-length cDNA clones finished independently by the conventional primer-walking using Sanger sequencers. The exon-intron structure of the coding sequence was correct for more than 95% of the clones with coding sequence annotation when we excluded cDNA clones insufficiently represented in the shotgun library due to PCR failure (42 out of 200 clones excluded), and the nucleotide-level accuracy of coding sequences of those correct clones was over 99.99%. We also applied MuSICA 2 to full-length cDNA clones from Toxoplasma gondii, to confirm that its ability was competent even for non-human species. The entire sequencing and shotgun assembly takes less than 1 week and the consumables cost only approximately US$3 per clone, demonstrating a significant advantage over previous approaches.

  4. Fine-scale variation in meiotic recombination in Mimulus inferred from population shotgun sequencing

    Energy Technology Data Exchange (ETDEWEB)

    Hellsten, Uffe [USDOE Joint Genome Inst., Walnut Creek, CA (United States); Wright, Kevin M. [Harvard Univ., Cambridge, MA (United States); Jenkins, Jerry [USDOE Joint Genome Inst., Walnut Creek, CA (United States); HudsonAlpha Inst. of Biotechnology, Huntsville, AL (United States); Shu, Shengqiang [USDOE Joint Genome Inst., Walnut Creek, CA (United States); Yuan, Yao-Wu [Univ. of Connecticut, Storrs, CT (United States); Wessler, Susan R. [Univ. of California, Riverside, CA (United States); Schmutz, Jeremy [USDOE Joint Genome Inst., Walnut Creek, CA (United States); HudsonAlpha Inst. of Biotechnology, Huntsville, AL (United States); Willis, John H. [Duke Univ., Durham, NC (United States); Rokhsar, Daniel S. [USDOE Joint Genome Inst., Walnut Creek, CA (United States); Univ. of California, Berkeley, CA (United States)

    2013-11-13

    Meiotic recombination rates can vary widely across genomes, with hotspots of intense activity interspersed among cold regions. In yeast, hotspots tend to occur in promoter regions of genes, whereas in humans and mice hotspots are largely defined by binding sites of the PRDM9 protein. To investigate the detailed recombination pattern in a flowering plant we use shotgun resequencing of a wild population of the monkeyflower Mimulus guttatus to precisely locate over 400,000 boundaries of historic crossovers or gene conversion tracts. Their distribution defines some 13,000 hotspots of varying strengths, interspersed with cold regions of undetectably low recombination. Average recombination rates peak near starts of genes and fall off sharply, exhibiting polarity. Within genes, recombination tracts are more likely to terminate in exons than in introns. The general pattern is similar to that observed in yeast, as well as in PRDM9-knockout mice, suggesting that recombination initiation described here in Mimulus may reflect ancient and conserved eukaryotic mechanisms

  5. Advancing Eucalyptus genomics: identification and sequencing of lignin biosynthesis genes from deep-coverage BAC libraries

    Directory of Open Access Journals (Sweden)

    Kudrna David

    2011-03-01

    Full Text Available Abstract Background Eucalyptus species are among the most planted hardwoods in the world because of their rapid growth, adaptability and valuable wood properties. The development and integration of genomic resources into breeding practice will be increasingly important in the decades to come. Bacterial artificial chromosome (BAC libraries are key genomic tools that enable positional cloning of important traits, synteny evaluation, and the development of genome framework physical maps for genetic linkage and genome sequencing. Results We describe the construction and characterization of two deep-coverage BAC libraries EG_Ba and EG_Bb obtained from nuclear DNA fragments of E. grandis (clone BRASUZ1 digested with HindIII and BstYI, respectively. Genome coverages of 17 and 15 haploid genome equivalents were estimated for EG_Ba and EG_Bb, respectively. Both libraries contained large inserts, with average sizes ranging from 135 Kb (Eg_Bb to 157 Kb (Eg_Ba, very low extra-nuclear genome contamination providing a probability of finding a single copy gene ≥ 99.99%. Libraries were screened for the presence of several genes of interest via hybridizations to high-density BAC filters followed by PCR validation. Five selected BAC clones were sequenced and assembled using the Roche GS FLX technology providing the whole sequence of the E. grandis chloroplast genome, and complete genomic sequences of important lignin biosynthesis genes. Conclusions The two E. grandis BAC libraries described in this study represent an important milestone for the advancement of Eucalyptus genomics and forest tree research. These BAC resources have a highly redundant genome coverage (> 15×, contain large average inserts and have a very low percentage of clones with organellar DNA or empty vectors. These publicly available BAC libraries are thus suitable for a broad range of applications in genetic and genomic research in Eucalyptus and possibly in related species of Myrtaceae

  6. Genome shotgun sequencing and development of microsatellite ...

    African Journals Online (AJOL)

    Analysis of the gerbera genome DNA ('Raon') general library showed that sequences of (AT), (AG), (AAG) and (AAT) repeats appeared most often, whereas (AC), (AAC) and (ACC) were the least frequent. Primer pairs were designed for 80 loci. Only eight primer pairs produced reproducible polymorphic bands in the 28 ...

  7. A theoretical justification for single molecule peptide sequencing.

    Directory of Open Access Journals (Sweden)

    Jagannath Swaminathan

    2015-02-01

    Full Text Available The proteomes of cells, tissues, and organisms reflect active cellular processes and change continuously in response to intracellular and extracellular cues. Deep, quantitative profiling of the proteome, especially if combined with mRNA and metabolite measurements, should provide an unprecedented view of cell state, better revealing functions and interactions of cell components. Molecular diagnostics and biomarker discovery should benefit particularly from the accurate quantification of proteomes, since complex diseases like cancer change protein abundances and modifications. Currently, shotgun mass spectrometry is the primary technology for high-throughput protein identification and quantification; while powerful, it lacks high sensitivity and coverage. We draw parallels with next-generation DNA sequencing and propose a strategy, termed fluorosequencing, for sequencing peptides in a complex protein sample at the level of single molecules. In the proposed approach, millions of individual fluorescently labeled peptides are visualized in parallel, monitoring changing patterns of fluorescence intensity as N-terminal amino acids are sequentially removed, and using the resulting fluorescence signatures (fluorosequences to uniquely identify individual peptides. We introduce a theoretical foundation for fluorosequencing and, by using Monte Carlo computer simulations, we explore its feasibility, anticipate the most likely experimental errors, quantify their potential impact, and discuss the broad potential utility offered by a high-throughput peptide sequencing technology.

  8. Building a model: developing genomic resources for common milkweed (Asclepias syriaca with low coverage genome sequencing

    Directory of Open Access Journals (Sweden)

    Weitemier Kevin

    2011-05-01

    Full Text Available Abstract Background Milkweeds (Asclepias L. have been extensively investigated in diverse areas of evolutionary biology and ecology; however, there are few genetic resources available to facilitate and compliment these studies. This study explored how low coverage genome sequencing of the common milkweed (Asclepias syriaca L. could be useful in characterizing the genome of a plant without prior genomic information and for development of genomic resources as a step toward further developing A. syriaca as a model in ecology and evolution. Results A 0.5× genome of A. syriaca was produced using Illumina sequencing. A virtually complete chloroplast genome of 158,598 bp was assembled, revealing few repeats and loss of three genes: accD, clpP, and ycf1. A nearly complete rDNA cistron (18S-5.8S-26S; 7,541 bp and 5S rDNA (120 bp sequence were obtained. Assessment of polymorphism revealed that the rDNA cistron and 5S rDNA had 0.3% and 26.7% polymorphic sites, respectively. A partial mitochondrial genome sequence (130,764 bp, with identical gene content to tobacco, was also assembled. An initial characterization of repeat content indicated that Ty1/copia-like retroelements are the most common repeat type in the milkweed genome. At least one A. syriaca microread hit 88% of Catharanthus roseus (Apocynaceae unigenes (median coverage of 0.29× and 66% of single copy orthologs (COSII in asterids (median coverage of 0.14×. From this partial characterization of the A. syriaca genome, markers for population genetics (microsatellites and phylogenetics (low-copy nuclear genes studies were developed. Conclusions The results highlight the promise of next generation sequencing for development of genomic resources for any organism. Low coverage genome sequencing allows characterization of the high copy fraction of the genome and exploration of the low copy fraction of the genome, which facilitate the development of molecular tools for further study of a target species

  9. Shotgun approaches to gait analysis : insights & limitations

    NARCIS (Netherlands)

    Kaptein, Ronald G.; Wezenberg, Daphne; IJmker, Trienke; Houdijk, Han; Beek, Peter J.; Lamoth, Claudine J. C.; Daffertshofer, Andreas

    2014-01-01

    Background: Identifying features for gait classification is a formidable problem. The number of candidate measures is legion. This calls for proper, objective criteria when ranking their relevance. Methods: Following a shotgun approach we determined a plenitude of kinematic and physiological gait

  10. An Enumerative Combinatorics Model for Fragmentation Patterns in RNA Sequencing Provides Insights into Nonuniformity of the Expected Fragment Starting-Point and Coverage Profile.

    Science.gov (United States)

    Prakash, Celine; Haeseler, Arndt Von

    2017-03-01

    RNA sequencing (RNA-seq) has emerged as the method of choice for measuring the expression of RNAs in a given cell population. In most RNA-seq technologies, sequencing the full length of RNA molecules requires fragmentation into smaller pieces. Unfortunately, the issue of nonuniform sequencing coverage across a genomic feature has been a concern in RNA-seq and is attributed to biases for certain fragments in RNA-seq library preparation and sequencing. To investigate the expected coverage obtained from fragmentation, we develop a simple fragmentation model that is independent of bias from the experimental method and is not specific to the transcript sequence. Essentially, we enumerate all configurations for maximal placement of a given fragment length, F, on transcript length, T, to represent every possible fragmentation pattern, from which we compute the expected coverage profile across a transcript. We extend this model to incorporate general empirical attributes such as read length, fragment length distribution, and number of molecules of the transcript. We further introduce the fragment starting-point, fragment coverage, and read coverage profiles. We find that the expected profiles are not uniform and that factors such as fragment length to transcript length ratio, read length to fragment length ratio, fragment length distribution, and number of molecules influence the variability of coverage across a transcript. Finally, we explore a potential application of the model where, with simulations, we show that it is possible to correctly estimate the transcript copy number for any transcript in the RNA-seq experiment.

  11. Genomic Selection Using Genotyping-By-Sequencing Data with Different Coverage Depth in Perennial Ryegrass

    DEFF Research Database (Denmark)

    Cericola, Fabio; Fé, Dario; Janss, Luc

    2015-01-01

    the diagonal elements by estimating the amount of genetic variance caused by the reduction of the coverage depth. Secondly we developed a method to scale the relationship matrix by taking into account the overall amount of pairwise non-missing loci between all families. Rust resistance and heading date were......Genotyping by sequencing (GBS) allows generating up to millions of molecular markers with a cost per sample which is proportional to the level of multiplexing. Increasing the sample multiplexing decreases the genotyping price but also reduces the numbers of reads per marker. In this work we...... investigated how this reduction of the coverage depth affects the genomic relationship matrices used to estimated breeding value of F2 family pools in perennial ryegrass. A total of 995 families were genotyped via GBS providing more than 1.8M allele frequency estimates for each family with an average coverage...

  12. Low-pass sequencing for microbial comparative genomics

    Directory of Open Access Journals (Sweden)

    Kennedy Sean

    2004-01-01

    Full Text Available Abstract Background We studied four extremely halophilic archaea by low-pass shotgun sequencing: (1 the metabolically versatile Haloarcula marismortui; (2 the non-pigmented Natrialba asiatica; (3 the psychrophile Halorubrum lacusprofundi and (4 the Dead Sea isolate Halobaculum gomorrense. Approximately one thousand single pass genomic sequences per genome were obtained. The data were analyzed by comparative genomic analyses using the completed Halobacterium sp. NRC-1 genome as a reference. Low-pass shotgun sequencing is a simple, inexpensive, and rapid approach that can readily be performed on any cultured microbe. Results As expected, the four archaeal halophiles analyzed exhibit both bacterial and eukaryotic characteristics as well as uniquely archaeal traits. All five halophiles exhibit greater than sixty percent GC content and low isoelectric points (pI for their predicted proteins. Multiple insertion sequence (IS elements, often involved in genome rearrangements, were identified in H. lacusprofundi and H. marismortui. The core biological functions that govern cellular and genetic mechanisms of H. sp. NRC-1 appear to be conserved in these four other halophiles. Multiple TATA box binding protein (TBP and transcription factor IIB (TFB homologs were identified from most of the four shotgunned halophiles. The reconstructed molecular tree of all five halophiles shows a large divergence between these species, but with the closest relationship being between H. sp. NRC-1 and H. lacusprofundi. Conclusion Despite the diverse habitats of these species, all five halophiles share (1 high GC content and (2 low protein isoelectric points, which are characteristics associated with environmental exposure to UV radiation and hypersalinity, respectively. Identification of multiple IS elements in the genome of H. lacusprofundi and H. marismortui suggest that genome structure and dynamic genome reorganization might be similar to that previously observed in the

  13. BrAD-seq: Breath Adapter Directional sequencing: a streamlined, ultra-simple and fast library preparation protocol for strand specific mRNA library construction.

    Directory of Open Access Journals (Sweden)

    Brad Thomas Townsley

    2015-05-01

    Full Text Available Next Generation Sequencing (NGS is driving rapid advancement in biological understanding and RNA-sequencing (RNA-seq has become an indispensable tool for biology and medicine. There is a growing need for access to these technologies although preparation of NGS libraries remains a bottleneck to wider adoption. Here we report a novel method for the production of strand specific RNA-seq libraries utilizing inherent properties of double-stranded cDNA to capture and incorporate a sequencing adapter. Breath Adapter Directional sequencing (BrAD-seq reduces sample handling and requires far fewer enzymatic steps than most available methods to produce high quality strand-specific RNA-seq libraries. The method we present is optimized for 3-prime Digital Gene Expression (DGE libraries and can easily extend to full transcript coverage shotgun (SHO type strand-specific libraries and is modularized to accommodate a diversity of RNA and DNA input materials. BrAD-seq offers a highly streamlined and inexpensive option for RNA-seq libraries.

  14. Seasonal changes in the communities of photosynthetic picoeukaryotes in Ofunato Bay as revealed by shotgun metagenomic sequencing

    KAUST Repository

    Rashid, Jonaira; Kobiyama, Atsushi; Reza, Md. Shaheed; Yamada, Yuichiro; Ikeda, Yuri; Ikeda, Daisuke; Mizusawa, Nanami; Ikeo, Kazuho; Sato, Shigeru; Ogata, Takehiko; Kudo, Toshiaki; Kaga, Shinnosuke; Watanabe, Shiho; Naiki, Kimiaki; Kaga, Yoshimasa; Mineta, Katsuhiko; Bajic, Vladimir B.; Gojobori, Takashi; Watabe, Shugo

    2018-01-01

    Small photosynthetic eukaryotes play important roles in oceanic food webs in coastal regions. We investigated seasonal changes in the communities of photosynthetic picoeukaryotes (PPEs) of the class Mamiellophyceae, including the genera Bathycoccus, Micromonas and Ostreococcus, in Ofunato Bay, which is located in northeastern Japan and faces the Pacific Ocean. The abundances of PPEs were assessed over a period of one year in 2015 at three sampling stations, KSt. 1 (innermost bay area), KSt. 2 (middle bay area) and KSt. 3 (bay entrance area) at depths of 1 m (KSt. 1, KSt. 2 and KSt. 3), 8 m (KSt. 1) or 10 m (KSt. 2 and KSt. 3) by employing MiSeq shotgun metagenomic sequencing. The total abundances of Bathycoccus, Ostreococcus and Micromonas were in the ranges of 42–49%, 35–49% and 13–17%, respectively. Considering all assayed sampling stations and depths, seasonal changes revealed high abundances of PPEs during the winter and summer and low abundances during late winter to early spring and late summer to early autumn. Bathycoccus was most abundant in the winter, and Ostreococcus showed a high abundance during the summer. Another genus, Micromonas, was relatively low in abundance throughout the study period. Taken together with previously suggested blooming periods of phytoplankton, as revealed by chlorophyll a concentrations in Ofunato Bay during spring and late autumn, these results for PPEs suggest that greater phytoplankton blooming has a negative influence on the seasonal occurrences of PPEs in the bay.

  15. Seasonal changes in the communities of photosynthetic picoeukaryotes in Ofunato Bay as revealed by shotgun metagenomic sequencing

    KAUST Repository

    Rashid, Jonaira

    2018-04-30

    Small photosynthetic eukaryotes play important roles in oceanic food webs in coastal regions. We investigated seasonal changes in the communities of photosynthetic picoeukaryotes (PPEs) of the class Mamiellophyceae, including the genera Bathycoccus, Micromonas and Ostreococcus, in Ofunato Bay, which is located in northeastern Japan and faces the Pacific Ocean. The abundances of PPEs were assessed over a period of one year in 2015 at three sampling stations, KSt. 1 (innermost bay area), KSt. 2 (middle bay area) and KSt. 3 (bay entrance area) at depths of 1 m (KSt. 1, KSt. 2 and KSt. 3), 8 m (KSt. 1) or 10 m (KSt. 2 and KSt. 3) by employing MiSeq shotgun metagenomic sequencing. The total abundances of Bathycoccus, Ostreococcus and Micromonas were in the ranges of 42–49%, 35–49% and 13–17%, respectively. Considering all assayed sampling stations and depths, seasonal changes revealed high abundances of PPEs during the winter and summer and low abundances during late winter to early spring and late summer to early autumn. Bathycoccus was most abundant in the winter, and Ostreococcus showed a high abundance during the summer. Another genus, Micromonas, was relatively low in abundance throughout the study period. Taken together with previously suggested blooming periods of phytoplankton, as revealed by chlorophyll a concentrations in Ofunato Bay during spring and late autumn, these results for PPEs suggest that greater phytoplankton blooming has a negative influence on the seasonal occurrences of PPEs in the bay.

  16. Determination of genetic relatedness from low-coverage human genome sequences using pedigree simulations.

    Science.gov (United States)

    Martin, Michael D; Jay, Flora; Castellano, Sergi; Slatkin, Montgomery

    2017-08-01

    We develop and evaluate methods for inferring relatedness among individuals from low-coverage DNA sequences of their genomes, with particular emphasis on sequences obtained from fossil remains. We suggest the major factors complicating the determination of relatedness among ancient individuals are sequencing depth, the number of overlapping sites, the sequencing error rate and the presence of contamination from present-day genetic sources. We develop a theoretical model that facilitates the exploration of these factors and their relative effects, via measurement of pairwise genetic distances, without calling genotypes, and determine the power to infer relatedness under various scenarios of varying sequencing depth, present-day contamination and sequencing error. The model is validated by a simulation study as well as the analysis of aligned sequences from present-day human genomes. We then apply the method to the recently published genome sequences of ancient Europeans, developing a statistical treatment to determine confidence in assigned relatedness that is, in some cases, more precise than previously reported. As the majority of ancient specimens are from animals, this method would be applicable to investigate kinship in nonhuman remains. The developed software grups (Genetic Relatedness Using Pedigree Simulations) is implemented in Python and freely available. © 2017 John Wiley & Sons Ltd.

  17. Genomic V exons from whole genome shotgun data in reptiles.

    Science.gov (United States)

    Olivieri, D N; von Haeften, B; Sánchez-Espinel, C; Faro, J; Gambón-Deza, F

    2014-08-01

    Reptiles and mammals diverged over 300 million years ago, creating two parallel evolutionary lineages amongst terrestrial vertebrates. In reptiles, two main evolutionary lines emerged: one gave rise to Squamata, while the other gave rise to Testudines, Crocodylia, and Aves. In this study, we determined the genomic variable (V) exons from whole genome shotgun sequencing (WGS) data in reptiles corresponding to the three main immunoglobulin (IG) loci and the four main T cell receptor (TR) loci. We show that Squamata lack the TRG and TRD genes, and snakes lack the IGKV genes. In representative species of Testudines and Crocodylia, the seven major IG and TR loci are maintained. As in mammals, genes of the IG loci can be grouped into well-defined IMGT clans through a multi-species phylogenetic analysis. We show that the reptilian IGHV and IGLV genes are distributed amongst the established mammalian clans, while their IGKV genes are found within a single clan, nearly exclusive from the mammalian sequences. The reptilian and mammalian TRAV genes cluster into six common evolutionary clades (since IMGT clans have not been defined for TR). In contrast, the reptilian TRBV genes cluster into three clades, which have few mammalian members. In this locus, the V exon sequences from mammals appear to have undergone different evolutionary diversification processes that occurred outside these shared reptilian clans. These sequences can be obtained in a freely available public repository (http://vgenerepertoire.org).

  18. Target-dependent enrichment of virions determines the reduction of high-throughput sequencing in virus discovery.

    Directory of Open Access Journals (Sweden)

    Randi Holm Jensen

    Full Text Available Viral infections cause many different diseases stemming both from well-characterized viral pathogens but also from emerging viruses, and the search for novel viruses continues to be of great importance. High-throughput sequencing is an important technology for this purpose. However, viral nucleic acids often constitute a minute proportion of the total genetic material in a sample from infected tissue. Techniques to enrich viral targets in high-throughput sequencing have been reported, but the sensitivity of such methods is not well established. This study compares different library preparation techniques targeting both DNA and RNA with and without virion enrichment. By optimizing the selection of intact virus particles, both by physical and enzymatic approaches, we assessed the effectiveness of the specific enrichment of viral sequences as compared to non-enriched sample preparations by selectively looking for and counting read sequences obtained from shotgun sequencing. Using shotgun sequencing of total DNA or RNA, viral targets were detected at concentrations corresponding to the predicted level, providing a foundation for estimating the effectiveness of virion enrichment. Virion enrichment typically produced a 1000-fold increase in the proportion of DNA virus sequences. For RNA virions the gain was less pronounced with a maximum 13-fold increase. This enrichment varied between the different sample concentrations, with no clear trend. Despite that less sequencing was required to identify target sequences, it was not evident from our data that a lower detection level was achieved by virion enrichment compared to shotgun sequencing.

  19. Shotgun metagenomics of 250 adult twins reveals genetic and environmental impacts on the gut microbiome

    DEFF Research Database (Denmark)

    Xie, Hailiang; Guo, Ruijin; Zhong, Huanzi

    2016-01-01

    The gut microbiota has been typically viewed as an environmental factor for human health. Twins are well suited for investigating the concordance of their gut microbiomes and decomposing genetic and environmental influences. However, existing twin studies utilizing metagenomic shotgun sequencing...... have included only a few samples. Here, we sequenced fecal samples from 250 adult twins in the TwinsUK registry and constructed a comprehensive gut microbial reference gene catalog. We demonstrate heritability of many microbial taxa and functional modules in the gut microbiome, including those...... associated with diseases. Moreover, we identified 8 million SNPs in the gut microbiome and observe a high similarity in microbiome SNPs between twins that slowly decreases after decades of living apart. The results shed new light on the genetic and environmental influences on the composition and function...

  20. Taxonomy of anaerobic digestion microbiome reveals biases associated with the applied high throughput sequencing strategies

    DEFF Research Database (Denmark)

    Campanaro, Stefano; Treu, Laura; Kougias, Panagiotis

    2018-01-01

    In the past few years, many studies investigated the anaerobic digestion microbiome by means of 16S rRNA amplicon sequencing. Results obtained from these studies were compared to each other without taking into consideration the followed procedure for amplicons preparation and data analysis...... specifically, the microbial compositions of three laboratory scale biogas reactors were analyzed before and after addition of sodium oleate by sequencing the microbiome with three different approaches: 16S rRNA amplicon sequencing, shotgun DNA and shotgun RNA. This comparative analysis revealed that......, in amplicon sequencing, abundance of some taxa (Euryarchaeota and Spirochaetes) was biased by the inefficiency of universal primers to hybridize all the templates. Reliability of the results obtained was also influenced by the number of hypervariable regions under investigation. Finally, amplicon sequencing...

  1. Survey of bacterial diversity in chronic wounds using Pyrosequencing, DGGE, and full ribosome shotgun sequencing

    Directory of Open Access Journals (Sweden)

    Wolcott Benjamin M

    2008-03-01

    Full Text Available Abstract Background Chronic wound pathogenic biofilms are host-pathogen environments that colonize and exist as a cohabitation of many bacterial species. These bacterial populations cooperate to promote their own survival and the chronic nature of the infection. Few studies have performed extensive surveys of the bacterial populations that occur within different types of chronic wound biofilms. The use of 3 separate16S-based molecular amplifications followed by pyrosequencing, shotgun Sanger sequencing, and denaturing gradient gel electrophoresis were utilized to survey the major populations of bacteria that occur in the pathogenic biofilms of three types of chronic wound types: diabetic foot ulcers (D, venous leg ulcers (V, and pressure ulcers (P. Results There are specific major populations of bacteria that were evident in the biofilms of all chronic wound types, including Staphylococcus, Pseudomonas, Peptoniphilus, Enterobacter, Stenotrophomonas, Finegoldia, and Serratia spp. Each of the wound types reveals marked differences in bacterial populations, such as pressure ulcers in which 62% of the populations were identified as obligate anaerobes. There were also populations of bacteria that were identified but not recognized as wound pathogens, such as Abiotrophia para-adiacens and Rhodopseudomonas spp. Results of molecular analyses were also compared to those obtained using traditional culture-based diagnostics. Only in one wound type did culture methods correctly identify the primary bacterial population indicating the need for improved diagnostic methods. Conclusion If clinicians can gain a better understanding of the wound's microbiota, it will give them a greater understanding of the wound's ecology and will allow them to better manage healing of the wound improving the prognosis of patients. This research highlights the necessity to begin evaluating, studying, and treating chronic wound pathogenic biofilms as multi-species entities in

  2. MUMAL: Multivariate analysis in shotgun proteomics using machine learning techniques

    Directory of Open Access Journals (Sweden)

    Cerqueira Fabio R

    2012-10-01

    Full Text Available Abstract Background The shotgun strategy (liquid chromatography coupled with tandem mass spectrometry is widely applied for identification of proteins in complex mixtures. This method gives rise to thousands of spectra in a single run, which are interpreted by computational tools. Such tools normally use a protein database from which peptide sequences are extracted for matching with experimentally derived mass spectral data. After the database search, the correctness of obtained peptide-spectrum matches (PSMs needs to be evaluated also by algorithms, as a manual curation of these huge datasets would be impractical. The target-decoy database strategy is largely used to perform spectrum evaluation. Nonetheless, this method has been applied without considering sensitivity, i.e., only error estimation is taken into account. A recently proposed method termed MUDE treats the target-decoy analysis as an optimization problem, where sensitivity is maximized. This method demonstrates a significant increase in the retrieved number of PSMs for a fixed error rate. However, the MUDE model is constructed in such a way that linear decision boundaries are established to separate correct from incorrect PSMs. Besides, the described heuristic for solving the optimization problem has to be executed many times to achieve a significant augmentation in sensitivity. Results Here, we propose a new method, termed MUMAL, for PSM assessment that is based on machine learning techniques. Our method can establish nonlinear decision boundaries, leading to a higher chance to retrieve more true positives. Furthermore, we need few iterations to achieve high sensitivities, strikingly shortening the running time of the whole process. Experiments show that our method achieves a considerably higher number of PSMs compared with standard tools such as MUDE, PeptideProphet, and typical target-decoy approaches. Conclusion Our approach not only enhances the computational performance, and

  3. Low-coverage MiSeq next generation sequencing reveals the mitochondrial genome of the Eastern Rock Lobster, Sagmariasus verreauxi.

    Science.gov (United States)

    Doyle, Stephen R; Griffith, Ian S; Murphy, Nick P; Strugnell, Jan M

    2015-01-01

    The complete mitochondrial genome of the Eastern Rock lobster, Sagmariasus verreauxi, is reported for the first time. Using low-coverage, long read MiSeq next generation sequencing, we constructed and determined the mtDNA genome organization of the 15,470 bp sequence from two isolates from Eastern Tasmania, Australia and Northern New Zealand, and identified 46 polymorphic nucleotides between the two sequences. This genome sequence and its genetic polymorphisms will likely be useful in understanding the distribution and population connectivity of the Eastern Rock Lobster, and in the fisheries management of this commercially important species.

  4. Performance Evaluation of NIPT in Detection of Chromosomal Copy Number Variants Using Low-Coverage Whole-Genome Sequencing of Plasma DNA

    DEFF Research Database (Denmark)

    Liu, Hongtai; Gao, Ya; Hu, Zhiyang

    2016-01-01

    , including 33 CNVs samples and 886 normal samples from September 1, 2011 to May 31, 2013, were enrolled in this study. The samples were randomly rearranged and blindly sequenced by low-coverage (about 7M reads) whole-genome sequencing of plasma DNA. Fetal CNVs were detected by Fetal Copy-number Analysis...

  5. A platform-independent method for detecting errors in metagenomic sequencing data: DRISEE.

    Directory of Open Access Journals (Sweden)

    Kevin P Keegan

    Full Text Available We provide a novel method, DRISEE (duplicate read inferred sequencing error estimation, to assess sequencing quality (alternatively referred to as "noise" or "error" within and/or between sequencing samples. DRISEE provides positional error estimates that can be used to inform read trimming within a sample. It also provides global (whole sample error estimates that can be used to identify samples with high or varying levels of sequencing error that may confound downstream analyses, particularly in the case of studies that utilize data from multiple sequencing samples. For shotgun metagenomic data, we believe that DRISEE provides estimates of sequencing error that are more accurate and less constrained by technical limitations than existing methods that rely on reference genomes or the use of scores (e.g. Phred. Here, DRISEE is applied to (non amplicon data sets from both the 454 and Illumina platforms. The DRISEE error estimate is obtained by analyzing sets of artifactual duplicate reads (ADRs, a known by-product of both sequencing platforms. We present DRISEE as an open-source, platform-independent method to assess sequencing error in shotgun metagenomic data, and utilize it to discover previously uncharacterized error in de novo sequence data from the 454 and Illumina sequencing platforms.

  6. Heap: a highly sensitive and accurate SNP detection tool for low-coverage high-throughput sequencing data

    KAUST Repository

    Kobayashi, Masaaki

    2017-04-20

    Recent availability of large-scale genomic resources enables us to conduct so called genome-wide association studies (GWAS) and genomic prediction (GP) studies, particularly with next-generation sequencing (NGS) data. The effectiveness of GWAS and GP depends on not only their mathematical models, but the quality and quantity of variants employed in the analysis. In NGS single nucleotide polymorphism (SNP) calling, conventional tools ideally require more reads for higher SNP sensitivity and accuracy. In this study, we aimed to develop a tool, Heap, that enables robustly sensitive and accurate calling of SNPs, particularly with a low coverage NGS data, which must be aligned to the reference genome sequences in advance. To reduce false positive SNPs, Heap determines genotypes and calls SNPs at each site except for sites at the both ends of reads or containing a minor allele supported by only one read. Performance comparison with existing tools showed that Heap achieved the highest F-scores with low coverage (7X) restriction-site associated DNA sequencing reads of sorghum and rice individuals. This will facilitate cost-effective GWAS and GP studies in this NGS era. Code and documentation of Heap are freely available from https://github.com/meiji-bioinf/heap (29 March 2017, date last accessed) and our web site (http://bioinf.mind.meiji.ac.jp/lab/en/tools.html (29 March 2017, date last accessed)).

  7. A combined meta-barcoding and shotgun metagenomic analysis of spontaneous wine fermentation.

    Science.gov (United States)

    Sternes, Peter R; Lee, Danna; Kutyna, Dariusz R; Borneman, Anthony R

    2017-07-01

    Wine is a complex beverage, comprising hundreds of metabolites produced through the action of yeasts and bacteria in fermenting grape must. Commercially, there is now a growing trend away from using wine yeast (Saccharomyces) starter cultures, toward the historic practice of uninoculated or "wild" fermentation, where the yeasts and bacteria associated with the grapes and/or winery perform the fermentation. It is the varied metabolic contributions of these numerous non-Saccharomyces species that are thought to impart complexity and desirable taste and aroma attributes to wild ferments in comparison to their inoculated counterparts. To map the microflora of spontaneous fermentation, metagenomic techniques were employed to characterize and monitor the progression of fungal species in 5 different wild fermentations. Both amplicon-based ribosomal DNA internal transcribed spacer (ITS) phylotyping and shotgun metagenomics were used to assess community structure across different stages of fermentation. While providing a sensitive and highly accurate means of characterizing the wine microbiome, the shotgun metagenomic data also uncovered a significant overabundance bias in the ITS phylotyping abundance estimations for the common non-Saccharomyces wine yeast genus Metschnikowia. By identifying biases such as that observed for Metschnikowia, abundance measurements from future ITS phylotyping datasets can be corrected to provide more accurate species representation. Ultimately, as more shotgun metagenomic and single-strain de novo assemblies for key wine species become available, the accuracy of both ITS-amplicon and shotgun studies will greatly increase, providing a powerful methodology for deciphering the influence of the microbial community on the wine flavor and aroma. © The Authors 2017. Published by Oxford University Press.

  8. Analysis of pig serum proteins based on shotgun liquid ...

    African Journals Online (AJOL)

    Recent advances in proteomics technologies have opened up significant opportunities for future applications. We used shotgun liquid chromatography, coupled with tandem mass spectrometry (LC-MS/MS) to determine the proteome profile of healthy pig serum. Samples of venous blood were collected and subjected to ...

  9. Comparison of two next-generation sequencing kits for diagnosis of epileptic disorders with a user-friendly tool for displaying gene coverage, DeCovA

    Directory of Open Access Journals (Sweden)

    Sarra Dimassi

    2015-12-01

    Full Text Available In recent years, molecular genetics has been playing an increasing role in the diagnostic process of monogenic epilepsies. Knowing the genetic basis of one patient's epilepsy provides accurate genetic counseling and may guide therapeutic options. Genetic diagnosis of epilepsy syndromes has long been based on Sanger sequencing and search for large rearrangements using MLPA or DNA arrays (array-CGH or SNP-array. Recently, next-generation sequencing (NGS was demonstrated to be a powerful approach to overcome the wide clinical and genetic heterogeneity of epileptic disorders. Coverage is critical for assessing the quality and accuracy of results from NGS. However, it is often a difficult parameter to display in practice. The aim of the study was to compare two library-building methods (Haloplex, Agilent and SeqCap EZ, Roche for a targeted panel of 41 genes causing monogenic epileptic disorders. We included 24 patients, 20 of whom had known disease-causing mutations. For each patient both libraries were built in parallel and sequenced on an Ion Torrent Personal Genome Machine (PGM. To compare coverage and depth, we developed a simple homemade tool, named DeCovA (Depth and Coverage Analysis. DeCovA displays the sequencing depth of each base and the coverage of target genes for each genomic position. The fraction of each gene covered at different thresholds could be easily estimated. None of the two methods used, namely NextGene and Ion Reporter, were able to identify all the known mutations/CNVs displayed by the 20 patients. Variant detection rate was globally similar for the two techniques and DeCovA showed that failure to detect a mutation was mainly related to insufficient coverage.

  10. A high-throughput shotgun mutagenesis approach to mapping B-cell antibody epitopes.

    Science.gov (United States)

    Davidson, Edgar; Doranz, Benjamin J

    2014-09-01

    Characterizing the binding sites of monoclonal antibodies (mAbs) on protein targets, their 'epitopes', can aid in the discovery and development of new therapeutics, diagnostics and vaccines. However, the speed of epitope mapping techniques has not kept pace with the increasingly large numbers of mAbs being isolated. Obtaining detailed epitope maps for functionally relevant antibodies can be challenging, particularly for conformational epitopes on structurally complex proteins. To enable rapid epitope mapping, we developed a high-throughput strategy, shotgun mutagenesis, that enables the identification of both linear and conformational epitopes in a fraction of the time required by conventional approaches. Shotgun mutagenesis epitope mapping is based on large-scale mutagenesis and rapid cellular testing of natively folded proteins. Hundreds of mutant plasmids are individually cloned, arrayed in 384-well microplates, expressed within human cells, and tested for mAb reactivity. Residues are identified as a component of a mAb epitope if their mutation (e.g. to alanine) does not support candidate mAb binding but does support that of other conformational mAbs or allows full protein function. Shotgun mutagenesis is particularly suited for studying structurally complex proteins because targets are expressed in their native form directly within human cells. Shotgun mutagenesis has been used to delineate hundreds of epitopes on a variety of proteins, including G protein-coupled receptor and viral envelope proteins. The epitopes mapped on dengue virus prM/E represent one of the largest collections of epitope information for any viral protein, and results are being used to design better vaccines and drugs. © 2014 John Wiley & Sons Ltd.

  11. CoverageAnalyzer (CAn: A Tool for Inspection of Modification Signatures in RNA Sequencing Profiles

    Directory of Open Access Journals (Sweden)

    Ralf Hauenschild

    2016-11-01

    Full Text Available Combination of reverse transcription (RT and deep sequencing has emerged as a powerful instrument for the detection of RNA modifications, a field that has seen a recent surge in activity because of its importance in gene regulation. Recent studies yielded high-resolution RT signatures of modified ribonucleotides relying on both sequence-dependent mismatch patterns and reverse transcription arrests. Common alignment viewers lack specialized functionality, such as filtering, tailored visualization, image export and differential analysis. Consequently, the community will profit from a platform seamlessly connecting detailed visual inspection of RT signatures and automated screening for modification candidates. CoverageAnalyzer (CAn was developed in response to the demand for a powerful inspection tool. It is freely available for all three main operating systems. With SAM file format as standard input, CAn is an intuitive and user-friendly tool that is generally applicable to the large community of biomedical users, starting from simple visualization of RNA sequencing (RNA-Seq data, up to sophisticated modification analysis with significance-based modification candidate calling.

  12. Genome sequencing of Deutsch strain of cattle ticks, Rhipicephalus microplus: Raw Pac Bio reads.

    Science.gov (United States)

    Pac Bio RS II whole genome shotgun sequencing technology was used to sequence the genome of the cattle tick, Rhipicephalus microplus. The DNA was derived from 14 day old eggs from the Deutsch Texas outbreak strain reared at the USDA-ARS Cattle Fever Tick Research Laboratory, Edinburg, TX. Each corre...

  13. A comparison of genotyping-by-sequencing analysis methods on low-coverage crop datasets shows advantages of a new workflow, GB-eaSy.

    Science.gov (United States)

    Wickland, Daniel P; Battu, Gopal; Hudson, Karen A; Diers, Brian W; Hudson, Matthew E

    2017-12-28

    Genotyping-by-sequencing (GBS), a method to identify genetic variants and quickly genotype samples, reduces genome complexity by using restriction enzymes to divide the genome into fragments whose ends are sequenced on short-read sequencing platforms. While cost-effective, this method produces extensive missing data and requires complex bioinformatics analysis. GBS is most commonly used on crop plant genomes, and because crop plants have highly variable ploidy and repeat content, the performance of GBS analysis software can vary by target organism. Here we focus our analysis on soybean, a polyploid crop with a highly duplicated genome, relatively little public GBS data and few dedicated tools. We compared the performance of five GBS pipelines using low-coverage Illumina sequence data from three soybean populations. To address issues identified with existing methods, we developed GB-eaSy, a GBS bioinformatics workflow that incorporates widely used genomics tools, parallelization and automation to increase the accuracy and accessibility of GBS data analysis. Compared to other GBS pipelines, GB-eaSy rapidly and accurately identified the greatest number of SNPs, with SNP calls closely concordant with whole-genome sequencing of selected lines. Across all five GBS analysis platforms, SNP calls showed unexpectedly low convergence but generally high accuracy, indicating that the workflows arrived at largely complementary sets of valid SNP calls on the low-coverage data analyzed. We show that GB-eaSy is approximately as good as, or better than, other leading software solutions in the accuracy, yield and missing data fraction of variant calling, as tested on low-coverage genomic data from soybean. It also performs well relative to other solutions in terms of the run time and disk space required. In addition, GB-eaSy is built from existing open-source, modular software packages that are regularly updated and commonly used, making it straightforward to install and maintain

  14. Global repeat discovery and estimation of genomic copy number in a large, complex genome using a high-throughput 454 sequence survey

    Directory of Open Access Journals (Sweden)

    Varala Kranthi

    2007-05-01

    Full Text Available Abstract Background Extensive computational and database tools are available to mine genomic and genetic databases for model organisms, but little genomic data is available for many species of ecological or agricultural significance, especially those with large genomes. Genome surveys using conventional sequencing techniques are powerful, particularly for detecting sequences present in many copies per genome. However these methods are time-consuming and have potential drawbacks. High throughput 454 sequencing provides an alternative method by which much information can be gained quickly and cheaply from high-coverage surveys of genomic DNA. Results We sequenced 78 million base-pairs of randomly sheared soybean DNA which passed our quality criteria. Computational analysis of the survey sequences provided global information on the abundant repetitive sequences in soybean. The sequence was used to determine the copy number across regions of large genomic clones or contigs and discover higher-order structures within satellite repeats. We have created an annotated, online database of sequences present in multiple copies in the soybean genome. The low bias of pyrosequencing against repeat sequences is demonstrated by the overall composition of the survey data, which matches well with past estimates of repetitive DNA content obtained by DNA re-association kinetics (Cot analysis. Conclusion This approach provides a potential aid to conventional or shotgun genome assembly, by allowing rapid assessment of copy number in any clone or clone-end sequence. In addition, we show that partial sequencing can provide access to partial protein-coding sequences.

  15. Shotgun proteomics of plant plasma membrane and microdomain proteins using nano-LC-MS/MS.

    Science.gov (United States)

    Takahashi, Daisuke; Li, Bin; Nakayama, Takato; Kawamura, Yukio; Uemura, Matsuo

    2014-01-01

    Shotgun proteomics allows the comprehensive analysis of proteins extracted from plant cells, subcellular organelles, and membranes. Previously, two-dimensional gel electrophoresis-based proteomics was used for mass spectrometric analysis of plasma membrane proteins. In order to get comprehensive proteome profiles of the plasma membrane including highly hydrophobic proteins with a number of transmembrane domains, a mass spectrometry-based shotgun proteomics method using nano-LC-MS/MS for proteins from the plasma membrane proteins and plasma membrane microdomain fraction is described. The results obtained are easily applicable to label-free protein semiquantification.

  16. Prediction of the neuropeptidomes of members of the Astacidea (Crustacea, Decapoda) using publicly accessible transcriptome shotgun assembly (TSA) sequence data.

    Science.gov (United States)

    Christie, Andrew E; Chi, Megan

    2015-12-01

    The decapod infraorder Astacidea is comprised of clawed lobsters and freshwater crayfish. Due to their economic importance and their use as models for investigating neurochemical signaling, much work has focused on elucidating their neurochemistry, particularly their peptidergic systems. Interestingly, no astacidean has been the subject of large-scale peptidomic analysis via in silico transcriptome mining, this despite growing transcriptomic resources for members of this taxon. Here, the publicly accessible astacidean transcriptome shotgun assembly data were mined for putative peptide-encoding transcripts; these sequences were used to predict the structures of mature neuropeptides. One hundred seventy-six distinct peptides were predicted for Procambarus clarkii, including isoforms of adipokinetic hormone-corazonin-like peptide (ACP), allatostatin A (AST-A), allatostatin B, allatostatin C (AST-C) bursicon α, bursicon β, CCHamide, crustacean hyperglycemic hormone (CHH)/ion transport peptide (ITP), diuretic hormone 31 (DH31), eclosion hormone (EH), FMRFamide-like peptide, GSEFLamide, intocin, leucokinin, neuroparsin, neuropeptide F, pigment dispersing hormone, pyrokinin, RYamide, short neuropeptide F (sNPF), SIFamide, sulfakinin and tachykinin-related peptide (TRP). Forty-six distinct peptides, including isoforms of AST-A, AST-C, bursicon α, CCHamide, CHH/ITP, DH31, EH, intocin, myosuppressin, neuroparsin, red pigment concentrating hormone, sNPF and TRP, were predicted for Pontastacus leptodactylus, with a bursicon β and a neuroparsin predicted for Cherax quadricarinatus. The identification of ACP is the first from a decapod, while the predictions of CCHamide, EH, GSEFLamide, intocin, neuroparsin and RYamide are firsts for the Astacidea. Collectively, these data greatly expand the catalog of known astacidean neuropeptides and provide a foundation for functional studies of peptidergic signaling in members of this decapod infraorder. Copyright © 2015 Elsevier Inc

  17. A comparison of rice chloroplast genomes

    DEFF Research Database (Denmark)

    Tang, Jiabin; Xia, Hong'ai; Cao, Mengliang

    2004-01-01

    Using high quality sequence reads extracted from our whole genome shotgun repository, we assembled two chloroplast genome sequences from two rice (Oryza sativa) varieties, one from 93-11 (a typical indica variety) and the other from PA64S (an indica-like variety with maternal origin of japonica......), which are both parental varieties of the super-hybrid rice, LYP9. Based on the patterns of high sequence coverage, we partitioned chloroplast sequence variations into two classes, intravarietal and intersubspecific polymorphisms. Intravarietal polymorphisms refer to variations within 93-11 or PA64S...

  18. Reference-quality genome sequence of Aegilops tauschii, the source of wheat D genome, shows that recombination shapes genome structure and evolution

    Science.gov (United States)

    Aegilops tauschii is the diploid progenitor of the D genome of hexaploid wheat and an important genetic resource for wheat. A reference-quality sequence for the Ae. tauschii genome was produced with a combination of ordered-clone sequencing, whole-genome shotgun sequencing, and BioNano optical geno...

  19. Propionibacterium acnes: disease-causing agent or common contaminant? Detection in diverse patient samples by next generation sequencing

    DEFF Research Database (Denmark)

    Mollerup, Sarah; Friis-Nielsen, Jens; Vinner, Lasse

    2016-01-01

    Propionibacterium acnes is the most abundant bacterium on human skin, particularly in sebaceous areas. P. acnes is suggested to be an opportunistic pathogen involved in the development of diverse medical conditions, but is also a proven contaminant of human samples and surgical wounds. Its...... significance as a pathogen is consequently a matter of debate.In the present study we investigated the presence of P. acnes DNA in 250 next generation sequencing datasets generated from 180 samples of 20 different sample types, mostly of cancerous origin. The samples were either subjected to microbial...... enrichment, involving nuclease treatment to reduce the amount of host nucleic acids, or shotgun-sequenced.We detected high proportions of P. acnes in enriched samples, particularly skin derived and other tissue samples, with levels being higher in enriched compared to shotgun-sequenced samples. P. acnes...

  20. Chimeric 16S rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR amplicons

    Science.gov (United States)

    Haas, Brian J.; Gevers, Dirk; Earl, Ashlee M.; Feldgarden, Mike; Ward, Doyle V.; Giannoukos, Georgia; Ciulla, Dawn; Tabbaa, Diana; Highlander, Sarah K.; Sodergren, Erica; Methé, Barbara; DeSantis, Todd Z.; Petrosino, Joseph F.; Knight, Rob; Birren, Bruce W.

    2011-01-01

    Bacterial diversity among environmental samples is commonly assessed with PCR-amplified 16S rRNA gene (16S) sequences. Perceived diversity, however, can be influenced by sample preparation, primer selection, and formation of chimeric 16S amplification products. Chimeras are hybrid products between multiple parent sequences that can be falsely interpreted as novel organisms, thus inflating apparent diversity. We developed a new chimera detection tool called Chimera Slayer (CS). CS detects chimeras with greater sensitivity than previous methods, performs well on short sequences such as those produced by the 454 Life Sciences (Roche) Genome Sequencer, and can scale to large data sets. By benchmarking CS performance against sequences derived from a controlled DNA mixture of known organisms and a simulated chimera set, we provide insights into the factors that affect chimera formation such as sequence abundance, the extent of similarity between 16S genes, and PCR conditions. Chimeras were found to reproducibly form among independent amplifications and contributed to false perceptions of sample diversity and the false identification of novel taxa, with less-abundant species exhibiting chimera rates exceeding 70%. Shotgun metagenomic sequences of our mock community appear to be devoid of 16S chimeras, supporting a role for shotgun metagenomics in validating novel organisms discovered in targeted sequence surveys. PMID:21212162

  1. Preliminary Genomic Characterization of Ten Hardwood Tree Species from Multiplexed Low Coverage Whole Genome Sequencing.

    Directory of Open Access Journals (Sweden)

    Margaret Staton

    Full Text Available Forest health issues are on the rise in the United States, resulting from introduction of alien pests and diseases, coupled with abiotic stresses related to climate change. Increasingly, forest scientists are finding genetic/genomic resources valuable in addressing forest health issues. For a set of ten ecologically and economically important native hardwood tree species representing a broad phylogenetic spectrum, we used low coverage whole genome sequencing from multiplex Illumina paired ends to economically profile their genomic content. For six species, the genome content was further analyzed by flow cytometry in order to determine the nuclear genome size. Sequencing yielded a depth of 0.8X to 7.5X, from which in silico analysis yielded preliminary estimates of gene and repetitive sequence content in the genome for each species. Thousands of genomic SSRs were identified, with a clear predisposition toward dinucleotide repeats and AT-rich repeat motifs. Flanking primers were designed for SSR loci for all ten species, ranging from 891 loci in sugar maple to 18,167 in redbay. In summary, we have demonstrated that useful preliminary genome information including repeat content, gene content and useful SSR markers can be obtained at low cost and time input from a single lane of Illumina multiplex sequence.

  2. Whole-genome shotgun sequencing of mitochondria from ancient hair shafts

    DEFF Research Database (Denmark)

    Gilbert, M Thomas P; Tomsho, Lynn P; Rendulic, Snjezana

    2007-01-01

    Although the application of sequencing-by-synthesis techniques to DNA extracted from bones has revolutionized the study of ancient DNA, it has been plagued by large fractions of contaminating environmental DNA. The genetic analyses of hair shafts could be a solution: We present 10 previously...

  3. Whole-genome sequencing approaches for conservation biology: Advantages, limitations and practical recommendations.

    Science.gov (United States)

    Fuentes-Pardo, Angela P; Ruzzante, Daniel E

    2017-10-01

    Whole-genome resequencing (WGR) is a powerful method for addressing fundamental evolutionary biology questions that have not been fully resolved using traditional methods. WGR includes four approaches: the sequencing of individuals to a high depth of coverage with either unresolved or resolved haplotypes, the sequencing of population genomes to a high depth by mixing equimolar amounts of unlabelled-individual DNA (Pool-seq) and the sequencing of multiple individuals from a population to a low depth (lcWGR). These techniques require the availability of a reference genome. This, along with the still high cost of shotgun sequencing and the large demand for computing resources and storage, has limited their implementation in nonmodel species with scarce genomic resources and in fields such as conservation biology. Our goal here is to describe the various WGR methods, their pros and cons and potential applications in conservation biology. WGR offers an unprecedented marker density and surveys a wide diversity of genetic variations not limited to single nucleotide polymorphisms (e.g., structural variants and mutations in regulatory elements), increasing their power for the detection of signatures of selection and local adaptation as well as for the identification of the genetic basis of phenotypic traits and diseases. Currently, though, no single WGR approach fulfils all requirements of conservation genetics, and each method has its own limitations and sources of potential bias. We discuss proposed ways to minimize such biases. We envision a not distant future where the analysis of whole genomes becomes a routine task in many nonmodel species and fields including conservation biology. © 2017 John Wiley & Sons Ltd.

  4. The first complete chloroplast genome sequence of a lycophyte,Huperzia lucidula (Lycopodiaceae)

    Energy Technology Data Exchange (ETDEWEB)

    Wolf, Paul G.; Karol, Kenneth G.; Mandoli, Dina F.; Kuehl,Jennifer V.; Arumuganathan, K.; Ellis, Mark W.; Mishler, Brent D.; Kelch,Dean G.; Olmstead, Richard G.; Boore, Jeffrey L.

    2005-02-01

    We used a unique combination of techniques to sequence the first complete chloroplast genome of a lycophyte, Huperzia lucidula. This plant belongs to a significant clade hypothesized to represent the sister group to all other vascular plants. We used fluorescence-activated cell sorting (FACS) to isolate the organelles, rolling circle amplification (RCA) to amplify the genome, and shotgun sequencing to 8x depth coverage to obtain the complete chloroplast genome sequence. The genome is 154,373bp, containing inverted repeats of 15,314 bp each, a large single-copy region of 104,088 bp, and a small single-copy region of 19,671 bp. Gene order is more similar to those of mosses, liverworts, and hornworts than to gene order for other vascular plants. For example, the Huperziachloroplast genome possesses the bryophyte gene order for a previously characterized 30 kb inversion, thus supporting the hypothesis that lycophytes are sister to all other extant vascular plants. The lycophytechloroplast genome data also enable a better reconstruction of the basaltracheophyte genome, which is useful for inferring relationships among bryophyte lineages. Several unique characters are observed in Huperzia, such as movement of the gene ndhF from the small single copy region into the inverted repeat. We present several analyses of evolutionary relationships among land plants by using nucleotide data, amino acid sequences, and by comparing gene arrangements from chloroplast genomes. The results, while still tentative pending the large number of chloroplast genomes from other key lineages that are soon to be sequenced, are intriguing in themselves, and contribute to a growing comparative database of genomic and morphological data across the green plants.

  5. Overview of recurrent chromosomal losses in retinoblastoma detected by low coverage next generation sequencing

    Science.gov (United States)

    García-Chequer, A.J.; Méndez-Tenorio, A.; Olguín-Ruiz, G.; Sánchez-Vallejo, C.; Isa, P.; Arias, C.F.; Torres, J.; Hernández-Angeles, A.; Ramírez-Ortiz, M.A.; Lara, C.; Cabrera-Muñoz, M.L.; Sadowinski-Pine, S.; Bravo-Ortiz, J.C.; Ramón-García, G.; Diegopérez-Ramírez, J.; Ramírez-Reyes, G.; Casarrubias-Islas, R.; Ramírez, J.; Orjuela, M.A.; Ponce-Castañeda, M.V.

    2016-01-01

    Genes are frequently lost or gained in malignant tumors and the analysis of these changes can be informative about the underlying tumor biology. Retinoblastoma is a pediatric intraocular malignancy, and since deletions in chromosome 13 have been described in this tumor, we performed genome wide sequencing with the Illumina platform to test whether recurrent losses could be detected in low coverage data from DNA pools of Rb cases. An in silico reference profile for each pool was created from the human genome sequence GRCh37p5; a chromosome integrity score and a graphics 40 Kb window analysis approach, allowed us to identify with high resolution previously reported non random recurrent losses in all chromosomes of these tumors. We also found a pattern of gains and losses associated to clear and dark cytogenetic bands respectively. We further analyze a pool of medulloblastoma and found a more stable genomic profile and previously reported losses in this tumor. This approach facilitates identification of recurrent deletions from many patients that may be biological relevant for tumor development. PMID:26883451

  6. Midpregnancy Marriage and Divorce: Why the Death of Shotgun Marriage Has Been Greatly Exaggerated.

    Science.gov (United States)

    Gibson-Davis, Christina M; Ananat, Elizabeth O; Gassman-Pines, Anna

    2016-12-01

    Conventional wisdom holds that births following the colloquially termed "shotgun marriage"-that is, births to parents who married between conception and the birth-are nearing obsolescence. To investigate trends in shotgun marriage, we matched North Carolina administrative data on nearly 800,000 first births among white and black mothers to marriage and divorce records. We found that among married births, midpregnancy-married births (our preferred term for shotgun-married births) have been relatively stable at about 10 % over the past quarter-century while increasing substantially for vulnerable population subgroups. In 2012, among black and white less-educated and younger women, midpregnancy-married births accounted for approximately 20 % to 25 % of married first births. The increasing representation of midpregnancy-married births among married births raises concerns about well-being among at-risk families because midpregnancy marriages may be quite fragile. Our analysis revealed, however, that midpregnancy marriages were more likely to dissolve only among more advantaged groups. Of those groups considered to be most at risk of divorce-namely, black women with lower levels of education and who were younger-midpregnancy marriages had the same or lower likelihood of divorce as preconception marriages. Our results suggest an overlooked resiliency in a type of marriage that has only increased in salience.

  7. Whole-genome sequence of Clostridium lituseburense L74, isolated from the larval gut of the rhinoceros beetle, Trypoxylus dichotomus

    OpenAIRE

    Lee, Yookyung; Lim, Sooyeon; Rhee, Moon-Soo; Chang, Dong-Ho; Kim, Byoung-Chan

    2016-01-01

    Clostridium lituseburense L74 was isolated from the larval gut of the rhinoceros beetle, Trypoxylus dichotomus collected in Yeong-dong, Chuncheongbuk-do, South Korea and subjected to whole genome sequencing on HiSeq platform and annotated on RAST. The nucleotide sequence of this genome was deposited into DDBJ/EMBL/GenBank under the accession NZ_LITJ00000000. Keywords: Insect, Larval gut, Whole genome shot-gun sequencing

  8. Low-pass shotgun sequencing of the barley genome facilitates rapid identification of genes, conserved non-coding sequences and novel repeats

    Directory of Open Access Journals (Sweden)

    Graner Andreas

    2008-10-01

    Full Text Available Abstract Background Barley has one of the largest and most complex genomes of all economically important food crops. The rise of new short read sequencing technologies such as Illumina/Solexa permits such large genomes to be effectively sampled at relatively low cost. Based on the corresponding sequence reads a Mathematically Defined Repeat (MDR index can be generated to map repetitive regions in genomic sequences. Results We have generated 574 Mbp of Illumina/Solexa sequences from barley total genomic DNA, representing about 10% of a genome equivalent. From these sequences we generated an MDR index which was then used to identify and mark repetitive regions in the barley genome. Comparison of the MDR plots with expert repeat annotation drawing on the information already available for known repetitive elements revealed a significant correspondence between the two methods. MDR-based annotation allowed for the identification of dozens of novel repeat sequences, though, which were not recognised by hand-annotation. The MDR data was also used to identify gene-containing regions by masking of repetitive sequences in eight de-novo sequenced bacterial artificial chromosome (BAC clones. For half of the identified candidate gene islands indeed gene sequences could be identified. MDR data were only of limited use, when mapped on genomic sequences from the closely related species Triticum monococcum as only a fraction of the repetitive sequences was recognised. Conclusion An MDR index for barley, which was obtained by whole-genome Illumina/Solexa sequencing, proved as efficient in repeat identification as manual expert annotation. Circumventing the labour-intensive step of producing a specific repeat library for expert annotation, an MDR index provides an elegant and efficient resource for the identification of repetitive and low-copy (i.e. potentially gene-containing sequences regions in uncharacterised genomic sequences. The restriction that a particular

  9. Defining Diagnostic Biomarkers Using Shotgun Proteomics and MALDI-TOF Mass Spectrometry.

    Science.gov (United States)

    Armengaud, Jean

    2017-01-01

    Whole-cell MALDI-TOF has become a robust and widely used tool to quickly identify any pathogen. In addition to being routinely used in hospitals, it is also useful for low cost dereplication in large scale screening procedures of new environmental isolates for environmental biotechnology or taxonomical applications. Here, I describe how specific biomarkers can be defined using shotgun proteomics and whole-cell MALDI-TOF mass spectrometry. Based on MALDI-TOF spectra recorded on a given set of pathogens with internal calibrants, m/z values of interest are extracted. The proteins which contribute to these peaks are deduced from label-free shotgun proteomics measurements carried out on the same sample. Quantitative information based on the spectral count approach allows ranking the most probable candidates. Proteogenomic approaches help to define whether these proteins give the same m/z values along the whole taxon under consideration or result in heterogeneous lists. These specific biomarkers nicely complement conventional profiling approaches and may help to better define groups of organisms, for example at the subspecies level.

  10. PAnalyzer: A software tool for protein inference in shotgun proteomics

    Directory of Open Access Journals (Sweden)

    Prieto Gorka

    2012-11-01

    Full Text Available Abstract Background Protein inference from peptide identifications in shotgun proteomics must deal with ambiguities that arise due to the presence of peptides shared between different proteins, which is common in higher eukaryotes. Recently data independent acquisition (DIA approaches have emerged as an alternative to the traditional data dependent acquisition (DDA in shotgun proteomics experiments. MSE is the term used to name one of the DIA approaches used in QTOF instruments. MSE data require specialized software to process acquired spectra and to perform peptide and protein identifications. However the software available at the moment does not group the identified proteins in a transparent way by taking into account peptide evidence categories. Furthermore the inspection, comparison and report of the obtained results require tedious manual intervention. Here we report a software tool to address these limitations for MSE data. Results In this paper we present PAnalyzer, a software tool focused on the protein inference process of shotgun proteomics. Our approach considers all the identified proteins and groups them when necessary indicating their confidence using different evidence categories. PAnalyzer can read protein identification files in the XML output format of the ProteinLynx Global Server (PLGS software provided by Waters Corporation for their MSE data, and also in the mzIdentML format recently standardized by HUPO-PSI. Multiple files can also be read simultaneously and are considered as technical replicates. Results are saved to CSV, HTML and mzIdentML (in the case of a single mzIdentML input file files. An MSE analysis of a real sample is presented to compare the results of PAnalyzer and ProteinLynx Global Server. Conclusions We present a software tool to deal with the ambiguities that arise in the protein inference process. Key contributions are support for MSE data analysis by ProteinLynx Global Server and technical replicates

  11. Rhipicephalus (Boophilus) microplus strain Deutsch, whole genome shotgun sequencing project first submission of genome sequence

    Science.gov (United States)

    The size and repetitive nature of the Rhipicephalus microplus genome makes obtaining a full genome sequence difficult. Cot filtration/selection techniques were used to reduce the repetitive fraction of the tick genome and enrich for the fraction of DNA with gene-containing regions. The Cot-selected ...

  12. Whole-genome sequence of Clostridium lituseburense L74, isolated from the larval gut of the rhinoceros beetle, Trypoxylus dichotomus

    Directory of Open Access Journals (Sweden)

    Yookyung Lee

    2016-03-01

    Full Text Available Clostridium lituseburense L74 was isolated from the larval gut of the rhinoceros beetle, Trypoxylus dichotomus collected in Yeong-dong, Chuncheongbuk-do, South Korea and subjected to whole genome sequencing on HiSeq platform and annotated on RAST. The nucleotide sequence of this genome was deposited into DDBJ/EMBL/GenBank under the accession NZ_LITJ00000000. Keywords: Insect, Larval gut, Whole genome shot-gun sequencing

  13. Analysis of high-throughput sequencing and annotation strategies for phage genomes.

    Directory of Open Access Journals (Sweden)

    Matthew R Henn

    Full Text Available BACKGROUND: Bacterial viruses (phages play a critical role in shaping microbial populations as they influence both host mortality and horizontal gene transfer. As such, they have a significant impact on local and global ecosystem function and human health. Despite their importance, little is known about the genomic diversity harbored in phages, as methods to capture complete phage genomes have been hampered by the lack of knowledge about the target genomes, and difficulties in generating sufficient quantities of genomic DNA for sequencing. Of the approximately 550 phage genomes currently available in the public domain, fewer than 5% are marine phage. METHODOLOGY/PRINCIPAL FINDINGS: To advance the study of phage biology through comparative genomic approaches we used marine cyanophage as a model system. We compared DNA preparation methodologies (DNA extraction directly from either phage lysates or CsCl purified phage particles, and sequencing strategies that utilize either Sanger sequencing of a linker amplification shotgun library (LASL or of a whole genome shotgun library (WGSL, or 454 pyrosequencing methods. We demonstrate that genomic DNA sample preparation directly from a phage lysate, combined with 454 pyrosequencing, is best suited for phage genome sequencing at scale, as this method is capable of capturing complete continuous genomes with high accuracy. In addition, we describe an automated annotation informatics pipeline that delivers high-quality annotation and yields few false positives and negatives in ORF calling. CONCLUSIONS/SIGNIFICANCE: These DNA preparation, sequencing and annotation strategies enable a high-throughput approach to the burgeoning field of phage genomics.

  14. Identification of meat products by shotgun spectral matching

    DEFF Research Database (Denmark)

    Ohana, D.; Dalebout, H.; Marissen, R. J.

    2016-01-01

    A new method, based on shotgun spectral matching of peptide tandem mass spectra, was successfully applied to the identification of different food species. The method was demonstrated to work on raw as well as processed samples from 16 mammalian and 10 bird species by counting spectral matches...... to spectral libraries in a reference database with one spectral library per species. A phylogenetic tree could also be constructed directly from the spectra. Nearly all samples could be correctly identified at the species level, and 100% at the genus level. The method does not use any genomic information...

  15. MULTI-DIMENSIONAL MASS SPECTROMETRY-BASED SHOTGUN LIPIDOMICS AND NOVEL STRATEGIES FOR LIPIDOMIC ANALYSES

    Science.gov (United States)

    Han, Xianlin; Yang, Kui; Gross, Richard W.

    2011-01-01

    Since our last comprehensive review on multi-dimensional mass spectrometry-based shotgun lipidomics (Mass Spectrom. Rev. 24 (2005), 367), many new developments in the field of lipidomics have occurred. These developments include new strategies and refinements for shotgun lipidomic approaches that use direct infusion, including novel fragmentation strategies, identification of multiple new informative dimensions for mass spectrometric interrogation, and the development of new bioinformatic approaches for enhanced identification and quantitation of the individual molecular constituents that comprise each cell’s lipidome. Concurrently, advances in liquid chromatography-based platforms and novel strategies for quantitative matrix-assisted laser desorption/ionization mass spectrometry for lipidomic analyses have been developed. Through the synergistic use of this repertoire of new mass spectrometric approaches, the power and scope of lipidomics has been greatly expanded to accelerate progress toward the comprehensive understanding of the pleiotropic roles of lipids in biological systems. PMID:21755525

  16. Rhipicephalus microplus strain Deutsch, whole genome shotgun sequencing project Version 2

    Science.gov (United States)

    The cattle tick, Rhipicephalus (Boophilus) microplus, has a genome over 2.4 times the size of the human genome, and with over 70% of repetitive DNA, this genome would prove very costly to sequence at today's prices and difficult to assemble and analyze. Cot filtration/selection techniques were used ...

  17. Enrichment allows identification of diverse, rare elements in metagenomic resistome-virulome sequencing.

    Science.gov (United States)

    Noyes, Noelle R; Weinroth, Maggie E; Parker, Jennifer K; Dean, Chris J; Lakin, Steven M; Raymond, Robert A; Rovira, Pablo; Doster, Enrique; Abdo, Zaid; Martin, Jennifer N; Jones, Kenneth L; Ruiz, Jaime; Boucher, Christina A; Belk, Keith E; Morley, Paul S

    2017-10-17

    Shotgun metagenomic sequencing is increasingly utilized as a tool to evaluate ecological-level dynamics of antimicrobial resistance and virulence, in conjunction with microbiome analysis. Interest in use of this method for environmental surveillance of antimicrobial resistance and pathogenic microorganisms is also increasing. In published metagenomic datasets, the total of all resistance- and virulence-related sequences accounts for enrichment system that incorporates unique molecular indices to count DNA molecules and correct for enrichment bias. The use of the bait-capture and enrichment system significantly increased on-target sequencing of the resistome-virulome, enabling detection of an additional 1441 gene accessions and revealing a low-abundance portion of the resistome-virulome that was more diverse and compositionally different than that detected by more traditional metagenomic assays. The low-abundance portion of the resistome-virulome also contained resistance genes with public health importance, such as extended-spectrum betalactamases, that were not detected using traditional shotgun metagenomic sequencing. In addition, the use of the bait-capture and enrichment system enabled identification of rare resistance gene haplotypes that were used to discriminate between sample origins. These results demonstrate that the rare resistome-virulome contains valuable and unique information that can be utilized for both surveillance and population genetic investigations of resistance. Access to the rare resistome-virulome using the bait-capture and enrichment system validated in this study can greatly advance our understanding of microbiome-resistome dynamics.

  18. MetaGaAP: A Novel Pipeline to Estimate Community Composition and Abundance from Non-Model Sequence Data

    Directory of Open Access Journals (Sweden)

    Christopher Noune

    2017-02-01

    Full Text Available Next generation sequencing and bioinformatic approaches are increasingly used to quantify microorganisms within populations by analysis of ‘meta-barcode’ data. This approach relies on comparison of amplicon sequences of ‘barcode’ regions from a population with public-domain databases of reference sequences. However, for many organisms relevant ‘barcode’ regions may not have been identified and large databases of reference sequences may not be available. A workflow and software pipeline, ‘MetaGaAP,’ was developed to identify and quantify genotypes through four steps: shotgun sequencing and identification of polymorphisms in a metapopulation to identify custom ‘barcode’ regions of less than 30 polymorphisms within the span of a single ‘read’, amplification and sequencing of the ‘barcode’, generation of a custom database of polymorphisms, and quantitation of the relative abundance of genotypes. The pipeline and workflow were validated in a ‘wild type’ Alphabaculovirus isolate, Helicoverpa armigera single nucleopolyhedrovirus (HaSNPV-AC53 and a tissue-culture derived strain (HaSNPV-AC53-T2. The approach was validated by comparison of polymorphisms in amplicons and shotgun data, and by comparison of predicted dominant and co-dominant genotypes with Sanger sequences. The computational power required to generate and search the database effectively limits the number of polymorphisms that can be included in a barcode to 30 or less. The approach can be used in quantitative analysis of the ecology and pathology of non-model organisms.

  19. Simultaneous and complete genome sequencing of influenza A and B with high coverage by Illumina MiSeq Platform.

    Science.gov (United States)

    Rutvisuttinunt, Wiriya; Chinnawirotpisan, Piyawan; Simasathien, Sriluck; Shrestha, Sanjaya K; Yoon, In-Kyu; Klungthong, Chonticha; Fernandez, Stefan

    2013-11-01

    Active global surveillance and characterization of influenza viruses are essential for better preparation against possible pandemic events. Obtaining comprehensive information about the influenza genome can improve our understanding of the evolution of influenza viruses and emergence of new strains, and improve the accuracy when designing preventive vaccines. This study investigated the use of deep sequencing by the next-generation sequencing (NGS) Illumina MiSeq Platform to obtain complete genome sequence information from influenza virus isolates. The influenza virus isolates were cultured from 6 respiratory acute clinical specimens collected in Thailand and Nepal. DNA libraries obtained from each viral isolate were mixed and all were sequenced simultaneously. Total information of 2.6 Gbases was obtained from a 455±14 K/mm2 density with 95.76% (8,571,655/8,950,724 clusters) of the clusters passing quality control (QC) filters. Approximately 93.7% of all sequences from Read1 and 83.5% from Read2 contained high quality sequences that were ≥Q30, a base calling QC score standard. Alignments analysis identified three seasonal influenza A H3N2 strains, one 2009 pandemic influenza A H1N1 strain and two influenza B strains. The nearly entire genomes of all six virus isolates yielded equal or greater than 600-fold sequence coverage depth. MiSeq Platform identified seasonal influenza A H3N2, 2009 pandemic influenza A H1N1and influenza B in the DNA library mixtures efficiently. Copyright © 2013 The Authors. Published by Elsevier B.V. All rights reserved.

  20. De Novo Assembly of Complete Chloroplast Genomes from Non-model Species Based on a K-mer Frequency-Based Selection of Chloroplast Reads from Total DNA Sequences

    Directory of Open Access Journals (Sweden)

    Shairul Izan

    2017-08-01

    Full Text Available Whole Genome Shotgun (WGS sequences of plant species often contain an abundance of reads that are derived from the chloroplast genome. Up to now these reads have generally been identified and assembled into chloroplast genomes based on homology to chloroplasts from related species. This re-sequencing approach may select against structural differences between the genomes especially in non-model species for which no close relatives have been sequenced before. The alternative approach is to de novo assemble the chloroplast genome from total genomic DNA sequences. In this study, we used k-mer frequency tables to identify and extract the chloroplast reads from the WGS reads and assemble these using a highly integrated and automated custom pipeline. Our strategy includes steps aimed at optimizing assemblies and filling gaps which are left due to coverage variation in the WGS dataset. We have successfully de novo assembled three complete chloroplast genomes from plant species with a range of nuclear genome sizes to demonstrate the universality of our approach: Solanum lycopersicum (0.9 Gb, Aegilops tauschii (4 Gb and Paphiopedilum henryanum (25 Gb. We also highlight the need to optimize the choice of k and the amount of data used. This new and cost-effective method for de novo short read assembly will facilitate the study of complete chloroplast genomes with more accurate analyses and inferences, especially in non-model plant genomes.

  1. Analysis of transposable elements in the genome of Asparagus officinalis from high coverage sequence data.

    Science.gov (United States)

    Li, Shu-Fen; Gao, Wu-Jun; Zhao, Xin-Peng; Dong, Tian-Yu; Deng, Chuan-Liang; Lu, Long-Dou

    2014-01-01

    Asparagus officinalis is an economically and nutritionally important vegetable crop that is widely cultivated and is used as a model dioecious species to study plant sex determination and sex chromosome evolution. To improve our understanding of its genome composition, especially with respect to transposable elements (TEs), which make up the majority of the genome, we performed Illumina HiSeq2000 sequencing of both male and female asparagus genomes followed by bioinformatics analysis. We generated 17 Gb of sequence (12×coverage) and assembled them into 163,406 scaffolds with a total cumulated length of 400 Mbp, which represent about 30% of asparagus genome. Overall, TEs masked about 53% of the A. officinalis assembly. Majority of the identified TEs belonged to LTR retrotransposons, which constitute about 28% of genomic DNA, with Ty1/copia elements being more diverse and accumulated to higher copy numbers than Ty3/gypsy. Compared with LTR retrotransposons, non-LTR retrotransposons and DNA transposons were relatively rare. In addition, comparison of the abundance of the TE groups between male and female genomes showed that the overall TE composition was highly similar, with only slight differences in the abundance of several TE groups, which is consistent with the relatively recent origin of asparagus sex chromosomes. This study greatly improves our knowledge of the repetitive sequence construction of asparagus, which facilitates the identification of TEs responsible for the early evolution of plant sex chromosomes and is helpful for further studies on this dioecious plant.

  2. Shotgun lipidomic analysis of chemically sulfated sterols compromises analytical sensitivity

    DEFF Research Database (Denmark)

    Casanovas, Albert; Hannibal-Bach, Hans Kristian; Jensen, Ole Nørregaard

    2014-01-01

    Shotgun lipidomics affords comprehensive and quantitative analysis of lipid species in cells and tissues at high-throughput [1 5]. The methodology is based on direct infusion of lipid extracts by electrospray ionization (ESI) combined with tandem mass spectrometry (MS/MS) and/or high resolution F...... low ionization efficiency in ESI [7]. For this reason, chemical derivatization procedures including acetylation [8] or sulfation [9] are commonly implemented to facilitate ionization, detection and quantification of sterols for global lipidome analysis [1-3, 10]....

  3. Library Design-Facilitated High-Throughput Sequencing of Synthetic Peptide Libraries.

    Science.gov (United States)

    Vinogradov, Alexander A; Gates, Zachary P; Zhang, Chi; Quartararo, Anthony J; Halloran, Kathryn H; Pentelute, Bradley L

    2017-11-13

    A methodology to achieve high-throughput de novo sequencing of synthetic peptide mixtures is reported. The approach leverages shotgun nanoliquid chromatography coupled with tandem mass spectrometry-based de novo sequencing of library mixtures (up to 2000 peptides) as well as automated data analysis protocols to filter away incorrect assignments, noise, and synthetic side-products. For increasing the confidence in the sequencing results, mass spectrometry-friendly library designs were developed that enabled unambiguous decoding of up to 600 peptide sequences per hour while maintaining greater than 85% sequence identification rates in most cases. The reliability of the reported decoding strategy was additionally confirmed by matching fragmentation spectra for select authentic peptides identified from library sequencing samples. The methods reported here are directly applicable to screening techniques that yield mixtures of active compounds, including particle sorting of one-bead one-compound libraries and affinity enrichment of synthetic library mixtures performed in solution.

  4. Binning of shallowly sampled metagenomic sequence fragments reveals that low abundance bacteria play important roles in sulfur cycling and degradation of complex organic polymers in an acid mine drainage community

    Science.gov (United States)

    Dick, G. J.; Andersson, A.; Banfield, J. F.

    2007-12-01

    Our understanding of environmental microbiology has been greatly enhanced by community genome sequencing of DNA recovered directly the environment. Community genomics provides insights into the diversity, community structure, metabolic function, and evolution of natural populations of uncultivated microbes, thereby revealing dynamics of how microorganisms interact with each other and their environment. Recent studies have demonstrated the potential for reconstructing near-complete genomes from natural environments while highlighting the challenges of analyzing community genomic sequence, especially from diverse environments. A major challenge of shotgun community genome sequencing is identification of DNA fragments from minor community members for which only low coverage of genomic sequence is present. We analyzed community genome sequence retrieved from biofilms in an acid mine drainage (AMD) system in the Richmond Mine at Iron Mountain, CA, with an emphasis on identification and assembly of DNA fragments from low-abundance community members. The Richmond mine hosts an extensive, relatively low diversity subterranean chemolithoautotrophic community that is sustained entirely by oxidative dissolution of pyrite. The activity of these microorganisms greatly accelerates the generation of AMD. Previous and ongoing work in our laboratory has focused on reconstrucing genomes of dominant community members, including several bacteria and archaea. We binned contigs from several samples (including one new sample and two that had been previously analyzed) by tetranucleotide frequency with clustering by Self-Organizing Maps (SOM). The binning, evaluated by comparison with information from the manually curated assembly of the dominant organisms, was found to be very effective: fragments were correctly assigned with 95% accuracy. Improperly assigned fragments often contained sequences that are either evolutionarily constrained (e.g. 16S rRNA genes) or mobile elements that are

  5. MerCat: a versatile k-mer counter and diversity estimator for database-independent property analysis obtained from metagenomic and/or metatranscriptomic sequencing data

    Energy Technology Data Exchange (ETDEWEB)

    White, Richard A.; Panyala, Ajay R.; Glass, Kevin A.; Colby, Sean M.; Glaesemann, Kurt R.; Jansson, Georg C.; Jansson, Janet K.

    2017-02-21

    MerCat is a parallel, highly scalable and modular property software package for robust analysis of features in next-generation sequencing data. MerCat inputs include assembled contigs and raw sequence reads from any platform resulting in feature abundance counts tables. MerCat allows for direct analysis of data properties without reference sequence database dependency commonly used by search tools such as BLAST and/or DIAMOND for compositional analysis of whole community shotgun sequencing (e.g. metagenomes and metatranscriptomes).

  6. GO Explorer: A gene-ontology tool to aid in the interpretation of shotgun proteomics data.

    Science.gov (United States)

    Carvalho, Paulo C; Fischer, Juliana Sg; Chen, Emily I; Domont, Gilberto B; Carvalho, Maria Gc; Degrave, Wim M; Yates, John R; Barbosa, Valmir C

    2009-02-24

    Spectral counting is a shotgun proteomics approach comprising the identification and relative quantitation of thousands of proteins in complex mixtures. However, this strategy generates bewildering amounts of data whose biological interpretation is a challenge. Here we present a new algorithm, termed GO Explorer (GOEx), that leverages the gene ontology (GO) to aid in the interpretation of proteomic data. GOEx stands out because it combines data from protein fold changes with GO over-representation statistics to help draw conclusions. Moreover, it is tightly integrated within the PatternLab for Proteomics project and, thus, lies within a complete computational environment that provides parsers and pattern recognition tools designed for spectral counting. GOEx offers three independent methods to query data: an interactive directed acyclic graph, a specialist mode where key words can be searched, and an automatic search. Its usefulness is demonstrated by applying it to help interpret the effects of perillyl alcohol, a natural chemotherapeutic agent, on glioblastoma multiform cell lines (A172). We used a new multi-surfactant shotgun proteomic strategy and identified more than 2600 proteins; GOEx pinpointed key sets of differentially expressed proteins related to cell cycle, alcohol catabolism, the Ras pathway, apoptosis, and stress response, to name a few. GOEx facilitates organism-specific studies by leveraging GO and providing a rich graphical user interface. It is a simple to use tool, specialized for biologists who wish to analyze spectral counting data from shotgun proteomics. GOEx is available at http://pcarvalho.com/patternlab.

  7. Shotgun glycomics of pig lung identifies natural endogenous receptors for influenza viruses.

    Science.gov (United States)

    Byrd-Leotis, Lauren; Liu, Renpeng; Bradley, Konrad C; Lasanajak, Yi; Cummings, Sandra F; Song, Xuezheng; Heimburg-Molinaro, Jamie; Galloway, Summer E; Culhane, Marie R; Smith, David F; Steinhauer, David A; Cummings, Richard D

    2014-06-03

    Influenza viruses bind to host cell surface glycans containing terminal sialic acids, but as studies on influenza binding become more sophisticated, it is becoming evident that although sialic acid may be necessary, it is not sufficient for productive binding. To better define endogenous glycans that serve as viral receptors, we have explored glycan recognition in the pig lung, because influenza is broadly disseminated in swine, and swine have been postulated as an intermediary host for the emergence of pandemic strains. For these studies, we used the technology of "shotgun glycomics" to identify natural receptor glycans. The total released N- and O-glycans from pig lung glycoproteins and glycolipid-derived glycans were fluorescently tagged and separated by multidimensional HPLC, and individual glycans were covalently printed to generate pig lung shotgun glycan microarrays. All viruses tested interacted with one or more sialylated N-glycans but not O-glycans or glycolipid-derived glycans, and each virus demonstrated novel and unexpected differences in endogenous N-glycan recognition. The results illustrate the repertoire of specific, endogenous N-glycans of pig lung glycoproteins for virus recognition and offer a new direction for studying endogenous glycan functions in viral pathogenesis.

  8. Technical Report: Benchmarking for Quasispecies Abundance Inference with Confidence Intervals from Metagenomic Sequence Data

    Energy Technology Data Exchange (ETDEWEB)

    McLoughlin, K. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

    2016-01-22

    The software application “MetaQuant” was developed by our group at Lawrence Livermore National Laboratory (LLNL). It is designed to profile microbial populations in a sample using data from whole-genome shotgun (WGS) metagenomic DNA sequencing. Several other metagenomic profiling applications have been described in the literature. We ran a series of benchmark tests to compare the performance of MetaQuant against that of a few existing profiling tools, using real and simulated sequence datasets. This report describes our benchmarking procedure and results.

  9. Soup to Tree: The Phylogeny of Beetles Inferred by Mitochondrial Metagenomics of a Bornean Rainforest Sample.

    Science.gov (United States)

    Crampton-Platt, Alex; Timmermans, Martijn J T N; Gimmel, Matthew L; Kutty, Sujatha Narayanan; Cockerill, Timothy D; Vun Khen, Chey; Vogler, Alfried P

    2015-09-01

    In spite of the growth of molecular ecology, systematics and next-generation sequencing, the discovery and analysis of diversity is not currently integrated with building the tree-of-life. Tropical arthropod ecologists are well placed to accelerate this process if all specimens obtained through mass-trapping, many of which will be new species, could be incorporated routinely into phylogeny reconstruction. Here we test a shotgun sequencing approach, whereby mitochondrial genomes are assembled from complex ecological mixtures through mitochondrial metagenomics, and demonstrate how the approach overcomes many of the taxonomic impediments to the study of biodiversity. DNA from approximately 500 beetle specimens, originating from a single rainforest canopy fogging sample from Borneo, was pooled and shotgun sequenced, followed by de novo assembly of complete and partial mitogenomes for 175 species. The phylogenetic tree obtained from this local sample was highly similar to that from existing mitogenomes selected for global coverage of major lineages of Coleoptera. When all sequences were combined only minor topological changes were induced against this reference set, indicating an increasingly stable estimate of coleopteran phylogeny, while the ecological sample expanded the tip-level representation of several lineages. Robust trees generated from ecological samples now enable an evolutionary framework for ecology. Meanwhile, the inclusion of uncharacterized samples in the tree-of-life rapidly expands taxon and biogeographic representation of lineages without morphological identification. Mitogenomes from shotgun sequencing of unsorted environmental samples and their associated metadata, placed robustly into the phylogenetic tree, constitute novel DNA "superbarcodes" for testing hypotheses regarding global patterns of diversity. © The Author 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  10. Facilitating genome navigation : survey sequencing and dense radiation-hybrid gene mapping

    NARCIS (Netherlands)

    Hitte, C; Madeoy, J; Kirkness, EF; Priat, C; Lorentzen, TD; Senger, F; Thomas, D; Derrien, T; Ramirez, C; Scott, C; Evanno, G; Pullar, B; Cadieu, E; Oza, [No Value; Lourgant, K; Jaffe, DB; Tacher, S; Dreano, S; Berkova, N; Andre, C; Deloukas, P; Fraser, C; Lindblad-Toh, K; Ostrander, EA; Galibert, F

    Accurate and comprehensive sequence coverage for large genomes has been restricted to only a few species of specific interest. Lower sequence coverage (survey sequencing) of related species can yield a wealth of information about gene content and putative regulatory elements. But survey sequences

  11. GO Explorer: A gene-ontology tool to aid in the interpretation of shotgun proteomics data

    Directory of Open Access Journals (Sweden)

    Domont Gilberto B

    2009-02-01

    Full Text Available Abstract Background Spectral counting is a shotgun proteomics approach comprising the identification and relative quantitation of thousands of proteins in complex mixtures. However, this strategy generates bewildering amounts of data whose biological interpretation is a challenge. Results Here we present a new algorithm, termed GO Explorer (GOEx, that leverages the gene ontology (GO to aid in the interpretation of proteomic data. GOEx stands out because it combines data from protein fold changes with GO over-representation statistics to help draw conclusions. Moreover, it is tightly integrated within the PatternLab for Proteomics project and, thus, lies within a complete computational environment that provides parsers and pattern recognition tools designed for spectral counting. GOEx offers three independent methods to query data: an interactive directed acyclic graph, a specialist mode where key words can be searched, and an automatic search. Its usefulness is demonstrated by applying it to help interpret the effects of perillyl alcohol, a natural chemotherapeutic agent, on glioblastoma multiform cell lines (A172. We used a new multi-surfactant shotgun proteomic strategy and identified more than 2600 proteins; GOEx pinpointed key sets of differentially expressed proteins related to cell cycle, alcohol catabolism, the Ras pathway, apoptosis, and stress response, to name a few. Conclusion GOEx facilitates organism-specific studies by leveraging GO and providing a rich graphical user interface. It is a simple to use tool, specialized for biologists who wish to analyze spectral counting data from shotgun proteomics. GOEx is available at http://pcarvalho.com/patternlab.

  12. Large pore dermal microdialysis and liquid chromatography-tandem mass spectroscopy shotgun proteomic analysis: a feasibility study.

    Science.gov (United States)

    Petersen, Lars J; Sørensen, Mette A; Codrea, Marius C; Zacho, Helle D; Bendixen, Emøke

    2013-11-01

    The purpose of the present pilot study was to investigate the feasibility of combining large pore dermal microdialysis with shotgun proteomic analysis in human skin. Dialysate was recovered from human skin by 2000 kDa microdialysis membranes from one subject at three different phases of the study; trauma due to implantation of the dialysis device, a post implantation steady-state period, and after induction of vasodilatation and plasma extravasation. For shotgun proteomics, the proteins were extracted and digested with trypsin. Peptides were separated by capillary and nanoflow HPLC systems, followed by tandem mass spectrometry (MS/MS) on a Quadrupole-TOF hybrid instrument. The MS/MS spectra were merged and mapped to a human target protein database to achieve peptide identification and protein inference. Results showed variation in protein amounts and profiles for each of the different sampling phases. The total protein concentration was 1.7, 0.6, and 1.3 mg/mL during the three phases, respectively. A total of 158 different proteins were identified. Immunoglobulins and the major classes of plasma proteins, including proteases, coagulation factors, apolipoproteins, albumins, and complement factors, make up the major load of proteins in all three test conditions. Shotgun proteomics allowed the identification of more than 150 proteins in microdialysis samples from human skin. This highlights the opportunities of LC-MS/MS to study the complex molecular interactions in the skin. © 2013 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  13. A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers

    Directory of Open Access Journals (Sweden)

    Quail Michael A

    2012-07-01

    Full Text Available Abstract Background Next generation sequencing (NGS technology has revolutionized genomic and genetic research. The pace of change in this area is rapid with three major new sequencing platforms having been released in 2011: Ion Torrent’s PGM, Pacific Biosciences’ RS and the Illumina MiSeq. Here we compare the results obtained with those platforms to the performance of the Illumina HiSeq, the current market leader. In order to compare these platforms, and get sufficient coverage depth to allow meaningful analysis, we have sequenced a set of 4 microbial genomes with mean GC content ranging from 19.3 to 67.7%. Together, these represent a comprehensive range of genome content. Here we report our analysis of that sequence data in terms of coverage distribution, bias, GC distribution, variant detection and accuracy. Results Sequence generated by Ion Torrent, MiSeq and Pacific Biosciences technologies displays near perfect coverage behaviour on GC-rich, neutral and moderately AT-rich genomes, but a profound bias was observed upon sequencing the extremely AT-rich genome of Plasmodium falciparum on the PGM, resulting in no coverage for approximately 30% of the genome. We analysed the ability to call variants from each platform and found that we could call slightly more variants from Ion Torrent data compared to MiSeq data, but at the expense of a higher false positive rate. Variant calling from Pacific Biosciences data was possible but higher coverage depth was required. Context specific errors were observed in both PGM and MiSeq data, but not in that from the Pacific Biosciences platform. Conclusions All three fast turnaround sequencers evaluated here were able to generate usable sequence. However there are key differences between the quality of that data and the applications it will support.

  14. Depletion of Human DNA in Spiked Clinical Specimens for Improvement of Sensitivity of Pathogen Detection by Next-Generation Sequencing

    OpenAIRE

    Hasan, Mohammad R.; Rawat, Arun; Tang, Patrick; Jithesh, Puthen V.; Thomas, Eva; Tan, Rusung; Tilley, Peter

    2016-01-01

    Next-generation sequencing (NGS) technology has shown promise for the detection of human pathogens from clinical samples. However, one of the major obstacles to the use of NGS in diagnostic microbiology is the low ratio of pathogen DNA to human DNA in most clinical specimens. In this study, we aimed to develop a specimen-processing protocol to remove human DNA and enrich specimens for bacterial and viral DNA for shotgun metagenomic sequencing. Cerebrospinal fluid (CSF) and nasopharyngeal aspi...

  15. Unbiased RNA Shotgun Metagenomics in Social and Solitary Wild Bees Detects Associations with Eukaryote Parasites and New Viruses.

    Directory of Open Access Journals (Sweden)

    Karel Schoonvaere

    Full Text Available The diversity of eukaryote organisms and viruses associated with wild bees remains poorly characterized in contrast to the well-documented pathosphere of the western honey bee, Apis mellifera. Using a deliberate RNA shotgun metagenomic sequencing strategy in combination with a dedicated bioinformatics workflow, we identified the (micro-organisms and viruses associated with two bumble bee hosts, Bombus terrestris and Bombus pascuorum, and two solitary bee hosts, Osmia cornuta and Andrena vaga. Ion Torrent semiconductor sequencing generated approximately 3.8 million high quality reads. The most significant eukaryote associations were two protozoan, Apicystis bombi and Crithidia bombi, and one nematode parasite Sphaerularia bombi in bumble bees. The trypanosome protozoan C. bombi was also found in the solitary bee O. cornuta. Next to the identification of three honey bee viruses Black queen cell virus, Sacbrood virus and Varroa destructor virus-1 and four plant viruses, we describe two novel RNA viruses Scaldis River bee virus (SRBV and Ganda bee virus (GABV based on their partial genomic sequences. The novel viruses belong to the class of negative-sense RNA viruses, SRBV is related to the order Mononegavirales whereas GABV is related to the family Bunyaviridae. The potential biological role of both viruses in bees is discussed in the context of recent advances in the field of arthropod viruses. Further, fragmentary sequence evidence for other undescribed viruses is presented, among which a nudivirus in O. cornuta and an unclassified virus related to Chronic bee paralysis virus in B. terrestris. Our findings extend the current knowledge of wild bee parasites in general and addsto the growing evidence of unexplored arthropod viruses in valuable insects.

  16. Unbiased RNA Shotgun Metagenomics in Social and Solitary Wild Bees Detects Associations with Eukaryote Parasites and New Viruses.

    Science.gov (United States)

    Schoonvaere, Karel; De Smet, Lina; Smagghe, Guy; Vierstraete, Andy; Braeckman, Bart P; de Graaf, Dirk C

    2016-01-01

    The diversity of eukaryote organisms and viruses associated with wild bees remains poorly characterized in contrast to the well-documented pathosphere of the western honey bee, Apis mellifera. Using a deliberate RNA shotgun metagenomic sequencing strategy in combination with a dedicated bioinformatics workflow, we identified the (micro-)organisms and viruses associated with two bumble bee hosts, Bombus terrestris and Bombus pascuorum, and two solitary bee hosts, Osmia cornuta and Andrena vaga. Ion Torrent semiconductor sequencing generated approximately 3.8 million high quality reads. The most significant eukaryote associations were two protozoan, Apicystis bombi and Crithidia bombi, and one nematode parasite Sphaerularia bombi in bumble bees. The trypanosome protozoan C. bombi was also found in the solitary bee O. cornuta. Next to the identification of three honey bee viruses Black queen cell virus, Sacbrood virus and Varroa destructor virus-1 and four plant viruses, we describe two novel RNA viruses Scaldis River bee virus (SRBV) and Ganda bee virus (GABV) based on their partial genomic sequences. The novel viruses belong to the class of negative-sense RNA viruses, SRBV is related to the order Mononegavirales whereas GABV is related to the family Bunyaviridae. The potential biological role of both viruses in bees is discussed in the context of recent advances in the field of arthropod viruses. Further, fragmentary sequence evidence for other undescribed viruses is presented, among which a nudivirus in O. cornuta and an unclassified virus related to Chronic bee paralysis virus in B. terrestris. Our findings extend the current knowledge of wild bee parasites in general and addsto the growing evidence of unexplored arthropod viruses in valuable insects.

  17. Dynamics of domain coverage of the protein sequence universe

    Science.gov (United States)

    2012-01-01

    Background The currently known protein sequence space consists of millions of sequences in public databases and is rapidly expanding. Assigning sequences to families leads to a better understanding of protein function and the nature of the protein universe. However, a large portion of the current protein space remains unassigned and is referred to as its “dark matter”. Results Here we suggest that true size of “dark matter” is much larger than stated by current definitions. We propose an approach to reducing the size of “dark matter” by identifying and subtracting regions in protein sequences that are not likely to contain any domain. Conclusions Recent improvements in computational domain modeling result in a decrease, albeit slowly, in the relative size of “dark matter”; however, its absolute size increases substantially with the growth of sequence data. PMID:23157439

  18. Dynamics of domain coverage of the protein sequence universe

    Directory of Open Access Journals (Sweden)

    Rekapalli Bhanu

    2012-11-01

    Full Text Available Abstract Background The currently known protein sequence space consists of millions of sequences in public databases and is rapidly expanding. Assigning sequences to families leads to a better understanding of protein function and the nature of the protein universe. However, a large portion of the current protein space remains unassigned and is referred to as its “dark matter”. Results Here we suggest that true size of “dark matter” is much larger than stated by current definitions. We propose an approach to reducing the size of “dark matter” by identifying and subtracting regions in protein sequences that are not likely to contain any domain. Conclusions Recent improvements in computational domain modeling result in a decrease, albeit slowly, in the relative size of “dark matter”; however, its absolute size increases substantially with the growth of sequence data.

  19. Comparative study of label and label-free techniques using shotgun proteomics for relative protein quantification.

    Science.gov (United States)

    Sjödin, Marcus O D; Wetterhall, Magnus; Kultima, Kim; Artemenko, Konstantin

    2013-06-01

    The analytical performance of three different strategies, iTRAQ (isobaric tag for relative and absolute quantification), dimethyl labeling (DML) and label free (LF) for relative protein quantification using shotgun proteomics have been evaluated. The methods have been explored using samples containing (i) Bovine proteins in known ratios and (ii) Bovine proteins in known ratios spiked into Escherichia coli. The latter case mimics the actual conditions in a typical biological sample with a few differentially expressed proteins and a bulk of proteins with unchanged ratios. Additionally, the evaluation was performed on both QStar and LTQ-FTICR mass spectrometers. LF LTQ-FTICR was found to have the highest proteome coverage while the highest accuracy based on the artificially regulated proteins was found for DML LTQ-FTICR (54%). A varying linearity (k: 0.55-1.16, r(2): 0.61-0.96) was shown for all methods within selected dynamic ranges. All methods were found to consistently underestimate Bovine protein ratios when matrix proteins were added. However, LF LTQ-FTICR was more tolerant toward a compression effect. A single peptide was demonstrated to be sufficient for a reliable quantification using iTRAQ. A ranking system utilizing several parameters important for quantitative proteomics demonstrated that the overall performance of the five different methods was; DML LTQ-FTICR>iTRAQ QStar>LF LTQ-FTICR>DML QStar>LF QStar. Copyright © 2013 Elsevier B.V. All rights reserved.

  20. Coverage analysis of lists of genes involved in heterogeneous ...

    Indian Academy of Sciences (India)

    The rationale of our study was to specifically evaluate sequence coverage using ... Catherine Badens and Martin Krahn contributed equally to this work. the analysis of ... (Life Technologies) with sequencing data processing using the Torrent ... map4 parameter, which is the default option to balance rapid- ity and maintain a ...

  1. Fine mapping of powdery mildew resistance genes PmTb7A.1 and PmTb7A.2 in Triticum boeoticum (Boiss.) using the shotgun sequence assembly of chromosome 7AL.

    Science.gov (United States)

    Chhuneja, Parveen; Yadav, Bharat; Stirnweis, Daniel; Hurni, Severine; Kaur, Satinder; Elkot, Ahmed Fawzy; Keller, Beat; Wicker, Thomas; Sehgal, Sunish; Gill, Bikram S; Singh, Kuldeep

    2015-10-01

    A novel powdery mildew resistance gene and a new allele of Pm1 were identified and fine mapped. DNA markers suitable for marker-assisted selection have been identified. Powdery mildew caused by Blumeria graminis is one of the most important foliar diseases of wheat and causes significant yield losses worldwide. Diploid A genome species are an important genetic resource for disease resistance genes. Two powdery mildew resistance genes, identified in Triticum boeoticum (A(b)A(b)) accession pau5088, PmTb7A.1 and PmTb7A.2 were mapped on chromosome 7AL. In the present study, shotgun sequence assembly data for chromosome 7AL were utilised for fine mapping of these Pm resistance genes. Forty SSR, 73 resistance gene analogue-based sequence-tagged sites (RGA-STS) and 36 single nucleotide polymorphism markers were designed for fine mapping of PmTb7A.1 and PmTb7A.2. Twenty-one RGA-STS, 8 SSR and 13 SNP markers were mapped to 7AL. RGA-STS markers Ta7AL-4556232 and 7AL-4426363 were linked to the PmTb7A.1 and PmTb7A.2, at a genetic distance of 0.6 and 6.0 cM, respectively. The present investigation established that PmTb7A.1 is a new powdery mildew resistance gene that confers resistance to a broad range of Bgt isolates, whereas PmTb7A.2 most probably is a new allele of Pm1 based on chromosomal location and screening with Bgt isolates showing differential reaction on lines with different Pm1 alleles. The markers identified to be linked to the two Pm resistance genes are robust and can be used for marker-assisted introgression of these genes to hexaploid wheat.

  2. Genome sequencing and annotation of multidrug resistant Mycobacterium tuberculosis (MDR-TB PR10 strain

    Directory of Open Access Journals (Sweden)

    Mohd Zakihalani A. Halim

    2016-03-01

    Full Text Available Here, we report the draft genome sequence and annotation of a multidrug resistant Mycobacterium tuberculosis strain PR10 (MDR-TB PR10 isolated from a patient diagnosed with tuberculosis. The size of the draft genome MDR-TB PR10 is 4.34 Mbp with 65.6% of G + C content and consists of 4637 predicted genes. The determinants were categorized by RAST into 400 subsystems with 4286 coding sequences and 50 RNAs. The whole genome shotgun project has been deposited at DDBJ/EMBL/GenBank under the accession number CP010968. Keywords: Mycobacterium tuberculosis, Genome, MDR, Extrapulmonary

  3. CAFE: aCcelerated Alignment-FrEe sequence analysis.

    Science.gov (United States)

    Lu, Yang Young; Tang, Kujin; Ren, Jie; Fuhrman, Jed A; Waterman, Michael S; Sun, Fengzhu

    2017-07-03

    Alignment-free genome and metagenome comparisons are increasingly important with the development of next generation sequencing (NGS) technologies. Recently developed state-of-the-art k-mer based alignment-free dissimilarity measures including CVTree, $d_2^*$ and $d_2^S$ are more computationally expensive than measures based solely on the k-mer frequencies. Here, we report a standalone software, aCcelerated Alignment-FrEe sequence analysis (CAFE), for efficient calculation of 28 alignment-free dissimilarity measures. CAFE allows for both assembled genome sequences and unassembled NGS shotgun reads as input, and wraps the output in a standard PHYLIP format. In downstream analyses, CAFE can also be used to visualize the pairwise dissimilarity measures, including dendrograms, heatmap, principal coordinate analysis and network display. CAFE serves as a general k-mer based alignment-free analysis platform for studying the relationships among genomes and metagenomes, and is freely available at https://github.com/younglululu/CAFE. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  4. Discovering and differentiating new and emerging clonal populations of Chlamydia trachomatis with a novel shotgun cell culture harvest assay.

    Science.gov (United States)

    Somboonna, Naraporn; Mead, Sally; Liu, Jessica; Dean, Deborah

    2008-03-01

    Chlamydia trachomatis is the leading cause of preventable blindness and bacterial sexually transmitted diseases worldwide. Plaque assays have been used to clonally segregate laboratory-adapted C. trachomatis strains from mixed infections, but no assays have been reported to segregate clones from recent clinical samples. We developed a novel shotgun cell culture harvest assay for this purpose because we found that recent clinical samples do not form plaques. Clones were strain-typed by using outer membrane protein A and 16S rRNA sequences. Surprisingly, ocular trachoma reference strain A/SA-1 contained clones of Chlamydophila abortus. C. abortus primarily infects ruminants and pigs and has never been identified in populations where trachoma is endemic. Three clonal variants of reference strain Ba/Apache-2 were also identified. Our findings reflect the importance of clonal isolation in identifying constituents of mixed infections containing new or emerging strains and of viable clones for research to more fully understand the dynamics of in vivo strain-mixing, evolution, and disease pathogenesis.

  5. Large pore dermal microdialysis and liquid chromatography-tandem mass spectroscopy shotgun proteomic analysis: a feasibility study

    DEFF Research Database (Denmark)

    Petersen, Lars J.; Sorensen, Mette A.; Codrea, Marius C.

    2013-01-01

    Background/AimsThe purpose of the present pilot study was to investigate the feasibility of combining large pore dermal microdialysis with shotgun proteomic analysis in human skin. MethodsDialysate was recovered from human skin by 2000 kDa microdialysis membranes from one subject at three different...

  6. Whole-Genome Sequence of Pseudomonas graminis Strain UASWS1507, a Potential Biological Control Agent and Biofertilizer Isolated in Switzerland.

    Science.gov (United States)

    Crovadore, Julien; Calmin, Gautier; Chablais, Romain; Cochard, Bastien; Schulz, Torsten; Lefort, François

    2016-10-06

    We report here the whole-genome shotgun sequence of the strain UASWS1507 of the species Pseudomonas graminis, isolated in Switzerland from an apple tree. This is the first genome registered for this species, which is considered as a potential and valuable resource of biological control agents and biofertilizers for agriculture. Copyright © 2016 Crovadore et al.

  7. Data on xylem sap proteins from Mn- and Fe-deficient tomato plants obtained using shotgun proteomics.

    Science.gov (United States)

    Ceballos-Laita, Laura; Gutierrez-Carbonell, Elain; Takahashi, Daisuke; Abadía, Anunciación; Uemura, Matsuo; Abadía, Javier; López-Millán, Ana Flor

    2018-04-01

    This article contains consolidated proteomic data obtained from xylem sap collected from tomato plants grown in Fe- and Mn-sufficient control, as well as Fe-deficient and Mn-deficient conditions. Data presented here cover proteins identified and quantified by shotgun proteomics and Progenesis LC-MS analyses: proteins identified with at least two peptides and showing changes statistically significant (ANOVA; p ≤ 0.05) and above a biologically relevant selected threshold (fold ≥ 2) between treatments are listed. The comparison between Fe-deficient, Mn-deficient and control xylem sap samples using a multivariate statistical data analysis (Principal Component Analysis, PCA) is also included. Data included in this article are discussed in depth in the research article entitled "Effects of Fe and Mn deficiencies on the protein profiles of tomato ( Solanum lycopersicum) xylem sap as revealed by shotgun analyses" [1]. This dataset is made available to support the cited study as well to extend analyses at a later stage.

  8. De Novo Assembly of Complete Chloroplast Genomes from Non-model Species Based on a K-mer Frequency-Based Selection of Chloroplast Reads from total DNA Sequences.

    NARCIS (Netherlands)

    Izan, Shairul; Esselink, G.; Visser, R.G.F.; Smulders, M.J.M.; Borm, T.J.A.

    2017-01-01

    Whole Genome Shotgun (WGS) sequences of plant species often contain an abundance of reads that are derived from the chloroplast genome. Up to now these reads have generally been identified and assembled into chloroplast genomes based on homology to chloroplasts from related species. This

  9. Sequencing of allotetraploid cotton (Gossypium hirsutum L. acc. TM-1) provides a resource for fiber improvement.

    Science.gov (United States)

    Zhang, Tianzhen; Hu, Yan; Jiang, Wenkai; Fang, Lei; Guan, Xueying; Chen, Jiedan; Zhang, Jinbo; Saski, Christopher A; Scheffler, Brian E; Stelly, David M; Hulse-Kemp, Amanda M; Wan, Qun; Liu, Bingliang; Liu, Chunxiao; Wang, Sen; Pan, Mengqiao; Wang, Yangkun; Wang, Dawei; Ye, Wenxue; Chang, Lijing; Zhang, Wenpan; Song, Qingxin; Kirkbride, Ryan C; Chen, Xiaoya; Dennis, Elizabeth; Llewellyn, Danny J; Peterson, Daniel G; Thaxton, Peggy; Jones, Don C; Wang, Qiong; Xu, Xiaoyang; Zhang, Hua; Wu, Huaitong; Zhou, Lei; Mei, Gaofu; Chen, Shuqi; Tian, Yue; Xiang, Dan; Li, Xinghe; Ding, Jian; Zuo, Qiyang; Tao, Linna; Liu, Yunchao; Li, Ji; Lin, Yu; Hui, Yuanyuan; Cao, Zhisheng; Cai, Caiping; Zhu, Xiefei; Jiang, Zhi; Zhou, Baoliang; Guo, Wangzhen; Li, Ruiqiang; Chen, Z Jeffrey

    2015-05-01

    Upland cotton is a model for polyploid crop domestication and transgenic improvement. Here we sequenced the allotetraploid Gossypium hirsutum L. acc. TM-1 genome by integrating whole-genome shotgun reads, bacterial artificial chromosome (BAC)-end sequences and genotype-by-sequencing genetic maps. We assembled and annotated 32,032 A-subgenome genes and 34,402 D-subgenome genes. Structural rearrangements, gene loss, disrupted genes and sequence divergence were more common in the A subgenome than in the D subgenome, suggesting asymmetric evolution. However, no genome-wide expression dominance was found between the subgenomes. Genomic signatures of selection and domestication are associated with positively selected genes (PSGs) for fiber improvement in the A subgenome and for stress tolerance in the D subgenome. This draft genome sequence provides a resource for engineering superior cotton lines.

  10. Tracembler – software for in-silico chromosome walking in unassembled genomes

    Directory of Open Access Journals (Sweden)

    Wilkerson Matthew D

    2007-05-01

    Full Text Available Abstract Background Whole genome shotgun sequencing produces increasingly higher coverage of a genome with random sequence reads. Progressive whole genome assembly and eventual finishing sequencing is a process that typically takes several years for large eukaryotic genomes. In the interim, all sequence reads of public sequencing projects are made available in repositories such as the NCBI Trace Archive. For a particular locus, sequencing coverage may be high enough early on to produce a reliable local genome assembly. We have developed software, Tracembler, that facilitates in silico chromosome walking by recursively assembling reads of a selected species from the NCBI Trace Archive starting with reads that significantly match sequence seeds supplied by the user. Results Tracembler takes one or multiple DNA or protein sequence(s as input to the NCBI Trace Archive BLAST engine to identify matching sequence reads from a species of interest. The BLAST searches are carried out recursively such that BLAST matching sequences identified in previous rounds of searches are used as new queries in subsequent rounds of BLAST searches. The recursive BLAST search stops when either no more new matching sequences are found, a given maximal number of queries is exhausted, or a specified maximum number of rounds of recursion is reached. All the BLAST matching sequences are then assembled into contigs based on significant sequence overlaps using the CAP3 program. We demonstrate the validity of the concept and software implementation with an example of successfully recovering a full-length Chrm2 gene as well as its upstream and downstream genomic regions from Rattus norvegicus reads. In a second example, a query with two adjacent Medicago truncatula genes as seeds resulted in a contig that likely identifies the microsyntenic homologous soybean locus. Conclusion Tracembler streamlines the process of recursive database searches, sequence assembly, and gene

  11. Uncertainty estimation of predictions of peptides' chromatographic retention times in shotgun proteomics.

    Science.gov (United States)

    Maboudi Afkham, Heydar; Qiu, Xuanbin; The, Matthew; Käll, Lukas

    2017-02-15

    Liquid chromatography is frequently used as a means to reduce the complexity of peptide-mixtures in shotgun proteomics. For such systems, the time when a peptide is released from a chromatography column and registered in the mass spectrometer is referred to as the peptide's retention time . Using heuristics or machine learning techniques, previous studies have demonstrated that it is possible to predict the retention time of a peptide from its amino acid sequence. In this paper, we are applying Gaussian Process Regression to the feature representation of a previously described predictor E lude . Using this framework, we demonstrate that it is possible to estimate the uncertainty of the prediction made by the model. Here we show how this uncertainty relates to the actual error of the prediction. In our experiments, we observe a strong correlation between the estimated uncertainty provided by Gaussian Process Regression and the actual prediction error. This relation provides us with new means for assessment of the predictions. We demonstrate how a subset of the peptides can be selected with lower prediction error compared to the whole set. We also demonstrate how such predicted standard deviations can be used for designing adaptive windowing strategies. lukas.kall@scilifelab.se. Our software and the data used in our experiments is publicly available and can be downloaded from https://github.com/statisticalbiotechnology/GPTime . © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  12. Genome sequences of six Phytophthora species associated with forests in New Zealand

    Science.gov (United States)

    Studholme, D.J.; McDougal, R.L.; Sambles, C.; Hansen, E.; Hardy, G.; Grant, M.; Ganley, R.J.; Williams, N.M.

    2015-01-01

    In New Zealand there has been a long association of Phytophthora diseases in forests, nurseries, remnant plantings and horticultural crops. However, new Phytophthora diseases of trees have recently emerged. Genome sequencing has been performed for 12 Phytophthora isolates, from six species: Phytophthora pluvialis, Phytophthora kernoviae, Phytophthora cinnamomi, Phytophthora agathidicida, Phytophthora multivora and Phytophthora taxon Totara. These sequences will enable comparative analyses to identify potential virulence strategies and ultimately facilitate better control strategies. This Whole Genome Shotgun data have been deposited in DDBJ/ENA/GenBank under the accession numbers LGTT00000000, LGTU00000000, JPWV00000000, JPWU00000000, LGSK00000000, LGSJ00000000, LGTR00000000, LGTS00000000, LGSM00000000, LGSL00000000, LGSO00000000, and LGSN00000000. PMID:26981359

  13. Whole genome sequencing of Mycobacterium tuberculosis SB24 isolated from Sabah, Malaysia

    Directory of Open Access Journals (Sweden)

    Noraini Philip

    2016-09-01

    Full Text Available Mycobacterium tuberculosis (M. tuberculosis is the causative agent of tuberculosis (TB that causes millions of death every year. We have sequenced the genome of M. tuberculosis isolated from cerebrospinal fluid (CSF of a patient diagnosed with tuberculous meningitis (TBM. The isolated strain was referred as M. tuberculosis SB24. Genomic DNA of the M. tuberculosis SB24 was extracted and subjected to whole genome sequencing using PacBio platform. The draft genome size of M. tuberculosis SB24 was determined to be 4,452,489 bp with a G + C content of 65.6%. The whole genome shotgun project has been deposited in NCBI SRA under the accession number SRP076503.

  14. Accurate and High-Coverage Immune Repertoire Sequencing Reveals Characteristics of Antibody Repertoire Diversification in Young Children with Malaria

    Science.gov (United States)

    Jiang, Ning

    Accurately measuring the immune repertoire sequence composition, diversity, and abundance is important in studying repertoire response in infections, vaccinations, and cancer immunology. Using molecular identifiers (MIDs) to tag mRNA molecules is an effective method in improving the accuracy of immune repertoire sequencing (IR-seq). However, it is still difficult to use IR-seq on small amount of clinical samples to achieve a high coverage of the repertoire diversities. This is especially challenging in studying infections and vaccinations where B cell subpopulations with fewer cells, such as memory B cells or plasmablasts, are often of great interest to study somatic mutation patterns and diversity changes. Here, we describe an approach of IR-seq based on the use of MIDs in combination with a clustering method that can reveal more than 80% of the antibody diversity in a sample and can be applied to as few as 1,000 B cells. We applied this to study the antibody repertoires of young children before and during an acute malaria infection. We discovered unexpectedly high levels of somatic hypermutation (SHM) in infants and revealed characteristics of antibody repertoire development in young children that would have a profound impact on immunization in children.

  15. Low coverage sequencing of three echinoderm genomes: the brittle star Ophionereis fasciata, the sea star Patiriella regularis, and the sea cucumber Australostichopus mollis.

    Science.gov (United States)

    Long, Kyle A; Nossa, Carlos W; Sewell, Mary A; Putnam, Nicholas H; Ryan, Joseph F

    2016-01-01

    There are five major extant groups of Echinodermata: Crinoidea (feather stars and sea lillies), Ophiuroidea (brittle stars and basket stars), Asteroidea (sea stars), Echinoidea (sea urchins, sea biscuits, and sand dollars), and Holothuroidea (sea cucumbers). These animals are known for their pentaradial symmetry as adults, unique water vascular system, mutable collagenous tissues, and endoskeletons of high magnesium calcite. To our knowledge, the only echinoderm species with a genome sequence available to date is Strongylocentrotus pupuratus (Echinoidea). The availability of additional echinoderm genome sequences is crucial for understanding the biology of these animals. Here we present assembled draft genomes of the brittle star Ophionereis fasciata, the sea star Patiriella regularis, and the sea cucumber Australostichopus mollis from Illumina sequence data with coverages of 12.5x, 22.5x, and 21.4x, respectively. These data provide a resource for mining gene superfamilies, identifying non-coding RNAs, confirming gene losses, and designing experimental constructs. They will be important comparative resources for future genomic studies in echinoderms.

  16. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.

    Science.gov (United States)

    Chin, Chen-Shan; Alexander, David H; Marks, Patrick; Klammer, Aaron A; Drake, James; Heiner, Cheryl; Clum, Alicia; Copeland, Alex; Huddleston, John; Eichler, Evan E; Turner, Stephen W; Korlach, Jonas

    2013-06-01

    We present a hierarchical genome-assembly process (HGAP) for high-quality de novo microbial genome assemblies using only a single, long-insert shotgun DNA library in conjunction with Single Molecule, Real-Time (SMRT) DNA sequencing. Our method uses the longest reads as seeds to recruit all other reads for construction of highly accurate preassembled reads through a directed acyclic graph-based consensus procedure, which we follow with assembly using off-the-shelf long-read assemblers. In contrast to hybrid approaches, HGAP does not require highly accurate raw reads for error correction. We demonstrate efficient genome assembly for several microorganisms using as few as three SMRT Cell zero-mode waveguide arrays of sequencing and for BACs using just one SMRT Cell. Long repeat regions can be successfully resolved with this workflow. We also describe a consensus algorithm that incorporates SMRT sequencing primary quality values to produce de novo genome sequence exceeding 99.999% accuracy.

  17. Somatosensory neuron types identified by high-coverage single-cell RNA-sequencing and functional heterogeneity

    Science.gov (United States)

    Li, Chang-Lin; Li, Kai-Cheng; Wu, Dan; Chen, Yan; Luo, Hao; Zhao, Jing-Rong; Wang, Sa-Shuang; Sun, Ming-Ming; Lu, Ying-Jin; Zhong, Yan-Qing; Hu, Xu-Ye; Hou, Rui; Zhou, Bei-Bei; Bao, Lan; Xiao, Hua-Sheng; Zhang, Xu

    2016-01-01

    Sensory neurons are distinguished by distinct signaling networks and receptive characteristics. Thus, sensory neuron types can be defined by linking transcriptome-based neuron typing with the sensory phenotypes. Here we classify somatosensory neurons of the mouse dorsal root ganglion (DRG) by high-coverage single-cell RNA-sequencing (10 950 ± 1 218 genes per neuron) and neuron size-based hierarchical clustering. Moreover, single DRG neurons responding to cutaneous stimuli are recorded using an in vivo whole-cell patch clamp technique and classified by neuron-type genetic markers. Small diameter DRG neurons are classified into one type of low-threshold mechanoreceptor and five types of mechanoheat nociceptors (MHNs). Each of the MHN types is further categorized into two subtypes. Large DRG neurons are categorized into four types, including neurexophilin 1-expressing MHNs and mechanical nociceptors (MNs) expressing BAI1-associated protein 2-like 1 (Baiap2l1). Mechanoreceptors expressing trafficking protein particle complex 3-like and Baiap2l1-marked MNs are subdivided into two subtypes each. These results provide a new system for cataloging somatosensory neurons and their transcriptome databases. PMID:26691752

  18. Caught in the middle with multiple displacement amplification: the myth of pooling for avoiding multiple displacement amplification bias in a metagenome.

    Science.gov (United States)

    Marine, Rachel; McCarren, Coleen; Vorrasane, Vansay; Nasko, Dan; Crowgey, Erin; Polson, Shawn W; Wommack, K Eric

    2014-01-30

    Shotgun metagenomics has become an important tool for investigating the ecology of microorganisms. Underlying these investigations is the assumption that metagenome sequence data accurately estimates the census of microbial populations. Multiple displacement amplification (MDA) of microbial community DNA is often used in cases where it is difficult to obtain enough DNA for sequencing; however, MDA can result in amplification biases that may impact subsequent estimates of population census from metagenome data. Some have posited that pooling replicate MDA reactions negates these biases and restores the accuracy of population analyses. This assumption has not been empirically tested. Using mock viral communities, we examined the influence of pooling on population-scale analyses. In pooled and single reaction MDA treatments, sequence coverage of viral populations was highly variable and coverage patterns across viral genomes were nearly identical, indicating that initial priming biases were reproducible and that pooling did not alleviate biases. In contrast, control unamplified sequence libraries showed relatively even coverage across phage genomes. MDA should be avoided for metagenomic investigations that require quantitative estimates of microbial taxa and gene functional groups. While MDA is an indispensable technique in applications such as single-cell genomics, amplification biases cannot be overcome by combining replicate MDA reactions. Alternative library preparation techniques should be utilized for quantitative microbial ecology studies utilizing metagenomic sequencing approaches.

  19. Draft genome sequence of Sclerospora graminicola, the pearl millet downy mildew pathogen

    Directory of Open Access Journals (Sweden)

    Navajeet Chakravartty

    2017-12-01

    Full Text Available Sclerospora graminicola pathogen is the most important biotic production constraints of pearl millet in India, Africa and other parts of the world. We report a de novo whole genome assembly and analysis of pathotype 1, one of the most virulent pathotypes of S. graminicola from India. The whole genome sequencing was performed by sequencing of 7.38 Gb with 73,889,924 paired end reads from the paired-end library, and 1.15 Gb with 3,851,788 reads from the mate pair library generated from Illumina HiSeq 2500 and Illumina MiSeq, respectively. A total 597,293 filtered sub reads with average read length of 6.39 Kb was generated on PACBIO RSII with P6-C4 chemistry. Assembled draft genome sequence of S. graminicola pathotype 1 was 299,901,251 bp in length, N50 of 17,909 bp with a minimum of 1 Kb scaffold size. The GC content was 47.2 % consisting of 26,786 scaffolds with longest scaffold size of 238,843 bp. The overall coverage was 40X. The draft genome sequence was used for gene prediction using AUGUSTUS which resulted in 65,404 genes using Saccharomyces cerevisiae as a model. A total of 52,285 predicted genes found homology using BLASTX against nr database and 38,120 genes were observed with a significant BLASTX match with E-value cutoff of 1e-5 and 40% identity percentage. Out of 38,120 genes annotated a set of 11,873 genes had UniProt entries, while 7,248 were GO terms and 9,686 with KEGG IDs. Of the 7,248 GO terms, 2,724 were associated with the biological processes. The genome information of downy mildew pathogen is available in the NCBI GenBank database. The Sclerospora graminicola whole genome shotgun (WGS project has the project accession MIQA00000000. This version of the project (02 has the accession number MIQA02000000, and consists of sequences MIQA02000001-MIQA02026786, with BioProject ID PRJNA325098 and BioSample ID SAMN05219233. This study may help understand the evolutionary pattern of pathogen and aid elucidation of effector evolution for

  20. Genome Sequence of the Palaeopolyploid soybean

    Energy Technology Data Exchange (ETDEWEB)

    Schmutz, Jeremy; Cannon, Steven B.; Schlueter, Jessica; Ma, Jianxin; Mitros, Therese; Nelson, William; Hyten, David L.; Song, Qijian; Thelen, Jay J.; Cheng, Jianlin; Xu, Dong; Hellsten, Uffe; May, Gregory D.; Yu, Yeisoo; Sakura, Tetsuya; Umezawa, Taishi; Bhattacharyya, Madan K.; Sandhu, Devinder; Valliyodan, Babu; Lindquist, Erika; Peto, Myron; Grant, David; Shu, Shengqiang; Goodstein, David; Barry, Kerrie; Futrell-Griggs, Montona; Abernathy, Brian; Du, Jianchang; Tian, Zhixi; Zhu, Liucun; Gill, Navdeep; Joshi, Trupti; Libault, Marc; Sethuraman, Anand; Zhang, Xue-Cheng; Shinozaki, Kazuo; Nguyen, Henry T.; Wing, Rod A.; Cregan, Perry; Specht, James; Grimwood, Jane; Rokhsar, Dan; Stacey, Gary; Shoemaker, Randy C.; Jackson, Scott A.

    2009-08-03

    Soybean (Glycine max) is one of the most important crop plants for seed protein and oil content, and for its capacity to fix atmospheric nitrogen through symbioses with soil-borne microorganisms. We sequenced the 1.1-gigabase genome by a whole-genome shotgun approach and integrated it with physical and high-density genetic maps to create a chromosome-scale draft sequence assembly. We predict 46,430 protein-coding genes, 70percent more than Arabidopsis and similar to the poplar genome which, like soybean, is an ancient polyploid (palaeopolyploid). About 78percent of the predicted genes occur in chromosome ends, which comprise less than one-half of the genome but account for nearly all of the genetic recombination. Genome duplications occurred at approximately 59 and 13 million years ago, resulting in a highly duplicated genome with nearly 75percent of the genes present in multiple copies. The two duplication events were followed by gene diversification and loss, and numerous chromosome rearrangements. An accurate soybean genome sequence will facilitate the identification of the genetic basis of many soybean traits, and accelerate the creation of improved soybean varieties.

  1. A framework for intelligent data acquisition and real-time database searching for shotgun proteomics.

    Science.gov (United States)

    Graumann, Johannes; Scheltema, Richard A; Zhang, Yong; Cox, Jürgen; Mann, Matthias

    2012-03-01

    In the analysis of complex peptide mixtures by MS-based proteomics, many more peptides elute at any given time than can be identified and quantified by the mass spectrometer. This makes it desirable to optimally allocate peptide sequencing and narrow mass range quantification events. In computer science, intelligent agents are frequently used to make autonomous decisions in complex environments. Here we develop and describe a framework for intelligent data acquisition and real-time database searching and showcase selected examples. The intelligent agent is implemented in the MaxQuant computational proteomics environment, termed MaxQuant Real-Time. It analyzes data as it is acquired on the mass spectrometer, constructs isotope patterns and SILAC pair information as well as controls MS and tandem MS events based on real-time and prior MS data or external knowledge. Re-implementing a top10 method in the intelligent agent yields similar performance to the data dependent methods running on the mass spectrometer itself. We demonstrate the capabilities of MaxQuant Real-Time by creating a real-time search engine capable of identifying peptides "on-the-fly" within 30 ms, well within the time constraints of a shotgun fragmentation "topN" method. The agent can focus sequencing events onto peptides of specific interest, such as those originating from a specific gene ontology (GO) term, or peptides that are likely modified versions of already identified peptides. Finally, we demonstrate enhanced quantification of SILAC pairs whose ratios were poorly defined in survey spectra. MaxQuant Real-Time is flexible and can be applied to a large number of scenarios that would benefit from intelligent, directed data acquisition. Our framework should be especially useful for new instrument types, such as the quadrupole-Orbitrap, that are currently becoming available.

  2. Vaccination coverage and out-of-sequence vaccinations in rural Guinea-Bissau

    DEFF Research Database (Denmark)

    Hornshøj, Linda; Benn, Christine Stabell; Fernandes, Manuel

    2012-01-01

    OBJECTIVE: The WHO aims for 90% coverage of the Expanded Program on Immunization (EPI), which in Guinea-Bissau included BCG vaccine at birth, three doses of diphtheria-tetanus-pertussis vaccine (DTP) and oral polio vaccine (OPV) at 6, 10 and 14 weeks and measles vaccine (MV) at 9 months when...

  3. High-content screening of yeast mutant libraries by shotgun lipidomics

    DEFF Research Database (Denmark)

    Tarasov, Kirill; Stefanko, Adam; Casanovas, Albert

    2014-01-01

    To identify proteins with a functional role in lipid metabolism and homeostasis we designed a high-throughput platform for high-content lipidomic screening of yeast mutant libraries. To this end, we combined culturing and lipid extraction in 96-well format, automated direct infusion...... factor KAR4 precipitated distinct lipid metabolic phenotypes. These results demonstrate that the high-throughput shotgun lipidomics platform is a valid and complementary proxy for high-content screening of yeast mutant libraries....... nanoelectrospray ionization, high-resolution Orbitrap mass spectrometry, and a dedicated data processing framework to support lipid phenotyping across hundreds of Saccharomyces cerevisiae mutants. Our novel approach revealed that the absence of genes with unknown function YBR141C and YJR015W, and the transcription...

  4. Simple sequence repeat marker development from bacterial artificial chromosome end sequences and expressed sequence tags of flax (Linum usitatissimum L.).

    Science.gov (United States)

    Cloutier, Sylvie; Miranda, Evelyn; Ward, Kerry; Radovanovic, Natasa; Reimer, Elsa; Walichnowski, Andrzej; Datla, Raju; Rowland, Gordon; Duguid, Scott; Ragupathy, Raja

    2012-08-01

    Flax is an important oilseed crop in North America and is mostly grown as a fibre crop in Europe. As a self-pollinated diploid with a small estimated genome size of ~370 Mb, flax is well suited for fast progress in genomics. In the last few years, important genetic resources have been developed for this crop. Here, we describe the assessment and comparative analyses of 1,506 putative simple sequence repeats (SSRs) of which, 1,164 were derived from BAC-end sequences (BESs) and 342 from expressed sequence tags (ESTs). The SSRs were assessed on a panel of 16 flax accessions with 673 (58 %) and 145 (42 %) primer pairs being polymorphic in the BESs and ESTs, respectively. With 818 novel polymorphic SSR primer pairs reported in this study, the repertoire of available SSRs in flax has more than doubled from the combined total of 508 of all previous reports. Among nucleotide motifs, trinucleotides were the most abundant irrespective of the class, but dinucleotides were the most polymorphic. SSR length was also positively correlated with polymorphism. Two dinucleotide (AT/TA and AG/GA) and two trinucleotide (AAT/ATA/TAA and GAA/AGA/AAG) motifs and their iterations, different from those reported in many other crops, accounted for more than half of all the SSRs and were also more polymorphic (63.4 %) than the rest of the markers (42.7 %). This improved resource promises to be useful in genetic, quantitative trait loci (QTL) and association mapping as well as for anchoring the physical/genetic map with the whole genome shotgun reference sequence of flax.

  5. Common contaminants in next-generation sequencing that hinder discovery of low-abundance microbes.

    Directory of Open Access Journals (Sweden)

    Martin Laurence

    Full Text Available Unbiased high-throughput sequencing of whole metagenome shotgun DNA libraries is a promising new approach to identifying microbes in clinical specimens, which, unlike other techniques, is not limited to known sequences. Unlike most sequencing applications, it is highly sensitive to laboratory contaminants as these will appear to originate from the clinical specimens. To assess the extent and diversity of sequence contaminants, we aligned 57 "1000 Genomes Project" sequencing runs from six centers against the four largest NCBI BLAST databases, detecting reads of diverse contaminant species in all runs and identifying the most common of these contaminant genera (Bradyrhizobium in assembled genomes from the NCBI Genome database. Many of these microorganisms have been reported as contaminants of ultrapure water systems. Studies aiming to identify novel microbes in clinical specimens will greatly benefit from not only preventive measures such as extensive UV irradiation of water and cross-validation using independent techniques, but also a concerted effort to sequence the complete genomes of common contaminants so that they may be subtracted computationally.

  6. CNNdel: Calling Structural Variations on Low Coverage Data Based on Convolutional Neural Networks

    Directory of Open Access Journals (Sweden)

    Jing Wang

    2017-01-01

    Full Text Available Many structural variations (SVs detection methods have been proposed due to the popularization of next-generation sequencing (NGS. These SV calling methods use different SV-property-dependent features; however, they all suffer from poor accuracy when running on low coverage sequences. The union of results from these tools achieves fairly high sensitivity but still produces low accuracy on low coverage sequence data. That is, these methods contain many false positives. In this paper, we present CNNdel, an approach for calling deletions from paired-end reads. CNNdel gathers SV candidates reported by multiple tools and then extracts features from aligned BAM files at the positions of candidates. With labeled feature-expressed candidates as a training set, CNNdel trains convolutional neural networks (CNNs to distinguish true unlabeled candidates from false ones. Results show that CNNdel works well with NGS reads from 26 low coverage genomes of the 1000 Genomes Project. The paper demonstrates that convolutional neural networks can automatically assign the priority of SV features and reduce the false positives efficaciously.

  7. Exome sequencing and genetic testing for MODY.

    Directory of Open Access Journals (Sweden)

    Stefan Johansson

    Full Text Available Genetic testing for monogenic diabetes is important for patient care. Given the extensive genetic and clinical heterogeneity of diabetes, exome sequencing might provide additional diagnostic potential when standard Sanger sequencing-based diagnostics is inconclusive.The aim of the study was to examine the performance of exome sequencing for a molecular diagnosis of MODY in patients who have undergone conventional diagnostic sequencing of candidate genes with negative results.We performed exome enrichment followed by high-throughput sequencing in nine patients with suspected MODY. They were Sanger sequencing-negative for mutations in the HNF1A, HNF4A, GCK, HNF1B and INS genes. We excluded common, non-coding and synonymous gene variants, and performed in-depth analysis on filtered sequence variants in a pre-defined set of 111 genes implicated in glucose metabolism.On average, we obtained 45 X median coverage of the entire targeted exome and found 199 rare coding variants per individual. We identified 0-4 rare non-synonymous and nonsense variants per individual in our a priori list of 111 candidate genes. Three of the variants were considered pathogenic (in ABCC8, HNF4A and PPARG, respectively, thus exome sequencing led to a genetic diagnosis in at least three of the nine patients. Approximately 91% of known heterozygous SNPs in the target exomes were detected, but we also found low coverage in some key diabetes genes using our current exome sequencing approach. Novel variants in the genes ARAP1, GLIS3, MADD, NOTCH2 and WFS1 need further investigation to reveal their possible role in diabetes.Our results demonstrate that exome sequencing can improve molecular diagnostics of MODY when used as a complement to Sanger sequencing. However, improvements will be needed, especially concerning coverage, before the full potential of exome sequencing can be realized.

  8. The diploid genome sequence of an Asian individual

    DEFF Research Database (Denmark)

    Wang, Jun; Wang, Wei; Li, Ruiqiang

    2008-01-01

    Here we present the first diploid genome sequence of an Asian individual. The genome was sequenced to 36-fold average coverage using massively parallel sequencing technology. We aligned the short reads onto the NCBI human reference genome to 99.97% coverage, and guided by the reference genome, we...... used uniquely mapped reads to assemble a high-quality consensus sequence for 92% of the Asian individual's genome. We identified approximately 3 million single-nucleotide polymorphisms (SNPs) inside this region, of which 13.6% were not in the dbSNP database. Genotyping analysis showed that SNP...... identification had high accuracy and consistency, indicating the high sequence quality of this assembly. We also carried out heterozygote phasing and haplotype prediction against HapMap CHB and JPT haplotypes (Chinese and Japanese, respectively), sequence comparison with the two available individual genomes (J...

  9. ReAS: Recovery of ancestral sequences for transposable elements from the unassembled reads of a whole genome shotgun

    DEFF Research Database (Denmark)

    Li, Ruiqiang; Ye, Jia; Li, Songgang

    2005-01-01

    in comparison to their ancestral sequences. Tested on the japonica rice genome, ReAS was able to reconstruct all of the high copy sequences in the Repbase repository of known TEs, and increase the effectiveness of RepeatMasker in identifying TEs from genome sequences. Udgivelsesdato: 2005-Sep...

  10. Plastic litter from shotgun ammunition on Danish coastlines - Amounts and provenance.

    Science.gov (United States)

    Kanstrup, Niels; Balsby, Thorsten J S

    2018-06-01

    Plastic litter in the marine environment is a major global issue. Discarded plastic shotgun ammunition shells and discharged wads are an unwelcome addition and feature among the top ten litter items found on reference beaches in Denmark. To understand this problem, its scale and origins, collections were made by volunteers along Danish coastal shorelines. In all 3669 plastic ammunition items were collected at 68 sites along 44.6 km of shoreline. The collected items were scored for characteristic variables such as gauge and length, shot type, and the legibility of text, the erosion, and the presence of metallic components. Scores for characteristics were related to the site, area, and season and possible influences discussed. The prevalence of collected plastic shotgun litter ranges from zero to 41 items per 100 m with an average of 3.7 items per 100 m. Most ammunition litter on Danish coasts originates from hunting on Danish coastal waterbodies, but a small amount may come from further afield. North Sea coasts are the most distinctive suggesting the possible contribution of long distance drift as well as the likelihood that such litter can persist in marine habitats for decades. The pathway from initial discard to eventual wash-up and collection depends on the physical properties of plastic components, marine tides and currents, coastal topography and shoreline vegetation. Judging from the disintegration of the cartridge and the wear and decomposition of components, we conclude that there is a substantial supply of polluting plastic ammunition materials that has and will accumulate. These plastic items pose a hazard to marine ecosystems and wash up on coasts for many years to come. We recommend that responsible managers, hunters and ammunition manufacturers will take action now to reduce the problem and, thereby, protect ecosystems, wildlife and the sustainability of hunting. Copyright © 2018 Elsevier Ltd. All rights reserved.

  11. Implementation of Targeted Next Generation Sequencing in Clinical Diagnostics

    DEFF Research Database (Denmark)

    Larsen, Martin Jakob; Burton, Mark; Thomassen, Mads

    Accurate mutation detection is essential in clinical genetic diagnostics of monogenic hereditary diseases. Targeted next generation sequencing (NGS) provides a promising and cost-effective alternative to Sanger sequencing and MLPA analysis currently used in most diagnostic laboratories. One...... of mutation positive controls previously characterized by Sanger/MLPA analysis. Agilent SureSelect Target-Enrichment kits were used for capturing a set of genes associated with hereditary breast and ovarian cancer syndrome and a compilation of genes involved in multiple rare single gene disorders......, respectively. For diagnostics, the sequencing coverage is essential, wherefore a minimum coverage of 30x per nucleotide in the coding regions was used as our primary quality criterion. For the majority of the included genes, we obtained adequate gene coverage, in which we were able to detect 100% of the known...

  12. The temporal analysis of yeast exponential phase using shotgun proteomics as a fermentation monitoring technique.

    Science.gov (United States)

    Huang, Eric L; Orsat, Valérie; Shah, Manesh B; Hettich, Robert L; VerBerkmoes, Nathan C; Lefsrud, Mark G

    2012-09-18

    System biology and bioprocess technology can be better understood using shotgun proteomics as a monitoring system during the fermentation. We demonstrated a shotgun proteomic method to monitor the temporal yeast proteome in early, middle and late exponential phases. Our study identified a total of 1389 proteins combining all 2D-LC-MS/MS runs. The temporal Saccharomyces cerevisiae proteome was enriched with proteolysis, radical detoxification, translation, one-carbon metabolism, glycolysis and TCA cycle. Heat shock proteins and proteins associated with oxidative stress response were found throughout the exponential phase. The most abundant proteins observed were translation elongation factors, ribosomal proteins, chaperones and glycolytic enzymes. The high abundance of the H-protein of the glycine decarboxylase complex (Gcv3p) indicated the availability of glycine in the environment. We observed differentially expressed proteins and the induced proteins at mid-exponential phase were involved in ribosome biogenesis, mitochondria DNA binding/replication and transcriptional activator. Induction of tryptophan synthase (Trp5p) indicated the abundance of tryptophan during the fermentation. As fermentation progressed toward late exponential phase, a decrease in cell proliferation was implied from the repression of ribosomal proteins, transcription coactivators, methionine aminopeptidase and translation-associated proteins. Copyright © 2012 Elsevier B.V. All rights reserved.

  13. Transcriptome sequencing of the Microarray Quality Control (MAQC RNA reference samples using next generation sequencing

    Directory of Open Access Journals (Sweden)

    Thierry-Mieg Danielle

    2009-06-01

    Full Text Available Abstract Background Transcriptome sequencing using next-generation sequencing platforms will soon be competing with DNA microarray technologies for global gene expression analysis. As a preliminary evaluation of these promising technologies, we performed deep sequencing of cDNA synthesized from the Microarray Quality Control (MAQC reference RNA samples using Roche's 454 Genome Sequencer FLX. Results We generated more that 3.6 million sequence reads of average length 250 bp for the MAQC A and B samples and introduced a data analysis pipeline for translating cDNA read counts into gene expression levels. Using BLAST, 90% of the reads mapped to the human genome and 64% of the reads mapped to the RefSeq database of well annotated genes with e-values ≤ 10-20. We measured gene expression levels in the A and B samples by counting the numbers of reads that mapped to individual RefSeq genes in multiple sequencing runs to evaluate the MAQC quality metrics for reproducibility, sensitivity, specificity, and accuracy and compared the results with DNA microarrays and Quantitative RT-PCR (QRTPCR from the MAQC studies. In addition, 88% of the reads were successfully aligned directly to the human genome using the AceView alignment programs with an average 90% sequence similarity to identify 137,899 unique exon junctions, including 22,193 new exon junctions not yet contained in the RefSeq database. Conclusion Using the MAQC metrics for evaluating the performance of gene expression platforms, the ExpressSeq results for gene expression levels showed excellent reproducibility, sensitivity, and specificity that improved systematically with increasing shotgun sequencing depth, and quantitative accuracy that was comparable to DNA microarrays and QRTPCR. In addition, a careful mapping of the reads to the genome using the AceView alignment programs shed new light on the complexity of the human transcriptome including the discovery of thousands of new splice variants.

  14. Bounds on the distribution of the number of gaps when circles and lines are covered by fragments: Theory and practical application to genomic and metagenomic projects

    Directory of Open Access Journals (Sweden)

    Marchesi Julian R

    2007-03-01

    Full Text Available Abstract Background The question of how a circle or line segment becomes covered when random arcs are marked off has arisen repeatedly in bioinformatics. The number of uncovered gaps is of particular interest. Approximate distributions for the number of gaps have been given in the literature, one motivation being ease of computation. Error bounds for these approximate distributions have not been given. Results We give bounds on the probability distribution of the number of gaps when a circle is covered by fragments of fixed size. The absolute error in the approximation is typically on the order of 0.1% at 10× coverage depth. The method can be applied to coverage problems on the interval, including edge effects, and applications are given to metagenomic libraries and shotgun sequencing.

  15. A Snapshot of the Emerging Tomato Genome Sequence

    Directory of Open Access Journals (Sweden)

    Lukas A. Mueller

    2009-03-01

    Full Text Available The genome of tomato ( L. is being sequenced by an international consortium of 10 countries (Korea, China, the United Kingdom, India, the Netherlands, France, Japan, Spain, Italy, and the United States as part of the larger “International Solanaceae Genome Project (SOL: Systems Approach to Diversity and Adaptation” initiative. The tomato genome sequencing project uses an ordered bacterial artificial chromosome (BAC approach to generate a high-quality tomato euchromatic genome sequence for use as a reference genome for the Solanaceae and euasterids. Sequence is deposited at GenBank and at the SOL Genomics Network (SGN. Currently, there are around 1000 BACs finished or in progress, representing more than a third of the projected euchromatic portion of the genome. An annotation effort is also underway by the International Tomato Annotation Group. The expected number of genes in the euchromatin is ∼40,000, based on an estimate from a preliminary annotation of 11% of finished sequence. Here, we present this first snapshot of the emerging tomato genome and its annotation, a short comparison with potato ( L. sequence data, and the tools available for the researchers to exploit this new resource are also presented. In the future, whole-genome shotgun techniques will be combined with the BAC-by-BAC approach to cover the entire tomato genome. The high-quality reference euchromatic tomato sequence is expected to be near completion by 2010.

  16. COCACOLA: binning metagenomic contigs using sequence COmposition, read CoverAge, CO-alignment and paired-end read LinkAge.

    Science.gov (United States)

    Lu, Yang Young; Chen, Ting; Fuhrman, Jed A; Sun, Fengzhu

    2017-03-15

    The advent of next-generation sequencing technologies enables researchers to sequence complex microbial communities directly from the environment. Because assembly typically produces only genome fragments, also known as contigs, instead of an entire genome, it is crucial to group them into operational taxonomic units (OTUs) for further taxonomic profiling and down-streaming functional analysis. OTU clustering is also referred to as binning. We present COCACOLA, a general framework automatically bin contigs into OTUs based on sequence composition and coverage across multiple samples. The effectiveness of COCACOLA is demonstrated in both simulated and real datasets in comparison with state-of-art binning approaches such as CONCOCT, GroopM, MaxBin and MetaBAT. The superior performance of COCACOLA relies on two aspects. One is using L 1 distance instead of Euclidean distance for better taxonomic identification during initialization. More importantly, COCACOLA takes advantage of both hard clustering and soft clustering by sparsity regularization. In addition, the COCACOLA framework seamlessly embraces customized knowledge to facilitate binning accuracy. In our study, we have investigated two types of additional knowledge, the co-alignment to reference genomes and linkage of contigs provided by paired-end reads, as well as the ensemble of both. We find that both co-alignment and linkage information further improve binning in the majority of cases. COCACOLA is scalable and faster than CONCOCT, GroopM, MaxBin and MetaBAT. The software is available at https://github.com/younglululu/COCACOLA . fsun@usc.edu. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  17. The de novo assembly of mitochondrial genomes of the extinct passenger pigeon (Ectopistes migratorius with next generation sequencing.

    Directory of Open Access Journals (Sweden)

    Chih-Ming Hung

    Full Text Available The information from ancient DNA (aDNA provides an unparalleled opportunity to infer phylogenetic relationships and population history of extinct species and to investigate genetic evolution directly. However, the degraded and fragmented nature of aDNA has posed technical challenges for studies based on conventional PCR amplification. In this study, we present an approach based on next generation sequencing to efficiently sequence the complete mitochondrial genome (mitogenome of two extinct passenger pigeons (Ectopistes migratorius using de novo assembly of massive short (90 bp, paired-end or single-end reads. Although varying levels of human contamination and low levels of postmortem nucleotide lesion were observed, they did not impact sequencing accuracy. Our results demonstrated that the de novo assembly of shotgun sequence reads could be a potent approach to sequence mitogenomes, and offered an efficient way to infer evolutionary history of extinct species.

  18. The De Novo Assembly of Mitochondrial Genomes of the Extinct Passenger Pigeon (Ectopistes migratorius) with Next Generation Sequencing

    Science.gov (United States)

    Hung, Chih-Ming; Lin, Rong-Chien; Chu, Jui-Hua; Yeh, Chia-Fen; Yao, Chiou-Ju; Li, Shou-Hsien

    2013-01-01

    The information from ancient DNA (aDNA) provides an unparalleled opportunity to infer phylogenetic relationships and population history of extinct species and to investigate genetic evolution directly. However, the degraded and fragmented nature of aDNA has posed technical challenges for studies based on conventional PCR amplification. In this study, we present an approach based on next generation sequencing to efficiently sequence the complete mitochondrial genome (mitogenome) of two extinct passenger pigeons (Ectopistes migratorius) using de novo assembly of massive short (90 bp), paired-end or single-end reads. Although varying levels of human contamination and low levels of postmortem nucleotide lesion were observed, they did not impact sequencing accuracy. Our results demonstrated that the de novo assembly of shotgun sequence reads could be a potent approach to sequence mitogenomes, and offered an efficient way to infer evolutionary history of extinct species. PMID:23437111

  19. Two low coverage bird genomes and a comparison of reference-guided versus de novo genome assemblies.

    Science.gov (United States)

    Card, Daren C; Schield, Drew R; Reyes-Velasco, Jacobo; Fujita, Matthew K; Andrew, Audra L; Oyler-McCance, Sara J; Fike, Jennifer A; Tomback, Diana F; Ruggiero, Robert P; Castoe, Todd A

    2014-01-01

    As a greater number and diversity of high-quality vertebrate reference genomes become available, it is increasingly feasible to use these references to guide new draft assemblies for related species. Reference-guided assembly approaches may substantially increase the contiguity and completeness of a new genome using only low levels of genome coverage that might otherwise be insufficient for de novo genome assembly. We used low-coverage (∼3.5-5.5x) Illumina paired-end sequencing to assemble draft genomes of two bird species (the Gunnison Sage-Grouse, Centrocercus minimus, and the Clark's Nutcracker, Nucifraga columbiana). We used these data to estimate de novo genome assemblies and reference-guided assemblies, and compared the information content and completeness of these assemblies by comparing CEGMA gene set representation, repeat element content, simple sequence repeat content, and GC isochore structure among assemblies. Our results demonstrate that even lower-coverage genome sequencing projects are capable of producing informative and useful genomic resources, particularly through the use of reference-guided assemblies.

  20. Two low coverage bird genomes and a comparison of reference-guided versus de novo genome assemblies

    Science.gov (United States)

    Card, Daren C.; Schield, Drew R.; Reyes-Velasco, Jacobo; Fujita, Matthre K.; Andrew, Audra L.; Oyler-McCance, Sara J.; Fike, Jennifer A.; Tomback, Diana F.; Ruggiero, Robert P.; Castoe, Todd A.

    2014-01-01

    As a greater number and diversity of high-quality vertebrate reference genomes become available, it is increasingly feasible to use these references to guide new draft assemblies for related species. Reference-guided assembly approaches may substantially increase the contiguity and completeness of a new genome using only low levels of genome coverage that might otherwise be insufficient for de novo genome assembly. We used low-coverage (~3.5–5.5x) Illumina paired-end sequencing to assemble draft genomes of two bird species (the Gunnison Sage-Grouse, Centrocercus minimus, and the Clark's Nutcracker, Nucifraga columbiana). We used these data to estimate de novo genome assemblies and reference-guided assemblies, and compared the information content and completeness of these assemblies by comparing CEGMA gene set representation, repeat element content, simple sequence repeat content, and GC isochore structure among assemblies. Our results demonstrate that even lower-coverage genome sequencing projects are capable of producing informative and useful genomic resources, particularly through the use of reference-guided assemblies.

  1. Highly multiplexed targeted DNA sequencing from single nuclei.

    Science.gov (United States)

    Leung, Marco L; Wang, Yong; Kim, Charissa; Gao, Ruli; Jiang, Jerry; Sei, Emi; Navin, Nicholas E

    2016-02-01

    Single-cell DNA sequencing methods are challenged by poor physical coverage, high technical error rates and low throughput. To address these issues, we developed a single-cell DNA sequencing protocol that combines flow-sorting of single nuclei, time-limited multiple-displacement amplification (MDA), low-input library preparation, DNA barcoding, targeted capture and next-generation sequencing (NGS). This approach represents a major improvement over our previous single nucleus sequencing (SNS) Nature Protocols paper in terms of generating higher-coverage data (>90%), thereby enabling the detection of genome-wide variants in single mammalian cells at base-pair resolution. Furthermore, by pooling 48-96 single-cell libraries together for targeted capture, this approach can be used to sequence many single-cell libraries in parallel in a single reaction. This protocol greatly reduces the cost of single-cell DNA sequencing, and it can be completed in 5-6 d by advanced users. This single-cell DNA sequencing protocol has broad applications for studying rare cells and complex populations in diverse fields of biological research and medicine.

  2. Protein identification and quantification from riverbank grape, Vitis riparia: Comparing SDS-PAGE and FASP-GPF techniques for shotgun proteomic analysis.

    Science.gov (United States)

    George, Iniga S; Fennell, Anne Y; Haynes, Paul A

    2015-09-01

    Protein sample preparation optimisation is critical for establishing reproducible high throughput proteomic analysis. In this study, two different fractionation sample preparation techniques (in-gel digestion and in-solution digestion) for shotgun proteomics were used to quantitatively compare proteins identified in Vitis riparia leaf samples. The total number of proteins and peptides identified were compared between filter aided sample preparation (FASP) coupled with gas phase fractionation (GPF) and SDS-PAGE methods. There was a 24% increase in the total number of reproducibly identified proteins when FASP-GPF was used. FASP-GPF is more reproducible, less expensive and a better method than SDS-PAGE for shotgun proteomics of grapevine samples as it significantly increases protein identification across biological replicates. Total peptide and protein information from the two fractionation techniques is available in PRIDE with the identifier PXD001399 (http://proteomecentral.proteomexchange.org/dataset/PXD001399). © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  3. Sequence-based classification and identification of Fungi.

    Science.gov (United States)

    Hibbett, David; Abarenkov, Kessy; Kõljalg, Urmas; Öpik, Maarja; Chai, Benli; Cole, James; Wang, Qiong; Crous, Pedro; Robert, Vincent; Helgason, Thorunn; Herr, Joshua R; Kirk, Paul; Lueschow, Shiloh; O'Donnell, Kerry; Nilsson, R Henrik; Oono, Ryoko; Schoch, Conrad; Smyth, Christopher; Walker, Donald M; Porras-Alfaro, Andrea; Taylor, John W; Geiser, David M

    Fungal taxonomy and ecology have been revolutionized by the application of molecular methods and both have increasing connections to genomics and functional biology. However, data streams from traditional specimen- and culture-based systematics are not yet fully integrated with those from metagenomic and metatranscriptomic studies, which limits understanding of the taxonomic diversity and metabolic properties of fungal communities. This article reviews current resources, needs, and opportunities for sequence-based classification and identification (SBCI) in fungi as well as related efforts in prokaryotes. To realize the full potential of fungal SBCI it will be necessary to make advances in multiple areas. Improvements in sequencing methods, including long-read and single-cell technologies, will empower fungal molecular ecologists to look beyond ITS and current shotgun metagenomics approaches. Data quality and accessibility will be enhanced by attention to data and metadata standards and rigorous enforcement of policies for deposition of data and workflows. Taxonomic communities will need to develop best practices for molecular characterization in their focal clades, while also contributing to globally useful datasets including ITS. Changes to nomenclatural rules are needed to enable validPUBLICation of sequence-based taxon descriptions. Finally, cultural shifts are necessary to promote adoption of SBCI and to accord professional credit to individuals who contribute to community resources.

  4. PatternLab for proteomics: a tool for differential shotgun proteomics

    Directory of Open Access Journals (Sweden)

    Yates John R

    2008-07-01

    Full Text Available Abstract Background A goal of proteomics is to distinguish between states of a biological system by identifying protein expression differences. Liu et al. demonstrated a method to perform semi-relative protein quantitation in shotgun proteomics data by correlating the number of tandem mass spectra obtained for each protein, or "spectral count", with its abundance in a mixture; however, two issues have remained open: how to normalize spectral counting data and how to efficiently pinpoint differences between profiles. Moreover, Chen et al. recently showed how to increase the number of identified proteins in shotgun proteomics by analyzing samples with different MS-compatible detergents while performing proteolytic digestion. The latter introduced new challenges as seen from the data analysis perspective, since replicate readings are not acquired. Results To address the open issues above, we present a program termed PatternLab for proteomics. This program implements existing strategies and adds two new methods to pinpoint differences in protein profiles. The first method, ACFold, addresses experiments with less than three replicates from each state or having assays acquired by different protocols as described by Chen et al. ACFold uses a combined criterion based on expression fold changes, the AC test, and the false-discovery rate, and can supply a "bird's-eye view" of differentially expressed proteins. The other method addresses experimental designs having multiple readings from each state and is referred to as nSVM (natural support vector machine because of its roots in evolutionary computing and in statistical learning theory. Our observations suggest that nSVM's niche comprises projects that select a minimum set of proteins for classification purposes; for example, the development of an early detection kit for a given pathology. We demonstrate the effectiveness of each method on experimental data and confront them with existing strategies

  5. Shotgun metaproteomics of the human distal gut microbiota

    Energy Technology Data Exchange (ETDEWEB)

    VerBerkmoes, N.C.; Russell, A.L.; Shah, M.; Godzik, A.; Rosenquist, M.; Halfvarsson, J.; Lefsrud, M.G.; Apajalahti, J.; Tysk, C.; Hettich, R.L.; Jansson, Janet K.

    2008-10-15

    The human gut contains a dense, complex and diverse microbial community, comprising the gut microbiome. Metagenomics has recently revealed the composition of genes in the gut microbiome, but provides no direct information about which genes are expressed or functioning. Therefore, our goal was to develop a novel approach to directly identify microbial proteins in fecal samples to gain information about the genes expressed and about key microbial functions in the human gut. We used a non-targeted, shotgun mass spectrometry-based whole community proteomics, or metaproteomics, approach for the first deep proteome measurements of thousands of proteins in human fecal samples, thus demonstrating this approach on the most complex sample type to date. The resulting metaproteomes had a skewed distribution relative to the metagenome, with more proteins for translation, energy production and carbohydrate metabolism when compared to what was earlier predicted from metagenomics. Human proteins, including antimicrobial peptides, were also identified, providing a non-targeted glimpse of the host response to the microbiota. Several unknown proteins represented previously undescribed microbial pathways or host immune responses, revealing a novel complex interplay between the human host and its associated microbes.

  6. Validation of Metagenomic Next-Generation Sequencing Tests for Universal Pathogen Detection.

    Science.gov (United States)

    Schlaberg, Robert; Chiu, Charles Y; Miller, Steve; Procop, Gary W; Weinstock, George

    2017-06-01

    - Metagenomic sequencing can be used for detection of any pathogens using unbiased, shotgun next-generation sequencing (NGS), without the need for sequence-specific amplification. Proof-of-concept has been demonstrated in infectious disease outbreaks of unknown causes and in patients with suspected infections but negative results for conventional tests. Metagenomic NGS tests hold great promise to improve infectious disease diagnostics, especially in immunocompromised and critically ill patients. - To discuss challenges and provide example solutions for validating metagenomic pathogen detection tests in clinical laboratories. A summary of current regulatory requirements, largely based on prior guidance for NGS testing in constitutional genetics and oncology, is provided. - Examples from 2 separate validation studies are provided for steps from assay design, and validation of wet bench and bioinformatics protocols, to quality control and assurance. - Although laboratory and data analysis workflows are still complex, metagenomic NGS tests for infectious diseases are increasingly being validated in clinical laboratories. Many parallels exist to NGS tests in other fields. Nevertheless, specimen preparation, rapidly evolving data analysis algorithms, and incomplete reference sequence databases are idiosyncratic to the field of microbiology and often overlooked.

  7. Deep whole-genome sequencing of 90 Han Chinese genomes.

    Science.gov (United States)

    Lan, Tianming; Lin, Haoxiang; Zhu, Wenjuan; Laurent, Tellier Christian Asker Melchior; Yang, Mengcheng; Liu, Xin; Wang, Jun; Wang, Jian; Yang, Huanming; Xu, Xun; Guo, Xiaosen

    2017-09-01

    Next-generation sequencing provides a high-resolution insight into human genetic information. However, the focus of previous studies has primarily been on low-coverage data due to the high cost of sequencing. Although the 1000 Genomes Project and the Haplotype Reference Consortium have both provided powerful reference panels for imputation, low-frequency and novel variants remain difficult to discover and call with accuracy on the basis of low-coverage data. Deep sequencing provides an optimal solution for the problem of these low-frequency and novel variants. Although whole-exome sequencing is also a viable choice for exome regions, it cannot account for noncoding regions, sometimes resulting in the absence of important, causal variants. For Han Chinese populations, the majority of variants have been discovered based upon low-coverage data from the 1000 Genomes Project. However, high-coverage, whole-genome sequencing data are limited for any population, and a large amount of low-frequency, population-specific variants remain uncharacterized. We have performed whole-genome sequencing at a high depth (∼×80) of 90 unrelated individuals of Chinese ancestry, collected from the 1000 Genomes Project samples, including 45 Northern Han Chinese and 45 Southern Han Chinese samples. Eighty-three of these 90 have been sequenced by the 1000 Genomes Project. We have identified 12 568 804 single nucleotide polymorphisms, 2 074 210 short InDels, and 26 142 structural variations from these 90 samples. Compared to the Han Chinese data from the 1000 Genomes Project, we have found 7 000 629 novel variants with low frequency (defined as minor allele frequency genome. Compared to the 1000 Genomes Project, these Han Chinese deep sequencing data enhance the characterization of a large number of low-frequency, novel variants. This will be a valuable resource for promoting Chinese genetics research and medical development. Additionally, it will provide a valuable supplement to the 1000

  8. HOMICIDE BY CERVICAL SPINAL CORD GUNSHOT INJURY WITH SHOTGUN FIRE PELLETS: CASE REPORT

    Directory of Open Access Journals (Sweden)

    Dana Turliuc, Serban Turliuc, Iustin Mihailov, Andrei Cucu, Gabriel Dumitrescu,Claudia Costea

    2015-10-01

    Full Text Available This case present a rare forensic case of cervical spinal gunshot injury of a female by her husband, a professional hunter, during a family fight with a shotgun fire pellets. The gunshot destroyed completely the cervical spinal cord, without injury to the neck vessels and organs and with the patient survival for seven days. We discuss notions of judicial ballistics, assessment of the patient with spinal cord gunshot injury and therapeutic strategies. Even if cervical spine gunshot injuries are most of the times lethal for majority of patients, the surviving patients need the coordination of a multidisciplinary surgical team to ensure the optimal functional prognostic.

  9. Microsatellite Primers Identified by 454 Sequencing in the Floodplain Tree Species Eucalyptus victrix (Myrtaceae

    Directory of Open Access Journals (Sweden)

    Paul G. Nevill

    2013-05-01

    Full Text Available Premise of the study: Microsatellite primers were developed for Eucalyptus victrix (Myrtaceae to evaluate the population and spatial genetic structure of this widespread northwestern Australian riparian tree species, which may be impacted by hydrological changes associated with mining activity. Methods and Results: 454 GS-FLX shotgun sequencing was used to obtain 1895 sequences containing putative microsatellite motifs. Ten polymorphic microsatellite loci were identified and screened for variation in individuals from two populations in the Pilbara region. Observed heterozygosities ranged from 0.44 to 0.91 (mean: 0.66 and the number of alleles per locus ranged from five to 25 (average: 11. Conclusions: These microsatellite loci will be useful in future studies of population and spatial genetic structure in E. victrix, and inform the development of seed sourcing strategies for the species.

  10. Phylogenetics and differentiation of Salmonella Newport lineages by whole genome sequencing.

    Directory of Open Access Journals (Sweden)

    Guojie Cao

    Full Text Available Salmonella Newport has ranked in the top three Salmonella serotypes associated with foodborne outbreaks from 1995 to 2011 in the United States. In the current study, we selected 26 S. Newport strains isolated from diverse sources and geographic locations and then conducted 454 shotgun pyrosequencing procedures to obtain 16-24 × coverage of high quality draft genomes for each strain. Comparative genomic analysis of 28 S. Newport strains (including 2 reference genomes and 15 outgroup genomes identified more than 140,000 informative SNPs. A resulting phylogenetic tree consisted of four sublineages and indicated that S. Newport had a clear geographic structure. Strains from Asia were divergent from those from the Americas. Our findings demonstrated that analysis using whole genome sequencing data resulted in a more accurate picture of phylogeny compared to that using single genes or small sets of genes. We selected loci around the mutS gene of S. Newport to differentiate distinct lineages, including those between invH and mutS genes at the 3' end of Salmonella Pathogenicity Island 1 (SPI-1, ste fimbrial operon, and Clustered, Regularly Interspaced, Short Palindromic Repeats (CRISPR associated-proteins (cas. These genes in the outgroup genomes held high similarity with either S. Newport Lineage II or III at the same loci. S. Newport Lineages II and III have different evolutionary histories in this region and our data demonstrated genetic flow and homologous recombination events around mutS. The findings suggested that S. Newport Lineages II and III diverged early in the serotype evolution and have evolved largely independently. Moreover, we identified genes that could delineate sublineages within the phylogenetic tree and that could be used as potential biomarkers for trace-back investigations during outbreaks. Thus, whole genome sequencing data enabled us to better understand the genetic background of pathogenicity and evolutionary history of S

  11. ReRep: Computational detection of repetitive sequences in genome survey sequences (GSS

    Directory of Open Access Journals (Sweden)

    Alves-Ferreira Marcelo

    2008-09-01

    Full Text Available Abstract Background Genome survey sequences (GSS offer a preliminary global view of a genome since, unlike ESTs, they cover coding as well as non-coding DNA and include repetitive regions of the genome. A more precise estimation of the nature, quantity and variability of repetitive sequences very early in a genome sequencing project is of considerable importance, as such data strongly influence the estimation of genome coverage, library quality and progress in scaffold construction. Also, the elimination of repetitive sequences from the initial assembly process is important to avoid errors and unnecessary complexity. Repetitive sequences are also of interest in a variety of other studies, for instance as molecular markers. Results We designed and implemented a straightforward pipeline called ReRep, which combines bioinformatics tools for identifying repetitive structures in a GSS dataset. In a case study, we first applied the pipeline to a set of 970 GSSs, sequenced in our laboratory from the human pathogen Leishmania braziliensis, the causative agent of leishmaniosis, an important public health problem in Brazil. We also verified the applicability of ReRep to new sequencing technologies using a set of 454-reads of an Escheria coli. The behaviour of several parameters in the algorithm is evaluated and suggestions are made for tuning of the analysis. Conclusion The ReRep approach for identification of repetitive elements in GSS datasets proved to be straightforward and efficient. Several potential repetitive sequences were found in a L. braziliensis GSS dataset generated in our laboratory, and further validated by the analysis of a more complete genomic dataset from the EMBL and Sanger Centre databases. ReRep also identified most of the E. coli K12 repeats prior to assembly in an example dataset obtained by automated sequencing using 454 technology. The parameters controlling the algorithm behaved consistently and may be tuned to the properties

  12. Genome sequence of the olive tree, Olea europaea.

    Science.gov (United States)

    Cruz, Fernando; Julca, Irene; Gómez-Garrido, Jèssica; Loska, Damian; Marcet-Houben, Marina; Cano, Emilio; Galán, Beatriz; Frias, Leonor; Ribeca, Paolo; Derdak, Sophia; Gut, Marta; Sánchez-Fernández, Manuel; García, Jose Luis; Gut, Ivo G; Vargas, Pablo; Alioto, Tyler S; Gabaldón, Toni

    2016-06-27

    The Mediterranean olive tree (Olea europaea subsp. europaea) was one of the first trees to be domesticated and is currently of major agricultural importance in the Mediterranean region as the source of olive oil. The molecular bases underlying the phenotypic differences among domesticated cultivars, or between domesticated olive trees and their wild relatives, remain poorly understood. Both wild and cultivated olive trees have 46 chromosomes (2n). A total of 543 Gb of raw DNA sequence from whole genome shotgun sequencing, and a fosmid library containing 155,000 clones from a 1,000+ year-old olive tree (cv. Farga) were generated by Illumina sequencing using different combinations of mate-pair and pair-end libraries. Assembly gave a final genome with a scaffold N50 of 443 kb, and a total length of 1.31 Gb, which represents 95 % of the estimated genome length (1.38 Gb). In addition, the associated fungus Aureobasidium pullulans was partially sequenced. Genome annotation, assisted by RNA sequencing from leaf, root, and fruit tissues at various stages, resulted in 56,349 unique protein coding genes, suggesting recent genomic expansion. Genome completeness, as estimated using the CEGMA pipeline, reached 98.79 %. The assembled draft genome of O. europaea will provide a valuable resource for the study of the evolution and domestication processes of this important tree, and allow determination of the genetic bases of key phenotypic traits. Moreover, it will enhance breeding programs and the formation of new varieties.

  13. Environmental whole-genome amplification to access microbial populations in contaminated sediments

    Energy Technology Data Exchange (ETDEWEB)

    Abulencia, Carl B [Diversa Corporation; Wyborski, Denise L. [Diversa Corporation; Garcia, Joseph A. [Diversa Corporation; Podar, Mircea [ORNL; Chen, Wenqiong [Diversa Corporation; Chang, Sherman H. [Diversa Corporation; Chang, Hwai W. [Diversa Corporation; Watson, David B [ORNL; Brodie, Eoin L. [Lawrence Berkeley National Laboratory (LBNL); Hazen, Terry [Lawrence Berkeley National Laboratory (LBNL); Keller, Martin [ORNL

    2006-05-01

    Low-biomass samples from nitrate and heavy metal contaminated soils yield DNA amounts that have limited use for direct, native analysis and screening. Multiple displacement amplification (MDA) using {phi}29 DNA polymerase was used to amplify whole genomes from environmental, contaminated, subsurface sediments. By first amplifying the genomic DNA (gDNA), biodiversity analysis and gDNA library construction of microbes found in contaminated soils were made possible. The MDA method was validated by analyzing amplified genome coverage from approximately five Escherichia coli cells, resulting in 99.2% genome coverage. The method was further validated by confirming overall representative species coverage and also an amplification bias when amplifying from a mix of eight known bacterial strains. We extracted DNA from samples with extremely low cell densities from a U.S. Department of Energy contaminated site. After amplification, small-subunit rRNA analysis revealed relatively even distribution of species across several major phyla. Clone libraries were constructed from the amplified gDNA, and a small subset of clones was used for shotgun sequencing. BLAST analysis of the library clone sequences showed that 64.9% of the sequences had significant similarities to known proteins, and 'clusters of orthologous groups' (COG) analysis revealed that more than half of the sequences from each library contained sequence similarity to known proteins. The libraries can be readily screened for native genes or any target of interest. Whole-genome amplification of metagenomic DNA from very minute microbial sources, while introducing an amplification bias, will allow access to genomic information that was not previously accessible.

  14. Environmental Whole-Genome Amplification to Access Microbial Diversity in Contaminated Sediments

    Energy Technology Data Exchange (ETDEWEB)

    Abulencia, C.B.; Wyborski, D.L.; Garcia, J.; Podar, M.; Chen, W.; Chang, S.H.; Chang, H.W.; Watson, D.; Brodie,E.I.; Hazen, T.C.; Keller, M.

    2005-12-10

    Low-biomass samples from nitrate and heavy metal contaminated soils yield DNA amounts that have limited use for direct, native analysis and screening. Multiple displacement amplification (MDA) using ?29 DNA polymerase was used to amplify whole genomes from environmental, contaminated, subsurface sediments. By first amplifying the genomic DNA (gDNA), biodiversity analysis and gDNA library construction of microbes found in contaminated soils were made possible. The MDA method was validated by analyzing amplified genome coverage from approximately five Escherichia coli cells, resulting in 99.2 percent genome coverage. The method was further validated by confirming overall representative species coverage and also an amplification bias when amplifying from a mix of eight known bacterial strains. We extracted DNA from samples with extremely low cell densities from a U.S. Department of Energy contaminated site. After amplification, small subunit rRNA analysis revealed relatively even distribution of species across several major phyla. Clone libraries were constructed from the amplified gDNA, and a small subset of clones was used for shotgun sequencing. BLAST analysis of the library clone sequences showed that 64.9 percent of the sequences had significant similarities to known proteins, and ''clusters of orthologous groups'' (COG) analysis revealed that more than half of the sequences from each library contained sequence similarity to known proteins. The libraries can be readily screened for native genes or any target of interest. Whole-genome amplification of metagenomic DNA from very minute microbial sources, while introducing an amplification bias, will allow access to genomic information that was not previously accessible.

  15. High resolution clustering of Salmonella enterica serovar Montevideo strains using a next-generation sequencing approach

    Directory of Open Access Journals (Sweden)

    Allard Marc W

    2012-01-01

    Full Text Available Abstract Background Next-Generation Sequencing (NGS is increasingly being used as a molecular epidemiologic tool for discerning ancestry and traceback of the most complicated, difficult to resolve bacterial pathogens. Making a linkage between possible food sources and clinical isolates requires distinguishing the suspected pathogen from an environmental background and placing the variation observed into the wider context of variation occurring within a serovar and among other closely related foodborne pathogens. Equally important is the need to validate these high resolution molecular tools for use in molecular epidemiologic traceback. Such efforts include the examination of strain cluster stability as well as the cumulative genetic effects of sub-culturing on these clusters. Numerous isolates of S. Montevideo were shot-gun sequenced including diverse lineage representatives as well as numerous replicate clones to determine how much variability is due to bias, sequencing error, and or the culturing of isolates. All new draft genomes were compared to 34 S. Montevideo isolates previously published during an NGS-based molecular epidemiological case study. Results Intraserovar lineages of S. Montevideo differ by thousands of SNPs, that are only slightly less than the number of SNPs observed between S. Montevideo and other distinct serovars. Much less variability was discovered within an individual S. Montevideo clade implicated in a recent foodborne outbreak as well as among individual NGS replicates. These findings were similar to previous reports documenting homopolymeric and deletion error rates with the Roche 454 GS Titanium technology. In no case, however, did variability associated with sequencing methods or sample preparations create inconsistencies with our current phylogenetic results or the subsequent molecular epidemiological evidence gleaned from these data. Conclusions Implementation of a validated pipeline for NGS data acquisition and

  16. The Release 6 reference sequence of the Drosophila melanogaster genome.

    Science.gov (United States)

    Hoskins, Roger A; Carlson, Joseph W; Wan, Kenneth H; Park, Soo; Mendez, Ivonne; Galle, Samuel E; Booth, Benjamin W; Pfeiffer, Barret D; George, Reed A; Svirskas, Robert; Krzywinski, Martin; Schein, Jacqueline; Accardo, Maria Carmela; Damia, Elisabetta; Messina, Giovanni; Méndez-Lago, María; de Pablos, Beatriz; Demakova, Olga V; Andreyeva, Evgeniya N; Boldyreva, Lidiya V; Marra, Marco; Carvalho, A Bernardo; Dimitri, Patrizio; Villasante, Alfredo; Zhimulev, Igor F; Rubin, Gerald M; Karpen, Gary H; Celniker, Susan E

    2015-03-01

    Drosophila melanogaster plays an important role in molecular, genetic, and genomic studies of heredity, development, metabolism, behavior, and human disease. The initial reference genome sequence reported more than a decade ago had a profound impact on progress in Drosophila research, and improving the accuracy and completeness of this sequence continues to be important to further progress. We previously described improvement of the 117-Mb sequence in the euchromatic portion of the genome and 21 Mb in the heterochromatic portion, using a whole-genome shotgun assembly, BAC physical mapping, and clone-based finishing. Here, we report an improved reference sequence of the single-copy and middle-repetitive regions of the genome, produced using cytogenetic mapping to mitotic and polytene chromosomes, clone-based finishing and BAC fingerprint verification, ordering of scaffolds by alignment to cDNA sequences, incorporation of other map and sequence data, and validation by whole-genome optical restriction mapping. These data substantially improve the accuracy and completeness of the reference sequence and the order and orientation of sequence scaffolds into chromosome arm assemblies. Representation of the Y chromosome and other heterochromatic regions is particularly improved. The new 143.9-Mb reference sequence, designated Release 6, effectively exhausts clone-based technologies for mapping and sequencing. Highly repeat-rich regions, including large satellite blocks and functional elements such as the ribosomal RNA genes and the centromeres, are largely inaccessible to current sequencing and assembly methods and remain poorly represented. Further significant improvements will require sequencing technologies that do not depend on molecular cloning and that produce very long reads. © 2015 Hoskins et al.; Published by Cold Spring Harbor Laboratory Press.

  17. Shotgun Bisulfite Sequencing of the Betula platyphylla Genome Reveals the Tree’s DNA Methylation Patterning

    Directory of Open Access Journals (Sweden)

    Chang Su

    2014-12-01

    Full Text Available DNA methylation plays a critical role in the regulation of gene expression. Most studies of DNA methylation have been performed in herbaceous plants, and little is known about the methylation patterns in tree genomes. In the present study, we generated a map of methylated cytosines at single base pair resolution for Betula platyphylla (white birch by bisulfite sequencing combined with transcriptomics to analyze DNA methylation and its effects on gene expression. We obtained a detailed view of the function of DNA methylation sequence composition and distribution in the genome of B. platyphylla. There are 34,460 genes in the whole genome of birch, and 31,297 genes are methylated. Conservatively, we estimated that 14.29% of genomic cytosines are methylcytosines in birch. Among the methylation sites, the CHH context accounts for 48.86%, and is the largest proportion. Combined transcriptome and methylation analysis showed that the genes with moderate methylation levels had higher expression levels than genes with high and low methylation. In addition, methylated genes are highly enriched for the GO subcategories of binding activities, catalytic activities, cellular processes, response to stimulus and cell death, suggesting that methylation mediates these pathways in birch trees.

  18. Detection of genomic variation by selection of a 9 mb DNA region and high throughput sequencing.

    Directory of Open Access Journals (Sweden)

    Sergey I Nikolaev

    Full Text Available Detection of the rare polymorphisms and causative mutations of genetic diseases in a targeted genomic area has become a major goal in order to understand genomic and phenotypic variability. We have interrogated repeat-masked regions of 8.9 Mb on human chromosomes 21 (7.8 Mb and 7 (1.1 Mb from an individual from the International HapMap Project (NA12872. We have optimized a method of genomic selection for high throughput sequencing. Microarray-based selection and sequencing resulted in 260-fold enrichment, with 41% of reads mapping to the target region. 83% of SNPs in the targeted region had at least 4-fold sequence coverage and 54% at least 15-fold. When assaying HapMap SNPs in NA12872, our sequence genotypes are 91.3% concordant in regions with coverage > or = 4-fold, and 97.9% concordant in regions with coverage > or = 15-fold. About 81% of the SNPs recovered with both thresholds are listed in dbSNP. We observed that regions with low sequence coverage occur in close proximity to low-complexity DNA. Validation experiments using Sanger sequencing were performed for 46 SNPs with 15-20 fold coverage, with a confirmation rate of 96%, suggesting that DNA selection provides an accurate and cost-effective method for identifying rare genomic variants.

  19. Targeted isolation, sequence assembly and characterization of two white spruce (Picea glauca BAC clones for terpenoid synthase and cytochrome P450 genes involved in conifer defence reveal insights into a conifer genome

    Directory of Open Access Journals (Sweden)

    Ritland Carol

    2009-08-01

    Full Text Available Abstract Background Conifers are a large group of gymnosperm trees which are separated from the angiosperms by more than 300 million years of independent evolution. Conifer genomes are extremely large and contain considerable amounts of repetitive DNA. Currently, conifer sequence resources exist predominantly as expressed sequence tags (ESTs and full-length (FLcDNAs. There is no genome sequence available for a conifer or any other gymnosperm. Conifer defence-related genes often group into large families with closely related members. The goals of this study are to assess the feasibility of targeted isolation and sequence assembly of conifer BAC clones containing specific genes from two large gene families, and to characterize large segments of genomic DNA sequence for the first time from a conifer. Results We used a PCR-based approach to identify BAC clones for two target genes, a terpene synthase (3-carene synthase; 3CAR and a cytochrome P450 (CYP720B4 from a non-arrayed genomic BAC library of white spruce (Picea glauca. Shotgun genomic fragments isolated from the BAC clones were sequenced to a depth of 15.6- and 16.0-fold coverage, respectively. Assembly and manual curation yielded sequence scaffolds of 172 kbp (3CAR and 94 kbp (CYP720B4 long. Inspection of the genomic sequences revealed the intron-exon structures, the putative promoter regions and putative cis-regulatory elements of these genes. Sequences related to transposable elements (TEs, high complexity repeats and simple repeats were prevalent and comprised approximately 40% of the sequenced genomic DNA. An in silico simulation of the effect of sequencing depth on the quality of the sequence assembly provides direction for future efforts of conifer genome sequencing. Conclusion We report the first targeted cloning, sequencing, assembly, and annotation of large segments of genomic DNA from a conifer. We demonstrate that genomic BAC clones for individual members of multi-member gene

  20. Targeted isolation, sequence assembly and characterization of two white spruce (Picea glauca) BAC clones for terpenoid synthase and cytochrome P450 genes involved in conifer defence reveal insights into a conifer genome.

    Science.gov (United States)

    Hamberger, Björn; Hall, Dawn; Yuen, Mack; Oddy, Claire; Hamberger, Britta; Keeling, Christopher I; Ritland, Carol; Ritland, Kermit; Bohlmann, Jörg

    2009-08-06

    Conifers are a large group of gymnosperm trees which are separated from the angiosperms by more than 300 million years of independent evolution. Conifer genomes are extremely large and contain considerable amounts of repetitive DNA. Currently, conifer sequence resources exist predominantly as expressed sequence tags (ESTs) and full-length (FL)cDNAs. There is no genome sequence available for a conifer or any other gymnosperm. Conifer defence-related genes often group into large families with closely related members. The goals of this study are to assess the feasibility of targeted isolation and sequence assembly of conifer BAC clones containing specific genes from two large gene families, and to characterize large segments of genomic DNA sequence for the first time from a conifer. We used a PCR-based approach to identify BAC clones for two target genes, a terpene synthase (3-carene synthase; 3CAR) and a cytochrome P450 (CYP720B4) from a non-arrayed genomic BAC library of white spruce (Picea glauca). Shotgun genomic fragments isolated from the BAC clones were sequenced to a depth of 15.6- and 16.0-fold coverage, respectively. Assembly and manual curation yielded sequence scaffolds of 172 kbp (3CAR) and 94 kbp (CYP720B4) long. Inspection of the genomic sequences revealed the intron-exon structures, the putative promoter regions and putative cis-regulatory elements of these genes. Sequences related to transposable elements (TEs), high complexity repeats and simple repeats were prevalent and comprised approximately 40% of the sequenced genomic DNA. An in silico simulation of the effect of sequencing depth on the quality of the sequence assembly provides direction for future efforts of conifer genome sequencing. We report the first targeted cloning, sequencing, assembly, and annotation of large segments of genomic DNA from a conifer. We demonstrate that genomic BAC clones for individual members of multi-member gene families can be isolated in a gene-specific fashion. The

  1. Measles seroprevalence, outbreaks, and vaccine coverage in Rwanda.

    Science.gov (United States)

    Seruyange, Eric; Gahutu, Jean-Bosco; Mambo Muvunyi, Claude; Uwimana, Zena G; Gatera, Maurice; Twagirumugabe, Theogene; Katare, Swaibu; Karenzi, Ben; Bergström, Tomas

    2016-01-01

    Measles outbreaks are reported after insufficient vaccine coverage, especially in countries recovering from natural disaster or conflict. We compared seroprevalence to measles in blood donors in Rwanda and Sweden and explored distribution of active cases of measles and vaccine coverage in Rwanda. 516 Rwandan and 215 Swedish blood donors were assayed for measles-specific immunoglobulin G (IgG) by enzyme-linked immunosorbent assay (ELISA). Data on vaccine coverage and acute cases in Rwanda from 1980 to 2014 were collected, and IgM on serum samples and polymerase chain reaction (PCR) on nasopharyngeal (NPH) swabs from suspected measles cases during 2010-2011 were analysed. The seroprevalence of measles IgG was significantly higher in Swedish blood donors (92.6%; 95% CI: 89.1-96.1%) compared to Rwandan subjects (71.5%; 95% CI: 67.6-75.4%) and more pronounced Rwanda, with the exception of an outbreak in 1995 following the 1994 genocide. 76/544 serum samples were IgM positive and 21/31 NPH swabs were PCR positive for measles, determined by sequencing to be of genotype B3. Measles seroprevalence was lower in Rwandan blood donors compared to Swedish subjects. Despite this, the number of reported measles cases in Rwanda rapidly decreased during the study period, concomitant with increased vaccine coverage. Taken together, the circulation of measles was limited in Rwanda and vaccine coverage was favourable, but seroprevalence and IgG levels were low especially in younger age groups.

  2. Data on genome sequencing, analysis and annotation of a pathogenic Bacillus cereus 062011msu

    Directory of Open Access Journals (Sweden)

    Rashmi Rathy

    2018-04-01

    Full Text Available Bacillus species 062011 msu is a harmful pathogenic strain responsible for causing abscessation in sheep and goat population studied by Mariappan et al. (2012 [1]. The organism specifically targets the female sheep and goat population and results in the reduction of milk and meat production. In the present study, we have performed the whole genome sequencing of the pathogenic isolate using the Ion Torrent sequencing platform and generated 458,944 raw reads with an average length of 198.2 bp. The genome sequence was assembled, annotated and analysed for the genetic islands, metabolic pathways, orthologous groups, virulence factors and antibiotic resistance genes associated with the pathogen. Simultaneously the 16S rRNA sequencing study and genome sequence comparison data confirmed that the strain belongs to the species Bacillus cereus and exhibits 99% sequence homo;logy with the genomes of B. cereus ATCC 10987 and B. cereus FRI-35. Hence, we have renamed the organism as Bacillus cereus 062011msu. The Whole Genome Shotgun (WGS project has been deposited at DDBJ/ENA/GenBank under the accession NTMF00000000 (https://www.ncbi.nlm.nih.gov/bioproject/PRJNA404036(SAMN07629099. Keywords: Bacillus cereus, Genome sequencing, Abscessation, Virulence factors

  3. Bacillus anthracis genome organization in light of whole transcriptome sequencing

    Energy Technology Data Exchange (ETDEWEB)

    Martin, Jeffrey; Zhu, Wenhan; Passalacqua, Karla D.; Bergman, Nicholas; Borodovsky, Mark

    2010-03-22

    Emerging knowledge of whole prokaryotic transcriptomes could validate a number of theoretical concepts introduced in the early days of genomics. What are the rules connecting gene expression levels with sequence determinants such as quantitative scores of promoters and terminators? Are translation efficiency measures, e.g. codon adaptation index and RBS score related to gene expression? We used the whole transcriptome shotgun sequencing of a bacterial pathogen Bacillus anthracis to assess correlation of gene expression level with promoter, terminator and RBS scores, codon adaptation index, as well as with a new measure of gene translational efficiency, average translation speed. We compared computational predictions of operon topologies with the transcript borders inferred from RNA-Seq reads. Transcriptome mapping may also improve existing gene annotation. Upon assessment of accuracy of current annotation of protein-coding genes in the B. anthracis genome we have shown that the transcriptome data indicate existence of more than a hundred genes missing in the annotation though predicted by an ab initio gene finder. Interestingly, we observed that many pseudogenes possess not only a sequence with detectable coding potential but also promoters that maintain transcriptional activity.

  4. High depth, whole-genome sequencing of cholera isolates from Haiti and the Dominican Republic.

    Science.gov (United States)

    Sealfon, Rachel; Gire, Stephen; Ellis, Crystal; Calderwood, Stephen; Qadri, Firdausi; Hensley, Lisa; Kellis, Manolis; Ryan, Edward T; LaRocque, Regina C; Harris, Jason B; Sabeti, Pardis C

    2012-09-11

    Whole-genome sequencing is an important tool for understanding microbial evolution and identifying the emergence of functionally important variants over the course of epidemics. In October 2010, a severe cholera epidemic began in Haiti, with additional cases identified in the neighboring Dominican Republic. We used whole-genome approaches to sequence four Vibrio cholerae isolates from Haiti and the Dominican Republic and three additional V. cholerae isolates to a high depth of coverage (>2000x); four of the seven isolates were previously sequenced. Using these sequence data, we examined the effect of depth of coverage and sequencing platform on genome assembly and identification of sequence variants. We found that 50x coverage is sufficient to construct a whole-genome assembly and to accurately call most variants from 100 base pair paired-end sequencing reads. Phylogenetic analysis between the newly sequenced and thirty-three previously sequenced V. cholerae isolates indicates that the Haitian and Dominican Republic isolates are closest to strains from South Asia. The Haitian and Dominican Republic isolates form a tight cluster, with only four variants unique to individual isolates. These variants are located in the CTX region, the SXT region, and the core genome. Of the 126 mutations identified that separate the Haiti-Dominican Republic cluster from the V. cholerae reference strain (N16961), 73 are non-synonymous changes, and a number of these changes cluster in specific genes and pathways. Sequence variant analyses of V. cholerae isolates, including multiple isolates from the Haitian outbreak, identify coverage-specific and technology-specific effects on variant detection, and provide insight into genomic change and functional evolution during an epidemic.

  5. High depth, whole-genome sequencing of cholera isolates from Haiti and the Dominican Republic

    Directory of Open Access Journals (Sweden)

    Sealfon Rachel

    2012-09-01

    Full Text Available Abstract Background Whole-genome sequencing is an important tool for understanding microbial evolution and identifying the emergence of functionally important variants over the course of epidemics. In October 2010, a severe cholera epidemic began in Haiti, with additional cases identified in the neighboring Dominican Republic. We used whole-genome approaches to sequence four Vibrio cholerae isolates from Haiti and the Dominican Republic and three additional V. cholerae isolates to a high depth of coverage (>2000x; four of the seven isolates were previously sequenced. Results Using these sequence data, we examined the effect of depth of coverage and sequencing platform on genome assembly and identification of sequence variants. We found that 50x coverage is sufficient to construct a whole-genome assembly and to accurately call most variants from 100 base pair paired-end sequencing reads. Phylogenetic analysis between the newly sequenced and thirty-three previously sequenced V. cholerae isolates indicates that the Haitian and Dominican Republic isolates are closest to strains from South Asia. The Haitian and Dominican Republic isolates form a tight cluster, with only four variants unique to individual isolates. These variants are located in the CTX region, the SXT region, and the core genome. Of the 126 mutations identified that separate the Haiti-Dominican Republic cluster from the V. cholerae reference strain (N16961, 73 are non-synonymous changes, and a number of these changes cluster in specific genes and pathways. Conclusions Sequence variant analyses of V. cholerae isolates, including multiple isolates from the Haitian outbreak, identify coverage-specific and technology-specific effects on variant detection, and provide insight into genomic change and functional evolution during an epidemic.

  6. Effects of Fe and Mn deficiencies on the protein profiles of tomato (Solanum lycopersicum) xylem sap as revealed by shotgun analyses

    Science.gov (United States)

    The aim of this work was to study the effects of Fe and Mn deficiencies on the xylem sap proteome of tomato using a shotgun proteomic approach, with the final goal of elucidating plant response mechanisms to these stresses. This approach yielded 643 proteins reliably identified and quantified with 7...

  7. Draft genome sequence of the Algerian bee Apis mellifera intermissa

    Directory of Open Access Journals (Sweden)

    Nizar Jamal Haddad

    2015-06-01

    Full Text Available Apis mellifera intermissa is the native honeybee subspecies of Algeria. A. m. intermissa occurs in Tunisia, Algeria and Morocco, between the Atlas and the Mediterranean and Atlantic coasts. This bee is very important due to its high ability to adapt to great variations in climatic conditions and due to its preferable cleaning behavior. Here we report the draft genome sequence of this honey bee, its Whole Genome Shotgun project has been deposited at DDBJ/EMBL/GenBank under the accession JSUV00000000. The 240-Mb genome is being annotated and analyzed. Comparison with the genome of other Apis mellifera sub-species promises to yield insights into the evolution of adaptations to high temperature and resistance to Varroa parasite infestation.

  8. Sequencing of chloroplast genome using whole cellular DNA and Solexa sequencing technology

    Directory of Open Access Journals (Sweden)

    Jian eWu

    2012-11-01

    Full Text Available Sequencing of the chloroplast genome using traditional sequencing methods has been difficult because of its size (>120 kb and the complicated procedures required to prepare templates. To explore the feasibility of sequencing the chloroplast genome using DNA extracted from whole cells and Solexa sequencing technology, we sequenced whole cellular DNA isolated from leaves of three Brassica rapa accessions with one lane per accession. In total, 246 Mb, 362Mb, 361 Mb sequence data were generated for the three accessions Chiifu-401-42, Z16 and FT, respectively. Microreads were assembled by reference-guided assembly using the cpDNA sequences of B. rapa, Arabidopsis thaliana, and Nicotiana tabacum. We achieved coverage of more than 99.96% of the cp genome in the three tested accessions using the B. rapa sequence as the reference. When A. thaliana or N. tabacum sequences were used as references, 99.7–99.8% or 95.5–99.7% of the B. rapa chloroplast genome was covered, respectively. These results demonstrated that sequencing of whole cellular DNA isolated from young leaves using the Illumina Genome Analyzer is an efficient method for high-throughput sequencing of chloroplast genome.

  9. A bumpy ride on the diagnostic bench of massive parallel sequencing, the case of the mitochondrial genome.

    Directory of Open Access Journals (Sweden)

    Kim Vancampenhout

    Full Text Available The advent of massive parallel sequencing (MPS has revolutionized the field of human molecular genetics, including the diagnostic study of mitochondrial (mt DNA dysfunction. The analysis of the complete mitochondrial genome using MPS platforms is now common and will soon outrun conventional sequencing. However, the development of a robust and reliable protocol is rather challenging. A previous pilot study for the re-sequencing of human mtDNA revealed an uneven coverage, affecting predominantly part of the plus strand. In an attempt to address this problem, we undertook a comparative study of standard and modified protocols for the Ion Torrent PGM system. We could not improve strand representation by altering the recommended shearing methodology of the standard workflow or omitting the DNA polymerase amplification step from the library construction process. However, we were able to associate coverage bias of the plus strand with a specific sequence motif. Additionally, we compared coverage and variant calling across technologies. The same samples were also sequenced on a MiSeq device which showed that coverage and heteroplasmic variant calling were much improved.

  10. Nonoperative Management of Multiple Penetrating Cardiac and Colon Wounds from a Shotgun: A Case Report and Literature Review

    OpenAIRE

    Jaramillo, Paula M.; Montoya, Jaime A.; Mejia, David A.; Pereira Warr, Salin

    2018-01-01

    Introduction. Surgery for cardiac trauma is considered fatal and for wounds of the colon by associated sepsis is normally considered; however, conservative management of many traumatic lesions of different injured organs has progressed over the years. Presentation of the Case. A 65-year-old male patient presented with multiple shotgun wounds on the left upper limb, thorax, and abdomen. On evaluation, he was hemodynamically stable with normal sinus rhythm and normal blood pressure, no dyspnea,...

  11. Sequencing of 50 human exomes reveals adaptation to high altitude

    DEFF Research Database (Denmark)

    Yi, Xin; Liang, Yu; Huerta-Sanchez, Emilia

    2010-01-01

    Residents of the Tibetan Plateau show heritable adaptations to extreme altitude. We sequenced 50 exomes of ethnic Tibetans, encompassing coding sequences of 92% of human genes, with an average coverage of 18x per individual. Genes showing population-specific allele frequency changes, which repres...... in genetic adaptation to high altitude.......Residents of the Tibetan Plateau show heritable adaptations to extreme altitude. We sequenced 50 exomes of ethnic Tibetans, encompassing coding sequences of 92% of human genes, with an average coverage of 18x per individual. Genes showing population-specific allele frequency changes, which...... represent strong candidates for altitude adaptation, were identified. The strongest signal of natural selection came from endothelial Per-Arnt-Sim (PAS) domain protein 1 (EPAS1), a transcription factor involved in response to hypoxia. One single-nucleotide polymorphism (SNP) at EPAS1 shows a 78% frequency...

  12. [Complete genome sequencing and sequence analysis of BCG Tice].

    Science.gov (United States)

    Wang, Zhiming; Pan, Yuanlong; Wu, Jun; Zhu, Baoli

    2012-10-04

    The objective of this study is to obtain the complete genome sequence of Bacillus Calmette-Guerin Tice (BCG Tice), in order to provide more information about the molecular biology of BCG Tice and design more reasonable vaccines to prevent tuberculosis. We assembled the data from high-throughput sequencing with SOAPdenovo software, with many contigs and scaffolds obtained. There are many sequence gaps and physical gaps remained as a result of regional low coverage and low quality. We designed primers at the end of contigs and performed PCR amplification in order to link these contigs and scaffolds. With various enzymes to perform PCR amplification, adjustment of PCR reaction conditions, and combined with clone construction to sequence, all the gaps were finished. We obtained the complete genome sequence of BCG Tice and submitted it to GenBank of National Center for Biotechnology Information (NCBI). The genome of BCG Tice is 4334064 base pairs in length, with GC content 65.65%. The problems and strategies during the finishing step of BCG Tice sequencing are illuminated here, with the hope of affording some experience to those who are involved in the finishing step of genome sequencing. The microarray data were verified by our results.

  13. Comparison of whole genome amplification techniques for human single cell exome sequencing.

    Science.gov (United States)

    Borgström, Erik; Paterlini, Marta; Mold, Jeff E; Frisen, Jonas; Lundeberg, Joakim

    2017-01-01

    Whole genome amplification (WGA) is currently a prerequisite for single cell whole genome or exome sequencing. Depending on the method used the rate of artifact formation, allelic dropout and sequence coverage over the genome may differ significantly. The largest difference between the evaluated protocols was observed when analyzing the target coverage and read depth distribution. These differences also had impact on the downstream variant calling. Conclusively, the products from the AMPLI1 and MALBAC kits were shown to be most similar to the bulk samples and are therefore recommended for WGA of single cells. In this study four commercial kits for WGA (AMPLI1, MALBAC, Repli-G and PicoPlex) were used to amplify human single cells. The WGA products were exome sequenced together with non-amplified bulk samples from the same source. The resulting data was evaluated in terms of genomic coverage, allelic dropout and SNP calling.

  14. A Targeted Enrichment Strategy for Massively Parallel Sequencing of Angiosperm Plastid Genomes

    Directory of Open Access Journals (Sweden)

    Gregory W. Stull

    2013-02-01

    Full Text Available Premise of the study: We explored a targeted enrichment strategy to facilitate rapid and low-cost next-generation sequencing (NGS of numerous complete plastid genomes from across the phylogenetic breadth of angiosperms. Methods and Results: A custom RNA probe set including the complete sequences of 22 previously sequenced eudicot plastomes was designed to facilitate hybridization-based targeted enrichment of eudicot plastid genomes. Using this probe set and an Agilent SureSelect targeted enrichment kit, we conducted an enrichment experiment including 24 angiosperms (22 eudicots, two monocots, which were subsequently sequenced on a single lane of the Illumina GAIIx with single-end, 100-bp reads. This approach yielded nearly complete to complete plastid genomes with exceptionally high coverage (mean coverage: 717×, even for the two monocots. Conclusions: Our enrichment experiment was highly successful even though many aspects of the capture process employed were suboptimal. Hence, significant improvements to this methodology are feasible. With this general approach and probe set, it should be possible to sequence more than 300 essentially complete plastid genomes in a single Illumina GAIIx lane (achieving 50× mean coverage. However, given the complications of pooling numerous samples for multiplex sequencing and the limited number of barcodes (e.g., 96 available in commercial kits, we recommend 96 samples as a current practical maximum for multiplex plastome sequencing. This high-throughput approach should facilitate large-scale plastid genome sequencing at any level of phylogenetic diversity in angiosperms.

  15. Whole genome sequence of the emerging oomycete pathogen Pythium insidiosum strain CDC-B5653 isolated from an infected human in the USA

    Directory of Open Access Journals (Sweden)

    Marina S. Ascunce

    2016-03-01

    Full Text Available Pythium insidiosum ATCC 200269 strain CDC-B5653, an isolate from necrotizing lesions on the mouth and eye of a 2-year-old boy in Memphis, Tennessee, USA, was sequenced using a combination of Illumina MiSeq (300 bp paired-end, 14 millions reads and PacBio (10  Kb fragment library, 356,001 reads. The sequencing data were assembled using SPAdes version 3.1.0, yielding a total genome size of 45.6 Mb contained in 8992 contigs, N50 of 13 Kb, 57% G + C content, and 17,867 putative protein-coding genes. This Whole Genome Shotgun project has been deposited at DDBJ/EMBL/GenBank under the accession JRHR00000000. Keywords: Oomycete, Pythium insidiosum, Pythiosis, Human emerging pathogen, Genome sequencing

  16. Development of Microsatellite Loci for the Riparian Tree Species Melaleuca argentea (Myrtaceae Using 454 Sequencing

    Directory of Open Access Journals (Sweden)

    Paul G. Nevill

    2013-05-01

    Full Text Available Premise of the study: Microsatellite primers were developed for Melaleuca argentea (Myrtaceae to evaluate genetic diversity and population genetic structure of this broadly distributed northern Australian riparian tree species. Methods and Results: 454 GS-FLX shotgun sequencing was used to obtain 5860 sequences containing putative microsatellite motifs. Two multiplex PCRs were optimized to genotype 11 polymorphic microsatellite loci. These loci were screened for variation in individuals from two populations in the Pilbara region, northwestern Western Australia. Overall, observed heterozygosities ranged from 0.27 to 0.86 (mean: 0.52 and the number of alleles per locus ranged from two to 13 (average: 4.3. Conclusions: These microsatellite loci will be useful in future studies of the evolutionary history and population and spatial genetic structure in M. argentea, and inform the development of seed sourcing strategies for the species.

  17. The study of human Y chromosome variation through ancient DNA.

    Science.gov (United States)

    Kivisild, Toomas

    2017-05-01

    High throughput sequencing methods have completely transformed the study of human Y chromosome variation by offering a genome-scale view on genetic variation retrieved from ancient human remains in context of a growing number of high coverage whole Y chromosome sequence data from living populations from across the world. The ancient Y chromosome sequences are providing us the first exciting glimpses into the past variation of male-specific compartment of the genome and the opportunity to evaluate models based on previously made inferences from patterns of genetic variation in living populations. Analyses of the ancient Y chromosome sequences are challenging not only because of issues generally related to ancient DNA work, such as DNA damage-induced mutations and low content of endogenous DNA in most human remains, but also because of specific properties of the Y chromosome, such as its highly repetitive nature and high homology with the X chromosome. Shotgun sequencing of uniquely mapping regions of the Y chromosomes to sufficiently high coverage is still challenging and costly in poorly preserved samples. To increase the coverage of specific target SNPs capture-based methods have been developed and used in recent years to generate Y chromosome sequence data from hundreds of prehistoric skeletal remains. Besides the prospects of testing directly as how much genetic change in a given time period has accompanied changes in material culture the sequencing of ancient Y chromosomes allows us also to better understand the rate at which mutations accumulate and get fixed over time. This review considers genome-scale evidence on ancient Y chromosome diversity that has recently started to accumulate in geographic areas favourable to DNA preservation. More specifically the review focuses on examples of regional continuity and change of the Y chromosome haplogroups in North Eurasia and in the New World.

  18. Medicare Coverage Database

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Medicare Coverage Database (MCD) contains all National Coverage Determinations (NCDs) and Local Coverage Determinations (LCDs), local articles, and proposed NCD...

  19. Quantification of massively parallel sequencing libraries - a comparative study of eight methods

    DEFF Research Database (Denmark)

    Hussing, Christian; Kampmann, Marie-Louise; Mogensen, Helle Smidt

    2018-01-01

    Quantification of massively parallel sequencing libraries is important for acquisition of monoclonal beads or clusters prior to clonal amplification and to avoid large variations in library coverage when multiple samples are included in one sequencing analysis. No gold standard for quantification...... estimates followed by Qubit and electrophoresis-based instruments (Bioanalyzer, TapeStation, GX Touch, and Fragment Analyzer), while SYBR Green and TaqMan based qPCR assays gave the lowest estimates. qPCR gave more accurate predictions of sequencing coverage than Qubit and TapeStation did. Costs, time......-consumption, workflow simplicity, and ability to quantify multiple samples are discussed. Technical specifications, advantages, and disadvantages of the various methods are pointed out....

  20. Analysis of quality raw data of second generation sequencers with Quality Assessment Software.

    Science.gov (United States)

    Ramos, Rommel Tj; Carneiro, Adriana R; Baumbach, Jan; Azevedo, Vasco; Schneider, Maria Pc; Silva, Artur

    2011-04-18

    Second generation technologies have advantages over Sanger; however, they have resulted in new challenges for the genome construction process, especially because of the small size of the reads, despite the high degree of coverage. Independent of the program chosen for the construction process, DNA sequences are superimposed, based on identity, to extend the reads, generating contigs; mismatches indicate a lack of homology and are not included. This process improves our confidence in the sequences that are generated. We developed Quality Assessment Software, with which one can review graphs showing the distribution of quality values from the sequencing reads. This software allow us to adopt more stringent quality standards for sequence data, based on quality-graph analysis and estimated coverage after applying the quality filter, providing acceptable sequence coverage for genome construction from short reads. Quality filtering is a fundamental step in the process of constructing genomes, as it reduces the frequency of incorrect alignments that are caused by measuring errors, which can occur during the construction process due to the size of the reads, provoking misassemblies. Application of quality filters to sequence data, using the software Quality Assessment, along with graphing analyses, provided greater precision in the definition of cutoff parameters, which increased the accuracy of genome construction.

  1. Application of whole genome shotgun sequencing for detection and characterization of genetically modified organisms and derived products

    NARCIS (Netherlands)

    Holst-Jensen, Arne; Spilsberg, Bjørn; Arulandhu, Alfred J.; Kok, Esther; Shi, Jianxin; Zel, Jana

    2016-01-01

    The emergence of high-throughput, massive or next-generation sequencing technologies has created a completely new foundation for molecular analyses. Various selective enrichment processes are commonly applied to facilitate detection of predefined (known) targets. Such approaches, however,

  2. Genome shotgun sequencing and development of microsatellite ...

    African Journals Online (AJOL)

    ADP

    2012-04-10

    Apr 10, 2012 ... useful for investigating genetic diversity and differentiation in gerbera. Key words: ... However, this method had a disadvantage: it could not .... PCR product. PCR was ..... advantages, SSR markers had not been developed or ...

  3. Source-pathway-receptor investigation of the fate of trace elements derived from shotgun pellets discharged in terrestrial ecosystems managed for game shooting

    International Nuclear Information System (INIS)

    Sneddon, Jennifer; Clemente, Rafael; Riby, Philip; Lepp, Nicholas W.

    2009-01-01

    Spent shotgun pellets may contaminate terrestrial ecosystems. We examined the fate of elements originating from shotgun pellets in pasture and woodland ecosystems. Two source-receptor pathways: i) soil-soil pore water-plant and ii) whole earthworm/worm gut contents - washed and unwashed small mammal hair were investigated. Concentrations of Pb and associated contaminants were higher in soils from shot areas than controls. Arsenic and lead concentrations were positively correlated in soils, soil pore water and associated biota. Element concentrations in biota were below statutory levels in all locations. Bioavailability of lead to small mammals, based on concentrations in washed body hair was low. Lead movement from soil water to higher trophic levels was minor compared to lead adsorbed onto body surfaces. Lead was concentrated in earthworm gut and some plants. Results indicate that managed game shooting presents minimal risk in terms of element transfer to soils and their associated biota. - Source-receptor pathway analysis of a managed game shooting site showed no environmental risk of trace element transfer.

  4. Source-pathway-receptor investigation of the fate of trace elements derived from shotgun pellets discharged in terrestrial ecosystems managed for game shooting

    Energy Technology Data Exchange (ETDEWEB)

    Sneddon, Jennifer [School of Natural Sciences and Psychology, Liverpool John Moores University, Byrom Street, Liverpool L3 3AF (United Kingdom); Clemente, Rafael, E-mail: rclemente@cebas.csic.e [School of Natural Sciences and Psychology, Liverpool John Moores University, Byrom Street, Liverpool L3 3AF (United Kingdom); Riby, Philip [School of Pharmacy and Chemistry, Liverpool John Moores University, Liverpool L3 3AF (United Kingdom); Lepp, Nicholas W., E-mail: n.w.lepp@ljmu.ac.u [School of Natural Sciences and Psychology, Liverpool John Moores University, Byrom Street, Liverpool L3 3AF (United Kingdom)

    2009-10-15

    Spent shotgun pellets may contaminate terrestrial ecosystems. We examined the fate of elements originating from shotgun pellets in pasture and woodland ecosystems. Two source-receptor pathways: i) soil-soil pore water-plant and ii) whole earthworm/worm gut contents - washed and unwashed small mammal hair were investigated. Concentrations of Pb and associated contaminants were higher in soils from shot areas than controls. Arsenic and lead concentrations were positively correlated in soils, soil pore water and associated biota. Element concentrations in biota were below statutory levels in all locations. Bioavailability of lead to small mammals, based on concentrations in washed body hair was low. Lead movement from soil water to higher trophic levels was minor compared to lead adsorbed onto body surfaces. Lead was concentrated in earthworm gut and some plants. Results indicate that managed game shooting presents minimal risk in terms of element transfer to soils and their associated biota. - Source-receptor pathway analysis of a managed game shooting site showed no environmental risk of trace element transfer.

  5. A Case Study into Microbial Genome Assembly Gap Sequences and Finishing Strategies.

    Science.gov (United States)

    Utturkar, Sagar M; Klingeman, Dawn M; Hurt, Richard A; Brown, Steven D

    2017-01-01

    This study characterized regions of DNA which remained unassembled by either PacBio and Illumina sequencing technologies for seven bacterial genomes. Two genomes were manually finished using bioinformatics and PCR/Sanger sequencing approaches and regions not assembled by automated software were analyzed. Gaps present within Illumina assemblies mostly correspond to repetitive DNA regions such as multiple rRNA operon sequences. PacBio gap sequences were evaluated for several properties such as GC content, read coverage, gap length, ability to form strong secondary structures, and corresponding annotations. Our hypothesis that strong secondary DNA structures blocked DNA polymerases and contributed to gap sequences was not accepted. PacBio assemblies had few limitations overall and gaps were explained as cumulative effect of lower than average sequence coverage and repetitive sequences at contig termini. An important aspect of the present study is the compilation of biological features that interfered with assembly and included active transposons, multiple plasmid sequences, phage DNA integration, and large sequence duplication. Our targeted genome finishing approach and systematic evaluation of the unassembled DNA will be useful for others looking to close, finish, and polish microbial genome sequences.

  6. Exome sequencing for prenatal diagnosis of fetuses with sonographic abnormalities.

    Science.gov (United States)

    Drury, Suzanne; Williams, Hywel; Trump, Natalie; Boustred, Christopher; Lench, Nicholas; Scott, Richard H; Chitty, Lyn S

    2015-10-01

    In the absence of aneuploidy or other pathogenic cytogenetic abnormality, fetuses with increased nuchal translucency (NT ≥ 3.5 mm) and/or other sonographic abnormalities have a greater incidence of genetic syndromes, but defining the underlying pathology can be challenging. Here, we investigate the value of whole exome sequencing in fetuses with sonographic abnormalities but normal microarray analysis. Whole exome sequencing was performed on DNA extracted from chorionic villi or amniocytes in 24 fetuses with unexplained ultrasound findings. In the first 14 cases sequencing was initially performed on fetal DNA only. For the remaining 10, the trio of fetus, mother and father was sequenced simultaneously. In 21% (5/24) cases, exome sequencing provided definitive diagnoses (Milroy disease, hypophosphatasia, achondrogenesis type 2, Freeman-Sheldon syndrome and Baraitser-Winter Syndrome). In a further case, a plausible diagnosis of orofaciodigital syndrome type 6 was made. In two others, a single mutation in an autosomal recessive gene was identified, but incomplete sequencing coverage precluded exclusion of the presence of a second mutation. Whole exome sequencing improves prenatal diagnosis in euploid fetuses with abnormal ultrasound scans. In order to expedite interpretation of results, trio sequencing should be employed, but interpretation can still be compromised by incomplete coverage of relevant genes. © 2015 John Wiley & Sons, Ltd.

  7. Exploring the potential of second-generation sequencing in diverse biological contexts

    DEFF Research Database (Denmark)

    Fordyce, Sarah Louise

    Second generation sequencing (SGS) has revolutionized the study of DNA, allowing massive parallel sequencing of nucleic acids with unprecedented depths of coverage. The research undertaken in this thesis occurred in parallel with the increased accessibility of SGS platforms for routine genetic...

  8. Evaluating imputation algorithms for low-depth genotyping-by-sequencing (GBS) data

    Science.gov (United States)

    Well-powered genomic studies require genome-wide marker coverage across many individuals. For non-model species with few genomic resources, high-throughput sequencing (HTS) methods, such as Genotyping-By-Sequencing (GBS), offer an inexpensive alternative to array-based genotyping. Although affordabl...

  9. ASAP: Amplification, sequencing & annotation of plastomes

    Directory of Open Access Journals (Sweden)

    Folta Kevin M

    2005-12-01

    Full Text Available Abstract Background Availability of DNA sequence information is vital for pursuing structural, functional and comparative genomics studies in plastids. Traditionally, the first step in mining the valuable information within a chloroplast genome requires sequencing a chloroplast plasmid library or BAC clones. These activities involve complicated preparatory procedures like chloroplast DNA isolation or identification of the appropriate BAC clones to be sequenced. Rolling circle amplification (RCA is being used currently to amplify the chloroplast genome from purified chloroplast DNA and the resulting products are sheared and cloned prior to sequencing. Herein we present a universal high-throughput, rapid PCR-based technique to amplify, sequence and assemble plastid genome sequence from diverse species in a short time and at reasonable cost from total plant DNA, using the large inverted repeat region from strawberry and peach as proof of concept. The method exploits the highly conserved coding regions or intergenic regions of plastid genes. Using an informatics approach, chloroplast DNA sequence information from 5 available eudicot plastomes was aligned to identify the most conserved regions. Cognate primer pairs were then designed to generate ~1 – 1.2 kb overlapping amplicons from the inverted repeat region in 14 diverse genera. Results 100% coverage of the inverted repeat region was obtained from Arabidopsis, tobacco, orange, strawberry, peach, lettuce, tomato and Amaranthus. Over 80% coverage was obtained from distant species, including Ginkgo, loblolly pine and Equisetum. Sequence from the inverted repeat region of strawberry and peach plastome was obtained, annotated and analyzed. Additionally, a polymorphic region identified from gel electrophoresis was sequenced from tomato and Amaranthus. Sequence analysis revealed large deletions in these species relative to tobacco plastome thus exhibiting the utility of this method for structural and

  10. Sequencing and analysis of an Irish human genome.

    LENUS (Irish Health Repository)

    Tong, Pin

    2010-01-01

    Recent studies generating complete human sequences from Asian, African and European subgroups have revealed population-specific variation and disease susceptibility loci. Here, choosing a DNA sample from a population of interest due to its relative geographical isolation and genetic impact on further populations, we extend the above studies through the generation of 11-fold coverage of the first Irish human genome sequence.

  11. Genomic sequencing: assessing the health care system, policy, and big-data implications.

    Science.gov (United States)

    Phillips, Kathryn A; Trosman, Julia R; Kelley, Robin K; Pletcher, Mark J; Douglas, Michael P; Weldon, Christine B

    2014-07-01

    New genomic sequencing technologies enable the high-speed analysis of multiple genes simultaneously, including all of those in a person's genome. Sequencing is a prominent example of a "big data" technology because of the massive amount of information it produces and its complexity, diversity, and timeliness. Our objective in this article is to provide a policy primer on sequencing and illustrate how it can affect health care system and policy issues. Toward this end, we developed an easily applied classification of sequencing based on inputs, methods, and outputs. We used it to examine the implications of sequencing for three health care system and policy issues: making care more patient-centered, developing coverage and reimbursement policies, and assessing economic value. We conclude that sequencing has great promise but that policy challenges include how to optimize patient engagement as well as privacy, develop coverage policies that distinguish research from clinical uses and account for bioinformatics costs, and determine the economic value of sequencing through complex economic models that take into account multiple findings and downstream costs. Project HOPE—The People-to-People Health Foundation, Inc.

  12. Divide and conquer: enriching environmental sequencing data.

    Directory of Open Access Journals (Sweden)

    Anne Bergeron

    2007-09-01

    Full Text Available In environmental sequencing projects, a mix of DNA from a whole microbial community is fragmented and sequenced, with one of the possible goals being to reconstruct partial or complete genomes of members of the community. In communities with high diversity of species, a significant proportion of the sequences do not overlap any other fragment in the sample. This problem will arise not only in situations with a relatively even distribution of many species, but also when the community in a particular environment is routinely dominated by the same few species. In the former case, no genomes may be assembled at all, while in the latter case a few dominant species in an environment will always be sequenced at high coverage to the detriment of coverage of the greater number of sparse species.Here we show that, with the same global sequencing effort, separating the species into two or more sub-communities prior to sequencing can yield a much higher proportion of sequences that can be assembled. We first use the Lander-Waterman model to show that, if the expected percentage of singleton sequences is higher than 25%, then, under the uniform distribution hypothesis, splitting the community is always a wise choice. We then construct simulated microbial communities to show that the results hold for highly non-uniform distributions. We also show that, for the distributions considered in the experiments, it is possible to estimate quite accurately the relative diversity of the two sub-communities.Given the fact that several methods exist to split microbial communities based on physical properties such as size, density, surface biochemistry, or optical properties, we strongly suggest that groups involved in environmental sequencing, and expecting high diversity, consider splitting their communities in order to maximize the information content of their sequencing effort.

  13. Analysis of secretome of breast cancer cell line with an optimized semi-shotgun method

    International Nuclear Information System (INIS)

    Tang Xiaorong; Yao Ling; Chen Keying; Hu Xiaofang; Xu Lisa; Fan Chunhai

    2009-01-01

    Secretome, the totality of secreted proteins, is viewed as a promising pool of candidate cancer biomarkers. Simple and reliable methods for identifying secreted proteins are highly desired. We used an optimized semi-shotgun liquid chromatography followed by tandem mass spectrometry (LC-MS/MS) method to analyze the secretome of breast cancer cell line MDA-MB-231. A total of 464 proteins were identified. About 63% of the proteins were classified as secreted proteins, including many promising breast cancer biomarkers, which were thought to be correlated with tumorigenesis, tumor development and metastasis. These results suggest that the optimized method may be a powerful strategy for cell line secretome profiling, and can be used to find potential cancer biomarkers with great clinical significance. (authors)

  14. Coverage and Rate of Downlink Sequence Transmissions with Reliability Guarantees

    DEFF Research Database (Denmark)

    Park, Jihong; Popovski, Petar

    2017-01-01

    Real-time distributed control is a promising application of 5G in which communication links should satisfy certain reliability guarantees. In this letter, we derive closed-form maximum average rate when a device (e.g. industrial machine) downloads a sequence of n operational commands through cell...

  15. Bayesian clustering of DNA sequences using Markov chains and a stochastic partition model.

    Science.gov (United States)

    Jääskinen, Väinö; Parkkinen, Ville; Cheng, Lu; Corander, Jukka

    2014-02-01

    In many biological applications it is necessary to cluster DNA sequences into groups that represent underlying organismal units, such as named species or genera. In metagenomics this grouping needs typically to be achieved on the basis of relatively short sequences which contain different types of errors, making the use of a statistical modeling approach desirable. Here we introduce a novel method for this purpose by developing a stochastic partition model that clusters Markov chains of a given order. The model is based on a Dirichlet process prior and we use conjugate priors for the Markov chain parameters which enables an analytical expression for comparing the marginal likelihoods of any two partitions. To find a good candidate for the posterior mode in the partition space, we use a hybrid computational approach which combines the EM-algorithm with a greedy search. This is demonstrated to be faster and yield highly accurate results compared to earlier suggested clustering methods for the metagenomics application. Our model is fairly generic and could also be used for clustering of other types of sequence data for which Markov chains provide a reasonable way to compress information, as illustrated by experiments on shotgun sequence type data from an Escherichia coli strain.

  16. Identification of optimum sequencing depth especially for de novo genome assembly of small genomes using next generation sequencing data.

    Science.gov (United States)

    Desai, Aarti; Marwah, Veer Singh; Yadav, Akshay; Jha, Vineet; Dhaygude, Kishor; Bangar, Ujwala; Kulkarni, Vivek; Jere, Abhay

    2013-01-01

    Next Generation Sequencing (NGS) is a disruptive technology that has found widespread acceptance in the life sciences research community. The high throughput and low cost of sequencing has encouraged researchers to undertake ambitious genomic projects, especially in de novo genome sequencing. Currently, NGS systems generate sequence data as short reads and de novo genome assembly using these short reads is computationally very intensive. Due to lower cost of sequencing and higher throughput, NGS systems now provide the ability to sequence genomes at high depth. However, currently no report is available highlighting the impact of high sequence depth on genome assembly using real data sets and multiple assembly algorithms. Recently, some studies have evaluated the impact of sequence coverage, error rate and average read length on genome assembly using multiple assembly algorithms, however, these evaluations were performed using simulated datasets. One limitation of using simulated datasets is that variables such as error rates, read length and coverage which are known to impact genome assembly are carefully controlled. Hence, this study was undertaken to identify the minimum depth of sequencing required for de novo assembly for different sized genomes using graph based assembly algorithms and real datasets. Illumina reads for E.coli (4.6 MB) S.kudriavzevii (11.18 MB) and C.elegans (100 MB) were assembled using SOAPdenovo, Velvet, ABySS, Meraculous and IDBA-UD. Our analysis shows that 50X is the optimum read depth for assembling these genomes using all assemblers except Meraculous which requires 100X read depth. Moreover, our analysis shows that de novo assembly from 50X read data requires only 6-40 GB RAM depending on the genome size and assembly algorithm used. We believe that this information can be extremely valuable for researchers in designing experiments and multiplexing which will enable optimum utilization of sequencing as well as analysis resources.

  17. A sampling and metagenomic sequencing-based methodology for monitoring antimicrobial resistance in swine herds

    DEFF Research Database (Denmark)

    Munk, Patrick; Dalhoff Andersen, Vibe; de Knegt, Leonardo

    2016-01-01

    Objectives Reliable methods for monitoring antimicrobial resistance (AMR) in livestock and other reservoirs are essential to understand the trends, transmission and importance of agricultural resistance. Quantification of AMR is mostly done using culture-based techniques, but metagenomic read...... mapping shows promise for quantitative resistance monitoring. Methods We evaluated the ability of: (i) MIC determination for Escherichia coli; (ii) cfu counting of E. coli; (iii) cfu counting of aerobic bacteria; and (iv) metagenomic shotgun sequencing to predict expected tetracycline resistance based...... cultivation-based techniques in terms of predicting expected tetracycline resistance based on antimicrobial consumption. Our metagenomic approach had sufficient resolution to detect antimicrobial-induced changes to individual resistance gene abundances. Pen floor manure samples were found to represent rectal...

  18. Functional Coverage of the Human Genome by Existing Structures, Structural Genomics Targets, and Homology Models.

    Directory of Open Access Journals (Sweden)

    2005-08-01

    Full Text Available The bias in protein structure and function space resulting from experimental limitations and targeting of particular functional classes of proteins by structural biologists has long been recognized, but never continuously quantified. Using the Enzyme Commission and the Gene Ontology classifications as a reference frame, and integrating structure data from the Protein Data Bank (PDB, target sequences from the structural genomics projects, structure homology derived from the SUPERFAMILY database, and genome annotations from Ensembl and NCBI, we provide a quantified view, both at the domain and whole-protein levels, of the current and projected coverage of protein structure and function space relative to the human genome. Protein structures currently provide at least one domain that covers 37% of the functional classes identified in the genome; whole structure coverage exists for 25% of the genome. If all the structural genomics targets were solved (twice the current number of structures in the PDB, it is estimated that structures of one domain would cover 69% of the functional classes identified and complete structure coverage would be 44%. Homology models from existing experimental structures extend the 37% coverage to 56% of the genome as single domains and 25% to 31% for complete structures. Coverage from homology models is not evenly distributed by protein family, reflecting differing degrees of sequence and structure divergence within families. While these data provide coverage, conversely, they also systematically highlight functional classes of proteins for which structures should be determined. Current key functional families without structure representation are highlighted here; updated information on the "most wanted list" that should be solved is available on a weekly basis from http://function.rcsb.org:8080/pdb/function_distribution/index.html.

  19. V-GAP: Viral genome assembly pipeline

    KAUST Repository

    Nakamura, Yoji

    2015-10-22

    Next-generation sequencing technologies have allowed the rapid determination of the complete genomes of many organisms. Although shotgun sequences from large genome organisms are still difficult to reconstruct perfect contigs each of which represents a full chromosome, those from small genomes have been assembled successfully into a very small number of contigs. In this study, we show that shotgun reads from phage genomes can be reconstructed into a single contig by controlling the number of read sequences used in de novo assembly. We have developed a pipeline to assemble small viral genomes with good reliability using a resampling method from shotgun data. This pipeline, named V-GAP (Viral Genome Assembly Pipeline), will contribute to the rapid genome typing of viruses, which are highly divergent, and thus will meet the increasing need for viral genome comparisons in metagenomic studies.

  20. V-GAP: Viral genome assembly pipeline

    KAUST Repository

    Nakamura, Yoji; Yasuike, Motoshige; Nishiki, Issei; Iwasaki, Yuki; Fujiwara, Atushi; Kawato, Yasuhiko; Nakai, Toshihiro; Nagai, Satoshi; Kobayashi, Takanori; Gojobori, Takashi; Ototake, Mitsuru

    2015-01-01

    Next-generation sequencing technologies have allowed the rapid determination of the complete genomes of many organisms. Although shotgun sequences from large genome organisms are still difficult to reconstruct perfect contigs each of which represents a full chromosome, those from small genomes have been assembled successfully into a very small number of contigs. In this study, we show that shotgun reads from phage genomes can be reconstructed into a single contig by controlling the number of read sequences used in de novo assembly. We have developed a pipeline to assemble small viral genomes with good reliability using a resampling method from shotgun data. This pipeline, named V-GAP (Viral Genome Assembly Pipeline), will contribute to the rapid genome typing of viruses, which are highly divergent, and thus will meet the increasing need for viral genome comparisons in metagenomic studies.

  1. 454 sequencing of pooled BAC clones on chromosome 3H of barley

    Directory of Open Access Journals (Sweden)

    Yamaji Nami

    2011-05-01

    Full Text Available Abstract Background Genome sequencing of barley has been delayed due to its large genome size (ca. 5,000Mbp. Among the fast sequencing systems, 454 liquid phase pyrosequencing provides the longest reads and is the most promising method for BAC clones. Here we report the results of pooled sequencing of BAC clones selected with ESTs genetically mapped to chromosome 3H. Results We sequenced pooled barley BAC clones using a 454 parallel genome sequencer. A PCR screening system based on primer sets derived from genetically mapped ESTs on chromosome 3H was used for clone selection in a BAC library developed from cultivar "Haruna Nijo". The DNA samples of 10 or 20 BAC clones were pooled and used for shotgun library development. The homology between contig sequences generated in each pooled library and mapped EST sequences was studied. The number of contigs assigned on chromosome 3H was 372. Their lengths ranged from 1,230 bp to 58,322 bp with an average 14,891 bp. Of these contigs, 240 showed homology and colinearity with the genome sequence of rice chromosome 1. A contig annotation browser supplemented with query search by unique sequence or genetic map position was developed. The identified contigs can be annotated with barley cDNAs and reference sequences on the browser. Homology analysis of these contigs with rice genes indicated that 1,239 rice genes can be assigned to barley contigs by the simple comparison of sequence lengths in both species. Of these genes, 492 are assigned to rice chromosome 1. Conclusions We demonstrate the efficiency of sequencing gene rich regions from barley chromosome 3H, with special reference to syntenic relationships with rice chromosome 1.

  2. NeSSM: a Next-generation Sequencing Simulator for Metagenomics.

    Directory of Open Access Journals (Sweden)

    Ben Jia

    Full Text Available BACKGROUND: Metagenomics can reveal the vast majority of microbes that have been missed by traditional cultivation-based methods. Due to its extremely wide range of application areas, fast metagenome sequencing simulation systems with high fidelity are in great demand to facilitate the development and comparison of metagenomics analysis tools. RESULTS: We present here a customizable metagenome simulation system: NeSSM (Next-generation Sequencing Simulator for Metagenomics. Combining complete genomes currently available, a community composition table, and sequencing parameters, it can simulate metagenome sequencing better than existing systems. Sequencing error models based on the explicit distribution of errors at each base and sequencing coverage bias are incorporated in the simulation. In order to improve the fidelity of simulation, tools are provided by NeSSM to estimate the sequencing error models, sequencing coverage bias and the community composition directly from existing metagenome sequencing data. Currently, NeSSM supports single-end and pair-end sequencing for both 454 and Illumina platforms. In addition, a GPU (graphics processing units version of NeSSM is also developed to accelerate the simulation. By comparing the simulated sequencing data from NeSSM with experimental metagenome sequencing data, we have demonstrated that NeSSM performs better in many aspects than existing popular metagenome simulators, such as MetaSim, GemSIM and Grinder. The GPU version of NeSSM is more than one-order of magnitude faster than MetaSim. CONCLUSIONS: NeSSM is a fast simulation system for high-throughput metagenome sequencing. It can be helpful to develop tools and evaluate strategies for metagenomics analysis and it's freely available for academic users at http://cbb.sjtu.edu.cn/~ccwei/pub/software/NeSSM.php.

  3. Switch from oral to inactivated poliovirus vaccine in Yogyakarta Province, Indonesia: summary of coverage, immunity, and environmental surveillance.

    Science.gov (United States)

    Wahjuhono, Gendro; Revolusiana; Widhiastuti, Dyah; Sundoro, Julitasari; Mardani, Tri; Ratih, Woro Umi; Sutomo, Retno; Safitri, Ida; Sampurno, Ondri Dwi; Rana, Bardan; Roivainen, Merja; Kahn, Anna-Lea; Mach, Ondrej; Pallansch, Mark A; Sutter, Roland W

    2014-11-01

    Inactivated poliovirus vaccine (IPV) is rarely used in tropical developing countries. To generate additional scientific information, especially on the possible emergence of vaccine-derived polioviruses (VDPVs) in an IPV-only environment, we initiated an IPV introduction project in Yogyakarta, an Indonesian province. In this report, we present the coverage, immunity, and VDPV surveillance results. In Yogyakarta, we established environmental surveillance starting in 2004; and conducted routine immunization coverage and seroprevalence surveys before and after a September 2007 switch from oral poliovirus vaccine (OPV) to IPV, using standard coverage and serosurvey methods. Rates and types of polioviruses found in sewage samples were analyzed, and all poliovirus isolates after the switch were sequenced. Vaccination coverage (>95%) and immunity (approximately 100%) did not change substantially before and after the IPV switch. No VDPVs were detected. Before the switch, 58% of environmental samples contained Sabin poliovirus; starting 6 weeks after the switch, Sabin polioviruses were rarely isolated, and if they were, genetic sequencing suggested recent introductions. This project demonstrated that under almost ideal conditions (good hygiene, maintenance of universally high IPV coverage, and corresponding high immunity against polioviruses), no emergence and circulation of VDPV could be detected in a tropical developing country setting. © The Author 2014. Published by Oxford University Press on behalf of the Infectious Diseases Society of America. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  4. Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes

    DEFF Research Database (Denmark)

    Albertsen, Mads; Hugenholtz, Philip; Skarshewski, Adam

    2013-01-01

    Reference genomes are required to understand the diverse roles of microorganisms in ecology, evolution, human and animal health, but most species remain uncultured. Here we present a sequence composition–independent approach to recover high-quality microbial genomes from deeply sequenced metageno......Reference genomes are required to understand the diverse roles of microorganisms in ecology, evolution, human and animal health, but most species remain uncultured. Here we present a sequence composition–independent approach to recover high-quality microbial genomes from deeply sequenced...

  5. The draft genome sequence of Mangrovibacter sp. strain MP23, an endophyte isolated from the roots of Phragmites karka

    Directory of Open Access Journals (Sweden)

    Pratiksha Behera

    2016-09-01

    Full Text Available Till date, only one draft genome has been reported within the genus Mangrovibacter. Here, we report the second draft genome shotgun sequence of a Mangrovibacter sp. strain MP23 that was isolated from the roots of Phargmites karka (P. karka, an invasive weed growing in the Chilika Lagoon, Odisha, India. Strain MP23 is a facultative anaerobic, nitrogen-fixing endophytic bacteria that grows optimally at 37 °C, 7.0 pH, and 1% NaCl concentration. The draft genome sequence of strain MP23 contains 4,947,475 bp with an estimated G + C content of 49.9% and total 4392 protein coding genes. The genome sequence has provided information on putative genes that code for proteins involved in oxidative stress, uptake of nutrients, and nitrogen fixation that might offer niche specific ecological fitness and explain the invasive success of P. karka in Chilika Lagoon. The draft genome sequence and annotation have been deposited at DDBJ/EMBL/GenBank under the accession number LYRP00000000.

  6. Whole-Genome Shotgun Sequence of the Keratinolytic Bacterium Lysobacter sp. A03, Isolated from the Antarctic Environment

    OpenAIRE

    Pereira, Jamile Queiroz; Ambrosini, Adriana; Sant?Anna, Fernando Hayashi; Tadra-Sfeir, Michele; Faoro, Helisson; Pedrosa, F?bio Oliveira; Souza, Emanuel Maltempi; Brandelli, Adriano; Passaglia, Luciane M. P.

    2015-01-01

    Lysobacter sp. strain A03 is a protease-producing bacterium isolated from decomposing-penguin feathers collected in the Antarctic environment. This strain has the ability to degrade keratin at low temperatures. The A03 genome sequence provides the possibility of finding new genes with biotechnological potential to better understand its cold-adaptation mechanism and survival in cold environments.

  7. The contribution of 700,000 ORF sequence tags to the definition of the human transcriptome

    Science.gov (United States)

    Camargo, Anamaria A.; Samaia, Helena P. B.; Dias-Neto, Emmanuel; Simão, Daniel F.; Migotto, Italo A.; Briones, Marcelo R. S.; Costa, Fernando F.; Aparecida Nagai, Maria; Verjovski-Almeida, Sergio; Zago, Marco A.; Andrade, Luis Eduardo C.; Carrer, Helaine; El-Dorry, Hamza F. A.; Espreafico, Enilza M.; Habr-Gama, Angelita; Giannella-Neto, Daniel; Goldman, Gustavo H.; Gruber, Arthur; Hackel, Christine; Kimura, Edna T.; Maciel, Rui M. B.; Marie, Suely K. N.; Martins, Elizabeth A. L.; Nóbrega, Marina P.; Paçó-Larson, Maria Luisa; Pardini, Maria Inês M. C.; Pereira, Gonçalo G.; Pesquero, João Bosco; Rodrigues, Vanderlei; Rogatto, Silvia R.; da Silva, Ismael D. C. G.; Sogayar, Mari C.; Sonati, Maria de Fátima; Tajara, Eloiza H.; Valentini, Sandro R.; Alberto, Fernando L.; Amaral, Maria Elisabete J.; Aneas, Ivy; Arnaldi, Liliane A. T.; de Assis, Angela M.; Bengtson, Mário Henrique; Bergamo, Nadia Aparecida; Bombonato, Vanessa; de Camargo, Maria E. R.; Canevari, Renata A.; Carraro, Dirce M.; Cerutti, Janete M.; Corrêa, Maria Lucia C.; Corrêa, Rosana F. R.; Costa, Maria Cristina R.; Curcio, Cyntia; Hokama, Paula O. M.; Ferreira, Ari J. S.; Furuzawa, Gilberto K.; Gushiken, Tsieko; Ho, Paulo L.; Kimura, Elza; Krieger, José E.; Leite, Luciana C. C.; Majumder, Paromita; Marins, Mozart; Marques, Everaldo R.; Melo, Analy S. A.; Melo, Monica; Mestriner, Carlos Alberto; Miracca, Elisabete C.; Miranda, Daniela C.; Nascimento, Ana Lucia T. O.; Nóbrega, Francisco G.; Ojopi, Élida P. B.; Pandolfi, José Rodrigo C.; Pessoa, Luciana G.; Prevedel, Aline C.; Rahal, Paula; Rainho, Claudia A.; Reis, Eduardo M. R.; Ribeiro, Marcelo L.; da Rós, Nancy; de Sá, Renata G.; Sales, Magaly M.; Sant'anna, Simone Cristina; dos Santos, Mariana L.; da Silva, Aline M.; da Silva, Neusa P.; Silva, Wilson A.; da Silveira, Rosana A.; Sousa, Josane F.; Stecconi, Daniella; Tsukumo, Fernando; Valente, Valéria; Soares, Fernando; Moreira, Eloisa S.; Nunes, Diana N.; Correa, Ricardo G.; Zalcberg, Heloisa; Carvalho, Alex F.; Reis, Luis F. L.; Brentani, Ricardo R.; Simpson, Andrew J. G.; de Souza, Sandro J.

    2001-01-01

    Open reading frame expressed sequences tags (ORESTES) differ from conventional ESTs by providing sequence data from the central protein coding portion of transcripts. We generated a total of 696,745 ORESTES sequences from 24 human tissues and used a subset of the data that correspond to a set of 15,095 full-length mRNAs as a means of assessing the efficiency of the strategy and its potential contribution to the definition of the human transcriptome. We estimate that ORESTES sampled over 80% of all highly and moderately expressed, and between 40% and 50% of rarely expressed, human genes. In our most thoroughly sequenced tissue, the breast, the 130,000 ORESTES generated are derived from transcripts from an estimated 70% of all genes expressed in that tissue, with an equally efficient representation of both highly and poorly expressed genes. In this respect, we find that the capacity of the ORESTES strategy both for gene discovery and shotgun transcript sequence generation significantly exceeds that of conventional ESTs. The distribution of ORESTES is such that many human transcripts are now represented by a scaffold of partial sequences distributed along the length of each gene product. The experimental joining of the scaffold components, by reverse transcription–PCR, represents a direct route to transcript finishing that may represent a useful alternative to full-length cDNA cloning. PMID:11593022

  8. The contribution of 700,000 ORF sequence tags to the definition of the human transcriptome.

    Science.gov (United States)

    Camargo, A A; Samaia, H P; Dias-Neto, E; Simão, D F; Migotto, I A; Briones, M R; Costa, F F; Nagai, M A; Verjovski-Almeida, S; Zago, M A; Andrade, L E; Carrer, H; El-Dorry, H F; Espreafico, E M; Habr-Gama, A; Giannella-Neto, D; Goldman, G H; Gruber, A; Hackel, C; Kimura, E T; Maciel, R M; Marie, S K; Martins, E A; Nobrega, M P; Paco-Larson, M L; Pardini, M I; Pereira, G G; Pesquero, J B; Rodrigues, V; Rogatto, S R; da Silva, I D; Sogayar, M C; Sonati, M F; Tajara, E H; Valentini, S R; Alberto, F L; Amaral, M E; Aneas, I; Arnaldi, L A; de Assis, A M; Bengtson, M H; Bergamo, N A; Bombonato, V; de Camargo, M E; Canevari, R A; Carraro, D M; Cerutti, J M; Correa, M L; Correa, R F; Costa, M C; Curcio, C; Hokama, P O; Ferreira, A J; Furuzawa, G K; Gushiken, T; Ho, P L; Kimura, E; Krieger, J E; Leite, L C; Majumder, P; Marins, M; Marques, E R; Melo, A S; Melo, M B; Mestriner, C A; Miracca, E C; Miranda, D C; Nascimento, A L; Nobrega, F G; Ojopi, E P; Pandolfi, J R; Pessoa, L G; Prevedel, A C; Rahal, P; Rainho, C A; Reis, E M; Ribeiro, M L; da Ros, N; de Sa, R G; Sales, M M; Sant'anna, S C; dos Santos, M L; da Silva, A M; da Silva, N P; Silva, W A; da Silveira, R A; Sousa, J F; Stecconi, D; Tsukumo, F; Valente, V; Soares, F; Moreira, E S; Nunes, D N; Correa, R G; Zalcberg, H; Carvalho, A F; Reis, L F; Brentani, R R; Simpson, A J; de Souza, S J; Melo, M

    2001-10-09

    Open reading frame expressed sequences tags (ORESTES) differ from conventional ESTs by providing sequence data from the central protein coding portion of transcripts. We generated a total of 696,745 ORESTES sequences from 24 human tissues and used a subset of the data that correspond to a set of 15,095 full-length mRNAs as a means of assessing the efficiency of the strategy and its potential contribution to the definition of the human transcriptome. We estimate that ORESTES sampled over 80% of all highly and moderately expressed, and between 40% and 50% of rarely expressed, human genes. In our most thoroughly sequenced tissue, the breast, the 130,000 ORESTES generated are derived from transcripts from an estimated 70% of all genes expressed in that tissue, with an equally efficient representation of both highly and poorly expressed genes. In this respect, we find that the capacity of the ORESTES strategy both for gene discovery and shotgun transcript sequence generation significantly exceeds that of conventional ESTs. The distribution of ORESTES is such that many human transcripts are now represented by a scaffold of partial sequences distributed along the length of each gene product. The experimental joining of the scaffold components, by reverse transcription-PCR, represents a direct route to transcript finishing that may represent a useful alternative to full-length cDNA cloning.

  9. An inventory of the Aspergillus niger secretome by combining in silico predictions with shotgun proteomics data

    Directory of Open Access Journals (Sweden)

    Martens-Uzunova Elena S

    2010-10-01

    Full Text Available Abstract Background The ecological niche occupied by a fungal species, its pathogenicity and its usefulness as a microbial cell factory to a large degree depends on its secretome. Protein secretion usually requires the presence of a N-terminal signal peptide (SP and by scanning for this feature using available highly accurate SP-prediction tools, the fraction of potentially secreted proteins can be directly predicted. However, prediction of a SP does not guarantee that the protein is actually secreted and current in silico prediction methods suffer from gene-model errors introduced during genome annotation. Results A majority rule based classifier that also evaluates signal peptide predictions from the best homologs of three neighbouring Aspergillus species was developed to create an improved list of potential signal peptide containing proteins encoded by the Aspergillus niger genome. As a complement to these in silico predictions, the secretome associated with growth and upon carbon source depletion was determined using a shotgun proteomics approach. Overall, some 200 proteins with a predicted signal peptide were identified to be secreted proteins. Concordant changes in the secretome state were observed as a response to changes in growth/culture conditions. Additionally, two proteins secreted via a non-classical route operating in A. niger were identified. Conclusions We were able to improve the in silico inventory of A. niger secretory proteins by combining different gene-model predictions from neighbouring Aspergilli and thereby avoiding prediction conflicts associated with inaccurate gene-models. The expected accuracy of signal peptide prediction for proteins that lack homologous sequences in the proteomes of related species is 85%. An experimental validation of the predicted proteome confirmed in silico predictions.

  10. An inventory of the Aspergillus niger secretome by combining in silico predictions with shotgun proteomics data.

    Science.gov (United States)

    Braaksma, Machtelt; Martens-Uzunova, Elena S; Punt, Peter J; Schaap, Peter J

    2010-10-19

    The ecological niche occupied by a fungal species, its pathogenicity and its usefulness as a microbial cell factory to a large degree depends on its secretome. Protein secretion usually requires the presence of a N-terminal signal peptide (SP) and by scanning for this feature using available highly accurate SP-prediction tools, the fraction of potentially secreted proteins can be directly predicted. However, prediction of a SP does not guarantee that the protein is actually secreted and current in silico prediction methods suffer from gene-model errors introduced during genome annotation. A majority rule based classifier that also evaluates signal peptide predictions from the best homologs of three neighbouring Aspergillus species was developed to create an improved list of potential signal peptide containing proteins encoded by the Aspergillus niger genome. As a complement to these in silico predictions, the secretome associated with growth and upon carbon source depletion was determined using a shotgun proteomics approach. Overall, some 200 proteins with a predicted signal peptide were identified to be secreted proteins. Concordant changes in the secretome state were observed as a response to changes in growth/culture conditions. Additionally, two proteins secreted via a non-classical route operating in A. niger were identified. We were able to improve the in silico inventory of A. niger secretory proteins by combining different gene-model predictions from neighbouring Aspergilli and thereby avoiding prediction conflicts associated with inaccurate gene-models. The expected accuracy of signal peptide prediction for proteins that lack homologous sequences in the proteomes of related species is 85%. An experimental validation of the predicted proteome confirmed in silico predictions.

  11. Finding the right coverage : The impact of coverage and sequence quality on single nucleotide polymorphism genotyping error rates

    NARCIS (Netherlands)

    Fountain, Emily D.; Pauli, Jonathan N.; Reid, Brendan N.; Palsboll, Per J.; Peery, M. Zachariah

    Restriction-enzyme-based sequencing methods enable the genotyping of thousands of single nucleotide polymorphism (SNP) loci in nonmodel organisms. However, in contrast to traditional genetic markers, genotyping error rates in SNPs derived from restriction-enzyme-based methods remain largely unknown.

  12. Automated Testing with Targeted Event Sequence Generation

    DEFF Research Database (Denmark)

    Jensen, Casper Svenning; Prasad, Mukul R.; Møller, Anders

    2013-01-01

    Automated software testing aims to detect errors by producing test inputs that cover as much of the application source code as possible. Applications for mobile devices are typically event-driven, which raises the challenge of automatically producing event sequences that result in high coverage...

  13. Whole-Genome Shotgun Sequence of the Keratinolytic Bacterium Lysobacter sp. A03, Isolated from the Antarctic Environment.

    Science.gov (United States)

    Pereira, Jamile Queiroz; Ambrosini, Adriana; Sant'Anna, Fernando Hayashi; Tadra-Sfeir, Michele; Faoro, Helisson; Pedrosa, Fábio Oliveira; Souza, Emanuel Maltempi; Brandelli, Adriano; Passaglia, Luciane M P

    2015-04-02

    Lysobacter sp. strain A03 is a protease-producing bacterium isolated from decomposing-penguin feathers collected in the Antarctic environment. This strain has the ability to degrade keratin at low temperatures. The A03 genome sequence provides the possibility of finding new genes with biotechnological potential to better understand its cold-adaptation mechanism and survival in cold environments. Copyright © 2015 Pereira et al.

  14. The genome sequence of the outbreeding globe artichoke constructed de novo incorporating a phase-aware low-pass sequencing strategy of F1 progeny

    Science.gov (United States)

    Scaglione, Davide; Reyes-Chin-Wo, Sebastian; Acquadro, Alberto; Froenicke, Lutz; Portis, Ezio; Beitel, Christopher; Tirone, Matteo; Mauro, Rosario; Lo Monaco, Antonino; Mauromicale, Giovanni; Faccioli, Primetta; Cattivelli, Luigi; Rieseberg, Loren; Michelmore, Richard; Lanteri, Sergio

    2016-01-01

    Globe artichoke (Cynara cardunculus var. scolymus) is an out-crossing, perennial, multi-use crop species that is grown worldwide and belongs to the Compositae, one of the most successful Angiosperm families. We describe the first genome sequence of globe artichoke. The assembly, comprising of 13,588 scaffolds covering 725 of the 1,084 Mb genome, was generated using ~133-fold Illumina sequencing data and encodes 26,889 predicted genes. Re-sequencing (30×) of globe artichoke and cultivated cardoon (C. cardunculus var. altilis) parental genotypes and low-coverage (0.5 to 1×) genotyping-by-sequencing of 163 F1 individuals resulted in 73% of the assembled genome being anchored in 2,178 genetic bins ordered along 17 chromosomal pseudomolecules. This was achieved using a novel pipeline, SOILoCo (Scaffold Ordering by Imputation with Low Coverage), to detect heterozygous regions and assign parental haplotypes with low sequencing read depth and of unknown phase. SOILoCo provides a powerful tool for de novo genome analysis of outcrossing species. Our data will enable genome-scale analyses of evolutionary processes among crops, weeds, and wild species within and beyond the Compositae, and will facilitate the identification of economically important genes from related species. PMID:26786968

  15. Women's Health Insurance Coverage

    Science.gov (United States)

    ... Women's Health Policy Women’s Health Insurance Coverage Women’s Health Insurance Coverage Published: Oct 31, 2017 Facebook Twitter LinkedIn ... that many women continue to face. Sources of Health Insurance Coverage Employer-Sponsored Insurance: Approximately 57.9 million ...

  16. ReadDepth: a parallel R package for detecting copy number alterations from short sequencing reads.

    Directory of Open Access Journals (Sweden)

    Christopher A Miller

    2011-01-01

    Full Text Available Copy number alterations are important contributors to many genetic diseases, including cancer. We present the readDepth package for R, which can detect these aberrations by measuring the depth of coverage obtained by massively parallel sequencing of the genome. In addition to achieving higher accuracy than existing packages, our tool runs much faster by utilizing multi-core architectures to parallelize the processing of these large data sets. In contrast to other published methods, readDepth does not require the sequencing of a reference sample, and uses a robust statistical model that accounts for overdispersed data. It includes a method for effectively increasing the resolution obtained from low-coverage experiments by utilizing breakpoint information from paired end sequencing to do positional refinement. We also demonstrate a method for inferring copy number using reads generated by whole-genome bisulfite sequencing, thus enabling integrative study of epigenomic and copy number alterations. Finally, we apply this tool to two genomes, showing that it performs well on genomes sequenced to both low and high coverage. The readDepth package runs on Linux and MacOSX, is released under the Apache 2.0 license, and is available at http://code.google.com/p/readdepth/.

  17. Next generation sequencing in clinical medicine: Challenges and lessons for pathology and biomedical informatics

    Directory of Open Access Journals (Sweden)

    Rama R Gullapalli

    2012-01-01

    Full Text Available The Human Genome Project (HGP provided the initial draft of mankind′s DNA sequence in 2001. The HGP was produced by 23 collaborating laboratories using Sanger sequencing of mapped regions as well as shotgun sequencing techniques in a process that occupied 13 years at a cost of ~$3 billion. Today, Next Generation Sequencing (NGS techniques represent the next phase in the evolution of DNA sequencing technology at dramatically reduced cost compared to traditional Sanger sequencing. A single laboratory today can sequence the entire human genome in a few days for a few thousand dollars in reagents and staff time. Routine whole exome or even whole genome sequencing of clinical patients is well within the realm of affordability for many academic institutions across the country. This paper reviews current sequencing technology methods and upcoming advancements in sequencing technology as well as challenges associated with data generation, data manipulation and data storage. Implementation of routine NGS data in cancer genomics is discussed along with potential pitfalls in the interpretation of the NGS data. The overarching importance of bioinformatics in the clinical implementation of NGS is emphasized. [7] We also review the issue of physician education which also is an important consideration for the successful implementation of NGS in the clinical workplace. NGS technologies represent a golden opportunity for the next generation of pathologists to be at the leading edge of the personalized medicine approaches coming our way. Often under-emphasized issues of data access and control as well as potential ethical implications of whole genome NGS sequencing are also discussed. Despite some challenges, it′s hard not to be optimistic about the future of personalized genome sequencing and its potential impact on patient care and the advancement of knowledge of human biology and disease in the near future.

  18. Characterization of the Burkholderia thailandensis SOS response by using whole-transcriptome shotgun sequencing.

    Science.gov (United States)

    Ulrich, Ricky L; Deshazer, David; Kenny, Tara A; Ulrich, Melanie P; Moravusova, Anna; Opperman, Timothy; Bavari, Sina; Bowlin, Terry L; Moir, Donald T; Panchal, Rekha G

    2013-10-01

    The bacterial SOS response is a well-characterized regulatory network encoded by most prokaryotic bacterial species and is involved in DNA repair. In addition to nucleic acid repair, the SOS response is involved in pathogenicity, stress-induced mutagenesis, and the emergence and dissemination of antibiotic resistance. Using high-throughput sequencing technology (SOLiD RNA-Seq), we analyzed the Burkholderia thailandensis global SOS response to the fluoroquinolone antibiotic, ciprofloxacin (CIP), and the DNA-damaging chemical, mitomycin C (MMC). We demonstrate that a B. thailandensis recA mutant (RU0643) is ∼4-fold more sensitive to CIP in contrast to the parental strain B. thailandensis DW503. Our RNA-Seq results show that CIP and MMC treatment (P SOS response were induced and include lexA, uvrA, dnaE, dinB, recX, and recA. At the genome-wide level, we found an overall decrease in gene expression, especially for genes involved in amino acid and carbohydrate transport and metabolism, following both CIP and MMC exposure. Interestingly, we observed the upregulation of several genes involved in bacterial motility and enhanced transcription of a B. thailandensis genomic island encoding a Siphoviridae bacteriophage designated E264. Using B. thailandensis plaque assays and PCR with B. mallei ATCC 23344 as the host, we demonstrate that CIP and MMC exposure in B. thailandensis DW503 induces the transcription and translation of viable bacteriophage in a RecA-dependent manner. This is the first report of the SOS response in Burkholderia spp. to DNA-damaging agents. We have identified both common and unique adaptive responses of B. thailandensis to chemical stress and DNA damage.

  19. Shotgun proteomic analytical approach for studying proteins adsorbed onto liposome surface

    KAUST Repository

    Capriotti, Anna Laura

    2011-07-02

    The knowledge about the interaction between plasma proteins and nanocarriers employed for in vivo delivery is fundamental to understand their biodistribution. Protein adsorption onto nanoparticle surface (protein corona) is strongly affected by vector surface characteristics. In general, the primary interaction is thought to be electrostatic, thus surface charge of carrier is supposed to play a central role in protein adsorption. Because protein corona composition can be critical in modifying the interactive surface that is recognized by cells, characterizing its formation onto lipid particles may serve as a fundamental predictive model for the in vivo efficiency of a lipidic vector. In the present work, protein coronas adsorbed onto three differently charged cationic liposome formulations were compared by a shotgun proteomic approach based on nano-liquid chromatography-high-resolution mass spectrometry. About 130 proteins were identified in each corona, with only small differences between the different cationic liposome formulations. However, this study could be useful for the future controlled design of colloidal drug carriers and possibly in the controlled creation of biocompatible surfaces of other devices that come into contact with proteins into body fluids. © 2011 Springer-Verlag.

  20. Molecular diagnosis of Usher syndrome: application of two different next generation sequencing-based procedures.

    Directory of Open Access Journals (Sweden)

    Danilo Licastro

    Full Text Available Usher syndrome (USH is a clinically and genetically heterogeneous disorder characterized by visual and hearing impairments. Clinically, it is subdivided into three subclasses with nine genes identified so far. In the present study, we investigated whether the currently available Next Generation Sequencing (NGS technologies are already suitable for molecular diagnostics of USH. We analyzed a total of 12 patients, most of which were negative for previously described mutations in known USH genes upon primer extension-based microarray genotyping. We enriched the NGS template either by whole exome capture or by Long-PCR of the known USH genes. The main NGS sequencing platforms were used: SOLiD for whole exome sequencing, Illumina (Genome Analyzer II and Roche 454 (GS FLX for the Long-PCR sequencing. Long-PCR targeting was more efficient with up to 94% of USH gene regions displaying an overall coverage higher than 25×, whereas whole exome sequencing yielded a similar coverage for only 50% of those regions. Overall this integrated analysis led to the identification of 11 novel sequence variations in USH genes (2 homozygous and 9 heterozygous out of 18 detected. However, at least two cases were not genetically solved. Our result highlights the current limitations in the diagnostic use of NGS for USH patients. The limit for whole exome sequencing is linked to the need of a strong coverage and to the correct interpretation of sequence variations with a non obvious, pathogenic role, whereas the targeted approach suffers from the high genetic heterogeneity of USH that may be also caused by the presence of additional causative genes yet to be identified.

  1. Molecular Diagnosis of Usher Syndrome: Application of Two Different Next Generation Sequencing-Based Procedures

    Science.gov (United States)

    Licastro, Danilo; Mutarelli, Margherita; Peluso, Ivana; Neveling, Kornelia; Wieskamp, Nienke; Rispoli, Rossella; Vozzi, Diego; Athanasakis, Emmanouil; D'Eustacchio, Angela; Pizzo, Mariateresa; D'Amico, Francesca; Ziviello, Carmela; Simonelli, Francesca; Fabretto, Antonella; Scheffer, Hans; Gasparini, Paolo; Banfi, Sandro; Nigro, Vincenzo

    2012-01-01

    Usher syndrome (USH) is a clinically and genetically heterogeneous disorder characterized by visual and hearing impairments. Clinically, it is subdivided into three subclasses with nine genes identified so far. In the present study, we investigated whether the currently available Next Generation Sequencing (NGS) technologies are already suitable for molecular diagnostics of USH. We analyzed a total of 12 patients, most of which were negative for previously described mutations in known USH genes upon primer extension-based microarray genotyping. We enriched the NGS template either by whole exome capture or by Long-PCR of the known USH genes. The main NGS sequencing platforms were used: SOLiD for whole exome sequencing, Illumina (Genome Analyzer II) and Roche 454 (GS FLX) for the Long-PCR sequencing. Long-PCR targeting was more efficient with up to 94% of USH gene regions displaying an overall coverage higher than 25×, whereas whole exome sequencing yielded a similar coverage for only 50% of those regions. Overall this integrated analysis led to the identification of 11 novel sequence variations in USH genes (2 homozygous and 9 heterozygous) out of 18 detected. However, at least two cases were not genetically solved. Our result highlights the current limitations in the diagnostic use of NGS for USH patients. The limit for whole exome sequencing is linked to the need of a strong coverage and to the correct interpretation of sequence variations with a non obvious, pathogenic role, whereas the targeted approach suffers from the high genetic heterogeneity of USH that may be also caused by the presence of additional causative genes yet to be identified. PMID:22952768

  2. IdentiPy: an extensible search engine for protein identification in shotgun proteomics.

    Science.gov (United States)

    Levitsky, Lev I; Ivanov, Mark V; Lobas, Anna A; Bubis, Julia A; Tarasova, Irina A; Solovyeva, Elizaveta M; Pridatchenko, Marina L; Gorshkov, Mikhail V

    2018-04-23

    We present an open-source, extensible search engine for shotgun proteomics. Implemented in Python programming language, IdentiPy shows competitive processing speed and sensitivity compared with the state-of-the-art search engines. It is equipped with a user-friendly web interface, IdentiPy Server, enabling the use of a single server installation accessed from multiple workstations. Using a simplified version of X!Tandem scoring algorithm and its novel ``auto-tune'' feature, IdentiPy outperforms the popular alternatives on high-resolution data sets. Auto-tune adjusts the search parameters for the particular data set, resulting in improved search efficiency and simplifying the user experience. IdentiPy with the auto-tune feature shows higher sensitivity compared with the evaluated search engines. IdentiPy Server has built-in post-processing and protein inference procedures and provides graphic visualization of the statistical properties of the data set and the search results. It is open-source and can be freely extended to use third-party scoring functions or processing algorithms, and allows customization of the search workflow for specialized applications.

  3. A microfluidic DNA library preparation platform for next-generation sequencing.

    Science.gov (United States)

    Kim, Hanyoup; Jebrail, Mais J; Sinha, Anupama; Bent, Zachary W; Solberg, Owen D; Williams, Kelly P; Langevin, Stanley A; Renzi, Ronald F; Van De Vreugde, James L; Meagher, Robert J; Schoeniger, Joseph S; Lane, Todd W; Branda, Steven S; Bartsch, Michael S; Patel, Kamlesh D

    2013-01-01

    Next-generation sequencing (NGS) is emerging as a powerful tool for elucidating genetic information for a wide range of applications. Unfortunately, the surging popularity of NGS has not yet been accompanied by an improvement in automated techniques for preparing formatted sequencing libraries. To address this challenge, we have developed a prototype microfluidic system for preparing sequencer-ready DNA libraries for analysis by Illumina sequencing. Our system combines droplet-based digital microfluidic (DMF) sample handling with peripheral modules to create a fully-integrated, sample-in library-out platform. In this report, we use our automated system to prepare NGS libraries from samples of human and bacterial genomic DNA. E. coli libraries prepared on-device from 5 ng of total DNA yielded excellent sequence coverage over the entire bacterial genome, with >99% alignment to the reference genome, even genome coverage, and good quality scores. Furthermore, we produced a de novo assembly on a previously unsequenced multi-drug resistant Klebsiella pneumoniae strain BAA-2146 (KpnNDM). The new method described here is fast, robust, scalable, and automated. Our device for library preparation will assist in the integration of NGS technology into a wide variety of laboratories, including small research laboratories and clinical laboratories.

  4. A microfluidic DNA library preparation platform for next-generation sequencing.

    Directory of Open Access Journals (Sweden)

    Hanyoup Kim

    Full Text Available Next-generation sequencing (NGS is emerging as a powerful tool for elucidating genetic information for a wide range of applications. Unfortunately, the surging popularity of NGS has not yet been accompanied by an improvement in automated techniques for preparing formatted sequencing libraries. To address this challenge, we have developed a prototype microfluidic system for preparing sequencer-ready DNA libraries for analysis by Illumina sequencing. Our system combines droplet-based digital microfluidic (DMF sample handling with peripheral modules to create a fully-integrated, sample-in library-out platform. In this report, we use our automated system to prepare NGS libraries from samples of human and bacterial genomic DNA. E. coli libraries prepared on-device from 5 ng of total DNA yielded excellent sequence coverage over the entire bacterial genome, with >99% alignment to the reference genome, even genome coverage, and good quality scores. Furthermore, we produced a de novo assembly on a previously unsequenced multi-drug resistant Klebsiella pneumoniae strain BAA-2146 (KpnNDM. The new method described here is fast, robust, scalable, and automated. Our device for library preparation will assist in the integration of NGS technology into a wide variety of laboratories, including small research laboratories and clinical laboratories.

  5. Sequence analysis of the genome of carnation (Dianthus caryophyllus L.).

    Science.gov (United States)

    Yagi, Masafumi; Kosugi, Shunichi; Hirakawa, Hideki; Ohmiya, Akemi; Tanase, Koji; Harada, Taro; Kishimoto, Kyutaro; Nakayama, Masayoshi; Ichimura, Kazuo; Onozaki, Takashi; Yamaguchi, Hiroyasu; Sasaki, Nobuhiro; Miyahara, Taira; Nishizaki, Yuzo; Ozeki, Yoshihiro; Nakamura, Noriko; Suzuki, Takamasa; Tanaka, Yoshikazu; Sato, Shusei; Shirasawa, Kenta; Isobe, Sachiko; Miyamura, Yoshinori; Watanabe, Akiko; Nakayama, Shinobu; Kishida, Yoshie; Kohara, Mitsuyo; Tabata, Satoshi

    2014-06-01

    The whole-genome sequence of carnation (Dianthus caryophyllus L.) cv. 'Francesco' was determined using a combination of different new-generation multiplex sequencing platforms. The total length of the non-redundant sequences was 568,887,315 bp, consisting of 45,088 scaffolds, which covered 91% of the 622 Mb carnation genome estimated by k-mer analysis. The N50 values of contigs and scaffolds were 16,644 bp and 60,737 bp, respectively, and the longest scaffold was 1,287,144 bp. The average GC content of the contig sequences was 36%. A total of 1050, 13, 92 and 143 genes for tRNAs, rRNAs, snoRNA and miRNA, respectively, were identified in the assembled genomic sequences. For protein-encoding genes, 43 266 complete and partial gene structures excluding those in transposable elements were deduced. Gene coverage was ∼ 98%, as deduced from the coverage of the core eukaryotic genes. Intensive characterization of the assigned carnation genes and comparison with those of other plant species revealed characteristic features of the carnation genome. The results of this study will serve as a valuable resource for fundamental and applied research of carnation, especially for breeding new carnation varieties. Further information on the genomic sequences is available at http://carnation.kazusa.or.jp. © The Author 2013. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  6. In-Depth Analysis of Exoproteomes from Marine Bacteria by Shotgun Liquid Chromatography-Tandem Mass Spectrometry: the Ruegeria pomeroyi DSS-3 Case-Study

    Directory of Open Access Journals (Sweden)

    Jean Armengaud

    2010-07-01

    Full Text Available Microorganisms secrete into their extracellular environment numerous compounds that are required for their survival. Many of these compounds could be of great interest for biotechnology applications and their genes used in synthetic biology design. The secreted proteins and the components of the translocation systems themselves can be scrutinized in-depth by the most recent proteomic tools. While the secretomes of pathogens are well-documented, those of non-pathogens remain largely to be established. Here, we present the analysis of the exoproteome from the marine bacterium Ruegeria pomeroyi DSS-3 grown in standard laboratory conditions. We used a shotgun approach consisting of trypsin digestion of the exoproteome, and identification of the resulting peptides by liquid chromatography coupled to tandem mass spectrometry. Three different proteins that have domains homologous to those observed in RTX toxins were uncovered and were semi-quantified as the most abundantly secreted proteins. One of these proteins clearly stands out from the catalogue, representing over half of the total exoproteome. We also listed many soluble proteins related to ABC and TRAP transporters implied in the uptake of nutrients. The Ruegeria pomeroyi DSS-3 case-study illustrates the power of the shotgun nano-LC-MS/MS strategy to decipher the exoproteome from marine bacteria and to contribute to environmental proteomics.

  7. Structural characterization of ether lipids from the archaeon Sulfolobus islandicus by high-resolution shotgun lipidomics

    DEFF Research Database (Denmark)

    Jensen, Sara Munk; Brandl, Martin; Treusch, Alexander H

    2015-01-01

    The molecular structures, biosynthetic pathways and physiological functions of membrane lipids produced by organisms in the domain Archaea are poorly characterized as compared with that of counterparts in Bacteria and Eukaryota. Here we report on the use of high-resolution shotgun lipidomics......-resolution Fourier transform mass spectrometry using an ion trap-orbitrap mass spectrometer. This analysis identified five clusters of molecular ions that matched ether lipids in the database with sub-ppm mass accuracy. To structurally characterize and validate the identities of the potential lipid species, we...... performed structural analysis using multistage activation on the ion trap-orbitrap instrument as well as tandem mass analysis using a quadrupole time-of-flight machine. Our analysis identified four ether lipid species previously reported in Archaea, and one ether lipid species that had not been described...

  8. A high-coverage Neandertal genome from Vindija Cave in Croatia.

    Science.gov (United States)

    Prüfer, Kay; de Filippo, Cesare; Grote, Steffi; Mafessoni, Fabrizio; Korlević, Petra; Hajdinjak, Mateja; Vernot, Benjamin; Skov, Laurits; Hsieh, Pinghsun; Peyrégne, Stéphane; Reher, David; Hopfe, Charlotte; Nagel, Sarah; Maricic, Tomislav; Fu, Qiaomei; Theunert, Christoph; Rogers, Rebekah; Skoglund, Pontus; Chintalapati, Manjusha; Dannemann, Michael; Nelson, Bradley J; Key, Felix M; Rudan, Pavao; Kućan, Željko; Gušić, Ivan; Golovanova, Liubov V; Doronichev, Vladimir B; Patterson, Nick; Reich, David; Eichler, Evan E; Slatkin, Montgomery; Schierup, Mikkel H; Andrés, Aida M; Kelso, Janet; Meyer, Matthias; Pääbo, Svante

    2017-11-03

    To date, the only Neandertal genome that has been sequenced to high quality is from an individual found in Southern Siberia. We sequenced the genome of a female Neandertal from ~50,000 years ago from Vindija Cave, Croatia, to ~30-fold genomic coverage. She carried 1.6 differences per 10,000 base pairs between the two copies of her genome, fewer than present-day humans, suggesting that Neandertal populations were of small size. Our analyses indicate that she was more closely related to the Neandertals that mixed with the ancestors of present-day humans living outside of sub-Saharan Africa than the previously sequenced Neandertal from Siberia, allowing 10 to 20% more Neandertal DNA to be identified in present-day humans, including variants involved in low-density lipoprotein cholesterol concentrations, schizophrenia, and other diseases. Copyright © 2017 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.

  9. Increasing Genome Sampling and Improving SNP Genotyping for Genotyping-by-Sequencing with New Combinations of Restriction Enzymes.

    Science.gov (United States)

    Fu, Yong-Bi; Peterson, Gregory W; Dong, Yibo

    2016-04-07

    Genotyping-by-sequencing (GBS) has emerged as a useful genomic approach for exploring genome-wide genetic variation. However, GBS commonly samples a genome unevenly and can generate a substantial amount of missing data. These technical features would limit the power of various GBS-based genetic and genomic analyses. Here we present software called IgCoverage for in silico evaluation of genomic coverage through GBS with an individual or pair of restriction enzymes on one sequenced genome, and report a new set of 21 restriction enzyme combinations that can be applied to enhance GBS applications. These enzyme combinations were developed through an application of IgCoverage on 22 plant, animal, and fungus species with sequenced genomes, and some of them were empirically evaluated with different runs of Illumina MiSeq sequencing in 12 plant species. The in silico analysis of 22 organisms revealed up to eight times more genome coverage for the new combinations consisted of pairing four- or five-cutter restriction enzymes than the commonly used enzyme combination PstI + MspI. The empirical evaluation of the new enzyme combination (HinfI + HpyCH4IV) in 12 plant species showed 1.7-6 times more genome coverage than PstI + MspI, and 2.3 times more genome coverage in dicots than monocots. Also, the SNP genotyping in 12 Arabidopsis and 12 rice plants revealed that HinfI + HpyCH4IV generated 7 and 1.3 times more SNPs (with 0-16.7% missing observations) than PstI + MspI, respectively. These findings demonstrate that these novel enzyme combinations can be utilized to increase genome sampling and improve SNP genotyping in various GBS applications. Copyright © 2016 Fu et al.

  10. Disk-based compression of data from genome sequencing.

    Science.gov (United States)

    Grabowski, Szymon; Deorowicz, Sebastian; Roguski, Łukasz

    2015-05-01

    High-coverage sequencing data have significant, yet hard to exploit, redundancy. Most FASTQ compressors cannot efficiently compress the DNA stream of large datasets, since the redundancy between overlapping reads cannot be easily captured in the (relatively small) main memory. More interesting solutions for this problem are disk based, where the better of these two, from Cox et al. (2012), is based on the Burrows-Wheeler transform (BWT) and achieves 0.518 bits per base for a 134.0 Gbp human genome sequencing collection with almost 45-fold coverage. We propose overlapping reads compression with minimizers, a compression algorithm dedicated to sequencing reads (DNA only). Our method makes use of a conceptually simple and easily parallelizable idea of minimizers, to obtain 0.317 bits per base as the compression ratio, allowing to fit the 134.0 Gbp dataset into only 5.31 GB of space. http://sun.aei.polsl.pl/orcom under a free license. sebastian.deorowicz@polsl.pl Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  11. Comparative analyses of two Geraniaceae transcriptomes using next-generation sequencing.

    Science.gov (United States)

    Zhang, Jin; Ruhlman, Tracey A; Mower, Jeffrey P; Jansen, Robert K

    2013-12-29

    Organelle genomes of Geraniaceae exhibit several unusual evolutionary phenomena compared to other angiosperm families including accelerated nucleotide substitution rates, widespread gene loss, reduced RNA editing, and extensive genomic rearrangements. Since most organelle-encoded proteins function in multi-subunit complexes that also contain nuclear-encoded proteins, it is likely that the atypical organellar phenomena affect the evolution of nuclear genes encoding organellar proteins. To begin to unravel the complex co-evolutionary interplay between organellar and nuclear genomes in this family, we sequenced nuclear transcriptomes of two species, Geranium maderense and Pelargonium x hortorum. Normalized cDNA libraries of G. maderense and P. x hortorum were used for transcriptome sequencing. Five assemblers (MIRA, Newbler, SOAPdenovo, SOAPdenovo-trans [SOAPtrans], Trinity) and two next-generation technologies (454 and Illumina) were compared to determine the optimal transcriptome sequencing approach. Trinity provided the highest quality assembly of Illumina data with the deepest transcriptome coverage. An analysis to determine the amount of sequencing needed for de novo assembly revealed diminishing returns of coverage and quality with data sets larger than sixty million Illumina paired end reads for both species. The G. maderense and P. x hortorum transcriptomes contained fewer transcripts encoding the PLS subclass of PPR proteins relative to other angiosperms, consistent with reduced mitochondrial RNA editing activity in Geraniaceae. In addition, transcripts for all six plastid targeted sigma factors were identified in both transcriptomes, suggesting that one of the highly divergent rpoA-like ORFs in the P. x hortorum plastid genome is functional. The findings support the use of the Illumina platform and assemblers optimized for transcriptome assembly, such as Trinity or SOAPtrans, to generate high-quality de novo transcriptomes with broad coverage. In addition

  12. Draft genome sequence of ramie, Boehmeria nivea (L.) Gaudich.

    Science.gov (United States)

    Luan, Ming-Bao; Jian, Jian-Bo; Chen, Ping; Chen, Jun-Hui; Chen, Jian-Hua; Gao, Qiang; Gao, Gang; Zhou, Ju-Hong; Chen, Kun-Mei; Guang, Xuan-Min; Chen, Ji-Kang; Zhang, Qian-Qian; Wang, Xiao-Fei; Fang, Long; Sun, Zhi-Min; Bai, Ming-Zhou; Fang, Xiao-Dong; Zhao, Shan-Cen; Xiong, He-Ping; Yu, Chun-Ming; Zhu, Ai-Guo

    2018-05-01

    Ramie, Boehmeria nivea (L.) Gaudich, family Urticaceae, is a plant native to eastern Asia, and one of the world's oldest fibre crops. It is also used as animal feed and for the phytoremediation of heavy metal-contaminated farmlands. Thus, the genome sequence of ramie was determined to explore the molecular basis of its fibre quality, protein content and phytoremediation. For further understanding ramie genome, different paired-end and mate-pair libraries were combined to generate 134.31 Gb of raw DNA sequences using the Illumina whole-genome shotgun sequencing approach. The highly heterozygous B. nivea genome was assembled using the Platanus Genome Assembler, which is an effective tool for the assembly of highly heterozygous genome sequences. The final length of the draft genome of this species was approximately 341.9 Mb (contig N50 = 22.62 kb, scaffold N50 = 1,126.36 kb). Based on ramie genome annotations, 30,237 protein-coding genes were predicted, and the repetitive element content was 46.3%. The completeness of the final assembly was evaluated by benchmarking universal single-copy orthologous genes (BUSCO); 90.5% of the 1,440 expected embryophytic genes were identified as complete, and 4.9% were identified as fragmented. Phylogenetic analysis based on single-copy gene families and one-to-one orthologous genes placed ramie with mulberry and cannabis, within the clade of urticalean rosids. Genome information of ramie will be a valuable resource for the conservation of endangered Boehmeria species and for future studies on the biogeography and characteristic evolution of members of Urticaceae. © 2018 John Wiley & Sons Ltd.

  13. Full-length mRNA sequencing uncovers a widespread coupling between transcription initiation and mRNA processing.

    Science.gov (United States)

    Anvar, Seyed Yahya; Allard, Guy; Tseng, Elizabeth; Sheynkman, Gloria M; de Klerk, Eleonora; Vermaat, Martijn; Yin, Raymund H; Johansson, Hans E; Ariyurek, Yavuz; den Dunnen, Johan T; Turner, Stephen W; 't Hoen, Peter A C

    2018-03-29

    The multifaceted control of gene expression requires tight coordination of regulatory mechanisms at transcriptional and post-transcriptional level. Here, we studied the interdependence of transcription initiation, splicing and polyadenylation events on single mRNA molecules by full-length mRNA sequencing. In MCF-7 breast cancer cells, we find 2700 genes with interdependent alternative transcription initiation, splicing and polyadenylation events, both in proximal and distant parts of mRNA molecules, including examples of coupling between transcription start sites and polyadenylation sites. The analysis of three human primary tissues (brain, heart and liver) reveals similar patterns of interdependency between transcription initiation and mRNA processing events. We predict thousands of novel open reading frames from full-length mRNA sequences and obtained evidence for their translation by shotgun proteomics. The mapping database rescues 358 previously unassigned peptides and improves the assignment of others. By recognizing sample-specific amino-acid changes and novel splicing patterns, full-length mRNA sequencing improves proteogenomics analysis of MCF-7 cells. Our findings demonstrate that our understanding of transcriptome complexity is far from complete and provides a basis to reveal largely unresolved mechanisms that coordinate transcription initiation and mRNA processing.

  14. First genome report on novel sequence types of Neisseria meningitidis: ST12777 and ST12778.

    Science.gov (United States)

    Veeraraghavan, Balaji; Lal, Binesh; Devanga Ragupathi, Naveen Kumar; Neeravi, Iyyan Raj; Jeyaraman, Ranjith; Varghese, Rosemol; Paul, Miracle Magdalene; Baskaran, Ashtawarthani; Ranjan, Ranjini

    2018-03-01

    Neisseria meningitidis is an important causative agent of meningitis and/or sepsis with high morbidity and mortality. Baseline genome data on N. meningitidis, especially from developing countries such as India, are lacking. This study aimed to investigate the whole genome sequences of N. meningitidis isolates from a tertiary care centre in India. Whole-genome sequencing was performed using an Ion Torrent™ Personal Genome Machine™ (PGM) with 400-bp chemistry. Data were assembled de novo using SPAdes Genome Assembler v.5.0.0.0. Sequence annotation was performed through PATRIC, RAST and the NCBI PGAAP server. Downstream analysis of the isolates was performed using the Center for Genomic Epidemiology databases for antimicrobial resistance genes and sequence types. Virulence factors and CRISPR were analysed using the PubMLST database and CRISPRFinder, respectively. This study reports the whole genome shotgun sequences of eight N. meningitidis isolates from bloodstream infections. The genome data revealed two novel sequence types (ST12777 and ST12778), along with ST11, ST437 and ST6928. The virulence profile of the isolates matched their sequence types. All isolates were negative for plasmid-mediated resistance genes. To the best of our knowledge, this is the first report of ST11 and ST437 N. meningitidis isolates in India along with two novel sequence types (ST12777 and ST12778). These results indicate that the sequence types circulating in India are diverse and require continuous monitoring. Further studies strengthening the genome data on N. meningitidis are required to understand the prevalence, spread, exact resistance and virulence mechanisms along with serotypes. Copyright © 2017 International Society for Chemotherapy of Infection and Cancer. Published by Elsevier Ltd. All rights reserved.

  15. Genomic Variance Estimation Based on Genotyping-by-Sequencing with Different Coverage in Perennial Ryegrass

    DEFF Research Database (Denmark)

    Ashraf, Bilal; Fé, Dario; Jensen, Just

    2014-01-01

    at each SNP in family pools or polyploids. There are, however, several statistical challenges associated with this method, including low sequencing depth and missing values. Low sequencing depth results in inaccuracies in estimates of allele frequencies for each SNP. In this work we have focused...

  16. Mariner Mars 1971 television picture catalog. Volume 2: Sequence design and picture coverage

    Science.gov (United States)

    Koskela, P. E.; Helton, M. R.; Seeley, L. N.; Zawacki, S. J.

    1972-01-01

    A collection of data relating to the Mariner 9 TV picture is presented. The data are arranged to offer speedy identification of what took place during entire science cycles, on individual revolutions, and during individual science links or sequences. Summary tables present the nominal design for each of the major picture-taking cycles, along with the sequences actually taken on each revolution. These tables permit identification at a glance, all TV sequences and the corresponding individual pictures for the first 262 revolutions (primary mission). A list of TV pictures, categorized according to their latitude and longitude, is also provided. Orthographic and/or mercator plots for all pictures, along with pertinent numerical data for their center points are presented. Other tables and plots of interest are also included. This document is based upon data contained in the Supplementary Experiment Data Record (SEDR) files as of 21 August 1972.

  17. Whole-genome shotgun optical mapping of rhodospirillumrubrum

    Energy Technology Data Exchange (ETDEWEB)

    Reslewic, Susan; Zhou, Shiguo; Place, Mike; Zhang, Yaoping; Briska, Adam; Goldstein, Steve; Churas, Chris; Runnheim, Rod; Forrest,Dan; Lim, Alex; Lapidus, Alla; Han, Cliff S.; Roberts, Gary P.; Schwartz,David C.

    2004-07-01

    Rhodospirillum rubrum is a phototrophic purple non-sulfur bacterium known for its unique and well-studied nitrogen fixation and carbon monoxide oxidation systems, and as a source of hydrogen and biodegradable plastics production. To better understand this organism and to facilitate assembly of its sequence, three whole-genome restriction maps (Xba I, Nhe I, and Hind III) of R. rubrum strain ATCC 11170 were created by optical mapping. Optical mapping is a system for creating whole-genome ordered restriction maps from randomly sheared genomic DNA molecules extracted directly from cells. During the sequence finishing process, all three optical maps confirmed a putative error in sequence assembly, while the Hind III map acted as a scaffold for high resolution alignment with sequence contigs spanning the whole genome. In addition to highlighting optical mapping's role in the assembly and validation of genome sequence, our work underscores the unique niche in resolution occupied by the optical mapping system. With a resolution ranging from 6.5 kb (previously published) to 45 kb (reported here), optical mapping advances a ''molecular cytogenetics'' approach to solving problems in genomic analysis.

  18. Genome sequencing of the sweetpotato whitefly Bemisia tabaci MED/Q.

    Science.gov (United States)

    Xie, Wen; Chen, Chunhai; Yang, Zezhong; Guo, Litao; Yang, Xin; Wang, Dan; Chen, Ming; Huang, Jinqun; Wen, Yanan; Zeng, Yang; Liu, Yating; Xia, Jixing; Tian, Lixia; Cui, Hongying; Wu, Qingjun; Wang, Shaoli; Xu, Baoyun; Li, Xianchun; Tan, Xinqiu; Ghanim, Murad; Qiu, Baoli; Pan, Huipeng; Chu, Dong; Delatte, Helene; Maruthi, M N; Ge, Feng; Zhou, Xueping; Wang, Xiaowei; Wan, Fanghao; Du, Yuzhou; Luo, Chen; Yan, Fengming; Preisser, Evan L; Jiao, Xiaoguo; Coates, Brad S; Zhao, Jinyang; Gao, Qiang; Xia, Jinquan; Yin, Ye; Liu, Yong; Brown, Judith K; Zhou, Xuguo Joe; Zhang, Youjun

    2017-05-01

    The sweetpotato whitefly Bemisia tabaci is a highly destructive agricultural and ornamental crop pest. It damages host plants through both phloem feeding and vectoring plant pathogens. Introductions of B. tabaci are difficult to quarantine and eradicate because of its high reproductive rates, broad host plant range, and insecticide resistance. A total of 791 Gb of raw DNA sequence from whole genome shotgun sequencing, and 13 BAC pooling libraries were generated by Illumina sequencing using different combinations of mate-pair and pair-end libraries. Assembly gave a final genome with a scaffold N50 of 437 kb, and a total length of 658 Mb. Annotation of repetitive elements and coding regions resulted in 265.0 Mb TEs (40.3%) and 20 786 protein-coding genes with putative gene family expansions, respectively. Phylogenetic analysis based on orthologs across 14 arthropod taxa suggested that MED/Q is clustered into a hemipteran clade containing A. pisum and is a sister lineage to a clade containing both R. prolixus and N. lugens. Genome completeness, as estimated using the CEGMA and Benchmarking Universal Single-Copy Orthologs pipelines, reached 96% and 79%. These MED/Q genomic resources lay a foundation for future 'pan-genomic' comparisons of invasive vs. noninvasive, invasive vs. invasive, and native vs. exotic Bemisia, which, in return, will open up new avenues of investigation into whitefly biology, evolution, and management. © The Author 2017. Published by Oxford University Press.

  19. HPV-QUEST: A highly customized system for automated HPV sequence analysis capable of processing Next Generation sequencing data set.

    Science.gov (United States)

    Yin, Li; Yao, Jiqiang; Gardner, Brent P; Chang, Kaifen; Yu, Fahong; Goodenow, Maureen M

    2012-01-01

    Next Generation sequencing (NGS) applied to human papilloma viruses (HPV) can provide sensitive methods to investigate the molecular epidemiology of multiple type HPV infection. Currently a genotyping system with a comprehensive collection of updated HPV reference sequences and a capacity to handle NGS data sets is lacking. HPV-QUEST was developed as an automated and rapid HPV genotyping system. The web-based HPV-QUEST subtyping algorithm was developed using HTML, PHP, Perl scripting language, and MYSQL as the database backend. HPV-QUEST includes a database of annotated HPV reference sequences with updated nomenclature covering 5 genuses, 14 species and 150 mucosal and cutaneous types to genotype blasted query sequences. HPV-QUEST processes up to 10 megabases of sequences within 1 to 2 minutes. Results are reported in html, text and excel formats and display e-value, blast score, and local and coverage identities; provide genus, species, type, infection site and risk for the best matched reference HPV sequence; and produce results ready for additional analyses.

  20. Comparative metagenomic analysis of soil microbial communities across three hexachlorocyclohexane contamination levels.

    Directory of Open Access Journals (Sweden)

    Naseer Sangwan

    Full Text Available This paper presents the characterization of the microbial community responsible for the in-situ bioremediation of hexachlorocyclohexane (HCH. Microbial community structure and function was analyzed using 16S rRNA amplicon and shotgun metagenomic sequencing methods for three sets of soil samples. The three samples were collected from a HCH-dumpsite (450 mg HCH/g soil and comprised of a HCH/soil ratio of 0.45, 0.0007, and 0.00003, respectively. Certain bacterial; (Chromohalobacter, Marinimicrobium, Idiomarina, Salinosphaera, Halomonas, Sphingopyxis, Novosphingobium, Sphingomonas and Pseudomonas, archaeal; (Halobacterium, Haloarcula and Halorhabdus and fungal (Fusarium genera were found to be more abundant in the soil sample from the HCH-dumpsite. Consistent with the phylogenetic shift, the dumpsite also exhibited a relatively higher abundance of genes coding for chemotaxis/motility, chloroaromatic and HCH degradation (lin genes. Reassembly of a draft pangenome of Chromohalobacter salaxigenes sp. (∼8X coverage and 3 plasmids (pISP3, pISP4 and pLB1; 13X coverage containing lin genes/clusters also provides an evidence for the horizontal transfer of HCH catabolism genes.

  1. Comparison of next generation sequencing technologies for transcriptome characterization

    Directory of Open Access Journals (Sweden)

    Soltis Douglas E

    2009-08-01

    Full Text Available Abstract Background We have developed a simulation approach to help determine the optimal mixture of sequencing methods for most complete and cost effective transcriptome sequencing. We compared simulation results for traditional capillary sequencing with "Next Generation" (NG ultra high-throughput technologies. The simulation model was parameterized using mappings of 130,000 cDNA sequence reads to the Arabidopsis genome (NCBI Accession SRA008180.19. We also generated 454-GS20 sequences and de novo assemblies for the basal eudicot California poppy (Eschscholzia californica and the magnoliid avocado (Persea americana using a variety of methods for cDNA synthesis. Results The Arabidopsis reads tagged more than 15,000 genes, including new splice variants and extended UTR regions. Of the total 134,791 reads (13.8 MB, 119,518 (88.7% mapped exactly to known exons, while 1,117 (0.8% mapped to introns, 11,524 (8.6% spanned annotated intron/exon boundaries, and 3,066 (2.3% extended beyond the end of annotated UTRs. Sequence-based inference of relative gene expression levels correlated significantly with microarray data. As expected, NG sequencing of normalized libraries tagged more genes than non-normalized libraries, although non-normalized libraries yielded more full-length cDNA sequences. The Arabidopsis data were used to simulate additional rounds of NG and traditional EST sequencing, and various combinations of each. Our simulations suggest a combination of FLX and Solexa sequencing for optimal transcriptome coverage at modest cost. We have also developed ESTcalc http://fgp.huck.psu.edu/NG_Sims/ngsim.pl, an online webtool, which allows users to explore the results of this study by specifying individualized costs and sequencing characteristics. Conclusion NG sequencing technologies are a highly flexible set of platforms that can be scaled to suit different project goals. In terms of sequence coverage alone, the NG sequencing is a dramatic advance

  2. Front-End Electron Transfer Dissociation Coupled to a 21 Tesla FT-ICR Mass Spectrometer for Intact Protein Sequence Analysis

    Science.gov (United States)

    Weisbrod, Chad R.; Kaiser, Nathan K.; Syka, John E. P.; Early, Lee; Mullen, Christopher; Dunyach, Jean-Jacques; English, A. Michelle; Anderson, Lissa C.; Blakney, Greg T.; Shabanowitz, Jeffrey; Hendrickson, Christopher L.; Marshall, Alan G.; Hunt, Donald F.

    2017-09-01

    High resolution mass spectrometry is a key technology for in-depth protein characterization. High-field Fourier transform ion cyclotron resonance mass spectrometry (FT-ICR MS) enables high-level interrogation of intact proteins in the most detail to date. However, an appropriate complement of fragmentation technologies must be paired with FTMS to provide comprehensive sequence coverage, as well as characterization of sequence variants, and post-translational modifications. Here we describe the integration of front-end electron transfer dissociation (FETD) with a custom-built 21 tesla FT-ICR mass spectrometer, which yields unprecedented sequence coverage for proteins ranging from 2.8 to 29 kDa, without the need for extensive spectral averaging (e.g., 60% sequence coverage for apo-myoglobin with four averaged acquisitions). The system is equipped with a multipole storage device separate from the ETD reaction device, which allows accumulation of multiple ETD fragment ion fills. Consequently, an optimally large product ion population is accumulated prior to transfer to the ICR cell for mass analysis, which improves mass spectral signal-to-noise ratio, dynamic range, and scan rate. We find a linear relationship between protein molecular weight and minimum number of ETD reaction fills to achieve optimum sequence coverage, thereby enabling more efficient use of instrument data acquisition time. Finally, real-time scaling of the number of ETD reactions fills during method-based acquisition is shown, and the implications for LC-MS/MS top-down analysis are discussed. [Figure not available: see fulltext.

  3. Scalable whole-exome sequencing of cell-free DNA reveals high concordance with metastatic tumors.

    Science.gov (United States)

    Adalsteinsson, Viktor A; Ha, Gavin; Freeman, Samuel S; Choudhury, Atish D; Stover, Daniel G; Parsons, Heather A; Gydush, Gregory; Reed, Sarah C; Rotem, Denisse; Rhoades, Justin; Loginov, Denis; Livitz, Dimitri; Rosebrock, Daniel; Leshchiner, Ignaty; Kim, Jaegil; Stewart, Chip; Rosenberg, Mara; Francis, Joshua M; Zhang, Cheng-Zhong; Cohen, Ofir; Oh, Coyin; Ding, Huiming; Polak, Paz; Lloyd, Max; Mahmud, Sairah; Helvie, Karla; Merrill, Margaret S; Santiago, Rebecca A; O'Connor, Edward P; Jeong, Seong H; Leeson, Rachel; Barry, Rachel M; Kramkowski, Joseph F; Zhang, Zhenwei; Polacek, Laura; Lohr, Jens G; Schleicher, Molly; Lipscomb, Emily; Saltzman, Andrea; Oliver, Nelly M; Marini, Lori; Waks, Adrienne G; Harshman, Lauren C; Tolaney, Sara M; Van Allen, Eliezer M; Winer, Eric P; Lin, Nancy U; Nakabayashi, Mari; Taplin, Mary-Ellen; Johannessen, Cory M; Garraway, Levi A; Golub, Todd R; Boehm, Jesse S; Wagle, Nikhil; Getz, Gad; Love, J Christopher; Meyerson, Matthew

    2017-11-06

    Whole-exome sequencing of cell-free DNA (cfDNA) could enable comprehensive profiling of tumors from blood but the genome-wide concordance between cfDNA and tumor biopsies is uncertain. Here we report ichorCNA, software that quantifies tumor content in cfDNA from 0.1× coverage whole-genome sequencing data without prior knowledge of tumor mutations. We apply ichorCNA to 1439 blood samples from 520 patients with metastatic prostate or breast cancers. In the earliest tested sample for each patient, 34% of patients have ≥10% tumor-derived cfDNA, sufficient for standard coverage whole-exome sequencing. Using whole-exome sequencing, we validate the concordance of clonal somatic mutations (88%), copy number alterations (80%), mutational signatures, and neoantigens between cfDNA and matched tumor biopsies from 41 patients with ≥10% cfDNA tumor content. In summary, we provide methods to identify patients eligible for comprehensive cfDNA profiling, revealing its applicability to many patients, and demonstrate high concordance of cfDNA and metastatic tumor whole-exome sequencing.

  4. Iatrogenic artefacts attributable to traditional cupping therapy in a shotgun fatality.

    Science.gov (United States)

    Cavlak, Mehmet; Özkök, Alper; Sarı, Serhat; Dursun, Ahmet; Akar, Taner; Karapirli, Mustafa; Demirel, Birol

    2015-10-01

    Cupping is a traditional treatment method that has been used for thousands of years to diminish pain, restore appetite and improve digestion, remove tendency to faint or remove 'bad blood' from the body. The suction of the cup is created by fire or mechanical devices. This procedure may result in circular erythema, petechiae, purpura, ecchymosis, burns and may be mistaken for trauma-related ecchymosis or livor mortis. Forty-year-old male was died by shotgun injuries in the same day of the wounding. Circular ecchymoses were observed on the forehead, within the scalp of occipital region, the back of the neck, and on the back. They were defined as ecchymoses in the first examination made by a general practitioner. In the external examination during the legal autopsy superficial incisions were observed on the circular ecchymoses. The shape, localization and color of and the characteristics of incisions on the circular lesions were concluded to be caused by the dry cupping therapy and wet cupping therapy procedures. These lesions and their formation mechanisms should be well-known by the forensic medical examiners and the other medical personnel involved in the forensic medical examination. Copyright © 2015 Elsevier Ltd and Faculty of Forensic and Legal Medicine. All rights reserved.

  5. Next-Generation Mitogenomics: A Comparison of Approaches Applied to Caecilian Amphibian Phylogeny.

    Science.gov (United States)

    Maddock, Simon T; Briscoe, Andrew G; Wilkinson, Mark; Waeschenbach, Andrea; San Mauro, Diego; Day, Julia J; Littlewood, D Tim J; Foster, Peter G; Nussbaum, Ronald A; Gower, David J

    2016-01-01

    Mitochondrial genome (mitogenome) sequences are being generated with increasing speed due to the advances of next-generation sequencing (NGS) technology and associated analytical tools. However, detailed comparisons to explore the utility of alternative NGS approaches applied to the same taxa have not been undertaken. We compared a 'traditional' Sanger sequencing method with two NGS approaches (shotgun sequencing and non-indexed, multiplex amplicon sequencing) on four different sequencing platforms (Illumina's HiSeq and MiSeq, Roche's 454 GS FLX, and Life Technologies' Ion Torrent) to produce seven (near-) complete mitogenomes from six species that form a small radiation of caecilian amphibians from the Seychelles. The fastest, most accurate method of obtaining mitogenome sequences that we tested was direct sequencing of genomic DNA (shotgun sequencing) using the MiSeq platform. Bayesian inference and maximum likelihood analyses using seven different partitioning strategies were unable to resolve compellingly all phylogenetic relationships among the Seychelles caecilian species, indicating the need for additional data in this case.

  6. Next-Generation Mitogenomics: A Comparison of Approaches Applied to Caecilian Amphibian Phylogeny.

    Directory of Open Access Journals (Sweden)

    Simon T Maddock

    Full Text Available Mitochondrial genome (mitogenome sequences are being generated with increasing speed due to the advances of next-generation sequencing (NGS technology and associated analytical tools. However, detailed comparisons to explore the utility of alternative NGS approaches applied to the same taxa have not been undertaken. We compared a 'traditional' Sanger sequencing method with two NGS approaches (shotgun sequencing and non-indexed, multiplex amplicon sequencing on four different sequencing platforms (Illumina's HiSeq and MiSeq, Roche's 454 GS FLX, and Life Technologies' Ion Torrent to produce seven (near- complete mitogenomes from six species that form a small radiation of caecilian amphibians from the Seychelles. The fastest, most accurate method of obtaining mitogenome sequences that we tested was direct sequencing of genomic DNA (shotgun sequencing using the MiSeq platform. Bayesian inference and maximum likelihood analyses using seven different partitioning strategies were unable to resolve compellingly all phylogenetic relationships among the Seychelles caecilian species, indicating the need for additional data in this case.

  7. Coverage Metrics for Model Checking

    Science.gov (United States)

    Penix, John; Visser, Willem; Norvig, Peter (Technical Monitor)

    2001-01-01

    When using model checking to verify programs in practice, it is not usually possible to achieve complete coverage of the system. In this position paper we describe ongoing research within the Automated Software Engineering group at NASA Ames on the use of test coverage metrics to measure partial coverage and provide heuristic guidance for program model checking. We are specifically interested in applying and developing coverage metrics for concurrent programs that might be used to support certification of next generation avionics software.

  8. Hybridization-based reconstruction of small non-coding RNA transcripts from deep sequencing data.

    Science.gov (United States)

    Ragan, Chikako; Mowry, Bryan J; Bauer, Denis C

    2012-09-01

    Recent advances in RNA sequencing technology (RNA-Seq) enables comprehensive profiling of RNAs by producing millions of short sequence reads from size-fractionated RNA libraries. Although conventional tools for detecting and distinguishing non-coding RNAs (ncRNAs) from reference-genome data can be applied to sequence data, ncRNA detection can be improved by harnessing the full information content provided by this new technology. Here we present NorahDesk, the first unbiased and universally applicable method for small ncRNAs detection from RNA-Seq data. NorahDesk utilizes the coverage-distribution of small RNA sequence data as well as thermodynamic assessments of secondary structure to reliably predict and annotate ncRNA classes. Using publicly available mouse sequence data from brain, skeletal muscle, testis and ovary, we evaluated our method with an emphasis on the performance for microRNAs (miRNAs) and piwi-interacting small RNA (piRNA). We compared our method with Dario and mirDeep2 and found that NorahDesk produces longer transcripts with higher read coverage. This feature makes it the first method particularly suitable for the prediction of both known and novel piRNAs.

  9. Increased sequence diversity coverage improves detection of HIV-Specific T cell responses

    DEFF Research Database (Denmark)

    Frahm, N.; Kaufmann, D.E.; Yusim, K.

    2007-01-01

    The accurate identification of HIV-specific T cell responses is important for determining the relationship between immune response, viral control, and disease progression. HIV-specific immune responses are usually measured using peptide sets based on consensus sequences, which frequently miss res...

  10. Influence of biocrusts coverage on microbial communities from underlying arid lands soils

    Science.gov (United States)

    Anguita-Maeso, Manuel; Miralles*, Isabel; van Wesemael, Bas; Lázaro, Roberto; Ortega, Raúl; García-Salcedo, José Antonio; Soriano**, Miguel

    2017-04-01

    In regions where the water availability limits the plant cover, biological soil crusts are especially essential in the development of an almost continuous living skin mediating the inputs and outputs across the soil surface boundary. However, the entire area is not covered equally and microbial communities from underlying soils might be influenced by biocrust type and the percentage of biocrust coverage. To clarify this question, we have collected underlying soils from biocrusts samples dominated by i) incipient colonization by cyanobacteria, ii) cyanobacteria, biocrusts formed by the lichens: iii) Diploschistes diacapsis and Squamarina lentigera and iv) Lepraria issidiata from Tabernas desert (southeast of Spain) so as to determine the differences in the microbial communities from these underlying soils at two extremes of its spatial distribution range: one with a high percentage of biocrust coverage and fewer degradation and other with a huge degradation and less percentage of biocrust coverage. DNA from these samples was isolated by using a commercial kit and it was taken as template for metagenomic analysis. We conducted a sequencing of the amplicons V4-V5 of the 16S rRNA gene with Next-Generation Sequencing (NGS) Illumina MiSeq platform and a relative quantity of bacteria and fungi were accomplished by quantitative qPCR of rRNA 16S and ITS1-5.8S, respectively. The high biocrust coverage position revealed the highest number of bacteria per gram of soil (1.64E+09 in L. issidiata, in 1.89E+09 D. diacapsis and S. lentigera, 1.63E+09 in cyanobacteria and 2.08E+09 in incipient colonization by cyanobacteria) whereas the less favourable position according to the percentage of biocrust coverage showed fewer amount (1.16E+09 in L. issidiata, 6.98E+08 in D. diacapsis and S. lentigera, 1.46E+09 in cyanobacteria and 7.92E+08 in incipient cyanobacteria biocrust). Similarly, the amount of fungi per gram of soil presented identical correlation ranging from the favourable

  11. Quantifying population genetic differentiation from next-generation sequencing data

    DEFF Research Database (Denmark)

    Fumagalli, Matteo; Garrett Vieira, Filipe Jorge; Korneliussen, Thorfinn Sand

    2013-01-01

    method for quantifying population genetic differentiation from next-generation sequencing data. In addition, we present a strategy to investigate population structure via Principal Components Analysis. Through extensive simulations, we compare the new method herein proposed to approaches based...... on genotype calling and demonstrate a marked improvement in estimation accuracy for a wide range of conditions. We apply the method to a large-scale genomic data set of domesticated and wild silkworms sequenced at low coverage. We find that we can infer the fine-scale genetic structure of the sampled......Over the last few years, new high-throughput DNA sequencing technologies have dramatically increased speed and reduced sequencing costs. However, the use of these sequencing technologies is often challenged by errors and biases associated with the bioinformatical methods used for analyzing the data...

  12. 5 CFR 890.1106 - Coverage.

    Science.gov (United States)

    2010-01-01

    ... family member is an individual whose relationship to the enrollee meets the requirements of 5 U.S.C. 8901... EMPLOYEES HEALTH BENEFITS PROGRAM Temporary Continuation of Coverage § 890.1106 Coverage. (a) Type of enrollment. An individual who enrolls under this subpart may elect coverage for self alone or self and family...

  13. Detecting false positive sequence homology: a machine learning approach.

    Science.gov (United States)

    Fujimoto, M Stanley; Suvorov, Anton; Jensen, Nicholas O; Clement, Mark J; Bybee, Seth M

    2016-02-24

    Accurate detection of homologous relationships of biological sequences (DNA or amino acid) amongst organisms is an important and often difficult task that is essential to various evolutionary studies, ranging from building phylogenies to predicting functional gene annotations. There are many existing heuristic tools, most commonly based on bidirectional BLAST searches that are used to identify homologous genes and combine them into two fundamentally distinct classes: orthologs and paralogs. Due to only using heuristic filtering based on significance score cutoffs and having no cluster post-processing tools available, these methods can often produce multiple clusters constituting unrelated (non-homologous) sequences. Therefore sequencing data extracted from incomplete genome/transcriptome assemblies originated from low coverage sequencing or produced by de novo processes without a reference genome are susceptible to high false positive rates of homology detection. In this paper we develop biologically informative features that can be extracted from multiple sequence alignments of putative homologous genes (orthologs and paralogs) and further utilized in context of guided experimentation to verify false positive outcomes. We demonstrate that our machine learning method trained on both known homology clusters obtained from OrthoDB and randomly generated sequence alignments (non-homologs), successfully determines apparent false positives inferred by heuristic algorithms especially among proteomes recovered from low-coverage RNA-seq data. Almost ~42 % and ~25 % of predicted putative homologies by InParanoid and HaMStR respectively were classified as false positives on experimental data set. Our process increases the quality of output from other clustering algorithms by providing a novel post-processing method that is both fast and efficient at removing low quality clusters of putative homologous genes recovered by heuristic-based approaches.

  14. Shot-gun proteome and transcriptome mapping of the jujube floral organ and identification of a pollen-specific S-locus F-box gene

    Directory of Open Access Journals (Sweden)

    Ruihong Chen

    2017-07-01

    Full Text Available The flower is a plant reproductive organ that forms part of the fruit produced as the flowering season ends. While the number and identity of proteins expressed in a jujube (Ziziphus jujuba Mill. flower is currently unknown, integrative proteomic and transcriptomic analyses provide a systematic strategy of characterizing the floral biology of plants. We conducted a shotgun proteomic analysis on jujube flowers by using a filter-aided sample preparation tryptic digestion, followed by liquid chromatography-tandem mass spectrometry (LC-MS/MS. In addition, transcriptomics analyses were performed on HiSeq2000 sequencers. In total, 7,853 proteins were identified accounting for nearly 30% of the ‘Junzao’ gene models (27,443. Genes identified in proteome generally showed higher RPKM (reads per kilobase per million mapped reads values than undetected genes. Gene ontology categories showed that ribosomes and intracellular organelles were the most dominant classes and accounted for 17.0% and 14.0% of the proteome mass, respectively. The top-ranking proteins with iBAQ >1010 included non-specific lipid transfer proteins, histones, actin-related proteins, fructose-bisphosphate aldolase, Bet v I type allergens, etc. In addition, we identified one pollen-specificity S-locus F-box-like gene located on the same chromosome as the S-RNase gene. Both of these may activate the behaviour of gametophyte self-incompatibility in jujube. These results reflected the protein profile features of jujube flowers and contributes new information important to the jujube breeding system.

  15. 29 CFR 801.3 - Coverage.

    Science.gov (United States)

    2010-07-01

    ... 29 Labor 3 2010-07-01 2010-07-01 false Coverage. 801.3 Section 801.3 Labor Regulations Relating to Labor (Continued) WAGE AND HOUR DIVISION, DEPARTMENT OF LABOR OTHER LAWS APPLICATION OF THE EMPLOYEE POLYGRAPH PROTECTION ACT OF 1988 General § 801.3 Coverage. (a) The coverage of the Act extends to “any...

  16. Coverage-based constraints for IMRT optimization

    Science.gov (United States)

    Mescher, H.; Ulrich, S.; Bangert, M.

    2017-09-01

    Radiation therapy treatment planning requires an incorporation of uncertainties in order to guarantee an adequate irradiation of the tumor volumes. In current clinical practice, uncertainties are accounted for implicitly with an expansion of the target volume according to generic margin recipes. Alternatively, it is possible to account for uncertainties by explicit minimization of objectives that describe worst-case treatment scenarios, the expectation value of the treatment or the coverage probability of the target volumes during treatment planning. In this note we show that approaches relying on objectives to induce a specific coverage of the clinical target volumes are inevitably sensitive to variation of the relative weighting of the objectives. To address this issue, we introduce coverage-based constraints for intensity-modulated radiation therapy (IMRT) treatment planning. Our implementation follows the concept of coverage-optimized planning that considers explicit error scenarios to calculate and optimize patient-specific probabilities q(\\hat{d}, \\hat{v}) of covering a specific target volume fraction \\hat{v} with a certain dose \\hat{d} . Using a constraint-based reformulation of coverage-based objectives we eliminate the trade-off between coverage and competing objectives during treatment planning. In-depth convergence tests including 324 treatment plan optimizations demonstrate the reliability of coverage-based constraints for varying levels of probability, dose and volume. General clinical applicability of coverage-based constraints is demonstrated for two cases. A sensitivity analysis regarding penalty variations within this planing study based on IMRT treatment planning using (1) coverage-based constraints, (2) coverage-based objectives, (3) probabilistic optimization, (4) robust optimization and (5) conventional margins illustrates the potential benefit of coverage-based constraints that do not require tedious adjustment of target volume objectives.

  17. Genomic and proteomic identification of Late Holocene remains

    DEFF Research Database (Denmark)

    Biard, Vincent; Gol'din, Pavel; Gladilina, Elena

    2017-01-01

    A critical challenge of the 21st century is to understand and minimise the effects of human activities on biodiversity. Cetaceans are a prime concern in biodiversity research, as many species still suffer from human impacts despite decades of management and conservation efforts. Zooarchaeology...... sequencing approach. In addition, shotgun sequencing produced several complete ancient odontocete mitogenomes and auxiliary nuclear genomic data for further exploration in a population genetic context. In contrast, both morphological identification and Sanger sequencing lacked taxonomic resolution and....../or resulted in misclassification of samples. We found that the combination of ZooMS and shotgun sequencing provides a powerful tool in zooarchaeology, and here allowed for a deeper understanding of past marine resource use and its implication for current management and conservation of Black Sea odontocetes....

  18. Monitoring intervention coverage in the context of universal health coverage.

    Directory of Open Access Journals (Sweden)

    Ties Boerma

    2014-09-01

    Full Text Available Monitoring universal health coverage (UHC focuses on information on health intervention coverage and financial protection. This paper addresses monitoring intervention coverage, related to the full spectrum of UHC, including health promotion and disease prevention, treatment, rehabilitation, and palliation. A comprehensive core set of indicators most relevant to the country situation should be monitored on a regular basis as part of health progress and systems performance assessment for all countries. UHC monitoring should be embedded in a broad results framework for the country health system, but focus on indicators related to the coverage of interventions that most directly reflect the results of UHC investments and strategies in each country. A set of tracer coverage indicators can be selected, divided into two groups-promotion/prevention, and treatment/care-as illustrated in this paper. Disaggregation of the indicators by the main equity stratifiers is critical to monitor progress in all population groups. Targets need to be set in accordance with baselines, historical rate of progress, and measurement considerations. Critical measurement gaps also exist, especially for treatment indicators, covering issues such as mental health, injuries, chronic conditions, surgical interventions, rehabilitation, and palliation. Consequently, further research and proxy indicators need to be used in the interim. Ideally, indicators should include a quality of intervention dimension. For some interventions, use of a single indicator is feasible, such as management of hypertension; but in many areas additional indicators are needed to capture quality of service provision. The monitoring of UHC has significant implications for health information systems. Major data gaps will need to be filled. At a minimum, countries will need to administer regular household health surveys with biological and clinical data collection. Countries will also need to improve the

  19. [Quantification of acetabular coverage in normal adult].

    Science.gov (United States)

    Lin, R M; Yang, C Y; Yu, C Y; Yang, C R; Chang, G L; Chou, Y L

    1991-03-01

    Quantification of acetabular coverage is important and can be expressed by superimposition of cartilage tracings on the maximum cross-sectional area of the femoral head. A practical Autolisp program on PC AutoCAD has been developed by us to quantify the acetabular coverage through numerical expression of the images of computed tomography. Thirty adults (60 hips) with normal center-edge angle and acetabular index in plain X ray were randomly selected for serial drops. These slices were prepared with a fixed coordination and in continuous sections of 5 mm in thickness. The contours of the cartilage of each section were digitized into a PC computer and processed by AutoCAD programs to quantify and characterize the acetabular coverage of normal and dysplastic adult hips. We found that a total coverage ratio of greater than 80%, an anterior coverage ratio of greater than 75% and a posterior coverage ratio of greater than 80% can be categorized in a normal group. Polar edge distance is a good indicator for the evaluation of preoperative and postoperative coverage conditions. For standardization and evaluation of acetabular coverage, the most suitable parameters are the total coverage ratio, anterior coverage ratio, posterior coverage ratio and polar edge distance. However, medial coverage and lateral coverage ratios are indispensable in cases of dysplastic hip because variations between them are so great that acetabuloplasty may be impossible. This program can also be used to classify precisely the type of dysplastic hip.

  20. Comparative shotgun proteomic analysis of wild and domesticated Opuntia spp. species shows a metabolic adaptation through domestication.

    Science.gov (United States)

    Pichereaux, Carole; Hernández-Domínguez, Eric E; Santos-Diaz, Maria Del Socorro; Reyes-Agüero, Antonio; Astello-García, Marizel; Guéraud, Françoise; Negre-Salvayre, Anne; Schiltz, Odile; Rossignol, Michel; Barba de la Rosa, Ana Paulina

    2016-06-30

    The Opuntia genus is widely distributed in America, but the highest richness of wild species are found in Mexico, as well as the most domesticated Opuntia ficus-indica, which is the most domesticated species and an important crop in agricultural economies of arid and semiarid areas worldwide. During domestication process, the Opuntia morphological characteristics were favored, such as less and smaller spines in cladodes and less seeds in fruits, but changes at molecular level are almost unknown. To obtain more insights about the Opuntia molecular changes through domestication, a shotgun proteomic analysis and database-dependent searches by homology was carried out. >1000 protein species were identified and by using a label-free quantitation method, the Opuntia proteomes were compared in order to identify differentially accumulated proteins among wild and domesticated species. Most of the changes were observed in glucose, secondary, and 1C metabolism, which correlate with the observed protein, fiber and phenolic compounds accumulation in Opuntia cladodes. Regulatory proteins, ribosomal proteins, and proteins related with response to stress were also observed in differential accumulation. These results provide new valuable data that will help to the understanding of the molecular changes of Opuntia species through domestication. Opuntia species are well adapted to dry and warm conditions in arid and semiarid regions worldwide, and they are highly productive plants showing considerable promises as an alternative food source. However, there is a gap regarding Opuntia molecular mechanisms that enable them to grow in extreme environmental conditions and how the domestication processes has changed them. In the present study, a shotgun analysis was carried out to characterize the proteomes of five Opuntia species selected by its domestication degree. Our results will help to a better understanding of proteomic features underlying the selection and specialization under

  1. The Pinus taeda genome is characterized by diverse and highly diverged repetitive sequences

    Directory of Open Access Journals (Sweden)

    Yandell Mark

    2010-07-01

    Full Text Available Abstract Background In today's age of genomic discovery, no attempt has been made to comprehensively sequence a gymnosperm genome. The largest genus in the coniferous family Pinaceae is Pinus, whose 110-120 species have extremely large genomes (c. 20-40 Gb, 2N = 24. The size and complexity of these genomes have prompted much speculation as to the feasibility of completing a conifer genome sequence. Conifer genomes are reputed to be highly repetitive, but there is little information available on the nature and identity of repetitive units in gymnosperms. The pines have extensive genetic resources, with approximately 329000 ESTs from eleven species and genetic maps in eight species, including a dense genetic map of the twelve linkage groups in Pinus taeda. Results We present here the Sanger sequence and annotation of ten P. taeda BAC clones and Genome Analyzer II whole genome shotgun (WGS sequences representing 7.5% of the genome. Computational annotation of ten BACs predicts three putative protein-coding genes and at least fifteen likely pseudogenes in nearly one megabase of sequence. We found three conifer-specific LTR retroelements in the BACs, and tentatively identified at least 15 others based on evidence from the distantly related angiosperms. Alignment of WGS sequences to the BACs indicates that 80% of BAC sequences have similar copies (≥ 75% nucleotide identity elsewhere in the genome, but only 23% have identical copies (99% identity. The three most common repetitive elements in the genome were identified and, when combined, represent less than 5% of the genome. Conclusions This study indicates that the majority of repeats in the P. taeda genome are 'novel' and will therefore require additional BAC or genomic sequencing for accurate characterization. The pine genome contains a very large number of diverged and probably defunct repetitive elements. This study also provides new evidence that sequencing a pine genome using a WGS approach is

  2. Cooperative Cloud Service Aware Mobile Internet Coverage Connectivity Guarantee Protocol Based on Sensor Opportunistic Coverage Mechanism

    Directory of Open Access Journals (Sweden)

    Qin Qin

    2015-01-01

    Full Text Available In order to improve the Internet coverage ratio and provide connectivity guarantee, based on sensor opportunistic coverage mechanism and cooperative cloud service, we proposed the coverage connectivity guarantee protocol for mobile Internet. In this scheme, based on the opportunistic covering rules, the network coverage algorithm of high reliability and real-time security was achieved by using the opportunity of sensor nodes and the Internet mobile node. Then, the cloud service business support platform is created based on the Internet application service management capabilities and wireless sensor network communication service capabilities, which is the architecture of the cloud support layer. The cooperative cloud service aware model was proposed. Finally, we proposed the mobile Internet coverage connectivity guarantee protocol. The results of experiments demonstrate that the proposed algorithm has excellent performance, in terms of the security of the Internet and the stability, as well as coverage connectivity ability.

  3. Lessons learned from microsatellite development for nonmodel organisms using 454 pyrosequencing

    Czech Academy of Sciences Publication Activity Database

    Schoebel, C. N.; Brodbeck, S.; Buehler, D.; Cornejo, C.; Gajurel, J.; Hartikainen, H.; Keller, D.; Leys, M.; Říčanová, Štěpánka; Segelbacher, G.; Werth, S.; Csencsics, D.

    2013-01-01

    Roč. 26, č. 3 (2013), s. 600-611 ISSN 1010-061X Institutional support: RVO:68081766 Keywords : comparative studies * conservation genetics * massively parallel sequencing * next generation sequencing technology * population genetics * shotgun sequencing Subject RIV: EB - Genetics ; Molecular Biology Impact factor: 3.483, year: 2013

  4. Eu-Detect: An algorithm for detecting eukaryotic sequences in ...

    Indian Academy of Sciences (India)

    Supplementary figure 1. Plots depicting the classification accuracy of Eu-Detect with various combinations of. 'cumulative sequence count' (40K, 50K, 60K, 70K, 80K) and 'coverage threshold' (20%, 30%, 40%, 50%, 60%, 70%,. 80%). While blue bars represent Eu-Detect's average classification accuracy with eukaryotic ...

  5. Metagenomic analysis and functional characterization of the biogas microbiome using high throughput shotgun sequencing and a novel binning strategy.

    Science.gov (United States)

    Campanaro, Stefano; Treu, Laura; Kougias, Panagiotis G; De Francisci, Davide; Valle, Giorgio; Angelidaki, Irini

    2016-01-01

    Biogas production is an economically attractive technology that has gained momentum worldwide over the past years. Biogas is produced by a biologically mediated process, widely known as "anaerobic digestion." This process is performed by a specialized and complex microbial community, in which different members have distinct roles in the establishment of a collective organization. Deciphering the complex microbial community engaged in this process is interesting both for unraveling the network of bacterial interactions and for applicability potential to the derived knowledge. In this study, we dissect the bioma involved in anaerobic digestion by means of high throughput Illumina sequencing (~51 gigabases of sequence data), disclosing nearly one million genes and extracting 106 microbial genomes by a novel strategy combining two binning processes. Microbial phylogeny and putative taxonomy performed using >400 proteins revealed that the biogas community is a trove of new species. A new approach based on functional properties as per network representation was developed to assign roles to the microbial species. The organization of the anaerobic digestion microbiome is resembled by a funnel concept, in which the microbial consortium presents a progressive functional specialization while reaching the final step of the process (i.e., methanogenesis). Key microbial genomes encoding enzymes involved in specific metabolic pathways, such as carbohydrates utilization, fatty acids degradation, amino acids fermentation, and syntrophic acetate oxidation, were identified. Additionally, the analysis identified a new uncultured archaeon that was putatively related to Methanomassiliicoccales but surprisingly having a methylotrophic methanogenic pathway. This study is a pioneer research on the phylogenetic and functional characterization of the microbial community populating biogas reactors. By applying for the first time high-throughput sequencing and a novel binning strategy, the

  6. Short-read reading-frame predictors are not created equal: sequence error causes loss of signal

    Directory of Open Access Journals (Sweden)

    Trimble William L

    2012-07-01

    Full Text Available Abstract Background Gene prediction algorithms (or gene callers are an essential tool for analyzing shotgun nucleic acid sequence data. Gene prediction is a ubiquitous step in sequence analysis pipelines; it reduces the volume of data by identifying the most likely reading frame for a fragment, permitting the out-of-frame translations to be ignored. In this study we evaluate five widely used ab initio gene-calling algorithms—FragGeneScan, MetaGeneAnnotator, MetaGeneMark, Orphelia, and Prodigal—for accuracy on short (75–1000 bp fragments containing sequence error from previously published artificial data and “real” metagenomic datasets. Results While gene prediction tools have similar accuracies predicting genes on error-free fragments, in the presence of sequencing errors considerable differences between tools become evident. For error-containing short reads, FragGeneScan finds more prokaryotic coding regions than does MetaGeneAnnotator, MetaGeneMark, Orphelia, or Prodigal. This improved detection of genes in error-containing fragments, however, comes at the cost of much lower (50% specificity and overprediction of genes in noncoding regions. Conclusions Ab initio gene callers offer a significant reduction in the computational burden of annotating individual nucleic acid reads and are used in many metagenomic annotation systems. For predicting reading frames on raw reads, we find the hidden Markov model approach in FragGeneScan is more sensitive than other gene prediction tools, while Prodigal, MGA, and MGM are better suited for higher-quality sequences such as assembled contigs.

  7. A massive parallel sequencing workflow for diagnostic genetic testing of mismatch repair genes

    Science.gov (United States)

    Hansen, Maren F; Neckmann, Ulrike; Lavik, Liss A S; Vold, Trine; Gilde, Bodil; Toft, Ragnhild K; Sjursen, Wenche

    2014-01-01

    The purpose of this study was to develop a massive parallel sequencing (MPS) workflow for diagnostic analysis of mismatch repair (MMR) genes using the GS Junior system (Roche). A pathogenic variant in one of four MMR genes, (MLH1, PMS2, MSH6, and MSH2), is the cause of Lynch Syndrome (LS), which mainly predispose to colorectal cancer. We used an amplicon-based sequencing method allowing specific and preferential amplification of the MMR genes including PMS2, of which several pseudogenes exist. The amplicons were pooled at different ratios to obtain coverage uniformity and maximize the throughput of a single-GS Junior run. In total, 60 previously identified and distinct variants (substitutions and indels), were sequenced by MPS and successfully detected. The heterozygote detection range was from 19% to 63% and dependent on sequence context and coverage. We were able to distinguish between false-positive and true-positive calls in homopolymeric regions by cross-sample comparison and evaluation of flow signal distributions. In addition, we filtered variants according to a predefined status, which facilitated variant annotation. Our study shows that implementation of MPS in routine diagnostics of LS can accelerate sample throughput and reduce costs without compromising sensitivity, compared to Sanger sequencing. PMID:24689082

  8. Proteomic analysis of protein interactions between Eimeria maxima sporozoites and chicken jejunal epithelial cells by shotgun LC-MS/MS.

    Science.gov (United States)

    Huang, Jingwei; Liu, Tingqi; Li, Ke; Song, Xiaokai; Yan, Ruofeng; Xu, Lixin; Li, Xiangrui

    2018-04-04

    Eimeria maxima initiates infection by invading the jejunal epithelial cells of chicken. However, the proteins involved in invasion remain unknown. The research of the molecules that participate in the interactions between E. maxima sporozoites and host target cells will fill a gap in our understanding of the invasion system of this parasitic pathogen. In the present study, chicken jejunal epithelial cells were isolated and cultured in vitro. Western blot was employed to analyze the soluble proteins of E. maxima sporozoites that bound to chicken jejunal epithelial cells. Co-immunoprecipitation (co-IP) assay was used to separate the E. maxima proteins that bound to chicken jejunal epithelial cells. Shotgun LC-MS/MS technique was used for proteomics identification and Gene Ontology was employed for the bioinformatics analysis. The results of Western blot analysis showed that four proteins bands from jejunal epithelial cells co-cultured with soluble proteins of E. maxima sporozoites were recognized by the positive sera, with molecular weights of 70, 90, 95 and 130 kDa. The co-IP dilutions were analyzed by shotgun LC-MS/MS. A total of 204 proteins were identified in the E. maxima protein database using the MASCOT search engine. Thirty-five proteins including microneme protein 3 and 7 had more than two unique peptide counts and were annotated using Gene Ontology for molecular function, biological process and cellular localization. The results revealed that of the 35 annotated peptides, 22 (62.86%) were associated with binding activity and 15 (42.86%) were involved in catalytic activity. Our findings provide an insight into the interaction between E. maxima and the corresponding host cells and it is important for the understanding of molecular mechanisms underlying E. maxima invasion.

  9. Whole-genome shotgun optical mapping of Rhodospirillum rubrum

    Energy Technology Data Exchange (ETDEWEB)

    Reslewic, S. [Univ. Wisc.-Madison; Zhou, S. [Univ. Wisc.-Madison; Place, M. [Univ. Wisc.-Madison; Zhang, Y. [Univ. Wisc.-Madison; Briska, A. [Univ. Wisc.-Madison; Goldstein, S. [Univ. Wisc.-Madison; Churas, C. [Univ. Wisc.-Madison; Runnheim, R. [Univ. Wisc.-Madison; Forrest, D. [Univ. Wisc.-Madison; Lim, A. [Univ. Wisc.-Madison; Lapidus, A. [Univ. Wisc.-Madison; Han, C. S. [Univ. Wisc.-Madison; Roberts, G. P. [Univ. Wisc.-Madison; Schwartz, D. C. [Univ. Wisc.-Madison

    2005-09-01

    Rhodospirillum rubrum is a phototrophic purple nonsulfur bacterium known for its unique and well-studied nitrogen fixation and carbon monoxide oxidation systems and as a source of hydrogen and biodegradable plastic production. To better understand this organism and to facilitate assembly of its sequence, three whole-genome restriction endonuclease maps (XbaI, NheI, and HindIII) of R. rubrum strain ATCC 11170 were created by optical mapping. Optical mapping is a system for creating whole-genome ordered restriction endonuclease maps from randomly sheared genomic DNA molecules extracted from cells. During the sequence finishing process, all three optical maps confirmed a putative error in sequence assembly, while the HindIII map acted as a scaffold for high-resolution alignment with sequence contigs spanning the whole genome. In addition to highlighting optical mapping's role in the assembly and confirmation of genome sequence, this work underscores the unique niche in resolution occupied by the optical mapping system. With a resolution ranging from 6.5 kb (previously published) to 45 kb (reported here), optical mapping advances a "molecular cytogenetics" approach to solving problems in genomic analysis.

  10. Draft Genome of Scalindua rubra, Obtained from the Interface Above the Discovery Deep Brine in the Red Sea, Sheds Light on Potential Salt Adaptation Strategies in Anammox Bacteria.

    Science.gov (United States)

    Speth, Daan R; Lagkouvardos, Ilias; Wang, Yong; Qian, Pei-Yuan; Dutilh, Bas E; Jetten, Mike S M

    2017-07-01

    Several recent studies have indicated that members of the phylum Planctomycetes are abundantly present at the brine-seawater interface (BSI) above multiple brine pools in the Red Sea. Planctomycetes include bacteria capable of anaerobic ammonium oxidation (anammox). Here, we investigated the possibility of anammox at BSI sites using metagenomic shotgun sequencing of DNA obtained from the BSI above the Discovery Deep brine pool. Analysis of sequencing reads matching the 16S rRNA and hzsA genes confirmed presence of anammox bacteria of the genus Scalindua. Phylogenetic analysis of the 16S rRNA gene indicated that this Scalindua sp. belongs to a distinct group, separate from the anammox bacteria in the seawater column, that contains mostly sequences retrieved from high-salt environments. Using coverage- and composition-based binning, we extracted and assembled the draft genome of the dominant anammox bacterium. Comparative genomic analysis indicated that this Scalindua species uses compatible solutes for osmoadaptation, in contrast to other marine anammox bacteria that likely use a salt-in strategy. We propose the name Candidatus Scalindua rubra for this novel species, alluding to its discovery in the Red Sea.

  11. Intrinsic challenges in ancient microbiome reconstruction using 16S rRNA gene amplification.

    Science.gov (United States)

    Ziesemer, Kirsten A; Mann, Allison E; Sankaranarayanan, Krithivasan; Schroeder, Hannes; Ozga, Andrew T; Brandt, Bernd W; Zaura, Egija; Waters-Rist, Andrea; Hoogland, Menno; Salazar-García, Domingo C; Aldenderfer, Mark; Speller, Camilla; Hendy, Jessica; Weston, Darlene A; MacDonald, Sandy J; Thomas, Gavin H; Collins, Matthew J; Lewis, Cecil M; Hofman, Corinne; Warinner, Christina

    2015-11-13

    To date, characterization of ancient oral (dental calculus) and gut (coprolite) microbiota has been primarily accomplished through a metataxonomic approach involving targeted amplification of one or more variable regions in the 16S rRNA gene. Specifically, the V3 region (E. coli 341-534) of this gene has been suggested as an excellent candidate for ancient DNA amplification and microbial community reconstruction. However, in practice this metataxonomic approach often produces highly skewed taxonomic frequency data. In this study, we use non-targeted (shotgun metagenomics) sequencing methods to better understand skewed microbial profiles observed in four ancient dental calculus specimens previously analyzed by amplicon sequencing. Through comparisons of microbial taxonomic counts from paired amplicon (V3 U341F/534R) and shotgun sequencing datasets, we demonstrate that extensive length polymorphisms in the V3 region are a consistent and major cause of differential amplification leading to taxonomic bias in ancient microbiome reconstructions based on amplicon sequencing. We conclude that systematic amplification bias confounds attempts to accurately reconstruct microbiome taxonomic profiles from 16S rRNA V3 amplicon data generated using universal primers. Because in silico analysis indicates that alternative 16S rRNA hypervariable regions will present similar challenges, we advocate for the use of a shotgun metagenomics approach in ancient microbiome reconstructions.

  12. Draft genome sequence of Streptomyces coelicoflavus ZG0656 reveals the putative biosynthetic gene cluster of acarviostatin family α-amylase inhibitors.

    Science.gov (United States)

    Guo, X; Geng, P; Bai, F; Bai, G; Sun, T; Li, X; Shi, L; Zhong, Q

    2012-08-01

    The aims of this study are to obtain the draft genome sequence of Streptomyces coelicoflavus ZG0656, which produces novel acarviostatin family α-amylase inhibitors, and then to reveal the putative acarviostatin-related gene cluster and the biosynthetic pathway. The draft genome sequence of S. coelicoflavus ZG0656 was generated using a shotgun approach employing a combination of 454 and Solexa sequencing technologies. Genome analysis revealed a putative gene cluster for acarviostatin biosynthesis, termed sct-cluster. The cluster contains 13 acarviostatin synthetic genes, six transporter genes, four starch degrading or transglycosylation enzyme genes and two regulator genes. On the basis of bioinformatic analysis, we proposed a putative biosynthetic pathway of acarviostatins. The intracellular steps produce a structural core, acarviostatin I00-7-P, and the extracellular assemblies lead to diverse acarviostatin end products. The draft genome sequence of S. coelicoflavus ZG0656 revealed the putative biosynthetic gene cluster of acarviostatins and a putative pathway of acarviostatin production. To our knowledge, S. coelicoflavus ZG0656 is the first strain in this species for which a genome sequence has been reported. The analysis of sct-cluster provided important insights into the biosynthesis of acarviostatins. This work will be a platform for producing novel variants and yield improvement. © 2012 The Authors. Letters in Applied Microbiology © 2012 The Society for Applied Microbiology.

  13. Predicted Strain Coverage of a New Meningococcal Multicomponent Vaccine (4CMenB in Spain: Analysis of the Differences with Other European Countries.

    Directory of Open Access Journals (Sweden)

    Raquel Abad

    Full Text Available A novel meningococcal multicomponent vaccine, 4CMenB (Bexsero®, has been approved in Europe, Canada, Australia and US. The potential impact of 4CMenB on strain coverage is being estimated by using Meningococcal Antigen Typing System (MATS, an ELISA assay which measures vaccine antigen expression and diversity in each strain. Here we show the genetic characterization and the 4CMenB potential coverage of Spanish invasive strains (collected during one epidemiological year compared to other European countries and discuss the potential reasons for the lower estimate of coverage in Spain.A panel of 300 strains, a representative sample of all serogroup B Neisseria meningitidis notified cases in Spain from 2009 to 2010, was characterized by multilocus sequence typing (MLST and FetA variable region determination. 4CMenB vaccine antigens, PorA, factor H binding protein (fHbp, Neisseria Heparin Binding Antigen (NHBA and Neisserial adhesin A (NadA were molecularly typed by sequencing. PorA coverage was assigned to strain with VR2 = 4. The levels of expression and cross-reactivity of fHbp, NHBA and NadA were analyzed using MATS ELISA.Global estimated strain coverage by MATS was 68.67% (95% CI: 47.77-84.59%, with 51.33%, 15.33% and 2% of strains covered by one, two and three vaccine antigens, respectively. The predicted strain coverage by individual antigens was: 42% NHBA, 36.33% fHbp, 8.33% PorA and 1.33% NadA. Coverage within the most prevalent clonal complexes (cc was 70.37% for cc 269, 30.19% for cc 213 and 95.83% for cc 32.Clonal complexes (cc distribution accounts for variations in strain coverage, so that country-by-country investigations of strain coverage and cc prevalence are important. Because the cc distribution could also vary over time, which in turn could lead to changes in strain coverage, continuous detailed surveillance and monitoring of vaccine antigens expression is needed in those countries where the multicomponent vaccine is introduced

  14. Photoelectric UBVRI sequences in the Magellanic Cloud clusters Lindsay 1, NGC 339, NGC 361, and NGC 1466

    International Nuclear Information System (INIS)

    Alcaino, G.; Alvarado, F.; Wenderoth, E.; Liller, W.

    1990-01-01

    UBVRI sequences in three Small Magellanic Cloud (SMC) clusters Lindsay 1, NGC 339, NGC 361, and in NGC 1466, which lies between the two Magellanic Clouds, are presented. These sequences are appropriate for charge-coupled device (CCD) coverage. Only BV standards have been published in NGC 339 and UBV in NGC 1466; no sequences exist for the two other clusters. 15 refs

  15. Karect: accurate correction of substitution, insertion and deletion errors for next-generation sequencing data

    KAUST Repository

    Allam, Amin

    2015-07-14

    Motivation: Next-generation sequencing generates large amounts of data affected by errors in the form of substitutions, insertions or deletions of bases. Error correction based on the high-coverage information, typically improves de novo assembly. Most existing tools can correct substitution errors only; some support insertions and deletions, but accuracy in many cases is low. Results: We present Karect, a novel error correction technique based on multiple alignment. Our approach supports substitution, insertion and deletion errors. It can handle non-uniform coverage as well as moderately covered areas of the sequenced genome. Experiments with data from Illumina, 454 FLX and Ion Torrent sequencing machines demonstrate that Karect is more accurate than previous methods, both in terms of correcting individual-bases errors (up to 10% increase in accuracy gain) and post de novo assembly quality (up to 10% increase in NGA50). We also introduce an improved framework for evaluating the quality of error correction.

  16. Biofilm-Growing Bacteria Involved in the Corrosion of Concrete Wastewater Pipes: Protocols for Comparative Metagenomic Analyses

    Science.gov (United States)

    Advances in high-throughput next-generation sequencing (NGS) technology for direct sequencing of environmental DNA (i.e. shotgun metagenomics) is transforming the field of microbiology. NGS technologies are now regularly being applied in comparative metagenomic studies, which pr...

  17. Calculation of Tajima's D and other neutrality test statistics from low depth next-generation sequencing data

    DEFF Research Database (Denmark)

    Korneliussen, Thorfinn Sand; Moltke, Ida; Albrechtsen, Anders

    2013-01-01

    A number of different statistics are used for detecting natural selection using DNA sequencing data, including statistics that are summaries of the frequency spectrum, such as Tajima's D. These statistics are now often being applied in the analysis of Next Generation Sequencing (NGS) data. Howeve......, estimates of frequency spectra from NGS data are strongly affected by low sequencing coverage; the inherent technology dependent variation in sequencing depth causes systematic differences in the value of the statistic among genomic regions....

  18. 29 CFR 2.13 - Audiovisual coverage prohibited.

    Science.gov (United States)

    2010-07-01

    ... 29 Labor 1 2010-07-01 2010-07-01 true Audiovisual coverage prohibited. 2.13 Section 2.13 Labor Office of the Secretary of Labor GENERAL REGULATIONS Audiovisual Coverage of Administrative Hearings § 2.13 Audiovisual coverage prohibited. The Department shall not permit audiovisual coverage of the...

  19. Management of High-Throughput DNA Sequencing Projects: Alpheus.

    Science.gov (United States)

    Miller, Neil A; Kingsmore, Stephen F; Farmer, Andrew; Langley, Raymond J; Mudge, Joann; Crow, John A; Gonzalez, Alvaro J; Schilkey, Faye D; Kim, Ryan J; van Velkinburgh, Jennifer; May, Gregory D; Black, C Forrest; Myers, M Kathy; Utsey, John P; Frost, Nicholas S; Sugarbaker, David J; Bueno, Raphael; Gullans, Stephen R; Baxter, Susan M; Day, Steve W; Retzel, Ernest F

    2008-12-26

    High-throughput DNA sequencing has enabled systems biology to begin to address areas in health, agricultural and basic biological research. Concomitant with the opportunities is an absolute necessity to manage significant volumes of high-dimensional and inter-related data and analysis. Alpheus is an analysis pipeline, database and visualization software for use with massively parallel DNA sequencing technologies that feature multi-gigabase throughput characterized by relatively short reads, such as Illumina-Solexa (sequencing-by-synthesis), Roche-454 (pyrosequencing) and Applied Biosystem's SOLiD (sequencing-by-ligation). Alpheus enables alignment to reference sequence(s), detection of variants and enumeration of sequence abundance, including expression levels in transcriptome sequence. Alpheus is able to detect several types of variants, including non-synonymous and synonymous single nucleotide polymorphisms (SNPs), insertions/deletions (indels), premature stop codons, and splice isoforms. Variant detection is aided by the ability to filter variant calls based on consistency, expected allele frequency, sequence quality, coverage, and variant type in order to minimize false positives while maximizing the identification of true positives. Alpheus also enables comparisons of genes with variants between cases and controls or bulk segregant pools. Sequence-based differential expression comparisons can be developed, with data export to SAS JMP Genomics for statistical analysis.

  20. A map of human genome variation from population-scale sequencing.

    Science.gov (United States)

    Abecasis, Gonçalo R; Altshuler, David; Auton, Adam; Brooks, Lisa D; Durbin, Richard M; Gibbs, Richard A; Hurles, Matt E; McVean, Gil A

    2010-10-28

    The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. Here we present results of the pilot phase of the project, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms. We undertook three projects: low-coverage whole-genome sequencing of 179 individuals from four populations; high-coverage sequencing of two mother-father-child trios; and exon-targeted sequencing of 697 individuals from seven populations. We describe the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants, most of which were previously undescribed. We show that, because we have catalogued the vast majority of common variation, over 95% of the currently accessible variants found in any individual are present in this data set. On average, each person is found to carry approximately 250 to 300 loss-of-function variants in annotated genes and 50 to 100 variants previously implicated in inherited disorders. We demonstrate how these results can be used to inform association and functional studies. From the two trios, we directly estimate the rate of de novo germline base substitution mutations to be approximately 10(-8) per base pair per generation. We explore the data with regard to signatures of natural selection, and identify a marked reduction of genetic variation in the neighbourhood of genes, due to selection at linked sites. These methods and public data will support the next phase of human genetic research.

  1. Complete genome sequence of Acinetobacter baumannii XH386 (ST208, a multi-drug resistant bacteria isolated from pediatric hospital in China

    Directory of Open Access Journals (Sweden)

    Youhong Fang

    2016-03-01

    Full Text Available Acinetobacter baumannii is an important bacterium that emerged as a significant nosocomial pathogen worldwide. The rise of A. baumannii was due to its multi-drug resistance (MDR, while it was difficult to treat multi-drug resistant A. baumannii with antibiotics, especially in pediatric patients for the therapeutic options with antibiotics were quite limited in pediatric patients. A. baumannii ST208 was identified as predominant sequence type of carbapenem resistant A. baumannii in the United States and China. As we knew, there was no complete genome sequence reproted for A. baumannii ST208, although several whole genome shotgun sequences had been reported. Here, we sequenced the 4087-kilobase (kb chromosome and 112-kb plasmid of A. baumannii XH386 (ST208, which was isolated from a pediatric hospital in China. The genome of A. baumannii XH386 contained 3968 protein-coding genes and 94 RNA-only encoding genes. Genomic analysis and Minimum inhibitory concentration assay showed that A. baumannii XH386 was multi-drug resistant strain, which showed resistance to most of antibiotics, except for tigecycline. The data may be accessed via the GenBank accession number CP010779 and CP010780. Keywords: Acinetobacter baumannii, Multi-drug resistance, Paediatric

  2. Estimation of allele frequency and association mapping using next-generation sequencing data

    DEFF Research Database (Denmark)

    Kim, Su Yeon; Lohmueller, Kirk E; Albrechtsen, Anders

    2011-01-01

    Estimation of allele frequency is of fundamental importance in population genetic analyses and in association mapping. In most studies using next-generation sequencing, a cost effective approach is to use medium or low-coverage data (e.g., frequency estimation...

  3. Identification of genomic insertion and flanking sequence of G2-EPSPS and GAT transgenes in soybean using whole genome sequencing method

    Directory of Open Access Journals (Sweden)

    Bingfu Guo

    2016-07-01

    Full Text Available Molecular characterization of sequences flanking exogenous fragment insertions is essential for safety assessment and labeling of genetically modified organisms (GMO. In this study, the T-DNA insertion sites and flanking sequences were identified in two newly developed transgenic glyphosate-tolerant soybeans GE-J16 and ZH10-6 based on whole genome sequencing (WGS method. About 21 Gb sequence data (~21× coverage for each line was generated on Illumina HiSeq 2500 platform. The junction reads mapped to boundary of T-DNA and flanking sequences in these two events were identified by comparing all sequencing reads with soybean reference genome and sequence of transgenic vector. The putative insertion loci and flanking sequences were further confirmed by PCR amplification, Sanger sequencing, and co-segregation analysis. All these analyses supported that exogenous T-DNA fragments were integrated in positions of Chr19: 50543767-50543792 and Chr17: 7980527-7980541 in these two transgenic lines. Identification of the genomic insertion site of the G2-EPSPS and GAT transgenes will facilitate the use of their glyphosate-tolerant traits in soybean breeding program. These results also demonstrated that WGS is a cost-effective and rapid method of identifying sites of T-DNA insertions and flanking sequences in soybean.

  4. Determining mutant spectra of three RNA viral samples using ultra-deep sequencing

    Energy Technology Data Exchange (ETDEWEB)

    Chen, H

    2012-06-06

    RNA viruses have extremely high mutation rates that enable the virus to adapt to new host environments and even jump from one species to another. As part of a viral transmission study, three viral samples collected from naturally infected animals were sequenced using Illumina paired-end technology at ultra-deep coverage. In order to determine the mutant spectra within the viral quasispecies, it is critical to understand the sequencing error rates and control for false positive calls of viral variants (point mutantations). I will estimate the sequencing error rate from two control sequences and characterize the mutant spectra in the natural samples with this error rate.

  5. A high-coverage draft genome of the mycalesine butterfly Bicyclus anynana.

    Science.gov (United States)

    Nowell, Reuben W; Elsworth, Ben; Oostra, Vicencio; Zwaan, Bas J; Wheat, Christopher W; Saastamoinen, Marjo; Saccheri, Ilik J; Van't Hof, Arjen E; Wasik, Bethany R; Connahs, Heidi; Aslam, Muhammad L; Kumar, Sujai; Challis, Richard J; Monteiro, Antónia; Brakefield, Paul M; Blaxter, Mark

    2017-07-01

    The mycalesine butterfly Bicyclus anynana, the "Squinting bush brown," is a model organism in the study of lepidopteran ecology, development, and evolution. Here, we present a draft genome sequence for B. anynana to serve as a genomics resource for current and future studies of this important model species. Seven libraries with insert sizes ranging from 350 bp to 20 kb were constructed using DNA from an inbred female and sequenced using both Illumina and PacBio technology; 128 Gb of raw Illumina data was filtered to 124 Gb and assembled to a final size of 475 Mb (∼×260 assembly coverage). Contigs were scaffolded using mate-pair, transcriptome, and PacBio data into 10 800 sequences with an N50 of 638 kb (longest scaffold 5 Mb). The genome is comprised of 26% repetitive elements and encodes a total of 22 642 predicted protein-coding genes. Recovery of a BUSCO set of core metazoan genes was almost complete (98%). Overall, these metrics compare well with other recently published lepidopteran genomes. We report a high-quality draft genome sequence for Bicyclus anynana. The genome assembly and annotated gene models are available at LepBase (http://ensembl.lepbase.org/index.html). © The Authors 2017. Published by Oxford University Press.

  6. A novel method to discover fluoroquinolone antibiotic resistance (qnr genes in fragmented nucleotide sequences

    Directory of Open Access Journals (Sweden)

    Boulund Fredrik

    2012-12-01

    Full Text Available Abstract Background Broad-spectrum fluoroquinolone antibiotics are central in modern health care and are used to treat and prevent a wide range of bacterial infections. The recently discovered qnr genes provide a mechanism of resistance with the potential to rapidly spread between bacteria using horizontal gene transfer. As for many antibiotic resistance genes present in pathogens today, qnr genes are hypothesized to originate from environmental bacteria. The vast amount of data generated by shotgun metagenomics can therefore be used to explore the diversity of qnr genes in more detail. Results In this paper we describe a new method to identify qnr genes in nucleotide sequence data. We show, using cross-validation, that the method has a high statistical power of correctly classifying sequences from novel classes of qnr genes, even for fragments as short as 100 nucleotides. Based on sequences from public repositories, the method was able to identify all previously reported plasmid-mediated qnr genes. In addition, several fragments from novel putative qnr genes were identified in metagenomes. The method was also able to annotate 39 chromosomal variants of which 11 have previously not been reported in literature. Conclusions The method described in this paper significantly improves the sensitivity and specificity of identification and annotation of qnr genes in nucleotide sequence data. The predicted novel putative qnr genes in the metagenomic data support the hypothesis of a large and uncharacterized diversity within this family of resistance genes in environmental bacterial communities. An implementation of the method is freely available at http://bioinformatics.math.chalmers.se/qnr/.

  7. Improved candidate generation and coverage analysis methods for design optimization of symmetric multi-satellite constellations

    Science.gov (United States)

    Matossian, Mark G.

    1997-01-01

    Much attention in recent years has focused on commercial telecommunications ventures involving constellations of spacecraft in low and medium Earth orbit. These projects often require investments on the order of billions of dollars (US$) for development and operations, but surprisingly little work has been published on constellation design optimization for coverage analysis, traffic simulation and launch sequencing for constellation build-up strategies. This paper addresses the two most critical aspects of constellation orbital design — efficient constellation candidate generation and coverage analysis. Inefficiencies and flaws in the current standard algorithm for constellation modeling are identified, and a corrected and improved algorithm is presented. In the 1970's, John Walker and G. V. Mozhaev developed innovative strategies for continuous global coverage using symmetric non-geosynchronous constellations. (These are sometimes referred to as rosette, or Walker constellations. An example is pictured above.) In 1980, the late Arthur Ballard extended and generalized the work of Walker into a detailed algorithm for the NAVSTAR/GPS program, which deployed a 24 satellite symmetric constellation. Ballard's important contribution was published in his "Rosette Constellations of Earth Satellites."

  8. 42 CFR 440.330 - Benchmark health benefits coverage.

    Science.gov (United States)

    2010-10-01

    ...) Federal Employees Health Benefit Plan Equivalent Coverage (FEHBP—Equivalent Health Insurance Coverage). A... coverage. Health benefits coverage that is offered and generally available to State employees in the State... 42 Public Health 4 2010-10-01 2010-10-01 false Benchmark health benefits coverage. 440.330 Section...

  9. Experimental conditions improving in-solution target enrichment for ancient DNA

    DEFF Research Database (Denmark)

    Cruz-Dávalos, Diana I.; Llamas, Bastien; Gaunitz, Charleen

    2017-01-01

    High-throughput sequencing has dramatically fostered ancient DNA research in recent years. Shotgun sequencing, however, does not necessarily appear as the best-suited approach due to the extensive contamination of samples with exogenous environmental microbial DNA. DNA capture-enrichment methods ...

  10. Potential and pitfalls of eukaryotic metagenome skimming: a test case for lichens.

    Science.gov (United States)

    Greshake, Bastian; Zehr, Simonida; Dal Grande, Francesco; Meiser, Anjuli; Schmitt, Imke; Ebersberger, Ingo

    2016-03-01

    Whole-genome shotgun sequencing of multispecies communities using only a single library layout is commonly used to assess taxonomic and functional diversity of microbial assemblages. Here, we investigate to what extent such metagenome skimming approaches are applicable for in-depth genomic characterizations of eukaryotic communities, for example lichens. We address how to best assemble a particular eukaryotic metagenome skimming data, what pitfalls can occur, and what genome quality can be expected from these data. To facilitate a project-specific benchmarking, we introduce the concept of twin sets, simulated data resembling the outcome of a particular metagenome sequencing study. We show that the quality of genome reconstructions depends essentially on assembler choice. Individual tools, including the metagenome assemblers Omega and MetaVelvet, are surprisingly sensitive to low and uneven coverages. In combination with the routine of assembly parameter choice to optimize the assembly N50 size, these tools can preclude an entire genome from the assembly. In contrast, MIRA, an all-purpose overlap assembler, and SPAdes, a multisized de Bruijn graph assembler, facilitate a comprehensive view on the individual genomes across a wide range of coverage ratios. Testing assemblers on a real-world metagenome skimming data from the lichen Lasallia pustulata demonstrates the applicability of twin sets for guiding method selection. Furthermore, it reveals that the assembly outcome for the photobiont Trebouxia sp. falls behind the a priori expectation given the simulations. Although the underlying reasons remain still unclear, this highlights that further studies on this organism require special attention during sequence data generation and downstream analysis. © 2015 John Wiley & Sons Ltd.

  11. High-resolution phylogenetic microbial community profiling

    Energy Technology Data Exchange (ETDEWEB)

    Singer, Esther; Coleman-Derr, Devin; Bowman, Brett; Schwientek, Patrick; Clum, Alicia; Copeland, Alex; Ciobanu, Doina; Cheng, Jan-Fang; Gies, Esther; Hallam, Steve; Tringe, Susannah; Woyke, Tanja

    2014-03-17

    The representation of bacterial and archaeal genome sequences is strongly biased towards cultivated organisms, which belong to merely four phylogenetic groups. Functional information and inter-phylum level relationships are still largely underexplored for candidate phyla, which are often referred to as microbial dark matter. Furthermore, a large portion of the 16S rRNA gene records in the GenBank database are labeled as environmental samples and unclassified, which is in part due to low read accuracy, potential chimeric sequences produced during PCR amplifications and the low resolution of short amplicons. In order to improve the phylogenetic classification of novel species and advance our knowledge of the ecosystem function of uncultivated microorganisms, high-throughput full length 16S rRNA gene sequencing methodologies with reduced biases are needed. We evaluated the performance of PacBio single-molecule real-time (SMRT) sequencing in high-resolution phylogenetic microbial community profiling. For this purpose, we compared PacBio and Illumina metagenomic shotgun and 16S rRNA gene sequencing of a mock community as well as of an environmental sample from Sakinaw Lake, British Columbia. Sakinaw Lake is known to contain a large age of microbial species from candidate phyla. Sequencing results show that community structure based on PacBio shotgun and 16S rRNA gene sequences is highly similar in both the mock and the environmental communities. Resolution power and community representation accuracy from SMRT sequencing data appeared to be independent of GC content of microbial genomes and was higher when compared to Illumina-based metagenome shotgun and 16S rRNA gene (iTag) sequences, e.g. full-length sequencing resolved all 23 OTUs in the mock community, while iTags did not resolve closely related species. SMRT sequencing hence offers various potential benefits when characterizing uncharted microbial communities.

  12. Comparison of illumina and 454 deep sequencing in participants failing raltegravir-based antiretroviral therapy.

    Directory of Open Access Journals (Sweden)

    Jonathan Z Li

    Full Text Available The impact of raltegravir-resistant HIV-1 minority variants (MVs on raltegravir treatment failure is unknown. Illumina sequencing offers greater throughput than 454, but sequence analysis tools for viral sequencing are needed. We evaluated Illumina and 454 for the detection of HIV-1 raltegravir-resistant MVs.A5262 was a single-arm study of raltegravir and darunavir/ritonavir in treatment-naïve patients. Pre-treatment plasma was obtained from 5 participants with raltegravir resistance at the time of virologic failure. A control library was created by pooling integrase clones at predefined proportions. Multiplexed sequencing was performed with Illumina and 454 platforms at comparable costs. Illumina sequence analysis was performed with the novel snp-assess tool and 454 sequencing was analyzed with V-Phaser.Illumina sequencing resulted in significantly higher sequence coverage and a 0.095% limit of detection. Illumina accurately detected all MVs in the control library at ≥0.5% and 7/10 MVs expected at 0.1%. 454 sequencing failed to detect any MVs at 0.1% with 5 false positive calls. For MVs detected in the patient samples by both 454 and Illumina, the correlation in the detected variant frequencies was high (R2 = 0.92, P<0.001. Illumina sequencing detected 2.4-fold greater nucleotide MVs and 2.9-fold greater amino acid MVs compared to 454. The only raltegravir-resistant MV detected was an E138K mutation in one participant by Illumina sequencing, but not by 454.In participants of A5262 with raltegravir resistance at virologic failure, baseline raltegravir-resistant MVs were rarely detected. At comparable costs to 454 sequencing, Illumina demonstrated greater depth of coverage, increased sensitivity for detecting HIV MVs, and fewer false positive variant calls.

  13. Choice of reference sequence and assembler for alignment of Listeria monocytogenes short-read sequence data greatly influences rates of error in SNP analyses.

    Directory of Open Access Journals (Sweden)

    Arthur W Pightling

    Full Text Available The wide availability of whole-genome sequencing (WGS and an abundance of open-source software have made detection of single-nucleotide polymorphisms (SNPs in bacterial genomes an increasingly accessible and effective tool for comparative analyses. Thus, ensuring that real nucleotide differences between genomes (i.e., true SNPs are detected at high rates and that the influences of errors (such as false positive SNPs, ambiguously called sites, and gaps are mitigated is of utmost importance. The choices researchers make regarding the generation and analysis of WGS data can greatly influence the accuracy of short-read sequence alignments and, therefore, the efficacy of such experiments. We studied the effects of some of these choices, including: i depth of sequencing coverage, ii choice of reference-guided short-read sequence assembler, iii choice of reference genome, and iv whether to perform read-quality filtering and trimming, on our ability to detect true SNPs and on the frequencies of errors. We performed benchmarking experiments, during which we assembled simulated and real Listeria monocytogenes strain 08-5578 short-read sequence datasets of varying quality with four commonly used assemblers (BWA, MOSAIK, Novoalign, and SMALT, using reference genomes of varying genetic distances, and with or without read pre-processing (i.e., quality filtering and trimming. We found that assemblies of at least 50-fold coverage provided the most accurate results. In addition, MOSAIK yielded the fewest errors when reads were aligned to a nearly identical reference genome, while using SMALT to align reads against a reference sequence that is ∼0.82% distant from 08-5578 at the nucleotide level resulted in the detection of the greatest numbers of true SNPs and the fewest errors. Finally, we show that whether read pre-processing improves SNP detection depends upon the choice of reference sequence and assembler. In total, this study demonstrates that researchers

  14. 40 CFR 51.356 - Vehicle coverage.

    Science.gov (United States)

    2010-07-01

    ... 40 Protection of Environment 2 2010-07-01 2010-07-01 false Vehicle coverage. 51.356 Section 51.356....356 Vehicle coverage. The performance standard for enhanced I/M programs assumes coverage of all 1968 and later model year light duty vehicles and light duty trucks up to 8,500 pounds GVWR, and includes...

  15. Phylogenetic and functional analysis of metagenome sequence from high-temperature archaeal habitats demonstrate linkages between metabolic potential and geochemistry

    Directory of Open Access Journals (Sweden)

    William P. Inskeep

    2013-05-01

    Full Text Available Geothermal habitats in Yellowstone National Park (YNP provide an unparalled opportunity to understand the environmental factors that control the distribution of archaea in thermal habitats. Here we describe, analyze and synthesize metagenomic and geochemical data collected from seven high-temperature sites that contain microbial communities dominated by archaea relative to bacteria. The specific objectives of the study were to use metagenome sequencing to determine the structure and functional capacity of thermophilic archaeal-dominated microbial communities across a pH range from 2.5 to 6.4 and to discuss specific examples where the metabolic potential correlated with measured environmental parameters and geochemical processes occurring in situ. Random shotgun metagenome sequence (~40-45 Mbase Sanger sequencing per site was obtained from environmental DNA extracted from high-temperature sediments and/or microbial mats and subjected to numerous phylogenetic and functional analyses. Analysis of individual sequences (e.g., MEGAN and G+C content and assemblies from each habitat type revealed the presence of dominant archaeal populations in all environments, 10 of whose genomes were largely reconstructed from the sequence data. Analysis of protein family occurrence, particularly of those involved in energy conservation, electron transport and autotrophic metabolism, revealed significant differences in metabolic strategies across sites consistent with differences in major geochemical attributes (e.g., sulfide, oxygen, pH. These observations provide an ecological basis for understanding the distribution of indigenous archaeal lineages across high temperature systems of YNP.

  16. Efficiency to Discovery Transgenic Loci in GM Rice Using Next Generation Sequencing Whole Genome Re-sequencing

    Directory of Open Access Journals (Sweden)

    Doori Park

    2015-09-01

    Full Text Available Molecular characterization technology in genetically modified organisms, in addition to how transgenic biotechnologies are developed now require full transparency to assess the risk to living modified and non-modified organisms. Next generation sequencing (NGS methodology is suggested as an effective means in genome characterization and detection of transgenic insertion locations. In the present study, we applied NGS to insert transgenic loci, specifically the epidermal growth factor (EGF in genetically modified rice cells. A total of 29.3 Gb (~72× coverage was sequenced with a 2 × 150 bp paired end method by Illumina HiSeq2500, which was consecutively mapped to the rice genome and T-vector sequence. The compatible pairs of reads were successfully mapped to 10 loci on the rice chromosome and vector sequences were validated to the insertion location by polymerase chain reaction (PCR amplification. The EGF transgenic site was confirmed only on chromosome 4 by PCR. Results of this study demonstrated the success of NGS data to characterize the rice genome. Bioinformatics analyses must be developed in association with NGS data to identify highly accurate transgenic sites.

  17. High-throughput automated microfluidic sample preparation for accurate microbial genomics.

    Science.gov (United States)

    Kim, Soohong; De Jonghe, Joachim; Kulesa, Anthony B; Feldman, David; Vatanen, Tommi; Bhattacharyya, Roby P; Berdy, Brittany; Gomez, James; Nolan, Jill; Epstein, Slava; Blainey, Paul C

    2017-01-27

    Low-cost shotgun DNA sequencing is transforming the microbial sciences. Sequencing instruments are so effective that sample preparation is now the key limiting factor. Here, we introduce a microfluidic sample preparation platform that integrates the key steps in cells to sequence library sample preparation for up to 96 samples and reduces DNA input requirements 100-fold while maintaining or improving data quality. The general-purpose microarchitecture we demonstrate supports workflows with arbitrary numbers of reaction and clean-up or capture steps. By reducing the sample quantity requirements, we enabled low-input (∼10,000 cells) whole-genome shotgun (WGS) sequencing of Mycobacterium tuberculosis and soil micro-colonies with superior results. We also leveraged the enhanced throughput to sequence ∼400 clinical Pseudomonas aeruginosa libraries and demonstrate excellent single-nucleotide polymorphism detection performance that explained phenotypically observed antibiotic resistance. Fully-integrated lab-on-chip sample preparation overcomes technical barriers to enable broader deployment of genomics across many basic research and translational applications.

  18. Rapid and accurate pyrosequencing of angiosperm plastid genomes

    Science.gov (United States)

    Moore, Michael J; Dhingra, Amit; Soltis, Pamela S; Shaw, Regina; Farmerie, William G; Folta, Kevin M; Soltis, Douglas E

    2006-01-01

    genome sequence was generated for a significant reduction in time and cost over traditional shotgun-based genome sequencing techniques, although with approximately half the coverage of previously reported GS 20 de novo genome sequence. The GS 20 should be broadly applicable to angiosperm plastid genome sequencing, and therefore promises to expand the scale of plant genetic and phylogenetic research dramatically. PMID:16934154

  19. Rapid and accurate pyrosequencing of angiosperm plastid genomes

    Directory of Open Access Journals (Sweden)

    Farmerie William G

    2006-08-01

    observed in the GS 20 plastid genome sequence was generated for a significant reduction in time and cost over traditional shotgun-based genome sequencing techniques, although with approximately half the coverage of previously reported GS 20 de novo genome sequence. The GS 20 should be broadly applicable to angiosperm plastid genome sequencing, and therefore promises to expand the scale of plant genetic and phylogenetic research dramatically.

  20. Strain-Level Discrimination of Shiga Toxin-Producing Escherichia coli in Spinach Using Metagenomic Sequencing.

    Directory of Open Access Journals (Sweden)

    Susan R Leonard

    Full Text Available Consumption of fresh bagged spinach contaminated with Shiga toxin-producing Escherichia coli (STEC has led to severe illness and death; however current culture-based methods to detect foodborne STEC are time consuming. Since not all STEC strains are considered pathogenic to humans, it is crucial to incorporate virulence characterization of STEC in the detection method. In this study, we assess the comprehensiveness of utilizing a shotgun metagenomics approach for detection and strain-level identification by spiking spinach with a variety of genomically disparate STEC strains at a low contamination level of 0.1 CFU/g. Molecular serotyping, virulence gene characterization, microbial community analysis, and E. coli core gene single nucleotide polymorphism (SNP analysis were performed on metagenomic sequence data from enriched samples. It was determined from bacterial community analysis that E. coli, which was classified at the phylogroup level, was a major component of the population in most samples. However, in over half the samples, molecular serotyping revealed the presence of indigenous E. coli which also contributed to the percent abundance of E. coli. Despite the presence of additional E. coli strains, the serotype and virulence genes of the spiked STEC, including correct Shiga toxin subtype, were detected in 94% of the samples with a total number of reads per sample averaging 2.4 million. Variation in STEC abundance and/or detection was observed in replicate spiked samples, indicating an effect from the indigenous microbiota during enrichment. SNP analysis of the metagenomic data correctly placed the spiked STEC in a phylogeny of related strains in cases where the indigenous E. coli did not predominate in the enriched sample. Also, for these samples, our analysis demonstrates that strain-level phylogenetic resolution is possible using shotgun metagenomic data for determining the genomic relatedness of a contaminating STEC strain to other

  1. Analysis and Visualization Tool for Targeted Amplicon Bisulfite Sequencing on Ion Torrent Sequencers.

    Directory of Open Access Journals (Sweden)

    Stephan Pabinger

    Full Text Available Targeted sequencing of PCR amplicons generated from bisulfite deaminated DNA is a flexible, cost-effective way to study methylation of a sample at single CpG resolution and perform subsequent multi-target, multi-sample comparisons. Currently, no platform specific protocol, support, or analysis solution is provided to perform targeted bisulfite sequencing on a Personal Genome Machine (PGM. Here, we present a novel tool, called TABSAT, for analyzing targeted bisulfite sequencing data generated on Ion Torrent sequencers. The workflow starts with raw sequencing data, performs quality assessment, and uses a tailored version of Bismark to map the reads to a reference genome. The pipeline visualizes results as lollipop plots and is able to deduce specific methylation-patterns present in a sample. The obtained profiles are then summarized and compared between samples. In order to assess the performance of the targeted bisulfite sequencing workflow, 48 samples were used to generate 53 different Bisulfite-Sequencing PCR amplicons from each sample, resulting in 2,544 amplicon targets. We obtained a mean coverage of 282X using 1,196,822 aligned reads. Next, we compared the sequencing results of these targets to the methylation level of the corresponding sites on an Illumina 450k methylation chip. The calculated average Pearson correlation coefficient of 0.91 confirms the sequencing results with one of the industry-leading CpG methylation platforms and shows that targeted amplicon bisulfite sequencing provides an accurate and cost-efficient method for DNA methylation studies, e.g., to provide platform-independent confirmation of Illumina Infinium 450k methylation data. TABSAT offers a novel way to analyze data generated by Ion Torrent instruments and can also be used with data from the Illumina MiSeq platform. It can be easily accessed via the Platomics platform, which offers a web-based graphical user interface along with sample and parameter storage

  2. Insurance Coverage Policies for Personalized Medicine

    Directory of Open Access Journals (Sweden)

    Andrew Hresko

    2012-10-01

    Full Text Available Adoption of personalized medicine in practice has been slow, in part due to the lack of evidence of clinical benefit provided by these technologies. Coverage by insurers is a critical step in achieving widespread adoption of personalized medicine. Insurers consider a variety of factors when formulating medical coverage policies for personalized medicine, including the overall strength of evidence for a test, availability of clinical guidelines and health technology assessments by independent organizations. In this study, we reviewed coverage policies of the largest U.S. insurers for genomic (disease-related and pharmacogenetic (PGx tests to determine the extent that these tests were covered and the evidence basis for the coverage decisions. We identified 41 coverage policies for 49 unique testing: 22 tests for disease diagnosis, prognosis and risk and 27 PGx tests. Fifty percent (or less of the tests reviewed were covered by insurers. Lack of evidence of clinical utility appears to be a major factor in decisions of non-coverage. The inclusion of PGx information in drug package inserts appears to be a common theme of PGx tests that are covered. This analysis highlights the variability of coverage determinations and factors considered, suggesting that the adoption of personal medicine will affected by numerous factors, but will continue to be slowed due to lack of demonstrated clinical benefit.

  3. A programmable method for massively parallel targeted sequencing

    Science.gov (United States)

    Hopmans, Erik S.; Natsoulis, Georges; Bell, John M.; Grimes, Susan M.; Sieh, Weiva; Ji, Hanlee P.

    2014-01-01

    We have developed a targeted resequencing approach referred to as Oligonucleotide-Selective Sequencing. In this study, we report a series of significant improvements and novel applications of this method whereby the surface of a sequencing flow cell is modified in situ to capture specific genomic regions of interest from a sample and then sequenced. These improvements include a fully automated targeted sequencing platform through the use of a standard Illumina cBot fluidics station. Targeting optimization increased the yield of total on-target sequencing data 2-fold compared to the previous iteration, while simultaneously increasing the percentage of reads that could be mapped to the human genome. The described assays cover up to 1421 genes with a total coverage of 5.5 Megabases (Mb). We demonstrate a 10-fold abundance uniformity of greater than 90% in 1 log distance from the median and a targeting rate of up to 95%. We also sequenced continuous genomic loci up to 1.5 Mb while simultaneously genotyping SNPs and genes. Variants with low minor allele fraction were sensitively detected at levels of 5%. Finally, we determined the exact breakpoint sequence of cancer rearrangements. Overall, this approach has high performance for selective sequencing of genome targets, configuration flexibility and variant calling accuracy. PMID:24782526

  4. Implementation of Whole Genome Sequencing (WGS for Identification and Characterization of Shiga Toxin-Producing Escherichia coli (STEC in the United States

    Directory of Open Access Journals (Sweden)

    Rebecca L Lindsey

    2016-05-01

    Full Text Available Shiga toxin-producing Escherichia coli (STEC is an important foodborne pathogen capable of causing severe disease in humans. Rapid and accurate identification and characterization techniques are essential during outbreak investigations. Current methods for characterization of STEC are expensive and time-consuming. With the advent of rapid and cheap whole genome sequencing (WGS benchtop sequencers, the potential exists to replace traditional workflows with WGS. The aim of this study was to validate tools to do reference identification and characterization from WGS for STEC in a single workflow within an easy to use commercially available software platform. Publically available serotype, virulence, and antimicrobial resistance databases were downloaded from the Center for Genomic Epidemiology (CGE (www.genomicepidemiology.org and integrated into a genotyping plug-in with in silico PCR tools to confirm some of the virulence genes detected from WGS data. Additionally, down sampling experiments on the WGS sequence data were performed to determine a threshold for sequence coverage needed to accurately predict serotype and virulence genes using the established workflow. The serotype database was tested on a total of 228 genomes and correctly predicted from WGS for 96.1% of O serogroups and 96.5% of H serogroups identified by conventional testing techniques. A total of 59 genomes were evaluated to determine the threshold of coverage to detect the different WGS targets, 40 were evaluated for serotype and virulence gene detection and 19 for the stx gene subtypes. For serotype, 95% of the O and 100% of the H serogroups were detected at > 40x and ≥ 30x coverage, respectively. For virulence targets and stx gene subtypes, nearly all genes were detected at > 40x, though some targets were 100% detectable from genomes with coverage ≥20x. The resistance detection tool was 97% concordant with phenotypic testing results. With isolates sequenced to > 40x

  5. Implementation of Whole Genome Sequencing (WGS) for Identification and Characterization of Shiga Toxin-Producing Escherichia coli (STEC) in the United States

    Science.gov (United States)

    Lindsey, Rebecca L.; Pouseele, Hannes; Chen, Jessica C.; Strockbine, Nancy A.; Carleton, Heather A.

    2016-01-01

    Shiga toxin-producing Escherichia coli (STEC) is an important foodborne pathogen capable of causing severe disease in humans. Rapid and accurate identification and characterization techniques are essential during outbreak investigations. Current methods for characterization of STEC are expensive and time-consuming. With the advent of rapid and cheap whole genome sequencing (WGS) benchtop sequencers, the potential exists to replace traditional workflows with WGS. The aim of this study was to validate tools to do reference identification and characterization from WGS for STEC in a single workflow within an easy to use commercially available software platform. Publically available serotype, virulence, and antimicrobial resistance databases were downloaded from the Center for Genomic Epidemiology (CGE) (www.genomicepidemiology.org) and integrated into a genotyping plug-in with in silico PCR tools to confirm some of the virulence genes detected from WGS data. Additionally, down sampling experiments on the WGS sequence data were performed to determine a threshold for sequence coverage needed to accurately predict serotype and virulence genes using the established workflow. The serotype database was tested on a total of 228 genomes and correctly predicted from WGS for 96.1% of O serogroups and 96.5% of H serogroups identified by conventional testing techniques. A total of 59 genomes were evaluated to determine the threshold of coverage to detect the different WGS targets, 40 were evaluated for serotype and virulence gene detection and 19 for the stx gene subtypes. For serotype, 95% of the O and 100% of the H serogroups were detected at > 40x and ≥ 30x coverage, respectively. For virulence targets and stx gene subtypes, nearly all genes were detected at > 40x, though some targets were 100% detectable from genomes with coverage ≥20x. The resistance detection tool was 97% concordant with phenotypic testing results. With isolates sequenced to > 40x coverage, the different

  6. MaRaCluster: A Fragment Rarity Metric for Clustering Fragment Spectra in Shotgun Proteomics.

    Science.gov (United States)

    The, Matthew; Käll, Lukas

    2016-03-04

    Shotgun proteomics experiments generate large amounts of fragment spectra as primary data, normally with high redundancy between and within experiments. Here, we have devised a clustering technique to identify fragment spectra stemming from the same species of peptide. This is a powerful alternative method to traditional search engines for analyzing spectra, specifically useful for larger scale mass spectrometry studies. As an aid in this process, we propose a distance calculation relying on the rarity of experimental fragment peaks, following the intuition that peaks shared by only a few spectra offer more evidence than peaks shared by a large number of spectra. We used this distance calculation and a complete-linkage scheme to cluster data from a recent large-scale mass spectrometry-based study. The clusterings produced by our method have up to 40% more identified peptides for their consensus spectra compared to those produced by the previous state-of-the-art method. We see that our method would advance the construction of spectral libraries as well as serve as a tool for mining large sets of fragment spectra. The source code and Ubuntu binary packages are available at https://github.com/statisticalbiotechnology/maracluster (under an Apache 2.0 license).

  7. Comparison of the Equine Reference Sequence with Its Sanger Source Data and New Illumina Reads.

    Directory of Open Access Journals (Sweden)

    Jovan Rebolledo-Mendez

    Full Text Available The reference assembly for the domestic horse, EquCab2, published in 2009, was built using approximately 30 million Sanger reads from a Thoroughbred mare named Twilight. Contiguity in the assembly was facilitated using nearly 315 thousand BAC end sequences from Twilight's half brother Bravo. Since then, it has served as the foundation for many genome-wide analyses that include not only the modern horse, but ancient horses and other equid species as well. As data mapped to this reference has accumulated, consistent variation between mapped datasets and the reference, in terms of regions with no read coverage, single nucleotide variants, and small insertions/deletions have become apparent. In many cases, it is not clear whether these differences are the result of true sequence variation between the research subjects' and Twilight's genome or due to errors in the reference. EquCab2 is regarded as "The Twilight Assembly." The objective of this study was to identify inconsistencies between the EquCab2 assembly and the source Twilight Sanger data used to build it. To that end, the original Sanger and BAC end reads have been mapped back to this equine reference and assessed with the addition of approximately 40X coverage of new Illumina Paired-End sequence data. The resulting mapped datasets identify those regions with low Sanger read coverage, as well as variation in genomic content that is not consistent with either the original Twilight Sanger data or the new genomic sequence data generated from Twilight on the Illumina platform. As the haploid EquCab2 reference assembly was created using Sanger reads derived largely from a single individual, the vast majority of variation detected in a mapped dataset comprised of those same Sanger reads should be heterozygous. In contrast, homozygous variations would represent either errors in the reference or contributions from Bravo's BAC end sequences. Our analysis identifies 720,843 homozygous discrepancies

  8. Sequence Capture versus Restriction Site Associated DNA Sequencing for Shallow Systematics.

    Science.gov (United States)

    Harvey, Michael G; Smith, Brian Tilston; Glenn, Travis C; Faircloth, Brant C; Brumfield, Robb T

    2016-09-01

    Sequence capture and restriction site associated DNA sequencing (RAD-Seq) are two genomic enrichment strategies for applying next-generation sequencing technologies to systematics studies. At shallow timescales, such as within species, RAD-Seq has been widely adopted among researchers, although there has been little discussion of the potential limitations and benefits of RAD-Seq and sequence capture. We discuss a series of issues that may impact the utility of sequence capture and RAD-Seq data for shallow systematics in non-model species. We review prior studies that used both methods, and investigate differences between the methods by re-analyzing existing RAD-Seq and sequence capture data sets from a Neotropical bird (Xenops minutus). We suggest that the strengths of RAD-Seq data sets for shallow systematics are the wide dispersion of markers across the genome, the relative ease and cost of laboratory work, the deep coverage and read overlap at recovered loci, and the high overall information that results. Sequence capture's benefits include flexibility and repeatability in the genomic regions targeted, success using low-quality samples, more straightforward read orthology assessment, and higher per-locus information content. The utility of a method in systematics, however, rests not only on its performance within a study, but on the comparability of data sets and inferences with those of prior work. In RAD-Seq data sets, comparability is compromised by low overlap of orthologous markers across species and the sensitivity of genetic diversity in a data set to an interaction between the level of natural heterozygosity in the samples examined and the parameters used for orthology assessment. In contrast, sequence capture of conserved genomic regions permits interrogation of the same loci across divergent species, which is preferable for maintaining comparability among data sets and studies for the purpose of drawing general conclusions about the impact of

  9. Direct squencing from the minimal number of DNA molecules needed to fill a 454 picotiterplate.

    Directory of Open Access Journals (Sweden)

    Mária Džunková

    Full Text Available The large amount of DNA needed to prepare a library in next generation sequencing protocols hinders direct sequencing of small DNA samples. This limitation is usually overcome by the enrichment of such samples with whole genome amplification (WGA, mostly by multiple displacement amplification (MDA based on φ29 polymerase. However, this technique can be biased by the GC content of the sample and is prone to the development of chimeras as well as contamination during enrichment, which contributes to undesired noise during sequence data analysis, and also hampers the proper functional and/or taxonomic assignments. An alternative to MDA is direct DNA sequencing (DS, which represents the theoretical gold standard in genome sequencing. In this work, we explore the possibility of sequencing the genome of Escherichia coli fs 24 from the minimum number of DNA molecules required for pyrosequencing, according to the notion of one-bead-one-molecule. Using an optimized protocol for DS, we constructed a shotgun library containing the minimum number of DNA molecules needed to fill a selected region of a picotiterplate. We gathered most of the reference genome extension with uniform coverage. We compared the DS method with MDA applied to the same amount of starting DNA. As expected, MDA yielded a sparse and biased read distribution, with a very high amount of unassigned and unspecific DNA amplifications. The optimized DS protocol allows unbiased sequencing to be performed from samples with a very small amount of DNA.

  10. Toward Prostate Cancer Contouring Guidelines on Magnetic Resonance Imaging: Dominant Lesion Gross and Clinical Target Volume Coverage Via Accurate Histology Fusion

    International Nuclear Information System (INIS)

    Gibson, Eli; Bauman, Glenn S.; Romagnoli, Cesare; Cool, Derek W.; Bastian-Jordan, Matthew; Kassam, Zahra; Gaed, Mena; Moussa, Madeleine; Gómez, José A.; Pautler, Stephen E.; Chin, Joseph L.; Crukley, Cathie; Haider, Masoom A.

    2016-01-01

    Purpose: Defining prostate cancer (PCa) lesion clinical target volumes (CTVs) for multiparametric magnetic resonance imaging (mpMRI) could support focal boosting or treatment to improve outcomes or lower morbidity, necessitating appropriate CTV margins for mpMRI-defined gross tumor volumes (GTVs). This study aimed to identify CTV margins yielding 95% coverage of PCa tumors for prospective cases with high likelihood. Methods and Materials: Twenty-five men with biopsy-confirmed clinical stage T1 or T2 PCa underwent pre-prostatectomy mpMRI, yielding T2-weighted, dynamic contrast-enhanced, and apparent diffusion coefficient images. Digitized whole-mount histology was contoured and registered to mpMRI scans (error ≤2 mm). Four observers contoured lesion GTVs on each mpMRI scan. CTVs were defined by isotropic and anisotropic expansion from these GTVs and from multiparametric (unioned) GTVs from 2 to 3 scans. Histologic coverage (proportions of tumor area on co-registered histology inside the CTV, measured for Gleason scores [GSs] ≥6 and ≥7) and prostate sparing (proportions of prostate volume outside the CTV) were measured. Nonparametric histologic-coverage prediction intervals defined minimal margins yielding 95% coverage for prospective cases with 78% to 92% likelihood. Results: On analysis of 72 true-positive tumor detections, 95% coverage margins were 9 to 11 mm (GS ≥ 6) and 8 to 10 mm (GS ≥ 7) for single-sequence GTVs and were 8 mm (GS ≥ 6) and 6 mm (GS ≥ 7) for 3-sequence GTVs, yielding CTVs that spared 47% to 81% of prostate tissue for the majority of tumors. Inclusion of T2-weighted contours increased sparing for multiparametric CTVs with 95% coverage margins for GS ≥6, and inclusion of dynamic contrast-enhanced contours increased sparing for GS ≥7. Anisotropic 95% coverage margins increased the sparing proportions to 71% to 86%. Conclusions: Multiparametric magnetic resonance imaging–defined GTVs expanded by appropriate margins

  11. Toward Prostate Cancer Contouring Guidelines on Magnetic Resonance Imaging: Dominant Lesion Gross and Clinical Target Volume Coverage Via Accurate Histology Fusion

    Energy Technology Data Exchange (ETDEWEB)

    Gibson, Eli [Robarts Research Institute, University of Western Ontario, London, Ontario (Canada); Biomedical Engineering, University of Western Ontario, London, Ontario (Canada); Centre for Medical Image Computing, University College London, London (United Kingdom); Department of Radiology, Radboud University Medical Centre, Nijmegen (Netherlands); Bauman, Glenn S., E-mail: glenn.bauman@lhsc.on.ca [Lawson Health Research Institute, London, Ontario (Canada); Department of Oncology, University of Western Ontario, London, Ontario (Canada); Romagnoli, Cesare; Cool, Derek W. [Department of Medical Imaging, University of Western Ontario, London, Ontario (Canada); Bastian-Jordan, Matthew [Department of Medical Imaging, University of Western Ontario, London, Ontario (Canada); Queensland Health, Brisbane, Queensland (Australia); Kassam, Zahra [Department of Medical Imaging, University of Western Ontario, London, Ontario (Canada); Gaed, Mena [Robarts Research Institute, University of Western Ontario, London, Ontario (Canada); Department of Pathology, University of Western Ontario, London, Ontario (Canada); Moussa, Madeleine; Gómez, José A. [Department of Pathology, University of Western Ontario, London, Ontario (Canada); Pautler, Stephen E.; Chin, Joseph L. [Lawson Health Research Institute, London, Ontario (Canada); Department of Urology, University of Western Ontario, London, Ontario (Canada); Crukley, Cathie [Robarts Research Institute, University of Western Ontario, London, Ontario (Canada); Lawson Health Research Institute, London, Ontario (Canada); Haider, Masoom A. [Department of Medical Imaging, Sunnybrook Health Sciences Centre, Toronto, Ontario (Canada); and others

    2016-09-01

    Purpose: Defining prostate cancer (PCa) lesion clinical target volumes (CTVs) for multiparametric magnetic resonance imaging (mpMRI) could support focal boosting or treatment to improve outcomes or lower morbidity, necessitating appropriate CTV margins for mpMRI-defined gross tumor volumes (GTVs). This study aimed to identify CTV margins yielding 95% coverage of PCa tumors for prospective cases with high likelihood. Methods and Materials: Twenty-five men with biopsy-confirmed clinical stage T1 or T2 PCa underwent pre-prostatectomy mpMRI, yielding T2-weighted, dynamic contrast-enhanced, and apparent diffusion coefficient images. Digitized whole-mount histology was contoured and registered to mpMRI scans (error ≤2 mm). Four observers contoured lesion GTVs on each mpMRI scan. CTVs were defined by isotropic and anisotropic expansion from these GTVs and from multiparametric (unioned) GTVs from 2 to 3 scans. Histologic coverage (proportions of tumor area on co-registered histology inside the CTV, measured for Gleason scores [GSs] ≥6 and ≥7) and prostate sparing (proportions of prostate volume outside the CTV) were measured. Nonparametric histologic-coverage prediction intervals defined minimal margins yielding 95% coverage for prospective cases with 78% to 92% likelihood. Results: On analysis of 72 true-positive tumor detections, 95% coverage margins were 9 to 11 mm (GS ≥ 6) and 8 to 10 mm (GS ≥ 7) for single-sequence GTVs and were 8 mm (GS ≥ 6) and 6 mm (GS ≥ 7) for 3-sequence GTVs, yielding CTVs that spared 47% to 81% of prostate tissue for the majority of tumors. Inclusion of T2-weighted contours increased sparing for multiparametric CTVs with 95% coverage margins for GS ≥6, and inclusion of dynamic contrast-enhanced contours increased sparing for GS ≥7. Anisotropic 95% coverage margins increased the sparing proportions to 71% to 86%. Conclusions: Multiparametric magnetic resonance imaging–defined GTVs expanded by appropriate margins

  12. Percent Coverage

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — The Percent Coverage is a spreadsheet that keeps track of and compares the number of vessels that have departed with and without observers to the numbers of vessels...

  13. Effects of 28 days of resistance exercise and consuming a commercially available pre-workout supplement, NO-Shotgun®, on body composition, muscle strength and mass, markers of satellite cell activation, and clinical safety markers in males

    Directory of Open Access Journals (Sweden)

    Leutholtz Brian

    2009-08-01

    Full Text Available Abstract Purpose This study determined the effects of 28 days of heavy resistance exercise combined with the nutritional supplement, NO-Shotgun®, on body composition, muscle strength and mass, markers of satellite cell activation, and clinical safety markers. Methods Eighteen non-resistance-trained males participated in a resistance training program (3 × 10-RM 4 times/wk for 28 days while also ingesting 27 g/day of placebo (PL or NO-Shotgun® (NO 30 min prior to exercise. Data were analyzed with separate 2 × 2 ANOVA and t-tests (p Results Total body mass was increased in both groups (p = 0.001, but without any significant increases in total body water (p = 0.77. No significant changes occurred with fat mass (p = 0.62; however fat-free mass did increase with training (p = 0.001, and NO was significantly greater than PL (p = 0.001. Bench press strength for NO was significantly greater than PL (p = 0.003. Myofibrillar protein increased with training (p = 0.001, with NO being significantly greater than PL (p = 0.019. Serum IGF-1 (p = 0.046 and HGF (p = 0.06 were significantly increased with training and for NO HGF was greater than PL (p = 0.002. Muscle phosphorylated c-met was increased with training for both groups (p = 0.019. Total DNA was increased in both groups (p = 0.006, while NO was significantly greater than PL (p = 0.038. For DNA/protein, PL was decreased and NO was not changed (p = 0.014. All of the myogenic regulatory factors were increased with training; however, NO was shown to be significantly greater than PL for Myo-D (p = 0.008 and MRF-4 (p = 0.022. No significant differences were located for any of the whole blood and serum clinical chemistry markers (p > 0.05. Conclusion When combined with heavy resistance training for 28 days, NO-Shotgun® is not associated with any negative side effects, nor does it abnormally impact any of the clinical chemistry markers. Rather, NO-Shotgun® effectively increases muscle strength and mass

  14. Community ecology of hot spring cyanobacterial mats: predominant populations and their functional potential

    DEFF Research Database (Denmark)

    Klatt, C. G.; Wood, J. M.; Rusch, D. B.

    2011-01-01

    Phototrophic microbial mat communities from 60¿°C and 65¿°C regions in the effluent channels of Mushroom and Octopus Springs (Yellowstone National Park, WY, USA) were investigated by shotgun metagenomic sequencing. Analyses of assembled metagenomic sequences resolved six dominant chlorophototrophic...

  15. State of the art and challenges in sequence based T-cell epitope prediction

    DEFF Research Database (Denmark)

    Lundegaard, Claus; Hoof, Ilka; Lund, Ole

    2010-01-01

    Sequence based T-cell epitope predictions have improved immensely in the last decade. From predictions of peptide binding to major histocompatibility complex molecules with moderate accuracy, limited allele coverage, and no good estimates of the other events in the antigen-processing pathway, the...

  16. Providing Universal Health Insurance Coverage in Nigeria.

    Science.gov (United States)

    Okebukola, Peter O; Brieger, William R

    2016-07-07

    Despite a stated goal of achieving universal coverage, the National Health Insurance Scheme of Nigeria had achieved only 4% coverage 12 years after it was launched. This study assessed the plans of the National Health Insurance Scheme to achieve universal health insurance coverage in Nigeria by 2015 and discusses the challenges facing the scheme in achieving insurance coverage. In-depth interviews from various levels of the health-care system in the country, including providers, were conducted. The results of the analysis suggest that challenges to extending coverage include the difficulty in convincing autonomous state governments to buy into the scheme and an inadequate health workforce that might not be able to meet increased demand. Recommendations for increasing the scheme's coverage include increasing decentralization and strengthening human resources for health in the service delivery systems. Strong political will is needed as a catalyst to achieving these goals. © The Author(s) 2016.

  17. Next-Generation Mitogenomics: A Comparison of Approaches Applied to Caecilian Amphibian Phylogeny

    OpenAIRE

    Maddock, Simon T.; Briscoe, Andrew G.; Wilkinson, Mark; Waeschenbach, Andrea; San Mauro, Diego; Day, Julia J.; Littlewood, D. Tim J.; Foster, Peter G.; Nussbaum, Ronald A.; Gower, David J.

    2016-01-01

    Mitochondrial genome (mitogenome) sequences are being generated with increasing speed due to the advances of next-generation sequencing (NGS) technology and associated analytical tools. However, detailed comparisons to explore the utility of alternative NGS approaches applied to the same taxa have not been undertaken. We compared a ‘traditional’ Sanger sequencing method with two NGS approaches (shotgun sequencing and non-indexed, multiplex amplicon sequencing) on four different sequencing pla...

  18. A linear programming model for protein inference problem in shotgun proteomics.

    Science.gov (United States)

    Huang, Ting; He, Zengyou

    2012-11-15

    Assembling peptides identified from tandem mass spectra into a list of proteins, referred to as protein inference, is an important issue in shotgun proteomics. The objective of protein inference is to find a subset of proteins that are truly present in the sample. Although many methods have been proposed for protein inference, several issues such as peptide degeneracy still remain unsolved. In this article, we present a linear programming model for protein inference. In this model, we use a transformation of the joint probability that each peptide/protein pair is present in the sample as the variable. Then, both the peptide probability and protein probability can be expressed as a formula in terms of the linear combination of these variables. Based on this simple fact, the protein inference problem is formulated as an optimization problem: minimize the number of proteins with non-zero probabilities under the constraint that the difference between the calculated peptide probability and the peptide probability generated from peptide identification algorithms should be less than some threshold. This model addresses the peptide degeneracy issue by forcing some joint probability variables involving degenerate peptides to be zero in a rigorous manner. The corresponding inference algorithm is named as ProteinLP. We test the performance of ProteinLP on six datasets. Experimental results show that our method is competitive with the state-of-the-art protein inference algorithms. The source code of our algorithm is available at: https://sourceforge.net/projects/prolp/. zyhe@dlut.edu.cn. Supplementary data are available at Bioinformatics Online.

  19. SeqEntropy: genome-wide assessment of repeats for short read sequencing.

    Directory of Open Access Journals (Sweden)

    Hsueh-Ting Chu

    Full Text Available BACKGROUND: Recent studies on genome assembly from short-read sequencing data reported the limitation of this technology to reconstruct the entire genome even at very high depth coverage. We investigated the limitation from the perspective of information theory to evaluate the effect of repeats on short-read genome assembly using idealized (error-free reads at different lengths. METHODOLOGY/PRINCIPAL FINDINGS: We define a metric H(k to be the entropy of sequencing reads at a read length k and use the relative loss of entropy ΔH(k to measure the impact of repeats for the reconstruction of whole-genome from sequences of length k. In our experiments, we found that entropy loss correlates well with de-novo assembly coverage of a genome, and a score of ΔH(k>1% indicates a severe loss in genome reconstruction fidelity. The minimal read lengths to achieve ΔH(k<1% are different for various organisms and are independent of the genome size. For example, in order to meet the threshold of ΔH(k<1%, a read length of 60 bp is needed for the sequencing of human genome (3.2 10(9 bp and 320 bp for the sequencing of fruit fly (1.8×10(8 bp. We also calculated the ΔH(k scores for 2725 prokaryotic chromosomes and plasmids at several read lengths. Our results indicate that the levels of repeats in different genomes are diverse and the entropy of sequencing reads provides a measurement for the repeat structures. CONCLUSIONS/SIGNIFICANCE: The proposed entropy-based measurement, which can be calculated in seconds to minutes in most cases, provides a rapid quantitative evaluation on the limitation of idealized short-read genome sequencing. Moreover, the calculation can be parallelized to scale up to large euakryotic genomes. This approach may be useful to tune the sequencing parameters to achieve better genome assemblies when a closely related genome is already available.

  20. Network television news coverage of environmental risks

    International Nuclear Information System (INIS)

    Greenberg, M.R.; Sandman, P.M.; Sachsman, D.V.; Salomone, K.L.

    1989-01-01

    Despite the criticisms that surround television coverage of environmental risk, there have been relatively few attempts to measure what and whom television shows. Most research has focused analysis on a few weeks of coverage of major stories like the gas leak at Bhopal, the Three Mile Island nuclear accident, or the Mount St. Helen's eruption. To advance the research into television coverage of environmental risk, an analysis has been made of all environmental risk coverage by the network nightly news broadcasts for a period of more than two years. Researchers have analyzed all environmental risk coverage-564 stories in 26 months-presented on ABC, CBS, and NBC's evening news broadcasts from January 1984 through February 1986. The quantitative information from the 564 stories was balanced by a more qualitative analysis of the television coverage of two case studies-the dioxin contamination in Times Beach, Missouri, and the suspected methyl isocyanate emissions from the Union Carbide plant in Institute, West Virginia. Both qualitative and quantitative data contributed to the analysis of the role played by experts and environmental advocacy sources in coverage of environmental risk and to the suggestions for increasing that role

  1. Shedding genomic light on Aristotle's lantern.

    Science.gov (United States)

    Sodergren, Erica; Shen, Yufeng; Song, Xingzhi; Zhang, Lan; Gibbs, Richard A; Weinstock, George M

    2006-12-01

    Sea urchins have proved fascinating to biologists since the time of Aristotle who compared the appearance of their bony mouth structure to a lantern in The History of Animals. Throughout modern times it has been a model system for research in developmental biology. Now, the genome of the sea urchin Strongylocentrotus purpuratus is the first echinoderm genome to be sequenced. A high quality draft sequence assembly was produced using the Atlas assembler to combine whole genome shotgun sequences with sequences from a collection of BACs selected to form a minimal tiling path along the genome. A formidable challenge was presented by the high degree of heterozygosity between the two haplotypes of the selected male representative of this marine organism. This was overcome by use of the BAC tiling path backbone, in which each BAC represents a single haplotype, as well as by improvements in the Atlas software. Another innovation introduced in this project was the sequencing of pools of tiling path BACs rather than individual BAC sequencing. The Clone-Array Pooled Shotgun Strategy greatly reduced the cost and time devoted to preparing shotgun libraries from BAC clones. The genome sequence was analyzed with several gene prediction methods to produce a comprehensive gene list that was then manually refined and annotated by a volunteer team of sea urchin experts. This latter annotation community edited over 9000 gene models and uncovered many unexpected aspects of the sea urchin genetic content impacting transcriptional regulation, immunology, sensory perception, and an organism's development. Analysis of the basic deuterostome genetic complement supports the sea urchin's role as a model system for deuterostome and, by extension, chordate development.

  2. Light whole genome sequence for SNP discovery across domestic cat breeds

    Directory of Open Access Journals (Sweden)

    Driscoll Carlos

    2010-06-01

    Full Text Available Abstract Background The domestic cat has offered enormous genomic potential in the veterinary description of over 250 hereditary disease models as well as the occurrence of several deadly feline viruses (feline leukemia virus -- FeLV, feline coronavirus -- FECV, feline immunodeficiency virus - FIV that are homologues to human scourges (cancer, SARS, and AIDS respectively. However, to realize this bio-medical potential, a high density single nucleotide polymorphism (SNP map is required in order to accomplish disease and phenotype association discovery. Description To remedy this, we generated 3,178,297 paired fosmid-end Sanger sequence reads from seven cats, and combined these data with the publicly available 2X cat whole genome sequence. All sequence reads were assembled together to form a 3X whole genome assembly allowing the discovery of over three million SNPs. To reduce potential false positive SNPs due to the low coverage assembly, a low upper-limit was placed on sequence coverage and a high lower-limit on the quality of the discrepant bases at a potential variant site. In all domestic cats of different breeds: female Abyssinian, female American shorthair, male Cornish Rex, female European Burmese, female Persian, female Siamese, a male Ragdoll and a female African wildcat were sequenced lightly. We report a total of 964 k common SNPs suitable for a domestic cat SNP genotyping array and an additional 900 k SNPs detected between African wildcat and domestic cats breeds. An empirical sampling of 94 discovered SNPs were tested in the sequenced cats resulting in a SNP validation rate of 99%. Conclusions These data provide a large collection of mapped feline SNPs across the cat genome that will allow for the development of SNP genotyping platforms for mapping feline diseases.

  3. 20 CFR 404.1065 - Self-employment coverage.

    Science.gov (United States)

    2010-04-01

    ... 20 Employees' Benefits 2 2010-04-01 2010-04-01 false Self-employment coverage. 404.1065 Section... INSURANCE (1950- ) Employment, Wages, Self-Employment, and Self-Employment Income Self-Employment § 404.1065 Self-employment coverage. For an individual to have self-employment coverage under social security, the...

  4. Assessing Measurement Error in Medicare Coverage

    Data.gov (United States)

    U.S. Department of Health & Human Services — Assessing Measurement Error in Medicare Coverage From the National Health Interview Survey Using linked administrative data, to validate Medicare coverage estimates...

  5. Dryland biological soil crust cyanobacteria show unexpected decreases in abundance under long-term elevated CO2

    Science.gov (United States)

    Steven, Blaire; Gallegos-Graves, La Verne; Yeager, Chris M.; Belnap, Jayne; Evans, R. David; Kuske, Cheryl R.

    2012-01-01

    Biological soil crusts (biocrusts) cover soil surfaces in many drylands globally. The impacts of 10 years of elevated atmospheric CO2 on the cyanobacteria in biocrusts of an arid shrubland were examined at a large manipulated experiment in Nevada, USA. Cyanobacteria-specific quantitative PCR surveys of cyanobacteria small-subunit (SSU) rRNA genes suggested a reduction in biocrust cyanobacterial biomass in the elevated CO2 treatment relative to the ambient controls. Additionally, SSU rRNA gene libraries and shotgun metagenomes showed reduced representation of cyanobacteria in the total microbial community. Taxonomic composition of the cyanobacteria was similar under ambient and elevated CO2 conditions, indicating the decline was manifest across multiple cyanobacterial lineages. Recruitment of cyanobacteria sequences from replicate shotgun metagenomes to cyanobacterial genomes representing major biocrust orders also suggested decreased abundance of cyanobacteria sequences across the majority of genomes tested. Functional assignment of cyanobacteria-related shotgun metagenome sequences indicated that four subsystem categories, three related to oxidative stress, were differentially abundant in relation to the elevated CO2 treatment. Taken together, these results suggest that elevated CO2 affected a generalized decrease in cyanobacteria in the biocrusts and may have favoured cyanobacteria with altered gene inventories for coping with oxidative stress.

  6. Next-Generation Sequencing Workflow for NSCLC Critical Samples Using a Targeted Sequencing Approach by Ion Torrent PGM™ Platform.

    Science.gov (United States)

    Vanni, Irene; Coco, Simona; Truini, Anna; Rusmini, Marta; Dal Bello, Maria Giovanna; Alama, Angela; Banelli, Barbara; Mora, Marco; Rijavec, Erika; Barletta, Giulia; Genova, Carlo; Biello, Federica; Maggioni, Claudia; Grossi, Francesco

    2015-12-03

    Next-generation sequencing (NGS) is a cost-effective technology capable of screening several genes simultaneously; however, its application in a clinical context requires an established workflow to acquire reliable sequencing results. Here, we report an optimized NGS workflow analyzing 22 lung cancer-related genes to sequence critical samples such as DNA from formalin-fixed paraffin-embedded (FFPE) blocks and circulating free DNA (cfDNA). Snap frozen and matched FFPE gDNA from 12 non-small cell lung cancer (NSCLC) patients, whose gDNA fragmentation status was previously evaluated using a multiplex PCR-based quality control, were successfully sequenced with Ion Torrent PGM™. The robust bioinformatic pipeline allowed us to correctly call both Single Nucleotide Variants (SNVs) and indels with a detection limit of 5%, achieving 100% specificity and 96% sensitivity. This workflow was also validated in 13 FFPE NSCLC biopsies. Furthermore, a specific protocol for low input gDNA capable of producing good sequencing data with high coverage, high uniformity, and a low error rate was also optimized. In conclusion, we demonstrate the feasibility of obtaining gDNA from FFPE samples suitable for NGS by performing appropriate quality controls. The optimized workflow, capable of screening low input gDNA, highlights NGS as a potential tool in the detection, disease monitoring, and treatment of NSCLC.

  7. 7 CFR 457.172 - Coverage Enhancement Option.

    Science.gov (United States)

    2010-01-01

    ... 7 Agriculture 6 2010-01-01 2010-01-01 false Coverage Enhancement Option. 457.172 Section 457.172..., DEPARTMENT OF AGRICULTURE COMMON CROP INSURANCE REGULATIONS § 457.172 Coverage Enhancement Option. The Coverage Enhancement Option for the 2009 and succeeding crop years are as follows: FCIC policies: United...

  8. 29 CFR 2.12 - Audiovisual coverage permitted.

    Science.gov (United States)

    2010-07-01

    ... 29 Labor 1 2010-07-01 2010-07-01 true Audiovisual coverage permitted. 2.12 Section 2.12 Labor Office of the Secretary of Labor GENERAL REGULATIONS Audiovisual Coverage of Administrative Hearings § 2.12 Audiovisual coverage permitted. The following are the types of hearings where the Department...

  9. Rapid isolation of microsatellite DNAs and identification of polymorphic mitochondrial DNA regions in the fish rotan (Perccottus glenii) invading European Russia

    Science.gov (United States)

    King, Timothy L.; Eackles, Michael S.; Reshetnikov, Andrey N.

    2015-01-01

    Human-mediated translocations and subsequent large-scale colonization by the invasive fish rotan (Perccottus glenii Dybowski, 1877; Perciformes, Odontobutidae), also known as Amur or Chinese sleeper, has resulted in dramatic transformations of small lentic ecosystems. However, no detailed genetic information exists on population structure, levels of effective movement, or relatedness among geographic populations of P. glenii within the European part of the range. We used massively parallel genomic DNA shotgun sequencing on the semiconductor-based Ion Torrent Personal Genome Machine (PGM) sequencing platform to identify nuclear microsatellite and mitochondrial DNA sequences in P. glenii from European Russia. Here we describe the characterization of nine nuclear microsatellite loci, ascertain levels of allelic diversity, heterozygosity, and demographic status of P. glenii collected from Ilev, Russia, one of several initial introduction points in European Russia. In addition, we mapped sequence reads to the complete P. glenii mitochondrial DNA sequence to identify polymorphic regions. Nuclear microsatellite markers developed for P. glenii yielded sufficient genetic diversity to: (1) produce unique multilocus genotypes; (2) elucidate structure among geographic populations; and (3) provide unique perspectives for analysis of population sizes and historical demographics. Among 4.9 million filtered P. glenii Ion Torrent PGM sequence reads, 11,304 mapped to the mitochondrial genome (NC_020350). This resulted in 100 % coverage of this genome to a mean coverage depth of 102X. A total of 130 variable sites were observed between the publicly available genome from China and the studied composite mitochondrial genome. Among these, 82 were diagnostic and monomorphic between the mitochondrial genomes and distributed among 15 genome regions. The polymorphic sites (N = 48) were distributed among 11 mitochondrial genome regions. Our results also indicate that sequence reads generated

  10. Genomic treasure troves: complete genome sequencing of herbarium and insect museum specimens.

    Science.gov (United States)

    Staats, Martijn; Erkens, Roy H J; van de Vossenberg, Bart; Wieringa, Jan J; Kraaijeveld, Ken; Stielow, Benjamin; Geml, József; Richardson, James E; Bakker, Freek T

    2013-01-01

    Unlocking the vast genomic diversity stored in natural history collections would create unprecedented opportunities for genome-scale evolutionary, phylogenetic, domestication and population genomic studies. Many researchers have been discouraged from using historical specimens in molecular studies because of both generally limited success of DNA extraction and the challenges associated with PCR-amplifying highly degraded DNA. In today's next-generation sequencing (NGS) world, opportunities and prospects for historical DNA have changed dramatically, as most NGS methods are actually designed for taking short fragmented DNA molecules as templates. Here we show that using a standard multiplex and paired-end Illumina sequencing approach, genome-scale sequence data can be generated reliably from dry-preserved plant, fungal and insect specimens collected up to 115 years ago, and with minimal destructive sampling. Using a reference-based assembly approach, we were able to produce the entire nuclear genome of a 43-year-old Arabidopsis thaliana (Brassicaceae) herbarium specimen with high and uniform sequence coverage. Nuclear genome sequences of three fungal specimens of 22-82 years of age (Agaricus bisporus, Laccaria bicolor, Pleurotus ostreatus) were generated with 81.4-97.9% exome coverage. Complete organellar genome sequences were assembled for all specimens. Using de novo assembly we retrieved between 16.2-71.0% of coding sequence regions, and hence remain somewhat cautious about prospects for de novo genome assembly from historical specimens. Non-target sequence contaminations were observed in 2 of our insect museum specimens. We anticipate that future museum genomics projects will perhaps not generate entire genome sequences in all cases (our specimens contained relatively small and low-complexity genomes), but at least generating vital comparative genomic data for testing (phylo)genetic, demographic and genetic hypotheses, that become increasingly more horizontal

  11. Functional coverages

    NARCIS (Netherlands)

    Donchyts, G.; Baart, F.; Jagers, H.R.A.; Van Dam, A.

    2011-01-01

    A new Application Programming Interface (API) is presented which simplifies working with geospatial coverages as well as many other data structures of a multi-dimensional nature. The main idea extends the Common Data Model (CDM) developed at the University Corporation for Atmospheric Research

  12. 29 CFR 95.31 - Insurance coverage.

    Science.gov (United States)

    2010-07-01

    ... recipient. Federally-owned property need not be insured unless required by the terms and conditions of the... § 95.31 Insurance coverage. Recipients shall, at a minimum, provide the equivalent insurance coverage...

  13. Structural Analysis of Unsaturated Glycosphingolipids Using Shotgun Ozone-Induced Dissociation Mass Spectrometry

    Science.gov (United States)

    Barrientos, Rodell C.; Vu, Ngoc; Zhang, Qibin

    2017-08-01

    Glycosphingolipids are essential biomolecules widely distributed across biological kingdoms yet remain relatively underexplored owing to both compositional and structural complexity. While the glycan head group has been the subject of most studies, there is paucity of reports on the lipid moiety, particularly the location of unsaturation. In this paper, ozone-induced dissociation mass spectrometry (OzID-MS) implemented in a traveling wave-based quadrupole time-of-flight (Q-ToF) mass spectrometer was applied to study unsaturated glycosphingolipids using shotgun approach. Resulting high resolution mass spectra facilitated the unambiguous identification of diagnostic OzID product ions. Using [M+Na]+ adducts of authentic standards, we observed that the long chain base and fatty acyl unsaturation had distinct reactivity with ozone. The reactivity of unsaturation in the fatty acyl chain was about 8-fold higher than that in the long chain base, which enables their straightforward differentiation. Influence of the head group, fatty acyl hydroxylation, and length of fatty acyl chain on the oxidative cleavage of double bonds was also observed. Application of this technique to bovine brain galactocerebrosides revealed co-isolated isobaric and regioisomeric species, which otherwise would be incompletely identified using contemporary collision-induced dissociation (CID) alone. These results highlight the potential of OzID-MS in glycosphingolipids research, which not only provides complementary structural information to existing CID technique but also facilitates de novo structural determination of these complex biomolecules. [Figure not available: see fulltext.

  14. Extending Coverage and Lifetime of K-coverage Wireless Sensor Networks Using Improved Harmony Search

    Directory of Open Access Journals (Sweden)

    Shohreh Ebrahimnezhad

    2011-07-01

    Full Text Available K-coverage wireless sensor networks try to provide facilities such that each hotspot region is covered by at least k sensors. Because, the fundamental evaluation metrics of such networks are coverage and lifetime, proposing an approach that extends both of them simultaneously has a lot of interests. In this article, it is supposed that two kinds of nodes are available: static and mobile. The proposed method, at first, tries to balance energy among sensor nodes using Improved Harmony Search (IHS algorithm in a k-coverage and connected wireless sensor network in order to achieve a sensor node deployment. Also, this method proposes a suitable place for a gateway node (Sink that collects data from all sensors. Second, in order to prolong the network lifetime, some of the high energy-consuming mobile nodes are moved to the closest positions of low energy-consuming ones and vice versa after a while. This leads increasing the lifetime of network while connectivity and k-coverage are preserved. Through computer simulations, experimental results verified that the proposed IHS-based algorithm found better solution compared to some related methods.

  15. Development and characterization of 21 polymorphic microsatellite markers for the barren-ground shrew, Sorex ugyunak (Mammalia: Sorcidae), through next-generation sequencing, and cross-species amplification in the masked shrew, S. cinereus

    Science.gov (United States)

    Sonsthagen, Sarah A.; Sage, G. Kevin; Fowler, Megan C.; Hope, Andrew G.; Cook, J.A.; Talbot, Sandra L.

    2013-01-01

    We used next generation shotgun sequencing to develop 21 novel microsatellite markers for the barren-ground shrew (Sorex ugyunak), which were polymorphic among individuals from northern Alaska. The loci displayed moderate allelic diversity (averaging 6.81 alleles per locus) and heterozygosity (averaging 70 %). Two loci deviated from Hardy–Weinberg equilibrium (HWE) due to heterozygote deficiency. While the population did not deviate from HWE overall, it showed significant linkage disequilibrium suggesting this population is not in mutation-drift equilibrium. Nineteen of 21 loci were polymorphic in masked shrews (S. cinereus) from interior Alaska and exhibited linkage equilibrium and HWE overall. All loci yielded sufficient variability for use in population studies.

  16. Sequencing and characterisation of rearrangements in three S. pastorianus strains reveals the presence of chimeric genes and gives evidence of breakpoint reuse.

    Directory of Open Access Journals (Sweden)

    Sarah K Hewitt

    Full Text Available Gross chromosomal rearrangements have the potential to be evolutionarily advantageous to an adapting organism. The generation of a hybrid species increases opportunity for recombination by bringing together two homologous genomes. We sought to define the location of genomic rearrangements in three strains of Saccharomyces pastorianus, a natural lager-brewing yeast hybrid of Saccharomyces cerevisiae and Saccharomyces eubayanus, using whole genome shotgun sequencing. Each strain of S. pastorianus has lost species-specific portions of its genome and has undergone extensive recombination, producing chimeric chromosomes. We predicted 30 breakpoints that we confirmed at the single nucleotide level by designing species-specific primers that flank each breakpoint, and then sequencing the PCR product. These rearrangements are the result of recombination between areas of homology between the two subgenomes, rather than repetitive elements such as transposons or tRNAs. Interestingly, 28/30 S. cerevisiae-S. eubayanus recombination breakpoints are located within genic regions, generating chimeric genes. Furthermore we show evidence for the reuse of two breakpoints, located in HSP82 and KEM1, in strains of proposed independent origin.

  17. A High-Coverage Yersinia pestis Genome from a Sixth-Century Justinianic Plague Victim.

    Science.gov (United States)

    Feldman, Michal; Harbeck, Michaela; Keller, Marcel; Spyrou, Maria A; Rott, Andreas; Trautmann, Bernd; Scholz, Holger C; Päffgen, Bernd; Peters, Joris; McCormick, Michael; Bos, Kirsten; Herbig, Alexander; Krause, Johannes

    2016-11-01

    The Justinianic Plague, which started in the sixth century and lasted to the mid eighth century, is thought to be the first of three historically documented plague pandemics causing massive casualties. Historical accounts and molecular data suggest the bacterium Yersinia pestis as its etiological agent. Here we present a new high-coverage (17.9-fold) Y. pestis genome obtained from a sixth-century skeleton recovered from a southern German burial site close to Munich. The reconstructed genome enabled the detection of 30 unique substitutions as well as structural differences that have not been previously described. We report indels affecting a lacl family transcription regulator gene as well as nonsynonymous substitutions in the nrdE, fadJ, and pcp genes, that have been suggested as plague virulence determinants or have been shown to be upregulated in different models of plague infection. In addition, we identify 19 false positive substitutions in a previously published lower-coverage Y. pestis genome from another archaeological site of the same time period and geographical region that is otherwise genetically identical to the high-coverage genome sequence reported here, suggesting low-genetic diversity of the plague during the sixth century in rural southern Germany. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  18. Dental Care Coverage and Use: Modeling Limitations and Opportunities

    Science.gov (United States)

    Moeller, John F.; Chen, Haiyan

    2014-01-01

    Objectives. We examined why older US adults without dental care coverage and use would have lower use rates if offered coverage than do those who currently have coverage. Methods. We used data from the 2008 Health and Retirement Study to estimate a multinomial logistic model to analyze the influence of personal characteristics in the grouping of older US adults into those with and those without dental care coverage and dental care use. Results. Compared with persons with no coverage and no dental care use, users of dental care with coverage were more likely to be younger, female, wealthier, college graduates, married, in excellent or very good health, and not missing all their permanent teeth. Conclusions. Providing dental care coverage to uninsured older US adults without use will not necessarily result in use rates similar to those with prior coverage and use. We have offered a model using modifiable factors that may help policy planners facilitate programs to increase dental care coverage uptake and use. PMID:24328635

  19. Sequential strand displacement beacon for detection of DNA coverage on functionalized gold nanoparticles.

    Science.gov (United States)

    Paliwoda, Rebecca E; Li, Feng; Reid, Michael S; Lin, Yanwen; Le, X Chris

    2014-06-17

    Functionalizing nanomaterials for diverse analytical, biomedical, and therapeutic applications requires determination of surface coverage (or density) of DNA on nanomaterials. We describe a sequential strand displacement beacon assay that is able to quantify specific DNA sequences conjugated or coconjugated onto gold nanoparticles (AuNPs). Unlike the conventional fluorescence assay that requires the target DNA to be fluorescently labeled, the sequential strand displacement beacon method is able to quantify multiple unlabeled DNA oligonucleotides using a single (universal) strand displacement beacon. This unique feature is achieved by introducing two short unlabeled DNA probes for each specific DNA sequence and by performing sequential DNA strand displacement reactions. Varying the relative amounts of the specific DNA sequences and spacing DNA sequences during their coconjugation onto AuNPs results in different densities of the specific DNA on AuNP, ranging from 90 to 230 DNA molecules per AuNP. Results obtained from our sequential strand displacement beacon assay are consistent with those obtained from the conventional fluorescence assays. However, labeling of DNA with some fluorescent dyes, e.g., tetramethylrhodamine, alters DNA density on AuNP. The strand displacement strategy overcomes this problem by obviating direct labeling of the target DNA. This method has broad potential to facilitate more efficient design and characterization of novel multifunctional materials for diverse applications.

  20. GABenchToB: a genome assembly benchmark tuned on bacteria and benchtop sequencers.

    Directory of Open Access Journals (Sweden)

    Sebastian Jünemann

    Full Text Available De novo genome assembly is the process of reconstructing a complete genomic sequence from countless small sequencing reads. Due to the complexity of this task, numerous genome assemblers have been developed to cope with different requirements and the different kinds of data provided by sequencers within the fast evolving field of next-generation sequencing technologies. In particular, the recently introduced generation of benchtop sequencers, like Illumina's MiSeq and Ion Torrent's Personal Genome Machine (PGM, popularized the easy, fast, and cheap sequencing of bacterial organisms to a broad range of academic and clinical institutions. With a strong pragmatic focus, here, we give a novel insight into the line of assembly evaluation surveys as we benchmark popular de novo genome assemblers based on bacterial data generated by benchtop sequencers. Therefore, single-library assemblies were generated, assembled, and compared to each other by metrics describing assembly contiguity and accuracy, and also by practice-oriented criteria as for instance computing time. In addition, we extensively analyzed the effect of the depth of coverage on the genome assemblies within reasonable ranges and the k-mer optimization problem of de Bruijn Graph assemblers. Our results show that, although both MiSeq and PGM allow for good genome assemblies, they require different approaches. They not only pair with different assembler types, but also affect assemblies differently regarding the depth of coverage where oversampling can become problematic. Assemblies vary greatly with respect to contiguity and accuracy but also by the requirement on the computing power. Consequently, no assembler can be rated best for all preconditions. Instead, the given kind of data, the demands on assembly quality, and the available computing infrastructure determines which assembler suits best. The data sets, scripts and all additional information needed to replicate our results are freely

  1. Targeted gene panel sequencing in children with very early onset inflammatory bowel disease--evaluation and prospective analysis.

    Science.gov (United States)

    Kammermeier, Jochen; Drury, Suzanne; James, Chela T; Dziubak, Robert; Ocaka, Louise; Elawad, Mamoun; Beales, Philip; Lench, Nicholas; Uhlig, Holm H; Bacchelli, Chiara; Shah, Neil

    2014-11-01

    Multiple monogenetic conditions with partially overlapping phenotypes can present with inflammatory bowel disease (IBD)-like intestinal inflammation. With novel genotype-specific therapies emerging, establishing a molecular diagnosis is becoming increasingly important. We have introduced targeted next-generation sequencing (NGS) technology as a prospective screening tool in children with very early onset IBD (VEOIBD). We evaluated the coverage of 40 VEOIBD genes in two separate cohorts undergoing targeted gene panel sequencing (TGPS) (n=25) and whole exome sequencing (WES) (n=20). TGPS revealed causative mutations in four genes (IL10RA, EPCAM, TTC37 and SKIV2L) discovered unexpected phenotypes and directly influenced clinical decision making by supporting as well as avoiding haematopoietic stem cell transplantation. TGPS resulted in significantly higher median coverage when compared with WES, fewer coverage deficiencies and improved variant detection across established VEOIBD genes. Excluding or confirming known VEOIBD genotypes should be considered early in the disease course in all cases of therapy-refractory VEOIBD, as it can have a direct impact on patient management. To combine both described NGS technologies would compensate for the limitations of WES for disease-specific application while offering the opportunity for novel gene discovery in the research setting. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

  2. Comparative evaluation of seven commercial products for human serum enrichment/depletion by shotgun proteomics.

    Science.gov (United States)

    Pisanu, Salvatore; Biosa, Grazia; Carcangiu, Laura; Uzzau, Sergio; Pagnozzi, Daniela

    2018-08-01

    Seven commercial products for human serum depletion/enrichment were tested and compared by shotgun proteomics. Methods were based on four different capturing agents: antibodies (Qproteome Albumin/IgG Depletion kit, ProteoPrep Immunoaffinity Albumin and IgG Depletion Kit, Top 2 Abundant Protein Depletion Spin Columns, and Top 12 Abundant Protein Depletion Spin Columns), specific ligands (Albumin/IgG Removal), mixture of antibodies and ligands (Albumin and IgG Depletion SpinTrap), and combinatorial peptide ligand libraries (ProteoMiner beads), respectively. All procedures, to a greater or lesser extent, allowed an increase of identified proteins. ProteoMiner beads provided the highest number of proteins; Albumin and IgG Depletion SpinTrap and ProteoPrep Immunoaffinity Albumin and IgG Depletion Kit resulted the most efficient in albumin removal; Top 2 and Top 12 Abundant Protein Depletion Spin Columns decreased the overall immunoglobulin levels more than other procedures, whereas specifically gamma immunoglobulins were mostly removed by Albumin and IgG Depletion SpinTrap, ProteoPrep Immunoaffinity Albumin and IgG Depletion Kit, and Top 2 Abundant Protein Depletion Spin Columns. Albumin/IgG Removal, a resin bound to a mixture of protein A and Cibacron Blue, behaved less efficiently than the other products. Copyright © 2018 Elsevier B.V. All rights reserved.

  3. Cloud CPFP: a shotgun proteomics data analysis pipeline using cloud and high performance computing.

    Science.gov (United States)

    Trudgian, David C; Mirzaei, Hamid

    2012-12-07

    We have extended the functionality of the Central Proteomics Facilities Pipeline (CPFP) to allow use of remote cloud and high performance computing (HPC) resources for shotgun proteomics data processing. CPFP has been modified to include modular local and remote scheduling for data processing jobs. The pipeline can now be run on a single PC or server, a local cluster, a remote HPC cluster, and/or the Amazon Web Services (AWS) cloud. We provide public images that allow easy deployment of CPFP in its entirety in the AWS cloud. This significantly reduces the effort necessary to use the software, and allows proteomics laboratories to pay for compute time ad hoc, rather than obtaining and maintaining expensive local server clusters. Alternatively the Amazon cloud can be used to increase the throughput of a local installation of CPFP as necessary. We demonstrate that cloud CPFP allows users to process data at higher speed than local installations but with similar cost and lower staff requirements. In addition to the computational improvements, the web interface to CPFP is simplified, and other functionalities are enhanced. The software is under active development at two leading institutions and continues to be released under an open-source license at http://cpfp.sourceforge.net.

  4. Patterns of genomic variation in the poplar rust fungus Melampsora larici-populina identify pathogenesis-related factors

    Directory of Open Access Journals (Sweden)

    Antoine ePersoons

    2014-09-01

    Full Text Available Melampsora larici-populina is a fungal pathogen responsible for foliar rust disease on poplar trees, which causes damage to forest plantations worldwide, particularly in Northern Europe. The reference genome of the isolate 98AG31 was previously sequenced using a whole genome shotgun strategy, revealing a large genome of 101 megabases containing 16,399 predicted genes, which included secreted protein genes representing poplar rust candidate effectors. In the present study, the genomes of 15 isolates collected over the past 20 years throughout the French territory, representing distinct virulence profiles, were characterized by massively parallel sequencing to assess genetic variation in the poplar rust fungus. Comparison to the reference genome revealed striking structural variations. Analysis of coverage and sequencing depth identified large missing regions between isolates related to the mating type loci. More than 611,824 single-nucleotide polymorphism (SNP positions were uncovered overall, indicating a remarkable level of polymorphism. Based on the accumulation of non-synonymous substitutions in coding sequences and the relative frequencies of synonymous and non-synonymous polymorphisms (i.e. PN/PS, we identify candidate genes that may be involved in fungal pathogenesis. Correlation between non-synonymous SNPs in genes encoding secreted proteins and pathotypes of the studied isolates revealed candidate genes potentially related to virulences 1, 6 and 8 of the poplar rust fungus.

  5. Geoseq: a tool for dissecting deep-sequencing datasets

    Directory of Open Access Journals (Sweden)

    Homann Robert

    2010-10-01

    Full Text Available Abstract Background Datasets generated on deep-sequencing platforms have been deposited in various public repositories such as the Gene Expression Omnibus (GEO, Sequence Read Archive (SRA hosted by the NCBI, or the DNA Data Bank of Japan (ddbj. Despite being rich data sources, they have not been used much due to the difficulty in locating and analyzing datasets of interest. Results Geoseq http://geoseq.mssm.edu provides a new method of analyzing short reads from deep sequencing experiments. Instead of mapping the reads to reference genomes or sequences, Geoseq maps a reference sequence against the sequencing data. It is web-based, and holds pre-computed data from public libraries. The analysis reduces the input sequence to tiles and measures the coverage of each tile in a sequence library through the use of suffix arrays. The user can upload custom target sequences or use gene/miRNA names for the search and get back results as plots and spreadsheet files. Geoseq organizes the public sequencing data using a controlled vocabulary, allowing identification of relevant libraries by organism, tissue and type of experiment. Conclusions Analysis of small sets of sequences against deep-sequencing datasets, as well as identification of public datasets of interest, is simplified by Geoseq. We applied Geoseq to, a identify differential isoform expression in mRNA-seq datasets, b identify miRNAs (microRNAs in libraries, and identify mature and star sequences in miRNAS and c to identify potentially mis-annotated miRNAs. The ease of using Geoseq for these analyses suggests its utility and uniqueness as an analysis tool.

  6. Sequencing and De Novo Transcriptome Assembly of Brachypodium sylvaticum (Poaceae

    Directory of Open Access Journals (Sweden)

    Samuel E. Fox

    2013-03-01

    Full Text Available Premise of the study: We report the de novo assembly and characterization of the transcriptomes of Brachypodium sylvaticum (slender false-brome accessions from native populations of Spain and Greece, and an invasive population west of Corvallis, Oregon, USA. Methods and Results: More than 350 million sequence reads from the mRNA libraries prepared from three B. sylvaticum genotypes were assembled into 120,091 (Corvallis, 104,950 (Spain, and 177,682 (Greece transcript contigs. In comparison with the B. distachyon Bd21 reference genome and GenBank protein sequences, we estimate >90% exome coverage for B. sylvaticum. The transcripts were assigned Gene Ontology and InterPro annotations. Brachypodium sylvaticum sequence reads aligned against the Bd21 genome revealed 394,654 single-nucleotide polymorphisms (SNPs and >20,000 simple sequence repeat (SSR DNA sites. Conclusions: To our knowledge, this is the first report of transcriptome sequencing of invasive plant species with a closely related sequenced reference genome. The sequences and identified SNP variant and SSR sites will provide tools for developing novel genetic markers for use in genotyping and characterization of invasive behavior of B. sylvaticum.

  7. Phylogenetic stratigraphy in the Guerrero Negro hypersaline microbial mat.

    Science.gov (United States)

    Harris, J Kirk; Caporaso, J Gregory; Walker, Jeffrey J; Spear, John R; Gold, Nicholas J; Robertson, Charles E; Hugenholtz, Philip; Goodrich, Julia; McDonald, Daniel; Knights, Dan; Marshall, Paul; Tufo, Henry; Knight, Rob; Pace, Norman R

    2013-01-01

    The microbial mats of Guerrero Negro (GN), Baja California Sur, Mexico historically were considered a simple environment, dominated by cyanobacteria and sulfate-reducing bacteria. Culture-independent rRNA community profiling instead revealed these microbial mats as among the most phylogenetically diverse environments known. A preliminary molecular survey of the GN mat based on only ∼1500 small subunit rRNA gene sequences discovered several new phylum-level groups in the bacterial phylogenetic domain and many previously undetected lower-level taxa. We determined an additional ∼119,000 nearly full-length sequences and 28,000 >200 nucleotide 454 reads from a 10-layer depth profile of the GN mat. With this unprecedented coverage of long sequences from one environment, we confirm the mat is phylogenetically stratified, presumably corresponding to light and geochemical gradients throughout the depth of the mat. Previous shotgun metagenomic data from the same depth profile show the same stratified pattern and suggest that metagenome properties may be predictable from rRNA gene sequences. We verify previously identified novel lineages and identify new phylogenetic diversity at lower taxonomic levels, for example, thousands of operational taxonomic units at the family-genus levels differ considerably from known sequences. The new sequences populate parts of the bacterial phylogenetic tree that previously were poorly described, but indicate that any comprehensive survey of GN diversity has only begun. Finally, we show that taxonomic conclusions are generally congruent between Sanger and 454 sequencing technologies, with the taxonomic resolution achieved dependent on the abundance of reference sequences in the relevant region of the rRNA tree of life.

  8. Enhanced Methods for Local Ancestry Assignment in Sequenced Admixed Individuals

    Science.gov (United States)

    Brown, Robert; Pasaniuc, Bogdan

    2014-01-01

    Inferring the ancestry at each locus in the genome of recently admixed individuals (e.g., Latino Americans) plays a major role in medical and population genetic inferences, ranging from finding disease-risk loci, to inferring recombination rates, to mapping missing contigs in the human genome. Although many methods for local ancestry inference have been proposed, most are designed for use with genotyping arrays and fail to make use of the full spectrum of data available from sequencing. In addition, current haplotype-based approaches are very computationally demanding, requiring large computational time for moderately large sample sizes. Here we present new methods for local ancestry inference that leverage continent-specific variants (CSVs) to attain increased performance over existing approaches in sequenced admixed genomes. A key feature of our approach is that it incorporates the admixed genomes themselves jointly with public datasets, such as 1000 Genomes, to improve the accuracy of CSV calling. We use simulations to show that our approach attains accuracy similar to widely used computationally intensive haplotype-based approaches with large decreases in runtime. Most importantly, we show that our method recovers comparable local ancestries, as the 1000 Genomes consensus local ancestry calls in the real admixed individuals from the 1000 Genomes Project. We extend our approach to account for low-coverage sequencing and show that accurate local ancestry inference can be attained at low sequencing coverage. Finally, we generalize CSVs to sub-continental population-specific variants (sCSVs) and show that in some cases it is possible to determine the sub-continental ancestry for short chromosomal segments on the basis of sCSVs. PMID:24743331

  9. Coverage matters: insurance and health care

    National Research Council Canada - National Science Library

    Board on Health Care Services Staff; Institute of Medicine Staff; Institute of Medicine; National Academy of Sciences

    2001-01-01

    ...? How does the system of insurance coverage in the U.S. operate, and where does it fail? The first of six Institute of Medicine reports that will examine in detail the consequences of having a large uninsured population, Coverage Matters...

  10. Metagenomic species profiling using universal phylogenetic marker genes

    DEFF Research Database (Denmark)

    Sunagawa, Shinichi; Mende, Daniel R; Zeller, Georg

    2013-01-01

    To quantify known and unknown microorganisms at species-level resolution using shotgun sequencing data, we developed a method that establishes metagenomic operational taxonomic units (mOTUs) based on single-copy phylogenetic marker genes. Applied to 252 human fecal samples, the method revealed th...... that on average 43% of the species abundance and 58% of the richness cannot be captured by current reference genome-based methods. An implementation of the method is available at http://www.bork.embl.de/software/mOTU/.......To quantify known and unknown microorganisms at species-level resolution using shotgun sequencing data, we developed a method that establishes metagenomic operational taxonomic units (mOTUs) based on single-copy phylogenetic marker genes. Applied to 252 human fecal samples, the method revealed...

  11. Proteomic analysis of a pleistocene mammoth femur reveals more than one hundred ancient bone proteins

    DEFF Research Database (Denmark)

    Cappellini, Enrico; Jensen, Lars Juhl; Szklarczyk, Damian Milosz

    2012-01-01

    We used high-sensitivity, high-resolution tandem mass spectrometry to shotgun sequence ancient protein remains extracted from a 43 000 year old woolly mammoth (Mammuthus primigenius) bone preserved in the Siberian permafrost. For the first time, 126 unique protein accessions, mostly low-abundance......We used high-sensitivity, high-resolution tandem mass spectrometry to shotgun sequence ancient protein remains extracted from a 43 000 year old woolly mammoth (Mammuthus primigenius) bone preserved in the Siberian permafrost. For the first time, 126 unique protein accessions, mostly low......-abundance extracellular matrix and plasma proteins, were confidently identified by solid molecular evidence. Among the best characterized was the carrier protein serum albumin, presenting two single amino acid substitutions compared to extant African (Loxodonta africana) and Indian (Elephas maximus) elephants. Strong...

  12. Short read sequence typing (SRST: multi-locus sequence types from short reads

    Directory of Open Access Journals (Sweden)

    Inouye Michael

    2012-07-01

    Full Text Available Abstract Background Multi-locus sequence typing (MLST has become the gold standard for population analyses of bacterial pathogens. This method focuses on the sequences of a small number of loci (usually seven to divide the population and is simple, robust and facilitates comparison of results between laboratories and over time. Over the last decade, researchers and population health specialists have invested substantial effort in building up public MLST databases for nearly 100 different bacterial species, and these databases contain a wealth of important information linked to MLST sequence types such as time and place of isolation, host or niche, serotype and even clinical or drug resistance profiles. Recent advances in sequencing technology mean it is increasingly feasible to perform bacterial population analysis at the whole genome level. This offers massive gains in resolving power and genetic profiling compared to MLST, and will eventually replace MLST for bacterial typing and population analysis. However given the wealth of data currently available in MLST databases, it is crucial to maintain backwards compatibility with MLST schemes so that new genome analyses can be understood in their proper historical context. Results We present a software tool, SRST, for quick and accurate retrieval of sequence types from short read sets, using inputs easily downloaded from public databases. SRST uses read mapping and an allele assignment score incorporating sequence coverage and variability, to determine the most likely allele at each MLST locus. Analysis of over 3,500 loci in more than 500 publicly accessible Illumina read sets showed SRST to be highly accurate at allele assignment. SRST output is compatible with common analysis tools such as eBURST, Clonal Frame or PhyloViz, allowing easy comparison between novel genome data and MLST data. Alignment, fastq and pileup files can also be generated for novel alleles. Conclusions SRST is a novel

  13. Genome sequence of Mycobacterium yongonense RT 955-2015 isolate from a patient misdiagnosed with multi-drug resistant tuberculosis: first clinical isolate in Tanzania.

    Science.gov (United States)

    Mnyambwa, Nicholaus Peter; Kim, Dong-Jin; Ngadaya, Esther; Chun, Jongsik; Ha, Sung-Min; Petrucka, Pammla; Addo, Kennedy Kwasi; Kazwala, Rudovick R; Mfinanga, Sayoki G

    2018-04-24

    Mycobacterium yongonense is a recently described novel species belonging to Mycobacterium avium complex which is the most prevalent etiology of non-tuberculous mycobacteria associated with pulmonary infections, and posing tuberculosis diagnostic challenges in high-burden, resource-constrained settings. We used whole genome shotgun sequencing and comparative microbial genomic analyses to characterize the isolate from a patient diagnosed with multi-drug resistant tuberculosis (MDR-TB) after relapse. We present a genome sequence of the first case of M. yongonense (M. yongonense RT 955-2015) in Tanzania. Sequence analysis revealed that the RT 955-2015 strain had a high similarity to M. yongonense 05-1390(T) (98.74%) and M. chimaera DSM 44623(T) (98%). Its 16S rRNA showed similarity to M. paraintracellulare KCTC 290849(T) (100%); M. intracellulare ATCC 13950(T) (100%); M. chimaera DSM 44623(T) (99.9%); and M. yongonense 05-1390(T) (98%). The strain had a substantially different rpoB sequence from that of M. yongonense 05-1390 (95.16%) but exhibited a sequence closely related to M. chimaera DSM 44623(T) (99.86%), M. intracellulare ATCC 13950(T) (99.53%), and M. paraintracellulare KCTC 290849(T) (99.53%). In light of the OrthoANI algorithm, and phylogenetic analysis, we conclude that the isolate was M. yongonense Type II genotype, which is an indication that the patient was misdiagnosed with TB/MDR-TB and received inappropriate treatment. Copyright © 2018. Published by Elsevier Ltd.

  14. PANTHER version 6: protein sequence and function evolution data with expanded representation of biological pathways

    OpenAIRE

    Mi, Huaiyu; Guo, Nan; Kejariwal, Anish; Thomas, Paul D.

    2006-01-01

    PANTHER is a freely available, comprehensive software system for relating protein sequence evolution to the evolution of specific protein functions and biological roles. Since 2005, there have been three main improvements to PANTHER. First, the sequences used to create evolutionary trees are carefully selected to provide coverage of phylogenetic as well as functional information. Second, PANTHER is now a member of the InterPro Consortium, and the PANTHER hidden markov Models (HMMs) are distri...

  15. The complete chloroplast genome sequence of Helwingia himalaica (Helwingiaceae, Aquifoliales) and a chloroplast phylogenomic analysis of the Campanulidae

    OpenAIRE

    Yao, Xin; Liu, Ying-Ying; Tan, Yun-Hong; Song, Yu; Corlett, Richard T.

    2016-01-01

    Complete chloroplast genome sequences have been very useful for understanding phylogenetic relationships in angiosperms at the family level and above, but there are currently large gaps in coverage. We report the chloroplast genome for Helwingia himalaica, the first in the distinctive family Helwingiaceae and only the second genus to be sequenced in the order Aquifoliales. We then combine this with 36 published sequences in the large (c. 35,000 species) subclass Campanulidae in order to inves...

  16. Sequence of a complete chicken BG haplotype shows dynamic expansion and contraction of two gene lineages with particular expression patterns

    DEFF Research Database (Denmark)

    Salomonsen, Jan; Chattaway, John A.; Chan, Andrew C. Y.

    2014-01-01

    complex (MHC), and show striking association with particular autoimmune diseases. In chickens, BG genes encode homologues with somewhat different domain organisation. Only a few BG genes have been characterised, one involved in actin-myosin interaction in the intestinal brush border, and another...... implicated in resistance to viral diseases. We characterise all BG genes in B12 chickens, finding a multigene family organised as tandem repeats in the BG region outside the MHC, a single gene in the MHC (the BF-BL region), and another single gene on a different chromosome. There is a precise cell and tissue...... many hybrid genes, suggesting recombination and/or deletion as major evolutionary forces. We identify BG genes in the chicken whole genome shotgun sequence, as well as by comparison to other haplotypes by fibre fluorescence in situ hybridisation, confirming dynamic expansion and contraction within...

  17. Immunization Coverage

    Science.gov (United States)

    ... room/fact-sheets/detail/immunization-coverage","@context":"http://schema.org","@type":"Article"}; العربية 中文 français русский español ... Plan Global Health Observatory (GHO) data - Immunization More information on vaccines and immunization News 1 in 10 ...

  18. Prediction of Scylla olivacea (Crustacea; Brachyura) peptide hormones using publicly accessible transcriptome shotgun assembly (TSA) sequences.

    Science.gov (United States)

    Christie, Andrew E

    2016-05-01

    The aquaculture of crabs from the genus Scylla is of increasing economic importance for many Southeast Asian countries. Expansion of Scylla farming has led to increased efforts to understand the physiology and behavior of these crabs, and as such, there are growing molecular resources for them. Here, publicly accessible Scylla olivacea transcriptomic data were mined for putative peptide-encoding transcripts; the proteins deduced from the identified sequences were then used to predict the structures of mature peptide hormones. Forty-nine pre/preprohormone-encoding transcripts were identified, allowing for the prediction of 187 distinct mature peptides. The identified peptides included isoforms of adipokinetic hormone-corazonin-like peptide, allatostatin A, allatostatin B, allatostatin C, bursicon β, CCHamide, corazonin, crustacean cardioactive peptide, crustacean hyperglycemic hormone/molt-inhibiting hormone, diuretic hormone 31, eclosion hormone, FMRFamide-like peptide, HIGSLYRamide, insulin-like peptide, intocin, leucokinin, myosuppressin, neuroparsin, neuropeptide F, orcokinin, pigment dispersing hormone, pyrokinin, red pigment concentrating hormone, RYamide, short neuropeptide F, SIFamide and tachykinin-related peptide, all well-known neuropeptide families. Surprisingly, the tissue used to generate the transcriptome mined here is reported to be testis. Whether or not the testis samples had neural contamination is unknown. However, if the peptides are truly produced by this reproductive organ, it could have far reaching consequences for the study of crustacean endocrinology, particularly in the area of reproductive control. Regardless, this peptidome is the largest thus far predicted for any brachyuran (true crab) species, and will serve as a foundation for future studies of peptidergic control in members of the commercially important genus Scylla. Copyright © 2016 Elsevier Inc. All rights reserved.

  19. Direct comparisons of Illumina vs. Roche 454 sequencing technologies on the same microbial community DNA sample.

    Science.gov (United States)

    Luo, Chengwei; Tsementzi, Despina; Kyrpides, Nikos; Read, Timothy; Konstantinidis, Konstantinos T

    2012-01-01

    Next-generation sequencing (NGS) is commonly used in metagenomic studies of complex microbial communities but whether or not different NGS platforms recover the same diversity from a sample and their assembled sequences are of comparable quality remain unclear. We compared the two most frequently used platforms, the Roche 454 FLX Titanium and the Illumina Genome Analyzer (GA) II, on the same DNA sample obtained from a complex freshwater planktonic community. Despite the substantial differences in read length and sequencing protocols, the platforms provided a comparable view of the community sampled. For instance, derived assemblies overlapped in ~90% of their total sequences and in situ abundances of genes and genotypes (estimated based on sequence coverage) correlated highly between the two platforms (R(2)>0.9). Evaluation of base-call error, frameshift frequency, and contig length suggested that Illumina offered equivalent, if not better, assemblies than Roche 454. The results from metagenomic samples were further validated against DNA samples of eighteen isolate genomes, which showed a range of genome sizes and G+C% content. We also provide quantitative estimates of the errors in gene and contig sequences assembled from datasets characterized by different levels of complexity and G+C% content. For instance, we noted that homopolymer-associated, single-base errors affected ~1% of the protein sequences recovered in Illumina contigs of 10× coverage and 50% G+C; this frequency increased to ~3% when non-homopolymer errors were also considered. Collectively, our results should serve as a useful practical guide for choosing proper sampling strategies and data possessing protocols for future metagenomic studies.

  20. High-Throughput Next-Generation Sequencing of Polioviruses

    Science.gov (United States)

    Montmayeur, Anna M.; Schmidt, Alexander; Zhao, Kun; Magaña, Laura; Iber, Jane; Castro, Christina J.; Chen, Qi; Henderson, Elizabeth; Ramos, Edward; Shaw, Jing; Tatusov, Roman L.; Dybdahl-Sissoko, Naomi; Endegue-Zanga, Marie Claire; Adeniji, Johnson A.; Oberste, M. Steven; Burns, Cara C.

    2016-01-01

    ABSTRACT The poliovirus (PV) is currently targeted for worldwide eradication and containment. Sanger-based sequencing of the viral protein 1 (VP1) capsid region is currently the standard method for PV surveillance. However, the whole-genome sequence is sometimes needed for higher resolution global surveillance. In this study, we optimized whole-genome sequencing protocols for poliovirus isolates and FTA cards using next-generation sequencing (NGS), aiming for high sequence coverage, efficiency, and throughput. We found that DNase treatment of poliovirus RNA followed by random reverse transcription (RT), amplification, and the use of the Nextera XT DNA library preparation kit produced significantly better results than other preparations. The average viral reads per total reads, a measurement of efficiency, was as high as 84.2% ± 15.6%. PV genomes covering >99 to 100% of the reference length were obtained and validated with Sanger sequencing. A total of 52 PV genomes were generated, multiplexing as many as 64 samples in a single Illumina MiSeq run. This high-throughput, sequence-independent NGS approach facilitated the detection of a diverse range of PVs, especially for those in vaccine-derived polioviruses (VDPV), circulating VDPV, or immunodeficiency-related VDPV. In contrast to results from previous studies on other viruses, our results showed that filtration and nuclease treatment did not discernibly increase the sequencing efficiency of PV isolates. However, DNase treatment after nucleic acid extraction to remove host DNA significantly improved the sequencing results. This NGS method has been successfully implemented to generate PV genomes for molecular epidemiology of the most recent PV isolates. Additionally, the ability to obtain full PV genomes from FTA cards will aid in facilitating global poliovirus surveillance. PMID:27927929

  1. 22 CFR 226.31 - Insurance coverage.

    Science.gov (United States)

    2010-04-01

    ... 22 Foreign Relations 1 2010-04-01 2010-04-01 false Insurance coverage. 226.31 Section 226.31 Foreign Relations AGENCY FOR INTERNATIONAL DEVELOPMENT ADMINISTRATION OF ASSISTANCE AWARDS TO U.S. NON-GOVERNMENTAL ORGANIZATIONS Post-award Requirements Property Standards § 226.31 Insurance coverage. Recipients...

  2. 14 CFR 1260.131 - Insurance coverage.

    Science.gov (United States)

    2010-01-01

    ... coverage. Recipients shall, at a minimum, provide the equivalent insurance coverage for real property and equipment acquired with Federal funds as provided for property owned by the recipient. Federally-owned property need not be insured unless required by the terms and conditions of the award. ...

  3. The genome sequence of taurine cattle: A window to ruminant biology and evolution

    Science.gov (United States)

    To understand the biology and evolution of ruminants, the cattle genome was sequenced to about sevenfold coverage. The cattle genome contains a minimum of 22,000 genes, with a core set of 14,345 orthologs shared among seven mammalian species of which 1217 are absent or undetected in noneutherian (ma...

  4. Effects of coverage gap reform on adherence to diabetes medications.

    Science.gov (United States)

    Zeng, Feng; Patel, Bimal V; Brunetti, Louis

    2013-04-01

    To investigate the impact of Part D coverage gap reform on diabetes medication adherence. Retrospective data analysis based on pharmacy claims data from a national pharmacy benefit manager. We used a difference-in-difference-indifference method to evaluate the impact of coverage gap reform on adherence to diabetes medications. Two cohorts (2010 and 2011) were constructed to represent the last year before Affordable Care Act (ACA) reform and the first year after reform, respectively. Each patient had 2 observations: 1 before and 1 after entering the coverage gap. Patients in each cohort were divided into groups based on type of gap coverage: no coverage, partial coverage (generics only), and full coverage. Following ACA reform, patients with no gap coverage and patients with partial gap coverage experienced substantial drops in copayments in the coverage gap in 2011. Their adherence to diabetes medications in the gap, measured by percentage of days covered, improved correspondingly (2.99 percentage points, 95% confidence interval [CI] 0.49-5.48, P = .019 for patients with no coverage; 6.46 percentage points, 95% CI 3.34-9.58, P gap in 2011. However, their adherence did not increase (-0.13 percentage point, P = .8011). In the first year of ACA coverage gap reform, copayments in the gap decreased substantially for all patients. Patients with no coverage and patients with partial coverage in the gap had better adherence in the gap in 2011.

  5. Genome Improvement at JGI-HAGSC

    Energy Technology Data Exchange (ETDEWEB)

    Grimwood, Jane; Schmutz, Jeremy J.; Myers, Richard M.

    2012-03-03

    Since the completion of the sequencing of the human genome, the Joint Genome Institute (JGI) has rapidly expanded its scientific goals in several DOE mission-relevant areas. At the JGI-HAGSC, we have kept pace with this rapid expansion of projects with our focus on assessing, assembling, improving and finishing eukaryotic whole genome shotgun (WGS) projects for which the shotgun sequence is generated at the Production Genomic Facility (JGI-PGF). We follow this by combining the draft WGS with genomic resources generated at JGI-HAGSC or in collaborator laboratories (including BAC end sequences, genetic maps and FLcDNA sequences) to produce an improved draft sequence. For eukaryotic genomes important to the DOE mission, we then add further information from directed experiments to produce reference genomic sequences that are publicly available for any scientific researcher. Also, we have continued our program for producing BAC-based finished sequence, both for adding information to JGI genome projects and for small BAC-based sequencing projects proposed through any of the JGI sequencing programs. We have now built our computational expertise in WGS assembly and analysis and have moved eukaryotic genome assembly from the JGI-PGF to JGI-HAGSC. We have concentrated our assembly development work on large plant genomes and complex fungal and algal genomes.

  6. Exome Sequencing in Suspected Monogenic Dyslipidemias

    Science.gov (United States)

    Stitziel, Nathan O.; Peloso, Gina M.; Abifadel, Marianne; Cefalu, Angelo B.; Fouchier, Sigrid; Motazacker, M. Mahdi; Tada, Hayato; Larach, Daniel B.; Awan, Zuhier; Haller, Jorge F.; Pullinger, Clive R.; Varret, Mathilde; Rabès, Jean-Pierre; Noto, Davide; Tarugi, Patrizia; Kawashiri, Masa-aki; Nohara, Atsushi; Yamagishi, Masakazu; Risman, Marjorie; Deo, Rahul; Ruel, Isabelle; Shendure, Jay; Nickerson, Deborah A.; Wilson, James G.; Rich, Stephen S.; Gupta, Namrata; Farlow, Deborah N.; Neale, Benjamin M.; Daly, Mark J.; Kane, John P.; Freeman, Mason W.; Genest, Jacques; Rader, Daniel J.; Mabuchi, Hiroshi; Kastelein, John J.P.; Hovingh, G. Kees; Averna, Maurizio R.; Gabriel, Stacey; Boileau, Catherine; Kathiresan, Sekar

    2015-01-01

    Background Exome sequencing is a promising tool for gene mapping in Mendelian disorders. We utilized this technique in an attempt to identify novel genes underlying monogenic dyslipidemias. Methods and Results We performed exome sequencing on 213 selected family members from 41 kindreds with suspected Mendelian inheritance of extreme levels of low-density lipoprotein (LDL) cholesterol (after candidate gene sequencing excluded known genetic causes for high LDL cholesterol families) or high-density lipoprotein (HDL) cholesterol. We used standard analytic approaches to identify candidate variants and also assigned a polygenic score to each individual in order to account for their burden of common genetic variants known to influence lipid levels. In nine families, we identified likely pathogenic variants in known lipid genes (ABCA1, APOB, APOE, LDLR, LIPA, and PCSK9); however, we were unable to identify obvious genetic etiologies in the remaining 32 families despite follow-up analyses. We identified three factors that limited novel gene discovery: (1) imperfect sequencing coverage across the exome hid potentially causal variants; (2) large numbers of shared rare alleles within families obfuscated causal variant identification; and (3) individuals from 15% of families carried a significant burden of common lipid-related alleles, suggesting complex inheritance can masquerade as monogenic disease. Conclusions We identified the genetic basis of disease in nine of 41 families; however, none of these represented novel gene discoveries. Our results highlight the promise and limitations of exome sequencing as a discovery technique in suspected monogenic dyslipidemias. Considering the confounders identified may inform the design of future exome sequencing studies. PMID:25632026

  7. 7 CFR 1737.31 - Area Coverage Survey (ACS).

    Science.gov (United States)

    2010-01-01

    ... an ACS are provided in RUS Telecommunications Engineering and Construction Manual section 205. (e... Studies-Area Coverage Survey and Loan Design § 1737.31 Area Coverage Survey (ACS). (a) The Area Coverage... the borrower's records contain sufficient information as to subscriber development to enable cost...

  8. 2 CFR 215.31 - Insurance coverage.

    Science.gov (United States)

    2010-01-01

    ... Insurance coverage. Recipients shall, at a minimum, provide the equivalent insurance coverage for real property and equipment acquired with Federal funds as provided to property owned by the recipient. Federally-owned property need not be insured unless required by the terms and conditions of the award. ...

  9. 36 CFR 1210.31 - Insurance coverage.

    Science.gov (United States)

    2010-07-01

    ....31 Insurance coverage. Recipients shall, at a minimum, provide the equivalent insurance coverage for real property and equipment acquired with NHPRC funds as provided to property owned by the recipient. Federally-owned property need not be insured unless required by the terms and conditions of the award. ...

  10. 28 CFR 55.6 - Coverage under section 203(c).

    Science.gov (United States)

    2010-07-01

    ... THE VOTING RIGHTS ACT REGARDING LANGUAGE MINORITY GROUPS Nature of Coverage § 55.6 Coverage under section 203(c). (a) Coverage formula. There are four ways in which a political subdivision can become subject to section 203(c). 2 2 The criteria for coverage are contained in section 203(b). (1) Political...

  11. Metagenomic analysis and functional characterization of the biogas microbiome using high throughput shotgun sequencing and a novel binning strategy

    DEFF Research Database (Denmark)

    Campanaro, Stefano; Treu, Laura; Kougias, Panagiotis

    2016-01-01

    Biogas production is an economically attractive technology that has gained momentum worldwide over the past years. Biogas is produced by a biologically mediated process, widely known as "anaerobic digestion." This process is performed by a specialized and complex microbial community, in which...... performed using >400 proteins revealed that the biogas community is a trove of new species. A new approach based on functional properties as per network representation was developed to assign roles to the microbial species. The organization of the anaerobic digestion microbiome is resembled by a funnel...... on the phylogenetic and functional characterization of the microbial community populating biogas reactors. By applying for the first time high-throughput sequencing and a novel binning strategy, the identified genes were anchored to single genomes providing a clear understanding of their metabolic pathways...

  12. Comparison of C. elegans and C. briggsae genome sequences reveals extensive conservation of chromosome organization and synteny.

    Directory of Open Access Journals (Sweden)

    LaDeana W Hillier

    2007-07-01

    Full Text Available To determine whether the distinctive features of Caenorhabditis elegans chromosomal organization are shared with the C. briggsae genome, we constructed a single nucleotide polymorphism-based genetic map to order and orient the whole genome shotgun assembly along the six C. briggsae chromosomes. Although these species are of the same genus, their most recent common ancestor existed 80-110 million years ago, and thus they are more evolutionarily distant than, for example, human and mouse. We found that, like C. elegans chromosomes, C. briggsae chromosomes exhibit high levels of recombination on the arms along with higher repeat density, a higher fraction of intronic sequence, and a lower fraction of exonic sequence compared with chromosome centers. Despite extensive intrachromosomal rearrangements, 1:1 orthologs tend to remain in the same region of the chromosome, and colinear blocks of orthologs tend to be longer in chromosome centers compared with arms. More strikingly, the two species show an almost complete conservation of synteny, with 1:1 orthologs present on a single chromosome in one species also found on a single chromosome in the other. The conservation of both chromosomal organization and synteny between these two distantly related species suggests roles for chromosome organization in the fitness of an organism that are only poorly understood presently.

  13. Insurance premiums and insurance coverage of near-poor children.

    Science.gov (United States)

    Hadley, Jack; Reschovsky, James D; Cunningham, Peter; Kenney, Genevieve; Dubay, Lisa

    States increasingly are using premiums for near-poor children in their public insurance programs (Medicaid/SCHIP) to limit private insurance crowd-out and constrain program costs. Using national data from four rounds of the Community Tracking Study Household Surveys spanning the seven years from 1996 to 2003, this study estimates a multinomial logistic regression model examining how public and private insurance premiums affect insurance coverage outcomes (Medicaid/SCHIP coverage, private coverage, and no coverage). Higher public premiums are significantly associated with a lower probability of public coverage and higher probabilities of private coverage and uninsurance; higher private premiums are significantly related to a lower probability of private coverage and higher probabilities of public coverage and uninsurance. The results imply that uninsurance rates will rise if both public and private premiums increase, and suggest that states that impose or increase public insurance premiums for near-poor children will succeed in discouraging crowd-out of private insurance, but at the expense of higher rates of uninsurance. Sustained increases in private insurance premiums will continue to create enrollment pressures on state insurance programs for children.

  14. 42 CFR 457.410 - Health benefits coverage options.

    Science.gov (United States)

    2010-10-01

    ... 42 Public Health 4 2010-10-01 2010-10-01 false Health benefits coverage options. 457.410 Section 457.410 Public Health CENTERS FOR MEDICARE & MEDICAID SERVICES, DEPARTMENT OF HEALTH AND HUMAN... State Plan Requirements: Coverage and Benefits § 457.410 Health benefits coverage options. (a) Types of...

  15. Next-generation phylogeography: a targeted approach for multilocus sequencing of non-model organisms.

    Directory of Open Access Journals (Sweden)

    Jonathan B Puritz

    Full Text Available The field of phylogeography has long since realized the need and utility of incorporating nuclear DNA (nDNA sequences into analyses. However, the use of nDNA sequence data, at the population level, has been hindered by technical laboratory difficulty, sequencing costs, and problematic analytical methods dealing with genotypic sequence data, especially in non-model organisms. Here, we present a method utilizing the 454 GS-FLX Titanium pyrosequencing platform with the capacity to simultaneously sequence two species of sea star (Meridiastra calcar and Parvulastra exigua at five different nDNA loci across 16 different populations of 20 individuals each per species. We compare results from 3 populations with traditional Sanger sequencing based methods, and demonstrate that this next-generation sequencing platform is more time and cost effective and more sensitive to rare variants than Sanger based sequencing. A crucial advantage is that the high coverage of clonally amplified sequences simplifies haplotype determination, even in highly polymorphic species. This targeted next-generation approach can greatly increase the use of nDNA sequence loci in phylogeographic and population genetic studies by mitigating many of the time, cost, and analytical issues associated with highly polymorphic, diploid sequence markers.

  16. Quantum Point Contact Single-Nucleotide Conductance for DNA and RNA Sequence Identification.

    Science.gov (United States)

    Afsari, Sepideh; Korshoj, Lee E; Abel, Gary R; Khan, Sajida; Chatterjee, Anushree; Nagpal, Prashant

    2017-11-28

    Several nanoscale electronic methods have been proposed for high-throughput single-molecule nucleic acid sequence identification. While many studies display a large ensemble of measurements as "electronic fingerprints" with some promise for distinguishing the DNA and RNA nucleobases (adenine, guanine, cytosine, thymine, and uracil), important metrics such as accuracy and confidence of base calling fall well below the current genomic methods. Issues such as unreliable metal-molecule junction formation, variation of nucleotide conformations, insufficient differences between the molecular orbitals responsible for single-nucleotide conduction, and lack of rigorous base calling algorithms lead to overlapping nanoelectronic measurements and poor nucleotide discrimination, especially at low coverage on single molecules. Here, we demonstrate a technique for reproducible conductance measurements on conformation-constrained single nucleotides and an advanced algorithmic approach for distinguishing the nucleobases. Our quantum point contact single-nucleotide conductance sequencing (QPICS) method uses combed and electrostatically bound single DNA and RNA nucleotides on a self-assembled monolayer of cysteamine molecules. We demonstrate that by varying the applied bias and pH conditions, molecular conductance can be switched ON and OFF, leading to reversible nucleotide perturbation for electronic recognition (NPER). We utilize NPER as a method to achieve >99.7% accuracy for DNA and RNA base calling at low molecular coverage (∼12×) using unbiased single measurements on DNA/RNA nucleotides, which represents a significant advance compared to existing sequencing methods. These results demonstrate the potential for utilizing simple surface modifications and existing biochemical moieties in individual nucleobases for a reliable, direct, single-molecule, nanoelectronic DNA and RNA nucleotide identification method for sequencing.

  17. 5 CFR 531.402 - Employee coverage.

    Science.gov (United States)

    2010-01-01

    ... 5 Administrative Personnel 1 2010-01-01 2010-01-01 false Employee coverage. 531.402 Section 531... GENERAL SCHEDULE Within-Grade Increases § 531.402 Employee coverage. (a) Except as provided in paragraph (b) of this section, this subpart applies to employees who— (1) Are classified and paid under the...

  18. TU-H-CAMPUS-JeP3-05: Adaptive Determination of Needle Sequence HDR Prostate Brachytherapy with Divergent Needle-By-Needle Delivery

    International Nuclear Information System (INIS)

    Borot de Battisti, M; Maenhout, M; Lagendijk, J J W; Van Vulpen, M; Moerland, M A; Denis de Senneville, B; Hautvast, G; Binnekamp, D

    2016-01-01

    Purpose: To develop a new method which adaptively determines the optimal needle insertion sequence for HDR prostate brachytherapy involving divergent needle-by-needle dose delivery by e.g. a robotic device. A needle insertion sequence is calculated at the beginning of the intervention and updated after each needle insertion with feedback on needle positioning errors. Methods: Needle positioning errors and anatomy changes may occur during HDR brachytherapy which can lead to errors in the delivered dose. A novel strategy was developed to calculate and update the needle sequence and the dose plan after each needle insertion with feedback on needle positioning errors. The dose plan optimization was performed by numerical simulations. The proposed needle sequence determination optimizes the final dose distribution based on the dose coverage impact of each needle. This impact is predicted stochastically by needle insertion simulations. HDR procedures were simulated with varying number of needle insertions (4 to 12) using 11 patient MR data-sets with PTV, prostate, urethra, bladder and rectum delineated. Needle positioning errors were modeled by random normally distributed angulation errors (standard deviation of 3 mm at the needle’s tip). The final dose parameters were compared in the situations where the needle with the largest vs. the smallest dose coverage impact was selected at each insertion. Results: Over all scenarios, the percentage of clinically acceptable final dose distribution improved when the needle selected had the largest dose coverage impact (91%) compared to the smallest (88%). The differences were larger for few (4 to 6) needle insertions (maximum difference scenario: 79% vs. 60%). The computation time of the needle sequence optimization was below 60s. Conclusion: A new adaptive needle sequence determination for HDR prostate brachytherapy was developed. Coupled to adaptive planning, the selection of the needle with the largest dose coverage impact

  19. Combined Targeted DNA Sequencing in Non-Small Cell Lung Cancer (NSCLC Using UNCseq and NGScopy, and RNA Sequencing Using UNCqeR for the Detection of Genetic Aberrations in NSCLC.

    Directory of Open Access Journals (Sweden)

    Xiaobei Zhao

    Full Text Available The recent FDA approval of the MiSeqDx platform provides a unique opportunity to develop targeted next generation sequencing (NGS panels for human disease, including cancer. We have developed a scalable, targeted panel-based assay termed UNCseq, which involves a NGS panel of over 200 cancer-associated genes and a standardized downstream bioinformatics pipeline for detection of single nucleotide variations (SNV as well as small insertions and deletions (indel. In addition, we developed a novel algorithm, NGScopy, designed for samples with sparse sequencing coverage to detect large-scale copy number variations (CNV, similar to human SNP Array 6.0 as well as small-scale intragenic CNV. Overall, we applied this assay to 100 snap-frozen lung cancer specimens lacking same-patient germline DNA (07-0120 tissue cohort and validated our results against Sanger sequencing, SNP Array, and our recently published integrated DNA-seq/RNA-seq assay, UNCqeR, where RNA-seq of same-patient tumor specimens confirmed SNV detected by DNA-seq, if RNA-seq coverage depth was adequate. In addition, we applied the UNCseq assay on an independent lung cancer tumor tissue collection with available same-patient germline DNA (11-1115 tissue cohort and confirmed mutations using assays performed in a CLIA-certified laboratory. We conclude that UNCseq can identify SNV, indel, and CNV in tumor specimens lacking germline DNA in a cost-efficient fashion.

  20. An analysis of expressed sequence tags of developing castor endosperm using a full-length cDNA library

    Directory of Open Access Journals (Sweden)

    Wallis James G

    2007-07-01

    Full Text Available Abstract Background Castor seeds are a major source for ricinoleate, an important industrial raw material. Genomics studies of castor plant will provide critical information for understanding seed metabolism, for effectively engineering ricinoleate production in transgenic oilseeds, or for genetically improving castor plants by eliminating toxic and allergic proteins in seeds. Results Full-length cDNAs are useful resources in annotating genes and in providing functional analysis of genes and their products. We constructed a full-length cDNA library from developing castor endosperm, and obtained 4,720 ESTs from 5'-ends of the cDNA clones representing 1,908 unique sequences. The most abundant transcripts are genes encoding storage proteins, ricin, agglutinin and oleosins. Several other sequences are also very numerous, including two acidic triacylglycerol lipases, and the oleate hydroxylase (FAH12 gene that is responsible for ricinoleate biosynthesis. The role(s of the lipases in developing castor seeds are not clear, and co-expressing of a lipase and the FAH12 did not result in significant changes in hydroxy fatty acid accumulation in transgenic Arabidopsis seeds. Only one oleate desaturase (FAD2 gene was identified in our cDNA sequences. Sequence and functional analyses of the castor FAD2 were carried out since it had not been characterized previously. Overexpression of castor FAD2 in a FAH12-expressing Arabidopsis line resulted in decreased accumulation of hydroxy fatty acids in transgenic seeds. Conclusion Our results suggest that transcriptional regulation of FAD2 and FAH12 genes maybe one of the mechanisms that contribute to a high level of ricinoleate accumulation in castor endosperm. The full-length cDNA library will be used to search for additional genes that affect ricinoleate accumulation in seed oils. Our EST sequences will also be useful to annotate the castor genome, which whole sequence is being generated by shotgun sequencing at

  1. 42 CFR 435.350 - Coverage for certain aliens.

    Science.gov (United States)

    2010-10-01

    ... 42 Public Health 4 2010-10-01 2010-10-01 false Coverage for certain aliens. 435.350 Section 435... ISLANDS, AND AMERICAN SAMOA Optional Coverage of the Medically Needy § 435.350 Coverage for certain aliens... treatment of an emergency medical condition, as defined in § 440.255(c) of this chapter, to those aliens...

  2. The whole genome sequences and experimentally phased haplotypes of over 100 personal genomes.

    Science.gov (United States)

    Mao, Qing; Ciotlos, Serban; Zhang, Rebecca Yu; Ball, Madeleine P; Chin, Robert; Carnevali, Paolo; Barua, Nina; Nguyen, Staci; Agarwal, Misha R; Clegg, Tom; Connelly, Abram; Vandewege, Ward; Zaranek, Alexander Wait; Estep, Preston W; Church, George M; Drmanac, Radoje; Peters, Brock A

    2016-10-11

    Since the completion of the Human Genome Project in 2003, it is estimated that more than 200,000 individual whole human genomes have been sequenced. A stunning accomplishment in such a short period of time. However, most of these were sequenced without experimental haplotype data and are therefore missing an important aspect of genome biology. In addition, much of the genomic data is not available to the public and lacks phenotypic information. As part of the Personal Genome Project, blood samples from 184 participants were collected and processed using Complete Genomics' Long Fragment Read technology. Here, we present the experimental whole genome haplotyping and sequencing of these samples to an average read coverage depth of 100X. This is approximately three-fold higher than the read coverage applied to most whole human genome assemblies and ensures the highest quality results. Currently, 114 genomes from this dataset are freely available in the GigaDB repository and are associated with rich phenotypic data; the remaining 70 should be added in the near future as they are approved through the PGP data release process. For reproducibility analyses, 20 genomes were sequenced at least twice using independent LFR barcoded libraries. Seven genomes were also sequenced using Complete Genomics' standard non-barcoded library process. In addition, we report 2.6 million high-quality, rare variants not previously identified in the Single Nucleotide Polymorphisms database or the 1000 Genomes Project Phase 3 data. These genomes represent a unique source of haplotype and phenotype data for the scientific community and should help to expand our understanding of human genome evolution and function.

  3. Evaluation of two main RNA-seq approaches for gene quantification in clinical RNA sequencing: polyA+ selection versus rRNA depletion.

    Science.gov (United States)

    Zhao, Shanrong; Zhang, Ying; Gamini, Ramya; Zhang, Baohong; von Schack, David

    2018-03-19

    To allow efficient transcript/gene detection, highly abundant ribosomal RNAs (rRNA) are generally removed from total RNA either by positive polyA+ selection or by rRNA depletion (negative selection) before sequencing. Comparisons between the two methods have been carried out by various groups, but the assessments have relied largely on non-clinical samples. In this study, we evaluated these two RNA sequencing approaches using human blood and colon tissue samples. Our analyses showed that rRNA depletion captured more unique transcriptome features, whereas polyA+ selection outperformed rRNA depletion with higher exonic coverage and better accuracy of gene quantification. For blood- and colon-derived RNAs, we found that 220% and 50% more reads, respectively, would have to be sequenced to achieve the same level of exonic coverage in the rRNA depletion method compared with the polyA+ selection method. Therefore, in most cases we strongly recommend polyA+ selection over rRNA depletion for gene quantification in clinical RNA sequencing. Our evaluation revealed that a small number of lncRNAs and small RNAs made up a large fraction of the reads in the rRNA depletion RNA sequencing data. Thus, we recommend that these RNAs are specifically depleted to improve the sequencing depth of the remaining RNAs.

  4. Terrorism and nuclear damage coverage

    International Nuclear Information System (INIS)

    Horbach, N. L. J. T.; Brown, O. F.; Vanden Borre, T.

    2004-01-01

    This paper deals with nuclear terrorism and the manner in which nuclear operators can insure themselves against it, based on the international nuclear liability conventions. It concludes that terrorism is currently not covered under the treaty exoneration provisions on 'war-like events' based on an analysis of the concept on 'terrorism' and travaux preparatoires. Consequently, operators remain liable for nuclear damage resulting from terrorist acts, for which mandatory insurance is applicable. Since nuclear insurance industry looks at excluding such insurance coverage from their policies in the near future, this article aims to suggest alternative means for insurance, in order to ensure adequate compensation for innocent victims. The September 11, 2001 attacks at the World Trade Center in New York City and the Pentagon in Washington, DC resulted in the largest loss in the history of insurance, inevitably leading to concerns about nuclear damage coverage, should future such assaults target a nuclear power plant or other nuclear installation. Since the attacks, some insurers have signalled their intentions to exclude coverage for terrorism from their nuclear liability and property insurance policies. Other insurers are maintaining coverage for terrorism, but are establishing aggregate limits or sublimits and are increasing premiums. Additional changes by insurers are likely to occur. Highlighted by the September 11th events, and most recently by those in Madrid on 11 March 2004, are questions about how to define acts of terrorism and the extent to which such are covered under the international nuclear liability conventions and various domestic nuclear liability laws. Of particular concern to insurers is the possibility of coordinated simultaneous attacks on multiple nuclear facilities. This paper provides a survey of the issues, and recommendations for future clarifications and coverage options.(author)

  5. High-coverage sequencing and annotated assembly of the genome of the Australian dragon lizard Pogona vitticeps.

    Science.gov (United States)

    Georges, Arthur; Li, Qiye; Lian, Jinmin; O'Meally, Denis; Deakin, Janine; Wang, Zongji; Zhang, Pei; Fujita, Matthew; Patel, Hardip R; Holleley, Clare E; Zhou, Yang; Zhang, Xiuwen; Matsubara, Kazumi; Waters, Paul; Graves, Jennifer A Marshall; Sarre, Stephen D; Zhang, Guojie

    2015-01-01

    The lizards of the family Agamidae are one of the most prominent elements of the Australian reptile fauna. Here, we present a genomic resource built on the basis of a wild-caught male ZZ central bearded dragon Pogona vitticeps. The genomic sequence for P. vitticeps, generated on the Illumina HiSeq 2000 platform, comprised 317 Gbp (179X raw read depth) from 13 insert libraries ranging from 250 bp to 40 kbp. After filtering for low-quality and duplicated reads, 146 Gbp of data (83X) was available for assembly. Exceptionally high levels of heterozygosity (0.85 % of single nucleotide polymorphisms plus sequence insertions or deletions) complicated assembly; nevertheless, 96.4 % of reads mapped back to the assembled scaffolds, indicating that the assembly included most of the sequenced genome. Length of the assembly was 1.8 Gbp in 545,310 scaffolds (69,852 longer than 300 bp), the longest being 14.68 Mbp. N50 was 2.29 Mbp. Genes were annotated on the basis of de novo prediction, similarity to the green anole Anolis carolinensis, Gallus gallus and Homo sapiens proteins, and P. vitticeps transcriptome sequence assemblies, to yield 19,406 protein-coding genes in the assembly, 63 % of which had intact open reading frames. Our assembly captured 99 % (246 of 248) of core CEGMA genes, with 93 % (231) being complete. The quality of the P. vitticeps assembly is comparable or superior to that of other published squamate genomes, and the annotated P. vitticeps genome can be accessed through a genome browser available at https://genomics.canberra.edu.au.

  6. ISRNA: an integrative online toolkit for short reads from high-throughput sequencing data.

    Science.gov (United States)

    Luo, Guan-Zheng; Yang, Wei; Ma, Ying-Ke; Wang, Xiu-Jie

    2014-02-01

    Integrative Short Reads NAvigator (ISRNA) is an online toolkit for analyzing high-throughput small RNA sequencing data. Besides the high-speed genome mapping function, ISRNA provides statistics for genomic location, length distribution and nucleotide composition bias analysis of sequence reads. Number of reads mapped to known microRNAs and other classes of short non-coding RNAs, coverage of short reads on genes, expression abundance of sequence reads as well as some other analysis functions are also supported. The versatile search functions enable users to select sequence reads according to their sub-sequences, expression abundance, genomic location, relationship to genes, etc. A specialized genome browser is integrated to visualize the genomic distribution of short reads. ISRNA also supports management and comparison among multiple datasets. ISRNA is implemented in Java/C++/Perl/MySQL and can be freely accessed at http://omicslab.genetics.ac.cn/ISRNA/.

  7. 42 CFR 436.330 - Coverage for certain aliens.

    Science.gov (United States)

    2010-10-01

    ... 42 Public Health 4 2010-10-01 2010-10-01 false Coverage for certain aliens. 436.330 Section 436... Coverage of the Medically Needy § 436.330 Coverage for certain aliens. If an agency provides Medicaid to... condition, as defined in § 440.255(c) of this chapter to those aliens described in § 436.406(c) of this...

  8. Direct comparisons of Illumina vs. Roche 454 sequencing technologies on the same microbial community DNA sample.

    Directory of Open Access Journals (Sweden)

    Chengwei Luo

    Full Text Available Next-generation sequencing (NGS is commonly used in metagenomic studies of complex microbial communities but whether or not different NGS platforms recover the same diversity from a sample and their assembled sequences are of comparable quality remain unclear. We compared the two most frequently used platforms, the Roche 454 FLX Titanium and the Illumina Genome Analyzer (GA II, on the same DNA sample obtained from a complex freshwater planktonic community. Despite the substantial differences in read length and sequencing protocols, the platforms provided a comparable view of the community sampled. For instance, derived assemblies overlapped in ~90% of their total sequences and in situ abundances of genes and genotypes (estimated based on sequence coverage correlated highly between the two platforms (R(2>0.9. Evaluation of base-call error, frameshift frequency, and contig length suggested that Illumina offered equivalent, if not better, assemblies than Roche 454. The results from metagenomic samples were further validated against DNA samples of eighteen isolate genomes, which showed a range of genome sizes and G+C% content. We also provide quantitative estimates of the errors in gene and contig sequences assembled from datasets characterized by different levels of complexity and G+C% content. For instance, we noted that homopolymer-associated, single-base errors affected ~1% of the protein sequences recovered in Illumina contigs of 10× coverage and 50% G+C; this frequency increased to ~3% when non-homopolymer errors were also considered. Collectively, our results should serve as a useful practical guide for choosing proper sampling strategies and data possessing protocols for future metagenomic studies.

  9. Lula VS. Larry Rohter: Misconceptions in international coverage

    Directory of Open Access Journals (Sweden)

    Heloiza Golbspan Herckovitz

    2007-06-01

    Full Text Available This article discusses the confl ict between the New York Times foreign correspondent Larry Rohter and Brazil’s President Luis Inácio Lula da Silva over a story published by the American newspaper on May 9, 2004 accusing the President of being a drunkard. Larry Rohter’s piece was criticized for its lack of facts and of reliable sources, and for its ironic overtone. President Lula was criticized for cancelling the journalist’s visa, a measure later revoked because of public pressure. The case exemplifi es a well-know sequence of misconceptions and stereotypes from both sides (the world’s most prestigious newspaper and the president of the largest country in Latin America, which brings to light a much needed discussion on the quality of international news coverage, press freedom and social responsibility. This article also attempts to advance the discussion on how framing – second level agenda-setting —may infl uence how we think about foreign political leaders.

  10. [Coverage by health insurance or discount cards: a household survey in the coverage area of the Family Health Strategy].

    Science.gov (United States)

    Fontenelle, Leonardo Ferreira; Camargo, Maria Beatriz Junqueira de; Bertoldi, Andréa Dâmaso; Gonçalves, Helen; Maciel, Ethel Leonor Noia; Barros, Aluísio J D

    2017-10-26

    This study was designed to assess the reasons for health insurance coverage in a population covered by the Family Health Strategy in Brazil. We describe overall health insurance coverage and according to types, and analyze its association with health-related and socio-demographic characteristics. Among the 31.3% of persons (95%CI: 23.8-39.9) who reported "health insurance" coverage, 57.0% (95%CI: 45.2-68.0) were covered only by discount cards, which do not offer any kind of coverage for medical care, but only discounts in pharmacies, clinics, and hospitals. Both for health insurance and discount cards, the most frequently cited reasons for such coverage were "to be on the safe side" and "to receive better care". Both types of coverage were associated statistically with age (+65 vs. 15-24 years: adjusted odds ratios, aOR = 2.98, 95%CI: 1.28-6.90; and aOR = 3.67; 95%CI: 2.22-6.07, respectively) and socioeconomic status (additional standard deviation: aOR = 2.25, 95%CI: 1.62-3.14; and aOR = 1.96, 95%CI: 1.34-2.97). In addition, health insurance coverage was associated with schooling (aOR = 7.59, 95%CI: 4.44-13.00) for complete University Education and aOR = 3.74 (95%CI: 1.61-8.68) for complete Secondary Education, compared to less than complete Primary Education. Meanwhile, neither health insurance nor discount card was associated with health status or number of diagnosed diseases. In conclusion, studies that aim to assess private health insurance should be planned to distinguish between discount cards and formal health insurance.

  11. Generation of artificial FASTQ files to evaluate the performance of next-generation sequencing pipelines.

    Directory of Open Access Journals (Sweden)

    Matthew Frampton

    Full Text Available Pipelines for the analysis of Next-Generation Sequencing (NGS data are generally composed of a set of different publicly available software, configured together in order to map short reads of a genome and call variants. The fidelity of pipelines is variable. We have developed ArtificialFastqGenerator, which takes a reference genome sequence as input and outputs artificial paired-end FASTQ files containing Phred quality scores. Since these artificial FASTQs are derived from the reference genome, it provides a gold-standard for read-alignment and variant-calling, thereby enabling the performance of any NGS pipeline to be evaluated. The user can customise DNA template/read length, the modelling of coverage based on GC content, whether to use real Phred base quality scores taken from existing FASTQ files, and whether to simulate sequencing errors. Detailed coverage and error summary statistics are outputted. Here we describe ArtificialFastqGenerator and illustrate its implementation in evaluating a typical bespoke NGS analysis pipeline under different experimental conditions. ArtificialFastqGenerator was released in January 2012. Source code, example files and binaries are freely available under the terms of the GNU General Public License v3.0. from https://sourceforge.net/projects/artfastqgen/.

  12. An optimized field coverage planning approach for navigation of agricultural robots in fields involving obstacle areas

    DEFF Research Database (Denmark)

    Hameed, Ibahim; Bochtis, D.; Sørensen, C.A.

    2013-01-01

    -field obstacle areas, the headland paths generation for the field and each obstacle area, the implementation of a genetic algorithm to optimize the sequence that the field robot vehicle will follow to visit the blocks, and an algorithmically generation of the task sequences derived from the farmer practices......Technological advances combined with the demand of cost efficiency and environmental considerations lead farmers to review their practices towards the adoption of new managerial approaches including enhanced automation. The application of field robots is one of the most promising advances among....... This approach has proven that it is possible to capture the practices of farmers and embed these practices in an algorithmic description providing a complete field area coverage plan in a form prepared for execution by the navigation system of a field robot....

  13. Mobile-robot navigation with complete coverage of unstructured environments

    OpenAIRE

    García Armada, Elena; González de Santos, Pablo

    2004-01-01

    There are some mobile-robot applications that require the complete coverage of an unstructured environment. Examples are humanitarian de-mining and floor-cleaning tasks. A complete-coverage algorithm is then used, a path-planning technique that allows the robot to pass over all points in the environment, avoiding unknown obstacles. Different coverage algorithms exist, but they fail working in unstructured environments. This paper details a complete-coverage algorithm for unstructured environm...

  14. Complete sequencing of the bla(NDM-1)-positive IncA/C plasmid from Escherichia coli ST38 isolate suggests a possible origin from plant pathogens.

    Science.gov (United States)

    Sekizuka, Tsuyoshi; Matsui, Mari; Yamane, Kunikazu; Takeuchi, Fumihiko; Ohnishi, Makoto; Hishinuma, Akira; Arakawa, Yoshichika; Kuroda, Makoto

    2011-01-01

    The complete sequence of the plasmid pNDM-1_Dok01 carrying New Delhi metallo-β-lactamase (NDM-1) was determined by whole genome shotgun sequencing using Escherichia coli strain NDM-1_Dok01 (multilocus sequence typing type: ST38) and the transconjugant E. coli DH10B. The plasmid is an IncA/C incompatibility type composed of 225 predicted coding sequences in 195.5 kb and partially shares a sequence with bla(CMY-2)-positive IncA/C plasmids such as E. coli AR060302 pAR060302 (166.5 kb) and Salmonella enterica serovar Newport pSN254 (176.4 kb). The bla(NDM-1) gene in pNDM-1_Dok01 is terminally flanked by two IS903 elements that are distinct from those of the other characterized NDM-1 plasmids, suggesting that the bla(NDM-1) gene has been broadly transposed, together with various mobile elements, as a cassette gene. The chaperonin groES and groEL genes were identified in the bla(NDM-1)-related composite transposon, and phylogenetic analysis and guanine-cytosine content (GC) percentage showed similarities to the homologs of plant pathogens such as Pseudoxanthomonas and Xanthomonas spp., implying that plant pathogens are the potential source of the bla(NDM-1) gene. The complete sequence of pNDM-1_Dok01 suggests that the bla(NDM-1) gene was acquired by a novel composite transposon on an extensively disseminated IncA/C plasmid and transferred to the E. coli ST38 isolate.

  15. 76 FR 7767 - Student Health Insurance Coverage

    Science.gov (United States)

    2011-02-11

    ... Student Health Insurance Coverage AGENCY: Centers for Medicare & Medicaid Services (CMS), HHS. ACTION... health insurance coverage under the Public Health Service Act and the Affordable Care Act. The proposed rule would define ``student health insurance [[Page 7768

  16. Simultaneous genomic identification and profiling of a single cell using semiconductor-based next generation sequencing

    Directory of Open Access Journals (Sweden)

    Manabu Watanabe

    2014-09-01

    Full Text Available Combining single-cell methods and next-generation sequencing should provide a powerful means to understand single-cell biology and obviate the effects of sample heterogeneity. Here we report a single-cell identification method and seamless cancer gene profiling using semiconductor-based massively parallel sequencing. A549 cells (adenocarcinomic human alveolar basal epithelial cell line were used as a model. Single-cell capture was performed using laser capture microdissection (LCM with an Arcturus® XT system, and a captured single cell and a bulk population of A549 cells (≈106 cells were subjected to whole genome amplification (WGA. For cell identification, a multiplex PCR method (AmpliSeq™ SNP HID panel was used to enrich 136 highly discriminatory SNPs with a genotype concordance probability of 1031–35. For cancer gene profiling, we used mutation profiling that was performed in parallel using a hotspot panel for 50 cancer-related genes. Sequencing was performed using a semiconductor-based bench top sequencer. The distribution of sequence reads for both HID and Cancer panel amplicons was consistent across these samples. For the bulk population of cells, the percentages of sequence covered at coverage of more than 100× were 99.04% for the HID panel and 98.83% for the Cancer panel, while for the single cell percentages of sequence covered at coverage of more than 100× were 55.93% for the HID panel and 65.96% for the Cancer panel. Partial amplification failure or randomly distributed non-amplified regions across samples from single cells during the WGA procedures or random allele drop out probably caused these differences. However, comparative analyses showed that this method successfully discriminated a single A549 cancer cell from a bulk population of A549 cells. Thus, our approach provides a powerful means to overcome tumor sample heterogeneity when searching for somatic mutations.

  17. The characterization of twenty sequenced human genomes.

    Directory of Open Access Journals (Sweden)

    Kimberly Pelak

    2010-09-01

    Full Text Available We present the analysis of twenty human genomes to evaluate the prospects for identifying rare functional variants that contribute to a phenotype of interest. We sequenced at high coverage ten "case" genomes from individuals with severe hemophilia A and ten "control" genomes. We summarize the number of genetic variants emerging from a study of this magnitude, and provide a proof of concept for the identification of rare and highly-penetrant functional variants by confirming that the cause of hemophilia A is easily recognizable in this data set. We also show that the number of novel single nucleotide variants (SNVs discovered per genome seems to stabilize at about 144,000 new variants per genome, after the first 15 individuals have been sequenced. Finally, we find that, on average, each genome carries 165 homozygous protein-truncating or stop loss variants in genes representing a diverse set of pathways.

  18. Media Coverage of Nuclear Energy after Fukushima

    International Nuclear Information System (INIS)

    Oltra, C.; Roman, P.; Prades, A.

    2013-01-01

    This report presents the main findings of a content analysis of printed media coverage of nuclear energy in Spain before and after the Fukushima accident. Our main objective is to understand the changes in the presentation of nuclear fission and nuclear fusion as a result of the accident in Japan. We specifically analyze the volume of coverage and thematic content in the media coverage for nuclear fusion from a sample of Spanish print articles in more than 20 newspapers from 2008 to 2012. We also analyze the media coverage of nuclear energy (fission) in three main Spanish newspapers one year before and one year after the accident. The results illustrate how the media contributed to the presentation of nuclear power in the months before and after the accident. This could have implications for the public understanding of nuclear power. (Author)

  19. Media Coverage of Nuclear Energy after Fukushima

    Energy Technology Data Exchange (ETDEWEB)

    Oltra, C.; Roman, P.; Prades, A.

    2013-07-01

    This report presents the main findings of a content analysis of printed media coverage of nuclear energy in Spain before and after the Fukushima accident. Our main objective is to understand the changes in the presentation of nuclear fission and nuclear fusion as a result of the accident in Japan. We specifically analyze the volume of coverage and thematic content in the media coverage for nuclear fusion from a sample of Spanish print articles in more than 20 newspapers from 2008 to 2012. We also analyze the media coverage of nuclear energy (fission) in three main Spanish newspapers one year before and one year after the accident. The results illustrate how the media contributed to the presentation of nuclear power in the months before and after the accident. This could have implications for the public understanding of nuclear power. (Author)

  20. MEETING: Chlamydomonas Annotation Jamboree - October 2003

    Energy Technology Data Exchange (ETDEWEB)

    Grossman, Arthur R

    2007-04-13

    Shotgun sequencing of the nuclear genome of Chlamydomonas reinhardtii (Chlamydomonas throughout) was performed at an approximate 10X coverage by JGI. Roughly half of the genome is now contained on 26 scaffolds, all of which are at least 1.6 Mb, and the coverage of the genome is ~95%. There are now over 200,000 cDNA sequence reads that we have generated as part of the Chlamydomonas genome project (Grossman, 2003; Shrager et al., 2003; Grossman et al. 2007; Merchant et al., 2007); other sequences have also been generated by the Kasuza sequence group (Asamizu et al., 1999; Asamizu et al., 2000) or individual laboratories that have focused on specific genes. Shrager et al. (2003) placed the reads into distinct contigs (an assemblage of reads with overlapping nucleotide sequences), and contigs that group together as part of the same genes have been designated ACEs (assembly of contigs generated from EST information). All of the reads have also been mapped to the Chlamydomonas nuclear genome and the cDNAs and their corresponding genomic sequences have been reassembled, and the resulting assemblage is called an ACEG (an Assembly of contiguous EST sequences supported by genomic sequence) (Jain et al., 2007). Most of the unique genes or ACEGs are also represented by gene models that have been generated by the Joint Genome Institute (JGI, Walnut Creek, CA). These gene models have been placed onto the DNA scaffolds and are presented as a track on the Chlamydomonas genome browser associated with the genome portal (http://genome.jgi-psf.org/Chlre3/Chlre3.home.html). Ultimately, the meeting grant awarded by DOE has helped enormously in the development of an annotation pipeline (a set of guidelines used in the annotation of genes) and resulted in high quality annotation of over 4,000 genes; the annotators were from both Europe and the USA. Some of the people who led the annotation initiative were Arthur Grossman, Olivier Vallon, and Sabeeha Merchant (with many individual

  1. State contraceptive coverage laws: creative responses to questions of "conscience".

    Science.gov (United States)

    Dailard, C

    1999-08-01

    The Federal Employees Health Benefits Program (FEHBP) guaranteed contraceptive coverage for employees of the federal government. However, opponents of the FEHBP contraceptive coverage questioned the viability of the conscience clause. Supporters of the contraceptive coverage pressed for the narrowest exemption, one that only permit religious plans that clearly states religious objection to contraception. There are six of the nine states that have enacted contraceptive coverage laws aimed at the private sector. The statutes included a provision of conscience clause. The private sector disagrees to the plan since almost all of the employees¿ work for employers who only offer one plan. The scope of exemption for employers was an issue in five states that have enacted the contraceptive coverage. In Hawaii and California, it was exemplified that if employers are exempted from the contraceptive coverage based on religious grounds, an employee will be entitled to purchase coverage directly from the plan. There are still questions on how an insurer, who objects based on religious grounds to a plan with contraceptive coverage, can function in a marketplace where such coverage is provided by most private sector employers.

  2. A Bioinformatician's Guide to Metagenomics

    Energy Technology Data Exchange (ETDEWEB)

    Kunin, Victor; Copeland, Alex; Lapidus, Alla; Mavromatis, Konstantinos; Hugenholtz, Philip

    2008-08-01

    As random shotgun metagenomic projects proliferate and become the dominant source of publicly available sequence data, procedures for best practices in their execution and analysis become increasingly important. Based on our experience at the Joint Genome Institute, we describe step-by-step the chain of decisions accompanying a metagenomic project from the viewpoint of a bioinformatician. We guide the reader through a standard workflow for a metagenomic project beginning with pre-sequencing considerations such as community composition and sequence data type that will greatly influence downstream analyses. We proceed with recommendations for sampling and data generation including sample and metadata collection, community profiling, construction of shotgun libraries and sequencing strategies. We then discuss the application of generic sequence processing steps (read preprocessing, assembly, and gene prediction and annotation) to metagenomic datasets by contrast to genome projects. Different types of data analyses particular to metagenomes are then presented including binning, dominant population analysis and gene-centric analysis. Finally data management systems and issues are presented and discussed. We hope that this review will assist bioinformaticians and biologists in making better-informed decisions on their journey during a metagenomic project.

  3. Massively parallel whole genome amplification for single-cell sequencing using droplet microfluidics.

    Science.gov (United States)

    Hosokawa, Masahito; Nishikawa, Yohei; Kogawa, Masato; Takeyama, Haruko

    2017-07-12

    Massively parallel single-cell genome sequencing is required to further understand genetic diversities in complex biological systems. Whole genome amplification (WGA) is the first step for single-cell sequencing, but its throughput and accuracy are insufficient in conventional reaction platforms. Here, we introduce single droplet multiple displacement amplification (sd-MDA), a method that enables massively parallel amplification of single cell genomes while maintaining sequence accuracy and specificity. Tens of thousands of single cells are compartmentalized in millions of picoliter droplets and then subjected to lysis and WGA by passive droplet fusion in microfluidic channels. Because single cells are isolated in compartments, their genomes are amplified to saturation without contamination. This enables the high-throughput acquisition of contamination-free and cell specific sequence reads from single cells (21,000 single-cells/h), resulting in enhancement of the sequence data quality compared to conventional methods. This method allowed WGA of both single bacterial cells and human cancer cells. The obtained sequencing coverage rivals those of conventional techniques with superior sequence quality. In addition, we also demonstrate de novo assembly of uncultured soil bacteria and obtain draft genomes from single cell sequencing. This sd-MDA is promising for flexible and scalable use in single-cell sequencing.

  4. CDMA coverage under mobile heterogeneous network load

    NARCIS (Netherlands)

    Saban, D.; van den Berg, Hans Leo; Boucherie, Richardus J.; Endrayanto, A.I.

    2002-01-01

    We analytically investigate coverage (determined by the uplink) under non-homogeneous and moving traffic load of third generation UMTS mobile networks. In particular, for different call assignment policies, we investigate cell breathing and the movement of the coverage gap occurring between cells

  5. 20 CFR 404.1913 - Precluding dual coverage.

    Science.gov (United States)

    2010-04-01

    ... precluding dual coverage to avoid inequitable or anomalous coverage situations for certain workers. However... 404.1913 Employees' Benefits SOCIAL SECURITY ADMINISTRATION FEDERAL OLD-AGE, SURVIVORS AND DISABILITY...) General. Employment or self-employment or services recognized as equivalent under the Act or the social...

  6. Validation of rice genome sequence by optical mapping

    Directory of Open Access Journals (Sweden)

    Pape Louise

    2007-08-01

    Full Text Available Abstract Background Rice feeds much of the world, and possesses the simplest genome analyzed to date within the grass family, making it an economically relevant model system for other cereal crops. Although the rice genome is sequenced, validation and gap closing efforts require purely independent means for accurate finishing of sequence build data. Results To facilitate ongoing sequencing finishing and validation efforts, we have constructed a whole-genome SwaI optical restriction map of the rice genome. The physical map consists of 14 contigs, covering 12 chromosomes, with a total genome size of 382.17 Mb; this value is about 11% smaller than original estimates. 9 of the 14 optical map contigs are without gaps, covering chromosomes 1, 2, 3, 4, 5, 7, 8 10, and 12 in their entirety – including centromeres and telomeres. Alignments between optical and in silico restriction maps constructed from IRGSP (International Rice Genome Sequencing Project and TIGR (The Institute for Genomic Research genome sequence sources are comprehensive and informative, evidenced by map coverage across virtually all published gaps, discovery of new ones, and characterization of sequence misassemblies; all totalling ~14 Mb. Furthermore, since optical maps are ordered restriction maps, identified discordances are pinpointed on a reliable physical scaffold providing an independent resource for closure of gaps and rectification of misassemblies. Conclusion Analysis of sequence and optical mapping data effectively validates genome sequence assemblies constructed from large, repeat-rich genomes. Given this conclusion we envision new applications of such single molecule analysis that will merge advantages offered by high-resolution optical maps with inexpensive, but short sequence reads generated by emerging sequencing platforms. Lastly, map construction techniques presented here points the way to new types of comparative genome analysis that would focus on discernment of

  7. Socio-economic inequality in oral healthcare coverage

    DEFF Research Database (Denmark)

    Hosseinpoor, A R; Itani, L; Petersen, P E

    2012-01-01

    wealth quintiles in each country, a wealth-based relative index of inequality was used to measure socio-economic inequality. The index was adjusted for sex, age, marital status, education, employment, overall health status, and urban/rural residence. Pro-rich inequality in oral healthcare coverage......The objective of this study was to assess socio-economic inequality in oral healthcare coverage among adults with expressed need living in 52 countries. Data on 60,332 adults aged 18 years or older were analyzed from 52 countries participating in the 2002-2004 World Health Survey. Oral healthcare...... coverage was defined as the proportion of individuals who received any medical care from a dentist or other oral health specialist during a period of 12 months prior to the survey, among those who expressed any mouth and/or teeth problems during that period. In addition to assessment of the coverage across...

  8. Scalable Coverage Maintenance for Dense Wireless Sensor Networks

    Directory of Open Access Journals (Sweden)

    Jun Lu

    2007-06-01

    Full Text Available Owing to numerous potential applications, wireless sensor networks have been attracting significant research effort recently. The critical challenge that wireless sensor networks often face is to sustain long-term operation on limited battery energy. Coverage maintenance schemes can effectively prolong network lifetime by selecting and employing a subset of sensors in the network to provide sufficient sensing coverage over a target region. We envision future wireless sensor networks composed of a vast number of miniaturized sensors in exceedingly high density. Therefore, the key issue of coverage maintenance for future sensor networks is the scalability to sensor deployment density. In this paper, we propose a novel coverage maintenance scheme, scalable coverage maintenance (SCOM, which is scalable to sensor deployment density in terms of communication overhead (i.e., number of transmitted and received beacons and computational complexity (i.e., time and space complexity. In addition, SCOM achieves high energy efficiency and load balancing over different sensors. We have validated our claims through both analysis and simulations.

  9. Inequity between male and female coverage in state infertility laws.

    Science.gov (United States)

    Dupree, James M; Dickey, Ryan M; Lipshultz, Larry I

    2016-06-01

    To analyze state insurance laws mandating coverage for male factor infertility and identify possible inequities between male and female coverage in state insurance laws. We identified states with laws or codes related to infertility insurance coverage using the National Conference of States Legislatures' and the National Infertility Association's websites. We performed a primary, systematic analysis of the laws or codes to specifically identify coverage for male factor infertility services. Not applicable. Not applicable. Not applicable. The presence or absence of language in state insurance laws mandating coverage for male factor infertility care. There are 15 states with laws mandating insurance coverage for female factor infertility. Only eight of those states (California, Connecticut, Massachusetts, Montana, New Jersey, New York, Ohio, and West Virginia) have mandates for male factor infertility evaluation or treatment. Insurance coverage for male factor infertility is most specific in Massachusetts, New Jersey, and New York, yet significant differences exist in the male factor policies in all eight states. Three states (Massachusetts, New Jersey, and New York) exempt coverage for vasectomy reversal. Despite national recommendations that male and female partners begin infertility evaluations together, only 8 of 15 states with laws mandating infertility coverage include coverage for the male partner. Excluding men from infertility coverage places an undue burden on female partners and risks missing opportunities to diagnose serious male health conditions, correct reversible causes of infertility, and provide cost-effective treatments that can downgrade the intensity of intervention required to achieve a pregnancy. Copyright © 2016 American Society for Reproductive Medicine. Published by Elsevier Inc. All rights reserved.

  10. Genome sequence of the oleaginous yeast Rhodotorula toruloides strain CGMCC 2.1609

    Directory of Open Access Journals (Sweden)

    Christine Sambles

    2017-09-01

    Full Text Available Most eukaryotic oleaginous species are yeasts and among them the basidiomycete red yeast, Rhodotorula (Rhodosporidium toruloides (Pucciniomycotina is known to produce high quantities of lipids when grown in nitrogen-limiting media, and has potential for biodiesel production. The genome of the CGMCC 2.1609 strain of this oleaginous red yeast was sequenced using a hybrid of Roche 454 and Illumina technology generating 13× coverage. The de novo assembly was carried out using MIRA and scaffolded using MAQ and BAMBUS. The sequencing and assembly resulted in 365 scaffolds with total genome size of 33.4 Mb. The complete genome sequence of this strain was deposited in GenBank and the accession number is LKER00000000. The annotation is available on Figshare (doi:10.6084/m9.figshare.4754251.

  11. GenBank

    OpenAIRE

    Benson, Dennis A.; Cavanaugh, Mark; Clark, Karen; Karsch-Mizrachi, Ilene; Lipman, David J.; Ostell, James; Sayers, Eric W.

    2012-01-01

    GenBank? (http://www.ncbi.nlm.nih.gov) is a comprehensive database that contains publicly available nucleotide sequences for almost 260 000 formally described species. These sequences are obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects, including whole-genome shotgun (WGS) and environmental sampling projects. Most submissions are made using the web-based BankIt or standalone Sequin programs, and GenBank staff assig...

  12. Advanced Whole-Genome Sequencing and Analysis of Fetal Genomes from Amniotic Fluid.

    Science.gov (United States)

    Mao, Qing; Chin, Robert; Xie, Weiwei; Deng, Yuqing; Zhang, Wenwei; Xu, Huixin; Zhang, Rebecca Yu; Shi, Quan; Peters, Erin E; Gulbahce, Natali; Li, Zhenyu; Chen, Fang; Drmanac, Radoje; Peters, Brock A

    2018-04-01

    Amniocentesis is a common procedure, the primary purpose of which is to collect cells from the fetus to allow testing for abnormal chromosomes, altered chromosomal copy number, or a small number of genes that have small single- to multibase defects. Here we demonstrate the feasibility of generating an accurate whole-genome sequence of a fetus from either the cellular or cell-free DNA (cfDNA) of an amniotic sample. cfDNA and DNA isolated from the cell pellet of 31 amniocenteses were sequenced to approximately 50× genome coverage by use of the Complete Genomics nanoarray platform. In a subset of the samples, long fragment read libraries were generated from DNA isolated from cells and sequenced to approximately 100× genome coverage. Concordance of variant calls between the 2 DNA sources and with parental libraries was >96%. Two fetal genomes were found to harbor potentially detrimental variants in chromodomain helicase DNA binding protein 8 ( CHD8 ) and LDL receptor-related protein 1 ( LRP1 ), variations of which have been associated with autism spectrum disorder and keratosis pilaris atrophicans, respectively. We also discovered drug sensitivities and carrier information of fetuses for a variety of diseases. We were able to elucidate the complete genome sequence of 31 fetuses from amniotic fluid and demonstrate that the cfDNA or DNA from the cell pellet can be analyzed with little difference in quality. We believe that current technologies could analyze this material in a highly accurate and complete manner and that analyses like these should be considered for addition to current amniocentesis procedures. © 2018 American Association for Clinical Chemistry.

  13. Genomic DNA Enrichment Using Sequence Capture Microarrays: a Novel Approach to Discover Sequence Nucleotide Polymorphisms (SNP) in Brassica napus L

    Science.gov (United States)

    Clarke, Wayne E.; Parkin, Isobel A.; Gajardo, Humberto A.; Gerhardt, Daniel J.; Higgins, Erin; Sidebottom, Christine; Sharpe, Andrew G.; Snowdon, Rod J.; Federico, Maria L.; Iniguez-Luy, Federico L.

    2013-01-01

    Targeted genomic selection methodologies, or sequence capture, allow for DNA enrichment and large-scale resequencing and characterization of natural genetic variation in species with complex genomes, such as rapeseed canola (Brassica napus L., AACC, 2n=38). The main goal of this project was to combine sequence capture with next generation sequencing (NGS) to discover single nucleotide polymorphisms (SNPs) in specific areas of the B. napus genome historically associated (via quantitative trait loci –QTL– analysis) to traits of agronomical and nutritional importance. A 2.1 million feature sequence capture platform was designed to interrogate DNA sequence variation across 47 specific genomic regions, representing 51.2 Mb of the Brassica A and C genomes, in ten diverse rapeseed genotypes. All ten genotypes were sequenced using the 454 Life Sciences chemistry and to assess the effect of increased sequence depth, two genotypes were also sequenced using Illumina HiSeq chemistry. As a result, 589,367 potentially useful SNPs were identified. Analysis of sequence coverage indicated a four-fold increased representation of target regions, with 57% of the filtered SNPs falling within these regions. Sixty percent of discovered SNPs corresponded to transitions while 40% were transversions. Interestingly, fifty eight percent of the SNPs were found in genic regions while 42% were found in intergenic regions. Further, a high percentage of genic SNPs was found in exons (65% and 64% for the A and C genomes, respectively). Two different genotyping assays were used to validate the discovered SNPs. Validation rates ranged from 61.5% to 84% of tested SNPs, underpinning the effectiveness of this SNP discovery approach. Most importantly, the discovered SNPs were associated with agronomically important regions of the B. napus genome generating a novel data resource for research and breeding this crop species. PMID:24312619

  14. Mediating Trust in Terrorism Coverage

    DEFF Research Database (Denmark)

    Mogensen, Kirsten

    crisis. While the framework is presented in the context of television coverage of a terror-related crisis situation, it can equally be used in connection with all other forms of mediated trust. Key words: National crisis, risk communication, crisis management, television coverage, mediated trust.......Mass mediated risk communication can contribute to perceptions of threats and fear of “others” and/or to perceptions of trust in fellow citizens and society to overcome problems. This paper outlines a cross-disciplinary holistic framework for research in mediated trust building during an acute...

  15. 28 CFR 55.5 - Coverage under section 4(f)(4).

    Science.gov (United States)

    2010-07-01

    ... THE VOTING RIGHTS ACT REGARDING LANGUAGE MINORITY GROUPS Nature of Coverage § 55.5 Coverage unde