WorldWideScience

Sample records for high-throughput amplicon sequencing

  1. High throughput 16S rRNA gene amplicon sequencing

    DEFF Research Database (Denmark)

    Nierychlo, Marta; Larsen, Poul; Jørgensen, Mads Koustrup

    S rRNA gene amplicon sequencing has been developed over the past few years and is now ready to use for more comprehensive studies related to plant operation and optimization thanks to short analysis time, low cost, high throughput, and high taxonomic resolution. In this study we show how 16S r...

  2. A priori Considerations When Conducting High-Throughput Amplicon-Based Sequence Analysis

    Directory of Open Access Journals (Sweden)

    Aditi Sengupta

    2016-03-01

    Full Text Available Amplicon-based sequencing strategies that include 16S rRNA and functional genes, alongside “meta-omics” analyses of communities of microorganisms, have allowed researchers to pose questions and find answers to “who” is present in the environment and “what” they are doing. Next-generation sequencing approaches that aid microbial ecology studies of agricultural systems are fast gaining popularity among agronomy, crop, soil, and environmental science researchers. Given the rapid development of these high-throughput sequencing techniques, researchers with no prior experience will desire information about the best practices that can be used before actually starting high-throughput amplicon-based sequence analyses. We have outlined items that need to be carefully considered in experimental design, sampling, basic bioinformatics, sequencing of mock communities and negative controls, acquisition of metadata, and in standardization of reaction conditions as per experimental requirements. Not all considerations mentioned here may pertain to a particular study. The overall goal is to inform researchers about considerations that must be taken into account when conducting high-throughput microbial DNA sequencing and sequences analysis.

  3. Comprehensive evaluation and optimization of amplicon library preparation methods for high-throughput antibody sequencing.

    Directory of Open Access Journals (Sweden)

    Ulrike Menzel

    Full Text Available High-throughput sequencing (HTS of antibody repertoire libraries has become a powerful tool in the field of systems immunology. However, numerous sources of bias in HTS workflows may affect the obtained antibody repertoire data. A crucial step in antibody library preparation is the addition of short platform-specific nucleotide adapter sequences. As of yet, the impact of the method of adapter addition on experimental library preparation and the resulting antibody repertoire HTS datasets has not been thoroughly investigated. Therefore, we compared three standard library preparation methods by performing Illumina HTS on antibody variable heavy genes from murine antibody-secreting cells. Clonal overlap and rank statistics demonstrated that the investigated methods produced equivalent HTS datasets. PCR-based methods were experimentally superior to ligation with respect to speed, efficiency, and practicality. Finally, using a two-step PCR based method we established a protocol for antibody repertoire library generation, beginning from inputs as low as 1 ng of total RNA. In summary, this study represents a major advance towards a standardized experimental framework for antibody HTS, thus opening up the potential for systems-based, cross-experiment meta-analyses of antibody repertoires.

  4. Unraveling Core Functional Microbiota in Traditional Solid-State Fermentation by High-Throughput Amplicons and Metatranscriptomics Sequencing

    Science.gov (United States)

    Song, Zhewei; Du, Hai; Zhang, Yan; Xu, Yan

    2017-01-01

    Fermentation microbiota is specific microorganisms that generate different types of metabolites in many productions. In traditional solid-state fermentation, the structural composition and functional capacity of the core microbiota determine the quality and quantity of products. As a typical example of food fermentation, Chinese Maotai-flavor liquor production involves a complex of various microorganisms and a wide variety of metabolites. However, the microbial succession and functional shift of the core microbiota in this traditional food fermentation remain unclear. Here, high-throughput amplicons (16S rRNA gene amplicon sequencing and internal transcribed space amplicon sequencing) and metatranscriptomics sequencing technologies were combined to reveal the structure and function of the core microbiota in Chinese soy sauce aroma type liquor production. In addition, ultra-performance liquid chromatography and headspace-solid phase microextraction-gas chromatography-mass spectrometry were employed to provide qualitative and quantitative analysis of the major flavor metabolites. A total of 10 fungal and 11 bacterial genera were identified as the core microbiota. In addition, metatranscriptomic analysis revealed pyruvate metabolism in yeasts (genera Pichia, Schizosaccharomyces, Saccharomyces, and Zygosaccharomyces) and lactic acid bacteria (genus Lactobacillus) classified into two stages in the production of flavor components. Stage I involved high-level alcohol (ethanol) production, with the genus Schizosaccharomyces serving as the core functional microorganism. Stage II involved high-level acid (lactic acid and acetic acid) production, with the genus Lactobacillus serving as the core functional microorganism. The functional shift from the genus Schizosaccharomyces to the genus Lactobacillus drives flavor component conversion from alcohol (ethanol) to acid (lactic acid and acetic acid) in Chinese Maotai-flavor liquor production. Our findings provide insight into the

  5. Unraveling Core Functional Microbiota in Traditional Solid-State Fermentation by High-Throughput Amplicons and Metatranscriptomics Sequencing.

    Science.gov (United States)

    Song, Zhewei; Du, Hai; Zhang, Yan; Xu, Yan

    2017-01-01

    Fermentation microbiota is specific microorganisms that generate different types of metabolites in many productions. In traditional solid-state fermentation, the structural composition and functional capacity of the core microbiota determine the quality and quantity of products. As a typical example of food fermentation, Chinese Maotai-flavor liquor production involves a complex of various microorganisms and a wide variety of metabolites. However, the microbial succession and functional shift of the core microbiota in this traditional food fermentation remain unclear. Here, high-throughput amplicons (16S rRNA gene amplicon sequencing and internal transcribed space amplicon sequencing) and metatranscriptomics sequencing technologies were combined to reveal the structure and function of the core microbiota in Chinese soy sauce aroma type liquor production. In addition, ultra-performance liquid chromatography and headspace-solid phase microextraction-gas chromatography-mass spectrometry were employed to provide qualitative and quantitative analysis of the major flavor metabolites. A total of 10 fungal and 11 bacterial genera were identified as the core microbiota. In addition, metatranscriptomic analysis revealed pyruvate metabolism in yeasts (genera Pichia, Schizosaccharomyces, Saccharomyces, and Zygosaccharomyces) and lactic acid bacteria (genus Lactobacillus) classified into two stages in the production of flavor components. Stage I involved high-level alcohol (ethanol) production, with the genus Schizosaccharomyces serving as the core functional microorganism. Stage II involved high-level acid (lactic acid and acetic acid) production, with the genus Lactobacillus serving as the core functional microorganism. The functional shift from the genus Schizosaccharomyces to the genus Lactobacillus drives flavor component conversion from alcohol (ethanol) to acid (lactic acid and acetic acid) in Chinese Maotai-flavor liquor production. Our findings provide insight into the

  6. Unraveling Core Functional Microbiota in Traditional Solid-State Fermentation by High-Throughput Amplicons and Metatranscriptomics Sequencing

    Directory of Open Access Journals (Sweden)

    Zhewei Song

    2017-07-01

    Full Text Available Fermentation microbiota is specific microorganisms that generate different types of metabolites in many productions. In traditional solid-state fermentation, the structural composition and functional capacity of the core microbiota determine the quality and quantity of products. As a typical example of food fermentation, Chinese Maotai-flavor liquor production involves a complex of various microorganisms and a wide variety of metabolites. However, the microbial succession and functional shift of the core microbiota in this traditional food fermentation remain unclear. Here, high-throughput amplicons (16S rRNA gene amplicon sequencing and internal transcribed space amplicon sequencing and metatranscriptomics sequencing technologies were combined to reveal the structure and function of the core microbiota in Chinese soy sauce aroma type liquor production. In addition, ultra-performance liquid chromatography and headspace-solid phase microextraction-gas chromatography-mass spectrometry were employed to provide qualitative and quantitative analysis of the major flavor metabolites. A total of 10 fungal and 11 bacterial genera were identified as the core microbiota. In addition, metatranscriptomic analysis revealed pyruvate metabolism in yeasts (genera Pichia, Schizosaccharomyces, Saccharomyces, and Zygosaccharomyces and lactic acid bacteria (genus Lactobacillus classified into two stages in the production of flavor components. Stage I involved high-level alcohol (ethanol production, with the genus Schizosaccharomyces serving as the core functional microorganism. Stage II involved high-level acid (lactic acid and acetic acid production, with the genus Lactobacillus serving as the core functional microorganism. The functional shift from the genus Schizosaccharomyces to the genus Lactobacillus drives flavor component conversion from alcohol (ethanol to acid (lactic acid and acetic acid in Chinese Maotai-flavor liquor production. Our findings provide

  7. Fungi Sailing the Arctic Ocean: Speciose Communities in North Atlantic Driftwood as Revealed by High-Throughput Amplicon Sequencing.

    Science.gov (United States)

    Rämä, Teppo; Davey, Marie L; Nordén, Jenni; Halvorsen, Rune; Blaalid, Rakel; Mathiassen, Geir H; Alsos, Inger G; Kauserud, Håvard

    2016-08-01

    High amounts of driftwood sail across the oceans and provide habitat for organisms tolerating the rough and saline environment. Fungi have adapted to the extremely cold and saline conditions which driftwood faces in the high north. For the first time, we applied high-throughput sequencing to fungi residing in driftwood to reveal their taxonomic richness, community composition, and ecology in the North Atlantic. Using pyrosequencing of ITS2 amplicons obtained from 49 marine logs, we found 807 fungal operational taxonomic units (OTUs) based on clustering at 97 % sequence similarity cut-off level. The phylum Ascomycota comprised 74 % of the OTUs and 20 % belonged to Basidiomycota. The richness of basidiomycetes decreased with prolonged submersion in the sea, supporting the general view of ascomycetes being more extremotolerant. However, more than one fourth of the fungal OTUs remained unassigned to any fungal class, emphasising the need for better DNA reference data from the marine habitat. Different fungal communities were detected in coniferous and deciduous logs. Our results highlight that driftwood hosts a considerably higher fungal diversity than currently known. The driftwood fungal community is not a terrestrial relic but a speciose assemblage of fungi adapted to the stressful marine environment and different kinds of wooden substrates found in it.

  8. High-throughput amplicon sequencing and stream benthic bacteria: identifying the best taxonomic level for multiple-stressor research

    Science.gov (United States)

    Salis, R. K.; Bruder, A.; Piggott, J. J.; Summerfield, T. C.; Matthaei, C. D.

    2017-03-01

    Disentangling the individual and interactive effects of multiple stressors on microbial communities is a key challenge to our understanding and management of ecosystems. Advances in molecular techniques allow studying microbial communities in situ and with high taxonomic resolution. However, the taxonomic level which provides the best trade-off between our ability to detect multiple-stressor effects versus the goal of studying entire communities remains unknown. We used outdoor mesocosms simulating small streams to investigate the effects of four agricultural stressors (nutrient enrichment, the nitrification inhibitor dicyandiamide (DCD), fine sediment and flow velocity reduction) on stream bacteria (phyla, orders, genera, and species represented by Operational Taxonomic Units with 97% sequence similarity). Community composition was assessed using amplicon sequencing (16S rRNA gene, V3-V4 region). DCD was the most pervasive stressor, affecting evenness and most abundant taxa, followed by sediment and flow velocity. Stressor pervasiveness was similar across taxonomic levels and lower levels did not perform better in detecting stressor effects. Community coverage decreased from 96% of all sequences for abundant phyla to 28% for species. Order-level responses were generally representative of responses of corresponding genera and species, suggesting that this level may represent the best compromise between stressor sensitivity and coverage of bacterial communities.

  9. Two-stage clustering (TSC: a pipeline for selecting operational taxonomic units for the high-throughput sequencing of PCR amplicons.

    Directory of Open Access Journals (Sweden)

    Xiao-Tao Jiang

    Full Text Available Clustering 16S/18S rRNA amplicon sequences into operational taxonomic units (OTUs is a critical step for the bioinformatic analysis of microbial diversity. Here, we report a pipeline for selecting OTUs with a relatively low computational demand and a high degree of accuracy. This pipeline is referred to as two-stage clustering (TSC because it divides tags into two groups according to their abundance and clusters them sequentially. The more abundant group is clustered using a hierarchical algorithm similar to that in ESPRIT, which has a high degree of accuracy but is computationally costly for large datasets. The rarer group, which includes the majority of tags, is then heuristically clustered to improve efficiency. To further improve the computational efficiency and accuracy, two preclustering steps are implemented. To maintain clustering accuracy, all tags are grouped into an OTU depending on their pairwise Needleman-Wunsch distance. This method not only improved the computational efficiency but also mitigated the spurious OTU estimation from 'noise' sequences. In addition, OTUs clustered using TSC showed comparable or improved performance in beta-diversity comparisons compared to existing OTU selection methods. This study suggests that the distribution of sequencing datasets is a useful property for improving the computational efficiency and increasing the clustering accuracy of the high-throughput sequencing of PCR amplicons. The software and user guide are freely available at http://hwzhoulab.smu.edu.cn/paperdata/.

  10. Biphasic Study to Characterize Agricultural Biogas Plants by High-Throughput 16S rRNA Gene Amplicon Sequencing and Microscopic Analysis.

    Science.gov (United States)

    Maus, Irena; Kim, Yong Sung; Wibberg, Daniel; Stolze, Yvonne; Off, Sandra; Antonczyk, Sebastian; Pühler, Alfred; Scherer, Paul; Schlüter, Andreas

    2017-02-28

    Process surveillance within agricultural biogas plants (BGPs) was concurrently studied by high-throughput 16S rRNA gene amplicon sequencing and an optimized quantitative microscopic fingerprinting (QMF) technique. In contrast to 16S rRNA gene amplicons, digitalized microscopy is a rapid and cost-effective method that facilitates enumeration and morphological differentiation of the most significant groups of methanogens regarding their shape and characteristic autofluorescent factor 420. Moreover, the fluorescence signal mirrors cell vitality. In this study, four different BGPs were investigated. The results indicated stable process performance in the mesophilic BGPs and in the thermophilic reactor. Bacterial subcommunity characterization revealed significant differences between the four BGPs. Most remarkably, the genera Defluviitoga and Halocella dominated the thermophilic bacterial subcommunity, whereas members of another taxon, Syntrophaceticus, were found to be abundant in the mesophilic BGP. The domain Archaea was dominated by the genus Methanoculleus in all four BGPs, followed by Methanosaeta in BGP1 and BGP3. In contrast, Methanothermobacter members were highly abundant in the thermophilic BGP4. Furthermore, a high consistency between the sequencing approach and the QMF method was shown, especially for the thermophilic BGP. The differences elucidated that using this biphasic approach for mesophilic BGPs provided novel insights regarding disaggregated single cells of Methanosarcina and Methanosaeta species. Both dominated the archaeal subcommunity and replaced coccoid Methanoculleus members belonging to the same group of Methanomicrobiales that have been frequently observed in similar BGPs. This work demonstrates that combining QMF and 16S rRNA gene amplicon sequencing is a complementary strategy to describe archaeal community structures within biogas processes.

  11. Network analysis of the microorganism in 25 Danish wastewater treatment plants over 7 years using high-throughput amplicon sequencing

    DEFF Research Database (Denmark)

    Albertsen, Mads; Larsen, Poul; Saunders, Aaron Marc

    to link sludge and floc properties to the microbial communities. All data was subjected to extensive network analysis and multivariate statistics through R. The 16S amplicon results confirmed the findings of relatively few core groups of organism shared by all the wastewater treatment plants......Wastewater treatment is the world’s largest biotechnological processes and a perfect model system for microbial ecology as the habitat is well defined and replicated all over the world. Extensive investigations on Danish wastewater treatment plants using fluorescent in situ hybridization have...... identified 38 probe-defined core genera, which are shared among all investigated Danish plants. A large body of knowledge exists on many of the core genera, however few attempts have been made to integrate the knowledge on a system-level understanding of the process. In this work we aimed to integrate...

  12. Unearthing microbial diversity of Taxus rhizosphere via MiSeq high-throughput amplicon sequencing and isolate characterization

    Science.gov (United States)

    Hao, Da Cheng; Song, Si Meng; Mu, Jun; Hu, Wen Li; Xiao, Pei Gen

    2016-04-01

    The species variability and potential environmental functions of Taxus rhizosphere microbial community were studied by comparative analyses of 15 16S rRNA and 15 ITS MiSeq sequencing libraries from Taxus rhizospheres in subtropical and temperate regions of China, as well as by isolating laccase-producing strains and polycyclic aromatic hydrocarbon (PAH)-degrading strains. Total reads could be assigned to 2,141 Operational Taxonomic Units (OTUs) belonging to 31 bacteria phyla and 2,904 OTUs of at least seven fungi phyla. The abundance of Planctomycetes, Actinobacteria, and Chloroflexi was higher in T. cuspidata var. nana and T. × media rhizospheres than in T. mairei rhizosphere (NF), while Acidobacteria, Proteobacteria, Nitrospirae, and unclassified bacteria were more abundant in the latter. Ascomycota and Zygomycota were predominant in NF, while two temperate Taxus rhizospheres had more unclassified fungi, Basidiomycota, and Chytridiomycota. The bacterial/fungal community richness and diversity were lower in NF than in other two. Three dye decolorizing fungal isolates were shown to be highly efficient in removing three classes of reactive dye, while two PAH-degrading fungi were able to degrade recalcitrant benzo[a]pyrene. The present studies extend the knowledge pedigree of the microbial diversity populating rhizospheres, and exemplify the method shift in research and development of resource plant rhizosphere.

  13. High-throughput sequencing of 16S rDNA amplicons characterizes bacterial composition in bronchoalveolar lavage fluid in patients with ventilator-associated pneumonia

    Directory of Open Access Journals (Sweden)

    Yang XJ

    2015-08-01

    Full Text Available Xiao-Jun Yang,1,* Yan-Bo Wang,2,3,* Zhi-Wei Zhou,4,* Guo-Wei Wang,2 Xiao-Hong Wang,1 Qing-Fu Liu,1 Shu-Feng Zhou,4 Zhen-Hai Wang2,3 1Department of Intensive Care Unit, 2Neurology Center, General Hospital of Ningxia Medical University, Yinchuan, Ningxia, People’s Republic of China; 3Key Laboratory of Brain Diseases of Ningxia, Yinchuan, Ningxia, People’s Republic of China; 4Department of Pharmaceutical Sciences, College of Pharmacy, University of South Florida, Tampa, FL, USA *These authors contributed equally to this work Abstract: Ventilator-associated pneumonia (VAP is a life-threatening disease that is associated with high rates of morbidity and likely mortality, placing a heavy burden on an individual and society. Currently available diagnostic and therapeutic approaches for VAP treatment are limited, and the prognosis of VAP is poor. The present study aimed to reveal and discriminate the identification of the full spectrum of the pathogens in patients with VAP using high-throughput sequencing approach and analyze the species richness and complexity via alpha and beta diversity analysis. The bronchoalveolar lavage fluid samples were collected from 27 patients with VAP in intensive care unit. The polymerase chain reaction products of the hypervariable regions of 16S rDNA gene in these 27 samples of VAP were sequenced using the 454 GS FLX system. A total of 103,856 pyrosequencing reads and 638 operational taxonomic units were obtained from these 27 samples. There were four dominant phyla, including Proteobacteria, Firmicutes, Actinobacteria, and Bacteroidetes. There were 90 different genera, of which 12 genera occurred in over ten different samples. The top five dominant genera were Streptococcus, Acinetobacter, Limnohabitans, Neisseria, and Corynebacterium, and the most widely distributed genera were Streptococcus, Limnohabitans, and Acinetobacter in these 27 samples. Of note, the mixed profile of causative pathogens was observed. Taken

  14. High-throughput sequencing of 16S rDNA amplicons characterizes bacterial composition in cerebrospinal fluid samples from patients with purulent meningitis

    Directory of Open Access Journals (Sweden)

    Liu A

    2015-08-01

    Full Text Available Aicui Liu,1,2,* Chao Wang,1,2,* Zhijuan Liang,3 Zhi-Wei Zhou,4 Lin Wang,1,2 Qiaoli Ma,1,2 Guowei Wang,1,2 Shu-Feng Zhou,4 Zhenhai Wang1,2 1Neurology Center, General Hospital of Ningxia Medical University, Yinchuan, Ningxia; 2Key Laboratory of Brain Diseases of Ningxia, Yinchuan, Ningxia; 3Department of Neurology, The First People’s Hospital of Lanzhou, Lanzhou, Gansu, People’s Republic of China; 4Department of Pharmaceutical Sciences, College of Pharmacy, University of South Florida, Tampa, FL, USA *These authors contributed equally to this work Abstract: Purulent meningitis (PM is a severe infectious disease that is associated with high rates of morbidity and mortality. It has been recognized that bacterial infection is a major contributing factor to the pathogenesis of PM. However, there is a lack of information on the bacterial composition in PM, due to the low positive rate of cerebrospinal fluid bacterial culture. Herein, we aimed to discriminate and identify the main pathogens and bacterial composition in cerebrospinal fluid sample from PM patients using high-throughput sequencing approach. The cerebrospinal fluid samples were collected from 26 PM patients, and were determined as culture-negative samples. The polymerase chain reaction products of the hypervariable regions of 16S rDNA gene in these 26 samples of PM were sequenced using the 454 GS FLX system. The results showed that there were 71,440 pyrosequencing reads, of which, the predominant phyla were Proteobacteria and Firmicutes; and the predominant genera were Streptococcus, Acinetobacter, Pseudomonas, and Neisseria. The bacterial species in the cerebrospinal fluid were complex, with 61.5% of the samples presenting with mixed pathogens. A significant number of bacteria belonging to a known pathogenic potential was observed. The number of operational taxonomic units for individual samples ranged from six to 75 and there was a comparable difference in the species diversity that

  15. Bacterial diversity of the Colombian fermented milk "Suero Costeño" assessed by culturing and high-throughput sequencing and DGGE analysis of 16S rRNA gene amplicons.

    Science.gov (United States)

    Motato, Karina Edith; Milani, Christian; Ventura, Marco; Valencia, Francia Elena; Ruas-Madiedo, Patricia; Delgado, Susana

    2017-12-01

    "Suero Costeño" (SC) is a traditional soured cream elaborated from raw milk in the Northern-Caribbean coast of Colombia. The natural microbiota that characterizes this popular Colombian fermented milk is unknown, although several culturing studies have previously been attempted. In this work, the microbiota associated with SC from three manufacturers in two regions, "Planeta Rica" (Córdoba) and "Caucasia" (Antioquia), was analysed by means of culturing methods in combination with high-throughput sequencing and DGGE analysis of 16S rRNA gene amplicons. The bacterial ecosystem of SC samples was revealed to be composed of lactic acid bacteria belonging to the Streptococcaceae and Lactobacillaceae families; the proportions and genera varying among manufacturers and region of elaboration. Members of the Lactobacillus acidophilus group, Lactocococcus lactis, Streptococcus infantarius and Streptococcus salivarius characterized this artisanal product. In comparison with culturing, the use of molecular in deep culture-independent techniques provides a more realistic picture of the overall bacterial communities residing in SC. Besides the descriptive purpose, these approaches will facilitate a rational strategy to follow (culture media and growing conditions) for the isolation of indigenous strains that allow standardization in the manufacture of SC. Copyright © 2017 Elsevier Ltd. All rights reserved.

  16. Applications of High Throughput Nucleotide Sequencing

    DEFF Research Database (Denmark)

    Waage, Johannes Eichler

    The recent advent of high throughput sequencing of nucleic acids (RNA and DNA) has vastly expanded research into the functional and structural biology of the genome of all living organisms (and even a few dead ones). With this enormous and exponential growth in biological data generation come...... equally large demands in data handling, analysis and interpretation, perhaps defining the modern challenge of the computational biologist of the post-genomic era. The first part of this thesis consists of a general introduction to the history, common terms and challenges of next generation sequencing......, focusing on oft encountered problems in data processing, such as quality assurance, mapping, normalization, visualization, and interpretation. Presented in the second part are scientific endeavors representing solutions to problems of two sub-genres of next generation sequencing. For the first flavor, RNA-sequencing...

  17. Applications of High Throughput Nucleotide Sequencing

    DEFF Research Database (Denmark)

    Waage, Johannes Eichler

    The recent advent of high throughput sequencing of nucleic acids (RNA and DNA) has vastly expanded research into the functional and structural biology of the genome of all living organisms (and even a few dead ones). With this enormous and exponential growth in biological data generation come...... equally large demands in data handling, analysis and interpretation, perhaps defining the modern challenge of the computational biologist of the post-genomic era. The first part of this thesis consists of a general introduction to the history, common terms and challenges of next generation sequencing......). For the second flavor, DNA-seq, a study presenting genome wide profiling of transcription factor CEBP/A in liver cells undergoing regeneration after partial hepatectomy (article IV) is included....

  18. Targeted Amplicon Sequencing for Single-Nucleotide-Polymorphism Genotyping of Attaching and Effacing Escherichia coli O26:H11 Cattle Strains via a High-Throughput Library Preparation Technique.

    Science.gov (United States)

    Ison, Sarah A; Delannoy, Sabine; Bugarel, Marie; Nagaraja, Tiruvoor G; Renter, David G; den Bakker, Henk C; Nightingale, Kendra K; Fach, Patrick; Loneragan, Guy H

    2015-11-13

    Enterohemorrhagic Escherichia coli (EHEC) O26:H11, a serotype within Shiga toxin-producing E. coli (STEC) that causes severe human disease, has been considered to have evolved from attaching and effacing E. coli (AEEC) O26:H11 through the acquisition of a Shiga toxin-encoding gene. Targeted amplicon sequencing using next-generation sequencing technology of 48 phylogenetically informative single-nucleotide polymorphisms (SNPs) and three SNPs differentiating Shiga toxin-positive (stx-positive) strains from Shiga toxin-negative (stx-negative) strains were used to infer the phylogenetic relationships of 178 E. coli O26:H11 strains (6 stx-positive strains and 172 stx-negative AEEC strains) from cattle feces to 7 publically available genomes of human clinical strains. The AEEC cattle strains displayed synonymous SNP genotypes with stx2-positive sequence type 29 (ST29) human O26:H11 strains, while stx1 ST21 human and cattle strains clustered separately, demonstrating the close phylogenetic relatedness of these Shiga toxin-negative AEEC cattle strains and human clinical strains. With the exception of seven stx-negative strains, five of which contained espK, three stx-related SNPs differentiated the STEC strains from non-STEC strains, supporting the hypothesis that these AEEC cattle strains could serve as a potential reservoir for new or existing pathogenic human strains. Our results support the idea that targeted amplicon sequencing for SNP genotyping expedites strain identification and genetic characterization of E. coli O26:H11, which is important for food safety and public health. Copyright © 2016 Ison et al.

  19. Applications of High Throughput Sequencing for Immunology and Clinical Diagnostics

    OpenAIRE

    Kim, Hyunsung John

    2014-01-01

    High throughput sequencing methods have fundamentally shifted the manner in which biological experiments are performed. In this dissertation, conventional and novel high throughput sequencing and bioinformatics methods are applied to immunology and diagnostics. In order to study rare subsets of cells, an RNA sequencing method was first optimized for use with minimal levels of RNA and cellular input. The optimized RNA sequencing method was then applied to study the transcriptional differences ...

  20. High-throughput sequencing in mitochondrial DNA research.

    Science.gov (United States)

    Ye, Fei; Samuels, David C; Clark, Travis; Guo, Yan

    2014-07-01

    Next-generation sequencing, also known as high-throughput sequencing, has greatly enhanced researchers' ability to conduct biomedical research on all levels. Mitochondrial research has also benefitted greatly from high-throughput sequencing; sequencing technology now allows for screening of all 16,569 base pairs of the mitochondrial genome simultaneously for SNPs and low level heteroplasmy and, in some cases, the estimation of mitochondrial DNA copy number. It is important to realize the full potential of high-throughput sequencing for the advancement of mitochondrial research. To this end, we review how high-throughput sequencing has impacted mitochondrial research in the categories of SNPs, low level heteroplasmy, copy number, and structural variants. We also discuss the different types of mitochondrial DNA sequencing and their pros and cons. Based on previous studies conducted by various groups, we provide strategies for processing mitochondrial DNA sequencing data, including assembly, variant calling, and quality control. Copyright © 2014 Elsevier B.V. and Mitochondria Research Society. All rights reserved.

  1. Sources of PCR-induced distortions in high-throughput sequencing data sets

    Science.gov (United States)

    Kebschull, Justus M.; Zador, Anthony M.

    2015-01-01

    PCR permits the exponential and sequence-specific amplification of DNA, even from minute starting quantities. PCR is a fundamental step in preparing DNA samples for high-throughput sequencing. However, there are errors associated with PCR-mediated amplification. Here we examine the effects of four important sources of error—bias, stochasticity, template switches and polymerase errors—on sequence representation in low-input next-generation sequencing libraries. We designed a pool of diverse PCR amplicons with a defined structure, and then used Illumina sequencing to search for signatures of each process. We further developed quantitative models for each process, and compared predictions of these models to our experimental data. We find that PCR stochasticity is the major force skewing sequence representation after amplification of a pool of unique DNA amplicons. Polymerase errors become very common in later cycles of PCR but have little impact on the overall sequence distribution as they are confined to small copy numbers. PCR template switches are rare and confined to low copy numbers. Our results provide a theoretical basis for removing distortions from high-throughput sequencing data. In addition, our findings on PCR stochasticity will have particular relevance to quantification of results from single cell sequencing, in which sequences are represented by only one or a few molecules. PMID:26187991

  2. High-throughput sequence alignment using Graphics Processing Units

    Directory of Open Access Journals (Sweden)

    Trapnell Cole

    2007-12-01

    Full Text Available Abstract Background The recent availability of new, less expensive high-throughput DNA sequencing technologies has yielded a dramatic increase in the volume of sequence data that must be analyzed. These data are being generated for several purposes, including genotyping, genome resequencing, metagenomics, and de novo genome assembly projects. Sequence alignment programs such as MUMmer have proven essential for analysis of these data, but researchers will need ever faster, high-throughput alignment tools running on inexpensive hardware to keep up with new sequence technologies. Results This paper describes MUMmerGPU, an open-source high-throughput parallel pairwise local sequence alignment program that runs on commodity Graphics Processing Units (GPUs in common workstations. MUMmerGPU uses the new Compute Unified Device Architecture (CUDA from nVidia to align multiple query sequences against a single reference sequence stored as a suffix tree. By processing the queries in parallel on the highly parallel graphics card, MUMmerGPU achieves more than a 10-fold speedup over a serial CPU version of the sequence alignment kernel, and outperforms the exact alignment component of MUMmer on a high end CPU by 3.5-fold in total application time when aligning reads from recent sequencing projects using Solexa/Illumina, 454, and Sanger sequencing technologies. Conclusion MUMmerGPU is a low cost, ultra-fast sequence alignment program designed to handle the increasing volume of data produced by new, high-throughput sequencing technologies. MUMmerGPU demonstrates that even memory-intensive applications can run significantly faster on the relatively low-cost GPU than on the CPU.

  3. High-throughput sequence alignment using Graphics Processing Units.

    Science.gov (United States)

    Schatz, Michael C; Trapnell, Cole; Delcher, Arthur L; Varshney, Amitabh

    2007-12-10

    The recent availability of new, less expensive high-throughput DNA sequencing technologies has yielded a dramatic increase in the volume of sequence data that must be analyzed. These data are being generated for several purposes, including genotyping, genome resequencing, metagenomics, and de novo genome assembly projects. Sequence alignment programs such as MUMmer have proven essential for analysis of these data, but researchers will need ever faster, high-throughput alignment tools running on inexpensive hardware to keep up with new sequence technologies. This paper describes MUMmerGPU, an open-source high-throughput parallel pairwise local sequence alignment program that runs on commodity Graphics Processing Units (GPUs) in common workstations. MUMmerGPU uses the new Compute Unified Device Architecture (CUDA) from nVidia to align multiple query sequences against a single reference sequence stored as a suffix tree. By processing the queries in parallel on the highly parallel graphics card, MUMmerGPU achieves more than a 10-fold speedup over a serial CPU version of the sequence alignment kernel, and outperforms the exact alignment component of MUMmer on a high end CPU by 3.5-fold in total application time when aligning reads from recent sequencing projects using Solexa/Illumina, 454, and Sanger sequencing technologies. MUMmerGPU is a low cost, ultra-fast sequence alignment program designed to handle the increasing volume of data produced by new, high-throughput sequencing technologies. MUMmerGPU demonstrates that even memory-intensive applications can run significantly faster on the relatively low-cost GPU than on the CPU.

  4. Validation of high throughput sequencing and microbial forensics applications

    OpenAIRE

    Budowle, Bruce; Connell, Nancy D.; Bielecka-Oder, Anna; Rita R Colwell; Corbett, Cindi R.; Fletcher, Jacqueline; Forsman, Mats; Kadavy, Dana R; Markotic, Alemka; Morse, Stephen A.; Murch, Randall S; Sajantila, Antti; Schemes, Sarah E; Ternus, Krista L; Turner, Stephen D

    2014-01-01

    Abstract High throughput sequencing (HTS) generates large amounts of high quality sequence data for microbial genomics. The value of HTS for microbial forensics is the speed at which evidence can be collected and the power to characterize microbial-related evidence to solve biocrimes and bioterrorist events. As HTS technologies continue to improve, they provide increasingly powerful sets of tools to support the entire field of microbial forensics. Accurate, credible results a...

  5. Automated degenerate PCR primer design for high-throughput sequencing improves efficiency of viral sequencing

    Directory of Open Access Journals (Sweden)

    Li Kelvin

    2012-11-01

    Full Text Available Abstract Background In a high-throughput environment, to PCR amplify and sequence a large set of viral isolates from populations that are potentially heterogeneous and continuously evolving, the use of degenerate PCR primers is an important strategy. Degenerate primers allow for the PCR amplification of a wider range of viral isolates with only one set of pre-mixed primers, thus increasing amplification success rates and minimizing the necessity for genome finishing activities. To successfully select a large set of degenerate PCR primers necessary to tile across an entire viral genome and maximize their success, this process is best performed computationally. Results We have developed a fully automated degenerate PCR primer design system that plays a key role in the J. Craig Venter Institute’s (JCVI high-throughput viral sequencing pipeline. A consensus viral genome, or a set of consensus segment sequences in the case of a segmented virus, is specified using IUPAC ambiguity codes in the consensus template sequence to represent the allelic diversity of the target population. PCR primer pairs are then selected computationally to produce a minimal amplicon set capable of tiling across the full length of the specified target region. As part of the tiling process, primer pairs are computationally screened to meet the criteria for successful PCR with one of two described amplification protocols. The actual sequencing success rates for designed primers for measles virus, mumps virus, human parainfluenza virus 1 and 3, human respiratory syncytial virus A and B and human metapneumovirus are described, where >90% of designed primer pairs were able to consistently successfully amplify >75% of the isolates. Conclusions Augmenting our previously developed and published JCVI Primer Design Pipeline, we achieved similarly high sequencing success rates with only minor software modifications. The recommended methodology for the construction of the consensus

  6. High-throughput DNA sequencing: a genomic data manufacturing process.

    Science.gov (United States)

    Huang, G M

    1999-01-01

    The progress trends in automated DNA sequencing operation are reviewed. Technological development in sequencing instruments, enzymatic chemistry and robotic stations has resulted in ever-increasing capacity of sequence data production. This progress leads to a higher demand on laboratory information management and data quality assessment. High-throughput laboratories face the challenge of organizational management, as well as technology management. Engineering principles of process control should be adopted in this biological data manufacturing procedure. While various systems attempt to provide solutions to automate different parts of, or even the entire process, new technical advances will continue to change the paradigm and provide new challenges.

  7. High throughput sequencing of microRNAs in chicken somites.

    Science.gov (United States)

    Rathjen, Tina; Pais, Helio; Sweetman, Dylan; Moulton, Vincent; Munsterberg, Andrea; Dalmay, Tamas

    2009-05-06

    High throughput Solexa sequencing technology was applied to identify microRNAs in somites of developing chicken embryos. We obtained 651,273 reads, from which 340,415 were mapped to the chicken genome representing 1701 distinct sequences. Eighty-five of these were known microRNAs and 42 novel miRNA candidates were identified. Accumulation of 18 of 42 sequences was confirmed by Northern blot analysis. Ten of the 18 sequences are new variants of known miRNAs and eight short RNAs are novel miRNAs. Six of these eight have not been reported by other deep sequencing projects. One of the six new miRNAs is highly enriched in somite tissue suggesting that deep sequencing of other specific tissues has the potential to identify novel tissue specific miRNAs.

  8. 76 FR 28990 - Ultra High Throughput Sequencing for Clinical Diagnostic Applications-Approaches To Assess...

    Science.gov (United States)

    2011-05-19

    ... Clinical Diagnostic Applications--Approaches To Assess Analytical Validity.'' The purpose of the public... approaches to assess analytical validity of ultra high throughput sequencing for clinical diagnostic... HUMAN SERVICES Food and Drug Administration Ultra High Throughput Sequencing for Clinical Diagnostic...

  9. High-throughput sequencing: a roadmap toward community ecology.

    Science.gov (United States)

    Poisot, Timothée; Péquin, Bérangère; Gravel, Dominique

    2013-04-01

    High-throughput sequencing is becoming increasingly important in microbial ecology, yet it is surprisingly under-used to generate or test biogeographic hypotheses. In this contribution, we highlight how adding these methods to the ecologist toolbox will allow the detection of new patterns, and will help our understanding of the structure and dynamics of diversity. Starting with a review of ecological questions that can be addressed, we move on to the technical and analytical issues that will benefit from an increased collaboration between different disciplines.

  10. An improved high throughput sequencing method for studying oomycete communities

    DEFF Research Database (Denmark)

    Sapkota, Rumakanta; Nicolaisen, Mogens

    2015-01-01

    Culture-independent studies using next generation sequencing have revolutionizedmicrobial ecology, however, oomycete ecology in soils is severely lagging behind. The aimof this study was to improve and validate standard techniques for using high throughput sequencing as a tool for studying oomycete...... agricultural fields in Denmark, and 11 samples from carrot tissue with symptoms of Pythium infection. Sequence data from the Pythium and Phytophthora mock communities showed that our strategy successfully detected all included species. Taxonomic assignments of OTUs from 26 soil sample showed that 95...... the usefulness of the method not only in soil DNA but also in a plant DNA background. In conclusion, we demonstrate a successful approach for pyrosequencing of oomycete communities using ITS1 as the barcode sequence with well-known primers for oomycete DNA amplification....

  11. Fusion genes and their discovery using high throughput sequencing.

    Science.gov (United States)

    Annala, M J; Parker, B C; Zhang, W; Nykter, M

    2013-11-01

    Fusion genes are hybrid genes that combine parts of two or more original genes. They can form as a result of chromosomal rearrangements or abnormal transcription, and have been shown to act as drivers of malignant transformation and progression in many human cancers. The biological significance of fusion genes together with their specificity to cancer cells has made them into excellent targets for molecular therapy. Fusion genes are also used as diagnostic and prognostic markers to confirm cancer diagnosis and monitor response to molecular therapies. High-throughput sequencing has enabled the systematic discovery of fusion genes in a wide variety of cancer types. In this review, we describe the history of fusion genes in cancer and the ways in which fusion genes form and affect cellular function. We also describe computational methodologies for detecting fusion genes from high-throughput sequencing experiments, and the most common sources of error that lead to false discovery of fusion genes. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  12. Applications of High-Throughput Nucleotide Sequencing (PhD)

    DEFF Research Database (Denmark)

    Waage, Johannes

    The recent advent of high throughput sequencing of nucleic acids (RNA and DNA) has vastly expanded research into the functional and structural biology of the genome of all living organisms (and even a few dead ones). With this enormous and exponential growth in biological data generation come...... equally large demands in data handling, analysis and interpretation, perhaps defining the modern challenge of the computational biologist of the post-genomic era. The first part of this thesis consists of a general introduction to the history, common terms and challenges of next generation sequencing......). For the second flavor, DNA-seq, a study presenting genome wide profiling of transcription factor CEBP/A in liver cells undergoing regeneration after partial hepatectomy (article IV) is included....

  13. Savant: genome browser for high-throughput sequencing data.

    Science.gov (United States)

    Fiume, Marc; Williams, Vanessa; Brook, Andrew; Brudno, Michael

    2010-08-15

    The advent of high-throughput sequencing (HTS) technologies has made it affordable to sequence many individuals' genomes. Simultaneously the computational analysis of the large volumes of data generated by the new sequencing machines remains a challenge. While a plethora of tools are available to map the resulting reads to a reference genome, and to conduct primary analysis of the mappings, it is often necessary to visually examine the results and underlying data to confirm predictions and understand the functional effects, especially in the context of other datasets. We introduce Savant, the Sequence Annotation, Visualization and ANalysis Tool, a desktop visualization and analysis browser for genomic data. Savant was developed for visualizing and analyzing HTS data, with special care taken to enable dynamic visualization in the presence of gigabases of genomic reads and references the size of the human genome. Savant supports the visualization of genome-based sequence, point, interval and continuous datasets, and multiple visualization modes that enable easy identification of genomic variants (including single nucleotide polymorphisms, structural and copy number variants), and functional genomic information (e.g. peaks in ChIP-seq data) in the context of genomic annotations. Savant is freely available at http://compbio.cs.toronto.edu/savant.

  14. Validation of high throughput sequencing and microbial forensics applications.

    Science.gov (United States)

    Budowle, Bruce; Connell, Nancy D; Bielecka-Oder, Anna; Colwell, Rita R; Corbett, Cindi R; Fletcher, Jacqueline; Forsman, Mats; Kadavy, Dana R; Markotic, Alemka; Morse, Stephen A; Murch, Randall S; Sajantila, Antti; Schmedes, Sarah E; Ternus, Krista L; Turner, Stephen D; Minot, Samuel

    2014-01-01

    High throughput sequencing (HTS) generates large amounts of high quality sequence data for microbial genomics. The value of HTS for microbial forensics is the speed at which evidence can be collected and the power to characterize microbial-related evidence to solve biocrimes and bioterrorist events. As HTS technologies continue to improve, they provide increasingly powerful sets of tools to support the entire field of microbial forensics. Accurate, credible results allow analysis and interpretation, significantly influencing the course and/or focus of an investigation, and can impact the response of the government to an attack having individual, political, economic or military consequences. Interpretation of the results of microbial forensic analyses relies on understanding the performance and limitations of HTS methods, including analytical processes, assays and data interpretation. The utility of HTS must be defined carefully within established operating conditions and tolerances. Validation is essential in the development and implementation of microbial forensics methods used for formulating investigative leads attribution. HTS strategies vary, requiring guiding principles for HTS system validation. Three initial aspects of HTS, irrespective of chemistry, instrumentation or software are: 1) sample preparation, 2) sequencing, and 3) data analysis. Criteria that should be considered for HTS validation for microbial forensics are presented here. Validation should be defined in terms of specific application and the criteria described here comprise a foundation for investigators to establish, validate and implement HTS as a tool in microbial forensics, enhancing public safety and national security.

  15. Large scale library generation for high throughput sequencing.

    Directory of Open Access Journals (Sweden)

    Erik Borgström

    Full Text Available BACKGROUND: Large efforts have recently been made to automate the sample preparation protocols for massively parallel sequencing in order to match the increasing instrument throughput. Still, the size selection through agarose gel electrophoresis separation is a labor-intensive bottleneck of these protocols. METHODOLOGY/PRINCIPAL FINDINGS: In this study a method for automatic library preparation and size selection on a liquid handling robot is presented. The method utilizes selective precipitation of certain sizes of DNA molecules on to paramagnetic beads for cleanup and selection after standard enzymatic reactions. CONCLUSIONS/SIGNIFICANCE: The method is used to generate libraries for de novo and re-sequencing on the Illumina HiSeq 2000 instrument with a throughput of 12 samples per instrument in approximately 4 hours. The resulting output data show quality scores and pass filter rates comparable to manually prepared samples. The sample size distribution can be adjusted for each application, and are suitable for all high throughput DNA processing protocols seeking to control size intervals.

  16. Highly Accurate Sequencing of Full-Length Immune Repertoire Amplicons Using Tn5-Enabled and Molecular Identifier-Guided Amplicon Assembly.

    Science.gov (United States)

    Cole, Charles; Volden, Roger; Dharmadhikari, Sumedha; Scelfo-Dalbey, Camille; Vollmers, Christopher

    2016-03-15

    Ab repertoire sequencing is a powerful tool to analyze the adaptive immune system. To sequence entire Ab repertoires, amplicons are created from Ab H chain (IgH) transcripts and sequenced on a high-throughput sequencer. The field of immune repertoire sequencing is growing rapidly and the protocols used are steadily improving; however, thus far, immune repertoire sequencing protocols have not been able to sequence full-length immune repertoires including the entire IgH V region and enough of the IgH C region to identify isotype subtypes. In this study, we present a method that combines Tn5 transposase and molecular identifiers for the highly accurate sequencing of amplicons >500 bp using Illumina short read paired-end sequencing. We then apply this method to Ab H chain amplicons to sequence the first, to our knowledge, highly accurate full-length immune repertoire. Copyright © 2016 by The American Association of Immunologists, Inc.

  17. Using high throughput sequencing to explore the biodiversity in oral bacterial communities

    Science.gov (United States)

    Diaz, P.I.; Dupuy, A.K.; Abusleme, L.; Reese, B.; Obergfell, C.; Choquette, L.; Dongari-Bagtzoglou, A.; Peterson, D.E.; Terzi, E.; Strausbaugh, L.D.

    2013-01-01

    Summary High throughput sequencing of 16S ribosomal RNA gene amplicons is a cost-effective method for characterization of oral bacterial communities. However, before undertaking large-scale studies, it is necessary to understand the technique-associated limitations and intrinsic variability of the oral ecosystem. In this work we evaluated bias in species representation using an in vitro-assembled mock community of oral bacteria. We then characterized the bacterial communities in saliva and buccal mucosa of five healthy subjects to investigate the power of high throughput sequencing in revealing their diversity and biogeography patterns. Mock community analysis showed primer and DNA isolation biases and an overestimation of diversity that was reduced after eliminating singleton operational taxonomic units (OTUs). Sequencing of salivary and mucosal communities found a total of 455 OTUs (0.3% dissimilarity) with only 78 of these present in all subjects. We demonstrate that this variability was partly the result of incomplete richness coverage even at great sequencing depths, and so comparing communities by their structure was more effective than comparisons based solely on membership. With respect to oral biogeography, we found inter-subject variability in community structure was lower than site differences between salivary and mucosal communities within subjects. These differences were evident at very low sequencing depths and were mostly caused by the abundance of Streptococcus mitis and Gemella haemolysans in mucosa. In summary, we present an experimental and data analysis framework that will facilitate design and interpretation of pyrosequencing-based studies. Despite challenges associated with this technique, we demonstrate its power for evaluation of oral diversity and biogeography patterns. PMID:22520388

  18. Survey of Microbial Diversity in Flood Areas during Thailand 2011 Flood Crisis Using High-Throughput Tagged Amplicon Pyrosequencing.

    Science.gov (United States)

    Mhuantong, Wuttichai; Wongwilaiwalin, Sarunyou; Laothanachareon, Thanaporn; Eurwilaichitr, Lily; Tangphatsornruang, Sithichoke; Boonchayaanant, Benjaporn; Limpiyakorn, Tawan; Pattaragulwanit, Kobchai; Punmatharith, Thantip; McEvoy, John; Khan, Eakalak; Rachakornkij, Manaskorn; Champreda, Verawat

    2015-01-01

    The Thailand flood crisis in 2011 was one of the largest recorded floods in modern history, causing enormous damage to the economy and ecological habitats of the country. In this study, bacterial and fungal diversity in sediments and waters collected from ten flood areas in Bangkok and its suburbs, covering residential and agricultural areas, were analyzed using high-throughput 454 pyrosequencing of 16S rRNA gene and internal transcribed spacer sequences. Analysis of microbial community showed differences in taxa distribution in water and sediment with variations in the diversity of saprophytic microbes and sulfate/nitrate reducers among sampling locations, suggesting differences in microbial activity in the habitats. Overall, Proteobacteria represented a major bacterial group in waters, while this group co-existed with Firmicutes, Bacteroidetes, and Actinobacteria in sediments. Anaeromyxobacter, Steroidobacter, and Geobacter were the dominant bacterial genera in sediments, while Sulfuricurvum, Thiovirga, and Hydrogenophaga predominated in waters. For fungi in sediments, Ascomycota, Glomeromycota, and Basidiomycota, particularly in genera Philipsia, Rozella, and Acaulospora, were most frequently detected. Chytridiomycota and Ascomycota were the major fungal phyla, and Rhizophlyctis and Mortierella were the most frequently detected fungal genera in water. Diversity of sulfate-reducing bacteria, related to odor problems, was further investigated using analysis of the dsrB gene which indicated the presence of sulfate-reducing bacteria of families Desulfobacteraceae, Desulfobulbaceae, Syntrobacteraceae, and Desulfoarculaceae in the flood sediments. The work provides an insight into the diversity and function of microbes related to biological processes in flood areas.

  19. Survey of Microbial Diversity in Flood Areas during Thailand 2011 Flood Crisis Using High-Throughput Tagged Amplicon Pyrosequencing.

    Directory of Open Access Journals (Sweden)

    Wuttichai Mhuantong

    Full Text Available The Thailand flood crisis in 2011 was one of the largest recorded floods in modern history, causing enormous damage to the economy and ecological habitats of the country. In this study, bacterial and fungal diversity in sediments and waters collected from ten flood areas in Bangkok and its suburbs, covering residential and agricultural areas, were analyzed using high-throughput 454 pyrosequencing of 16S rRNA gene and internal transcribed spacer sequences. Analysis of microbial community showed differences in taxa distribution in water and sediment with variations in the diversity of saprophytic microbes and sulfate/nitrate reducers among sampling locations, suggesting differences in microbial activity in the habitats. Overall, Proteobacteria represented a major bacterial group in waters, while this group co-existed with Firmicutes, Bacteroidetes, and Actinobacteria in sediments. Anaeromyxobacter, Steroidobacter, and Geobacter were the dominant bacterial genera in sediments, while Sulfuricurvum, Thiovirga, and Hydrogenophaga predominated in waters. For fungi in sediments, Ascomycota, Glomeromycota, and Basidiomycota, particularly in genera Philipsia, Rozella, and Acaulospora, were most frequently detected. Chytridiomycota and Ascomycota were the major fungal phyla, and Rhizophlyctis and Mortierella were the most frequently detected fungal genera in water. Diversity of sulfate-reducing bacteria, related to odor problems, was further investigated using analysis of the dsrB gene which indicated the presence of sulfate-reducing bacteria of families Desulfobacteraceae, Desulfobulbaceae, Syntrobacteraceae, and Desulfoarculaceae in the flood sediments. The work provides an insight into the diversity and function of microbes related to biological processes in flood areas.

  20. Roche genome sequencer FLX based high-throughput sequencing of ancient DNA

    DEFF Research Database (Denmark)

    Alquezar-Planas, David E; Fordyce, Sarah Louise

    2012-01-01

    Since the development of so-called "next generation" high-throughput sequencing in 2005, this technology has been applied to a variety of fields. Such applications include disease studies, evolutionary investigations, and ancient DNA. Each application requires a specialized protocol to ensure tha...

  1. High-throughput DNA sequencing errors are reduced by orders of magnitude using circle sequencing

    Science.gov (United States)

    Lou, Dianne I.; Hussmann, Jeffrey A.; McBee, Ross M.; Acevedo, Ashley; Andino, Raul; Press, William H.; Sawyer, Sara L.

    2013-01-01

    A major limitation of high-throughput DNA sequencing is the high rate of erroneous base calls produced. For instance, Illumina sequencing machines produce errors at a rate of ∼0.1–1 × 10−2 per base sequenced. These technologies typically produce billions of base calls per experiment, translating to millions of errors. We have developed a unique library preparation strategy, “circle sequencing,” which allows for robust downstream computational correction of these errors. In this strategy, DNA templates are circularized, copied multiple times in tandem with a rolling circle polymerase, and then sequenced on any high-throughput sequencing machine. Each read produced is computationally processed to obtain a consensus sequence of all linked copies of the original molecule. Physically linking the copies ensures that each copy is independently derived from the original molecule and allows for efficient formation of consensus sequences. The circle-sequencing protocol precedes standard library preparations and is therefore suitable for a broad range of sequencing applications. We tested our method using the Illumina MiSeq platform and obtained errors in our processed sequencing reads at a rate as low as 7.6 × 10−6 per base sequenced, dramatically improving the error rate of Illumina sequencing and putting error on par with low-throughput, but highly accurate, Sanger sequencing. Circle sequencing also had substantially higher efficiency and lower cost than existing barcode-based schemes for correcting sequencing errors. PMID:24243955

  2. Whole Genome Sequencing of Enterovirus species C Isolates by High-throughput Sequencing: Development of Generic Primers

    Directory of Open Access Journals (Sweden)

    Maël Bessaud

    2016-08-01

    Full Text Available Enteroviruses are among the most common viruses infecting humans and can cause diverse clinical syndromes ranging from minor febrile illness to severe and potentially fatal diseases. Enterovirus species C (EV-C consists of more than 20 types, among which the 3 serotypes of polioviruses, the etiological agents of poliomyelitis, are included. Biodiversity and evolution of EV-C genomes are shaped by frequent recombination events. Therefore, identification and characterization of circulating EV-C strains require the sequencing of different genomic regions.A simple method was developed to sequence quickly the entire genome of EV-C isolates. Four overlapping fragments were produced separately by RT-PCR performed with generic primers. The four amplicons were then pooled and purified prior to be sequenced by high-throughput technique.The method was assessed on a panel of EV-Cs belonging to a wide-range of types. It can be used to determine full-length genome sequences through de novo assembly of thousands of reads. It was also able to discriminate reads from closely related viruses in mixtures.By decreasing the workload compared to classical Sanger-based techniques, this method will serve as a precious tool for sequencing large panels of EV-Cs isolated in cell cultures during environmental surveillance or from patients, including vaccine-derived polioviruses.

  3. Direct multiplex sequencing (DMPS)--a novel method for targeted high-throughput sequencing of ancient and highly degraded DNA

    National Research Council Canada - National Science Library

    Stiller, Mathias; Knapp, Michael; Stenzel, Udo; Hofreiter, Michael; Meyer, Matthias

    2009-01-01

    Although the emergence of high-throughput sequencing technologies has enabled whole-genome sequencing from extinct organisms, little progress has been made in accelerating targeted sequencing from highly degraded DNA...

  4. A flexible and economical barcoding approach for highly multiplexed amplicon sequencing of diverse target genes

    Directory of Open Access Journals (Sweden)

    Craig W. Herbold

    2015-07-01

    Full Text Available High throughput sequencing of phylogenetic and functional gene amplicons provides tremendous insight into the structure and functional potential of complex microbial communities. Here, we introduce a highly adaptable and economical PCR approach to barcoding and pooling libraries of numerous target genes. In this approach, we replace gene- and sequencing platform-specific fusion primers with general, interchangeable barcoding primers, enabling nearly limitless customized barcode-primer combinations. Compared to barcoding with long fusion primers, our multiple-target gene approach is more economical because it overall requires lower number of primers and is based on short primers with generally lower synthesis and purification costs. To highlight our approach, we pooled over 900 different small-subunit rRNA and functional gene amplicon libraries obtained from various environmental or host-associated microbial community samples into a single, paired-end Illumina MiSeq run. Although the amplicon regions ranged in size from approximately 290 to 720 bp, we found no significant systematic sequencing bias related to amplicon length or gene target. Our results indicate that this flexible multiplexing approach produces large, diverse and high quality sets of amplicon sequence data for modern studies in microbial ecology.

  5. Automated cleaning and pre-processing of immunoglobulin gene sequences from high-throughput sequencing

    Directory of Open Access Journals (Sweden)

    Miri eMichaeli

    2012-12-01

    Full Text Available High throughput sequencing (HTS yields tens of thousands to millions of sequences that require a large amount of pre-processing work to clean various artifacts. Such cleaning cannot be performed manually. Existing programs are not suitable for immunoglobulin (Ig genes, which are variable and often highly mutated. This paper describes Ig-HTS-Cleaner (Ig High Throughput Sequencing Cleaner, a program containing a simple cleaning procedure that successfully deals with pre-processing of Ig sequences derived from HTS, and Ig-Indel-Identifier (Ig Insertion – Deletion Identifier, a program for identifying legitimate and artifact insertions and/or deletions (indels. Our programs were designed for analyzing Ig gene sequences obtained by 454 sequencing, but they are applicable to all types of sequences and sequencing platforms. Ig-HTS-Cleaner and Ig-Indel-Identifier have been implemented in Java and saved as executable JAR files, supported on Linux and MS Windows. No special requirements are needed in order to run the programs, except for correctly constructing the input files as explained in the text. The programs' performance has been tested and validated on real and simulated data sets.

  6. Recent progress using high-throughput sequencing technologies in plant molecular breeding.

    Science.gov (United States)

    Gao, Qiang; Yue, Guidong; Li, Wenqi; Wang, Junyi; Xu, Jiaohui; Yin, Ye

    2012-04-01

    High-throughput sequencing is a revolutionary technological innovation in DNA sequencing. This technology has an ultra-low cost per base of sequencing and an overwhelmingly high data output. High-throughput sequencing has brought novel research methods and solutions to the research fields of genomics and post-genomics. Furthermore, this technology is leading to a new molecular breeding revolution that has landmark significance for scientific research and enables us to launch multi-level, multi-faceted, and multi-extent studies in the fields of crop genetics, genomics, and crop breeding. In this paper, we review progress in the application of high-throughput sequencing technologies to plant molecular breeding studies. © 2012 Institute of Botany, Chinese Academy of Sciences.

  7. Probabilistic Methods for Processing High-Throughput Sequencing Signals

    DEFF Research Database (Denmark)

    Sørensen, Lasse Maretty

    for reconstructing transcript sequences from RNA sequencing data. The method is based on a novel sparse prior distribution over transcript abundances and is markedly more accurate than existing approaches. The second chapter describes a new method for calling genotypes from a fixed set of candidate variants...

  8. Algorithms for mapping high-throughput DNA sequences

    DEFF Research Database (Denmark)

    Frellsen, Jes; Menzel, Peter; Krogh, Anders

    2014-01-01

    of data generation, new bioinformatics approaches have been developed to cope with the large amount of sequencing reads obtained in these experiments. In this chapter, we first introduce HTS technologies and their usage in molecular biology and discuss the problem of mapping sequencing reads...

  9. Monitoring genomic sequences during SELEX using high-throughput sequencing: neutral SELEX.

    Directory of Open Access Journals (Sweden)

    Bob Zimmermann

    Full Text Available BACKGROUND: SELEX is a well established in vitro selection tool to analyze the structure of ligand-binding nucleic acid sequences called aptamers. Genomic SELEX transforms SELEX into a tool to discover novel, genomically encoded RNA or DNA sequences binding a ligand of interest, called genomic aptamers. Concerns have been raised regarding requirements imposed on RNA sequences undergoing SELEX selection. METHODOLOGY/PRINCIPAL FINDINGS: To evaluate SELEX and assess the extent of these effects, we designed and performed a Neutral SELEX experiment omitting the selection step, such that the sequences are under the sole selective pressure of SELEX's amplification steps. Using high-throughput sequencing, we obtained thousands of full-length sequences from the initial genomic library and the pools after each of the 10 rounds of Neutral SELEX. We compared these to sequences obtained from a Genomic SELEX experiment deriving from the same initial library, but screening for RNAs binding with high affinity to the E. coli regulator protein Hfq. With each round of Neutral SELEX, sequences became less stable and changed in nucleotide content, but no sequences were enriched. In contrast, we detected substantial enrichment in the Hfq-selected set with enriched sequences having structural stability similar to the neutral sequences but with significantly different nucleotide selection. CONCLUSIONS/SIGNIFICANCE: Our data indicate that positive selection in SELEX acts independently of the neutral selective requirements imposed on the sequences. We conclude that Genomic SELEX, when combined with high-throughput sequencing of positively and neutrally selected pools, as well as the gnomic library, is a powerful method to identify genomic aptamers.

  10. Monitoring genomic sequences during SELEX using high-throughput sequencing: neutral SELEX.

    Science.gov (United States)

    Zimmermann, Bob; Gesell, Tanja; Chen, Doris; Lorenz, Christina; Schroeder, Renée

    2010-02-11

    SELEX is a well established in vitro selection tool to analyze the structure of ligand-binding nucleic acid sequences called aptamers. Genomic SELEX transforms SELEX into a tool to discover novel, genomically encoded RNA or DNA sequences binding a ligand of interest, called genomic aptamers. Concerns have been raised regarding requirements imposed on RNA sequences undergoing SELEX selection. To evaluate SELEX and assess the extent of these effects, we designed and performed a Neutral SELEX experiment omitting the selection step, such that the sequences are under the sole selective pressure of SELEX's amplification steps. Using high-throughput sequencing, we obtained thousands of full-length sequences from the initial genomic library and the pools after each of the 10 rounds of Neutral SELEX. We compared these to sequences obtained from a Genomic SELEX experiment deriving from the same initial library, but screening for RNAs binding with high affinity to the E. coli regulator protein Hfq. With each round of Neutral SELEX, sequences became less stable and changed in nucleotide content, but no sequences were enriched. In contrast, we detected substantial enrichment in the Hfq-selected set with enriched sequences having structural stability similar to the neutral sequences but with significantly different nucleotide selection. Our data indicate that positive selection in SELEX acts independently of the neutral selective requirements imposed on the sequences. We conclude that Genomic SELEX, when combined with high-throughput sequencing of positively and neutrally selected pools, as well as the gnomic library, is a powerful method to identify genomic aptamers.

  11. High-Throughput Sequencing Based Methods of RNA Structure Investigation

    DEFF Research Database (Denmark)

    Kielpinski, Lukasz Jan

    In this thesis we describe the development of four related methods for RNA structure probing that utilize massive parallel sequencing. Using them, we were able to gather structural data for multiple, long molecules simultaneously. First, we have established an easy to follow experimental and comp......In this thesis we describe the development of four related methods for RNA structure probing that utilize massive parallel sequencing. Using them, we were able to gather structural data for multiple, long molecules simultaneously. First, we have established an easy to follow experimental...... with known priming sites....

  12. High‑throughput sequencing of amplicons for monitoring yeast biodiversity in must and during alcoholic fermentation.

    Science.gov (United States)

    David, Vanessa; Terrat, Sébastien; Herzine, Khaled; Claisse, Olivier; Rousseaux, Sandrine; Tourdot-Maréchal, Raphaëlle; Masneuf-Pomarede, Isabelle; Ranjard, Lionel; Alexandre, Hervé

    2014-05-01

    We compared pyrosequencing technology with the PCR-ITS-RFLP analysis of yeast isolates and denaturing gradient gel electrophoresis (DGGE). These methods gave divergent findings for the yeast population. DGGE was unsuitable for the quantification of biodiversity and its use for species detection was limited by the initial abundance of each species. The isolates identified by PCR-ITSRFLP were not fully representative of the true population. For population dynamics, high-throughput sequencing technology yielded results differing in some respects from those obtained with other approaches. This study demonstrates that 454 pyrosequencing of amplicons is more relevant than other methods for studying the yeast community on grapes and during alcoholic fermentation. Indeed, this high-throughput sequencing method detected larger numbers of species on grapes and identified species present during alcoholic fermentation that were undetectable with the other techniques.

  13. Exploring the sources of bacterial spoilers in beefsteaks by culture-independent high-throughput sequencing.

    Directory of Open Access Journals (Sweden)

    Francesca De Filippis

    Full Text Available Microbial growth on meat to unacceptable levels contributes significantly to change meat structure, color and flavor and to cause meat spoilage. The types of microorganisms initially present in meat depend on several factors and multiple sources of contamination can be identified. The aims of this study were to evaluate the microbial diversity in beefsteaks before and after aerobic storage at 4°C and to investigate the sources of microbial contamination by examining the microbiota of carcasses wherefrom the steaks originated and of the processing environment where the beef was handled. Carcass, environmental (processing plant and meat samples were analyzed by culture-independent high-throughput sequencing of 16S rRNA gene amplicons. The microbiota of carcass swabs was very complex, including more than 600 operational taxonomic units (OTUs belonging to 15 different phyla. A significant association was found between beef microbiota and specific beef cuts (P<0.01 indicating that different cuts of the same carcass can influence the microbial contamination of beef. Despite the initially high complexity of the carcass microbiota, the steaks after aerobic storage at 4°C showed a dramatic decrease in microbial complexity. Pseudomonas sp. and Brochothrix thermosphacta were the main contaminants, and Acinetobacter, Psychrobacter and Enterobacteriaceae were also found. Comparing the relative abundance of OTUs in the different samples it was shown that abundant OTUs in beefsteaks after storage occurred in the corresponding carcass. However, the abundance of these same OTUs clearly increased in environmental samples taken in the processing plant suggesting that spoilage-associated microbial species originate from carcasses, they are carried to the processing environment where the meat is handled and there they become a resident microbiota. Such microbiota is then further spread on meat when it is handled and it represents the starting microbial association

  14. Determining the diet of larvae of western rock lobster (Panulirus cygnus using high-throughput DNA sequencing techniques.

    Directory of Open Access Journals (Sweden)

    Richard O'Rorke

    Full Text Available The Western Australian rock lobster fishery has been both a highly productive and sustainable fishery. However, a recent dramatic and unexplained decline in post-larval recruitment threatens this sustainability. Our lack of knowledge of key processes in lobster larval ecology, such as their position in the food web, limits our ability to determine what underpins this decline. The present study uses a high-throughput amplicon sequencing approach on DNA obtained from the hepatopancreas of larvae to discover significant prey items. Two short regions of the 18S rRNA gene were amplified under the presence of lobster specific PNA to prevent lobster amplification and to improve prey amplification. In the resulting sequences either little prey was recovered, indicating that the larval gut was empty, or there was a high number of reads originating from multiple zooplankton taxa. The most abundant reads included colonial Radiolaria, Thaliacea, Actinopterygii, Hydrozoa and Sagittoidea, which supports the hypothesis that the larvae feed on multiple groups of mostly transparent gelatinous zooplankton. This hypothesis has prevailed as it has been tentatively inferred from the physiology of larvae, captive feeding trials and co-occurrence in situ. However, these prey have not been observed in the larval gut as traditional microscopic techniques cannot discern between transparent and gelatinous prey items in the gut. High-throughput amplicon sequencing of gut DNA has enabled us to classify these otherwise undetectable prey. The dominance of the colonial radiolarians among the gut contents is intriguing in that this group has been historically difficult to quantify in the water column, which may explain why they have not been connected to larval diet previously. Our results indicate that a PCR based technique is a very successful approach to identify the most abundant taxa in the natural diet of lobster larvae.

  15. Determining the diet of larvae of western rock lobster (Panulirus cygnus) using high-throughput DNA sequencing techniques.

    Science.gov (United States)

    O'Rorke, Richard; Lavery, Shane; Chow, Seinen; Takeyama, Haruko; Tsai, Peter; Beckley, Lynnath E; Thompson, Peter A; Waite, Anya M; Jeffs, Andrew G

    2012-01-01

    The Western Australian rock lobster fishery has been both a highly productive and sustainable fishery. However, a recent dramatic and unexplained decline in post-larval recruitment threatens this sustainability. Our lack of knowledge of key processes in lobster larval ecology, such as their position in the food web, limits our ability to determine what underpins this decline. The present study uses a high-throughput amplicon sequencing approach on DNA obtained from the hepatopancreas of larvae to discover significant prey items. Two short regions of the 18S rRNA gene were amplified under the presence of lobster specific PNA to prevent lobster amplification and to improve prey amplification. In the resulting sequences either little prey was recovered, indicating that the larval gut was empty, or there was a high number of reads originating from multiple zooplankton taxa. The most abundant reads included colonial Radiolaria, Thaliacea, Actinopterygii, Hydrozoa and Sagittoidea, which supports the hypothesis that the larvae feed on multiple groups of mostly transparent gelatinous zooplankton. This hypothesis has prevailed as it has been tentatively inferred from the physiology of larvae, captive feeding trials and co-occurrence in situ. However, these prey have not been observed in the larval gut as traditional microscopic techniques cannot discern between transparent and gelatinous prey items in the gut. High-throughput amplicon sequencing of gut DNA has enabled us to classify these otherwise undetectable prey. The dominance of the colonial radiolarians among the gut contents is intriguing in that this group has been historically difficult to quantify in the water column, which may explain why they have not been connected to larval diet previously. Our results indicate that a PCR based technique is a very successful approach to identify the most abundant taxa in the natural diet of lobster larvae.

  16. Targeted high throughput sequencing in hereditary ataxia and spastic paraplegia

    OpenAIRE

    Iqbal, Zafar; Rydning, Siri L.; Wedding, Iselin M.; Koht, Jeanette; Pihlstr?m, Lasse; Rengmark, Aina H.; Henriksen, Sandra P.; Tallaksen, Chantal M. E.; Toft, Mathias

    2017-01-01

    Hereditary ataxia and spastic paraplegia are heterogeneous monogenic neurodegenerative disorders. To date, a large number of individuals with such disorders remain undiagnosed. Here, we have assessed molecular diagnosis by gene panel sequencing in 105 early and late-onset hereditary ataxia and spastic paraplegia probands, in whom extensive previous investigations had failed to identify the genetic cause of disease. Pathogenic and likely-pathogenic variants were identified in 20 probands (19%)...

  17. Targeted high throughput sequencing in hereditary ataxia and spastic paraplegia.

    Science.gov (United States)

    Iqbal, Zafar; Rydning, Siri L; Wedding, Iselin M; Koht, Jeanette; Pihlstrøm, Lasse; Rengmark, Aina H; Henriksen, Sandra P; Tallaksen, Chantal M E; Toft, Mathias

    2017-01-01

    Hereditary ataxia and spastic paraplegia are heterogeneous monogenic neurodegenerative disorders. To date, a large number of individuals with such disorders remain undiagnosed. Here, we have assessed molecular diagnosis by gene panel sequencing in 105 early and late-onset hereditary ataxia and spastic paraplegia probands, in whom extensive previous investigations had failed to identify the genetic cause of disease. Pathogenic and likely-pathogenic variants were identified in 20 probands (19%) and variants of uncertain significance in ten probands (10%). Together these accounted for 30 probands (29%) and involved 18 different genes. Among several interesting findings, dominantly inherited KIF1A variants, p.(Val8Met) and p.(Ile27Thr) segregated in two independent families, both presenting with a pure spastic paraplegia phenotype. Two homozygous missense variants, p.(Gly4230Ser) and p.(Leu4221Val) were found in SACS in one consanguineous family, presenting with spastic ataxia and isolated cerebellar atrophy. The average disease duration in probands with pathogenic and likely-pathogenic variants was 31 years, ranging from 4 to 51 years. In conclusion, this study confirmed and expanded the clinical phenotypes associated with known disease genes. The results demonstrate that gene panel sequencing and similar sequencing approaches can serve as efficient diagnostic tools for different heterogeneous disorders. Early use of such strategies may help to reduce both costs and time of the diagnostic process.

  18. High Throughput Sequencing of Extracellular RNA from Human Plasma.

    Directory of Open Access Journals (Sweden)

    Kirsty M Danielson

    Full Text Available The presence and relative stability of extracellular RNAs (exRNAs in biofluids has led to an emerging recognition of their promise as 'liquid biopsies' for diseases. Most prior studies on discovery of exRNAs as disease-specific biomarkers have focused on microRNAs (miRNAs using technologies such as qRT-PCR and microarrays. The recent application of next-generation sequencing to discovery of exRNA biomarkers has revealed the presence of potential novel miRNAs as well as other RNA species such as tRNAs, snoRNAs, piRNAs and lncRNAs in biofluids. At the same time, the use of RNA sequencing for biofluids poses unique challenges, including low amounts of input RNAs, the presence of exRNAs in different compartments with varying degrees of vulnerability to isolation techniques, and the high abundance of specific RNA species (thereby limiting the sensitivity of detection of less abundant species. Moreover, discovery in human diseases often relies on archival biospecimens of varying age and limiting amounts of samples. In this study, we have tested RNA isolation methods to optimize profiling exRNAs by RNA sequencing in individuals without any known diseases. Our findings are consistent with other recent studies that detect microRNAs and ribosomal RNAs as the major exRNA species in plasma. Similar to other recent studies, we found that the landscape of biofluid microRNA transcriptome is dominated by several abundant microRNAs that appear to comprise conserved extracellular miRNAs. There is reasonable correlation of sets of conserved miRNAs across biological replicates, and even across other data sets obtained at different investigative sites. Conversely, the detection of less abundant miRNAs is far more dependent on the exact methodology of RNA isolation and profiling. This study highlights the challenges in detecting and quantifying less abundant plasma miRNAs in health and disease using RNA sequencing platforms.

  19. The gut microbiotassay – a high-throughput real-time PCR chip combined with next generation sequencing

    DEFF Research Database (Denmark)

    Hermann-Bank, Marie Louise; Skovgaard, Kerstin; Mølbak, Lars

    informative. Many methods can be used to try to define and characterize the gut microbiota. Here we designed an assay consisting of twenty-four different primer systems targeting the most common bacterial groups of the intestine on different hierarchical levels. The aim of this study was to implement and test...... this assay with the high-throughput real-time PCR chip “Access Array 48.48” from Fluidigm. The chip executes 2304 individual reactions in parallel and afterwards it is possible to harvest the amplicons for next-generation sequencing. This approach gives a taxonomical overview of the gut microbiota, hence...... the name: ‘the gut microbiotassay’. The assay was tested on fifteen different bacterial type strains each functioning as target for one or more of the primer systems. In this way the sensitivity and the specificity of the primers were assessed. Next the assay was tested on complex ecosystems by extracting...

  20. High-throughput sequencing of black pepper root transcriptome

    Directory of Open Access Journals (Sweden)

    Gordo Sheila MC

    2012-09-01

    Full Text Available Abstract Background Black pepper (Piper nigrum L. is one of the most popular spices in the world. It is used in cooking and the preservation of food and even has medicinal properties. Losses in production from disease are a major limitation in the culture of this crop. The major diseases are root rot and foot rot, which are results of root infection by Fusarium solani and Phytophtora capsici, respectively. Understanding the molecular interaction between the pathogens and the host’s root region is important for obtaining resistant cultivars by biotechnological breeding. Genetic and molecular data for this species, though, are limited. In this paper, RNA-Seq technology has been employed, for the first time, to describe the root transcriptome of black pepper. Results The root transcriptome of black pepper was sequenced by the NGS SOLiD platform and assembled using the multiple-k method. Blast2Go and orthoMCL methods were used to annotate 10338 unigenes. The 4472 predicted proteins showed about 52% homology with the Arabidopsis proteome. Two root proteomes identified 615 proteins, which seem to define the plant’s root pattern. Simple-sequence repeats were identified that may be useful in studies of genetic diversity and may have applications in biotechnology and ecology. Conclusions This dataset of 10338 unigenes is crucially important for the biotechnological breeding of black pepper and the ecogenomics of the Magnoliids, a major group of basal angiosperms.

  1. High-throughput sequencing of black pepper root transcriptome.

    Science.gov (United States)

    Gordo, Sheila M C; Pinheiro, Daniel G; Moreira, Edith C O; Rodrigues, Simone M; Poltronieri, Marli C; de Lemos, Oriel F; da Silva, Israel Tojal; Ramos, Rommel T J; Silva, Artur; Schneider, Horacio; Silva, Wilson A; Sampaio, Iracilda; Darnet, Sylvain

    2012-09-17

    Black pepper (Piper nigrum L.) is one of the most popular spices in the world. It is used in cooking and the preservation of food and even has medicinal properties. Losses in production from disease are a major limitation in the culture of this crop. The major diseases are root rot and foot rot, which are results of root infection by Fusarium solani and Phytophtora capsici, respectively. Understanding the molecular interaction between the pathogens and the host's root region is important for obtaining resistant cultivars by biotechnological breeding. Genetic and molecular data for this species, though, are limited. In this paper, RNA-Seq technology has been employed, for the first time, to describe the root transcriptome of black pepper. The root transcriptome of black pepper was sequenced by the NGS SOLiD platform and assembled using the multiple-k method. Blast2Go and orthoMCL methods were used to annotate 10338 unigenes. The 4472 predicted proteins showed about 52% homology with the Arabidopsis proteome. Two root proteomes identified 615 proteins, which seem to define the plant's root pattern. Simple-sequence repeats were identified that may be useful in studies of genetic diversity and may have applications in biotechnology and ecology. This dataset of 10338 unigenes is crucially important for the biotechnological breeding of black pepper and the ecogenomics of the Magnoliids, a major group of basal angiosperms.

  2. High-throughput sequencing of black pepper root transcriptome

    Science.gov (United States)

    2012-01-01

    Background Black pepper (Piper nigrum L.) is one of the most popular spices in the world. It is used in cooking and the preservation of food and even has medicinal properties. Losses in production from disease are a major limitation in the culture of this crop. The major diseases are root rot and foot rot, which are results of root infection by Fusarium solani and Phytophtora capsici, respectively. Understanding the molecular interaction between the pathogens and the host’s root region is important for obtaining resistant cultivars by biotechnological breeding. Genetic and molecular data for this species, though, are limited. In this paper, RNA-Seq technology has been employed, for the first time, to describe the root transcriptome of black pepper. Results The root transcriptome of black pepper was sequenced by the NGS SOLiD platform and assembled using the multiple-k method. Blast2Go and orthoMCL methods were used to annotate 10338 unigenes. The 4472 predicted proteins showed about 52% homology with the Arabidopsis proteome. Two root proteomes identified 615 proteins, which seem to define the plant’s root pattern. Simple-sequence repeats were identified that may be useful in studies of genetic diversity and may have applications in biotechnology and ecology. Conclusions This dataset of 10338 unigenes is crucially important for the biotechnological breeding of black pepper and the ecogenomics of the Magnoliids, a major group of basal angiosperms. PMID:22984782

  3. The Microsoft Biology Foundation Applications for High-Throughput Sequencing

    Science.gov (United States)

    Mercer, S.

    2010-01-01

    w9-2 The need for reusable libraries of bioinformatics functions has been recognized for many years and a number of language-specific toolkits have been constructed. Such toolkits have served as valuable nucleation points for the community, promoting the sharing of code and establishing standards. The majority of DNA sequencing machines and many other standard pieces of lab equipment are controlled by PCs using Windows, and a Microsoft genomics toolkit would enable initial processing and quality control to happen closer to the instrumentation and provide opportunities for added-value services within core facilities. The Microsoft Biology Foundation (MBF) is an open source software library, freely available for both commercial and academic use, available as an early-stage betafrom mbf.codeplex.com. This presentation will describe the structure and goals of MBF and demonstrate some of its uses.

  4. Barcoded sequencing workflow for high throughput digitization of hybridoma antibody variable domain sequences.

    Science.gov (United States)

    Chen, Yongmei; Kim, Si Hyun; Shang, Yonglei; Guillory, Joseph; Stinson, Jeremy; Zhang, Qing; Hötzel, Isidro; Hoi, Kam Hon

    2018-01-20

    Since the invention of Hybridoma technology by Milstein and Köhler in 1975, its application has greatly advanced the antibody discovery process. The technology enables both functional screening and long-term archival of the immortalized monoclonal antibody producing B cells. Despite the dependable cryopreservation technology for hybridoma cells, practicality of long-term storage has been outpaced by recent progress in robotics and automations, which enables routine identification of thousands of antigen specific hybridoma clones. Such throughput increase imposes two nascent challenges in the antibody discovery process, namely limited cryopreservation storage space and limited throughput in conventional antibody sequencing. We herein provide a barcoded sequencing workflow that utilizes next generation sequencing to expand the conventional sequencing capacity. Accompanied with the bioinformatics tools we describe, the barcoded sequencing workflow robustly reports unambiguous antibody sequences as confirmed with Sanger sequencing controls. In complement with the commonly accessible recombinant DNA technology, the barcoded sequencing workflow allows for high throughput digitization of the antibody sequences and provides an effective solution to the limitations imposed by physical storage and sequencing capacity. Copyright © 2018 Genentech, Inc. Published by Elsevier B.V. All rights reserved.

  5. ESSENTIALS: Software for Rapid Analysis of High Throughput Transposon Insertion Sequencing Data.

    NARCIS (Netherlands)

    Zomer, A.L.; Burghout, P.J.; Bootsma, H.J.; Hermans, P.W.M.; Hijum, S.A.F.T. van

    2012-01-01

    High-throughput analysis of genome-wide random transposon mutant libraries is a powerful tool for (conditional) essential gene discovery. Recently, several next-generation sequencing approaches, e.g. Tn-seq/INseq, HITS and TraDIS, have been developed that accurately map the site of transposon

  6. Increasing ecological inference from high throughput sequencing of fungi in the environment through a tagging approach

    Science.gov (United States)

    D. Lee Taylor; Michael G. Booth; Jack W. McFarland; Ian C. Herriott; Niall J. Lennon; Chad Nusbaum; Thomas G. Marr

    2008-01-01

    High throughput sequencing methods are widely used in analyses of microbial diversity but are generally applied to small numbers of samples, which precludes charaterization of patterns of microbial diversity across space and time. We have designed a primer-tagging approach that allows pooling and subsequent sorting of numerous samples, which is directed to...

  7. Deep Mutational Scanning: Library Construction, Functional Selection, and High-Throughput Sequencing.

    Science.gov (United States)

    Starita, Lea M; Fields, Stanley

    2015-08-03

    Deep mutational scanning is a highly parallel method that uses high-throughput sequencing to track changes in >10(5) protein variants before and after selection to measure the effects of mutations on protein function. Here we outline the stages of a deep mutational scanning experiment, focusing on the construction of libraries of protein sequence variants and the preparation of Illumina sequencing libraries. © 2015 Cold Spring Harbor Laboratory Press.

  8. The promise and challenge of high-throughput sequencing of the antibody repertoire

    Science.gov (United States)

    Georgiou, George; Ippolito, Gregory C; Beausang, John; Busse, Christian E; Wardemann, Hedda; Quake, Stephen R

    2014-01-01

    Efforts to determine the antibody repertoire encoded by B cells in the blood or lymphoid organs using high-throughput DNA sequencing technologies have been advancing at an extremely rapid pace and are transforming our understanding of humoral immune responses. Information gained from high-throughput DNA sequencing of immunoglobulin genes (Ig-seq) can be applied to detect B-cell malignancies with high sensitivity, to discover antibodies specific for antigens of interest, to guide vaccine development and to understand autoimmunity. Rapid progress in the development of experimental protocols and informatics analysis tools is helping to reduce sequencing artifacts, to achieve more precise quantification of clonal diversity and to extract the most pertinent biological information. That said, broader application of Ig-seq, especially in clinical settings, will require the development of a standardized experimental design framework that will enable the sharing and meta-analysis of sequencing data generated by different laboratories. PMID:24441474

  9. Pyicos: a versatile toolkit for the analysis of high-throughput sequencing data

    OpenAIRE

    Althammer, Sonja Daniela; González-Vallinas Rostes, Juan, 1983-; Ballaré, Cecilia Julia; Beato, Miguel; Eyras Jiménez, Eduardo

    2011-01-01

    Motivation: High-throughput sequencing (HTS) has revolutionized gene regulation studies and is now fundamental for the detection of protein?DNA and protein?RNA binding, as well as for measuring RNA expression. With increasing variety and sequencing depth of HTS datasets, the need for more flexible and memory-efficient tools to analyse them is growing. Results: We describe Pyicos, a powerful toolkit for the analysis of mapped reads from diverse HTS experiments: ChIP-Seq, either punctuated or b...

  10. Evaluation of a pooled strategy for high-throughput sequencing of cosmid clones from metagenomic libraries.

    Directory of Open Access Journals (Sweden)

    Kathy N Lam

    Full Text Available High-throughput sequencing methods have been instrumental in the growing field of metagenomics, with technological improvements enabling greater throughput at decreased costs. Nonetheless, the economy of high-throughput sequencing cannot be fully leveraged in the subdiscipline of functional metagenomics. In this area of research, environmental DNA is typically cloned to generate large-insert libraries from which individual clones are isolated, based on specific activities of interest. Sequence data are required for complete characterization of such clones, but the sequencing of a large set of clones requires individual barcode-based sample preparation; this can become costly, as the cost of clone barcoding scales linearly with the number of clones processed, and thus sequencing a large number of metagenomic clones often remains cost-prohibitive. We investigated a hybrid Sanger/Illumina pooled sequencing strategy that omits barcoding altogether, and we evaluated this strategy by comparing the pooled sequencing results to reference sequence data obtained from traditional barcode-based sequencing of the same set of clones. Using identity and coverage metrics in our evaluation, we show that pooled sequencing can generate high-quality sequence data, without producing problematic chimeras. Though caveats of a pooled strategy exist and further optimization of the method is required to improve recovery of complete clone sequences and to avoid circumstances that generate unrecoverable clone sequences, our results demonstrate that pooled sequencing represents an effective and low-cost alternative for sequencing large sets of metagenomic clones.

  11. Targeted gene enrichment and high-throughput sequencing for environmental biomonitoring: a case study using freshwater macroinvertebrates.

    Science.gov (United States)

    Dowle, Eddy J; Pochon, Xavier; C Banks, Jonathan; Shearer, Karen; Wood, Susanna A

    2016-09-01

    Recent studies have advocated biomonitoring using DNA techniques. In this study, two high-throughput sequencing (HTS)-based methods were evaluated: amplicon metabarcoding of the cytochrome C oxidase subunit I (COI) mitochondrial gene and gene enrichment using MYbaits (targeting nine different genes including COI). The gene-enrichment method does not require PCR amplification and thus avoids biases associated with universal primers. Macroinvertebrate samples were collected from 12 New Zealand rivers. Macroinvertebrates were morphologically identified and enumerated, and their biomass determined. DNA was extracted from all macroinvertebrate samples and HTS undertaken using the illumina miseq platform. Macroinvertebrate communities were characterized from sequence data using either six genes (three of the original nine were not used) or just the COI gene in isolation. The gene-enrichment method (all genes) detected the highest number of taxa and obtained the strongest Spearman rank correlations between the number of sequence reads, abundance and biomass in 67% of the samples. Median detection rates across rare (5%) taxa were highest using the gene-enrichment method (all genes). Our data indicated primer biases occurred during amplicon metabarcoding with greater than 80% of sequence reads originating from one taxon in several samples. The accuracy and sensitivity of both HTS methods would be improved with more comprehensive reference sequence databases. The data from this study illustrate the challenges of using PCR amplification-based methods for biomonitoring and highlight the potential benefits of using approaches, such as gene enrichment, which circumvent the need for an initial PCR step. © 2015 John Wiley & Sons Ltd.

  12. Filling reference gaps via assembling DNA barcodes using high-throughput sequencing-moving toward barcoding the world.

    Science.gov (United States)

    Liu, Shanlin; Yang, Chentao; Zhou, Chengran; Zhou, Xin

    2017-12-01

    Over the past decade, biodiversity researchers have dedicated tremendous efforts to constructing DNA reference barcodes for rapid species registration and identification. Although analytical cost for standard DNA barcoding has been significantly reduced since early 2000, further dramatic reduction in barcoding costs is unlikely because Sanger sequencing is approaching its limits in throughput and chemistry cost. Constraints in barcoding cost not only led to unbalanced barcoding efforts around the globe, but also prevented high-throughput sequencing (HTS)-based taxonomic identification from applying binomial species names, which provide crucial linkages to biological knowledge. We developed an Illumina-based pipeline, HIFI-Barcode, to produce full-length Cytochrome c oxidase subunit I (COI) barcodes from pooled polymerase chain reaction amplicons generated by individual specimens. The new pipeline generated accurate barcode sequences that were comparable to Sanger standards, even for different haplotypes of the same species that were only a few nucleotides different from each other. Additionally, the new pipeline was much more sensitive in recovering amplicons at low quantity. The HIFI-Barcode pipeline successfully recovered barcodes from more than 78% of the polymerase chain reactions that didn't show clear bands on the electrophoresis gel. Moreover, sequencing results based on the single molecular sequencing platform Pacbio confirmed the accuracy of the HIFI-Barcode results. Altogether, the new pipeline can provide an improved solution to produce full-length reference barcodes at about one-tenth of the current cost, enabling construction of comprehensive barcode libraries for local fauna, leading to a feasible direction for DNA barcoding global biomes. © The Authors 2017. Published by Oxford University Press.

  13. The application of the high throughput sequencing technology in the transposable elements.

    Science.gov (United States)

    Liu, Zhen; Xu, Jian-hong

    2015-09-01

    High throughput sequencing technology has dramatically improved the efficiency of DNA sequencing, and decreased the costs to a great extent. Meanwhile, this technology usually has advantages of better specificity, higher sensitivity and accuracy. Therefore, it has been applied to the research on genetic variations, transcriptomics and epigenomics. Recently, this technology has been widely employed in the studies of transposable elements and has achieved fruitful results. In this review, we summarize the application of high throughput sequencing technology in the fields of transposable elements, including the estimation of transposon content, preference of target sites and distribution, insertion polymorphism and population frequency, identification of rare copies, transposon horizontal transfers as well as transposon tagging. We also briefly introduce the major common sequencing strategies and algorithms, their advantages and disadvantages, and the corresponding solutions. Finally, we envision the developing trends of high throughput sequencing technology, especially the third generation sequencing technology, and its application in transposon studies in the future, hopefully providing a comprehensive understanding and reference for related scientific researchers.

  14. Comprehensive molecular diagnosis of Bardet-Biedl syndrome by high-throughput targeted exome sequencing.

    Directory of Open Access Journals (Sweden)

    Dong-Jun Xing

    Full Text Available Bardet-Biedl syndrome (BBS is an autosomal recessive disorder with significant genetic heterogeneity. BBS is linked to mutations in 17 genes, which contain more than 200 coding exons. Currently, BBS is diagnosed by direct DNA sequencing for mutations in these genes, which because of the large genomic screening region is both time-consuming and expensive. In order to develop a practical method for the clinic diagnosis of BBS, we have developed a high-throughput targeted exome sequencing (TES for genetic diagnosis. Five typical BBS patients were recruited and screened for mutations in a total of 144 known genes responsible for inherited retinal diseases, a hallmark symptom of BBS. The genomic DNA of these patients and their families were subjected to high-throughput DNA re-sequencing. Deep bioinformatics analysis was carried out to filter the massive sequencing data, which were further confirmed through co-segregation analysis. TES successfully revealed mutations in BBS genes in each patient and family member. Six pathological mutations, including five novel mutations, were revealed in the genes BBS2, MKKS, ARL6, MKS1. This study represents the first report of targeted exome sequencing in BBS patients and demonstrates that high-throughput TES is an accurate and rapid method for the genetic diagnosis of BBS.

  15. High-throughput Sequencing Based Immune Repertoire Study during Infectious Disease

    Directory of Open Access Journals (Sweden)

    Dongni Hou

    2016-08-01

    Full Text Available The selectivity of the adaptive immune response is based on the enormous diversity of T and B cell antigen-specific receptors. The immune repertoire, the collection of T and B cells with functional diversity in the circulatory system at any given time, is dynamic and reflects the essence of immune selectivity. In this article, we review the recent advances in immune repertoire study of infectious diseases that achieved by traditional techniques and high-throughput sequencing techniques. High-throughput sequencing techniques enable the determination of complementary regions of lymphocyte receptors with unprecedented efficiency and scale. This progress in methodology enhances the understanding of immunologic changes during pathogen challenge, and also provides a basis for further development of novel diagnostic markers, immunotherapies and vaccines.

  16. High-throughput sequencing of forensic genetic samples using punches of FTA cards with buccal swabs

    DEFF Research Database (Denmark)

    Kampmann, Marie-Louise; Buchard, Anders; Børsting, Claus

    2016-01-01

    Here, we demonstrate that punches from buccal swab samples preserved on FTA cards can be used for high-throughput DNA sequencing, also known as massively parallel sequencing (MPS). We typed 44 reference samples with the HID-Ion AmpliSeq Identity Panel using washed 1.2 mm punches from FTA cards...... with buccal swabs and compared the results with those obtained with DNA extracted using the EZ1 DNA Investigator Kit. Concordant profiles were obtained for all samples. Our protocol includes simple punch, wash, and PCR steps, reducing cost and hands-on time in the laboratory. Furthermore, it facilitates...... automation of DNA sequencing....

  17. Finding sRNA generative locales from high-throughput sequencing data with NiBLS

    Directory of Open Access Journals (Sweden)

    Moulton Vincent

    2010-02-01

    Full Text Available Abstract Background Next-generation sequencing technologies allow researchers to obtain millions of sequence reads in a single experiment. One important use of the technology is the sequencing of small non-coding regulatory RNAs and the identification of the genomic locales from which they originate. Currently, there is a paucity of methods for finding small RNA generative locales. Results We describe and implement an algorithm that can determine small RNA generative locales from high-throughput sequencing data. The algorithm creates a network, or graph, of the small RNAs by creating links between them depending on their proximity on the target genome. For each of the sub-networks in the resulting graph the clustering coefficient, a measure of the interconnectedness of the subnetwork, is used to identify the generative locales. We test the algorithm over a wide range of parameters using RFAM sequences as positive controls and demonstrate that the algorithm has good sensitivity and specificity in a range of Arabidopsis and mouse small RNA sequence sets and that the locales it generates are robust to differences in the choice of parameters. Conclusions NiBLS is a fast, reliable and sensitive method for determining small RNA locales in high-throughput sequence data that is generally applicable to all classes of small RNA.

  18. High-Throughput Sequencing and Mutagenesis to Accelerate the Domestication of Microlaena stipoides as a New Food Crop

    Science.gov (United States)

    Shapter, Frances M.; Cross, Michael; Ablett, Gary; Malory, Sylvia; Chivers, Ian H.; King, Graham J.; Henry, Robert J.

    2013-01-01

    Global food demand, climatic variability and reduced land availability are driving the need for domestication of new crop species. The accelerated domestication of a rice-like Australian dryland polyploid grass, Microlaena stipoides (Poaceae), was targeted using chemical mutagenesis in conjunction with high throughput sequencing of genes for key domestication traits. While M. stipoides has previously been identified as having potential as a new grain crop for human consumption, only a limited understanding of its genetic diversity and breeding system was available to aid the domestication process. Next generation sequencing of deeply-pooled target amplicons estimated allelic diversity of a selected base population at 14.3 SNP/Mb and identified novel, putatively mutation-induced polymorphisms at about 2.4 mutations/Mb. A 97% lethal dose (LD97) of ethyl methanesulfonate treatment was applied without inducing sterility in this polyploid species. Forward and reverse genetic screens identified beneficial alleles for the domestication trait, seed-shattering. Unique phenotypes observed in the M2 population suggest the potential for rapid accumulation of beneficial traits without recourse to a traditional cross-breeding strategy. This approach may be applicable to other wild species, unlocking their potential as new food, fibre and fuel crops. PMID:24367532

  19. High-throughput sequencing and mutagenesis to accelerate the domestication of Microlaena stipoides as a new food crop.

    Directory of Open Access Journals (Sweden)

    Frances M Shapter

    Full Text Available Global food demand, climatic variability and reduced land availability are driving the need for domestication of new crop species. The accelerated domestication of a rice-like Australian dryland polyploid grass, Microlaena stipoides (Poaceae, was targeted using chemical mutagenesis in conjunction with high throughput sequencing of genes for key domestication traits. While M. stipoides has previously been identified as having potential as a new grain crop for human consumption, only a limited understanding of its genetic diversity and breeding system was available to aid the domestication process. Next generation sequencing of deeply-pooled target amplicons estimated allelic diversity of a selected base population at 14.3 SNP/Mb and identified novel, putatively mutation-induced polymorphisms at about 2.4 mutations/Mb. A 97% lethal dose (LD₉₇ of ethyl methanesulfonate treatment was applied without inducing sterility in this polyploid species. Forward and reverse genetic screens identified beneficial alleles for the domestication trait, seed-shattering. Unique phenotypes observed in the M2 population suggest the potential for rapid accumulation of beneficial traits without recourse to a traditional cross-breeding strategy. This approach may be applicable to other wild species, unlocking their potential as new food, fibre and fuel crops.

  20. Improving High-Throughput Sequencing Approaches for Reconstructing the Evolutionary Dynamics of Upper Paleolithic Human Groups

    DEFF Research Database (Denmark)

    Seguin-Orlando, Andaine

    the development and testing of innovative molecular approaches aiming at improving the amount of informative HTS data one can recover from ancient DNA extracts. We have characterized important ligation and amplification biases in the sequencing library building and enrichment steps, which can impede further...... been mainly driven by the development of High-Throughput DNA Sequencing (HTS) technologies but also by the implementation of novel molecular tools tailored to the manipulation of ultra short and damaged DNA molecules. Our ability to retrieve traces of genetic material has tremendously improved, pushing...

  1. On the optimal trimming of high-throughput mRNA sequence data

    Directory of Open Access Journals (Sweden)

    Matthew D MacManes

    2014-01-01

    Full Text Available The widespread and rapid adoption of high-throughput sequencing technologies has afforded researchers the opportunity to gain a deep understanding of genome level processes that underlie evolutionary change, and perhaps more importantly, the links between genotype and phenotype. In particular, researchers interested in functional biology and adaptation have used these technologies to sequence mRNA transcriptomes of specific tissues, which in turn are often compared to other tissues, or other individuals with different phenotypes. While these techniques are extremely powerful, careful attention to data quality is required. In particular, because high-throughput sequencing is more error-prone than traditional Sanger sequencing, quality trimming of sequence reads should be an important step in all data processing pipelines. While several software packages for quality trimming exist, no general guidelines for the specifics of trimming have been developed. Here, using empirically derived sequence data, I provide general recommendations regarding the optimal strength of trimming, specifically in mRNA-Seq studies. Although very aggressive quality trimming is common, this study suggests that a more gentle trimming, specifically of those nucleotides whose Phred score < 2 or < 5, is optimal for most studies across a wide variety of metrics.

  2. Library Design-Facilitated High-Throughput Sequencing of Synthetic Peptide Libraries.

    Science.gov (United States)

    Vinogradov, Alexander A; Gates, Zachary P; Zhang, Chi; Quartararo, Anthony J; Halloran, Kathryn H; Pentelute, Bradley L

    2017-11-13

    A methodology to achieve high-throughput de novo sequencing of synthetic peptide mixtures is reported. The approach leverages shotgun nanoliquid chromatography coupled with tandem mass spectrometry-based de novo sequencing of library mixtures (up to 2000 peptides) as well as automated data analysis protocols to filter away incorrect assignments, noise, and synthetic side-products. For increasing the confidence in the sequencing results, mass spectrometry-friendly library designs were developed that enabled unambiguous decoding of up to 600 peptide sequences per hour while maintaining greater than 85% sequence identification rates in most cases. The reliability of the reported decoding strategy was additionally confirmed by matching fragmentation spectra for select authentic peptides identified from library sequencing samples. The methods reported here are directly applicable to screening techniques that yield mixtures of active compounds, including particle sorting of one-bead one-compound libraries and affinity enrichment of synthetic library mixtures performed in solution.

  3. Semi-automated library preparation for high-throughput DNA sequencing platforms.

    Science.gov (United States)

    Farias-Hesson, Eveline; Erikson, Jonathan; Atkins, Alexander; Shen, Peidong; Davis, Ronald W; Scharfe, Curt; Pourmand, Nader

    2010-01-01

    Next-generation sequencing platforms are powerful technologies, providing gigabases of genetic information in a single run. An important prerequisite for high-throughput DNA sequencing is the development of robust and cost-effective preprocessing protocols for DNA sample library construction. Here we report the development of a semi-automated sample preparation protocol to produce adaptor-ligated fragment libraries. Using a liquid-handling robot in conjunction with Carboxy Terminated Magnetic Beads, we labeled each library sample using a unique 6 bp DNA barcode, which allowed multiplex sample processing and sequencing of 32 libraries in a single run using Applied Biosystems' SOLiD sequencer. We applied our semi-automated pipeline to targeted medical resequencing of nuclear candidate genes in individuals affected by mitochondrial disorders. This novel method is capable of preparing as much as 32 DNA libraries in 2.01 days (8-hour workday) for emulsion PCR/high throughput DNA sequencing, increasing sample preparation production by 8-fold.

  4. A reporter system coupled with high-throughput sequencing unveils key bacterial transcription and translation determinants.

    Science.gov (United States)

    Yus, Eva; Yang, Jae-Seong; Sogues, Adrià; Serrano, Luis

    2017-08-28

    Quantitative analysis of the sequence determinants of transcription and translation regulation is relevant for systems and synthetic biology. To identify these determinants, researchers have developed different methods of screening random libraries using fluorescent reporters or antibiotic resistance genes. Here, we have implemented a generic approach called ELM-seq (expression level monitoring by DNA methylation) that overcomes the technical limitations of such classic reporters. ELM-seq uses DamID (Escherichia coli DNA adenine methylase as a reporter coupled with methylation-sensitive restriction enzyme digestion and high-throughput sequencing) to enable in vivo quantitative analyses of upstream regulatory sequences. Using the genome-reduced bacterium Mycoplasma pneumoniae, we show that ELM-seq has a large dynamic range and causes minimal toxicity. We use ELM-seq to determine key sequences (known and putatively novel) of promoter and untranslated regions that influence transcription and translation efficiency. Applying ELM-seq to other organisms will help us to further understand gene expression and guide synthetic biology.Quantitative analysis of how DNA sequence determines transcription and translation regulation is of interest to systems and synthetic biologists. Here the authors present ELM-seq, which uses Dam activity as reporter for high-throughput analysis of promoter and 5'-UTR regions.

  5. Grinder: a versatile amplicon and shotgun sequence simulator.

    Science.gov (United States)

    Angly, Florent E; Willner, Dana; Rohwer, Forest; Hugenholtz, Philip; Tyson, Gene W

    2012-07-01

    We introduce Grinder (http://sourceforge.net/projects/biogrinder/), an open-source bioinformatic tool to simulate amplicon and shotgun (genomic, metagenomic, transcriptomic and metatranscriptomic) datasets from reference sequences. This is the first tool to simulate amplicon datasets (e.g. 16S rRNA) widely used by microbial ecologists. Grinder can create sequence libraries with a specific community structure, α and β diversities and experimental biases (e.g. chimeras, gene copy number variation) for commonly used sequencing platforms. This versatility allows the creation of simple to complex read datasets necessary for hypothesis testing when developing bioinformatic software, benchmarking existing tools or designing sequence-based experiments. Grinder is particularly useful for simulating clinical or environmental microbial communities and complements the use of in vitro mock communities.

  6. Forecasting Ecological Genomics: High-Tech Animal Instrumentation Meets High-Throughput Sequencing.

    Science.gov (United States)

    Shafer, Aaron B A; Northrup, Joseph M; Wikelski, Martin; Wittemyer, George; Wolf, Jochen B W

    2016-01-01

    Recent advancements in animal tracking technology and high-throughput sequencing are rapidly changing the questions and scope of research in the biological sciences. The integration of genomic data with high-tech animal instrumentation comes as a natural progression of traditional work in ecological genetics, and we provide a framework for linking the separate data streams from these technologies. Such a merger will elucidate the genetic basis of adaptive behaviors like migration and hibernation and advance our understanding of fundamental ecological and evolutionary processes such as pathogen transmission, population responses to environmental change, and communication in natural populations.

  7. New Tools For Understanding Microbial Diversity Using High-throughput Sequence Data

    Science.gov (United States)

    Knight, R.; Hamady, M.; Liu, Z.; Lozupone, C.

    2007-12-01

    High-throughput sequencing techniques such as 454 are straining the limits of tools traditionally used to build trees, choose OTUs, and perform other essential sequencing tasks. We have developed a workflow for phylogenetic analysis of large-scale sequence data sets that combines existing tools, such as the Arb phylogeny package and the NAST multiple sequence alignment tool, with new methods for choosing and clustering OTUs and for performing phylogenetic community analysis with UniFrac. This talk discusses the cyberinfrastructure we are developing to support the human microbiome project, and the application of these workflows to analyze very large data sets that contrast the gut microbiota with a range of physical environments. These tools will ultimately help to define core and peripheral microbiomes in a range of environments, and will allow us to understand the physical and biotic factors that contribute most to differences in microbial diversity.

  8. Unraveling long non-coding RNAs through analysis of high-throughput RNA-sequencing data

    Directory of Open Access Journals (Sweden)

    Rashmi Tripathi

    2017-06-01

    Full Text Available Extensive genome-wide transcriptome study mediated by high throughput sequencing technique has revolutionized the study of genetics and epigenetic at unprecedented resolution. The research has revealed that besides protein-coding RNAs, large proportions of mammalian transcriptome includes a heap of regulatory non protein-coding RNAs, the number encoded within human genome is enigmatic. Many taboos developed in the past categorized these non-coding RNAs as ‘‘dark matter” and “junks”. Breaking the myth, RNA-seq-- a recently developed experimental technique is widely being used for studying non-coding RNAs which has acquired the limelight due to their physiological and pathological significance. The longest member of the ncRNA family-- long non-coding RNAs, acts as stable and functional part of a genome, guiding towards the important clues about the varied biological events like cellular-, structural- processes governing the complexity of an organism. Here, we review the most recent and influential computational approach developed to identify and quantify the long non-coding RNAs serving as an assistant for the users to choose appropriate tools for their specific research. Keywords: Transcriptome, High throughput sequencing, Genetic and epigenetic, Long non-coding RNA, RNA-sequencing, RNA-seq

  9. Characterizing ncRNAs in human pathogenic protists using high-throughput sequencing technology

    Directory of Open Access Journals (Sweden)

    Lesley Joan Collins

    2011-12-01

    Full Text Available ncRNAs are key genes in many human diseases including cancer and viral infection, as well as providing critical functions in pathogenic organisms such as fungi, bacteria, viruses and protists. Until now the identification and characterization of ncRNAs associated with disease has been slow or inaccurate requiring many years of testing to understand complicated RNA and protein gene relationships. High-throughput sequencing now offers the opportunity to characterize miRNAs, siRNAs, snoRNAs and long ncRNAs on a genomic scale making it faster and easier to clarify how these ncRNAs contribute to the disease state. However, this technology is still relatively new, and ncRNA discovery is not an application of high priority for streamlined bioinformatics. Here we summarize background concepts and practical approaches for ncRNA analysis using high-throughput sequencing, and how it relates to understanding human disease. As a case study, we focus on the parasitic protists Giardia lamblia and Trichomonas vaginalis, where large evolutionary distance has meant difficulties in comparing ncRNAs with those from model eukaryotes. A combination of biological, computational and sequencing approaches has enabled easier classification of ncRNA classes such as snoRNAs, but has also aided the identification of novel classes. It is hoped that a higher level of understanding of ncRNA expression and interaction may aid in the development of less harsh treatment for protist-based diseases.

  10. Using machine learning and high-throughput RNA sequencing to classify the precursors of small non-coding RNAs

    OpenAIRE

    Ryvkin, Paul; Leung, Yuk Yee; Ungar, Lyle H.; Gregory, Brian D.; Wang, Li-San

    2013-01-01

    Recent advances in high-throughput sequencing allow researchers to examine the transcriptome in more detail than ever before. Using a method known as high-throughput small RNA-sequencing, we can now profile the expression of small regulatory RNAs such as microRNAs and small interfering RNAs (siRNAs) with a great deal of sensitivity. However, there are many other types of small RNAs (

  11. Comparing the performance of three ancient DNA extraction methods for high-throughput sequencing

    DEFF Research Database (Denmark)

    Gamba, Cristina; Hanghøj, Kristian Ebbesen; Gaunitz, Charleen

    2016-01-01

    The DNA molecules that can be extracted from archaeological and palaeontological remains are often degraded and massively contaminated with environmental microbial material. This reduces the efficacy of shotgun approaches for sequencing ancient genomes, despite the decreasing sequencing costs...... of high-throughput sequencing (HTS). Improving the recovery of endogenous molecules from the DNA extraction and purification steps could, thus, help advance the characterization of ancient genomes. Here, we apply the three most commonly used DNA extraction methods to five ancient bone samples spanning...... a ~30 thousand year temporal range and originating from a diversity of environments, from South America to Alaska. We show that methods based on the purification of DNA fragments using silica columns are more advantageous than in solution methods and increase not only the total amount of DNA molecules...

  12. Improved Efficiency and Reliability of NGS Amplicon Sequencing Data Analysis for Genetic Diagnostic Procedures Using AGSA Software

    Directory of Open Access Journals (Sweden)

    Axel Poulet

    2016-01-01

    Full Text Available Screening for BRCA mutations in women with familial risk of breast or ovarian cancer is an ideal situation for high-throughput sequencing, providing large amounts of low cost data. However, 454, Roche, and Ion Torrent, Thermo Fisher, technologies produce homopolymer-associated indel errors, complicating their use in routine diagnostics. We developed software, named AGSA, which helps to detect false positive mutations in homopolymeric sequences. Seventy-two familial breast cancer cases were analysed in parallel by amplicon 454 pyrosequencing and Sanger dideoxy sequencing for genetic variations of the BRCA genes. All 565 variants detected by dideoxy sequencing were also detected by pyrosequencing. Furthermore, pyrosequencing detected 42 variants that were missed with Sanger technique. Six amplicons contained homopolymer tracts in the coding sequence that were systematically misread by the software supplied by Roche. Read data plotted as histograms by AGSA software aided the analysis considerably and allowed validation of the majority of homopolymers. As an optimisation, additional 250 patients were analysed using microfluidic amplification of regions of interest (Access Array Fluidigm of the BRCA genes, followed by 454 sequencing and AGSA analysis. AGSA complements a complete line of high-throughput diagnostic sequence analysis, reducing time and costs while increasing reliability, notably for homopolymer tracts.

  13. A High-Throughput Process for the Solid-Phase Purification of Synthetic DNA Sequences.

    Science.gov (United States)

    Grajkowski, Andrzej; Cieślak, Jacek; Beaucage, Serge L

    2017-06-19

    An efficient process for the purification of synthetic phosphorothioate and native DNA sequences is presented. The process is based on the use of an aminopropylated silica gel support functionalized with aminooxyalkyl functions to enable capture of DNA sequences through an oximation reaction with the keto function of a linker conjugated to the 5'-terminus of DNA sequences. Deoxyribonucleoside phosphoramidites carrying this linker, as a 5'-hydroxyl protecting group, have been synthesized for incorporation into DNA sequences during the last coupling step of a standard solid-phase synthesis protocol executed on a controlled pore glass (CPG) support. Solid-phase capture of the nucleobase- and phosphate-deprotected DNA sequences released from the CPG support is demonstrated to proceed near quantitatively. Shorter than full-length DNA sequences are first washed away from the capture support; the solid-phase purified DNA sequences are then released from this support upon reaction with tetra-n-butylammonium fluoride in dry dimethylsulfoxide (DMSO) and precipitated in tetrahydrofuran (THF). The purity of solid-phase-purified DNA sequences exceeds 98%. The simulated high-throughput and scalability features of the solid-phase purification process are demonstrated without sacrificing purity of the DNA sequences. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.

  14. SUGAR: graphical user interface-based data refiner for high-throughput DNA sequencing.

    Science.gov (United States)

    Sato, Yukuto; Kojima, Kaname; Nariai, Naoki; Yamaguchi-Kabata, Yumi; Kawai, Yosuke; Takahashi, Mamoru; Mimori, Takahiro; Nagasaki, Masao

    2014-08-08

    Next-generation sequencers (NGSs) have become one of the main tools for current biology. To obtain useful insights from the NGS data, it is essential to control low-quality portions of the data affected by technical errors such as air bubbles in sequencing fluidics. We develop a software SUGAR (subtile-based GUI-assisted refiner) which can handle ultra-high-throughput data with user-friendly graphical user interface (GUI) and interactive analysis capability. The SUGAR generates high-resolution quality heatmaps of the flowcell, enabling users to find possible signals of technical errors during the sequencing. The sequencing data generated from the error-affected regions of a flowcell can be selectively removed by automated analysis or GUI-assisted operations implemented in the SUGAR. The automated data-cleaning function based on sequence read quality (Phred) scores was applied to a public whole human genome sequencing data and we proved the overall mapping quality was improved. The detailed data evaluation and cleaning enabled by SUGAR would reduce technical problems in sequence read mapping, improving subsequent variant analysis that require high-quality sequence data and mapping results. Therefore, the software will be especially useful to control the quality of variant calls to the low population cells, e.g., cancers, in a sample with technical errors of sequencing procedures.

  15. Deep sequencing of HPV16 genomes: A new high-throughput tool for exploring the carcinogenicity and natural history of HPV16 infection

    Directory of Open Access Journals (Sweden)

    Michael Cullen

    2015-12-01

    Full Text Available For unknown reasons, there is huge variability in risk conferred by different HPV types and, remarkably, strong differences even between closely related variant lineages within each type. HPV16 is a uniquely powerful carcinogenic type, causing approximately half of cervical cancer and most other HPV-related cancers. To permit the large-scale study of HPV genome variability and precancer/cancer, starting with HPV16 and cervical cancer, we developed a high-throughput next-generation sequencing (NGS whole-genome method. We designed a custom HPV16 AmpliSeq™ panel that generated 47 overlapping amplicons covering 99% of the genome sequenced on the Ion Torrent Proton platform. After validating with Sanger, the current “gold standard” of sequencing, in 89 specimens with concordance of 99.9%, we used our NGS method and custom annotation pipeline to sequence 796 HPV16-positive exfoliated cervical cell specimens. The median completion rate per sample was 98.0%.Our method enabled us to discover novel SNPs, large contiguous deletions suggestive of viral integration (OR of 27.3, 95% CI 3.3–222, P=0.002, and the sensitive detection of variant lineage coinfections. This method represents an innovative high-throughput, ultra-deep coverage technique for HPV genomic sequencing, which, in turn, enables the investigation of the role of genetic variation in HPV epidemiology and carcinogenesis. Keywords: HPV16, HPV epidemiology, HPV genomics

  16. TeloPCR-seq: a high-throughput sequencing approach for telomeres

    Science.gov (United States)

    Bennett, Henrietta W.; Liu, Na; Hu, Yan; King, Megan C.

    2017-01-01

    We have developed a high-throughput sequencing approach that enables us to determine terminal telomere sequences from tens of thousands of individual Schizosaccharomyces pombe telomeres. This method provides unprecedented coverage of telomeric sequence complexity in fission yeast. S. pombe telomeres are composed of modular degenerate repeats that can be explained by variation in usage of the TER1 RNA template during reverse transcription. Taking advantage of this deep sequencing approach, we find that “like” repeat modules are highly correlated within individual telomeres. Moreover, repeat module preference varies with telomere length, suggesting that existing repeats promote the incorporation of like repeats and/or that specific conformations of the telomerase holoenzyme efficiently and/or processively add repeats of like nature. After the loss of telomerase activity, this sequencing and analysis pipeline defines a population of telomeres with altered sequence content. This approach will be adaptable to study telomeric repeats in other organisms and also to interrogate repetitive sequences throughout the genome that are inaccessible to other sequencing methods. PMID:27714790

  17. Low-Cost, High-Throughput Sequencing of DNA Assemblies Using a Highly Multiplexed Nextera Process.

    Science.gov (United States)

    Shapland, Elaine B; Holmes, Victor; Reeves, Christopher D; Sorokin, Elena; Durot, Maxime; Platt, Darren; Allen, Christopher; Dean, Jed; Serber, Zach; Newman, Jack; Chandran, Sunil

    2015-07-17

    In recent years, next-generation sequencing (NGS) technology has greatly reduced the cost of sequencing whole genomes, whereas the cost of sequence verification of plasmids via Sanger sequencing has remained high. Consequently, industrial-scale strain engineers either limit the number of designs or take short cuts in quality control. Here, we show that over 4000 plasmids can be completely sequenced in one Illumina MiSeq run for less than $3 each (15× coverage), which is a 20-fold reduction over using Sanger sequencing (2× coverage). We reduced the volume of the Nextera tagmentation reaction by 100-fold and developed an automated workflow to prepare thousands of samples for sequencing. We also developed software to track the samples and associated sequence data and to rapidly identify correctly assembled constructs having the fewest defects. As DNA synthesis and assembly become a centralized commodity, this NGS quality control (QC) process will be essential to groups operating high-throughput pipelines for DNA construction.

  18. Use of high throughput sequencing to study oomycete communities in soil and roots

    DEFF Research Database (Denmark)

    Sapkota, Rumakanta; Nicolaisen, Mogens

    2015-01-01

    limited understanding of the diversity of oomycetes in symptomatic plant tissue as well as in root zones. The aim of this study was to improve and validate techniques for using high throughput sequencing as a tool for studying oomycete communities. Primer sets ITS4, ITS6 and ITS7 that have been used...... taxonomic units from symptomatic lesions in carrot resulted in 94% of the reads belonging to oomycetes with a dominance of species of Pythium that are known to be involved in causing cavity spot. Moreover, soil samples showed that 95% of the sequences could be assigned to oomycetes including Pythium......, Aphanomyces, Peronospora, Saprolegnia and Phytophthora. A high proportion of oomycete reads was consistently present in all symptomatic lesions and soil samples showing the versatility of the strategy and thus demonstrating the usefulness of the method in plant and soil DNA background....

  19. High-throughput genome sequencing of two Listeria monocytogenes clinical isolates during a large foodborne outbreak

    Directory of Open Access Journals (Sweden)

    Trout-Yakel Keri M

    2010-02-01

    Full Text Available Abstract Background A large, multi-province outbreak of listeriosis associated with ready-to-eat meat products contaminated with Listeria monocytogenes serotype 1/2a occurred in Canada in 2008. Subtyping of outbreak-associated isolates using pulsed-field gel electrophoresis (PFGE revealed two similar but distinct AscI PFGE patterns. High-throughput pyrosequencing of two L. monocytogenes isolates was used to rapidly provide the genome sequence of the primary outbreak strain and to investigate the extent of genetic diversity associated with a change of a single restriction enzyme fragment during PFGE. Results The chromosomes were collinear, but differences included 28 single nucleotide polymorphisms (SNPs and three indels, including a 33 kbp prophage that accounted for the observed difference in AscI PFGE patterns. The distribution of these traits was assessed within further clinical, environmental and food isolates associated with the outbreak, and this comparison indicated that three distinct, but highly related strains may have been involved in this nationwide outbreak. Notably, these two isolates were found to harbor a 50 kbp putative mobile genomic island encoding translocation and efflux functions that has not been observed in other Listeria genomes. Conclusions High-throughput genome sequencing provided a more detailed real-time assessment of genetic traits characteristic of the outbreak strains than could be achieved with routine subtyping methods. This study confirms that the latest generation of DNA sequencing technologies can be applied during high priority public health events, and laboratories need to prepare for this inevitability and assess how to properly analyze and interpret whole genome sequences in the context of molecular epidemiology.

  20. Construction and analysis of high-density linkage map using high-throughput sequencing data.

    Directory of Open Access Journals (Sweden)

    Dongyuan Liu

    Full Text Available Linkage maps enable the study of important biological questions. The construction of high-density linkage maps appears more feasible since the advent of next-generation sequencing (NGS, which eases SNP discovery and high-throughput genotyping of large population. However, the marker number explosion and genotyping errors from NGS data challenge the computational efficiency and linkage map quality of linkage study methods. Here we report the HighMap method for constructing high-density linkage maps from NGS data. HighMap employs an iterative ordering and error correction strategy based on a k-nearest neighbor algorithm and a Monte Carlo multipoint maximum likelihood algorithm. Simulation study shows HighMap can create a linkage map with three times as many markers as ordering-only methods while offering more accurate marker orders and stable genetic distances. Using HighMap, we constructed a common carp linkage map with 10,004 markers. The singleton rate was less than one-ninth of that generated by JoinMap4.1. Its total map distance was 5,908 cM, consistent with reports on low-density maps. HighMap is an efficient method for constructing high-density, high-quality linkage maps from high-throughput population NGS data. It will facilitate genome assembling, comparative genomic analysis, and QTL studies. HighMap is available at http://highmap.biomarker.com.cn/.

  1. glbase: a framework for combining, analyzing and displaying heterogeneous genomic and high-throughput sequencing data

    Directory of Open Access Journals (Sweden)

    Andrew Paul Hutchins

    2014-01-01

    Full Text Available Genomic datasets and the tools to analyze them have proliferated at an astonishing rate. However, such tools are often poorly integrated with each other: each program typically produces its own custom output in a variety of non-standard file formats. Here we present glbase, a framework that uses a flexible set of descriptors that can quickly parse non-binary data files. glbase includes many functions to intersect two lists of data, including operations on genomic interval data and support for the efficient random access to huge genomic data files. Many glbase functions can produce graphical outputs, including scatter plots, heatmaps, boxplots and other common analytical displays of high-throughput data such as RNA-seq, ChIP-seq and microarray expression data. glbase is designed to rapidly bring biological data into a Python-based analytical environment to facilitate analysis and data processing. In summary, glbase is a flexible and multifunctional toolkit that allows the combination and analysis of high-throughput data (especially next-generation sequencing and genome-wide data, and which has been instrumental in the analysis of complex data sets. glbase is freely available at http://bitbucket.org/oaxiom/glbase/.

  2. A Bayesian framework to identify methylcytosines from high-throughput bisulfite sequencing data.

    Directory of Open Access Journals (Sweden)

    Qing Xie

    2014-09-01

    Full Text Available High-throughput bisulfite sequencing technologies have provided a comprehensive and well-fitted way to investigate DNA methylation at single-base resolution. However, there are substantial bioinformatic challenges to distinguish precisely methylcytosines from unconverted cytosines based on bisulfite sequencing data. The challenges arise, at least in part, from cell heterozygosis caused by multicellular sequencing and the still limited number of statistical methods that are available for methylcytosine calling based on bisulfite sequencing data. Here, we present an algorithm, termed Bycom, a new Bayesian model that can perform methylcytosine calling with high accuracy. Bycom considers cell heterozygosis along with sequencing errors and bisulfite conversion efficiency to improve calling accuracy. Bycom performance was compared with the performance of Lister, the method most widely used to identify methylcytosines from bisulfite sequencing data. The results showed that the performance of Bycom was better than that of Lister for data with high methylation levels. Bycom also showed higher sensitivity and specificity for low methylation level samples (<1% than Lister. A validation experiment based on reduced representation bisulfite sequencing data suggested that Bycom had a false positive rate of about 4% while maintaining an accuracy of close to 94%. This study demonstrated that Bycom had a low false calling rate at any methylation level and accurate methylcytosine calling at high methylation levels. Bycom will contribute significantly to studies aimed at recalibrating the methylation level of genomic regions based on the presence of methylcytosines.

  3. Yeast diversity during the fermentation of Andean chicha: A comparison of high-throughput sequencing and culture-dependent approaches.

    Science.gov (United States)

    Mendoza, Lucía M; Neef, Alexander; Vignolo, Graciela; Belloch, Carmela

    2017-10-01

    Diversity and dynamics of yeasts associated with the fermentation of Argentinian maize-based beverage chicha was investigated. Samples taken at different stages from two chicha productions were analyzed by culture-dependent and culture-independent methods. Five hundred and ninety six yeasts were isolated by classical microbiological methods and 16 species identified by RFLPs and sequencing of D1/D2 26S rRNA gene. Genetic typing of isolates from the dominant species, Saccharomyces cerevisiae, by PCR of delta elements revealed up to 42 different patterns. High-throughput sequencing (HTS) of D1/D2 26S rRNA gene amplicons from chicha samples detected more than one hundred yeast species and almost fifty filamentous fungi taxa. Analysis of the data revealed that yeasts dominated the fermentation, although, a significant percentage of filamentous fungi appeared in the first step of the process. Statistical analysis of results showed that very few taxa were represented by more than 1% of the reads per sample at any step of the process. S. cerevisiae represented more than 90% of the reads in the fermentative samples. Other yeast species dominated the pre-fermentative steps and abounded in fermented samples when S. cerevisiae was in percentages below 90%. Most yeasts species detected by pyrosequencing were not recovered by cultivation. In contrast, the cultivation-based methodology detected very few yeast taxa, and most of them corresponded with very few reads in the pyrosequencing analysis. Copyright © 2017 Elsevier Ltd. All rights reserved.

  4. ESSENTIALS: Software for Rapid Analysis of High Throughput Transposon Insertion Sequencing Data

    Science.gov (United States)

    Zomer, Aldert; Burghout, Peter; Bootsma, Hester J.; Hermans, Peter W. M.; van Hijum, Sacha A. F. T.

    2012-01-01

    High-throughput analysis of genome-wide random transposon mutant libraries is a powerful tool for (conditional) essential gene discovery. Recently, several next-generation sequencing approaches, e.g. Tn-seq/INseq, HITS and TraDIS, have been developed that accurately map the site of transposon insertions by mutant-specific amplification and sequence readout of DNA flanking the transposon insertions site, assigning a measure of essentiality based on the number of reads per insertion site flanking sequence or per gene. However, analysis of these large and complex datasets is hampered by the lack of an easy to use and automated tool for transposon insertion sequencing data. To fill this gap, we developed ESSENTIALS, an open source, web-based software tool for researchers in the genomics field utilizing transposon insertion sequencing analysis. It accurately predicts (conditionally) essential genes and offers the flexibility of using different sample normalization methods, genomic location bias correction, data preprocessing steps, appropriate statistical tests and various visualizations to examine the results, while requiring only a minimum of input and hands-on work from the researcher. We successfully applied ESSENTIALS to in-house and published Tn-seq, TraDIS and HITS datasets and we show that the various pre- and post-processing steps on the sequence reads and count data with ESSENTIALS considerably improve the sensitivity and specificity of predicted gene essentiality. PMID:22900082

  5. Detection of genomic variation by selection of a 9 mb DNA region and high throughput sequencing.

    Directory of Open Access Journals (Sweden)

    Sergey I Nikolaev

    Full Text Available Detection of the rare polymorphisms and causative mutations of genetic diseases in a targeted genomic area has become a major goal in order to understand genomic and phenotypic variability. We have interrogated repeat-masked regions of 8.9 Mb on human chromosomes 21 (7.8 Mb and 7 (1.1 Mb from an individual from the International HapMap Project (NA12872. We have optimized a method of genomic selection for high throughput sequencing. Microarray-based selection and sequencing resulted in 260-fold enrichment, with 41% of reads mapping to the target region. 83% of SNPs in the targeted region had at least 4-fold sequence coverage and 54% at least 15-fold. When assaying HapMap SNPs in NA12872, our sequence genotypes are 91.3% concordant in regions with coverage > or = 4-fold, and 97.9% concordant in regions with coverage > or = 15-fold. About 81% of the SNPs recovered with both thresholds are listed in dbSNP. We observed that regions with low sequence coverage occur in close proximity to low-complexity DNA. Validation experiments using Sanger sequencing were performed for 46 SNPs with 15-20 fold coverage, with a confirmation rate of 96%, suggesting that DNA selection provides an accurate and cost-effective method for identifying rare genomic variants.

  6. Human Genome Sequencing at the Population Scale: A Primer on High-Throughput DNA Sequencing and Analysis.

    Science.gov (United States)

    Goldfeder, Rachel L; Wall, Dennis P; Khoury, Muin J; Ioannidis, John P A; Ashley, Euan A

    2017-10-15

    Most human diseases have underlying genetic causes. To better understand the impact of genes on disease and its implications for medicine and public health, researchers have pursued methods for determining the sequences of individual genes, then all genes, and now complete human genomes. Massively parallel high-throughput sequencing technology, where DNA is sheared into smaller pieces, sequenced, and then computationally reordered and analyzed, enables fast and affordable sequencing of full human genomes. As the price of sequencing continues to decline, more and more individuals are having their genomes sequenced. This may facilitate better population-level disease subtyping and characterization, as well as individual-level diagnosis and personalized treatment and prevention plans. In this review, we describe several massively parallel high-throughput DNA sequencing technologies and their associated strengths, limitations, and error modes, with a focus on applications in epidemiologic research and precision medicine. We detail the methods used to computationally process and interpret sequence data to inform medical or preventative action. © The Author(s) 2017. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  7. The efficacy of high-throughput sequencing and target enrichment on charred archaeobotanical remains

    DEFF Research Database (Denmark)

    Nistelberger, H. M.; Smith, O.; Wales, Nathan

    2016-01-01

    The majority of archaeological plant material is preserved in a charred state. Obtaining reliable ancient DNA data from these remains has presented challenges due to high rates of nucleotide damage, short DNA fragment lengths, low endogenous DNA content and the potential for modern contamination...... different laboratories, presenting the largest HTS assessment of charred archaeobotanical specimens to date. Rigorous analysis of our data - excluding false-positives due to background contamination or incorrect index assignments - indicated a lack of endogenous DNA in nearly all samples, except for one....... It has been suggested that high-throughput sequencing (HTS) technologies coupled with DNA enrichment techniques may overcome some of these limitations. Here we report the findings of HTS and target enrichment on four important archaeological crops (barley, grape, maize and rice) performed in three...

  8. ViewBS: a powerful toolkit for visualization of high-throughput bisulfite sequencing data.

    Science.gov (United States)

    Huang, Xiaosan; Zhang, Shaoling; Li, Kongqing; Thimmapuram, Jyothi; Xie, Shaojun

    2017-10-26

    High throughput bisulfite sequencing (BS-seq) is an important technology to generate single-base DNA methylomes in both plants and animals. In order to accelerate the data analysis of BS-seq data, toolkits for visualization are required. ViewBS, an open-source toolkit, can extract and visualize the DNA methylome data easily and with flexibility. By using Tabix, ViewBS can visualize BS-seq for large datasets quickly. ViewBS can generate publication-quality figures, such as meta-plots, heat maps and violin-boxplots, which can help users to answer biological questions. We illustrate its application using BS-seq data from Arabidopsis thaliana. ViewBS is freely available at: https://github.com/xie186/ViewBS. xie186@purdue.edu. Supplementary data are available at Bioinformatics online.

  9. Compositional analysis: a valid approach to analyze microbiome high-throughput sequencing data.

    Science.gov (United States)

    Gloor, Gregory B; Reid, Gregor

    2016-08-01

    A workshop held at the 2015 annual meeting of the Canadian Society of Microbiologists highlighted compositional data analysis methods and the importance of exploratory data analysis for the analysis of microbiome data sets generated by high-throughput DNA sequencing. A summary of the content of that workshop, a review of new methods of analysis, and information on the importance of careful analyses are presented herein. The workshop focussed on explaining the rationale behind the use of compositional data analysis, and a demonstration of these methods for the examination of 2 microbiome data sets. A clear understanding of bioinformatics methodologies and the type of data being analyzed is essential, given the growing number of studies uncovering the critical role of the microbiome in health and disease and the need to understand alterations to its composition and function following intervention with fecal transplant, probiotics, diet, and pharmaceutical agents.

  10. High-throughput scanning of the rat genome using interspersed repetitive sequence-PCR markers.

    Science.gov (United States)

    Gösele, C; Hong, L; Kreitler, T; Rossmann, M; Hieke, B; Gross, U; Kramer, M; Himmelbauer, H; Bihoreau, M T; Kwitek-Black, A E; Twigger, S; Tonellato, P J; Jacob, H J; Schalkwyk, L C; Lindpaintner, K; Ganten, D; Lehrach, H; Knoblauch, M

    2000-11-01

    We report the establishment of a hybridization-based marker system for the rat genome based on the PCR amplification of interspersed repetitive sequences (IRS). Overall, 351 IRS markers were mapped within the rat genome. The IRS marker panel consists of 210 nonpolymorphic and 141 polymorphic markers that were screened for presence/absence polymorphism patterns in 38 different rat strains and substrains that are commonly used in biomedical research. The IRS marker panel was demonstrated to be useful for rapid genome screening in experimental rat crosses and high-throughput characterization of large-insert genomic library clones. Information on corresponding YAC clones is made available for this IRS marker set distributed over the whole rat genome. The two existing rat radiation hybrid maps were integrated by placing the IRS markers in both maps. The genetic and physical mapping data presented provide substantial information for ongoing positional cloning projects in the rat. Copyright 2000 Academic Press.

  11. SNP calling using genotype model selection on high-throughput sequencing data

    KAUST Repository

    You, Na

    2012-01-16

    Motivation: A review of the available single nucleotide polymorphism (SNP) calling procedures for Illumina high-throughput sequencing (HTS) platform data reveals that most rely mainly on base-calling and mapping qualities as sources of error when calling SNPs. Thus, errors not involved in base-calling or alignment, such as those in genomic sample preparation, are not accounted for.Results: A novel method of consensus and SNP calling, Genotype Model Selection (GeMS), is given which accounts for the errors that occur during the preparation of the genomic sample. Simulations and real data analyses indicate that GeMS has the best performance balance of sensitivity and positive predictive value among the tested SNP callers. © The Author 2012. Published by Oxford University Press. All rights reserved.

  12. Barcoding the food chain: from Sanger to high-throughput sequencing.

    Science.gov (United States)

    Littlefair, Joanne E; Clare, Elizabeth L

    2016-11-01

    Society faces the complex challenge of supporting biodiversity and ecosystem functioning, while ensuring food security by providing safe traceable food through an ever-more-complex global food chain. The increase in human mobility brings the added threat of pests, parasites, and invaders that further complicate our agro-industrial efforts. DNA barcoding technologies allow researchers to identify both individual species, and, when combined with universal primers and high-throughput sequencing techniques, the diversity within mixed samples (metabarcoding). These tools are already being employed to detect market substitutions, trace pests through the forensic evaluation of trace "environmental DNA", and to track parasitic infections in livestock. The potential of DNA barcoding to contribute to increased security of the food chain is clear, but challenges remain in regulation and the need for validation of experimental analysis. Here, we present an overview of the current uses and challenges of applied DNA barcoding in agriculture, from agro-ecosystems within farmland to the kitchen table.

  13. Galaxy Workflows for Web-based Bioinformatics Analysis of Aptamer High-throughput Sequencing Data

    Directory of Open Access Journals (Sweden)

    William H Thiel

    2016-01-01

    Full Text Available Development of RNA and DNA aptamers for diagnostic and therapeutic applications is a rapidly growing field. Aptamers are identified through iterative rounds of selection in a process termed SELEX (Systematic Evolution of Ligands by EXponential enrichment. High-throughput sequencing (HTS revolutionized the modern SELEX process by identifying millions of aptamer sequences across multiple rounds of aptamer selection. However, these vast aptamer HTS datasets necessitated bioinformatics techniques. Herein, we describe a semiautomated approach to analyze aptamer HTS datasets using the Galaxy Project, a web-based open source collection of bioinformatics tools that were originally developed to analyze genome, exome, and transcriptome HTS data. Using a series of Workflows created in the Galaxy webserver, we demonstrate efficient processing of aptamer HTS data and compilation of a database of unique aptamer sequences. Additional Workflows were created to characterize the abundance and persistence of aptamer sequences within a selection and to filter sequences based on these parameters. A key advantage of this approach is that the online nature of the Galaxy webserver and its graphical interface allow for the analysis of HTS data without the need to compile code or install multiple programs.

  14. Galaxy Workflows for Web-based Bioinformatics Analysis of Aptamer High-throughput Sequencing Data.

    Science.gov (United States)

    Thiel, William H

    2016-01-01

    Development of RNA and DNA aptamers for diagnostic and therapeutic applications is a rapidly growing field. Aptamers are identified through iterative rounds of selection in a process termed SELEX (Systematic Evolution of Ligands by EXponential enrichment). High-throughput sequencing (HTS) revolutionized the modern SELEX process by identifying millions of aptamer sequences across multiple rounds of aptamer selection. However, these vast aptamer HTS datasets necessitated bioinformatics techniques. Herein, we describe a semiautomated approach to analyze aptamer HTS datasets using the Galaxy Project, a web-based open source collection of bioinformatics tools that were originally developed to analyze genome, exome, and transcriptome HTS data. Using a series of Workflows created in the Galaxy webserver, we demonstrate efficient processing of aptamer HTS data and compilation of a database of unique aptamer sequences. Additional Workflows were created to characterize the abundance and persistence of aptamer sequences within a selection and to filter sequences based on these parameters. A key advantage of this approach is that the online nature of the Galaxy webserver and its graphical interface allow for the analysis of HTS data without the need to compile code or install multiple programs. Copyright © 2016 Official journal of the American Society of Gene & Cell Therapy. Published by Elsevier Inc. All rights reserved.

  15. Pair-barcode high-throughput sequencing for large-scale multiplexed sample analysis

    Directory of Open Access Journals (Sweden)

    Tu Jing

    2012-01-01

    Full Text Available Abstract Background The multiplexing becomes the major limitation of the next-generation sequencing (NGS in application to low complexity samples. Physical space segregation allows limited multiplexing, while the existing barcode approach only permits simultaneously analysis of up to several dozen samples. Results Here we introduce pair-barcode sequencing (PBS, an economic and flexible barcoding technique that permits parallel analysis of large-scale multiplexed samples. In two pilot runs using SOLiD sequencer (Applied Biosystems Inc., 32 independent pair-barcoded miRNA libraries were simultaneously discovered by the combination of 4 unique forward barcodes and 8 unique reverse barcodes. Over 174,000,000 reads were generated and about 64% of them are assigned to both of the barcodes. After mapping all reads to pre-miRNAs in miRBase, different miRNA expression patterns are captured from the two clinical groups. The strong correlation using different barcode pairs and the high consistency of miRNA expression in two independent runs demonstrates that PBS approach is valid. Conclusions By employing PBS approach in NGS, large-scale multiplexed pooled samples could be practically analyzed in parallel so that high-throughput sequencing economically meets the requirements of samples which are low sequencing throughput demand.

  16. PCR Strategies for Complete Allele Calling in Multigene Families Using High-Throughput Sequencing Approaches.

    Directory of Open Access Journals (Sweden)

    Elena Marmesat

    Full Text Available The characterization of multigene families with high copy number variation is often approached through PCR amplification with highly degenerate primers to account for all expected variants flanking the region of interest. Such an approach often introduces PCR biases that result in an unbalanced representation of targets in high-throughput sequencing libraries that eventually results in incomplete detection of the targeted alleles. Here we confirm this result and propose two different amplification strategies to alleviate this problem. The first strategy (called pooled-PCRs targets different subsets of alleles in multiple independent PCRs using different moderately degenerate primer pairs, whereas the second approach (called pooled-primers uses a custom-made pool of non-degenerate primers in a single PCR. We compare their performance to the common use of a single PCR with highly degenerate primers using the MHC class I of the Iberian lynx as a model. We found both novel approaches to work similarly well and better than the conventional approach. They significantly scored more alleles per individual (11.33 ± 1.38 and 11.72 ± 0.89 vs 7.94 ± 1.95, yielded more complete allelic profiles (96.28 ± 8.46 and 99.50 ± 2.12 vs 63.76 ± 15.43, and revealed more alleles at a population level (13 vs 12. Finally, we could link each allele's amplification efficiency with the primer-mismatches in its flanking sequences and show that ultra-deep coverage offered by high-throughput technologies does not fully compensate for such biases, especially as real alleles may reach lower coverage than artefacts. Adopting either of the proposed amplification methods provides the opportunity to attain more complete allelic profiles at lower coverages, improving confidence over the downstream analyses and subsequent applications.

  17. A Systematic Assessment of Accuracy in Detecting Somatic Mosaic Variants by Deep Amplicon Sequencing: Application to NF2 Gene.

    Directory of Open Access Journals (Sweden)

    Elisa Contini

    Full Text Available The accurate detection of low-allelic variants is still challenging, particularly for the identification of somatic mosaicism, where matched control sample is not available. High throughput sequencing, by the simultaneous and independent analysis of thousands of different DNA fragments, might overcome many of the limits of traditional methods, greatly increasing the sensitivity. However, it is necessary to take into account the high number of false positives that may arise due to the lack of matched control samples. Here, we applied deep amplicon sequencing to the analysis of samples with known genotype and variant allele fraction (VAF followed by a tailored statistical analysis. This method allowed to define a minimum value of VAF for detecting mosaic variants with high accuracy. Then, we exploited the estimated VAF to select candidate alterations in NF2 gene in 34 samples with unknown genotype (30 blood and 4 tumor DNAs, demonstrating the suitability of our method. The strategy we propose optimizes the use of deep amplicon sequencing for the identification of low abundance variants. Moreover, our method can be applied to different high throughput sequencing approaches to estimate the background noise and define the accuracy of the experimental design.

  18. Kaleidaseq: a Web-based tool to monitor data flow in a high throughput sequencing facility.

    Science.gov (United States)

    Dedhia, N N; McCombie, W R

    1998-03-01

    Tracking data flow in high throughput sequencing is important in maintaining a consistent number of successfully sequenced samples, making decisions on scheduling the flow of sequencing steps, resolving problems at various steps and tracking the status of different projects. This is especially critical when the laboratory is handling a multitude of projects. We have built a Web-based data flow tracking package, called Kaleidaseq, which allows us to monitor the flow and quality of sequencing samples through the steps of preparation of library plates, plaque-picking, preparation of templates, conducting sequencing reactions, loading of samples on gels, base-calling the traces, and calculating the quality of the sequenced samples. Kaleidaseq's suite of displays allows for outstanding monitoring of the production sequencing process. The online display of current information that Kaleidaseq provides on both project status and process queues sorted by project enables accurate real-time assessment of the necessary samples that must be processed to complete the project. This information allows the process manager to allocate future resources optimally and schedule tasks according to scientific priorities. Quality of the sequenced samples can be tracked on a daily basis, which allows the sequencing laboratory to maintain a steady performance level and quickly resolve dips in quality. Kaleidaseq has a simple easy-to-use interface that allows access to all major functions and process queues from one Web page. This software package is modular and designed to allow additional processing steps and new monitoring variables to be added and tracked with ease. Access to the underlying relational database is through the Perl DBI interface, which allows for the use of different relational databases. Kaleidaseq is available for free use by the academic community from http://www.cshl.org/kaleidaseq.

  19. End-to-End Optimization of High-Throughput DNA Sequencing.

    Science.gov (United States)

    O'Reilly, Eliza; Baccelli, Francois; De Veciana, Gustavo; Vikalo, Haris

    2016-10-01

    At the core of Illumina's high-throughput DNA sequencing platforms lies a biophysical surface process that results in a random geometry of clusters of homogeneous short DNA fragments typically hundreds of base pairs long-bridge amplification. The statistical properties of this random process and the lengths of the fragments are critical as they affect the information that can be subsequently extracted, that is, density of successfully inferred DNA fragment reads. The ensembles of overlapping DNA fragment reads are then used to computationally reconstruct the much longer target genome sequence. The success of the reconstruction in turn depends on having a sufficiently large ensemble of DNA fragments that are sufficiently long. In this article using stochastic geometry, we model and optimize the end-to-end flow cell synthesis and target genome sequencing process, linking and partially controlling the statistics of the physical processes to the success of the final computational step. Based on a rough calibration of our model, we provide, for the first time, a mathematical framework capturing the salient features of the sequencing platform that serves as a basis for optimizing cost, performance, and/or sensitivity analysis to various parameters.

  20. Comparing the performance of three ancient DNA extraction methods for high-throughput sequencing.

    Science.gov (United States)

    Gamba, Cristina; Hanghøj, Kristian; Gaunitz, Charleen; Alfarhan, Ahmed H; Alquraishi, Saleh A; Al-Rasheid, Khaled A S; Bradley, Daniel G; Orlando, Ludovic

    2016-03-01

    The DNA molecules that can be extracted from archaeological and palaeontological remains are often degraded and massively contaminated with environmental microbial material. This reduces the efficacy of shotgun approaches for sequencing ancient genomes, despite the decreasing sequencing costs of high-throughput sequencing (HTS). Improving the recovery of endogenous molecules from the DNA extraction and purification steps could, thus, help advance the characterization of ancient genomes. Here, we apply the three most commonly used DNA extraction methods to five ancient bone samples spanning a ~30 thousand year temporal range and originating from a diversity of environments, from South America to Alaska. We show that methods based on the purification of DNA fragments using silica columns are more advantageous than in solution methods and increase not only the total amount of DNA molecules retrieved but also the relative importance of endogenous DNA fragments and their molecular diversity. Therefore, these methods provide a cost-effective solution for downstream applications, including DNA sequencing on HTS platforms. © 2015 John Wiley & Sons Ltd.

  1. SAMQA: error classification and validation of high-throughput sequenced read data

    Directory of Open Access Journals (Sweden)

    Bressler Ryan

    2011-08-01

    Full Text Available Abstract Background The advances in high-throughput sequencing technologies and growth in data sizes has highlighted the need for scalable tools to perform quality assurance testing. These tests are necessary to ensure that data is of a minimum necessary standard for use in downstream analysis. In this paper we present the SAMQA tool to rapidly and robustly identify errors in population-scale sequence data. Results SAMQA has been used on samples from three separate sets of cancer genome data from The Cancer Genome Atlas (TCGA project. Using technical standards provided by the SAM specification and biological standards defined by researchers, we have classified errors in these sequence data sets relative to individual reads within a sample. Due to an observed linearithmic speedup through the use of a high-performance computing (HPC framework for the majority of tasks, poor quality data was identified prior to secondary analysis in significantly less time on the HPC framework than the same data run using alternative parallelization strategies on a single server. Conclusions The SAMQA toolset validates a minimum set of data quality standards across whole-genome and exome sequences. It is tuned to run on a high-performance computational framework, enabling QA across hundreds gigabytes of samples regardless of coverage or sample type.

  2. A beginners guide to SNP calling from high-throughput DNA-sequencing data.

    Science.gov (United States)

    Altmann, André; Weber, Peter; Bader, Daniel; Preuss, Michael; Binder, Elisabeth B; Müller-Myhsok, Bertram

    2012-10-01

    High-throughput DNA sequencing (HTS) is of increasing importance in the life sciences. One of its most prominent applications is the sequencing of whole genomes or targeted regions of the genome such as all exonic regions (i.e., the exome). Here, the objective is the identification of genetic variants such as single nucleotide polymorphisms (SNPs). The extraction of SNPs from the raw genetic sequences involves many processing steps and the application of a diverse set of tools. We review the essential building blocks for a pipeline that calls SNPs from raw HTS data. The pipeline includes quality control, mapping of short reads to the reference genome, visualization and post-processing of the alignment including base quality recalibration. The final steps of the pipeline include the SNP calling procedure along with filtering of SNP candidates. The steps of this pipeline are accompanied by an analysis of a publicly available whole-exome sequencing dataset. To this end, we employ several alignment programs and SNP calling routines for highlighting the fact that the choice of the tools significantly affects the final results.

  3. Species tree estimation of North American chorus frogs (Hylidae: Pseudacris) with parallel tagged amplicon sequencing.

    Science.gov (United States)

    Barrow, Lisa N; Ralicki, Hannah F; Emme, Sandra A; Lemmon, Emily Moriarty

    2014-06-01

    The field of phylogenetics is changing rapidly with the application of high-throughput sequencing to non-model organisms. Cost-effective use of this technology for phylogenetic studies, which often include a relatively small portion of the genome but several taxa, requires strategies for genome partitioning and sequencing multiple individuals in parallel. In this study we estimated a multilocus phylogeny for the North American chorus frog genus Pseudacris using anonymous nuclear loci that were recently developed using a reduced representation library approach. We sequenced 27 nuclear loci and three mitochondrial loci for 44 individuals on 1/3 of an Illumina MiSeq run, obtaining 96.5% of the targeted amplicons at less than 20% of the cost of traditional Sanger sequencing. We found heterogeneity among gene trees, although four major clades (Trilling Frog, Fat Frog, crucifer, and West Coast) were consistently supported, and we resolved the relationships among these clades for the first time with strong support. We also found discordance between the mitochondrial and nuclear datasets that we attribute to mitochondrial introgression and a possible selective sweep. Bayesian concordance analysis in BUCKy and species tree analysis in (*)BEAST produced largely similar topologies, although we identify taxa that require additional investigation in order to clarify taxonomic and geographic range boundaries. Overall, we demonstrate the utility of a reduced representation library approach for marker development and parallel tagged sequencing on an Illumina MiSeq for phylogenetic studies of non-model organisms. Copyright © 2014 Elsevier Inc. All rights reserved.

  4. High-Throughput Sequencing of Three Lemnoideae (Duckweeds) Chloroplast Genomes from Total DNA

    Science.gov (United States)

    Wang, Wenqin; Messing, Joachim

    2011-01-01

    Background Chloroplast genomes provide a wealth of information for evolutionary and population genetic studies. Chloroplasts play a particularly important role in the adaption for aquatic plants because they float on water and their major surface is exposed continuously to sunlight. The subfamily of Lemnoideae represents such a collection of aquatic species that because of photosynthesis represents one of the fastest growing plant species on earth. Methods We sequenced the chloroplast genomes from three different genera of Lemnoideae, Spirodela polyrhiza, Wolffiella lingulata and Wolffia australiana by high-throughput DNA sequencing of genomic DNA using the SOLiD platform. Unfractionated total DNA contains high copies of plastid DNA so that sequences from the nucleus and mitochondria can easily be filtered computationally. Remaining sequence reads were assembled into contiguous sequences (contigs) using SOLiD software tools. Contigs were mapped to a reference genome of Lemna minor and gaps, selected by PCR, were sequenced on the ABI3730xl platform. Conclusions This combinatorial approach yielded whole genomic contiguous sequences in a cost-effective manner. Over 1,000-time coverage of chloroplast from total DNA were reached by the SOLiD platform in a single spot on a quadrant slide without purification. Comparative analysis indicated that the chloroplast genome was conserved in gene number and organization with respect to the reference genome of L. minor. However, higher nucleotide substitution, abundant deletions and insertions occurred in non-coding regions of these genomes, indicating a greater genomic dynamics than expected from the comparison of other related species in the Pooideae. Noticeably, there was no transition bias over transversion in Lemnoideae. The data should have immediate applications in evolutionary biology and plant taxonomy with increased resolution and statistical power. PMID:21931804

  5. High-throughput sequencing of three Lemnoideae (duckweeds chloroplast genomes from total DNA.

    Directory of Open Access Journals (Sweden)

    Wenqin Wang

    Full Text Available BACKGROUND: Chloroplast genomes provide a wealth of information for evolutionary and population genetic studies. Chloroplasts play a particularly important role in the adaption for aquatic plants because they float on water and their major surface is exposed continuously to sunlight. The subfamily of Lemnoideae represents such a collection of aquatic species that because of photosynthesis represents one of the fastest growing plant species on earth. METHODS: We sequenced the chloroplast genomes from three different genera of Lemnoideae, Spirodela polyrhiza, Wolffiella lingulata and Wolffia australiana by high-throughput DNA sequencing of genomic DNA using the SOLiD platform. Unfractionated total DNA contains high copies of plastid DNA so that sequences from the nucleus and mitochondria can easily be filtered computationally. Remaining sequence reads were assembled into contiguous sequences (contigs using SOLiD software tools. Contigs were mapped to a reference genome of Lemna minor and gaps, selected by PCR, were sequenced on the ABI3730xl platform. CONCLUSIONS: This combinatorial approach yielded whole genomic contiguous sequences in a cost-effective manner. Over 1,000-time coverage of chloroplast from total DNA were reached by the SOLiD platform in a single spot on a quadrant slide without purification. Comparative analysis indicated that the chloroplast genome was conserved in gene number and organization with respect to the reference genome of L. minor. However, higher nucleotide substitution, abundant deletions and insertions occurred in non-coding regions of these genomes, indicating a greater genomic dynamics than expected from the comparison of other related species in the Pooideae. Noticeably, there was no transition bias over transversion in Lemnoideae. The data should have immediate applications in evolutionary biology and plant taxonomy with increased resolution and statistical power.

  6. A Multicenter Study To Evaluate the Performance of High-Throughput Sequencing for Virus Detection.

    Science.gov (United States)

    Khan, Arifa S; Ng, Siemon H S; Vandeputte, Olivier; Aljanahi, Aisha; Deyati, Avisek; Cassart, Jean-Pol; Charlebois, Robert L; Taliaferro, Lanyn P

    2017-01-01

    The capability of high-throughput sequencing (HTS) for detection of known and unknown viruses makes it a powerful tool for broad microbial investigations, such as evaluation of novel cell substrates that may be used for the development of new biological products. However, like any new assay, regulatory applications of HTS need method standardization. Therefore, our three laboratories initiated a study to evaluate performance of HTS for potential detection of viral adventitious agents by spiking model viruses in different cellular matrices to mimic putative materials for manufacturing of biologics. Four model viruses were selected based upon different physical and biochemical properties and commercial availability: human respiratory syncytial virus (RSV), Epstein-Barr virus (EBV), feline leukemia virus (FeLV), and human reovirus (REO). Additionally, porcine circovirus (PCV) was tested by one laboratory. Independent samples were prepared for HTS by spiking intact viruses or extracted viral nucleic acids, singly or mixed, into different HeLa cell matrices (resuspended whole cells, cell lysate, or total cellular RNA). Data were obtained using different sequencing platforms (Roche 454, Illumina HiSeq1500 or HiSeq2500). Bioinformatic analyses were performed independently by each laboratory using available tools, pipelines, and databases. The results showed that comparable virus detection was obtained in the three laboratories regardless of sample processing, library preparation, sequencing platform, and bioinformatic analysis: between 0.1 and 3 viral genome copies per cell were detected for all of the model viruses used. This study highlights the potential for using HTS for sensitive detection of adventitious viruses in complex biological samples containing cellular background. IMPORTANCE Recent high-throughput sequencing (HTS) investigations have resulted in unexpected discoveries of known and novel viruses in a variety of sample types, including research materials

  7. Identification and characterization of small non-coding RNAs from Chinese fir by high throughput sequencing

    Directory of Open Access Journals (Sweden)

    Wan Li-Chuan

    2012-08-01

    Full Text Available Abstract Background Small non-coding RNAs (sRNAs play key roles in plant development, growth and responses to biotic and abiotic stresses. At least four classes of sRNAs have been well characterized in plants, including repeat-associated siRNAs (rasiRNAs, microRNAs (miRNAs, trans-acting siRNAs (tasiRNAs and natural antisense transcript-derived siRNAs. Chinese fir (Cunninghamia lanceolata is one of the most important coniferous evergreen tree species in China. No sRNA from Chinese fir has been described to date. Results To obtain sRNAs in Chinese fir, we sequenced a sRNA library generated from seeds, seedlings, leaves, stems and calli, using Illumina high throughput sequencing technology. A comprehensive set of sRNAs were acquired, including conserved and novel miRNAs, rasiRNAs and tasiRNAs. With BLASTN and MIREAP we identified a total of 115 conserved miRNAs comprising 40 miRNA families and one novel miRNA with precursor sequence. The expressions of 16 conserved and one novel miRNAs and one tasiRNA were detected by RT-PCR. Utilizing real time RT-PCR, we revealed that four conserved and one novel miRNAs displayed developmental stage-specific expression patterns in Chinese fir. In addition, 209 unigenes were predicted to be targets of 30 Chinese fir miRNA families, of which five target genes were experimentally verified by 5' RACE, including a squamosa promoter-binding protein gene, a pentatricopeptide (PPR repeat-containing protein gene, a BolA-like family protein gene, AGO1 and a gene of unknown function. We also demonstrated that the DCL3-dependent rasiRNA biogenesis pathway, which had been considered absent in conifers, existed in Chinese fir. Furthermore, the miR390-TAS3-ARF regulatory pathway was elucidated. Conclusions We unveiled a complex population of sRNAs in Chinese fir through high throughput sequencing. This provides an insight into the composition and function of sRNAs in Chinese fir and sheds new light on land plant sRNA evolution.

  8. The Gut Microbiotassay: a high-throughput qPCR approach combinable with next generation sequencing to study gut microbial diversity

    DEFF Research Database (Denmark)

    Hermann-Bank, Marie Louise; Skovgaard, Kerstin; Stockmarr, Anders

    2013-01-01

    Background The intestinal microbiota is a complex and diverse ecosystem that plays a significant role in maintaining the health and well-being of the mammalian host. During the last decade focus has increased on the importance of intestinal bacteria. Several molecular methods can be applied...... to describe the composition of the microbiota. This study used a new approach, the Gut Microbiotassay: an assembly of 24 primer sets targeting the main phyla and taxonomically related subgroups of the intestinal microbiota, to be used with the high-throughput qPCR chip ‘Access Array 48.48′, AA48.48, (Fluidigm...... with and without diarrhoea. The PCR amplicons from the 2304 reaction chambers were harvested from the AA48.48, purified, and sequenced using 454-technology. Results The Gut Microbiotassay was able to detect significant differences in the quantity and composition of the microbiota according to gut sections...

  9. High-throughput nucleotide sequence analysis of diverse bacterial communities in leachates of decomposing pig carcasses

    Directory of Open Access Journals (Sweden)

    Seung Hak Yang

    2015-09-01

    Full Text Available The leachate generated by the decomposition of animal carcass has been implicated as an environmental contaminant surrounding the burial site. High-throughput nucleotide sequencing was conducted to investigate the bacterial communities in leachates from the decomposition of pig carcasses. We acquired 51,230 reads from six different samples (1, 2, 3, 4, 6 and 14 week-old carcasses and found that sequences representing the phylum Firmicutes predominated. The diversity of bacterial 16S rRNA gene sequences in the leachate was the highest at 6 weeks, in contrast to those at 2 and 14 weeks. The relative abundance of Firmicutes was reduced, while the proportion of Bacteroidetes and Proteobacteria increased from 3–6 weeks. The representation of phyla was restored after 14 weeks. However, the community structures between the samples taken at 1–2 and 14 weeks differed at the bacterial classification level. The trend in pH was similar to the changes seen in bacterial communities, indicating that the pH of the leachate could be related to the shift in the microbial community. The results indicate that the composition of bacterial communities in leachates of decomposing pig carcasses shifted continuously during the study period and might be influenced by the burial site.

  10. [Study on Microbial Diversity of Peri-implantitis Subgingival by High-throughput Sequencing].

    Science.gov (United States)

    Li, Zhi-jie; Wang, Shao-guo; Li, Yue-hong; Tu, Dong-xiang; Liu, Shi-yun; Nie, Hong-bing; Li, Zhi-qiang; Zhang, Ju-mei

    2015-07-01

    To study microbial diversity of peri-implantitis subgingival with high-throughput sequencing, and investigate microbiological etiology of peri-implantitis. Subgingival plaques were sampled from the patients with peri-implantitis (D group) and non-peri-implantitis subjects (N group). The microbiological diversity of the subgingival plaques was detected by sequencing V4 region of 16S rRNA with Illumina Miseq platform. The diversity of the community structure was analyzed using Mothur software. A total of 156 507 gene sequences were detected in nine samples and 4 402 operational taxonomic units (OTUs) were found. Selenomonas, Pseudomonas, and Fusobacterium were dominant bacteria in D group, while Fusobacterium, Veillonella and Streptococcus were dominant bacteria in N group. Differences between peri-implantitis and non-peri-implantitis bacterial communities were observed at all phylogenetic levels by LEfSe, which was also found in PcoA test. The occurrence of peri-implantitis is not only related to periodontitis pathogenic microbe, but also related with the changes of oral microbial community structure. Treponema, Herbaspirillum, Butyricimonas and Phaeobacte may be closely related to the occurrence and development of peri-implantitis.

  11. Alignment of high-throughput sequencing data inside in-memory databases.

    Science.gov (United States)

    Firnkorn, Daniel; Knaup-Gregori, Petra; Lorenzo Bermejo, Justo; Ganzinger, Matthias

    2014-01-01

    In times of high-throughput DNA sequencing techniques, performance-capable analysis of DNA sequences is of high importance. Computer supported DNA analysis is still an intensive time-consuming task. In this paper we explore the potential of a new In-Memory database technology by using SAP's High Performance Analytic Appliance (HANA). We focus on read alignment as one of the first steps in DNA sequence analysis. In particular, we examined the widely used Burrows-Wheeler Aligner (BWA) and implemented stored procedures in both, HANA and the free database system MySQL, to compare execution time and memory management. To ensure that the results are comparable, MySQL has been running in memory as well, utilizing its integrated memory engine for database table creation. We implemented stored procedures, containing exact and inexact searching of DNA reads within the reference genome GRCh37. Due to technical restrictions in SAP HANA concerning recursion, the inexact matching problem could not be implemented on this platform. Hence, performance analysis between HANA and MySQL was made by comparing the execution time of the exact search procedures. Here, HANA was approximately 27 times faster than MySQL which means, that there is a high potential within the new In-Memory concepts, leading to further developments of DNA analysis procedures in the future.

  12. In Silico Identification of RNA Modifications from High-Throughput Sequencing Data Using HAMR.

    Science.gov (United States)

    Kuksa, Pavel P; Leung, Yuk Yee; Vandivier, Lee E; Anderson, Zachary; Gregory, Brian D; Wang, Li-San

    2017-01-01

    RNA molecules are often altered post-transcriptionally by the covalent modification of their nucleotides. These modifications are known to modulate the structure, function, and activity of RNAs. When reverse transcribed into cDNA during RNA sequencing library preparation, atypical (modified) ribonucleotides that affect Watson-Crick base pairing will interfere with reverse transcriptase (RT), resulting in cDNA products with mis-incorporated bases or prematurely terminated RNA products. These interactions with RT can therefore be inferred from mismatch patterns in the sequencing reads, and are distinguishable from simple base-calling errors, single-nucleotide polymorphisms (SNPs), or RNA editing sites. Here, we describe a computational protocol for the in silico identification of modified ribonucleotides from RT-based RNA-seq read-out using the High-throughput Analysis of Modified Ribonucleotides (HAMR) software. HAMR can identify these modifications transcriptome-wide with single nucleotide resolution, and also differentiate between different types of modifications to predict modification identity. Researchers can use HAMR to identify and characterize RNA modifications using RNA-seq data from a variety of common RT-based sequencing protocols such as Poly(A), total RNA-seq, and small RNA-seq.

  13. A quantitative SMRT cell sequencing method for ribosomal amplicons.

    Science.gov (United States)

    Jones, Bethan M; Kustka, Adam B

    2017-04-01

    Advances in sequencing technologies continue to provide unprecedented opportunities to characterize microbial communities. For example, the Pacific Biosciences Single Molecule Real-Time (SMRT) platform has emerged as a unique approach harnessing DNA polymerase activity to sequence template molecules, enabling long reads at low costs. With the aim to simultaneously classify and enumerate in situ microbial populations, we developed a quantitative SMRT (qSMRT) approach that involves the addition of exogenous standards to quantify ribosomal amplicons derived from environmental samples. The V7-9 regions of 18S SSU rDNA were targeted and quantified from protistan community samples collected in the Ross Sea during the Austral summer of 2011. We used three standards of different length and optimized conditions to obtain accurate quantitative retrieval across the range of expected amplicon sizes, a necessary criterion for analyzing taxonomically diverse 18S rDNA molecules from natural environments. The ability to concurrently identify and quantify microorganisms in their natural environment makes qSMRT a powerful, rapid and cost-effective approach for defining ecosystem diversity and function. Copyright © 2017 Elsevier B.V. All rights reserved.

  14. Vidjil: A Web Platform for Analysis of High-Throughput Repertoire Sequencing.

    Science.gov (United States)

    Duez, Marc; Giraud, Mathieu; Herbert, Ryan; Rocher, Tatiana; Salson, Mikaël; Thonier, Florian

    2016-01-01

    The B and T lymphocytes are white blood cells playing a key role in the adaptive immunity. A part of their DNA, called the V(D)J recombinations, is specific to each lymphocyte, and enables recognition of specific antigenes. Today, with new sequencing techniques, one can get billions of DNA sequences from these regions. With dedicated Repertoire Sequencing (RepSeq) methods, it is now possible to picture population of lymphocytes, and to monitor more accurately the immune response as well as pathologies such as leukemia. Vidjil is an open-source platform for the interactive analysis of high-throughput sequencing data from lymphocyte recombinations. It contains an algorithm gathering reads into clonotypes according to their V(D)J junctions, a web application made of a sample, experiment and patient database and a visualization for the analysis of clonotypes along the time. Vidjil is implemented in C++, Python and Javascript and licensed under the GPLv3 open-source license. Source code, binaries and a public web server are available at http://www.vidjil.org and at http://bioinfo.lille.inria.fr/vidjil. Using the Vidjil web application consists of four steps: 1. uploading a raw sequence file (typically a FASTQ); 2. running RepSeq analysis software; 3. visualizing the results; 4. annotating the results and saving them for future use. For the end-user, the Vidjil web application needs no specific installation and just requires a connection and a modern web browser. Vidjil is used by labs in hematology or immunology for research and clinical applications.

  15. Analysis of high-throughput sequencing and annotation strategies for phage genomes.

    Directory of Open Access Journals (Sweden)

    Matthew R Henn

    Full Text Available BACKGROUND: Bacterial viruses (phages play a critical role in shaping microbial populations as they influence both host mortality and horizontal gene transfer. As such, they have a significant impact on local and global ecosystem function and human health. Despite their importance, little is known about the genomic diversity harbored in phages, as methods to capture complete phage genomes have been hampered by the lack of knowledge about the target genomes, and difficulties in generating sufficient quantities of genomic DNA for sequencing. Of the approximately 550 phage genomes currently available in the public domain, fewer than 5% are marine phage. METHODOLOGY/PRINCIPAL FINDINGS: To advance the study of phage biology through comparative genomic approaches we used marine cyanophage as a model system. We compared DNA preparation methodologies (DNA extraction directly from either phage lysates or CsCl purified phage particles, and sequencing strategies that utilize either Sanger sequencing of a linker amplification shotgun library (LASL or of a whole genome shotgun library (WGSL, or 454 pyrosequencing methods. We demonstrate that genomic DNA sample preparation directly from a phage lysate, combined with 454 pyrosequencing, is best suited for phage genome sequencing at scale, as this method is capable of capturing complete continuous genomes with high accuracy. In addition, we describe an automated annotation informatics pipeline that delivers high-quality annotation and yields few false positives and negatives in ORF calling. CONCLUSIONS/SIGNIFICANCE: These DNA preparation, sequencing and annotation strategies enable a high-throughput approach to the burgeoning field of phage genomics.

  16. High throughput whole rumen metagenome profiling using untargeted massively parallel sequencing

    Directory of Open Access Journals (Sweden)

    Ross Elizabeth M

    2012-07-01

    Full Text Available Abstract Background Variation of microorganism communities in the rumen of cattle (Bos taurus is of great interest because of possible links to economically or environmentally important traits, such as feed conversion efficiency or methane emission levels. The resolution of studies investigating this variation may be improved by utilizing untargeted massively parallel sequencing (MPS, that is, sequencing without targeted amplification of genes. The objective of this study was to develop a method which used MPS to generate “rumen metagenome profiles”, and to investigate if these profiles were repeatable among samples taken from the same cow. Given faecal samples are much easier to obtain than rumen fluid samples; we also investigated whether rumen metagenome profiles were predictive of faecal metagenome profiles. Results Rather than focusing on individual organisms within the rumen, our method used MPS data to generate quantitative rumen micro-biome profiles, regardless of taxonomic classifications. The method requires a previously assembled reference metagenome. A number of such reference metagenomes were considered, including two rumen derived metagenomes, a human faecal microflora metagenome and a reference metagenome made up of publically available prokaryote sequences. Sequence reads from each test sample were aligned to these references. The “rumen metagenome profile” was generated from the number of the reads that aligned to each contig in the database. We used this method to test the hypothesis that rumen fluid microbial community profiles vary more between cows than within multiple samples from the same cow. Rumen fluid samples were taken from three cows, at three locations within the rumen. DNA from the samples was sequenced on the Illumina GAIIx. When the reads were aligned to a rumen metagenome reference, the rumen metagenome profiles were repeatable (P  Conclusions We have presented a simple and high throughput method of

  17. The efficacy of high-throughput sequencing and target enrichment on charred archaeobotanical remains.

    Science.gov (United States)

    Nistelberger, H M; Smith, O; Wales, N; Star, B; Boessenkool, S

    2016-11-24

    The majority of archaeological plant material is preserved in a charred state. Obtaining reliable ancient DNA data from these remains has presented challenges due to high rates of nucleotide damage, short DNA fragment lengths, low endogenous DNA content and the potential for modern contamination. It has been suggested that high-throughput sequencing (HTS) technologies coupled with DNA enrichment techniques may overcome some of these limitations. Here we report the findings of HTS and target enrichment on four important archaeological crops (barley, grape, maize and rice) performed in three different laboratories, presenting the largest HTS assessment of charred archaeobotanical specimens to date. Rigorous analysis of our data - excluding false-positives due to background contamination or incorrect index assignments - indicated a lack of endogenous DNA in nearly all samples, except for one lightly-charred maize cob. Even with target enrichment, this sample failed to yield adequate data required to address fundamental questions in archaeology and biology. We further reanalysed part of an existing dataset on charred plant material, and found all purported endogenous DNA sequences were likely to be spurious. We suggest these technologies are not suitable for use with charred archaeobotanicals and urge great caution when interpreting data obtained by HTS of these remains.

  18. Genotyping by PCR and High-Throughput Sequencing of Commercial Probiotic Products Reveals Composition Biases.

    Directory of Open Access Journals (Sweden)

    Wesley Morovic

    2016-11-01

    Full Text Available Recent advances in microbiome research have brought renewed focus on beneficial bacteria, many of which are available in food and dietary supplements. Although probiotics have historically been defined as microorganisms that convey health benefits when ingested in sufficient viable amounts, this description now includes the stipulation well defined strains, encompassing definitive taxonomy for consumer consideration and regulatory oversight. Here, we evaluated 52 commercial dietary supplements covering a range of labeled species, and determined their content using plate counting, targeted genotyping. Additionally, strain identities were assessed using methods recently published by the United States Pharmacopeial Convention. We also determined the relative abundance of individual bacteria by high-throughput sequencing (HTS of the 16S rRNA sequence using paired-end 2x250bp Illumina MiSeq technology. Using multiple methods, we tested the hypothesis that products do contain the quantitative amount of labeled bacteria, and qualitative list of labeled microbial species. We found that 17 samples (33% were below label claim for CFU prior to their expiration dates. A multiplexed-PCR scheme showed that only 30/52 (58% of the products contained a correctly labeled classification, with issues encompassing incorrect taxonomy, missing species and un-labeled species. The HTS revealed that many blended products consisted predominantly of Lactobacillus acidophilus and Bifidobacterium animalis subsp. lactis. These results highlight the need for reliable methods to qualitatively determine the correct taxonomy and quantitatively ascertain the relative amounts of mixed microbial populations in commercial probiotic products.

  19. Perchlorate reduction by hydrogen autotrophic bacteria and microbial community analysis using high-throughput sequencing.

    Science.gov (United States)

    Wan, Dongjin; Liu, Yongde; Niu, Zhenhua; Xiao, Shuhu; Li, Daorong

    2016-02-01

    Hydrogen autotrophic reduction of perchlorate have advantages of high removal efficiency and harmless to drinking water. But so far the reported information about the microbial community structure was comparatively limited, changes in the biodiversity and the dominant bacteria during acclimation process required detailed study. In this study, perchlorate-reducing hydrogen autotrophic bacteria were acclimated by hydrogen aeration from activated sludge. For the first time, high-throughput sequencing was applied to analyze changes in biodiversity and the dominant bacteria during acclimation process. The Michaelis-Menten model described the perchlorate reduction kinetics well. Model parameters q(max) and K(s) were 2.521-3.245 (mg ClO4(-)/gVSS h) and 5.44-8.23 (mg/l), respectively. Microbial perchlorate reduction occurred across at pH range 5.0-11.0; removal was highest at pH 9.0. The enriched mixed bacteria could use perchlorate, nitrate and sulfate as electron accepter, and the sequence of preference was: NO3(-) > ClO4(-) > SO4(2-). Compared to the feed culture, biodiversity decreased greatly during acclimation process, the microbial community structure gradually stabilized after 9 acclimation cycles. The Thauera genus related to Rhodocyclales was the dominated perchlorate reducing bacteria (PRB) in the mixed culture.

  20. High-throughput DNA Stretching in Continuous Elongational Flow for Genome Sequence Scanning

    Science.gov (United States)

    Meltzer, Robert; Griffis, Joshua; Safranovitch, Mikhail; Malkin, Gene; Cameron, Douglas

    2014-03-01

    Genome Sequence Scanning (GSS) identifies and compares bacterial genomes by stretching long (60 - 300 kb) genomic DNA restriction fragments and scanning for site-selective fluorescent probes. Practical application of GSS requires: 1) high throughput data acquisition, 2) efficient DNA stretching, 3) reproducible DNA elasticity in the presence of intercalating fluorescent dyes. GSS utilizes a pseudo-two-dimensional micron-scale funnel with convergent sheathing flows to stretch one molecule at a time in continuous elongational flow and center the DNA stream over diffraction-limited confocal laser excitation spots. Funnel geometry has been optimized to maximize throughput of DNA within the desired length range (>10 million nucleobases per second). A constant-strain detection channel maximizes stretching efficiency by applying a constant parabolic tension profile to each molecule, minimizing relaxation and flow-induced tumbling. The effect of intercalator on DNA elasticity is experimentally controlled by reacting one molecule of DNA at a time in convergent sheathing flows of the dye. Derivations of accelerating flow and non-linear tension distribution permit alignment of detected fluorescence traces to theoretical templates derived from whole-genome sequence data.

  1. Surveying the repair of ancient DNA from bones via high-throughput sequencing.

    Science.gov (United States)

    Mouttham, Nathalie; Klunk, Jennifer; Kuch, Melanie; Fourney, Ron; Poinar, Hendrik

    2015-07-01

    DNA damage in the form of abasic sites, chemically altered nucleotides, and strand fragmentation is the foremost limitation in obtaining genetic information from many ancient samples. Upon cell death, DNA continues to endure various chemical attacks such as hydrolysis and oxidation, but repair pathways found in vivo no longer operate. By incubating degraded DNA with specific enzyme combinations adopted from these pathways, it is possible to reverse some of the post-mortem nucleic acid damage prior to downstream analyses such as library preparation, targeted enrichment, and high-throughput sequencing. Here, we evaluate the performance of two available repair protocols on previously characterized DNA extracts from four mammoths. Both methods use endonucleases and glycosylases along with a DNA polymerase-ligase combination. PreCR Repair Mix increases the number of molecules converted to sequencing libraries, leading to an increase in endogenous content and a decrease in cytosine-to-thymine transitions due to cytosine deamination. However, the effects of Nelson Repair Mix on repair of DNA damage remain inconclusive.

  2. Pyicos: a versatile toolkit for the analysis of high-throughput sequencing data.

    Science.gov (United States)

    Althammer, Sonja; González-Vallinas, Juan; Ballaré, Cecilia; Beato, Miguel; Eyras, Eduardo

    2011-12-15

    High-throughput sequencing (HTS) has revolutionized gene regulation studies and is now fundamental for the detection of protein-DNA and protein-RNA binding, as well as for measuring RNA expression. With increasing variety and sequencing depth of HTS datasets, the need for more flexible and memory-efficient tools to analyse them is growing. We describe Pyicos, a powerful toolkit for the analysis of mapped reads from diverse HTS experiments: ChIP-Seq, either punctuated or broad signals, CLIP-Seq and RNA-Seq. We prove the effectiveness of Pyicos to select for significant signals and show that its accuracy is comparable and sometimes superior to that of methods specifically designed for each particular type of experiment. Pyicos facilitates the analysis of a variety of HTS datatypes through its flexibility and memory efficiency, providing a useful framework for data integration into models of regulatory genomics. Open-source software, with tutorials and protocol files, is available at http://regulatorygenomics.upf.edu/pyicos or as a Galaxy server at http://regulatorygenomics.upf.edu/galaxy eduardo.eyras@upf.edu Supplementary data are available at Bioinformatics online.

  3. Common fusion transcripts identified in colorectal cancer cell lines by high-throughput RNA sequencing.

    Science.gov (United States)

    Nome, Torfinn; Thomassen, Gard Os; Bruun, Jarle; Ahlquist, Terje; Bakken, Anne C; Hoff, Andreas M; Rognum, Torleiv; Nesbakken, Arild; Lorenz, Susanne; Sun, Jinchang; Barros-Silva, João Diogo; Lind, Guro E; Myklebost, Ola; Teixeira, Manuel R; Meza-Zepeda, Leonardo A; Lothe, Ragnhild A; Skotheim, Rolf I

    2013-01-01

    Colorectal cancer (CRC) is the third most common cancer disease in the Western world, and about 40% of the patients die from this disease. The cancer cells are commonly genetically unstable, but only a few low-frequency recurrent fusion genes have so far been reported for this disease. In this study, we present a thorough search for novel fusion transcripts in CRC using high-throughput RNA sequencing. From altogether 220 million paired-end sequence reads from seven CRC cell lines, we identified 3391 candidate fused transcripts. By stringent requirements, we nominated 11 candidate fusion transcripts for further experimental validation, of which 10 were positive by reverse transcription-polymerase chain reaction and Sanger sequencing. Six were intrachromosomal fusion transcripts, and interestingly, three of these, AKAP13-PDE8A, COMMD10-AP3S1, and CTB-35F21.1-PSD2, were present in, respectively, 18, 18, and 20 of 21 analyzed cell lines and in, respectively, 18, 61, and 48 (17%-58%) of 106 primary cancer tissues. These three fusion transcripts were also detected in 2 to 4 of 14 normal colonic mucosa samples (14%-28%). Whole-genome sequencing identified a specific genomic breakpoint in COMMD10-AP3S1 and further indicates that both the COMMD10-AP3S1 and AKAP13-PDE8A fusion transcripts are due to genomic duplications in specific cell lines. In conclusion, we have identified AKAP13-PDE8A, COMMD10-AP3S1, and CTB-35F21.1-PSD2 as novel intrachromosomal fusion transcripts and the most highly recurring chimeric transcripts described for CRC to date. The functional and clinical relevance of these chimeric RNA molecules remains to be elucidated.

  4. Common Fusion Transcripts Identified in Colorectal Cancer Cell Lines by High-Throughput RNA Sequencing12

    Science.gov (United States)

    Nome, Torfinn; Thomassen, Gard OS; Bruun, Jarle; Ahlquist, Terje; Bakken, Anne C; Hoff, Andreas M; Rognum, Torleiv; Nesbakken, Arild; Lorenz, Susanne; Sun, Jinchang; Barros-Silva, João Diogo; Lind, Guro E; Myklebost, Ola; Teixeira, Manuel R; Meza-Zepeda, Leonardo A; Lothe, Ragnhild A; Skotheim, Rolf I

    2013-01-01

    Colorectal cancer (CRC) is the third most common cancer disease in the Western world, and about 40% of the patients die from this disease. The cancer cells are commonly genetically unstable, but only a few low-frequency recurrent fusion genes have so far been reported for this disease. In this study, we present a thorough search for novel fusion transcripts in CRC using high-throughput RNA sequencing. From altogether 220 million paired-end sequence reads from seven CRC cell lines, we identified 3391 candidate fused transcripts. By stringent requirements, we nominated 11 candidate fusion transcripts for further experimental validation, of which 10 were positive by reverse transcription-polymerase chain reaction and Sanger sequencing. Six were intrachromosomal fusion transcripts, and interestingly, three of these, AKAP13-PDE8A, COMMD10-AP3S1, and CTB-35F21.1-PSD2, were present in, respectively, 18, 18, and 20 of 21 analyzed cell lines and in, respectively, 18, 61, and 48 (17%-58%) of 106 primary cancer tissues. These three fusion transcripts were also detected in 2 to 4 of 14 normal colonic mucosa samples (14%–28%). Whole-genome sequencing identified a specific genomic breakpoint in COMMD10-AP3S1 and further indicates that both the COMMD10-AP3S1 and AKAP13-PDE8A fusion transcripts are due to genomic duplications in specific cell lines. In conclusion, we have identified AKAP13-PDE8A, COMMD10-AP3S1, and CTB-35F21.1-PSD2 as novel intrachromosomal fusion transcripts and the most highly recurring chimeric transcripts described for CRC to date. The functional and clinical relevance of these chimeric RNA molecules remains to be elucidated. PMID:24151535

  5. Identification of microRNAs from Eugenia uniflora by high-throughput sequencing and bioinformatics analysis.

    Science.gov (United States)

    Guzman, Frank; Almerão, Mauricio P; Körbes, Ana P; Loss-Morais, Guilherme; Margis, Rogerio

    2012-01-01

    microRNAs or miRNAs are small non-coding regulatory RNAs that play important functions in the regulation of gene expression at the post-transcriptional level by targeting mRNAs for degradation or inhibiting protein translation. Eugenia uniflora is a plant native to tropical America with pharmacological and ecological importance, and there have been no previous studies concerning its gene expression and regulation. To date, no miRNAs have been reported in Myrtaceae species. Small RNA and RNA-seq libraries were constructed to identify miRNAs and pre-miRNAs in Eugenia uniflora. Solexa technology was used to perform high throughput sequencing of the library, and the data obtained were analyzed using bioinformatics tools. From 14,489,131 small RNA clean reads, we obtained 1,852,722 mature miRNA sequences representing 45 conserved families that have been identified in other plant species. Further analysis using contigs assembled from RNA-seq allowed the prediction of secondary structures of 25 known and 17 novel pre-miRNAs. The expression of twenty-seven identified miRNAs was also validated using RT-PCR assays. Potential targets were predicted for the most abundant mature miRNAs in the identified pre-miRNAs based on sequence homology. This study is the first large scale identification of miRNAs and their potential targets from a species of the Myrtaceae family without genomic sequence resources. Our study provides more information about the evolutionary conservation of the regulatory network of miRNAs in plants and highlights species-specific miRNAs.

  6. High-throughput massively parallel sequencing for fetal aneuploidy detection from maternal plasma.

    Directory of Open Access Journals (Sweden)

    Taylor J Jensen

    Full Text Available Circulating cell-free (ccf fetal DNA comprises 3-20% of all the cell-free DNA present in maternal plasma. Numerous research and clinical studies have described the analysis of ccf DNA using next generation sequencing for the detection of fetal aneuploidies with high sensitivity and specificity. We sought to extend the utility of this approach by assessing semi-automated library preparation, higher sample multiplexing during sequencing, and improved bioinformatic tools to enable a higher throughput, more efficient assay while maintaining or improving clinical performance.Whole blood (10mL was collected from pregnant female donors and plasma separated using centrifugation. Ccf DNA was extracted using column-based methods. Libraries were prepared using an optimized semi-automated library preparation method and sequenced on an Illumina HiSeq2000 sequencer in a 12-plex format. Z-scores were calculated for affected chromosomes using a robust method after normalization and genomic segment filtering. Classification was based upon a standard normal transformed cutoff value of z = 3 for chromosome 21 and z = 3.95 for chromosomes 18 and 13.Two parallel assay development studies using a total of more than 1900 ccf DNA samples were performed to evaluate the technical feasibility of automating library preparation and increasing the sample multiplexing level. These processes were subsequently combined and a study of 1587 samples was completed to verify the stability of the process-optimized assay. Finally, an unblinded clinical evaluation of 1269 euploid and aneuploid samples utilizing this high-throughput assay coupled to improved bioinformatic procedures was performed. We were able to correctly detect all aneuploid cases with extremely low false positive rates of 0.09%, <0.01%, and 0.08% for trisomies 21, 18, and 13, respectively.These data suggest that the developed laboratory methods in concert with improved bioinformatic approaches enable higher sample

  7. LightAssembler: fast and memory-efficient assembly algorithm for high-throughput sequencing reads.

    Science.gov (United States)

    El-Metwally, Sara; Zakaria, Magdi; Hamza, Taher

    2016-11-01

    The deluge of current sequenced data has exceeded Moore's Law, more than doubling every 2 years since the next-generation sequencing (NGS) technologies were invented. Accordingly, we will able to generate more and more data with high speed at fixed cost, but lack the computational resources to store, process and analyze it. With error prone high throughput NGS reads and genomic repeats, the assembly graph contains massive amount of redundant nodes and branching edges. Most assembly pipelines require this large graph to reside in memory to start their workflows, which is intractable for mammalian genomes. Resource-efficient genome assemblers combine both the power of advanced computing techniques and innovative data structures to encode the assembly graph efficiently in a computer memory. LightAssembler is a lightweight assembly algorithm designed to be executed on a desktop machine. It uses a pair of cache oblivious Bloom filters, one holding a uniform sample of [Formula: see text]-spaced sequenced [Formula: see text]-mers and the other holding [Formula: see text]-mers classified as likely correct, using a simple statistical test. LightAssembler contains a light implementation of the graph traversal and simplification modules that achieves comparable assembly accuracy and contiguity to other competing tools. Our method reduces the memory usage by [Formula: see text] compared to the resource-efficient assemblers using benchmark datasets from GAGE and Assemblathon projects. While LightAssembler can be considered as a gap-based sequence assembler, different gap sizes result in an almost constant assembly size and genome coverage. https://github.com/SaraEl-Metwally/LightAssembler CONTACT: sarah_almetwally4@mans.edu.egSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  8. Flow cytometry community fingerprinting and amplicon sequencing for the assessment of landfill leachate cellulolytic bioaugmentation.

    Science.gov (United States)

    Kinet, R; Dzaomuho, P; Baert, J; Taminiau, B; Daube, G; Nezer, C; Brostaux, Y; Nguyen, F; Dumont, G; Thonart, P; Delvigne, F

    2016-08-01

    Flow cytometry (FCM) is a high throughput single cell technology that is actually becoming widely used for studying phenotypic and genotypic diversity among microbial communities. This technology is considered in this work for the assessment of a bioaugmentation treatment in order to enhance cellulolytic potential of landfill leachate. The experimental results reveal the relevant increase of leachate cellulolytic potential due to bioaugmentation. Cytometric monitoring of microbial dynamics along these assays is then realized. The flow FP package is used to establish microbial samples fingerprint from initial 2D cytometry histograms. This procedure allows highlighting microbial communities' variation along the assays. Cytometric and 16S rRNA gene sequencing fingerprinting methods are then compared. The two approaches give same evidence about microbial dynamics throughout digestion assay. There are however a lack of significant correlation between cytometric and amplicon sequencing fingerprint at genus or species level. Same phenotypical profiles of microbiota during assays matched to several 16S rRNA gene sequencing ones. Flow cytometry fingerprinting can thus be considered as a promising routine on-site method suitable for the detection of stability/variation/disturbance of complex microbial communities involved in bioprocesses. Copyright © 2016 Elsevier Ltd. All rights reserved.

  9. Rapid Detection and Identification of Infectious Pathogens Based on High-throughput Sequencing

    Directory of Open Access Journals (Sweden)

    Pei-Xiang Ni

    2015-01-01

    Full Text Available Background: The dilemma of pathogens identification in patients with unidentified clinical symptoms such as fever of unknown origin exists, which not only poses a challenge to both the diagnostic and therapeutic process by itself, but also to expert physicians. Methods: In this report, we have attempted to increase the awareness of unidentified pathogens by developing a method to investigate hitherto unidentified infectious pathogens based on unbiased high-throughput sequencing. Results: Our observations show that this method supplements current diagnostic technology that predominantly relies on information derived five cases from the intensive care unit. This methodological approach detects viruses and corrects the incidence of false positive detection rates of pathogens in a much shorter period. Through our method is followed by polymerase chain reaction validation, we could identify infection with Epstein-Barr virus, and in another case, we could identify infection with Streptococcus viridians based on the culture, which was false positive. Conclusions: This technology is a promising approach to revolutionize rapid diagnosis of infectious pathogens and to guide therapy that might result in the improvement of personalized medicine.

  10. Exploring the polyadenylated RNA virome of sweet potato through high-throughput sequencing.

    Directory of Open Access Journals (Sweden)

    Ying-Hong Gu

    Full Text Available BACKGROUND: Viral diseases are the second most significant biotic stress for sweet potato, with yield losses reaching 20% to 40%. Over 30 viruses have been reported to infect sweet potato around the world, and 11 of these have been detected in China. Most of these viruses were detected by traditional detection approaches that show disadvantages in detection throughput. Next-generation sequencing technology provides a novel, high sensitive method for virus detection and diagnosis. METHODOLOGY/PRINCIPAL FINDINGS: We report the polyadenylated RNA virome of three sweet potato cultivars using a high throughput RNA sequencing approach. Transcripts of 15 different viruses were detected, 11 of which were detected in cultivar Xushu18, whilst 11 and 4 viruses were detected in Guangshu 87 and Jingshu 6, respectively. Four were detected in sweet potato for the first time, and 4 were found for the first time in China. The most prevalent virus was SPFMV, which constituted 88% of the total viral sequence reads. Virus transcripts with extremely low expression levels were also detected, such as transcripts of SPLCV, CMV and CymMV. Digital gene expression (DGE and reverse transcription polymerase chain reaction (RT-PCR analyses showed that the highest viral transcript expression levels were found in fibrous and tuberous roots, which suggest that these tissues should be optimum samples for virus detection. CONCLUSIONS/SIGNIFICANCE: A total of 15 viruses were presumed to present in three sweet potato cultivars growing in China. This is the first insight into the sweet potato polyadenylated RNA virome. These results can serve as a basis for further work to investigate whether some of the 'new' viruses infecting sweet potato are pathogenic.

  11. Exploring the polyadenylated RNA virome of sweet potato through high-throughput sequencing.

    Science.gov (United States)

    Gu, Ying-Hong; Tao, Xiang; Lai, Xian-Jun; Wang, Hai-Yan; Zhang, Yi-Zheng

    2014-01-01

    Viral diseases are the second most significant biotic stress for sweet potato, with yield losses reaching 20% to 40%. Over 30 viruses have been reported to infect sweet potato around the world, and 11 of these have been detected in China. Most of these viruses were detected by traditional detection approaches that show disadvantages in detection throughput. Next-generation sequencing technology provides a novel, high sensitive method for virus detection and diagnosis. We report the polyadenylated RNA virome of three sweet potato cultivars using a high throughput RNA sequencing approach. Transcripts of 15 different viruses were detected, 11 of which were detected in cultivar Xushu18, whilst 11 and 4 viruses were detected in Guangshu 87 and Jingshu 6, respectively. Four were detected in sweet potato for the first time, and 4 were found for the first time in China. The most prevalent virus was SPFMV, which constituted 88% of the total viral sequence reads. Virus transcripts with extremely low expression levels were also detected, such as transcripts of SPLCV, CMV and CymMV. Digital gene expression (DGE) and reverse transcription polymerase chain reaction (RT-PCR) analyses showed that the highest viral transcript expression levels were found in fibrous and tuberous roots, which suggest that these tissues should be optimum samples for virus detection. A total of 15 viruses were presumed to present in three sweet potato cultivars growing in China. This is the first insight into the sweet potato polyadenylated RNA virome. These results can serve as a basis for further work to investigate whether some of the 'new' viruses infecting sweet potato are pathogenic.

  12. Improved detection of artifactual viral minority variants in high-throughput sequencing data

    Directory of Open Access Journals (Sweden)

    Matthijs Rudolf Albert Welkers

    2015-01-01

    Full Text Available High-throughput sequencing (HTS of viral samples provides important information on the presence of viral minority variants. However, detection and accurate quantification is limited by the capacity to distinguish biological from artificial variation. In this study, errors related to the Illumina Hiseq2000 library generation and HTS process were investigated by determining minority variant frequencies in an influenza A/WSN/1933(H1N1 virus reverse-genetics plasmid pool. Errors related to amplification and sequencing were determined using the same plasmid pool, by generation of infectious virus using reverse genetics followed by in duplo reverse-transcriptase PCR (RT-PCR amplification and HTS in the same sequence run. Results showed that after ‘best practice’ quality control (QC, within the plasmid pool, 1 minority variant with a frequency >0.5% was identified, while 84 and 139 were identified in the RT-PCR amplified samples, indicating RT-PCR amplification artificially increased variation. Detailed analysis showed that artifactual minority variants could be identified by two major technical characteristics: their predominant presence in a single read orientation and uneven distribution of mismatches over the length of the reads. We demonstrate that by addition of two QC steps 95% of the artifactual minority variants could be identified. When our analysis approach was applied to 3 clinical samples 68% of the initially identified minority variants were identified as artifacts. Our study clearly demonstrated that, without additional QC steps, overestimation of viral minority variants is very likely to occur, mainly as a consequence of the required RT-PCR amplification step. The improved ability to detect and correct for artifactual minority variants, increases data resolution and could aid both past and future studies incorporating HTS. The source code has been made available through Sourceforge (https://sourceforge.net/projects/mva-ngs.

  13. Transcriptomic analysis of Petunia hybrida in response to salt stress using high throughput RNA sequencing.

    Directory of Open Access Journals (Sweden)

    Gonzalo H Villarino

    Full Text Available Salinity and drought stress are the primary cause of crop losses worldwide. In sodic saline soils sodium chloride (NaCl disrupts normal plant growth and development. The complex interactions of plant systems with abiotic stress have made RNA sequencing a more holistic and appealing approach to study transcriptome level responses in a single cell and/or tissue. In this work, we determined the Petunia transcriptome response to NaCl stress by sequencing leaf samples and assembling 196 million Illumina reads with Trinity software. Using our reference transcriptome we identified more than 7,000 genes that were differentially expressed within 24 h of acute NaCl stress. The proposed transcriptome can also be used as an excellent tool for biological and bioinformatics in the absence of an available Petunia genome and it is available at the SOL Genomics Network (SGN http://solgenomics.net. Genes related to regulation of reactive oxygen species, transport, and signal transductions as well as novel and undescribed transcripts were among those differentially expressed in response to salt stress. The candidate genes identified in this study can be applied as markers for breeding or to genetically engineer plants to enhance salt tolerance. Gene Ontology analyses indicated that most of the NaCl damage happened at 24 h inducing genotoxicity, affecting transport and organelles due to the high concentration of Na+ ions. Finally, we report a modification to the library preparation protocol whereby cDNA samples were bar-coded with non-HPLC purified primers, without affecting the quality and quantity of the RNA-seq data. The methodological improvement presented here could substantially reduce the cost of sample preparation for future high-throughput RNA sequencing experiments.

  14. Annotation of primate miRNAs by high throughput sequencing of small RNA libraries

    Directory of Open Access Journals (Sweden)

    Dannemann Michael

    2012-03-01

    Full Text Available Abstract Background In addition to genome sequencing, accurate functional annotation of genomes is required in order to carry out comparative and evolutionary analyses between species. Among primates, the human genome is the most extensively annotated. Human miRNA gene annotation is based on multiple lines of evidence including evidence for expression as well as prediction of the characteristic hairpin structure. In contrast, most miRNA genes in non-human primates are annotated based on homology without any expression evidence. We have sequenced small-RNA libraries from chimpanzee, gorilla, orangutan and rhesus macaque from multiple individuals and tissues. Using patterns of miRNA expression in conjunction with a model of miRNA biogenesis we used these high-throughput sequencing data to identify novel miRNAs in non-human primates. Results We predicted 47 new miRNAs in chimpanzee, 240 in gorilla, 55 in orangutan and 47 in rhesus macaque. The algorithm we used was able to predict 64% of the previously known miRNAs in chimpanzee, 94% in gorilla, 61% in orangutan and 71% in rhesus macaque. We therefore added evidence for expression in between one and five tissues to miRNAs that were previously annotated based only on homology to human miRNAs. We increased from 60 to 175 the number miRNAs that are located in orthologous regions in humans and the four non-human primate species studied here. Conclusions In this study we provide expression evidence for homology-based annotated miRNAs and predict de novo miRNAs in four non-human primate species. We increased the number of annotated miRNA genes and provided evidence for their expression in four non-human primates. Similar approaches using different individuals and tissues would improve annotation in non-human primates and allow for further comparative studies in the future.

  15. Diversity and Structure of Diazotrophic Communities in Mangrove Rhizosphere, Revealed by High-Throughput Sequencing

    Directory of Open Access Journals (Sweden)

    Yanying Zhang

    2017-10-01

    Full Text Available Diazotrophic communities make an essential contribution to the productivity through providing new nitrogen. However, knowledge of the roles that both mangrove tree species and geochemical parameters play in shaping mangove rhizosphere diazotrophic communities is still elusive. Here, a comprehensive examination of the diversity and structure of microbial communities in the rhizospheres of three mangrove species, Rhizophora apiculata, Avicennia marina, and Ceriops tagal, was undertaken using high-throughput sequencing of the 16S rRNA and nifH genes. Our results revealed a great diversity of both the total microbial composition and the diazotrophic composition specifically in the mangrove rhizosphere. Deltaproteobacteria and Gammaproteobacteria were both ubiquitous and dominant, comprising an average of 45.87 and 86.66% of total microbial and diazotrophic communities, respectively. Sulfate-reducing bacteria belonging to the Desulfobacteraceae and Desulfovibrionaceae were the dominant diazotrophs. Community statistical analyses suggested that both mangrove tree species and additional environmental variables played important roles in shaping total microbial and potential diazotroph communities in mangrove rhizospheres. In contrast to the total microbial community investigated by analysis of 16S rRNA gene sequences, most of the dominant diazotrophic groups identified by nifH gene sequences were significantly different among mangrove species. The dominant diazotrophs of the family Desulfobacteraceae were positively correlated with total phosphorus, but negatively correlated with the nitrogen to phosphorus ratio. The Pseudomonadaceae were positively correlated with the concentration of available potassium, suggesting that diazotrophs potentially play an important role in biogeochemical cycles, such as those of nitrogen, phosphorus, sulfur, and potassium, in the mangrove ecosystem.

  16. NucTools: analysis of chromatin feature occupancy profiles from high-throughput sequencing data.

    Science.gov (United States)

    Vainshtein, Yevhen; Rippe, Karsten; Teif, Vladimir B

    2017-02-14

    Biomedical applications of high-throughput sequencing methods generate a vast amount of data in which numerous chromatin features are mapped along the genome. The results are frequently analysed by creating binary data sets that link the presence/absence of a given feature to specific genomic loci. However, the nucleosome occupancy or chromatin accessibility landscape is essentially continuous. It is currently a challenge in the field to cope with continuous distributions of deep sequencing chromatin readouts and to integrate the different types of discrete chromatin features to reveal linkages between them. Here we introduce the NucTools suite of Perl scripts as well as MATLAB- and R-based visualization programs for a nucleosome-centred downstream analysis of deep sequencing data. NucTools accounts for the continuous distribution of nucleosome occupancy. It allows calculations of nucleosome occupancy profiles averaged over several replicates, comparisons of nucleosome occupancy landscapes between different experimental conditions, and the estimation of the changes of integral chromatin properties such as the nucleosome repeat length. Furthermore, NucTools facilitates the annotation of nucleosome occupancy with other chromatin features like binding of transcription factors or architectural proteins, and epigenetic marks like histone modifications or DNA methylation. The applications of NucTools are demonstrated for the comparison of several datasets for nucleosome occupancy in mouse embryonic stem cells (ESCs) and mouse embryonic fibroblasts (MEFs). The typical workflows of data processing and integrative analysis with NucTools reveal information on the interplay of nucleosome positioning with other features such as for example binding of a transcription factor CTCF, regions with stable and unstable nucleosomes, and domains of large organized chromatin K9me2 modifications (LOCKs). As potential limitations and problems we discuss how inter-replicate variability of

  17. Preselection of shotgun clones by oligonucleotide fingerprinting: an efficient and high throughput strategy to reduce redundancy in large-scale sequencing projects

    National Research Council Canada - National Science Library

    Radelof, U; Hennig, S; Seranski, P; Steinfath, M; Ramser, J; Reinhardt, R; Poustka, A; Francis, F; Lehrach, H

    1998-01-01

    .... To reduce the overall effort and cost of those projects and to accelerate the sequencing throughput, we have developed an efficient, high throughput oligonucleotide fingerprinting protocol to select...

  18. Comparison of mapping algorithms used in high-throughput sequencing: application to Ion Torrent data.

    Science.gov (United States)

    Caboche, Ségolène; Audebert, Christophe; Lemoine, Yves; Hot, David

    2014-04-05

    The rapid evolution in high-throughput sequencing (HTS) technologies has opened up new perspectives in several research fields and led to the production of large volumes of sequence data. A fundamental step in HTS data analysis is the mapping of reads onto reference sequences. Choosing a suitable mapper for a given technology and a given application is a subtle task because of the difficulty of evaluating mapping algorithms. In this paper, we present a benchmark procedure to compare mapping algorithms used in HTS using both real and simulated datasets and considering four evaluation criteria: computational resource and time requirements, robustness of mapping, ability to report positions for reads in repetitive regions, and ability to retrieve true genetic variation positions. To measure robustness, we introduced a new definition for a correctly mapped read taking into account not only the expected start position of the read but also the end position and the number of indels and substitutions. We developed CuReSim, a new read simulator, that is able to generate customized benchmark data for any kind of HTS technology by adjusting parameters to the error types. CuReSim and CuReSimEval, a tool to evaluate the mapping quality of the CuReSim simulated reads, are freely available. We applied our benchmark procedure to evaluate 14 mappers in the context of whole genome sequencing of small genomes with Ion Torrent data for which such a comparison has not yet been established. A benchmark procedure to compare HTS data mappers is introduced with a new definition for the mapping correctness as well as tools to generate simulated reads and evaluate mapping quality. The application of this procedure to Ion Torrent data from the whole genome sequencing of small genomes has allowed us to validate our benchmark procedure and demonstrate that it is helpful for selecting a mapper based on the intended application, questions to be addressed, and the technology used. This benchmark

  19. Methylation Sensitive Amplification Polymorphism Sequencing (MSAP-Seq)-A Method for High-Throughput Analysis of Differentially Methylated CCGG Sites in Plants with Large Genomes.

    Science.gov (United States)

    Chwialkowska, Karolina; Korotko, Urszula; Kosinska, Joanna; Szarejko, Iwona; Kwasniewski, Miroslaw

    2017-01-01

    Epigenetic mechanisms, including histone modifications and DNA methylation, mutually regulate chromatin structure, maintain genome integrity, and affect gene expression and transposon mobility. Variations in DNA methylation within plant populations, as well as methylation in response to internal and external factors, are of increasing interest, especially in the crop research field. Methylation Sensitive Amplification Polymorphism (MSAP) is one of the most commonly used methods for assessing DNA methylation changes in plants. This method involves gel-based visualization of PCR fragments from selectively amplified DNA that are cleaved using methylation-sensitive restriction enzymes. In this study, we developed and validated a new method based on the conventional MSAP approach called Methylation Sensitive Amplification Polymorphism Sequencing (MSAP-Seq). We improved the MSAP-based approach by replacing the conventional separation of amplicons on polyacrylamide gels with direct, high-throughput sequencing using Next Generation Sequencing (NGS) and automated data analysis. MSAP-Seq allows for global sequence-based identification of changes in DNA methylation. This technique was validated in Hordeum vulgare. However, MSAP-Seq can be straightforwardly implemented in different plant species, including crops with large, complex and highly repetitive genomes. The incorporation of high-throughput sequencing into MSAP-Seq enables parallel and direct analysis of DNA methylation in hundreds of thousands of sites across the genome. MSAP-Seq provides direct genomic localization of changes and enables quantitative evaluation. We have shown that the MSAP-Seq method specifically targets gene-containing regions and that a single analysis can cover three-quarters of all genes in large genomes. Moreover, MSAP-Seq's simplicity, cost effectiveness, and high-multiplexing capability make this method highly affordable. Therefore, MSAP-Seq can be used for DNA methylation analysis in crop

  20. Unbiased Characterization of Anopheles Mosquito Blood Meals by Targeted High-Throughput Sequencing.

    Directory of Open Access Journals (Sweden)

    Kyle Logue

    2016-03-01

    Full Text Available Understanding mosquito host choice is important for assessing vector competence or identifying disease reservoirs. Unfortunately, the availability of an unbiased method for comprehensively evaluating the composition of insect blood meals is very limited, as most current molecular assays only test for the presence of a few pre-selected species. These approaches also have limited ability to identify the presence of multiple mammalian hosts in a single blood meal. Here, we describe a novel high-throughput sequencing method that enables analysis of 96 mosquitoes simultaneously and provides a comprehensive and quantitative perspective on the composition of each blood meal. We validated in silico that universal primers targeting the mammalian mitochondrial 16S ribosomal RNA genes (16S rRNA should amplify more than 95% of the mammalian 16S rRNA sequences present in the NCBI nucleotide database. We applied this method to 442 female Anopheles punctulatus s. l. mosquitoes collected in Papua New Guinea (PNG. While human (52.9%, dog (15.8% and pig (29.2% were the most common hosts identified in our study, we also detected DNA from mice, one marsupial species and two bat species. Our analyses also revealed that 16.3% of the mosquitoes fed on more than one host. Analysis of the human mitochondrial hypervariable region I in 102 human blood meals showed that 5 (4.9% of the mosquitoes unambiguously fed on more than one person. Overall, analysis of PNG mosquitoes illustrates the potential of this approach to identify unsuspected hosts and characterize mixed blood meals, and shows how this approach can be adapted to evaluate inter-individual variations among human blood meals. Furthermore, this approach can be applied to any disease-transmitting arthropod and can be easily customized to investigate non-mammalian host sources.

  1. High-throughput sequencing of RNA silencing-associated small RNAs in olive (Olea europaea L..

    Directory of Open Access Journals (Sweden)

    Livia Donaire

    Full Text Available Small RNAs (sRNAs of 20 to 25 nucleotides (nt in length maintain genome integrity and control gene expression in a multitude of developmental and physiological processes. Despite RNA silencing has been primarily studied in model plants, the advent of high-throughput sequencing technologies has enabled profiling of the sRNA component of more than 40 plant species. Here, we used deep sequencing and molecular methods to report the first inventory of sRNAs in olive (Olea europaea L.. sRNA libraries prepared from juvenile and adult shoots revealed that the 24-nt class dominates the sRNA transcriptome and atypically accumulates to levels never seen in other plant species, suggesting an active role of heterochromatin silencing in the maintenance and integrity of its large genome. A total of 18 known miRNA families were identified in the libraries. Also, 5 other sRNAs derived from potential hairpin-like precursors remain as plausible miRNA candidates. RNA blots confirmed miRNA expression and suggested tissue- and/or developmental-specific expression patterns. Target mRNAs of conserved miRNAs were computationally predicted among the olive cDNA collection and experimentally validated through endonucleolytic cleavage assays. Finally, we use expression data to uncover genetic components of the miR156, miR172 and miR390/TAS3-derived trans-acting small interfering RNA (tasiRNA regulatory nodes, suggesting that these interactive networks controlling developmental transitions are fully operational in olive.

  2. Forensic soil DNA analysis using high-throughput sequencing: a comparison of four molecular markers.

    Science.gov (United States)

    Young, Jennifer M; Weyrich, Laura S; Cooper, Alan

    2014-11-01

    Soil analysis, such as mineralogy, geophysics, texture and colour, are commonly used in forensic casework to link a suspect to a crime scene. However, DNA analysis can also be applied to characterise the vast diversity of organisms present in soils. DNA metabarcoding and high-throughput sequencing (HTS) now offer a means to improve discrimination between forensic soil samples by identifying individual taxa and exploring non-culturable microbial species. Here, we compare the small-scale reproducibility and resolution of four molecular markers targeting different taxa (bacterial 16S rRNA, eukaryotic18S rRNA, plant trnL intron and fungal internal transcribed spacer I (ITS1) rDNA) to distinguish two sample sites. We also assess the background DNA level associated with each marker and examine the effects of filtering Operational Taxonomic Units (OTUs) detected in extraction blank controls. From this study, we show that non-bacterial taxa in soil, particularly fungi, can provide the greatest resolution between the sites, whereas plant markers may be problematic for forensic discrimination. ITS and 18S markers exhibit reliable amplification, and both show high discriminatory power with low background DNA levels. The 16S rRNA marker showed comparable discriminatory power post filtering; however, presented the highest level of background DNA. The discriminatory power of all markers was increased by applying OTU filtering steps, with the greatest improvement observed by the removal of any sequences detected in extraction blanks. This study demonstrates the potential use of multiple DNA markers for forensic soil analysis using HTS, and identifies some of the standardisation and evaluation steps necessary before this technique can be applied in casework. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  3. Influence of artifact removal on rare species recovery in natural complex communities using high-throughput sequencing.

    Directory of Open Access Journals (Sweden)

    Aibin Zhan

    Full Text Available Large-scale high-throughput sequencing techniques are rapidly becoming popular methods to profile complex communities and have generated deep insights into community biodiversity. However, several technical problems, especially sequencing artifacts such as nucleotide calling errors, could artificially inflate biodiversity estimates. Sequence filtering for artifact removal is a conventional method for deleting error-prone sequences from high-throughput sequencing data. As rare species represented by low-abundance sequences in datasets may be sensitive to artifact removal process, the influence of artifact removal on rare species recovery has not been well evaluated in natural complex communities. Here we employed both internal (reliable operational taxonomic units selected from communities themselves and external (indicator species spiked into communities references to evaluate the influence of artifact removal on rare species recovery using 454 pyrosequencing of complex plankton communities collected from both freshwater and marine habitats. Multiple analyses revealed three clear patterns: 1 rare species were eliminated during sequence filtering process at all tested filtering stringencies, 2 more rare taxa were eliminated as filtering stringencies increased, and 3 elimination of rare species intensified as biomass of a species in a community was reduced. Our results suggest that cautions be applied when processing high-throughput sequencing data, especially for rare taxa detection for conservation of species at risk and for rapid response programs targeting non-indigenous species. Establishment of both internal and external references proposed here provides a practical strategy to evaluate artifact removal process.

  4. Extracellular DNA amplicon sequencing reveals high levels of benthic eukaryotic diversity in the central Red Sea

    KAUST Repository

    Pearman, John K.

    2015-11-01

    The present study aims to characterize the benthic eukaryotic biodiversity patterns at a coarse taxonomic level in three areas of the central Red Sea (a lagoon, an offshore area in Thuwal and a shallow coastal area near Jeddah) based on extracellular DNA. High-throughput amplicon sequencing targeting the V9 region of the 18S rRNA gene was undertaken for 32 sediment samples. High levels of alpha-diversity were detected with 16,089 operational taxonomic units (OTUs) being identified. The majority of the OTUs were assigned to Metazoa (29.2%), Alveolata (22.4%) and Stramenopiles (17.8%). Stramenopiles (Diatomea) and Alveolata (Ciliophora) were frequent in a lagoon and in shallower coastal stations, whereas metazoans (Arthropoda: Maxillopoda) were dominant in deeper offshore stations. Only 24.6% of total OTUs were shared among all areas. Beta-diversity was generally lower between the lagoon and Jeddah (nearshore) than between either of those and the offshore area, suggesting a nearshore–offshore biodiversity gradient. The current approach allowed for a broad-range of benthic eukaryotic biodiversity to be analysed with significantly less labour than would be required by other traditional taxonomic approaches. Our findings suggest that next generation sequencing techniques have the potential to provide a fast and standardised screening of benthic biodiversity at large spatial and temporal scales.

  5. High-throughput 16S rRNA gene sequencing reveals alterations of intestinal microbiota in myalgic encephalomyelitis/chronic fatigue syndrome patients.

    Science.gov (United States)

    Frémont, Marc; Coomans, Danny; Massart, Sebastien; De Meirleir, Kenny

    2013-08-01

    Human intestinal microbiota plays an important role in the maintenance of host health by providing energy, nutrients, and immunological protection. Intestinal dysfunction is a frequent complaint in myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) patients, and previous reports suggest that dysbiosis, i.e. the overgrowth of abnormal populations of bacteria in the gut, is linked to the pathogenesis of the disease. We used high-throughput 16S rRNA gene sequencing to investigate the presence of specific alterations in the gut microbiota of ME/CFS patients from Belgium and Norway. 43 ME/CFS patients and 36 healthy controls were included in the study. Bacterial DNA was extracted from stool samples, PCR amplification was performed on 16S rRNA gene regions, and PCR amplicons were sequenced using Roche FLX 454 sequencer. The composition of the gut microbiota was found to differ between Belgian controls and Norwegian controls: Norwegians showed higher percentages of specific Firmicutes populations (Roseburia, Holdemania) and lower proportions of most Bacteroidetes genera. A highly significant separation could be achieved between Norwegian controls and Norwegian patients: patients presented increased proportions of Lactonifactor and Alistipes, as well as a decrease in several Firmicutes populations. In Belgian subjects the patient/control separation was less pronounced, however some abnormalities observed in Norwegian patients were also found in Belgian patients. These results show that intestinal microbiota is altered in ME/CFS. High-throughput sequencing is a useful tool to diagnose dysbiosis in patients and could help designing treatments based on gut microbiota modulation (antibiotics, pre and probiotics supplementation). Copyright © 2013 The Authors. Published by Elsevier Ltd.. All rights reserved.

  6. Novel Sequencing-based Strategies for High-Throughput Discovery of Genetic Mutations Underlying Inherited Antibody Deficiency Disorders

    OpenAIRE

    Wang, Hong-Ying; Jain, Ashish

    2011-01-01

    Human inherited antibody deficiency disorders are generally caused by mutations in genes involved in the pathways regulating B-cell class switch recombination; DNA damage repair; and B-cell development, differentiation, and survival. Sequencing a large set of candidate genes involved in these pathways appears to be a highly efficient way to identify novel mutations. Herein we review several high-throughput sequencing approaches as well as recent improvements in target gene enrichment technolo...

  7. High-throughput next-generation sequencing to genotype six classical HLA loci from 96 donors in a single MiSeq run.

    Science.gov (United States)

    Ehrenberg, P K; Geretz, A; Sindhu, R K; Vayntrub, T; Fernández Viña, M A; Apps, R; Michael, N L; Thomas, R

    2017-11-01

    Next generation sequencing (NGS) methods have been established as an efficient approach for HLA typing because unlike traditional Sanger sequencing, they provide unambiguous results at a reasonable cost. We previously developed a multi-locus index method to genotype four HLA loci (A, B, C, and DRB1) on the Illumina MiSeq platform. We have now expanded this method to include two additional loci, HLA-DPB1 and DQB1. Contiguous full-length amplicons from 5'UTR through 3'UTR regions were generated using one long-range PCR reaction per locus for each of the six loci from 96 individuals of different ethnicities. The six amplicons from each donor were pooled, enzymatically fragmented and given a donor-specific index. This approach enabled sequencing of 576 loci from 96 individuals in a single MiSeq run. Donor-specific sequence reads were demultiplexed, and allele calls were generated from FASTQ files using commercially available software. Comparison to HLA genotypes generated from Sanger sequence-based typing (SBT) identified no discordances among any of the alleles analyzed in this study. Importantly, this method was able to resolve 22 DPB1 and 20 DQB1 alleles that were ambiguous with the SBT method. Furthermore, a novel allele in each of these two loci was identified, with the DQB1*05:01:24 allele having a frequency of greater than five percent. This method was subsequently validated against a blinded panel of 22 samples from the 17th International HLA and Immunogenetics Workshop. The flexibility of the method is further highlighted by successful genotyping of eight loci comprising all classical HLA loci for a subset of the samples. We now present a high-throughput, high-resolution, scalable NGS HLA typing method to accurately and efficiently genotype all classical HLA class I and II loci. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  8. Construction and analysis of an integrated regulatory network derived from high-throughput sequencing data.

    Directory of Open Access Journals (Sweden)

    Chao Cheng

    2011-11-01

    Full Text Available We present a network framework for analyzing multi-level regulation in higher eukaryotes based on systematic integration of various high-throughput datasets. The network, namely the integrated regulatory network, consists of three major types of regulation: TF→gene, TF→miRNA and miRNA→gene. We identified the target genes and target miRNAs for a set of TFs based on the ChIP-Seq binding profiles, the predicted targets of miRNAs using annotated 3'UTR sequences and conservation information. Making use of the system-wide RNA-Seq profiles, we classified transcription factors into positive and negative regulators and assigned a sign for each regulatory interaction. Other types of edges such as protein-protein interactions and potential intra-regulations between miRNAs based on the embedding of miRNAs in their host genes were further incorporated. We examined the topological structures of the network, including its hierarchical organization and motif enrichment. We found that transcription factors downstream of the hierarchy distinguish themselves by expressing more uniformly at various tissues, have more interacting partners, and are more likely to be essential. We found an over-representation of notable network motifs, including a FFL in which a miRNA cost-effectively shuts down a transcription factor and its target. We used data of C. elegans from the modENCODE project as a primary model to illustrate our framework, but further verified the results using other two data sets. As more and more genome-wide ChIP-Seq and RNA-Seq data becomes available in the near future, our methods of data integration have various potential applications.

  9. Polyadenylated Sequencing Primers Enable Complete Readability of PCR Amplicons Analyzed by Dideoxynucleotide Sequencing

    Directory of Open Access Journals (Sweden)

    Martin Beránek

    2012-01-01

    Full Text Available Dideoxynucleotide DNA sequencing is one of the principal procedures in molecular biology. Loss of an initial part of nucleotides behind the 3' end of the sequencing primer limits the readability of sequenced amplicons. We present a method which extends the readability by using sequencing primers modified by polyadenylated tails attached to their 5' ends. Performing a polymerase chain reaction, we amplified eight amplicons of six human genes (AMELX, APOE, HFE, MBL2, SERPINA1 and TGFB1 ranging from 106 bp to 680 bp. Polyadenylation of the sequencing primers minimized the loss of bases in all amplicons. Complete sequences of shorter products (AMELX 106 bp, SERPINA1 121 bp, HFE 208 bp, APOE 244 bp, MBL2 317 bp were obtained. In addition, in the case of TGFB1 products (366 bp, 432 bp, and 680 bp, respectively, the lengths of sequencing readings were significantly longer if adenylated primers were used. Thus, single strand dideoxynucleotide sequencing with adenylated primers enables complete or near complete readability of short PCR amplicons.

  10. Efficient strategy for the molecular diagnosis of intellectual disability using targeted high-throughput sequencing.

    Science.gov (United States)

    Redin, Claire; Gérard, Bénédicte; Lauer, Julia; Herenger, Yvan; Muller, Jean; Quartier, Angélique; Masurel-Paulet, Alice; Willems, Marjolaine; Lesca, Gaétan; El-Chehadeh, Salima; Le Gras, Stéphanie; Vicaire, Serge; Philipps, Muriel; Dumas, Michaël; Geoffroy, Véronique; Feger, Claire; Haumesser, Nicolas; Alembik, Yves; Barth, Magalie; Bonneau, Dominique; Colin, Estelle; Dollfus, Hélène; Doray, Bérénice; Delrue, Marie-Ange; Drouin-Garraud, Valérie; Flori, Elisabeth; Fradin, Mélanie; Francannet, Christine; Goldenberg, Alice; Lumbroso, Serge; Mathieu-Dramard, Michèle; Martin-Coignard, Dominique; Lacombe, Didier; Morin, Gilles; Polge, Anne; Sukno, Sylvie; Thauvin-Robinet, Christel; Thevenon, Julien; Doco-Fenzy, Martine; Genevieve, David; Sarda, Pierre; Edery, Patrick; Isidor, Bertrand; Jost, Bernard; Olivier-Faivre, Laurence; Mandel, Jean-Louis; Piton, Amélie

    2014-11-01

    Intellectual disability (ID) is characterised by an extreme genetic heterogeneity. Several hundred genes have been associated to monogenic forms of ID, considerably complicating molecular diagnostics. Trio-exome sequencing was recently proposed as a diagnostic approach, yet remains costly for a general implementation. We report the alternative strategy of targeted high-throughput sequencing of 217 genes in which mutations had been reported in patients with ID or autism as the major clinical concern. We analysed 106 patients with ID of unknown aetiology following array-CGH analysis and other genetic investigations. Ninety per cent of these patients were males, and 75% sporadic cases. We identified 26 causative mutations: 16 in X-linked genes (ATRX, CUL4B, DMD, FMR1, HCFC1, IL1RAPL1, IQSEC2, KDM5C, MAOA, MECP2, SLC9A6, SLC16A2, PHF8) and 10 de novo in autosomal-dominant genes (DYRK1A, GRIN1, MED13L, TCF4, RAI1, SHANK3, SLC2A1, SYNGAP1). We also detected four possibly causative mutations (eg, in NLGN3) requiring further investigations. We present detailed reasoning for assigning causality for each mutation, and associated patients' clinical information. Some genes were hit more than once in our cohort, suggesting they correspond to more frequent ID-associated conditions (KDM5C, MECP2, DYRK1A, TCF4). We highlight some unexpected genotype to phenotype correlations, with causative mutations being identified in genes associated to defined syndromes in patients deviating from the classic phenotype (DMD, TCF4, MECP2). We also bring additional supportive (HCFC1, MED13L) or unsupportive (SHROOM4, SRPX2) evidences for the implication of previous candidate genes or mutations in cognitive disorders. With a diagnostic yield of 25% targeted sequencing appears relevant as a first intention test for the diagnosis of ID, but importantly will also contribute to a better understanding regarding the specific contribution of the many genes implicated in ID and autism. Published by the

  11. Experimental design-based functional mining and characterization of high-throughput sequencing data in the sequence read archive.

    Science.gov (United States)

    Nakazato, Takeru; Ohta, Tazro; Bono, Hidemasa

    2013-01-01

    High-throughput sequencing technology, also called next-generation sequencing (NGS), has the potential to revolutionize the whole process of genome sequencing, transcriptomics, and epigenetics. Sequencing data is captured in a public primary data archive, the Sequence Read Archive (SRA). As of January 2013, data from more than 14,000 projects have been submitted to SRA, which is double that of the previous year. Researchers can download raw sequence data from SRA website to perform further analyses and to compare with their own data. However, it is extremely difficult to search entries and download raw sequences of interests with SRA because the data structure is complicated, and experimental conditions along with raw sequences are partly described in natural language. Additionally, some sequences are of inconsistent quality because anyone can submit sequencing data to SRA with no quality check. Therefore, as a criterion of data quality, we focused on SRA entries that were cited in journal articles. We extracted SRA IDs and PubMed IDs (PMIDs) from SRA and full-text versions of journal articles and retrieved 2748 SRA ID-PMID pairs. We constructed a publication list referring to SRA entries. Since, one of the main themes of -omics analyses is clarification of disease mechanisms, we also characterized SRA entries by disease keywords, according to the Medical Subject Headings (MeSH) extracted from articles assigned to each SRA entry. We obtained 989 SRA ID-MeSH disease term pairs, and constructed a disease list referring to SRA data. We previously developed feature profiles of diseases in a system called "Gendoo". We generated hyperlinks between diseases extracted from SRA and the feature profiles of it. The developed project, publication and disease lists resulting from this study are available at our web service, called "DBCLS SRA" (http://sra.dbcls.jp/). This service will improve accessibility to high-quality data from SRA.

  12. Experimental design-based functional mining and characterization of high-throughput sequencing data in the sequence read archive.

    Directory of Open Access Journals (Sweden)

    Takeru Nakazato

    Full Text Available High-throughput sequencing technology, also called next-generation sequencing (NGS, has the potential to revolutionize the whole process of genome sequencing, transcriptomics, and epigenetics. Sequencing data is captured in a public primary data archive, the Sequence Read Archive (SRA. As of January 2013, data from more than 14,000 projects have been submitted to SRA, which is double that of the previous year. Researchers can download raw sequence data from SRA website to perform further analyses and to compare with their own data. However, it is extremely difficult to search entries and download raw sequences of interests with SRA because the data structure is complicated, and experimental conditions along with raw sequences are partly described in natural language. Additionally, some sequences are of inconsistent quality because anyone can submit sequencing data to SRA with no quality check. Therefore, as a criterion of data quality, we focused on SRA entries that were cited in journal articles. We extracted SRA IDs and PubMed IDs (PMIDs from SRA and full-text versions of journal articles and retrieved 2748 SRA ID-PMID pairs. We constructed a publication list referring to SRA entries. Since, one of the main themes of -omics analyses is clarification of disease mechanisms, we also characterized SRA entries by disease keywords, according to the Medical Subject Headings (MeSH extracted from articles assigned to each SRA entry. We obtained 989 SRA ID-MeSH disease term pairs, and constructed a disease list referring to SRA data. We previously developed feature profiles of diseases in a system called "Gendoo". We generated hyperlinks between diseases extracted from SRA and the feature profiles of it. The developed project, publication and disease lists resulting from this study are available at our web service, called "DBCLS SRA" (http://sra.dbcls.jp/. This service will improve accessibility to high-quality data from SRA.

  13. Obtaining representative community profiles of anaerobic digesters through optimisation of 16S rRNA amplicon sequencing protocols

    DEFF Research Database (Denmark)

    Kirkegaard, Rasmus Hansen; McIlroy, Simon Jon; Karst, Søren Michael

    RNA gene amplicon sequencing is rapid, cheap, high throughput, and has high taxonomic resolution. However, biases are introduced in multiple steps of this approach, including non-representative DNA extraction and uneven taxonomic coverage of selected PCR primers, potentially giving a skewed view...... of the community composition . As such sample specific optimisation and standardisation of DNA extraction, as well PCR primer selection, are essential to minimising the potential for such biases. The aim of this study was to develop a protocol for optimized community profiling of anaerobic digesters. The Fast......DNA SPIN kit was selected and the mechanical lysis parameters optimised for extraction of genomic DNA from mesophilic and thermophilic anaerobic digester samples. Different primer sets were compared for targeting the archaea and bacteria, both together and individually . Shotgun sequencing...

  14. High-throughput sequencing enhanced phage display enables the identification of patient-specific epitope motifs in serum

    DEFF Research Database (Denmark)

    Christiansen, Anders; Kringelum, Jens Vindahl; Hansen, Christian Skjødt

    2015-01-01

    Phage display is a prominent screening technique with a multitude of applications including therapeutic antibody development and mapping of antigen epitopes. In this study, phages were selected based on their interaction with patient serum and exhaustively characterised by high-throughput sequenc......Phage display is a prominent screening technique with a multitude of applications including therapeutic antibody development and mapping of antigen epitopes. In this study, phages were selected based on their interaction with patient serum and exhaustively characterised by high......-throughput sequencing. A bioinformatics approach was developed in order to identify peptide motifs of interest based on clustering and contrasting to control samples. Comparison of patient and control samples confirmed a major issue in phage display, namely the selection of unspecific peptides. The potential...... display by (i) enabling the analysis of complex biological samples, (ii) circumventing the traditional laborious picking and functional testing of individual phage clones and (iii) reducing the number of selection rounds....

  15. Deep amplicon sequencing reveals mixed phytoplasma infection within single grapevine plants

    DEFF Research Database (Denmark)

    Nicolaisen, Mogens; Contaldo, Nicoletta; Makarova, Olga

    2011-01-01

    The diversity of phytoplasmas within single plants has not yet been fully investigated. In this project, deep amplicon sequencing was used to generate 50,926 phytoplasma sequences from 11 phytoplasma-infected grapevine samples from a PCR amplicon in the 5' end of the 16S region. After clustering ...

  16. Characterization of Microbial Community in Lascaux Cave by High Throughput Sequencing

    Science.gov (United States)

    Alonso, Lise; Dubost, Audrey; Luis, Patricia; Pommier, Thomas; Moënne-Loccoz, Yvan

    2017-04-01

    The Lascaux Cave in South-Est France is an archeological landmark renowned for its Paleolithic paintings dating back c.18.000 years. Extensive touristic frequenting and repeated chemical treatments have resulted in the development of microbial stains on cave walls, which is a major issue in terms of art conservation. Therefore, it is of prime importance to better understand the microbial ecology of Lascaux Cave. Like many other caves, Lascaux is quite heterogeneous in terms of the nature and surface properties of rock walls within cave rooms, as well as the succession of rooms/galleries from the entrance to deeper areas of the cave. Lascaux Cave displays an additional levels of heterogeneity related to the presence of discontinuous stains on certain types of cave walls. We compared the microbial community (i.e. both prokaryotic and eukaryotic microbial populations) colonizing cave walls of different rooms/galleries, in and outside stains and in different cave layers, in successive years. Quantitative PCR analysis of cave wall samples gave in the order of 102 copies of 18S rRNA genes and 105 copies of 16S rRNA genes per ng of DNA, indicating significant colonization of all cave walls by micro-eukaryotes and especially bacteria. Illumina metagenomic analyses of cave wall samples was carried out based on four ribosomal DNA markers targeting bacteria, archaea, fungi, and other micro-eukaryotes. The results showed that the four microbial communities were highly diverse in and outside stains, as several hundred genera of microorganisms were identified in each. Proteobacteria were more prominent within stains whereas Bacteroidetes and Sordariomycetes were more prominent outside stains. High-throughput sequencing also showed that the nature/surface properties of cave walls were the main factor determining the structure and composition of microbial communities, ahead of the other heterogeneity factors studied i.e. location within the cave, presence of stain and sampling

  17. Target-dependent enrichment of virions determines the reduction of high-throughput sequencing in virus discovery.

    Directory of Open Access Journals (Sweden)

    Randi Holm Jensen

    Full Text Available Viral infections cause many different diseases stemming both from well-characterized viral pathogens but also from emerging viruses, and the search for novel viruses continues to be of great importance. High-throughput sequencing is an important technology for this purpose. However, viral nucleic acids often constitute a minute proportion of the total genetic material in a sample from infected tissue. Techniques to enrich viral targets in high-throughput sequencing have been reported, but the sensitivity of such methods is not well established. This study compares different library preparation techniques targeting both DNA and RNA with and without virion enrichment. By optimizing the selection of intact virus particles, both by physical and enzymatic approaches, we assessed the effectiveness of the specific enrichment of viral sequences as compared to non-enriched sample preparations by selectively looking for and counting read sequences obtained from shotgun sequencing. Using shotgun sequencing of total DNA or RNA, viral targets were detected at concentrations corresponding to the predicted level, providing a foundation for estimating the effectiveness of virion enrichment. Virion enrichment typically produced a 1000-fold increase in the proportion of DNA virus sequences. For RNA virions the gain was less pronounced with a maximum 13-fold increase. This enrichment varied between the different sample concentrations, with no clear trend. Despite that less sequencing was required to identify target sequences, it was not evident from our data that a lower detection level was achieved by virion enrichment compared to shotgun sequencing.

  18. A high-throughput splinkerette-PCR method for the isolation and sequencing of retroviral insertion sites

    DEFF Research Database (Denmark)

    Uren, Anthony G; Mikkers, Harald; Kool, Jaap

    2009-01-01

    sites has been a major limitation to performing screens on this scale. Here we present a method for the high-throughput isolation of insertion sites using a highly efficient splinkerette-PCR method coupled with capillary or 454 sequencing. This protocol includes a description of the procedure for DNA...... optimized for the murine leukemia virus (MuLV), and can easily be performed in a 96-well plate format for the efficient multiplex isolation of insertion sites....

  19. Using high-throughput sequencing of ITS2 to describe Symbiodinium metacommunities in St. John, US Virgin Islands

    OpenAIRE

    Ross Cunning; Gates, Ruth D; Edmunds, Peter J.

    2017-01-01

    Symbiotic microalgae (Symbiodinium spp.) strongly influence the performance and stress-tolerance of their coral hosts, making the analysis of Symbiodinium communities in corals (and metacommunities on reefs) advantageous for many aspects of coral reef research. High-throughput sequencing of ITS2 nrDNA offers unprecedented scale in describing these communities, yet high intragenomic variability at this locus complicates the resolution of biologically meaningful diversity. Here, we demonstrate ...

  20. Using machine learning and high-throughput RNA sequencing to classify the precursors of small non-coding RNAs.

    Science.gov (United States)

    Ryvkin, Paul; Leung, Yuk Yee; Ungar, Lyle H; Gregory, Brian D; Wang, Li-San

    2014-05-01

    Recent advances in high-throughput sequencing allow researchers to examine the transcriptome in more detail than ever before. Using a method known as high-throughput small RNA-sequencing, we can now profile the expression of small regulatory RNAs such as microRNAs and small interfering RNAs (siRNAs) with a great deal of sensitivity. However, there are many other types of small RNAs (small nucleolar RNAs), snRNAs (small nuclear RNAs), scRNAs (small cytoplasmic RNAs), tRNAs (transfer RNAs), and transposon-derived RNAs. Here, we present a user's guide for CoRAL (Classification of RNAs by Analysis of Length), a computational method for discriminating between different classes of RNA using high-throughput small RNA-sequencing data. Not only can CoRAL distinguish between RNA classes with high accuracy, but it also uses features that are relevant to small RNA biogenesis pathways. By doing so, CoRAL can give biologists a glimpse into the characteristics of different RNA processing pathways and how these might differ between tissue types, biological conditions, or even different species. CoRAL is available at http://wanglab.pcbi.upenn.edu/coral/. Copyright © 2013 Elsevier Inc. All rights reserved.

  1. High-throughput sequencing of fecal DNA to identify insects consumed by wild Weddell's saddleback tamarins (Saguinus weddelli, Cebidae, Primates) in Bolivia.

    Science.gov (United States)

    Mallott, E K; Malhi, R S; Garber, P A

    2015-03-01

    The genus Saguinus represents a successful radiation of over 20 species of small-bodied New World monkeys. Studies of the tamarin diet indicate that insects and small vertebrates account for ∼16-45% of total feeding and foraging time, and represent an important source of lipids, protein, and metabolizable energy. Although tamarins are reported to commonly consume large-bodied insects such as grasshoppers and walking sticks (Orthoptera), little is known concerning the degree to which smaller or less easily identifiable arthropod prey comprises an important component of their diet. To better understand tamarin arthropod feeding behavior, fecal samples from 20 wild Bolivian saddleback tamarins (members of five groups) were collected over a 3 week period in June 2012, and analyzed for the presence of arthropod DNA. DNA was extracted using a Qiagen stool extraction kit, and universal insect primers were created and used to amplify a ∼280 bp section of the COI mitochondrial gene. Amplicons were sequenced on the Roche 454 sequencing platform using high-throughput sequencing techniques. An analysis of these samples indicated the presence of 43 taxa of arthropods including 10 orders, 15 families, and 12 identified genera. Many of these taxa had not been previously identified in the tamarin diet. These results highlight molecular analysis of fecal DNA as an important research tool for identifying anthropod feeding patterns in primates, and reveal broad diversity in the taxa, foraging microhabitats, and size of arthropods consumed by tamarin monkeys. © 2014 Wiley Periodicals, Inc.

  2. CSReport: A New Computational Tool Designed for Automatic Analysis of Class Switch Recombination Junctions Sequenced by High-Throughput Sequencing.

    Science.gov (United States)

    Boyer, François; Boutouil, Hend; Dalloul, Iman; Dalloul, Zeinab; Cook-Moreau, Jeanne; Aldigier, Jean-Claude; Carrion, Claire; Herve, Bastien; Scaon, Erwan; Cogné, Michel; Péron, Sophie

    2017-05-15

    B cells ensure humoral immune responses due to the production of Ag-specific memory B cells and Ab-secreting plasma cells. In secondary lymphoid organs, Ag-driven B cell activation induces terminal maturation and Ig isotype class switch (class switch recombination [CSR]). CSR creates a virtually unique IgH locus in every B cell clone by intrachromosomal recombination between two switch (S) regions upstream of each C region gene. Amount and structural features of CSR junctions reveal valuable information about the CSR mechanism, and analysis of CSR junctions is useful in basic and clinical research studies of B cell functions. To provide an automated tool able to analyze large data sets of CSR junction sequences produced by high-throughput sequencing (HTS), we designed CSReport, a software program dedicated to support analysis of CSR recombination junctions sequenced with a HTS-based protocol (Ion Torrent technology). CSReport was assessed using simulated data sets of CSR junctions and then used for analysis of Sμ-Sα and Sμ-Sγ1 junctions from CH12F3 cells and primary murine B cells, respectively. CSReport identifies junction segment breakpoints on reference sequences and junction structure (blunt-ended junctions or junctions with insertions or microhomology). Besides the ability to analyze unprecedentedly large libraries of junction sequences, CSReport will provide a unified framework for CSR junction studies. Our results show that CSReport is an accurate tool for analysis of sequences from our HTS-based protocol for CSR junctions, thereby facilitating and accelerating their study. Copyright © 2017 by The American Association of Immunologists, Inc.

  3. Evaluation of a High-Throughput Repetitive-Sequence-Based PCR System for DNA Fingerprinting of Mycobacterium tuberculosis and Mycobacterium avium Complex Strains

    Science.gov (United States)

    Cangelosi, Gerard A.; Freeman, Robert J.; Lewis, Kaeryn N.; Livingston-Rosanoff, Devon; Shah, Ketan S.; Milan, Sparrow Joy; Goldberg, Stefan V.

    2004-01-01

    Repetitive-sequence-based PCR (rep-PCR) is useful for generating DNA fingerprints of diverse bacterial and fungal species. Rep-PCR amplicon fingerprints represent genomic segments lying between repetitive sequences. A commercial system that electrophoretically separates rep-PCR amplicons on microfluidic chips, and provides computer-generated readouts of results has been adapted for use with Mycobacterium species. The ability of this system to type M. tuberculosis and M. avium complex (MAC) isolates was evaluated. M. tuberculosis strains (n = 56) were typed by spoligotyping with rep-PCR as a high-resolution adjunct. Results were compared with those generated by a standard approach of spoligotyping with IS6110-targeted restriction fragment length polymorphism (IS6110-RFLP) as the high-resolution adjunct. The sample included 11 epidemiologically and genotypically linked outbreak isolates and a population-based sample of 45 isolates from recent immigrants to Seattle, Wash., from the African Horn countries of Somalia, Eritrea, and Ethiopia. Twenty isolates exhibited unique spoligotypes and were not analyzed further. Of the 36 outbreak and African Horn isolates with nonunique spoligotypes, 23 fell into four clusters identified by IS6110-RFLP and rep-PCR, with 97% concordance observed between the two methods. Both approaches revealed extensive strain heterogeneity within the African Horn sample, consistent with a predominant pattern of reactivation of latent infections in this immigrant population. Rep-PCR exhibited 89% concordance with IS1245-RFLP typing of 28 M. avium subspecies avium strains. For M. tuberculosis as well as M. avium subspecies avium, the discriminative power of rep-PCR equaled or exceeded that of RFLP. Rep-PCR also generated DNA fingerprints from M. intracellulare (n = 8) and MACx (n = 2) strains. It shows promise as a fast, unified method for high-throughput genotypic fingerprinting of multiple Mycobacterium species. PMID:15184453

  4. Optimisation of 16S rDNA amplicon sequencing protocols for microbial community profiling of anaerobic digesters

    DEFF Research Database (Denmark)

    Kirkegaard, Rasmus Hansen; McIlroy, Simon Jon; Larsen, Poul

    integrity, with optimal parameters involving four times harsher lysis conditions than recommended for the commercial kit, and appeared to be similar for both the mesophilic and thermophilic reactor biomass samples. The community profiling was found to be greatly influenced by the selected PCR primers......RNA gene amplicon sequencing is rapid, cheap, high throughput, and has high taxonomic resolution. However, biases are introduced in multiple steps of this approach, including non-representative DNA extraction and uneven taxonomic coverage of selected PCR primers, potentially giving a skewed view...... of the community composition. As such sample specific optimisation and standardisation of DNA extraction, as well PCR primer selection, are essential to minimising the potential for such biases. The aim of this study was to develop a protocol for optimized community profiling of anaerobic digesters. The Fast...

  5. Exploring fungal diversity in deep-sea sediments from Okinawa Trough using high-throughput Illumina sequencing

    Science.gov (United States)

    Zhang, Xiao-Yong; Wang, Guang-Hua; Xu, Xin-Ya; Nong, Xu-Hua; Wang, Jie; Amin, Muhammad; Qi, Shu-Hua

    2016-10-01

    The present study investigated the fungal diversity in four different deep-sea sediments from Okinawa Trough using high-throughput Illumina sequencing of the nuclear ribosomal internal transcribed spacer-1 (ITS1). A total of 40,297 fungal ITS1 sequences clustered into 420 operational taxonomic units (OTUs) with 97% sequence similarity and 170 taxa were recovered from these sediments. Most ITS1 sequences (78%) belonged to the phylum Ascomycota, followed by Basidiomycota (17.3%), Zygomycota (1.5%) and Chytridiomycota (0.8%), and a small proportion (2.4%) belonged to unassigned fungal phyla. Compared with previous studies on fungal diversity of sediments from deep-sea environments by culture-dependent approach and clone library analysis, the present result suggested that Illumina sequencing had been dramatically accelerating the discovery of fungal community of deep-sea sediments. Furthermore, our results revealed that Sordariomycetes was the most diverse and abundant fungal class in this study, challenging the traditional view that the diversity of Sordariomycetes phylotypes was low in the deep-sea environments. In addition, more than 12 taxa accounted for 21.5% sequences were found to be rarely reported as deep-sea fungi, suggesting the deep-sea sediments from Okinawa Trough harbored a plethora of different fungal communities compared with other deep-sea environments. To our knowledge, this study is the first exploration of the fungal diversity in deep-sea sediments from Okinawa Trough using high-throughput Illumina sequencing.

  6. Metagenomic analysis and functional characterization of the biogas microbiome using high throughput shotgun sequencing and a novel binning strategy

    DEFF Research Database (Denmark)

    Campanaro, Stefano; Treu, Laura; Kougias, Panagiotis

    2016-01-01

    Biogas production is an economically attractive technology that has gained momentum worldwide over the past years. Biogas is produced by a biologically mediated process, widely known as "anaerobic digestion." This process is performed by a specialized and complex microbial community, in which...... dissect the bioma involved in anaerobic digestion by means of high throughput Illumina sequencing (~51 gigabases of sequence data), disclosing nearly one million genes and extracting 106 microbial genomes by a novel strategy combining two binning processes. Microbial phylogeny and putative taxonomy...

  7. A family-based probabilistic method for capturing de novo mutations from high-throughput short-read sequencing data.

    Science.gov (United States)

    Cartwright, Reed A; Hussin, Julie; Keebler, Jonathan E M; Stone, Eric A; Awadalla, Philip

    2012-01-06

    Recent advances in high-throughput DNA sequencing technologies and associated statistical analyses have enabled in-depth analysis of whole-genome sequences. As this technology is applied to a growing number of individual human genomes, entire families are now being sequenced. Information contained within the pedigree of a sequenced family can be leveraged when inferring the donors' genotypes. The presence of a de novo mutation within the pedigree is indicated by a violation of Mendelian inheritance laws. Here, we present a method for probabilistically inferring genotypes across a pedigree using high-throughput sequencing data and producing the posterior probability of de novo mutation at each genomic site examined. This framework can be used to disentangle the effects of germline and somatic mutational processes and to simultaneously estimate the effect of sequencing error and the initial genetic variation in the population from which the founders of the pedigree arise. This approach is examined in detail through simulations and areas for method improvement are noted. By applying this method to data from members of a well-defined nuclear family with accurate pedigree information, the stage is set to make the most direct estimates of the human mutation rate to date.

  8. Multiplexed Spliced-Leader Sequencing: A high-throughput, selective method for RNA-seq in Trypanosomatids.

    Science.gov (United States)

    Cuypers, Bart; Domagalska, Malgorzata A; Meysman, Pieter; Muylder, Géraldine de; Vanaerschot, Manu; Imamura, Hideo; Dumetz, Franck; Verdonckt, Thomas Wolf; Myler, Peter J; Ramasamy, Gowthaman; Laukens, Kris; Dujardin, Jean-Claude

    2017-06-16

    High throughput sequencing techniques are poorly adapted for in vivo studies of parasites, which require prior in vitro culturing and purification. Trypanosomatids, a group of kinetoplastid protozoans, possess a distinctive feature in their transcriptional mechanism whereby a specific Spliced Leader (SL) sequence is added to the 5'end of each mRNA by trans-splicing. This allows to discriminate Trypansomatid RNA from mammalian RNA and forms the basis of our new multiplexed protocol for high-throughput, selective RNA-sequencing called SL-seq. We provided a proof-of-concept of SL-seq in Leishmania donovani, the main causative agent of visceral leishmaniasis in humans, and successfully applied the method to sequence Leishmania mRNA directly from infected macrophages and from highly diluted mixes with human RNA. mRNA profiles obtained with SL-seq corresponded largely to those obtained from conventional poly-A tail purification methods, indicating both enumerate the same mRNA pool. However, SL-seq offers additional advantages, including lower sequencing depth requirements, fast and simple library prep and high resolution splice site detection. SL-seq is therefore ideal for fast and massive parallel sequencing of parasite transcriptomes directly from host tissues. Since SLs are also present in Nematodes, Cnidaria and primitive chordates, this method could also have high potential for transcriptomics studies in other organisms.

  9. Investigation of Human Cancers for Retrovirus by Low-Stringency Target Enrichment and High-Throughput Sequencing.

    Science.gov (United States)

    Vinner, Lasse; Mourier, Tobias; Friis-Nielsen, Jens; Gniadecki, Robert; Dybkaer, Karen; Rosenberg, Jacob; Langhoff, Jill Levin; Cruz, David Flores Santa; Fonager, Jannik; Izarzugaza, Jose M G; Gupta, Ramneek; Sicheritz-Ponten, Thomas; Brunak, Søren; Willerslev, Eske; Nielsen, Lars Peter; Hansen, Anders Johannes

    2015-08-19

    Although nearly one fifth of all human cancers have an infectious aetiology, the causes for the majority of cancers remain unexplained. Despite the enormous data output from high-throughput shotgun sequencing, viral DNA in a clinical sample typically constitutes a proportion of host DNA that is too small to be detected. Sequence variation among virus genomes complicates application of sequence-specific, and highly sensitive, PCR methods. Therefore, we aimed to develop and characterize a method that permits sensitive detection of sequences despite considerable variation. We demonstrate that our low-stringency in-solution hybridization method enables detection of retrovirus in samples from three cancer types. In accordance with recent studies our investigation revealed no retroviral infections in human B-cell lymphoma cells, cutaneous T-cell lymphoma or colorectal cancer biopsies. Nonetheless, our generally applicable method makes sensitive detection possible and permits sequencing of distantly related sequences from complex material.

  10. JRC GMO-Amplicons: a collection of nucleic acid sequences related to genetically modified organisms

    OpenAIRE

    PETRILLO MAURO; ANGERS ALEXANDRE; HENRIKSSON PETER; BONFINI Laura; PATAK DENNSTEDT Alexandre; KREYSA JOACHIM

    2015-01-01

    The DNA target sequence is the key element in designing detection methods for genetically modified organisms (GMOs). Unfortunately this information is frequently lacking, especially for unauthorized GMOs. In addition, patent sequences are generally poorly annotated, buried in complex and extensive documentation and hard to link to the corresponding GM event. Here, we present the JRC GMO-Amplicons, a database of amplicons collected by screening public nucleotide sequence databanks by in silico...

  11. A new perspective on studying burial environment before archaeological excavation: analyzing bacterial community distribution by high-throughput sequencing

    OpenAIRE

    Jinjin Xu; Yanfei Wei; Hanqing Jia; Lin Xiao; Decai Gong

    2017-01-01

    Burial conditions play a crucial role in archaeological heritage preservation. Especially, the microorganisms were considered as the leading causes which incurred degradation and vanishment of historic materials. In this article, we analyzed bacterial diversity and community structure from M1 of Wangshanqiao using 16?S rRNA gene amplicon sequencing. The results indicated that microbial communities in burial conditions were diverse among four different samples. The samples from the robber hole...

  12. The Microbiome and Metabolites in Fermented Pu-erh Tea as Revealed by High-Throughput Sequencing and Quantitative Multiplex Metabolite Analysis.

    Directory of Open Access Journals (Sweden)

    Yongjie Zhang

    Full Text Available Pu-erh is a tea produced in Yunnan, China by microbial fermentation of fresh Camellia sinensis leaves by two processes, the traditional raw fermentation and the faster, ripened fermentation. We characterized fungal and bacterial communities in leaves and both Pu-erhs by high-throughput, rDNA-amplicon sequencing and we characterized the profile of bioactive extrolite mycotoxins in Pu-erh teas by quantitative liquid chromatography-tandem mass spectrometry. We identified 390 fungal and 629 bacterial OTUs from leaves and both Pu-erhs. Major findings are: 1 fungal diversity drops and bacterial diversity rises due to raw or ripened fermentation, 2 fungal and bacterial community composition changes significantly between fresh leaves and both raw and ripened Pu-erh, 3 aging causes significant changes in the microbial community of raw, but not ripened, Pu-erh, and, 4 ripened and well-aged raw Pu-erh have similar microbial communities that are distinct from those of young, raw Ph-erh tea. Twenty-five toxic metabolites, mainly of fungal origin, were detected, with patulin and asperglaucide dominating and at levels supporting the Chinese custom of discarding the first preparation of Pu-erh and using the wet tea to then brew a pot for consumption.

  13. Investigation of bacterial and fungal diversity in tarag using high-throughput sequencing.

    Science.gov (United States)

    Sun, Zhihong; Liu, Wenjun; Bao, Qiuhua; Zhang, Jiachao; Hou, Qiangchuan; Kwok, Laiyu; Sun, Tiansong; Zhang, Heping

    2014-10-01

    This is the first study on the bacterial and fungal community diversity in 17 tarag samples (naturally fermented dairy products) through a metagenomic approach involving high-throughput pyrosequencing. Our results revealed the presence of a total of 47 bacterial and 43 fungal genera in all tarag samples, in which Lactobacillus and Galactomyces were the predominant genera of bacteria and fungi, respectively. The number of some microbial genera, such as Lactococcus, Acetobacter, Saccharomyces, Trichosporon, and Kluyveromyces, among others, was found to vary between different samples. Altogether, our results showed that the microbial flora in different samples may be stratified by geographic region. Copyright © 2014 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  14. Inertial-ordering-assisted droplet microfluidics for high-throughput single-cell RNA-sequencing.

    Science.gov (United States)

    Moon, Hui-Sung; Je, Kwanghwi; Min, Jae-Woong; Park, Donghyun; Han, Kyung-Yeon; Shin, Seung-Ho; Park, Woong-Yang; Yoo, Chang Eun; Kim, Shin-Hyun

    2018-02-27

    Single-cell RNA-seq reveals the cellular heterogeneity inherent in the population of cells, which is very important in many clinical and research applications. Recent advances in droplet microfluidics have achieved the automatic isolation, lysis, and labeling of single cells in droplet compartments without complex instrumentation. However, barcoding errors occurring in the cell encapsulation process because of the multiple-beads-in-droplet and insufficient throughput because of the low concentration of beads for avoiding multiple-beads-in-a-droplet remain important challenges for precise and efficient expression profiling of single cells. In this study, we developed a new droplet-based microfluidic platform that significantly improved the throughput while reducing barcoding errors through deterministic encapsulation of inertially ordered beads. Highly concentrated beads containing oligonucleotide barcodes were spontaneously ordered in a spiral channel by an inertial effect, which were in turn encapsulated in droplets one-by-one, while cells were simultaneously encapsulated in the droplets. The deterministic encapsulation of beads resulted in a high fraction of single-bead-in-a-droplet and rare multiple-beads-in-a-droplet although the bead concentration increased to 1000 μl -1 , which diminished barcoding errors and enabled accurate high-throughput barcoding. We successfully validated our device with single-cell RNA-seq. In addition, we found that multiple-beads-in-a-droplet, generated using a normal Drop-Seq device with a high concentration of beads, underestimated transcript numbers and overestimated cell numbers. This accurate high-throughput platform can expand the capability and practicality of Drop-Seq in single-cell analysis.

  15. Isolation of cultivation-resistant oomycetes, first detected as amplicon sequences, from roots of herbicide-terminated winter rye

    Science.gov (United States)

    The dynamics of microbial communities associated with dying cover crops are of interest because of potential impacts on disease in a subsequent crop, and because of the importance of microbial activity on plant residue to soil organic matter dynamics and nutrient cycling. High throughput amplicon se...

  16. High-throughput sequencing and graph-based cluster analysis facilitate microsatellite development from a highly complex genome.

    Science.gov (United States)

    Shah, Abhijeet B; Schielzeth, Holger; Albersmeier, Andreas; Kalinowski, Joern; Hoffman, Joseph I

    2016-08-01

    Despite recent advances in high-throughput sequencing, difficulties are often encountered when developing microsatellites for species with large and complex genomes. This probably reflects the close association in many species of microsatellites with cryptic repetitive elements. We therefore developed a novel approach for isolating polymorphic microsatellites from the club-legged grasshopper (Gomphocerus sibiricus), an emerging quantitative genetic and behavioral model system. Whole genome shotgun Illumina MiSeq sequencing was used to generate over three million 300 bp paired-end reads, of which 67.75% were grouped into 40,548 clusters within RepeatExplorer. Annotations of the top 468 clusters, which represent 60.5% of the reads, revealed homology to satellite DNA and a variety of transposable elements. Evaluating 96 primer pairs in eight wild-caught individuals, we found that primers mined from singleton reads were six times more likely to amplify a single polymorphic microsatellite locus than primers mined from clusters. Our study provides experimental evidence in support of the notion that microsatellites associated with repetitive elements are less likely to successfully amplify. It also reveals how advances in high-throughput sequencing and graph-based repetitive DNA analysis can be leveraged to isolate polymorphic microsatellites from complex genomes.

  17. Allelome.PRO, a pipeline to define allele-specific genomic features from high-throughput sequencing data.

    Science.gov (United States)

    Andergassen, Daniel; Dotter, Christoph P; Kulinski, Tomasz M; Guenzl, Philipp M; Bammer, Philipp C; Barlow, Denise P; Pauler, Florian M; Hudson, Quanah J

    2015-12-02

    Detecting allelic biases from high-throughput sequencing data requires an approach that maximises sensitivity while minimizing false positives. Here, we present Allelome.PRO, an automated user-friendly bioinformatics pipeline, which uses high-throughput sequencing data from reciprocal crosses of two genetically distinct mouse strains to detect allele-specific expression and chromatin modifications. Allelome.PRO extends approaches used in previous studies that exclusively analyzed imprinted expression to give a complete picture of the 'allelome' by automatically categorising the allelic expression of all genes in a given cell type into imprinted, strain-biased, biallelic or non-informative. Allelome.PRO offers increased sensitivity to analyze lowly expressed transcripts, together with a robust false discovery rate empirically calculated from variation in the sequencing data. We used RNA-seq data from mouse embryonic fibroblasts from F1 reciprocal crosses to determine a biologically relevant allelic ratio cutoff, and define for the first time an entire allelome. Furthermore, we show that Allelome.PRO detects differential enrichment of H3K4me3 over promoters from ChIP-seq data validating the RNA-seq results. This approach can be easily extended to analyze histone marks of active enhancers, or transcription factor binding sites and therefore provides a powerful tool to identify candidate cis regulatory elements genome wide. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  18. PCR primers to study the diversity of expressed fungal genes encoding lignocellulolytic enzymes in soils using high-throughput sequencing.

    Directory of Open Access Journals (Sweden)

    Florian Barbi

    Full Text Available Plant biomass degradation in soil is one of the key steps of carbon cycling in terrestrial ecosystems. Fungal saprotrophic communities play an essential role in this process by producing hydrolytic enzymes active on the main components of plant organic matter. Open questions in this field regard the diversity of the species involved, the major biochemical pathways implicated and how these are affected by external factors such as litter quality or climate changes. This can be tackled by environmental genomic approaches involving the systematic sequencing of key enzyme-coding gene families using soil-extracted RNA as material. Such an approach necessitates the design and evaluation of gene family-specific PCR primers producing sequence fragments compatible with high-throughput sequencing approaches. In the present study, we developed and evaluated PCR primers for the specific amplification of fungal CAZy Glycoside Hydrolase gene families GH5 (subfamily 5 and GH11 encoding endo-β-1,4-glucanases and endo-β-1,4-xylanases respectively as well as Basidiomycota class II peroxidases, corresponding to the CAZy Auxiliary Activity family 2 (AA2, active on lignin. These primers were experimentally validated using DNA extracted from a wide range of Ascomycota and Basidiomycota species including 27 with sequenced genomes. Along with the published primers for Glycoside Hydrolase GH7 encoding enzymes active on cellulose, the newly design primers were shown to be compatible with the Illumina MiSeq sequencing technology. Sequences obtained from RNA extracted from beech or spruce forest soils showed a high diversity and were uniformly distributed in gene trees featuring the global diversity of these gene families. This high-throughput sequencing approach using several degenerate primers constitutes a robust method, which allows the simultaneous characterization of the diversity of different fungal transcripts involved in plant organic matter degradation and may

  19. Analysis and Visualization Tool for Targeted Amplicon Bisulfite Sequencing on Ion Torrent Sequencers

    Science.gov (United States)

    Pabinger, Stephan; Ernst, Karina; Pulverer, Walter; Kallmeyer, Rainer; Valdes, Ana M.; Metrustry, Sarah; Katic, Denis; Nuzzo, Angelo; Kriegner, Albert; Vierlinger, Klemens; Weinhaeusel, Andreas

    2016-01-01

    Targeted sequencing of PCR amplicons generated from bisulfite deaminated DNA is a flexible, cost-effective way to study methylation of a sample at single CpG resolution and perform subsequent multi-target, multi-sample comparisons. Currently, no platform specific protocol, support, or analysis solution is provided to perform targeted bisulfite sequencing on a Personal Genome Machine (PGM). Here, we present a novel tool, called TABSAT, for analyzing targeted bisulfite sequencing data generated on Ion Torrent sequencers. The workflow starts with raw sequencing data, performs quality assessment, and uses a tailored version of Bismark to map the reads to a reference genome. The pipeline visualizes results as lollipop plots and is able to deduce specific methylation-patterns present in a sample. The obtained profiles are then summarized and compared between samples. In order to assess the performance of the targeted bisulfite sequencing workflow, 48 samples were used to generate 53 different Bisulfite-Sequencing PCR amplicons from each sample, resulting in 2,544 amplicon targets. We obtained a mean coverage of 282X using 1,196,822 aligned reads. Next, we compared the sequencing results of these targets to the methylation level of the corresponding sites on an Illumina 450k methylation chip. The calculated average Pearson correlation coefficient of 0.91 confirms the sequencing results with one of the industry-leading CpG methylation platforms and shows that targeted amplicon bisulfite sequencing provides an accurate and cost-efficient method for DNA methylation studies, e.g., to provide platform-independent confirmation of Illumina Infinium 450k methylation data. TABSAT offers a novel way to analyze data generated by Ion Torrent instruments and can also be used with data from the Illumina MiSeq platform. It can be easily accessed via the Platomics platform, which offers a web-based graphical user interface along with sample and parameter storage. TABSAT is freely

  20. Analysis and Visualization Tool for Targeted Amplicon Bisulfite Sequencing on Ion Torrent Sequencers.

    Science.gov (United States)

    Pabinger, Stephan; Ernst, Karina; Pulverer, Walter; Kallmeyer, Rainer; Valdes, Ana M; Metrustry, Sarah; Katic, Denis; Nuzzo, Angelo; Kriegner, Albert; Vierlinger, Klemens; Weinhaeusel, Andreas

    2016-01-01

    Targeted sequencing of PCR amplicons generated from bisulfite deaminated DNA is a flexible, cost-effective way to study methylation of a sample at single CpG resolution and perform subsequent multi-target, multi-sample comparisons. Currently, no platform specific protocol, support, or analysis solution is provided to perform targeted bisulfite sequencing on a Personal Genome Machine (PGM). Here, we present a novel tool, called TABSAT, for analyzing targeted bisulfite sequencing data generated on Ion Torrent sequencers. The workflow starts with raw sequencing data, performs quality assessment, and uses a tailored version of Bismark to map the reads to a reference genome. The pipeline visualizes results as lollipop plots and is able to deduce specific methylation-patterns present in a sample. The obtained profiles are then summarized and compared between samples. In order to assess the performance of the targeted bisulfite sequencing workflow, 48 samples were used to generate 53 different Bisulfite-Sequencing PCR amplicons from each sample, resulting in 2,544 amplicon targets. We obtained a mean coverage of 282X using 1,196,822 aligned reads. Next, we compared the sequencing results of these targets to the methylation level of the corresponding sites on an Illumina 450k methylation chip. The calculated average Pearson correlation coefficient of 0.91 confirms the sequencing results with one of the industry-leading CpG methylation platforms and shows that targeted amplicon bisulfite sequencing provides an accurate and cost-efficient method for DNA methylation studies, e.g., to provide platform-independent confirmation of Illumina Infinium 450k methylation data. TABSAT offers a novel way to analyze data generated by Ion Torrent instruments and can also be used with data from the Illumina MiSeq platform. It can be easily accessed via the Platomics platform, which offers a web-based graphical user interface along with sample and parameter storage. TABSAT is freely

  1. High throughput resistance profiling of Plasmodium falciparum infections based on custom dual indexing and Illumina next generation sequencing-technology

    DEFF Research Database (Denmark)

    Nag, Sidsel; Dalgaard, Marlene Danner; Kofoed, Poul-Erik

    2017-01-01

    Genetic polymorphisms in P. falciparum can be used to indicate the parasite's susceptibility to antimalarial drugs as well as its geographical origin. Both of these factors are key to monitoring development and spread of antimalarial drug resistance. In this study, we combine multiplex PCR, custom...... designed dual indexing and Miseq sequencing for high throughput SNP-profiling of 457 malaria infections from Guinea-Bissau, at the cost of 10 USD per sample. By amplifying and sequencing 15 genetic fragments, we cover 20 resistance-conferring SNPs occurring in pfcrt, pfmdr1, pfdhfr, pfdhps, as well...... as the entire length of pfK13, and the mitochondrial barcode for parasite origin. SNPs of interest were sequenced with an average depth of 2,043 reads, and bases were called for the various SNP-positions with a p-value below 0.05, for 89.8-100% of samples. The SNP data indicates that artemisinin resistance...

  2. Efficient error correction for next-generation sequencing of viral amplicons

    Directory of Open Access Journals (Sweden)

    Skums Pavel

    2012-06-01

    Full Text Available Abstract Background Next-generation sequencing allows the analysis of an unprecedented number of viral sequence variants from infected patients, presenting a novel opportunity for understanding virus evolution, drug resistance and immune escape. However, sequencing in bulk is error prone. Thus, the generated data require error identification and correction. Most error-correction methods to date are not optimized for amplicon analysis and assume that the error rate is randomly distributed. Recent quality assessment of amplicon sequences obtained using 454-sequencing showed that the error rate is strongly linked to the presence and size of homopolymers, position in the sequence and length of the amplicon. All these parameters are strongly sequence specific and should be incorporated into the calibration of error-correction algorithms designed for amplicon sequencing. Results In this paper, we present two new efficient error correction algorithms optimized for viral amplicons: (i k-mer-based error correction (KEC and (ii empirical frequency threshold (ET. Both were compared to a previously published clustering algorithm (SHORAH, in order to evaluate their relative performance on 24 experimental datasets obtained by 454-sequencing of amplicons with known sequences. All three algorithms show similar accuracy in finding true haplotypes. However, KEC and ET were significantly more efficient than SHORAH in removing false haplotypes and estimating the frequency of true ones. Conclusions Both algorithms, KEC and ET, are highly suitable for rapid recovery of error-free haplotypes obtained by 454-sequencing of amplicons from heterogeneous viruses. The implementations of the algorithms and data sets used for their testing are available at: http://alan.cs.gsu.edu/NGS/?q=content/pyrosequencing-error-correction-algorithm

  3. Digital PCR provides sensitive and absolute calibration for high throughput sequencing

    Directory of Open Access Journals (Sweden)

    Fan H Christina

    2009-03-01

    Full Text Available Abstract Background Next-generation DNA sequencing on the 454, Solexa, and SOLiD platforms requires absolute calibration of the number of molecules to be sequenced. This requirement has two unfavorable consequences. First, large amounts of sample-typically micrograms-are needed for library preparation, thereby limiting the scope of samples which can be sequenced. For many applications, including metagenomics and the sequencing of ancient, forensic, and clinical samples, the quantity of input DNA can be critically limiting. Second, each library requires a titration sequencing run, thereby increasing the cost and lowering the throughput of sequencing. Results We demonstrate the use of digital PCR to accurately quantify 454 and Solexa sequencing libraries, enabling the preparation of sequencing libraries from nanogram quantities of input material while eliminating costly and time-consuming titration runs of the sequencer. We successfully sequenced low-nanogram scale bacterial and mammalian DNA samples on the 454 FLX and Solexa DNA sequencing platforms. This study is the first to definitively demonstrate the successful sequencing of picogram quantities of input DNA on the 454 platform, reducing the sample requirement more than 1000-fold without pre-amplification and the associated bias and reduction in library depth. Conclusion The digital PCR assay allows absolute quantification of sequencing libraries, eliminates uncertainties associated with the construction and application of standard curves to PCR-based quantification, and with a coefficient of variation close to 10%, is sufficiently precise to enable direct sequencing without titration runs.

  4. A high-throughput de novo sequencing approach for shotgun proteomics using high-resolution tandem mass spectrometry

    Directory of Open Access Journals (Sweden)

    Banfield Jillian F

    2010-03-01

    Full Text Available Abstract Background High-resolution tandem mass spectra can now be readily acquired with hybrid instruments, such as LTQ-Orbitrap and LTQ-FT, in high-throughput shotgun proteomics workflows. The improved spectral quality enables more accurate de novo sequencing for identification of post-translational modifications and amino acid polymorphisms. Results In this study, a new de novo sequencing algorithm, called Vonode, has been developed specifically for analysis of such high-resolution tandem mass spectra. To fully exploit the high mass accuracy of these spectra, a unique scoring system is proposed to evaluate sequence tags based primarily on mass accuracy information of fragment ions. Consensus sequence tags were inferred for 11,422 spectra with an average peptide length of 5.5 residues from a total of 40,297 input spectra acquired in a 24-hour proteomics measurement of Rhodopseudomonas palustris. The accuracy of inferred consensus sequence tags was 84%. According to our comparison, the performance of Vonode was shown to be superior to the PepNovo v2.0 algorithm, in terms of the number of de novo sequenced spectra and the sequencing accuracy. Conclusions Here, we improved de novo sequencing performance by developing a new algorithm specifically for high-resolution tandem mass spectral data. The Vonode algorithm is freely available for download at http://compbio.ornl.gov/Vonode.

  5. SSR_pipeline: a bioinformatic infrastructure for identifying microsatellites from paired-end Illumina high-throughput DNA sequencing data

    Science.gov (United States)

    Miller, Mark P.; Knaus, Brian J.; Mullins, Thomas D.; Haig, Susan M.

    2013-01-01

    SSR_pipeline is a flexible set of programs designed to efficiently identify simple sequence repeats (e.g., microsatellites) from paired-end high-throughput Illumina DNA sequencing data. The program suite contains 3 analysis modules along with a fourth control module that can automate analyses of large volumes of data. The modules are used to 1) identify the subset of paired-end sequences that pass Illumina quality standards, 2) align paired-end reads into a single composite DNA sequence, and 3) identify sequences that possess microsatellites (both simple and compound) conforming to user-specified parameters. The microsatellite search algorithm is extremely efficient, and we have used it to identify repeats with motifs from 2 to 25bp in length. Each of the 3 analysis modules can also be used independently to provide greater flexibility or to work with FASTQ or FASTA files generated from other sequencing platforms (Roche 454, Ion Torrent, etc.). We demonstrate use of the program with data from the brine fly Ephydra packardi (Diptera: Ephydridae) and provide empirical timing benchmarks to illustrate program performance on a common desktop computer environment. We further show that the Illumina platform is capable of identifying large numbers of microsatellites, even when using unenriched sample libraries and a very small percentage of the sequencing capacity from a single DNA sequencing run. All modules from SSR_pipeline are implemented in the Python programming language and can therefore be used from nearly any computer operating system (Linux, Macintosh, and Windows).

  6. Polymorphism discovery and allele frequency estimation using high-throughput DNA sequencing of target-enriched pooled DNA samples

    Directory of Open Access Journals (Sweden)

    Mullen Michael P

    2012-01-01

    Full Text Available Abstract Background The central role of the somatotrophic axis in animal post-natal growth, development and fertility is well established. Therefore, the identification of genetic variants affecting quantitative traits within this axis is an attractive goal. However, large sample numbers are a pre-requisite for the identification of genetic variants underlying complex traits and although technologies are improving rapidly, high-throughput sequencing of large numbers of complete individual genomes remains prohibitively expensive. Therefore using a pooled DNA approach coupled with target enrichment and high-throughput sequencing, the aim of this study was to identify polymorphisms and estimate allele frequency differences across 83 candidate genes of the somatotrophic axis, in 150 Holstein-Friesian dairy bulls divided into two groups divergent for genetic merit for fertility. Results In total, 4,135 SNPs and 893 indels were identified during the resequencing of the 83 candidate genes. Nineteen percent (n = 952 of variants were located within 5' and 3' UTRs. Seventy-two percent (n = 3,612 were intronic and 9% (n = 464 were exonic, including 65 indels and 236 SNPs resulting in non-synonymous substitutions (NSS. Significant (P ® MassARRAY. No significant differences (P > 0.1 were observed between the two methods for any of the 43 SNPs across both pools (i.e., 86 tests in total. Conclusions The results of the current study support previous findings of the use of DNA sample pooling and high-throughput sequencing as a viable strategy for polymorphism discovery and allele frequency estimation. Using this approach we have characterised the genetic variation within genes of the somatotrophic axis and related pathways, central to mammalian post-natal growth and development and subsequent lactogenesis and fertility. We have identified a large number of variants segregating at significantly different frequencies between cattle groups divergent for calving

  7. Direct metagenomic detection of viral pathogens in nasal and fecal specimens using an unbiased high-throughput sequencing approach.

    Directory of Open Access Journals (Sweden)

    Shota Nakamura

    Full Text Available With the severe acute respiratory syndrome epidemic of 2003 and renewed attention on avian influenza viral pandemics, new surveillance systems are needed for the earlier detection of emerging infectious diseases. We applied a "next-generation" parallel sequencing platform for viral detection in nasopharyngeal and fecal samples collected during seasonal influenza virus (Flu infections and norovirus outbreaks from 2005 to 2007 in Osaka, Japan. Random RT-PCR was performed to amplify RNA extracted from 0.1-0.25 ml of nasopharyngeal aspirates (N = 3 and fecal specimens (N = 5, and more than 10 microg of cDNA was synthesized. Unbiased high-throughput sequencing of these 8 samples yielded 15,298-32,335 (average 24,738 reads in a single 7.5 h run. In nasopharyngeal samples, although whole genome analysis was not available because the majority (>90% of reads were host genome-derived, 20-460 Flu-reads were detected, which was sufficient for subtype identification. In fecal samples, bacteria and host cells were removed by centrifugation, resulting in gain of 484-15,260 reads of norovirus sequence (78-98% of the whole genome was covered, except for one specimen that was under-detectable by RT-PCR. These results suggest that our unbiased high-throughput sequencing approach is useful for directly detecting pathogenic viruses without advance genetic information. Although its cost and technological availability make it unlikely that this system will very soon be the diagnostic standard worldwide, this system could be useful for the earlier discovery of novel emerging viruses and bioterrorism, which are difficult to detect with conventional procedures.

  8. High-throughput sequencing of nematode communities from total soil DNA extractions

    DEFF Research Database (Denmark)

    Sapkota, Rumakanta; Nicolaisen, Mogens

    2015-01-01

    nematodes without the need for enrichment was developed. Using this strategy on DNA templates from a set of 22 agricultural soils, we obtained 64.4% sequences of nematode origin in total, whereas the remaining sequences were almost entirely from other metazoans. The nematode sequences were derived from...... a broad taxonomic range and most sequences were from nematode taxa that have previously been found to be abundant in soil such as Tylenchida, Rhabditida, Dorylaimida, Triplonchida and Araeolaimida. Conclusions: Our amplification and sequencing strategy for assessing nematode diversity was able to collect...

  9. High-throughput and quantitative genome-wide messenger RNA sequencing for molecular phenotyping.

    Science.gov (United States)

    Collins, John E; Wali, Neha; Sealy, Ian M; Morris, James A; White, Richard J; Leonard, Steven R; Jackson, David K; Jones, Matthew C; Smerdon, Nathalie C; Zamora, Jorge; Dooley, Christopher M; Carruthers, Samantha N; Barrett, Jeffrey C; Stemple, Derek L; Busch-Nentwich, Elisabeth M

    2015-08-05

    We present a genome-wide messenger RNA (mRNA) sequencing technique that converts small amounts of RNA from many samples into molecular phenotypes. It encompasses all steps from sample preparation to sequence analysis and is applicable to baseline profiling or perturbation measurements. Multiplex sequencing of transcript 3' ends identifies differential transcript abundance independent of gene annotation. We show that increasing biological replicate number while maintaining the total amount of sequencing identifies more differentially abundant transcripts. This method can be implemented on polyadenylated RNA from any organism with an annotated reference genome and in any laboratory with access to Illumina sequencing.

  10. JRC GMO-Amplicons: a collection of nucleic acid sequences related to genetically modified organisms.

    Science.gov (United States)

    Petrillo, Mauro; Angers-Loustau, Alexandre; Henriksson, Peter; Bonfini, Laura; Patak, Alex; Kreysa, Joachim

    2015-01-01

    The DNA target sequence is the key element in designing detection methods for genetically modified organisms (GMOs). Unfortunately this information is frequently lacking, especially for unauthorized GMOs. In addition, patent sequences are generally poorly annotated, buried in complex and extensive documentation and hard to link to the corresponding GM event. Here, we present the JRC GMO-Amplicons, a database of amplicons collected by screening public nucleotide sequence databanks by in silico determination of PCR amplification with reference methods for GMO analysis. The European Union Reference Laboratory for Genetically Modified Food and Feed (EU-RL GMFF) provides these methods in the GMOMETHODS database to support enforcement of EU legislation and GM food/feed control. The JRC GMO-Amplicons database is composed of more than 240 000 amplicons, which can be easily accessed and screened through a web interface. To our knowledge, this is the first attempt at pooling and collecting publicly available sequences related to GMOs in food and feed. The JRC GMO-Amplicons supports control laboratories in the design and assessment of GMO methods, providing inter-alia in silico prediction of primers specificity and GM targets coverage. The new tool can assist the laboratories in the analysis of complex issues, such as the detection and identification of unauthorized GMOs. Notably, the JRC GMO-Amplicons database allows the retrieval and characterization of GMO-related sequences included in patents documentation. Finally, it can help annotating poorly described GM sequences and identifying new relevant GMO-related sequences in public databases. The JRC GMO-Amplicons is freely accessible through a web-based portal that is hosted on the EU-RL GMFF website. Database URL: http://gmo-crl.jrc.ec.europa.eu/jrcgmoamplicons/. © The Author(s) 2015. Published by Oxford University Press.

  11. Peptide Pattern Recognition for high-throughput protein sequence analysis and clustering

    DEFF Research Database (Denmark)

    Busk, Peter Kamp

    2017-01-01

    Large collections of protein sequences with divergent sequences are tedious to analyze for understanding their phylogenetic or structure-function relation. Peptide Pattern Recognition is an algorithm that was developed to facilitate this task but the previous version does only allow a limited...... number of sequences as input. I implemented Peptide Pattern Recognition as a multithread software designed to handle large numbers of sequences and perform analysis in a reasonable time frame. Benchmarking showed that the new implementation of Peptide Pattern Recognition is twenty times faster than...... the previous implementation on a small protein collection with 673 MAP kinase sequences. In addition, the new implementation could analyze a large protein collection with 48,570 Glycosyl Transferase family 20 sequences without reaching its upper limit on a desktop computer. Peptide Pattern Recognition...

  12. The Candida Genome Database (CGD): incorporation of Assembly 22, systematic identifiers and visualization of high throughput sequencing data.

    Science.gov (United States)

    Skrzypek, Marek S; Binkley, Jonathan; Binkley, Gail; Miyasato, Stuart R; Simison, Matt; Sherlock, Gavin

    2017-01-04

    The Candida Genome Database (CGD, http://www.candidagenome.org/) is a freely available online resource that provides gene, protein and sequence information for multiple Candida species, along with web-based tools for accessing, analyzing and exploring these data. The mission of CGD is to facilitate and accelerate research into Candida pathogenesis and biology, by curating the scientific literature in real time, and connecting literature-derived annotations to the latest version of the genomic sequence and its annotations. Here, we report the incorporation into CGD of Assembly 22, the first chromosome-level, phased diploid assembly of the C. albicans genome, coupled with improvements that we have made to the assembly using additional available sequence data. We also report the creation of systematic identifiers for C. albicans genes and sequence features using a system similar to that adopted by the yeast community over two decades ago. Finally, we describe the incorporation of JBrowse into CGD, which allows online browsing of mapped high throughput sequencing data, and its implementation for several RNA-Seq data sets, as well as the whole genome sequencing data that was used in the construction of Assembly 22. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  13. Viral Metagenomics: Analysis of Begomoviruses by Illumina High-Throughput Sequencing

    Directory of Open Access Journals (Sweden)

    Ali Idris

    2014-03-01

    Full Text Available Traditional DNA sequencing methods are inefficient, lack the ability to discern the least abundant viral sequences, and ineffective for determining the extent of variability in viral populations. Here, populations of single-stranded DNA plant begomoviral genomes and their associated beta- and alpha-satellite molecules (virus-satellite complexes (genus, Begomovirus; family, Geminiviridae were enriched from total nucleic acids isolated from symptomatic, field-infected plants, using rolling circle amplification (RCA. Enriched virus-satellite complexes were subjected to Illumina-Next Generation Sequencing (NGS. CASAVA and SeqMan NGen programs were implemented, respectively, for quality control and for de novo and reference-guided contig assembly of viral-satellite sequences. The authenticity of the begomoviral sequences, and the reproducibility of the Illumina-NGS approach for begomoviral deep sequencing projects, were validated by comparing NGS results with those obtained using traditional molecular cloning and Sanger sequencing of viral components and satellite DNAs, also enriched by RCA or amplified by polymerase chain reaction. As the use of NGS approaches, together with advances in software development, make possible deep sequence coverage at a lower cost; the approach described herein will streamline the exploration of begomovirus diversity and population structure from naturally infected plants, irrespective of viral abundance. This is the first report of the implementation of Illumina-NGS to explore the diversity and identify begomoviral-satellite SNPs directly from plants naturally-infected with begomoviruses under field conditions.

  14. Viral metagenomics: Analysis of begomoviruses by illumina high-throughput sequencing

    KAUST Repository

    Idris, Ali

    2014-03-12

    Traditional DNA sequencing methods are inefficient, lack the ability to discern the least abundant viral sequences, and ineffective for determining the extent of variability in viral populations. Here, populations of single-stranded DNA plant begomoviral genomes and their associated beta- and alpha-satellite molecules (virus-satellite complexes) (genus, Begomovirus; family, Geminiviridae) were enriched from total nucleic acids isolated from symptomatic, field-infected plants, using rolling circle amplification (RCA). Enriched virus-satellite complexes were subjected to Illumina-Next Generation Sequencing (NGS). CASAVA and SeqMan NGen programs were implemented, respectively, for quality control and for de novo and reference-guided contig assembly of viral-satellite sequences. The authenticity of the begomoviral sequences, and the reproducibility of the Illumina-NGS approach for begomoviral deep sequencing projects, were validated by comparing NGS results with those obtained using traditional molecular cloning and Sanger sequencing of viral components and satellite DNAs, also enriched by RCA or amplified by polymerase chain reaction. As the use of NGS approaches, together with advances in software development, make possible deep sequence coverage at a lower cost; the approach described herein will streamline the exploration of begomovirus diversity and population structure from naturally infected plants, irrespective of viral abundance. This is the first report of the implementation of Illumina-NGS to explore the diversity and identify begomoviral-satellite SNPs directly from plants naturally-infected with begomoviruses under field conditions. 2014 by the authors; licensee MDPI, Basel, Switzerland.

  15. Detection and mapping of mtDNA SNPs in Atlantic salmon using high throughput DNA sequencing

    Directory of Open Access Journals (Sweden)

    Olafsdottir Gudbjorg

    2011-04-01

    Full Text Available Abstract Background Approximately half of the mitochondrial genome inherent within 546 individual Atlantic salmon (Salmo salar derived from across the species' North Atlantic range, was selectively amplified with a novel combination of standard PCR and pyro-sequencing in a single run using 454 Titanium FLX technology (Roche, 454 Life Sciences. A unique combination of barcoded primers and a partitioned sequencing plate was employed to designate each sequence read to its original sample. The sequence reads were aligned according to the S. salar mitochondrial reference sequence (NC_001960.1, with the objective of identifying single nucleotide polymorphisms (SNPs. They were validated if they met with the following three stringent criteria: (i sequence reads were produced from both DNA strands; (ii SNPs were confirmed in a minimum of 90% of replicate sequence reads; and (iii SNPs occurred in more than one individual. Results Pyrosequencing generated a total of 179,826,884 bp of data, and 10,765 of the total 10,920 S. salar sequences (98.6% were assigned back to their original samples. The approach taken resulted in a total of 216 SNPs and 2 indels, which were validated and mapped onto the S. salar mitochondrial genome, including 107 SNPs and one indel not previously reported. An average of 27.3 sequence reads with a standard deviation of 11.7 supported each SNP per individual. Conclusion The study generated a mitochondrial SNP panel from a large sample group across a broad geographical area, reducing the potential for ascertainment bias, which has hampered previous studies. The SNPs identified here validate those identified in previous studies, and also contribute additional potentially informative loci for the future study of phylogeography and evolution in the Atlantic salmon. The overall success experienced with this novel application of HT sequencing of targeted regions suggests that the same approach could be successfully applied for SNP mining

  16. High-throughput sequencing of core STR loci for forensic genetic investigations using the Roche Genome Sequencer FLX platform

    DEFF Research Database (Denmark)

    Fordyce, Sarah Louise; Avila Arcos, Maria del Carmen; Rockenbauer, Eszter

    2011-01-01

    The analysis and profiling of short tandem repeat (STR) loci is routinely used in forensic genetics. Current methods to investigate STR loci, including PCR-based standard fragment analyses and capillary electrophoresis, only provide amplicon lengths that are used to estimate the number of STR...

  17. DNA Sudoku—harnessing high-throughput sequencing for multiplexed specimen analysis

    Science.gov (United States)

    Erlich, Yaniv; Chang, Kenneth; Gordon, Assaf; Ronen, Roy; Navon, Oron; Rooks, Michelle; Hannon, Gregory J.

    2009-01-01

    Next-generation sequencers have sufficient power to analyze simultaneously DNAs from many different specimens, a practice known as multiplexing. Such schemes rely on the ability to associate each sequence read with the specimen from which it was derived. The current practice of appending molecular barcodes prior to pooling is practical for parallel analysis of up to many dozen samples. Here, we report a strategy that permits simultaneous analysis of tens of thousands of specimens. Our approach relies on the use of combinatorial pooling strategies in which pools rather than individual specimens are assigned barcodes. Thus, the identity of each specimen is encoded within the pooling pattern rather than by its association with a particular sequence tag. Decoding the pattern allows the sequence of an original specimen to be inferred with high confidence. We verified the ability of our encoding and decoding strategies to accurately report the sequence of individual samples within a large number of mixed specimens in two ways. First, we simulated data both from a clone library and from a human population in which a sequence variant associated with cystic fibrosis was present. Second, we actually pooled, sequenced, and decoded identities within two sets of 40,000 bacterial clones comprising approximately 20,000 different artificial microRNAs targeting Arabidopsis or human genes. We achieved greater than 97% accuracy in these trials. The strategies reported here can be applied to a wide variety of biological problems, including the determination of genotypic variation within large populations of individuals. PMID:19447965

  18. Reproducibility of high-throughput mRNA and small RNA sequencing across laboratories.

    Science.gov (United States)

    't Hoen, Peter A C; Friedländer, Marc R; Almlöf, Jonas; Sammeth, Michael; Pulyakhina, Irina; Anvar, Seyed Yahya; Laros, Jeroen F J; Buermans, Henk P J; Karlberg, Olof; Brännvall, Mathias; den Dunnen, Johan T; van Ommen, Gert-Jan B; Gut, Ivo G; Guigó, Roderic; Estivill, Xavier; Syvänen, Ann-Christine; Dermitzakis, Emmanouil T; Lappalainen, Tuuli

    2013-11-01

    RNA sequencing is an increasingly popular technology for genome-wide analysis of transcript sequence and abundance. However, understanding of the sources of technical and interlaboratory variation is still limited. To address this, the GEUVADIS consortium sequenced mRNAs and small RNAs of lymphoblastoid cell lines of 465 individuals in seven sequencing centers, with a large number of replicates. The variation between laboratories appeared to be considerably smaller than the already limited biological variation. Laboratory effects were mainly seen in differences in insert size and GC content and could be adequately corrected for. In small-RNA sequencing, the microRNA (miRNA) content differed widely between samples owing to competitive sequencing of rRNA fragments. This did not affect relative quantification of miRNAs. We conclude that distributing RNA sequencing among different laboratories is feasible, given proper standardization and randomization procedures. We provide a set of quality measures and guidelines for assessing technical biases in RNA-seq data.

  19. High throughput resistance profiling of Plasmodium falciparum infections based on custom dual indexing and Illumina next generation sequencing-technology

    DEFF Research Database (Denmark)

    Nag, Sidsel; Dalgaard, Marlene Danner; Kofoed, Poul-Erik

    2017-01-01

    as the entire length of pfK13, and the mitochondrial barcode for parasite origin. SNPs of interest were sequenced with an average depth of 2,043 reads, and bases were called for the various SNP-positions with a p-value below 0.05, for 89.8-100% of samples. The SNP data indicates that artemisinin resistance......-conferring SNPs in pfK13 are absent from the studied area of Guinea-Bissau, while the pfmdr1 86 N allele is found at a high prevalence. The mitochondrial barcodes are unanimous and accommodate a West African origin of the parasites. With this method, very reliable high throughput surveillance of antimalarial drug...

  20. HTSSIP: An R package for analysis of high throughput sequencing data from nucleic acid stable isotope probing (SIP) experiments.

    Science.gov (United States)

    Youngblut, Nicholas D; Barnett, Samuel E; Buckley, Daniel H

    2018-01-01

    Combining high throughput sequencing with stable isotope probing (HTS-SIP) is a powerful method for mapping in situ metabolic processes to thousands of microbial taxa. However, accurately mapping metabolic processes to taxa is complex and challenging. Multiple HTS-SIP data analysis methods have been developed, including high-resolution stable isotope probing (HR-SIP), multi-window high-resolution stable isotope probing (MW-HR-SIP), quantitative stable isotope probing (qSIP), and ΔBD. Currently, there is no publicly available software designed specifically for analyzing HTS-SIP data. To address this shortfall, we have developed the HTSSIP R package, an open-source, cross-platform toolset for conducting HTS-SIP analyses in a straightforward and easily reproducible manner. The HTSSIP package, along with full documentation and examples, is available from CRAN at https://cran.r-project.org/web/packages/HTSSIP/index.html and Github at https://github.com/buckleylab/HTSSIP.

  1. metaBIT, an integrative and automated metagenomic pipeline for analysing microbial profiles from high-throughput sequencing shotgun data

    DEFF Research Database (Denmark)

    Louvel, Guillaume; Der Sarkissian, Clio; Hanghøj, Kristian Ebbesen

    2016-01-01

    Micro-organisms account for most of the Earth's biodiversity and yet remain largely unknown. The complexity and diversity of microbial communities present in clinical and environmental samples can now be robustly investigated in record times and prices thanks to recent advances in high......-throughput DNA sequencing (HTS). Here, we develop metaBIT, an open-source computational pipeline automatizing routine microbial profiling of shotgun HTS data. Customizable by the user at different stringency levels, it performs robust taxonomy-based assignment and relative abundance calculation of microbial taxa......, as well as cross-sample statistical analyses of microbial diversity distributions. We demonstrate the versatility of metaBIT within a range of published HTS data sets sampled from the environment (soil and seawater) and the human body (skin and gut), but also from archaeological specimens. We present...

  2. Ammonium inhibition through the decoupling of acidification process and methanogenesis in anaerobic digester revealed by high throughput sequencing.

    Science.gov (United States)

    Zhang, Miao; Lin, Qiang; Rui, Junpeng; Li, Jiabao; Li, Xiangzhen

    2017-02-01

    To reveal the shifts of microbial communities along ammonium gradients, and the relationship between microbial community composition and the anaerobic digestion performance using a high throughput sequencing technique. Methane production declined with increasing ammonium concentration, and was inhibited above 4 g l-1. The volatile fatty acids, especially acetate, accumulated with elevated ammonium. Prokaryotic populations showed different responses to the ammonium concentration: Clostridium, Tepidimicrobium, Sporanaerobacter, Peptostreptococcus, Sarcina and Peptoniphilus showed good tolerance to ammonium ions. However, Syntrophomonas with poor tolerance to ammonium may be inhibited during anaerobic digestion. During methanogenesis, Methanosarcina was the dominant methanogen. Excessive ammonium inhibited methane production probably by decoupling the linkage between acidification process and methanogenesis, and finally resulted in different performance in anaerobic digestion.

  3. Insights into the microbial diversity and community dynamics of Chinese traditional fermented foods from using high-throughput sequencing approaches*

    Science.gov (United States)

    He, Guo-qing; Liu, Tong-jie; Sadiq, Faizan A.; Gu, Jing-si; Zhang, Guo-hua

    2017-01-01

    Chinese traditional fermented foods have a very long history dating back thousands of years and have become an indispensable part of Chinese dietary culture. A plethora of research has been conducted to unravel the composition and dynamics of microbial consortia associated with Chinese traditional fermented foods using culture-dependent as well as culture-independent methods, like different high-throughput sequencing (HTS) techniques. These HTS techniques enable us to understand the relationship between a food product and its microbes to a greater extent than ever before. Considering the importance of Chinese traditional fermented products, the objective of this paper is to review the diversity and dynamics of microbiota in Chinese traditional fermented foods revealed by HTS approaches. PMID:28378567

  4. Insights into the microbial diversity and community dynamics of Chinese traditional fermented foods from using high-throughput sequencing approaches.

    Science.gov (United States)

    He, Guo-Qing; Liu, Tong-Jie; Sadiq, Faizan A; Gu, Jing-Si; Zhang, Guo-Hua

    Chinese traditional fermented foods have a very long history dating back thousands of years and have become an indispensable part of Chinese dietary culture. A plethora of research has been conducted to unravel the composition and dynamics of microbial consortia associated with Chinese traditional fermented foods using culture-dependent as well as culture-independent methods, like different high-throughput sequencing (HTS) techniques. These HTS techniques enable us to understand the relationship between a food product and its microbes to a greater extent than ever before. Considering the importance of Chinese traditional fermented products, the objective of this paper is to review the diversity and dynamics of microbiota in Chinese traditional fermented foods revealed by HTS approaches.

  5. High-Throughput Sequencing of Microbial Community Diversity and Dynamics during Douchi Fermentation

    National Research Council Canada - National Science Library

    Yang, Lin; Yang, Hui-lin; Tu, Zong-cai; Wang, Xiao-lan

    2016-01-01

    .... A total of 181,443 high quality bacterial 16S rRNA sequences and 221,059 high quality fungal internal transcribed spacer reads were used for taxonomic classification, revealing eight bacterial and three fungal phyla...

  6. High-throughput polymorphism detection and genotyping in Brassica napus using next-generation RAD sequencing

    Directory of Open Access Journals (Sweden)

    Bus Anja

    2012-06-01

    Full Text Available Abstract Background The complex genome of rapeseed (Brassica napus is not well understood despite the economic importance of the species. Good knowledge of sequence variation is needed for genetics approaches and breeding purposes. We used a diversity set of B. napus representing eight different germplasm types to sequence genome-wide distributed restriction-site associated DNA (RAD fragments for polymorphism detection and genotyping. Results More than 113,000 RAD clusters with more than 20,000 single nucleotide polymorphisms (SNPs and 125 insertions/deletions were detected and characterized. About one third of the RAD clusters and polymorphisms mapped to the Brassica rapa reference sequence. An even distribution of RAD clusters and polymorphisms was observed across the B. rapa chromosomes, which suggests that there might be an equal distribution over the Brassica oleracea chromosomes, too. The representation of Gene Ontology (GO terms for unigenes with RAD clusters and polymorphisms revealed no signature of selection with respect to the distribution of polymorphisms within genes belonging to a specific GO category. Conclusions Considering the decreasing costs for next-generation sequencing, the results of our study suggest that RAD sequencing is not only a simple and cost-effective method for high-density polymorphism detection but also an alternative to SNP genotyping from transcriptome sequencing or SNP arrays, even for species with complex genomes such as B. napus.

  7. Isolation and characterization of antigen-specific alpaca (Lama pacos) VHH antibodies by biopanning followed by high-throughput sequencing.

    Science.gov (United States)

    Miyazaki, Nobuo; Kiyose, Norihiko; Akazawa, Yoko; Takashima, Mizuki; Hagihara, Yosihisa; Inoue, Naokazu; Matsuda, Tomonari; Ogawa, Ryu; Inoue, Seiya; Ito, Yuji

    2015-09-01

    The antigen-binding domain of camelid dimeric heavy chain antibodies, known as VHH or Nanobody, has much potential in pharmaceutical and industrial applications. To establish the isolation process of antigen-specific VHH, a VHH phage library was constructed with a diversity of 8.4 × 10(7) from cDNA of peripheral blood mononuclear cells of an alpaca (Lama pacos) immunized with a fragment of IZUMO1 (IZUMO1PFF) as a model antigen. By conventional biopanning, 13 antigen-specific VHHs were isolated. The amino acid sequences of these VHHs, designated as N-group VHHs, were very similar to each other (>93% identity). To find more diverse antibodies, we performed high-throughput sequencing (HTS) of VHH genes. By comparing the frequencies of each sequence between before and after biopanning, we found the sequences whose frequencies were increased by biopanning. The top 100 sequences of them were supplied for phylogenic tree analysis. In total 75% of them belonged to N-group VHHs, but the other were phylogenically apart from N-group VHHs (Non N-group). Two of three VHHs selected from non N-group VHHs showed sufficient antigen binding ability. These results suggested that biopanning followed by HTS provided a useful method for finding minor and diverse antigen-specific clones that could not be identified by conventional biopanning. © The Authors 2015. Published by Oxford University Press on behalf of the Japanese Biochemical Society. All rights reserved.

  8. Universal and blocking primer mismatches limit the use of high-throughput DNA sequencing for the quantitative metabarcoding of arthropods.

    Science.gov (United States)

    Piñol, J; Mir, G; Gomez-Polo, P; Agustí, N

    2015-07-01

    The quantification of the biological diversity in environmental samples using high-throughput DNA sequencing is hindered by the PCR bias caused by variable primer-template mismatches of the individual species. In some dietary studies, there is the added problem that samples are enriched with predator DNA, so often a predator-specific blocking oligonucleotide is used to alleviate the problem. However, specific blocking oligonucleotides could coblock nontarget species to some degree. Here, we accurately estimate the extent of the PCR biases induced by universal and blocking primers on a mock community prepared with DNA of twelve species of terrestrial arthropods. We also compare universal and blocking primer biases with those induced by variable annealing temperature and number of PCR cycles. The results show that reads of all species were recovered after PCR enrichment at our control conditions (no blocking oligonucleotide, 45 °C annealing temperature and 40 cycles) and high-throughput sequencing. They also show that the four factors considered biased the final proportions of the species to some degree. Among these factors, the number of primer-template mismatches of each species had a disproportionate effect (up to five orders of magnitude) on the amplification efficiency. In particular, the number of primer-template mismatches explained most of the variation (~3/4) in the amplification efficiency of the species. The effect of blocking oligonucleotide concentration on nontarget species relative abundance was also significant, but less important (below one order of magnitude). Considering the results reported here, the quantitative potential of the technique is limited, and only qualitative results (the species list) are reliable, at least when targeting the barcoding COI region. © 2014 John Wiley & Sons Ltd.

  9. MicroRNA from Moringa oleifera: Identification by High Throughput Sequencing and Their Potential Contribution to Plant Medicinal Value.

    Directory of Open Access Journals (Sweden)

    Stefano Pirrò

    Full Text Available Moringa oleifera is a widespread plant with substantial nutritional and medicinal value. We postulated that microRNAs (miRNAs, which are endogenous, noncoding small RNAs regulating gene expression at the post-transcriptional level, might contribute to the medicinal properties of plants of this species after ingestion into human body, regulating human gene expression. However, the knowledge is scarce about miRNA in Moringa. Furthermore, in order to test the hypothesis on the pharmacological potential properties of miRNA, we conducted a high-throughput sequencing analysis using the Illumina platform. A total of 31,290,964 raw reads were produced from a library of small RNA isolated from M. oleifera seeds. We identified 94 conserved and two novel miRNAs that were validated by qRT-PCR assays. Results from qRT-PCR trials conducted on the expression of 20 Moringa miRNA showed that are conserved across multiple plant species as determined by their detection in tissue of other common crop plants. In silico analyses predicted target genes for the conserved miRNA that in turn allowed to relate the miRNAs to the regulation of physiological processes. Some of the predicted plant miRNAs have functional homology to their mammalian counterparts and regulated human genes when they were transfected into cell lines. To our knowledge, this is the first report of discovering M. oleifera miRNAs based on high-throughput sequencing and bioinformatics analysis and we provided new insight into a potential cross-species control of human gene expression. The widespread cultivation and consumption of M. oleifera, for nutritional and medicinal purposes, brings humans into close contact with products and extracts of this plant species. The potential for miRNA transfer should be evaluated as one possible mechanism of action to account for beneficial properties of this valuable species.

  10. MicroRNA from Moringa oleifera: Identification by High Throughput Sequencing and Their Potential Contribution to Plant Medicinal Value.

    Science.gov (United States)

    Pirrò, Stefano; Zanella, Letizia; Kenzo, Maurice; Montesano, Carla; Minutolo, Antonella; Potestà, Marina; Sobze, Martin Sanou; Canini, Antonella; Cirilli, Marco; Muleo, Rosario; Colizzi, Vittorio; Galgani, Andrea

    2016-01-01

    Moringa oleifera is a widespread plant with substantial nutritional and medicinal value. We postulated that microRNAs (miRNAs), which are endogenous, noncoding small RNAs regulating gene expression at the post-transcriptional level, might contribute to the medicinal properties of plants of this species after ingestion into human body, regulating human gene expression. However, the knowledge is scarce about miRNA in Moringa. Furthermore, in order to test the hypothesis on the pharmacological potential properties of miRNA, we conducted a high-throughput sequencing analysis using the Illumina platform. A total of 31,290,964 raw reads were produced from a library of small RNA isolated from M. oleifera seeds. We identified 94 conserved and two novel miRNAs that were validated by qRT-PCR assays. Results from qRT-PCR trials conducted on the expression of 20 Moringa miRNA showed that are conserved across multiple plant species as determined by their detection in tissue of other common crop plants. In silico analyses predicted target genes for the conserved miRNA that in turn allowed to relate the miRNAs to the regulation of physiological processes. Some of the predicted plant miRNAs have functional homology to their mammalian counterparts and regulated human genes when they were transfected into cell lines. To our knowledge, this is the first report of discovering M. oleifera miRNAs based on high-throughput sequencing and bioinformatics analysis and we provided new insight into a potential cross-species control of human gene expression. The widespread cultivation and consumption of M. oleifera, for nutritional and medicinal purposes, brings humans into close contact with products and extracts of this plant species. The potential for miRNA transfer should be evaluated as one possible mechanism of action to account for beneficial properties of this valuable species.

  11. Bulk segregant analysis by high-throughput sequencing reveals a novel xylose utilization gene from Saccharomyces cerevisiae.

    Directory of Open Access Journals (Sweden)

    Jared W Wenger

    2010-05-01

    Full Text Available Fermentation of xylose is a fundamental requirement for the efficient production of ethanol from lignocellulosic biomass sources. Although they aggressively ferment hexoses, it has long been thought that native Saccharomyces cerevisiae strains cannot grow fermentatively or non-fermentatively on xylose. Population surveys have uncovered a few naturally occurring strains that are weakly xylose-positive, and some S. cerevisiae have been genetically engineered to ferment xylose, but no strain, either natural or engineered, has yet been reported to ferment xylose as efficiently as glucose. Here, we used a medium-throughput screen to identify Saccharomyces strains that can increase in optical density when xylose is presented as the sole carbon source. We identified 38 strains that have this xylose utilization phenotype, including strains of S. cerevisiae, other sensu stricto members, and hybrids between them. All the S. cerevisiae xylose-utilizing strains we identified are wine yeasts, and for those that could produce meiotic progeny, the xylose phenotype segregates as a single gene trait. We mapped this gene by Bulk Segregant Analysis (BSA using tiling microarrays and high-throughput sequencing. The gene is a putative xylitol dehydrogenase, which we name XDH1, and is located in the subtelomeric region of the right end of chromosome XV in a region not present in the S288c reference genome. We further characterized the xylose phenotype by performing gene expression microarrays and by genetically dissecting the endogenous Saccharomyces xylose pathway. We have demonstrated that natural S. cerevisiae yeasts are capable of utilizing xylose as the sole carbon source, characterized the genetic basis for this trait as well as the endogenous xylose utilization pathway, and demonstrated the feasibility of BSA using high-throughput sequencing.

  12. Metagenomic Analysis of Slovak Bryndza Cheese Using Next-Generation 16S rDNA Amplicon Sequencing

    Directory of Open Access Journals (Sweden)

    Planý Matej

    2016-06-01

    Full Text Available Knowledge about diversity and taxonomic structure of the microbial population present in traditional fermented foods plays a key role in starter culture selection, safety improvement and quality enhancement of the end product. Aim of this study was to investigate microbial consortia composition in Slovak bryndza cheese. For this purpose, we used culture-independent approach based on 16S rDNA amplicon sequencing using next generation sequencing platform. Results obtained by the analysis of three commercial (produced on industrial scale in winter season and one traditional (artisanal, most valued, produced in May Slovak bryndza cheese sample were compared. A diverse prokaryotic microflora composed mostly of the genera Lactococcus, Streptococcus, Lactobacillus, and Enterococcus was identified. Lactococcus lactis subsp. lactis and Lactococcus lactis subsp. cremoris were the dominant taxons in all tested samples. Second most abundant species, detected in all bryndza cheeses, were Lactococcus fujiensis and Lactococcus taiwanensis, independently by two different approaches, using different reference 16S rRNA genes databases (Greengenes and NCBI respectively. They have been detected in bryndza cheese samples in substantial amount for the first time. The narrowest microbial diversity was observed in a sample made with a starter culture from pasteurised milk. Metagenomic analysis by high-throughput sequencing using 16S rRNA genes seems to be a powerful tool for studying the structure of the microbial population in cheeses.

  13. Single-nucleotide polymorphism discovery by high-throughput sequencing in sorghum

    Directory of Open Access Journals (Sweden)

    White Frank F

    2011-07-01

    Full Text Available Abstract Background Eight diverse sorghum (Sorghum bicolor L. Moench accessions were subjected to short-read genome sequencing to characterize the distribution of single-nucleotide polymorphisms (SNPs. Two strategies were used for DNA library preparation. Missing SNP genotype data were imputed by local haplotype comparison. The effect of library type and genomic diversity on SNP discovery and imputation are evaluated. Results Alignment of eight genome equivalents (6 Gb to the public reference genome revealed 283,000 SNPs at ≥82% confirmation probability. Sequencing from libraries constructed to limit sequencing to start at defined restriction sites led to genotyping 10-fold more SNPs in all 8 accessions, and correctly imputing 11% more missing data, than from semirandom libraries. The SNP yield advantage of the reduced-representation method was less than expected, since up to one fifth of reads started at noncanonical restriction sites and up to one third of restriction sites predicted in silico to yield unique alignments were not sampled at near-saturation. For imputation accuracy, the availability of a genomically similar accession in the germplasm panel was more important than panel size or sequencing coverage. Conclusions A sequence quantity of 3 million 50-base reads per accession using a BsrFI library would conservatively provide satisfactory genotyping of 96,000 sorghum SNPs. For most reliable SNP-genotype imputation in shallowly sequenced genomes, germplasm panels should consist of pairs or groups of genomically similar entries. These results may help in designing strategies for economical genotyping-by-sequencing of large numbers of plant accessions.

  14. Genomic Methods Take the Plunge: Recent Advances in High-Throughput Sequencing of Marine Mammals.

    Science.gov (United States)

    Cammen, Kristina M; Andrews, Kimberly R; Carroll, Emma L; Foote, Andrew D; Humble, Emily; Khudyakov, Jane I; Louis, Marie; McGowen, Michael R; Olsen, Morten Tange; Van Cise, Amy M

    2016-11-01

    The dramatic increase in the application of genomic techniques to non-model organisms (NMOs) over the past decade has yielded numerous valuable contributions to evolutionary biology and ecology, many of which would not have been possible with traditional genetic markers. We review this recent progression with a particular focus on genomic studies of marine mammals, a group of taxa that represent key macroevolutionary transitions from terrestrial to marine environments and for which available genomic resources have recently undergone notable rapid growth. Genomic studies of NMOs utilize an expanding range of approaches, including whole genome sequencing, restriction site-associated DNA sequencing, array-based sequencing of single nucleotide polymorphisms and target sequence probes (e.g., exomes), and transcriptome sequencing. These approaches generate different types and quantities of data, and many can be applied with limited or no prior genomic resources, thus overcoming one traditional limitation of research on NMOs. Within marine mammals, such studies have thus far yielded significant contributions to the fields of phylogenomics and comparative genomics, as well as enabled investigations of fitness, demography, and population structure. Here we review the primary options for generating genomic data, introduce several emerging techniques, and discuss the suitability of each approach for different applications in the study of NMOs. © The American Genetic Association 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  15. Molecular diet analysis of two African free-tailed bats (Molossidae) using high throughput sequencing

    DEFF Research Database (Denmark)

    Bohmann, Kristine; Monadjem, Ara; Noer, Christina Lehmkuhl

    2011-01-01

    Given the diversity of prey consumed by insectivorous bats, it is difficult to discern the composition of their diet using morphological or conventional PCR-based analyses of their faeces. We demonstrate the use of a powerful alternate tool, the use of the Roche FLX sequencing platform to deep......-sequence uniquely 5′ tagged insect-generic barcode cytochrome c oxidase I (COI) fragments, that were PCR amplified from faecal pellets of two free-tailed bat species Chaerephon pumilus and Mops condylurus (family: Molossidae). Although the analyses were challenged by the paucity of southern African insect COI...

  16. High-throughput physical map anchoring via BAC-pool sequencing

    Czech Academy of Sciences Publication Activity Database

    Cviková, Kateřina; Cattonaro, F.; Alaux, M.; Stein, N.; Mayer, K.F.X.; Doležel, Jaroslav; Bartoš, Jan

    2015-01-01

    Roč. 15, APR 11 (2015) ISSN 1471-2229 R&D Projects: GA ČR GA13-08786S; GA MŠk(CZ) LO1204 Institutional support: RVO:61389030 Keywords : Physical map * Contig anchoring * Next generation sequencing Subject RIV: EB - Genetics ; Molecular Biology Impact factor: 3.631, year: 2015

  17. Profiling of Ribose Methylations in RNA by High-Throughput Sequencing

    DEFF Research Database (Denmark)

    Birkedal, Ulf; Christensen-Dalsgaard, Mikkel; Krogh, Nicolai

    2015-01-01

    Ribose methylations are the most abundant chemical modifications of ribosomal RNA and are critical for ribosome assembly and fidelity of translation. Many aspects of ribose methylations have been difficult to study due to lack of efficient mapping methods. Here, we present a sequencing-based meth...

  18. A Torrent of data: mapping chromatin organization using 5C and high-throughput sequencing.

    Science.gov (United States)

    Fraser, James; Ethier, Sylvain D; Miura, Hisashi; Dostie, Josée

    2012-01-01

    The study of three-dimensional genome organization is an exciting research area, which has benefited from the rapid development of high-resolution molecular mapping techniques over the past decade. These methods are derived from the chromosome conformation capture (3C) technique and are each aimed at improving some aspect of 3C. All 3C technologies use formaldehyde fixation and proximity-based ligation to capture chromatin contacts in cell populations and consider in vivo spatial proximity more or less inversely proportional to the frequency of measured interactions. The 3C-carbon copy (5C) method is among the most quantitative of these approaches. 5C is extremely robust and can be used to study chromatin organization at various scales. Here, we present a modified 5C analysis protocol adapted for sequencing with an Ion Torrent Personal Genome Machine™ (PGM™). We explain how Torrent 5C libraries are produced and sequenced. We also describe the statistical and computational methods we developed to normalize and analyze raw Torrent 5C sequence data. The Torrent 5C protocol should facilitate the study of in vivo chromatin architecture at high resolution because it benefits from high accuracy, greater speed, low running costs, and the flexibility of in-house next-generation sequencing. Copyright © 2012 Elsevier Inc. All rights reserved.

  19. Designing small universal k-mer hitting sets for improved analysis of high-throughput sequencing.

    Directory of Open Access Journals (Sweden)

    Yaron Orenstein

    2017-10-01

    Full Text Available With the rapidly increasing volume of deep sequencing data, more efficient algorithms and data structures are needed. Minimizers are a central recent paradigm that has improved various sequence analysis tasks, including hashing for faster read overlap detection, sparse suffix arrays for creating smaller indexes, and Bloom filters for speeding up sequence search. Here, we propose an alternative paradigm that can lead to substantial further improvement in these and other tasks. For integers k and L > k, we say that a set of k-mers is a universal hitting set (UHS if every possible L-long sequence must contain a k-mer from the set. We develop a heuristic called DOCKS to find a compact UHS, which works in two phases: The first phase is solved optimally, and for the second we propose several efficient heuristics, trading set size for speed and memory. The use of heuristics is motivated by showing the NP-hardness of a closely related problem. We show that DOCKS works well in practice and produces UHSs that are very close to a theoretical lower bound. We present results for various values of k and L and by applying them to real genomes show that UHSs indeed improve over minimizers. In particular, DOCKS uses less than 30% of the 10-mers needed to span the human genome compared to minimizers. The software and computed UHSs are freely available at github.com/Shamir-Lab/DOCKS/ and acgt.cs.tau.ac.il/docks/, respectively.

  20. Transcriptome analysis of Emiliania huxleyi cells grown under different conditions using high-throughput sequencing data

    Science.gov (United States)

    Andreson, R.; Anlauf, H.; Mackinder, L.; Iglesias-Rodriguez, D.; LaRoche, J.; Lenhard, B.

    2012-04-01

    Coccolithophores are ideal for studying genes responsible for biomineralization processes due to relatively small genome sizes, ability to grow in culture, and as a natural model system for measuring expression of calcification-related genes in two life stages. As the Emiliania huxleyi has several annotated calcification-related proteins, we have concentrated on analyzing its genes and promoter areas. Many recent studies have focused primarily on transcriptome analysis of E. huxleyi using nutrient-limited conditions to get more information about up-regulated genes involved in biomineralization and calcification processes. Although there are more than 100,000 EST sequences for E. huxleyi available from these projects in public databases, that data is often insufficient to identify the exact position of transcription start site (TSS) to perform precise analysis (nucleotide content, motif search) of core promoters and regulatory mechanisms in immediate flanking areas. ESTs are not ideal for these kinds of analyses because the standard technologies of producing 5' EST libraries do not guarantee that the exact 5' end of the transcript will be captured. To determine the extent and accurate positions of 5' ends of transcripts and therefore the positions of core promoters, Cap analysis of gene expression (CAGE) sequencing method was used for sequencing RNA of E. huxleyi in both stages, calcifying and non-calcifying. As an additional info, gene expression levels of RNA for 21 samples were retrieved with whole transcriptome shotgun sequencing (RNA-Seq). The collections of reads these methods produced were used to map and annotate genes on several samples and measure the RNA expression levels in different conditions. Although there are not much data available for close organisms, it is possible to compare these results with other species to find conserved regulatory mechanisms between genes related to calcification. Visualization tools allowing browsing of annotated genes

  1. Genome-Wide Assessment of the Binding Effects of Artificial Transcriptional Activators by High-Throughput Sequencing.

    Science.gov (United States)

    Chandran, Anandhakumar; Syed, Junetha; Li, Yue; Sato, Shinsuke; Bando, Toshikazu; Sugiyama, Hiroshi

    2016-10-17

    One of the major goals in DNA-based personalized medicine is the development of sequence-specific small molecules to target the genome. SAHA-PIPs belong to such class of small molecule. In the context of the complex eukaryotic genome, the differential biological effects of SAHA-PIPs are unclear. This question can be addressed by identifying the binding regions across the genome; however, it is a challenge to enrich small-molecule-bound DNA without chemical crosslinking. Here, we developed a method that employs high-throughput sequencing to map the binding area of small molecules throughout the chromatinized human genome. Analysis of the sequenced data confirmed the presence of specific binding sites for SAHA-PIPs from the enriched sequence reads. Mapping the binding sites and enriched regions on the human genome clarifies the reason for the distinct biological effects of SAHA-PIP. This approach will be useful for identifying the function of other small molecules on a large scale. © 2016 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.

  2. Melon Transcriptome Characterization: Simple Sequence Repeats and Single Nucleotide Polymorphisms Discovery for High Throughput Genotyping across the Species

    Directory of Open Access Journals (Sweden)

    José Miguel Blanca

    2011-07-01

    Full Text Available Melon ( L. ranks among the highest-valued fruit crops worldwide. Some genomic tools are available for this crop, including a Sanger transcriptome. We report the generation of 689,054 high-quality expressed sequence tags (ESTs from two 454 sequencing runs, using normalized and nonnormalized complementary DNA (cDNA libraries prepared from four genotypes belonging to the two subspecies and the main commercial types. 454 ESTs were combined with the Sanger available ESTs and de novo assembled into 53,252 unigenes. Over 63% of the unigenes were functionally annotated with Gene Ontology (GO terms and 21% had known orthologs of (L. Heynh. Annotation distribution followed similar tendencies than that reported for , suggesting that the dataset represents a fairly complete melon transcriptome. Furthermore, we identified a set of 3298 unigenes with microsatellite motifs and 14,417 sequences with single nucleotide variants of which 11,655 single nucleotide polymorphism met criteria for use with high-throughput genotyping platforms, and 453 could be detected as cleaved amplified polymorphic sequence (CAPS. A set of markers were validated, 90% of them being polymorphic in a number of variable accessions. This transcriptome provides an invaluable new tool for biological research, more so when it includes transcripts not described previously. It is being used for genome annotation and has provided a large collection of markers that will allow speeding up the process of breeding new melon varieties.

  3. naiveBayesCall: an efficient model-based base-calling algorithm for high-throughput sequencing.

    Science.gov (United States)

    Kao, Wei-Chun; Song, Yun S

    2011-03-01

    Immense amounts of raw instrument data (i.e., images of fluorescence) are currently being generated using ultra high-throughput sequencing platforms. An important computational challenge associated with this rapid advancement is to develop efficient algorithms that can extract accurate sequence information from raw data. To address this challenge, we recently introduced a novel model-based base-calling algorithm that is fully parametric and has several advantages over previously proposed methods. Our original algorithm, called BayesCall, significantly reduced the error rate, particularly in the later cycles of a sequencing run, and also produced useful base-specific quality scores with a high discrimination ability. Unfortunately, however, BayesCall is too computationally expensive to be of broad practical use. In this article, we build on our previous model-based approach to devise an efficient base-calling algorithm that is orders of magnitude faster than BayesCall, while still maintaining a comparably high level of accuracy. Our new algorithm is called naive-BayesCall, and it utilizes approximation and optimization methods to achieve scalability. We describe the performance of naiveBayesCall and demonstrate how improved base-calling accuracy may facilitate de novo assembly and SNP detection when the sequence coverage depth is low to moderate.

  4. [High throughput-targeted sequencing panel for exploring radiosensitivity associated genes in esophageal squamous cell carcinoma].

    Science.gov (United States)

    Qiao, Y; Hu, C X; Song, D A; Li, S Q; Zhou, L H; Jiang, X D

    2017-08-23

    Objective: To explore radiosensitivity-associated genes in esophageal squamous cell carcinoma by targeted sequencing panel. Methods: The peripheral blood from 22 esophageal squamous cell carcinoma (ESCC) patients received radiotherapy alone were collected, respectively. The genomic DNA (gDNA) of peripheral blood was extracted and used to create a library of gDNA restriction fragments. The gDNA restriction fragments were hybridized to the HaloPlex probe capture library, which comprises 356 cancer genes selected from the Catalogue of Somatic Mutations in Cancer (Cosmic) database of 2011 updated edition. The sequencing data were aligned by the Genome Analysis Toolkit GATK (version 3.0) and Picar. The single nucleotide polymorphism and inserted-deletion (SNP/InDel) variations were annotated by online database. The pathway enrichment was analyzed by Ingenuity Pathway analysis (IPA). Moreover, according to the short-period curative effect, 22 patients were divided into two groups: the radiation- sensitivity group (CR+ PR) and the radiation-resistant group (PD+ SD). The nonsynonymous mutation sites were statistically analyzed and the genes associated with radiosensitivity of ESCC were screened. Results: More than 97% sequencing reads were aligned to human genome reference sequence and more than 90% sequencing reads were the target sequences. SNP/InDel database annotation results showed that the mutations of 22 cases mainly distributed in exons, and the mutant types were mainly missense and synonymous single nucleotide variant (SNV). There were 23 genes of high-frequency mutation associated with esophageal cancer. Pathway enrichment by IPA showed that 3 pathways were associated with the development of esophageal cancer, which were roles of BRCA1 in DNA damage response pathway, DNA double-strand break repair by non-homologous end joining pathway and ATM signaling pathway. According to the curative effect, five genes including mismatch repair system component (PMS1

  5. Exploring the environmental diversity of kinetoplastid flagellates in the high-throughput DNA sequencing era

    Directory of Open Access Journals (Sweden)

    Claudia Masini d’Avila-Levy

    2015-01-01

    Full Text Available The class Kinetoplastea encompasses both free-living and parasitic species from a wide range of hosts. Several representatives of this group are responsible for severe human diseases and for economic losses in agriculture and livestock. While this group encompasses over 30 genera, most of the available information has been derived from the vertebrate pathogenic genera Leishmaniaand Trypanosoma.Recent studies of the previously neglected groups of Kinetoplastea indicated that the actual diversity is much higher than previously thought. This article discusses the known segment of kinetoplastid diversity and how gene-directed Sanger sequencing and next-generation sequencing methods can help to deepen our knowledge of these interesting protists.

  6. Identification and characterization of microRNAs in Phaseolus vulgaris by high-throughput sequencing

    Science.gov (United States)

    2012-01-01

    Background MicroRNAs (miRNAs) are endogenously encoded small RNAs that post-transcriptionally regulate gene expression. MiRNAs play essential roles in almost all plant biological processes. Currently, few miRNAs have been identified in the model food legume Phaseolus vulgaris (common bean). Recent advances in next generation sequencing technologies have allowed the identification of conserved and novel miRNAs in many plant species. Here, we used Illumina's sequencing by synthesis (SBS) technology to identify and characterize the miRNA population of Phaseolus vulgaris. Results Small RNA libraries were generated from roots, flowers, leaves, and seedlings of P. vulgaris. Based on similarity to previously reported plant miRNAs,114 miRNAs belonging to 33 conserved miRNA families were identified. Stem-loop precursors and target gene sequences for several conserved common bean miRNAs were determined from publicly available databases. Less conserved miRNA families and species-specific common bean miRNA isoforms were also characterized. Moreover, novel miRNAs based on the small RNAs were found and their potential precursors were predicted. In addition, new target candidates for novel and conserved miRNAs were proposed. Finally, we studied organ-specific miRNA family expression levels through miRNA read frequencies. Conclusions This work represents the first massive-scale RNA sequencing study performed in Phaseolus vulgaris to identify and characterize its miRNA population. It significantly increases the number of miRNAs, precursors, and targets identified in this agronomically important species. The miRNA expression analysis provides a foundation for understanding common bean miRNA organ-specific expression patterns. The present study offers an expanded picture of P. vulgaris miRNAs in relation to those of other legumes. PMID:22394504

  7. Identification and characterization of microRNAs in Phaseolus vulgaris by high-throughput sequencing.

    Science.gov (United States)

    Peláez, Pablo; Trejo, Minerva S; Iñiguez, Luis P; Estrada-Navarrete, Georgina; Covarrubias, Alejandra A; Reyes, José L; Sanchez, Federico

    2012-03-06

    MicroRNAs (miRNAs) are endogenously encoded small RNAs that post-transcriptionally regulate gene expression. MiRNAs play essential roles in almost all plant biological processes. Currently, few miRNAs have been identified in the model food legume Phaseolus vulgaris (common bean). Recent advances in next generation sequencing technologies have allowed the identification of conserved and novel miRNAs in many plant species. Here, we used Illumina's sequencing by synthesis (SBS) technology to identify and characterize the miRNA population of Phaseolus vulgaris. Small RNA libraries were generated from roots, flowers, leaves, and seedlings of P. vulgaris. Based on similarity to previously reported plant miRNAs,114 miRNAs belonging to 33 conserved miRNA families were identified. Stem-loop precursors and target gene sequences for several conserved common bean miRNAs were determined from publicly available databases. Less conserved miRNA families and species-specific common bean miRNA isoforms were also characterized. Moreover, novel miRNAs based on the small RNAs were found and their potential precursors were predicted. In addition, new target candidates for novel and conserved miRNAs were proposed. Finally, we studied organ-specific miRNA family expression levels through miRNA read frequencies. This work represents the first massive-scale RNA sequencing study performed in Phaseolus vulgaris to identify and characterize its miRNA population. It significantly increases the number of miRNAs, precursors, and targets identified in this agronomically important species. The miRNA expression analysis provides a foundation for understanding common bean miRNA organ-specific expression patterns. The present study offers an expanded picture of P. vulgaris miRNAs in relation to those of other legumes.

  8. Identification and characterization of microRNAs in Phaseolus vulgaris by high-throughput sequencing

    Directory of Open Access Journals (Sweden)

    Peláez Pablo

    2012-03-01

    Full Text Available Abstract Background MicroRNAs (miRNAs are endogenously encoded small RNAs that post-transcriptionally regulate gene expression. MiRNAs play essential roles in almost all plant biological processes. Currently, few miRNAs have been identified in the model food legume Phaseolus vulgaris (common bean. Recent advances in next generation sequencing technologies have allowed the identification of conserved and novel miRNAs in many plant species. Here, we used Illumina's sequencing by synthesis (SBS technology to identify and characterize the miRNA population of Phaseolus vulgaris. Results Small RNA libraries were generated from roots, flowers, leaves, and seedlings of P. vulgaris. Based on similarity to previously reported plant miRNAs,114 miRNAs belonging to 33 conserved miRNA families were identified. Stem-loop precursors and target gene sequences for several conserved common bean miRNAs were determined from publicly available databases. Less conserved miRNA families and species-specific common bean miRNA isoforms were also characterized. Moreover, novel miRNAs based on the small RNAs were found and their potential precursors were predicted. In addition, new target candidates for novel and conserved miRNAs were proposed. Finally, we studied organ-specific miRNA family expression levels through miRNA read frequencies. Conclusions This work represents the first massive-scale RNA sequencing study performed in Phaseolus vulgaris to identify and characterize its miRNA population. It significantly increases the number of miRNAs, precursors, and targets identified in this agronomically important species. The miRNA expression analysis provides a foundation for understanding common bean miRNA organ-specific expression patterns. The present study offers an expanded picture of P. vulgaris miRNAs in relation to those of other legumes.

  9. An integrated multiple capillary array electrophoresis system for high-throughput DNA sequencing

    Energy Technology Data Exchange (ETDEWEB)

    Lu, X.

    1998-03-27

    A capillary array electrophoresis system was chosen to perform DNA sequencing because of several advantages such as rapid heat dissipation, multiplexing capabilities, gel matrix filling simplicity, and the mature nature of the associated manufacturing technologies. There are two major concerns for the multiple capillary systems. One concern is inter-capillary cross-talk, and the other concern is excitation and detection efficiency. Cross-talk is eliminated through proper optical coupling, good focusing and immersing capillary array into index matching fluid. A side-entry excitation scheme with orthogonal detection was established for large capillary array. Two 100 capillary array formats were used for DNA sequencing. One format is cylindrical capillary with 150 {micro}m o.d., 75 {micro}m i.d and the other format is square capillary with 300 {micro}m out edge and 75 {micro}m inner edge. This project is focused on the development of excitation and detection of DNA as well as performing DNA sequencing. The DNA injection schemes are discussed for the cases of single and bundled capillaries. An individual sampling device was designed. The base-calling was performed for a capillary from the capillary array with the accuracy of 98%.

  10. High-throughput sequencing of human plasma RNA by using thermostable group II intron reverse transcriptases.

    Science.gov (United States)

    Qin, Yidan; Yao, Jun; Wu, Douglas C; Nottingham, Ryan M; Mohr, Sabine; Hunicke-Smith, Scott; Lambowitz, Alan M

    2016-01-01

    Next-generation RNA-sequencing (RNA-seq) has revolutionized transcriptome profiling, gene expression analysis, and RNA-based diagnostics. Here, we developed a new RNA-seq method that exploits thermostable group II intron reverse transcriptases (TGIRTs) and used it to profile human plasma RNAs. TGIRTs have higher thermostability, processivity, and fidelity than conventional reverse transcriptases, plus a novel template-switching activity that can efficiently attach RNA-seq adapters to target RNA sequences without RNA ligation. The new TGIRT-seq method enabled construction of RNA-seq libraries from RNA in RNA in 1-mL plasma samples from a healthy individual revealed RNA fragments mapping to a diverse population of protein-coding gene and long ncRNAs, which are enriched in intron and antisense sequences, as well as nearly all known classes of small ncRNAs, some of which have never before been seen in plasma. Surprisingly, many of the small ncRNA species were present as full-length transcripts, suggesting that they are protected from plasma RNases in ribonucleoprotein (RNP) complexes and/or exosomes. This TGIRT-seq method is readily adaptable for profiling of whole-cell, exosomal, and miRNAs, and for related procedures, such as HITS-CLIP and ribosome profiling. © 2015 Qin et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.

  11. Utility of high-throughput DNA sequencing in the study of the human papillomaviruses.

    Science.gov (United States)

    Escobar-Escamilla, Noé; Ramírez-González, José Ernesto; Castro-Escarpulli, Graciela; Díaz-Quiñonez, José Alberto

    2017-12-27

    The Papillomaviridae family is probably the most diverse group of viruses that affect vertebrates. The study of the relationship between infection by certain types of human papillomavirus (HPV) and the development of neoplastic epithelial lesions is of particular interest because of the high prevalence of HPV-related carcinomas in populations of developing countries. To understand the mechanisms of infection and their association with different clinical manifestations, molecular tools play an important role in the description of new types of HPV, the characterization of effector properties of the viral factors, the specific diagnosis and monitoring of HPV types, and the alteration patterns at genetic level in the host. Technological advances in the field of DNA sequencing have led to the development of different next-generation sequencing systems, allowing obtaining a large amount of data and broadening the applications to study viral diseases. In this review, we summarize the main approaches and their perspectives where the use of massively parallel sequencing has been proved as a useful tool in the research of the HPV infection.

  12. Deconvolution of multiple infections in Plasmodium falciparum from high throughput sequencing data.

    Science.gov (United States)

    Zhu, Sha Joe; Almagro-Garcia, Jacob; McVean, Gil

    2018-01-01

    The presence of multiple infecting strains of the malarial parasite Plasmodium falciparum affects key phenotypic traits, including drug resistance and risk of severe disease. Advances in protocols and sequencing technology have made it possible to obtain high-coverage genome-wide sequencing data from blood samples and blood spots taken in the field. However, analyzing and interpreting such data is challenging because of the high rate of multiple infections present. We have developed a statistical method and implementation for deconvolving multiple genome sequences present in an individual with mixed infections. The software package DEploid uses haplotype structure within a reference panel of clonal isolates as a prior for haplotypes present in a given sample. It estimates the number of strains, their relative proportions and the haplotypes presented in a sample, allowing researchers to study multiple infection in malaria with an unprecedented level of detail. The open source implementation DEploid is freely available at https://github.com/mcveanlab/DEploid under the conditions of the GPLv3 license. An R version is available at https://github.com/mcveanlab/DEploid-r. joe.zhu@bdi.ox.ac.uk or gil.mcvean@bdi.ox.ac.uk. Supplementary data are available at Bioinformatics online.

  13. The High Throughput Sequence Annotation Service (HT-SAS – the shortcut from sequence to true Medline words

    Directory of Open Access Journals (Sweden)

    Siedlecki Pawel

    2009-05-01

    Full Text Available Abstract Background Advances in high-throughput technologies available to modern biology have created an increasing flood of experimentally determined facts. Ordering, managing and describing these raw results is the first step which allows facts to become knowledge. Currently there are limited ways to automatically annotate such data, especially utilizing information deposited in published literature. Results To aid researchers in describing results from high-throughput experiments we developed HT-SAS, a web service for automatic annotation of proteins using general English words. For each protein a poll of Medline abstracts connected to homologous proteins is gathered using the UniProt-Medline link. Overrepresented words are detected using binomial statistics approximation. We tested our automatic approach with a protein test set from SGD to determine the accuracy and usefulness of our approach. We also applied the automatic annotation service to improve annotations of proteins from Plasmodium bergei expressed exclusively during the blood stage. Conclusion Using HT-SAS we created new, or enriched already established annotations for over 20% of proteins from Plasmodium bergei expressed in the blood stage, deposited in PlasmoDB. Our tests show this approach to information extraction provides highly specific keywords, often also when the number of abstracts is limited. Our service should be useful for manual curators, as a complement to manually curated information sources and for researchers working with protein datasets, especially from poorly characterized organisms.

  14. Secure and robust cloud computing for high-throughput forensic microsatellite sequence analysis and databasing.

    Science.gov (United States)

    Bailey, Sarah F; Scheible, Melissa K; Williams, Christopher; Silva, Deborah S B S; Hoggan, Marina; Eichman, Christopher; Faith, Seth A

    2017-11-01

    Next-generation Sequencing (NGS) is a rapidly evolving technology with demonstrated benefits for forensic genetic applications, and the strategies to analyze and manage the massive NGS datasets are currently in development. Here, the computing, data storage, connectivity, and security resources of the Cloud were evaluated as a model for forensic laboratory systems that produce NGS data. A complete front-to-end Cloud system was developed to upload, process, and interpret raw NGS data using a web browser dashboard. The system was extensible, demonstrating analysis capabilities of autosomal and Y-STRs from a variety of NGS instrumentation (Illumina MiniSeq and MiSeq, and Oxford Nanopore MinION). NGS data for STRs were concordant with standard reference materials previously characterized with capillary electrophoresis and Sanger sequencing. The computing power of the Cloud was implemented with on-demand auto-scaling to allow multiple file analysis in tandem. The system was designed to store resulting data in a relational database, amenable to downstream sample interpretations and databasing applications following the most recent guidelines in nomenclature for sequenced alleles. Lastly, a multi-layered Cloud security architecture was tested and showed that industry standards for securing data and computing resources were readily applied to the NGS system without disadvantageous effects for bioinformatic analysis, connectivity or data storage/retrieval. The results of this study demonstrate the feasibility of using Cloud-based systems for secured NGS data analysis, storage, databasing, and multi-user distributed connectivity. Copyright © 2017 Elsevier B.V. All rights reserved.

  15. Sample Preservation, DNA or RNA Extraction and Data Analysis for High-Throughput Phytoplankton Community Sequencing

    Directory of Open Access Journals (Sweden)

    Anita Mäki

    2017-09-01

    Full Text Available Phytoplankton is the basis for aquatic food webs and mirrors the water quality. Conventionally, phytoplankton analysis has been done using time consuming and partly subjective microscopic observations, but next generation sequencing (NGS technologies provide promising potential for rapid automated examination of environmental samples. Because many phytoplankton species have tough cell walls, methods for cell lysis and DNA or RNA isolation need to be efficient to allow unbiased nucleic acid retrieval. Here, we analyzed how two phytoplankton preservation methods, three commercial DNA extraction kits and their improvements, three RNA extraction methods, and two data analysis procedures affected the results of the NGS analysis. A mock community was pooled from phytoplankton species with variation in nucleus size and cell wall hardness. Although the study showed potential for studying Lugol-preserved sample collections, it demonstrated critical challenges in the DNA-based phytoplankton analysis in overall. The 18S rRNA gene sequencing output was highly affected by the variation in the rRNA gene copy numbers per cell, while sample preservation and nucleic acid extraction methods formed another source of variation. At the top, sequence-specific variation in the data quality introduced unexpected bioinformatics bias when the sliding-window method was used for the quality trimming of the Ion Torrent data. While DNA-based analyses did not correlate with biomasses or cell numbers of the mock community, rRNA-based analyses were less affected by different RNA extraction procedures and had better match with the biomasses, dry weight and carbon contents, and are therefore recommended for quantitative phytoplankton analyses.

  16. Quantitative insertion-site sequencing (QIseq) for high throughput phenotyping of transposon mutants.

    Science.gov (United States)

    Bronner, Iraad F; Otto, Thomas D; Zhang, Min; Udenze, Kenneth; Wang, Chengqi; Quail, Michael A; Jiang, Rays H Y; Adams, John H; Rayner, Julian C

    2016-07-01

    Genetic screening using random transposon insertions has been a powerful tool for uncovering biology in prokaryotes, where whole-genome saturating screens have been performed in multiple organisms. In eukaryotes, such screens have proven more problematic, in part because of the lack of a sensitive and robust system for identifying transposon insertion sites. We here describe quantitative insertion-site sequencing, or QIseq, which uses custom library preparation and Illumina sequencing technology and is able to identify insertion sites from both the 5' and 3' ends of the transposon, providing an inbuilt level of validation. The approach was developed using piggyBac mutants in the human malaria parasite Plasmodium falciparum but should be applicable to many other eukaryotic genomes. QIseq proved accurate, confirming known sites in >100 mutants, and sensitive, identifying and monitoring sites over a >10,000-fold dynamic range of sequence counts. Applying QIseq to uncloned parasites shortly after transfections revealed multiple insertions in mixed populations and suggests that >4000 independent mutants could be generated from relatively modest scales of transfection, providing a clear pathway to genome-scale screens in P. falciparum QIseq was also used to monitor the growth of pools of previously cloned mutants and reproducibly differentiated between deleterious and neutral mutations in competitive growth. Among the mutants with fitness defects was a mutant with a piggyBac insertion immediately upstream of the kelch protein K13 gene associated with artemisinin resistance, implying mutants in this gene may have competitive fitness costs. QIseq has the potential to enable the scale-up of piggyBac-mediated genetics across multiple eukaryotic systems. © 2016 Bronner et al.; Published by Cold Spring Harbor Laboratory Press.

  17. Improving transcriptome assembly through error correction of high-throughput sequence reads.

    Science.gov (United States)

    Macmanes, Matthew D; Eisen, Michael B

    2013-01-01

    The study of functional genomics, particularly in non-model organisms, has been dramatically improved over the last few years by the use of transcriptomes and RNAseq. While these studies are potentially extremely powerful, a computationally intensive procedure, the de novo construction of a reference transcriptome must be completed as a prerequisite to further analyses. The accurate reference is critically important as all downstream steps, including estimating transcript abundance are critically dependent on the construction of an accurate reference. Though a substantial amount of research has been done on assembly, only recently have the pre-assembly procedures been studied in detail. Specifically, several stand-alone error correction modules have been reported on and, while they have shown to be effective in reducing errors at the level of sequencing reads, how error correction impacts assembly accuracy is largely unknown. Here, we show via use of a simulated and empiric dataset, that applying error correction to sequencing reads has significant positive effects on assembly accuracy, and should be applied to all datasets. A complete collection of commands which will allow for the production of Reptile corrected reads is available at https://github.com/macmanes/error_correction/tree/master/scripts and as File S1.

  18. Improving transcriptome assembly through error correction of high-throughput sequence reads

    Directory of Open Access Journals (Sweden)

    Matthew D. MacManes

    2013-07-01

    Full Text Available The study of functional genomics, particularly in non-model organisms, has been dramatically improved over the last few years by the use of transcriptomes and RNAseq. While these studies are potentially extremely powerful, a computationally intensive procedure, the de novo construction of a reference transcriptome must be completed as a prerequisite to further analyses. The accurate reference is critically important as all downstream steps, including estimating transcript abundance are critically dependent on the construction of an accurate reference. Though a substantial amount of research has been done on assembly, only recently have the pre-assembly procedures been studied in detail. Specifically, several stand-alone error correction modules have been reported on and, while they have shown to be effective in reducing errors at the level of sequencing reads, how error correction impacts assembly accuracy is largely unknown. Here, we show via use of a simulated and empiric dataset, that applying error correction to sequencing reads has significant positive effects on assembly accuracy, and should be applied to all datasets. A complete collection of commands which will allow for the production of Reptile corrected reads is available at https://github.com/macmanes/error_correction/tree/master/scripts and as File S1.

  19. High-throughput sequencing reveals inbreeding depression in a natural population.

    Science.gov (United States)

    Hoffman, Joseph I; Simpson, Fraser; David, Patrice; Rijks, Jolianne M; Kuiken, Thijs; Thorne, Michael A S; Lacy, Robert C; Dasmahapatra, Kanchon K

    2014-03-11

    Proxy measures of genome-wide heterozygosity based on approximately 10 microsatellites have been used to uncover heterozygosity fitness correlations (HFCs) for a wealth of important fitness traits in natural populations. However, effect sizes are typically very small and the underlying mechanisms remain contentious, as a handful of markers usually provides little power to detect inbreeding. We therefore used restriction site associated DNA (RAD) sequencing to accurately estimate genome-wide heterozygosity, an approach transferrable to any organism. As a proof of concept, we first RAD sequenced oldfield mice (Peromyscus polionotus) from a known pedigree, finding strong concordance between the inbreeding coefficient and heterozygosity measured at 13,198 single-nucleotide polymorphisms (SNPs). When applied to a natural population of harbor seals (Phoca vitulina), a weak HFC for parasite infection based on 27 microsatellites strengthened considerably with 14,585 SNPs, the deviance explained by heterozygosity increasing almost fivefold to a remarkable 49%. These findings arguably provide the strongest evidence to date of an HFC being due to inbreeding depression in a natural population lacking a pedigree. They also suggest that under some circumstances heterozygosity may explain far more variation in fitness than previously envisaged.

  20. High-throughput sequencing-based genome-wide identification of microRNAs expressed in developing cotton seeds.

    Science.gov (United States)

    Wang, YanMei; Ding, Yan; Yu, DingWei; Xue, Wei; Liu, JinYuan

    2015-08-01

    MicroRNAs (miRNAs) have been shown to play critical regulatory roles in gene expression in cotton. Although a large number of miRNAs have been identified in cotton fibers, the functions of miRNAs in seed development remain unexplored. In this study, a small RNA library was constructed from cotton seeds sampled at 15 days post-anthesis (DPA) and was subjected to high-throughput sequencing. A total of 95 known miRNAs were detected to be expressed in cotton seeds. The expression pattern of these identified miRNAs was profiled and 48 known miRNAs were differentially expressed between cotton seeds and fibers at 15 DPA. In addition, 23 novel miRNA candidates were identified in 15-DPA seeds. Putative targets for 21 novel and 87 known miRNAs were successfully predicted and 900 expressed sequence tag (EST) sequences were proposed to be candidate target genes, which are involved in various metabolic and biological processes, suggesting a complex regulatory network in developing cotton seeds. Furthermore, miRNA-mediated cleavage of three important transcripts in vivo was validated by RLM-5' RACE. This study is the first to show the regulatory network of miRNAs that are involved in developing cotton seeds and provides a foundation for future studies on the specific functions of these miRNAs in seed development.

  1. Bacterial diversity of the American sand fly Lutzomyia intermedia using high-throughput metagenomic sequencing.

    Science.gov (United States)

    Monteiro, Carolina Cunha; Villegas, Luis Eduardo Martinez; Campolina, Thais Bonifácio; Pires, Ana Clara Machado Araújo; Miranda, Jose Carlos; Pimenta, Paulo Filemon Paolucci; Secundino, Nagila Francinete Costa

    2016-08-31

    Parasites of the genus Leishmania cause a broad spectrum of diseases, collectively known as leishmaniasis, in humans worldwide. American cutaneous leishmaniasis is a neglected disease transmitted by sand fly vectors including Lutzomyia intermedia, a proven vector. The female sand fly can acquire or deliver Leishmania spp. parasites while feeding on a blood meal, which is required for nutrition, egg development and survival. The microbiota composition and abundance varies by food source, life stages and physiological conditions. The sand fly microbiota can affect parasite life-cycle in the vector. We performed a metagenomic analysis for microbiota composition and abundance in Lu. intermedia, from an endemic area in Brazil. The adult insects were collected using CDC light traps, morphologically identified, carefully sterilized, dissected under a microscope and the females separated into groups according to their physiological condition: (i) absence of blood meal (unfed = UN); (ii) presence of blood meal (blood-fed = BF); and (iii) presence of developed ovaries (gravid = GR). Then, they were processed for metagenomics with Illumina Hiseq Sequencing in order to be sequence analyzed and to obtain the taxonomic profiles of the microbiota. Bacterial metagenomic analysis revealed differences in microbiota composition based upon the distinct physiological stages of the adult insect. Sequence identification revealed two phyla (Proteobacteria and Actinobacteria), 11 families and 15 genera; 87 % of the bacteria were Gram-negative, while only one family and two genera were identified as Gram-positive. The genera Ochrobactrum, Bradyrhizobium and Pseudomonas were found across all of the groups. The metagenomic analysis revealed that the microbiota of the Lu. intermedia female sand flies are distinct under specific physiological conditions and consist of 15 bacterial genera. The Ochrobactrum, Bradyrhizobium and Pseudomonas were the common genera. Our results detailing

  2. A novel approach for transcription factor analysis using SELEX with high-throughput sequencing (TFAST.

    Directory of Open Access Journals (Sweden)

    Daniel J Reiss

    Full Text Available BACKGROUND: In previous work, we designed a modified aptamer-free SELEX-seq protocol (afSELEX-seq for the discovery of transcription factor binding sites. Here, we present original software, TFAST, designed to analyze afSELEX-seq data, validated against our previously generated afSELEX-seq dataset and a model dataset. TFAST is designed with a simple graphical interface (Java so that it can be installed and executed without extensive expertise in bioinformatics. TFAST completes analysis within minutes on most personal computers. METHODOLOGY: Once afSELEX-seq data are aligned to a target genome, TFAST identifies peaks and, uniquely, compares peak characteristics between cycles. TFAST generates a hierarchical report of graded peaks, their associated genomic sequences, binding site length predictions, and dummy sequences. PRINCIPAL FINDINGS: Including additional cycles of afSELEX-seq improved TFAST's ability to selectively identify peaks, leading to 7,274, 4,255, and 2,628 peaks identified in two-, three-, and four-cycle afSELEX-seq. Inter-round analysis by TFAST identified 457 peaks as the strongest candidates for true binding sites. Separating peaks by TFAST into classes of worst, second-best and best candidate peaks revealed a trend of increasing significance (e-values 4.5 × 10(12, 2.9 × 10(-46, and 1.2 × 10(-73 and informational content (11.0, 11.9, and 12.5 bits over 15 bp of discovered motifs within each respective class. TFAST also predicted a binding site length (28 bp consistent with non-computational experimentally derived results for the transcription factor PapX (22 to 29 bp. CONCLUSIONS/SIGNIFICANCE: TFAST offers a novel and intuitive approach for determining DNA binding sites of proteins subjected to afSELEX-seq. Here, we demonstrate that TFAST, using afSELEX-seq data, rapidly and accurately predicted sequence length and motif for a putative transcription factor's binding site.

  3. Characterization of limes (Citrus aurantifolia) grown in Bhutan and Indonesia using high-throughput sequencing.

    Science.gov (United States)

    Penjor, Tshering; Mimura, Takashi; Matsumoto, Ryoji; Yamamoto, Masashi; Nagano, Yukio

    2014-04-30

    Lime [Citrus aurantifolia (Cristm.) Swingle] is a Citrus species that is a popular ingredient in many cuisines. Some citrus plants are known to originate in the area ranging from northeastern India to southwestern China. In the current study, we characterized and compared limes grown in Bhutan (n = 5 accessions) and Indonesia (n = 3 accessions). The limes were separated into two groups based on their morphology. Restriction site-associated DNA sequencing (RAD-seq) separated the eight accessions into two clusters. One cluster contained four accessions from Bhutan, whereas the other cluster contained one accession from Bhutan and the three accessions from Indonesia. This genetic classification supported the morphological classification of limes. The analysis suggests that the properties associated with asexual reproduction, and somatic homologous recombination, have contributed to the genetic diversification of limes.

  4. Uncommon nucleotide excision repair phenotypes revealed by targeted high-throughput sequencing.

    Science.gov (United States)

    Calmels, Nadège; Greff, Géraldine; Obringer, Cathy; Kempf, Nadine; Gasnier, Claire; Tarabeux, Julien; Miguet, Marguerite; Baujat, Geneviève; Bessis, Didier; Bretones, Patricia; Cavau, Anne; Digeon, Béatrice; Doco-Fenzy, Martine; Doray, Bérénice; Feillet, François; Gardeazabal, Jesus; Gener, Blanca; Julia, Sophie; Llano-Rivas, Isabel; Mazur, Artur; Michot, Caroline; Renaldo-Robin, Florence; Rossi, Massimiliano; Sabouraud, Pascal; Keren, Boris; Depienne, Christel; Muller, Jean; Mandel, Jean-Louis; Laugel, Vincent

    2016-03-22

    Deficient nucleotide excision repair (NER) activity causes a variety of autosomal recessive diseases including xeroderma pigmentosum (XP) a disorder which pre-disposes to skin cancer, and the severe multisystem condition known as Cockayne syndrome (CS). In view of the clinical overlap between NER-related disorders, as well as the existence of multiple phenotypes and the numerous genes involved, we developed a new diagnostic approach based on the enrichment of 16 NER-related genes by multiplex amplification coupled with next-generation sequencing (NGS). Our test cohort consisted of 11 DNA samples, all with known mutations and/or non pathogenic SNPs in two of the tested genes. We then used the same technique to analyse samples from a prospective cohort of 40 patients. Multiplex amplification and sequencing were performed using AmpliSeq protocol on the Ion Torrent PGM (Life Technologies). We identified causative mutations in 17 out of the 40 patients (43%). Four patients showed biallelic mutations in the ERCC6(CSB) gene, five in the ERCC8(CSA) gene: most of them had classical CS features but some had very mild and incomplete phenotypes. A small cohort of 4 unrelated classic XP patients from the Basque country (Northern Spain) revealed a common splicing mutation in POLH (XP-variant), demonstrating a new founder effect in this population. Interestingly, our results also found ERCC2(XPD), ERCC3(XPB) or ERCC5(XPG) mutations in two cases of UV-sensitive syndrome and in two cases with mixed XP/CS phenotypes. Our study confirms that NGS is an efficient technique for the analysis of NER-related disorders on a molecular level. It is particularly useful for phenotypes with combined features or unusually mild symptoms. Targeted NGS used in conjunction with DNA repair functional tests and precise clinical evaluation permits rapid and cost-effective diagnosis in patients with NER-defects.

  5. Origin, diversity and maturation of human antiviral antibodies analyzed by high-throughput sequencing

    Directory of Open Access Journals (Sweden)

    Ponraj ePrabakaran

    2012-08-01

    Full Text Available Our understanding of how antibodies are generated and function could help develop effective vaccines and antibody-based therapeutics against viruses such as HIV-1, SARS Coronavirus (CoV, and Hendra and Nipah viruses (henipaviruses. Although broadly neutralizing antibodies (bnAbs against the HIV-1 were observed in patients, elicitation of such bnAbs remains a major challenge when compared to other viral targets. We previously hypothesized that HIV-1 could have evolved a strategy to evade the immune system due to absent or very weak binding of germline antibodies to the conserved epitopes that may not be sufficient to initiate and/or maintain an effective immune response. To further explore our hypothesis, we used the 454 sequence analysis of a large naïve library of human IgM antibodies which had been used for selecting antibodies against SARS Coronavirus (CoV receptor-binding domain (RBD, and soluble G proteins (sG of Hendra and Nipah viruses (henipaviruses. We found that the human IgM repertoires from the 454 sequencing have diverse germline usages, recombination patterns, junction diversity and a lower extent of somatic mutation. In this study, we identified germline intermediates of antibodies specific to HIV-1 and other viruses as observed in normal individuals, and compared their genetic diversity and somatic mutation level along with available structural and functional data. Further computational analysis will provide framework for understanding the underlying genetic and molecular determinants related to maturation pathways of antiviral bnAbs that could be useful for applying novel approaches to the design of effective vaccine immunogens and antibody-based therapeutics.

  6. DRUMS: Disk Repository with Update Management and Select option for high throughput sequencing data.

    Science.gov (United States)

    Nettling, Martin; Thieme, Nils; Both, Andreas; Grosse, Ivo

    2014-02-04

    New technologies for analyzing biological samples, like next generation sequencing, are producing a growing amount of data together with quality scores. Moreover, software tools (e.g., for mapping sequence reads), calculating transcription factor binding probabilities, estimating epigenetic modification enriched regions or determining single nucleotide polymorphism increase this amount of position-specific DNA-related data even further. Hence, requesting data becomes challenging and expensive and is often implemented using specialised hardware. In addition, picking specific data as fast as possible becomes increasingly important in many fields of science. The general problem of handling big data sets was addressed by developing specialized databases like HBase, HyperTable or Cassandra. However, these database solutions require also specialized or distributed hardware leading to expensive investments. To the best of our knowledge, there is no database capable of (i) storing billions of position-specific DNA-related records, (ii) performing fast and resource saving requests, and (iii) running on a single standard computer hardware. Here, we present DRUMS (Disk Repository with Update Management and Select option), satisfying demands (i)-(iii). It tackles the weaknesses of traditional databases while handling position-specific DNA-related data in an efficient manner. DRUMS is capable of storing up to billions of records. Moreover, it focuses on optimizing relating single lookups as range request, which are needed permanently for computations in bioinformatics. To validate the power of DRUMS, we compare it to the widely used MySQL database. The test setting considers two biological data sets. We use standard desktop hardware as test environment. DRUMS outperforms MySQL in writing and reading records by a factor of two up to a factor of 10000. Furthermore, it can work with significantly larger data sets. Our work focuses on mid-sized data sets up to several billion

  7. A bioinformatics approach for determining sample identity from different lanes of high-throughput sequencing data.

    Directory of Open Access Journals (Sweden)

    Rachel L Goldfeder

    Full Text Available The ability to generate whole genome data is rapidly becoming commoditized. For example, a mammalian sized genome (∼3Gb can now be sequenced using approximately ten lanes on an Illumina HiSeq 2000. Since lanes from different runs are often combined, verifying that each lane in a genome's build is from the same sample is an important quality control. We sought to address this issue in a post hoc bioinformatic manner, instead of using upstream sample or "barcode" modifications. We rely on the inherent small differences between any two individuals to show that genotype concordance rates can be effectively used to test if any two lanes of HiSeq 2000 data are from the same sample. As proof of principle, we use recent data from three different human samples generated on this platform. We show that the distributions of concordance rates are non-overlapping when comparing lanes from the same sample versus lanes from different samples. Our method proves to be robust even when different numbers of reads are analyzed. Finally, we provide a straightforward method for determining the gender of any given sample. Our results suggest that examining the concordance of detected genotypes from lanes purported to be from the same sample is a relatively simple approach for confirming that combined lanes of data are of the same identity and quality.

  8. A high-throughput sequencing ecotoxicology study of freshwater bacterial communities and their responses to tebuconazole.

    Science.gov (United States)

    Pascault, Noémie; Roux, Simon; Artigas, Joan; Pesce, Stéphane; Leloup, Julie; Tadonleke, Rémy D; Debroas, Didier; Bouchez, Agnès; Humbert, Jean-François

    2014-12-01

    The pollution of lakes and rivers by pesticides is a growing problem worldwide. However, the impacts of these substances on microbial communities are still poorly understood, partly because next-generation sequencing (NGS) has rarely been used in an ecotoxicology context to study bacterial communities despite its interest for accessing rare taxa. Microcosm experiments were carried out to evaluate the effects of tebuconazole (TBZ) on the structure and composition of bacterial communities from two types of freshwater ecosystem (lakes and rivers) with differing histories of pollutant contamination (pristine vs. previously exposed sites). Pyrosequencing revealed that bacterial diversity was higher in the river than in the lakes and in previously exposed sites than in pristine sites. Lakes and river stations shared very few OTUs, and differences at the phylum level were identified between these ecosystems (i.e. the relative importance of Actinobacteria and Gammaproteobacteria). Despite differences between these ecosystems and their contamination history, no significant effect of TBZ on bacterial community structure or composition was observed. Compared to functional parameters that displayed variable responses, we demonstrated that a combination of classical methods and NGS is necessary to investigate the ecotoxicological responses of microbial communities to pollutants. © 2014 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved.

  9. High-Throughput Sequencing of Microbial Community Diversity and Dynamics during Douchi Fermentation

    Science.gov (United States)

    Tu, Zong-cai; Wang, Xiao-lan

    2016-01-01

    Douchi is a type of Chinese traditional fermented food that is an important source of protein and is used in flavouring ingredients. The end product is affected by the microbial community present during fermentation, but exactly how microbes influence the fermentation process remains poorly understood. We used an Illumina MiSeq approach to investigate bacterial and fungal community diversity during both douchi-koji making and fermentation. A total of 181,443 high quality bacterial 16S rRNA sequences and 221,059 high quality fungal internal transcribed spacer reads were used for taxonomic classification, revealing eight bacterial and three fungal phyla. Firmicutes, Actinobacteria and Proteobacteria were the dominant bacterial phyla, while Ascomycota and Zygomycota were the dominant fungal phyla. At the genus level, Staphylococcus and Weissella were the dominant bacteria, while Aspergillus and Lichtheimia were the dominant fungi. Principal coordinate analysis showed structural separation between the composition of bacteria in koji making and fermentation. However, multivariate analysis of variance based on unweighted UniFrac distances did identify distinct differences (p fermentation. This is the first investigation to integrate douchi fermentation and koji making and fermentation processes through this technological approach. The results provide insight into the microbiome of the douchi fermentation process, and reveal a structural separation that may be stratified by the environment during the production of this traditional fermented food. PMID:27992473

  10. Next generation sequencing-based multigene panel for high throughput detection of food-borne pathogens.

    Science.gov (United States)

    Ferrario, Chiara; Lugli, Gabriele Andrea; Ossiprandi, Maria Cristina; Turroni, Francesca; Milani, Christian; Duranti, Sabrina; Mancabelli, Leonardo; Mangifesta, Marta; Alessandri, Giulia; van Sinderen, Douwe; Ventura, Marco

    2017-09-01

    Contamination of food by chemicals or pathogenic bacteria may cause particular illnesses that are linked to food consumption, commonly referred to as foodborne diseases. Bacteria are present in/on various foods products, such as fruits, vegetables and ready-to-eat products. Bacteria that cause foodborne diseases are known as foodborne pathogens (FBPs). Accurate detection methods that are able to reveal the presence of FBPs in food matrices are in constant demand, in order to ensure safe foods with a minimal risk of causing foodborne diseases. Here, a multiplex PCR-based Illumina sequencing method for FBP detection in food matrices was developed. Starting from 25 bacterial targets and 49 selected PCR primer pairs, a primer collection called foodborne pathogen - panel (FPP) consisting of 12 oligonucleotide pairs was developed. The FPP allows a more rapid and reliable identification of FBPs compared to classical cultivation methods. Furthermore, FPP permits sensitive and specific FBP detection in about two days from food sample acquisition to bioinformatics-based identification. The FPP is able to simultaneously identify eight different bacterial pathogens, i.e. Listeria monocytogenes, Campylobacter jejuni, Campylobacter coli, Salmonella enterica subsp. enterica serovar enteritidis, Escherichia coli, Shigella sonnei, Staphylococcus aureus and Yersinia enterocolitica, in a given food matrix at a threshold contamination level of 10 1 cell/g. Moreover, this novel detection method may represent an alternative and/or a complementary approach to PCR-based techniques, which are routinely used for FBP detection, and could be implemented in (parts of) the food chain as a quality check. Copyright © 2017 Elsevier B.V. All rights reserved.

  11. Neuropeptidergic Signaling in the American Lobster Homarus americanus: New Insights from High-Throughput Nucleotide Sequencing.

    Science.gov (United States)

    Christie, Andrew E; Chi, Megan; Lameyer, Tess J; Pascual, Micah G; Shea, Devlin N; Stanhope, Meredith E; Schulz, David J; Dickinson, Patsy S

    2015-01-01

    Peptides are the largest and most diverse class of molecules used for neurochemical communication, playing key roles in the control of essentially all aspects of physiology and behavior. The American lobster, Homarus americanus, is a crustacean of commercial and biomedical importance; lobster growth and reproduction are under neuropeptidergic control, and portions of the lobster nervous system serve as models for understanding the general principles underlying rhythmic motor behavior (including peptidergic neuromodulation). While a number of neuropeptides have been identified from H. americanus, and the effects of some have been investigated at the cellular/systems levels, little is currently known about the molecular components of neuropeptidergic signaling in the lobster. Here, a H. americanus neural transcriptome was generated and mined for sequences encoding putative peptide precursors and receptors; 35 precursor- and 41 receptor-encoding transcripts were identified. We predicted 194 distinct neuropeptides from the deduced precursor proteins, including members of the adipokinetic hormone-corazonin-like peptide, allatostatin A, allatostatin C, bursicon, CCHamide, corazonin, crustacean cardioactive peptide, crustacean hyperglycemic hormone (CHH), CHH precursor-related peptide, diuretic hormone 31, diuretic hormone 44, eclosion hormone, FLRFamide, GSEFLamide, insulin-like peptide, intocin, leucokinin, myosuppressin, neuroparsin, neuropeptide F, orcokinin, pigment dispersing hormone, proctolin, pyrokinin, SIFamide, sulfakinin and tachykinin-related peptide families. While some of the predicted peptides are known H. americanus isoforms, most are novel identifications, more than doubling the extant lobster neuropeptidome. The deduced receptor proteins are the first descriptions of H. americanus neuropeptide receptors, and include ones for most of the peptide groups mentioned earlier, as well as those for ecdysis-triggering hormone, red pigment concentrating hormone

  12. High Throughput Sample Preparation and Analysis for DNA Sequencing, PCR and Combinatorial Screening of Catalysis Based on Capillary Array Technique

    Energy Technology Data Exchange (ETDEWEB)

    Zhang, Yonghua [Iowa State Univ., Ames, IA (United States)

    2000-01-01

    Sample preparation has been one of the major bottlenecks for many high throughput analyses. The purpose of this research was to develop new sample preparation and integration approach for DNA sequencing, PCR based DNA analysis and combinatorial screening of homogeneous catalysis based on multiplexed capillary electrophoresis with laser induced fluorescence or imaging UV absorption detection. The author first introduced a method to integrate the front-end tasks to DNA capillary-array sequencers. protocols for directly sequencing the plasmids from a single bacterial colony in fused-silica capillaries were developed. After the colony was picked, lysis was accomplished in situ in the plastic sample tube using either a thermocycler or heating block. Upon heating, the plasmids were released while chromsomal DNA and membrane proteins were denatured and precipitated to the bottom of the tube. After adding enzyme and Sanger reagents, the resulting solution was aspirated into the reaction capillaries by a syringe pump, and cycle sequencing was initiated. No deleterious effect upon the reaction efficiency, the on-line purification system, or the capillary electrophoresis separation was observed, even though the crude lysate was used as the template. Multiplexed on-line DNA sequencing data from 8 parallel channels allowed base calling up to 620 bp with an accuracy of 98%. The entire system can be automatically regenerated for repeated operation. For PCR based DNA analysis, they demonstrated that capillary electrophoresis with UV detection can be used for DNA analysis starting from clinical sample without purification. After PCR reaction using cheek cell, blood or HIV-1 gag DNA, the reaction mixtures was injected into the capillary either on-line or off-line by base stacking. The protocol was also applied to capillary array electrophoresis. The use of cheaper detection, and the elimination of purification of DNA sample before or after PCR reaction, will make this approach an

  13. Coupled high-throughput functional screening and next generation sequencing for identification of plant polymer decomposing enzymes in metagenomic libraries

    Directory of Open Access Journals (Sweden)

    Mari eNyyssönen

    2013-09-01

    Full Text Available Recent advances in sequencing technologies generate new predictions and hypotheses about the functional roles of environmental microorganisms. Yet, until we can test these predictions at a scale that matches our ability to generate them, most of them will remain as hypotheses. Function-based mining of metagenomic libraries can provide direct linkages between genes, metabolic traits and microbial taxa and thus bridge this gap between sequence data generation and functional predictions. Here we developed high-throughput screening assays for function-based characterization of activities involved in plant polymer decomposition from environmental metagenomic libraries. The multiplexed assays use fluorogenic and chromogenic substrates, combine automated liquid handling and use a genetically modified expression host to enable simultaneous screening of 12,160 clones for 14 activities in a total of 170,240 reactions. Using this platform we identified 374 (0.26 % cellulose, hemicellulose, chitin, starch, phosphate and protein hydrolyzing clones from fosmid libraries prepared from decomposing leaf litter. Sequencing on the Illumina MiSeq platform, followed by assembly and gene prediction of a subset of 95 fosmid clones, identified a broad range of bacterial phyla, including Actinobacteria, Bacteroidetes, multiple Proteobacteria sub-phyla in addition to some Fungi. Carbohydrate-active enzyme genes from 20 different glycoside hydrolase families were detected. Using tetranucleotide frequency binning of fosmid sequences, multiple enzyme activities from distinct fosmids were linked, demonstrating how biochemically-confirmed functional traits in environmental metagenomes may be attributed to groups of specific organisms. Overall, our results demonstrate how functional screening of metagenomic libraries can be used to connect microbial functionality to community composition and, as a result, complement large-scale metagenomic sequencing efforts.

  14. Transcriptomics of in vitro immune-stimulated hemocytes from the Manila clam Ruditapes philippinarum using high-throughput sequencing.

    Directory of Open Access Journals (Sweden)

    Rebeca Moreira

    Full Text Available BACKGROUND: The Manila clam (Ruditapes philippinarum is a worldwide cultured bivalve species with important commercial value. Diseases affecting this species can result in large economic losses. Because knowledge of the molecular mechanisms of the immune response in bivalves, especially clams, is scarce and fragmentary, we sequenced RNA from immune-stimulated R. philippinarum hemocytes by 454-pyrosequencing to identify genes involved in their immune defense against infectious diseases. METHODOLOGY AND PRINCIPAL FINDINGS: High-throughput deep sequencing of R. philippinarum using 454 pyrosequencing technology yielded 974,976 high-quality reads with an average read length of 250 bp. The reads were assembled into 51,265 contigs and the 44.7% of the translated nucleotide sequences into protein were annotated successfully. The 35 most frequently found contigs included a large number of immune-related genes, and a more detailed analysis showed the presence of putative members of several immune pathways and processes like the apoptosis, the toll like signaling pathway and the complement cascade. We have found sequences from molecules never described in bivalves before, especially in the complement pathway where almost all the components are present. CONCLUSIONS: This study represents the first transcriptome analysis using 454-pyrosequencing conducted on R. philippinarum focused on its immune system. Our results will provide a rich source of data to discover and identify new genes, which will serve as a basis for microarray construction and the study of gene expression as well as for the identification of genetic markers. The discovery of new immune sequences was very productive and resulted in a large variety of contigs that may play a role in the defense mechanisms of Ruditapes philippinarum.

  15. Genome-wide analysis of microRNAs in rubber tree (Hevea brasiliensis L.) using high-throughput sequencing.

    Science.gov (United States)

    Lertpanyasampatha, Manassawe; Gao, Lei; Kongsawadworakul, Panida; Viboonjun, Unchera; Chrestin, Hervé; Liu, Renyi; Chen, Xuemei; Narangajavana, Jarunya

    2012-08-01

    MicroRNAs (miRNAs) are short RNAs with essential roles in gene regulation in various organisms including higher plants. In contrast to the vast information on miRNAs from many economically important plants, almost nothing has been reported on the identification or analysis of miRNAs from rubber tree (Hevea brasiliensis L.), the most important natural rubber-producing crop. To identify miRNAs and their target genes in rubber tree, high-throughput sequencing combined with a computational approach was performed. Four small RNA libraries were constructed for deep sequencing from mature and young leaves of two rubber tree clones, PB 260 and PB 217, which provide high and low latex yield, respectively. 115 miRNAs belonging to 56 known miRNA families were identified, and northern hybridization validated miRNA expression and revealed developmental stage-dependent and clone-specific expression for some miRNAs. We took advantage of the newly released rubber tree genome assembly and predicted 20 novel miRNAs. Further, computational analysis uncovered potential targets of the known and novel miRNAs. Predicted target genes included not only transcription factors but also genes involved in various biological processes including stress responses, primary and secondary metabolism, and signal transduction. In particular, genes with roles in rubber biosynthesis are predicted targets of miRNAs. This study provides a basic catalog of miRNAs and their targets in rubber tree to facilitate future improvement and exploitation of rubber tree.

  16. High-throughput sequencing reveals extensive variation in human-specific L1 content in individual human genomes.

    Science.gov (United States)

    Ewing, Adam D; Kazazian, Haig H

    2010-09-01

    Using high-throughput sequencing, we devised a technique to determine the insertion sites of virtually all members of the human-specific L1 retrotransposon family in any human genome. Using diagnostic nucleotides, we were able to locate the approximately 800 L1Hs copies corresponding specifically to the pre-Ta, Ta-0, and Ta-1 L1Hs subfamilies, with over 90% of sequenced reads corresponding to human-specific elements. We find that any two individual genomes differ at an average of 285 sites with respect to L1 insertion presence or absence. In total, we assayed 25 individuals, 15 of which are unrelated, at 1139 sites, including 772 shared with the reference genome and 367 nonreference L1 insertions. We show that L1Hs profiles recapitulate genetic ancestry, and determine the chromosomal distribution of these elements. Using these data, we estimate that the rate of L1 retrotransposition in humans is between 1/95 and 1/270 births, and the number of dimorphic L1 elements in the human population with gene frequencies greater than 0.05 is between 3000 and 10,000.

  17. High throughput sequencing reveals the diversity of TRB-CDR3 repertoire in patients with psoriasis vulgaris.

    Science.gov (United States)

    Cao, Xiaofang; Wa, Qingbiao; Wang, Qidi; Li, Lin; Liu, Xin; An, Lisha; Cai, Ruikun; Du, Meng; Qiu, Yue; Han, Jian; Wang, Chunlin; Wang, Xingyu; Guo, Changlong; Lu, Yonghong; Ma, Xu

    2016-11-01

    Psoriasis is a T cell-mediated chronic inflammatory skin disease with inflammatory cell infiltrates in the dermis and epidermis. Previous studies suggested that there are some expanded T-cell receptor (TCR) clones in psoriatic skin. However, the effect of psoriasis on the immunological characteristics of TCR in circulating blood has not been reported. To address this, we performed high-throughput sequencing to reveal the immunological characteristics of TCR beta chain (TRB) in both psoriasis patients and healthy controls. Our results revealed that the TRB-CDR3 region of psoriasis patients had distinctive immunological characteristics compared with that of healthy controls, including V gene usage, nt of N addition. In addition, three types of TRB-CDR3 peptides were found highly relevant to psoriasis. Our findings show the comprehensive characteristics of psoriasis on the TRB-CDR3 repertoire of circulating blood at sequence-level resolution. These findings may contribute to a better understanding of the pathogenesis of psoriasis and open opportunities to explore potential therapeutic targets. Copyright © 2016 Elsevier B.V. All rights reserved.

  18. High-Throughput Sequencing of Viable Microbial Communities in Raw Pork Subjected to a Fast Cooling Process.

    Science.gov (United States)

    Yang, Chao; Che, You; Qi, Yan; Liang, Peixin; Song, Cunjiang

    2017-01-01

    This study aimed to investigate the effect of the fast cooling process on the microbiological community in chilled fresh pork during storage. We established a culture-independent method to study viable microbes in raw pork. Tray-packaged fresh pork and chilled fresh pork were completely spoiled after 18 and 49 d in aseptic bags at 4 °C, respectively. 16S/18S ribosomal RNAs were reverse transcribed to cDNA to characterize the activity of viable bacteria/fungi in the 2 types of pork. Both cDNA and total DNA were analyzed by high-throughput sequencing, which revealed that viable Bacteroides sp. were the most active genus in rotten pork, although viable Myroides sp. and Pseudomonas sp. were also active. Moreover, viable fungi were only detected in chilled fresh pork. The sequencing results revealed that the fast cooling process could suppress the growth of microbes present initially in the raw meat to extend its shelf life. Our results also suggested that fungi associated with pork spoilage could not grow well in aseptic tray-packaged conditions. © 2016 Institute of Food Technologists®.

  19. ImmuneDB: a system for the analysis and exploration of high-throughput adaptive immune receptor sequencing data.

    Science.gov (United States)

    Rosenfeld, Aaron M; Meng, Wenzhao; Luning Prak, Eline T; Hershberg, Uri

    2017-01-15

    As high-throughput sequencing of B cells becomes more common, the need for tools to analyze the large quantity of data also increases. This article introduces ImmuneDB, a system for analyzing vast amounts of heavy chain variable region sequences and exploring the resulting data. It can take as input raw FASTA/FASTQ data, identify genes, determine clones, construct lineages, as well as provide information such as selection pressure and mutation analysis. It uses an industry leading database, MySQL, to provide fast analysis and avoid the complexities of using error prone flat-files. ImmuneDB is freely available at http://immunedb.comA demo of the ImmuneDB web interface is available at: http://immunedb.com/demo CONTACT: Uh25@drexel.eduSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  20. Discovery of J Chain in African Lungfish (Protopterus dolloi, Sarcopterygii) Using High Throughput Transcriptome Sequencing: Implications in Mucosal Immunity

    Science.gov (United States)

    Tacchi, Luca; Larragoite, Erin; Salinas, Irene

    2013-01-01

    J chain is a small polypeptide responsible for immunoglobulin (Ig) polymerization and transport of Igs across mucosal surfaces in higher vertebrates. We identified a J chain in dipnoid fish, the African lungfish (Protopterus dolloi) by high throughput sequencing of the transcriptome. P. dolloi J chain is 161 aa long and contains six of the eight Cys residues present in mammalian J chain. Phylogenetic studies place the lungfish J chain closer to tetrapod J chain than to the coelacanth or nurse shark sequences. J chain expression occurs in all P. dolloi immune tissues examined and it increases in the gut and kidney in response to an experimental bacterial infection. Double fluorescent in-situ hybridization shows that 88.5% of IgM+ cells in the gut co-express J chain, a significantly higher percentage than in the pre-pyloric spleen. Importantly, J chain expression is not restricted to the B-cell compartment since gut epithelial cells also express J chain. These results improve our current view of J chain from a phylogenetic perspective. PMID:23967082

  1. Soil DNA metabarcoding and high-throughput sequencing as a forensic tool: considerations, potential limitations and recommendations.

    Science.gov (United States)

    Young, J M; Austin, J J; Weyrich, L S

    2017-02-01

    Analysis of physical evidence is typically a deciding factor in forensic casework by establishing what transpired at a scene or who was involved. Forensic geoscience is an emerging multi-disciplinary science that can offer significant benefits to forensic investigations. Soil is a powerful, nearly 'ideal' contact trace evidence, as it is highly individualistic, easy to characterise, has a high transfer and retention probability, and is often overlooked in attempts to conceal evidence. However, many real-life cases encounter close proximity soil samples or soils with low inorganic content, which cannot be easily discriminated based on current physical and chemical analysis techniques. The capability to improve forensic soil discrimination, and identify key indicator taxa from soil using the organic fraction is currently lacking. The development of new DNA sequencing technologies offers the ability to generate detailed genetic profiles from soils and enhance current forensic soil analyses. Here, we discuss the use of DNA metabarcoding combined with high-throughput sequencing (HTS) technology to distinguish between soils from different locations in a forensic context. Specifically, we provide recommendations for best practice, outline the potential limitations encountered in a forensic context and describe the future directions required to integrate soil DNA analysis into casework. © FEMS 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  2. VarSim: a high-fidelity simulation and validation framework for high-throughput genome sequencing with cancer applications.

    Science.gov (United States)

    Mu, John C; Mohiyuddin, Marghoob; Li, Jian; Bani Asadi, Narges; Gerstein, Mark B; Abyzov, Alexej; Wong, Wing H; Lam, Hugo Y K

    2015-05-01

    VarSim is a framework for assessing alignment and variant calling accuracy in high-throughput genome sequencing through simulation or real data. In contrast to simulating a random mutation spectrum, it synthesizes diploid genomes with germline and somatic mutations based on a realistic model. This model leverages information such as previously reported mutations to make the synthetic genomes biologically relevant. VarSim simulates and validates a wide range of variants, including single nucleotide variants, small indels and large structural variants. It is an automated, comprehensive compute framework supporting parallel computation and multiple read simulators. Furthermore, we developed a novel map data structure to validate read alignments, a strategy to compare variants binned in size ranges and a lightweight, interactive, graphical report to visualize validation results with detailed statistics. Thus far, it is the most comprehensive validation tool for secondary analysis in next generation sequencing. Code in Java and Python along with instructions to download the reads and variants is at http://bioinform.github.io/varsim. rd@bina.com Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.

  3. MetaGenSense: A web-application for analysis and exploration of high throughput sequencing metagenomic data.

    Science.gov (United States)

    Correia, Damien; Doppelt-Azeroual, Olivia; Denis, Jean-Baptiste; Vandenbogaert, Mathias; Caro, Valérie

    2015-01-01

    The detection and characterization of emerging infectious agents has been a continuing public health concern. High Throughput Sequencing (HTS) or Next-Generation Sequencing (NGS) technologies have proven to be promising approaches for efficient and unbiased detection of pathogens in complex biological samples, providing access to comprehensive analyses. As NGS approaches typically yield millions of putatively representative reads per sample, efficient data management and visualization resources have become mandatory. Most usually, those resources are implemented through a dedicated Laboratory Information Management System (LIMS), solely to provide perspective regarding the available information. We developed an easily deployable web-interface, facilitating management and bioinformatics analysis of metagenomics data-samples. It was engineered to run associated and dedicated Galaxy workflows for the detection and eventually classification of pathogens. The web application allows easy interaction with existing Galaxy metagenomic workflows, facilitates the organization, exploration and aggregation of the most relevant sample-specific sequences among millions of genomic sequences, allowing them to determine their relative abundance, and associate them to the most closely related organism or pathogen. The user-friendly Django-Based interface, associates the users' input data and its metadata through a bio-IT provided set of resources (a Galaxy instance, and both sufficient storage and grid computing power). Galaxy is used to handle and analyze the user's input data from loading, indexing, mapping, assembly and DB-searches. Interaction between our application and Galaxy is ensured by the BioBlend library, which gives API-based access to Galaxy's main features. Metadata about samples, runs, as well as the workflow results are stored in the LIMS. For metagenomic classification and exploration purposes, we show, as a proof of concept, that integration of intuitive

  4. PathOS: a decision support system for reporting high throughput sequencing of cancers in clinical diagnostic laboratories.

    Science.gov (United States)

    Doig, Kenneth D; Fellowes, Andrew; Bell, Anthony H; Seleznev, Andrei; Ma, David; Ellul, Jason; Li, Jason; Doyle, Maria A; Thompson, Ella R; Kumar, Amit; Lara, Luis; Vedururu, Ravikiran; Reid, Gareth; Conway, Thomas; Papenfuss, Anthony T; Fox, Stephen B

    2017-04-24

    The increasing affordability of DNA sequencing has allowed it to be widely deployed in pathology laboratories. However, this has exposed many issues with the analysis and reporting of variants for clinical diagnostic use. Implementing a high-throughput sequencing (NGS) clinical reporting system requires a diverse combination of capabilities, statistical methods to identify variants, global variant databases, a validated bioinformatics pipeline, an auditable laboratory workflow, reproducible clinical assays and quality control monitoring throughout. These capabilities must be packaged in software that integrates the disparate components into a useable system. To meet these needs, we developed a web-based application, PathOS, which takes variant data from a patient sample through to a clinical report. PathOS has been used operationally in the Peter MacCallum Cancer Centre for two years for the analysis, curation and reporting of genetic tests for cancer patients, as well as the curation of large-scale research studies. PathOS has also been deployed in cloud environments allowing multiple institutions to use separate, secure and customisable instances of the system. Increasingly, the bottleneck of variant curation is limiting the adoption of clinical sequencing for molecular diagnostics. PathOS is focused on providing clinical variant curators and pathology laboratories with a decision support system needed for personalised medicine. While the genesis of PathOS has been within cancer molecular diagnostics, the system is applicable to NGS clinical reporting generally. The widespread availability of genomic sequencers has highlighted the limited availability of software to support clinical decision-making in molecular pathology. PathOS is a system that has been developed and refined in a hospital laboratory context to meet the needs of clinical diagnostics. The software is available as a set of Docker images and source code at https://github.com/PapenfussLab/PathOS .

  5. Bacterial metabarcoding by 16S rRNA gene ion torrent amplicon sequencing.

    Science.gov (United States)

    Fantini, Elio; Gianese, Giulio; Giuliano, Giovanni; Fiore, Alessia

    2015-01-01

    Ion Torrent is a next generation sequencing technology based on the detection of hydrogen ions produced during DNA chain elongation; this technology allows analyzing and characterizing genomes, genes, and species. Here, we describe an Ion Torrent procedure applied to the metagenomic analysis of 16S rRNA gene amplicons to study the bacterial diversity in food and environmental samples.

  6. The pig gut microbial diversity: Understanding the pig gut microbial ecology through the next generation high throughput sequencing.

    Science.gov (United States)

    Kim, Hyeun Bum; Isaacson, Richard E

    2015-06-12

    The importance of the gut microbiota of animals is widely acknowledged because of its pivotal roles in the health and well being of animals. The genetic diversity of the gut microbiota contributes to the overall development and metabolic needs of the animal, and provides the host with many beneficial functions including production of volatile fatty acids, re-cycling of bile salts, production of vitamin K, cellulose digestion, and development of immune system. Thus the intestinal microbiota of animals has been the subject of study for many decades. Although most of the older studies have used culture dependent methods, the recent advent of high throughput sequencing of 16S rRNA genes has facilitated in depth studies exploring microbial populations and their dynamics in the animal gut. These culture independent DNA based studies generate large amounts of data and as a result contribute to a more detailed understanding of the microbiota dynamics in the gut and the ecology of the microbial populations. Of equal importance, is being able to identify and quantify microbes that are difficult to grow or that have not been grown in the laboratory. Interpreting the data obtained from this type of study requires using basic principles of microbial diversity to understand importance of the composition of microbial populations. In this review, we summarize the literature on culture independent studies of the pig gut microbiota with an emphasis on its succession and alterations caused by diverse factors. Copyright © 2015 Elsevier B.V. All rights reserved.

  7. Distribution and Diversity of Bacteria and Fungi Colonization in Stone Monuments Analyzed by High-Throughput Sequencing

    Science.gov (United States)

    Li, Qiang; Zhang, Bingjian; He, Zhang; Yang, Xiaoru

    2016-01-01

    The historical and cultural heritage of Qingxing palace and Lingyin and Kaihua temple, located in Hangzhou of China, include a large number of exquisite Buddhist statues and ancient stone sculptures which date back to the Northern Song (960–1219 A.D.) and Qing dynasties (1636–1912 A.D.) and are considered to be some of the best examples of ancient stone sculpting techniques. They were added to the World Heritage List in 2011 because of their unique craftsmanship and importance to the study of ancient Chinese Buddhist culture. However, biodeterioration of the surface of the ancient Buddhist statues and white marble pillars not only severely impairs their aesthetic value but also alters their material structure and thermo-hygric properties. In this study, high-throughput sequencing was utilized to identify the microbial communities colonizing the stone monuments. The diversity and distribution of the microbial communities in six samples collected from three different environmental conditions with signs of deterioration were analyzed by means of bioinformatics software and diversity indices. In addition, the impact of environmental factors, including temperature, light intensity, air humidity, and the concentration of NO2 and SO2, on the microbial communities’ diversity and distribution was evaluated. The results indicate that the presence of predominantly phototrophic microorganisms was correlated with light and humidity, while nitrifying bacteria and Thiobacillus were associated with NO2 and SO2 from air pollution. PMID:27658256

  8. Distribution and Diversity of Bacteria and Fungi Colonization in Stone Monuments Analyzed by High-Throughput Sequencing.

    Science.gov (United States)

    Li, Qiang; Zhang, Bingjian; He, Zhang; Yang, Xiaoru

    The historical and cultural heritage of Qingxing palace and Lingyin and Kaihua temple, located in Hangzhou of China, include a large number of exquisite Buddhist statues and ancient stone sculptures which date back to the Northern Song (960-1219 A.D.) and Qing dynasties (1636-1912 A.D.) and are considered to be some of the best examples of ancient stone sculpting techniques. They were added to the World Heritage List in 2011 because of their unique craftsmanship and importance to the study of ancient Chinese Buddhist culture. However, biodeterioration of the surface of the ancient Buddhist statues and white marble pillars not only severely impairs their aesthetic value but also alters their material structure and thermo-hygric properties. In this study, high-throughput sequencing was utilized to identify the microbial communities colonizing the stone monuments. The diversity and distribution of the microbial communities in six samples collected from three different environmental conditions with signs of deterioration were analyzed by means of bioinformatics software and diversity indices. In addition, the impact of environmental factors, including temperature, light intensity, air humidity, and the concentration of NO2 and SO2, on the microbial communities' diversity and distribution was evaluated. The results indicate that the presence of predominantly phototrophic microorganisms was correlated with light and humidity, while nitrifying bacteria and Thiobacillus were associated with NO2 and SO2 from air pollution.

  9. Distribution and Diversity of Bacteria and Fungi Colonization in Stone Monuments Analyzed by High-Throughput Sequencing.

    Directory of Open Access Journals (Sweden)

    Qiang Li

    Full Text Available The historical and cultural heritage of Qingxing palace and Lingyin and Kaihua temple, located in Hangzhou of China, include a large number of exquisite Buddhist statues and ancient stone sculptures which date back to the Northern Song (960-1219 A.D. and Qing dynasties (1636-1912 A.D. and are considered to be some of the best examples of ancient stone sculpting techniques. They were added to the World Heritage List in 2011 because of their unique craftsmanship and importance to the study of ancient Chinese Buddhist culture. However, biodeterioration of the surface of the ancient Buddhist statues and white marble pillars not only severely impairs their aesthetic value but also alters their material structure and thermo-hygric properties. In this study, high-throughput sequencing was utilized to identify the microbial communities colonizing the stone monuments. The diversity and distribution of the microbial communities in six samples collected from three different environmental conditions with signs of deterioration were analyzed by means of bioinformatics software and diversity indices. In addition, the impact of environmental factors, including temperature, light intensity, air humidity, and the concentration of NO2 and SO2, on the microbial communities' diversity and distribution was evaluated. The results indicate that the presence of predominantly phototrophic microorganisms was correlated with light and humidity, while nitrifying bacteria and Thiobacillus were associated with NO2 and SO2 from air pollution.

  10. Selection of DNA Aptamers for Ovarian Cancer Biomarker CA125 Using One-Pot SELEX and High-Throughput Sequencing

    Directory of Open Access Journals (Sweden)

    Delia J. Scoville

    2017-01-01

    Full Text Available CA125 is a mucin glycoprotein whose concentration in serum correlates with a woman’s risk of developing ovarian cancer and also indicates response to therapy in diagnosed patients. Accurate detection of this large, complex protein in patient samples is of great clinical relevance. We suggest that powerful new diagnostic tools may be enabled by the development of nucleic acid aptamers with affinity for CA125. Here, we report on our use of One-Pot SELEX to isolate single-stranded DNA aptamers with affinity for CA125, followed by high-throughput sequencing of the selected oligonucleotides. This data-rich approach, combined with bioinformatics tools, enabled the entire selection process to be characterized. Using fluorescence anisotropy and affinity probe capillary electrophoresis, the binding affinities of four aptamer candidates were evaluated. Two aptamers, CA125_1 and CA125_12, both without primers, were found to bind to clinically relevant concentrations of the protein target. Binding was differently influenced by the presence of Mg2+ ions, being required for binding of CA125_1 and abrogating binding of CA125_12. In conclusion, One-Pot SELEX was found to be a promising selection method that yielded DNA aptamers to a clinically important protein target.

  11. The First Report of miRNAs from a Thysanopteran Insect, Thrips palmi Karny Using High-Throughput Sequencing.

    Science.gov (United States)

    Rebijith, K B; Asokan, R; Hande, H Ranjitha; Krishna Kumar, N K

    Thrips palmi Karny (Thysanoptera: Thripidae) is the sole vector of Watermelon bud necrosis tospovirus, where the crop loss has been estimated to be around USD 50 million annually. Chemical insecticides are of limited use in the management of T. palmi due to the thigmokinetic behaviour and development of high levels of resistance to insecticides. There is an urgent need to find out an effective futuristic management strategy, where the small RNAs especially microRNAs hold great promise as a key player in the growth and development. miRNAs are a class of short non-coding RNAs involved in regulation of gene expression either by mRNA cleavage or by translational repression. We identified and characterized a total of 77 miRNAs from T. palmi using high-throughput deep sequencing. Functional classifications of the targets for these miRNAs revealed that majority of them are involved in the regulation of transcription and translation, nucleotide binding and signal transduction. We have also validated few of these miRNAs employing stem-loop RT-PCR, qRT-PCR and Northern blot. The present study not only provides an in-depth understanding of the biological and physiological roles of miRNAs in governing gene expression but may also lead as an invaluable tool for the management of thysanopteran insects in the future.

  12. Apple ring rot-responsive putative microRNAs revealed by high-throughput sequencing in Malus × domestica Borkh.

    Science.gov (United States)

    Yu, Xin-Yi; Du, Bei-Bei; Gao, Zhi-Hong; Zhang, Shi-Jie; Tu, Xu-Tong; Chen, Xiao-Yun; Zhang, Zhen; Qu, Shen-Chun

    2014-08-01

    MicroRNAs (miRNAs) are small non-coding RNAs, which silence target mRNA via cleavage or translational inhibition to function in regulating gene expression. MiRNAs act as important regulators of plant development and stress response. For understanding the role of miRNAs responsive to apple ring rot stress, we identified disease-responsive miRNAs using high-throughput sequencing in Malus × domestica Borkh.. Four small RNA libraries were constructed from two control strains in M. domestica, crabapple (CKHu) and Fuji Naga-fu No. 6 (CKFu), and two disease stress strains, crabapple (DSHu) and Fuji Naga-fu No. 6 (DSFu). A total of 59 miRNA families were identified and five miRNAs might be responsive to apple ring rot infection and validated via qRT-PCR. Furthermore, we predicted 76 target genes which were regulated by conserved miRNAs potentially. Our study demonstrated that miRNAs was responsive to apple ring rot infection and may have important implications on apple disease resistance.

  13. Genome-wide identification of bone metastasis-related microRNAs in lung adenocarcinoma by high-throughput sequencing.

    Directory of Open Access Journals (Sweden)

    Lin Xie

    Full Text Available BACKGROUND: MicroRNAs (miRNAs are a class of small noncoding RNAs that regulate gene expression at the post-transcriptional level. They participate in a wide variety of biological processes, including apoptosis, proliferation and metastasis. The aberrant expression of miRNAs has been found to play an important role in many cancers. RESULTS: To understand the roles of miRNAs in the bone metastasis of lung adenocarcinoma, we constructed two small RNA libraries from blood of lung adenocarcinoma patients with and without bone metastasis. High-throughput sequencing combined with differential expression analysis identified that 7 microRNAs were down-regulated and 21 microRNAs were up-regulated in lung adenocarcinoma with bone metastasis. A total of 797 target genes of the differentially expressed microRNAs were identified using a bioinformatics approach. Functional annotation analysis indicated that a number of pathways might be involved in bone metastasis, survival of the primary origin and metastatic angiogenesis of lung adenocarcinoma. These include the MAPK, Wnt, and NF-kappaB signaling pathways, as well as pathways involving the matrix metalloproteinase, cytoskeletal protein and angiogenesis factors. CONCLUSIONS: This study provides some insights into the molecular mechanisms that underlie lung adenocarcinoma development, thereby aiding the diagnosis and treatment of the disease.

  14. Taxonomic analysis of the microbial community in stored sugar beets using high-throughput sequencing of different marker genes.

    Science.gov (United States)

    Liebe, Sebastian; Wibberg, Daniel; Winkler, Anika; Pühler, Alfred; Schlüter, Andreas; Varrelmann, Mark

    2016-02-01

    Post-harvest colonization of sugar beets accompanied by rot development is a serious problem due to sugar losses and negative impact on processing quality. Studies on the microbial community associated with rot development and factors shaping their structure are missing. Therefore, high-throughput sequencing was applied to describe the influence of environment, plant genotype and storage temperature (8°C and 20°C) on three different communities in stored sugar beets, namely fungi (internal transcribed spacers 1 and 2), Fusarium spp. (elongation factor-1α gene fragment) and oomycetes (internal transcribed spacers 1). The composition of the fungal community changed during storage mostly influenced by the storage temperature followed by a weak environmental effect. Botrytis cinerea was the prevalent species at 8°C whereas members of the fungal genera Fusarium and Penicillium became dominant at 20°C. This shift was independent of the plant genotype. Species richness within the genus Fusarium also increased during storage at both temperatures whereas the oomycetes community did not change. Moreover, oomycetes species were absent after storage at 20°C. The results of the present study clearly show that rot development during sugar beet storage is associated with pathogens well known as causal agents of post-harvest diseases in many other crops. © FEMS 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  15. HTSSIP: An R package for analysis of high throughput sequencing data from nucleic acid stable isotope probing (SIP experiments.

    Directory of Open Access Journals (Sweden)

    Nicholas D Youngblut

    Full Text Available Combining high throughput sequencing with stable isotope probing (HTS-SIP is a powerful method for mapping in situ metabolic processes to thousands of microbial taxa. However, accurately mapping metabolic processes to taxa is complex and challenging. Multiple HTS-SIP data analysis methods have been developed, including high-resolution stable isotope probing (HR-SIP, multi-window high-resolution stable isotope probing (MW-HR-SIP, quantitative stable isotope probing (qSIP, and ΔBD. Currently, there is no publicly available software designed specifically for analyzing HTS-SIP data. To address this shortfall, we have developed the HTSSIP R package, an open-source, cross-platform toolset for conducting HTS-SIP analyses in a straightforward and easily reproducible manner. The HTSSIP package, along with full documentation and examples, is available from CRAN at https://cran.r-project.org/web/packages/HTSSIP/index.html and Github at https://github.com/buckleylab/HTSSIP.

  16. A High-Throughput DNA-Sequencing Approach for Determining Sources of Fecal Bacteria in a Lake Superior Estuary.

    Science.gov (United States)

    Brown, Clairessa M; Staley, Christopher; Wang, Ping; Dalzell, Brent; Chun, Chan Lan; Sadowsky, Michael J

    2017-08-01

    Current microbial source-tracking (MST) methods, employed to determine sources of fecal contamination in waterways, use molecular markers targeting host-associated bacteria in animal or human feces. However, there is a lack of knowledge about fecal microbiome composition in several animals and imperfect marker specificity and sensitivity. To overcome these issues, a community-based MST method has been developed. Here, we describe a study done in the Lake Superior-Saint Louis River estuary using SourceTracker, a program that calculates the source contribution to an environment. High-throughput DNA sequencing of microbiota from a diverse collection of fecal samples obtained from 11 types of animals (wild, agricultural, and domesticated) and treated effluent (n = 233) was used to generate a fecal library to perform community-based MST. Analysis of 319 fecal and environmental samples revealed that the community compositions in water and fecal samples were significantly different, allowing for the determination of the presence of fecal inputs and identification of specific sources. SourceTracker results indicated that fecal bacterial inputs into the Lake Superior estuary were primarily attributed to wastewater effluent and, to a lesser extent, geese and gull wastes. These results suggest that a community-based MST method may be another useful tool for determining sources of aquatic fecal bacteria.

  17. The First Report of miRNAs from a Thysanopteran Insect, Thrips palmi Karny Using High-Throughput Sequencing.

    Directory of Open Access Journals (Sweden)

    K B Rebijith

    Full Text Available Thrips palmi Karny (Thysanoptera: Thripidae is the sole vector of Watermelon bud necrosis tospovirus, where the crop loss has been estimated to be around USD 50 million annually. Chemical insecticides are of limited use in the management of T. palmi due to the thigmokinetic behaviour and development of high levels of resistance to insecticides. There is an urgent need to find out an effective futuristic management strategy, where the small RNAs especially microRNAs hold great promise as a key player in the growth and development. miRNAs are a class of short non-coding RNAs involved in regulation of gene expression either by mRNA cleavage or by translational repression. We identified and characterized a total of 77 miRNAs from T. palmi using high-throughput deep sequencing. Functional classifications of the targets for these miRNAs revealed that majority of them are involved in the regulation of transcription and translation, nucleotide binding and signal transduction. We have also validated few of these miRNAs employing stem-loop RT-PCR, qRT-PCR and Northern blot. The present study not only provides an in-depth understanding of the biological and physiological roles of miRNAs in governing gene expression but may also lead as an invaluable tool for the management of thysanopteran insects in the future.

  18. High throughput sequencing identifies an imprinted gene, Grb10, associated with the pluripotency state in nuclear transfer embryonic stem cells.

    Science.gov (United States)

    Li, Hui; Gao, Shuai; Huang, Hua; Liu, Wenqiang; Huang, Huanwei; Liu, Xiaoyu; Gao, Yawei; Le, Rongrong; Kou, Xiaochen; Zhao, Yanhong; Kou, Zhaohui; Li, Jia; Wang, Hong; Zhang, Yu; Wang, Hailin; Cai, Tao; Sun, Qingyuan; Gao, Shaorong; Han, Zhiming

    2017-07-18

    Somatic cell nuclear transfer and transcription factor mediated reprogramming are two widely used techniques for somatic cell reprogramming. Both fully reprogrammed nuclear transfer embryonic stem cells and induced pluripotent stem cells hold potential for regenerative medicine, and evaluation of the stem cell pluripotency state is crucial for these applications. Previous reports have shown that the Dlk1-Dio3 region is associated with pluripotency in induced pluripotent stem cells and the incomplete somatic cell reprogramming causes abnormally elevated levels of genomic 5-methylcytosine in induced pluripotent stem cells compared to nuclear transfer embryonic stem cells and embryonic stem cells. In this study, we compared pluripotency associated genes Rian and Gtl2 in the Dlk1-Dio3 region in exactly syngeneic nuclear transfer embryonic stem cells and induced pluripotent stem cells with same genomic insertion. We also assessed 5-methylcytosine and 5-hydroxymethylcytosine levels and performed high-throughput sequencing in these cells. Our results showed that Rian and Gtl2 in the Dlk1-Dio3 region related to pluripotency in induced pluripotent stem cells did not correlate with the genes in nuclear transfer embryonic stem cells, and no significant difference in 5-methylcytosine and 5-hydroxymethylcytosine levels were observed between fully and partially reprogrammed nuclear transfer embryonic stem cells and induced pluripotent stem cells. Through syngeneic comparison, our study identifies for the first time that Grb10 is associated with the pluripotency state in nuclear transfer embryonic stem cells.

  19. Identification and characterization of novel and conserved microRNAs in radish (Raphanus sativus L.) using high-throughput sequencing.

    Science.gov (United States)

    Xu, Liang; Wang, Yan; Xu, Yuanyuan; Wang, Liangju; Zhai, Lulu; Zhu, Xianwen; Gong, Yiqin; Ye, Shan; Liu, Liwang

    2013-03-01

    MicroRNAs (miRNAs) are endogenous, non-coding, small RNAs that play significant regulatory roles in plant growth, development, and biotic and abiotic stress responses. To date, a great number of conserved and species-specific miRNAs have been identified in many important plant species such as Arabidopsis, rice and poplar. However, little is known about identification of miRNAs and their target genes in radish (Raphanus sativus L.). In the present study, a small RNA library from radish root was constructed and sequenced using the high-throughput Solexa sequencing. Through sequence alignment and secondary structure prediction, a total of 545 conserved miRNA families as well as 15 novel (with their miRNA* strand) and 64 potentially novel miRNAs were identified. Quantitative real-time PCR (qRT-PCR) analysis confirmed that both conserved and novel miRNAs were expressed in radish, and some of them were preferentially expressed in certain tissues. A total of 196 potential target genes were predicted for 42 novel radish miRNAs. Gene ontology (GO) analysis showed that most of the targets were involved in plant growth, development, metabolism and stress responses. This study represents a first large-scale identification and characterization of radish miRNAs and their potential target genes. These results could lead to the further identification of radish miRNAs and enhance our understanding of radish miRNA regulatory mechanisms in diverse biological and metabolic processes. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.

  20. High-throughput sequencing of microRNAs in peripheral blood mononuclear cells: identification of potential weight loss biomarkers.

    Directory of Open Access Journals (Sweden)

    Fermín I Milagro

    Full Text Available INTRODUCTION: MicroRNAs (miRNAs are being increasingly studied in relation to energy metabolism and body composition homeostasis. Indeed, the quantitative analysis of miRNAs expression in different adiposity conditions may contribute to understand the intimate mechanisms participating in body weight control and to find new biomarkers with diagnostic or prognostic value in obesity management. OBJECTIVE: The aim of this study was the search for miRNAs in blood cells whose expression could be used as prognostic biomarkers of weight loss. METHODS: Ten Caucasian obese women were selected among the participants in a weight-loss trial that consisted in following an energy-restricted treatment. Weight loss was considered unsuccessful when 5% (responders. At baseline, total miRNA isolated from peripheral blood mononuclear cells (PBMC was sequenced with SOLiD v4. The miRNA sequencing data were validated by RT-PCR. RESULTS: Differential baseline expression of several miRNAs was found between responders and non-responders. Two miRNAs were up-regulated in the non-responder group (mir-935 and mir-4772 and three others were down-regulated (mir-223, mir-224 and mir-376b. Both mir-935 and mir-4772 showed relevant associations with the magnitude of weight loss, although the expression of other transcripts (mir-874, mir-199b, mir-766, mir-589 and mir-148b also correlated with weight loss. CONCLUSIONS: This research addresses the use of high-throughput sequencing technologies in the search for miRNA expression biomarkers in obesity, by determining the miRNA transcriptome of PBMC. Basal expression of different miRNAs, particularly mir-935 and mir-4772, could be prognostic biomarkers and may forecast the response to a hypocaloric diet.

  1. Chromatin analyses of Zymoseptoria tritici: Methods for chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq).

    Science.gov (United States)

    Soyer, Jessica L; Möller, Mareike; Schotanus, Klaas; Connolly, Lanelle R; Galazka, Jonathan M; Freitag, Michael; Stukenbrock, Eva H

    2015-06-01

    The presence or absence of specific transcription factors, chromatin remodeling machineries, chromatin modification enzymes, post-translational histone modifications and histone variants all play crucial roles in the regulation of pathogenicity genes. Chromatin immunoprecipitation (ChIP) followed by high-throughput sequencing (ChIP-seq) provides an important tool to study genome-wide protein-DNA interactions to help understand gene regulation in the context of native chromatin. ChIP-seq is a convenient in vivo technique to identify, map and characterize occupancy of specific DNA fragments with proteins against which specific antibodies exist or which can be epitope-tagged in vivo. We optimized existing ChIP protocols for use in the wheat pathogen Zymoseptoria tritici and closely related sister species. Here, we provide a detailed method, underscoring which aspects of the technique are organism-specific. Library preparation for Illumina sequencing is described, as this is currently the most widely used ChIP-seq method. One approach for the analysis and visualization of representative sequence is described; improved tools for these analyses are constantly being developed. Using ChIP-seq with antibodies against H3K4me2, which is considered a mark for euchromatin or H3K9me3 and H3K27me3, which are considered marks for heterochromatin, the overall distribution of euchromatin and heterochromatin in the genome of Z. tritici can be determined. Our ChIP-seq protocol was also successfully applied to Z. tritici strains with high levels of melanization or aberrant colony morphology, and to different species of the genus (Z. ardabiliae and Z. pseudotritici), suggesting that our technique is robust. The methods described here provide a powerful framework to study new aspects of chromatin biology and gene regulation in this prominent wheat pathogen. Copyright © 2015 Elsevier Inc. All rights reserved.

  2. Metabolomic and high-throughput sequencing analysis – modern approach for the assessment of biodeterioration of materials from historic buildings

    Directory of Open Access Journals (Sweden)

    Beata eGutarowska

    2015-09-01

    Full Text Available Preservation of cultural heritage is of paramount importance worldwide. Microbial colonization of construction materials, such as wood, brick, mortar and stone in historic buildings can lead to severe deterioration. The aim of the present study was to give modern insight into the phylogenetic diversity and activated metabolic pathways of microbial communities colonized historic objects located in the former Auschwitz II-Birkenau concentration and extermination camp in Oświęcim, Poland. For this purpose we combined molecular, microscopic and chemical methods. Selected specimens were examined using Field Emission Scanning Electron Microscopy (FESEM, metabolomic analysis and high-throughput Illumina sequencing. FESEM imaging revealed the presence of complex microbial communities comprising diatoms, fungi and bacteria, mainly cyanobacteria and actinobacteria, on sample surfaces. Microbial diversity of brick specimens appeared higher than that of the wood and was dominated by algae and cyanobacteria, while wood was mainly colonized by fungi. DNA sequences documented the presence of 15 bacterial phyla representing 99 genera including Halomonas, Halorhodospira, Salinisphaera, Salinibacterium, Rubrobacter, Streptomyces, Arthrobacter and 9 fungal classes represented by 113 genera including Cladosporium, Acremonium, Alternaria, Engyodontium, Penicillium, Rhizopus and Aureobasidium. Most of the identified sequences were characteristic of organisms implicated in deterioration of wood and brick. Metabolomic data indicated the activation of numerous metabolic pathways, including those regulating the production of primary and secondary metabolites, for example, metabolites associated with the production of antibiotics, organic acids and deterioration of organic compounds. The study demonstrated that a combination of electron microscopy imaging with metabolomic and genomic techniques allows to link the phylogenetic information and metabolic profiles of

  3. The use of high-throughput DNA sequencing in the investigation of antigenic variation: application to Neisseria species.

    Directory of Open Access Journals (Sweden)

    John K Davies

    Full Text Available Antigenic variation occurs in a broad range of species. This process resembles gene conversion in that variant DNA is unidirectionally transferred from partial gene copies (or silent loci into an expression locus. Previous studies of antigenic variation have involved the amplification and sequencing of individual genes from hundreds of colonies. Using the pilE gene from Neisseria gonorrhoeae we have demonstrated that it is possible to use PCR amplification, followed by high-throughput DNA sequencing and a novel assembly process, to detect individual antigenic variation events. The ability to detect these events was much greater than has previously been possible. In N. gonorrhoeae most silent loci contain multiple partial gene copies. Here we show that there is a bias towards using the copy at the 3' end of the silent loci (copy 1 as the donor sequence. The pilE gene of N. gonorrhoeae and some strains of Neisseria meningitidis encode class I pilin, but strains of N. meningitidis from clonal complexes 8 and 11 encode a class II pilin. We have confirmed that the class II pili of meningococcal strain FAM18 (clonal complex 11 are non-variable, and this is also true for the class II pili of strain NMB from clonal complex 8. In addition when a gene encoding class I pilin was moved into the meningococcal strain NMB background there was no evidence of antigenic variation. Finally we investigated several members of the opa gene family of N. gonorrhoeae, where it has been suggested that limited variation occurs. Variation was detected in the opaK gene that is located close to pilE, but not at the opaJ gene located elsewhere on the genome. The approach described here promises to dramatically improve studies of the extent and nature of antigenic variation systems in a variety of species.

  4. A standardized framework for accurate, high-throughput genotyping of recombinant and non-recombinant viral sequences.

    Science.gov (United States)

    Alcantara, Luiz Carlos Junior; Cassol, Sharon; Libin, Pieter; Deforche, Koen; Pybus, Oliver G; Van Ranst, Marc; Galvão-Castro, Bernardo; Vandamme, Anne-Mieke; de Oliveira, Tulio

    2009-07-01

    Human immunodeficiency virus type-1 (HIV-1), hepatitis B and C and other rapidly evolving viruses are characterized by extremely high levels of genetic diversity. To facilitate diagnosis and the development of prevention and treatment strategies that efficiently target the diversity of these viruses, and other pathogens such as human T-lymphotropic virus type-1 (HTLV-1), human herpes virus type-8 (HHV8) and human papillomavirus (HPV), we developed a rapid high-throughput-genotyping system. The method involves the alignment of a query sequence with a carefully selected set of pre-defined reference strains, followed by phylogenetic analysis of multiple overlapping segments of the alignment using a sliding window. Each segment of the query sequence is assigned the genotype and sub-genotype of the reference strain with the highest bootstrap (>70%) and bootscanning (>90%) scores. Results from all windows are combined and displayed graphically using color-coded genotypes. The new Virus-Genotyping Tools provide accurate classification of recombinant and non-recombinant viruses and are currently being assessed for their diagnostic utility. They have incorporated into several HIV drug resistance algorithms including the Stanford (http://hivdb.stanford.edu) and two European databases (http://www.umcutrecht.nl/subsite/spread-programme/ and http://www.hivrdb.org.uk/) and have been successfully used to genotype a large number of sequences in these and other databases. The tools are a PHP/JAVA web application and are freely accessible on a number of servers including: http://bioafrica.mrc.ac.za/rega-genotype/html/, http://lasp.cpqgm.fiocruz.br/virus-genotype/html/, http://jose.med.kuleuven.be/genotypetool/html/.

  5. Identification and characterization of microRNAs in Humulus lupulus using high-throughput sequencing and their response to Citrus bark cracking viroid (CBCVd) infection

    Czech Academy of Sciences Publication Activity Database

    Mishra, Ajay Kumar; Duraisamy, Ganesh Selvaraj; Matoušek, Jaroslav; Radišek, S.; Javornik, B.; Jakše, J.

    2016-01-01

    Roč. 17, č. 919 (2016) ISSN 1471-2164 R&D Projects: GA MŠk(CZ) LH14255 Institutional support: RVO:60077344 Keywords : Humulus lupulus * High-throughput sequencing * Citrus bark cracking viroid Subject RIV: EB - Genetics ; Molecular Biology Impact factor: 3.729, year: 2016

  6. Fast, accurate and easy-to-pipeline methods for amplicon sequence processing

    Science.gov (United States)

    Antonielli, Livio; Sessitsch, Angela

    2016-04-01

    Next generation sequencing (NGS) technologies established since years as an essential resource in microbiology. While on the one hand metagenomic studies can benefit from the continuously increasing throughput of the Illumina (Solexa) technology, on the other hand the spreading of third generation sequencing technologies (PacBio, Oxford Nanopore) are getting whole genome sequencing beyond the assembly of fragmented draft genomes, making it now possible to finish bacterial genomes even without short read correction. Besides (meta)genomic analysis next-gen amplicon sequencing is still fundamental for microbial studies. Amplicon sequencing of the 16S rRNA gene and ITS (Internal Transcribed Spacer) remains a well-established widespread method for a multitude of different purposes concerning the identification and comparison of archaeal/bacterial (16S rRNA gene) and fungal (ITS) communities occurring in diverse environments. Numerous different pipelines have been developed in order to process NGS-derived amplicon sequences, among which Mothur, QIIME and USEARCH are the most well-known and cited ones. The entire process from initial raw sequence data through read error correction, paired-end read assembly, primer stripping, quality filtering, clustering, OTU taxonomic classification and BIOM table rarefaction as well as alternative "normalization" methods will be addressed. An effective and accurate strategy will be presented using the state-of-the-art bioinformatic tools and the example of a straightforward one-script pipeline for 16S rRNA gene or ITS MiSeq amplicon sequencing will be provided. Finally, instructions on how to automatically retrieve nucleotide sequences from NCBI and therefore apply the pipeline to targets other than 16S rRNA gene (Greengenes, SILVA) and ITS (UNITE) will be discussed.

  7. Temporal dynamics of soil microbial communities under different moisture regimes: high-throughput sequencing and bioinformatics analysis

    Science.gov (United States)

    Semenov, Mikhail; Zhuravleva, Anna; Semenov, Vyacheslav; Yevdokimov, Ilya; Larionova, Alla

    2017-04-01

    Recent climate scenarios predict not only continued global warming but also an increased frequency and intensity of extreme climatic events such as strong changes in temperature and precipitation regimes. Microorganisms are well known to be more sensitive to changes in environmental conditions than to other soil chemical and physical parameters. In this study, we determined the shifts in soil microbial community structure as well as indicative taxa in soils under three moisture regimes using high-throughput Illumina sequencing and range of bioinformatics approaches for the assessment of sequence data. Incubation experiments were performed in soil-filled (Greyic Phaeozems Albic) rhizoboxes with maize and without plants. Three contrasting moisture regimes were being simulated: 1) optimal wetting (OW), a watering 2-3 times per week to maintain soil moisture of 20-25% by weight; 2) periodic wetting (PW), with alternating periods of wetting and drought; and 3) constant insufficient wetting (IW), while soil moisture of 12% by weight was permanently maintained. Sampled fresh soils were homogenized, and the total DNA of three replicates was extracted using the FastDNA® SPIN kit for Soil. DNA replicates were combined in a pooled sample and the DNA was used for PCR with specific primers for the 16S V3 and V4 regions. In order to compare variability between different samples and replicates within a single sample, some DNA replicates treated separately. The products were purified and submitted to Illumina MiSeq sequencing. Sequence data were evaluated by alpha-diversity (Chao1 and Shannon H' diversity indexes), beta-diversity (UniFrac and Bray-Curtis dissimilarity), heatmap, tagcloud, and plot-bar analyses using the MiSeq Reporter Metagenomics Workflow and R packages (phyloseq, vegan, tagcloud). Shannon index varied in a rather narrow range (4.4-4.9) with the lowest values for microbial communities under PW treatment. Chao1 index varied from 385 to 480, being a more flexible

  8. Understanding microalgal species composition and contributions in Antarctic glacial melt water through rbcL high throughput sequencing

    Science.gov (United States)

    Barretto, K. M.; Kalmbach, A. J.; de la Torre, J. R.; Falcón, L. I.; Carpenter, E. J.

    2016-02-01

    The McMurdo Dry Valleys (MDV) in Antarctica present unique research opportunities, both because of the understudied biogeochemical impact of their microbial communities, and their sensitivity to climate change. Despite harsh desiccation, pH, and salinity stress, summer glacial melt water supports life in the MDV in the form of algal mats. These mat communities are complex in structure, with a network of dominant cyanobacteria interspersed with heterotrophic diazotrophs, smaller photoautotrophs, and thick extracellular polymeric substances. Due to their complexity, standard microscopy yields a limited understanding of community assemblages. Our previous high throughput sequencing (HTS) approaches focusing on 16S rRNA have profiled communities with understudied photosynthetic phyla such as Acidobacteria, Gemmatimonadetes, and Chloroflexi. To characterize these phototrophic communities, we are interested in (1) understanding their temporal dynamics and how the dominant cyanobacterial species influence community composition, (2) modeling how pH, nutrients, soil wetness, and temperature act as multivariate drivers of community composition, and (3) establishing a pipeline for HTS of the rbcL gene - which encodes the large subunit of the ubiquitous photosynthetic protein RuBisCO. Our initial screening of community DNA from MDV algal mats has shown the presence of Form IA, IB, and IC cbbL (an rbcL ortholog), and Form ID rbcL - indicating a relatively high degree of photoautotrophic diversity. Soil wetness drives anoxic conditions and we see that it shifts overall microbial composition - we expect photoautotrophs to respond similarly. We also expect photoautotrophic assemblages to shift with pH and soil nutrients. Our deep sequencing efforts suggest an inconsistency between indexing primers and algal DNA that could underestimate cyanobacterial and overestimate eukaryotic abundance. Resolving these issues with new approaches will allow us to more fully understand the

  9. Short-term assessment of BCR repertoires of SLE patients after high dose glucocorticoid therapy with high-throughput sequencing.

    Science.gov (United States)

    Shi, Bin; Yu, Jiang; Ma, Long; Ma, Qingqing; Liu, Chunmei; Sun, Suhong; Ma, Rui; Yao, Xinsheng

    2016-01-01

    We analyze and assess BCR repertoires of SLE patients before and after high dose glucocorticoid therapy to address two fundamental questions: (1) After the treatment, how the BCR repertoire of SLE patient change on the clone level? (2) How to screen putative autoantibody clone set from BCR repertoire of SLE patients? The PBMCs of two SLE patients (P1 and P2) at different time points were collected, and DNA of these samples were extracted. High-throughput sequencing technology was applied in detection of BCR repertoire. Finally, we used bioinformatic methodology to analyse sequence data. We found that these two patients lost some IGHV3 family genes usage after treatment compared with before treatment. For pairing of IGHV-IGHJ gene, no significant change was shown for each patient. In addition, analyses of the composition of H-CDR3 showed overall AA compositions of H-CDR3 at three time points in each SLE patients were very similar, and the results of H-CDR3 AA usage that had the same length (14 AA) and the same position were similar. Antinuclear antibody tests of SLE patients showed that level of some antinuclear antibodies reduced after treatment; however, there was no sign that the percentage of autoantibody clones in BCR repertoires would reduce. High dose glucocorticoid treatment in short term will have little impact on composition of BCR repertoire of SLE patient. Treatment can reduce the amount of autoantibody in the protein level, but may not reduce the percentage of autoantibody clones in BCR repertoire in the clonal level.

  10. Comparison of analysis tools for miRNA high throughput sequencing using nerve crush as a model

    Directory of Open Access Journals (Sweden)

    Raghu Prasad Rao Metpally

    2013-03-01

    Full Text Available Recent advances in sample preparation and analysis for next generation sequencing have made it possible to profile and discover new miRNAs in a high throughput manner. In the case of neurological disease and injury, these types of experiments have been more limited. Possibly because tissues such as the brain and spinal cord are inaccessible for direct sampling in living patients, and indirect sampling of blood and cerebrospinal fluid are affected by low amounts of RNA. We used a mouse model to examine changes in miRNA expression in response to acute nerve crush. We assayed miRNA from both muscle tissue and blood plasma. We examined how the depth of coverage (the number of mapped reads changed the number of detectable miRNAs in each sample type. We also found that samples with very low starting amounts of RNA (mouse plasma made high depth of mature miRNA coverage more difficult to obtain. Each tissue must be assessed independently for the depth of coverage required to adequately power detection of differential expression, weighed against the cost of sequencing that sample to the adequate depth. We explored the changes in total mapped reads and differential expression results generated by three different software packages: miRDeep2, miRNAKey, and miRExpress and two different analysis packages, DESeq and EdgeR. We also examine the accuracy of using miRDeep2 to predict novel miRNAs and subsequently detect them in the samples using qRT-PCR.

  11. High-throughput sequencing and copy number variation detection using formalin fixed embedded tissue in metastatic gastric cancer.

    Directory of Open Access Journals (Sweden)

    Seokhwi Kim

    Full Text Available In the era of targeted therapy, mutation profiling of cancer is a crucial aspect of making therapeutic decisions. To characterize cancer at a molecular level, the use of formalin-fixed paraffin-embedded tissue is important. We tested the Ion AmpliSeq Cancer Hotspot Panel v2 and nCounter Copy Number Variation Assay in 89 formalin-fixed paraffin-embedded gastric cancer samples to determine whether they are applicable in archival clinical samples for personalized targeted therapies. We validated the results with Sanger sequencing, real-time quantitative PCR, fluorescence in situ hybridization and immunohistochemistry. Frequently detected somatic mutations included TP53 (28.17%, APC (10.1%, PIK3CA (5.6%, KRAS (4.5%, SMO (3.4%, STK11 (3.4%, CDKN2A (3.4% and SMAD4 (3.4%. Amplifications of HER2, CCNE1, MYC, KRAS and EGFR genes were observed in 8 (8.9%, 4 (4.5%, 2 (2.2%, 1 (1.1% and 1 (1.1% cases, respectively. In the cases with amplification, fluorescence in situ hybridization for HER2 verified gene amplification and immunohistochemistry for HER2, EGFR and CCNE1 verified the overexpression of proteins in tumor cells. In conclusion, we successfully performed semiconductor-based sequencing and nCounter copy number variation analyses in formalin-fixed paraffin-embedded gastric cancer samples. High-throughput screening in archival clinical samples enables faster, more accurate and cost-effective detection of hotspot mutations or amplification in genes.

  12. High-throughput sequence analysis of small RNAs in grapevine (Vitis vinifera L.) affected by grapevine leafroll disease.

    Science.gov (United States)

    Alabi, Olufemi J; Zheng, Yun; Jagadeeswaran, Guru; Sunkar, Ramanjulu; Naidu, Rayapati A

    2012-12-01

    Grapevine leafroll disease (GLRD) is one of the most economically important virus diseases of grapevine (Vitis spp.) worldwide. In this study, we used high-throughput sequencing of cDNA libraries made from small RNAs (sRNAs) to compare profiles of sRNA populations recovered from own-rooted Merlot grapevines with and without GLRD symptoms. The data revealed the presence of sRNAs specific to Grapevine leafroll-associated virus 3, Hop stunt viroid (HpSVd), Grapevine yellow speckle viroid 1 (GYSVd-1) and Grapevine yellow speckle viroid 2 (GYSVd-2) in symptomatic grapevines and sRNAs specific only to HpSVd, GYSVd-1 and GYSVd-2 in nonsymptomatic grapevines. In addition to 135 previously identified conserved microRNAs in grapevine (Vvi-miRs), we identified 10 novel and several candidate Vvi-miRs in both symptomatic and nonsymptomatic grapevine leaves based on the cloning of miRNA star sequences. Quantitative real-time reverse transcriptase-polymerase chain reaction (RT-PCR) of selected conserved Vvi-miRs indicated that individual members of an miRNA family are differentially expressed in symptomatic and nonsymptomatic leaves. The high-resolution mapping of sRNAs specific to an ampelovirus and three viroids in mixed infections, the identification of novel Vvi-miRs and the modulation of certain conserved Vvi-miRs offers resources for the further elucidation of compatible host-pathogen interactions and for the provision of ecologically relevant information to better understand host-pathogen-environment interactions in a perennial fruit crop. © 2012 THE AUTHORS. MOLECULAR PLANT PATHOLOGY © 2012 BSPP AND BLACKWELL PUBLISHING LTD.

  13. Unifying the analysis of high-throughput sequencing datasets: characterizing RNA-seq, 16S rRNA gene sequencing and selective growth experiments by compositional data analysis.

    Science.gov (United States)

    Fernandes, Andrew D; Reid, Jennifer Ns; Macklaim, Jean M; McMurrough, Thomas A; Edgell, David R; Gloor, Gregory B

    2014-01-01

    Experimental designs that take advantage of high-throughput sequencing to generate datasets include RNA sequencing (RNA-seq), chromatin immunoprecipitation sequencing (ChIP-seq), sequencing of 16S rRNA gene fragments, metagenomic analysis and selective growth experiments. In each case the underlying data are similar and are composed of counts of sequencing reads mapped to a large number of features in each sample. Despite this underlying similarity, the data analysis methods used for these experimental designs are all different, and do not translate across experiments. Alternative methods have been developed in the physical and geological sciences that treat similar data as compositions. Compositional data analysis methods transform the data to relative abundances with the result that the analyses are more robust and reproducible. Data from an in vitro selective growth experiment, an RNA-seq experiment and the Human Microbiome Project 16S rRNA gene abundance dataset were examined by ALDEx2, a compositional data analysis tool that uses Bayesian methods to infer technical and statistical error. The ALDEx2 approach is shown to be suitable for all three types of data: it correctly identifies both the direction and differential abundance of features in the differential growth experiment, it identifies a substantially similar set of differentially expressed genes in the RNA-seq dataset as the leading tools and it identifies as differential the taxa that distinguish the tongue dorsum and buccal mucosa in the Human Microbiome Project dataset. The design of ALDEx2 reduces the number of false positive identifications that result from datasets composed of many features in few samples. Statistical analysis of high-throughput sequencing datasets composed of per feature counts showed that the ALDEx2 R package is a simple and robust tool, which can be applied to RNA-seq, 16S rRNA gene sequencing and differential growth datasets, and by extension to other techniques that use a

  14. Using expected sequence features to improve basecalling accuracy of amplicon pyrosequencing data

    DEFF Research Database (Denmark)

    Rask, Thomas Salhøj; Petersen, Bent; Chen, Donald S.

    2016-01-01

    Amplicon pyrosequencing targets a known genetic region and thus inherently produces reads highly anticipated to have certain features, such as conserved nucleotide sequence, and in the case of protein coding DNA, an open reading frame. Pyrosequencing errors, consisting mainly of nucleotide...... insertions and deletions, are on the other hand likely to disrupt open reading frames. Such an inverse relationship between errors and expectation based on prior knowledge can be used advantageously to guide the process known as basecalling, i.e. the inference of nucleotide sequence from raw sequencing data....... This probabilistic approach enables integration of basecalling into a larger model where other parameters can be incorporated, such as the likelihood for observing a full-length open reading frame at the targeted region. We apply the method to 454 amplicon pyrosequencing data obtained from a malaria virulence gene...

  15. A new perspective on studying burial environment before archaeological excavation: analyzing bacterial community distribution by high-throughput sequencing.

    Science.gov (United States)

    Xu, Jinjin; Wei, Yanfei; Jia, Hanqing; Xiao, Lin; Gong, Decai

    2017-02-07

    Burial conditions play a crucial role in archaeological heritage preservation. Especially, the microorganisms were considered as the leading causes which incurred degradation and vanishment of historic materials. In this article, we analyzed bacterial diversity and community structure from M1 of Wangshanqiao using 16 S rRNA gene amplicon sequencing. The results indicated that microbial communities in burial conditions were diverse among four different samples. The samples from the robber hole varied most obviously in community structure both in Alpha and Beta diversity. In addition, the dominant phylum in different samples were Proteobacteria, Actinobacteria and Bacteroidetes, respectively. Moreover, the study implied that historical materials preservation conditions had connections with bacterial community distribution. At the genus level, Acinetobacter might possess high ability in degrading organic culture heritage in burial conditions, while Bacteroides were associated closely with favorable preservation conditions. This method contributes to fetch information which would never recover after excavation, and it will help to explore microbial degradation on precious organic culture heritage and further our understanding of archaeological burial environment. The study also indicates that robbery has a serious negative impact on burial remains.

  16. High-throughput sequencing technology to reveal the composition and function of cecal microbiota in Dagu chicken.

    Science.gov (United States)

    Xu, Yunhe; Yang, Huixin; Zhang, Lili; Su, Yuhong; Shi, Donghui; Xiao, Haidi; Tian, Yumin

    2016-11-04

    The chicken gut microbiota is an important and complicated ecosystem for the host. They play an important role in converting food into nutrient and energy. The coding capacity of microbiome vastly surpasses that of the host's genome, encoding biochemical pathways that the host has not developed. An optimal gut microbiota can increase agricultural productivity. This study aims to explore the composition and function of cecal microbiota in Dagu chicken under two feeding modes, free-range (outdoor, OD) and cage (indoor, ID) raising. Cecal samples were collected from 24 chickens across 4 groups (12-w OD, 12-w ID, 18-w OD, and 18-w ID). We performed high-throughput sequencing of the 16S rRNA genes V4 hypervariable regions to characterize the cecal microbiota of Dagu chicken and compare the difference of cecal microbiota between free-range and cage raising chickens. It was found that 34 special operational taxonomic units (OTUs) in OD groups and 4 special OTUs in ID groups. 24 phyla were shared by the 24 samples. Bacteroidetes was the most abundant phylum with the largest proportion, followed by Firmicutes and Proteobacteria. The OD groups showed a higher proportion of Bacteroidetes (>50 %) in cecum, but a lower Firmicutes/Bacteroidetes ratio in both 12-w old (0.42, 0.62) and 18-w old groups (0.37, 0.49) compared with the ID groups. Cecal microbiota in the OD groups have higher abundance of functions involved in amino acids and glycan metabolic pathway. The composition and function of cecal microbiota in Dagu chicken under two feeding modes, free-range and cage raising are different. The cage raising mode showed a lower proportion of Bacteroidetes in cecum, but a higher Firmicutes/Bacteroidetes ratio compared with free-range mode. Cecal microbiota in free-range mode have higher abundance of functions involved in amino acids and glycan metabolic pathway.

  17. Predicting the origin of soil evidence: High throughput eukaryote sequencing and MIR spectroscopy applied to a crime scene scenario.

    Science.gov (United States)

    Young, Jennifer M; Weyrich, Laura S; Breen, James; Macdonald, Lynne M; Cooper, Alan

    2015-06-01

    Soil can serve as powerful trace evidence in forensic casework, because it is highly individualistic and can be characterised using a number of techniques. Complex soil matrixes can support a vast number of organisms that can provide a site-specific signal for use in forensic soil discrimination. Previous DNA fingerprinting techniques rely on variations in fragment length to distinguish between soil profiles and focus solely on microbial communities. However, the recent development of high throughput sequencing (HTS) has the potential to provide a more detailed picture of the soil community by accessing non-culturable microorganisms and by identifying specific bacteria, fungi, and plants within soil. To demonstrate the application of HTS to forensic soil analysis, 18S ribosomal RNA profiles of six forensic mock crime scene samples were compared to those collected from seven reference locations across South Australia. Our results demonstrate the utility of non-bacterial DNA to discriminate between different sites, and were able to link a soil to a particular location. In addition, HTS complemented traditional Mid Infrared (MIR) spectroscopy soil profiling, but was able to provide statistically stronger discriminatory power at a finer scale. Through the design of an experimental case scenario, we highlight the considerations and potential limitations of this method in forensic casework. We show that HTS analysis of soil eukaryotes was robust to environmental variation, e.g. rainfall and temperature, transfer effects, storage effects and spatial variation. In addition, this study utilises novel analytical methodologies to interpret results for investigative purposes and provides prediction statistics to support soil DNA analysis for evidential stages of a case. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  18. QTL Mapping for Rice RVA Properties Using High-Throughput Re-sequenced Chromosome Segment Substitution Lines

    Directory of Open Access Journals (Sweden)

    Chang-quan ZHANG

    2013-11-01

    Full Text Available The rapid visco analyser (RVA profile is an important factor for evaluation of the cooking and eating quality of rice. To improve rice quality, the identification of new quantitative trait loci (QTLs for RVA profiling is of great significance. We used a japonica rice cultivar Nipponbare as the recipient and indica rice 9311 as the donor to develop a population containing 38 chromosome segment substitution lines (CSSLs genotyped by a high-throughput re-sequencing strategy. In this study, the population and the parent lines, which contained similar apparent amylose contents, were used to map the QTLs of RVA properties including peak paste viscosity (PKV, hot paste viscosity (HPV, cool paste viscosity (CPV, breakdown viscosity (BKV, setback viscosity (SBV, consistency viscosity (CSV, peak time (PeT and pasting temperature (PaT. QTL analysis was carried out using one-way analysis of variance and Dunnett's test, and stable QTLs were identified over two years and under two environments. We identified 10 stable QTLs: qPKV2-1, qSBV2-1; qPKV5-1, qHPV5-1, qCPV5-1; qPKV7-1, qHPV7-1, qCPV7-1, qSBV7-1; and qPKV8-1 on chromosomes 2, 5, 7 and 8, respectively, with contributions ranging from −95.6% to 47.1%. Besides, there was pleiotropy in the QTLs on chromosomes 2, 5 and 7.

  19. Characterization of Bacterial and Fungal Community Dynamics by High-Throughput Sequencing (HTS Metabarcoding during Flax Dew-Retting

    Directory of Open Access Journals (Sweden)

    Christophe Djemiel

    2017-10-01

    Full Text Available Flax dew-retting is a key step in the industrial extraction of fibers from flax stems and is dependent upon the production of a battery of hydrolytic enzymes produced by micro-organisms during this process. To explore the diversity and dynamics of bacterial and fungal communities involved in this process we applied a high-throughput sequencing (HTS DNA metabarcoding approach (16S rRNA/ITS region, Illumina Miseq on plant and soil samples obtained over a period of 7 weeks in July and August 2014. Twenty-three bacterial and six fungal phyla were identified in soil samples and 11 bacterial and four fungal phyla in plant samples. Dominant phyla were Proteobacteria, Bacteroidetes, Actinobacteria, and Firmicutes (bacteria and Ascomycota, Basidiomycota, and Zygomycota (fungi all of which have been previously associated with flax dew-retting except for Bacteroidetes and Basidiomycota that were identified for the first time. Rare phyla also identified for the first time in this process included Acidobacteria, CKC4, Chlorobi, Fibrobacteres, Gemmatimonadetes, Nitrospirae and TM6 (bacteria, and Chytridiomycota (fungi. No differences in microbial communities and colonization dynamics were observed between early and standard flax harvests. In contrast, the common agricultural practice of swath turning affects both bacterial and fungal community membership and structure in straw samples and may contribute to a more uniform retting. Prediction of community function using PICRUSt indicated the presence of a large collection of potential bacterial enzymes capable of hydrolyzing backbones and side-chains of cell wall polysaccharides. Assignment of functional guild (functional group using FUNGuild software highlighted a change from parasitic to saprophytic trophic modes in fungi during retting. This work provides the first exhaustive description of the microbial communities involved in flax dew-retting and will provide a valuable benchmark in future studies aiming

  20. Characterization of Bacterial and Fungal Community Dynamics by High-Throughput Sequencing (HTS) Metabarcoding during Flax Dew-Retting.

    Science.gov (United States)

    Djemiel, Christophe; Grec, Sébastien; Hawkins, Simon

    2017-01-01

    Flax dew-retting is a key step in the industrial extraction of fibers from flax stems and is dependent upon the production of a battery of hydrolytic enzymes produced by micro-organisms during this process. To explore the diversity and dynamics of bacterial and fungal communities involved in this process we applied a high-throughput sequencing (HTS) DNA metabarcoding approach (16S rRNA/ITS region, Illumina Miseq) on plant and soil samples obtained over a period of 7 weeks in July and August 2014. Twenty-three bacterial and six fungal phyla were identified in soil samples and 11 bacterial and four fungal phyla in plant samples. Dominant phyla were Proteobacteria, Bacteroidetes, Actinobacteria, and Firmicutes (bacteria) and Ascomycota, Basidiomycota, and Zygomycota (fungi) all of which have been previously associated with flax dew-retting except for Bacteroidetes and Basidiomycota that were identified for the first time. Rare phyla also identified for the first time in this process included Acidobacteria, CKC4, Chlorobi, Fibrobacteres, Gemmatimonadetes, Nitrospirae and TM6 (bacteria), and Chytridiomycota (fungi). No differences in microbial communities and colonization dynamics were observed between early and standard flax harvests. In contrast, the common agricultural practice of swath turning affects both bacterial and fungal community membership and structure in straw samples and may contribute to a more uniform retting. Prediction of community function using PICRUSt indicated the presence of a large collection of potential bacterial enzymes capable of hydrolyzing backbones and side-chains of cell wall polysaccharides. Assignment of functional guild (functional group) using FUNGuild software highlighted a change from parasitic to saprophytic trophic modes in fungi during retting. This work provides the first exhaustive description of the microbial communities involved in flax dew-retting and will provide a valuable benchmark in future studies aiming to evaluate

  1. Evaluation of the reproducibility of amplicon sequencing with Illumina MiSeq platform

    Science.gov (United States)

    Van Nostrand, Joy D.; Ning, Daliang; Sun, Bo; Xue, Kai; Liu, Feifei; Deng, Ye; Liang, Yuting; Zhou, Jizhong

    2017-01-01

    Illumina’s MiSeq has become the dominant platform for gene amplicon sequencing in microbial ecology studies; however, various technical concerns, such as reproducibility, still exist. To assess reproducibility, 16S rRNA gene amplicons from 18 soil samples of a reciprocal transplantation experiment were sequenced on an Illumina MiSeq. The V4 region of 16S rRNA gene from each sample was sequenced in triplicate with each replicate having a unique barcode. The average OTU overlap, without considering sequence abundance, at a rarefaction level of 10,323 sequences was 33.4±2.1% and 20.2±1.7% between two and among three technical replicates, respectively. When OTU sequence abundance was considered, the average sequence abundance weighted OTU overlap was 85.6±1.6% and 81.2±2.1% for two and three replicates, respectively. Removing singletons significantly increased the overlap for both (~1–3%, psamples examined. These results suggest that although there is variation among technical replicates, amplicon sequencing on MiSeq is useful for analyzing microbial community structure if used appropriately and with caution. For example, including technical replicates, removing spurious sequences and unrepresentative OTUs, using a clustering method with a high stringency for OTU generation, estimating treatment effects at higher taxonomic levels, and adapting the unique molecular identifier (UMI) and other newly developed methods to lower PCR and sequencing error and to identify true low abundance rare species all can increase reproducibility. PMID:28453559

  2. Identification of Brassinosteroid Target Genes by Chromatin Immunoprecipitation Followed by High-Throughput Sequencing (ChIP-seq) and RNA-Sequencing.

    Science.gov (United States)

    Nolan, Trevor; Liu, Sanzhen; Guo, Hongqing; Li, Lei; Schnable, Patrick; Yin, Yanhai

    2017-01-01

    Brassinosteroids (BRs) play important roles in many growth and developmental processes. BRs signal to regulate BR-INSENSITIVE1-ETHYL METHANESULFONATE-SUPPRESSOR1 (BES1) and BRASSINAZOLE-RESISTANT1 (BZR1) transcription factors (TFs), which, in turn, regulate several hundreds of transcription factors (termed BES1/BZR1-targeted TFs or BTFs) and thousands of genes to mediate various BR responses. Chromatin Immunoprecipitation followed by high-throughput sequencing (ChIP-seq) with BES1/BZR1 and BTFs is an important approach to identify BR target genes. In combination with RNA-sequencing experiments, these genomic methods have become powerful tools to detect BR target genes and reveal transcriptional networks underlying BR-regulated processes.

  3. eRNA: a graphic user interface-based tool optimized for large data analysis from high-throughput RNA sequencing.

    Science.gov (United States)

    Yuan, Tiezheng; Huang, Xiaoyi; Dittmar, Rachel L; Du, Meijun; Kohli, Manish; Boardman, Lisa; Thibodeau, Stephen N; Wang, Liang

    2014-03-05

    RNA sequencing (RNA-seq) is emerging as a critical approach in biological research. However, its high-throughput advantage is significantly limited by the capacity of bioinformatics tools. The research community urgently needs user-friendly tools to efficiently analyze the complicated data generated by high throughput sequencers. We developed a standalone tool with graphic user interface (GUI)-based analytic modules, known as eRNA. The capacity of performing parallel processing and sample management facilitates large data analyses by maximizing hardware usage and freeing users from tediously handling sequencing data. The module miRNA identification" includes GUIs for raw data reading, adapter removal, sequence alignment, and read counting. The module "mRNA identification" includes GUIs for reference sequences, genome mapping, transcript assembling, and differential expression. The module "Target screening" provides expression profiling analyses and graphic visualization. The module "Self-testing" offers the directory setups, sample management, and a check for third-party package dependency. Integration of other GUIs including Bowtie, miRDeep2, and miRspring extend the program's functionality. eRNA focuses on the common tools required for the mapping and quantification analysis of miRNA-seq and mRNA-seq data. The software package provides an additional choice for scientists who require a user-friendly computing environment and high-throughput capacity for large data analysis. eRNA is available for free download at https://sourceforge.net/projects/erna/?source=directory.

  4. The use of coded PCR primers enables high-throughput sequencing of multiple homolog amplification products by 454 parallel sequencing

    DEFF Research Database (Denmark)

    Binladen, Jonas; Gilbert, M Thomas P; Bollback, Jonathan P

    2007-01-01

    to the correct source once sequencing anomalies are accounted for (miss-assignment ratebias in the distribution of the differently tagged...... primers that is dependent on the 5' nucleotide of the tag. In particular, primers 5' labelled with a cytosine are heavily overrepresented among the final sequences, while those 5' labelled with a thymine are strongly underrepresented. A weaker bias also exists with regards to the distribution...

  5. Identification of microRNAs in the Toxigenic Dinoflagellate Alexandrium catenella by High-Throughput Illumina Sequencing and Bioinformatic Analysis.

    Directory of Open Access Journals (Sweden)

    Huili Geng

    Full Text Available Micro-ribonucleic acids (miRNAs are a large group of endogenous, tiny, non-coding RNAs consisting of 19-25 nucleotides that regulate gene expression at either the transcriptional or post-transcriptional level by mediating gene silencing in eukaryotes. They are considered to be important regulators that affect growth, development, and response to various stresses in plants. Alexandrium catenella is an important marine toxic phytoplankton species that can cause harmful algal blooms (HABs. To date, identification and function analysis of miRNAs in A. catenella remain largely unexamined. In this study, high-throughput sequencing was performed on A. catenella to identify and quantitatively profile the repertoire of small RNAs from two different growth phases. A total of 38,092,056 and 32,969,156 raw reads were obtained from the two small RNA libraries, respectively. In total, 88 mature miRNAs belonging to 32 miRNA families were identified. Significant differences were found in the member number, expression level of various families, and expression abundance of each member within a family. A total of 15 potentially novel miRNAs were identified. Comparative profiling showed that 12 known miRNAs exhibited differential expression between the lag phase and the logarithmic phase. Real-time quantitative RT-PCR (qPCR was performed to confirm the expression of two differentially expressed miRNAs that were one up-regulated novel miRNA (aca-miR-3p-456915, and one down-regulated conserved miRNA (tae-miR159a. The expression trend of the qPCR assay was generally consistent with the deep sequencing result. Target predictions of the 12 differentially expressed miRNAs resulted in 1813 target genes. Gene ontology (GO analysis and the Kyoto Encyclopedia of Genes and Genomes pathway database (KEGG annotations revealed that some miRNAs were associated with growth and developmental processes of the alga. These results provide insights into the roles that miRNAs play in

  6. Microbes in deep marine sediments viewed through amplicon sequencing and metagenomics

    Science.gov (United States)

    Biddle, J.; Leon, Z. R.; Russell, J. A., III; Martino, A. J.

    2016-12-01

    Nearly twenty percent of microbial biomass on Earth can be found in the marine subsurface. The majority of this is concentrated on continental margins, which have been investigated by scientific drilling. On the Costa Rica Margin, Iberian Margin and Peru Margins, sediment samples have been investigated through DNA extraction followed by amplicon and metagenomic sequencing. Overall samples show a high degree of microbial diversity, including many lineages of newly defined groups. In this talk, metagenome assembled genomes of unusual lineages will be presented, including their relationships to shallower relatives. From Costa Rica, in particular, we have retrieved deep relatives of Lokiarchaeota and Thorarchaeota, as well as other deeply branching archaeal relatives. We discuss their genome similarities to both other archaea and eukaryotes. From the Iberian Margin, relatives of Atribacteria and Aerophobetes will be discussed. Finally, we will detail the knowledge lost or gained depending on whether samples are studied via amplicon sequencing or total metagenomics, as studies in other environments have shown that up to 15% of microbial diversity is ignored when samples are studied via amplicon sequencing alone.

  7. Analysis of soil microbial communities based on amplicon sequencing of marker genes

    DEFF Research Database (Denmark)

    Schöler, Anne; Jacquiod, Samuel; Vestergaard, Gisle

    2017-01-01

    The use of cultivation independent methods has revolutionized soil biology in the last decades. Most popular approaches are based on directly extracted DNA from soil and subsequent analysis of PCR-amplified marker genes by next-generation sequencing. While these high-throughput methods offer novel...... possibilities over cultivation-based approaches, several key points need to be considered to minimize potential biases during library preparation and downstream bioinformatic analysis. This opinion paper highlights crucial steps that should be considered for accurate analysis and data interpretation....

  8. Assessment of Bifidobacterium Species Using groEL Gene on the Basis of Illumina MiSeq High-Throughput Sequencing

    Science.gov (United States)

    Hu, Lujun; Lu, Wenwei; Wang, Linlin; Pan, Mingluo; Zhang, Hao; Zhao, Jianxin; Chen, Wei

    2017-01-01

    The next-generation high-throughput sequencing techniques have introduced a new way to assess the gut’s microbial diversity on the basis of 16S rRNA gene-based microbiota analysis. However, the precise appraisal of the biodiversity of Bifidobacterium species within the gut remains a challenging task because of the limited resolving power of the 16S rRNA gene in different species. The groEL gene, a protein-coding gene, evolves quickly and thus is useful for differentiating bifidobacteria. Here, we designed a Bifidobacterium-specific primer pair which targets a hypervariable sequence region within the groEL gene that is suitable for precise taxonomic identification and detection of all recognized species of the genus Bifidobacterium so far. The results showed that the novel designed primer set can specifically differentiate Bifidobacterium species from non-bifidobacteria, and as low as 104 cells of Bifidobacterium species can be detected using the novel designed primer set on the basis of Illumina Miseq high-throughput sequencing. We also developed a novel protocol to assess the diversity of Bifidobacterium species in both human and rat feces through high-throughput sequencing technologies using groEL gene as a discriminative marker. PMID:29160815

  9. SSR_pipeline--computer software for the identification of microsatellite sequences from paired-end Illumina high-throughput DNA sequence data

    Science.gov (United States)

    Miller, Mark P.; Knaus, Brian J.; Mullins, Thomas D.; Haig, Susan M.

    2013-01-01

    SSR_pipeline is a flexible set of programs designed to efficiently identify simple sequence repeats (SSRs; for example, microsatellites) from paired-end high-throughput Illumina DNA sequencing data. The program suite contains three analysis modules along with a fourth control module that can be used to automate analyses of large volumes of data. The modules are used to (1) identify the subset of paired-end sequences that pass quality standards, (2) align paired-end reads into a single composite DNA sequence, and (3) identify sequences that possess microsatellites conforming to user specified parameters. Each of the three separate analysis modules also can be used independently to provide greater flexibility or to work with FASTQ or FASTA files generated from other sequencing platforms (Roche 454, Ion Torrent, etc). All modules are implemented in the Python programming language and can therefore be used from nearly any computer operating system (Linux, Macintosh, Windows). The program suite relies on a compiled Python extension module to perform paired-end alignments. Instructions for compiling the extension from source code are provided in the documentation. Users who do not have Python installed on their computers or who do not have the ability to compile software also may choose to download packaged executable files. These files include all Python scripts, a copy of the compiled extension module, and a minimal installation of Python in a single binary executable. See program documentation for more information.

  10. Genetic Bases of Bicuspid Aortic Valve: The Contribution of Traditional and High-Throughput Sequencing Approaches on Research and Diagnosis

    Directory of Open Access Journals (Sweden)

    Betti Giusti

    2017-08-01

    Full Text Available Bicuspid aortic valve (BAV is a common (0.5–2.0% of general population congenital heart defect with increased prevalence of aortic dilatation and dissection. BAV has an autosomal dominant inheritance with reduced penetrance and variable expressivity. BAV has been described as an isolated trait or associated with syndromic conditions [e.g., Marfan Marfan syndrome or Loeys-Dietz syndrome (MFS, LDS]. Identification of a syndromic condition in a BAV patient is clinically relevant to personalize aortic surgery indication. A 4-fold increase in BAV prevalence in a large cohort of unrelated MFS patients with respect to general population was reported, as well as in LDS patients (8-fold. It is also known that BAV is more frequent in patients with thoracic aortic aneurysm (TAA related to mutations in ACTA2, FBN1, and TGFBR2 genes. Moreover, in 8 patients with BAV and thoracic aortic dilation, not fulfilling the clinical criteria for MFS, FBN1 mutations in 2/8 patients were identified suggesting that FBN1 or other genes involved in syndromic conditions correlated to aortopathy could be involved in BAV. Beyond loci associated to syndromic disorders, studies in humans and animal models evidenced/suggested the role of further genes in non-syndromic BAV. The transcriptional regulator NOTCH1 has been associated with the development and acceleration of calcium deposition. Genome wide marker-based linkage analysis demonstrated a linkage of BAV to loci on chromosomes 18, 5, and 13q. Recently, a role for GATA4/5 in aortic valve morphogenesis and endocardial cell differentiation has been reported. BAV has also been associated with a reduced UFD1L gene expression or involvement of a locus containing AXIN1/PDIA2. Much remains to be understood about the genetics of BAV. In the last years, high-throughput sequencing technologies, allowing the analysis of large number of genes or entire exomes or genomes, progressively became available. The latter issue together with

  11. Next-generation sequencing of multiple individuals per barcoded library by deconvolution of sequenced amplicons using endonuclease fragment analysis

    DEFF Research Database (Denmark)

    Andersen, Jeppe D; Pereira, Vania; Pietroni, Carlotta

    2014-01-01

    The simultaneous sequencing of samples from multiple individuals increases the efficiency of next-generation sequencing (NGS) while also reducing costs. Here we describe a novel and simple approach for sequencing DNA from multiple individuals per barcode. Our strategy relies on the endonuclease...... digestion of PCR amplicons prior to library preparation, creating a specific fragment pattern for each individual that can be resolved after sequencing. By using both barcodes and restriction fragment patterns, we demonstrate the ability to sequence the human melanocortin 1 receptor (MC1R) genes from 72...

  12. Next-generation sequencing of multiple individuals per barcoded library by deconvolution of sequenced amplicons using endonuclease fragment analysis.

    Science.gov (United States)

    Andersen, Jeppe D; Pereira, Vania; Pietroni, Carlotta; Mikkelsen, Martin; Johansen, Peter; Børsting, Claus; Morling, Niels

    2014-08-01

    The simultaneous sequencing of samples from multiple individuals increases the efficiency of next-generation sequencing (NGS) while also reducing costs. Here we describe a novel and simple approach for sequencing DNA from multiple individuals per barcode. Our strategy relies on the endonuclease digestion of PCR amplicons prior to library preparation, creating a specific fragment pattern for each individual that can be resolved after sequencing. By using both barcodes and restriction fragment patterns, we demonstrate the ability to sequence the human melanocortin 1 receptor (MC1R) genes from 72 individuals using only 24 barcoded libraries.

  13. High-Throughput Sequencing of the Expressed Torafugu (Takifugu rubripes Antibody Sequences Distinguishes IgM and IgT Repertoires and Reveals Evidence of Convergent Evolution

    Directory of Open Access Journals (Sweden)

    Xi Fu

    2018-02-01

    Full Text Available B-cell antigen receptor (BCR or antibody diversity arises from somatic recombination of immunoglobulin (Ig gene segments and is concentrated within the Ig heavy (H chain complementarity-determining region 3 (CDR-H3. We performed high-throughput sequencing of the expressed antibody heavy-chain repertoire from adult torafugu. We found that torafugu use between 70 and 82% of all possible V (variable, D (diversity, and J (joining gene segment combinations and that they share a similar frequency distribution of these VDJ combinations. The CDR-H3 sequence repertoire observed in individuals is biased with the preferential use of a small number of VDJ, dominated by sequences containing inserted nucleotides. We uncovered the common CDR-H3 amino-acid (aa sequences shared by individuals. Common CDR-H3 sequences feature highly convergent nucleic-acid recombination compared with private ones. Finally, we observed differences in repertoires between IgM and IgT, including the unequal usage frequencies of V gene segment and the biased number of nucleotide insertion/deletion at VDJ junction regions that leads to distinct distributions of CDR-H3 lengths.

  14. Fixing Formalin: A Method to Recover Genomic-Scale DNA Sequence Data from Formalin-Fixed Museum Specimens Using High-Throughput Sequencing.

    Directory of Open Access Journals (Sweden)

    Sarah M Hykin

    Full Text Available For 150 years or more, specimens were routinely collected and deposited in natural history collections without preserving fresh tissue samples for genetic analysis. In the case of most herpetological specimens (i.e. amphibians and reptiles, attempts to extract and sequence DNA from formalin-fixed, ethanol-preserved specimens-particularly for use in phylogenetic analyses-has been laborious and largely ineffective due to the highly fragmented nature of the DNA. As a result, tens of thousands of specimens in herpetological collections have not been available for sequence-based phylogenetic studies. Massively parallel High-Throughput Sequencing methods and the associated bioinformatics, however, are particularly suited to recovering meaningful genetic markers from severely degraded/fragmented DNA sequences such as DNA damaged by formalin-fixation. In this study, we compared previously published DNA extraction methods on three tissue types subsampled from formalin-fixed specimens of Anolis carolinensis, followed by sequencing. Sufficient quality DNA was recovered from liver tissue, making this technique minimally destructive to museum specimens. Sequencing was only successful for the more recently collected specimen (collected ~30 ybp. We suspect this could be due either to the conditions of preservation and/or the amount of tissue used for extraction purposes. For the successfully sequenced sample, we found a high rate of base misincorporation. After rigorous trimming, we successfully mapped 27.93% of the cleaned reads to the reference genome, were able to reconstruct the complete mitochondrial genome, and recovered an accurate phylogenetic placement for our specimen. We conclude that the amount of DNA available, which can vary depending on specimen age and preservation conditions, will determine if sequencing will be successful. The technique described here will greatly improve the value of museum collections by making many formalin-fixed specimens

  15. Fixing Formalin: A Method to Recover Genomic-Scale DNA Sequence Data from Formalin-Fixed Museum Specimens Using High-Throughput Sequencing.

    Science.gov (United States)

    Hykin, Sarah M; Bi, Ke; McGuire, Jimmy A

    2015-01-01

    For 150 years or more, specimens were routinely collected and deposited in natural history collections without preserving fresh tissue samples for genetic analysis. In the case of most herpetological specimens (i.e. amphibians and reptiles), attempts to extract and sequence DNA from formalin-fixed, ethanol-preserved specimens-particularly for use in phylogenetic analyses-has been laborious and largely ineffective due to the highly fragmented nature of the DNA. As a result, tens of thousands of specimens in herpetological collections have not been available for sequence-based phylogenetic studies. Massively parallel High-Throughput Sequencing methods and the associated bioinformatics, however, are particularly suited to recovering meaningful genetic markers from severely degraded/fragmented DNA sequences such as DNA damaged by formalin-fixation. In this study, we compared previously published DNA extraction methods on three tissue types subsampled from formalin-fixed specimens of Anolis carolinensis, followed by sequencing. Sufficient quality DNA was recovered from liver tissue, making this technique minimally destructive to museum specimens. Sequencing was only successful for the more recently collected specimen (collected ~30 ybp). We suspect this could be due either to the conditions of preservation and/or the amount of tissue used for extraction purposes. For the successfully sequenced sample, we found a high rate of base misincorporation. After rigorous trimming, we successfully mapped 27.93% of the cleaned reads to the reference genome, were able to reconstruct the complete mitochondrial genome, and recovered an accurate phylogenetic placement for our specimen. We conclude that the amount of DNA available, which can vary depending on specimen age and preservation conditions, will determine if sequencing will be successful. The technique described here will greatly improve the value of museum collections by making many formalin-fixed specimens available for

  16. Characterization of a transcriptome from a non-model organism, Cladonia rangiferina, the grey reindeer lichen, using high-throughput next generation sequencing and EST sequence data

    Directory of Open Access Journals (Sweden)

    Junttila Sini

    2012-10-01

    Full Text Available Abstract Background Lichens are symbiotic organisms that have a remarkable ability to survive in some of the most extreme terrestrial climates on earth. Lichens can endure frequent desiccation and wetting cycles and are able to survive in a dehydrated molecular dormant state for decades at a time. Genetic resources have been established in lichen species for the study of molecular systematics and their taxonomic classification. No lichen species have been characterised yet using genomics and the molecular mechanisms underlying the lichen symbiosis and the fundamentals of desiccation tolerance remain undescribed. We report the characterisation of a transcriptome of the grey reindeer lichen, Cladonia rangiferina, using high-throughput next-generation transcriptome sequencing and traditional Sanger EST sequencing data. Results Altogether 243,729 high quality sequence reads were de novo assembled into 16,204 contigs and 49,587 singletons. The genome of origin for the sequences produced was predicted using Eclat with sequences derived from the axenically grown symbiotic partners used as training sequences for the classification model. 62.8% of the sequences were classified as being of fungal origin while the remaining 37.2% were predicted as being of algal origin. The assembled sequences were annotated by BLASTX comparison against a non-redundant protein sequence database with 34.4% of the sequences having a BLAST match. 29.3% of the sequences had a Gene Ontology term match and 27.9% of the sequences had a domain or structural match following an InterPro search. 60 KEGG pathways with more than 10 associated sequences were identified. Conclusions Our results present a first transcriptome sequencing and de novo assembly for a lichen species and describe the ongoing molecular processes and the most active pathways in C. rangiferina. This brings a meaningful contribution to publicly available lichen sequence information. These data provide a first

  17. Multiplex amplicon sequencing for microbe identification in community-based culture collections.

    Science.gov (United States)

    Armanhi, Jaderson Silveira Leite; de Souza, Rafael Soares Correa; de Araújo, Laura Migliorini; Okura, Vagner Katsumi; Mieczkowski, Piotr; Imperial, Juan; Arruda, Paulo

    2016-07-12

    Microbiome analysis using metagenomic sequencing has revealed a vast microbial diversity associated with plants. Identifying the molecular functions associated with microbiome-plant interaction is a significant challenge concerning the development of microbiome-derived technologies applied to agriculture. An alternative to accelerate the discovery of the microbiome benefits to plants is to construct microbial culture collections concomitant with accessing microbial community structure and abundance. However, traditional methods of isolation, cultivation, and identification of microbes are time-consuming and expensive. Here we describe a method for identification of microbes in culture collections constructed by picking colonies from primary platings that may contain single or multiple microorganisms, which we named community-based culture collections (CBC). A multiplexing 16S rRNA gene amplicon sequencing based on two-step PCR amplifications with tagged primers for plates, rows, and columns allowed the identification of the microbial composition regardless if the well contains single or multiple microorganisms. The multiplexing system enables pooling amplicons into a single tube. The sequencing performed on the PacBio platform led to recovery near-full-length 16S rRNA gene sequences allowing accurate identification of microorganism composition in each plate well. Cross-referencing with plant microbiome structure and abundance allowed the estimation of diversity and abundance representation of microorganism in the CBC.

  18. High Throughput Facility

    Data.gov (United States)

    Federal Laboratory Consortium — Argonne?s high throughput facility provides highly automated and parallel approaches to material and materials chemistry development. The facility allows scientists...

  19. Genetic characterisation of Malawian pneumococci prior to the roll-out of the PCV13 vaccine using a high-throughput whole genome sequencing approach.

    Directory of Open Access Journals (Sweden)

    Dean B Everett

    Full Text Available Malawi commenced the introduction of the 13-valent pneumococcal conjugate vaccine (PCV13 into the routine infant immunisation schedule in November 2011. Here we have tested the utility of high throughput whole genome sequencing to provide a high-resolution view of pre-vaccine pneumococcal epidemiology and population evolutionary trends to predict potential future change in population structure post introduction.One hundred and twenty seven (127 archived pneumococcal isolates from randomly selected adults and children presenting to the Queen Elizabeth Central Hospital, Blantyre, Malawi underwent whole genome sequencing.The pneumococcal population was dominated by serotype 1 (20.5% of invasive isolates prior to vaccine introduction. PCV13 is likely to protect against 62.9% of all circulating invasive pneumococci (78.3% in under-5-year-olds. Several Pneumococcal Molecular Epidemiology Network (PMEN clones are now in circulation in Malawi which were previously undetected but the pandemic multidrug resistant PMEN1 lineage was not identified. Genome analysis identified a number of novel sequence types and serotype switching.High throughput genome sequencing is now feasible and has the capacity to simultaneously elucidate serotype, sequence type and as well as detailed genetic information. It enables population level characterization, providing a detailed picture of population structure and genome evolution relevant to disease control. Post-vaccine introduction surveillance supported by genome sequencing is essential to providing a comprehensive picture of the impact of PCV13 on pneumococcal population structure and informing future public health interventions.

  20. Global Perspectives on Activated Sludge Community Composition analyzed using 16S rRNA amplicon sequencing

    DEFF Research Database (Denmark)

    Nierychlo, Marta; Saunders, Aaron Marc; Albertsen, Mads

    Activated sludge is the most commonly applied bioprocess throughout the world for wastewater treatment. Microorganisms are key to the process, yet our knowledge of their identity and function is still limited. High-througput16S rRNA amplicon sequencing can reliably characterize microbial...... communities, and in this study activated sludge sampled from 32 Wastewater Treatment Plants (WWTPs) around the world was described and compared. The top abundant bacteria in the global activated sludge ecosystem were found and the core population shared by multiple samples was investigated. The results...

  1. eRNA: a graphic user interface-based tool optimized for large data analysis from high-throughput RNA sequencing

    Science.gov (United States)

    2014-01-01

    Background RNA sequencing (RNA-seq) is emerging as a critical approach in biological research. However, its high-throughput advantage is significantly limited by the capacity of bioinformatics tools. The research community urgently needs user-friendly tools to efficiently analyze the complicated data generated by high throughput sequencers. Results We developed a standalone tool with graphic user interface (GUI)-based analytic modules, known as eRNA. The capacity of performing parallel processing and sample management facilitates large data analyses by maximizing hardware usage and freeing users from tediously handling sequencing data. The module miRNA identification” includes GUIs for raw data reading, adapter removal, sequence alignment, and read counting. The module “mRNA identification” includes GUIs for reference sequences, genome mapping, transcript assembling, and differential expression. The module “Target screening” provides expression profiling analyses and graphic visualization. The module “Self-testing” offers the directory setups, sample management, and a check for third-party package dependency. Integration of other GUIs including Bowtie, miRDeep2, and miRspring extend the program’s functionality. Conclusions eRNA focuses on the common tools required for the mapping and quantification analysis of miRNA-seq and mRNA-seq data. The software package provides an additional choice for scientists who require a user-friendly computing environment and high-throughput capacity for large data analysis. eRNA is available for free download at https://sourceforge.net/projects/erna/?source=directory. PMID:24593312

  2. SNP discovery by amplicon sequencing and multiplex SNP genotyping in the allopolyploid species Brassica napus.

    Science.gov (United States)

    Durstewitz, G; Polley, A; Plieske, J; Luerssen, H; Graner, E M; Wieseke, R; Ganal, M W

    2010-11-01

    Oilseed rape (Brassica napus) is an allotetraploid species consisting of two genomes, derived from B. rapa (A genome) and B. oleracea (C genome). The presence of these two genomes makes single nucleotide polymorphism (SNP) marker identification and SNP analysis more challenging than in diploid species, as for a given locus usually two versions of a DNA sequence (based on the two ancestral genomes) have to be analyzed simultaneously during SNP identification and analysis. One hundred amplicons derived from expressed sequence tag (ESTs) were analyzed to identify SNPs in a panel of oilseed rape varieties and within two sister species representing the ancestral genomes. A total of 604 SNPs were identified, averaging one SNP in every 42 bp. It was possible to clearly discriminate SNPs that are polymorphic between different plant varieties from SNPs differentiating the two ancestral genomes. To validate the identified SNPs for their use in genetic analysis, we have developed Illumina GoldenGate assays for some of the identified SNPs. Through the analysis of a number of oilseed rape varieties and mapping populations with GoldenGate assays, we were able to identify a number of different segregation patterns in allotetraploid oilseed rape. The majority of the identified SNP markers can be readily used for genetic mapping, showing that amplicon sequencing and Illumina GoldenGate assays can be used to reliably identify SNP markers in tetraploid oilseed rape and to convert them into successful SNP assays that can be used for genetic analysis.

  3. Next-generation sequencing in veterinary medicine: how can the massive amount of information arising from high-throughput technologies improve diagnosis, control, and management of infectious diseases?

    Science.gov (United States)

    Van Borm, Steven; Belák, Sándor; Freimanis, Graham; Fusaro, Alice; Granberg, Fredrik; Höper, Dirk; King, Donald P; Monne, Isabella; Orton, Richard; Rosseel, Toon

    2015-01-01

    The development of high-throughput molecular technologies and associated bioinformatics has dramatically changed the capacities of scientists to produce, handle, and analyze large amounts of genomic, transcriptomic, and proteomic data. A clear example of this step-change is represented by the amount of DNA sequence data that can be now produced using next-generation sequencing (NGS) platforms. Similarly, recent improvements in protein and peptide separation efficiencies and highly accurate mass spectrometry have promoted the identification and quantification of proteins in a given sample. These advancements in biotechnology have increasingly been applied to the study of animal infectious diseases and are beginning to revolutionize the way that biological and evolutionary processes can be studied at the molecular level. Studies have demonstrated the value of NGS technologies for molecular characterization, ranging from metagenomic characterization of unknown pathogens or microbial communities to molecular epidemiology and evolution of viral quasispecies. Moreover, high-throughput technologies now allow detailed studies of host-pathogen interactions at the level of their genomes (genomics), transcriptomes (transcriptomics), or proteomes (proteomics). Ultimately, the interaction between pathogen and host biological networks can be questioned by analytically integrating these levels (integrative OMICS and systems biology). The application of high-throughput biotechnology platforms in these fields and their typical low-cost per information content has revolutionized the resolution with which these processes can now be studied. The aim of this chapter is to provide a current and prospective view on the opportunities and challenges associated with the application of massive parallel sequencing technologies to veterinary medicine, with particular focus on applications that have a potential impact on disease control and management.

  4. Profiling of the metabolically active community from a production-scale biogas plant by means of high-throughput metatranscriptome sequencing

    DEFF Research Database (Denmark)

    Zakrzewski, Martha; Goesmann, Alexander; Jaenicke, Sebastian

    2012-01-01

    Structural composition and gene content of a biogas-producing microbial community from a production-scale biogas plant fed with renewable primary products was recently analyzed by means of a metagenome sequencing approach. To determine the transcriptionally active part of the same biogas community...... and to identify key transcripts for the biogas production process, the metatranscriptome of the microorganisms was sequenced for the first time. The metatranscriptome sequence dataset generated on the Genome Sequencer FLX platform is represented by 484,920 sequence reads. Taxonomic profiling of the active part...... reads resulted in 18,598 high-quality 16S rDNA sequences covering the V3-V4 hypervariable region of the 16S rRNA gene. Comparison of the taxonomic profiles deduced from 16S rDNA amplicon sequences and the metatranscriptome dataset indicates a high transcriptional activity of archaeal species. Overall...

  5. PARTIE: a partition engine to separate metagenomic and amplicon projects in the Sequence Read Archive.

    Science.gov (United States)

    Torres, Pedro J; Edwards, Robert A; McNair, Katelyn A

    2017-08-01

    The Sequence Read Archive (SRA) contains raw data from many different types of sequence projects. As of 2017, the SRA contained approximately ten petabases of DNA sequence (10 16 bp). Annotations of the data are provided by the submitter, and mining the data in the SRA is complicated by both the amount of data and the detail within those annotations. Here, we introduce PARTIE, a partition engine optimized to differentiate sequence read data into metagenomic (random) and amplicon (targeted) sequence data sets. PARTIE subsamples reads from the sequencing file and calculates four different statistics: k -mer frequency, 16S abundance, prokaryotic- and viral-read abundance. These metrics are used to create a RandomForest decision tree to classify the sequencing data, and PARTIE provides mechanisms for both supervised and unsupervised classification. We demonstrate the accuracy of PARTIE for classifying SRA data, discuss the probable error rates in the SRA annotations and introduce a resource assessing SRA data. PARTIE and reclassified metagenome SRA entries are available from https://github.com/linsalrob/partie. redwards@mail.sdsu.edu. Supplementary data are available at Bioinformatics online.

  6. Global repeat discovery and estimation of genomic copy number in a large, complex genome using a high-throughput 454 sequence survey

    Directory of Open Access Journals (Sweden)

    Varala Kranthi

    2007-05-01

    Full Text Available Abstract Background Extensive computational and database tools are available to mine genomic and genetic databases for model organisms, but little genomic data is available for many species of ecological or agricultural significance, especially those with large genomes. Genome surveys using conventional sequencing techniques are powerful, particularly for detecting sequences present in many copies per genome. However these methods are time-consuming and have potential drawbacks. High throughput 454 sequencing provides an alternative method by which much information can be gained quickly and cheaply from high-coverage surveys of genomic DNA. Results We sequenced 78 million base-pairs of randomly sheared soybean DNA which passed our quality criteria. Computational analysis of the survey sequences provided global information on the abundant repetitive sequences in soybean. The sequence was used to determine the copy number across regions of large genomic clones or contigs and discover higher-order structures within satellite repeats. We have created an annotated, online database of sequences present in multiple copies in the soybean genome. The low bias of pyrosequencing against repeat sequences is demonstrated by the overall composition of the survey data, which matches well with past estimates of repetitive DNA content obtained by DNA re-association kinetics (Cot analysis. Conclusion This approach provides a potential aid to conventional or shotgun genome assembly, by allowing rapid assessment of copy number in any clone or clone-end sequence. In addition, we show that partial sequencing can provide access to partial protein-coding sequences.

  7. Application of high-throughput sequencing to whole rabies viral genome characterisation and its use for phylogenetic re-evaluation of a raccoon strain incursion into the province of Ontario.

    Science.gov (United States)

    Nadin-Davis, Susan A; Colville, Adam; Trewby, Hannah; Biek, Roman; Real, Leslie

    2017-03-15

    Raccoon rabies remains a serious public health problem throughout much of the eastern seaboard of North America due to the urban nature of the reservoir host and the many challenges inherent in multi-jurisdictional efforts to administer co-ordinated and comprehensive wildlife rabies control programmes. Better understanding of the mechanisms of spread of rabies virus can play a significant role in guiding such control efforts. To facilitate a detailed molecular epidemiological study of raccoon rabies virus movements across eastern North America, we developed a methodology to efficiently determine whole genome sequences of hundreds of viral samples. The workflow combines the generation of a limited number of overlapping amplicons covering the complete viral genome and use of high throughput sequencing technology. The value of this approach is demonstrated through a retrospective phylogenetic analysis of an outbreak of raccoon rabies which occurred in the province of Ontario between 1999 and 2005. As demonstrated by the number of single nucleotide polymorphisms detected, whole genome sequence data were far more effective than single gene sequences in discriminating between samples and this facilitated the generation of more robust and informative phylogenies that yielded insights into the spatio-temporal pattern of viral spread. With minor modification this approach could be applied to other rabies virus variants thereby facilitating greatly improved phylogenetic inference and thus better understanding of the spread of this serious zoonotic disease. Such information will inform the most appropriate strategies for rabies control in wildlife reservoirs. Crown Copyright © 2017. Published by Elsevier B.V. All rights reserved.

  8. Investigation of Human Cancers for Retrovirus by Low-Stringency Target Enrichment and High-Throughput Sequencing

    DEFF Research Database (Denmark)

    Vinner, Lasse; Mourier, Tobias; Friis-Nielsen, Jens

    2015-01-01

    small to be detected. Sequence variation among virus genomes complicates application of sequence-specific, and highly sensitive, PCR methods. Therefore, we aimed to develop and characterize a method that permits sensitive detection of sequences despite considerable variation. We demonstrate that our low...... biopsies. Nonetheless, our generally applicable method makes sensitive detection possible and permits sequencing of distantly related sequences from complex material....

  9. Leveraging the power of high performance computing for next generation sequencing data analysis: tricks and twists from a high throughput exome workflow.

    Directory of Open Access Journals (Sweden)

    Amit Kawalia

    Full Text Available Next generation sequencing (NGS has been a great success and is now a standard method of research in the life sciences. With this technology, dozens of whole genomes or hundreds of exomes can be sequenced in rather short time, producing huge amounts of data. Complex bioinformatics analyses are required to turn these data into scientific findings. In order to run these analyses fast, automated workflows implemented on high performance computers are state of the art. While providing sufficient compute power and storage to meet the NGS data challenge, high performance computing (HPC systems require special care when utilized for high throughput processing. This is especially true if the HPC system is shared by different users. Here, stability, robustness and maintainability are as important for automated workflows as speed and throughput. To achieve all of these aims, dedicated solutions have to be developed. In this paper, we present the tricks and twists that we utilized in the implementation of our exome data processing workflow. It may serve as a guideline for other high throughput data analysis projects using a similar infrastructure. The code implementing our solutions is provided in the supporting information files.

  10. Leveraging the Power of High Performance Computing for Next Generation Sequencing Data Analysis: Tricks and Twists from a High Throughput Exome Workflow

    Science.gov (United States)

    Wonczak, Stephan; Thiele, Holger; Nieroda, Lech; Jabbari, Kamel; Borowski, Stefan; Sinha, Vishal; Gunia, Wilfried; Lang, Ulrich; Achter, Viktor; Nürnberg, Peter

    2015-01-01

    Next generation sequencing (NGS) has been a great success and is now a standard method of research in the life sciences. With this technology, dozens of whole genomes or hundreds of exomes can be sequenced in rather short time, producing huge amounts of data. Complex bioinformatics analyses are required to turn these data into scientific findings. In order to run these analyses fast, automated workflows implemented on high performance computers are state of the art. While providing sufficient compute power and storage to meet the NGS data challenge, high performance computing (HPC) systems require special care when utilized for high throughput processing. This is especially true if the HPC system is shared by different users. Here, stability, robustness and maintainability are as important for automated workflows as speed and throughput. To achieve all of these aims, dedicated solutions have to be developed. In this paper, we present the tricks and twists that we utilized in the implementation of our exome data processing workflow. It may serve as a guideline for other high throughput data analysis projects using a similar infrastructure. The code implementing our solutions is provided in the supporting information files. PMID:25942438

  11. Combination of amplified rDNA restriction analysis and high-throughput sequencing revealed the negative effect of colistin sulfate on the diversity of soil microorganisms.

    Science.gov (United States)

    Fan, Tingli; Sun, Yongxue; Peng, Jinju; Wu, Qun; Ma, Yi; Zhou, Xiaohui

    2018-01-01

    Colistin sulfate is widely used in both human and veterinary medicine. However, its effect on the microbial ecologyis unknown. In this study, we determined the effect of colistin sulfate on the diversity of soil microorganisms by amplified rDNA restriction analysis (ARDRA) and high-throughput sequencing.ARDRAshowed that the diversity of DNA from soil microorganisms was reduced after soil was treated with colistin sulfate, with the most dramatic reductionobserved after 35days of treatment. High-throughput sequencing showed that the Chao1 and abundance-based coverage estimators (ACE) were reduced in the soils treated with colistin sulfate for 35 dayscompared to those treated with colistin sulfate for 7days. Furthermore, Chao1 and ACE tended to be lower when higher concentration of colistin sulfate was used, suggesting that the microbial abundance is reduced by colistin sulfate in a dose-dependent manner. Shannon index showed that the diversity of soil microorganism was reduced upon treatment with colistin sulfate compared to the untreated control group. Following 7days of treatment, Bacillus, Clostridiumand Sphingomonas were sensitive to all the concentration of colistin sulfate used in this study. Following 35days of treatment, the abundance of Choroplast, Haliangium, Pseudomonas, Lactococcus, and Clostridium was significantly decreased. Our results demonstrated that colistin sulfate especially at high concentration (≥5mg/kg) could alter the population structure of microorganisms and consequently the microbial community function in soil. Copyright © 2017 Elsevier GmbH. All rights reserved.

  12. Leveraging the power of high performance computing for next generation sequencing data analysis: tricks and twists from a high throughput exome workflow.

    Science.gov (United States)

    Kawalia, Amit; Motameny, Susanne; Wonczak, Stephan; Thiele, Holger; Nieroda, Lech; Jabbari, Kamel; Borowski, Stefan; Sinha, Vishal; Gunia, Wilfried; Lang, Ulrich; Achter, Viktor; Nürnberg, Peter

    2015-01-01

    Next generation sequencing (NGS) has been a great success and is now a standard method of research in the life sciences. With this technology, dozens of whole genomes or hundreds of exomes can be sequenced in rather short time, producing huge amounts of data. Complex bioinformatics analyses are required to turn these data into scientific findings. In order to run these analyses fast, automated workflows implemented on high performance computers are state of the art. While providing sufficient compute power and storage to meet the NGS data challenge, high performance computing (HPC) systems require special care when utilized for high throughput processing. This is especially true if the HPC system is shared by different users. Here, stability, robustness and maintainability are as important for automated workflows as speed and throughput. To achieve all of these aims, dedicated solutions have to be developed. In this paper, we present the tricks and twists that we utilized in the implementation of our exome data processing workflow. It may serve as a guideline for other high throughput data analysis projects using a similar infrastructure. The code implementing our solutions is provided in the supporting information files.

  13. Characterization of the fecal microbiota using high-throughput sequencing reveals a stable microbial community during storage.

    Directory of Open Access Journals (Sweden)

    Ian M Carroll

    Full Text Available The handling and treatment of biological samples is critical when characterizing the composition of the intestinal microbiota between different ecological niches or diseases. Specifically, exposure of fecal samples to room temperature or long term storage in deep freezing conditions may alter the composition of the microbiota. Thus, we stored fecal samples at room temperature and monitored the stability of the microbiota over twenty four hours. We also investigated the stability of the microbiota in fecal samples during a six month storage period at -80°C. As the stability of the fecal microbiota may be affected by intestinal disease, we analyzed two healthy controls and two patients with irritable bowel syndrome (IBS. We used high-throughput pyrosequencing of the 16S rRNA gene to characterize the microbiota in fecal samples stored at room temperature or -80°C at six and seven time points, respectively. The composition of microbial communities in IBS patients and healthy controls were determined and compared using the Quantitative Insights Into Microbial Ecology (QIIME pipeline. The composition of the microbiota in fecal samples stored for different lengths of time at room temperature or -80°C clustered strongly based on the host each sample originated from. Our data demonstrates that fecal samples exposed to room or deep freezing temperatures for up to twenty four hours and six months, respectively, exhibit a microbial composition and diversity that shares more identity with its host of origin than any other sample.

  14. jMHC: software assistant for multilocus genotyping of gene families using next-generation amplicon sequencing.

    Science.gov (United States)

    Stuglik, Michał T; Radwan, Jacek; Babik, Wiesław

    2011-07-01

    Genotyping of multilocus gene families, such as the major histocompatibility complex (MHC), may be challenging because of problems with assigning alleles to loci and copy number variation among individuals. Simultaneous amplification and genotyping of multiple loci may be necessary, and in such cases, next-generation deep amplicon sequencing offers a great promise as a genotyping method of choice. Here, we describe jMHC, a computer program developed for analysing and assisting in the visualization of deep amplicon sequencing data. Software operates on FASTA files; therefore, output from any sequencing technology may be used. jMHC was designed specifically for MHC studies but it may be useful for analysing amplicons derived from other multigene families or for genotyping other polymorphic systems. The program is written in Java with user-friendly graphical interface (GUI) and can be run on Microsoft Windows, Linux OS and Mac OS. © 2011 Blackwell Publishing Ltd.

  15. A comprehensive analysis of in vitro and in vivo genetic fitness of Pseudomonas aeruginosa using high-throughput sequencing of transposon libraries.

    Directory of Open Access Journals (Sweden)

    David Skurnik

    Full Text Available High-throughput sequencing of transposon (Tn libraries created within entire genomes identifies and quantifies the contribution of individual genes and operons to the fitness of organisms in different environments. We used insertion-sequencing (INSeq to analyze the contribution to fitness of all non-essential genes in the chromosome of Pseudomonas aeruginosa strain PA14 based on a library of ∼300,000 individual Tn insertions. In vitro growth in LB provided a baseline for comparison with the survival of the Tn insertion strains following 6 days of colonization of the murine gastrointestinal tract as well as a comparison with Tn-inserts subsequently able to systemically disseminate to the spleen following induction of neutropenia. Sequencing was performed following DNA extraction from the recovered bacteria, digestion with the MmeI restriction enzyme that hydrolyzes DNA 16 bp away from the end of the Tn insert, and fractionation into oligonucleotides of 1,200-1,500 bp that were prepared for high-throughput sequencing. Changes in frequency of Tn inserts into the P. aeruginosa genome were used to quantify in vivo fitness resulting from loss of a gene. 636 genes had <10 sequencing reads in LB, thus defined as unable to grow in this medium. During in vivo infection there were major losses of strains with Tn inserts in almost all known virulence factors, as well as respiration, energy utilization, ion pumps, nutritional genes and prophages. Many new candidates for virulence factors were also identified. There were consistent changes in the recovery of Tn inserts in genes within most operons and Tn insertions into some genes enhanced in vivo fitness. Strikingly, 90% of the non-essential genes were required for in vivo survival following systemic dissemination during neutropenia. These experiments resulted in the identification of the P. aeruginosa strain PA14 genes necessary for optimal survival in the mucosal and systemic environments of a mammalian

  16. Analysis of 16S rRNA amplicon sequencing options on the Roche/454 next-generation titanium sequencing platform.

    Directory of Open Access Journals (Sweden)

    Hideyuki Tamaki

    Full Text Available BACKGROUND: 16S rRNA gene pyrosequencing approach has revolutionized studies in microbial ecology. While primer selection and short read length can affect the resulting microbial community profile, little is known about the influence of pyrosequencing methods on the sequencing throughput and the outcome of microbial community analyses. The aim of this study is to compare differences in output, ease, and cost among three different amplicon pyrosequencing methods for the Roche/454 Titanium platform METHODOLOGY/PRINCIPAL FINDINGS: The following three pyrosequencing methods for 16S rRNA genes were selected in this study: Method-1 (standard method is the recommended method for bi-directional sequencing using the LIB-A kit; Method-2 is a new option designed in this study for unidirectional sequencing with the LIB-A kit; and Method-3 uses the LIB-L kit for unidirectional sequencing. In our comparison among these three methods using 10 different environmental samples, Method-2 and Method-3 produced 1.5-1.6 times more useable reads than the standard method (Method-1, after quality-based trimming, and did not compromise the outcome of microbial community analyses. Specifically, Method-3 is the most cost-effective unidirectional amplicon sequencing method as it provided the most reads and required the least effort in consumables management. CONCLUSIONS: Our findings clearly demonstrated that alternative pyrosequencing methods for 16S rRNA genes could drastically affect sequencing output (e.g. number of reads before and after trimming but have little effect on the outcomes of microbial community analysis. This finding is important for both researchers and sequencing facilities utilizing 16S rRNA gene pyrosequencing for microbial ecological studies.

  17. Analysis of 16S rRNA amplicon sequencing options on the Roche/454 next-generation titanium sequencing platform.

    Science.gov (United States)

    Tamaki, Hideyuki; Wright, Chris L; Li, Xiangzhen; Lin, Qiaoyan; Hwang, Chiachi; Wang, Shiping; Thimmapuram, Jyothi; Kamagata, Yoichi; Liu, Wen-Tso

    2011-01-01

    16S rRNA gene pyrosequencing approach has revolutionized studies in microbial ecology. While primer selection and short read length can affect the resulting microbial community profile, little is known about the influence of pyrosequencing methods on the sequencing throughput and the outcome of microbial community analyses. The aim of this study is to compare differences in output, ease, and cost among three different amplicon pyrosequencing methods for the Roche/454 Titanium platform The following three pyrosequencing methods for 16S rRNA genes were selected in this study: Method-1 (standard method) is the recommended method for bi-directional sequencing using the LIB-A kit; Method-2 is a new option designed in this study for unidirectional sequencing with the LIB-A kit; and Method-3 uses the LIB-L kit for unidirectional sequencing. In our comparison among these three methods using 10 different environmental samples, Method-2 and Method-3 produced 1.5-1.6 times more useable reads than the standard method (Method-1), after quality-based trimming, and did not compromise the outcome of microbial community analyses. Specifically, Method-3 is the most cost-effective unidirectional amplicon sequencing method as it provided the most reads and required the least effort in consumables management. Our findings clearly demonstrated that alternative pyrosequencing methods for 16S rRNA genes could drastically affect sequencing output (e.g. number of reads before and after trimming) but have little effect on the outcomes of microbial community analysis. This finding is important for both researchers and sequencing facilities utilizing 16S rRNA gene pyrosequencing for microbial ecological studies.

  18. The challenges of using high-throughput sequencing to track multiple bipartite mycoviruses of wild orchid-fungus partnerships over consecutive years.

    Science.gov (United States)

    Ong, Jamie W L; Li, Hua; Sivasithamparam, Krishnapillai; Dixon, Kingsley W; Jones, Michael G K; Wylie, Stephen J

    2017-10-01

    The bipartite alpha- and betapartitiviruses are recorded from a wide range of fungi and plants. Using a combination of dsRNA-enrichment, high-throughput shotgun sequencing and informatics, we report the occurrence of multiple new partitiviruses associated with mycorrhizal Ceratobasidium fungi, themselves symbiotically associated with a small wild population of Pterostylis sanguinea orchids in Australia, over two consecutive years. Twenty-one partial or near-complete sequences representing 16 definitive alpha- and betapartitivirus species, and further possible species, were detected from two fungal isolates. The majority of partitiviruses occurred in fungal isolates from both years. Two of the partitiviruses represent phylogenetically divergent forms of Alphapartitivirus, suggesting that they may have evolved under long geographical isolation there. We address the challenge of pairing the two genomic segments of partitiviruses to identify species when multiple partitiviruses co-infect a single host. Copyright © 2017 Elsevier Inc. All rights reserved.

  19. Characterization of bacteria in biopsies of colon and stools by high throughput sequencing of the V2 region of bacterial 16S rRNA gene in human.

    Directory of Open Access Journals (Sweden)

    Yukihide Momozawa

    Full Text Available BACKGROUND: The characterization of the human intestinal microflora and their interactions with the host have been identified as key components in the study of intestinal disorders such as inflammatory bowel diseases. High-throughput sequencing has enabled culture-independent studies to deeply analyze bacteria in the gut. It is possible with this technology to systematically analyze links between microbes and the genetic constitution of the host, such as DNA polymorphisms and methylation, and gene expression. METHODS AND FINDINGS: In this study the V2 region of the bacterial 16S ribosomal RNA (rRNA gene using 454 pyrosequencing from seven anatomic regions of human colon and two types of stool specimens were analyzed. The study examined the number of reads needed to ascertain differences between samples, the effect of DNA extraction procedures and PCR reproducibility, and differences between biopsies and stools in order to design a large scale systematic analysis of gut microbes. It was shown (1 that sequence coverage lower than 1,000 reads influenced quantitative and qualitative differences between samples measured by UniFrac distances. Distances between samples became stable after 1,000 reads. (2 Difference of extracted bacteria was observed between the two DNA extraction methods. In particular, Firmicutes Bacilli were not extracted well by one method. (3 Quantitative and qualitative difference in bacteria from ileum to rectum colon were not observed, but there was a significant positive trend between distances within colon and quantitative differences. Between sample type, biopsies or stools, quantitative and qualitative differences were observed. CONCLUSIONS: Results of human colonic bacteria analyzed using high-throughput sequencing were highly dependent on the experimental design, especially the number of sequence reads, DNA extraction method, and sample type.

  20. Microdissection of lampbrush chromosomes as an approach for generation of locus-specific FISH-probes and samples for high-throughput sequencing.

    Science.gov (United States)

    Zlotina, Anna; Kulikova, Tatiana; Kosyakova, Nadezda; Liehr, Thomas; Krasikova, Alla

    2016-02-20

    Over the past two decades, chromosome microdissection has been widely used in diagnostics and research enabling analysis of chromosomes and their regions through probe generation and establishing of chromosome- and chromosome region-specific DNA libraries. However, relatively small physical size of mitotic chromosomes limited the use of the conventional chromosome microdissection for investigation of tiny chromosomal regions. In the present study, we developed a workflow for mechanical microdissection of giant transcriptionally active lampbrush chromosomes followed by the preparation of whole-chromosome and locus-specific fluorescent in situ hybridization (FISH)-probes and high-throughput sequencing. In particular, chicken (Gallus g. domesticus) lampbrush chromosome regions as small as single chromomeres, individual lateral loops and marker structures were successfully microdissected. The dissected fragments were mapped with high resolution to target regions of the corresponding lampbrush chromosomes. For investigation of RNA-content of lampbrush chromosome structures, samples retrieved by microdissection were subjected to reverse transcription. Using high-throughput sequencing, the isolated regions were successfully assigned to chicken genome coordinates. As a result, we defined precisely the loci for marker structures formation on chicken lampbrush chromosomes 2 and 3. Additionally, our data suggest that large DAPI-positive chromomeres of chicken lampbrush chromosome arms are characterized by low gene density and high repeat content. The developed technical approach allows to obtain DNA and RNA samples from particular lampbrush chromosome loci, to define precisely the genomic position, extent and sequence content of the dissected regions. The data obtained demonstrate that lampbrush chromosome microdissection provides a unique opportunity to correlate a particular transcriptional domain or a cytological structure with a known DNA sequence. This approach offers

  1. Use of genotyping by sequencing data to develop a high-throughput and multifunctional SNP panel for conservation applications in Pacific lamprey.

    Science.gov (United States)

    Hess, Jon E; Campbell, Nathan R; Docker, Margaret F; Baker, Cyndi; Jackson, Aaron; Lampman, Ralph; McIlraith, Brian; Moser, Mary L; Statler, David P; Young, William P; Wildbill, Andrew J; Narum, Shawn R

    2015-01-01

    Next-generation sequencing data can be mined for highly informative single nucleotide polymorphisms (SNPs) to develop high-throughput genomic assays for nonmodel organisms. However, choosing a set of SNPs to address a variety of objectives can be difficult because SNPs are often not equally informative. We developed an optimal combination of 96 high-throughput SNP assays from a total of 4439 SNPs identified in a previous study of Pacific lamprey (Entosphenus tridentatus) and used them to address four disparate objectives: parentage analysis, species identification and characterization of neutral and adaptive variation. Nine of these SNPs are FST outliers, and five of these outliers are localized within genes and significantly associated with geography, run-timing and dwarf life history. Two of the 96 SNPs were diagnostic for two other lamprey species that were morphologically indistinguishable at early larval stages and were sympatric in the Pacific Northwest. The majority (85) of SNPs in the panel were highly informative for parentage analysis, that is, putatively neutral with high minor allele frequency across the species' range. Results from three case studies are presented to demonstrate the broad utility of this panel of SNP markers in this species. As Pacific lamprey populations are undergoing rapid decline, these SNPs provide an important resource to address critical uncertainties associated with the conservation and recovery of this imperiled species. © 2014 John Wiley & Sons Ltd.

  2. One step forwards for the routine use of high-throughput DNA sequencing in environmental monitoring. An efficient and standardizable method to maximize the detection of environmental bacteria.

    Science.gov (United States)

    Bruno, Antonia; Sandionigi, Anna; Galimberti, Andrea; Siani, Eleonora; Labra, Massimo; Cocuzza, Clementina; Ferri, Emanuele; Casiraghi, Maurizio

    2017-02-01

    We propose an innovative, repeatable, and reliable experimental workflow to concentrate and detect environmental bacteria in drinking water using molecular techniques. We first concentrated bacteria in water samples using tangential flow filtration and then we evaluated two methods of environmental DNA extraction. We performed tests on both artificially contaminated water samples and real drinking water samples. The efficiency of the experimental workflow was measured through qPCR. The successful applicability of the high-throughput DNA sequencing (HTS) approach was demonstrated on drinking water samples. Our results demonstrate the feasibility of our approach in high-throughput-based studies, and we suggest incorporating it in monitoring strategies to have a better representation of the microbial community. In the recent years, HTS techniques have become key tools in the study of microbial communities. To make the leap from academic laboratories to the routine monitoring (e.g., water treatment plants laboratories), we here propose an experimental workflow suitable for the introduction of HTS as a standard method for detecting environmental bacteria. © 2016 The Authors. MicrobiologyOpen published by John Wiley & Sons Ltd.

  3. Species identification and profiling of complex microbial communities using shotgun Illumina sequencing of 16S rRNA amplicon sequences.

    Directory of Open Access Journals (Sweden)

    Swee Hoe Ong

    Full Text Available The high throughput and cost-effectiveness afforded by short-read sequencing technologies, in principle, enable researchers to perform 16S rRNA profiling of complex microbial communities at unprecedented depth and resolution. Existing Illumina sequencing protocols are, however, limited by the fraction of the 16S rRNA gene that is interrogated and therefore limit the resolution and quality of the profiling. To address this, we present the design of a novel protocol for shotgun Illumina sequencing of the bacterial 16S rRNA gene, optimized to amplify more than 90% of sequences in the Greengenes database and with the ability to distinguish nearly twice as many species-level OTUs compared to existing protocols. Using several in silico and experimental datasets, we demonstrate that despite the presence of multiple variable and conserved regions, the resulting shotgun sequences can be used to accurately quantify the constituents of complex microbial communities. The reconstruction of a significant fraction of the 16S rRNA gene also enabled high precision (>90% in species-level identification thereby opening up potential application of this approach for clinical microbial characterization.

  4. Using high-throughput sequencing to leverage surveillance of genetic diversity and oseltamivir resistance: a pilot study during the 2009 influenza A(H1N1 pandemic.

    Directory of Open Access Journals (Sweden)

    Juan Téllez-Sosa

    Full Text Available BACKGROUND: Influenza viruses display a high mutation rate and complex evolutionary patterns. Next-generation sequencing (NGS has been widely used for qualitative and semi-quantitative assessment of genetic diversity in complex biological samples. The "deep sequencing" approach, enabled by the enormous throughput of current NGS platforms, allows the identification of rare genetic viral variants in targeted genetic regions, but is usually limited to a small number of samples. METHODOLOGY AND PRINCIPAL FINDINGS: We designed a proof-of-principle study to test whether redistributing sequencing throughput from a high depth-small sample number towards a low depth-large sample number approach is feasible and contributes to influenza epidemiological surveillance. Using 454-Roche sequencing, we sequenced at a rather low depth, a 307 bp amplicon of the neuraminidase gene of the Influenza A(H1N1 pandemic (A(H1N1pdm virus from cDNA amplicons pooled in 48 barcoded libraries obtained from nasal swab samples of infected patients (n  =  299 taken from May to November, 2009 pandemic period in Mexico. This approach revealed that during the transition from the first (May-July to second wave (September-November of the pandemic, the initial genetic variants were replaced by the N248D mutation in the NA gene, and enabled the establishment of temporal and geographic associations with genetic diversity and the identification of mutations associated with oseltamivir resistance. CONCLUSIONS: NGS sequencing of a short amplicon from the NA gene at low sequencing depth allowed genetic screening of a large number of samples, providing insights to viral genetic diversity dynamics and the identification of genetic variants associated with oseltamivir resistance. Further research is needed to explain the observed replacement of the genetic variants seen during the second wave. As sequencing throughput rises and library multiplexing and automation improves, we foresee that

  5. Uncovering leaf rust responsive miRNAs in wheat (Triticum aestivum L.) using high-throughput sequencing and prediction of their targets through degradome analysis.

    Science.gov (United States)

    Kumar, Dhananjay; Dutta, Summi; Singh, Dharmendra; Prabhu, Kumble Vinod; Kumar, Manish; Mukhopadhyay, Kunal

    2017-01-01

    Deep sequencing identified 497 conserved and 559 novel miRNAs in wheat, while degradome analysis revealed 701 targets genes. QRT-PCR demonstrated differential expression of miRNAs during stages of leaf rust progression. Bread wheat (Triticum aestivum L.) is an important cereal food crop feeding 30 % of the world population. Major threat to wheat production is the rust epidemics. This study was targeted towards identification and functional characterizations of micro(mi)RNAs and their target genes in wheat in response to leaf rust ingression. High-throughput sequencing was used for transcriptome-wide identification of miRNAs and their expression profiling in retort to leaf rust using mock and pathogen-inoculated resistant and susceptible near-isogenic wheat plants. A total of 1056 mature miRNAs were identified, of which 497 miRNAs were conserved and 559 miRNAs were novel. The pathogen-inoculated resistant plants manifested more miRNAs compared with the pathogen infected susceptible plants. The miRNA counts increased in susceptible isoline due to leaf rust, conversely, the counts decreased in the resistant isoline in response to pathogenesis illustrating precise spatial tuning of miRNAs during compatible and incompatible interaction. Stem-loop quantitative real-time PCR was used to profile 10 highly differentially expressed miRNAs obtained from high-throughput sequencing data. The spatio-temporal profiling validated the differential expression of miRNAs between the isolines as well as in retort to pathogen infection. Degradome analysis provided 701 predicted target genes associated with defense response, signal transduction, development, metabolism, and transcriptional regulation. The obtained results indicate that wheat isolines employ diverse arrays of miRNAs that modulate their target genes during compatible and incompatible interaction. Our findings contribute to increase knowledge on roles of microRNA in wheat-leaf rust interactions and could help in rust

  6. SLAF-seq: an efficient method of large-scale de novo SNP discovery and genotyping using high-throughput sequencing.

    Directory of Open Access Journals (Sweden)

    Xiaowen Sun

    Full Text Available Large-scale genotyping plays an important role in genetic association studies. It has provided new opportunities for gene discovery, especially when combined with high-throughput sequencing technologies. Here, we report an efficient solution for large-scale genotyping. We call it specific-locus amplified fragment sequencing (SLAF-seq. SLAF-seq technology has several distinguishing characteristics: i deep sequencing to ensure genotyping accuracy; ii reduced representation strategy to reduce sequencing costs; iii pre-designed reduced representation scheme to optimize marker efficiency; and iv double barcode system for large populations. In this study, we tested the efficiency of SLAF-seq on rice and soybean data. Both sets of results showed strong consistency between predicted and practical SLAFs and considerable genotyping accuracy. We also report the highest density genetic map yet created for any organism without a reference genome sequence, common carp in this case, using SLAF-seq data. We detected 50,530 high-quality SLAFs with 13,291 SNPs genotyped in 211 individual carp. The genetic map contained 5,885 markers with 0.68 cM intervals on average. A comparative genomics study between common carp genetic map and zebrafish genome sequence map showed high-quality SLAF-seq genotyping results. SLAF-seq provides a high-resolution strategy for large-scale genotyping and can be generally applicable to various species and populations.

  7. Enhancing the detection of barcoded reads in high throughput DNA sequencing data by controlling the false discovery rate

    NARCIS (Netherlands)

    Buschmann, Tilo; Zhang, Rong; Brash, Douglas E.; Bystrykh, Leonid V.

    2014-01-01

    Background: DNA barcodes are short unique sequences used to label DNA or RNA-derived samples in multiplexed deep sequencing experiments. During the demultiplexing step, barcodes must be detected and their position identified. In some cases (e. g., with PacBio SMRT), the position of the barcode and

  8. ssHMM: extracting intuitive sequence-structure motifs from high-throughput RNA-binding protein data.

    Science.gov (United States)

    Heller, David; Krestel, Ralf; Ohler, Uwe; Vingron, Martin; Marsico, Annalisa

    2017-11-02

    RNA-binding proteins (RBPs) play an important role in RNA post-transcriptional regulation and recognize target RNAs via sequence-structure motifs. The extent to which RNA structure influences protein binding in the presence or absence of a sequence motif is still poorly understood. Existing RNA motif finders either take the structure of the RNA only partially into account, or employ models which are not directly interpretable as sequence-structure motifs. We developed ssHMM, an RNA motif finder based on a hidden Markov model (HMM) and Gibbs sampling which fully captures the relationship between RNA sequence and secondary structure preference of a given RBP. Compared to previous methods which output separate logos for sequence and structure, it directly produces a combined sequence-structure motif when trained on a large set of sequences. ssHMM's model is visualized intuitively as a graph and facilitates biological interpretation. ssHMM can be used to find novel bona fide sequence-structure motifs of uncharacterized RBPs, such as the one presented here for the YY1 protein. ssHMM reaches a high motif recovery rate on synthetic data, it recovers known RBP motifs from CLIP-Seq data, and scales linearly on the input size, being considerably faster than MEMERIS and RNAcontext on large datasets while being on par with GraphProt. It is freely available on Github and as a Docker image. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  9. Using High-Throughput Sequencing to Leverage Surveillance of Genetic Diversity and Oseltamivir Resistance: A Pilot Study during the 2009 Influenza A(H1N1) Pandemic

    Science.gov (United States)

    Téllez-Sosa, Juan; Rodríguez, Mario Henry; Gómez-Barreto, Rosa E.; Valdovinos-Torres, Humberto; Hidalgo, Ana Cecilia; Cruz-Hervert, Pablo; Luna, René Santos; Carrillo-Valenzo, Erik; Ramos, Celso; García-García, Lourdes; Martínez-Barnetche, Jesús

    2013-01-01

    Background Influenza viruses display a high mutation rate and complex evolutionary patterns. Next-generation sequencing (NGS) has been widely used for qualitative and semi-quantitative assessment of genetic diversity in complex biological samples. The “deep sequencing” approach, enabled by the enormous throughput of current NGS platforms, allows the identification of rare genetic viral variants in targeted genetic regions, but is usually limited to a small number of samples. Methodology and Principal Findings We designed a proof-of-principle study to test whether redistributing sequencing throughput from a high depth-small sample number towards a low depth-large sample number approach is feasible and contributes to influenza epidemiological surveillance. Using 454-Roche sequencing, we sequenced at a rather low depth, a 307 bp amplicon of the neuraminidase gene of the Influenza A(H1N1) pandemic (A(H1N1)pdm) virus from cDNA amplicons pooled in 48 barcoded libraries obtained from nasal swab samples of infected patients (n  =  299) taken from May to November, 2009 pandemic period in Mexico. This approach revealed that during the transition from the first (May-July) to second wave (September-November) of the pandemic, the initial genetic variants were replaced by the N248D mutation in the NA gene, and enabled the establishment of temporal and geographic associations with genetic diversity and the identification of mutations associated with oseltamivir resistance. Conclusions NGS sequencing of a short amplicon from the NA gene at low sequencing depth allowed genetic screening of a large number of samples, providing insights to viral genetic diversity dynamics and the identification of genetic variants associated with oseltamivir resistance. Further research is needed to explain the observed replacement of the genetic variants seen during the second wave. As sequencing throughput rises and library multiplexing and automation improves, we foresee that the approach

  10. Comparison of two high-throughput semiconductor chip sequencing platforms in noninvasive prenatal testing for Down syndrome in early pregnancy.

    Science.gov (United States)

    Kim, Sunshin; Jung, HeeJung; Han, Sung Hee; Lee, SeungJae; Kwon, JeongSub; Kim, Min Gyun; Chu, Hyungsik; Chen, Hongliang; Han, Kyudong; Kwak, Hwanjong; Park, Sunghoon; Joo, Hee Jae; Kim, Byung Chul; Bhak, Jong

    2016-04-30

    Noninvasive prenatal testing (NIPT) to detect fetal aneuploidy using next-generation sequencing on ion semiconductor platforms has become common. There are several sequencers that can generate sufficient DNA reads for NIPT. However, the approval criteria vary among platforms and countries. This can delay the introduction of such devices and systems to clinics. A comparison of the sensitivity and specificity of two different platforms using the same sequencing chemistry could be useful in NIPT for fetal chromosomal aneuploidies. This would improve healthcare authorities' confidence in decision-making on sequencing-based tests. One hundred and one pregnant women who were predicted at high risk of fetal defects using conventional prenatal screening tests, and who underwent definitive diagnosis by full karyotyping, were enrolled from three hospitals in Korea. Most of the pregnant women (69.79 %) received NIPT during weeks 11-13 of gestation and 30.21 % during weeks 14-18. We used Ion Torrent PGM and Proton semi-conductor-based sequencers with 0.3× sequencing coverage depth. The average total reads of 101 samples were approximately 4.5 and 7.6 M for PGM and Proton, respectively. A Burrows-Wheeler Aligner (BWA) algorithm was used for the alignment, and a z-score was used to decide fetal trisomy 21. Interactive dot diagrams from the sequencing data showed minimal z-score values of 2.07 and 2.10 to discriminate negative versus positive cases of fetal trisomy 21 for the two different sequencing systems. Our z-score-based discrimination method resulted in 100 % positive and negative prediction values for both ion semiconductor PGM and Proton sequencers, regardless of their sequencing chip and chemistry differences. Both platforms performed well at an early stage (11-13 weeks of gestation) compared with previous studies. These results suggested that, using two different sequencers, NIPT to detect fetal trisomy 21 in early pregnancy is accurate and platform

  11. Identification and characterization of cold-responsive microRNAs in tea plant (Camellia sinensis) and their targets using high-throughput sequencing and degradome analysis.

    Science.gov (United States)

    Zhang, Yue; Zhu, Xujun; Chen, Xuan; Song, Changnian; Zou, Zhongwei; Wang, Yuhua; Wang, Mingle; Fang, Wanping; Li, Xinghui

    2014-10-21

    MicroRNAs (miRNAs) are approximately 19 ~ 21 nucleotide noncoding RNAs produced by Dicer-catalyzed excision from stem-loop precursors. Many plant miRNAs have critical functions in development, nutrient homeostasis, abiotic stress responses, and pathogen responses via interaction with specific target mRNAs. Camellia sinensis is one of the most important commercial beverage crops in the world. However, miRNAs associated with cold stress tolerance in C. sinensis remains unexplored. The use of high-throughput sequencing can provide a much deeper understanding of miRNAs. To obtain more insight into the function of miRNAs in cold stress tolerance, Illumina sequencing of C. sinensis sRNA was conducted. Solexa sequencing technology was used for high-throughput sequencing of the small RNA library from the cold treatment of tea leaves. To align the sequencing data with known plant miRNAs, we characterized 106 conserved C. sinensis miRNAs. In addition, 215 potential candidate miRNAs were found, among, which 98 candidates with star sequences were chosen as novel miRNAs. Both congruously and differentially regulated miRNAs were obtained, and cultivar-specific miRNAs were identified by microarray-based hybridization in response to cold stress. The results were also confirmed by quantitative real-time polymerase chain reaction. To confirm the targets of miRNAs, two degradome libraries from two treatments were constructed. According to degradome sequencing, 455 and 591 genes were identified as cleavage targets of miRNAs from cold treatments and control libraries, respectively, and 283 targets were present in both libraries. Functional analysis of these miRNA targets indicated their involvement in important activities, such as development, regulation of transcription, and stress response. We discovered 31 up-regulated miRNAs and 43 down-regulated miRNAs in 'Yingshuang', and 46 up-regulated miRNA and 45 down-regulated miRNAs in 'Baiye 1' in response to cold stress, respectively. A

  12. Identification of Treponema pedis as the predominant Treponema species in porcine skin ulcers by fluorescence in situ hybridization and high-throughput sequencing

    DEFF Research Database (Denmark)

    Karlsson, Frida; Schou, Kirstine Klitgaard; Jensen, Tim Kåre

    2014-01-01

    Skin lesions often seen in pig production are of great animal welfare concern. To study the potential role of Treponema bacteria in porcine skin ulcers, we investigated the presence and distribution of these organisms in decubital shoulder ulcers (n=51) and ear necroses (n=54) by fluorescence....... The results from this study point toward an important role of T. pedis as a secondary bacterial infection in porcine skin ulcers, especially in severe and chronic lesions....... in situ hybridization (FISH) and high-throughput sequencing. In addition, two cases of facial ulcers and five cases of other skin ulcers were included in the study. Samples from all 112 skin lesions and intact skin from pigs without skin ulcers (n=14) were screened by FISH. Three different oligonucleotide...

  13. An effective method to purify Plasmodium falciparum DNA directly from clinical blood samples for whole genome high-throughput sequencing.

    Directory of Open Access Journals (Sweden)

    Sarah Auburn

    Full Text Available Highly parallel sequencing technologies permit cost-effective whole genome sequencing of hundreds of Plasmodium parasites. The ability to sequence clinical Plasmodium samples, extracted directly from patient blood without a culture step, presents a unique opportunity to sample the diversity of "natural" parasite populations in high resolution clinical and epidemiological studies. A major challenge to sequencing clinical Plasmodium samples is the abundance of human DNA, which may substantially reduce the yield of Plasmodium sequence. We tested a range of human white blood cell (WBC depletion methods on P. falciparum-infected patient samples in search of a method displaying an optimal balance of WBC-removal efficacy, cost, simplicity, and applicability to low resource settings. In the first of a two-part study, combinations of three different WBC depletion methods were tested on 43 patient blood samples in Mali. A two-step combination of Lymphoprep plus Plasmodipur best fitted our requirements, although moderate variability was observed in human DNA quantity. This approach was further assessed in a larger sample of 76 patients from Burkina Faso. WBC-removal efficacy remained high (70% samples and lower variation was observed in human DNA quantities. In order to assess the Plasmodium sequence yield at different human DNA proportions, 59 samples with up to 60% human DNA contamination were sequenced on the Illumina Genome Analyzer platform. An average ~40-fold coverage of the genome was observed per lane for samples with ≤ 30% human DNA. Even in low resource settings, using a simple two-step combination of Lymphoprep plus Plasmodipur, over 70% of clinical sample preparations should exhibit sufficiently low human DNA quantities to enable ~40-fold sequence coverage of the P. falciparum genome using a single lane on the Illumina Genome Analyzer platform. This approach should greatly facilitate large-scale clinical and epidemiologic studies of P

  14. Diagnostic single gene analyses beyond Sanger. Economic high-throughput sequencing of small genes involved in congenital coagulation and platelet disorders.

    Science.gov (United States)

    Najm, Juliane; Rath, Matthias; Schröder, Winnie; Felbor, Ute

    2017-07-17

    Molecular testing of congenital coagulation and platelet disorders offers confirmation of clinical diagnoses, supports genetic counselling, and enables predictive and prenatal diagnosis. In some cases, genotype-phenotype correlations are important for predicting the clinical course of the disease and adaptation of individualized therapy. Until recently, genotyping has been mainly performed by Sanger sequencing. While next generation sequencing (NGS) enables the parallel analysis of multiple genes, the cost-value ratio of custom-made panels can be unfavorable for analyses of specific small genes. The aim of this study was to transfer genotyping of small genes involved in congenital coagulation and platelet disorders from Sanger sequencing to an NGS-based method. A LR-PCR approach for target enrichment of the entire genomic regions of the genes F7, F10, F11, F12, GATA1, MYH9, TUBB1 and WAS was combined with high-throughput sequencing on a MiSeq platform. NGS detected all variants that had previously been identified by Sanger sequencing. Our results demonstrate that this approach is an accurate and flexible tool for molecular genetic diagnostics of single small genes.

  15. High throughput sequencing and proteomics to identify immunogenic proteins of a new pathogen: the dirty genome approach.

    Directory of Open Access Journals (Sweden)

    Gilbert Greub

    Full Text Available BACKGROUND: With the availability of new generation sequencing technologies, bacterial genome projects have undergone a major boost. Still, chromosome completion needs a costly and time-consuming gap closure, especially when containing highly repetitive elements. However, incomplete genome data may be sufficiently informative to derive the pursued information. For emerging pathogens, i.e. newly identified pathogens, lack of release of genome data during gap closure stage is clearly medically counterproductive. METHODS/PRINCIPAL FINDINGS: We thus investigated the feasibility of a dirty genome approach, i.e. the release of unfinished genome sequences to develop serological diagnostic tools. We showed that almost the whole genome sequence of the emerging pathogen Parachlamydia acanthamoebae was retrieved even with relatively short reads from Genome Sequencer 20 and Solexa. The bacterial proteome was analyzed to select immunogenic proteins, which were then expressed and used to elaborate the first steps of an ELISA. CONCLUSIONS/SIGNIFICANCE: This work constitutes the proof of principle for a dirty genome approach, i.e. the use of unfinished genome sequences of pathogenic bacteria, coupled with proteomics to rapidly identify new immunogenic proteins useful to develop in the future specific diagnostic tests such as ELISA, immunohistochemistry and direct antigen detection. Although applied here to an emerging pathogen, this combined dirty genome sequencing/proteomic approach may be used for any pathogen for which better diagnostics are needed. These genome sequences may also be very useful to develop DNA based diagnostic tests. All these diagnostic tools will allow further evaluations of the pathogenic potential of this obligate intracellular bacterium.

  16. Enhancing the detection of barcoded reads in high throughput DNA sequencing data by controlling the false discovery rate.

    Science.gov (United States)

    Buschmann, Tilo; Zhang, Rong; Brash, Douglas E; Bystrykh, Leonid V

    2014-08-07

    DNA barcodes are short unique sequences used to label DNA or RNA-derived samples in multiplexed deep sequencing experiments. During the demultiplexing step, barcodes must be detected and their position identified. In some cases (e.g., with PacBio SMRT), the position of the barcode and DNA context is not well defined. Many reads start inside the genomic insert so that adjacent primers might be missed. The matter is further complicated by coincidental similarities between barcode sequences and reference DNA. Therefore, a robust strategy is required in order to detect barcoded reads and avoid a large number of false positives or negatives.For mass inference problems such as this one, false discovery rate (FDR) methods are powerful and balanced solutions. Since existing FDR methods cannot be applied to this particular problem, we present an adapted FDR method that is suitable for the detection of barcoded reads as well as suggest possible improvements. In our analysis, barcode sequences showed high rates of coincidental similarities with the Mus musculus reference DNA. This problem became more acute when the length of the barcode sequence decreased and the number of barcodes in the set increased. The method presented in this paper controls the tail area-based false discovery rate to distinguish between barcoded and unbarcoded reads. This method helps to establish the highest acceptable minimal distance between reads and barcode sequences. In a proof of concept experiment we correctly detected barcodes in 83% of the reads with a precision of 89%. Sensitivity improved to 99% at 99% precision when the adjacent primer sequence was incorporated in the analysis. The analysis was further improved using a paired end strategy. Following an analysis of the data for sequence variants induced in the Atp1a1 gene of C57BL/6 murine melanocytes by ultraviolet light and conferring resistance to ouabain, we found no evidence of cross-contamination of DNA material between samples. Our

  17. High throughput protein production screening

    Science.gov (United States)

    Beernink, Peter T [Walnut Creek, CA; Coleman, Matthew A [Oakland, CA; Segelke, Brent W [San Ramon, CA

    2009-09-08

    Methods, compositions, and kits for the cell-free production and analysis of proteins are provided. The invention allows for the production of proteins from prokaryotic sequences or eukaryotic sequences, including human cDNAs using PCR and IVT methods and detecting the proteins through fluorescence or immunoblot techniques. This invention can be used to identify optimized PCR and WT conditions, codon usages and mutations. The methods are readily automated and can be used for high throughput analysis of protein expression levels, interactions, and functional states.

  18. Developing high throughput genotyped chromosome segment substitution lines based on population whole-genome re-sequencing in rice (Oryza sativa L.

    Directory of Open Access Journals (Sweden)

    Gu Minghong

    2010-11-01

    Full Text Available Abstract Background Genetic populations provide the basis for a wide range of genetic and genomic studies and have been widely used in genetic mapping, gene discovery and genomics-assisted breeding. Chromosome segment substitution lines (CSSLs are the most powerful tools for the detection and precise mapping of quantitative trait loci (QTLs, for the analysis of complex traits in plant molecular genetics. Results In this study, a wide population consisting of 128 CSSLs was developed, derived from the crossing and back-crossing of two sequenced rice cultivars: 9311, an elite indica cultivar as the recipient and Nipponbare, a japonica cultivar as the donor. First, a physical map of the 128 CSSLs was constructed on the basis of estimates of the lengths and locations of the substituted chromosome segments using 254 PCR-based molecular markers. From this map, the total size of the 142 substituted segments in the population was 882.2 Mb, was 2.37 times that of the rice genome. Second, every CSSL underwent high-throughput genotyping by whole-genome re-sequencing with a 0.13× genome sequence, and an ultrahigh-quality physical map was constructed. This sequencing-based physical map indicated that 117 new segments were detected; almost all were shorter than 3 Mb and were not apparent in the molecular marker map. Furthermore, relative to the molecular marker-based map, the sequencing-based map yielded more precise recombination breakpoint determination and greater accuracy of the lengths of the substituted segments, and provided more accurate background information. Third, using the 128 CSSLs combined with the bin-map converted from the sequencing-based physical map, a multiple linear regression QTL analysis mapped nine QTLs, which explained 89.50% of the phenotypic variance for culm length. A large-effect QTL was located in a 791,655 bp region that contained the rice 'green revolution' gene. Conclusions The present results demonstrated that high

  19. Developing high throughput genotyped chromosome segment substitution lines based on population whole-genome re-sequencing in rice (Oryza sativa L.).

    Science.gov (United States)

    Xu, Jianjun; Zhao, Qiang; Du, Peina; Xu, Chenwu; Wang, Baohe; Feng, Qi; Liu, Qiaoquan; Tang, Shuzhu; Gu, Minghong; Han, Bin; Liang, Guohua

    2010-11-24

    Genetic populations provide the basis for a wide range of genetic and genomic studies and have been widely used in genetic mapping, gene discovery and genomics-assisted breeding. Chromosome segment substitution lines (CSSLs) are the most powerful tools for the detection and precise mapping of quantitative trait loci (QTLs), for the analysis of complex traits in plant molecular genetics. In this study, a wide population consisting of 128 CSSLs was developed, derived from the crossing and back-crossing of two sequenced rice cultivars: 9311, an elite indica cultivar as the recipient and Nipponbare, a japonica cultivar as the donor. First, a physical map of the 128 CSSLs was constructed on the basis of estimates of the lengths and locations of the substituted chromosome segments using 254 PCR-based molecular markers. From this map, the total size of the 142 substituted segments in the population was 882.2 Mb, was 2.37 times that of the rice genome. Second, every CSSL underwent high-throughput genotyping by whole-genome re-sequencing with a 0.13× genome sequence, and an ultrahigh-quality physical map was constructed. This sequencing-based physical map indicated that 117 new segments were detected; almost all were shorter than 3 Mb and were not apparent in the molecular marker map. Furthermore, relative to the molecular marker-based map, the sequencing-based map yielded more precise recombination breakpoint determination and greater accuracy of the lengths of the substituted segments, and provided more accurate background information. Third, using the 128 CSSLs combined with the bin-map converted from the sequencing-based physical map, a multiple linear regression QTL analysis mapped nine QTLs, which explained 89.50% of the phenotypic variance for culm length. A large-effect QTL was located in a 791,655 bp region that contained the rice 'green revolution' gene. The present results demonstrated that high throughput genotyped CSSLs combine the advantages of an ultrahigh

  20. Comparative high-throughput transcriptome sequencing and development of SiESTa, the Silene EST annotation database

    Directory of Open Access Journals (Sweden)

    Marais Gabriel AB

    2011-07-01

    Full Text Available Abstract Background The genus Silene is widely used as a model system for addressing ecological and evolutionary questions in plants, but advances in using the genus as a model system are impeded by the lack of available resources for studying its genome. Massively parallel sequencing cDNA has recently developed into an efficient method for characterizing the transcriptomes of non-model organisms, generating massive amounts of data that enable the study of multiple species in a comparative framework. The sequences generated provide an excellent resource for identifying expressed genes, characterizing functional variation and developing molecular markers, thereby laying the foundations for future studies on gene sequence and gene expression divergence. Here, we report the results of a comparative transcriptome sequencing study of eight individuals representing four Silene and one Dianthus species as outgroup. All sequences and annotations have been deposited in a newly developed and publicly available database called SiESTa, the Silene EST annotation database. Results A total of 1,041,122 EST reads were generated in two runs on a Roche GS-FLX 454 pyrosequencing platform. EST reads were analyzed separately for all eight individuals sequenced and were assembled into contigs using TGICL. These were annotated with results from BLASTX searches and Gene Ontology (GO terms, and thousands of single-nucleotide polymorphisms (SNPs were characterized. Unassembled reads were kept as singletons and together with the contigs contributed to the unigenes characterized in each individual. The high quality of unigenes is evidenced by the proportion (49% that have significant hits in similarity searches with the A. thaliana proteome. The SiESTa database is accessible at http://www.siesta.ethz.ch. Conclusion The sequence collections established in the present study provide an important genomic resource for four Silene and one Dianthus species and will help to

  1. Development and Evaluation of Quality Metrics for Bioinformatics Analysis of Viral Insertion Site Data Generated Using High Throughput Sequencing.

    Science.gov (United States)

    Gao, Hongyu; Hawkins, Troy; Jasti, Aparna; Chen, Yu-Hsiang; Mockaitis, Keithanne; Dinauer, Mary; Cornetta, Kenneth

    2014-05-06

    Integration of viral vectors into a host genome is associated with insertional mutagenesis and subjects in clinical gene therapy trials must be monitored for this adverse event. Several PCR based methods such as ligase-mediated (LM) PCR, linear-amplification-mediated (LAM) PCR and non-restrictive (nr) LAM PCR were developed to identify sites of vector integration. Coupling the power of next-generation sequencing technologies with various PCR approaches will provide a comprehensive and genome-wide profiling of insertion sites and increase throughput. In this bioinformatics study, we aimed to develop and apply quality metrics to viral insertion data obtained using next-generation sequencing. We developed five simple metrics for assessing next-generation sequencing data from different PCR products and showed how the metrics can be used to objectively compare runs performed with the same methodology as well as data generated using different PCR techniques. The results will help researchers troubleshoot complex methodologies, understand the quality of sequencing data, and provide a starting point for developing standardization of vector insertion site data analysis.

  2. Investigation of Human Cancers for Retrovirus by Low-Stringency Target Enrichment and High-Throughput Sequencing

    DEFF Research Database (Denmark)

    Vinner, Lasse; Mourier, Tobias; Friis-Nielsen, Jens

    2015-01-01

    sequences in clinical samples. We used this method to conduct an investigation for novel retrovirus in samples from three cancer types. In accordance with recent studies our investigation revealed no retroviral infections in human B-cell lymphoma cells, cutaneous T-cell lymphoma or colorectal cancer...

  3. High-throughput sequencing of partially edited trypanosome mRNAs reveals barriers to editing progression and evidence for alternative editing

    Science.gov (United States)

    Simpson, Rachel M.; Bruno, Andrew E.; Bard, Jonathan E.; Buck, Michael J.

    2016-01-01

    Uridine insertion/deletion RNA editing in kinetoplastids entails the addition and deletion of uridine residues throughout the length of mitochondrial transcripts to generate translatable mRNAs. This complex process requires the coordinated use of several multiprotein complexes as well as the sequential use of noncoding template RNAs called guide RNAs. The majority of steady-state mitochondrial mRNAs are partially edited and often contain regions of mis-editing, termed junctions, whose role is unclear. Here, we report a novel method for sequencing entire populations of pre-edited partially edited, and fully edited RNAs and analyzing editing characteristics across populations using a new bioinformatics tool, the Trypanosome RNA Editing Alignment Tool (TREAT). Using TREAT, we examined populations of two transcripts, RPS12 and ND7-5′, in wild-type Trypanosoma brucei. We provide evidence that the majority of partially edited sequences contain junctions, that intrinsic pause sites arise during the progression of editing, and that the mechanisms that mediate pausing in the generation of canonical fully edited sequences are distinct from those that mediate the ends of junction regions. Furthermore, we identify alternatively edited sequences that constitute plausible alternative open reading frames and identify substantial variability in the 5′ UTRs of both canonical and alternatively edited sequences. This work is the first to use high-throughput sequencing to examine full-length sequences of whole populations of partially edited transcripts. Our method is highly applicable to current questions in the RNA editing field, including defining mechanisms of action for editing factors and identifying potential alternatively edited sequences. PMID:26908922

  4. High-throughput sequencing of partially edited trypanosome mRNAs reveals barriers to editing progression and evidence for alternative editing.

    Science.gov (United States)

    Simpson, Rachel M; Bruno, Andrew E; Bard, Jonathan E; Buck, Michael J; Read, Laurie K

    2016-05-01

    Uridine insertion/deletion RNA editing in kinetoplastids entails the addition and deletion of uridine residues throughout the length of mitochondrial transcripts to generate translatable mRNAs. This complex process requires the coordinated use of several multiprotein complexes as well as the sequential use of noncoding template RNAs called guide RNAs. The majority of steady-state mitochondrial mRNAs are partially edited and often contain regions of mis-editing, termed junctions, whose role is unclear. Here, we report a novel method for sequencing entire populations of pre-edited partially edited, and fully edited RNAs and analyzing editing characteristics across populations using a new bioinformatics tool, the Trypanosome RNA Editing Alignment Tool (TREAT). Using TREAT, we examined populations of two transcripts, RPS12 and ND7-5', in wild-typeTrypanosoma brucei We provide evidence that the majority of partially edited sequences contain junctions, that intrinsic pause sites arise during the progression of editing, and that the mechanisms that mediate pausing in the generation of canonical fully edited sequences are distinct from those that mediate the ends of junction regions. Furthermore, we identify alternatively edited sequences that constitute plausible alternative open reading frames and identify substantial variability in the 5' UTRs of both canonical and alternatively edited sequences. This work is the first to use high-throughput sequencing to examine full-length sequences of whole populations of partially edited transcripts. Our method is highly applicable to current questions in the RNA editing field, including defining mechanisms of action for editing factors and identifying potential alternatively edited sequences. © 2016 Simpson et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.

  5. High-throughput Transcriptome Sequencing Reveals the Role of Anthocyanin Metabolism in Begonia semperflorens Under High Light Stress.

    Science.gov (United States)

    Wang, Jiawan; Guo, Meili; Li, Yonghua; Wu, Ronghua; Zhang, Kaiming

    2017-07-26

    Begonia semperflorens is an ornamental perennial herb. The leaves of B. semperflorens turn red under increased light, which increases the ornamental value of the plant. The color of the leaves is determined by anthocyanin metabolism. In B. semperflorens leaves, anthocyanin metabolism is sensitive to external environmental conditions such as temperature, light and hormone levels. To explore this process in detail and to assess gene expression under high light stress, transcriptome analysis was performed by RNA sequencing using the sequencing-by-synthesis method. A total of 83 699 unigenes were isolated, and 51 754 unigenes were annotated using the NR, Swiss-Prot, KEGG, COG, KOG, GO and Pfam databases. Furthermore, many of the differentially expressed genes were related to factors associated with anthocyanin metabolism, which influences the expression of leaf color. © 2017 The American Society of Photobiology.

  6. Effective Optimization of Antibody Affinity by Phage Display Integrated with High-Throughput DNA Synthesis and Sequencing Technologies.

    Directory of Open Access Journals (Sweden)

    Dongmei Hu

    Full Text Available Phage display technology has been widely used for antibody affinity maturation for decades. The limited library sequence diversity together with excessive redundancy and labour-consuming procedure for candidate identification are two major obstacles to widespread adoption of this technology. We hereby describe a novel library generation and screening approach to address the problems. The approach started with the targeted diversification of multiple complementarity determining regions (CDRs of a humanized anti-ErbB2 antibody, HuA21, with a small perturbation mutagenesis strategy. A combination of three degenerate codons, NWG, NWC, and NSG, were chosen for amino acid saturation mutagenesis without introducing cysteine and stop residues. In total, 7,749 degenerate oligonucleotides were synthesized on two microchips and released to construct five single-chain antibody fragment (scFv gene libraries with 4 x 10(6 DNA sequences. Deep sequencing of the unselected and selected phage libraries using the Illumina platform allowed for an in-depth evaluation of the enrichment landscapes in CDR sequences and amino acid substitutions. Potent candidates were identified according to their high frequencies using NGS analysis, by-passing the need for the primary screening of target-binding clones. Furthermore, a subsequent library by recombination of the 10 most abundant variants from four CDRs was constructed and screened, and a mutant with 158-fold increased affinity (Kd = 25.5 pM was obtained. These results suggest the potential application of the developed methodology for optimizing the binding properties of other antibodies and biomolecules.

  7. Evaluation of MC1R high-throughput nucleotide sequencing data generated by the 1000 Genomes Project.

    Science.gov (United States)

    Marano, Leonardo Arduino; Marcorin, Letícia; Castelli, Erick da Cruz; Mendes-Junior, Celso Teixeira

    2017-01-01

    The advent of next-generation sequencing allows simultaneous processing of several genomic regions/individuals, increasing the availability and accuracy of whole-genome data. However, these new approaches may present some errors and bias due to alignment, genotype calling, and imputation methods. Despite these flaws, data obtained by next-generation sequencing can be valuable for population and evolutionary studies of specific genes, such as genes related to how pigmentation evolved among populations, one of the main topics in human evolutionary biology. Melanocortin-1 receptor (MC1R) is one of the most studied genes involved in pigmentation variation. As MC1R has already been suggested to affect melanogenesis and increase risk of developing melanoma, it constitutes one of the best models to understand how natural selection acts on pigmentation. Here we employed a locally developed pipeline to obtain genotype and haplotype data for MC1R from the raw sequencing data provided by the 1000 Genomes FTP site. We also compared such genotype data to Phase 3 VCF to evaluate its quality and discover any polymorphic sites that may have been overlooked. In conclusion, either the VCF file or one of the presently described pipelines could be used to obtain reliable and accurate genotype calling from the 1000 Genomes Phase 3 data.

  8. Automated high throughput nucleic acid purification from formalin-fixed paraffin-embedded tissue samples for next generation sequence analysis.

    Science.gov (United States)

    Haile, Simon; Pandoh, Pawan; McDonald, Helen; Corbett, Richard D; Tsao, Philip; Kirk, Heather; MacLeod, Tina; Jones, Martin; Bilobram, Steve; Brooks, Denise; Smailus, Duane; Steidl, Christian; Scott, David W; Bala, Miruna; Hirst, Martin; Miller, Diane; Moore, Richard A; Mungall, Andrew J; Coope, Robin J; Ma, Yussanne; Zhao, Yongjun; Holt, Rob A; Jones, Steven J; Marra, Marco A

    2017-01-01

    Curation and storage of formalin-fixed, paraffin-embedded (FFPE) samples are standard procedures in hospital pathology laboratories around the world. Many thousands of such samples exist and could be used for next generation sequencing analysis. Retrospective analyses of such samples are important for identifying molecular correlates of carcinogenesis, treatment history and disease outcomes. Two major hurdles in using FFPE material for sequencing are the damaged nature of the nucleic acids and the labor-intensive nature of nucleic acid purification. These limitations and a number of other issues that span multiple steps from nucleic acid purification to library construction are addressed here. We optimized and automated a 96-well magnetic bead-based extraction protocol that can be scaled to large cohorts and is compatible with automation. Using sets of 32 and 91 individual FFPE samples respectively, we generated libraries from 100 ng of total RNA and DNA starting amounts with 95-100% success rate. The use of the resulting RNA in micro-RNA sequencing was also demonstrated. In addition to offering the potential of scalability and rapid throughput, the yield obtained with lower input requirements makes these methods applicable to clinical samples where tissue abundance is limiting.

  9. Heap: a highly sensitive and accurate SNP detection tool for low-coverage high-throughput sequencing data

    KAUST Repository

    Kobayashi, Masaaki

    2017-04-20

    Recent availability of large-scale genomic resources enables us to conduct so called genome-wide association studies (GWAS) and genomic prediction (GP) studies, particularly with next-generation sequencing (NGS) data. The effectiveness of GWAS and GP depends on not only their mathematical models, but the quality and quantity of variants employed in the analysis. In NGS single nucleotide polymorphism (SNP) calling, conventional tools ideally require more reads for higher SNP sensitivity and accuracy. In this study, we aimed to develop a tool, Heap, that enables robustly sensitive and accurate calling of SNPs, particularly with a low coverage NGS data, which must be aligned to the reference genome sequences in advance. To reduce false positive SNPs, Heap determines genotypes and calls SNPs at each site except for sites at the both ends of reads or containing a minor allele supported by only one read. Performance comparison with existing tools showed that Heap achieved the highest F-scores with low coverage (7X) restriction-site associated DNA sequencing reads of sorghum and rice individuals. This will facilitate cost-effective GWAS and GP studies in this NGS era. Code and documentation of Heap are freely available from https://github.com/meiji-bioinf/heap (29 March 2017, date last accessed) and our web site (http://bioinf.mind.meiji.ac.jp/lab/en/tools.html (29 March 2017, date last accessed)).

  10. High-Throughput Analysis With 96-Capillary Array Electrophoresis and Integrated Sample Preparation for DNA Sequencing Based on Laser Induced Fluorescence Detection

    Energy Technology Data Exchange (ETDEWEB)

    Xue, Gang [Iowa State Univ., Ames, IA (United States)

    2001-01-01

    The purpose of this research was to improve the fluorescence detection for the multiplexed capillary array electrophoresis, extend its use beyond the genomic analysis, and to develop an integrated micro-sample preparation system for high-throughput DNA sequencing. The authors first demonstrated multiplexed capillary zone electrophoresis (CZE) and micellar electrokinetic chromatography (MEKC) separations in a 96-capillary array system with laser-induced fluorescence detection. Migration times of four kinds of fluoresceins and six polyaromatic hydrocarbons (PAHs) are normalized to one of the capillaries using two internal standards. The relative standard deviations (RSD) after normalization are 0.6-1.4% for the fluoresceins and 0.1-1.5% for the PAHs. Quantitative calibration of the separations based on peak areas is also performed, again with substantial improvement over the raw data. This opens up the possibility of performing massively parallel separations for high-throughput chemical analysis for process monitoring, combinatorial synthesis, and clinical diagnosis. The authors further improved the fluorescence detection by step laser scanning. A computer-controlled galvanometer scanner is adapted for scanning a focused laser beam across a 96-capillary array for laser-induced fluorescence detection. The signal at a single photomultiplier tube is temporally sorted to distinguish among the capillaries. The limit of detection for fluorescein is 3 x 10-11 M (S/N = 3) for 5-mW of total laser power scanned at 4 Hz. The observed cross-talk among capillaries is 0.2%. Advantages include the efficient utilization of light due to the high duty-cycle of step scan, good detection performance due to the reduction of stray light, ruggedness due to the small mass of the galvanometer mirror, low cost due to the simplicity of components, and flexibility due to the independent paths for excitation and emission.

  11. Hi-Plex for Simple, Accurate, and Cost-Effective Amplicon-based Targeted DNA Sequencing.

    Science.gov (United States)

    Pope, Bernard J; Hammet, Fleur; Nguyen-Dumont, Tu; Park, Daniel J

    2018-01-01

    Hi-Plex is a suite of methods to enable simple, accurate, and cost-effective highly multiplex PCR-based targeted sequencing (Nguyen-Dumont et al., Biotechniques 58:33-36, 2015). At its core is the principle of using gene-specific primers (GSPs) to "seed" (or target) the reaction and universal primers to "drive" the majority of the reaction. In this manner, effects on amplification efficiencies across the target amplicons can, to a large extent, be restricted to early seeding cycles. Product sizes are defined within a relatively narrow range to enable high-specificity size selection, replication uniformity across target sites (including in the context of fragmented input DNA such as that derived from fixed tumor specimens (Nguyen-Dumont et al., Biotechniques 55:69-74, 2013; Nguyen-Dumont et al., Anal Biochem 470:48-51, 2015), and application of high-specificity genetic variant calling algorithms (Pope et al., Source Code Biol Med 9:3, 2014; Park et al., BMC Bioinformatics 17:165, 2016). Hi-Plex offers a streamlined workflow that is suitable for testing large numbers of specimens without the need for automation.

  12. High-throughput sequencing of six bamboo chloroplast genomes: phylogenetic implications for temperate woody bamboos (Poaceae: Bambusoideae.

    Directory of Open Access Journals (Sweden)

    Yun-Jie Zhang

    Full Text Available BACKGROUND: Bambusoideae is the only subfamily that contains woody members in the grass family, Poaceae. In phylogenetic analyses, Bambusoideae, Pooideae and Ehrhartoideae formed the BEP clade, yet the internal relationships of this clade are controversial. The distinctive life history (infrequent flowering and predominance of asexual reproduction of woody bamboos makes them an interesting but taxonomically difficult group. Phylogenetic analyses based on large DNA fragments could only provide a moderate resolution of woody bamboo relationships, although a robust phylogenetic tree is needed to elucidate their evolutionary history. Phylogenomics is an alternative choice for resolving difficult phylogenies. METHODOLOGY/PRINCIPAL FINDINGS: Here we present the complete nucleotide sequences of six woody bamboo chloroplast (cp genomes using Illumina sequencing. These genomes are similar to those of other grasses and rather conservative in evolution. We constructed a phylogeny of Poaceae from 24 complete cp genomes including 21 grass species. Within the BEP clade, we found strong support for a sister relationship between Bambusoideae and Pooideae. In a substantial improvement over prior studies, all six nodes within Bambusoideae were supported with ≥0.95 posterior probability from Bayesian inference and 5/6 nodes resolved with 100% bootstrap support in maximum parsimony and maximum likelihood analyses. We found that repeats in the cp genome could provide phylogenetic information, while caution is needed when using indels in phylogenetic analyses based on few selected genes. We also identified relatively rapidly evolving cp genome regions that have the potential to be used for further phylogenetic study in Bambusoideae. CONCLUSIONS/SIGNIFICANCE: The cp genome of Bambusoideae evolved slowly, and phylogenomics based on whole cp genome could be used to resolve major relationships within the subfamily. The difficulty in resolving the diversification among

  13. Assessment of whole genome amplification-induced bias through high-throughput, massively parallel whole genome sequencing

    Directory of Open Access Journals (Sweden)

    Plant Ramona N

    2006-08-01

    Full Text Available Abstract Background Whole genome amplification is an increasingly common technique through which minute amounts of DNA can be multiplied to generate quantities suitable for genetic testing and analysis. Questions of amplification-induced error and template bias generated by these methods have previously been addressed through either small scale (SNPs or large scale (CGH array, FISH methodologies. Here we utilized whole genome sequencing to assess amplification-induced bias in both coding and non-coding regions of two bacterial genomes. Halobacterium species NRC-1 DNA and Campylobacter jejuni were amplified by several common, commercially available protocols: multiple displacement amplification, primer extension pre-amplification and degenerate oligonucleotide primed PCR. The amplification-induced bias of each method was assessed by sequencing both genomes in their entirety using the 454 Sequencing System technology and comparing the results with those obtained from unamplified controls. Results All amplification methodologies induced statistically significant bias relative to the unamplified control. For the Halobacterium species NRC-1 genome, assessed at 100 base resolution, the D-statistics from GenomiPhi-amplified material were 119 times greater than those from unamplified material, 164.0 times greater for Repli-G, 165.0 times greater for PEP-PCR and 252.0 times greater than the unamplified controls for DOP-PCR. For Campylobacter jejuni, also analyzed at 100 base resolution, the D-statistics from GenomiPhi-amplified material were 15 times greater than those from unamplified material, 19.8 times greater for Repli-G, 61.8 times greater for PEP-PCR and 220.5 times greater than the unamplified controls for DOP-PCR. Conclusion Of the amplification methodologies examined in this paper, the multiple displacement amplification products generated the least bias, and produced significantly higher yields of amplified DNA.

  14. Identification and characterization of Wilt and salt stress-responsive microRNAs in chickpea through high-throughput sequencing.

    Science.gov (United States)

    Kohli, Deshika; Joshi, Gopal; Deokar, Amit Atmaram; Bhardwaj, Ankur R; Agarwal, Manu; Katiyar-Agarwal, Surekha; Srinivasan, Ramamurthy; Jain, Pradeep Kumar

    2014-01-01

    Chickpea (Cicer arietinum) is the second most widely grown legume worldwide and is the most important pulse crop in the Indian subcontinent. Chickpea productivity is adversely affected by a large number of biotic and abiotic stresses. MicroRNAs (miRNAs) have been implicated in the regulation of plant responses to several biotic and abiotic stresses. This study is the first attempt to identify chickpea miRNAs that are associated with biotic and abiotic stresses. The wilt infection that is caused by the fungus Fusarium oxysporum f.sp. ciceris is one of the major diseases severely affecting chickpea yields. Of late, increasing soil salinization has become a major problem in realizing these potential yields. Three chickpea libraries using fungal-infected, salt-treated and untreated seedlings were constructed and sequenced using next-generation sequencing technology. A total of 12,135,571 unique reads were obtained. In addition to 122 conserved miRNAs belonging to 25 different families, 59 novel miRNAs along with their star sequences were identified. Four legume-specific miRNAs, including miR5213, miR5232, miR2111 and miR2118, were found in all of the libraries. Poly(A)-based qRT-PCR (Quantitative real-time PCR) was used to validate eleven conserved and five novel miRNAs. miR530 was highly up regulated in response to fungal infection, which targets genes encoding zinc knuckle- and microtubule-associated proteins. Many miRNAs responded in a similar fashion under both biotic and abiotic stresses, indicating the existence of cross talk between the pathways that are involved in regulating these stresses. The potential target genes for the conserved and novel miRNAs were predicted based on sequence homologies. miR166 targets a HD-ZIPIII transcription factor and was validated by 5' RLM-RACE. This study has identified several conserved and novel miRNAs in the chickpea that are associated with gene regulation following exposure to wilt and salt stress.

  15. Identification and characterization of Wilt and salt stress-responsive microRNAs in chickpea through high-throughput sequencing.

    Directory of Open Access Journals (Sweden)

    Deshika Kohli

    Full Text Available Chickpea (Cicer arietinum is the second most widely grown legume worldwide and is the most important pulse crop in the Indian subcontinent. Chickpea productivity is adversely affected by a large number of biotic and abiotic stresses. MicroRNAs (miRNAs have been implicated in the regulation of plant responses to several biotic and abiotic stresses. This study is the first attempt to identify chickpea miRNAs that are associated with biotic and abiotic stresses. The wilt infection that is caused by the fungus Fusarium oxysporum f.sp. ciceris is one of the major diseases severely affecting chickpea yields. Of late, increasing soil salinization has become a major problem in realizing these potential yields. Three chickpea libraries using fungal-infected, salt-treated and untreated seedlings were constructed and sequenced using next-generation sequencing technology. A total of 12,135,571 unique reads were obtained. In addition to 122 conserved miRNAs belonging to 25 different families, 59 novel miRNAs along with their star sequences were identified. Four legume-specific miRNAs, including miR5213, miR5232, miR2111 and miR2118, were found in all of the libraries. Poly(A-based qRT-PCR (Quantitative real-time PCR was used to validate eleven conserved and five novel miRNAs. miR530 was highly up regulated in response to fungal infection, which targets genes encoding zinc knuckle- and microtubule-associated proteins. Many miRNAs responded in a similar fashion under both biotic and abiotic stresses, indicating the existence of cross talk between the pathways that are involved in regulating these stresses. The potential target genes for the conserved and novel miRNAs were predicted based on sequence homologies. miR166 targets a HD-ZIPIII transcription factor and was validated by 5' RLM-RACE. This study has identified several conserved and novel miRNAs in the chickpea that are associated with gene regulation following exposure to wilt and salt stress.

  16. High throughput SNP discovery and genotyping in grapevine (Vitis vinifera L. by combining a re-sequencing approach and SNPlex technology

    Directory of Open Access Journals (Sweden)

    Martínez-Zapater José M

    2007-11-01

    decay of LD within the selected grapevine genotypes. To validate the use of the detected polymorphisms in genetic mapping, cultivar identification and genetic diversity studies we have used the SNPlex™ genotyping technology in a sample of grapevine genotypes and segregating progenies. Conclusion These results provide accurate values for nucleotide diversity in coding sequences and a first estimate of short-range LD in grapevine. Using SNPlex™ genotyping we have shown the application of a set of discovered SNPs as molecular markers for cultivar identification, linkage mapping and genetic diversity studies. Thus, the combination a highly efficient re-sequencing approach and the SNPlex™ high throughput genotyping technology provide a powerful tool for grapevine genetic analysis.

  17. High-throughput sequencing identification of genes involved with Varroa destructor resistance in the eastern honeybee, Apis cerana.

    Science.gov (United States)

    Ji, T; Yin, L; Liu, Z; Shen, F; Shen, J

    2014-10-31

    Varroa destructor is the greatest threat to the honeybee Apis mellifera worldwide, while it rarely causes serious harm to its native host, the Eastern honeybee Apis cerana. The genetic mechanisms underlying the resistance of A. cerana to Varroa remain unclear. Thus, understanding the molecular mechanism of resistance to Varroa may provide useful insights for reducing this disease in other organisms. In this study, the transcriptomes of two A. cerana colonies were sequenced using the Illumina Solexa sequencing method. One colony was highly affected by mites, whereas the other colony displayed strong resistance to V. destructor. We determined differences in gene expression in the two colonies after challenging the colonies with V. destructor. After de novo transcriptome assembly, we obtained 91,172 unigenes for A. cerana and found that 288 differentially expressed genes varied by more than 15-fold. A total of 277 unigenes were present at higher levels in the non-affected colony. Genes involved in resistance to Varroa included unigenes related to skeletal muscle movement, olfactory sensitivity, and transcription factors. This suggests that hygienic behavior and grooming behavior may play important roles in the resistance to Varroa.

  18. CRISPR-Cas9-Edited Site Sequencing (CRES-Seq): An Efficient and High-Throughput Method for the Selection of CRISPR-Cas9-Edited Clones.

    Science.gov (United States)

    Veeranagouda, Yaligara; Debono-Lagneaux, Delphine; Fournet, Hamida; Thill, Gilbert; Didier, Michel

    2018-01-16

    The emergence of clustered regularly interspaced short palindromic repeats-Cas9 (CRISPR-Cas9) gene editing systems has enabled the creation of specific mutants at low cost, in a short time and with high efficiency, in eukaryotic cells. Since a CRISPR-Cas9 system typically creates an array of mutations in targeted sites, a successful gene editing project requires careful selection of edited clones. This process can be very challenging, especially when working with multiallelic genes and/or polyploid cells (such as cancer and plants cells). Here we described a next-generation sequencing method called CRISPR-Cas9 Edited Site Sequencing (CRES-Seq) for the efficient and high-throughput screening of CRISPR-Cas9-edited clones. CRES-Seq facilitates the precise genotyping up to 96 CRISPR-Cas9-edited sites (CRES) in a single MiniSeq (Illumina) run with an approximate sequencing cost of $6/clone. CRES-Seq is particularly useful when multiple genes are simultaneously targeted by CRISPR-Cas9, and also for screening of clones generated from multiallelic genes/polyploid cells. © 2018 by John Wiley & Sons, Inc. Copyright © 2018 John Wiley & Sons, Inc.

  19. High-throughput sequencing as an effective approach in profiling small RNAs derived from a hairpin RNA expression vector in woody plants.

    Science.gov (United States)

    Zhao, Dongyan; Song, Guo-Qing

    2014-11-01

    Hairpin RNA (hpRNA)-mediated gene silencing has proved to be an efficient approach to develop virus-resistant transgenic plants. To characterize small RNA molecules (sRNAs) derived from an hpRNA expression vector in transgenic cherry rootstock plants, we conducted small RNA sequencing of (1) a transgenic rootstock containing an inverted repeat of the partial coat protein of Prunus necrotic ring spot virus (PNRSV-hpRNA); (2) a nontransgenic rootstock; and (3) a PNRSV-infected sweet cherry plant. Analysis of the PNRSV sRNA pools indicated that 24-nt (nucleotide) small interfering RNAs (siRNAs) were the most prevalent sRNAs in the transgenic rootstock whereas the most abundant sRNAs in the PNRSV-infected nontransgenic rootstock were 21-nt siRNAs. In addition, the 24-nt siRNAs of the PNRSV-hpRNA were more abundant on the sense strand than those on the antisense strand in the transgenic rootstock. In contrast, preference in generating PNRSV sRNAs, ranging from 19-nt to 30-nt for sense and antisense strands, was not distinct in the PNRSV-infected nontransgenic sweet cherry. Taken together, this is the first report on profiling hpRNA-derived sRNAs in woody plants using high-throughput sequencing technology, which is an efficient way to verify the presence/absence, the abundance, and the sequence features of certain sRNAs. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  20. A High-Throughput and Low-Complexity H.264/AVC Intra 16×16 Prediction Architecture for HD Video Sequences

    Directory of Open Access Journals (Sweden)

    M. Orlandić

    2014-11-01

    Full Text Available H.264/AVC compression standard provides tools and solutions for an efficient coding of video sequences of various resolutions. Spatial redundancy in a video frame is removed by use of intra prediction algorithm. There are three block-wise types of intra prediction: 4×4, 8×8 and 16×16. This paper proposes an efficient, low-complexity architecture for intra 16×16 prediction that provides real-time processing of HD video sequences. All four prediction (V, H, DC, Plane modes are supported in the implementation. The high-complexity plane mode computes a number of intermediate parameters required for creating prediction pixels. The local memory buffers are used for storing intermediate reconstructed data used as reference pixels in intra prediction process. The high throughput is achieved by 16-pixel parallelism and the proposed prediction process takes 48 cycles for processing one macroblock. The proposed architecture is synthesized and implemented on Kintex 705 -XC7K325T board and requires 94 MHz to encode a video sequence of HD 4k×2k (3840×2160 resolution at 60 fps in real time. This represents a significant improvement compared to the state of the art.

  1. Culture-Independent Metagenomic Surveillance of Commercially Available Probiotics with High-Throughput Next-Generation Sequencing.

    Science.gov (United States)

    Patro, Jennifer N; Ramachandran, Padmini; Barnaba, Tammy; Mammel, Mark K; Lewis, Jada L; Elkins, Christopher A

    2016-01-01

    Millions of people consume dietary supplements either following a doctor's recommendation or at their own discretion to improve their overall health and well-being. This is a rapidly growing trend, with an associated and expanding manufacturing industry to meet the demand for new health-related products. In this study, we examined the contents and microbial viability of several popular probiotic products on the United States market. Culture-independent methods are proving ideal for fast and efficient analysis of foodborne pathogens and their associated microbial communities but may also be relevant for analyzing probiotics containing mixed microbial constituents. These products were subjected to next-generation whole-genome sequencing and analyzed by a custom in-house-developed k-mer counting method to validate manufacturer label information. In addition, the batch variability of respective products was examined to determine if any changes in their formulations and/or the manufacturing process occurred. Overall, the products we tested adhered to the ingredient claims and lot-to-lot differences were minimal. However, there were a few discrepancies in the naming of closely related Lactobacillus and Bifidobacterium species, whereas one product contained an apparent Enterococcus contaminant in two of its three lots. With the microbial contents of the products identified, we used traditional PCR and colony counting methods to comparatively assess our results and verify the viability of the microbes in these products with regard to the labeling claims. Of all the supplements examined, only one was found to be inaccurate in viability. Our use of next-generation sequencing as an analytical tool clearly demonstrated its utility for quickly analyzing commercially available products containing multiple microbes to ensure consumer safety. IMPORTANCE The rapidly growing supplement industry operates without a formal premarket approval process. Consumers rely on product labels to

  2. Mining genes involved in the stratification of Paris Polyphylla seeds using high-throughput embryo Transcriptome sequencing

    Science.gov (United States)

    2013-01-01

    Background Paris polyphylla var. yunnanensis is an important medicinal plant. Seed dormancy is one of the main factors restricting artificial cultivation. The molecular mechanisms of seed dormancy remain unclear, and little genomic or transcriptome data are available for this plant. Results In this study, massive parallel pyrosequencing on the Roche 454-GS FLX Titanium platform was used to generate a substantial sequence dataset for the P. polyphylla embryo. 369,496 high quality reads were obtained, ranging from 50 to 1146 bp, with a mean of 219 bp. These reads were assembled into 47,768 unigenes, which included 16,069 contigs and 31,699 singletons. Using BLASTX searches of public databases, 15,757 (32.3%) unique transcripts were identified. Gene Ontology and Cluster of Orthologous Groups of proteins annotations revealed that these transcripts were broadly representative of the P. polyphylla embryo transcriptome. The Kyoto Encyclopedia of Genes and Genomes assigned 5961 of the unique sequences to specific metabolic pathways. Relative expression levels analysis showed that eleven phytohormone-related genes and five other genes have different expression patterns in the embryo and endosperm in the seed stratification process. Conclusions Gene annotation and quantitative RT-PCR expression analysis identified 464 transcripts that may be involved in phytohormone catabolism and biosynthesis, hormone signal, seed dormancy, seed maturation, cell wall growth and circadian rhythms. In particular, the relative expression analysis of sixteen genes (CYP707A, NCED, GA20ox2, GA20ox3, ABI2, PP2C, ARP3, ARP7, IAAH, IAAS, BRRK, DRM, ELF1, ELF2, SFR6, and SUS) in embryo and endosperm and at two temperatures indicated that these related genes may be candidates for clarifying the molecular basis of seed dormancy in P. polyphlla var. yunnanensis. PMID:23718911

  3. Development of novel microsatellite markers for the BBCC Oryza genome (Poaceae) using high-throughput sequencing technology.

    Science.gov (United States)

    Wang, Caihong; Liu, Xiaojiao; Peng, Suotang; Xu, Qun; Yuan, Xiaoping; Feng, Yue; Yu, Hanyong; Wang, Yiping; Wei, Xinghua

    2014-01-01

    Wild species of Oryza are extremely valuable sources of genetic material that can be used to broaden the genetic background of cultivated rice, and to increase its resistance to abiotic and biotic stresses. Until recently, there was no sequence information for the BBCC Oryza genome; therefore, no special markers had been developed for this genome type. The lack of suitable markers made it difficult to search for valuable genes in the BBCC genome. The aim of this study was to develop microsatellite markers for the BBCC genome. We obtained 13,991 SSR-containing sequences and designed 14,508 primer pairs. The most abundant was hexanuclelotide (31.39%), followed by trinucleotide (27.67%) and dinucleotide (19.04%). 600 markers were selected for validation in 23 accessions of Oryza species with the BBCC genome. A set of 495 markers produced clear amplified fragments of the expected sizes. The average number of alleles per locus (Na) was 2.5, ranging from 1 to 9. The genetic diversity per locus (He) ranged from 0 to 0.844 with a mean of 0.333. The mean polymorphism information content (PIC) was 0.290, and ranged from 0 to 0.825. Of the 495 markers, 12 were only found in the BB genome, 173 were unique to the CC genome, and 198 were also present in the AA genome. These microsatellite markers could be used to evaluate the phylogenetic relationships among different Oryza genomes, and to construct a genetic linkage map for locating and identifying valuable genes in the BBCC genome, and would also for marker-assisted breeding programs that included accessions with the AA genome, especially Oryza sativa.

  4. Assessing the impact of water treatment on bacterial biofilms in drinking water distribution systems using high-throughput DNA sequencing.

    Science.gov (United States)

    Shaw, Jennifer L A; Monis, Paul; Fabris, Rolando; Ho, Lionel; Braun, Kalan; Drikas, Mary; Cooper, Alan

    2014-12-01

    Biofilm control in drinking water distribution systems (DWDSs) is crucial, as biofilms are known to reduce flow efficiency, impair taste and quality of drinking water and have been implicated in the transmission of harmful pathogens. Microorganisms within biofilm communities are more resistant to disinfection compared to planktonic microorganisms, making them difficult to manage in DWDSs. This study evaluates the impact of four unique drinking water treatments on biofilm community structure using metagenomic DNA sequencing. Four experimental DWDSs were subjected to the following treatments: (1) conventional coagulation, (2) magnetic ion exchange contact (MIEX) plus conventional coagulation, (3) MIEX plus conventional coagulation plus granular activated carbon, and (4) membrane filtration (MF). Bacterial biofilms located inside the pipes of each system were sampled under sterile conditions both (a) immediately after treatment application ('inlet') and (b) at a 1 km distance from the treatment application ('outlet'). Bacterial 16S rRNA gene sequencing revealed that the outlet biofilms were more diverse than those sampled at the inlet for all treatments. The lowest number of unique operational taxonomic units (OTUs) and lowest diversity was observed in the MF inlet. However, the MF system revealed the greatest increase in diversity and OTU count from inlet to outlet. Further, the biofilm communities at the outlet of each system were more similar to one another than to their respective inlet, suggesting that biofilm communities converge towards a common established equilibrium as distance from treatment application increases. Based on the results, MF treatment is most effective at inhibiting biofilm growth, but a highly efficient post-treatment disinfection regime is also critical in order to prevent the high rates of post-treatment regrowth. Copyright © 2014 Elsevier Ltd. All rights reserved.

  5. Identification and Characterization of miRNA Transcriptome in Asiatic Cotton (Gossypium arboreum Using High Throughput Sequencing

    Directory of Open Access Journals (Sweden)

    Muhammad Farooq

    2017-06-01

    Full Text Available MicroRNAs (miRNAs are small 20–24nt molecules that have been well studied over the past decade due to their important regulatory roles in different cellular processes. The mature sequences are more conserved across vast phylogenetic scales than their precursors and some are conserved within entire kingdoms, hence, their loci and function can be predicted by homology searches. Different studies have been performed to elucidate miRNAs using de novo prediction methods but due to complex regulatory mechanisms or false positive in silico predictions, not all of them express in reality and sometimes computationally predicted mature transcripts differ from the actual expressed ones. With the availability of a complete genome sequence of Gossypium arboreum, it is important to annotate the genome for both coding and non-coding regions using high confidence transcript evidence, for this cotton species that is highly resistant to various biotic and abiotic stresses. Here we have analyzed the small RNA transcriptome of G. arboreum leaves and provided genome annotation of miRNAs with evidence from miRNA/miRNA∗ transcripts. A total of 446 miRNAs clustered into 224 miRNA families were found, among which 48 families are conserved in other plants and 176 are novel. Four short RNA libraries were used to shortlist best predictions based on high reads per million. The size, origin, copy numbers and transcript depth of all miRNAs along with their isoforms and targets has been reported. The highest gene copy number was observed for gar-miR7504 followed by gar-miR166, gar-miR8771, gar-miR156, and gar-miR7484. Altogether, 1274 target genes were found in G. arboreum that are enriched for 216 KEGG pathways. The resultant genomic annotations are provided in UCSC, BED format.

  6. Identification of miRNAs Responsive to Botrytis cinerea in Herbaceous Peony (Paeonia lactiflora Pall. by High-Throughput Sequencing

    Directory of Open Access Journals (Sweden)

    Daqiu Zhao

    2015-09-01

    Full Text Available Herbaceous peony (Paeonia lactiflora Pall., one of the world’s most important ornamental plants, is highly susceptible to Botrytis cinerea, and improving resistance to this pathogenic fungus is a problem yet to be solved. MicroRNAs (miRNAs play an essential role in resistance to B. cinerea, but until now, no studies have been reported concerning miRNAs induction in P. lactiflora. Here, we constructed and sequenced two small RNA (sRNA libraries from two B. cinerea-infected P. lactiflora cultivars (“Zifengyu” and “Dafugui” with significantly different levels of resistance to B. cinerea, using the Illumina HiSeq 2000 platform. From the raw reads generated, 4,592,881 and 5,809,796 sRNAs were obtained, and 280 and 306 miRNAs were identified from “Zifengyu” and “Dafugui”, respectively. A total of 237 conserved and 7 novel sequences of miRNAs were differentially expressed between the two cultivars, and we predicted and annotated their potential target genes. Subsequently, 7 differentially expressed candidate miRNAs were screened according to their target genes annotated in KEGG pathways, and the expression patterns of miRNAs and corresponding target genes were elucidated. We found that miR5254, miR165a-3p, miR3897-3p and miR6450a might be involved in the P. lactiflora response to B. cinerea infection. These results provide insight into the molecular mechanisms responsible for resistance to B. cinerea in P. lactiflora.

  7. De novo Transcriptome Analysis of Chinese Citrus Fly, Bactrocera minax (Diptera: Tephritidae, by High-Throughput Illumina Sequencing.

    Directory of Open Access Journals (Sweden)

    Jia Wang

    Full Text Available The Chinese citrus fly, Bactrocera minax (Enderlein, is one of the most devastating pests of citrus in the temperate areas of Asia. So far, studies involving molecular biology and physiology of B. minax are still scarce, partly because of the lack of genomic information and inability to rear this insect in laboratory. In this study, de novo assembly of a transcriptome was performed using Illumina sequencing technology. A total of 20,928,907 clean reads were obtained and assembled into 33,324 unigenes, with an average length of 908.44 bp. Unigenes were annotated by alignment against NCBI non-redundant protein (Nr, Swiss-Prot, Clusters of Orthologous Groups (COG, Gene Ontology (GO, and Kyoto Encyclopedia of Genes and Genomes Pathway (KEGG database. Genes potentially involved in stress tolerance, including 20 heat shock protein (Hsps genes, 26 glutathione S-transferases (GSTs genes, and 2 ferritin subunit genes, were identified. These genes may play roles in stress tolerance in B. minax diapause stage. It has previously been found that 20E application on B. minax pupae could avert diapause, but the underlying mechanisms remain unknown. Thus, genes encoding enzymes in 20E biosynthesis pathway, including Neverland, Spook, Phantom, Disembodied, Shadow, Shade, and Cyp18a1, and genes encoding 20E receptor proteins, ecdysone receptor (EcR and ultraspiracle (USP, were identified. The expression patterns of 20E-related genes among developmental stages and between 20E-treated and untreated pupae demonstrated their roles in diapause program. In addition, 1,909 simple sequence repeats (SSRs were detected, which will contribute to molecular marker development. The findings in this study greatly improve our genetic understanding of B. minax, and lay the foundation for future studies on this species.

  8. De novo Transcriptome Analysis of Chinese Citrus Fly, Bactrocera minax (Diptera: Tephritidae), by High-Throughput Illumina Sequencing.

    Science.gov (United States)

    Wang, Jia; Xiong, Ke-Cai; Liu, Ying-Hong

    2016-01-01

    The Chinese citrus fly, Bactrocera minax (Enderlein), is one of the most devastating pests of citrus in the temperate areas of Asia. So far, studies involving molecular biology and physiology of B. minax are still scarce, partly because of the lack of genomic information and inability to rear this insect in laboratory. In this study, de novo assembly of a transcriptome was performed using Illumina sequencing technology. A total of 20,928,907 clean reads were obtained and assembled into 33,324 unigenes, with an average length of 908.44 bp. Unigenes were annotated by alignment against NCBI non-redundant protein (Nr), Swiss-Prot, Clusters of Orthologous Groups (COG), Gene Ontology (GO), and Kyoto Encyclopedia of Genes and Genomes Pathway (KEGG) database. Genes potentially involved in stress tolerance, including 20 heat shock protein (Hsps) genes, 26 glutathione S-transferases (GSTs) genes, and 2 ferritin subunit genes, were identified. These genes may play roles in stress tolerance in B. minax diapause stage. It has previously been found that 20E application on B. minax pupae could avert diapause, but the underlying mechanisms remain unknown. Thus, genes encoding enzymes in 20E biosynthesis pathway, including Neverland, Spook, Phantom, Disembodied, Shadow, Shade, and Cyp18a1, and genes encoding 20E receptor proteins, ecdysone receptor (EcR) and ultraspiracle (USP), were identified. The expression patterns of 20E-related genes among developmental stages and between 20E-treated and untreated pupae demonstrated their roles in diapause program. In addition, 1,909 simple sequence repeats (SSRs) were detected, which will contribute to molecular marker development. The findings in this study greatly improve our genetic understanding of B. minax, and lay the foundation for future studies on this species.

  9. Exploring the transcriptome of non-model oleaginous microalga Dunaliella tertiolecta through high-throughput sequencing and high performance computing.

    Science.gov (United States)

    Yao, Lina; Tan, Kenneth Wei Min; Tan, Tin Wee; Lee, Yuan Kun

    2017-02-22

    RNA-Seq technology has received a lot of attention in recent years for microalgal global transcriptomic profiling. It is widely used in transcriptome-wide analysis of gene expression., particularly for microalgal strains with potential as biofuel sources. However, insufficient genomic or transcriptomic information of non-model microalgae has limited the understanding of their regulatory mechanisms and hampered genetic manipulation to enhance biofuel production. As such, an optimal microalgal transcriptomic database construction is a subject of urgent investigation. Dunaliella tertiolecta, a non-model oleaginous microalgal species, was sequenced via Illumina MISEQ and HISEQ 4000 in RNA-Seq studies. The high quality high-throughout sequencing data were explored using high performance computing (HPC) in a petascale data center and subjected to de novo assembly and parallelized mpiBLASTX search with multiple species. As a result, a transcriptome database of 17,845 was constructed (~95% completeness). This enlarged database constructed fueled the RNA-Seq data analysis, which was validated by a nitrogen deprivation (ND) study that induces triacylglycerol (TAG) production. The new paralleled assembly and annotation method under HPC presented here allows the solution of large-scale data processing problems in acceptable computation time. There is significant increase in the number of transcriptomic data achieved and observable heterogeneity in the performance to identify differentially expressed genes in the ND treatment paradigm. The results provide new insights as to how response to ND treatment in microalgae is regulated. ND analyses highlight the advantages of this database generated in this study that could also serve as a useful resource for future gene manipulation and transcriptome-wide analysis. We thus demonstrate the usefulness of exploring the transcriptome as an informative platform for functional studies and genetic manipulations in similar species.

  10. High-throughput detection of induced mutations and natural variation using KeyPoint technology.

    Directory of Open Access Journals (Sweden)

    Diana Rigola

    Full Text Available Reverse genetics approaches rely on the detection of sequence alterations in target genes to identify allelic variants among mutant or natural populations. Current (pre- screening methods such as TILLING and EcoTILLING are based on the detection of single base mismatches in heteroduplexes using endonucleases such as CEL 1. However, there are drawbacks in the use of endonucleases due to their relatively poor cleavage efficiency and exonuclease activity. Moreover, pre-screening methods do not reveal information about the nature of sequence changes and their possible impact on gene function. We present KeyPoint technology, a high-throughput mutation/polymorphism discovery technique based on massive parallel sequencing of target genes amplified from mutant or natural populations. KeyPoint combines multi-dimensional pooling of large numbers of individual DNA samples and the use of sample identification tags ("sample barcoding" with next-generation sequencing technology. We show the power of KeyPoint by identifying two mutants in the tomato eIF4E gene based on screening more than 3000 M2 families in a single GS FLX sequencing run, and discovery of six haplotypes of tomato eIF4E gene by re-sequencing three amplicons in a subset of 92 tomato lines from the EU-SOL core collection. We propose KeyPoint technology as a broadly applicable amplicon sequencing approach to screen mutant populations or germplasm collections for identification of (novel allelic variation in a high-throughput fashion.

  11. High-throughput sequencing and pathway analysis reveal alteration of the pituitary transcriptome by 17α-ethynylestradiol (EE2) in female coho salmon, Oncorhynchus kisutch

    Energy Technology Data Exchange (ETDEWEB)

    Harding, Louisa B. [School of Aquatic and Fishery Sciences, University of Washington, Seattle, WA 98195 (United States); Schultz, Irvin R. [Battelle, Marine Sciences Laboratory – Pacific Northwest National Laboratory, 1529 West Sequim Bay Road, Sequim, WA 98382 (United States); Goetz, Giles W. [School of Aquatic and Fishery Sciences, University of Washington, Seattle, WA 98195 (United States); Luckenbach, J. Adam [Northwest Fisheries Science Center, National Marine Fisheries Service, National Oceanic and Atmospheric Administration, 2725 Montlake Blvd E, Seattle, WA 98112 (United States); Center for Reproductive Biology, Washington State University, Pullman, WA 98164 (United States); Young, Graham [School of Aquatic and Fishery Sciences, University of Washington, Seattle, WA 98195 (United States); Center for Reproductive Biology, Washington State University, Pullman, WA 98164 (United States); Goetz, Frederick W. [Northwest Fisheries Science Center, National Marine Fisheries Service, National Oceanic and Atmospheric Administration, Manchester Research Station, P.O. Box 130, Manchester, WA 98353 (United States); Swanson, Penny, E-mail: penny.swanson@noaa.gov [Northwest Fisheries Science Center, National Marine Fisheries Service, National Oceanic and Atmospheric Administration, 2725 Montlake Blvd E, Seattle, WA 98112 (United States); Center for Reproductive Biology, Washington State University, Pullman, WA 98164 (United States)

    2013-10-15

    Highlights: •Studied impacts of ethynylestradiol (EE2) exposure on salmon pituitary transcriptome. •High-throughput sequencing, RNAseq, and pathway analysis were performed. •EE2 altered mRNAs for genes in circadian rhythm, GnRH, and TGFβ signaling pathways. •LH and FSH beta subunit mRNAs were most highly up- and down-regulated by EE2, respectively. •Estrogens may alter processes associated with reproductive timing in salmon. -- Abstract: Considerable research has been done on the effects of endocrine disrupting chemicals (EDCs) on reproduction and gene expression in the brain, liver and gonads of teleost fish, but information on impacts to the pituitary gland are still limited despite its central role in regulating reproduction. The aim of this study was to further our understanding of the potential effects of natural and synthetic estrogens on the brain–pituitary–gonad axis in fish by determining the effects of 17α-ethynylestradiol (EE2) on the pituitary transcriptome. We exposed sub-adult coho salmon (Oncorhynchus kisutch) to 0 or 12 ng EE2/L for up to 6 weeks and effects on the pituitary transcriptome of females were assessed using high-throughput Illumina{sup ®} sequencing, RNA-Seq and pathway analysis. After 1 or 6 weeks, 218 and 670 contiguous sequences (contigs) respectively, were differentially expressed in pituitaries of EE2-exposed fish relative to control. Two of the most highly up- and down-regulated contigs were luteinizing hormone β subunit (241-fold and 395-fold at 1 and 6 weeks, respectively) and follicle-stimulating hormone β subunit (−3.4-fold at 6 weeks). Additional contigs related to gonadotropin synthesis and release were differentially expressed in EE2-exposed fish relative to controls. These included contigs involved in gonadotropin releasing hormone (GNRH) and transforming growth factor-β signaling. There was an over-representation of significantly affected contigs in 33 and 18 canonical pathways at 1 and 6 weeks

  12. A new sieving matrix for DNA sequencing, genotyping and mutation detection and high-throughput genotyping with a 96-capillary array system

    Energy Technology Data Exchange (ETDEWEB)

    Gao, David [Iowa State Univ., Ames, IA (United States)

    1999-11-08

    Capillary electrophoresis has been widely accepted as a fast separation technique in DNA analysis. In this dissertation, a new sieving matrix is described for DNA analysis, especially DNA sequencing, genetic typing and mutation detection. A high-throughput 96 capillary array electrophoresis system was also demonstrated for simultaneous multiple genotyping. The authors first evaluated the influence of different capillary coatings on the performance of DNA sequencing. A bare capillary was compared with a DB-wax, an FC-coated and a polyvinylpyrrolidone dynamically coated capillary with PEO as sieving matrix. It was found that covalently-coated capillaries had no better performance than bare capillaries while PVP coating provided excellent and reproducible results. The authors also developed a new sieving Matrix for DNA separation based on commercially available poly(vinylpyrrolidone) (PVP). This sieving matrix has a very low viscosity and an excellent self-coating effect. Successful separations were achieved in uncoated capillaries. Sequencing of M13mp18 showed good resolution up to 500 bases in treated PVP solution. Temperature gradient capillary electrophoresis and PVP solution was applied to mutation detection. A heteroduplex sample and a homoduplex reference were injected during a pair of continuous runs. A temperature gradient of 10 C with a ramp of 0.7 C/min was swept throughout the capillary. Detection was accomplished by laser induced fluorescence detection. Mutation detection was performed by comparing the pattern changes between the homoduplex and the heteroduplex samples. High throughput, high detection rate and easy operation were achieved in this system. They further demonstrated fast and reliable genotyping based on CTTv STR system by multiple-capillary array electrophoresis. The PCR products from individuals were mixed with pooled allelic ladder as an absolute standard and coinjected with a 96-vial tray. Simultaneous one-color laser-induced fluorescence

  13. Identification and characterization of microRNAs related to salt stress in broccoli, using high-throughput sequencing and bioinformatics analysis.

    Science.gov (United States)

    Tian, Yunhong; Tian, Yunming; Luo, Xiaojun; Zhou, Tao; Huang, Zuoping; Liu, Ying; Qiu, Yihan; Hou, Bing; Sun, Dan; Deng, Hongyu; Qian, Shen; Yao, Kaitai

    2014-09-03

    MicroRNAs (miRNAs) are a new class of endogenous regulators of a broad range of physiological processes, which act by regulating gene expression post-transcriptionally. The brassica vegetable, broccoli (Brassica oleracea var. italica), is very popular with a wide range of consumers, but environmental stresses such as salinity are a problem worldwide in restricting its growth and yield. Little is known about the role of miRNAs in the response of broccoli to salt stress. In this study, broccoli subjected to salt stress and broccoli grown under control conditions were analyzed by high-throughput sequencing. Differential miRNA expression was confirmed by real-time reverse transcription polymerase chain reaction (RT-PCR). The prediction of miRNA targets was undertaken using the Kyoto Encyclopedia of Genes and Genomes (KEGG) Orthology (KO) database and Gene Ontology (GO)-enrichment analyses. Two libraries of small (or short) RNAs (sRNAs) were constructed and sequenced by high-throughput Solexa sequencing. A total of 24,511,963 and 21,034,728 clean reads, representing 9,861,236 (40.23%) and 8,574,665 (40.76%) unique reads, were obtained for control and salt-stressed broccoli, respectively. Furthermore, 42 putative known and 39 putative candidate miRNAs that were differentially expressed between control and salt-stressed broccoli were revealed by their read counts and confirmed by the use of stem-loop real-time RT-PCR. Amongst these, the putative conserved miRNAs, miR393 and miR855, and two putative candidate miRNAs, miR3 and miR34, were the most strongly down-regulated when broccoli was salt-stressed, whereas the putative conserved miRNA, miR396a, and the putative candidate miRNA, miR37, were the most up-regulated. Finally, analysis of the predicted gene targets of miRNAs using the GO and KO databases indicated that a range of metabolic and other cellular functions known to be associated with salt stress were up-regulated in broccoli treated with salt. A comprehensive

  14. High-Throughput Sequencing of MicroRNAs in Adenovirus Type 3 Infected Human Laryngeal Epithelial Cells

    Directory of Open Access Journals (Sweden)

    Yuhua Qi

    2010-01-01

    Full Text Available Adenovirus infection can cause various illnesses depending on the infecting serotype, such as gastroenteritis, conjunctivitis, cystitis, and rash illness, but the infection mechanism is still unknown. MicroRNAs (miRNA have been reported to play essential roles in cell proliferation, cell differentiation, and pathogenesis of human diseases including viral infections. We analyzed the miRNA expression profiles from adenovirus type 3 (AD3 infected Human laryngeal epithelial (Hep2 cells using a SOLiD deep sequencing. 492 precursor miRNAs were identified in the AD3 infected Hep2 cells, and 540 precursor miRNAs were identified in the control. A total of 44 miRNAs demonstrated high expression and 36 miRNAs showed lower expression in the AD3 infected cells than control. The biogenesis of miRNAs has been analyzed, and some of the SOLiD results were confirmed by Quantitative PCR analysis. The present studies may provide a useful clue for the biological function research into AD3 infection.

  15. Characterization of P5CS gene in Calotropis procera plant from the de novo assembled transcriptome contigs of the high-throughput sequencing dataset.

    Science.gov (United States)

    Ramadan, Ahmed M; Hassanein, Sameh E

    2014-12-01

    The wild plant known as Calotropis procera is important in medicine, industry and ornamental fields. Due to spread in areas that suffer from environmental stress, it has a large number of tolerance genes to environmental stress such as drought and salinity. Proline is one of the most compatible solutes that accumulate widely in plants to tolerate unfavorable environmental conditions. Plant proline synthesis depends on Δ-pyrroline-5-carboxylate synthase (P5CS) gene. But information about this gene in C. procera is unavailable. In this study, we uncovered and characterized P5CS (P5CS, NCBI accession no. KJ020750) gene in this medicinal plant from the de novo assembled transcriptome contigs of the high-throughput sequencing dataset. A number of GenBank accessions for P5CS sequences were blasted with the recovered de novo assembled contigs. Homology modeling of the deduced amino acids (NCBI accession No. AHM25913) was further carried out using Swiss-Model, accessible via the EXPASY. Superimposition of C. procera P5CS-like full sequence model on Homo sapiens (P5CS_HUMAN, UniProt protein accession no. P54886) was constructed using RasMol and Deep-View programs. The functional domains of the novel P5CS amino acids sequence were identified from the NCBI conserved domain database (CDD) that provide insights into sequence structure/function relationships, as well as domain models imported from a number of external source databases (Pfam, SMART, COG, PRK, TIGRFAM). Copyright © 2014 Académie des sciences. Published by Elsevier SAS. All rights reserved.

  16. Detection of a Usp-like gene in Calotropis procera plant from the de novo assembled genome contigs of the high-throughput sequencing dataset.

    Science.gov (United States)

    Shokry, Ahmed M; Al-Karim, Saleh; Ramadan, Ahmed; Gadallah, Nour; Al Attas, Sanaa G; Sabir, Jamal S M; Hassan, Sabah M; Madkour, Magdy A; Bressan, Ray; Mahfouz, Magdy; Bahieldin, Ahmed

    2014-02-01

    The wild plant species Calotropis procera (C. procera) has many potential applications and beneficial uses in medicine, industry and ornamental field. It also represents an excellent source of genes for drought and salt tolerance. Genes encoding proteins that contain the conserved universal stress protein (USP) domain are known to provide organisms like bacteria, archaea, fungi, protozoa and plants with the ability to respond to a plethora of environmental stresses. However, information on the possible occurrence of Usp in C. procera is not available. In this study, we uncovered and characterized a one-class A Usp-like (UspA-like, NCBI accession No. KC954274) gene in this medicinal plant from the de novo assembled genome contigs of the high-throughput sequencing dataset. A number of GenBank accessions for Usp sequences were blasted with the recovered de novo assembled contigs. Homology modelling of the deduced amino acids (NCBI accession No. AGT02387) was further carried out using Swiss-Model, accessible via the EXPASY. Superimposition of C. procera USPA-like full sequence model on Thermus thermophilus USP UniProt protein (PDB accession No. Q5SJV7) was constructed using RasMol and Deep-View programs. The functional domains of the novel USPA-like amino acids sequence were identified from the NCBI conserved domain database (CDD) that provide insights into sequence structure/function relationships, as well as domain models imported from a number of external source databases (Pfam, SMART, COG, PRK, TIGRFAM). Copyright © 2014 Académie des sciences. All rights reserved.

  17. Detection of a Usp-like gene in Calotropis procera plant from the de novo assembled genome contigs of the high-throughput sequencing dataset

    KAUST Repository

    Shokry, Ahmed M.

    2014-02-01

    The wild plant species Calotropis procera (C. procera) has many potential applications and beneficial uses in medicine, industry and ornamental field. It also represents an excellent source of genes for drought and salt tolerance. Genes encoding proteins that contain the conserved universal stress protein (USP) domain are known to provide organisms like bacteria, archaea, fungi, protozoa and plants with the ability to respond to a plethora of environmental stresses. However, information on the possible occurrence of Usp in C. procera is not available. In this study, we uncovered and characterized a one-class A Usp-like (UspA-like, NCBI accession No. KC954274) gene in this medicinal plant from the de novo assembled genome contigs of the high-throughput sequencing dataset. A number of GenBank accessions for Usp sequences were blasted with the recovered de novo assembled contigs. Homology modelling of the deduced amino acids (NCBI accession No. AGT02387) was further carried out using Swiss-Model, accessible via the EXPASY. Superimposition of C. procera USPA-like full sequence model on Thermus thermophilus USP UniProt protein (PDB accession No. Q5SJV7) was constructed using RasMol and Deep-View programs. The functional domains of the novel USPA-like amino acids sequence were identified from the NCBI conserved domain database (CDD) that provide insights into sequence structure/function relationships, as well as domain models imported from a number of external source databases (Pfam, SMART, COG, PRK, TIGRFAM). © 2014 Académie des sciences.

  18. Design and evaluation of universal 16S rRNA gene primers for high-throughput sequencing to simultaneously detect DAMO microbes and anammox bacteria.

    Science.gov (United States)

    Lu, Yong-Ze; Ding, Zhao-Wei; Ding, Jing; Fu, Liang; Zeng, Raymond J

    2015-12-15

    To develop universal 16S rRNA gene primers for high-throughput sequencing for the simultaneous detection of denitrifying anaerobic methane oxidation (DAMO) archaea, DAMO bacteria, and anaerobic ammonium oxidation (anammox) bacteria, four published primer sets (PS2-PS5) were modified. The overall coverage of the four primer pairs was evaluated in silico with the Silva SSU r119 dataset. Based on the virtual evaluation, the two best primer pairs (PS4 and PS5) were selected for further verification. Illumina MiSeq sequencing of a freshwater sediment and a culture from a DAMO-anammox reactor using these two primer pairs revealed that PS5 (341b4F-806R) was the most promising universal primer pair. This pair of primers detected both archaea and bacteria with less bias than PS4. Furthermore, an anaerobic fermentation culture and a wastewater treatment plant culture were used to verify the accuracy of PS5. More importantly, it detected DAMO archaea, DAMO bacteria, and anammox bacteria simultaneously with no false positives appeared. This universal 16S rRNA gene primer pair extends the existing molecular tools for studying the community structures and distributions of DAMO microbes and their potential interactions with anammox bacteria in different environments. Copyright © 2015 Elsevier Ltd. All rights reserved.

  19. [Biological ingredient analysis of traditional Chinese medicines utilizing metagenomic approach based on high-throughput-sequencing and big-data-mining].

    Science.gov (United States)

    Bai, Hong; Ning, Kang; Wang, Chang-yun

    2015-03-01

    The quality of traditional Chinese medicines (TCMs) has been mainly evaluated based on chemical ingredients, yet recently more attentions have been paid on biological ingredients, especially for pill-based preparations. It is a key approach to establish a fast, accurate and systematic method of biological ingredient analysis for realization of modernization, industrialization and internationalization of TCMs. The biological ingredient analysis of TCM preparations could be abstracted as the identification of multiple species from a biological mixture. The metagenomic approach based on high-throughput-sequencing (HTS) and big-data-mining has been considered as one of the most effective methods for multiple species analysis of a biological mixture, which would also be helpful for the analysis of biological ingredients in TCMs. Simultaneous identification of diverse species, including the prescribed species, adulterants, toxic species, protected species and even the biological impurities introduced through production process, could be achieved by selecting appropriate DNA biomarkers, as well as applying large-scale sequence comparison and data mining. By this approach, it is prospective to offer an evaluation basis for the effectiveness, safety and legality of TCM preparations.

  20. Identification and characterization of miRNAs in ripening fruit of Lycium barbarum L. using high-throughput sequencing

    Directory of Open Access Journals (Sweden)

    Shaohua eZeng

    2015-09-01

    Full Text Available MicroRNAs (miRNAs are master regulators of gene activity documented to play central roles in fruit ripening in model plant species, yet little is known of their roles in Lycium barbarum L. fruits. In this study, miRNA levels in L. barbarum fruit samples at four developmental stages, were assayed using Illumina HiSeqTM2000. This revealed the presence of 50 novel miRNAs and 38 known miRNAs in L. barbarum fruits. Of the novel miRNAs, 36 were specific to L. barbarum fruits compared with L. chinense. A number of stage-specific miRNAs were identified and GO terms were assigned to 194 unigenes targeted by miRNAs. The majority of GO terms of unigenes targeted by differentially expressed miRNAs are ‘intracellular organelle’, ‘binding’, ‘metabolic process’, ‘pigmentation’, and ‘biological regulation’. Enriched KEGG analysis indicated that nucleotide excision repair and ubiquitin mediated proteolysis were over-represented during the initial stage of ripening, with ABC transporters and sulfur metabolism pathways active during the middle stages and ABC transporters and spliceosome enriched in the final stages of ripening. Several miRNAs and their targets serving as potential regulators in L. barbarum fruit ripening were identified using quantitative reverse transcription polymerase chain reaction. The miRNA-target interactions were predicted for L. barbarum ripening regulators including miR156/157 with LbCNR and LbWRKY8, and miR171 with LbGRAS. Additionally, regulatory interactions potentially controlling fruit quality and nutritional value via sugar and secondary metabolite accumulation were identified. These include miR156 targeting of fructokinase and 1-deoxy-D-xylulose-5-phosphate synthase and miR164 targeting of beta-fructofuranosidase. In sum, valuable information revealed by small RNA sequencing in this study will provide a solid foundation for uncovering the miRNA-mediated mechanism of fruit ripening and quality in this

  1. Identification and characterization of miRNAs in ripening fruit of Lycium barbarum L. using high-throughput sequencing.

    Science.gov (United States)

    Zeng, Shaohua; Liu, Yongliang; Pan, Lizhu; Hayward, Alice; Wang, Ying

    2015-01-01

    MicroRNAs (miRNAs) are master regulators of gene activity documented to play central roles in fruit ripening in model plant species, yet little is known of their roles in Lycium barbarum L. fruits. In this study, miRNA levels in L. barbarum fruit samples at four developmental stages, were assayed using Illumina HiSeqTM2000. This revealed the presence of 50 novel miRNAs and 38 known miRNAs in L. barbarum fruits. Of the novel miRNAs, 36 were specific to L. barbarum fruits compared with L. chinense. A number of stage-specific miRNAs were identified and GO terms were assigned to 194 unigenes targeted by miRNAs. The majority of GO terms of unigenes targeted by differentially expressed miRNAs are "intracellular organelle," "binding," "metabolic process," "pigmentation," and "biological regulation." Enriched KEGG analysis indicated that nucleotide excision repair and ubiquitin mediated proteolysis were over-represented during the initial stage of ripening, with ABC transporters and sulfur metabolism pathways active during the middle stages and ABC transporters and spliceosome enriched in the final stages of ripening. Several miRNAs and their targets serving as potential regulators in L. barbarum fruit ripening were identified using quantitative reverse transcription polymerase chain reaction. The miRNA-target interactions were predicted for L. barbarum ripening regulators including miR156/157 with LbCNR and LbWRKY8, and miR171 with LbGRAS. Additionally, regulatory interactions potentially controlling fruit quality and nutritional value via sugar and secondary metabolite accumulation were identified. These include miR156 targeting of fructokinase and 1-deoxy-D-xylulose-5-phosphate synthase and miR164 targeting of beta-fructofuranosidase. In sum, valuable information revealed by small RNA sequencing in this study will provide a solid foundation for uncovering the miRNA-mediated mechanism of fruit ripening and quality in this nutritional food.

  2. MetaGenSense: A web-application for analysis and exploration of high throughput sequencing metagenomic data [version 3; referees: 1 approved, 2 approved with reservations

    Directory of Open Access Journals (Sweden)

    Damien Correia

    2016-12-01

    Full Text Available The detection and characterization of emerging infectious agents has been a continuing public health concern. High Throughput Sequencing (HTS or Next-Generation Sequencing (NGS technologies have proven to be promising approaches for efficient and unbiased detection of pathogens in complex biological samples, providing access to comprehensive analyses. As NGS approaches typically yield millions of putatively representative reads per sample, efficient data management and visualization resources have become mandatory. Most usually, those resources are implemented through a dedicated Laboratory Information Management System (LIMS, solely to provide perspective regarding the available information. We developed an easily deployable web-interface, facilitating management and bioinformatics analysis of metagenomics data-samples. It was engineered to run associated and dedicated Galaxy workflows for the detection and eventually classification of pathogens. The web application allows easy interaction with existing Galaxy metagenomic workflows, facilitates the organization, exploration and aggregation of the most relevant sample-specific sequences among millions of genomic sequences, allowing them to determine their relative abundance, and associate them to the most closely related organism or pathogen. The user-friendly Django-Based interface, associates the users’ input data and its metadata through a bio-IT provided set of resources (a Galaxy instance, and both sufficient storage and grid computing power. Galaxy is used to handle and analyze the user’s input data from loading, indexing, mapping, assembly and DB-searches. Interaction between our application and Galaxy is ensured by the BioBlend library, which gives API-based access to Galaxy’s main features. Metadata about samples, runs, as well as the workflow results are stored in the LIMS. For metagenomic classification and exploration purposes, we show, as a proof of concept, that integration

  3. MetaGenSense: A web-application for analysis and exploration of high throughput sequencing metagenomic data [version 2; referees: 1 approved, 2 approved with reservations

    Directory of Open Access Journals (Sweden)

    Damien Correia

    2016-08-01

    Full Text Available The detection and characterization of emerging infectious agents has been a continuing public health concern. High Throughput Sequencing (HTS or Next-Generation Sequencing (NGS technologies have proven to be promising approaches for efficient and unbiased detection of pathogens in complex biological samples, providing access to comprehensive analyses. As NGS approaches typically yield millions of putatively representative reads per sample, efficient data management and visualization resources have become mandatory. Most usually, those resources are implemented through a dedicated Laboratory Information Management System (LIMS, solely to provide perspective regarding the available information. We developed an easily deployable web-interface, facilitating management and bioinformatics analysis of metagenomics data-samples. It was engineered to run associated and dedicated Galaxy workflows for the detection and eventually classification of pathogens. The web application allows easy interaction with existing Galaxy metagenomic workflows, facilitates the organization, exploration and aggregation of the most relevant sample-specific sequences among millions of genomic sequences, allowing them to determine their relative abundance, and associate them to the most closely related organism or pathogen. The user-friendly Django-Based interface, associates the users’ input data and its metadata through a bio-IT provided set of resources (a Galaxy instance, and both sufficient storage and grid computing power. Galaxy is used to handle and analyze the user’s input data from loading, indexing, mapping, assembly and DB-searches. Interaction between our application and Galaxy is ensured by the BioBlend library, which gives API-based access to Galaxy’s main features. Metadata about samples, runs, as well as the workflow results are stored in the LIMS. For metagenomic classification and exploration purposes, we show, as a proof of concept, that integration

  4. Genome-wide identification of cold-responsive and new microRNAs in Populus tomentosa by high-throughput sequencing.

    Science.gov (United States)

    Chen, Lei; Zhang, Yiyun; Ren, Yuanyuan; Xu, Jichen; Zhang, Zhiyi; Wang, Yanwei

    2012-01-13

    MicroRNAs (miRNAs) are small, non-coding RNAs that regulate the expression of target mRNAs in plant growth, development, abiotic stress responses, and pathogen responses. Cold stress is one of the most common abiotic factors affecting plants, and it adversely affects plant growth, development, and spatial distribution. To understand the roles of miRNAs under cold stress in Populus tomentosa, we constructed two small RNA libraries from plantlets treated or not with cold conditions (4°C for 8 h). High-throughput sequencing of the two libraries identified 144 conserved miRNAs belonging to 33 miRNA families and 29 new miRNAs (as well as their corresponding miRNA(∗)s) belonging to 23 miRNA families. Differential expression analysis showed that 21 miRNAs were down-regulated and nine miRNAs were up-regulated in response to cold stress. Among them, 19 cold-responsive miRNAs, two new miRNAs and their corresponding miRNA(∗)s were validated by qRT-PCR. A total of 101 target genes of the new miRNAs were predicted using a bioinformatics approach. These target genes are involved in growth and resistance to various stresses. The results demonstrated that Populus miRNAs play critical roles in the cold stress response. Copyright © 2011 Elsevier Inc. All rights reserved.

  5. Identification and characterization of miRNAs in two closely related C4 and C3 species of Cleome by high-throughput sequencing.

    Science.gov (United States)

    Gao, Shuangcheng; Zhao, Wei; Li, Xiang; You, Qingbo; Shen, Xinjie; Guo, Wei; Wang, Shihua; Shi, Guoan; Liu, Zheng; Jiao, Yongqing

    2017-04-19

    Cleome gynandra and Cleome hassleriana, which are C4 and C3 plants, respectively, are two species of Cleome. The close genetic relationship between C. gynandra and C. hassleriana provides advantages for discovering the differences in leaf development and physiological processes between C3 and C4 plants. MicroRNAs (miRNAs) are a class of important regulators of various biological processes. In this study, we investigate the differences in the characteristics of miRNAs between C. gynandra and C. hassleriana using high-throughput sequencing technology. In total, 94 and 102 known miRNAs were identified in C. gynandra and C. hassleriana, respectively, of which 3 were specific for C. gynandra and 10 were specific for C. hassleriana. Ninety-one common miRNAs were identified in both species. In addition, 4 novel miRNAs were detected, including three in C. gynandra and three in C. hassleriana. Of these miRNAs, 67 were significantly differentially expressed between these two species and were involved in extensive biological processes, such as glycol-metabolism and photosynthesis. Our study not only provided resources for C. gynandra and C. hassleriana research but also provided useful clues for the understanding of the roles of miRNAs in the alterations of biological processes in leaf tissues during the evolution of the C4 pathway.

  6. A complementary role of multiparameter flow cytometry and high-throughput sequencing for minimal residual disease detection in chronic lymphocytic leukemia: an European Research Initiative on CLL study.

    LENUS (Irish Health Repository)

    Rawstron, A C

    2016-04-01

    In chronic lymphocytic leukemia (CLL) the level of minimal residual disease (MRD) after therapy is an independent predictor of outcome. Given the increasing number of new agents being explored for CLL therapy, using MRD as a surrogate could greatly reduce the time necessary to assess their efficacy. In this European Research Initiative on CLL (ERIC) project we have identified and validated a flow-cytometric approach to reliably quantitate CLL cells to the level of 0.0010% (10(-5)). The assay comprises a core panel of six markers (i.e. CD19, CD20, CD5, CD43, CD79b and CD81) with a component specification independent of instrument and reagents, which can be locally re-validated using normal peripheral blood. This method is directly comparable to previous ERIC-designed assays and also provides a backbone for investigation of new markers. A parallel analysis of high-throughput sequencing using the ClonoSEQ assay showed good concordance with flow cytometry results at the 0.010% (10(-4)) level, the MRD threshold defined in the 2008 International Workshop on CLL guidelines, but it also provides good linearity to a detection limit of 1 in a million (10(-6)). The combination of both technologies would permit a highly sensitive approach to MRD detection while providing a reproducible and broadly accessible method to quantify residual disease and optimize treatment in CLL.

  7. A complementary role of multiparameter flow cytometry and high-throughput sequencing for minimal residual disease detection in chronic lymphocytic leukemia: an European Research Initiative on CLL study

    Science.gov (United States)

    Rawstron, A C; Fazi, C; Agathangelidis, A; Villamor, N; Letestu, R; Nomdedeu, J; Palacio, C; Stehlikova, O; Kreuzer, K-A; Liptrot, S; O'Brien, D; de Tute, R M; Marinov, I; Hauwel, M; Spacek, M; Dobber, J; Kater, A P; Gambell, P; Soosapilla, A; Lozanski, G; Brachtl, G; Lin, K; Boysen, J; Hanson, C; Jorgensen, J L; Stetler-Stevenson, M; Yuan, C; Broome, H E; Rassenti, L; Craig, F; Delgado, J; Moreno, C; Bosch, F; Egle, A; Doubek, M; Pospisilova, S; Mulligan, S; Westerman, D; Sanders, C M; Emerson, R; Robins, H S; Kirsch, I; Shanafelt, T; Pettitt, A; Kipps, T J; Wierda, W G; Cymbalista, F; Hallek, M; Hillmen, P; Montserrat, E; Ghia, P

    2016-01-01

    In chronic lymphocytic leukemia (CLL) the level of minimal residual disease (MRD) after therapy is an independent predictor of outcome. Given the increasing number of new agents being explored for CLL therapy, using MRD as a surrogate could greatly reduce the time necessary to assess their efficacy. In this European Research Initiative on CLL (ERIC) project we have identified and validated a flow-cytometric approach to reliably quantitate CLL cells to the level of 0.0010% (10−5). The assay comprises a core panel of six markers (i.e. CD19, CD20, CD5, CD43, CD79b and CD81) with a component specification independent of instrument and reagents, which can be locally re-validated using normal peripheral blood. This method is directly comparable to previous ERIC-designed assays and also provides a backbone for investigation of new markers. A parallel analysis of high-throughput sequencing using the ClonoSEQ assay showed good concordance with flow cytometry results at the 0.010% (10−4) level, the MRD threshold defined in the 2008 International Workshop on CLL guidelines, but it also provides good linearity to a detection limit of 1 in a million (10−6). The combination of both technologies would permit a highly sensitive approach to MRD detection while providing a reproducible and broadly accessible method to quantify residual disease and optimize treatment in CLL. PMID:26639181

  8. Identification of Treponema pedis as the predominant Treponema species in porcine skin ulcers by fluorescence in situ hybridization and high-throughput sequencing.

    Science.gov (United States)

    Karlsson, Frida; Klitgaard, Kirstine; Jensen, Tim Kåre

    2014-06-25

    Skin lesions often seen in pig production are of great animal welfare concern. To study the potential role of Treponema bacteria in porcine skin ulcers, we investigated the presence and distribution of these organisms in decubital shoulder ulcers (n=51) and ear necroses (n=54) by fluorescence in situ hybridization (FISH) and high-throughput sequencing. In addition, two cases of facial ulcers and five cases of other skin ulcers were included in the study. Samples from all 112 skin lesions and intact skin from pigs without skin ulcers (n=14) were screened by FISH. Three different oligonucleotide probes targeting 16S rRNA were used, specific for domain bacterium, Treponema spp. and species T. pedis. Screening showed that two cases each of facial and other ulcers, 35 (69%) of shoulder ulcers and 32 (59%) of ear necroses were positive for Treponema spp. T. pedis was the unequivocally, predominant species typically constituting more than 90% of the treponemes in a lesion, assessed visually by microscopy. Altogether, T. pedis was demonstrated in 69 of the 71 Treponema spp. positive lesions. We conclude that Treponema spp. are frequently present and abundant in various skin ulcers of pigs. The results from this study point toward an important role of T. pedis as a secondary bacterial infection in porcine skin ulcers, especially in severe and chronic lesions. Copyright © 2014 Elsevier B.V. All rights reserved.

  9. Identification and Analysis of Red Sea Mangrove (Avicennia marina) microRNAs by High-Throughput Sequencing and Their Association with Stress Responses

    KAUST Repository

    Khraiwesh, Basel

    2013-04-08

    Although RNA silencing has been studied primarily in model plants, advances in high-throughput sequencing technologies have enabled profiling of the small RNA components of many more plant species, providing insights into the ubiquity and conservatism of some miRNA-based regulatory mechanisms. Small RNAs of 20 to 24 nucleotides (nt) are important regulators of gene transcript levels by either transcriptional or by posttranscriptional gene silencing, contributing to genome maintenance and controlling a variety of developmental and physiological processes. Here, we used deep sequencing and molecular methods to create an inventory of the small RNAs in the mangrove species, Avicennia marina. We identified 26 novel mangrove miRNAs and 193 conserved miRNAs belonging to 36 families. We determined that 2 of the novel miRNAs were produced from known miRNA precursors and 4 were likely to be species-specific by the criterion that we found no homologs in other plant species. We used qRT-PCR to analyze the expression of miRNAs and their target genes in different tissue sets and some demonstrated tissue-specific expression. Furthermore, we predicted potential targets of these putative miRNAs based on a sequence homology and experimentally validated through endonucleolytic cleavage assays. Our results suggested that expression profiles of miRNAs and their predicted targets could be useful in exploring the significance of the conservation patterns of plants, particularly in response to abiotic stress. Because of their well-developed abilities in this regard, mangroves and other extremophiles are excellent models for such exploration. © 2013 Khraiwesh et al.

  10. Comprehensive processing of high-throughput small RNA sequencing data including quality checking, normalization, and differential expression analysis using the UEA sRNA Workbench.

    Science.gov (United States)

    Beckers, Matthew; Mohorianu, Irina; Stocks, Matthew; Applegate, Christopher; Dalmay, Tamas; Moulton, Vincent

    2017-06-01

    Recently, high-throughput sequencing (HTS) has revealed compelling details about the small RNA (sRNA) population in eukaryotes. These 20 to 25 nt noncoding RNAs can influence gene expression by acting as guides for the sequence-specific regulatory mechanism known as RNA silencing. The increase in sequencing depth and number of samples per project enables a better understanding of the role sRNAs play by facilitating the study of expression patterns. However, the intricacy of the biological hypotheses coupled with a lack of appropriate tools often leads to inadequate mining of the available data and thus, an incomplete description of the biological mechanisms involved. To enable a comprehensive study of differential expression in sRNA data sets, we present a new interactive pipeline that guides researchers through the various stages of data preprocessing and analysis. This includes various tools, some of which we specifically developed for sRNA analysis, for quality checking and normalization of sRNA samples as well as tools for the detection of differentially expressed sRNAs and identification of the resulting expression patterns. The pipeline is available within the UEA sRNA Workbench, a user-friendly software package for the processing of sRNA data sets. We demonstrate the use of the pipeline on a H. sapiens data set; additional examples on a B. terrestris data set and on an A. thaliana data set are described in the Supplemental Information A comparison with existing approaches is also included, which exemplifies some of the issues that need to be addressed for sRNA analysis and how the new pipeline may be used to do this. © 2017 Beckers et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.

  11. Identification and analysis of red sea mangrove (Avicennia marina microRNAs by high-throughput sequencing and their association with stress responses.

    Directory of Open Access Journals (Sweden)

    Basel Khraiwesh

    Full Text Available Although RNA silencing has been studied primarily in model plants, advances in high-throughput sequencing technologies have enabled profiling of the small RNA components of many more plant species, providing insights into the ubiquity and conservatism of some miRNA-based regulatory mechanisms. Small RNAs of 20 to 24 nucleotides (nt are important regulators of gene transcript levels by either transcriptional or by posttranscriptional gene silencing, contributing to genome maintenance and controlling a variety of developmental and physiological processes. Here, we used deep sequencing and molecular methods to create an inventory of the small RNAs in the mangrove species, Avicennia marina. We identified 26 novel mangrove miRNAs and 193 conserved miRNAs belonging to 36 families. We determined that 2 of the novel miRNAs were produced from known miRNA precursors and 4 were likely to be species-specific by the criterion that we found no homologs in other plant species. We used qRT-PCR to analyze the expression of miRNAs and their target genes in different tissue sets and some demonstrated tissue-specific expression. Furthermore, we predicted potential targets of these putative miRNAs based on a sequence homology and experimentally validated through endonucleolytic cleavage assays. Our results suggested that expression profiles of miRNAs and their predicted targets could be useful in exploring the significance of the conservation patterns of plants, particularly in response to abiotic stress. Because of their well-developed abilities in this regard, mangroves and other extremophiles are excellent models for such exploration.

  12. The use of high-throughput small RNA sequencing reveals differentially expressed microRNAs in response to aster yellows phytoplasma-infection in Vitis vinifera cv. 'Chardonnay'.

    Science.gov (United States)

    Snyman, Marius C; Solofoharivelo, Marie-Chrystine; Souza-Richards, Rose; Stephan, Dirk; Murray, Shane; Burger, Johan T

    2017-01-01

    Phytoplasmas are cell wall-less plant pathogenic bacteria responsible for major crop losses throughout the world. In grapevine they cause grapevine yellows, a detrimental disease associated with a variety of symptoms. The high economic impact of this disease has sparked considerable interest among researchers to understand molecular mechanisms related to pathogenesis. Increasing evidence exist that a class of small non-coding endogenous RNAs, known as microRNAs (miRNAs), play an important role in post-transcriptional gene regulation during plant development and responses to biotic and abiotic stresses. Thus, we aimed to dissect complex high-throughput small RNA sequencing data for the genome-wide identification of known and novel differentially expressed miRNAs, using read libraries constructed from healthy and phytoplasma-infected Chardonnay leaf material. Furthermore, we utilised computational resources to predict putative miRNA targets to explore the involvement of possible pathogen response pathways. We identified multiple known miRNA sequence variants (isomiRs), likely generated through post-transcriptional modifications. Sequences of 13 known, canonical miRNAs were shown to be differentially expressed. A total of 175 novel miRNA precursor sequences, each derived from a unique genomic location, were predicted, of which 23 were differentially expressed. A homology search revealed that some of these novel miRNAs shared high sequence similarity with conserved miRNAs from other plant species, as well as known grapevine miRNAs. The relative expression of randomly selected known and novel miRNAs was determined with real-time RT-qPCR analysis, thereby validating the trend of expression seen in the normalised small RNA sequencing read count data. Among the putative miRNA targets, we identified genes involved in plant morphology, hormone signalling, nutrient homeostasis, as well as plant stress. Our results may assist in understanding the role that miRNA pathways play

  13. High Throughput Sequencing to Detect Differences in Methanotrophic Methylococcaceae and Methylocystaceae in Surface Peat, Forest Soil, and Sphagnum Moss in Cranesville Swamp Preserve, West Virginia, USA

    Science.gov (United States)

    Lau, Evan; Nolan, Edward J.; Dillard, Zachary W.; Dague, Ryan D.; Semple, Amanda L.; Wentzell, Wendi L.

    2015-01-01

    Northern temperate forest soils and Sphagnum-dominated peatlands are a major source and sink of methane. In these ecosystems, methane is mainly oxidized by aerobic methanotrophic bacteria, which are typically found in aerated forest soils, surface peat, and Sphagnum moss. We contrasted methanotrophic bacterial diversity and abundances from the (i) organic horizon of forest soil; (ii) surface peat; and (iii) submerged Sphagnum moss from Cranesville Swamp Preserve, West Virginia, using multiplex sequencing of bacterial 16S rRNA (V3 region) gene amplicons. From ~1 million reads, >50,000 unique OTUs (Operational Taxonomic Units), 29 and 34 unique sequences were detected in the Methylococcaceae and Methylocystaceae, respectively, and 24 potential methanotrophs in the Beijerinckiaceae were also identified. Methylacidiphilum-like methanotrophs were not detected. Proteobacterial methanotrophic bacteria constitute Sphagnum moss) or co-occurred in both Sphagnum moss and peat. This study provides insights into the structure of methanotrophic communities in relationship to habitat type, and suggests that peat and Sphagnum moss can influence methanotroph community structure and biogeography. PMID:27682082

  14. High Throughput Sequencing to Detect Differences in Methanotrophic Methylococcaceae and Methylocystaceae in Surface Peat, Forest Soil, and Sphagnum Moss in Cranesville Swamp Preserve, West Virginia, USA

    Directory of Open Access Journals (Sweden)

    Evan Lau

    2015-04-01

    Full Text Available Northern temperate forest soils and Sphagnum-dominated peatlands are a major source and sink of methane. In these ecosystems, methane is mainly oxidized by aerobic methanotrophic bacteria, which are typically found in aerated forest soils, surface peat, and Sphagnum moss. We contrasted methanotrophic bacterial diversity and abundances from the (i organic horizon of forest soil; (ii surface peat; and (iii submerged Sphagnum moss from Cranesville Swamp Preserve, West Virginia, using multiplex sequencing of bacterial 16S rRNA (V3 region gene amplicons. From ~1 million reads, >50,000 unique OTUs (Operational Taxonomic Units, 29 and 34 unique sequences were detected in the Methylococcaceae and Methylocystaceae, respectively, and 24 potential methanotrophs in the Beijerinckiaceae were also identified. Methylacidiphilum-like methanotrophs were not detected. Proteobacterial methanotrophic bacteria constitute <2% of microbiota in these environments, with the Methylocystaceae one to two orders of magnitude more abundant than the Methylococcaceae in all environments sampled. The Methylococcaceae are also less diverse in forest soil compared to the other two habitats. Nonmetric multidimensional scaling analyses indicated that the majority of methanotrophs from the Methylococcaceae and Methylocystaceae tend to occur in one habitat only (peat or Sphagnum moss or co-occurred in both Sphagnum moss and peat. This study provides insights into the structure of methanotrophic communities in relationship to habitat type, and suggests that peat and Sphagnum moss can influence methanotroph community structure and biogeography.

  15. High-throughput sequencing and degradome analysis reveal altered expression of miRNAs and their targets in a male-sterile cybrid pummelo (Citrus grandis).

    Science.gov (United States)

    Fang, Yan-Ni; Zheng, Bei-Bei; Wang, Lun; Yang, Wei; Wu, Xiao-Meng; Xu, Qiang; Guo, Wen-Wu

    2016-08-09

    G1 + HBP is a male sterile cybrid line with nuclear genome from Hirado Buntan pummelo (C. grandis Osbeck) (HBP) and mitochondrial genome from "Guoqing No.1" (G1, Satsuma mandarin), which provides a good opportunity to study male sterility and nuclear-cytoplasmic cross talk in citrus. High-throughput sRNA and degradome sequencing were applied to identify miRNAs and their targets in G1 + HBP and its fertile type HBP during reproductive development. A total of 184 known miRNAs, 22 novel miRNAs and 86 target genes were identified. Some of the targets are transcription factors involved in floral development, such as auxin response factors (ARFs), SQUAMOSA promoter binding protein box (SBP-box), MYB, basic region-leucine zipper (bZIP), APETALA2 (AP2) and transport inhibitor response 1 (TIR1). Eight target genes were confirmed to be sliced by corresponding miRNAs using 5' RACE technology. Based on the sequencing abundance, 42 differentially expressed miRNAs between sterile line G1 + HBP and fertile line HBP were identified. Differential expression of miRNAs and their target genes between two lines was validated by quantitative RT-PCR, and reciprocal expression patterns between some miRNAs and their targets were demonstrated. The regulatory mechanism of miR167a was investigated by yeast one-hybrid and dual-luciferase assays that one dehydrate responsive element binding (DREB) transcription factor binds to miR167a promoter and transcriptionally repress miR167 expression. Our study reveals the altered expression of miRNAs and their target genes in a male sterile line of pummelo and highlights that miRNA regulatory network may be involved in floral bud development and cytoplasmic male sterility in citrus.

  16. Transcriptome-Wide Analysis of Botrytis elliptica Responsive microRNAs and Their Targets in Lilium Regale Wilson by High-Throughput Sequencing and Degradome Analysis

    Directory of Open Access Journals (Sweden)

    Xue Gao

    2017-05-01

    Full Text Available MicroRNAs, as master regulators of gene expression, have been widely identified and play crucial roles in plant-pathogen interactions. A fatal pathogen, Botrytis elliptica, causes the serious folia disease of lily, which reduces production because of the high susceptibility of most cultivated species. However, the miRNAs related to Botrytis infection of lily, and the miRNA-mediated gene regulatory networks providing resistance to B. elliptica in lily remain largely unexplored. To systematically dissect B. elliptica-responsive miRNAs and their target genes, three small RNA libraries were constructed from the leaves of Lilium regale, a promising Chinese wild Lilium species, which had been subjected to mock B. elliptica treatment or B. elliptica infection for 6 and 24 h. By high-throughput sequencing, 71 known miRNAs belonging to 47 conserved families and 24 novel miRNA were identified, of which 18 miRNAs were downreguleted and 13 were upregulated in response to B. elliptica. Moreover, based on the lily mRNA transcriptome, 22 targets for 9 known and 1 novel miRNAs were identified by the degradome sequencing approach. Most target genes for elliptica-responsive miRNAs were involved in metabolic processes, few encoding different transcription factors, including ELONGATION FACTOR 1 ALPHA (EF1a and TEOSINTE BRANCHED1/CYCLOIDEA/PROLIFERATING CELL FACTOR 2 (TCP2. Furthermore, the expression patterns of a set of elliptica-responsive miRNAs and their targets were validated by quantitative real-time PCR. This study represents the first transcriptome-based analysis of miRNAs responsive to B. elliptica and their targets in lily. The results reveal the possible regulatory roles of miRNAs and their targets in B. elliptica interaction, which will extend our understanding of the mechanisms of this disease in lily.

  17. Application of High-Throughput Next-Generation Sequencing for HLA Typing on Buccal Extracted DNA: Results from over 10,000 Donor Recruitment Samples.

    Directory of Open Access Journals (Sweden)

    Yuxin Yin

    Full Text Available Unambiguous HLA typing is important in hematopoietic stem cell transplantation (HSCT, HLA disease association studies, and solid organ transplantation. However, current molecular typing methods only interrogate the antigen recognition site (ARS of HLA genes, resulting in many cis-trans ambiguities that require additional typing methods to resolve. Here we report high-resolution HLA typing of 10,063 National Marrow Donor Program (NMDP registry donors using long-range PCR by next generation sequencing (NGS approach on buccal swab DNA.Multiplex long-range PCR primers amplified the full-length of HLA class I genes (A, B, C from promotor to 3' UTR. Class II genes (DRB1, DQB1 were amplified from exon 2 through part of exon 4. PCR amplicons were pooled and sheared using Covaris fragmentation. Library preparation was performed using the Illumina TruSeq Nano kit on the Beckman FX automated platform. Each sample was tagged with a unique barcode, followed by 2×250 bp paired-end sequencing on the Illumina MiSeq. HLA typing was assigned using Omixon Twin software that combines two independent computational algorithms to ensure high confidence in allele calling. Consensus sequence and typing results were reported in Histoimmunogenetics Markup Language (HML format. All homozygous alleles were confirmed by Luminex SSO typing and exon novelties were confirmed by Sanger sequencing.Using this automated workflow, over 10,063 NMDP registry donors were successfully typed under high-resolution by NGS. Despite known challenges of nucleic acid degradation and low DNA concentration commonly associated with buccal-based specimens, 97.8% of samples were successfully amplified using long-range PCR. Among these, 98.2% were successfully reported by NGS, with an accuracy rate of 99.84% in an independent blind Quality Control audit performed by the NDMP. In this study, NGS-HLA typing identified 23 null alleles (0.023%, 92 rare alleles (0.091% and 42 exon novelties (0.042%.Long

  18. Application of High-Throughput Next-Generation Sequencing for HLA Typing on Buccal Extracted DNA: Results from over 10,000 Donor Recruitment Samples.

    Science.gov (United States)

    Yin, Yuxin; Lan, James H; Nguyen, David; Valenzuela, Nicole; Takemura, Ping; Bolon, Yung-Tsi; Springer, Brianna; Saito, Katsuyuki; Zheng, Ying; Hague