WorldWideScience

Sample records for high-throughput sequence analysis

  1. ESSENTIALS: Software for Rapid Analysis of High Throughput Transposon Insertion Sequencing Data.

    NARCIS (Netherlands)

    Zomer, A.L.; Burghout, P.J.; Bootsma, H.J.; Hermans, P.W.M.; Hijum, S.A.F.T. van

    2012-01-01

    High-throughput analysis of genome-wide random transposon mutant libraries is a powerful tool for (conditional) essential gene discovery. Recently, several next-generation sequencing approaches, e.g. Tn-seq/INseq, HITS and TraDIS, have been developed that accurately map the site of transposon

  2. Pyicos: a versatile toolkit for the analysis of high-throughput sequencing data

    OpenAIRE

    Althammer, Sonja Daniela; González-Vallinas Rostes, Juan, 1983-; Ballaré, Cecilia Julia; Beato, Miguel; Eyras Jiménez, Eduardo

    2011-01-01

    Motivation: High-throughput sequencing (HTS) has revolutionized gene regulation studies and is now fundamental for the detection of protein?DNA and protein?RNA binding, as well as for measuring RNA expression. With increasing variety and sequencing depth of HTS datasets, the need for more flexible and memory-efficient tools to analyse them is growing. Results: We describe Pyicos, a powerful toolkit for the analysis of mapped reads from diverse HTS experiments: ChIP-Seq, either punctuated or b...

  3. A priori Considerations When Conducting High-Throughput Amplicon-Based Sequence Analysis

    Directory of Open Access Journals (Sweden)

    Aditi Sengupta

    2016-03-01

    Full Text Available Amplicon-based sequencing strategies that include 16S rRNA and functional genes, alongside “meta-omics” analyses of communities of microorganisms, have allowed researchers to pose questions and find answers to “who” is present in the environment and “what” they are doing. Next-generation sequencing approaches that aid microbial ecology studies of agricultural systems are fast gaining popularity among agronomy, crop, soil, and environmental science researchers. Given the rapid development of these high-throughput sequencing techniques, researchers with no prior experience will desire information about the best practices that can be used before actually starting high-throughput amplicon-based sequence analyses. We have outlined items that need to be carefully considered in experimental design, sampling, basic bioinformatics, sequencing of mock communities and negative controls, acquisition of metadata, and in standardization of reaction conditions as per experimental requirements. Not all considerations mentioned here may pertain to a particular study. The overall goal is to inform researchers about considerations that must be taken into account when conducting high-throughput microbial DNA sequencing and sequences analysis.

  4. Compositional analysis: a valid approach to analyze microbiome high-throughput sequencing data.

    Science.gov (United States)

    Gloor, Gregory B; Reid, Gregor

    2016-08-01

    A workshop held at the 2015 annual meeting of the Canadian Society of Microbiologists highlighted compositional data analysis methods and the importance of exploratory data analysis for the analysis of microbiome data sets generated by high-throughput DNA sequencing. A summary of the content of that workshop, a review of new methods of analysis, and information on the importance of careful analyses are presented herein. The workshop focussed on explaining the rationale behind the use of compositional data analysis, and a demonstration of these methods for the examination of 2 microbiome data sets. A clear understanding of bioinformatics methodologies and the type of data being analyzed is essential, given the growing number of studies uncovering the critical role of the microbiome in health and disease and the need to understand alterations to its composition and function following intervention with fecal transplant, probiotics, diet, and pharmaceutical agents.

  5. ESSENTIALS: Software for Rapid Analysis of High Throughput Transposon Insertion Sequencing Data

    Science.gov (United States)

    Zomer, Aldert; Burghout, Peter; Bootsma, Hester J.; Hermans, Peter W. M.; van Hijum, Sacha A. F. T.

    2012-01-01

    High-throughput analysis of genome-wide random transposon mutant libraries is a powerful tool for (conditional) essential gene discovery. Recently, several next-generation sequencing approaches, e.g. Tn-seq/INseq, HITS and TraDIS, have been developed that accurately map the site of transposon insertions by mutant-specific amplification and sequence readout of DNA flanking the transposon insertions site, assigning a measure of essentiality based on the number of reads per insertion site flanking sequence or per gene. However, analysis of these large and complex datasets is hampered by the lack of an easy to use and automated tool for transposon insertion sequencing data. To fill this gap, we developed ESSENTIALS, an open source, web-based software tool for researchers in the genomics field utilizing transposon insertion sequencing analysis. It accurately predicts (conditionally) essential genes and offers the flexibility of using different sample normalization methods, genomic location bias correction, data preprocessing steps, appropriate statistical tests and various visualizations to examine the results, while requiring only a minimum of input and hands-on work from the researcher. We successfully applied ESSENTIALS to in-house and published Tn-seq, TraDIS and HITS datasets and we show that the various pre- and post-processing steps on the sequence reads and count data with ESSENTIALS considerably improve the sensitivity and specificity of predicted gene essentiality. PMID:22900082

  6. Construction and analysis of high-density linkage map using high-throughput sequencing data.

    Directory of Open Access Journals (Sweden)

    Dongyuan Liu

    Full Text Available Linkage maps enable the study of important biological questions. The construction of high-density linkage maps appears more feasible since the advent of next-generation sequencing (NGS, which eases SNP discovery and high-throughput genotyping of large population. However, the marker number explosion and genotyping errors from NGS data challenge the computational efficiency and linkage map quality of linkage study methods. Here we report the HighMap method for constructing high-density linkage maps from NGS data. HighMap employs an iterative ordering and error correction strategy based on a k-nearest neighbor algorithm and a Monte Carlo multipoint maximum likelihood algorithm. Simulation study shows HighMap can create a linkage map with three times as many markers as ordering-only methods while offering more accurate marker orders and stable genetic distances. Using HighMap, we constructed a common carp linkage map with 10,004 markers. The singleton rate was less than one-ninth of that generated by JoinMap4.1. Its total map distance was 5,908 cM, consistent with reports on low-density maps. HighMap is an efficient method for constructing high-density, high-quality linkage maps from high-throughput population NGS data. It will facilitate genome assembling, comparative genomic analysis, and QTL studies. HighMap is available at http://highmap.biomarker.com.cn/.

  7. Pair-barcode high-throughput sequencing for large-scale multiplexed sample analysis

    Directory of Open Access Journals (Sweden)

    Tu Jing

    2012-01-01

    Full Text Available Abstract Background The multiplexing becomes the major limitation of the next-generation sequencing (NGS in application to low complexity samples. Physical space segregation allows limited multiplexing, while the existing barcode approach only permits simultaneously analysis of up to several dozen samples. Results Here we introduce pair-barcode sequencing (PBS, an economic and flexible barcoding technique that permits parallel analysis of large-scale multiplexed samples. In two pilot runs using SOLiD sequencer (Applied Biosystems Inc., 32 independent pair-barcoded miRNA libraries were simultaneously discovered by the combination of 4 unique forward barcodes and 8 unique reverse barcodes. Over 174,000,000 reads were generated and about 64% of them are assigned to both of the barcodes. After mapping all reads to pre-miRNAs in miRBase, different miRNA expression patterns are captured from the two clinical groups. The strong correlation using different barcode pairs and the high consistency of miRNA expression in two independent runs demonstrates that PBS approach is valid. Conclusions By employing PBS approach in NGS, large-scale multiplexed pooled samples could be practically analyzed in parallel so that high-throughput sequencing economically meets the requirements of samples which are low sequencing throughput demand.

  8. Galaxy Workflows for Web-based Bioinformatics Analysis of Aptamer High-throughput Sequencing Data

    Directory of Open Access Journals (Sweden)

    William H Thiel

    2016-01-01

    Full Text Available Development of RNA and DNA aptamers for diagnostic and therapeutic applications is a rapidly growing field. Aptamers are identified through iterative rounds of selection in a process termed SELEX (Systematic Evolution of Ligands by EXponential enrichment. High-throughput sequencing (HTS revolutionized the modern SELEX process by identifying millions of aptamer sequences across multiple rounds of aptamer selection. However, these vast aptamer HTS datasets necessitated bioinformatics techniques. Herein, we describe a semiautomated approach to analyze aptamer HTS datasets using the Galaxy Project, a web-based open source collection of bioinformatics tools that were originally developed to analyze genome, exome, and transcriptome HTS data. Using a series of Workflows created in the Galaxy webserver, we demonstrate efficient processing of aptamer HTS data and compilation of a database of unique aptamer sequences. Additional Workflows were created to characterize the abundance and persistence of aptamer sequences within a selection and to filter sequences based on these parameters. A key advantage of this approach is that the online nature of the Galaxy webserver and its graphical interface allow for the analysis of HTS data without the need to compile code or install multiple programs.

  9. Galaxy Workflows for Web-based Bioinformatics Analysis of Aptamer High-throughput Sequencing Data.

    Science.gov (United States)

    Thiel, William H

    2016-01-01

    Development of RNA and DNA aptamers for diagnostic and therapeutic applications is a rapidly growing field. Aptamers are identified through iterative rounds of selection in a process termed SELEX (Systematic Evolution of Ligands by EXponential enrichment). High-throughput sequencing (HTS) revolutionized the modern SELEX process by identifying millions of aptamer sequences across multiple rounds of aptamer selection. However, these vast aptamer HTS datasets necessitated bioinformatics techniques. Herein, we describe a semiautomated approach to analyze aptamer HTS datasets using the Galaxy Project, a web-based open source collection of bioinformatics tools that were originally developed to analyze genome, exome, and transcriptome HTS data. Using a series of Workflows created in the Galaxy webserver, we demonstrate efficient processing of aptamer HTS data and compilation of a database of unique aptamer sequences. Additional Workflows were created to characterize the abundance and persistence of aptamer sequences within a selection and to filter sequences based on these parameters. A key advantage of this approach is that the online nature of the Galaxy webserver and its graphical interface allow for the analysis of HTS data without the need to compile code or install multiple programs. Copyright © 2016 Official journal of the American Society of Gene & Cell Therapy. Published by Elsevier Inc. All rights reserved.

  10. Vidjil: A Web Platform for Analysis of High-Throughput Repertoire Sequencing.

    Science.gov (United States)

    Duez, Marc; Giraud, Mathieu; Herbert, Ryan; Rocher, Tatiana; Salson, Mikaël; Thonier, Florian

    2016-01-01

    The B and T lymphocytes are white blood cells playing a key role in the adaptive immunity. A part of their DNA, called the V(D)J recombinations, is specific to each lymphocyte, and enables recognition of specific antigenes. Today, with new sequencing techniques, one can get billions of DNA sequences from these regions. With dedicated Repertoire Sequencing (RepSeq) methods, it is now possible to picture population of lymphocytes, and to monitor more accurately the immune response as well as pathologies such as leukemia. Vidjil is an open-source platform for the interactive analysis of high-throughput sequencing data from lymphocyte recombinations. It contains an algorithm gathering reads into clonotypes according to their V(D)J junctions, a web application made of a sample, experiment and patient database and a visualization for the analysis of clonotypes along the time. Vidjil is implemented in C++, Python and Javascript and licensed under the GPLv3 open-source license. Source code, binaries and a public web server are available at http://www.vidjil.org and at http://bioinfo.lille.inria.fr/vidjil. Using the Vidjil web application consists of four steps: 1. uploading a raw sequence file (typically a FASTQ); 2. running RepSeq analysis software; 3. visualizing the results; 4. annotating the results and saving them for future use. For the end-user, the Vidjil web application needs no specific installation and just requires a connection and a modern web browser. Vidjil is used by labs in hematology or immunology for research and clinical applications.

  11. Pyicos: a versatile toolkit for the analysis of high-throughput sequencing data.

    Science.gov (United States)

    Althammer, Sonja; González-Vallinas, Juan; Ballaré, Cecilia; Beato, Miguel; Eyras, Eduardo

    2011-12-15

    High-throughput sequencing (HTS) has revolutionized gene regulation studies and is now fundamental for the detection of protein-DNA and protein-RNA binding, as well as for measuring RNA expression. With increasing variety and sequencing depth of HTS datasets, the need for more flexible and memory-efficient tools to analyse them is growing. We describe Pyicos, a powerful toolkit for the analysis of mapped reads from diverse HTS experiments: ChIP-Seq, either punctuated or broad signals, CLIP-Seq and RNA-Seq. We prove the effectiveness of Pyicos to select for significant signals and show that its accuracy is comparable and sometimes superior to that of methods specifically designed for each particular type of experiment. Pyicos facilitates the analysis of a variety of HTS datatypes through its flexibility and memory efficiency, providing a useful framework for data integration into models of regulatory genomics. Open-source software, with tutorials and protocol files, is available at http://regulatorygenomics.upf.edu/pyicos or as a Galaxy server at http://regulatorygenomics.upf.edu/galaxy eduardo.eyras@upf.edu Supplementary data are available at Bioinformatics online.

  12. Identification of microRNAs from Eugenia uniflora by high-throughput sequencing and bioinformatics analysis.

    Science.gov (United States)

    Guzman, Frank; Almerão, Mauricio P; Körbes, Ana P; Loss-Morais, Guilherme; Margis, Rogerio

    2012-01-01

    microRNAs or miRNAs are small non-coding regulatory RNAs that play important functions in the regulation of gene expression at the post-transcriptional level by targeting mRNAs for degradation or inhibiting protein translation. Eugenia uniflora is a plant native to tropical America with pharmacological and ecological importance, and there have been no previous studies concerning its gene expression and regulation. To date, no miRNAs have been reported in Myrtaceae species. Small RNA and RNA-seq libraries were constructed to identify miRNAs and pre-miRNAs in Eugenia uniflora. Solexa technology was used to perform high throughput sequencing of the library, and the data obtained were analyzed using bioinformatics tools. From 14,489,131 small RNA clean reads, we obtained 1,852,722 mature miRNA sequences representing 45 conserved families that have been identified in other plant species. Further analysis using contigs assembled from RNA-seq allowed the prediction of secondary structures of 25 known and 17 novel pre-miRNAs. The expression of twenty-seven identified miRNAs was also validated using RT-PCR assays. Potential targets were predicted for the most abundant mature miRNAs in the identified pre-miRNAs based on sequence homology. This study is the first large scale identification of miRNAs and their potential targets from a species of the Myrtaceae family without genomic sequence resources. Our study provides more information about the evolutionary conservation of the regulatory network of miRNAs in plants and highlights species-specific miRNAs.

  13. NucTools: analysis of chromatin feature occupancy profiles from high-throughput sequencing data.

    Science.gov (United States)

    Vainshtein, Yevhen; Rippe, Karsten; Teif, Vladimir B

    2017-02-14

    Biomedical applications of high-throughput sequencing methods generate a vast amount of data in which numerous chromatin features are mapped along the genome. The results are frequently analysed by creating binary data sets that link the presence/absence of a given feature to specific genomic loci. However, the nucleosome occupancy or chromatin accessibility landscape is essentially continuous. It is currently a challenge in the field to cope with continuous distributions of deep sequencing chromatin readouts and to integrate the different types of discrete chromatin features to reveal linkages between them. Here we introduce the NucTools suite of Perl scripts as well as MATLAB- and R-based visualization programs for a nucleosome-centred downstream analysis of deep sequencing data. NucTools accounts for the continuous distribution of nucleosome occupancy. It allows calculations of nucleosome occupancy profiles averaged over several replicates, comparisons of nucleosome occupancy landscapes between different experimental conditions, and the estimation of the changes of integral chromatin properties such as the nucleosome repeat length. Furthermore, NucTools facilitates the annotation of nucleosome occupancy with other chromatin features like binding of transcription factors or architectural proteins, and epigenetic marks like histone modifications or DNA methylation. The applications of NucTools are demonstrated for the comparison of several datasets for nucleosome occupancy in mouse embryonic stem cells (ESCs) and mouse embryonic fibroblasts (MEFs). The typical workflows of data processing and integrative analysis with NucTools reveal information on the interplay of nucleosome positioning with other features such as for example binding of a transcription factor CTCF, regions with stable and unstable nucleosomes, and domains of large organized chromatin K9me2 modifications (LOCKs). As potential limitations and problems we discuss how inter-replicate variability of

  14. Forensic soil DNA analysis using high-throughput sequencing: a comparison of four molecular markers.

    Science.gov (United States)

    Young, Jennifer M; Weyrich, Laura S; Cooper, Alan

    2014-11-01

    Soil analysis, such as mineralogy, geophysics, texture and colour, are commonly used in forensic casework to link a suspect to a crime scene. However, DNA analysis can also be applied to characterise the vast diversity of organisms present in soils. DNA metabarcoding and high-throughput sequencing (HTS) now offer a means to improve discrimination between forensic soil samples by identifying individual taxa and exploring non-culturable microbial species. Here, we compare the small-scale reproducibility and resolution of four molecular markers targeting different taxa (bacterial 16S rRNA, eukaryotic18S rRNA, plant trnL intron and fungal internal transcribed spacer I (ITS1) rDNA) to distinguish two sample sites. We also assess the background DNA level associated with each marker and examine the effects of filtering Operational Taxonomic Units (OTUs) detected in extraction blank controls. From this study, we show that non-bacterial taxa in soil, particularly fungi, can provide the greatest resolution between the sites, whereas plant markers may be problematic for forensic discrimination. ITS and 18S markers exhibit reliable amplification, and both show high discriminatory power with low background DNA levels. The 16S rRNA marker showed comparable discriminatory power post filtering; however, presented the highest level of background DNA. The discriminatory power of all markers was increased by applying OTU filtering steps, with the greatest improvement observed by the removal of any sequences detected in extraction blanks. This study demonstrates the potential use of multiple DNA markers for forensic soil analysis using HTS, and identifies some of the standardisation and evaluation steps necessary before this technique can be applied in casework. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  15. Unraveling long non-coding RNAs through analysis of high-throughput RNA-sequencing data

    Directory of Open Access Journals (Sweden)

    Rashmi Tripathi

    2017-06-01

    Full Text Available Extensive genome-wide transcriptome study mediated by high throughput sequencing technique has revolutionized the study of genetics and epigenetic at unprecedented resolution. The research has revealed that besides protein-coding RNAs, large proportions of mammalian transcriptome includes a heap of regulatory non protein-coding RNAs, the number encoded within human genome is enigmatic. Many taboos developed in the past categorized these non-coding RNAs as ‘‘dark matter” and “junks”. Breaking the myth, RNA-seq-- a recently developed experimental technique is widely being used for studying non-coding RNAs which has acquired the limelight due to their physiological and pathological significance. The longest member of the ncRNA family-- long non-coding RNAs, acts as stable and functional part of a genome, guiding towards the important clues about the varied biological events like cellular-, structural- processes governing the complexity of an organism. Here, we review the most recent and influential computational approach developed to identify and quantify the long non-coding RNAs serving as an assistant for the users to choose appropriate tools for their specific research. Keywords: Transcriptome, High throughput sequencing, Genetic and epigenetic, Long non-coding RNA, RNA-sequencing, RNA-seq

  16. Applications of High Throughput Nucleotide Sequencing

    DEFF Research Database (Denmark)

    Waage, Johannes Eichler

    The recent advent of high throughput sequencing of nucleic acids (RNA and DNA) has vastly expanded research into the functional and structural biology of the genome of all living organisms (and even a few dead ones). With this enormous and exponential growth in biological data generation come...... equally large demands in data handling, analysis and interpretation, perhaps defining the modern challenge of the computational biologist of the post-genomic era. The first part of this thesis consists of a general introduction to the history, common terms and challenges of next generation sequencing......, focusing on oft encountered problems in data processing, such as quality assurance, mapping, normalization, visualization, and interpretation. Presented in the second part are scientific endeavors representing solutions to problems of two sub-genres of next generation sequencing. For the first flavor, RNA-sequencing...

  17. High-throughput nucleotide sequence analysis of diverse bacterial communities in leachates of decomposing pig carcasses

    Directory of Open Access Journals (Sweden)

    Seung Hak Yang

    2015-09-01

    Full Text Available The leachate generated by the decomposition of animal carcass has been implicated as an environmental contaminant surrounding the burial site. High-throughput nucleotide sequencing was conducted to investigate the bacterial communities in leachates from the decomposition of pig carcasses. We acquired 51,230 reads from six different samples (1, 2, 3, 4, 6 and 14 week-old carcasses and found that sequences representing the phylum Firmicutes predominated. The diversity of bacterial 16S rRNA gene sequences in the leachate was the highest at 6 weeks, in contrast to those at 2 and 14 weeks. The relative abundance of Firmicutes was reduced, while the proportion of Bacteroidetes and Proteobacteria increased from 3–6 weeks. The representation of phyla was restored after 14 weeks. However, the community structures between the samples taken at 1–2 and 14 weeks differed at the bacterial classification level. The trend in pH was similar to the changes seen in bacterial communities, indicating that the pH of the leachate could be related to the shift in the microbial community. The results indicate that the composition of bacterial communities in leachates of decomposing pig carcasses shifted continuously during the study period and might be influenced by the burial site.

  18. CSReport: A New Computational Tool Designed for Automatic Analysis of Class Switch Recombination Junctions Sequenced by High-Throughput Sequencing.

    Science.gov (United States)

    Boyer, François; Boutouil, Hend; Dalloul, Iman; Dalloul, Zeinab; Cook-Moreau, Jeanne; Aldigier, Jean-Claude; Carrion, Claire; Herve, Bastien; Scaon, Erwan; Cogné, Michel; Péron, Sophie

    2017-05-15

    B cells ensure humoral immune responses due to the production of Ag-specific memory B cells and Ab-secreting plasma cells. In secondary lymphoid organs, Ag-driven B cell activation induces terminal maturation and Ig isotype class switch (class switch recombination [CSR]). CSR creates a virtually unique IgH locus in every B cell clone by intrachromosomal recombination between two switch (S) regions upstream of each C region gene. Amount and structural features of CSR junctions reveal valuable information about the CSR mechanism, and analysis of CSR junctions is useful in basic and clinical research studies of B cell functions. To provide an automated tool able to analyze large data sets of CSR junction sequences produced by high-throughput sequencing (HTS), we designed CSReport, a software program dedicated to support analysis of CSR recombination junctions sequenced with a HTS-based protocol (Ion Torrent technology). CSReport was assessed using simulated data sets of CSR junctions and then used for analysis of Sμ-Sα and Sμ-Sγ1 junctions from CH12F3 cells and primary murine B cells, respectively. CSReport identifies junction segment breakpoints on reference sequences and junction structure (blunt-ended junctions or junctions with insertions or microhomology). Besides the ability to analyze unprecedentedly large libraries of junction sequences, CSReport will provide a unified framework for CSR junction studies. Our results show that CSReport is an accurate tool for analysis of sequences from our HTS-based protocol for CSR junctions, thereby facilitating and accelerating their study. Copyright © 2017 by The American Association of Immunologists, Inc.

  19. Applications of High Throughput Nucleotide Sequencing

    DEFF Research Database (Denmark)

    Waage, Johannes Eichler

    The recent advent of high throughput sequencing of nucleic acids (RNA and DNA) has vastly expanded research into the functional and structural biology of the genome of all living organisms (and even a few dead ones). With this enormous and exponential growth in biological data generation come...... equally large demands in data handling, analysis and interpretation, perhaps defining the modern challenge of the computational biologist of the post-genomic era. The first part of this thesis consists of a general introduction to the history, common terms and challenges of next generation sequencing......). For the second flavor, DNA-seq, a study presenting genome wide profiling of transcription factor CEBP/A in liver cells undergoing regeneration after partial hepatectomy (article IV) is included....

  20. Analysis of high-throughput sequencing and annotation strategies for phage genomes.

    Directory of Open Access Journals (Sweden)

    Matthew R Henn

    Full Text Available BACKGROUND: Bacterial viruses (phages play a critical role in shaping microbial populations as they influence both host mortality and horizontal gene transfer. As such, they have a significant impact on local and global ecosystem function and human health. Despite their importance, little is known about the genomic diversity harbored in phages, as methods to capture complete phage genomes have been hampered by the lack of knowledge about the target genomes, and difficulties in generating sufficient quantities of genomic DNA for sequencing. Of the approximately 550 phage genomes currently available in the public domain, fewer than 5% are marine phage. METHODOLOGY/PRINCIPAL FINDINGS: To advance the study of phage biology through comparative genomic approaches we used marine cyanophage as a model system. We compared DNA preparation methodologies (DNA extraction directly from either phage lysates or CsCl purified phage particles, and sequencing strategies that utilize either Sanger sequencing of a linker amplification shotgun library (LASL or of a whole genome shotgun library (WGSL, or 454 pyrosequencing methods. We demonstrate that genomic DNA sample preparation directly from a phage lysate, combined with 454 pyrosequencing, is best suited for phage genome sequencing at scale, as this method is capable of capturing complete continuous genomes with high accuracy. In addition, we describe an automated annotation informatics pipeline that delivers high-quality annotation and yields few false positives and negatives in ORF calling. CONCLUSIONS/SIGNIFICANCE: These DNA preparation, sequencing and annotation strategies enable a high-throughput approach to the burgeoning field of phage genomics.

  1. Perchlorate reduction by hydrogen autotrophic bacteria and microbial community analysis using high-throughput sequencing.

    Science.gov (United States)

    Wan, Dongjin; Liu, Yongde; Niu, Zhenhua; Xiao, Shuhu; Li, Daorong

    2016-02-01

    Hydrogen autotrophic reduction of perchlorate have advantages of high removal efficiency and harmless to drinking water. But so far the reported information about the microbial community structure was comparatively limited, changes in the biodiversity and the dominant bacteria during acclimation process required detailed study. In this study, perchlorate-reducing hydrogen autotrophic bacteria were acclimated by hydrogen aeration from activated sludge. For the first time, high-throughput sequencing was applied to analyze changes in biodiversity and the dominant bacteria during acclimation process. The Michaelis-Menten model described the perchlorate reduction kinetics well. Model parameters q(max) and K(s) were 2.521-3.245 (mg ClO4(-)/gVSS h) and 5.44-8.23 (mg/l), respectively. Microbial perchlorate reduction occurred across at pH range 5.0-11.0; removal was highest at pH 9.0. The enriched mixed bacteria could use perchlorate, nitrate and sulfate as electron accepter, and the sequence of preference was: NO3(-) > ClO4(-) > SO4(2-). Compared to the feed culture, biodiversity decreased greatly during acclimation process, the microbial community structure gradually stabilized after 9 acclimation cycles. The Thauera genus related to Rhodocyclales was the dominated perchlorate reducing bacteria (PRB) in the mixed culture.

  2. Transcriptomic analysis of Petunia hybrida in response to salt stress using high throughput RNA sequencing.

    Directory of Open Access Journals (Sweden)

    Gonzalo H Villarino

    Full Text Available Salinity and drought stress are the primary cause of crop losses worldwide. In sodic saline soils sodium chloride (NaCl disrupts normal plant growth and development. The complex interactions of plant systems with abiotic stress have made RNA sequencing a more holistic and appealing approach to study transcriptome level responses in a single cell and/or tissue. In this work, we determined the Petunia transcriptome response to NaCl stress by sequencing leaf samples and assembling 196 million Illumina reads with Trinity software. Using our reference transcriptome we identified more than 7,000 genes that were differentially expressed within 24 h of acute NaCl stress. The proposed transcriptome can also be used as an excellent tool for biological and bioinformatics in the absence of an available Petunia genome and it is available at the SOL Genomics Network (SGN http://solgenomics.net. Genes related to regulation of reactive oxygen species, transport, and signal transductions as well as novel and undescribed transcripts were among those differentially expressed in response to salt stress. The candidate genes identified in this study can be applied as markers for breeding or to genetically engineer plants to enhance salt tolerance. Gene Ontology analyses indicated that most of the NaCl damage happened at 24 h inducing genotoxicity, affecting transport and organelles due to the high concentration of Na+ ions. Finally, we report a modification to the library preparation protocol whereby cDNA samples were bar-coded with non-HPLC purified primers, without affecting the quality and quantity of the RNA-seq data. The methodological improvement presented here could substantially reduce the cost of sample preparation for future high-throughput RNA sequencing experiments.

  3. Human Genome Sequencing at the Population Scale: A Primer on High-Throughput DNA Sequencing and Analysis.

    Science.gov (United States)

    Goldfeder, Rachel L; Wall, Dennis P; Khoury, Muin J; Ioannidis, John P A; Ashley, Euan A

    2017-10-15

    Most human diseases have underlying genetic causes. To better understand the impact of genes on disease and its implications for medicine and public health, researchers have pursued methods for determining the sequences of individual genes, then all genes, and now complete human genomes. Massively parallel high-throughput sequencing technology, where DNA is sheared into smaller pieces, sequenced, and then computationally reordered and analyzed, enables fast and affordable sequencing of full human genomes. As the price of sequencing continues to decline, more and more individuals are having their genomes sequenced. This may facilitate better population-level disease subtyping and characterization, as well as individual-level diagnosis and personalized treatment and prevention plans. In this review, we describe several massively parallel high-throughput DNA sequencing technologies and their associated strengths, limitations, and error modes, with a focus on applications in epidemiologic research and precision medicine. We detail the methods used to computationally process and interpret sequence data to inform medical or preventative action. © The Author(s) 2017. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  4. Construction and analysis of an integrated regulatory network derived from high-throughput sequencing data.

    Directory of Open Access Journals (Sweden)

    Chao Cheng

    2011-11-01

    Full Text Available We present a network framework for analyzing multi-level regulation in higher eukaryotes based on systematic integration of various high-throughput datasets. The network, namely the integrated regulatory network, consists of three major types of regulation: TF→gene, TF→miRNA and miRNA→gene. We identified the target genes and target miRNAs for a set of TFs based on the ChIP-Seq binding profiles, the predicted targets of miRNAs using annotated 3'UTR sequences and conservation information. Making use of the system-wide RNA-Seq profiles, we classified transcription factors into positive and negative regulators and assigned a sign for each regulatory interaction. Other types of edges such as protein-protein interactions and potential intra-regulations between miRNAs based on the embedding of miRNAs in their host genes were further incorporated. We examined the topological structures of the network, including its hierarchical organization and motif enrichment. We found that transcription factors downstream of the hierarchy distinguish themselves by expressing more uniformly at various tissues, have more interacting partners, and are more likely to be essential. We found an over-representation of notable network motifs, including a FFL in which a miRNA cost-effectively shuts down a transcription factor and its target. We used data of C. elegans from the modENCODE project as a primary model to illustrate our framework, but further verified the results using other two data sets. As more and more genome-wide ChIP-Seq and RNA-Seq data becomes available in the near future, our methods of data integration have various potential applications.

  5. High-throughput sequencing and graph-based cluster analysis facilitate microsatellite development from a highly complex genome.

    Science.gov (United States)

    Shah, Abhijeet B; Schielzeth, Holger; Albersmeier, Andreas; Kalinowski, Joern; Hoffman, Joseph I

    2016-08-01

    Despite recent advances in high-throughput sequencing, difficulties are often encountered when developing microsatellites for species with large and complex genomes. This probably reflects the close association in many species of microsatellites with cryptic repetitive elements. We therefore developed a novel approach for isolating polymorphic microsatellites from the club-legged grasshopper (Gomphocerus sibiricus), an emerging quantitative genetic and behavioral model system. Whole genome shotgun Illumina MiSeq sequencing was used to generate over three million 300 bp paired-end reads, of which 67.75% were grouped into 40,548 clusters within RepeatExplorer. Annotations of the top 468 clusters, which represent 60.5% of the reads, revealed homology to satellite DNA and a variety of transposable elements. Evaluating 96 primer pairs in eight wild-caught individuals, we found that primers mined from singleton reads were six times more likely to amplify a single polymorphic microsatellite locus than primers mined from clusters. Our study provides experimental evidence in support of the notion that microsatellites associated with repetitive elements are less likely to successfully amplify. It also reveals how advances in high-throughput sequencing and graph-based repetitive DNA analysis can be leveraged to isolate polymorphic microsatellites from complex genomes.

  6. Peptide Pattern Recognition for high-throughput protein sequence analysis and clustering

    DEFF Research Database (Denmark)

    Busk, Peter Kamp

    2017-01-01

    Large collections of protein sequences with divergent sequences are tedious to analyze for understanding their phylogenetic or structure-function relation. Peptide Pattern Recognition is an algorithm that was developed to facilitate this task but the previous version does only allow a limited...... number of sequences as input. I implemented Peptide Pattern Recognition as a multithread software designed to handle large numbers of sequences and perform analysis in a reasonable time frame. Benchmarking showed that the new implementation of Peptide Pattern Recognition is twenty times faster than...... the previous implementation on a small protein collection with 673 MAP kinase sequences. In addition, the new implementation could analyze a large protein collection with 48,570 Glycosyl Transferase family 20 sequences without reaching its upper limit on a desktop computer. Peptide Pattern Recognition...

  7. DNA Sudoku—harnessing high-throughput sequencing for multiplexed specimen analysis

    Science.gov (United States)

    Erlich, Yaniv; Chang, Kenneth; Gordon, Assaf; Ronen, Roy; Navon, Oron; Rooks, Michelle; Hannon, Gregory J.

    2009-01-01

    Next-generation sequencers have sufficient power to analyze simultaneously DNAs from many different specimens, a practice known as multiplexing. Such schemes rely on the ability to associate each sequence read with the specimen from which it was derived. The current practice of appending molecular barcodes prior to pooling is practical for parallel analysis of up to many dozen samples. Here, we report a strategy that permits simultaneous analysis of tens of thousands of specimens. Our approach relies on the use of combinatorial pooling strategies in which pools rather than individual specimens are assigned barcodes. Thus, the identity of each specimen is encoded within the pooling pattern rather than by its association with a particular sequence tag. Decoding the pattern allows the sequence of an original specimen to be inferred with high confidence. We verified the ability of our encoding and decoding strategies to accurately report the sequence of individual samples within a large number of mixed specimens in two ways. First, we simulated data both from a clone library and from a human population in which a sequence variant associated with cystic fibrosis was present. Second, we actually pooled, sequenced, and decoded identities within two sets of 40,000 bacterial clones comprising approximately 20,000 different artificial microRNAs targeting Arabidopsis or human genes. We achieved greater than 97% accuracy in these trials. The strategies reported here can be applied to a wide variety of biological problems, including the determination of genotypic variation within large populations of individuals. PMID:19447965

  8. Transcriptome analysis of Emiliania huxleyi cells grown under different conditions using high-throughput sequencing data

    Science.gov (United States)

    Andreson, R.; Anlauf, H.; Mackinder, L.; Iglesias-Rodriguez, D.; LaRoche, J.; Lenhard, B.

    2012-04-01

    Coccolithophores are ideal for studying genes responsible for biomineralization processes due to relatively small genome sizes, ability to grow in culture, and as a natural model system for measuring expression of calcification-related genes in two life stages. As the Emiliania huxleyi has several annotated calcification-related proteins, we have concentrated on analyzing its genes and promoter areas. Many recent studies have focused primarily on transcriptome analysis of E. huxleyi using nutrient-limited conditions to get more information about up-regulated genes involved in biomineralization and calcification processes. Although there are more than 100,000 EST sequences for E. huxleyi available from these projects in public databases, that data is often insufficient to identify the exact position of transcription start site (TSS) to perform precise analysis (nucleotide content, motif search) of core promoters and regulatory mechanisms in immediate flanking areas. ESTs are not ideal for these kinds of analyses because the standard technologies of producing 5' EST libraries do not guarantee that the exact 5' end of the transcript will be captured. To determine the extent and accurate positions of 5' ends of transcripts and therefore the positions of core promoters, Cap analysis of gene expression (CAGE) sequencing method was used for sequencing RNA of E. huxleyi in both stages, calcifying and non-calcifying. As an additional info, gene expression levels of RNA for 21 samples were retrieved with whole transcriptome shotgun sequencing (RNA-Seq). The collections of reads these methods produced were used to map and annotate genes on several samples and measure the RNA expression levels in different conditions. Although there are not much data available for close organisms, it is possible to compare these results with other species to find conserved regulatory mechanisms between genes related to calcification. Visualization tools allowing browsing of annotated genes

  9. Designing small universal k-mer hitting sets for improved analysis of high-throughput sequencing.

    Directory of Open Access Journals (Sweden)

    Yaron Orenstein

    2017-10-01

    Full Text Available With the rapidly increasing volume of deep sequencing data, more efficient algorithms and data structures are needed. Minimizers are a central recent paradigm that has improved various sequence analysis tasks, including hashing for faster read overlap detection, sparse suffix arrays for creating smaller indexes, and Bloom filters for speeding up sequence search. Here, we propose an alternative paradigm that can lead to substantial further improvement in these and other tasks. For integers k and L > k, we say that a set of k-mers is a universal hitting set (UHS if every possible L-long sequence must contain a k-mer from the set. We develop a heuristic called DOCKS to find a compact UHS, which works in two phases: The first phase is solved optimally, and for the second we propose several efficient heuristics, trading set size for speed and memory. The use of heuristics is motivated by showing the NP-hardness of a closely related problem. We show that DOCKS works well in practice and produces UHSs that are very close to a theoretical lower bound. We present results for various values of k and L and by applying them to real genomes show that UHSs indeed improve over minimizers. In particular, DOCKS uses less than 30% of the 10-mers needed to span the human genome compared to minimizers. The software and computed UHSs are freely available at github.com/Shamir-Lab/DOCKS/ and acgt.cs.tau.ac.il/docks/, respectively.

  10. Secure and robust cloud computing for high-throughput forensic microsatellite sequence analysis and databasing.

    Science.gov (United States)

    Bailey, Sarah F; Scheible, Melissa K; Williams, Christopher; Silva, Deborah S B S; Hoggan, Marina; Eichman, Christopher; Faith, Seth A

    2017-11-01

    Next-generation Sequencing (NGS) is a rapidly evolving technology with demonstrated benefits for forensic genetic applications, and the strategies to analyze and manage the massive NGS datasets are currently in development. Here, the computing, data storage, connectivity, and security resources of the Cloud were evaluated as a model for forensic laboratory systems that produce NGS data. A complete front-to-end Cloud system was developed to upload, process, and interpret raw NGS data using a web browser dashboard. The system was extensible, demonstrating analysis capabilities of autosomal and Y-STRs from a variety of NGS instrumentation (Illumina MiniSeq and MiSeq, and Oxford Nanopore MinION). NGS data for STRs were concordant with standard reference materials previously characterized with capillary electrophoresis and Sanger sequencing. The computing power of the Cloud was implemented with on-demand auto-scaling to allow multiple file analysis in tandem. The system was designed to store resulting data in a relational database, amenable to downstream sample interpretations and databasing applications following the most recent guidelines in nomenclature for sequenced alleles. Lastly, a multi-layered Cloud security architecture was tested and showed that industry standards for securing data and computing resources were readily applied to the NGS system without disadvantageous effects for bioinformatic analysis, connectivity or data storage/retrieval. The results of this study demonstrate the feasibility of using Cloud-based systems for secured NGS data analysis, storage, databasing, and multi-user distributed connectivity. Copyright © 2017 Elsevier B.V. All rights reserved.

  11. Sample Preservation, DNA or RNA Extraction and Data Analysis for High-Throughput Phytoplankton Community Sequencing

    Directory of Open Access Journals (Sweden)

    Anita Mäki

    2017-09-01

    Full Text Available Phytoplankton is the basis for aquatic food webs and mirrors the water quality. Conventionally, phytoplankton analysis has been done using time consuming and partly subjective microscopic observations, but next generation sequencing (NGS technologies provide promising potential for rapid automated examination of environmental samples. Because many phytoplankton species have tough cell walls, methods for cell lysis and DNA or RNA isolation need to be efficient to allow unbiased nucleic acid retrieval. Here, we analyzed how two phytoplankton preservation methods, three commercial DNA extraction kits and their improvements, three RNA extraction methods, and two data analysis procedures affected the results of the NGS analysis. A mock community was pooled from phytoplankton species with variation in nucleus size and cell wall hardness. Although the study showed potential for studying Lugol-preserved sample collections, it demonstrated critical challenges in the DNA-based phytoplankton analysis in overall. The 18S rRNA gene sequencing output was highly affected by the variation in the rRNA gene copy numbers per cell, while sample preservation and nucleic acid extraction methods formed another source of variation. At the top, sequence-specific variation in the data quality introduced unexpected bioinformatics bias when the sliding-window method was used for the quality trimming of the Ion Torrent data. While DNA-based analyses did not correlate with biomasses or cell numbers of the mock community, rRNA-based analyses were less affected by different RNA extraction procedures and had better match with the biomasses, dry weight and carbon contents, and are therefore recommended for quantitative phytoplankton analyses.

  12. HTSSIP: An R package for analysis of high throughput sequencing data from nucleic acid stable isotope probing (SIP) experiments.

    Science.gov (United States)

    Youngblut, Nicholas D; Barnett, Samuel E; Buckley, Daniel H

    2018-01-01

    Combining high throughput sequencing with stable isotope probing (HTS-SIP) is a powerful method for mapping in situ metabolic processes to thousands of microbial taxa. However, accurately mapping metabolic processes to taxa is complex and challenging. Multiple HTS-SIP data analysis methods have been developed, including high-resolution stable isotope probing (HR-SIP), multi-window high-resolution stable isotope probing (MW-HR-SIP), quantitative stable isotope probing (qSIP), and ΔBD. Currently, there is no publicly available software designed specifically for analyzing HTS-SIP data. To address this shortfall, we have developed the HTSSIP R package, an open-source, cross-platform toolset for conducting HTS-SIP analyses in a straightforward and easily reproducible manner. The HTSSIP package, along with full documentation and examples, is available from CRAN at https://cran.r-project.org/web/packages/HTSSIP/index.html and Github at https://github.com/buckleylab/HTSSIP.

  13. A novel approach for transcription factor analysis using SELEX with high-throughput sequencing (TFAST.

    Directory of Open Access Journals (Sweden)

    Daniel J Reiss

    Full Text Available BACKGROUND: In previous work, we designed a modified aptamer-free SELEX-seq protocol (afSELEX-seq for the discovery of transcription factor binding sites. Here, we present original software, TFAST, designed to analyze afSELEX-seq data, validated against our previously generated afSELEX-seq dataset and a model dataset. TFAST is designed with a simple graphical interface (Java so that it can be installed and executed without extensive expertise in bioinformatics. TFAST completes analysis within minutes on most personal computers. METHODOLOGY: Once afSELEX-seq data are aligned to a target genome, TFAST identifies peaks and, uniquely, compares peak characteristics between cycles. TFAST generates a hierarchical report of graded peaks, their associated genomic sequences, binding site length predictions, and dummy sequences. PRINCIPAL FINDINGS: Including additional cycles of afSELEX-seq improved TFAST's ability to selectively identify peaks, leading to 7,274, 4,255, and 2,628 peaks identified in two-, three-, and four-cycle afSELEX-seq. Inter-round analysis by TFAST identified 457 peaks as the strongest candidates for true binding sites. Separating peaks by TFAST into classes of worst, second-best and best candidate peaks revealed a trend of increasing significance (e-values 4.5 × 10(12, 2.9 × 10(-46, and 1.2 × 10(-73 and informational content (11.0, 11.9, and 12.5 bits over 15 bp of discovered motifs within each respective class. TFAST also predicted a binding site length (28 bp consistent with non-computational experimentally derived results for the transcription factor PapX (22 to 29 bp. CONCLUSIONS/SIGNIFICANCE: TFAST offers a novel and intuitive approach for determining DNA binding sites of proteins subjected to afSELEX-seq. Here, we demonstrate that TFAST, using afSELEX-seq data, rapidly and accurately predicted sequence length and motif for a putative transcription factor's binding site.

  14. Bulk segregant analysis by high-throughput sequencing reveals a novel xylose utilization gene from Saccharomyces cerevisiae.

    Directory of Open Access Journals (Sweden)

    Jared W Wenger

    2010-05-01

    Full Text Available Fermentation of xylose is a fundamental requirement for the efficient production of ethanol from lignocellulosic biomass sources. Although they aggressively ferment hexoses, it has long been thought that native Saccharomyces cerevisiae strains cannot grow fermentatively or non-fermentatively on xylose. Population surveys have uncovered a few naturally occurring strains that are weakly xylose-positive, and some S. cerevisiae have been genetically engineered to ferment xylose, but no strain, either natural or engineered, has yet been reported to ferment xylose as efficiently as glucose. Here, we used a medium-throughput screen to identify Saccharomyces strains that can increase in optical density when xylose is presented as the sole carbon source. We identified 38 strains that have this xylose utilization phenotype, including strains of S. cerevisiae, other sensu stricto members, and hybrids between them. All the S. cerevisiae xylose-utilizing strains we identified are wine yeasts, and for those that could produce meiotic progeny, the xylose phenotype segregates as a single gene trait. We mapped this gene by Bulk Segregant Analysis (BSA using tiling microarrays and high-throughput sequencing. The gene is a putative xylitol dehydrogenase, which we name XDH1, and is located in the subtelomeric region of the right end of chromosome XV in a region not present in the S288c reference genome. We further characterized the xylose phenotype by performing gene expression microarrays and by genetically dissecting the endogenous Saccharomyces xylose pathway. We have demonstrated that natural S. cerevisiae yeasts are capable of utilizing xylose as the sole carbon source, characterized the genetic basis for this trait as well as the endogenous xylose utilization pathway, and demonstrated the feasibility of BSA using high-throughput sequencing.

  15. Viral Metagenomics: Analysis of Begomoviruses by Illumina High-Throughput Sequencing

    Directory of Open Access Journals (Sweden)

    Ali Idris

    2014-03-01

    Full Text Available Traditional DNA sequencing methods are inefficient, lack the ability to discern the least abundant viral sequences, and ineffective for determining the extent of variability in viral populations. Here, populations of single-stranded DNA plant begomoviral genomes and their associated beta- and alpha-satellite molecules (virus-satellite complexes (genus, Begomovirus; family, Geminiviridae were enriched from total nucleic acids isolated from symptomatic, field-infected plants, using rolling circle amplification (RCA. Enriched virus-satellite complexes were subjected to Illumina-Next Generation Sequencing (NGS. CASAVA and SeqMan NGen programs were implemented, respectively, for quality control and for de novo and reference-guided contig assembly of viral-satellite sequences. The authenticity of the begomoviral sequences, and the reproducibility of the Illumina-NGS approach for begomoviral deep sequencing projects, were validated by comparing NGS results with those obtained using traditional molecular cloning and Sanger sequencing of viral components and satellite DNAs, also enriched by RCA or amplified by polymerase chain reaction. As the use of NGS approaches, together with advances in software development, make possible deep sequence coverage at a lower cost; the approach described herein will streamline the exploration of begomovirus diversity and population structure from naturally infected plants, irrespective of viral abundance. This is the first report of the implementation of Illumina-NGS to explore the diversity and identify begomoviral-satellite SNPs directly from plants naturally-infected with begomoviruses under field conditions.

  16. Viral metagenomics: Analysis of begomoviruses by illumina high-throughput sequencing

    KAUST Repository

    Idris, Ali

    2014-03-12

    Traditional DNA sequencing methods are inefficient, lack the ability to discern the least abundant viral sequences, and ineffective for determining the extent of variability in viral populations. Here, populations of single-stranded DNA plant begomoviral genomes and their associated beta- and alpha-satellite molecules (virus-satellite complexes) (genus, Begomovirus; family, Geminiviridae) were enriched from total nucleic acids isolated from symptomatic, field-infected plants, using rolling circle amplification (RCA). Enriched virus-satellite complexes were subjected to Illumina-Next Generation Sequencing (NGS). CASAVA and SeqMan NGen programs were implemented, respectively, for quality control and for de novo and reference-guided contig assembly of viral-satellite sequences. The authenticity of the begomoviral sequences, and the reproducibility of the Illumina-NGS approach for begomoviral deep sequencing projects, were validated by comparing NGS results with those obtained using traditional molecular cloning and Sanger sequencing of viral components and satellite DNAs, also enriched by RCA or amplified by polymerase chain reaction. As the use of NGS approaches, together with advances in software development, make possible deep sequence coverage at a lower cost; the approach described herein will streamline the exploration of begomovirus diversity and population structure from naturally infected plants, irrespective of viral abundance. This is the first report of the implementation of Illumina-NGS to explore the diversity and identify begomoviral-satellite SNPs directly from plants naturally-infected with begomoviruses under field conditions. 2014 by the authors; licensee MDPI, Basel, Switzerland.

  17. Unifying the analysis of high-throughput sequencing datasets: characterizing RNA-seq, 16S rRNA gene sequencing and selective growth experiments by compositional data analysis.

    Science.gov (United States)

    Fernandes, Andrew D; Reid, Jennifer Ns; Macklaim, Jean M; McMurrough, Thomas A; Edgell, David R; Gloor, Gregory B

    2014-01-01

    Experimental designs that take advantage of high-throughput sequencing to generate datasets include RNA sequencing (RNA-seq), chromatin immunoprecipitation sequencing (ChIP-seq), sequencing of 16S rRNA gene fragments, metagenomic analysis and selective growth experiments. In each case the underlying data are similar and are composed of counts of sequencing reads mapped to a large number of features in each sample. Despite this underlying similarity, the data analysis methods used for these experimental designs are all different, and do not translate across experiments. Alternative methods have been developed in the physical and geological sciences that treat similar data as compositions. Compositional data analysis methods transform the data to relative abundances with the result that the analyses are more robust and reproducible. Data from an in vitro selective growth experiment, an RNA-seq experiment and the Human Microbiome Project 16S rRNA gene abundance dataset were examined by ALDEx2, a compositional data analysis tool that uses Bayesian methods to infer technical and statistical error. The ALDEx2 approach is shown to be suitable for all three types of data: it correctly identifies both the direction and differential abundance of features in the differential growth experiment, it identifies a substantially similar set of differentially expressed genes in the RNA-seq dataset as the leading tools and it identifies as differential the taxa that distinguish the tongue dorsum and buccal mucosa in the Human Microbiome Project dataset. The design of ALDEx2 reduces the number of false positive identifications that result from datasets composed of many features in few samples. Statistical analysis of high-throughput sequencing datasets composed of per feature counts showed that the ALDEx2 R package is a simple and robust tool, which can be applied to RNA-seq, 16S rRNA gene sequencing and differential growth datasets, and by extension to other techniques that use a

  18. Molecular diet analysis of two African free-tailed bats (Molossidae) using high throughput sequencing

    DEFF Research Database (Denmark)

    Bohmann, Kristine; Monadjem, Ara; Noer, Christina Lehmkuhl

    2011-01-01

    Given the diversity of prey consumed by insectivorous bats, it is difficult to discern the composition of their diet using morphological or conventional PCR-based analyses of their faeces. We demonstrate the use of a powerful alternate tool, the use of the Roche FLX sequencing platform to deep......-sequence uniquely 5′ tagged insect-generic barcode cytochrome c oxidase I (COI) fragments, that were PCR amplified from faecal pellets of two free-tailed bat species Chaerephon pumilus and Mops condylurus (family: Molossidae). Although the analyses were challenged by the paucity of southern African insect COI...

  19. High Throughput Sample Preparation and Analysis for DNA Sequencing, PCR and Combinatorial Screening of Catalysis Based on Capillary Array Technique

    Energy Technology Data Exchange (ETDEWEB)

    Zhang, Yonghua [Iowa State Univ., Ames, IA (United States)

    2000-01-01

    Sample preparation has been one of the major bottlenecks for many high throughput analyses. The purpose of this research was to develop new sample preparation and integration approach for DNA sequencing, PCR based DNA analysis and combinatorial screening of homogeneous catalysis based on multiplexed capillary electrophoresis with laser induced fluorescence or imaging UV absorption detection. The author first introduced a method to integrate the front-end tasks to DNA capillary-array sequencers. protocols for directly sequencing the plasmids from a single bacterial colony in fused-silica capillaries were developed. After the colony was picked, lysis was accomplished in situ in the plastic sample tube using either a thermocycler or heating block. Upon heating, the plasmids were released while chromsomal DNA and membrane proteins were denatured and precipitated to the bottom of the tube. After adding enzyme and Sanger reagents, the resulting solution was aspirated into the reaction capillaries by a syringe pump, and cycle sequencing was initiated. No deleterious effect upon the reaction efficiency, the on-line purification system, or the capillary electrophoresis separation was observed, even though the crude lysate was used as the template. Multiplexed on-line DNA sequencing data from 8 parallel channels allowed base calling up to 620 bp with an accuracy of 98%. The entire system can be automatically regenerated for repeated operation. For PCR based DNA analysis, they demonstrated that capillary electrophoresis with UV detection can be used for DNA analysis starting from clinical sample without purification. After PCR reaction using cheek cell, blood or HIV-1 gag DNA, the reaction mixtures was injected into the capillary either on-line or off-line by base stacking. The protocol was also applied to capillary array electrophoresis. The use of cheaper detection, and the elimination of purification of DNA sample before or after PCR reaction, will make this approach an

  20. Genome-wide analysis of microRNAs in rubber tree (Hevea brasiliensis L.) using high-throughput sequencing.

    Science.gov (United States)

    Lertpanyasampatha, Manassawe; Gao, Lei; Kongsawadworakul, Panida; Viboonjun, Unchera; Chrestin, Hervé; Liu, Renyi; Chen, Xuemei; Narangajavana, Jarunya

    2012-08-01

    MicroRNAs (miRNAs) are short RNAs with essential roles in gene regulation in various organisms including higher plants. In contrast to the vast information on miRNAs from many economically important plants, almost nothing has been reported on the identification or analysis of miRNAs from rubber tree (Hevea brasiliensis L.), the most important natural rubber-producing crop. To identify miRNAs and their target genes in rubber tree, high-throughput sequencing combined with a computational approach was performed. Four small RNA libraries were constructed for deep sequencing from mature and young leaves of two rubber tree clones, PB 260 and PB 217, which provide high and low latex yield, respectively. 115 miRNAs belonging to 56 known miRNA families were identified, and northern hybridization validated miRNA expression and revealed developmental stage-dependent and clone-specific expression for some miRNAs. We took advantage of the newly released rubber tree genome assembly and predicted 20 novel miRNAs. Further, computational analysis uncovered potential targets of the known and novel miRNAs. Predicted target genes included not only transcription factors but also genes involved in various biological processes including stress responses, primary and secondary metabolism, and signal transduction. In particular, genes with roles in rubber biosynthesis are predicted targets of miRNAs. This study provides a basic catalog of miRNAs and their targets in rubber tree to facilitate future improvement and exploitation of rubber tree.

  1. ImmuneDB: a system for the analysis and exploration of high-throughput adaptive immune receptor sequencing data.

    Science.gov (United States)

    Rosenfeld, Aaron M; Meng, Wenzhao; Luning Prak, Eline T; Hershberg, Uri

    2017-01-15

    As high-throughput sequencing of B cells becomes more common, the need for tools to analyze the large quantity of data also increases. This article introduces ImmuneDB, a system for analyzing vast amounts of heavy chain variable region sequences and exploring the resulting data. It can take as input raw FASTA/FASTQ data, identify genes, determine clones, construct lineages, as well as provide information such as selection pressure and mutation analysis. It uses an industry leading database, MySQL, to provide fast analysis and avoid the complexities of using error prone flat-files. ImmuneDB is freely available at http://immunedb.comA demo of the ImmuneDB web interface is available at: http://immunedb.com/demo CONTACT: Uh25@drexel.eduSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  2. High throughput 16S rRNA gene amplicon sequencing

    DEFF Research Database (Denmark)

    Nierychlo, Marta; Larsen, Poul; Jørgensen, Mads Koustrup

    S rRNA gene amplicon sequencing has been developed over the past few years and is now ready to use for more comprehensive studies related to plant operation and optimization thanks to short analysis time, low cost, high throughput, and high taxonomic resolution. In this study we show how 16S r...

  3. MetaGenSense: A web-application for analysis and exploration of high throughput sequencing metagenomic data.

    Science.gov (United States)

    Correia, Damien; Doppelt-Azeroual, Olivia; Denis, Jean-Baptiste; Vandenbogaert, Mathias; Caro, Valérie

    2015-01-01

    The detection and characterization of emerging infectious agents has been a continuing public health concern. High Throughput Sequencing (HTS) or Next-Generation Sequencing (NGS) technologies have proven to be promising approaches for efficient and unbiased detection of pathogens in complex biological samples, providing access to comprehensive analyses. As NGS approaches typically yield millions of putatively representative reads per sample, efficient data management and visualization resources have become mandatory. Most usually, those resources are implemented through a dedicated Laboratory Information Management System (LIMS), solely to provide perspective regarding the available information. We developed an easily deployable web-interface, facilitating management and bioinformatics analysis of metagenomics data-samples. It was engineered to run associated and dedicated Galaxy workflows for the detection and eventually classification of pathogens. The web application allows easy interaction with existing Galaxy metagenomic workflows, facilitates the organization, exploration and aggregation of the most relevant sample-specific sequences among millions of genomic sequences, allowing them to determine their relative abundance, and associate them to the most closely related organism or pathogen. The user-friendly Django-Based interface, associates the users' input data and its metadata through a bio-IT provided set of resources (a Galaxy instance, and both sufficient storage and grid computing power). Galaxy is used to handle and analyze the user's input data from loading, indexing, mapping, assembly and DB-searches. Interaction between our application and Galaxy is ensured by the BioBlend library, which gives API-based access to Galaxy's main features. Metadata about samples, runs, as well as the workflow results are stored in the LIMS. For metagenomic classification and exploration purposes, we show, as a proof of concept, that integration of intuitive

  4. High-throughput sequence alignment using Graphics Processing Units

    Directory of Open Access Journals (Sweden)

    Trapnell Cole

    2007-12-01

    Full Text Available Abstract Background The recent availability of new, less expensive high-throughput DNA sequencing technologies has yielded a dramatic increase in the volume of sequence data that must be analyzed. These data are being generated for several purposes, including genotyping, genome resequencing, metagenomics, and de novo genome assembly projects. Sequence alignment programs such as MUMmer have proven essential for analysis of these data, but researchers will need ever faster, high-throughput alignment tools running on inexpensive hardware to keep up with new sequence technologies. Results This paper describes MUMmerGPU, an open-source high-throughput parallel pairwise local sequence alignment program that runs on commodity Graphics Processing Units (GPUs in common workstations. MUMmerGPU uses the new Compute Unified Device Architecture (CUDA from nVidia to align multiple query sequences against a single reference sequence stored as a suffix tree. By processing the queries in parallel on the highly parallel graphics card, MUMmerGPU achieves more than a 10-fold speedup over a serial CPU version of the sequence alignment kernel, and outperforms the exact alignment component of MUMmer on a high end CPU by 3.5-fold in total application time when aligning reads from recent sequencing projects using Solexa/Illumina, 454, and Sanger sequencing technologies. Conclusion MUMmerGPU is a low cost, ultra-fast sequence alignment program designed to handle the increasing volume of data produced by new, high-throughput sequencing technologies. MUMmerGPU demonstrates that even memory-intensive applications can run significantly faster on the relatively low-cost GPU than on the CPU.

  5. High-throughput sequence alignment using Graphics Processing Units.

    Science.gov (United States)

    Schatz, Michael C; Trapnell, Cole; Delcher, Arthur L; Varshney, Amitabh

    2007-12-10

    The recent availability of new, less expensive high-throughput DNA sequencing technologies has yielded a dramatic increase in the volume of sequence data that must be analyzed. These data are being generated for several purposes, including genotyping, genome resequencing, metagenomics, and de novo genome assembly projects. Sequence alignment programs such as MUMmer have proven essential for analysis of these data, but researchers will need ever faster, high-throughput alignment tools running on inexpensive hardware to keep up with new sequence technologies. This paper describes MUMmerGPU, an open-source high-throughput parallel pairwise local sequence alignment program that runs on commodity Graphics Processing Units (GPUs) in common workstations. MUMmerGPU uses the new Compute Unified Device Architecture (CUDA) from nVidia to align multiple query sequences against a single reference sequence stored as a suffix tree. By processing the queries in parallel on the highly parallel graphics card, MUMmerGPU achieves more than a 10-fold speedup over a serial CPU version of the sequence alignment kernel, and outperforms the exact alignment component of MUMmer on a high end CPU by 3.5-fold in total application time when aligning reads from recent sequencing projects using Solexa/Illumina, 454, and Sanger sequencing technologies. MUMmerGPU is a low cost, ultra-fast sequence alignment program designed to handle the increasing volume of data produced by new, high-throughput sequencing technologies. MUMmerGPU demonstrates that even memory-intensive applications can run significantly faster on the relatively low-cost GPU than on the CPU.

  6. HTSSIP: An R package for analysis of high throughput sequencing data from nucleic acid stable isotope probing (SIP experiments.

    Directory of Open Access Journals (Sweden)

    Nicholas D Youngblut

    Full Text Available Combining high throughput sequencing with stable isotope probing (HTS-SIP is a powerful method for mapping in situ metabolic processes to thousands of microbial taxa. However, accurately mapping metabolic processes to taxa is complex and challenging. Multiple HTS-SIP data analysis methods have been developed, including high-resolution stable isotope probing (HR-SIP, multi-window high-resolution stable isotope probing (MW-HR-SIP, quantitative stable isotope probing (qSIP, and ΔBD. Currently, there is no publicly available software designed specifically for analyzing HTS-SIP data. To address this shortfall, we have developed the HTSSIP R package, an open-source, cross-platform toolset for conducting HTS-SIP analyses in a straightforward and easily reproducible manner. The HTSSIP package, along with full documentation and examples, is available from CRAN at https://cran.r-project.org/web/packages/HTSSIP/index.html and Github at https://github.com/buckleylab/HTSSIP.

  7. Metagenomic analysis and functional characterization of the biogas microbiome using high throughput shotgun sequencing and a novel binning strategy

    DEFF Research Database (Denmark)

    Campanaro, Stefano; Treu, Laura; Kougias, Panagiotis

    2016-01-01

    Biogas production is an economically attractive technology that has gained momentum worldwide over the past years. Biogas is produced by a biologically mediated process, widely known as "anaerobic digestion." This process is performed by a specialized and complex microbial community, in which...... dissect the bioma involved in anaerobic digestion by means of high throughput Illumina sequencing (~51 gigabases of sequence data), disclosing nearly one million genes and extracting 106 microbial genomes by a novel strategy combining two binning processes. Microbial phylogeny and putative taxonomy...

  8. Metabolomic and high-throughput sequencing analysis – modern approach for the assessment of biodeterioration of materials from historic buildings

    Directory of Open Access Journals (Sweden)

    Beata eGutarowska

    2015-09-01

    Full Text Available Preservation of cultural heritage is of paramount importance worldwide. Microbial colonization of construction materials, such as wood, brick, mortar and stone in historic buildings can lead to severe deterioration. The aim of the present study was to give modern insight into the phylogenetic diversity and activated metabolic pathways of microbial communities colonized historic objects located in the former Auschwitz II-Birkenau concentration and extermination camp in Oświęcim, Poland. For this purpose we combined molecular, microscopic and chemical methods. Selected specimens were examined using Field Emission Scanning Electron Microscopy (FESEM, metabolomic analysis and high-throughput Illumina sequencing. FESEM imaging revealed the presence of complex microbial communities comprising diatoms, fungi and bacteria, mainly cyanobacteria and actinobacteria, on sample surfaces. Microbial diversity of brick specimens appeared higher than that of the wood and was dominated by algae and cyanobacteria, while wood was mainly colonized by fungi. DNA sequences documented the presence of 15 bacterial phyla representing 99 genera including Halomonas, Halorhodospira, Salinisphaera, Salinibacterium, Rubrobacter, Streptomyces, Arthrobacter and 9 fungal classes represented by 113 genera including Cladosporium, Acremonium, Alternaria, Engyodontium, Penicillium, Rhizopus and Aureobasidium. Most of the identified sequences were characteristic of organisms implicated in deterioration of wood and brick. Metabolomic data indicated the activation of numerous metabolic pathways, including those regulating the production of primary and secondary metabolites, for example, metabolites associated with the production of antibiotics, organic acids and deterioration of organic compounds. The study demonstrated that a combination of electron microscopy imaging with metabolomic and genomic techniques allows to link the phylogenetic information and metabolic profiles of

  9. Comparison of analysis tools for miRNA high throughput sequencing using nerve crush as a model

    Directory of Open Access Journals (Sweden)

    Raghu Prasad Rao Metpally

    2013-03-01

    Full Text Available Recent advances in sample preparation and analysis for next generation sequencing have made it possible to profile and discover new miRNAs in a high throughput manner. In the case of neurological disease and injury, these types of experiments have been more limited. Possibly because tissues such as the brain and spinal cord are inaccessible for direct sampling in living patients, and indirect sampling of blood and cerebrospinal fluid are affected by low amounts of RNA. We used a mouse model to examine changes in miRNA expression in response to acute nerve crush. We assayed miRNA from both muscle tissue and blood plasma. We examined how the depth of coverage (the number of mapped reads changed the number of detectable miRNAs in each sample type. We also found that samples with very low starting amounts of RNA (mouse plasma made high depth of mature miRNA coverage more difficult to obtain. Each tissue must be assessed independently for the depth of coverage required to adequately power detection of differential expression, weighed against the cost of sequencing that sample to the adequate depth. We explored the changes in total mapped reads and differential expression results generated by three different software packages: miRDeep2, miRNAKey, and miRExpress and two different analysis packages, DESeq and EdgeR. We also examine the accuracy of using miRDeep2 to predict novel miRNAs and subsequently detect them in the samples using qRT-PCR.

  10. Identification of microRNAs in the Toxigenic Dinoflagellate Alexandrium catenella by High-Throughput Illumina Sequencing and Bioinformatic Analysis.

    Directory of Open Access Journals (Sweden)

    Huili Geng

    Full Text Available Micro-ribonucleic acids (miRNAs are a large group of endogenous, tiny, non-coding RNAs consisting of 19-25 nucleotides that regulate gene expression at either the transcriptional or post-transcriptional level by mediating gene silencing in eukaryotes. They are considered to be important regulators that affect growth, development, and response to various stresses in plants. Alexandrium catenella is an important marine toxic phytoplankton species that can cause harmful algal blooms (HABs. To date, identification and function analysis of miRNAs in A. catenella remain largely unexamined. In this study, high-throughput sequencing was performed on A. catenella to identify and quantitatively profile the repertoire of small RNAs from two different growth phases. A total of 38,092,056 and 32,969,156 raw reads were obtained from the two small RNA libraries, respectively. In total, 88 mature miRNAs belonging to 32 miRNA families were identified. Significant differences were found in the member number, expression level of various families, and expression abundance of each member within a family. A total of 15 potentially novel miRNAs were identified. Comparative profiling showed that 12 known miRNAs exhibited differential expression between the lag phase and the logarithmic phase. Real-time quantitative RT-PCR (qPCR was performed to confirm the expression of two differentially expressed miRNAs that were one up-regulated novel miRNA (aca-miR-3p-456915, and one down-regulated conserved miRNA (tae-miR159a. The expression trend of the qPCR assay was generally consistent with the deep sequencing result. Target predictions of the 12 differentially expressed miRNAs resulted in 1813 target genes. Gene ontology (GO analysis and the Kyoto Encyclopedia of Genes and Genomes pathway database (KEGG annotations revealed that some miRNAs were associated with growth and developmental processes of the alga. These results provide insights into the roles that miRNAs play in

  11. eRNA: a graphic user interface-based tool optimized for large data analysis from high-throughput RNA sequencing.

    Science.gov (United States)

    Yuan, Tiezheng; Huang, Xiaoyi; Dittmar, Rachel L; Du, Meijun; Kohli, Manish; Boardman, Lisa; Thibodeau, Stephen N; Wang, Liang

    2014-03-05

    RNA sequencing (RNA-seq) is emerging as a critical approach in biological research. However, its high-throughput advantage is significantly limited by the capacity of bioinformatics tools. The research community urgently needs user-friendly tools to efficiently analyze the complicated data generated by high throughput sequencers. We developed a standalone tool with graphic user interface (GUI)-based analytic modules, known as eRNA. The capacity of performing parallel processing and sample management facilitates large data analyses by maximizing hardware usage and freeing users from tediously handling sequencing data. The module miRNA identification" includes GUIs for raw data reading, adapter removal, sequence alignment, and read counting. The module "mRNA identification" includes GUIs for reference sequences, genome mapping, transcript assembling, and differential expression. The module "Target screening" provides expression profiling analyses and graphic visualization. The module "Self-testing" offers the directory setups, sample management, and a check for third-party package dependency. Integration of other GUIs including Bowtie, miRDeep2, and miRspring extend the program's functionality. eRNA focuses on the common tools required for the mapping and quantification analysis of miRNA-seq and mRNA-seq data. The software package provides an additional choice for scientists who require a user-friendly computing environment and high-throughput capacity for large data analysis. eRNA is available for free download at https://sourceforge.net/projects/erna/?source=directory.

  12. Applications of High Throughput Sequencing for Immunology and Clinical Diagnostics

    OpenAIRE

    Kim, Hyunsung John

    2014-01-01

    High throughput sequencing methods have fundamentally shifted the manner in which biological experiments are performed. In this dissertation, conventional and novel high throughput sequencing and bioinformatics methods are applied to immunology and diagnostics. In order to study rare subsets of cells, an RNA sequencing method was first optimized for use with minimal levels of RNA and cellular input. The optimized RNA sequencing method was then applied to study the transcriptional differences ...

  13. High throughput sequencing of microRNAs in chicken somites.

    Science.gov (United States)

    Rathjen, Tina; Pais, Helio; Sweetman, Dylan; Moulton, Vincent; Munsterberg, Andrea; Dalmay, Tamas

    2009-05-06

    High throughput Solexa sequencing technology was applied to identify microRNAs in somites of developing chicken embryos. We obtained 651,273 reads, from which 340,415 were mapped to the chicken genome representing 1701 distinct sequences. Eighty-five of these were known microRNAs and 42 novel miRNA candidates were identified. Accumulation of 18 of 42 sequences was confirmed by Northern blot analysis. Ten of the 18 sequences are new variants of known miRNAs and eight short RNAs are novel miRNAs. Six of these eight have not been reported by other deep sequencing projects. One of the six new miRNAs is highly enriched in somite tissue suggesting that deep sequencing of other specific tissues has the potential to identify novel tissue specific miRNAs.

  14. Taxonomic analysis of the microbial community in stored sugar beets using high-throughput sequencing of different marker genes.

    Science.gov (United States)

    Liebe, Sebastian; Wibberg, Daniel; Winkler, Anika; Pühler, Alfred; Schlüter, Andreas; Varrelmann, Mark

    2016-02-01

    Post-harvest colonization of sugar beets accompanied by rot development is a serious problem due to sugar losses and negative impact on processing quality. Studies on the microbial community associated with rot development and factors shaping their structure are missing. Therefore, high-throughput sequencing was applied to describe the influence of environment, plant genotype and storage temperature (8°C and 20°C) on three different communities in stored sugar beets, namely fungi (internal transcribed spacers 1 and 2), Fusarium spp. (elongation factor-1α gene fragment) and oomycetes (internal transcribed spacers 1). The composition of the fungal community changed during storage mostly influenced by the storage temperature followed by a weak environmental effect. Botrytis cinerea was the prevalent species at 8°C whereas members of the fungal genera Fusarium and Penicillium became dominant at 20°C. This shift was independent of the plant genotype. Species richness within the genus Fusarium also increased during storage at both temperatures whereas the oomycetes community did not change. Moreover, oomycetes species were absent after storage at 20°C. The results of the present study clearly show that rot development during sugar beet storage is associated with pathogens well known as causal agents of post-harvest diseases in many other crops. © FEMS 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  15. Temporal dynamics of soil microbial communities under different moisture regimes: high-throughput sequencing and bioinformatics analysis

    Science.gov (United States)

    Semenov, Mikhail; Zhuravleva, Anna; Semenov, Vyacheslav; Yevdokimov, Ilya; Larionova, Alla

    2017-04-01

    Recent climate scenarios predict not only continued global warming but also an increased frequency and intensity of extreme climatic events such as strong changes in temperature and precipitation regimes. Microorganisms are well known to be more sensitive to changes in environmental conditions than to other soil chemical and physical parameters. In this study, we determined the shifts in soil microbial community structure as well as indicative taxa in soils under three moisture regimes using high-throughput Illumina sequencing and range of bioinformatics approaches for the assessment of sequence data. Incubation experiments were performed in soil-filled (Greyic Phaeozems Albic) rhizoboxes with maize and without plants. Three contrasting moisture regimes were being simulated: 1) optimal wetting (OW), a watering 2-3 times per week to maintain soil moisture of 20-25% by weight; 2) periodic wetting (PW), with alternating periods of wetting and drought; and 3) constant insufficient wetting (IW), while soil moisture of 12% by weight was permanently maintained. Sampled fresh soils were homogenized, and the total DNA of three replicates was extracted using the FastDNA® SPIN kit for Soil. DNA replicates were combined in a pooled sample and the DNA was used for PCR with specific primers for the 16S V3 and V4 regions. In order to compare variability between different samples and replicates within a single sample, some DNA replicates treated separately. The products were purified and submitted to Illumina MiSeq sequencing. Sequence data were evaluated by alpha-diversity (Chao1 and Shannon H' diversity indexes), beta-diversity (UniFrac and Bray-Curtis dissimilarity), heatmap, tagcloud, and plot-bar analyses using the MiSeq Reporter Metagenomics Workflow and R packages (phyloseq, vegan, tagcloud). Shannon index varied in a rather narrow range (4.4-4.9) with the lowest values for microbial communities under PW treatment. Chao1 index varied from 385 to 480, being a more flexible

  16. High-throughput sequence analysis of small RNAs in grapevine (Vitis vinifera L.) affected by grapevine leafroll disease.

    Science.gov (United States)

    Alabi, Olufemi J; Zheng, Yun; Jagadeeswaran, Guru; Sunkar, Ramanjulu; Naidu, Rayapati A

    2012-12-01

    Grapevine leafroll disease (GLRD) is one of the most economically important virus diseases of grapevine (Vitis spp.) worldwide. In this study, we used high-throughput sequencing of cDNA libraries made from small RNAs (sRNAs) to compare profiles of sRNA populations recovered from own-rooted Merlot grapevines with and without GLRD symptoms. The data revealed the presence of sRNAs specific to Grapevine leafroll-associated virus 3, Hop stunt viroid (HpSVd), Grapevine yellow speckle viroid 1 (GYSVd-1) and Grapevine yellow speckle viroid 2 (GYSVd-2) in symptomatic grapevines and sRNAs specific only to HpSVd, GYSVd-1 and GYSVd-2 in nonsymptomatic grapevines. In addition to 135 previously identified conserved microRNAs in grapevine (Vvi-miRs), we identified 10 novel and several candidate Vvi-miRs in both symptomatic and nonsymptomatic grapevine leaves based on the cloning of miRNA star sequences. Quantitative real-time reverse transcriptase-polymerase chain reaction (RT-PCR) of selected conserved Vvi-miRs indicated that individual members of an miRNA family are differentially expressed in symptomatic and nonsymptomatic leaves. The high-resolution mapping of sRNAs specific to an ampelovirus and three viroids in mixed infections, the identification of novel Vvi-miRs and the modulation of certain conserved Vvi-miRs offers resources for the further elucidation of compatible host-pathogen interactions and for the provision of ecologically relevant information to better understand host-pathogen-environment interactions in a perennial fruit crop. © 2012 THE AUTHORS. MOLECULAR PLANT PATHOLOGY © 2012 BSPP AND BLACKWELL PUBLISHING LTD.

  17. eRNA: a graphic user interface-based tool optimized for large data analysis from high-throughput RNA sequencing

    Science.gov (United States)

    2014-01-01

    Background RNA sequencing (RNA-seq) is emerging as a critical approach in biological research. However, its high-throughput advantage is significantly limited by the capacity of bioinformatics tools. The research community urgently needs user-friendly tools to efficiently analyze the complicated data generated by high throughput sequencers. Results We developed a standalone tool with graphic user interface (GUI)-based analytic modules, known as eRNA. The capacity of performing parallel processing and sample management facilitates large data analyses by maximizing hardware usage and freeing users from tediously handling sequencing data. The module miRNA identification” includes GUIs for raw data reading, adapter removal, sequence alignment, and read counting. The module “mRNA identification” includes GUIs for reference sequences, genome mapping, transcript assembling, and differential expression. The module “Target screening” provides expression profiling analyses and graphic visualization. The module “Self-testing” offers the directory setups, sample management, and a check for third-party package dependency. Integration of other GUIs including Bowtie, miRDeep2, and miRspring extend the program’s functionality. Conclusions eRNA focuses on the common tools required for the mapping and quantification analysis of miRNA-seq and mRNA-seq data. The software package provides an additional choice for scientists who require a user-friendly computing environment and high-throughput capacity for large data analysis. eRNA is available for free download at https://sourceforge.net/projects/erna/?source=directory. PMID:24593312

  18. Applications of High-Throughput Nucleotide Sequencing (PhD)

    DEFF Research Database (Denmark)

    Waage, Johannes

    The recent advent of high throughput sequencing of nucleic acids (RNA and DNA) has vastly expanded research into the functional and structural biology of the genome of all living organisms (and even a few dead ones). With this enormous and exponential growth in biological data generation come...... equally large demands in data handling, analysis and interpretation, perhaps defining the modern challenge of the computational biologist of the post-genomic era. The first part of this thesis consists of a general introduction to the history, common terms and challenges of next generation sequencing......). For the second flavor, DNA-seq, a study presenting genome wide profiling of transcription factor CEBP/A in liver cells undergoing regeneration after partial hepatectomy (article IV) is included....

  19. High-throughput sequencing in mitochondrial DNA research.

    Science.gov (United States)

    Ye, Fei; Samuels, David C; Clark, Travis; Guo, Yan

    2014-07-01

    Next-generation sequencing, also known as high-throughput sequencing, has greatly enhanced researchers' ability to conduct biomedical research on all levels. Mitochondrial research has also benefitted greatly from high-throughput sequencing; sequencing technology now allows for screening of all 16,569 base pairs of the mitochondrial genome simultaneously for SNPs and low level heteroplasmy and, in some cases, the estimation of mitochondrial DNA copy number. It is important to realize the full potential of high-throughput sequencing for the advancement of mitochondrial research. To this end, we review how high-throughput sequencing has impacted mitochondrial research in the categories of SNPs, low level heteroplasmy, copy number, and structural variants. We also discuss the different types of mitochondrial DNA sequencing and their pros and cons. Based on previous studies conducted by various groups, we provide strategies for processing mitochondrial DNA sequencing data, including assembly, variant calling, and quality control. Copyright © 2014 Elsevier B.V. and Mitochondria Research Society. All rights reserved.

  20. Savant: genome browser for high-throughput sequencing data.

    Science.gov (United States)

    Fiume, Marc; Williams, Vanessa; Brook, Andrew; Brudno, Michael

    2010-08-15

    The advent of high-throughput sequencing (HTS) technologies has made it affordable to sequence many individuals' genomes. Simultaneously the computational analysis of the large volumes of data generated by the new sequencing machines remains a challenge. While a plethora of tools are available to map the resulting reads to a reference genome, and to conduct primary analysis of the mappings, it is often necessary to visually examine the results and underlying data to confirm predictions and understand the functional effects, especially in the context of other datasets. We introduce Savant, the Sequence Annotation, Visualization and ANalysis Tool, a desktop visualization and analysis browser for genomic data. Savant was developed for visualizing and analyzing HTS data, with special care taken to enable dynamic visualization in the presence of gigabases of genomic reads and references the size of the human genome. Savant supports the visualization of genome-based sequence, point, interval and continuous datasets, and multiple visualization modes that enable easy identification of genomic variants (including single nucleotide polymorphisms, structural and copy number variants), and functional genomic information (e.g. peaks in ChIP-seq data) in the context of genomic annotations. Savant is freely available at http://compbio.cs.toronto.edu/savant.

  1. Methylation Sensitive Amplification Polymorphism Sequencing (MSAP-Seq)-A Method for High-Throughput Analysis of Differentially Methylated CCGG Sites in Plants with Large Genomes.

    Science.gov (United States)

    Chwialkowska, Karolina; Korotko, Urszula; Kosinska, Joanna; Szarejko, Iwona; Kwasniewski, Miroslaw

    2017-01-01

    Epigenetic mechanisms, including histone modifications and DNA methylation, mutually regulate chromatin structure, maintain genome integrity, and affect gene expression and transposon mobility. Variations in DNA methylation within plant populations, as well as methylation in response to internal and external factors, are of increasing interest, especially in the crop research field. Methylation Sensitive Amplification Polymorphism (MSAP) is one of the most commonly used methods for assessing DNA methylation changes in plants. This method involves gel-based visualization of PCR fragments from selectively amplified DNA that are cleaved using methylation-sensitive restriction enzymes. In this study, we developed and validated a new method based on the conventional MSAP approach called Methylation Sensitive Amplification Polymorphism Sequencing (MSAP-Seq). We improved the MSAP-based approach by replacing the conventional separation of amplicons on polyacrylamide gels with direct, high-throughput sequencing using Next Generation Sequencing (NGS) and automated data analysis. MSAP-Seq allows for global sequence-based identification of changes in DNA methylation. This technique was validated in Hordeum vulgare. However, MSAP-Seq can be straightforwardly implemented in different plant species, including crops with large, complex and highly repetitive genomes. The incorporation of high-throughput sequencing into MSAP-Seq enables parallel and direct analysis of DNA methylation in hundreds of thousands of sites across the genome. MSAP-Seq provides direct genomic localization of changes and enables quantitative evaluation. We have shown that the MSAP-Seq method specifically targets gene-containing regions and that a single analysis can cover three-quarters of all genes in large genomes. Moreover, MSAP-Seq's simplicity, cost effectiveness, and high-multiplexing capability make this method highly affordable. Therefore, MSAP-Seq can be used for DNA methylation analysis in crop

  2. Leveraging the power of high performance computing for next generation sequencing data analysis: tricks and twists from a high throughput exome workflow.

    Directory of Open Access Journals (Sweden)

    Amit Kawalia

    Full Text Available Next generation sequencing (NGS has been a great success and is now a standard method of research in the life sciences. With this technology, dozens of whole genomes or hundreds of exomes can be sequenced in rather short time, producing huge amounts of data. Complex bioinformatics analyses are required to turn these data into scientific findings. In order to run these analyses fast, automated workflows implemented on high performance computers are state of the art. While providing sufficient compute power and storage to meet the NGS data challenge, high performance computing (HPC systems require special care when utilized for high throughput processing. This is especially true if the HPC system is shared by different users. Here, stability, robustness and maintainability are as important for automated workflows as speed and throughput. To achieve all of these aims, dedicated solutions have to be developed. In this paper, we present the tricks and twists that we utilized in the implementation of our exome data processing workflow. It may serve as a guideline for other high throughput data analysis projects using a similar infrastructure. The code implementing our solutions is provided in the supporting information files.

  3. Leveraging the Power of High Performance Computing for Next Generation Sequencing Data Analysis: Tricks and Twists from a High Throughput Exome Workflow

    Science.gov (United States)

    Wonczak, Stephan; Thiele, Holger; Nieroda, Lech; Jabbari, Kamel; Borowski, Stefan; Sinha, Vishal; Gunia, Wilfried; Lang, Ulrich; Achter, Viktor; Nürnberg, Peter

    2015-01-01

    Next generation sequencing (NGS) has been a great success and is now a standard method of research in the life sciences. With this technology, dozens of whole genomes or hundreds of exomes can be sequenced in rather short time, producing huge amounts of data. Complex bioinformatics analyses are required to turn these data into scientific findings. In order to run these analyses fast, automated workflows implemented on high performance computers are state of the art. While providing sufficient compute power and storage to meet the NGS data challenge, high performance computing (HPC) systems require special care when utilized for high throughput processing. This is especially true if the HPC system is shared by different users. Here, stability, robustness and maintainability are as important for automated workflows as speed and throughput. To achieve all of these aims, dedicated solutions have to be developed. In this paper, we present the tricks and twists that we utilized in the implementation of our exome data processing workflow. It may serve as a guideline for other high throughput data analysis projects using a similar infrastructure. The code implementing our solutions is provided in the supporting information files. PMID:25942438

  4. Combination of amplified rDNA restriction analysis and high-throughput sequencing revealed the negative effect of colistin sulfate on the diversity of soil microorganisms.

    Science.gov (United States)

    Fan, Tingli; Sun, Yongxue; Peng, Jinju; Wu, Qun; Ma, Yi; Zhou, Xiaohui

    2018-01-01

    Colistin sulfate is widely used in both human and veterinary medicine. However, its effect on the microbial ecologyis unknown. In this study, we determined the effect of colistin sulfate on the diversity of soil microorganisms by amplified rDNA restriction analysis (ARDRA) and high-throughput sequencing.ARDRAshowed that the diversity of DNA from soil microorganisms was reduced after soil was treated with colistin sulfate, with the most dramatic reductionobserved after 35days of treatment. High-throughput sequencing showed that the Chao1 and abundance-based coverage estimators (ACE) were reduced in the soils treated with colistin sulfate for 35 dayscompared to those treated with colistin sulfate for 7days. Furthermore, Chao1 and ACE tended to be lower when higher concentration of colistin sulfate was used, suggesting that the microbial abundance is reduced by colistin sulfate in a dose-dependent manner. Shannon index showed that the diversity of soil microorganism was reduced upon treatment with colistin sulfate compared to the untreated control group. Following 7days of treatment, Bacillus, Clostridiumand Sphingomonas were sensitive to all the concentration of colistin sulfate used in this study. Following 35days of treatment, the abundance of Choroplast, Haliangium, Pseudomonas, Lactococcus, and Clostridium was significantly decreased. Our results demonstrated that colistin sulfate especially at high concentration (≥5mg/kg) could alter the population structure of microorganisms and consequently the microbial community function in soil. Copyright © 2017 Elsevier GmbH. All rights reserved.

  5. Leveraging the power of high performance computing for next generation sequencing data analysis: tricks and twists from a high throughput exome workflow.

    Science.gov (United States)

    Kawalia, Amit; Motameny, Susanne; Wonczak, Stephan; Thiele, Holger; Nieroda, Lech; Jabbari, Kamel; Borowski, Stefan; Sinha, Vishal; Gunia, Wilfried; Lang, Ulrich; Achter, Viktor; Nürnberg, Peter

    2015-01-01

    Next generation sequencing (NGS) has been a great success and is now a standard method of research in the life sciences. With this technology, dozens of whole genomes or hundreds of exomes can be sequenced in rather short time, producing huge amounts of data. Complex bioinformatics analyses are required to turn these data into scientific findings. In order to run these analyses fast, automated workflows implemented on high performance computers are state of the art. While providing sufficient compute power and storage to meet the NGS data challenge, high performance computing (HPC) systems require special care when utilized for high throughput processing. This is especially true if the HPC system is shared by different users. Here, stability, robustness and maintainability are as important for automated workflows as speed and throughput. To achieve all of these aims, dedicated solutions have to be developed. In this paper, we present the tricks and twists that we utilized in the implementation of our exome data processing workflow. It may serve as a guideline for other high throughput data analysis projects using a similar infrastructure. The code implementing our solutions is provided in the supporting information files.

  6. Development and Evaluation of Quality Metrics for Bioinformatics Analysis of Viral Insertion Site Data Generated Using High Throughput Sequencing.

    Science.gov (United States)

    Gao, Hongyu; Hawkins, Troy; Jasti, Aparna; Chen, Yu-Hsiang; Mockaitis, Keithanne; Dinauer, Mary; Cornetta, Kenneth

    2014-05-06

    Integration of viral vectors into a host genome is associated with insertional mutagenesis and subjects in clinical gene therapy trials must be monitored for this adverse event. Several PCR based methods such as ligase-mediated (LM) PCR, linear-amplification-mediated (LAM) PCR and non-restrictive (nr) LAM PCR were developed to identify sites of vector integration. Coupling the power of next-generation sequencing technologies with various PCR approaches will provide a comprehensive and genome-wide profiling of insertion sites and increase throughput. In this bioinformatics study, we aimed to develop and apply quality metrics to viral insertion data obtained using next-generation sequencing. We developed five simple metrics for assessing next-generation sequencing data from different PCR products and showed how the metrics can be used to objectively compare runs performed with the same methodology as well as data generated using different PCR techniques. The results will help researchers troubleshoot complex methodologies, understand the quality of sequencing data, and provide a starting point for developing standardization of vector insertion site data analysis.

  7. Uncovering leaf rust responsive miRNAs in wheat (Triticum aestivum L.) using high-throughput sequencing and prediction of their targets through degradome analysis.

    Science.gov (United States)

    Kumar, Dhananjay; Dutta, Summi; Singh, Dharmendra; Prabhu, Kumble Vinod; Kumar, Manish; Mukhopadhyay, Kunal

    2017-01-01

    Deep sequencing identified 497 conserved and 559 novel miRNAs in wheat, while degradome analysis revealed 701 targets genes. QRT-PCR demonstrated differential expression of miRNAs during stages of leaf rust progression. Bread wheat (Triticum aestivum L.) is an important cereal food crop feeding 30 % of the world population. Major threat to wheat production is the rust epidemics. This study was targeted towards identification and functional characterizations of micro(mi)RNAs and their target genes in wheat in response to leaf rust ingression. High-throughput sequencing was used for transcriptome-wide identification of miRNAs and their expression profiling in retort to leaf rust using mock and pathogen-inoculated resistant and susceptible near-isogenic wheat plants. A total of 1056 mature miRNAs were identified, of which 497 miRNAs were conserved and 559 miRNAs were novel. The pathogen-inoculated resistant plants manifested more miRNAs compared with the pathogen infected susceptible plants. The miRNA counts increased in susceptible isoline due to leaf rust, conversely, the counts decreased in the resistant isoline in response to pathogenesis illustrating precise spatial tuning of miRNAs during compatible and incompatible interaction. Stem-loop quantitative real-time PCR was used to profile 10 highly differentially expressed miRNAs obtained from high-throughput sequencing data. The spatio-temporal profiling validated the differential expression of miRNAs between the isolines as well as in retort to pathogen infection. Degradome analysis provided 701 predicted target genes associated with defense response, signal transduction, development, metabolism, and transcriptional regulation. The obtained results indicate that wheat isolines employ diverse arrays of miRNAs that modulate their target genes during compatible and incompatible interaction. Our findings contribute to increase knowledge on roles of microRNA in wheat-leaf rust interactions and could help in rust

  8. Validation of high throughput sequencing and microbial forensics applications

    OpenAIRE

    Budowle, Bruce; Connell, Nancy D.; Bielecka-Oder, Anna; Rita R Colwell; Corbett, Cindi R.; Fletcher, Jacqueline; Forsman, Mats; Kadavy, Dana R; Markotic, Alemka; Morse, Stephen A.; Murch, Randall S; Sajantila, Antti; Schemes, Sarah E; Ternus, Krista L; Turner, Stephen D

    2014-01-01

    Abstract High throughput sequencing (HTS) generates large amounts of high quality sequence data for microbial genomics. The value of HTS for microbial forensics is the speed at which evidence can be collected and the power to characterize microbial-related evidence to solve biocrimes and bioterrorist events. As HTS technologies continue to improve, they provide increasingly powerful sets of tools to support the entire field of microbial forensics. Accurate, credible results a...

  9. Validation of high throughput sequencing and microbial forensics applications.

    Science.gov (United States)

    Budowle, Bruce; Connell, Nancy D; Bielecka-Oder, Anna; Colwell, Rita R; Corbett, Cindi R; Fletcher, Jacqueline; Forsman, Mats; Kadavy, Dana R; Markotic, Alemka; Morse, Stephen A; Murch, Randall S; Sajantila, Antti; Schmedes, Sarah E; Ternus, Krista L; Turner, Stephen D; Minot, Samuel

    2014-01-01

    High throughput sequencing (HTS) generates large amounts of high quality sequence data for microbial genomics. The value of HTS for microbial forensics is the speed at which evidence can be collected and the power to characterize microbial-related evidence to solve biocrimes and bioterrorist events. As HTS technologies continue to improve, they provide increasingly powerful sets of tools to support the entire field of microbial forensics. Accurate, credible results allow analysis and interpretation, significantly influencing the course and/or focus of an investigation, and can impact the response of the government to an attack having individual, political, economic or military consequences. Interpretation of the results of microbial forensic analyses relies on understanding the performance and limitations of HTS methods, including analytical processes, assays and data interpretation. The utility of HTS must be defined carefully within established operating conditions and tolerances. Validation is essential in the development and implementation of microbial forensics methods used for formulating investigative leads attribution. HTS strategies vary, requiring guiding principles for HTS system validation. Three initial aspects of HTS, irrespective of chemistry, instrumentation or software are: 1) sample preparation, 2) sequencing, and 3) data analysis. Criteria that should be considered for HTS validation for microbial forensics are presented here. Validation should be defined in terms of specific application and the criteria described here comprise a foundation for investigators to establish, validate and implement HTS as a tool in microbial forensics, enhancing public safety and national security.

  10. Automated high throughput nucleic acid purification from formalin-fixed paraffin-embedded tissue samples for next generation sequence analysis.

    Science.gov (United States)

    Haile, Simon; Pandoh, Pawan; McDonald, Helen; Corbett, Richard D; Tsao, Philip; Kirk, Heather; MacLeod, Tina; Jones, Martin; Bilobram, Steve; Brooks, Denise; Smailus, Duane; Steidl, Christian; Scott, David W; Bala, Miruna; Hirst, Martin; Miller, Diane; Moore, Richard A; Mungall, Andrew J; Coope, Robin J; Ma, Yussanne; Zhao, Yongjun; Holt, Rob A; Jones, Steven J; Marra, Marco A

    2017-01-01

    Curation and storage of formalin-fixed, paraffin-embedded (FFPE) samples are standard procedures in hospital pathology laboratories around the world. Many thousands of such samples exist and could be used for next generation sequencing analysis. Retrospective analyses of such samples are important for identifying molecular correlates of carcinogenesis, treatment history and disease outcomes. Two major hurdles in using FFPE material for sequencing are the damaged nature of the nucleic acids and the labor-intensive nature of nucleic acid purification. These limitations and a number of other issues that span multiple steps from nucleic acid purification to library construction are addressed here. We optimized and automated a 96-well magnetic bead-based extraction protocol that can be scaled to large cohorts and is compatible with automation. Using sets of 32 and 91 individual FFPE samples respectively, we generated libraries from 100 ng of total RNA and DNA starting amounts with 95-100% success rate. The use of the resulting RNA in micro-RNA sequencing was also demonstrated. In addition to offering the potential of scalability and rapid throughput, the yield obtained with lower input requirements makes these methods applicable to clinical samples where tissue abundance is limiting.

  11. Identification and characterization of cold-responsive microRNAs in tea plant (Camellia sinensis) and their targets using high-throughput sequencing and degradome analysis.

    Science.gov (United States)

    Zhang, Yue; Zhu, Xujun; Chen, Xuan; Song, Changnian; Zou, Zhongwei; Wang, Yuhua; Wang, Mingle; Fang, Wanping; Li, Xinghui

    2014-10-21

    MicroRNAs (miRNAs) are approximately 19 ~ 21 nucleotide noncoding RNAs produced by Dicer-catalyzed excision from stem-loop precursors. Many plant miRNAs have critical functions in development, nutrient homeostasis, abiotic stress responses, and pathogen responses via interaction with specific target mRNAs. Camellia sinensis is one of the most important commercial beverage crops in the world. However, miRNAs associated with cold stress tolerance in C. sinensis remains unexplored. The use of high-throughput sequencing can provide a much deeper understanding of miRNAs. To obtain more insight into the function of miRNAs in cold stress tolerance, Illumina sequencing of C. sinensis sRNA was conducted. Solexa sequencing technology was used for high-throughput sequencing of the small RNA library from the cold treatment of tea leaves. To align the sequencing data with known plant miRNAs, we characterized 106 conserved C. sinensis miRNAs. In addition, 215 potential candidate miRNAs were found, among, which 98 candidates with star sequences were chosen as novel miRNAs. Both congruously and differentially regulated miRNAs were obtained, and cultivar-specific miRNAs were identified by microarray-based hybridization in response to cold stress. The results were also confirmed by quantitative real-time polymerase chain reaction. To confirm the targets of miRNAs, two degradome libraries from two treatments were constructed. According to degradome sequencing, 455 and 591 genes were identified as cleavage targets of miRNAs from cold treatments and control libraries, respectively, and 283 targets were present in both libraries. Functional analysis of these miRNA targets indicated their involvement in important activities, such as development, regulation of transcription, and stress response. We discovered 31 up-regulated miRNAs and 43 down-regulated miRNAs in 'Yingshuang', and 46 up-regulated miRNA and 45 down-regulated miRNAs in 'Baiye 1' in response to cold stress, respectively. A

  12. High-Throughput Analysis of Enzyme Activities

    Energy Technology Data Exchange (ETDEWEB)

    Lu, Guoxin [Iowa State Univ., Ames, IA (United States)

    2007-01-01

    High-throughput screening (HTS) techniques have been applied to many research fields nowadays. Robot microarray printing technique and automation microtiter handling technique allows HTS performing in both heterogeneous and homogeneous formats, with minimal sample required for each assay element. In this dissertation, new HTS techniques for enzyme activity analysis were developed. First, patterns of immobilized enzyme on nylon screen were detected by multiplexed capillary system. The imaging resolution is limited by the outer diameter of the capillaries. In order to get finer images, capillaries with smaller outer diameters can be used to form the imaging probe. Application of capillary electrophoresis allows separation of the product from the substrate in the reaction mixture, so that the product doesn't have to have different optical properties with the substrate. UV absorption detection allows almost universal detection for organic molecules. Thus, no modifications of either the substrate or the product molecules are necessary. This technique has the potential to be used in screening of local distribution variations of specific bio-molecules in a tissue or in screening of multiple immobilized catalysts. Another high-throughput screening technique is developed by directly monitoring the light intensity of the immobilized-catalyst surface using a scientific charge-coupled device (CCD). Briefly, the surface of enzyme microarray is focused onto a scientific CCD using an objective lens. By carefully choosing the detection wavelength, generation of product on an enzyme spot can be seen by the CCD. Analyzing the light intensity change over time on an enzyme spot can give information of reaction rate. The same microarray can be used for many times. Thus, high-throughput kinetic studies of hundreds of catalytic reactions are made possible. At last, we studied the fluorescence emission spectra of ADP and obtained the detection limits for ADP under three different

  13. High-Throughput Analysis With 96-Capillary Array Electrophoresis and Integrated Sample Preparation for DNA Sequencing Based on Laser Induced Fluorescence Detection

    Energy Technology Data Exchange (ETDEWEB)

    Xue, Gang [Iowa State Univ., Ames, IA (United States)

    2001-01-01

    The purpose of this research was to improve the fluorescence detection for the multiplexed capillary array electrophoresis, extend its use beyond the genomic analysis, and to develop an integrated micro-sample preparation system for high-throughput DNA sequencing. The authors first demonstrated multiplexed capillary zone electrophoresis (CZE) and micellar electrokinetic chromatography (MEKC) separations in a 96-capillary array system with laser-induced fluorescence detection. Migration times of four kinds of fluoresceins and six polyaromatic hydrocarbons (PAHs) are normalized to one of the capillaries using two internal standards. The relative standard deviations (RSD) after normalization are 0.6-1.4% for the fluoresceins and 0.1-1.5% for the PAHs. Quantitative calibration of the separations based on peak areas is also performed, again with substantial improvement over the raw data. This opens up the possibility of performing massively parallel separations for high-throughput chemical analysis for process monitoring, combinatorial synthesis, and clinical diagnosis. The authors further improved the fluorescence detection by step laser scanning. A computer-controlled galvanometer scanner is adapted for scanning a focused laser beam across a 96-capillary array for laser-induced fluorescence detection. The signal at a single photomultiplier tube is temporally sorted to distinguish among the capillaries. The limit of detection for fluorescein is 3 x 10-11 M (S/N = 3) for 5-mW of total laser power scanned at 4 Hz. The observed cross-talk among capillaries is 0.2%. Advantages include the efficient utilization of light due to the high duty-cycle of step scan, good detection performance due to the reduction of stray light, ruggedness due to the small mass of the galvanometer mirror, low cost due to the simplicity of components, and flexibility due to the independent paths for excitation and emission.

  14. High-throughput DNA sequencing: a genomic data manufacturing process.

    Science.gov (United States)

    Huang, G M

    1999-01-01

    The progress trends in automated DNA sequencing operation are reviewed. Technological development in sequencing instruments, enzymatic chemistry and robotic stations has resulted in ever-increasing capacity of sequence data production. This progress leads to a higher demand on laboratory information management and data quality assessment. High-throughput laboratories face the challenge of organizational management, as well as technology management. Engineering principles of process control should be adopted in this biological data manufacturing procedure. While various systems attempt to provide solutions to automate different parts of, or even the entire process, new technical advances will continue to change the paradigm and provide new challenges.

  15. [Biological ingredient analysis of traditional Chinese medicines utilizing metagenomic approach based on high-throughput-sequencing and big-data-mining].

    Science.gov (United States)

    Bai, Hong; Ning, Kang; Wang, Chang-yun

    2015-03-01

    The quality of traditional Chinese medicines (TCMs) has been mainly evaluated based on chemical ingredients, yet recently more attentions have been paid on biological ingredients, especially for pill-based preparations. It is a key approach to establish a fast, accurate and systematic method of biological ingredient analysis for realization of modernization, industrialization and internationalization of TCMs. The biological ingredient analysis of TCM preparations could be abstracted as the identification of multiple species from a biological mixture. The metagenomic approach based on high-throughput-sequencing (HTS) and big-data-mining has been considered as one of the most effective methods for multiple species analysis of a biological mixture, which would also be helpful for the analysis of biological ingredients in TCMs. Simultaneous identification of diverse species, including the prescribed species, adulterants, toxic species, protected species and even the biological impurities introduced through production process, could be achieved by selecting appropriate DNA biomarkers, as well as applying large-scale sequence comparison and data mining. By this approach, it is prospective to offer an evaluation basis for the effectiveness, safety and legality of TCM preparations.

  16. Transcriptome-Wide Analysis of Botrytis elliptica Responsive microRNAs and Their Targets in Lilium Regale Wilson by High-Throughput Sequencing and Degradome Analysis

    Directory of Open Access Journals (Sweden)

    Xue Gao

    2017-05-01

    Full Text Available MicroRNAs, as master regulators of gene expression, have been widely identified and play crucial roles in plant-pathogen interactions. A fatal pathogen, Botrytis elliptica, causes the serious folia disease of lily, which reduces production because of the high susceptibility of most cultivated species. However, the miRNAs related to Botrytis infection of lily, and the miRNA-mediated gene regulatory networks providing resistance to B. elliptica in lily remain largely unexplored. To systematically dissect B. elliptica-responsive miRNAs and their target genes, three small RNA libraries were constructed from the leaves of Lilium regale, a promising Chinese wild Lilium species, which had been subjected to mock B. elliptica treatment or B. elliptica infection for 6 and 24 h. By high-throughput sequencing, 71 known miRNAs belonging to 47 conserved families and 24 novel miRNA were identified, of which 18 miRNAs were downreguleted and 13 were upregulated in response to B. elliptica. Moreover, based on the lily mRNA transcriptome, 22 targets for 9 known and 1 novel miRNAs were identified by the degradome sequencing approach. Most target genes for elliptica-responsive miRNAs were involved in metabolic processes, few encoding different transcription factors, including ELONGATION FACTOR 1 ALPHA (EF1a and TEOSINTE BRANCHED1/CYCLOIDEA/PROLIFERATING CELL FACTOR 2 (TCP2. Furthermore, the expression patterns of a set of elliptica-responsive miRNAs and their targets were validated by quantitative real-time PCR. This study represents the first transcriptome-based analysis of miRNAs responsive to B. elliptica and their targets in lily. The results reveal the possible regulatory roles of miRNAs and their targets in B. elliptica interaction, which will extend our understanding of the mechanisms of this disease in lily.

  17. High-throughput sequencing and pathway analysis reveal alteration of the pituitary transcriptome by 17α-ethynylestradiol (EE2) in female coho salmon, Oncorhynchus kisutch

    Energy Technology Data Exchange (ETDEWEB)

    Harding, Louisa B. [School of Aquatic and Fishery Sciences, University of Washington, Seattle, WA 98195 (United States); Schultz, Irvin R. [Battelle, Marine Sciences Laboratory – Pacific Northwest National Laboratory, 1529 West Sequim Bay Road, Sequim, WA 98382 (United States); Goetz, Giles W. [School of Aquatic and Fishery Sciences, University of Washington, Seattle, WA 98195 (United States); Luckenbach, J. Adam [Northwest Fisheries Science Center, National Marine Fisheries Service, National Oceanic and Atmospheric Administration, 2725 Montlake Blvd E, Seattle, WA 98112 (United States); Center for Reproductive Biology, Washington State University, Pullman, WA 98164 (United States); Young, Graham [School of Aquatic and Fishery Sciences, University of Washington, Seattle, WA 98195 (United States); Center for Reproductive Biology, Washington State University, Pullman, WA 98164 (United States); Goetz, Frederick W. [Northwest Fisheries Science Center, National Marine Fisheries Service, National Oceanic and Atmospheric Administration, Manchester Research Station, P.O. Box 130, Manchester, WA 98353 (United States); Swanson, Penny, E-mail: penny.swanson@noaa.gov [Northwest Fisheries Science Center, National Marine Fisheries Service, National Oceanic and Atmospheric Administration, 2725 Montlake Blvd E, Seattle, WA 98112 (United States); Center for Reproductive Biology, Washington State University, Pullman, WA 98164 (United States)

    2013-10-15

    Highlights: •Studied impacts of ethynylestradiol (EE2) exposure on salmon pituitary transcriptome. •High-throughput sequencing, RNAseq, and pathway analysis were performed. •EE2 altered mRNAs for genes in circadian rhythm, GnRH, and TGFβ signaling pathways. •LH and FSH beta subunit mRNAs were most highly up- and down-regulated by EE2, respectively. •Estrogens may alter processes associated with reproductive timing in salmon. -- Abstract: Considerable research has been done on the effects of endocrine disrupting chemicals (EDCs) on reproduction and gene expression in the brain, liver and gonads of teleost fish, but information on impacts to the pituitary gland are still limited despite its central role in regulating reproduction. The aim of this study was to further our understanding of the potential effects of natural and synthetic estrogens on the brain–pituitary–gonad axis in fish by determining the effects of 17α-ethynylestradiol (EE2) on the pituitary transcriptome. We exposed sub-adult coho salmon (Oncorhynchus kisutch) to 0 or 12 ng EE2/L for up to 6 weeks and effects on the pituitary transcriptome of females were assessed using high-throughput Illumina{sup ®} sequencing, RNA-Seq and pathway analysis. After 1 or 6 weeks, 218 and 670 contiguous sequences (contigs) respectively, were differentially expressed in pituitaries of EE2-exposed fish relative to control. Two of the most highly up- and down-regulated contigs were luteinizing hormone β subunit (241-fold and 395-fold at 1 and 6 weeks, respectively) and follicle-stimulating hormone β subunit (−3.4-fold at 6 weeks). Additional contigs related to gonadotropin synthesis and release were differentially expressed in EE2-exposed fish relative to controls. These included contigs involved in gonadotropin releasing hormone (GNRH) and transforming growth factor-β signaling. There was an over-representation of significantly affected contigs in 33 and 18 canonical pathways at 1 and 6 weeks

  18. Web-based visual analysis for high-throughput genomics.

    Science.gov (United States)

    Goecks, Jeremy; Eberhard, Carl; Too, Tomithy; Nekrutenko, Anton; Taylor, James

    2013-06-13

    Visualization plays an essential role in genomics research by making it possible to observe correlations and trends in large datasets as well as communicate findings to others. Visual analysis, which combines visualization with analysis tools to enable seamless use of both approaches for scientific investigation, offers a powerful method for performing complex genomic analyses. However, there are numerous challenges that arise when creating rich, interactive Web-based visualizations/visual analysis applications for high-throughput genomics. These challenges include managing data flow from Web server to Web browser, integrating analysis tools and visualizations, and sharing visualizations with colleagues. We have created a platform simplifies the creation of Web-based visualization/visual analysis applications for high-throughput genomics. This platform provides components that make it simple to efficiently query very large datasets, draw common representations of genomic data, integrate with analysis tools, and share or publish fully interactive visualizations. Using this platform, we have created a Circos-style genome-wide viewer, a generic scatter plot for correlation analysis, an interactive phylogenetic tree, a scalable genome browser for next-generation sequencing data, and an application for systematically exploring tool parameter spaces to find good parameter values. All visualizations are interactive and fully customizable. The platform is integrated with the Galaxy (http://galaxyproject.org) genomics workbench, making it easy to integrate new visual applications into Galaxy. Visualization and visual analysis play an important role in high-throughput genomics experiments, and approaches are needed to make it easier to create applications for these activities. Our framework provides a foundation for creating Web-based visualizations and integrating them into Galaxy. Finally, the visualizations we have created using the framework are useful tools for high-throughput

  19. De novo Transcriptome Analysis of Chinese Citrus Fly, Bactrocera minax (Diptera: Tephritidae, by High-Throughput Illumina Sequencing.

    Directory of Open Access Journals (Sweden)

    Jia Wang

    Full Text Available The Chinese citrus fly, Bactrocera minax (Enderlein, is one of the most devastating pests of citrus in the temperate areas of Asia. So far, studies involving molecular biology and physiology of B. minax are still scarce, partly because of the lack of genomic information and inability to rear this insect in laboratory. In this study, de novo assembly of a transcriptome was performed using Illumina sequencing technology. A total of 20,928,907 clean reads were obtained and assembled into 33,324 unigenes, with an average length of 908.44 bp. Unigenes were annotated by alignment against NCBI non-redundant protein (Nr, Swiss-Prot, Clusters of Orthologous Groups (COG, Gene Ontology (GO, and Kyoto Encyclopedia of Genes and Genomes Pathway (KEGG database. Genes potentially involved in stress tolerance, including 20 heat shock protein (Hsps genes, 26 glutathione S-transferases (GSTs genes, and 2 ferritin subunit genes, were identified. These genes may play roles in stress tolerance in B. minax diapause stage. It has previously been found that 20E application on B. minax pupae could avert diapause, but the underlying mechanisms remain unknown. Thus, genes encoding enzymes in 20E biosynthesis pathway, including Neverland, Spook, Phantom, Disembodied, Shadow, Shade, and Cyp18a1, and genes encoding 20E receptor proteins, ecdysone receptor (EcR and ultraspiracle (USP, were identified. The expression patterns of 20E-related genes among developmental stages and between 20E-treated and untreated pupae demonstrated their roles in diapause program. In addition, 1,909 simple sequence repeats (SSRs were detected, which will contribute to molecular marker development. The findings in this study greatly improve our genetic understanding of B. minax, and lay the foundation for future studies on this species.

  20. De novo Transcriptome Analysis of Chinese Citrus Fly, Bactrocera minax (Diptera: Tephritidae), by High-Throughput Illumina Sequencing.

    Science.gov (United States)

    Wang, Jia; Xiong, Ke-Cai; Liu, Ying-Hong

    2016-01-01

    The Chinese citrus fly, Bactrocera minax (Enderlein), is one of the most devastating pests of citrus in the temperate areas of Asia. So far, studies involving molecular biology and physiology of B. minax are still scarce, partly because of the lack of genomic information and inability to rear this insect in laboratory. In this study, de novo assembly of a transcriptome was performed using Illumina sequencing technology. A total of 20,928,907 clean reads were obtained and assembled into 33,324 unigenes, with an average length of 908.44 bp. Unigenes were annotated by alignment against NCBI non-redundant protein (Nr), Swiss-Prot, Clusters of Orthologous Groups (COG), Gene Ontology (GO), and Kyoto Encyclopedia of Genes and Genomes Pathway (KEGG) database. Genes potentially involved in stress tolerance, including 20 heat shock protein (Hsps) genes, 26 glutathione S-transferases (GSTs) genes, and 2 ferritin subunit genes, were identified. These genes may play roles in stress tolerance in B. minax diapause stage. It has previously been found that 20E application on B. minax pupae could avert diapause, but the underlying mechanisms remain unknown. Thus, genes encoding enzymes in 20E biosynthesis pathway, including Neverland, Spook, Phantom, Disembodied, Shadow, Shade, and Cyp18a1, and genes encoding 20E receptor proteins, ecdysone receptor (EcR) and ultraspiracle (USP), were identified. The expression patterns of 20E-related genes among developmental stages and between 20E-treated and untreated pupae demonstrated their roles in diapause program. In addition, 1,909 simple sequence repeats (SSRs) were detected, which will contribute to molecular marker development. The findings in this study greatly improve our genetic understanding of B. minax, and lay the foundation for future studies on this species.

  1. Identification and characterization of microRNAs related to salt stress in broccoli, using high-throughput sequencing and bioinformatics analysis.

    Science.gov (United States)

    Tian, Yunhong; Tian, Yunming; Luo, Xiaojun; Zhou, Tao; Huang, Zuoping; Liu, Ying; Qiu, Yihan; Hou, Bing; Sun, Dan; Deng, Hongyu; Qian, Shen; Yao, Kaitai

    2014-09-03

    MicroRNAs (miRNAs) are a new class of endogenous regulators of a broad range of physiological processes, which act by regulating gene expression post-transcriptionally. The brassica vegetable, broccoli (Brassica oleracea var. italica), is very popular with a wide range of consumers, but environmental stresses such as salinity are a problem worldwide in restricting its growth and yield. Little is known about the role of miRNAs in the response of broccoli to salt stress. In this study, broccoli subjected to salt stress and broccoli grown under control conditions were analyzed by high-throughput sequencing. Differential miRNA expression was confirmed by real-time reverse transcription polymerase chain reaction (RT-PCR). The prediction of miRNA targets was undertaken using the Kyoto Encyclopedia of Genes and Genomes (KEGG) Orthology (KO) database and Gene Ontology (GO)-enrichment analyses. Two libraries of small (or short) RNAs (sRNAs) were constructed and sequenced by high-throughput Solexa sequencing. A total of 24,511,963 and 21,034,728 clean reads, representing 9,861,236 (40.23%) and 8,574,665 (40.76%) unique reads, were obtained for control and salt-stressed broccoli, respectively. Furthermore, 42 putative known and 39 putative candidate miRNAs that were differentially expressed between control and salt-stressed broccoli were revealed by their read counts and confirmed by the use of stem-loop real-time RT-PCR. Amongst these, the putative conserved miRNAs, miR393 and miR855, and two putative candidate miRNAs, miR3 and miR34, were the most strongly down-regulated when broccoli was salt-stressed, whereas the putative conserved miRNA, miR396a, and the putative candidate miRNA, miR37, were the most up-regulated. Finally, analysis of the predicted gene targets of miRNAs using the GO and KO databases indicated that a range of metabolic and other cellular functions known to be associated with salt stress were up-regulated in broccoli treated with salt. A comprehensive

  2. The simple fool's guide to population genomics via RNA-Seq: An introduction to high-throughput sequencing data analysis

    DEFF Research Database (Denmark)

    De Wit, P.; Pespeni, M.H.; Ladner, J.T.

    2012-01-01

    ://sfg.stanford.edu, that includes detailed protocols for data processing and analysis, along with a repository of custom-made scripts and sample files. Steps included in the SFG range from tissue collection to de novo assembly, blast annotation, alignment, gene expression, functional enrichment, SNP detection, principal components...

  3. Systematic Analysis of the Association between Gut Flora and Obesity through High-Throughput Sequencing and Bioinformatics Approaches

    Directory of Open Access Journals (Sweden)

    Chih-Min Chiu

    2014-01-01

    Full Text Available Eighty-one stool samples from Taiwanese were collected for analysis of the association between the gut flora and obesity. The supervised analysis showed that the most, abundant genera of bacteria in normal samples (from people with a body mass index (BMI ≤ 24 were Bacteroides (27.7%, Prevotella (19.4%, Escherichia (12%, Phascolarctobacterium (3.9%, and Eubacterium (3.5%. The most abundant genera of bacteria in case samples (with a BMI ≥ 27 were Bacteroides (29%, Prevotella (21%, Escherichia (7.4%, Megamonas (5.1%, and Phascolarctobacterium (3.8%. A principal coordinate analysis (PCoA demonstrated that normal samples were clustered more compactly than case samples. An unsupervised analysis demonstrated that bacterial communities in the gut were clustered into two main groups: N-like and OB-like groups. Remarkably, most normal samples (78% were clustered in the N-like group, and most case samples (81% were clustered in the OB-like group (Fisher’s P  value=1.61E-07. The results showed that bacterial communities in the gut were highly associated with obesity. This is the first study in Taiwan to investigate the association between human gut flora and obesity, and the results provide new insights into the correlation of bacteria with the rising trend in obesity.

  4. Systematic analysis of the association between gut flora and obesity through high-throughput sequencing and bioinformatics approaches.

    Science.gov (United States)

    Chiu, Chih-Min; Huang, Wei-Chih; Weng, Shun-Long; Tseng, Han-Chi; Liang, Chao; Wang, Wei-Chi; Yang, Ting; Yang, Tzu-Ling; Weng, Chen-Tsung; Chang, Tzu-Hao; Huang, Hsien-Da

    2014-01-01

    Eighty-one stool samples from Taiwanese were collected for analysis of the association between the gut flora and obesity. The supervised analysis showed that the most, abundant genera of bacteria in normal samples (from people with a body mass index (BMI) ≤ 24) were Bacteroides (27.7%), Prevotella (19.4%), Escherichia (12%), Phascolarctobacterium (3.9%), and Eubacterium (3.5%). The most abundant genera of bacteria in case samples (with a BMI ≥ 27) were Bacteroides (29%), Prevotella (21%), Escherichia (7.4%), Megamonas (5.1%), and Phascolarctobacterium (3.8%). A principal coordinate analysis (PCoA) demonstrated that normal samples were clustered more compactly than case samples. An unsupervised analysis demonstrated that bacterial communities in the gut were clustered into two main groups: N-like and OB-like groups. Remarkably, most normal samples (78%) were clustered in the N-like group, and most case samples (81%) were clustered in the OB-like group (Fisher's P  value = 1.61E - 07). The results showed that bacterial communities in the gut were highly associated with obesity. This is the first study in Taiwan to investigate the association between human gut flora and obesity, and the results provide new insights into the correlation of bacteria with the rising trend in obesity.

  5. 76 FR 28990 - Ultra High Throughput Sequencing for Clinical Diagnostic Applications-Approaches To Assess...

    Science.gov (United States)

    2011-05-19

    ... Clinical Diagnostic Applications--Approaches To Assess Analytical Validity.'' The purpose of the public... approaches to assess analytical validity of ultra high throughput sequencing for clinical diagnostic... HUMAN SERVICES Food and Drug Administration Ultra High Throughput Sequencing for Clinical Diagnostic...

  6. Network analysis of the microorganism in 25 Danish wastewater treatment plants over 7 years using high-throughput amplicon sequencing

    DEFF Research Database (Denmark)

    Albertsen, Mads; Larsen, Poul; Saunders, Aaron Marc

    to link sludge and floc properties to the microbial communities. All data was subjected to extensive network analysis and multivariate statistics through R. The 16S amplicon results confirmed the findings of relatively few core groups of organism shared by all the wastewater treatment plants......Wastewater treatment is the world’s largest biotechnological processes and a perfect model system for microbial ecology as the habitat is well defined and replicated all over the world. Extensive investigations on Danish wastewater treatment plants using fluorescent in situ hybridization have...... identified 38 probe-defined core genera, which are shared among all investigated Danish plants. A large body of knowledge exists on many of the core genera, however few attempts have been made to integrate the knowledge on a system-level understanding of the process. In this work we aimed to integrate...

  7. High-throughput sequencing: a roadmap toward community ecology.

    Science.gov (United States)

    Poisot, Timothée; Péquin, Bérangère; Gravel, Dominique

    2013-04-01

    High-throughput sequencing is becoming increasingly important in microbial ecology, yet it is surprisingly under-used to generate or test biogeographic hypotheses. In this contribution, we highlight how adding these methods to the ecologist toolbox will allow the detection of new patterns, and will help our understanding of the structure and dynamics of diversity. Starting with a review of ecological questions that can be addressed, we move on to the technical and analytical issues that will benefit from an increased collaboration between different disciplines.

  8. Comprehensive processing of high-throughput small RNA sequencing data including quality checking, normalization, and differential expression analysis using the UEA sRNA Workbench.

    Science.gov (United States)

    Beckers, Matthew; Mohorianu, Irina; Stocks, Matthew; Applegate, Christopher; Dalmay, Tamas; Moulton, Vincent

    2017-06-01

    Recently, high-throughput sequencing (HTS) has revealed compelling details about the small RNA (sRNA) population in eukaryotes. These 20 to 25 nt noncoding RNAs can influence gene expression by acting as guides for the sequence-specific regulatory mechanism known as RNA silencing. The increase in sequencing depth and number of samples per project enables a better understanding of the role sRNAs play by facilitating the study of expression patterns. However, the intricacy of the biological hypotheses coupled with a lack of appropriate tools often leads to inadequate mining of the available data and thus, an incomplete description of the biological mechanisms involved. To enable a comprehensive study of differential expression in sRNA data sets, we present a new interactive pipeline that guides researchers through the various stages of data preprocessing and analysis. This includes various tools, some of which we specifically developed for sRNA analysis, for quality checking and normalization of sRNA samples as well as tools for the detection of differentially expressed sRNAs and identification of the resulting expression patterns. The pipeline is available within the UEA sRNA Workbench, a user-friendly software package for the processing of sRNA data sets. We demonstrate the use of the pipeline on a H. sapiens data set; additional examples on a B. terrestris data set and on an A. thaliana data set are described in the Supplemental Information A comparison with existing approaches is also included, which exemplifies some of the issues that need to be addressed for sRNA analysis and how the new pipeline may be used to do this. © 2017 Beckers et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.

  9. An improved high throughput sequencing method for studying oomycete communities

    DEFF Research Database (Denmark)

    Sapkota, Rumakanta; Nicolaisen, Mogens

    2015-01-01

    Culture-independent studies using next generation sequencing have revolutionizedmicrobial ecology, however, oomycete ecology in soils is severely lagging behind. The aimof this study was to improve and validate standard techniques for using high throughput sequencing as a tool for studying oomycete...... agricultural fields in Denmark, and 11 samples from carrot tissue with symptoms of Pythium infection. Sequence data from the Pythium and Phytophthora mock communities showed that our strategy successfully detected all included species. Taxonomic assignments of OTUs from 26 soil sample showed that 95...... the usefulness of the method not only in soil DNA but also in a plant DNA background. In conclusion, we demonstrate a successful approach for pyrosequencing of oomycete communities using ITS1 as the barcode sequence with well-known primers for oomycete DNA amplification....

  10. MetaGenSense: A web-application for analysis and exploration of high throughput sequencing metagenomic data [version 3; referees: 1 approved, 2 approved with reservations

    Directory of Open Access Journals (Sweden)

    Damien Correia

    2016-12-01

    Full Text Available The detection and characterization of emerging infectious agents has been a continuing public health concern. High Throughput Sequencing (HTS or Next-Generation Sequencing (NGS technologies have proven to be promising approaches for efficient and unbiased detection of pathogens in complex biological samples, providing access to comprehensive analyses. As NGS approaches typically yield millions of putatively representative reads per sample, efficient data management and visualization resources have become mandatory. Most usually, those resources are implemented through a dedicated Laboratory Information Management System (LIMS, solely to provide perspective regarding the available information. We developed an easily deployable web-interface, facilitating management and bioinformatics analysis of metagenomics data-samples. It was engineered to run associated and dedicated Galaxy workflows for the detection and eventually classification of pathogens. The web application allows easy interaction with existing Galaxy metagenomic workflows, facilitates the organization, exploration and aggregation of the most relevant sample-specific sequences among millions of genomic sequences, allowing them to determine their relative abundance, and associate them to the most closely related organism or pathogen. The user-friendly Django-Based interface, associates the users’ input data and its metadata through a bio-IT provided set of resources (a Galaxy instance, and both sufficient storage and grid computing power. Galaxy is used to handle and analyze the user’s input data from loading, indexing, mapping, assembly and DB-searches. Interaction between our application and Galaxy is ensured by the BioBlend library, which gives API-based access to Galaxy’s main features. Metadata about samples, runs, as well as the workflow results are stored in the LIMS. For metagenomic classification and exploration purposes, we show, as a proof of concept, that integration

  11. MetaGenSense: A web-application for analysis and exploration of high throughput sequencing metagenomic data [version 2; referees: 1 approved, 2 approved with reservations

    Directory of Open Access Journals (Sweden)

    Damien Correia

    2016-08-01

    Full Text Available The detection and characterization of emerging infectious agents has been a continuing public health concern. High Throughput Sequencing (HTS or Next-Generation Sequencing (NGS technologies have proven to be promising approaches for efficient and unbiased detection of pathogens in complex biological samples, providing access to comprehensive analyses. As NGS approaches typically yield millions of putatively representative reads per sample, efficient data management and visualization resources have become mandatory. Most usually, those resources are implemented through a dedicated Laboratory Information Management System (LIMS, solely to provide perspective regarding the available information. We developed an easily deployable web-interface, facilitating management and bioinformatics analysis of metagenomics data-samples. It was engineered to run associated and dedicated Galaxy workflows for the detection and eventually classification of pathogens. The web application allows easy interaction with existing Galaxy metagenomic workflows, facilitates the organization, exploration and aggregation of the most relevant sample-specific sequences among millions of genomic sequences, allowing them to determine their relative abundance, and associate them to the most closely related organism or pathogen. The user-friendly Django-Based interface, associates the users’ input data and its metadata through a bio-IT provided set of resources (a Galaxy instance, and both sufficient storage and grid computing power. Galaxy is used to handle and analyze the user’s input data from loading, indexing, mapping, assembly and DB-searches. Interaction between our application and Galaxy is ensured by the BioBlend library, which gives API-based access to Galaxy’s main features. Metadata about samples, runs, as well as the workflow results are stored in the LIMS. For metagenomic classification and exploration purposes, we show, as a proof of concept, that integration

  12. Fusion genes and their discovery using high throughput sequencing.

    Science.gov (United States)

    Annala, M J; Parker, B C; Zhang, W; Nykter, M

    2013-11-01

    Fusion genes are hybrid genes that combine parts of two or more original genes. They can form as a result of chromosomal rearrangements or abnormal transcription, and have been shown to act as drivers of malignant transformation and progression in many human cancers. The biological significance of fusion genes together with their specificity to cancer cells has made them into excellent targets for molecular therapy. Fusion genes are also used as diagnostic and prognostic markers to confirm cancer diagnosis and monitor response to molecular therapies. High-throughput sequencing has enabled the systematic discovery of fusion genes in a wide variety of cancer types. In this review, we describe the history of fusion genes in cancer and the ways in which fusion genes form and affect cellular function. We also describe computational methodologies for detecting fusion genes from high-throughput sequencing experiments, and the most common sources of error that lead to false discovery of fusion genes. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  13. Identification of DNA sequence variation in Campylobacter jejuni strains associated with the Guillain-Barré syndrome by high-throughput AFLP analysis

    Directory of Open Access Journals (Sweden)

    Endtz Hubert P

    2006-04-01

    Full Text Available Abstract Background Campylobacter jejuni is the predominant cause of antecedent infection in post-infectious neuropathies such as the Guillain-Barré (GBS and Miller Fisher syndromes (MFS. GBS and MFS are probably induced by molecular mimicry between human gangliosides and bacterial lipo-oligosaccharides (LOS. This study describes a new C. jejuni-specific high-throughput AFLP (htAFLP approach for detection and identification of DNA polymorphism, in general, and of putative GBS/MFS-markers, in particular. Results We compared 6 different isolates of the "genome strain" NCTC 11168 obtained from different laboratories. HtAFLP analysis generated approximately 3000 markers per stain, 19 of which were polymorphic. The DNA polymorphisms could not be confirmed by PCR-RFLP analysis, suggesting a baseline level of 0.6% AFLP artefacts. Comparison of NCTC 11168 with 4 GBS-associated strains revealed 23 potentially GBS-specific markers, 17 of which were identified by DNA sequencing. A collection of 27 GBS/MFS-associated and 17 enteritis control strains was analyzed with PCR-RFLP tests based on 11 of these markers. We identified 3 markers, located in the LOS biosynthesis genes cj1136, cj1138 and cj1139c, that were significantly associated with GBS (P = 0.024, P = 0.047 and P Conclusion This study shows that bacterial GBS markers are limited in number and located in the LOS biosynthesis genes, which corroborates the current consensus that LOS mimicry may be the prime etiologic determinant of GBS. Furthermore, our results demonstrate that htAFLP, with its high reproducibility and resolution, is an effective technique for the detection and subsequent identification of putative bacterial disease markers.

  14. Identification of novel and conserved miRNAs involved in pollen development in Brassica campestris ssp. chinensis by high-throughput sequencing and degradome analysis

    Science.gov (United States)

    2014-01-01

    Background microRNAs (miRNAs) are endogenous, noncoding, small RNAs that have essential regulatory functions in plant growth, development, and stress response processes. However, limited information is available about their functions in sexual reproduction of flowering plants. Pollen development is an important process in the life cycle of a flowering plant and is a major factor that affects the yield and quality of crop seeds. Results This study aims to identify miRNAs involved in pollen development. Two independent small RNA libraries were constructed from the flower buds of the male sterile line (Bcajh97-01A) and male fertile line (Bcajh97-01B) of Brassica campestris ssp. chinensis. The libraries were subjected to high-throughput sequencing by using the Illumina Solexa system. Eight novel miRNAs on the other arm of known pre-miRNAs, 54 new conserved miRNAs, and 8 novel miRNA members were identified. Twenty-five pairs of novel miRNA/miRNA* were found. Among all the identified miRNAs, 18 differentially expressed miRNAs with over two-fold change between flower buds of male sterile line (Bcajh97-01A) and male fertile line (Bcajh97-01B) were identified. qRT-PCR analysis revealed that most of the differentially expressed miRNAs were preferentially expressed in flower buds of the male fertile line (Bcajh97-01B). Degradome analysis showed that a total of 15 genes were predicted to be the targets of seven miRNAs. Conclusions Our findings provide an overview of potential miRNAs involved in pollen development and interactions between miRNAs and their corresponding targets, which may provide important clues on the function of miRNAs in pollen development. PMID:24559317

  15. A comprehensive analysis of in vitro and in vivo genetic fitness of Pseudomonas aeruginosa using high-throughput sequencing of transposon libraries.

    Directory of Open Access Journals (Sweden)

    David Skurnik

    Full Text Available High-throughput sequencing of transposon (Tn libraries created within entire genomes identifies and quantifies the contribution of individual genes and operons to the fitness of organisms in different environments. We used insertion-sequencing (INSeq to analyze the contribution to fitness of all non-essential genes in the chromosome of Pseudomonas aeruginosa strain PA14 based on a library of ∼300,000 individual Tn insertions. In vitro growth in LB provided a baseline for comparison with the survival of the Tn insertion strains following 6 days of colonization of the murine gastrointestinal tract as well as a comparison with Tn-inserts subsequently able to systemically disseminate to the spleen following induction of neutropenia. Sequencing was performed following DNA extraction from the recovered bacteria, digestion with the MmeI restriction enzyme that hydrolyzes DNA 16 bp away from the end of the Tn insert, and fractionation into oligonucleotides of 1,200-1,500 bp that were prepared for high-throughput sequencing. Changes in frequency of Tn inserts into the P. aeruginosa genome were used to quantify in vivo fitness resulting from loss of a gene. 636 genes had <10 sequencing reads in LB, thus defined as unable to grow in this medium. During in vivo infection there were major losses of strains with Tn inserts in almost all known virulence factors, as well as respiration, energy utilization, ion pumps, nutritional genes and prophages. Many new candidates for virulence factors were also identified. There were consistent changes in the recovery of Tn inserts in genes within most operons and Tn insertions into some genes enhanced in vivo fitness. Strikingly, 90% of the non-essential genes were required for in vivo survival following systemic dissemination during neutropenia. These experiments resulted in the identification of the P. aeruginosa strain PA14 genes necessary for optimal survival in the mucosal and systemic environments of a mammalian

  16. High-throughput sequencing and analysis of the gill tissue transcriptome from the deep-sea hydrothermal vent mussel Bathymodiolus azoricus

    Directory of Open Access Journals (Sweden)

    Gomes Paula

    2010-10-01

    Full Text Available Abstract Background Bathymodiolus azoricus is a deep-sea hydrothermal vent mussel found in association with large faunal communities living in chemosynthetic environments at the bottom of the sea floor near the Azores Islands. Investigation of the exceptional physiological reactions that vent mussels have adopted in their habitat, including responses to environmental microbes, remains a difficult challenge for deep-sea biologists. In an attempt to reveal genes potentially involved in the deep-sea mussel innate immunity we carried out a high-throughput sequence analysis of freshly collected B. azoricus transcriptome using gills tissues as the primary source of immune transcripts given its strategic role in filtering the surrounding waterborne potentially infectious microorganisms. Additionally, a substantial EST data set was produced and from which a comprehensive collection of genes coding for putative proteins was organized in a dedicated database, "DeepSeaVent" the first deep-sea vent animal transcriptome database based on the 454 pyrosequencing technology. Results A normalized cDNA library from gills tissue was sequenced in a full 454 GS-FLX run, producing 778,996 sequencing reads. Assembly of the high quality reads resulted in 75,407 contigs of which 3,071 were singletons. A total of 39,425 transcripts were conceptually translated into amino-sequences of which 22,023 matched known proteins in the NCBI non-redundant protein database, 15,839 revealed conserved protein domains through InterPro functional classification and 9,584 were assigned with Gene Ontology terms. Queries conducted within the database enabled the identification of genes putatively involved in immune and inflammatory reactions which had not been previously evidenced in the vent mussel. Their physical counterpart was confirmed by semi-quantitative quantitative Reverse-Transcription-Polymerase Chain Reactions (RT-PCR and their RNA transcription level by quantitative PCR (q

  17. Large scale library generation for high throughput sequencing.

    Directory of Open Access Journals (Sweden)

    Erik Borgström

    Full Text Available BACKGROUND: Large efforts have recently been made to automate the sample preparation protocols for massively parallel sequencing in order to match the increasing instrument throughput. Still, the size selection through agarose gel electrophoresis separation is a labor-intensive bottleneck of these protocols. METHODOLOGY/PRINCIPAL FINDINGS: In this study a method for automatic library preparation and size selection on a liquid handling robot is presented. The method utilizes selective precipitation of certain sizes of DNA molecules on to paramagnetic beads for cleanup and selection after standard enzymatic reactions. CONCLUSIONS/SIGNIFICANCE: The method is used to generate libraries for de novo and re-sequencing on the Illumina HiSeq 2000 instrument with a throughput of 12 samples per instrument in approximately 4 hours. The resulting output data show quality scores and pass filter rates comparable to manually prepared samples. The sample size distribution can be adjusted for each application, and are suitable for all high throughput DNA processing protocols seeking to control size intervals.

  18. The promise and challenge of high-throughput sequencing of the antibody repertoire

    Science.gov (United States)

    Georgiou, George; Ippolito, Gregory C; Beausang, John; Busse, Christian E; Wardemann, Hedda; Quake, Stephen R

    2014-01-01

    Efforts to determine the antibody repertoire encoded by B cells in the blood or lymphoid organs using high-throughput DNA sequencing technologies have been advancing at an extremely rapid pace and are transforming our understanding of humoral immune responses. Information gained from high-throughput DNA sequencing of immunoglobulin genes (Ig-seq) can be applied to detect B-cell malignancies with high sensitivity, to discover antibodies specific for antigens of interest, to guide vaccine development and to understand autoimmunity. Rapid progress in the development of experimental protocols and informatics analysis tools is helping to reduce sequencing artifacts, to achieve more precise quantification of clonal diversity and to extract the most pertinent biological information. That said, broader application of Ig-seq, especially in clinical settings, will require the development of a standardized experimental design framework that will enable the sharing and meta-analysis of sequencing data generated by different laboratories. PMID:24441474

  19. Comprehensive molecular diagnosis of Bardet-Biedl syndrome by high-throughput targeted exome sequencing.

    Directory of Open Access Journals (Sweden)

    Dong-Jun Xing

    Full Text Available Bardet-Biedl syndrome (BBS is an autosomal recessive disorder with significant genetic heterogeneity. BBS is linked to mutations in 17 genes, which contain more than 200 coding exons. Currently, BBS is diagnosed by direct DNA sequencing for mutations in these genes, which because of the large genomic screening region is both time-consuming and expensive. In order to develop a practical method for the clinic diagnosis of BBS, we have developed a high-throughput targeted exome sequencing (TES for genetic diagnosis. Five typical BBS patients were recruited and screened for mutations in a total of 144 known genes responsible for inherited retinal diseases, a hallmark symptom of BBS. The genomic DNA of these patients and their families were subjected to high-throughput DNA re-sequencing. Deep bioinformatics analysis was carried out to filter the massive sequencing data, which were further confirmed through co-segregation analysis. TES successfully revealed mutations in BBS genes in each patient and family member. Six pathological mutations, including five novel mutations, were revealed in the genes BBS2, MKKS, ARL6, MKS1. This study represents the first report of targeted exome sequencing in BBS patients and demonstrates that high-throughput TES is an accurate and rapid method for the genetic diagnosis of BBS.

  20. A simple, high throughput method to locate single copy sequences from Bacterial Artificial Chromosome (BAC libraries using High Resolution Melt analysis

    Directory of Open Access Journals (Sweden)

    Caligari Peter DS

    2010-05-01

    Full Text Available Abstract Background The high-throughput anchoring of genetic markers into contigs is required for many ongoing physical mapping projects. Multidimentional BAC pooling strategies for PCR-based screening of large insert libraries is a widely used alternative to high density filter hybridisation of bacterial colonies. To date, concerns over reliability have led most if not all groups engaged in high throughput physical mapping projects to favour BAC DNA isolation prior to amplification by conventional PCR. Results Here, we report the first combined use of Multiplex Tandem PCR (MT-PCR and High Resolution Melt (HRM analysis on bacterial stocks of BAC library superpools as a means of rapidly anchoring markers to BAC colonies and thereby to integrate genetic and physical maps. We exemplify the approach using a BAC library of the model plant Arabidopsis thaliana. Super pools of twenty five 384-well plates and two-dimension matrix pools of the BAC library were prepared for marker screening. The entire procedure only requires around 3 h to anchor one marker. Conclusions A pre-amplification step during MT-PCR allows high multiplexing and increases the sensitivity and reliability of subsequent HRM discrimination. This simple gel-free protocol is more reliable, faster and far less costly than conventional PCR screening. The option to screen in parallel 3 genetic markers in one MT-PCR-HRM reaction using templates from directly pooled bacterial stocks of BAC-containing bacteria further reduces time for anchoring markers in physical maps of species with large genomes.

  1. The Microbiome and Metabolites in Fermented Pu-erh Tea as Revealed by High-Throughput Sequencing and Quantitative Multiplex Metabolite Analysis.

    Directory of Open Access Journals (Sweden)

    Yongjie Zhang

    Full Text Available Pu-erh is a tea produced in Yunnan, China by microbial fermentation of fresh Camellia sinensis leaves by two processes, the traditional raw fermentation and the faster, ripened fermentation. We characterized fungal and bacterial communities in leaves and both Pu-erhs by high-throughput, rDNA-amplicon sequencing and we characterized the profile of bioactive extrolite mycotoxins in Pu-erh teas by quantitative liquid chromatography-tandem mass spectrometry. We identified 390 fungal and 629 bacterial OTUs from leaves and both Pu-erhs. Major findings are: 1 fungal diversity drops and bacterial diversity rises due to raw or ripened fermentation, 2 fungal and bacterial community composition changes significantly between fresh leaves and both raw and ripened Pu-erh, 3 aging causes significant changes in the microbial community of raw, but not ripened, Pu-erh, and, 4 ripened and well-aged raw Pu-erh have similar microbial communities that are distinct from those of young, raw Ph-erh tea. Twenty-five toxic metabolites, mainly of fungal origin, were detected, with patulin and asperglaucide dominating and at levels supporting the Chinese custom of discarding the first preparation of Pu-erh and using the wet tea to then brew a pot for consumption.

  2. Identification and Analysis of Red Sea Mangrove (Avicennia marina) microRNAs by High-Throughput Sequencing and Their Association with Stress Responses

    KAUST Repository

    Khraiwesh, Basel

    2013-04-08

    Although RNA silencing has been studied primarily in model plants, advances in high-throughput sequencing technologies have enabled profiling of the small RNA components of many more plant species, providing insights into the ubiquity and conservatism of some miRNA-based regulatory mechanisms. Small RNAs of 20 to 24 nucleotides (nt) are important regulators of gene transcript levels by either transcriptional or by posttranscriptional gene silencing, contributing to genome maintenance and controlling a variety of developmental and physiological processes. Here, we used deep sequencing and molecular methods to create an inventory of the small RNAs in the mangrove species, Avicennia marina. We identified 26 novel mangrove miRNAs and 193 conserved miRNAs belonging to 36 families. We determined that 2 of the novel miRNAs were produced from known miRNA precursors and 4 were likely to be species-specific by the criterion that we found no homologs in other plant species. We used qRT-PCR to analyze the expression of miRNAs and their target genes in different tissue sets and some demonstrated tissue-specific expression. Furthermore, we predicted potential targets of these putative miRNAs based on a sequence homology and experimentally validated through endonucleolytic cleavage assays. Our results suggested that expression profiles of miRNAs and their predicted targets could be useful in exploring the significance of the conservation patterns of plants, particularly in response to abiotic stress. Because of their well-developed abilities in this regard, mangroves and other extremophiles are excellent models for such exploration. © 2013 Khraiwesh et al.

  3. Identification and analysis of red sea mangrove (Avicennia marina microRNAs by high-throughput sequencing and their association with stress responses.

    Directory of Open Access Journals (Sweden)

    Basel Khraiwesh

    Full Text Available Although RNA silencing has been studied primarily in model plants, advances in high-throughput sequencing technologies have enabled profiling of the small RNA components of many more plant species, providing insights into the ubiquity and conservatism of some miRNA-based regulatory mechanisms. Small RNAs of 20 to 24 nucleotides (nt are important regulators of gene transcript levels by either transcriptional or by posttranscriptional gene silencing, contributing to genome maintenance and controlling a variety of developmental and physiological processes. Here, we used deep sequencing and molecular methods to create an inventory of the small RNAs in the mangrove species, Avicennia marina. We identified 26 novel mangrove miRNAs and 193 conserved miRNAs belonging to 36 families. We determined that 2 of the novel miRNAs were produced from known miRNA precursors and 4 were likely to be species-specific by the criterion that we found no homologs in other plant species. We used qRT-PCR to analyze the expression of miRNAs and their target genes in different tissue sets and some demonstrated tissue-specific expression. Furthermore, we predicted potential targets of these putative miRNAs based on a sequence homology and experimentally validated through endonucleolytic cleavage assays. Our results suggested that expression profiles of miRNAs and their predicted targets could be useful in exploring the significance of the conservation patterns of plants, particularly in response to abiotic stress. Because of their well-developed abilities in this regard, mangroves and other extremophiles are excellent models for such exploration.

  4. Biphasic Study to Characterize Agricultural Biogas Plants by High-Throughput 16S rRNA Gene Amplicon Sequencing and Microscopic Analysis.

    Science.gov (United States)

    Maus, Irena; Kim, Yong Sung; Wibberg, Daniel; Stolze, Yvonne; Off, Sandra; Antonczyk, Sebastian; Pühler, Alfred; Scherer, Paul; Schlüter, Andreas

    2017-02-28

    Process surveillance within agricultural biogas plants (BGPs) was concurrently studied by high-throughput 16S rRNA gene amplicon sequencing and an optimized quantitative microscopic fingerprinting (QMF) technique. In contrast to 16S rRNA gene amplicons, digitalized microscopy is a rapid and cost-effective method that facilitates enumeration and morphological differentiation of the most significant groups of methanogens regarding their shape and characteristic autofluorescent factor 420. Moreover, the fluorescence signal mirrors cell vitality. In this study, four different BGPs were investigated. The results indicated stable process performance in the mesophilic BGPs and in the thermophilic reactor. Bacterial subcommunity characterization revealed significant differences between the four BGPs. Most remarkably, the genera Defluviitoga and Halocella dominated the thermophilic bacterial subcommunity, whereas members of another taxon, Syntrophaceticus, were found to be abundant in the mesophilic BGP. The domain Archaea was dominated by the genus Methanoculleus in all four BGPs, followed by Methanosaeta in BGP1 and BGP3. In contrast, Methanothermobacter members were highly abundant in the thermophilic BGP4. Furthermore, a high consistency between the sequencing approach and the QMF method was shown, especially for the thermophilic BGP. The differences elucidated that using this biphasic approach for mesophilic BGPs provided novel insights regarding disaggregated single cells of Methanosarcina and Methanosaeta species. Both dominated the archaeal subcommunity and replaced coccoid Methanoculleus members belonging to the same group of Methanomicrobiales that have been frequently observed in similar BGPs. This work demonstrates that combining QMF and 16S rRNA gene amplicon sequencing is a complementary strategy to describe archaeal community structures within biogas processes.

  5. High-throughput sequencing and degradome analysis reveal altered expression of miRNAs and their targets in a male-sterile cybrid pummelo (Citrus grandis).

    Science.gov (United States)

    Fang, Yan-Ni; Zheng, Bei-Bei; Wang, Lun; Yang, Wei; Wu, Xiao-Meng; Xu, Qiang; Guo, Wen-Wu

    2016-08-09

    G1 + HBP is a male sterile cybrid line with nuclear genome from Hirado Buntan pummelo (C. grandis Osbeck) (HBP) and mitochondrial genome from "Guoqing No.1" (G1, Satsuma mandarin), which provides a good opportunity to study male sterility and nuclear-cytoplasmic cross talk in citrus. High-throughput sRNA and degradome sequencing were applied to identify miRNAs and their targets in G1 + HBP and its fertile type HBP during reproductive development. A total of 184 known miRNAs, 22 novel miRNAs and 86 target genes were identified. Some of the targets are transcription factors involved in floral development, such as auxin response factors (ARFs), SQUAMOSA promoter binding protein box (SBP-box), MYB, basic region-leucine zipper (bZIP), APETALA2 (AP2) and transport inhibitor response 1 (TIR1). Eight target genes were confirmed to be sliced by corresponding miRNAs using 5' RACE technology. Based on the sequencing abundance, 42 differentially expressed miRNAs between sterile line G1 + HBP and fertile line HBP were identified. Differential expression of miRNAs and their target genes between two lines was validated by quantitative RT-PCR, and reciprocal expression patterns between some miRNAs and their targets were demonstrated. The regulatory mechanism of miR167a was investigated by yeast one-hybrid and dual-luciferase assays that one dehydrate responsive element binding (DREB) transcription factor binds to miR167a promoter and transcriptionally repress miR167 expression. Our study reveals the altered expression of miRNAs and their target genes in a male sterile line of pummelo and highlights that miRNA regulatory network may be involved in floral bud development and cytoplasmic male sterility in citrus.

  6. Roche genome sequencer FLX based high-throughput sequencing of ancient DNA

    DEFF Research Database (Denmark)

    Alquezar-Planas, David E; Fordyce, Sarah Louise

    2012-01-01

    Since the development of so-called "next generation" high-throughput sequencing in 2005, this technology has been applied to a variety of fields. Such applications include disease studies, evolutionary investigations, and ancient DNA. Each application requires a specialized protocol to ensure tha...

  7. High-throughput DNA sequencing errors are reduced by orders of magnitude using circle sequencing

    Science.gov (United States)

    Lou, Dianne I.; Hussmann, Jeffrey A.; McBee, Ross M.; Acevedo, Ashley; Andino, Raul; Press, William H.; Sawyer, Sara L.

    2013-01-01

    A major limitation of high-throughput DNA sequencing is the high rate of erroneous base calls produced. For instance, Illumina sequencing machines produce errors at a rate of ∼0.1–1 × 10−2 per base sequenced. These technologies typically produce billions of base calls per experiment, translating to millions of errors. We have developed a unique library preparation strategy, “circle sequencing,” which allows for robust downstream computational correction of these errors. In this strategy, DNA templates are circularized, copied multiple times in tandem with a rolling circle polymerase, and then sequenced on any high-throughput sequencing machine. Each read produced is computationally processed to obtain a consensus sequence of all linked copies of the original molecule. Physically linking the copies ensures that each copy is independently derived from the original molecule and allows for efficient formation of consensus sequences. The circle-sequencing protocol precedes standard library preparations and is therefore suitable for a broad range of sequencing applications. We tested our method using the Illumina MiSeq platform and obtained errors in our processed sequencing reads at a rate as low as 7.6 × 10−6 per base sequenced, dramatically improving the error rate of Illumina sequencing and putting error on par with low-throughput, but highly accurate, Sanger sequencing. Circle sequencing also had substantially higher efficiency and lower cost than existing barcode-based schemes for correcting sequencing errors. PMID:24243955

  8. A reporter system coupled with high-throughput sequencing unveils key bacterial transcription and translation determinants.

    Science.gov (United States)

    Yus, Eva; Yang, Jae-Seong; Sogues, Adrià; Serrano, Luis

    2017-08-28

    Quantitative analysis of the sequence determinants of transcription and translation regulation is relevant for systems and synthetic biology. To identify these determinants, researchers have developed different methods of screening random libraries using fluorescent reporters or antibiotic resistance genes. Here, we have implemented a generic approach called ELM-seq (expression level monitoring by DNA methylation) that overcomes the technical limitations of such classic reporters. ELM-seq uses DamID (Escherichia coli DNA adenine methylase as a reporter coupled with methylation-sensitive restriction enzyme digestion and high-throughput sequencing) to enable in vivo quantitative analyses of upstream regulatory sequences. Using the genome-reduced bacterium Mycoplasma pneumoniae, we show that ELM-seq has a large dynamic range and causes minimal toxicity. We use ELM-seq to determine key sequences (known and putatively novel) of promoter and untranslated regions that influence transcription and translation efficiency. Applying ELM-seq to other organisms will help us to further understand gene expression and guide synthetic biology.Quantitative analysis of how DNA sequence determines transcription and translation regulation is of interest to systems and synthetic biologists. Here the authors present ELM-seq, which uses Dam activity as reporter for high-throughput analysis of promoter and 5'-UTR regions.

  9. Library Design-Facilitated High-Throughput Sequencing of Synthetic Peptide Libraries.

    Science.gov (United States)

    Vinogradov, Alexander A; Gates, Zachary P; Zhang, Chi; Quartararo, Anthony J; Halloran, Kathryn H; Pentelute, Bradley L

    2017-11-13

    A methodology to achieve high-throughput de novo sequencing of synthetic peptide mixtures is reported. The approach leverages shotgun nanoliquid chromatography coupled with tandem mass spectrometry-based de novo sequencing of library mixtures (up to 2000 peptides) as well as automated data analysis protocols to filter away incorrect assignments, noise, and synthetic side-products. For increasing the confidence in the sequencing results, mass spectrometry-friendly library designs were developed that enabled unambiguous decoding of up to 600 peptide sequences per hour while maintaining greater than 85% sequence identification rates in most cases. The reliability of the reported decoding strategy was additionally confirmed by matching fragmentation spectra for select authentic peptides identified from library sequencing samples. The methods reported here are directly applicable to screening techniques that yield mixtures of active compounds, including particle sorting of one-bead one-compound libraries and affinity enrichment of synthetic library mixtures performed in solution.

  10. New Tools For Understanding Microbial Diversity Using High-throughput Sequence Data

    Science.gov (United States)

    Knight, R.; Hamady, M.; Liu, Z.; Lozupone, C.

    2007-12-01

    High-throughput sequencing techniques such as 454 are straining the limits of tools traditionally used to build trees, choose OTUs, and perform other essential sequencing tasks. We have developed a workflow for phylogenetic analysis of large-scale sequence data sets that combines existing tools, such as the Arb phylogeny package and the NAST multiple sequence alignment tool, with new methods for choosing and clustering OTUs and for performing phylogenetic community analysis with UniFrac. This talk discusses the cyberinfrastructure we are developing to support the human microbiome project, and the application of these workflows to analyze very large data sets that contrast the gut microbiota with a range of physical environments. These tools will ultimately help to define core and peripheral microbiomes in a range of environments, and will allow us to understand the physical and biotic factors that contribute most to differences in microbial diversity.

  11. Direct multiplex sequencing (DMPS)--a novel method for targeted high-throughput sequencing of ancient and highly degraded DNA

    National Research Council Canada - National Science Library

    Stiller, Mathias; Knapp, Michael; Stenzel, Udo; Hofreiter, Michael; Meyer, Matthias

    2009-01-01

    Although the emergence of high-throughput sequencing technologies has enabled whole-genome sequencing from extinct organisms, little progress has been made in accelerating targeted sequencing from highly degraded DNA...

  12. Automated cleaning and pre-processing of immunoglobulin gene sequences from high-throughput sequencing

    Directory of Open Access Journals (Sweden)

    Miri eMichaeli

    2012-12-01

    Full Text Available High throughput sequencing (HTS yields tens of thousands to millions of sequences that require a large amount of pre-processing work to clean various artifacts. Such cleaning cannot be performed manually. Existing programs are not suitable for immunoglobulin (Ig genes, which are variable and often highly mutated. This paper describes Ig-HTS-Cleaner (Ig High Throughput Sequencing Cleaner, a program containing a simple cleaning procedure that successfully deals with pre-processing of Ig sequences derived from HTS, and Ig-Indel-Identifier (Ig Insertion – Deletion Identifier, a program for identifying legitimate and artifact insertions and/or deletions (indels. Our programs were designed for analyzing Ig gene sequences obtained by 454 sequencing, but they are applicable to all types of sequences and sequencing platforms. Ig-HTS-Cleaner and Ig-Indel-Identifier have been implemented in Java and saved as executable JAR files, supported on Linux and MS Windows. No special requirements are needed in order to run the programs, except for correctly constructing the input files as explained in the text. The programs' performance has been tested and validated on real and simulated data sets.

  13. Characterizing ncRNAs in human pathogenic protists using high-throughput sequencing technology

    Directory of Open Access Journals (Sweden)

    Lesley Joan Collins

    2011-12-01

    Full Text Available ncRNAs are key genes in many human diseases including cancer and viral infection, as well as providing critical functions in pathogenic organisms such as fungi, bacteria, viruses and protists. Until now the identification and characterization of ncRNAs associated with disease has been slow or inaccurate requiring many years of testing to understand complicated RNA and protein gene relationships. High-throughput sequencing now offers the opportunity to characterize miRNAs, siRNAs, snoRNAs and long ncRNAs on a genomic scale making it faster and easier to clarify how these ncRNAs contribute to the disease state. However, this technology is still relatively new, and ncRNA discovery is not an application of high priority for streamlined bioinformatics. Here we summarize background concepts and practical approaches for ncRNA analysis using high-throughput sequencing, and how it relates to understanding human disease. As a case study, we focus on the parasitic protists Giardia lamblia and Trichomonas vaginalis, where large evolutionary distance has meant difficulties in comparing ncRNAs with those from model eukaryotes. A combination of biological, computational and sequencing approaches has enabled easier classification of ncRNA classes such as snoRNAs, but has also aided the identification of novel classes. It is hoped that a higher level of understanding of ncRNA expression and interaction may aid in the development of less harsh treatment for protist-based diseases.

  14. SUGAR: graphical user interface-based data refiner for high-throughput DNA sequencing.

    Science.gov (United States)

    Sato, Yukuto; Kojima, Kaname; Nariai, Naoki; Yamaguchi-Kabata, Yumi; Kawai, Yosuke; Takahashi, Mamoru; Mimori, Takahiro; Nagasaki, Masao

    2014-08-08

    Next-generation sequencers (NGSs) have become one of the main tools for current biology. To obtain useful insights from the NGS data, it is essential to control low-quality portions of the data affected by technical errors such as air bubbles in sequencing fluidics. We develop a software SUGAR (subtile-based GUI-assisted refiner) which can handle ultra-high-throughput data with user-friendly graphical user interface (GUI) and interactive analysis capability. The SUGAR generates high-resolution quality heatmaps of the flowcell, enabling users to find possible signals of technical errors during the sequencing. The sequencing data generated from the error-affected regions of a flowcell can be selectively removed by automated analysis or GUI-assisted operations implemented in the SUGAR. The automated data-cleaning function based on sequence read quality (Phred) scores was applied to a public whole human genome sequencing data and we proved the overall mapping quality was improved. The detailed data evaluation and cleaning enabled by SUGAR would reduce technical problems in sequence read mapping, improving subsequent variant analysis that require high-quality sequence data and mapping results. Therefore, the software will be especially useful to control the quality of variant calls to the low population cells, e.g., cancers, in a sample with technical errors of sequencing procedures.

  15. Recent progress using high-throughput sequencing technologies in plant molecular breeding.

    Science.gov (United States)

    Gao, Qiang; Yue, Guidong; Li, Wenqi; Wang, Junyi; Xu, Jiaohui; Yin, Ye

    2012-04-01

    High-throughput sequencing is a revolutionary technological innovation in DNA sequencing. This technology has an ultra-low cost per base of sequencing and an overwhelmingly high data output. High-throughput sequencing has brought novel research methods and solutions to the research fields of genomics and post-genomics. Furthermore, this technology is leading to a new molecular breeding revolution that has landmark significance for scientific research and enables us to launch multi-level, multi-faceted, and multi-extent studies in the fields of crop genetics, genomics, and crop breeding. In this paper, we review progress in the application of high-throughput sequencing technologies to plant molecular breeding studies. © 2012 Institute of Botany, Chinese Academy of Sciences.

  16. Probabilistic Methods for Processing High-Throughput Sequencing Signals

    DEFF Research Database (Denmark)

    Sørensen, Lasse Maretty

    for reconstructing transcript sequences from RNA sequencing data. The method is based on a novel sparse prior distribution over transcript abundances and is markedly more accurate than existing approaches. The second chapter describes a new method for calling genotypes from a fixed set of candidate variants...

  17. Algorithms for mapping high-throughput DNA sequences

    DEFF Research Database (Denmark)

    Frellsen, Jes; Menzel, Peter; Krogh, Anders

    2014-01-01

    of data generation, new bioinformatics approaches have been developed to cope with the large amount of sequencing reads obtained in these experiments. In this chapter, we first introduce HTS technologies and their usage in molecular biology and discuss the problem of mapping sequencing reads...

  18. Monitoring genomic sequences during SELEX using high-throughput sequencing: neutral SELEX.

    Directory of Open Access Journals (Sweden)

    Bob Zimmermann

    Full Text Available BACKGROUND: SELEX is a well established in vitro selection tool to analyze the structure of ligand-binding nucleic acid sequences called aptamers. Genomic SELEX transforms SELEX into a tool to discover novel, genomically encoded RNA or DNA sequences binding a ligand of interest, called genomic aptamers. Concerns have been raised regarding requirements imposed on RNA sequences undergoing SELEX selection. METHODOLOGY/PRINCIPAL FINDINGS: To evaluate SELEX and assess the extent of these effects, we designed and performed a Neutral SELEX experiment omitting the selection step, such that the sequences are under the sole selective pressure of SELEX's amplification steps. Using high-throughput sequencing, we obtained thousands of full-length sequences from the initial genomic library and the pools after each of the 10 rounds of Neutral SELEX. We compared these to sequences obtained from a Genomic SELEX experiment deriving from the same initial library, but screening for RNAs binding with high affinity to the E. coli regulator protein Hfq. With each round of Neutral SELEX, sequences became less stable and changed in nucleotide content, but no sequences were enriched. In contrast, we detected substantial enrichment in the Hfq-selected set with enriched sequences having structural stability similar to the neutral sequences but with significantly different nucleotide selection. CONCLUSIONS/SIGNIFICANCE: Our data indicate that positive selection in SELEX acts independently of the neutral selective requirements imposed on the sequences. We conclude that Genomic SELEX, when combined with high-throughput sequencing of positively and neutrally selected pools, as well as the gnomic library, is a powerful method to identify genomic aptamers.

  19. Monitoring genomic sequences during SELEX using high-throughput sequencing: neutral SELEX.

    Science.gov (United States)

    Zimmermann, Bob; Gesell, Tanja; Chen, Doris; Lorenz, Christina; Schroeder, Renée

    2010-02-11

    SELEX is a well established in vitro selection tool to analyze the structure of ligand-binding nucleic acid sequences called aptamers. Genomic SELEX transforms SELEX into a tool to discover novel, genomically encoded RNA or DNA sequences binding a ligand of interest, called genomic aptamers. Concerns have been raised regarding requirements imposed on RNA sequences undergoing SELEX selection. To evaluate SELEX and assess the extent of these effects, we designed and performed a Neutral SELEX experiment omitting the selection step, such that the sequences are under the sole selective pressure of SELEX's amplification steps. Using high-throughput sequencing, we obtained thousands of full-length sequences from the initial genomic library and the pools after each of the 10 rounds of Neutral SELEX. We compared these to sequences obtained from a Genomic SELEX experiment deriving from the same initial library, but screening for RNAs binding with high affinity to the E. coli regulator protein Hfq. With each round of Neutral SELEX, sequences became less stable and changed in nucleotide content, but no sequences were enriched. In contrast, we detected substantial enrichment in the Hfq-selected set with enriched sequences having structural stability similar to the neutral sequences but with significantly different nucleotide selection. Our data indicate that positive selection in SELEX acts independently of the neutral selective requirements imposed on the sequences. We conclude that Genomic SELEX, when combined with high-throughput sequencing of positively and neutrally selected pools, as well as the gnomic library, is a powerful method to identify genomic aptamers.

  20. Evaluation of a High Throughput Starch Analysis Optimised for Wood

    Science.gov (United States)

    Bellasio, Chandra; Fini, Alessio; Ferrini, Francesco

    2014-01-01

    Starch is the most important long-term reserve in trees, and the analysis of starch is therefore useful source of physiological information. Currently published protocols for wood starch analysis impose several limitations, such as long procedures and a neutralization step. The high-throughput standard protocols for starch analysis in food and feed represent a valuable alternative. However, they have not been optimised or tested with woody samples. These have particular chemical and structural characteristics, including the presence of interfering secondary metabolites, low reactivity of starch, and low starch content. In this study, a standard method for starch analysis used for food and feed (AOAC standard method 996.11) was optimised to improve precision and accuracy for the analysis of starch in wood. Key modifications were introduced in the digestion conditions and in the glucose assay. The optimised protocol was then evaluated through 430 starch analyses of standards at known starch content, matrix polysaccharides, and wood collected from three organs (roots, twigs, mature wood) of four species (coniferous and flowering plants). The optimised protocol proved to be remarkably precise and accurate (3%), suitable for a high throughput routine analysis (35 samples a day) of specimens with a starch content between 40 mg and 21 µg. Samples may include lignified organs of coniferous and flowering plants and non-lignified organs, such as leaves, fruits and rhizomes. PMID:24523863

  1. glbase: a framework for combining, analyzing and displaying heterogeneous genomic and high-throughput sequencing data

    Directory of Open Access Journals (Sweden)

    Andrew Paul Hutchins

    2014-01-01

    Full Text Available Genomic datasets and the tools to analyze them have proliferated at an astonishing rate. However, such tools are often poorly integrated with each other: each program typically produces its own custom output in a variety of non-standard file formats. Here we present glbase, a framework that uses a flexible set of descriptors that can quickly parse non-binary data files. glbase includes many functions to intersect two lists of data, including operations on genomic interval data and support for the efficient random access to huge genomic data files. Many glbase functions can produce graphical outputs, including scatter plots, heatmaps, boxplots and other common analytical displays of high-throughput data such as RNA-seq, ChIP-seq and microarray expression data. glbase is designed to rapidly bring biological data into a Python-based analytical environment to facilitate analysis and data processing. In summary, glbase is a flexible and multifunctional toolkit that allows the combination and analysis of high-throughput data (especially next-generation sequencing and genome-wide data, and which has been instrumental in the analysis of complex data sets. glbase is freely available at http://bitbucket.org/oaxiom/glbase/.

  2. Using high throughput sequencing to explore the biodiversity in oral bacterial communities

    Science.gov (United States)

    Diaz, P.I.; Dupuy, A.K.; Abusleme, L.; Reese, B.; Obergfell, C.; Choquette, L.; Dongari-Bagtzoglou, A.; Peterson, D.E.; Terzi, E.; Strausbaugh, L.D.

    2013-01-01

    Summary High throughput sequencing of 16S ribosomal RNA gene amplicons is a cost-effective method for characterization of oral bacterial communities. However, before undertaking large-scale studies, it is necessary to understand the technique-associated limitations and intrinsic variability of the oral ecosystem. In this work we evaluated bias in species representation using an in vitro-assembled mock community of oral bacteria. We then characterized the bacterial communities in saliva and buccal mucosa of five healthy subjects to investigate the power of high throughput sequencing in revealing their diversity and biogeography patterns. Mock community analysis showed primer and DNA isolation biases and an overestimation of diversity that was reduced after eliminating singleton operational taxonomic units (OTUs). Sequencing of salivary and mucosal communities found a total of 455 OTUs (0.3% dissimilarity) with only 78 of these present in all subjects. We demonstrate that this variability was partly the result of incomplete richness coverage even at great sequencing depths, and so comparing communities by their structure was more effective than comparisons based solely on membership. With respect to oral biogeography, we found inter-subject variability in community structure was lower than site differences between salivary and mucosal communities within subjects. These differences were evident at very low sequencing depths and were mostly caused by the abundance of Streptococcus mitis and Gemella haemolysans in mucosa. In summary, we present an experimental and data analysis framework that will facilitate design and interpretation of pyrosequencing-based studies. Despite challenges associated with this technique, we demonstrate its power for evaluation of oral diversity and biogeography patterns. PMID:22520388

  3. High-Throughput Sequencing Based Methods of RNA Structure Investigation

    DEFF Research Database (Denmark)

    Kielpinski, Lukasz Jan

    In this thesis we describe the development of four related methods for RNA structure probing that utilize massive parallel sequencing. Using them, we were able to gather structural data for multiple, long molecules simultaneously. First, we have established an easy to follow experimental and comp......In this thesis we describe the development of four related methods for RNA structure probing that utilize massive parallel sequencing. Using them, we were able to gather structural data for multiple, long molecules simultaneously. First, we have established an easy to follow experimental...... with known priming sites....

  4. High throughput inclusion body sizing: Nano particle tracking analysis.

    Science.gov (United States)

    Reichelt, Wieland N; Kaineder, Andreas; Brillmann, Markus; Neutsch, Lukas; Taschauer, Alexander; Lohninger, Hans; Herwig, Christoph

    2017-06-01

    The expression of pharmaceutical relevant proteins in Escherichia coli frequently triggers inclusion body (IB) formation caused by protein aggregation. In the scientific literature, substantial effort has been devoted to the quantification of IB size. However, particle-based methods used up to this point to analyze the physical properties of representative numbers of IBs lack sensitivity and/or orthogonal verification. Using high pressure freezing and automated freeze substitution for transmission electron microscopy (TEM) the cytosolic inclusion body structure was preserved within the cells. TEM imaging in combination with manual grey scale image segmentation allowed the quantification of relative areas covered by the inclusion body within the cytosol. As a high throughput method nano particle tracking analysis (NTA) enables one to derive the diameter of inclusion bodies in cell homogenate based on a measurement of the Brownian motion. The NTA analysis of fixated (glutaraldehyde) and non-fixated IBs suggests that high pressure homogenization annihilates the native physiological shape of IBs. Nevertheless, the ratio of particle counts of non-fixated and fixated samples could potentially serve as factor for particle stickiness. In this contribution, we establish image segmentation of TEM pictures as an orthogonal method to size biologic particles in the cytosol of cells. More importantly, NTA has been established as a particle-based, fast and high throughput method (1000-3000 particles), thus constituting a much more accurate and representative analysis than currently available methods. Copyright © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  5. Targeted high throughput sequencing in hereditary ataxia and spastic paraplegia

    OpenAIRE

    Iqbal, Zafar; Rydning, Siri L.; Wedding, Iselin M.; Koht, Jeanette; Pihlstr?m, Lasse; Rengmark, Aina H.; Henriksen, Sandra P.; Tallaksen, Chantal M. E.; Toft, Mathias

    2017-01-01

    Hereditary ataxia and spastic paraplegia are heterogeneous monogenic neurodegenerative disorders. To date, a large number of individuals with such disorders remain undiagnosed. Here, we have assessed molecular diagnosis by gene panel sequencing in 105 early and late-onset hereditary ataxia and spastic paraplegia probands, in whom extensive previous investigations had failed to identify the genetic cause of disease. Pathogenic and likely-pathogenic variants were identified in 20 probands (19%)...

  6. Targeted high throughput sequencing in hereditary ataxia and spastic paraplegia.

    Science.gov (United States)

    Iqbal, Zafar; Rydning, Siri L; Wedding, Iselin M; Koht, Jeanette; Pihlstrøm, Lasse; Rengmark, Aina H; Henriksen, Sandra P; Tallaksen, Chantal M E; Toft, Mathias

    2017-01-01

    Hereditary ataxia and spastic paraplegia are heterogeneous monogenic neurodegenerative disorders. To date, a large number of individuals with such disorders remain undiagnosed. Here, we have assessed molecular diagnosis by gene panel sequencing in 105 early and late-onset hereditary ataxia and spastic paraplegia probands, in whom extensive previous investigations had failed to identify the genetic cause of disease. Pathogenic and likely-pathogenic variants were identified in 20 probands (19%) and variants of uncertain significance in ten probands (10%). Together these accounted for 30 probands (29%) and involved 18 different genes. Among several interesting findings, dominantly inherited KIF1A variants, p.(Val8Met) and p.(Ile27Thr) segregated in two independent families, both presenting with a pure spastic paraplegia phenotype. Two homozygous missense variants, p.(Gly4230Ser) and p.(Leu4221Val) were found in SACS in one consanguineous family, presenting with spastic ataxia and isolated cerebellar atrophy. The average disease duration in probands with pathogenic and likely-pathogenic variants was 31 years, ranging from 4 to 51 years. In conclusion, this study confirmed and expanded the clinical phenotypes associated with known disease genes. The results demonstrate that gene panel sequencing and similar sequencing approaches can serve as efficient diagnostic tools for different heterogeneous disorders. Early use of such strategies may help to reduce both costs and time of the diagnostic process.

  7. High Throughput Sequencing of Extracellular RNA from Human Plasma.

    Directory of Open Access Journals (Sweden)

    Kirsty M Danielson

    Full Text Available The presence and relative stability of extracellular RNAs (exRNAs in biofluids has led to an emerging recognition of their promise as 'liquid biopsies' for diseases. Most prior studies on discovery of exRNAs as disease-specific biomarkers have focused on microRNAs (miRNAs using technologies such as qRT-PCR and microarrays. The recent application of next-generation sequencing to discovery of exRNA biomarkers has revealed the presence of potential novel miRNAs as well as other RNA species such as tRNAs, snoRNAs, piRNAs and lncRNAs in biofluids. At the same time, the use of RNA sequencing for biofluids poses unique challenges, including low amounts of input RNAs, the presence of exRNAs in different compartments with varying degrees of vulnerability to isolation techniques, and the high abundance of specific RNA species (thereby limiting the sensitivity of detection of less abundant species. Moreover, discovery in human diseases often relies on archival biospecimens of varying age and limiting amounts of samples. In this study, we have tested RNA isolation methods to optimize profiling exRNAs by RNA sequencing in individuals without any known diseases. Our findings are consistent with other recent studies that detect microRNAs and ribosomal RNAs as the major exRNA species in plasma. Similar to other recent studies, we found that the landscape of biofluid microRNA transcriptome is dominated by several abundant microRNAs that appear to comprise conserved extracellular miRNAs. There is reasonable correlation of sets of conserved miRNAs across biological replicates, and even across other data sets obtained at different investigative sites. Conversely, the detection of less abundant miRNAs is far more dependent on the exact methodology of RNA isolation and profiling. This study highlights the challenges in detecting and quantifying less abundant plasma miRNAs in health and disease using RNA sequencing platforms.

  8. TeloPCR-seq: a high-throughput sequencing approach for telomeres

    Science.gov (United States)

    Bennett, Henrietta W.; Liu, Na; Hu, Yan; King, Megan C.

    2017-01-01

    We have developed a high-throughput sequencing approach that enables us to determine terminal telomere sequences from tens of thousands of individual Schizosaccharomyces pombe telomeres. This method provides unprecedented coverage of telomeric sequence complexity in fission yeast. S. pombe telomeres are composed of modular degenerate repeats that can be explained by variation in usage of the TER1 RNA template during reverse transcription. Taking advantage of this deep sequencing approach, we find that “like” repeat modules are highly correlated within individual telomeres. Moreover, repeat module preference varies with telomere length, suggesting that existing repeats promote the incorporation of like repeats and/or that specific conformations of the telomerase holoenzyme efficiently and/or processively add repeats of like nature. After the loss of telomerase activity, this sequencing and analysis pipeline defines a population of telomeres with altered sequence content. This approach will be adaptable to study telomeric repeats in other organisms and also to interrogate repetitive sequences throughout the genome that are inaccessible to other sequencing methods. PMID:27714790

  9. High-throughput sequencing of black pepper root transcriptome

    Directory of Open Access Journals (Sweden)

    Gordo Sheila MC

    2012-09-01

    Full Text Available Abstract Background Black pepper (Piper nigrum L. is one of the most popular spices in the world. It is used in cooking and the preservation of food and even has medicinal properties. Losses in production from disease are a major limitation in the culture of this crop. The major diseases are root rot and foot rot, which are results of root infection by Fusarium solani and Phytophtora capsici, respectively. Understanding the molecular interaction between the pathogens and the host’s root region is important for obtaining resistant cultivars by biotechnological breeding. Genetic and molecular data for this species, though, are limited. In this paper, RNA-Seq technology has been employed, for the first time, to describe the root transcriptome of black pepper. Results The root transcriptome of black pepper was sequenced by the NGS SOLiD platform and assembled using the multiple-k method. Blast2Go and orthoMCL methods were used to annotate 10338 unigenes. The 4472 predicted proteins showed about 52% homology with the Arabidopsis proteome. Two root proteomes identified 615 proteins, which seem to define the plant’s root pattern. Simple-sequence repeats were identified that may be useful in studies of genetic diversity and may have applications in biotechnology and ecology. Conclusions This dataset of 10338 unigenes is crucially important for the biotechnological breeding of black pepper and the ecogenomics of the Magnoliids, a major group of basal angiosperms.

  10. High-throughput sequencing of black pepper root transcriptome.

    Science.gov (United States)

    Gordo, Sheila M C; Pinheiro, Daniel G; Moreira, Edith C O; Rodrigues, Simone M; Poltronieri, Marli C; de Lemos, Oriel F; da Silva, Israel Tojal; Ramos, Rommel T J; Silva, Artur; Schneider, Horacio; Silva, Wilson A; Sampaio, Iracilda; Darnet, Sylvain

    2012-09-17

    Black pepper (Piper nigrum L.) is one of the most popular spices in the world. It is used in cooking and the preservation of food and even has medicinal properties. Losses in production from disease are a major limitation in the culture of this crop. The major diseases are root rot and foot rot, which are results of root infection by Fusarium solani and Phytophtora capsici, respectively. Understanding the molecular interaction between the pathogens and the host's root region is important for obtaining resistant cultivars by biotechnological breeding. Genetic and molecular data for this species, though, are limited. In this paper, RNA-Seq technology has been employed, for the first time, to describe the root transcriptome of black pepper. The root transcriptome of black pepper was sequenced by the NGS SOLiD platform and assembled using the multiple-k method. Blast2Go and orthoMCL methods were used to annotate 10338 unigenes. The 4472 predicted proteins showed about 52% homology with the Arabidopsis proteome. Two root proteomes identified 615 proteins, which seem to define the plant's root pattern. Simple-sequence repeats were identified that may be useful in studies of genetic diversity and may have applications in biotechnology and ecology. This dataset of 10338 unigenes is crucially important for the biotechnological breeding of black pepper and the ecogenomics of the Magnoliids, a major group of basal angiosperms.

  11. High-throughput sequencing of black pepper root transcriptome

    Science.gov (United States)

    2012-01-01

    Background Black pepper (Piper nigrum L.) is one of the most popular spices in the world. It is used in cooking and the preservation of food and even has medicinal properties. Losses in production from disease are a major limitation in the culture of this crop. The major diseases are root rot and foot rot, which are results of root infection by Fusarium solani and Phytophtora capsici, respectively. Understanding the molecular interaction between the pathogens and the host’s root region is important for obtaining resistant cultivars by biotechnological breeding. Genetic and molecular data for this species, though, are limited. In this paper, RNA-Seq technology has been employed, for the first time, to describe the root transcriptome of black pepper. Results The root transcriptome of black pepper was sequenced by the NGS SOLiD platform and assembled using the multiple-k method. Blast2Go and orthoMCL methods were used to annotate 10338 unigenes. The 4472 predicted proteins showed about 52% homology with the Arabidopsis proteome. Two root proteomes identified 615 proteins, which seem to define the plant’s root pattern. Simple-sequence repeats were identified that may be useful in studies of genetic diversity and may have applications in biotechnology and ecology. Conclusions This dataset of 10338 unigenes is crucially important for the biotechnological breeding of black pepper and the ecogenomics of the Magnoliids, a major group of basal angiosperms. PMID:22984782

  12. Fluorescent foci quantitation for high-throughput analysis

    Directory of Open Access Journals (Sweden)

    Elena Ledesma-Fernández

    2015-06-01

    Full Text Available A number of cellular proteins localize to discrete foci within cells, for example DNA repair proteins, microtubule organizing centers, P bodies or kinetochores. It is often possible to measure the fluorescence emission from tagged proteins within these foci as a surrogate for the concentration of that specific protein. We wished to develop tools that would allow quantitation of fluorescence foci intensities in high-throughput studies. As proof of principle we have examined the kinetochore, a large multi-subunit complex that is critical for the accurate segregation of chromosomes during cell division. Kinetochore perturbations lead to aneuploidy, which is a hallmark of cancer cells. Hence, understanding kinetochore homeostasis and regulation are important for a global understanding of cell division and genome integrity. The 16 budding yeast kinetochores colocalize within the nucleus to form a single focus. Here we have created a set of freely-available tools to allow high-throughput quantitation of kinetochore foci fluorescence. We use this ‘FociQuant’ tool to compare methods of kinetochore quantitation and we show proof of principle that FociQuant can be used to identify changes in kinetochore protein levels in a mutant that affects kinetochore function. This analysis can be applied to any protein that forms discrete foci in cells.

  13. The Microsoft Biology Foundation Applications for High-Throughput Sequencing

    Science.gov (United States)

    Mercer, S.

    2010-01-01

    w9-2 The need for reusable libraries of bioinformatics functions has been recognized for many years and a number of language-specific toolkits have been constructed. Such toolkits have served as valuable nucleation points for the community, promoting the sharing of code and establishing standards. The majority of DNA sequencing machines and many other standard pieces of lab equipment are controlled by PCs using Windows, and a Microsoft genomics toolkit would enable initial processing and quality control to happen closer to the instrumentation and provide opportunities for added-value services within core facilities. The Microsoft Biology Foundation (MBF) is an open source software library, freely available for both commercial and academic use, available as an early-stage betafrom mbf.codeplex.com. This presentation will describe the structure and goals of MBF and demonstrate some of its uses.

  14. The use of high throughput DNA sequence analysis to assess the endophytic microbiome of date palm roots grown under different levels of salt stress.

    Science.gov (United States)

    Yaish, Mahmoud W; Al-Harrasi, Ibtisam; Alansari, Aliya S; Al-Yahyai, Rashid; Glick, Bernard R

    2016-09-01

    Date palms are able to grow under diverse abiotic stress conditions including in saline soils, where microbial communities may be help in the plant's salinity tolerance. These communities able to produce specific growth promoting substances can enhance date palm growth in a saline environment. However, these communities are poorly defined. In the work reported here, the date palm endophytic bacterial and fungal communities were identified using the pyrosequencing method, and the microbial differential abundance in the root upon exposure to salinity stress was estimated. Approximately 150,061 reads were produced from the analysis of six ribosomal DNA libraries, which were prepared from endophytic microorganisms colonizing date palm root tissues. DNA sequence analysis of these libraries predicted the presence of a variety of bacterial and fungal endophytic species, some known and others unknown. The microbial community compositions of 30% and 8% of the bacterial and fungal species, respectively, were significantly (p ≤ 0.05) altered in response to salinity stress. Differential enrichment analysis showed that microbe diversity indicated by the Chao, Shannon and Simpson indices were slightly reduced, however, the overall microbial community structures were not significantly affected as a consequence of salinity. This may reflect a buffering effect by the host plant on the internal environments that these communities are colonizing. Some of the endophytes identified in this study were strains that were previously isolated from saline and marine environments. This suggests possible interactions with the plant that are favorable to salinity tolerance in date palm. [Int Microbiol 19(3):143-155 (2016)]. Copyright© by the Spanish Society for Microbiology and Institute for Catalan Studies.

  15. Automated degenerate PCR primer design for high-throughput sequencing improves efficiency of viral sequencing

    Directory of Open Access Journals (Sweden)

    Li Kelvin

    2012-11-01

    Full Text Available Abstract Background In a high-throughput environment, to PCR amplify and sequence a large set of viral isolates from populations that are potentially heterogeneous and continuously evolving, the use of degenerate PCR primers is an important strategy. Degenerate primers allow for the PCR amplification of a wider range of viral isolates with only one set of pre-mixed primers, thus increasing amplification success rates and minimizing the necessity for genome finishing activities. To successfully select a large set of degenerate PCR primers necessary to tile across an entire viral genome and maximize their success, this process is best performed computationally. Results We have developed a fully automated degenerate PCR primer design system that plays a key role in the J. Craig Venter Institute’s (JCVI high-throughput viral sequencing pipeline. A consensus viral genome, or a set of consensus segment sequences in the case of a segmented virus, is specified using IUPAC ambiguity codes in the consensus template sequence to represent the allelic diversity of the target population. PCR primer pairs are then selected computationally to produce a minimal amplicon set capable of tiling across the full length of the specified target region. As part of the tiling process, primer pairs are computationally screened to meet the criteria for successful PCR with one of two described amplification protocols. The actual sequencing success rates for designed primers for measles virus, mumps virus, human parainfluenza virus 1 and 3, human respiratory syncytial virus A and B and human metapneumovirus are described, where >90% of designed primer pairs were able to consistently successfully amplify >75% of the isolates. Conclusions Augmenting our previously developed and published JCVI Primer Design Pipeline, we achieved similarly high sequencing success rates with only minor software modifications. The recommended methodology for the construction of the consensus

  16. Barcoded sequencing workflow for high throughput digitization of hybridoma antibody variable domain sequences.

    Science.gov (United States)

    Chen, Yongmei; Kim, Si Hyun; Shang, Yonglei; Guillory, Joseph; Stinson, Jeremy; Zhang, Qing; Hötzel, Isidro; Hoi, Kam Hon

    2018-01-20

    Since the invention of Hybridoma technology by Milstein and Köhler in 1975, its application has greatly advanced the antibody discovery process. The technology enables both functional screening and long-term archival of the immortalized monoclonal antibody producing B cells. Despite the dependable cryopreservation technology for hybridoma cells, practicality of long-term storage has been outpaced by recent progress in robotics and automations, which enables routine identification of thousands of antigen specific hybridoma clones. Such throughput increase imposes two nascent challenges in the antibody discovery process, namely limited cryopreservation storage space and limited throughput in conventional antibody sequencing. We herein provide a barcoded sequencing workflow that utilizes next generation sequencing to expand the conventional sequencing capacity. Accompanied with the bioinformatics tools we describe, the barcoded sequencing workflow robustly reports unambiguous antibody sequences as confirmed with Sanger sequencing controls. In complement with the commonly accessible recombinant DNA technology, the barcoded sequencing workflow allows for high throughput digitization of the antibody sequences and provides an effective solution to the limitations imposed by physical storage and sequencing capacity. Copyright © 2018 Genentech, Inc. Published by Elsevier B.V. All rights reserved.

  17. The efficacy of high-throughput sequencing and target enrichment on charred archaeobotanical remains

    DEFF Research Database (Denmark)

    Nistelberger, H. M.; Smith, O.; Wales, Nathan

    2016-01-01

    The majority of archaeological plant material is preserved in a charred state. Obtaining reliable ancient DNA data from these remains has presented challenges due to high rates of nucleotide damage, short DNA fragment lengths, low endogenous DNA content and the potential for modern contamination...... different laboratories, presenting the largest HTS assessment of charred archaeobotanical specimens to date. Rigorous analysis of our data - excluding false-positives due to background contamination or incorrect index assignments - indicated a lack of endogenous DNA in nearly all samples, except for one....... It has been suggested that high-throughput sequencing (HTS) technologies coupled with DNA enrichment techniques may overcome some of these limitations. Here we report the findings of HTS and target enrichment on four important archaeological crops (barley, grape, maize and rice) performed in three...

  18. ViewBS: a powerful toolkit for visualization of high-throughput bisulfite sequencing data.

    Science.gov (United States)

    Huang, Xiaosan; Zhang, Shaoling; Li, Kongqing; Thimmapuram, Jyothi; Xie, Shaojun

    2017-10-26

    High throughput bisulfite sequencing (BS-seq) is an important technology to generate single-base DNA methylomes in both plants and animals. In order to accelerate the data analysis of BS-seq data, toolkits for visualization are required. ViewBS, an open-source toolkit, can extract and visualize the DNA methylome data easily and with flexibility. By using Tabix, ViewBS can visualize BS-seq for large datasets quickly. ViewBS can generate publication-quality figures, such as meta-plots, heat maps and violin-boxplots, which can help users to answer biological questions. We illustrate its application using BS-seq data from Arabidopsis thaliana. ViewBS is freely available at: https://github.com/xie186/ViewBS. xie186@purdue.edu. Supplementary data are available at Bioinformatics online.

  19. Barcoding the food chain: from Sanger to high-throughput sequencing.

    Science.gov (United States)

    Littlefair, Joanne E; Clare, Elizabeth L

    2016-11-01

    Society faces the complex challenge of supporting biodiversity and ecosystem functioning, while ensuring food security by providing safe traceable food through an ever-more-complex global food chain. The increase in human mobility brings the added threat of pests, parasites, and invaders that further complicate our agro-industrial efforts. DNA barcoding technologies allow researchers to identify both individual species, and, when combined with universal primers and high-throughput sequencing techniques, the diversity within mixed samples (metabarcoding). These tools are already being employed to detect market substitutions, trace pests through the forensic evaluation of trace "environmental DNA", and to track parasitic infections in livestock. The potential of DNA barcoding to contribute to increased security of the food chain is clear, but challenges remain in regulation and the need for validation of experimental analysis. Here, we present an overview of the current uses and challenges of applied DNA barcoding in agriculture, from agro-ecosystems within farmland to the kitchen table.

  20. Interactive Visual Analysis of High Throughput Text Streams

    Energy Technology Data Exchange (ETDEWEB)

    Steed, Chad A [ORNL; Potok, Thomas E [ORNL; Patton, Robert M [ORNL; Goodall, John R [ORNL; Maness, Christopher S [ORNL; Senter, James K [ORNL; Potok, Thomas E [ORNL

    2012-01-01

    The scale, velocity, and dynamic nature of large scale social media systems like Twitter demand a new set of visual analytics techniques that support near real-time situational awareness. Social media systems are credited with escalating social protest during recent large scale riots. Virtual communities form rapidly in these online systems, and they occasionally foster violence and unrest which is conveyed in the users language. Techniques for analyzing broad trends over these networks or reconstructing conversations within small groups have been demonstrated in recent years, but state-of- the-art tools are inadequate at supporting near real-time analysis of these high throughput streams of unstructured information. In this paper, we present an adaptive system to discover and interactively explore these virtual networks, as well as detect sentiment, highlight change, and discover spatio- temporal patterns.

  1. Increasing ecological inference from high throughput sequencing of fungi in the environment through a tagging approach

    Science.gov (United States)

    D. Lee Taylor; Michael G. Booth; Jack W. McFarland; Ian C. Herriott; Niall J. Lennon; Chad Nusbaum; Thomas G. Marr

    2008-01-01

    High throughput sequencing methods are widely used in analyses of microbial diversity but are generally applied to small numbers of samples, which precludes charaterization of patterns of microbial diversity across space and time. We have designed a primer-tagging approach that allows pooling and subsequent sorting of numerous samples, which is directed to...

  2. Deep Mutational Scanning: Library Construction, Functional Selection, and High-Throughput Sequencing.

    Science.gov (United States)

    Starita, Lea M; Fields, Stanley

    2015-08-03

    Deep mutational scanning is a highly parallel method that uses high-throughput sequencing to track changes in >10(5) protein variants before and after selection to measure the effects of mutations on protein function. Here we outline the stages of a deep mutational scanning experiment, focusing on the construction of libraries of protein sequence variants and the preparation of Illumina sequencing libraries. © 2015 Cold Spring Harbor Laboratory Press.

  3. Alignment of high-throughput sequencing data inside in-memory databases.

    Science.gov (United States)

    Firnkorn, Daniel; Knaup-Gregori, Petra; Lorenzo Bermejo, Justo; Ganzinger, Matthias

    2014-01-01

    In times of high-throughput DNA sequencing techniques, performance-capable analysis of DNA sequences is of high importance. Computer supported DNA analysis is still an intensive time-consuming task. In this paper we explore the potential of a new In-Memory database technology by using SAP's High Performance Analytic Appliance (HANA). We focus on read alignment as one of the first steps in DNA sequence analysis. In particular, we examined the widely used Burrows-Wheeler Aligner (BWA) and implemented stored procedures in both, HANA and the free database system MySQL, to compare execution time and memory management. To ensure that the results are comparable, MySQL has been running in memory as well, utilizing its integrated memory engine for database table creation. We implemented stored procedures, containing exact and inexact searching of DNA reads within the reference genome GRCh37. Due to technical restrictions in SAP HANA concerning recursion, the inexact matching problem could not be implemented on this platform. Hence, performance analysis between HANA and MySQL was made by comparing the execution time of the exact search procedures. Here, HANA was approximately 27 times faster than MySQL which means, that there is a high potential within the new In-Memory concepts, leading to further developments of DNA analysis procedures in the future.

  4. SAMQA: error classification and validation of high-throughput sequenced read data

    Directory of Open Access Journals (Sweden)

    Bressler Ryan

    2011-08-01

    Full Text Available Abstract Background The advances in high-throughput sequencing technologies and growth in data sizes has highlighted the need for scalable tools to perform quality assurance testing. These tests are necessary to ensure that data is of a minimum necessary standard for use in downstream analysis. In this paper we present the SAMQA tool to rapidly and robustly identify errors in population-scale sequence data. Results SAMQA has been used on samples from three separate sets of cancer genome data from The Cancer Genome Atlas (TCGA project. Using technical standards provided by the SAM specification and biological standards defined by researchers, we have classified errors in these sequence data sets relative to individual reads within a sample. Due to an observed linearithmic speedup through the use of a high-performance computing (HPC framework for the majority of tasks, poor quality data was identified prior to secondary analysis in significantly less time on the HPC framework than the same data run using alternative parallelization strategies on a single server. Conclusions The SAMQA toolset validates a minimum set of data quality standards across whole-genome and exome sequences. It is tuned to run on a high-performance computational framework, enabling QA across hundreds gigabytes of samples regardless of coverage or sample type.

  5. Identification of miRNAs and their target genes in peach (Prunus persica L.) using high-throughput sequencing and degradome analysis.

    Science.gov (United States)

    Luo, Xiaoyan; Gao, Zhihong; Shi, Ting; Cheng, Zongming; Zhang, Zhen; Ni, Zhaojun

    2013-01-01

    MicroRNAs play critical roles in various biological and metabolic processes. The function of miRNAs has been widely studied in model plants such as Arabidopsis and rice. However, the number of identified miRNAs and related miRNA targets in peach (Prunus persica) is limited. To understand further the relationship between miRNAs and their target genes during tissue development in peach, a small RNA library and three degradome libraries were constructed from three tissues for deep sequencing. We identified 117 conserved miRNAs and 186 novel miRNA candidates in peach by deep sequencing and 19 conserved miRNAs and 13 novel miRNAs were further evaluated for their expression by RT-qPCR. The number of gene targets that were identified for 26 conserved miRNA families and 38 novel miRNA candidates, were 172 and 87, respectively. Some of the identified miRNA targets were abundantly represented as conserved miRNA targets in plant. However, some of them were first identified and showed important roles in peach development. Our study provides information concerning the regulatory network of miRNAs in peach and advances our understanding of miRNA functions during tissue development.

  6. Genome-wide transcriptome analysis between small-tail Han sheep and the Surabaya fur sheep using high-throughput RNA sequencing.

    Science.gov (United States)

    Miao, Xiangyang; Luo, Qingmiao

    2013-06-01

    The small-tail Han sheep and the Surabaya fur sheep are two local breeds in north China, which are characterized by high-fecundity and low-prolificacy breed respectively. Significant genetic differences between these two breeds have provided increasing interests in the identification and utilization of major prolificacy genes in these sheep. High prolificacy is a complex trait, and it is difficult to comprehensively identify the candidate genes related to this trait using the single molecular biology technique. To understand the molecular mechanisms of fecundity and provide more information about high prolificacy candidate genes in high- and low-fecundity sheep, we explored the utility of next-generation sequencing technology in this work. A total of 1.8 Gb sequencing reads were obtained and resulted in more than 20 000 contigs that averaged ∼300 bp in length. Ten differentially expressed genes were further verified by quantitative real-time RT-PCR to confirm the reliability of RNA-seq results. Our work will provide a basis for the future research of the sheep reproduction.

  7. High-throughput sequencing analysis reveals the genetic diversity of different regions of the murine norovirus genome during in vitro replication.

    Science.gov (United States)

    Mauroy, Axel; Taminiau, Bernard; Nezer, Carine; Ghurburrun, Elsa; Baurain, Denis; Daube, Georges; Thiry, Etienne

    2017-04-01

    In this study, we report the genetic diversity and nucleotide mutation rates of five representative regions of the murine norovirus genome during in vitro passages. The mutation rates were similar in genomic regions encompassing partial coding sequences for non-structural (NS) 1-2, NS5, NS6, NS7 proteins within open reading frame (ORF) 1. In a region encoding a portion of the major capsid protein (VP1) within ORF2 (also including the ORF4 region) and a portion of the minor structural protein (VP2), the mutation rates were estimated to be at least one order of magnitude higher. The VP2 coding region was found to have the highest mutation rate.

  8. Genome-Wide Identification and Comparative Analysis of Conserved and Novel MicroRNAs in Grafted Watermelon by High-Throughput Sequencing

    Science.gov (United States)

    Liu, Na; Yang, Jinghua; Guo, Shaogui; Xu, Yong; Zhang, Mingfang

    2013-01-01

    MicroRNAs (miRNAs) are a class of endogenous small non-coding RNAs involved in the post-transcriptional gene regulation and play a critical role in plant growth, development and stresses response. However less is known about miRNAs involvement in grafting behaviors, especially with the watermelon (Citrullus lanatus L.) crop, which is one of the most important agricultural crops worldwide. Grafting method is commonly used in watermelon production in attempts to improve its adaptation to abiotic and biotic stresses, in particular to the soil-borne fusarium wilt disease. In this study, Solexa sequencing has been used to discover small RNA populations and compare miRNAs on genome-wide scale in watermelon grafting system. A total of 11,458,476, 11,614,094 and 9,339,089 raw reads representing 2,957,751, 2,880,328 and 2,964,990 unique sequences were obtained from the scions of self-grafted watermelon and watermelon grafted on-to bottle gourd and squash at two true-leaf stage, respectively. 39 known miRNAs belonging to 30 miRNA families and 80 novel miRNAs were identified in our small RNA dataset. Compared with self-grafted watermelon, 20 (5 known miRNA families and 15 novel miRNAs) and 47 (17 known miRNA families and 30 novel miRNAs) miRNAs were expressed significantly different in watermelon grafted on to bottle gourd and squash, respectively. MiRNAs expressed differentially when watermelon was grafted onto different rootstocks, suggesting that miRNAs might play an important role in diverse biological and metabolic processes in watermelon and grafting may possibly by changing miRNAs expressions to regulate plant growth and development as well as adaptation to stresses. The small RNA transcriptomes obtained in this study provided insights into molecular aspects of miRNA-mediated regulation in grafted watermelon. Obviously, this result would provide a basis for further unravelling the mechanism on how miRNAs information is exchanged between scion and rootstock in grafted

  9. High-Throughput Analysis and Automation for Glycomics Studies

    NARCIS (Netherlands)

    Shubhakar, A.; Reiding, K.R.; Gardner, R.A.; Spencer, D.I.R.; Fernandes, D.L.; Wuhrer, M.

    2015-01-01

    This review covers advances in analytical technologies for high-throughput (HTP) glycomics. Our focus is on structural studies of glycoprotein glycosylation to support biopharmaceutical realization and the discovery of glycan biomarkers for human disease. For biopharmaceuticals, there is increasing

  10. MIPHENO: Data normalization for high throughput metabolic analysis.

    Science.gov (United States)

    High throughput methodologies such as microarrays, mass spectrometry and plate-based small molecule screens are increasingly used to facilitate discoveries from gene function to drug candidate identification. These large-scale experiments are typically carried out over the course...

  11. End-to-End Optimization of High-Throughput DNA Sequencing.

    Science.gov (United States)

    O'Reilly, Eliza; Baccelli, Francois; De Veciana, Gustavo; Vikalo, Haris

    2016-10-01

    At the core of Illumina's high-throughput DNA sequencing platforms lies a biophysical surface process that results in a random geometry of clusters of homogeneous short DNA fragments typically hundreds of base pairs long-bridge amplification. The statistical properties of this random process and the lengths of the fragments are critical as they affect the information that can be subsequently extracted, that is, density of successfully inferred DNA fragment reads. The ensembles of overlapping DNA fragment reads are then used to computationally reconstruct the much longer target genome sequence. The success of the reconstruction in turn depends on having a sufficiently large ensemble of DNA fragments that are sufficiently long. In this article using stochastic geometry, we model and optimize the end-to-end flow cell synthesis and target genome sequencing process, linking and partially controlling the statistics of the physical processes to the success of the final computational step. Based on a rough calibration of our model, we provide, for the first time, a mathematical framework capturing the salient features of the sequencing platform that serves as a basis for optimizing cost, performance, and/or sensitivity analysis to various parameters.

  12. A beginners guide to SNP calling from high-throughput DNA-sequencing data.

    Science.gov (United States)

    Altmann, André; Weber, Peter; Bader, Daniel; Preuss, Michael; Binder, Elisabeth B; Müller-Myhsok, Bertram

    2012-10-01

    High-throughput DNA sequencing (HTS) is of increasing importance in the life sciences. One of its most prominent applications is the sequencing of whole genomes or targeted regions of the genome such as all exonic regions (i.e., the exome). Here, the objective is the identification of genetic variants such as single nucleotide polymorphisms (SNPs). The extraction of SNPs from the raw genetic sequences involves many processing steps and the application of a diverse set of tools. We review the essential building blocks for a pipeline that calls SNPs from raw HTS data. The pipeline includes quality control, mapping of short reads to the reference genome, visualization and post-processing of the alignment including base quality recalibration. The final steps of the pipeline include the SNP calling procedure along with filtering of SNP candidates. The steps of this pipeline are accompanied by an analysis of a publicly available whole-exome sequencing dataset. To this end, we employ several alignment programs and SNP calling routines for highlighting the fact that the choice of the tools significantly affects the final results.

  13. Comparative analysis of the gastrointestinal microbial communities of bar-headed goose (Anser indicus) in different breeding patterns by high-throughput sequencing.

    Science.gov (United States)

    Wang, Wen; Cao, Jian; Li, Ji-Rong; Yang, Fang; Li, Zhuo; Li, Lai-Xing

    2016-01-01

    The bar-headed goose is currently one of the most popular species for rare birds breeding in China. However, bar-headed geese in captivity display a reduced reproductive rate. The gut microbiome has been shown to influence host factors such as nutrient and energy metabolism, immune homeostasis and reproduction. It is therefore of great scientific and agriculture value to analyze the microbial communities associated with bar-headed geese in order to improve their reproductive rate. Here we describe the first comparative study of the gut microbial communities of bar-headed geese in three different breeding pattern groups by 16SrRNA sequences using the Illumina MiSeq platform. The results showed that Firmicutes predominated (58.33%) among wild bar-headed geese followed by Proteobacteria (30.67%), Actinobacteria (7.33%) and Bacteroidetes (3.33%). In semi-artificial breeding group, Firmicutes was also the most abundant bacteria (62.00%), followed by Bacteroidetes (28.67%), Proteobacteria (4.20%), Actinobacteria (3.27%) and Fusobacteria (1.51%). The microbial communities of artificial breeding group were dominated by Firmicutes (60.67%), Fusobacteria (29.67%) and Proteobacteria (9.33%). Wild bar-headed geese had a significant higher relative abundance of Proteobacteria and Actinobacteria, while semi-artificial breeding bar-headed geese had significantly more Bacteroidetes. The semi-artificial breeding group had the highest microbial community diversity and richness, followed by wild group, and then the artificial breeding group. The marked differences of genus level group-specific microbes create a baseline for future bar-headed goose microbiology research. Copyright © 2015 Elsevier GmbH. All rights reserved.

  14. Evaluation of a pooled strategy for high-throughput sequencing of cosmid clones from metagenomic libraries.

    Directory of Open Access Journals (Sweden)

    Kathy N Lam

    Full Text Available High-throughput sequencing methods have been instrumental in the growing field of metagenomics, with technological improvements enabling greater throughput at decreased costs. Nonetheless, the economy of high-throughput sequencing cannot be fully leveraged in the subdiscipline of functional metagenomics. In this area of research, environmental DNA is typically cloned to generate large-insert libraries from which individual clones are isolated, based on specific activities of interest. Sequence data are required for complete characterization of such clones, but the sequencing of a large set of clones requires individual barcode-based sample preparation; this can become costly, as the cost of clone barcoding scales linearly with the number of clones processed, and thus sequencing a large number of metagenomic clones often remains cost-prohibitive. We investigated a hybrid Sanger/Illumina pooled sequencing strategy that omits barcoding altogether, and we evaluated this strategy by comparing the pooled sequencing results to reference sequence data obtained from traditional barcode-based sequencing of the same set of clones. Using identity and coverage metrics in our evaluation, we show that pooled sequencing can generate high-quality sequence data, without producing problematic chimeras. Though caveats of a pooled strategy exist and further optimization of the method is required to improve recovery of complete clone sequences and to avoid circumstances that generate unrecoverable clone sequences, our results demonstrate that pooled sequencing represents an effective and low-cost alternative for sequencing large sets of metagenomic clones.

  15. The application of the high throughput sequencing technology in the transposable elements.

    Science.gov (United States)

    Liu, Zhen; Xu, Jian-hong

    2015-09-01

    High throughput sequencing technology has dramatically improved the efficiency of DNA sequencing, and decreased the costs to a great extent. Meanwhile, this technology usually has advantages of better specificity, higher sensitivity and accuracy. Therefore, it has been applied to the research on genetic variations, transcriptomics and epigenomics. Recently, this technology has been widely employed in the studies of transposable elements and has achieved fruitful results. In this review, we summarize the application of high throughput sequencing technology in the fields of transposable elements, including the estimation of transposon content, preference of target sites and distribution, insertion polymorphism and population frequency, identification of rare copies, transposon horizontal transfers as well as transposon tagging. We also briefly introduce the major common sequencing strategies and algorithms, their advantages and disadvantages, and the corresponding solutions. Finally, we envision the developing trends of high throughput sequencing technology, especially the third generation sequencing technology, and its application in transposon studies in the future, hopefully providing a comprehensive understanding and reference for related scientific researchers.

  16. Functional approach to high-throughput plant growth analysis

    Science.gov (United States)

    2013-01-01

    Method Taking advantage of the current rapid development in imaging systems and computer vision algorithms, we present HPGA, a high-throughput phenotyping platform for plant growth modeling and functional analysis, which produces better understanding of energy distribution in regards of the balance between growth and defense. HPGA has two components, PAE (Plant Area Estimation) and GMA (Growth Modeling and Analysis). In PAE, by taking the complex leaf overlap problem into consideration, the area of every plant is measured from top-view images in four steps. Given the abundant measurements obtained with PAE, in the second module GMA, a nonlinear growth model is applied to generate growth curves, followed by functional data analysis. Results Experimental results on model plant Arabidopsis thaliana show that, compared to an existing approach, HPGA reduces the error rate of measuring plant area by half. The application of HPGA on the cfq mutant plants under fluctuating light reveals the correlation between low photosynthetic rates and small plant area (compared to wild type), which raises a hypothesis that knocking out cfq changes the sensitivity of the energy distribution under fluctuating light conditions to repress leaf growth. Availability HPGA is available at http://www.msu.edu/~jinchen/HPGA. PMID:24565437

  17. Using high-throughput sequencing of ITS2 to describe Symbiodinium metacommunities in St. John, US Virgin Islands

    OpenAIRE

    Ross Cunning; Gates, Ruth D; Edmunds, Peter J.

    2017-01-01

    Symbiotic microalgae (Symbiodinium spp.) strongly influence the performance and stress-tolerance of their coral hosts, making the analysis of Symbiodinium communities in corals (and metacommunities on reefs) advantageous for many aspects of coral reef research. High-throughput sequencing of ITS2 nrDNA offers unprecedented scale in describing these communities, yet high intragenomic variability at this locus complicates the resolution of biologically meaningful diversity. Here, we demonstrate ...

  18. High-Throughput Sequencing of Three Lemnoideae (Duckweeds) Chloroplast Genomes from Total DNA

    Science.gov (United States)

    Wang, Wenqin; Messing, Joachim

    2011-01-01

    Background Chloroplast genomes provide a wealth of information for evolutionary and population genetic studies. Chloroplasts play a particularly important role in the adaption for aquatic plants because they float on water and their major surface is exposed continuously to sunlight. The subfamily of Lemnoideae represents such a collection of aquatic species that because of photosynthesis represents one of the fastest growing plant species on earth. Methods We sequenced the chloroplast genomes from three different genera of Lemnoideae, Spirodela polyrhiza, Wolffiella lingulata and Wolffia australiana by high-throughput DNA sequencing of genomic DNA using the SOLiD platform. Unfractionated total DNA contains high copies of plastid DNA so that sequences from the nucleus and mitochondria can easily be filtered computationally. Remaining sequence reads were assembled into contiguous sequences (contigs) using SOLiD software tools. Contigs were mapped to a reference genome of Lemna minor and gaps, selected by PCR, were sequenced on the ABI3730xl platform. Conclusions This combinatorial approach yielded whole genomic contiguous sequences in a cost-effective manner. Over 1,000-time coverage of chloroplast from total DNA were reached by the SOLiD platform in a single spot on a quadrant slide without purification. Comparative analysis indicated that the chloroplast genome was conserved in gene number and organization with respect to the reference genome of L. minor. However, higher nucleotide substitution, abundant deletions and insertions occurred in non-coding regions of these genomes, indicating a greater genomic dynamics than expected from the comparison of other related species in the Pooideae. Noticeably, there was no transition bias over transversion in Lemnoideae. The data should have immediate applications in evolutionary biology and plant taxonomy with increased resolution and statistical power. PMID:21931804

  19. High-throughput sequencing of three Lemnoideae (duckweeds chloroplast genomes from total DNA.

    Directory of Open Access Journals (Sweden)

    Wenqin Wang

    Full Text Available BACKGROUND: Chloroplast genomes provide a wealth of information for evolutionary and population genetic studies. Chloroplasts play a particularly important role in the adaption for aquatic plants because they float on water and their major surface is exposed continuously to sunlight. The subfamily of Lemnoideae represents such a collection of aquatic species that because of photosynthesis represents one of the fastest growing plant species on earth. METHODS: We sequenced the chloroplast genomes from three different genera of Lemnoideae, Spirodela polyrhiza, Wolffiella lingulata and Wolffia australiana by high-throughput DNA sequencing of genomic DNA using the SOLiD platform. Unfractionated total DNA contains high copies of plastid DNA so that sequences from the nucleus and mitochondria can easily be filtered computationally. Remaining sequence reads were assembled into contiguous sequences (contigs using SOLiD software tools. Contigs were mapped to a reference genome of Lemna minor and gaps, selected by PCR, were sequenced on the ABI3730xl platform. CONCLUSIONS: This combinatorial approach yielded whole genomic contiguous sequences in a cost-effective manner. Over 1,000-time coverage of chloroplast from total DNA were reached by the SOLiD platform in a single spot on a quadrant slide without purification. Comparative analysis indicated that the chloroplast genome was conserved in gene number and organization with respect to the reference genome of L. minor. However, higher nucleotide substitution, abundant deletions and insertions occurred in non-coding regions of these genomes, indicating a greater genomic dynamics than expected from the comparison of other related species in the Pooideae. Noticeably, there was no transition bias over transversion in Lemnoideae. The data should have immediate applications in evolutionary biology and plant taxonomy with increased resolution and statistical power.

  20. In Silico Identification of RNA Modifications from High-Throughput Sequencing Data Using HAMR.

    Science.gov (United States)

    Kuksa, Pavel P; Leung, Yuk Yee; Vandivier, Lee E; Anderson, Zachary; Gregory, Brian D; Wang, Li-San

    2017-01-01

    RNA molecules are often altered post-transcriptionally by the covalent modification of their nucleotides. These modifications are known to modulate the structure, function, and activity of RNAs. When reverse transcribed into cDNA during RNA sequencing library preparation, atypical (modified) ribonucleotides that affect Watson-Crick base pairing will interfere with reverse transcriptase (RT), resulting in cDNA products with mis-incorporated bases or prematurely terminated RNA products. These interactions with RT can therefore be inferred from mismatch patterns in the sequencing reads, and are distinguishable from simple base-calling errors, single-nucleotide polymorphisms (SNPs), or RNA editing sites. Here, we describe a computational protocol for the in silico identification of modified ribonucleotides from RT-based RNA-seq read-out using the High-throughput Analysis of Modified Ribonucleotides (HAMR) software. HAMR can identify these modifications transcriptome-wide with single nucleotide resolution, and also differentiate between different types of modifications to predict modification identity. Researchers can use HAMR to identify and characterize RNA modifications using RNA-seq data from a variety of common RT-based sequencing protocols such as Poly(A), total RNA-seq, and small RNA-seq.

  1. A Multicenter Study To Evaluate the Performance of High-Throughput Sequencing for Virus Detection.

    Science.gov (United States)

    Khan, Arifa S; Ng, Siemon H S; Vandeputte, Olivier; Aljanahi, Aisha; Deyati, Avisek; Cassart, Jean-Pol; Charlebois, Robert L; Taliaferro, Lanyn P

    2017-01-01

    The capability of high-throughput sequencing (HTS) for detection of known and unknown viruses makes it a powerful tool for broad microbial investigations, such as evaluation of novel cell substrates that may be used for the development of new biological products. However, like any new assay, regulatory applications of HTS need method standardization. Therefore, our three laboratories initiated a study to evaluate performance of HTS for potential detection of viral adventitious agents by spiking model viruses in different cellular matrices to mimic putative materials for manufacturing of biologics. Four model viruses were selected based upon different physical and biochemical properties and commercial availability: human respiratory syncytial virus (RSV), Epstein-Barr virus (EBV), feline leukemia virus (FeLV), and human reovirus (REO). Additionally, porcine circovirus (PCV) was tested by one laboratory. Independent samples were prepared for HTS by spiking intact viruses or extracted viral nucleic acids, singly or mixed, into different HeLa cell matrices (resuspended whole cells, cell lysate, or total cellular RNA). Data were obtained using different sequencing platforms (Roche 454, Illumina HiSeq1500 or HiSeq2500). Bioinformatic analyses were performed independently by each laboratory using available tools, pipelines, and databases. The results showed that comparable virus detection was obtained in the three laboratories regardless of sample processing, library preparation, sequencing platform, and bioinformatic analysis: between 0.1 and 3 viral genome copies per cell were detected for all of the model viruses used. This study highlights the potential for using HTS for sensitive detection of adventitious viruses in complex biological samples containing cellular background. IMPORTANCE Recent high-throughput sequencing (HTS) investigations have resulted in unexpected discoveries of known and novel viruses in a variety of sample types, including research materials

  2. High-throughput sequencing enhanced phage display enables the identification of patient-specific epitope motifs in serum

    DEFF Research Database (Denmark)

    Christiansen, Anders; Kringelum, Jens Vindahl; Hansen, Christian Skjødt

    2015-01-01

    Phage display is a prominent screening technique with a multitude of applications including therapeutic antibody development and mapping of antigen epitopes. In this study, phages were selected based on their interaction with patient serum and exhaustively characterised by high-throughput sequenc......Phage display is a prominent screening technique with a multitude of applications including therapeutic antibody development and mapping of antigen epitopes. In this study, phages were selected based on their interaction with patient serum and exhaustively characterised by high......-throughput sequencing. A bioinformatics approach was developed in order to identify peptide motifs of interest based on clustering and contrasting to control samples. Comparison of patient and control samples confirmed a major issue in phage display, namely the selection of unspecific peptides. The potential...... display by (i) enabling the analysis of complex biological samples, (ii) circumventing the traditional laborious picking and functional testing of individual phage clones and (iii) reducing the number of selection rounds....

  3. Bacterial diversity of the Colombian fermented milk "Suero Costeño" assessed by culturing and high-throughput sequencing and DGGE analysis of 16S rRNA gene amplicons.

    Science.gov (United States)

    Motato, Karina Edith; Milani, Christian; Ventura, Marco; Valencia, Francia Elena; Ruas-Madiedo, Patricia; Delgado, Susana

    2017-12-01

    "Suero Costeño" (SC) is a traditional soured cream elaborated from raw milk in the Northern-Caribbean coast of Colombia. The natural microbiota that characterizes this popular Colombian fermented milk is unknown, although several culturing studies have previously been attempted. In this work, the microbiota associated with SC from three manufacturers in two regions, "Planeta Rica" (Córdoba) and "Caucasia" (Antioquia), was analysed by means of culturing methods in combination with high-throughput sequencing and DGGE analysis of 16S rRNA gene amplicons. The bacterial ecosystem of SC samples was revealed to be composed of lactic acid bacteria belonging to the Streptococcaceae and Lactobacillaceae families; the proportions and genera varying among manufacturers and region of elaboration. Members of the Lactobacillus acidophilus group, Lactocococcus lactis, Streptococcus infantarius and Streptococcus salivarius characterized this artisanal product. In comparison with culturing, the use of molecular in deep culture-independent techniques provides a more realistic picture of the overall bacterial communities residing in SC. Besides the descriptive purpose, these approaches will facilitate a rational strategy to follow (culture media and growing conditions) for the isolation of indigenous strains that allow standardization in the manufacture of SC. Copyright © 2017 Elsevier Ltd. All rights reserved.

  4. High-throughput Sequencing Based Immune Repertoire Study during Infectious Disease

    Directory of Open Access Journals (Sweden)

    Dongni Hou

    2016-08-01

    Full Text Available The selectivity of the adaptive immune response is based on the enormous diversity of T and B cell antigen-specific receptors. The immune repertoire, the collection of T and B cells with functional diversity in the circulatory system at any given time, is dynamic and reflects the essence of immune selectivity. In this article, we review the recent advances in immune repertoire study of infectious diseases that achieved by traditional techniques and high-throughput sequencing techniques. High-throughput sequencing techniques enable the determination of complementary regions of lymphocyte receptors with unprecedented efficiency and scale. This progress in methodology enhances the understanding of immunologic changes during pathogen challenge, and also provides a basis for further development of novel diagnostic markers, immunotherapies and vaccines.

  5. Sources of PCR-induced distortions in high-throughput sequencing data sets

    Science.gov (United States)

    Kebschull, Justus M.; Zador, Anthony M.

    2015-01-01

    PCR permits the exponential and sequence-specific amplification of DNA, even from minute starting quantities. PCR is a fundamental step in preparing DNA samples for high-throughput sequencing. However, there are errors associated with PCR-mediated amplification. Here we examine the effects of four important sources of error—bias, stochasticity, template switches and polymerase errors—on sequence representation in low-input next-generation sequencing libraries. We designed a pool of diverse PCR amplicons with a defined structure, and then used Illumina sequencing to search for signatures of each process. We further developed quantitative models for each process, and compared predictions of these models to our experimental data. We find that PCR stochasticity is the major force skewing sequence representation after amplification of a pool of unique DNA amplicons. Polymerase errors become very common in later cycles of PCR but have little impact on the overall sequence distribution as they are confined to small copy numbers. PCR template switches are rare and confined to low copy numbers. Our results provide a theoretical basis for removing distortions from high-throughput sequencing data. In addition, our findings on PCR stochasticity will have particular relevance to quantification of results from single cell sequencing, in which sequences are represented by only one or a few molecules. PMID:26187991

  6. The efficacy of high-throughput sequencing and target enrichment on charred archaeobotanical remains.

    Science.gov (United States)

    Nistelberger, H M; Smith, O; Wales, N; Star, B; Boessenkool, S

    2016-11-24

    The majority of archaeological plant material is preserved in a charred state. Obtaining reliable ancient DNA data from these remains has presented challenges due to high rates of nucleotide damage, short DNA fragment lengths, low endogenous DNA content and the potential for modern contamination. It has been suggested that high-throughput sequencing (HTS) technologies coupled with DNA enrichment techniques may overcome some of these limitations. Here we report the findings of HTS and target enrichment on four important archaeological crops (barley, grape, maize and rice) performed in three different laboratories, presenting the largest HTS assessment of charred archaeobotanical specimens to date. Rigorous analysis of our data - excluding false-positives due to background contamination or incorrect index assignments - indicated a lack of endogenous DNA in nearly all samples, except for one lightly-charred maize cob. Even with target enrichment, this sample failed to yield adequate data required to address fundamental questions in archaeology and biology. We further reanalysed part of an existing dataset on charred plant material, and found all purported endogenous DNA sequences were likely to be spurious. We suggest these technologies are not suitable for use with charred archaeobotanicals and urge great caution when interpreting data obtained by HTS of these remains.

  7. Using machine learning and high-throughput RNA sequencing to classify the precursors of small non-coding RNAs.

    Science.gov (United States)

    Ryvkin, Paul; Leung, Yuk Yee; Ungar, Lyle H; Gregory, Brian D; Wang, Li-San

    2014-05-01

    Recent advances in high-throughput sequencing allow researchers to examine the transcriptome in more detail than ever before. Using a method known as high-throughput small RNA-sequencing, we can now profile the expression of small regulatory RNAs such as microRNAs and small interfering RNAs (siRNAs) with a great deal of sensitivity. However, there are many other types of small RNAs (small nucleolar RNAs), snRNAs (small nuclear RNAs), scRNAs (small cytoplasmic RNAs), tRNAs (transfer RNAs), and transposon-derived RNAs. Here, we present a user's guide for CoRAL (Classification of RNAs by Analysis of Length), a computational method for discriminating between different classes of RNA using high-throughput small RNA-sequencing data. Not only can CoRAL distinguish between RNA classes with high accuracy, but it also uses features that are relevant to small RNA biogenesis pathways. By doing so, CoRAL can give biologists a glimpse into the characteristics of different RNA processing pathways and how these might differ between tissue types, biological conditions, or even different species. CoRAL is available at http://wanglab.pcbi.upenn.edu/coral/. Copyright © 2013 Elsevier Inc. All rights reserved.

  8. Fully Bayesian Analysis of High-throughput Targeted Metabolomics Assays

    Science.gov (United States)

    High-throughput metabolomic assays that allow simultaneous targeted screening of hundreds of metabolites have recently become available in kit form. Such assays provide a window into understanding changes to biochemical pathways due to chemical exposure or disease, and are usefu...

  9. High-throughput sequencing of forensic genetic samples using punches of FTA cards with buccal swabs

    DEFF Research Database (Denmark)

    Kampmann, Marie-Louise; Buchard, Anders; Børsting, Claus

    2016-01-01

    Here, we demonstrate that punches from buccal swab samples preserved on FTA cards can be used for high-throughput DNA sequencing, also known as massively parallel sequencing (MPS). We typed 44 reference samples with the HID-Ion AmpliSeq Identity Panel using washed 1.2 mm punches from FTA cards...... with buccal swabs and compared the results with those obtained with DNA extracted using the EZ1 DNA Investigator Kit. Concordant profiles were obtained for all samples. Our protocol includes simple punch, wash, and PCR steps, reducing cost and hands-on time in the laboratory. Furthermore, it facilitates...... automation of DNA sequencing....

  10. High-throughput massively parallel sequencing for fetal aneuploidy detection from maternal plasma.

    Directory of Open Access Journals (Sweden)

    Taylor J Jensen

    Full Text Available Circulating cell-free (ccf fetal DNA comprises 3-20% of all the cell-free DNA present in maternal plasma. Numerous research and clinical studies have described the analysis of ccf DNA using next generation sequencing for the detection of fetal aneuploidies with high sensitivity and specificity. We sought to extend the utility of this approach by assessing semi-automated library preparation, higher sample multiplexing during sequencing, and improved bioinformatic tools to enable a higher throughput, more efficient assay while maintaining or improving clinical performance.Whole blood (10mL was collected from pregnant female donors and plasma separated using centrifugation. Ccf DNA was extracted using column-based methods. Libraries were prepared using an optimized semi-automated library preparation method and sequenced on an Illumina HiSeq2000 sequencer in a 12-plex format. Z-scores were calculated for affected chromosomes using a robust method after normalization and genomic segment filtering. Classification was based upon a standard normal transformed cutoff value of z = 3 for chromosome 21 and z = 3.95 for chromosomes 18 and 13.Two parallel assay development studies using a total of more than 1900 ccf DNA samples were performed to evaluate the technical feasibility of automating library preparation and increasing the sample multiplexing level. These processes were subsequently combined and a study of 1587 samples was completed to verify the stability of the process-optimized assay. Finally, an unblinded clinical evaluation of 1269 euploid and aneuploid samples utilizing this high-throughput assay coupled to improved bioinformatic procedures was performed. We were able to correctly detect all aneuploid cases with extremely low false positive rates of 0.09%, <0.01%, and 0.08% for trisomies 21, 18, and 13, respectively.These data suggest that the developed laboratory methods in concert with improved bioinformatic approaches enable higher sample

  11. Improved detection of artifactual viral minority variants in high-throughput sequencing data

    Directory of Open Access Journals (Sweden)

    Matthijs Rudolf Albert Welkers

    2015-01-01

    Full Text Available High-throughput sequencing (HTS of viral samples provides important information on the presence of viral minority variants. However, detection and accurate quantification is limited by the capacity to distinguish biological from artificial variation. In this study, errors related to the Illumina Hiseq2000 library generation and HTS process were investigated by determining minority variant frequencies in an influenza A/WSN/1933(H1N1 virus reverse-genetics plasmid pool. Errors related to amplification and sequencing were determined using the same plasmid pool, by generation of infectious virus using reverse genetics followed by in duplo reverse-transcriptase PCR (RT-PCR amplification and HTS in the same sequence run. Results showed that after ‘best practice’ quality control (QC, within the plasmid pool, 1 minority variant with a frequency >0.5% was identified, while 84 and 139 were identified in the RT-PCR amplified samples, indicating RT-PCR amplification artificially increased variation. Detailed analysis showed that artifactual minority variants could be identified by two major technical characteristics: their predominant presence in a single read orientation and uneven distribution of mismatches over the length of the reads. We demonstrate that by addition of two QC steps 95% of the artifactual minority variants could be identified. When our analysis approach was applied to 3 clinical samples 68% of the initially identified minority variants were identified as artifacts. Our study clearly demonstrated that, without additional QC steps, overestimation of viral minority variants is very likely to occur, mainly as a consequence of the required RT-PCR amplification step. The improved ability to detect and correct for artifactual minority variants, increases data resolution and could aid both past and future studies incorporating HTS. The source code has been made available through Sourceforge (https://sourceforge.net/projects/mva-ngs.

  12. Analysis of the Bacterial Communities in Two Liquors of Soy Sauce Aroma as Revealed by High-Throughput Sequencing of the 16S rRNA V4 Hypervariable Region

    Science.gov (United States)

    Tang, Jing; Tang, Xiaoxin; Tang, Ming; Zhang, Ximin; Xu, Xiaorong

    2017-01-01

    Chinese liquor is one of the world's oldest distilled alcoholic beverages and an important commercial fermented product in China. The Chinese liquor fermentation process has three stages: making Daqu (the starter), stacking fermentation on the ground, and liquor fermentation in pits. We investigated the bacterial diversity of Maotai and Guotai Daqu and liquor fermentation using high-throughput sequencing of the V4 hypervariable region of the 16S rRNA gene. A total of 70,297 sequences were obtained from the Daqu samples and clustered into 17 phyla. The composition of the bacterial communities in the Daqu from these two soy sauce aroma-style Chinese liquors was the same, although some bacterial species changed in abundance. Between the Daqu and liquor fermentation samples, 12 bacterial phyla increased. The abundance of Lactobacillus and Pseudomonas increased in the liquor fermentation. This study has used high-throughput sequencing to provide new insights into the bacterial composition of the Chinese liquor Daqu and fermentation. Similarities in the distribution of bacteria in the soy sauce aroma-style Chinese liquors Daqu suggest that the abundance of bacteria might be generally concerned to other liquor. PMID:28337455

  13. Analysis of the Bacterial Communities in Two Liquors of Soy Sauce Aroma as Revealed by High-Throughput Sequencing of the 16S rRNA V4 Hypervariable Region.

    Science.gov (United States)

    Tang, Jing; Tang, Xiaoxin; Tang, Ming; Zhang, Ximin; Xu, Xiaorong; Yi, Yin

    2017-01-01

    Chinese liquor is one of the world's oldest distilled alcoholic beverages and an important commercial fermented product in China. The Chinese liquor fermentation process has three stages: making Daqu (the starter), stacking fermentation on the ground, and liquor fermentation in pits. We investigated the bacterial diversity of Maotai and Guotai Daqu and liquor fermentation using high-throughput sequencing of the V4 hypervariable region of the 16S rRNA gene. A total of 70,297 sequences were obtained from the Daqu samples and clustered into 17 phyla. The composition of the bacterial communities in the Daqu from these two soy sauce aroma-style Chinese liquors was the same, although some bacterial species changed in abundance. Between the Daqu and liquor fermentation samples, 12 bacterial phyla increased. The abundance of Lactobacillus and Pseudomonas increased in the liquor fermentation. This study has used high-throughput sequencing to provide new insights into the bacterial composition of the Chinese liquor Daqu and fermentation. Similarities in the distribution of bacteria in the soy sauce aroma-style Chinese liquors Daqu suggest that the abundance of bacteria might be generally concerned to other liquor.

  14. Analysis of the Bacterial Communities in Two Liquors of Soy Sauce Aroma as Revealed by High-Throughput Sequencing of the 16S rRNA V4 Hypervariable Region

    Directory of Open Access Journals (Sweden)

    Jing Tang

    2017-01-01

    Full Text Available Chinese liquor is one of the world’s oldest distilled alcoholic beverages and an important commercial fermented product in China. The Chinese liquor fermentation process has three stages: making Daqu (the starter, stacking fermentation on the ground, and liquor fermentation in pits. We investigated the bacterial diversity of Maotai and Guotai Daqu and liquor fermentation using high-throughput sequencing of the V4 hypervariable region of the 16S rRNA gene. A total of 70,297 sequences were obtained from the Daqu samples and clustered into 17 phyla. The composition of the bacterial communities in the Daqu from these two soy sauce aroma-style Chinese liquors was the same, although some bacterial species changed in abundance. Between the Daqu and liquor fermentation samples, 12 bacterial phyla increased. The abundance of Lactobacillus and Pseudomonas increased in the liquor fermentation. This study has used high-throughput sequencing to provide new insights into the bacterial composition of the Chinese liquor Daqu and fermentation. Similarities in the distribution of bacteria in the soy sauce aroma-style Chinese liquors Daqu suggest that the abundance of bacteria might be generally concerned to other liquor.

  15. Finding sRNA generative locales from high-throughput sequencing data with NiBLS

    Directory of Open Access Journals (Sweden)

    Moulton Vincent

    2010-02-01

    Full Text Available Abstract Background Next-generation sequencing technologies allow researchers to obtain millions of sequence reads in a single experiment. One important use of the technology is the sequencing of small non-coding regulatory RNAs and the identification of the genomic locales from which they originate. Currently, there is a paucity of methods for finding small RNA generative locales. Results We describe and implement an algorithm that can determine small RNA generative locales from high-throughput sequencing data. The algorithm creates a network, or graph, of the small RNAs by creating links between them depending on their proximity on the target genome. For each of the sub-networks in the resulting graph the clustering coefficient, a measure of the interconnectedness of the subnetwork, is used to identify the generative locales. We test the algorithm over a wide range of parameters using RFAM sequences as positive controls and demonstrate that the algorithm has good sensitivity and specificity in a range of Arabidopsis and mouse small RNA sequence sets and that the locales it generates are robust to differences in the choice of parameters. Conclusions NiBLS is a fast, reliable and sensitive method for determining small RNA locales in high-throughput sequence data that is generally applicable to all classes of small RNA.

  16. Trade-Off Analysis in High-Throughput Materials Exploration.

    Science.gov (United States)

    Volety, Kalpana K; Huyberechts, Guido P J

    2017-03-13

    This Research Article presents a strategy to identify the optimum compositions in metal alloys with certain desired properties in a high-throughput screening environment, using a multiobjective optimization approach. In addition to the identification of the optimum compositions in a primary screening, the strategy also allows pointing to regions in the compositional space where further exploration in a secondary screening could be carried out. The strategy for the primary screening is a combination of two multiobjective optimization approaches namely Pareto optimality and desirability functions. The experimental data used in the present study have been collected from over 200 different compositions belonging to four different alloy systems. The metal alloys (comprising Fe, Ti, Al, Nb, Hf, Zr) are synthesized and screened using high-throughput technologies. The advantages of such a kind of approach compared to the limitations of the traditional and comparatively simpler approaches like ranking and calculating figures of merit are discussed.

  17. Improving High-Throughput Sequencing Approaches for Reconstructing the Evolutionary Dynamics of Upper Paleolithic Human Groups

    DEFF Research Database (Denmark)

    Seguin-Orlando, Andaine

    the development and testing of innovative molecular approaches aiming at improving the amount of informative HTS data one can recover from ancient DNA extracts. We have characterized important ligation and amplification biases in the sequencing library building and enrichment steps, which can impede further...... been mainly driven by the development of High-Throughput DNA Sequencing (HTS) technologies but also by the implementation of novel molecular tools tailored to the manipulation of ultra short and damaged DNA molecules. Our ability to retrieve traces of genetic material has tremendously improved, pushing...

  18. On the optimal trimming of high-throughput mRNA sequence data

    Directory of Open Access Journals (Sweden)

    Matthew D MacManes

    2014-01-01

    Full Text Available The widespread and rapid adoption of high-throughput sequencing technologies has afforded researchers the opportunity to gain a deep understanding of genome level processes that underlie evolutionary change, and perhaps more importantly, the links between genotype and phenotype. In particular, researchers interested in functional biology and adaptation have used these technologies to sequence mRNA transcriptomes of specific tissues, which in turn are often compared to other tissues, or other individuals with different phenotypes. While these techniques are extremely powerful, careful attention to data quality is required. In particular, because high-throughput sequencing is more error-prone than traditional Sanger sequencing, quality trimming of sequence reads should be an important step in all data processing pipelines. While several software packages for quality trimming exist, no general guidelines for the specifics of trimming have been developed. Here, using empirically derived sequence data, I provide general recommendations regarding the optimal strength of trimming, specifically in mRNA-Seq studies. Although very aggressive quality trimming is common, this study suggests that a more gentle trimming, specifically of those nucleotides whose Phred score < 2 or < 5, is optimal for most studies across a wide variety of metrics.

  19. Unbiased Characterization of Anopheles Mosquito Blood Meals by Targeted High-Throughput Sequencing.

    Directory of Open Access Journals (Sweden)

    Kyle Logue

    2016-03-01

    Full Text Available Understanding mosquito host choice is important for assessing vector competence or identifying disease reservoirs. Unfortunately, the availability of an unbiased method for comprehensively evaluating the composition of insect blood meals is very limited, as most current molecular assays only test for the presence of a few pre-selected species. These approaches also have limited ability to identify the presence of multiple mammalian hosts in a single blood meal. Here, we describe a novel high-throughput sequencing method that enables analysis of 96 mosquitoes simultaneously and provides a comprehensive and quantitative perspective on the composition of each blood meal. We validated in silico that universal primers targeting the mammalian mitochondrial 16S ribosomal RNA genes (16S rRNA should amplify more than 95% of the mammalian 16S rRNA sequences present in the NCBI nucleotide database. We applied this method to 442 female Anopheles punctulatus s. l. mosquitoes collected in Papua New Guinea (PNG. While human (52.9%, dog (15.8% and pig (29.2% were the most common hosts identified in our study, we also detected DNA from mice, one marsupial species and two bat species. Our analyses also revealed that 16.3% of the mosquitoes fed on more than one host. Analysis of the human mitochondrial hypervariable region I in 102 human blood meals showed that 5 (4.9% of the mosquitoes unambiguously fed on more than one person. Overall, analysis of PNG mosquitoes illustrates the potential of this approach to identify unsuspected hosts and characterize mixed blood meals, and shows how this approach can be adapted to evaluate inter-individual variations among human blood meals. Furthermore, this approach can be applied to any disease-transmitting arthropod and can be easily customized to investigate non-mammalian host sources.

  20. Diversity and Structure of Diazotrophic Communities in Mangrove Rhizosphere, Revealed by High-Throughput Sequencing

    Directory of Open Access Journals (Sweden)

    Yanying Zhang

    2017-10-01

    Full Text Available Diazotrophic communities make an essential contribution to the productivity through providing new nitrogen. However, knowledge of the roles that both mangrove tree species and geochemical parameters play in shaping mangove rhizosphere diazotrophic communities is still elusive. Here, a comprehensive examination of the diversity and structure of microbial communities in the rhizospheres of three mangrove species, Rhizophora apiculata, Avicennia marina, and Ceriops tagal, was undertaken using high-throughput sequencing of the 16S rRNA and nifH genes. Our results revealed a great diversity of both the total microbial composition and the diazotrophic composition specifically in the mangrove rhizosphere. Deltaproteobacteria and Gammaproteobacteria were both ubiquitous and dominant, comprising an average of 45.87 and 86.66% of total microbial and diazotrophic communities, respectively. Sulfate-reducing bacteria belonging to the Desulfobacteraceae and Desulfovibrionaceae were the dominant diazotrophs. Community statistical analyses suggested that both mangrove tree species and additional environmental variables played important roles in shaping total microbial and potential diazotroph communities in mangrove rhizospheres. In contrast to the total microbial community investigated by analysis of 16S rRNA gene sequences, most of the dominant diazotrophic groups identified by nifH gene sequences were significantly different among mangrove species. The dominant diazotrophs of the family Desulfobacteraceae were positively correlated with total phosphorus, but negatively correlated with the nitrogen to phosphorus ratio. The Pseudomonadaceae were positively correlated with the concentration of available potassium, suggesting that diazotrophs potentially play an important role in biogeochemical cycles, such as those of nitrogen, phosphorus, sulfur, and potassium, in the mangrove ecosystem.

  1. Bayesian analysis of high-throughput quantitative measurement of protein-DNA interactions.

    Directory of Open Access Journals (Sweden)

    David D Pollock

    Full Text Available Transcriptional regulation depends upon the binding of transcription factor (TF proteins to DNA in a sequence-dependent manner. Although many experimental methods address the interaction between DNA and proteins, they generally do not comprehensively and accurately assess the full binding repertoire (the complete set of sequences that might be bound with at least moderate strength. Here, we develop and evaluate through simulation an experimental approach that allows simultaneous high-throughput quantitative analysis of TF binding affinity to thousands of potential DNA ligands. Tens of thousands of putative binding targets can be mixed with a TF, and both the pre-bound and bound target pools sequenced. A hierarchical Bayesian Markov chain Monte Carlo approach determines posterior estimates for the dissociation constants, sequence-specific binding energies, and free TF concentrations. A unique feature of our approach is that dissociation constants are jointly estimated from their inferred degree of binding and from a model of binding energetics, depending on how many sequence reads are available and the explanatory power of the energy model. Careful experimental design is necessary to obtain accurate results over a wide range of dissociation constants. This approach, which we call Simultaneous Ultra high-throughput Ligand Dissociation EXperiment (SULDEX, is theoretically capable of rapid and accurate elucidation of an entire TF-binding repertoire.

  2. Comparison analysis of microRNAs in response to EV71 and CA16 infection in human bronchial epithelial cells by high-throughput sequencing to reveal differential infective mechanisms.

    Science.gov (United States)

    Hu, Yajie; Song, Jie; Liu, Longding; Li, Jing; Tang, Beibei; Zhang, Ying; Wang, Jingjing; Wang, Lichun; Fan, Shengtao; Feng, Ming; Li, Qihan

    2017-01-15

    Hand, foot, and mouth disease (HFMD) mainly caused by Enterovirus 71 (EV71) and coxsackievirus A16 (CA16) infections which presented significantly different clinical manifestations. Nevertheless, the factors underlying these differences remain unclear. Recently, the functions of microRNAs (miRNAs) in pathogen-host interactions have been highlighted. Here, we performed comprehensive miRNA profiling in EV71- and CA16-infected human bronchial epithelial (16HBE) cells at multiple time points using high-throughput sequencing. The results showed that 154 known and 47 novel miRNAs exhibited remarkable differences in expression. Of these, 65 miRNAs, including 58 known and 7 novel miRNAs, presented opposite trends in EV71- and CA16-infected samples. Subsequently, we mainly focused on the 56 known differentially expressed miRNAs by further screening for targets prediction. GO and pathway analysis of these targets demonstrated that 18 biological processes, 7 molecular functions, 1 cellular component and 123 pathways were enriched. Among these pathways, Cadherin signalling pathway, Wnt signalling pathway and angiogenesis showed significant alterations. The regulatory networks of these miRNAs with predicted targets, GOs, pathways and transcription factors were determined, which suggested that miRNAs displayed intricate regulatory mechanisms during the infection phase. Consequently, we specifically analysed the hierarchical GO categories of the predicted targets involved in adhesion. The results indicated that the distinct changes induced by EV71 and CA16 infection may be partly linked to airway epithelial barrier function. Taken together, our data provide useful insights that help elucidate the different host-pathogen interactions following EV71 and CA16 infection and might offer novel therapeutic targets for these infections. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.

  3. Exploring fungal diversity in deep-sea sediments from Okinawa Trough using high-throughput Illumina sequencing

    Science.gov (United States)

    Zhang, Xiao-Yong; Wang, Guang-Hua; Xu, Xin-Ya; Nong, Xu-Hua; Wang, Jie; Amin, Muhammad; Qi, Shu-Hua

    2016-10-01

    The present study investigated the fungal diversity in four different deep-sea sediments from Okinawa Trough using high-throughput Illumina sequencing of the nuclear ribosomal internal transcribed spacer-1 (ITS1). A total of 40,297 fungal ITS1 sequences clustered into 420 operational taxonomic units (OTUs) with 97% sequence similarity and 170 taxa were recovered from these sediments. Most ITS1 sequences (78%) belonged to the phylum Ascomycota, followed by Basidiomycota (17.3%), Zygomycota (1.5%) and Chytridiomycota (0.8%), and a small proportion (2.4%) belonged to unassigned fungal phyla. Compared with previous studies on fungal diversity of sediments from deep-sea environments by culture-dependent approach and clone library analysis, the present result suggested that Illumina sequencing had been dramatically accelerating the discovery of fungal community of deep-sea sediments. Furthermore, our results revealed that Sordariomycetes was the most diverse and abundant fungal class in this study, challenging the traditional view that the diversity of Sordariomycetes phylotypes was low in the deep-sea environments. In addition, more than 12 taxa accounted for 21.5% sequences were found to be rarely reported as deep-sea fungi, suggesting the deep-sea sediments from Okinawa Trough harbored a plethora of different fungal communities compared with other deep-sea environments. To our knowledge, this study is the first exploration of the fungal diversity in deep-sea sediments from Okinawa Trough using high-throughput Illumina sequencing.

  4. Semi-automated library preparation for high-throughput DNA sequencing platforms.

    Science.gov (United States)

    Farias-Hesson, Eveline; Erikson, Jonathan; Atkins, Alexander; Shen, Peidong; Davis, Ronald W; Scharfe, Curt; Pourmand, Nader

    2010-01-01

    Next-generation sequencing platforms are powerful technologies, providing gigabases of genetic information in a single run. An important prerequisite for high-throughput DNA sequencing is the development of robust and cost-effective preprocessing protocols for DNA sample library construction. Here we report the development of a semi-automated sample preparation protocol to produce adaptor-ligated fragment libraries. Using a liquid-handling robot in conjunction with Carboxy Terminated Magnetic Beads, we labeled each library sample using a unique 6 bp DNA barcode, which allowed multiplex sample processing and sequencing of 32 libraries in a single run using Applied Biosystems' SOLiD sequencer. We applied our semi-automated pipeline to targeted medical resequencing of nuclear candidate genes in individuals affected by mitochondrial disorders. This novel method is capable of preparing as much as 32 DNA libraries in 2.01 days (8-hour workday) for emulsion PCR/high throughput DNA sequencing, increasing sample preparation production by 8-fold.

  5. A family-based probabilistic method for capturing de novo mutations from high-throughput short-read sequencing data.

    Science.gov (United States)

    Cartwright, Reed A; Hussin, Julie; Keebler, Jonathan E M; Stone, Eric A; Awadalla, Philip

    2012-01-06

    Recent advances in high-throughput DNA sequencing technologies and associated statistical analyses have enabled in-depth analysis of whole-genome sequences. As this technology is applied to a growing number of individual human genomes, entire families are now being sequenced. Information contained within the pedigree of a sequenced family can be leveraged when inferring the donors' genotypes. The presence of a de novo mutation within the pedigree is indicated by a violation of Mendelian inheritance laws. Here, we present a method for probabilistically inferring genotypes across a pedigree using high-throughput sequencing data and producing the posterior probability of de novo mutation at each genomic site examined. This framework can be used to disentangle the effects of germline and somatic mutational processes and to simultaneously estimate the effect of sequencing error and the initial genetic variation in the population from which the founders of the pedigree arise. This approach is examined in detail through simulations and areas for method improvement are noted. By applying this method to data from members of a well-defined nuclear family with accurate pedigree information, the stage is set to make the most direct estimates of the human mutation rate to date.

  6. Comparison of mapping algorithms used in high-throughput sequencing: application to Ion Torrent data.

    Science.gov (United States)

    Caboche, Ségolène; Audebert, Christophe; Lemoine, Yves; Hot, David

    2014-04-05

    The rapid evolution in high-throughput sequencing (HTS) technologies has opened up new perspectives in several research fields and led to the production of large volumes of sequence data. A fundamental step in HTS data analysis is the mapping of reads onto reference sequences. Choosing a suitable mapper for a given technology and a given application is a subtle task because of the difficulty of evaluating mapping algorithms. In this paper, we present a benchmark procedure to compare mapping algorithms used in HTS using both real and simulated datasets and considering four evaluation criteria: computational resource and time requirements, robustness of mapping, ability to report positions for reads in repetitive regions, and ability to retrieve true genetic variation positions. To measure robustness, we introduced a new definition for a correctly mapped read taking into account not only the expected start position of the read but also the end position and the number of indels and substitutions. We developed CuReSim, a new read simulator, that is able to generate customized benchmark data for any kind of HTS technology by adjusting parameters to the error types. CuReSim and CuReSimEval, a tool to evaluate the mapping quality of the CuReSim simulated reads, are freely available. We applied our benchmark procedure to evaluate 14 mappers in the context of whole genome sequencing of small genomes with Ion Torrent data for which such a comparison has not yet been established. A benchmark procedure to compare HTS data mappers is introduced with a new definition for the mapping correctness as well as tools to generate simulated reads and evaluate mapping quality. The application of this procedure to Ion Torrent data from the whole genome sequencing of small genomes has allowed us to validate our benchmark procedure and demonstrate that it is helpful for selecting a mapper based on the intended application, questions to be addressed, and the technology used. This benchmark

  7. Forecasting Ecological Genomics: High-Tech Animal Instrumentation Meets High-Throughput Sequencing.

    Science.gov (United States)

    Shafer, Aaron B A; Northrup, Joseph M; Wikelski, Martin; Wittemyer, George; Wolf, Jochen B W

    2016-01-01

    Recent advancements in animal tracking technology and high-throughput sequencing are rapidly changing the questions and scope of research in the biological sciences. The integration of genomic data with high-tech animal instrumentation comes as a natural progression of traditional work in ecological genetics, and we provide a framework for linking the separate data streams from these technologies. Such a merger will elucidate the genetic basis of adaptive behaviors like migration and hibernation and advance our understanding of fundamental ecological and evolutionary processes such as pathogen transmission, population responses to environmental change, and communication in natural populations.

  8. Computational analysis of high-throughput flow cytometry data.

    Science.gov (United States)

    Robinson, J Paul; Rajwa, Bartek; Patsekin, Valery; Davisson, Vincent Jo

    2012-08-01

    Flow cytometry has been around for over 40 years, but only recently has the opportunity arisen to move into the high-throughput domain. The technology is now available and is highly competitive with imaging tools under the right conditions. Flow cytometry has, however, been a technology that has focused on its unique ability to study single cells and appropriate analytical tools are readily available to handle this traditional role of the technology. Expansion of flow cytometry to a high-throughput (HT) and high-content technology requires both advances in hardware and analytical tools. The historical perspective of flow cytometry operation as well as how the field has changed and what the key changes have been discussed. The authors provide a background and compelling arguments for moving toward HT flow, where there are many innovative opportunities. With alternative approaches now available for flow cytometry, there will be a considerable number of new applications. These opportunities show strong capability for drug screening and functional studies with cells in suspension. There is no doubt that HT flow is a rich technology awaiting acceptance by the pharmaceutical community. It can provide a powerful phenotypic analytical toolset that has the capacity to change many current approaches to HT screening. The previous restrictions on the technology, based on its reduced capacity for sample throughput, are no longer a major issue. Overcoming this barrier has transformed a mature technology into one that can focus on systems biology questions not previously considered possible.

  9. Computational analysis of high-throughput flow cytometry data

    Science.gov (United States)

    Robinson, J Paul; Rajwa, Bartek; Patsekin, Valery; Davisson, Vincent Jo

    2015-01-01

    Introduction Flow cytometry has been around for over 40 years, but only recently has the opportunity arisen to move into the high-throughput domain. The technology is now available and is highly competitive with imaging tools under the right conditions. Flow cytometry has, however, been a technology that has focused on its unique ability to study single cells and appropriate analytical tools are readily available to handle this traditional role of the technology. Areas covered Expansion of flow cytometry to a high-throughput (HT) and high-content technology requires both advances in hardware and analytical tools. The historical perspective of flow cytometry operation as well as how the field has changed and what the key changes have been discussed. The authors provide a background and compelling arguments for moving toward HT flow, where there are many innovative opportunities. With alternative approaches now available for flow cytometry, there will be a considerable number of new applications. These opportunities show strong capability for drug screening and functional studies with cells in suspension. Expert opinion There is no doubt that HT flow is a rich technology awaiting acceptance by the pharmaceutical community. It can provide a powerful phenotypic analytical toolset that has the capacity to change many current approaches to HT screening. The previous restrictions on the technology, based on its reduced capacity for sample throughput, are no longer a major issue. Overcoming this barrier has transformed a mature technology into one that can focus on systems biology questions not previously considered possible. PMID:22708834

  10. Using machine learning and high-throughput RNA sequencing to classify the precursors of small non-coding RNAs

    OpenAIRE

    Ryvkin, Paul; Leung, Yuk Yee; Ungar, Lyle H.; Gregory, Brian D.; Wang, Li-San

    2013-01-01

    Recent advances in high-throughput sequencing allow researchers to examine the transcriptome in more detail than ever before. Using a method known as high-throughput small RNA-sequencing, we can now profile the expression of small regulatory RNAs such as microRNAs and small interfering RNAs (siRNAs) with a great deal of sensitivity. However, there are many other types of small RNAs (

  11. Comparing the performance of three ancient DNA extraction methods for high-throughput sequencing

    DEFF Research Database (Denmark)

    Gamba, Cristina; Hanghøj, Kristian Ebbesen; Gaunitz, Charleen

    2016-01-01

    The DNA molecules that can be extracted from archaeological and palaeontological remains are often degraded and massively contaminated with environmental microbial material. This reduces the efficacy of shotgun approaches for sequencing ancient genomes, despite the decreasing sequencing costs...... of high-throughput sequencing (HTS). Improving the recovery of endogenous molecules from the DNA extraction and purification steps could, thus, help advance the characterization of ancient genomes. Here, we apply the three most commonly used DNA extraction methods to five ancient bone samples spanning...... a ~30 thousand year temporal range and originating from a diversity of environments, from South America to Alaska. We show that methods based on the purification of DNA fragments using silica columns are more advantageous than in solution methods and increase not only the total amount of DNA molecules...

  12. A High-Throughput Process for the Solid-Phase Purification of Synthetic DNA Sequences.

    Science.gov (United States)

    Grajkowski, Andrzej; Cieślak, Jacek; Beaucage, Serge L

    2017-06-19

    An efficient process for the purification of synthetic phosphorothioate and native DNA sequences is presented. The process is based on the use of an aminopropylated silica gel support functionalized with aminooxyalkyl functions to enable capture of DNA sequences through an oximation reaction with the keto function of a linker conjugated to the 5'-terminus of DNA sequences. Deoxyribonucleoside phosphoramidites carrying this linker, as a 5'-hydroxyl protecting group, have been synthesized for incorporation into DNA sequences during the last coupling step of a standard solid-phase synthesis protocol executed on a controlled pore glass (CPG) support. Solid-phase capture of the nucleobase- and phosphate-deprotected DNA sequences released from the CPG support is demonstrated to proceed near quantitatively. Shorter than full-length DNA sequences are first washed away from the capture support; the solid-phase purified DNA sequences are then released from this support upon reaction with tetra-n-butylammonium fluoride in dry dimethylsulfoxide (DMSO) and precipitated in tetrahydrofuran (THF). The purity of solid-phase-purified DNA sequences exceeds 98%. The simulated high-throughput and scalability features of the solid-phase purification process are demonstrated without sacrificing purity of the DNA sequences. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.

  13. Efficient strategy for the molecular diagnosis of intellectual disability using targeted high-throughput sequencing.

    Science.gov (United States)

    Redin, Claire; Gérard, Bénédicte; Lauer, Julia; Herenger, Yvan; Muller, Jean; Quartier, Angélique; Masurel-Paulet, Alice; Willems, Marjolaine; Lesca, Gaétan; El-Chehadeh, Salima; Le Gras, Stéphanie; Vicaire, Serge; Philipps, Muriel; Dumas, Michaël; Geoffroy, Véronique; Feger, Claire; Haumesser, Nicolas; Alembik, Yves; Barth, Magalie; Bonneau, Dominique; Colin, Estelle; Dollfus, Hélène; Doray, Bérénice; Delrue, Marie-Ange; Drouin-Garraud, Valérie; Flori, Elisabeth; Fradin, Mélanie; Francannet, Christine; Goldenberg, Alice; Lumbroso, Serge; Mathieu-Dramard, Michèle; Martin-Coignard, Dominique; Lacombe, Didier; Morin, Gilles; Polge, Anne; Sukno, Sylvie; Thauvin-Robinet, Christel; Thevenon, Julien; Doco-Fenzy, Martine; Genevieve, David; Sarda, Pierre; Edery, Patrick; Isidor, Bertrand; Jost, Bernard; Olivier-Faivre, Laurence; Mandel, Jean-Louis; Piton, Amélie

    2014-11-01

    Intellectual disability (ID) is characterised by an extreme genetic heterogeneity. Several hundred genes have been associated to monogenic forms of ID, considerably complicating molecular diagnostics. Trio-exome sequencing was recently proposed as a diagnostic approach, yet remains costly for a general implementation. We report the alternative strategy of targeted high-throughput sequencing of 217 genes in which mutations had been reported in patients with ID or autism as the major clinical concern. We analysed 106 patients with ID of unknown aetiology following array-CGH analysis and other genetic investigations. Ninety per cent of these patients were males, and 75% sporadic cases. We identified 26 causative mutations: 16 in X-linked genes (ATRX, CUL4B, DMD, FMR1, HCFC1, IL1RAPL1, IQSEC2, KDM5C, MAOA, MECP2, SLC9A6, SLC16A2, PHF8) and 10 de novo in autosomal-dominant genes (DYRK1A, GRIN1, MED13L, TCF4, RAI1, SHANK3, SLC2A1, SYNGAP1). We also detected four possibly causative mutations (eg, in NLGN3) requiring further investigations. We present detailed reasoning for assigning causality for each mutation, and associated patients' clinical information. Some genes were hit more than once in our cohort, suggesting they correspond to more frequent ID-associated conditions (KDM5C, MECP2, DYRK1A, TCF4). We highlight some unexpected genotype to phenotype correlations, with causative mutations being identified in genes associated to defined syndromes in patients deviating from the classic phenotype (DMD, TCF4, MECP2). We also bring additional supportive (HCFC1, MED13L) or unsupportive (SHROOM4, SRPX2) evidences for the implication of previous candidate genes or mutations in cognitive disorders. With a diagnostic yield of 25% targeted sequencing appears relevant as a first intention test for the diagnosis of ID, but importantly will also contribute to a better understanding regarding the specific contribution of the many genes implicated in ID and autism. Published by the

  14. Low-Cost, High-Throughput Sequencing of DNA Assemblies Using a Highly Multiplexed Nextera Process.

    Science.gov (United States)

    Shapland, Elaine B; Holmes, Victor; Reeves, Christopher D; Sorokin, Elena; Durot, Maxime; Platt, Darren; Allen, Christopher; Dean, Jed; Serber, Zach; Newman, Jack; Chandran, Sunil

    2015-07-17

    In recent years, next-generation sequencing (NGS) technology has greatly reduced the cost of sequencing whole genomes, whereas the cost of sequence verification of plasmids via Sanger sequencing has remained high. Consequently, industrial-scale strain engineers either limit the number of designs or take short cuts in quality control. Here, we show that over 4000 plasmids can be completely sequenced in one Illumina MiSeq run for less than $3 each (15× coverage), which is a 20-fold reduction over using Sanger sequencing (2× coverage). We reduced the volume of the Nextera tagmentation reaction by 100-fold and developed an automated workflow to prepare thousands of samples for sequencing. We also developed software to track the samples and associated sequence data and to rapidly identify correctly assembled constructs having the fewest defects. As DNA synthesis and assembly become a centralized commodity, this NGS quality control (QC) process will be essential to groups operating high-throughput pipelines for DNA construction.

  15. Use of high throughput sequencing to study oomycete communities in soil and roots

    DEFF Research Database (Denmark)

    Sapkota, Rumakanta; Nicolaisen, Mogens

    2015-01-01

    limited understanding of the diversity of oomycetes in symptomatic plant tissue as well as in root zones. The aim of this study was to improve and validate techniques for using high throughput sequencing as a tool for studying oomycete communities. Primer sets ITS4, ITS6 and ITS7 that have been used...... taxonomic units from symptomatic lesions in carrot resulted in 94% of the reads belonging to oomycetes with a dominance of species of Pythium that are known to be involved in causing cavity spot. Moreover, soil samples showed that 95% of the sequences could be assigned to oomycetes including Pythium......, Aphanomyces, Peronospora, Saprolegnia and Phytophthora. A high proportion of oomycete reads was consistently present in all symptomatic lesions and soil samples showing the versatility of the strategy and thus demonstrating the usefulness of the method in plant and soil DNA background....

  16. High-throughput genome sequencing of two Listeria monocytogenes clinical isolates during a large foodborne outbreak

    Directory of Open Access Journals (Sweden)

    Trout-Yakel Keri M

    2010-02-01

    Full Text Available Abstract Background A large, multi-province outbreak of listeriosis associated with ready-to-eat meat products contaminated with Listeria monocytogenes serotype 1/2a occurred in Canada in 2008. Subtyping of outbreak-associated isolates using pulsed-field gel electrophoresis (PFGE revealed two similar but distinct AscI PFGE patterns. High-throughput pyrosequencing of two L. monocytogenes isolates was used to rapidly provide the genome sequence of the primary outbreak strain and to investigate the extent of genetic diversity associated with a change of a single restriction enzyme fragment during PFGE. Results The chromosomes were collinear, but differences included 28 single nucleotide polymorphisms (SNPs and three indels, including a 33 kbp prophage that accounted for the observed difference in AscI PFGE patterns. The distribution of these traits was assessed within further clinical, environmental and food isolates associated with the outbreak, and this comparison indicated that three distinct, but highly related strains may have been involved in this nationwide outbreak. Notably, these two isolates were found to harbor a 50 kbp putative mobile genomic island encoding translocation and efflux functions that has not been observed in other Listeria genomes. Conclusions High-throughput genome sequencing provided a more detailed real-time assessment of genetic traits characteristic of the outbreak strains than could be achieved with routine subtyping methods. This study confirms that the latest generation of DNA sequencing technologies can be applied during high priority public health events, and laboratories need to prepare for this inevitability and assess how to properly analyze and interpret whole genome sequences in the context of molecular epidemiology.

  17. A Bayesian framework to identify methylcytosines from high-throughput bisulfite sequencing data.

    Directory of Open Access Journals (Sweden)

    Qing Xie

    2014-09-01

    Full Text Available High-throughput bisulfite sequencing technologies have provided a comprehensive and well-fitted way to investigate DNA methylation at single-base resolution. However, there are substantial bioinformatic challenges to distinguish precisely methylcytosines from unconverted cytosines based on bisulfite sequencing data. The challenges arise, at least in part, from cell heterozygosis caused by multicellular sequencing and the still limited number of statistical methods that are available for methylcytosine calling based on bisulfite sequencing data. Here, we present an algorithm, termed Bycom, a new Bayesian model that can perform methylcytosine calling with high accuracy. Bycom considers cell heterozygosis along with sequencing errors and bisulfite conversion efficiency to improve calling accuracy. Bycom performance was compared with the performance of Lister, the method most widely used to identify methylcytosines from bisulfite sequencing data. The results showed that the performance of Bycom was better than that of Lister for data with high methylation levels. Bycom also showed higher sensitivity and specificity for low methylation level samples (<1% than Lister. A validation experiment based on reduced representation bisulfite sequencing data suggested that Bycom had a false positive rate of about 4% while maintaining an accuracy of close to 94%. This study demonstrated that Bycom had a low false calling rate at any methylation level and accurate methylcytosine calling at high methylation levels. Bycom will contribute significantly to studies aimed at recalibrating the methylation level of genomic regions based on the presence of methylcytosines.

  18. Detection of genomic variation by selection of a 9 mb DNA region and high throughput sequencing.

    Directory of Open Access Journals (Sweden)

    Sergey I Nikolaev

    Full Text Available Detection of the rare polymorphisms and causative mutations of genetic diseases in a targeted genomic area has become a major goal in order to understand genomic and phenotypic variability. We have interrogated repeat-masked regions of 8.9 Mb on human chromosomes 21 (7.8 Mb and 7 (1.1 Mb from an individual from the International HapMap Project (NA12872. We have optimized a method of genomic selection for high throughput sequencing. Microarray-based selection and sequencing resulted in 260-fold enrichment, with 41% of reads mapping to the target region. 83% of SNPs in the targeted region had at least 4-fold sequence coverage and 54% at least 15-fold. When assaying HapMap SNPs in NA12872, our sequence genotypes are 91.3% concordant in regions with coverage > or = 4-fold, and 97.9% concordant in regions with coverage > or = 15-fold. About 81% of the SNPs recovered with both thresholds are listed in dbSNP. We observed that regions with low sequence coverage occur in close proximity to low-complexity DNA. Validation experiments using Sanger sequencing were performed for 46 SNPs with 15-20 fold coverage, with a confirmation rate of 96%, suggesting that DNA selection provides an accurate and cost-effective method for identifying rare genomic variants.

  19. High-Throughput Mutational Analysis of a Twister Ribozyme.

    Science.gov (United States)

    Kobori, Shungo; Yokobayashi, Yohei

    2016-08-22

    Recent discoveries of new classes of self-cleaving ribozymes in diverse organisms have triggered renewed interest in the chemistry and biology of ribozymes. Functional analysis and engineering of ribozymes often involve performing biochemical assays on multiple ribozyme mutants. However, because each ribozyme mutant must be individually prepared and assayed, the number and variety of mutants that can be studied are severely limited. All of the single and double mutants of a twister ribozyme (a total of 10 296 mutants) were generated and assayed for their self-cleaving activity by exploiting deep sequencing to count the numbers of cleaved and uncleaved sequences for every mutant. Interestingly, we found that the ribozyme is highly robust against mutations such that 71 % and 30 % of all single and double mutants, respectively, retain detectable activity under the assay conditions. It was also observed that the structural elements that comprise the ribozyme exhibit distinct sensitivity to mutations. © 2016 The Authors. Published by Wiley-VCH Verlag GmbH & Co. KGaA.

  20. An image analysis toolbox for high-throughput C. elegans assays.

    Science.gov (United States)

    Wählby, Carolina; Kamentsky, Lee; Liu, Zihan H; Riklin-Raviv, Tammy; Conery, Annie L; O'Rourke, Eyleen J; Sokolnicki, Katherine L; Visvikis, Orane; Ljosa, Vebjorn; Irazoqui, Javier E; Golland, Polina; Ruvkun, Gary; Ausubel, Frederick M; Carpenter, Anne E

    2012-04-22

    We present a toolbox for high-throughput screening of image-based Caenorhabditis elegans phenotypes. The image analysis algorithms measure morphological phenotypes in individual worms and are effective for a variety of assays and imaging systems. This WormToolbox is available through the open-source CellProfiler project and enables objective scoring of whole-worm high-throughput image-based assays of C. elegans for the study of diverse biological pathways that are relevant to human disease.

  1. High-throughput scanning of the rat genome using interspersed repetitive sequence-PCR markers.

    Science.gov (United States)

    Gösele, C; Hong, L; Kreitler, T; Rossmann, M; Hieke, B; Gross, U; Kramer, M; Himmelbauer, H; Bihoreau, M T; Kwitek-Black, A E; Twigger, S; Tonellato, P J; Jacob, H J; Schalkwyk, L C; Lindpaintner, K; Ganten, D; Lehrach, H; Knoblauch, M

    2000-11-01

    We report the establishment of a hybridization-based marker system for the rat genome based on the PCR amplification of interspersed repetitive sequences (IRS). Overall, 351 IRS markers were mapped within the rat genome. The IRS marker panel consists of 210 nonpolymorphic and 141 polymorphic markers that were screened for presence/absence polymorphism patterns in 38 different rat strains and substrains that are commonly used in biomedical research. The IRS marker panel was demonstrated to be useful for rapid genome screening in experimental rat crosses and high-throughput characterization of large-insert genomic library clones. Information on corresponding YAC clones is made available for this IRS marker set distributed over the whole rat genome. The two existing rat radiation hybrid maps were integrated by placing the IRS markers in both maps. The genetic and physical mapping data presented provide substantial information for ongoing positional cloning projects in the rat. Copyright 2000 Academic Press.

  2. SNP calling using genotype model selection on high-throughput sequencing data

    KAUST Repository

    You, Na

    2012-01-16

    Motivation: A review of the available single nucleotide polymorphism (SNP) calling procedures for Illumina high-throughput sequencing (HTS) platform data reveals that most rely mainly on base-calling and mapping qualities as sources of error when calling SNPs. Thus, errors not involved in base-calling or alignment, such as those in genomic sample preparation, are not accounted for.Results: A novel method of consensus and SNP calling, Genotype Model Selection (GeMS), is given which accounts for the errors that occur during the preparation of the genomic sample. Simulations and real data analyses indicate that GeMS has the best performance balance of sensitivity and positive predictive value among the tested SNP callers. © The Author 2012. Published by Oxford University Press. All rights reserved.

  3. PCR Strategies for Complete Allele Calling in Multigene Families Using High-Throughput Sequencing Approaches.

    Directory of Open Access Journals (Sweden)

    Elena Marmesat

    Full Text Available The characterization of multigene families with high copy number variation is often approached through PCR amplification with highly degenerate primers to account for all expected variants flanking the region of interest. Such an approach often introduces PCR biases that result in an unbalanced representation of targets in high-throughput sequencing libraries that eventually results in incomplete detection of the targeted alleles. Here we confirm this result and propose two different amplification strategies to alleviate this problem. The first strategy (called pooled-PCRs targets different subsets of alleles in multiple independent PCRs using different moderately degenerate primer pairs, whereas the second approach (called pooled-primers uses a custom-made pool of non-degenerate primers in a single PCR. We compare their performance to the common use of a single PCR with highly degenerate primers using the MHC class I of the Iberian lynx as a model. We found both novel approaches to work similarly well and better than the conventional approach. They significantly scored more alleles per individual (11.33 ± 1.38 and 11.72 ± 0.89 vs 7.94 ± 1.95, yielded more complete allelic profiles (96.28 ± 8.46 and 99.50 ± 2.12 vs 63.76 ± 15.43, and revealed more alleles at a population level (13 vs 12. Finally, we could link each allele's amplification efficiency with the primer-mismatches in its flanking sequences and show that ultra-deep coverage offered by high-throughput technologies does not fully compensate for such biases, especially as real alleles may reach lower coverage than artefacts. Adopting either of the proposed amplification methods provides the opportunity to attain more complete allelic profiles at lower coverages, improving confidence over the downstream analyses and subsequent applications.

  4. Characterization of Microbial Community in Lascaux Cave by High Throughput Sequencing

    Science.gov (United States)

    Alonso, Lise; Dubost, Audrey; Luis, Patricia; Pommier, Thomas; Moënne-Loccoz, Yvan

    2017-04-01

    The Lascaux Cave in South-Est France is an archeological landmark renowned for its Paleolithic paintings dating back c.18.000 years. Extensive touristic frequenting and repeated chemical treatments have resulted in the development of microbial stains on cave walls, which is a major issue in terms of art conservation. Therefore, it is of prime importance to better understand the microbial ecology of Lascaux Cave. Like many other caves, Lascaux is quite heterogeneous in terms of the nature and surface properties of rock walls within cave rooms, as well as the succession of rooms/galleries from the entrance to deeper areas of the cave. Lascaux Cave displays an additional levels of heterogeneity related to the presence of discontinuous stains on certain types of cave walls. We compared the microbial community (i.e. both prokaryotic and eukaryotic microbial populations) colonizing cave walls of different rooms/galleries, in and outside stains and in different cave layers, in successive years. Quantitative PCR analysis of cave wall samples gave in the order of 102 copies of 18S rRNA genes and 105 copies of 16S rRNA genes per ng of DNA, indicating significant colonization of all cave walls by micro-eukaryotes and especially bacteria. Illumina metagenomic analyses of cave wall samples was carried out based on four ribosomal DNA markers targeting bacteria, archaea, fungi, and other micro-eukaryotes. The results showed that the four microbial communities were highly diverse in and outside stains, as several hundred genera of microorganisms were identified in each. Proteobacteria were more prominent within stains whereas Bacteroidetes and Sordariomycetes were more prominent outside stains. High-throughput sequencing also showed that the nature/surface properties of cave walls were the main factor determining the structure and composition of microbial communities, ahead of the other heterogeneity factors studied i.e. location within the cave, presence of stain and sampling

  5. Kaleidaseq: a Web-based tool to monitor data flow in a high throughput sequencing facility.

    Science.gov (United States)

    Dedhia, N N; McCombie, W R

    1998-03-01

    Tracking data flow in high throughput sequencing is important in maintaining a consistent number of successfully sequenced samples, making decisions on scheduling the flow of sequencing steps, resolving problems at various steps and tracking the status of different projects. This is especially critical when the laboratory is handling a multitude of projects. We have built a Web-based data flow tracking package, called Kaleidaseq, which allows us to monitor the flow and quality of sequencing samples through the steps of preparation of library plates, plaque-picking, preparation of templates, conducting sequencing reactions, loading of samples on gels, base-calling the traces, and calculating the quality of the sequenced samples. Kaleidaseq's suite of displays allows for outstanding monitoring of the production sequencing process. The online display of current information that Kaleidaseq provides on both project status and process queues sorted by project enables accurate real-time assessment of the necessary samples that must be processed to complete the project. This information allows the process manager to allocate future resources optimally and schedule tasks according to scientific priorities. Quality of the sequenced samples can be tracked on a daily basis, which allows the sequencing laboratory to maintain a steady performance level and quickly resolve dips in quality. Kaleidaseq has a simple easy-to-use interface that allows access to all major functions and process queues from one Web page. This software package is modular and designed to allow additional processing steps and new monitoring variables to be added and tracked with ease. Access to the underlying relational database is through the Perl DBI interface, which allows for the use of different relational databases. Kaleidaseq is available for free use by the academic community from http://www.cshl.org/kaleidaseq.

  6. SSR_pipeline: a bioinformatic infrastructure for identifying microsatellites from paired-end Illumina high-throughput DNA sequencing data

    Science.gov (United States)

    Miller, Mark P.; Knaus, Brian J.; Mullins, Thomas D.; Haig, Susan M.

    2013-01-01

    SSR_pipeline is a flexible set of programs designed to efficiently identify simple sequence repeats (e.g., microsatellites) from paired-end high-throughput Illumina DNA sequencing data. The program suite contains 3 analysis modules along with a fourth control module that can automate analyses of large volumes of data. The modules are used to 1) identify the subset of paired-end sequences that pass Illumina quality standards, 2) align paired-end reads into a single composite DNA sequence, and 3) identify sequences that possess microsatellites (both simple and compound) conforming to user-specified parameters. The microsatellite search algorithm is extremely efficient, and we have used it to identify repeats with motifs from 2 to 25bp in length. Each of the 3 analysis modules can also be used independently to provide greater flexibility or to work with FASTQ or FASTA files generated from other sequencing platforms (Roche 454, Ion Torrent, etc.). We demonstrate use of the program with data from the brine fly Ephydra packardi (Diptera: Ephydridae) and provide empirical timing benchmarks to illustrate program performance on a common desktop computer environment. We further show that the Illumina platform is capable of identifying large numbers of microsatellites, even when using unenriched sample libraries and a very small percentage of the sequencing capacity from a single DNA sequencing run. All modules from SSR_pipeline are implemented in the Python programming language and can therefore be used from nearly any computer operating system (Linux, Macintosh, and Windows).

  7. Comparing the performance of three ancient DNA extraction methods for high-throughput sequencing.

    Science.gov (United States)

    Gamba, Cristina; Hanghøj, Kristian; Gaunitz, Charleen; Alfarhan, Ahmed H; Alquraishi, Saleh A; Al-Rasheid, Khaled A S; Bradley, Daniel G; Orlando, Ludovic

    2016-03-01

    The DNA molecules that can be extracted from archaeological and palaeontological remains are often degraded and massively contaminated with environmental microbial material. This reduces the efficacy of shotgun approaches for sequencing ancient genomes, despite the decreasing sequencing costs of high-throughput sequencing (HTS). Improving the recovery of endogenous molecules from the DNA extraction and purification steps could, thus, help advance the characterization of ancient genomes. Here, we apply the three most commonly used DNA extraction methods to five ancient bone samples spanning a ~30 thousand year temporal range and originating from a diversity of environments, from South America to Alaska. We show that methods based on the purification of DNA fragments using silica columns are more advantageous than in solution methods and increase not only the total amount of DNA molecules retrieved but also the relative importance of endogenous DNA fragments and their molecular diversity. Therefore, these methods provide a cost-effective solution for downstream applications, including DNA sequencing on HTS platforms. © 2015 John Wiley & Sons Ltd.

  8. Hydrogel Droplet Microfluidics for High-Throughput Single Molecule/Cell Analysis.

    Science.gov (United States)

    Zhu, Zhi; Yang, Chaoyong James

    2017-01-17

    molecule/cell analysis. The hydrogel can act as a 3D cell culture matrix to mimic the extracellular environment for long-term single cell culture, which allows further heterogeneity study in proliferation, drug screening, and metastasis at the single-cell level. The sol-gel transition allows reactions in solution to be performed rapidly and efficiently with product storage in the gel for flexible downstream manipulation and analysis. More importantly, controllable sol-gel regulation provides a new way to maintain phenotype-genotype linkages in the hydrogel matrix for high throughput molecular evolution. In this Account, we will review the hydrogel droplet generation on microfluidics, single molecule/cell encapsulation in hydrogel droplets, as well as the progress made by our group and others in the application of hydrogel droplet microfluidics for single molecule/cell analysis, including single cell culture, single molecule/cell detection, single cell sequencing, and molecular evolution.

  9. High-throughput analysis reveals novel maternal germline RNAs crucial for primordial germ cell preservation and proper migration

    OpenAIRE

    Owens, Dawn A.; Butler, Amanda M.; Aguero, Tristan H.; Newman, Karen M.; Van Booven, Derek; King, Mary Lou

    2017-01-01

    During oogenesis, hundreds of maternal RNAs are selectively localized to the animal or vegetal pole, including determinants of somatic and germline fates. Although microarray analysis has identified localized determinants, it is not comprehensive and is limited to known transcripts. Here, we utilized high-throughput RNA-sequencing analysis to comprehensively interrogate animal and vegetal pole RNAs in the fully grown Xenopus laevis oocyte. We identified 411 (198 annotated) and 27 (15 annotate...

  10. High Throughput Technologies for Functional Analysis of Archael Genomics

    Energy Technology Data Exchange (ETDEWEB)

    El-Sayed, Najib M. A.

    1998-09-25

    The specific aims of this project were as follows: (1) to design primers to each predicted open reading frame (ORF) in M. jannaschii and M. thermoautotrophicum to allow the amplification of a unique target sequence that will represent the corresponding coding region on a complete genome chip (2) to amplify each target sequence from M. jannaschii and M. thermoautotrophicum and verify that these PCR products are the expected DNA fragment (3) to establish a relational database that will track the production of target DNAs and the nucleotide sequence used to represent each ORF.

  11. A high-throughput de novo sequencing approach for shotgun proteomics using high-resolution tandem mass spectrometry

    Directory of Open Access Journals (Sweden)

    Banfield Jillian F

    2010-03-01

    Full Text Available Abstract Background High-resolution tandem mass spectra can now be readily acquired with hybrid instruments, such as LTQ-Orbitrap and LTQ-FT, in high-throughput shotgun proteomics workflows. The improved spectral quality enables more accurate de novo sequencing for identification of post-translational modifications and amino acid polymorphisms. Results In this study, a new de novo sequencing algorithm, called Vonode, has been developed specifically for analysis of such high-resolution tandem mass spectra. To fully exploit the high mass accuracy of these spectra, a unique scoring system is proposed to evaluate sequence tags based primarily on mass accuracy information of fragment ions. Consensus sequence tags were inferred for 11,422 spectra with an average peptide length of 5.5 residues from a total of 40,297 input spectra acquired in a 24-hour proteomics measurement of Rhodopseudomonas palustris. The accuracy of inferred consensus sequence tags was 84%. According to our comparison, the performance of Vonode was shown to be superior to the PepNovo v2.0 algorithm, in terms of the number of de novo sequenced spectra and the sequencing accuracy. Conclusions Here, we improved de novo sequencing performance by developing a new algorithm specifically for high-resolution tandem mass spectral data. The Vonode algorithm is freely available for download at http://compbio.ornl.gov/Vonode.

  12. Identification and characterization of small non-coding RNAs from Chinese fir by high throughput sequencing

    Directory of Open Access Journals (Sweden)

    Wan Li-Chuan

    2012-08-01

    Full Text Available Abstract Background Small non-coding RNAs (sRNAs play key roles in plant development, growth and responses to biotic and abiotic stresses. At least four classes of sRNAs have been well characterized in plants, including repeat-associated siRNAs (rasiRNAs, microRNAs (miRNAs, trans-acting siRNAs (tasiRNAs and natural antisense transcript-derived siRNAs. Chinese fir (Cunninghamia lanceolata is one of the most important coniferous evergreen tree species in China. No sRNA from Chinese fir has been described to date. Results To obtain sRNAs in Chinese fir, we sequenced a sRNA library generated from seeds, seedlings, leaves, stems and calli, using Illumina high throughput sequencing technology. A comprehensive set of sRNAs were acquired, including conserved and novel miRNAs, rasiRNAs and tasiRNAs. With BLASTN and MIREAP we identified a total of 115 conserved miRNAs comprising 40 miRNA families and one novel miRNA with precursor sequence. The expressions of 16 conserved and one novel miRNAs and one tasiRNA were detected by RT-PCR. Utilizing real time RT-PCR, we revealed that four conserved and one novel miRNAs displayed developmental stage-specific expression patterns in Chinese fir. In addition, 209 unigenes were predicted to be targets of 30 Chinese fir miRNA families, of which five target genes were experimentally verified by 5' RACE, including a squamosa promoter-binding protein gene, a pentatricopeptide (PPR repeat-containing protein gene, a BolA-like family protein gene, AGO1 and a gene of unknown function. We also demonstrated that the DCL3-dependent rasiRNA biogenesis pathway, which had been considered absent in conifers, existed in Chinese fir. Furthermore, the miR390-TAS3-ARF regulatory pathway was elucidated. Conclusions We unveiled a complex population of sRNAs in Chinese fir through high throughput sequencing. This provides an insight into the composition and function of sRNAs in Chinese fir and sheds new light on land plant sRNA evolution.

  13. [Study on Microbial Diversity of Peri-implantitis Subgingival by High-throughput Sequencing].

    Science.gov (United States)

    Li, Zhi-jie; Wang, Shao-guo; Li, Yue-hong; Tu, Dong-xiang; Liu, Shi-yun; Nie, Hong-bing; Li, Zhi-qiang; Zhang, Ju-mei

    2015-07-01

    To study microbial diversity of peri-implantitis subgingival with high-throughput sequencing, and investigate microbiological etiology of peri-implantitis. Subgingival plaques were sampled from the patients with peri-implantitis (D group) and non-peri-implantitis subjects (N group). The microbiological diversity of the subgingival plaques was detected by sequencing V4 region of 16S rRNA with Illumina Miseq platform. The diversity of the community structure was analyzed using Mothur software. A total of 156 507 gene sequences were detected in nine samples and 4 402 operational taxonomic units (OTUs) were found. Selenomonas, Pseudomonas, and Fusobacterium were dominant bacteria in D group, while Fusobacterium, Veillonella and Streptococcus were dominant bacteria in N group. Differences between peri-implantitis and non-peri-implantitis bacterial communities were observed at all phylogenetic levels by LEfSe, which was also found in PcoA test. The occurrence of peri-implantitis is not only related to periodontitis pathogenic microbe, but also related with the changes of oral microbial community structure. Treponema, Herbaspirillum, Butyricimonas and Phaeobacte may be closely related to the occurrence and development of peri-implantitis.

  14. High-throughput gender identification of penguin species using melting curve analysis.

    Science.gov (United States)

    Tseng, Chao-Neng; Chang, Yung-Ting; Chiu, Hui-Tzu; Chou, Yii-Cheng; Huang, Hurng-Wern; Cheng, Chien-Chung; Liao, Ming-Hui; Chang, Hsueh-Wei

    2014-04-03

    Most species of penguins are sexual monomorphic and therefore it is difficult to visually identify their genders for monitoring population stability in terms of sex ratio analysis. In this study, we evaluated the suitability using melting curve analysis (MCA) for high-throughput gender identification of penguins. Preliminary test indicated that the Griffiths's P2/P8 primers were not suitable for MCA analysis. Based on sequence alignment of Chromo-Helicase-DNA binding protein (CHD)-W and CHD-Z genes from four species of penguins (Pygoscelis papua, Aptenodytes patagonicus, Spheniscus magellanicus, and Eudyptes chrysocome), we redesigned forward primers for the CHD-W/CHD-Z-common region (PGU-ZW2) and the CHD-W-specific region (PGU-W2) to be used in combination with the reverse Griffiths's P2 primer. When tested with P. papua samples, PCR using P2/PGU-ZW2 and P2/PGU-W2 primer sets generated two amplicons of 148- and 356-bp, respectively, which were easily resolved in 1.5% agarose gels. MCA analysis indicated the melting temperature (Tm) values for P2/PGU-ZW2 and P2/PGU-W2 amplicons of P. papua samples were 79.75°C-80.5°C and 81.0°C-81.5°C, respectively. Females displayed both ZW-common and W-specific Tm peaks, whereas male was positive only for ZW-common peak. Taken together, our redesigned primers coupled with MCA analysis allows precise high throughput gender identification for P. papua, and potentially for other penguin species such as A. patagonicus, S. magellanicus, and E. chrysocome as well.

  15. Direct metagenomic detection of viral pathogens in nasal and fecal specimens using an unbiased high-throughput sequencing approach.

    Directory of Open Access Journals (Sweden)

    Shota Nakamura

    Full Text Available With the severe acute respiratory syndrome epidemic of 2003 and renewed attention on avian influenza viral pandemics, new surveillance systems are needed for the earlier detection of emerging infectious diseases. We applied a "next-generation" parallel sequencing platform for viral detection in nasopharyngeal and fecal samples collected during seasonal influenza virus (Flu infections and norovirus outbreaks from 2005 to 2007 in Osaka, Japan. Random RT-PCR was performed to amplify RNA extracted from 0.1-0.25 ml of nasopharyngeal aspirates (N = 3 and fecal specimens (N = 5, and more than 10 microg of cDNA was synthesized. Unbiased high-throughput sequencing of these 8 samples yielded 15,298-32,335 (average 24,738 reads in a single 7.5 h run. In nasopharyngeal samples, although whole genome analysis was not available because the majority (>90% of reads were host genome-derived, 20-460 Flu-reads were detected, which was sufficient for subtype identification. In fecal samples, bacteria and host cells were removed by centrifugation, resulting in gain of 484-15,260 reads of norovirus sequence (78-98% of the whole genome was covered, except for one specimen that was under-detectable by RT-PCR. These results suggest that our unbiased high-throughput sequencing approach is useful for directly detecting pathogenic viruses without advance genetic information. Although its cost and technological availability make it unlikely that this system will very soon be the diagnostic standard worldwide, this system could be useful for the earlier discovery of novel emerging viruses and bioterrorism, which are difficult to detect with conventional procedures.

  16. High throughput whole rumen metagenome profiling using untargeted massively parallel sequencing

    Directory of Open Access Journals (Sweden)

    Ross Elizabeth M

    2012-07-01

    Full Text Available Abstract Background Variation of microorganism communities in the rumen of cattle (Bos taurus is of great interest because of possible links to economically or environmentally important traits, such as feed conversion efficiency or methane emission levels. The resolution of studies investigating this variation may be improved by utilizing untargeted massively parallel sequencing (MPS, that is, sequencing without targeted amplification of genes. The objective of this study was to develop a method which used MPS to generate “rumen metagenome profiles”, and to investigate if these profiles were repeatable among samples taken from the same cow. Given faecal samples are much easier to obtain than rumen fluid samples; we also investigated whether rumen metagenome profiles were predictive of faecal metagenome profiles. Results Rather than focusing on individual organisms within the rumen, our method used MPS data to generate quantitative rumen micro-biome profiles, regardless of taxonomic classifications. The method requires a previously assembled reference metagenome. A number of such reference metagenomes were considered, including two rumen derived metagenomes, a human faecal microflora metagenome and a reference metagenome made up of publically available prokaryote sequences. Sequence reads from each test sample were aligned to these references. The “rumen metagenome profile” was generated from the number of the reads that aligned to each contig in the database. We used this method to test the hypothesis that rumen fluid microbial community profiles vary more between cows than within multiple samples from the same cow. Rumen fluid samples were taken from three cows, at three locations within the rumen. DNA from the samples was sequenced on the Illumina GAIIx. When the reads were aligned to a rumen metagenome reference, the rumen metagenome profiles were repeatable (P  Conclusions We have presented a simple and high throughput method of

  17. Genotyping by PCR and High-Throughput Sequencing of Commercial Probiotic Products Reveals Composition Biases.

    Directory of Open Access Journals (Sweden)

    Wesley Morovic

    2016-11-01

    Full Text Available Recent advances in microbiome research have brought renewed focus on beneficial bacteria, many of which are available in food and dietary supplements. Although probiotics have historically been defined as microorganisms that convey health benefits when ingested in sufficient viable amounts, this description now includes the stipulation well defined strains, encompassing definitive taxonomy for consumer consideration and regulatory oversight. Here, we evaluated 52 commercial dietary supplements covering a range of labeled species, and determined their content using plate counting, targeted genotyping. Additionally, strain identities were assessed using methods recently published by the United States Pharmacopeial Convention. We also determined the relative abundance of individual bacteria by high-throughput sequencing (HTS of the 16S rRNA sequence using paired-end 2x250bp Illumina MiSeq technology. Using multiple methods, we tested the hypothesis that products do contain the quantitative amount of labeled bacteria, and qualitative list of labeled microbial species. We found that 17 samples (33% were below label claim for CFU prior to their expiration dates. A multiplexed-PCR scheme showed that only 30/52 (58% of the products contained a correctly labeled classification, with issues encompassing incorrect taxonomy, missing species and un-labeled species. The HTS revealed that many blended products consisted predominantly of Lactobacillus acidophilus and Bifidobacterium animalis subsp. lactis. These results highlight the need for reliable methods to qualitatively determine the correct taxonomy and quantitatively ascertain the relative amounts of mixed microbial populations in commercial probiotic products.

  18. High-throughput DNA Stretching in Continuous Elongational Flow for Genome Sequence Scanning

    Science.gov (United States)

    Meltzer, Robert; Griffis, Joshua; Safranovitch, Mikhail; Malkin, Gene; Cameron, Douglas

    2014-03-01

    Genome Sequence Scanning (GSS) identifies and compares bacterial genomes by stretching long (60 - 300 kb) genomic DNA restriction fragments and scanning for site-selective fluorescent probes. Practical application of GSS requires: 1) high throughput data acquisition, 2) efficient DNA stretching, 3) reproducible DNA elasticity in the presence of intercalating fluorescent dyes. GSS utilizes a pseudo-two-dimensional micron-scale funnel with convergent sheathing flows to stretch one molecule at a time in continuous elongational flow and center the DNA stream over diffraction-limited confocal laser excitation spots. Funnel geometry has been optimized to maximize throughput of DNA within the desired length range (>10 million nucleobases per second). A constant-strain detection channel maximizes stretching efficiency by applying a constant parabolic tension profile to each molecule, minimizing relaxation and flow-induced tumbling. The effect of intercalator on DNA elasticity is experimentally controlled by reacting one molecule of DNA at a time in convergent sheathing flows of the dye. Derivations of accelerating flow and non-linear tension distribution permit alignment of detected fluorescence traces to theoretical templates derived from whole-genome sequence data.

  19. Surveying the repair of ancient DNA from bones via high-throughput sequencing.

    Science.gov (United States)

    Mouttham, Nathalie; Klunk, Jennifer; Kuch, Melanie; Fourney, Ron; Poinar, Hendrik

    2015-07-01

    DNA damage in the form of abasic sites, chemically altered nucleotides, and strand fragmentation is the foremost limitation in obtaining genetic information from many ancient samples. Upon cell death, DNA continues to endure various chemical attacks such as hydrolysis and oxidation, but repair pathways found in vivo no longer operate. By incubating degraded DNA with specific enzyme combinations adopted from these pathways, it is possible to reverse some of the post-mortem nucleic acid damage prior to downstream analyses such as library preparation, targeted enrichment, and high-throughput sequencing. Here, we evaluate the performance of two available repair protocols on previously characterized DNA extracts from four mammoths. Both methods use endonucleases and glycosylases along with a DNA polymerase-ligase combination. PreCR Repair Mix increases the number of molecules converted to sequencing libraries, leading to an increase in endogenous content and a decrease in cytosine-to-thymine transitions due to cytosine deamination. However, the effects of Nelson Repair Mix on repair of DNA damage remain inconclusive.

  20. Comprehensive evaluation and optimization of amplicon library preparation methods for high-throughput antibody sequencing.

    Directory of Open Access Journals (Sweden)

    Ulrike Menzel

    Full Text Available High-throughput sequencing (HTS of antibody repertoire libraries has become a powerful tool in the field of systems immunology. However, numerous sources of bias in HTS workflows may affect the obtained antibody repertoire data. A crucial step in antibody library preparation is the addition of short platform-specific nucleotide adapter sequences. As of yet, the impact of the method of adapter addition on experimental library preparation and the resulting antibody repertoire HTS datasets has not been thoroughly investigated. Therefore, we compared three standard library preparation methods by performing Illumina HTS on antibody variable heavy genes from murine antibody-secreting cells. Clonal overlap and rank statistics demonstrated that the investigated methods produced equivalent HTS datasets. PCR-based methods were experimentally superior to ligation with respect to speed, efficiency, and practicality. Finally, using a two-step PCR based method we established a protocol for antibody repertoire library generation, beginning from inputs as low as 1 ng of total RNA. In summary, this study represents a major advance towards a standardized experimental framework for antibody HTS, thus opening up the potential for systems-based, cross-experiment meta-analyses of antibody repertoires.

  1. Inertial-ordering-assisted droplet microfluidics for high-throughput single-cell RNA-sequencing.

    Science.gov (United States)

    Moon, Hui-Sung; Je, Kwanghwi; Min, Jae-Woong; Park, Donghyun; Han, Kyung-Yeon; Shin, Seung-Ho; Park, Woong-Yang; Yoo, Chang Eun; Kim, Shin-Hyun

    2018-02-27

    Single-cell RNA-seq reveals the cellular heterogeneity inherent in the population of cells, which is very important in many clinical and research applications. Recent advances in droplet microfluidics have achieved the automatic isolation, lysis, and labeling of single cells in droplet compartments without complex instrumentation. However, barcoding errors occurring in the cell encapsulation process because of the multiple-beads-in-droplet and insufficient throughput because of the low concentration of beads for avoiding multiple-beads-in-a-droplet remain important challenges for precise and efficient expression profiling of single cells. In this study, we developed a new droplet-based microfluidic platform that significantly improved the throughput while reducing barcoding errors through deterministic encapsulation of inertially ordered beads. Highly concentrated beads containing oligonucleotide barcodes were spontaneously ordered in a spiral channel by an inertial effect, which were in turn encapsulated in droplets one-by-one, while cells were simultaneously encapsulated in the droplets. The deterministic encapsulation of beads resulted in a high fraction of single-bead-in-a-droplet and rare multiple-beads-in-a-droplet although the bead concentration increased to 1000 μl -1 , which diminished barcoding errors and enabled accurate high-throughput barcoding. We successfully validated our device with single-cell RNA-seq. In addition, we found that multiple-beads-in-a-droplet, generated using a normal Drop-Seq device with a high concentration of beads, underestimated transcript numbers and overestimated cell numbers. This accurate high-throughput platform can expand the capability and practicality of Drop-Seq in single-cell analysis.

  2. Common fusion transcripts identified in colorectal cancer cell lines by high-throughput RNA sequencing.

    Science.gov (United States)

    Nome, Torfinn; Thomassen, Gard Os; Bruun, Jarle; Ahlquist, Terje; Bakken, Anne C; Hoff, Andreas M; Rognum, Torleiv; Nesbakken, Arild; Lorenz, Susanne; Sun, Jinchang; Barros-Silva, João Diogo; Lind, Guro E; Myklebost, Ola; Teixeira, Manuel R; Meza-Zepeda, Leonardo A; Lothe, Ragnhild A; Skotheim, Rolf I

    2013-01-01

    Colorectal cancer (CRC) is the third most common cancer disease in the Western world, and about 40% of the patients die from this disease. The cancer cells are commonly genetically unstable, but only a few low-frequency recurrent fusion genes have so far been reported for this disease. In this study, we present a thorough search for novel fusion transcripts in CRC using high-throughput RNA sequencing. From altogether 220 million paired-end sequence reads from seven CRC cell lines, we identified 3391 candidate fused transcripts. By stringent requirements, we nominated 11 candidate fusion transcripts for further experimental validation, of which 10 were positive by reverse transcription-polymerase chain reaction and Sanger sequencing. Six were intrachromosomal fusion transcripts, and interestingly, three of these, AKAP13-PDE8A, COMMD10-AP3S1, and CTB-35F21.1-PSD2, were present in, respectively, 18, 18, and 20 of 21 analyzed cell lines and in, respectively, 18, 61, and 48 (17%-58%) of 106 primary cancer tissues. These three fusion transcripts were also detected in 2 to 4 of 14 normal colonic mucosa samples (14%-28%). Whole-genome sequencing identified a specific genomic breakpoint in COMMD10-AP3S1 and further indicates that both the COMMD10-AP3S1 and AKAP13-PDE8A fusion transcripts are due to genomic duplications in specific cell lines. In conclusion, we have identified AKAP13-PDE8A, COMMD10-AP3S1, and CTB-35F21.1-PSD2 as novel intrachromosomal fusion transcripts and the most highly recurring chimeric transcripts described for CRC to date. The functional and clinical relevance of these chimeric RNA molecules remains to be elucidated.

  3. Common Fusion Transcripts Identified in Colorectal Cancer Cell Lines by High-Throughput RNA Sequencing12

    Science.gov (United States)

    Nome, Torfinn; Thomassen, Gard OS; Bruun, Jarle; Ahlquist, Terje; Bakken, Anne C; Hoff, Andreas M; Rognum, Torleiv; Nesbakken, Arild; Lorenz, Susanne; Sun, Jinchang; Barros-Silva, João Diogo; Lind, Guro E; Myklebost, Ola; Teixeira, Manuel R; Meza-Zepeda, Leonardo A; Lothe, Ragnhild A; Skotheim, Rolf I

    2013-01-01

    Colorectal cancer (CRC) is the third most common cancer disease in the Western world, and about 40% of the patients die from this disease. The cancer cells are commonly genetically unstable, but only a few low-frequency recurrent fusion genes have so far been reported for this disease. In this study, we present a thorough search for novel fusion transcripts in CRC using high-throughput RNA sequencing. From altogether 220 million paired-end sequence reads from seven CRC cell lines, we identified 3391 candidate fused transcripts. By stringent requirements, we nominated 11 candidate fusion transcripts for further experimental validation, of which 10 were positive by reverse transcription-polymerase chain reaction and Sanger sequencing. Six were intrachromosomal fusion transcripts, and interestingly, three of these, AKAP13-PDE8A, COMMD10-AP3S1, and CTB-35F21.1-PSD2, were present in, respectively, 18, 18, and 20 of 21 analyzed cell lines and in, respectively, 18, 61, and 48 (17%-58%) of 106 primary cancer tissues. These three fusion transcripts were also detected in 2 to 4 of 14 normal colonic mucosa samples (14%–28%). Whole-genome sequencing identified a specific genomic breakpoint in COMMD10-AP3S1 and further indicates that both the COMMD10-AP3S1 and AKAP13-PDE8A fusion transcripts are due to genomic duplications in specific cell lines. In conclusion, we have identified AKAP13-PDE8A, COMMD10-AP3S1, and CTB-35F21.1-PSD2 as novel intrachromosomal fusion transcripts and the most highly recurring chimeric transcripts described for CRC to date. The functional and clinical relevance of these chimeric RNA molecules remains to be elucidated. PMID:24151535

  4. LightAssembler: fast and memory-efficient assembly algorithm for high-throughput sequencing reads.

    Science.gov (United States)

    El-Metwally, Sara; Zakaria, Magdi; Hamza, Taher

    2016-11-01

    The deluge of current sequenced data has exceeded Moore's Law, more than doubling every 2 years since the next-generation sequencing (NGS) technologies were invented. Accordingly, we will able to generate more and more data with high speed at fixed cost, but lack the computational resources to store, process and analyze it. With error prone high throughput NGS reads and genomic repeats, the assembly graph contains massive amount of redundant nodes and branching edges. Most assembly pipelines require this large graph to reside in memory to start their workflows, which is intractable for mammalian genomes. Resource-efficient genome assemblers combine both the power of advanced computing techniques and innovative data structures to encode the assembly graph efficiently in a computer memory. LightAssembler is a lightweight assembly algorithm designed to be executed on a desktop machine. It uses a pair of cache oblivious Bloom filters, one holding a uniform sample of [Formula: see text]-spaced sequenced [Formula: see text]-mers and the other holding [Formula: see text]-mers classified as likely correct, using a simple statistical test. LightAssembler contains a light implementation of the graph traversal and simplification modules that achieves comparable assembly accuracy and contiguity to other competing tools. Our method reduces the memory usage by [Formula: see text] compared to the resource-efficient assemblers using benchmark datasets from GAGE and Assemblathon projects. While LightAssembler can be considered as a gap-based sequence assembler, different gap sizes result in an almost constant assembly size and genome coverage. https://github.com/SaraEl-Metwally/LightAssembler CONTACT: sarah_almetwally4@mans.edu.egSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  5. Freud: a software suite for high-throughput simulation analysis

    Science.gov (United States)

    Harper, Eric; Spellings, Matthew; Anderson, Joshua; Glotzer, Sharon

    Computer simulation is an indispensable tool for the study of a wide variety of systems. As simulations scale to fill petascale and exascale supercomputing clusters, so too does the size of the data produced, as well as the difficulty in analyzing these data. We present Freud, an analysis software suite for efficient analysis of simulation data. Freud makes no assumptions about the system being analyzed, allowing for general analysis methods to be applied to nearly any type of simulation. Freud includes standard analysis methods such as the radial distribution function, as well as new methods including the potential of mean force and torque and local crystal environment analysis. Freud combines a Python interface with fast, parallel C + + analysis routines to run efficiently on laptops, workstations, and supercomputing clusters. Data analysis on clusters reduces data transfer requirements, a prohibitive cost for petascale computing. Used in conjunction with simulation software, Freud allows for smart simulations that adapt to the current state of the system, enabling the study of phenomena such as nucleation and growth, intelligent investigation of phases and phase transitions, and determination of effective pair potentials.

  6. High-throughput Transcriptome analysis, CAGE and beyond

    KAUST Repository

    Kodzius, Rimantas

    2008-11-25

    1. Current research - PhD work on discovery of new allergens - Postdoctoral work on Transcriptional Start Sites a) Tag based technologies allow higher throughput b) CAGE technology to define promoters c) CAGE data analysis to understand Transcription - Wo

  7. High-throughput ion beam analysis at imec

    Science.gov (United States)

    Meersschaut, J.; Vandervorst, W.

    2017-09-01

    We describe the ion beam analysis activities at imec. Rutherford backscattering spectrometry and time of flight-energy (TOF-E) elastic recoil detection analysis are pursued to support the nano-electronics research and development. We outline the experimental set-up and we introduce a new data acquisition software platform. Finally, we illustrate the use of Rutherford backscattering spectrometry to map the thickness of a metallic thin film on a 300 mm Si wafer.

  8. MicroRNA from Moringa oleifera: Identification by High Throughput Sequencing and Their Potential Contribution to Plant Medicinal Value.

    Directory of Open Access Journals (Sweden)

    Stefano Pirrò

    Full Text Available Moringa oleifera is a widespread plant with substantial nutritional and medicinal value. We postulated that microRNAs (miRNAs, which are endogenous, noncoding small RNAs regulating gene expression at the post-transcriptional level, might contribute to the medicinal properties of plants of this species after ingestion into human body, regulating human gene expression. However, the knowledge is scarce about miRNA in Moringa. Furthermore, in order to test the hypothesis on the pharmacological potential properties of miRNA, we conducted a high-throughput sequencing analysis using the Illumina platform. A total of 31,290,964 raw reads were produced from a library of small RNA isolated from M. oleifera seeds. We identified 94 conserved and two novel miRNAs that were validated by qRT-PCR assays. Results from qRT-PCR trials conducted on the expression of 20 Moringa miRNA showed that are conserved across multiple plant species as determined by their detection in tissue of other common crop plants. In silico analyses predicted target genes for the conserved miRNA that in turn allowed to relate the miRNAs to the regulation of physiological processes. Some of the predicted plant miRNAs have functional homology to their mammalian counterparts and regulated human genes when they were transfected into cell lines. To our knowledge, this is the first report of discovering M. oleifera miRNAs based on high-throughput sequencing and bioinformatics analysis and we provided new insight into a potential cross-species control of human gene expression. The widespread cultivation and consumption of M. oleifera, for nutritional and medicinal purposes, brings humans into close contact with products and extracts of this plant species. The potential for miRNA transfer should be evaluated as one possible mechanism of action to account for beneficial properties of this valuable species.

  9. MicroRNA from Moringa oleifera: Identification by High Throughput Sequencing and Their Potential Contribution to Plant Medicinal Value.

    Science.gov (United States)

    Pirrò, Stefano; Zanella, Letizia; Kenzo, Maurice; Montesano, Carla; Minutolo, Antonella; Potestà, Marina; Sobze, Martin Sanou; Canini, Antonella; Cirilli, Marco; Muleo, Rosario; Colizzi, Vittorio; Galgani, Andrea

    2016-01-01

    Moringa oleifera is a widespread plant with substantial nutritional and medicinal value. We postulated that microRNAs (miRNAs), which are endogenous, noncoding small RNAs regulating gene expression at the post-transcriptional level, might contribute to the medicinal properties of plants of this species after ingestion into human body, regulating human gene expression. However, the knowledge is scarce about miRNA in Moringa. Furthermore, in order to test the hypothesis on the pharmacological potential properties of miRNA, we conducted a high-throughput sequencing analysis using the Illumina platform. A total of 31,290,964 raw reads were produced from a library of small RNA isolated from M. oleifera seeds. We identified 94 conserved and two novel miRNAs that were validated by qRT-PCR assays. Results from qRT-PCR trials conducted on the expression of 20 Moringa miRNA showed that are conserved across multiple plant species as determined by their detection in tissue of other common crop plants. In silico analyses predicted target genes for the conserved miRNA that in turn allowed to relate the miRNAs to the regulation of physiological processes. Some of the predicted plant miRNAs have functional homology to their mammalian counterparts and regulated human genes when they were transfected into cell lines. To our knowledge, this is the first report of discovering M. oleifera miRNAs based on high-throughput sequencing and bioinformatics analysis and we provided new insight into a potential cross-species control of human gene expression. The widespread cultivation and consumption of M. oleifera, for nutritional and medicinal purposes, brings humans into close contact with products and extracts of this plant species. The potential for miRNA transfer should be evaluated as one possible mechanism of action to account for beneficial properties of this valuable species.

  10. Predicting gene function through systematic analysis and quality assessment of high-throughput data.

    Science.gov (United States)

    Kemmeren, Patrick; Kockelkorn, Thessa T J P; Bijma, Theo; Donders, Rogier; Holstege, Frank C P

    2005-04-15

    Determining gene function is an important challenge arising from the availability of whole genome sequences. Until recently, approaches based on sequence homology were the only high-throughput method for predicting gene function. Use of high-throughput generated experimental data sets for determining gene function has been limited for several reasons. Here a new approach is presented for integration of high-throughput data sets, leading to prediction of function based on relationships supported by multiple types and sources of data. This is achieved with a database containing 125 different high-throughput data sets describing phenotypes, cellular localizations, protein interactions and mRNA expression levels from Saccharomyces cerevisiae, using a bit-vector representation and information content-based ranking. The approach takes characteristic and qualitative differences between the data sets into account, is highly flexible, efficient and scalable. Database queries result in predictions for 543 uncharacterized genes, based on multiple functional relationships each supported by at least three types of experimental data. Some of these are experimentally verified, further demonstrating their reliability. The results also generate insights into the relative merits of different data types and provide a coherent framework for functional genomic datamining. Free availability over the Internet. f.c.p.holstege@med.uu.nl http://www.genomics.med.uu.nl/pub/pk/comb_gen_network.

  11. Rapid Detection and Identification of Infectious Pathogens Based on High-throughput Sequencing

    Directory of Open Access Journals (Sweden)

    Pei-Xiang Ni

    2015-01-01

    Full Text Available Background: The dilemma of pathogens identification in patients with unidentified clinical symptoms such as fever of unknown origin exists, which not only poses a challenge to both the diagnostic and therapeutic process by itself, but also to expert physicians. Methods: In this report, we have attempted to increase the awareness of unidentified pathogens by developing a method to investigate hitherto unidentified infectious pathogens based on unbiased high-throughput sequencing. Results: Our observations show that this method supplements current diagnostic technology that predominantly relies on information derived five cases from the intensive care unit. This methodological approach detects viruses and corrects the incidence of false positive detection rates of pathogens in a much shorter period. Through our method is followed by polymerase chain reaction validation, we could identify infection with Epstein-Barr virus, and in another case, we could identify infection with Streptococcus viridians based on the culture, which was false positive. Conclusions: This technology is a promising approach to revolutionize rapid diagnosis of infectious pathogens and to guide therapy that might result in the improvement of personalized medicine.

  12. Exploring the polyadenylated RNA virome of sweet potato through high-throughput sequencing.

    Directory of Open Access Journals (Sweden)

    Ying-Hong Gu

    Full Text Available BACKGROUND: Viral diseases are the second most significant biotic stress for sweet potato, with yield losses reaching 20% to 40%. Over 30 viruses have been reported to infect sweet potato around the world, and 11 of these have been detected in China. Most of these viruses were detected by traditional detection approaches that show disadvantages in detection throughput. Next-generation sequencing technology provides a novel, high sensitive method for virus detection and diagnosis. METHODOLOGY/PRINCIPAL FINDINGS: We report the polyadenylated RNA virome of three sweet potato cultivars using a high throughput RNA sequencing approach. Transcripts of 15 different viruses were detected, 11 of which were detected in cultivar Xushu18, whilst 11 and 4 viruses were detected in Guangshu 87 and Jingshu 6, respectively. Four were detected in sweet potato for the first time, and 4 were found for the first time in China. The most prevalent virus was SPFMV, which constituted 88% of the total viral sequence reads. Virus transcripts with extremely low expression levels were also detected, such as transcripts of SPLCV, CMV and CymMV. Digital gene expression (DGE and reverse transcription polymerase chain reaction (RT-PCR analyses showed that the highest viral transcript expression levels were found in fibrous and tuberous roots, which suggest that these tissues should be optimum samples for virus detection. CONCLUSIONS/SIGNIFICANCE: A total of 15 viruses were presumed to present in three sweet potato cultivars growing in China. This is the first insight into the sweet potato polyadenylated RNA virome. These results can serve as a basis for further work to investigate whether some of the 'new' viruses infecting sweet potato are pathogenic.

  13. Exploring the polyadenylated RNA virome of sweet potato through high-throughput sequencing.

    Science.gov (United States)

    Gu, Ying-Hong; Tao, Xiang; Lai, Xian-Jun; Wang, Hai-Yan; Zhang, Yi-Zheng

    2014-01-01

    Viral diseases are the second most significant biotic stress for sweet potato, with yield losses reaching 20% to 40%. Over 30 viruses have been reported to infect sweet potato around the world, and 11 of these have been detected in China. Most of these viruses were detected by traditional detection approaches that show disadvantages in detection throughput. Next-generation sequencing technology provides a novel, high sensitive method for virus detection and diagnosis. We report the polyadenylated RNA virome of three sweet potato cultivars using a high throughput RNA sequencing approach. Transcripts of 15 different viruses were detected, 11 of which were detected in cultivar Xushu18, whilst 11 and 4 viruses were detected in Guangshu 87 and Jingshu 6, respectively. Four were detected in sweet potato for the first time, and 4 were found for the first time in China. The most prevalent virus was SPFMV, which constituted 88% of the total viral sequence reads. Virus transcripts with extremely low expression levels were also detected, such as transcripts of SPLCV, CMV and CymMV. Digital gene expression (DGE) and reverse transcription polymerase chain reaction (RT-PCR) analyses showed that the highest viral transcript expression levels were found in fibrous and tuberous roots, which suggest that these tissues should be optimum samples for virus detection. A total of 15 viruses were presumed to present in three sweet potato cultivars growing in China. This is the first insight into the sweet potato polyadenylated RNA virome. These results can serve as a basis for further work to investigate whether some of the 'new' viruses infecting sweet potato are pathogenic.

  14. Annotation of primate miRNAs by high throughput sequencing of small RNA libraries

    Directory of Open Access Journals (Sweden)

    Dannemann Michael

    2012-03-01

    Full Text Available Abstract Background In addition to genome sequencing, accurate functional annotation of genomes is required in order to carry out comparative and evolutionary analyses between species. Among primates, the human genome is the most extensively annotated. Human miRNA gene annotation is based on multiple lines of evidence including evidence for expression as well as prediction of the characteristic hairpin structure. In contrast, most miRNA genes in non-human primates are annotated based on homology without any expression evidence. We have sequenced small-RNA libraries from chimpanzee, gorilla, orangutan and rhesus macaque from multiple individuals and tissues. Using patterns of miRNA expression in conjunction with a model of miRNA biogenesis we used these high-throughput sequencing data to identify novel miRNAs in non-human primates. Results We predicted 47 new miRNAs in chimpanzee, 240 in gorilla, 55 in orangutan and 47 in rhesus macaque. The algorithm we used was able to predict 64% of the previously known miRNAs in chimpanzee, 94% in gorilla, 61% in orangutan and 71% in rhesus macaque. We therefore added evidence for expression in between one and five tissues to miRNAs that were previously annotated based only on homology to human miRNAs. We increased from 60 to 175 the number miRNAs that are located in orthologous regions in humans and the four non-human primate species studied here. Conclusions In this study we provide expression evidence for homology-based annotated miRNAs and predict de novo miRNAs in four non-human primate species. We increased the number of annotated miRNA genes and provided evidence for their expression in four non-human primates. Similar approaches using different individuals and tissues would improve annotation in non-human primates and allow for further comparative studies in the future.

  15. Preselection of shotgun clones by oligonucleotide fingerprinting: an efficient and high throughput strategy to reduce redundancy in large-scale sequencing projects

    National Research Council Canada - National Science Library

    Radelof, U; Hennig, S; Seranski, P; Steinfath, M; Ramser, J; Reinhardt, R; Poustka, A; Francis, F; Lehrach, H

    1998-01-01

    .... To reduce the overall effort and cost of those projects and to accelerate the sequencing throughput, we have developed an efficient, high throughput oligonucleotide fingerprinting protocol to select...

  16. Isolation and characterization of antigen-specific alpaca (Lama pacos) VHH antibodies by biopanning followed by high-throughput sequencing.

    Science.gov (United States)

    Miyazaki, Nobuo; Kiyose, Norihiko; Akazawa, Yoko; Takashima, Mizuki; Hagihara, Yosihisa; Inoue, Naokazu; Matsuda, Tomonari; Ogawa, Ryu; Inoue, Seiya; Ito, Yuji

    2015-09-01

    The antigen-binding domain of camelid dimeric heavy chain antibodies, known as VHH or Nanobody, has much potential in pharmaceutical and industrial applications. To establish the isolation process of antigen-specific VHH, a VHH phage library was constructed with a diversity of 8.4 × 10(7) from cDNA of peripheral blood mononuclear cells of an alpaca (Lama pacos) immunized with a fragment of IZUMO1 (IZUMO1PFF) as a model antigen. By conventional biopanning, 13 antigen-specific VHHs were isolated. The amino acid sequences of these VHHs, designated as N-group VHHs, were very similar to each other (>93% identity). To find more diverse antibodies, we performed high-throughput sequencing (HTS) of VHH genes. By comparing the frequencies of each sequence between before and after biopanning, we found the sequences whose frequencies were increased by biopanning. The top 100 sequences of them were supplied for phylogenic tree analysis. In total 75% of them belonged to N-group VHHs, but the other were phylogenically apart from N-group VHHs (Non N-group). Two of three VHHs selected from non N-group VHHs showed sufficient antigen binding ability. These results suggested that biopanning followed by HTS provided a useful method for finding minor and diverse antigen-specific clones that could not be identified by conventional biopanning. © The Authors 2015. Published by Oxford University Press on behalf of the Japanese Biochemical Society. All rights reserved.

  17. High-throughput sequencing of RNA silencing-associated small RNAs in olive (Olea europaea L..

    Directory of Open Access Journals (Sweden)

    Livia Donaire

    Full Text Available Small RNAs (sRNAs of 20 to 25 nucleotides (nt in length maintain genome integrity and control gene expression in a multitude of developmental and physiological processes. Despite RNA silencing has been primarily studied in model plants, the advent of high-throughput sequencing technologies has enabled profiling of the sRNA component of more than 40 plant species. Here, we used deep sequencing and molecular methods to report the first inventory of sRNAs in olive (Olea europaea L.. sRNA libraries prepared from juvenile and adult shoots revealed that the 24-nt class dominates the sRNA transcriptome and atypically accumulates to levels never seen in other plant species, suggesting an active role of heterochromatin silencing in the maintenance and integrity of its large genome. A total of 18 known miRNA families were identified in the libraries. Also, 5 other sRNAs derived from potential hairpin-like precursors remain as plausible miRNA candidates. RNA blots confirmed miRNA expression and suggested tissue- and/or developmental-specific expression patterns. Target mRNAs of conserved miRNAs were computationally predicted among the olive cDNA collection and experimentally validated through endonucleolytic cleavage assays. Finally, we use expression data to uncover genetic components of the miR156, miR172 and miR390/TAS3-derived trans-acting small interfering RNA (tasiRNA regulatory nodes, suggesting that these interactive networks controlling developmental transitions are fully operational in olive.

  18. HTPheno: an image analysis pipeline for high-throughput plant phenotyping.

    Science.gov (United States)

    Hartmann, Anja; Czauderna, Tobias; Hoffmann, Roberto; Stein, Nils; Schreiber, Falk

    2011-05-12

    In the last few years high-throughput analysis methods have become state-of-the-art in the life sciences. One of the latest developments is automated greenhouse systems for high-throughput plant phenotyping. Such systems allow the non-destructive screening of plants over a period of time by means of image acquisition techniques. During such screening different images of each plant are recorded and must be analysed by applying sophisticated image analysis algorithms. This paper presents an image analysis pipeline (HTPheno) for high-throughput plant phenotyping. HTPheno is implemented as a plugin for ImageJ, an open source image processing software. It provides the possibility to analyse colour images of plants which are taken in two different views (top view and side view) during a screening. Within the analysis different phenotypical parameters for each plant such as height, width and projected shoot area of the plants are calculated for the duration of the screening. HTPheno is applied to analyse two barley cultivars. HTPheno, an open source image analysis pipeline, supplies a flexible and adaptable ImageJ plugin which can be used for automated image analysis in high-throughput plant phenotyping and therefore to derive new biological insights, such as determination of fitness.

  19. HTPheno: An image analysis pipeline for high-throughput plant phenotyping

    Directory of Open Access Journals (Sweden)

    Stein Nils

    2011-05-01

    Full Text Available Abstract Background In the last few years high-throughput analysis methods have become state-of-the-art in the life sciences. One of the latest developments is automated greenhouse systems for high-throughput plant phenotyping. Such systems allow the non-destructive screening of plants over a period of time by means of image acquisition techniques. During such screening different images of each plant are recorded and must be analysed by applying sophisticated image analysis algorithms. Results This paper presents an image analysis pipeline (HTPheno for high-throughput plant phenotyping. HTPheno is implemented as a plugin for ImageJ, an open source image processing software. It provides the possibility to analyse colour images of plants which are taken in two different views (top view and side view during a screening. Within the analysis different phenotypical parameters for each plant such as height, width and projected shoot area of the plants are calculated for the duration of the screening. HTPheno is applied to analyse two barley cultivars. Conclusions HTPheno, an open source image analysis pipeline, supplies a flexible and adaptable ImageJ plugin which can be used for automated image analysis in high-throughput plant phenotyping and therefore to derive new biological insights, such as determination of fitness.

  20. Influence of artifact removal on rare species recovery in natural complex communities using high-throughput sequencing.

    Directory of Open Access Journals (Sweden)

    Aibin Zhan

    Full Text Available Large-scale high-throughput sequencing techniques are rapidly becoming popular methods to profile complex communities and have generated deep insights into community biodiversity. However, several technical problems, especially sequencing artifacts such as nucleotide calling errors, could artificially inflate biodiversity estimates. Sequence filtering for artifact removal is a conventional method for deleting error-prone sequences from high-throughput sequencing data. As rare species represented by low-abundance sequences in datasets may be sensitive to artifact removal process, the influence of artifact removal on rare species recovery has not been well evaluated in natural complex communities. Here we employed both internal (reliable operational taxonomic units selected from communities themselves and external (indicator species spiked into communities references to evaluate the influence of artifact removal on rare species recovery using 454 pyrosequencing of complex plankton communities collected from both freshwater and marine habitats. Multiple analyses revealed three clear patterns: 1 rare species were eliminated during sequence filtering process at all tested filtering stringencies, 2 more rare taxa were eliminated as filtering stringencies increased, and 3 elimination of rare species intensified as biomass of a species in a community was reduced. Our results suggest that cautions be applied when processing high-throughput sequencing data, especially for rare taxa detection for conservation of species at risk and for rapid response programs targeting non-indigenous species. Establishment of both internal and external references proposed here provides a practical strategy to evaluate artifact removal process.

  1. Genome-Wide Assessment of the Binding Effects of Artificial Transcriptional Activators by High-Throughput Sequencing.

    Science.gov (United States)

    Chandran, Anandhakumar; Syed, Junetha; Li, Yue; Sato, Shinsuke; Bando, Toshikazu; Sugiyama, Hiroshi

    2016-10-17

    One of the major goals in DNA-based personalized medicine is the development of sequence-specific small molecules to target the genome. SAHA-PIPs belong to such class of small molecule. In the context of the complex eukaryotic genome, the differential biological effects of SAHA-PIPs are unclear. This question can be addressed by identifying the binding regions across the genome; however, it is a challenge to enrich small-molecule-bound DNA without chemical crosslinking. Here, we developed a method that employs high-throughput sequencing to map the binding area of small molecules throughout the chromatinized human genome. Analysis of the sequenced data confirmed the presence of specific binding sites for SAHA-PIPs from the enriched sequence reads. Mapping the binding sites and enriched regions on the human genome clarifies the reason for the distinct biological effects of SAHA-PIP. This approach will be useful for identifying the function of other small molecules on a large scale. © 2016 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.

  2. High-throughput analysis of the impact of antibiotics on the human intestinal microbiota composition

    NARCIS (Netherlands)

    Ladirat, S.E.; Schols, H.A.; Nauta, A.; Schoterman, M.H.C.; Keijser, B.J.F.; Montijn, R.C.; Gruppen, H.; Schuren, F.H.J.

    2013-01-01

    Antibiotic treatments can lead to a disruption of the human microbiota. In this in-vitro study, the impact of antibiotics on adult intestinal microbiota was monitored in a new high-throughput approach: a fermentation screening-platform was coupled with a phylogenetic microarray analysis

  3. Novel Sequencing-based Strategies for High-Throughput Discovery of Genetic Mutations Underlying Inherited Antibody Deficiency Disorders

    OpenAIRE

    Wang, Hong-Ying; Jain, Ashish

    2011-01-01

    Human inherited antibody deficiency disorders are generally caused by mutations in genes involved in the pathways regulating B-cell class switch recombination; DNA damage repair; and B-cell development, differentiation, and survival. Sequencing a large set of candidate genes involved in these pathways appears to be a highly efficient way to identify novel mutations. Herein we review several high-throughput sequencing approaches as well as recent improvements in target gene enrichment technolo...

  4. Exploring the sources of bacterial spoilers in beefsteaks by culture-independent high-throughput sequencing.

    Directory of Open Access Journals (Sweden)

    Francesca De Filippis

    Full Text Available Microbial growth on meat to unacceptable levels contributes significantly to change meat structure, color and flavor and to cause meat spoilage. The types of microorganisms initially present in meat depend on several factors and multiple sources of contamination can be identified. The aims of this study were to evaluate the microbial diversity in beefsteaks before and after aerobic storage at 4°C and to investigate the sources of microbial contamination by examining the microbiota of carcasses wherefrom the steaks originated and of the processing environment where the beef was handled. Carcass, environmental (processing plant and meat samples were analyzed by culture-independent high-throughput sequencing of 16S rRNA gene amplicons. The microbiota of carcass swabs was very complex, including more than 600 operational taxonomic units (OTUs belonging to 15 different phyla. A significant association was found between beef microbiota and specific beef cuts (P<0.01 indicating that different cuts of the same carcass can influence the microbial contamination of beef. Despite the initially high complexity of the carcass microbiota, the steaks after aerobic storage at 4°C showed a dramatic decrease in microbial complexity. Pseudomonas sp. and Brochothrix thermosphacta were the main contaminants, and Acinetobacter, Psychrobacter and Enterobacteriaceae were also found. Comparing the relative abundance of OTUs in the different samples it was shown that abundant OTUs in beefsteaks after storage occurred in the corresponding carcass. However, the abundance of these same OTUs clearly increased in environmental samples taken in the processing plant suggesting that spoilage-associated microbial species originate from carcasses, they are carried to the processing environment where the meat is handled and there they become a resident microbiota. Such microbiota is then further spread on meat when it is handled and it represents the starting microbial association

  5. DNA from buccal swabs suitable for high-throughput SNP multiplex analysis.

    Science.gov (United States)

    McMichael, Gai L; Gibson, Catherine S; O'Callaghan, Michael E; Goldwater, Paul N; Dekker, Gustaaf A; Haan, Eric A; MacLennan, Alastair H

    2009-12-01

    We sought a convenient and reliable method for collection of genetic material that is inexpensive and noninvasive and suitable for self-collection and mailing and a compatible, commercial DNA extraction protocol to meet quantitative and qualitative requirements for high-throughput single nucleotide polymorphism (SNP) multiplex analysis on an automated platform. Buccal swabs were collected from 34 individuals as part of a pilot study to test commercially available buccal swabs and DNA extraction kits. DNA was quantified on a spectrofluorometer with Picogreen dsDNA prior to testing the DNA integrity with predesigned SNP multiplex assays. Based on the pilot study results, the Catch-All swabs and Isohelix buccal DNA isolation kit were selected for our high-throughput application and extended to a further 1140 samples as part of a large cohort study. The average DNA yield in the pilot study (n=34) was 1.94 microg +/- 0.54 with a 94% genotyping pass rate. For the high-throughput application (n=1140), the average DNA yield was 2.44 microg +/- 1.74 with a >or=93% genotyping pass rate. The Catch-All buccal swabs are a convenient and cost-effective alternative to blood sampling. Combined with the Isohelix buccal DNA isolation kit, they provided DNA of sufficient quantity and quality for high-throughput SNP multiplex analysis.

  6. Soil DNA metabarcoding and high-throughput sequencing as a forensic tool: considerations, potential limitations and recommendations.

    Science.gov (United States)

    Young, J M; Austin, J J; Weyrich, L S

    2017-02-01

    Analysis of physical evidence is typically a deciding factor in forensic casework by establishing what transpired at a scene or who was involved. Forensic geoscience is an emerging multi-disciplinary science that can offer significant benefits to forensic investigations. Soil is a powerful, nearly 'ideal' contact trace evidence, as it is highly individualistic, easy to characterise, has a high transfer and retention probability, and is often overlooked in attempts to conceal evidence. However, many real-life cases encounter close proximity soil samples or soils with low inorganic content, which cannot be easily discriminated based on current physical and chemical analysis techniques. The capability to improve forensic soil discrimination, and identify key indicator taxa from soil using the organic fraction is currently lacking. The development of new DNA sequencing technologies offers the ability to generate detailed genetic profiles from soils and enhance current forensic soil analyses. Here, we discuss the use of DNA metabarcoding combined with high-throughput sequencing (HTS) technology to distinguish between soils from different locations in a forensic context. Specifically, we provide recommendations for best practice, outline the potential limitations encountered in a forensic context and describe the future directions required to integrate soil DNA analysis into casework. © FEMS 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  7. Experimental design-based functional mining and characterization of high-throughput sequencing data in the sequence read archive.

    Science.gov (United States)

    Nakazato, Takeru; Ohta, Tazro; Bono, Hidemasa

    2013-01-01

    High-throughput sequencing technology, also called next-generation sequencing (NGS), has the potential to revolutionize the whole process of genome sequencing, transcriptomics, and epigenetics. Sequencing data is captured in a public primary data archive, the Sequence Read Archive (SRA). As of January 2013, data from more than 14,000 projects have been submitted to SRA, which is double that of the previous year. Researchers can download raw sequence data from SRA website to perform further analyses and to compare with their own data. However, it is extremely difficult to search entries and download raw sequences of interests with SRA because the data structure is complicated, and experimental conditions along with raw sequences are partly described in natural language. Additionally, some sequences are of inconsistent quality because anyone can submit sequencing data to SRA with no quality check. Therefore, as a criterion of data quality, we focused on SRA entries that were cited in journal articles. We extracted SRA IDs and PubMed IDs (PMIDs) from SRA and full-text versions of journal articles and retrieved 2748 SRA ID-PMID pairs. We constructed a publication list referring to SRA entries. Since, one of the main themes of -omics analyses is clarification of disease mechanisms, we also characterized SRA entries by disease keywords, according to the Medical Subject Headings (MeSH) extracted from articles assigned to each SRA entry. We obtained 989 SRA ID-MeSH disease term pairs, and constructed a disease list referring to SRA data. We previously developed feature profiles of diseases in a system called "Gendoo". We generated hyperlinks between diseases extracted from SRA and the feature profiles of it. The developed project, publication and disease lists resulting from this study are available at our web service, called "DBCLS SRA" (http://sra.dbcls.jp/). This service will improve accessibility to high-quality data from SRA.

  8. Experimental design-based functional mining and characterization of high-throughput sequencing data in the sequence read archive.

    Directory of Open Access Journals (Sweden)

    Takeru Nakazato

    Full Text Available High-throughput sequencing technology, also called next-generation sequencing (NGS, has the potential to revolutionize the whole process of genome sequencing, transcriptomics, and epigenetics. Sequencing data is captured in a public primary data archive, the Sequence Read Archive (SRA. As of January 2013, data from more than 14,000 projects have been submitted to SRA, which is double that of the previous year. Researchers can download raw sequence data from SRA website to perform further analyses and to compare with their own data. However, it is extremely difficult to search entries and download raw sequences of interests with SRA because the data structure is complicated, and experimental conditions along with raw sequences are partly described in natural language. Additionally, some sequences are of inconsistent quality because anyone can submit sequencing data to SRA with no quality check. Therefore, as a criterion of data quality, we focused on SRA entries that were cited in journal articles. We extracted SRA IDs and PubMed IDs (PMIDs from SRA and full-text versions of journal articles and retrieved 2748 SRA ID-PMID pairs. We constructed a publication list referring to SRA entries. Since, one of the main themes of -omics analyses is clarification of disease mechanisms, we also characterized SRA entries by disease keywords, according to the Medical Subject Headings (MeSH extracted from articles assigned to each SRA entry. We obtained 989 SRA ID-MeSH disease term pairs, and constructed a disease list referring to SRA data. We previously developed feature profiles of diseases in a system called "Gendoo". We generated hyperlinks between diseases extracted from SRA and the feature profiles of it. The developed project, publication and disease lists resulting from this study are available at our web service, called "DBCLS SRA" (http://sra.dbcls.jp/. This service will improve accessibility to high-quality data from SRA.

  9. Whole Genome Sequencing of Enterovirus species C Isolates by High-throughput Sequencing: Development of Generic Primers

    Directory of Open Access Journals (Sweden)

    Maël Bessaud

    2016-08-01

    Full Text Available Enteroviruses are among the most common viruses infecting humans and can cause diverse clinical syndromes ranging from minor febrile illness to severe and potentially fatal diseases. Enterovirus species C (EV-C consists of more than 20 types, among which the 3 serotypes of polioviruses, the etiological agents of poliomyelitis, are included. Biodiversity and evolution of EV-C genomes are shaped by frequent recombination events. Therefore, identification and characterization of circulating EV-C strains require the sequencing of different genomic regions.A simple method was developed to sequence quickly the entire genome of EV-C isolates. Four overlapping fragments were produced separately by RT-PCR performed with generic primers. The four amplicons were then pooled and purified prior to be sequenced by high-throughput technique.The method was assessed on a panel of EV-Cs belonging to a wide-range of types. It can be used to determine full-length genome sequences through de novo assembly of thousands of reads. It was also able to discriminate reads from closely related viruses in mixtures.By decreasing the workload compared to classical Sanger-based techniques, this method will serve as a precious tool for sequencing large panels of EV-Cs isolated in cell cultures during environmental surveillance or from patients, including vaccine-derived polioviruses.

  10. Assessment of Bifidobacterium Species Using groEL Gene on the Basis of Illumina MiSeq High-Throughput Sequencing

    Science.gov (United States)

    Hu, Lujun; Lu, Wenwei; Wang, Linlin; Pan, Mingluo; Zhang, Hao; Zhao, Jianxin; Chen, Wei

    2017-01-01

    The next-generation high-throughput sequencing techniques have introduced a new way to assess the gut’s microbial diversity on the basis of 16S rRNA gene-based microbiota analysis. However, the precise appraisal of the biodiversity of Bifidobacterium species within the gut remains a challenging task because of the limited resolving power of the 16S rRNA gene in different species. The groEL gene, a protein-coding gene, evolves quickly and thus is useful for differentiating bifidobacteria. Here, we designed a Bifidobacterium-specific primer pair which targets a hypervariable sequence region within the groEL gene that is suitable for precise taxonomic identification and detection of all recognized species of the genus Bifidobacterium so far. The results showed that the novel designed primer set can specifically differentiate Bifidobacterium species from non-bifidobacteria, and as low as 104 cells of Bifidobacterium species can be detected using the novel designed primer set on the basis of Illumina Miseq high-throughput sequencing. We also developed a novel protocol to assess the diversity of Bifidobacterium species in both human and rat feces through high-throughput sequencing technologies using groEL gene as a discriminative marker. PMID:29160815

  11. Transcriptomics of in vitro immune-stimulated hemocytes from the Manila clam Ruditapes philippinarum using high-throughput sequencing.

    Directory of Open Access Journals (Sweden)

    Rebeca Moreira

    Full Text Available BACKGROUND: The Manila clam (Ruditapes philippinarum is a worldwide cultured bivalve species with important commercial value. Diseases affecting this species can result in large economic losses. Because knowledge of the molecular mechanisms of the immune response in bivalves, especially clams, is scarce and fragmentary, we sequenced RNA from immune-stimulated R. philippinarum hemocytes by 454-pyrosequencing to identify genes involved in their immune defense against infectious diseases. METHODOLOGY AND PRINCIPAL FINDINGS: High-throughput deep sequencing of R. philippinarum using 454 pyrosequencing technology yielded 974,976 high-quality reads with an average read length of 250 bp. The reads were assembled into 51,265 contigs and the 44.7% of the translated nucleotide sequences into protein were annotated successfully. The 35 most frequently found contigs included a large number of immune-related genes, and a more detailed analysis showed the presence of putative members of several immune pathways and processes like the apoptosis, the toll like signaling pathway and the complement cascade. We have found sequences from molecules never described in bivalves before, especially in the complement pathway where almost all the components are present. CONCLUSIONS: This study represents the first transcriptome analysis using 454-pyrosequencing conducted on R. philippinarum focused on its immune system. Our results will provide a rich source of data to discover and identify new genes, which will serve as a basis for microarray construction and the study of gene expression as well as for the identification of genetic markers. The discovery of new immune sequences was very productive and resulted in a large variety of contigs that may play a role in the defense mechanisms of Ruditapes philippinarum.

  12. High-throughput and quantitative genome-wide messenger RNA sequencing for molecular phenotyping.

    Science.gov (United States)

    Collins, John E; Wali, Neha; Sealy, Ian M; Morris, James A; White, Richard J; Leonard, Steven R; Jackson, David K; Jones, Matthew C; Smerdon, Nathalie C; Zamora, Jorge; Dooley, Christopher M; Carruthers, Samantha N; Barrett, Jeffrey C; Stemple, Derek L; Busch-Nentwich, Elisabeth M

    2015-08-05

    We present a genome-wide messenger RNA (mRNA) sequencing technique that converts small amounts of RNA from many samples into molecular phenotypes. It encompasses all steps from sample preparation to sequence analysis and is applicable to baseline profiling or perturbation measurements. Multiplex sequencing of transcript 3' ends identifies differential transcript abundance independent of gene annotation. We show that increasing biological replicate number while maintaining the total amount of sequencing identifies more differentially abundant transcripts. This method can be implemented on polyadenylated RNA from any organism with an annotated reference genome and in any laboratory with access to Illumina sequencing.

  13. Target-dependent enrichment of virions determines the reduction of high-throughput sequencing in virus discovery.

    Directory of Open Access Journals (Sweden)

    Randi Holm Jensen

    Full Text Available Viral infections cause many different diseases stemming both from well-characterized viral pathogens but also from emerging viruses, and the search for novel viruses continues to be of great importance. High-throughput sequencing is an important technology for this purpose. However, viral nucleic acids often constitute a minute proportion of the total genetic material in a sample from infected tissue. Techniques to enrich viral targets in high-throughput sequencing have been reported, but the sensitivity of such methods is not well established. This study compares different library preparation techniques targeting both DNA and RNA with and without virion enrichment. By optimizing the selection of intact virus particles, both by physical and enzymatic approaches, we assessed the effectiveness of the specific enrichment of viral sequences as compared to non-enriched sample preparations by selectively looking for and counting read sequences obtained from shotgun sequencing. Using shotgun sequencing of total DNA or RNA, viral targets were detected at concentrations corresponding to the predicted level, providing a foundation for estimating the effectiveness of virion enrichment. Virion enrichment typically produced a 1000-fold increase in the proportion of DNA virus sequences. For RNA virions the gain was less pronounced with a maximum 13-fold increase. This enrichment varied between the different sample concentrations, with no clear trend. Despite that less sequencing was required to identify target sequences, it was not evident from our data that a lower detection level was achieved by virion enrichment compared to shotgun sequencing.

  14. A high-throughput splinkerette-PCR method for the isolation and sequencing of retroviral insertion sites

    DEFF Research Database (Denmark)

    Uren, Anthony G; Mikkers, Harald; Kool, Jaap

    2009-01-01

    sites has been a major limitation to performing screens on this scale. Here we present a method for the high-throughput isolation of insertion sites using a highly efficient splinkerette-PCR method coupled with capillary or 454 sequencing. This protocol includes a description of the procedure for DNA...... optimized for the murine leukemia virus (MuLV), and can easily be performed in a 96-well plate format for the efficient multiplex isolation of insertion sites....

  15. VarSim: a high-fidelity simulation and validation framework for high-throughput genome sequencing with cancer applications.

    Science.gov (United States)

    Mu, John C; Mohiyuddin, Marghoob; Li, Jian; Bani Asadi, Narges; Gerstein, Mark B; Abyzov, Alexej; Wong, Wing H; Lam, Hugo Y K

    2015-05-01

    VarSim is a framework for assessing alignment and variant calling accuracy in high-throughput genome sequencing through simulation or real data. In contrast to simulating a random mutation spectrum, it synthesizes diploid genomes with germline and somatic mutations based on a realistic model. This model leverages information such as previously reported mutations to make the synthetic genomes biologically relevant. VarSim simulates and validates a wide range of variants, including single nucleotide variants, small indels and large structural variants. It is an automated, comprehensive compute framework supporting parallel computation and multiple read simulators. Furthermore, we developed a novel map data structure to validate read alignments, a strategy to compare variants binned in size ranges and a lightweight, interactive, graphical report to visualize validation results with detailed statistics. Thus far, it is the most comprehensive validation tool for secondary analysis in next generation sequencing. Code in Java and Python along with instructions to download the reads and variants is at http://bioinform.github.io/varsim. rd@bina.com Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.

  16. Computational and statistical methods for high-throughput analysis of post-translational modifications of proteins

    DEFF Research Database (Denmark)

    Schwämmle, Veit; Braga, Thiago Verano; Roepstorff, Peter

    2015-01-01

    The investigation of post-translational modifications (PTMs) represents one of the main research focuses for the study of protein function and cell signaling. Mass spectrometry instrumentation with increasing sensitivity improved protocols for PTM enrichment and recently established pipelines...... for high-throughput experiments allow large-scale identification and quantification of several PTM types. This review addresses the concurrently emerging challenges for the computational analysis of the resulting data and presents PTM-centered approaches for spectra identification, statistical analysis...

  17. PathOS: a decision support system for reporting high throughput sequencing of cancers in clinical diagnostic laboratories.

    Science.gov (United States)

    Doig, Kenneth D; Fellowes, Andrew; Bell, Anthony H; Seleznev, Andrei; Ma, David; Ellul, Jason; Li, Jason; Doyle, Maria A; Thompson, Ella R; Kumar, Amit; Lara, Luis; Vedururu, Ravikiran; Reid, Gareth; Conway, Thomas; Papenfuss, Anthony T; Fox, Stephen B

    2017-04-24

    The increasing affordability of DNA sequencing has allowed it to be widely deployed in pathology laboratories. However, this has exposed many issues with the analysis and reporting of variants for clinical diagnostic use. Implementing a high-throughput sequencing (NGS) clinical reporting system requires a diverse combination of capabilities, statistical methods to identify variants, global variant databases, a validated bioinformatics pipeline, an auditable laboratory workflow, reproducible clinical assays and quality control monitoring throughout. These capabilities must be packaged in software that integrates the disparate components into a useable system. To meet these needs, we developed a web-based application, PathOS, which takes variant data from a patient sample through to a clinical report. PathOS has been used operationally in the Peter MacCallum Cancer Centre for two years for the analysis, curation and reporting of genetic tests for cancer patients, as well as the curation of large-scale research studies. PathOS has also been deployed in cloud environments allowing multiple institutions to use separate, secure and customisable instances of the system. Increasingly, the bottleneck of variant curation is limiting the adoption of clinical sequencing for molecular diagnostics. PathOS is focused on providing clinical variant curators and pathology laboratories with a decision support system needed for personalised medicine. While the genesis of PathOS has been within cancer molecular diagnostics, the system is applicable to NGS clinical reporting generally. The widespread availability of genomic sequencers has highlighted the limited availability of software to support clinical decision-making in molecular pathology. PathOS is a system that has been developed and refined in a hospital laboratory context to meet the needs of clinical diagnostics. The software is available as a set of Docker images and source code at https://github.com/PapenfussLab/PathOS .

  18. Computational and Statistical Methods for High-Throughput Mass Spectrometry-Based PTM Analysis.

    Science.gov (United States)

    Schwämmle, Veit; Vaudel, Marc

    2017-01-01

    Cell signaling and functions heavily rely on post-translational modifications (PTMs) of proteins. Their high-throughput characterization is thus of utmost interest for multiple biological and medical investigations. In combination with efficient enrichment methods, peptide mass spectrometry analysis allows the quantitative comparison of thousands of modified peptides over different conditions. However, the large and complex datasets produced pose multiple data interpretation challenges, ranging from spectral interpretation to statistical and multivariate analyses. Here, we present a typical workflow to interpret such data.

  19. A Concept for a Sensitive Micro Total Analysis System for High Throughput Fluorescence Imaging

    OpenAIRE

    Rabner, Arthur; Shacham, Yosi

    2006-01-01

    This paper discusses possible methods for on-chip fluorescent imaging for integrated bio-sensors. The integration of optical and electro-optical accessories, according to suggested methods, can improve the performance of fluorescence imaging. It can boost the signal to background ratio by a few orders of magnitudes in comparison to conventional discrete setups. The methods that are present in this paper are oriented towards building reproducible arrays for high-throughput micro total analysis...

  20. Genome-wide identification of bone metastasis-related microRNAs in lung adenocarcinoma by high-throughput sequencing.

    Directory of Open Access Journals (Sweden)

    Lin Xie

    Full Text Available BACKGROUND: MicroRNAs (miRNAs are a class of small noncoding RNAs that regulate gene expression at the post-transcriptional level. They participate in a wide variety of biological processes, including apoptosis, proliferation and metastasis. The aberrant expression of miRNAs has been found to play an important role in many cancers. RESULTS: To understand the roles of miRNAs in the bone metastasis of lung adenocarcinoma, we constructed two small RNA libraries from blood of lung adenocarcinoma patients with and without bone metastasis. High-throughput sequencing combined with differential expression analysis identified that 7 microRNAs were down-regulated and 21 microRNAs were up-regulated in lung adenocarcinoma with bone metastasis. A total of 797 target genes of the differentially expressed microRNAs were identified using a bioinformatics approach. Functional annotation analysis indicated that a number of pathways might be involved in bone metastasis, survival of the primary origin and metastatic angiogenesis of lung adenocarcinoma. These include the MAPK, Wnt, and NF-kappaB signaling pathways, as well as pathways involving the matrix metalloproteinase, cytoskeletal protein and angiogenesis factors. CONCLUSIONS: This study provides some insights into the molecular mechanisms that underlie lung adenocarcinoma development, thereby aiding the diagnosis and treatment of the disease.

  1. Current developments in high-throughput analysis for microalgae cellular contents.

    Science.gov (United States)

    Lee, Tsung-Hua; Chang, Jo-Shu; Wang, Hsiang-Yu

    2013-11-01

    Microalgae have emerged as one of the most promising feedstocks for biofuels and bio-based chemical production. However, due to the lack of effective tools enabling rapid and high-throughput analysis of the content of microalgae biomass, the efficiency of screening and identification of microalgae with desired functional components from the natural environment is usually quite low. Moreover, the real-time monitoring of the production of target components from microalgae is also difficult. Recently, research efforts focusing on overcoming this limitation have started. In this review, the recent development of high-throughput methods for analyzing microalgae cellular contents is summarized. The future prospects and impacts of these detection methods in microalgae-related processing and industries are also addressed. Copyright © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  2. Identification and characterization of novel and conserved microRNAs in radish (Raphanus sativus L.) using high-throughput sequencing.

    Science.gov (United States)

    Xu, Liang; Wang, Yan; Xu, Yuanyuan; Wang, Liangju; Zhai, Lulu; Zhu, Xianwen; Gong, Yiqin; Ye, Shan; Liu, Liwang

    2013-03-01

    MicroRNAs (miRNAs) are endogenous, non-coding, small RNAs that play significant regulatory roles in plant growth, development, and biotic and abiotic stress responses. To date, a great number of conserved and species-specific miRNAs have been identified in many important plant species such as Arabidopsis, rice and poplar. However, little is known about identification of miRNAs and their target genes in radish (Raphanus sativus L.). In the present study, a small RNA library from radish root was constructed and sequenced using the high-throughput Solexa sequencing. Through sequence alignment and secondary structure prediction, a total of 545 conserved miRNA families as well as 15 novel (with their miRNA* strand) and 64 potentially novel miRNAs were identified. Quantitative real-time PCR (qRT-PCR) analysis confirmed that both conserved and novel miRNAs were expressed in radish, and some of them were preferentially expressed in certain tissues. A total of 196 potential target genes were predicted for 42 novel radish miRNAs. Gene ontology (GO) analysis showed that most of the targets were involved in plant growth, development, metabolism and stress responses. This study represents a first large-scale identification and characterization of radish miRNAs and their potential target genes. These results could lead to the further identification of radish miRNAs and enhance our understanding of radish miRNA regulatory mechanisms in diverse biological and metabolic processes. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.

  3. Unraveling Core Functional Microbiota in Traditional Solid-State Fermentation by High-Throughput Amplicons and Metatranscriptomics Sequencing

    Science.gov (United States)

    Song, Zhewei; Du, Hai; Zhang, Yan; Xu, Yan

    2017-01-01

    Fermentation microbiota is specific microorganisms that generate different types of metabolites in many productions. In traditional solid-state fermentation, the structural composition and functional capacity of the core microbiota determine the quality and quantity of products. As a typical example of food fermentation, Chinese Maotai-flavor liquor production involves a complex of various microorganisms and a wide variety of metabolites. However, the microbial succession and functional shift of the core microbiota in this traditional food fermentation remain unclear. Here, high-throughput amplicons (16S rRNA gene amplicon sequencing and internal transcribed space amplicon sequencing) and metatranscriptomics sequencing technologies were combined to reveal the structure and function of the core microbiota in Chinese soy sauce aroma type liquor production. In addition, ultra-performance liquid chromatography and headspace-solid phase microextraction-gas chromatography-mass spectrometry were employed to provide qualitative and quantitative analysis of the major flavor metabolites. A total of 10 fungal and 11 bacterial genera were identified as the core microbiota. In addition, metatranscriptomic analysis revealed pyruvate metabolism in yeasts (genera Pichia, Schizosaccharomyces, Saccharomyces, and Zygosaccharomyces) and lactic acid bacteria (genus Lactobacillus) classified into two stages in the production of flavor components. Stage I involved high-level alcohol (ethanol) production, with the genus Schizosaccharomyces serving as the core functional microorganism. Stage II involved high-level acid (lactic acid and acetic acid) production, with the genus Lactobacillus serving as the core functional microorganism. The functional shift from the genus Schizosaccharomyces to the genus Lactobacillus drives flavor component conversion from alcohol (ethanol) to acid (lactic acid and acetic acid) in Chinese Maotai-flavor liquor production. Our findings provide insight into the

  4. Unraveling Core Functional Microbiota in Traditional Solid-State Fermentation by High-Throughput Amplicons and Metatranscriptomics Sequencing.

    Science.gov (United States)

    Song, Zhewei; Du, Hai; Zhang, Yan; Xu, Yan

    2017-01-01

    Fermentation microbiota is specific microorganisms that generate different types of metabolites in many productions. In traditional solid-state fermentation, the structural composition and functional capacity of the core microbiota determine the quality and quantity of products. As a typical example of food fermentation, Chinese Maotai-flavor liquor production involves a complex of various microorganisms and a wide variety of metabolites. However, the microbial succession and functional shift of the core microbiota in this traditional food fermentation remain unclear. Here, high-throughput amplicons (16S rRNA gene amplicon sequencing and internal transcribed space amplicon sequencing) and metatranscriptomics sequencing technologies were combined to reveal the structure and function of the core microbiota in Chinese soy sauce aroma type liquor production. In addition, ultra-performance liquid chromatography and headspace-solid phase microextraction-gas chromatography-mass spectrometry were employed to provide qualitative and quantitative analysis of the major flavor metabolites. A total of 10 fungal and 11 bacterial genera were identified as the core microbiota. In addition, metatranscriptomic analysis revealed pyruvate metabolism in yeasts (genera Pichia, Schizosaccharomyces, Saccharomyces, and Zygosaccharomyces) and lactic acid bacteria (genus Lactobacillus) classified into two stages in the production of flavor components. Stage I involved high-level alcohol (ethanol) production, with the genus Schizosaccharomyces serving as the core functional microorganism. Stage II involved high-level acid (lactic acid and acetic acid) production, with the genus Lactobacillus serving as the core functional microorganism. The functional shift from the genus Schizosaccharomyces to the genus Lactobacillus drives flavor component conversion from alcohol (ethanol) to acid (lactic acid and acetic acid) in Chinese Maotai-flavor liquor production. Our findings provide insight into the

  5. Unraveling Core Functional Microbiota in Traditional Solid-State Fermentation by High-Throughput Amplicons and Metatranscriptomics Sequencing

    Directory of Open Access Journals (Sweden)

    Zhewei Song

    2017-07-01

    Full Text Available Fermentation microbiota is specific microorganisms that generate different types of metabolites in many productions. In traditional solid-state fermentation, the structural composition and functional capacity of the core microbiota determine the quality and quantity of products. As a typical example of food fermentation, Chinese Maotai-flavor liquor production involves a complex of various microorganisms and a wide variety of metabolites. However, the microbial succession and functional shift of the core microbiota in this traditional food fermentation remain unclear. Here, high-throughput amplicons (16S rRNA gene amplicon sequencing and internal transcribed space amplicon sequencing and metatranscriptomics sequencing technologies were combined to reveal the structure and function of the core microbiota in Chinese soy sauce aroma type liquor production. In addition, ultra-performance liquid chromatography and headspace-solid phase microextraction-gas chromatography-mass spectrometry were employed to provide qualitative and quantitative analysis of the major flavor metabolites. A total of 10 fungal and 11 bacterial genera were identified as the core microbiota. In addition, metatranscriptomic analysis revealed pyruvate metabolism in yeasts (genera Pichia, Schizosaccharomyces, Saccharomyces, and Zygosaccharomyces and lactic acid bacteria (genus Lactobacillus classified into two stages in the production of flavor components. Stage I involved high-level alcohol (ethanol production, with the genus Schizosaccharomyces serving as the core functional microorganism. Stage II involved high-level acid (lactic acid and acetic acid production, with the genus Lactobacillus serving as the core functional microorganism. The functional shift from the genus Schizosaccharomyces to the genus Lactobacillus drives flavor component conversion from alcohol (ethanol to acid (lactic acid and acetic acid in Chinese Maotai-flavor liquor production. Our findings provide

  6. Image Harvest: an open-source platform for high-throughput plant image processing and analysis

    Science.gov (United States)

    Knecht, Avi C.; Campbell, Malachy T.; Caprez, Adam; Swanson, David R.; Walia, Harkamal

    2016-01-01

    High-throughput plant phenotyping is an effective approach to bridge the genotype-to-phenotype gap in crops. Phenomics experiments typically result in large-scale image datasets, which are not amenable for processing on desktop computers, thus creating a bottleneck in the image-analysis pipeline. Here, we present an open-source, flexible image-analysis framework, called Image Harvest (IH), for processing images originating from high-throughput plant phenotyping platforms. Image Harvest is developed to perform parallel processing on computing grids and provides an integrated feature for metadata extraction from large-scale file organization. Moreover, the integration of IH with the Open Science Grid provides academic researchers with the computational resources required for processing large image datasets at no cost. Image Harvest also offers functionalities to extract digital traits from images to interpret plant architecture-related characteristics. To demonstrate the applications of these digital traits, a rice (Oryza sativa) diversity panel was phenotyped and genome-wide association mapping was performed using digital traits that are used to describe different plant ideotypes. Three major quantitative trait loci were identified on rice chromosomes 4 and 6, which co-localize with quantitative trait loci known to regulate agronomically important traits in rice. Image Harvest is an open-source software for high-throughput image processing that requires a minimal learning curve for plant biologists to analyzephenomics datasets. PMID:27141917

  7. Image Harvest: an open-source platform for high-throughput plant image processing and analysis.

    Science.gov (United States)

    Knecht, Avi C; Campbell, Malachy T; Caprez, Adam; Swanson, David R; Walia, Harkamal

    2016-05-01

    High-throughput plant phenotyping is an effective approach to bridge the genotype-to-phenotype gap in crops. Phenomics experiments typically result in large-scale image datasets, which are not amenable for processing on desktop computers, thus creating a bottleneck in the image-analysis pipeline. Here, we present an open-source, flexible image-analysis framework, called Image Harvest (IH), for processing images originating from high-throughput plant phenotyping platforms. Image Harvest is developed to perform parallel processing on computing grids and provides an integrated feature for metadata extraction from large-scale file organization. Moreover, the integration of IH with the Open Science Grid provides academic researchers with the computational resources required for processing large image datasets at no cost. Image Harvest also offers functionalities to extract digital traits from images to interpret plant architecture-related characteristics. To demonstrate the applications of these digital traits, a rice (Oryza sativa) diversity panel was phenotyped and genome-wide association mapping was performed using digital traits that are used to describe different plant ideotypes. Three major quantitative trait loci were identified on rice chromosomes 4 and 6, which co-localize with quantitative trait loci known to regulate agronomically important traits in rice. Image Harvest is an open-source software for high-throughput image processing that requires a minimal learning curve for plant biologists to analyzephenomics datasets. © The Author 2016. Published by Oxford University Press on behalf of the Society for Experimental Biology.

  8. An improved high-throughput lipid extraction method for the analysis of human brain lipids.

    Science.gov (United States)

    Abbott, Sarah K; Jenner, Andrew M; Mitchell, Todd W; Brown, Simon H J; Halliday, Glenda M; Garner, Brett

    2013-03-01

    We have developed a protocol suitable for high-throughput lipidomic analysis of human brain samples. The traditional Folch extraction (using chloroform and glass-glass homogenization) was compared to a high-throughput method combining methyl-tert-butyl ether (MTBE) extraction with mechanical homogenization utilizing ceramic beads. This high-throughput method significantly reduced sample handling time and increased efficiency compared to glass-glass homogenizing. Furthermore, replacing chloroform with MTBE is safer (less carcinogenic/toxic), with lipids dissolving in the upper phase, allowing for easier pipetting and the potential for automation (i.e., robotics). Both methods were applied to the analysis of human occipital cortex. Lipid species (including ceramides, sphingomyelins, choline glycerophospholipids, ethanolamine glycerophospholipids and phosphatidylserines) were analyzed via electrospray ionization mass spectrometry and sterol species were analyzed using gas chromatography mass spectrometry. No differences in lipid species composition were evident when the lipid extraction protocols were compared, indicating that MTBE extraction with mechanical bead homogenization provides an improved method for the lipidomic profiling of human brain tissue.

  9. CrossCheck: an open-source web tool for high-throughput screen data analysis.

    Science.gov (United States)

    Najafov, Jamil; Najafov, Ayaz

    2017-07-19

    Modern high-throughput screening methods allow researchers to generate large datasets that potentially contain important biological information. However, oftentimes, picking relevant hits from such screens and generating testable hypotheses requires training in bioinformatics and the skills to efficiently perform database mining. There are currently no tools available to general public that allow users to cross-reference their screen datasets with published screen datasets. To this end, we developed CrossCheck, an online platform for high-throughput screen data analysis. CrossCheck is a centralized database that allows effortless comparison of the user-entered list of gene symbols with 16,231 published datasets. These datasets include published data from genome-wide RNAi and CRISPR screens, interactome proteomics and phosphoproteomics screens, cancer mutation databases, low-throughput studies of major cell signaling mediators, such as kinases, E3 ubiquitin ligases and phosphatases, and gene ontological information. Moreover, CrossCheck includes a novel database of predicted protein kinase substrates, which was developed using proteome-wide consensus motif searches. CrossCheck dramatically simplifies high-throughput screen data analysis and enables researchers to dig deep into the published literature and streamline data-driven hypothesis generation. CrossCheck is freely accessible as a web-based application at http://proteinguru.com/crosscheck.

  10. A High-Throughput DNA-Sequencing Approach for Determining Sources of Fecal Bacteria in a Lake Superior Estuary.

    Science.gov (United States)

    Brown, Clairessa M; Staley, Christopher; Wang, Ping; Dalzell, Brent; Chun, Chan Lan; Sadowsky, Michael J

    2017-08-01

    Current microbial source-tracking (MST) methods, employed to determine sources of fecal contamination in waterways, use molecular markers targeting host-associated bacteria in animal or human feces. However, there is a lack of knowledge about fecal microbiome composition in several animals and imperfect marker specificity and sensitivity. To overcome these issues, a community-based MST method has been developed. Here, we describe a study done in the Lake Superior-Saint Louis River estuary using SourceTracker, a program that calculates the source contribution to an environment. High-throughput DNA sequencing of microbiota from a diverse collection of fecal samples obtained from 11 types of animals (wild, agricultural, and domesticated) and treated effluent (n = 233) was used to generate a fecal library to perform community-based MST. Analysis of 319 fecal and environmental samples revealed that the community compositions in water and fecal samples were significantly different, allowing for the determination of the presence of fecal inputs and identification of specific sources. SourceTracker results indicated that fecal bacterial inputs into the Lake Superior estuary were primarily attributed to wastewater effluent and, to a lesser extent, geese and gull wastes. These results suggest that a community-based MST method may be another useful tool for determining sources of aquatic fecal bacteria.

  11. Multiplexed Spliced-Leader Sequencing: A high-throughput, selective method for RNA-seq in Trypanosomatids.

    Science.gov (United States)

    Cuypers, Bart; Domagalska, Malgorzata A; Meysman, Pieter; Muylder, Géraldine de; Vanaerschot, Manu; Imamura, Hideo; Dumetz, Franck; Verdonckt, Thomas Wolf; Myler, Peter J; Ramasamy, Gowthaman; Laukens, Kris; Dujardin, Jean-Claude

    2017-06-16

    High throughput sequencing techniques are poorly adapted for in vivo studies of parasites, which require prior in vitro culturing and purification. Trypanosomatids, a group of kinetoplastid protozoans, possess a distinctive feature in their transcriptional mechanism whereby a specific Spliced Leader (SL) sequence is added to the 5'end of each mRNA by trans-splicing. This allows to discriminate Trypansomatid RNA from mammalian RNA and forms the basis of our new multiplexed protocol for high-throughput, selective RNA-sequencing called SL-seq. We provided a proof-of-concept of SL-seq in Leishmania donovani, the main causative agent of visceral leishmaniasis in humans, and successfully applied the method to sequence Leishmania mRNA directly from infected macrophages and from highly diluted mixes with human RNA. mRNA profiles obtained with SL-seq corresponded largely to those obtained from conventional poly-A tail purification methods, indicating both enumerate the same mRNA pool. However, SL-seq offers additional advantages, including lower sequencing depth requirements, fast and simple library prep and high resolution splice site detection. SL-seq is therefore ideal for fast and massive parallel sequencing of parasite transcriptomes directly from host tissues. Since SLs are also present in Nematodes, Cnidaria and primitive chordates, this method could also have high potential for transcriptomics studies in other organisms.

  12. Yeast diversity during the fermentation of Andean chicha: A comparison of high-throughput sequencing and culture-dependent approaches.

    Science.gov (United States)

    Mendoza, Lucía M; Neef, Alexander; Vignolo, Graciela; Belloch, Carmela

    2017-10-01

    Diversity and dynamics of yeasts associated with the fermentation of Argentinian maize-based beverage chicha was investigated. Samples taken at different stages from two chicha productions were analyzed by culture-dependent and culture-independent methods. Five hundred and ninety six yeasts were isolated by classical microbiological methods and 16 species identified by RFLPs and sequencing of D1/D2 26S rRNA gene. Genetic typing of isolates from the dominant species, Saccharomyces cerevisiae, by PCR of delta elements revealed up to 42 different patterns. High-throughput sequencing (HTS) of D1/D2 26S rRNA gene amplicons from chicha samples detected more than one hundred yeast species and almost fifty filamentous fungi taxa. Analysis of the data revealed that yeasts dominated the fermentation, although, a significant percentage of filamentous fungi appeared in the first step of the process. Statistical analysis of results showed that very few taxa were represented by more than 1% of the reads per sample at any step of the process. S. cerevisiae represented more than 90% of the reads in the fermentative samples. Other yeast species dominated the pre-fermentative steps and abounded in fermented samples when S. cerevisiae was in percentages below 90%. Most yeasts species detected by pyrosequencing were not recovered by cultivation. In contrast, the cultivation-based methodology detected very few yeast taxa, and most of them corresponded with very few reads in the pyrosequencing analysis. Copyright © 2017 Elsevier Ltd. All rights reserved.

  13. High-throughput sequencing of microRNAs in peripheral blood mononuclear cells: identification of potential weight loss biomarkers.

    Directory of Open Access Journals (Sweden)

    Fermín I Milagro

    Full Text Available INTRODUCTION: MicroRNAs (miRNAs are being increasingly studied in relation to energy metabolism and body composition homeostasis. Indeed, the quantitative analysis of miRNAs expression in different adiposity conditions may contribute to understand the intimate mechanisms participating in body weight control and to find new biomarkers with diagnostic or prognostic value in obesity management. OBJECTIVE: The aim of this study was the search for miRNAs in blood cells whose expression could be used as prognostic biomarkers of weight loss. METHODS: Ten Caucasian obese women were selected among the participants in a weight-loss trial that consisted in following an energy-restricted treatment. Weight loss was considered unsuccessful when 5% (responders. At baseline, total miRNA isolated from peripheral blood mononuclear cells (PBMC was sequenced with SOLiD v4. The miRNA sequencing data were validated by RT-PCR. RESULTS: Differential baseline expression of several miRNAs was found between responders and non-responders. Two miRNAs were up-regulated in the non-responder group (mir-935 and mir-4772 and three others were down-regulated (mir-223, mir-224 and mir-376b. Both mir-935 and mir-4772 showed relevant associations with the magnitude of weight loss, although the expression of other transcripts (mir-874, mir-199b, mir-766, mir-589 and mir-148b also correlated with weight loss. CONCLUSIONS: This research addresses the use of high-throughput sequencing technologies in the search for miRNA expression biomarkers in obesity, by determining the miRNA transcriptome of PBMC. Basal expression of different miRNAs, particularly mir-935 and mir-4772, could be prognostic biomarkers and may forecast the response to a hypocaloric diet.

  14. Chromatin analyses of Zymoseptoria tritici: Methods for chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq).

    Science.gov (United States)

    Soyer, Jessica L; Möller, Mareike; Schotanus, Klaas; Connolly, Lanelle R; Galazka, Jonathan M; Freitag, Michael; Stukenbrock, Eva H

    2015-06-01

    The presence or absence of specific transcription factors, chromatin remodeling machineries, chromatin modification enzymes, post-translational histone modifications and histone variants all play crucial roles in the regulation of pathogenicity genes. Chromatin immunoprecipitation (ChIP) followed by high-throughput sequencing (ChIP-seq) provides an important tool to study genome-wide protein-DNA interactions to help understand gene regulation in the context of native chromatin. ChIP-seq is a convenient in vivo technique to identify, map and characterize occupancy of specific DNA fragments with proteins against which specific antibodies exist or which can be epitope-tagged in vivo. We optimized existing ChIP protocols for use in the wheat pathogen Zymoseptoria tritici and closely related sister species. Here, we provide a detailed method, underscoring which aspects of the technique are organism-specific. Library preparation for Illumina sequencing is described, as this is currently the most widely used ChIP-seq method. One approach for the analysis and visualization of representative sequence is described; improved tools for these analyses are constantly being developed. Using ChIP-seq with antibodies against H3K4me2, which is considered a mark for euchromatin or H3K9me3 and H3K27me3, which are considered marks for heterochromatin, the overall distribution of euchromatin and heterochromatin in the genome of Z. tritici can be determined. Our ChIP-seq protocol was also successfully applied to Z. tritici strains with high levels of melanization or aberrant colony morphology, and to different species of the genus (Z. ardabiliae and Z. pseudotritici), suggesting that our technique is robust. The methods described here provide a powerful framework to study new aspects of chromatin biology and gene regulation in this prominent wheat pathogen. Copyright © 2015 Elsevier Inc. All rights reserved.

  15. A standardized framework for accurate, high-throughput genotyping of recombinant and non-recombinant viral sequences.

    Science.gov (United States)

    Alcantara, Luiz Carlos Junior; Cassol, Sharon; Libin, Pieter; Deforche, Koen; Pybus, Oliver G; Van Ranst, Marc; Galvão-Castro, Bernardo; Vandamme, Anne-Mieke; de Oliveira, Tulio

    2009-07-01

    Human immunodeficiency virus type-1 (HIV-1), hepatitis B and C and other rapidly evolving viruses are characterized by extremely high levels of genetic diversity. To facilitate diagnosis and the development of prevention and treatment strategies that efficiently target the diversity of these viruses, and other pathogens such as human T-lymphotropic virus type-1 (HTLV-1), human herpes virus type-8 (HHV8) and human papillomavirus (HPV), we developed a rapid high-throughput-genotyping system. The method involves the alignment of a query sequence with a carefully selected set of pre-defined reference strains, followed by phylogenetic analysis of multiple overlapping segments of the alignment using a sliding window. Each segment of the query sequence is assigned the genotype and sub-genotype of the reference strain with the highest bootstrap (>70%) and bootscanning (>90%) scores. Results from all windows are combined and displayed graphically using color-coded genotypes. The new Virus-Genotyping Tools provide accurate classification of recombinant and non-recombinant viruses and are currently being assessed for their diagnostic utility. They have incorporated into several HIV drug resistance algorithms including the Stanford (http://hivdb.stanford.edu) and two European databases (http://www.umcutrecht.nl/subsite/spread-programme/ and http://www.hivrdb.org.uk/) and have been successfully used to genotype a large number of sequences in these and other databases. The tools are a PHP/JAVA web application and are freely accessible on a number of servers including: http://bioafrica.mrc.ac.za/rega-genotype/html/, http://lasp.cpqgm.fiocruz.br/virus-genotype/html/, http://jose.med.kuleuven.be/genotypetool/html/.

  16. An Automated High Throughput Proteolysis and Desalting Platform for Quantitative Proteomic Analysis

    Directory of Open Access Journals (Sweden)

    Albert-Baskar Arul

    2013-06-01

    Full Text Available Proteomics for biomarker validation needs high throughput instrumentation to analyze huge set of clinical samples for quantitative and reproducible analysis at a minimum time without manual experimental errors. Sample preparation, a vital step in proteomics plays a major role in identification and quantification of proteins from biological samples. Tryptic digestion a major check point in sample preparation for mass spectrometry based proteomics needs to be more accurate with rapid processing time. The present study focuses on establishing a high throughput automated online system for proteolytic digestion and desalting of proteins from biological samples quantitatively and qualitatively in a reproducible manner. The present study compares online protein digestion and desalting of BSA with conventional off-line (in-solution method and validated for real time sample for reproducibility. Proteins were identified using SEQUEST data base search engine and the data were quantified using IDEALQ software. The present study shows that the online system capable of handling high throughput samples in 96 well formats carries out protein digestion and peptide desalting efficiently in a reproducible and quantitative manner. Label free quantification showed clear increase of peptide quantities with increase in concentration with much linearity compared to off line method. Hence we would like to suggest that inclusion of this online system in proteomic pipeline will be effective in quantification of proteins in comparative proteomics were the quantification is really very crucial.

  17. Investigation of Human Cancers for Retrovirus by Low-Stringency Target Enrichment and High-Throughput Sequencing.

    Science.gov (United States)

    Vinner, Lasse; Mourier, Tobias; Friis-Nielsen, Jens; Gniadecki, Robert; Dybkaer, Karen; Rosenberg, Jacob; Langhoff, Jill Levin; Cruz, David Flores Santa; Fonager, Jannik; Izarzugaza, Jose M G; Gupta, Ramneek; Sicheritz-Ponten, Thomas; Brunak, Søren; Willerslev, Eske; Nielsen, Lars Peter; Hansen, Anders Johannes

    2015-08-19

    Although nearly one fifth of all human cancers have an infectious aetiology, the causes for the majority of cancers remain unexplained. Despite the enormous data output from high-throughput shotgun sequencing, viral DNA in a clinical sample typically constitutes a proportion of host DNA that is too small to be detected. Sequence variation among virus genomes complicates application of sequence-specific, and highly sensitive, PCR methods. Therefore, we aimed to develop and characterize a method that permits sensitive detection of sequences despite considerable variation. We demonstrate that our low-stringency in-solution hybridization method enables detection of retrovirus in samples from three cancer types. In accordance with recent studies our investigation revealed no retroviral infections in human B-cell lymphoma cells, cutaneous T-cell lymphoma or colorectal cancer biopsies. Nonetheless, our generally applicable method makes sensitive detection possible and permits sequencing of distantly related sequences from complex material.

  18. Reproducibility of high-throughput mRNA and small RNA sequencing across laboratories.

    Science.gov (United States)

    't Hoen, Peter A C; Friedländer, Marc R; Almlöf, Jonas; Sammeth, Michael; Pulyakhina, Irina; Anvar, Seyed Yahya; Laros, Jeroen F J; Buermans, Henk P J; Karlberg, Olof; Brännvall, Mathias; den Dunnen, Johan T; van Ommen, Gert-Jan B; Gut, Ivo G; Guigó, Roderic; Estivill, Xavier; Syvänen, Ann-Christine; Dermitzakis, Emmanouil T; Lappalainen, Tuuli

    2013-11-01

    RNA sequencing is an increasingly popular technology for genome-wide analysis of transcript sequence and abundance. However, understanding of the sources of technical and interlaboratory variation is still limited. To address this, the GEUVADIS consortium sequenced mRNAs and small RNAs of lymphoblastoid cell lines of 465 individuals in seven sequencing centers, with a large number of replicates. The variation between laboratories appeared to be considerably smaller than the already limited biological variation. Laboratory effects were mainly seen in differences in insert size and GC content and could be adequately corrected for. In small-RNA sequencing, the microRNA (miRNA) content differed widely between samples owing to competitive sequencing of rRNA fragments. This did not affect relative quantification of miRNAs. We conclude that distributing RNA sequencing among different laboratories is feasible, given proper standardization and randomization procedures. We provide a set of quality measures and guidelines for assessing technical biases in RNA-seq data.

  19. SSR_pipeline--computer software for the identification of microsatellite sequences from paired-end Illumina high-throughput DNA sequence data

    Science.gov (United States)

    Miller, Mark P.; Knaus, Brian J.; Mullins, Thomas D.; Haig, Susan M.

    2013-01-01

    SSR_pipeline is a flexible set of programs designed to efficiently identify simple sequence repeats (SSRs; for example, microsatellites) from paired-end high-throughput Illumina DNA sequencing data. The program suite contains three analysis modules along with a fourth control module that can be used to automate analyses of large volumes of data. The modules are used to (1) identify the subset of paired-end sequences that pass quality standards, (2) align paired-end reads into a single composite DNA sequence, and (3) identify sequences that possess microsatellites conforming to user specified parameters. Each of the three separate analysis modules also can be used independently to provide greater flexibility or to work with FASTQ or FASTA files generated from other sequencing platforms (Roche 454, Ion Torrent, etc). All modules are implemented in the Python programming language and can therefore be used from nearly any computer operating system (Linux, Macintosh, Windows). The program suite relies on a compiled Python extension module to perform paired-end alignments. Instructions for compiling the extension from source code are provided in the documentation. Users who do not have Python installed on their computers or who do not have the ability to compile software also may choose to download packaged executable files. These files include all Python scripts, a copy of the compiled extension module, and a minimal installation of Python in a single binary executable. See program documentation for more information.

  20. Optimizing transformations for automated, high throughput analysis of flow cytometry data

    Directory of Open Access Journals (Sweden)

    Weng Andrew

    2010-11-01

    Full Text Available Abstract Background In a high throughput setting, effective flow cytometry data analysis depends heavily on proper data preprocessing. While usual preprocessing steps of quality assessment, outlier removal, normalization, and gating have received considerable scrutiny from the community, the influence of data transformation on the output of high throughput analysis has been largely overlooked. Flow cytometry measurements can vary over several orders of magnitude, cell populations can have variances that depend on their mean fluorescence intensities, and may exhibit heavily-skewed distributions. Consequently, the choice of data transformation can influence the output of automated gating. An appropriate data transformation aids in data visualization and gating of cell populations across the range of data. Experience shows that the choice of transformation is data specific. Our goal here is to compare the performance of different transformations applied to flow cytometry data in the context of automated gating in a high throughput, fully automated setting. We examine the most common transformations used in flow cytometry, including the generalized hyperbolic arcsine, biexponential, linlog, and generalized Box-Cox, all within the BioConductor flowCore framework that is widely used in high throughput, automated flow cytometry data analysis. All of these transformations have adjustable parameters whose effects upon the data are non-intuitive for most users. By making some modelling assumptions about the transformed data, we develop maximum likelihood criteria to optimize parameter choice for these different transformations. Results We compare the performance of parameter-optimized and default-parameter (in flowCore data transformations on real and simulated data by measuring the variation in the locations of cell populations across samples, discovered via automated gating in both the scatter and fluorescence channels. We find that parameter

  1. Development of Droplet Microfluidics Enabling High-Throughput Single-Cell Analysis

    Directory of Open Access Journals (Sweden)

    Na Wen

    2016-07-01

    Full Text Available This article reviews recent developments in droplet microfluidics enabling high-throughput single-cell analysis. Five key aspects in this field are included in this review: (1 prototype demonstration of single-cell encapsulation in microfluidic droplets; (2 technical improvements of single-cell encapsulation in microfluidic droplets; (3 microfluidic droplets enabling single-cell proteomic analysis; (4 microfluidic droplets enabling single-cell genomic analysis; and (5 integrated microfluidic droplet systems enabling single-cell screening. We examine the advantages and limitations of each technique and discuss future research opportunities by focusing on key performances of throughput, multifunctionality, and absolute quantification.

  2. Genetic characterisation of Malawian pneumococci prior to the roll-out of the PCV13 vaccine using a high-throughput whole genome sequencing approach.

    Directory of Open Access Journals (Sweden)

    Dean B Everett

    Full Text Available Malawi commenced the introduction of the 13-valent pneumococcal conjugate vaccine (PCV13 into the routine infant immunisation schedule in November 2011. Here we have tested the utility of high throughput whole genome sequencing to provide a high-resolution view of pre-vaccine pneumococcal epidemiology and population evolutionary trends to predict potential future change in population structure post introduction.One hundred and twenty seven (127 archived pneumococcal isolates from randomly selected adults and children presenting to the Queen Elizabeth Central Hospital, Blantyre, Malawi underwent whole genome sequencing.The pneumococcal population was dominated by serotype 1 (20.5% of invasive isolates prior to vaccine introduction. PCV13 is likely to protect against 62.9% of all circulating invasive pneumococci (78.3% in under-5-year-olds. Several Pneumococcal Molecular Epidemiology Network (PMEN clones are now in circulation in Malawi which were previously undetected but the pandemic multidrug resistant PMEN1 lineage was not identified. Genome analysis identified a number of novel sequence types and serotype switching.High throughput genome sequencing is now feasible and has the capacity to simultaneously elucidate serotype, sequence type and as well as detailed genetic information. It enables population level characterization, providing a detailed picture of population structure and genome evolution relevant to disease control. Post-vaccine introduction surveillance supported by genome sequencing is essential to providing a comprehensive picture of the impact of PCV13 on pneumococcal population structure and informing future public health interventions.

  3. Integrated Analysis Platform: An Open-Source Information System for High-Throughput Plant Phenotyping.

    Science.gov (United States)

    Klukas, Christian; Chen, Dijun; Pape, Jean-Michel

    2014-06-01

    High-throughput phenotyping is emerging as an important technology to dissect phenotypic components in plants. Efficient image processing and feature extraction are prerequisites to quantify plant growth and performance based on phenotypic traits. Issues include data management, image analysis, and result visualization of large-scale phenotypic data sets. Here, we present Integrated Analysis Platform (IAP), an open-source framework for high-throughput plant phenotyping. IAP provides user-friendly interfaces, and its core functions are highly adaptable. Our system supports image data transfer from different acquisition environments and large-scale image analysis for different plant species based on real-time imaging data obtained from different spectra. Due to the huge amount of data to manage, we utilized a common data structure for efficient storage and organization of data for both input data and result data. We implemented a block-based method for automated image processing to extract a representative list of plant phenotypic traits. We also provide tools for build-in data plotting and result export. For validation of IAP, we performed an example experiment that contains 33 maize (Zea mays 'Fernandez') plants, which were grown for 9 weeks in an automated greenhouse with nondestructive imaging. Subsequently, the image data were subjected to automated analysis with the maize pipeline implemented in our system. We found that the computed digital volume and number of leaves correlate with our manually measured data in high accuracy up to 0.98 and 0.95, respectively. In summary, IAP provides a multiple set of functionalities for import/export, management, and automated analysis of high-throughput plant phenotyping data, and its analysis results are highly reliable. © 2014 American Society of Plant Biologists. All Rights Reserved.

  4. Investigation of bacterial and fungal diversity in tarag using high-throughput sequencing.

    Science.gov (United States)

    Sun, Zhihong; Liu, Wenjun; Bao, Qiuhua; Zhang, Jiachao; Hou, Qiangchuan; Kwok, Laiyu; Sun, Tiansong; Zhang, Heping

    2014-10-01

    This is the first study on the bacterial and fungal community diversity in 17 tarag samples (naturally fermented dairy products) through a metagenomic approach involving high-throughput pyrosequencing. Our results revealed the presence of a total of 47 bacterial and 43 fungal genera in all tarag samples, in which Lactobacillus and Galactomyces were the predominant genera of bacteria and fungi, respectively. The number of some microbial genera, such as Lactococcus, Acetobacter, Saccharomyces, Trichosporon, and Kluyveromyces, among others, was found to vary between different samples. Altogether, our results showed that the microbial flora in different samples may be stratified by geographic region. Copyright © 2014 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  5. Predicting the origin of soil evidence: High throughput eukaryote sequencing and MIR spectroscopy applied to a crime scene scenario.

    Science.gov (United States)

    Young, Jennifer M; Weyrich, Laura S; Breen, James; Macdonald, Lynne M; Cooper, Alan

    2015-06-01

    Soil can serve as powerful trace evidence in forensic casework, because it is highly individualistic and can be characterised using a number of techniques. Complex soil matrixes can support a vast number of organisms that can provide a site-specific signal for use in forensic soil discrimination. Previous DNA fingerprinting techniques rely on variations in fragment length to distinguish between soil profiles and focus solely on microbial communities. However, the recent development of high throughput sequencing (HTS) has the potential to provide a more detailed picture of the soil community by accessing non-culturable microorganisms and by identifying specific bacteria, fungi, and plants within soil. To demonstrate the application of HTS to forensic soil analysis, 18S ribosomal RNA profiles of six forensic mock crime scene samples were compared to those collected from seven reference locations across South Australia. Our results demonstrate the utility of non-bacterial DNA to discriminate between different sites, and were able to link a soil to a particular location. In addition, HTS complemented traditional Mid Infrared (MIR) spectroscopy soil profiling, but was able to provide statistically stronger discriminatory power at a finer scale. Through the design of an experimental case scenario, we highlight the considerations and potential limitations of this method in forensic casework. We show that HTS analysis of soil eukaryotes was robust to environmental variation, e.g. rainfall and temperature, transfer effects, storage effects and spatial variation. In addition, this study utilises novel analytical methodologies to interpret results for investigative purposes and provides prediction statistics to support soil DNA analysis for evidential stages of a case. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  6. Strategies for high-throughput comparative modeling: applications to leverage analysis in structural genomics and protein family organization.

    Science.gov (United States)

    Mirkovic, Nebojsa; Li, Zhaohui; Parnassa, Andrew; Murray, Diana

    2007-03-01

    The technological breakthroughs in structural genomics were designed to facilitate the solution of a sufficient number of structures, so that as many protein sequences as possible can be structurally characterized with the aid of comparative modeling. The leverage of a solved structure is the number and quality of the models that can be produced using the structure as a template for modeling and may be viewed as the "currency" with which the success of a structural genomics endeavor can be measured. Moreover, the models obtained in this way should be valuable to all biologists. To this end, at the Northeast Structural Genomics Consortium (NESG), a modular computational pipeline for automated high-throughput leverage analysis was devised and used to assess the leverage of the 186 unique NESG structures solved during the first phase of the Protein Structure Initiative (January 2000 to July 2005). Here, the results of this analysis are presented. The number of sequences in the nonredundant protein sequence database covered by quality models produced by the pipeline is approximately 39,000, so that the average leverage is approximately 210 models per structure. Interestingly, only 7900 of these models fulfill the stringent modeling criterion of being at least 30% sequence-identical to the corresponding NESG structures. This study shows how high-throughput modeling increases the efficiency of structure determination efforts by providing enhanced coverage of protein structure space. In addition, the approach is useful in refining the boundaries of structural domains within larger protein sequences, subclassifying sequence diverse protein families, and defining structure-based strategies specific to a particular family. (c) 2006 Wiley-Liss, Inc.

  7. Allelome.PRO, a pipeline to define allele-specific genomic features from high-throughput sequencing data.

    Science.gov (United States)

    Andergassen, Daniel; Dotter, Christoph P; Kulinski, Tomasz M; Guenzl, Philipp M; Bammer, Philipp C; Barlow, Denise P; Pauler, Florian M; Hudson, Quanah J

    2015-12-02

    Detecting allelic biases from high-throughput sequencing data requires an approach that maximises sensitivity while minimizing false positives. Here, we present Allelome.PRO, an automated user-friendly bioinformatics pipeline, which uses high-throughput sequencing data from reciprocal crosses of two genetically distinct mouse strains to detect allele-specific expression and chromatin modifications. Allelome.PRO extends approaches used in previous studies that exclusively analyzed imprinted expression to give a complete picture of the 'allelome' by automatically categorising the allelic expression of all genes in a given cell type into imprinted, strain-biased, biallelic or non-informative. Allelome.PRO offers increased sensitivity to analyze lowly expressed transcripts, together with a robust false discovery rate empirically calculated from variation in the sequencing data. We used RNA-seq data from mouse embryonic fibroblasts from F1 reciprocal crosses to determine a biologically relevant allelic ratio cutoff, and define for the first time an entire allelome. Furthermore, we show that Allelome.PRO detects differential enrichment of H3K4me3 over promoters from ChIP-seq data validating the RNA-seq results. This approach can be easily extended to analyze histone marks of active enhancers, or transcription factor binding sites and therefore provides a powerful tool to identify candidate cis regulatory elements genome wide. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  8. SmartGrain: high-throughput phenotyping software for measuring seed shape through image analysis.

    Science.gov (United States)

    Tanabata, Takanari; Shibaya, Taeko; Hori, Kiyosumi; Ebana, Kaworu; Yano, Masahiro

    2012-12-01

    Seed shape and size are among the most important agronomic traits because they affect yield and market price. To obtain accurate seed size data, a large number of measurements are needed because there is little difference in size among seeds from one plant. To promote genetic analysis and selection for seed shape in plant breeding, efficient, reliable, high-throughput seed phenotyping methods are required. We developed SmartGrain software for high-throughput measurement of seed shape. This software uses a new image analysis method to reduce the time taken in the preparation of seeds and in image capture. Outlines of seeds are automatically recognized from digital images, and several shape parameters, such as seed length, width, area, and perimeter length, are calculated. To validate the software, we performed a quantitative trait locus (QTL) analysis for rice (Oryza sativa) seed shape using backcrossed inbred lines derived from a cross between japonica cultivars Koshihikari and Nipponbare, which showed small differences in seed shape. SmartGrain removed areas of awns and pedicels automatically, and several QTLs were detected for six shape parameters. The allelic effect of a QTL for seed length detected on chromosome 11 was confirmed in advanced backcross progeny; the cv Nipponbare allele increased seed length and, thus, seed weight. High-throughput measurement with SmartGrain reduced sampling error and made it possible to distinguish between lines with small differences in seed shape. SmartGrain could accurately recognize seed not only of rice but also of several other species, including Arabidopsis (Arabidopsis thaliana). The software is free to researchers.

  9. A Torrent of data: mapping chromatin organization using 5C and high-throughput sequencing.

    Science.gov (United States)

    Fraser, James; Ethier, Sylvain D; Miura, Hisashi; Dostie, Josée

    2012-01-01

    The study of three-dimensional genome organization is an exciting research area, which has benefited from the rapid development of high-resolution molecular mapping techniques over the past decade. These methods are derived from the chromosome conformation capture (3C) technique and are each aimed at improving some aspect of 3C. All 3C technologies use formaldehyde fixation and proximity-based ligation to capture chromatin contacts in cell populations and consider in vivo spatial proximity more or less inversely proportional to the frequency of measured interactions. The 3C-carbon copy (5C) method is among the most quantitative of these approaches. 5C is extremely robust and can be used to study chromatin organization at various scales. Here, we present a modified 5C analysis protocol adapted for sequencing with an Ion Torrent Personal Genome Machine™ (PGM™). We explain how Torrent 5C libraries are produced and sequenced. We also describe the statistical and computational methods we developed to normalize and analyze raw Torrent 5C sequence data. The Torrent 5C protocol should facilitate the study of in vivo chromatin architecture at high resolution because it benefits from high accuracy, greater speed, low running costs, and the flexibility of in-house next-generation sequencing. Copyright © 2012 Elsevier Inc. All rights reserved.

  10. Quantitative dot blot analysis (QDB), a versatile high throughput immunoblot method.

    Science.gov (United States)

    Tian, Geng; Tang, Fangrong; Yang, Chunhua; Zhang, Wenfeng; Bergquist, Jonas; Wang, Bin; Mi, Jia; Zhang, Jiandi

    2017-08-29

    Lacking access to an affordable method of high throughput immunoblot analysis for daily use remains a big challenge for scientists worldwide. We proposed here Quantitative Dot Blot analysis (QDB) to meet this demand. With the defined linear range, QDB analysis fundamentally transforms traditional immunoblot method into a true quantitative assay. Its convenience in analyzing large number of samples also enables bench scientists to examine protein expression levels from multiple parameters. In addition, the small amount of sample lysates needed for analysis means significant saving in research sources and efforts. This method was evaluated at both cellular and tissue levels with unexpected observations otherwise would be hard to achieve using conventional immunoblot methods like Western blot analysis. Using QDB technique, we were able to observed an age-dependent significant alteration of CAPG protein expression level in TRAMP mice. We believe that the adoption of QDB analysis would have immediate impact on biological and biomedical research to provide much needed high-throughput information at protein level in this "Big Data" era.

  11. QTL Mapping for Rice RVA Properties Using High-Throughput Re-sequenced Chromosome Segment Substitution Lines

    Directory of Open Access Journals (Sweden)

    Chang-quan ZHANG

    2013-11-01

    Full Text Available The rapid visco analyser (RVA profile is an important factor for evaluation of the cooking and eating quality of rice. To improve rice quality, the identification of new quantitative trait loci (QTLs for RVA profiling is of great significance. We used a japonica rice cultivar Nipponbare as the recipient and indica rice 9311 as the donor to develop a population containing 38 chromosome segment substitution lines (CSSLs genotyped by a high-throughput re-sequencing strategy. In this study, the population and the parent lines, which contained similar apparent amylose contents, were used to map the QTLs of RVA properties including peak paste viscosity (PKV, hot paste viscosity (HPV, cool paste viscosity (CPV, breakdown viscosity (BKV, setback viscosity (SBV, consistency viscosity (CSV, peak time (PeT and pasting temperature (PaT. QTL analysis was carried out using one-way analysis of variance and Dunnett's test, and stable QTLs were identified over two years and under two environments. We identified 10 stable QTLs: qPKV2-1, qSBV2-1; qPKV5-1, qHPV5-1, qCPV5-1; qPKV7-1, qHPV7-1, qCPV7-1, qSBV7-1; and qPKV8-1 on chromosomes 2, 5, 7 and 8, respectively, with contributions ranging from −95.6% to 47.1%. Besides, there was pleiotropy in the QTLs on chromosomes 2, 5 and 7.

  12. High-throughput glycosylation analysis of therapeutic immunoglobulin G by capillary gel electrophoresis using a DNA analyzer.

    NARCIS (Netherlands)

    Reusch, D.; Haberger, M.; Kailich, T.; Heidenreich, A.K.; Kampe, M.; Bulau, P.; Wuhrer, M.

    2014-01-01

    The Fc glycosylation of therapeutic antibodies is crucial for their effector functions and their behavior in pharmacokinetics and pharmacodynamics. To monitor the Fc glycosylation in bioprocess development and characterization, high-throughput techniques for glycosylation analysis are needed. Here,

  13. Recent advances in quantitative high throughput and high content data analysis.

    Science.gov (United States)

    Moutsatsos, Ioannis K; Parker, Christian N

    2016-01-01

    High throughput screening has become a basic technique with which to explore biological systems. Advances in technology, including increased screening capacity, as well as methods that generate multiparametric readouts, are driving the need for improvements in the analysis of data sets derived from such screens. This article covers the recent advances in the analysis of high throughput screening data sets from arrayed samples, as well as the recent advances in the analysis of cell-by-cell data sets derived from image or flow cytometry application. Screening multiple genomic reagents targeting any given gene creates additional challenges and so methods that prioritize individual gene targets have been developed. The article reviews many of the open source data analysis methods that are now available and which are helping to define a consensus on the best practices to use when analyzing screening data. As data sets become larger, and more complex, the need for easily accessible data analysis tools will continue to grow. The presentation of such complex data sets, to facilitate quality control monitoring and interpretation of the results will require the development of novel visualizations. In addition, advanced statistical and machine learning algorithms that can help identify patterns, correlations and the best features in massive data sets will be required. The ease of use for these tools will be important, as they will need to be used iteratively by laboratory scientists to improve the outcomes of complex analyses.

  14. [High throughput-targeted sequencing panel for exploring radiosensitivity associated genes in esophageal squamous cell carcinoma].

    Science.gov (United States)

    Qiao, Y; Hu, C X; Song, D A; Li, S Q; Zhou, L H; Jiang, X D

    2017-08-23

    Objective: To explore radiosensitivity-associated genes in esophageal squamous cell carcinoma by targeted sequencing panel. Methods: The peripheral blood from 22 esophageal squamous cell carcinoma (ESCC) patients received radiotherapy alone were collected, respectively. The genomic DNA (gDNA) of peripheral blood was extracted and used to create a library of gDNA restriction fragments. The gDNA restriction fragments were hybridized to the HaloPlex probe capture library, which comprises 356 cancer genes selected from the Catalogue of Somatic Mutations in Cancer (Cosmic) database of 2011 updated edition. The sequencing data were aligned by the Genome Analysis Toolkit GATK (version 3.0) and Picar. The single nucleotide polymorphism and inserted-deletion (SNP/InDel) variations were annotated by online database. The pathway enrichment was analyzed by Ingenuity Pathway analysis (IPA). Moreover, according to the short-period curative effect, 22 patients were divided into two groups: the radiation- sensitivity group (CR+ PR) and the radiation-resistant group (PD+ SD). The nonsynonymous mutation sites were statistically analyzed and the genes associated with radiosensitivity of ESCC were screened. Results: More than 97% sequencing reads were aligned to human genome reference sequence and more than 90% sequencing reads were the target sequences. SNP/InDel database annotation results showed that the mutations of 22 cases mainly distributed in exons, and the mutant types were mainly missense and synonymous single nucleotide variant (SNV). There were 23 genes of high-frequency mutation associated with esophageal cancer. Pathway enrichment by IPA showed that 3 pathways were associated with the development of esophageal cancer, which were roles of BRCA1 in DNA damage response pathway, DNA double-strand break repair by non-homologous end joining pathway and ATM signaling pathway. According to the curative effect, five genes including mismatch repair system component (PMS1

  15. PCR primers to study the diversity of expressed fungal genes encoding lignocellulolytic enzymes in soils using high-throughput sequencing.

    Directory of Open Access Journals (Sweden)

    Florian Barbi

    Full Text Available Plant biomass degradation in soil is one of the key steps of carbon cycling in terrestrial ecosystems. Fungal saprotrophic communities play an essential role in this process by producing hydrolytic enzymes active on the main components of plant organic matter. Open questions in this field regard the diversity of the species involved, the major biochemical pathways implicated and how these are affected by external factors such as litter quality or climate changes. This can be tackled by environmental genomic approaches involving the systematic sequencing of key enzyme-coding gene families using soil-extracted RNA as material. Such an approach necessitates the design and evaluation of gene family-specific PCR primers producing sequence fragments compatible with high-throughput sequencing approaches. In the present study, we developed and evaluated PCR primers for the specific amplification of fungal CAZy Glycoside Hydrolase gene families GH5 (subfamily 5 and GH11 encoding endo-β-1,4-glucanases and endo-β-1,4-xylanases respectively as well as Basidiomycota class II peroxidases, corresponding to the CAZy Auxiliary Activity family 2 (AA2, active on lignin. These primers were experimentally validated using DNA extracted from a wide range of Ascomycota and Basidiomycota species including 27 with sequenced genomes. Along with the published primers for Glycoside Hydrolase GH7 encoding enzymes active on cellulose, the newly design primers were shown to be compatible with the Illumina MiSeq sequencing technology. Sequences obtained from RNA extracted from beech or spruce forest soils showed a high diversity and were uniformly distributed in gene trees featuring the global diversity of these gene families. This high-throughput sequencing approach using several degenerate primers constitutes a robust method, which allows the simultaneous characterization of the diversity of different fungal transcripts involved in plant organic matter degradation and may

  16. msBiodat analysis tool, big data analysis for high-throughput experiments.

    Science.gov (United States)

    Muñoz-Torres, Pau M; Rokć, Filip; Belužic, Robert; Grbeša, Ivana; Vugrek, Oliver

    2016-01-01

    Mass spectrometry (MS) are a group of a high-throughput techniques used to increase knowledge about biomolecules. They produce a large amount of data which is presented as a list of hundreds or thousands of proteins. Filtering those data efficiently is the first step for extracting biologically relevant information. The filtering may increase interest by merging previous data with the data obtained from public databases, resulting in an accurate list of proteins which meet the predetermined conditions. In this article we present msBiodat Analysis Tool, a web-based application thought to approach proteomics to the big data analysis. With this tool, researchers can easily select the most relevant information from their MS experiments using an easy-to-use web interface. An interesting feature of msBiodat analysis tool is the possibility of selecting proteins by its annotation on Gene Ontology using its Gene Id, ensembl or UniProt codes. The msBiodat analysis tool is a web-based application that allows researchers with any programming experience to deal with efficient database querying advantages. Its versatility and user-friendly interface makes easy to perform fast and accurate data screening by using complex queries. Once the analysis is finished, the result is delivered by e-mail. msBiodat analysis tool is freely available at http://msbiodata.irb.hr.

  17. High throughput resistance profiling of Plasmodium falciparum infections based on custom dual indexing and Illumina next generation sequencing-technology

    DEFF Research Database (Denmark)

    Nag, Sidsel; Dalgaard, Marlene Danner; Kofoed, Poul-Erik

    2017-01-01

    Genetic polymorphisms in P. falciparum can be used to indicate the parasite's susceptibility to antimalarial drugs as well as its geographical origin. Both of these factors are key to monitoring development and spread of antimalarial drug resistance. In this study, we combine multiplex PCR, custom...... designed dual indexing and Miseq sequencing for high throughput SNP-profiling of 457 malaria infections from Guinea-Bissau, at the cost of 10 USD per sample. By amplifying and sequencing 15 genetic fragments, we cover 20 resistance-conferring SNPs occurring in pfcrt, pfmdr1, pfdhfr, pfdhps, as well...... as the entire length of pfK13, and the mitochondrial barcode for parasite origin. SNPs of interest were sequenced with an average depth of 2,043 reads, and bases were called for the various SNP-positions with a p-value below 0.05, for 89.8-100% of samples. The SNP data indicates that artemisinin resistance...

  18. Fixing Formalin: A Method to Recover Genomic-Scale DNA Sequence Data from Formalin-Fixed Museum Specimens Using High-Throughput Sequencing.

    Directory of Open Access Journals (Sweden)

    Sarah M Hykin

    Full Text Available For 150 years or more, specimens were routinely collected and deposited in natural history collections without preserving fresh tissue samples for genetic analysis. In the case of most herpetological specimens (i.e. amphibians and reptiles, attempts to extract and sequence DNA from formalin-fixed, ethanol-preserved specimens-particularly for use in phylogenetic analyses-has been laborious and largely ineffective due to the highly fragmented nature of the DNA. As a result, tens of thousands of specimens in herpetological collections have not been available for sequence-based phylogenetic studies. Massively parallel High-Throughput Sequencing methods and the associated bioinformatics, however, are particularly suited to recovering meaningful genetic markers from severely degraded/fragmented DNA sequences such as DNA damaged by formalin-fixation. In this study, we compared previously published DNA extraction methods on three tissue types subsampled from formalin-fixed specimens of Anolis carolinensis, followed by sequencing. Sufficient quality DNA was recovered from liver tissue, making this technique minimally destructive to museum specimens. Sequencing was only successful for the more recently collected specimen (collected ~30 ybp. We suspect this could be due either to the conditions of preservation and/or the amount of tissue used for extraction purposes. For the successfully sequenced sample, we found a high rate of base misincorporation. After rigorous trimming, we successfully mapped 27.93% of the cleaned reads to the reference genome, were able to reconstruct the complete mitochondrial genome, and recovered an accurate phylogenetic placement for our specimen. We conclude that the amount of DNA available, which can vary depending on specimen age and preservation conditions, will determine if sequencing will be successful. The technique described here will greatly improve the value of museum collections by making many formalin-fixed specimens

  19. Fixing Formalin: A Method to Recover Genomic-Scale DNA Sequence Data from Formalin-Fixed Museum Specimens Using High-Throughput Sequencing.

    Science.gov (United States)

    Hykin, Sarah M; Bi, Ke; McGuire, Jimmy A

    2015-01-01

    For 150 years or more, specimens were routinely collected and deposited in natural history collections without preserving fresh tissue samples for genetic analysis. In the case of most herpetological specimens (i.e. amphibians and reptiles), attempts to extract and sequence DNA from formalin-fixed, ethanol-preserved specimens-particularly for use in phylogenetic analyses-has been laborious and largely ineffective due to the highly fragmented nature of the DNA. As a result, tens of thousands of specimens in herpetological collections have not been available for sequence-based phylogenetic studies. Massively parallel High-Throughput Sequencing methods and the associated bioinformatics, however, are particularly suited to recovering meaningful genetic markers from severely degraded/fragmented DNA sequences such as DNA damaged by formalin-fixation. In this study, we compared previously published DNA extraction methods on three tissue types subsampled from formalin-fixed specimens of Anolis carolinensis, followed by sequencing. Sufficient quality DNA was recovered from liver tissue, making this technique minimally destructive to museum specimens. Sequencing was only successful for the more recently collected specimen (collected ~30 ybp). We suspect this could be due either to the conditions of preservation and/or the amount of tissue used for extraction purposes. For the successfully sequenced sample, we found a high rate of base misincorporation. After rigorous trimming, we successfully mapped 27.93% of the cleaned reads to the reference genome, were able to reconstruct the complete mitochondrial genome, and recovered an accurate phylogenetic placement for our specimen. We conclude that the amount of DNA available, which can vary depending on specimen age and preservation conditions, will determine if sequencing will be successful. The technique described here will greatly improve the value of museum collections by making many formalin-fixed specimens available for

  20. High-throughput sequencing of human plasma RNA by using thermostable group II intron reverse transcriptases.

    Science.gov (United States)

    Qin, Yidan; Yao, Jun; Wu, Douglas C; Nottingham, Ryan M; Mohr, Sabine; Hunicke-Smith, Scott; Lambowitz, Alan M

    2016-01-01

    Next-generation RNA-sequencing (RNA-seq) has revolutionized transcriptome profiling, gene expression analysis, and RNA-based diagnostics. Here, we developed a new RNA-seq method that exploits thermostable group II intron reverse transcriptases (TGIRTs) and used it to profile human plasma RNAs. TGIRTs have higher thermostability, processivity, and fidelity than conventional reverse transcriptases, plus a novel template-switching activity that can efficiently attach RNA-seq adapters to target RNA sequences without RNA ligation. The new TGIRT-seq method enabled construction of RNA-seq libraries from RNA in RNA in 1-mL plasma samples from a healthy individual revealed RNA fragments mapping to a diverse population of protein-coding gene and long ncRNAs, which are enriched in intron and antisense sequences, as well as nearly all known classes of small ncRNAs, some of which have never before been seen in plasma. Surprisingly, many of the small ncRNA species were present as full-length transcripts, suggesting that they are protected from plasma RNases in ribonucleoprotein (RNP) complexes and/or exosomes. This TGIRT-seq method is readily adaptable for profiling of whole-cell, exosomal, and miRNAs, and for related procedures, such as HITS-CLIP and ribosome profiling. © 2015 Qin et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.

  1. Identification and characterization of microRNAs in Phaseolus vulgaris by high-throughput sequencing

    Science.gov (United States)

    2012-01-01

    Background MicroRNAs (miRNAs) are endogenously encoded small RNAs that post-transcriptionally regulate gene expression. MiRNAs play essential roles in almost all plant biological processes. Currently, few miRNAs have been identified in the model food legume Phaseolus vulgaris (common bean). Recent advances in next generation sequencing technologies have allowed the identification of conserved and novel miRNAs in many plant species. Here, we used Illumina's sequencing by synthesis (SBS) technology to identify and characterize the miRNA population of Phaseolus vulgaris. Results Small RNA libraries were generated from roots, flowers, leaves, and seedlings of P. vulgaris. Based on similarity to previously reported plant miRNAs,114 miRNAs belonging to 33 conserved miRNA families were identified. Stem-loop precursors and target gene sequences for several conserved common bean miRNAs were determined from publicly available databases. Less conserved miRNA families and species-specific common bean miRNA isoforms were also characterized. Moreover, novel miRNAs based on the small RNAs were found and their potential precursors were predicted. In addition, new target candidates for novel and conserved miRNAs were proposed. Finally, we studied organ-specific miRNA family expression levels through miRNA read frequencies. Conclusions This work represents the first massive-scale RNA sequencing study performed in Phaseolus vulgaris to identify and characterize its miRNA population. It significantly increases the number of miRNAs, precursors, and targets identified in this agronomically important species. The miRNA expression analysis provides a foundation for understanding common bean miRNA organ-specific expression patterns. The present study offers an expanded picture of P. vulgaris miRNAs in relation to those of other legumes. PMID:22394504

  2. Identification and characterization of microRNAs in Phaseolus vulgaris by high-throughput sequencing.

    Science.gov (United States)

    Peláez, Pablo; Trejo, Minerva S; Iñiguez, Luis P; Estrada-Navarrete, Georgina; Covarrubias, Alejandra A; Reyes, José L; Sanchez, Federico

    2012-03-06

    MicroRNAs (miRNAs) are endogenously encoded small RNAs that post-transcriptionally regulate gene expression. MiRNAs play essential roles in almost all plant biological processes. Currently, few miRNAs have been identified in the model food legume Phaseolus vulgaris (common bean). Recent advances in next generation sequencing technologies have allowed the identification of conserved and novel miRNAs in many plant species. Here, we used Illumina's sequencing by synthesis (SBS) technology to identify and characterize the miRNA population of Phaseolus vulgaris. Small RNA libraries were generated from roots, flowers, leaves, and seedlings of P. vulgaris. Based on similarity to previously reported plant miRNAs,114 miRNAs belonging to 33 conserved miRNA families were identified. Stem-loop precursors and target gene sequences for several conserved common bean miRNAs were determined from publicly available databases. Less conserved miRNA families and species-specific common bean miRNA isoforms were also characterized. Moreover, novel miRNAs based on the small RNAs were found and their potential precursors were predicted. In addition, new target candidates for novel and conserved miRNAs were proposed. Finally, we studied organ-specific miRNA family expression levels through miRNA read frequencies. This work represents the first massive-scale RNA sequencing study performed in Phaseolus vulgaris to identify and characterize its miRNA population. It significantly increases the number of miRNAs, precursors, and targets identified in this agronomically important species. The miRNA expression analysis provides a foundation for understanding common bean miRNA organ-specific expression patterns. The present study offers an expanded picture of P. vulgaris miRNAs in relation to those of other legumes.

  3. Identification and characterization of microRNAs in Phaseolus vulgaris by high-throughput sequencing

    Directory of Open Access Journals (Sweden)

    Peláez Pablo

    2012-03-01

    Full Text Available Abstract Background MicroRNAs (miRNAs are endogenously encoded small RNAs that post-transcriptionally regulate gene expression. MiRNAs play essential roles in almost all plant biological processes. Currently, few miRNAs have been identified in the model food legume Phaseolus vulgaris (common bean. Recent advances in next generation sequencing technologies have allowed the identification of conserved and novel miRNAs in many plant species. Here, we used Illumina's sequencing by synthesis (SBS technology to identify and characterize the miRNA population of Phaseolus vulgaris. Results Small RNA libraries were generated from roots, flowers, leaves, and seedlings of P. vulgaris. Based on similarity to previously reported plant miRNAs,114 miRNAs belonging to 33 conserved miRNA families were identified. Stem-loop precursors and target gene sequences for several conserved common bean miRNAs were determined from publicly available databases. Less conserved miRNA families and species-specific common bean miRNA isoforms were also characterized. Moreover, novel miRNAs based on the small RNAs were found and their potential precursors were predicted. In addition, new target candidates for novel and conserved miRNAs were proposed. Finally, we studied organ-specific miRNA family expression levels through miRNA read frequencies. Conclusions This work represents the first massive-scale RNA sequencing study performed in Phaseolus vulgaris to identify and characterize its miRNA population. It significantly increases the number of miRNAs, precursors, and targets identified in this agronomically important species. The miRNA expression analysis provides a foundation for understanding common bean miRNA organ-specific expression patterns. The present study offers an expanded picture of P. vulgaris miRNAs in relation to those of other legumes.

  4. Global repeat discovery and estimation of genomic copy number in a large, complex genome using a high-throughput 454 sequence survey

    Directory of Open Access Journals (Sweden)

    Varala Kranthi

    2007-05-01

    Full Text Available Abstract Background Extensive computational and database tools are available to mine genomic and genetic databases for model organisms, but little genomic data is available for many species of ecological or agricultural significance, especially those with large genomes. Genome surveys using conventional sequencing techniques are powerful, particularly for detecting sequences present in many copies per genome. However these methods are time-consuming and have potential drawbacks. High throughput 454 sequencing provides an alternative method by which much information can be gained quickly and cheaply from high-coverage surveys of genomic DNA. Results We sequenced 78 million base-pairs of randomly sheared soybean DNA which passed our quality criteria. Computational analysis of the survey sequences provided global information on the abundant repetitive sequences in soybean. The sequence was used to determine the copy number across regions of large genomic clones or contigs and discover higher-order structures within satellite repeats. We have created an annotated, online database of sequences present in multiple copies in the soybean genome. The low bias of pyrosequencing against repeat sequences is demonstrated by the overall composition of the survey data, which matches well with past estimates of repetitive DNA content obtained by DNA re-association kinetics (Cot analysis. Conclusion This approach provides a potential aid to conventional or shotgun genome assembly, by allowing rapid assessment of copy number in any clone or clone-end sequence. In addition, we show that partial sequencing can provide access to partial protein-coding sequences.

  5. Microfluidic cell microarray platform for high throughput analysis of particle-cell interactions.

    Science.gov (United States)

    Tong, Ziqiu; Rajeev, Gayathri; Guo, Keying; Ivask, Angela; McCormick, Scott; Lombi, Enzo; Priest, Craig; Voelcker, Nicolas H

    2018-03-02

    With the advances in nanotechnology, particles with various size, shape, surface chemistry and composition can be easily produced. Nano- and microparticles have been extensively explored in many industrial and clinical applications. Ensuring that the particles themselves are not possessing any toxic effects to the biological system is of paramount importance. This paper describes a proof of concept method in which a microfluidic system is used in conjunction with a cell microarray technique aiming to streamline the analysis of particle-cell interaction in a high throughput manner. Polymeric microparticles, with different particle surface functionalities, were firstly used to investigate the efficiency of particle-cell adhesion under dynamic flow. Silver nanoparticles (AgNPs,10 nm in diameter) perfused at different concentrations (0 to 20 μg/ml) in parallel streams over the cells in the microchannel exhibited higher toxicity compared to the static culture in the 96 well plate format. This developed microfluidic system can be easily scaled up to accommodate larger number of microchannels for high throughput analysis of potential toxicity of a wide range of particles in a single experiment.

  6. High-throughput SNP-genotyping analysis of the relationships among Ponto-Caspian sturgeon species

    Science.gov (United States)

    Rastorguev, Sergey M; Nedoluzhko, Artem V; Mazur, Alexander M; Gruzdeva, Natalia M; Volkov, Alexander A; Barmintseva, Anna E; Mugue, Nikolai S; Prokhortchouk, Egor B

    2013-01-01

    Abstract Legally certified sturgeon fisheries require population protection and conservation methods, including DNA tests to identify the source of valuable sturgeon roe. However, the available genetic data are insufficient to distinguish between different sturgeon populations, and are even unable to distinguish between some species. We performed high-throughput single-nucleotide polymorphism (SNP)-genotyping analysis on different populations of Russian (Acipenser gueldenstaedtii), Persian (A. persicus), and Siberian (A. baerii) sturgeon species from the Caspian Sea region (Volga and Ural Rivers), the Azov Sea, and two Siberian rivers. We found that Russian sturgeons from the Volga and Ural Rivers were essentially indistinguishable, but they differed from Russian sturgeons in the Azov Sea, and from Persian and Siberian sturgeons. We identified eight SNPs that were sufficient to distinguish these sturgeon populations with 80% confidence, and allowed the development of markers to distinguish sturgeon species. Finally, on the basis of our SNP data, we propose that the A. baerii-like mitochondrial DNA found in some Russian sturgeons from the Caspian Sea arose via an introgression event during the Pleistocene glaciation. In the present study, the high-throughput genotyping analysis of several sturgeon populations was performed. SNP markers for species identification were defined. The possible explanation of the baerii-like mitotype presence in some Russian sturgeons in the Caspian Sea was suggested. PMID:24567827

  7. Digital PCR provides sensitive and absolute calibration for high throughput sequencing

    Directory of Open Access Journals (Sweden)

    Fan H Christina

    2009-03-01

    Full Text Available Abstract Background Next-generation DNA sequencing on the 454, Solexa, and SOLiD platforms requires absolute calibration of the number of molecules to be sequenced. This requirement has two unfavorable consequences. First, large amounts of sample-typically micrograms-are needed for library preparation, thereby limiting the scope of samples which can be sequenced. For many applications, including metagenomics and the sequencing of ancient, forensic, and clinical samples, the quantity of input DNA can be critically limiting. Second, each library requires a titration sequencing run, thereby increasing the cost and lowering the throughput of sequencing. Results We demonstrate the use of digital PCR to accurately quantify 454 and Solexa sequencing libraries, enabling the preparation of sequencing libraries from nanogram quantities of input material while eliminating costly and time-consuming titration runs of the sequencer. We successfully sequenced low-nanogram scale bacterial and mammalian DNA samples on the 454 FLX and Solexa DNA sequencing platforms. This study is the first to definitively demonstrate the successful sequencing of picogram quantities of input DNA on the 454 platform, reducing the sample requirement more than 1000-fold without pre-amplification and the associated bias and reduction in library depth. Conclusion The digital PCR assay allows absolute quantification of sequencing libraries, eliminates uncertainties associated with the construction and application of standard curves to PCR-based quantification, and with a coefficient of variation close to 10%, is sufficiently precise to enable direct sequencing without titration runs.

  8. Bacterial diversity of the American sand fly Lutzomyia intermedia using high-throughput metagenomic sequencing.

    Science.gov (United States)

    Monteiro, Carolina Cunha; Villegas, Luis Eduardo Martinez; Campolina, Thais Bonifácio; Pires, Ana Clara Machado Araújo; Miranda, Jose Carlos; Pimenta, Paulo Filemon Paolucci; Secundino, Nagila Francinete Costa

    2016-08-31

    Parasites of the genus Leishmania cause a broad spectrum of diseases, collectively known as leishmaniasis, in humans worldwide. American cutaneous leishmaniasis is a neglected disease transmitted by sand fly vectors including Lutzomyia intermedia, a proven vector. The female sand fly can acquire or deliver Leishmania spp. parasites while feeding on a blood meal, which is required for nutrition, egg development and survival. The microbiota composition and abundance varies by food source, life stages and physiological conditions. The sand fly microbiota can affect parasite life-cycle in the vector. We performed a metagenomic analysis for microbiota composition and abundance in Lu. intermedia, from an endemic area in Brazil. The adult insects were collected using CDC light traps, morphologically identified, carefully sterilized, dissected under a microscope and the females separated into groups according to their physiological condition: (i) absence of blood meal (unfed = UN); (ii) presence of blood meal (blood-fed = BF); and (iii) presence of developed ovaries (gravid = GR). Then, they were processed for metagenomics with Illumina Hiseq Sequencing in order to be sequence analyzed and to obtain the taxonomic profiles of the microbiota. Bacterial metagenomic analysis revealed differences in microbiota composition based upon the distinct physiological stages of the adult insect. Sequence identification revealed two phyla (Proteobacteria and Actinobacteria), 11 families and 15 genera; 87 % of the bacteria were Gram-negative, while only one family and two genera were identified as Gram-positive. The genera Ochrobactrum, Bradyrhizobium and Pseudomonas were found across all of the groups. The metagenomic analysis revealed that the microbiota of the Lu. intermedia female sand flies are distinct under specific physiological conditions and consist of 15 bacterial genera. The Ochrobactrum, Bradyrhizobium and Pseudomonas were the common genera. Our results detailing

  9. Characterization of limes (Citrus aurantifolia) grown in Bhutan and Indonesia using high-throughput sequencing.

    Science.gov (United States)

    Penjor, Tshering; Mimura, Takashi; Matsumoto, Ryoji; Yamamoto, Masashi; Nagano, Yukio

    2014-04-30

    Lime [Citrus aurantifolia (Cristm.) Swingle] is a Citrus species that is a popular ingredient in many cuisines. Some citrus plants are known to originate in the area ranging from northeastern India to southwestern China. In the current study, we characterized and compared limes grown in Bhutan (n = 5 accessions) and Indonesia (n = 3 accessions). The limes were separated into two groups based on their morphology. Restriction site-associated DNA sequencing (RAD-seq) separated the eight accessions into two clusters. One cluster contained four accessions from Bhutan, whereas the other cluster contained one accession from Bhutan and the three accessions from Indonesia. This genetic classification supported the morphological classification of limes. The analysis suggests that the properties associated with asexual reproduction, and somatic homologous recombination, have contributed to the genetic diversification of limes.

  10. Bifrost: a Modular Python/C++ Framework for Development of High-Throughput Data Analysis Pipelines

    Science.gov (United States)

    Cranmer, Miles; Barsdell, Benjamin R.; Price, Danny C.; Garsden, Hugh; Taylor, Gregory B.; Dowell, Jayce; Schinzel, Frank; Costa, Timothy; Greenhill, Lincoln J.

    2017-01-01

    Large radio interferometers have data rates that render long-term storage of raw correlator data infeasible, thus motivating development of real-time processing software. For high-throughput applications, processing pipelines are challenging to design and implement. Motivated by science efforts with the Long Wavelength Array, we have developed Bifrost, a novel Python/C++ framework that eases the development of high-throughput data analysis software by packaging algorithms as black box processes in a directed graph. This strategy to modularize code allows astronomers to create parallelism without code adjustment. Bifrost uses CPU/GPU ’circular memory’ data buffers that enable ready introduction of arbitrary functions into the processing path for ’streams’ of data, and allow pipelines to automatically reconfigure in response to astrophysical transient detection or input of new observing settings. We have deployed and tested Bifrost at the latest Long Wavelength Array station, in Sevilleta National Wildlife Refuge, NM, where it handles throughput exceeding 10 Gbps per CPU core.

  11. High throughput imaging and analysis for biological interpretation of agricultural plants and environmental interaction

    Science.gov (United States)

    Hong, Hyundae; Benac, Jasenka; Riggsbee, Daniel; Koutsky, Keith

    2014-03-01

    High throughput (HT) phenotyping of crops is essential to increase yield in environments deteriorated by climate change. The controlled environment of a greenhouse offers an ideal platform to study the genotype to phenotype linkages for crop screening. Advanced imaging technologies are used to study plants' responses to resource limitations such as water and nutrient deficiency. Advanced imaging technologies coupled with automation make HT phenotyping in the greenhouse not only feasible, but practical. Monsanto has a state of the art automated greenhouse (AGH) facility. Handling of the soil, pots water and nutrients are all completely automated. Images of the plants are acquired by multiple hyperspectral and broadband cameras. The hyperspectral cameras cover wavelengths from visible light through short wave infra-red (SWIR). Inhouse developed software analyzes the images to measure plant morphological and biochemical properties. We measure phenotypic metrics like plant area, height, and width as well as biomass. Hyperspectral imaging allows us to measure biochemcical metrics such as chlorophyll, anthocyanin, and foliar water content. The last 4 years of AGH operations on crops like corn, soybean, and cotton have demonstrated successful application of imaging and analysis technologies for high throughput plant phenotyping. Using HT phenotyping, scientists have been showing strong correlations to environmental conditions, such as water and nutrient deficits, as well as the ability to tease apart distinct differences in the genetic backgrounds of crops.

  12. High-throughput mouse phenotyping using non-rigid registration and robust principal component analysis

    Science.gov (United States)

    Xie, Zhongliu; Kitamoto, Asanobu; Tamura, Masaru; Shiroishi, Toshihiko; Gillies, Duncan

    2016-03-01

    Intensive international efforts are underway towards phenotyping the mouse genome, by knocking out each of its ≍25,000 genes one-by-one for comparative study. With vast amounts of data to analyze, the traditional method using time-consuming histological examination is clearly impractical, leading to an overwhelming demand for some high-throughput phenotyping framework, especially with the employment of biomedical image informatics to efficiently identify phenotypes concerning morphological abnormality. Existing work has either excessively relied on volumetric analytics which is insensitive to phenotypes associated with no severe volume variations, or tailored for specific defects and thus fails to serve a general phenotyping purpose. Furthermore, the prevailing requirement of an atlas for image segmentation in contrast to its limited availability further complicates the issue in practice. In this paper we propose a high-throughput general-purpose phenotyping framework that is able to efficiently perform batch-wise anomaly detection without prior knowledge of the phenotype and the need for atlas-based segmentation. Anomaly detection is centered on the combined use of group-wise non-rigid image registration and robust principal component analysis (RPCA) for feature extraction and decomposition.

  13. Origin, diversity and maturation of human antiviral antibodies analyzed by high-throughput sequencing

    Directory of Open Access Journals (Sweden)

    Ponraj ePrabakaran

    2012-08-01

    Full Text Available Our understanding of how antibodies are generated and function could help develop effective vaccines and antibody-based therapeutics against viruses such as HIV-1, SARS Coronavirus (CoV, and Hendra and Nipah viruses (henipaviruses. Although broadly neutralizing antibodies (bnAbs against the HIV-1 were observed in patients, elicitation of such bnAbs remains a major challenge when compared to other viral targets. We previously hypothesized that HIV-1 could have evolved a strategy to evade the immune system due to absent or very weak binding of germline antibodies to the conserved epitopes that may not be sufficient to initiate and/or maintain an effective immune response. To further explore our hypothesis, we used the 454 sequence analysis of a large naïve library of human IgM antibodies which had been used for selecting antibodies against SARS Coronavirus (CoV receptor-binding domain (RBD, and soluble G proteins (sG of Hendra and Nipah viruses (henipaviruses. We found that the human IgM repertoires from the 454 sequencing have diverse germline usages, recombination patterns, junction diversity and a lower extent of somatic mutation. In this study, we identified germline intermediates of antibodies specific to HIV-1 and other viruses as observed in normal individuals, and compared their genetic diversity and somatic mutation level along with available structural and functional data. Further computational analysis will provide framework for understanding the underlying genetic and molecular determinants related to maturation pathways of antiviral bnAbs that could be useful for applying novel approaches to the design of effective vaccine immunogens and antibody-based therapeutics.

  14. Polymorphism discovery and allele frequency estimation using high-throughput DNA sequencing of target-enriched pooled DNA samples

    Directory of Open Access Journals (Sweden)

    Mullen Michael P

    2012-01-01

    Full Text Available Abstract Background The central role of the somatotrophic axis in animal post-natal growth, development and fertility is well established. Therefore, the identification of genetic variants affecting quantitative traits within this axis is an attractive goal. However, large sample numbers are a pre-requisite for the identification of genetic variants underlying complex traits and although technologies are improving rapidly, high-throughput sequencing of large numbers of complete individual genomes remains prohibitively expensive. Therefore using a pooled DNA approach coupled with target enrichment and high-throughput sequencing, the aim of this study was to identify polymorphisms and estimate allele frequency differences across 83 candidate genes of the somatotrophic axis, in 150 Holstein-Friesian dairy bulls divided into two groups divergent for genetic merit for fertility. Results In total, 4,135 SNPs and 893 indels were identified during the resequencing of the 83 candidate genes. Nineteen percent (n = 952 of variants were located within 5' and 3' UTRs. Seventy-two percent (n = 3,612 were intronic and 9% (n = 464 were exonic, including 65 indels and 236 SNPs resulting in non-synonymous substitutions (NSS. Significant (P ® MassARRAY. No significant differences (P > 0.1 were observed between the two methods for any of the 43 SNPs across both pools (i.e., 86 tests in total. Conclusions The results of the current study support previous findings of the use of DNA sample pooling and high-throughput sequencing as a viable strategy for polymorphism discovery and allele frequency estimation. Using this approach we have characterised the genetic variation within genes of the somatotrophic axis and related pathways, central to mammalian post-natal growth and development and subsequent lactogenesis and fertility. We have identified a large number of variants segregating at significantly different frequencies between cattle groups divergent for calving

  15. High-throughput sequencing of nematode communities from total soil DNA extractions

    DEFF Research Database (Denmark)

    Sapkota, Rumakanta; Nicolaisen, Mogens

    2015-01-01

    nematodes without the need for enrichment was developed. Using this strategy on DNA templates from a set of 22 agricultural soils, we obtained 64.4% sequences of nematode origin in total, whereas the remaining sequences were almost entirely from other metazoans. The nematode sequences were derived from...... a broad taxonomic range and most sequences were from nematode taxa that have previously been found to be abundant in soil such as Tylenchida, Rhabditida, Dorylaimida, Triplonchida and Araeolaimida. Conclusions: Our amplification and sequencing strategy for assessing nematode diversity was able to collect...

  16. Uncommon nucleotide excision repair phenotypes revealed by targeted high-throughput sequencing.

    Science.gov (United States)

    Calmels, Nadège; Greff, Géraldine; Obringer, Cathy; Kempf, Nadine; Gasnier, Claire; Tarabeux, Julien; Miguet, Marguerite; Baujat, Geneviève; Bessis, Didier; Bretones, Patricia; Cavau, Anne; Digeon, Béatrice; Doco-Fenzy, Martine; Doray, Bérénice; Feillet, François; Gardeazabal, Jesus; Gener, Blanca; Julia, Sophie; Llano-Rivas, Isabel; Mazur, Artur; Michot, Caroline; Renaldo-Robin, Florence; Rossi, Massimiliano; Sabouraud, Pascal; Keren, Boris; Depienne, Christel; Muller, Jean; Mandel, Jean-Louis; Laugel, Vincent

    2016-03-22

    Deficient nucleotide excision repair (NER) activity causes a variety of autosomal recessive diseases including xeroderma pigmentosum (XP) a disorder which pre-disposes to skin cancer, and the severe multisystem condition known as Cockayne syndrome (CS). In view of the clinical overlap between NER-related disorders, as well as the existence of multiple phenotypes and the numerous genes involved, we developed a new diagnostic approach based on the enrichment of 16 NER-related genes by multiplex amplification coupled with next-generation sequencing (NGS). Our test cohort consisted of 11 DNA samples, all with known mutations and/or non pathogenic SNPs in two of the tested genes. We then used the same technique to analyse samples from a prospective cohort of 40 patients. Multiplex amplification and sequencing were performed using AmpliSeq protocol on the Ion Torrent PGM (Life Technologies). We identified causative mutations in 17 out of the 40 patients (43%). Four patients showed biallelic mutations in the ERCC6(CSB) gene, five in the ERCC8(CSA) gene: most of them had classical CS features but some had very mild and incomplete phenotypes. A small cohort of 4 unrelated classic XP patients from the Basque country (Northern Spain) revealed a common splicing mutation in POLH (XP-variant), demonstrating a new founder effect in this population. Interestingly, our results also found ERCC2(XPD), ERCC3(XPB) or ERCC5(XPG) mutations in two cases of UV-sensitive syndrome and in two cases with mixed XP/CS phenotypes. Our study confirms that NGS is an efficient technique for the analysis of NER-related disorders on a molecular level. It is particularly useful for phenotypes with combined features or unusually mild symptoms. Targeted NGS used in conjunction with DNA repair functional tests and precise clinical evaluation permits rapid and cost-effective diagnosis in patients with NER-defects.

  17. High-Throughput Sequencing of Microbial Community Diversity and Dynamics during Douchi Fermentation

    Science.gov (United States)

    Tu, Zong-cai; Wang, Xiao-lan

    2016-01-01

    Douchi is a type of Chinese traditional fermented food that is an important source of protein and is used in flavouring ingredients. The end product is affected by the microbial community present during fermentation, but exactly how microbes influence the fermentation process remains poorly understood. We used an Illumina MiSeq approach to investigate bacterial and fungal community diversity during both douchi-koji making and fermentation. A total of 181,443 high quality bacterial 16S rRNA sequences and 221,059 high quality fungal internal transcribed spacer reads were used for taxonomic classification, revealing eight bacterial and three fungal phyla. Firmicutes, Actinobacteria and Proteobacteria were the dominant bacterial phyla, while Ascomycota and Zygomycota were the dominant fungal phyla. At the genus level, Staphylococcus and Weissella were the dominant bacteria, while Aspergillus and Lichtheimia were the dominant fungi. Principal coordinate analysis showed structural separation between the composition of bacteria in koji making and fermentation. However, multivariate analysis of variance based on unweighted UniFrac distances did identify distinct differences (p fermentation. This is the first investigation to integrate douchi fermentation and koji making and fermentation processes through this technological approach. The results provide insight into the microbiome of the douchi fermentation process, and reveal a structural separation that may be stratified by the environment during the production of this traditional fermented food. PMID:27992473

  18. INSIDIA: A FIJI Macro Delivering High-Throughput and High-Content Spheroid Invasion Analysis.

    Science.gov (United States)

    Moriconi, Chiara; Palmieri, Valentina; Di Santo, Riccardo; Tornillo, Giusy; Papi, Massimiliano; Pilkington, Geoff; De Spirito, Marco; Gumbleton, Mark

    2017-10-01

    Time-series image capture of in vitro 3D spheroidal cancer models embedded within an extracellular matrix affords examination of spheroid growth and cancer cell invasion. However, a customizable, comprehensive and open source solution for the quantitative analysis of such spheroid images is lacking. Here, the authors describe INSIDIA (INvasion SpheroID ImageJ Analysis), an open-source macro implemented as a customizable software algorithm running on the FIJI platform, that enables high-throughput high-content quantitative analysis of spheroid images (both bright-field gray and fluorescent images) with the output of a range of parameters defining the spheroid "tumor" core and its invasive characteristics. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  19. Insight into dynamic genome imaging: Canonical framework identification and high-throughput analysis.

    Science.gov (United States)

    Ronquist, Scott; Meixner, Walter; Rajapakse, Indika; Snyder, John

    2017-07-01

    The human genome is dynamic in structure, complicating researcher's attempts at fully understanding it. Time series "Fluorescent in situ Hybridization" (FISH) imaging has increased our ability to observe genome structure, but due to cell type and experimental variability this data is often noisy and difficult to analyze. Furthermore, computational analysis techniques are needed for homolog discrimination and canonical framework detection, in the case of time-series images. In this paper we introduce novel ideas for nucleus imaging analysis, present findings extracted using dynamic genome imaging, and propose an objective algorithm for high-throughput, time-series FISH imaging. While a canonical framework could not be detected beyond statistical significance in the analyzed dataset, a mathematical framework for detection has been outlined with extension to 3D image analysis. Copyright © 2017 Elsevier Inc. All rights reserved.

  20. Computational and statistical methods for high-throughput analysis of post-translational modifications of proteins.

    Science.gov (United States)

    Schwämmle, Veit; Verano-Braga, Thiago; Roepstorff, Peter

    2015-11-03

    The investigation of post-translational modifications (PTMs) represents one of the main research focuses for the study of protein function and cell signaling. Mass spectrometry instrumentation with increasing sensitivity improved protocols for PTM enrichment and recently established pipelines for high-throughput experiments allow large-scale identification and quantification of several PTM types. This review addresses the concurrently emerging challenges for the computational analysis of the resulting data and presents PTM-centered approaches for spectra identification, statistical analysis, multivariate analysis and data interpretation. We furthermore discuss the potential of future developments that will help to gain deep insight into the PTM-ome and its biological role in cells. This article is part of a Special Issue entitled: Computational Proteomics. Copyright © 2015 Elsevier B.V. All rights reserved.

  1. An exploratory data analysis method to reveal modular latent structures in high-throughput data

    Directory of Open Access Journals (Sweden)

    Yu Tianwei

    2010-08-01

    Full Text Available Abstract Background Modular structures are ubiquitous across various types of biological networks. The study of network modularity can help reveal regulatory mechanisms in systems biology, evolutionary biology and developmental biology. Identifying putative modular latent structures from high-throughput data using exploratory analysis can help better interpret the data and generate new hypotheses. Unsupervised learning methods designed for global dimension reduction or clustering fall short of identifying modules with factors acting in linear combinations. Results We present an exploratory data analysis method named MLSA (Modular Latent Structure Analysis to estimate modular latent structures, which can find co-regulative modules that involve non-coexpressive genes. Conclusions Through simulations and real-data analyses, we show that the method can recover modular latent structures effectively. In addition, the method also performed very well on data generated from sparse global latent factor models. The R code is available at http://userwww.service.emory.edu/~tyu8/MLSA/.

  2. STATISTICAL METHODS FOR THE ANALYSIS OF HIGH-THROUGHPUT METABOLOMICS DATA

    Directory of Open Access Journals (Sweden)

    Jörg Bartel

    2013-01-01

    Full Text Available Metabolomics is a relatively new high-throughput technology that aims at measuring all endogenous metabolites within a biological sample in an unbiased fashion. The resulting metabolic profiles may be regarded as functional signatures of the physiological state, and have been shown to comprise effects of genetic regulation as well as environmental factors. This potential to connect genotypic to phenotypic information promises new insights and biomarkers for different research fields, including biomedical and pharmaceutical research. In the statistical analysis of metabolomics data, many techniques from other omics fields can be reused. However recently, a number of tools specific for metabolomics data have been developed as well. The focus of this mini review will be on recent advancements in the analysis of metabolomics data especially by utilizing Gaussian graphical models and independent component analysis.

  3. Statistical methods for the analysis of high-throughput metabolomics data

    Directory of Open Access Journals (Sweden)

    Fabian J. Theis

    2013-01-01

    Full Text Available Metabolomics is a relatively new high-throughput technology that aims at measuring all endogenous metabolites within a biological sample in an unbiased fashion. The resulting metabolic profiles may be regarded as functional signatures of the physiological state, and have been shown to comprise effects of genetic regulation as well as environmental factors. This potential to connect genotypic to phenotypic information promises new insights and biomarkers for different research fields, including biomedical and pharmaceutical research. In the statistical analysis of metabolomics data, many techniques from other omics fields can be reused. However recently, a number of tools specific for metabolomics data have been developed as well. The focus of this mini review will be on recent advancements in the analysis of metabolomics data especially by utilizing Gaussian graphical models and independent component analysis.

  4. An exploratory data analysis method to reveal modular latent structures in high-throughput data.

    Science.gov (United States)

    Yu, Tianwei

    2010-08-27

    Modular structures are ubiquitous across various types of biological networks. The study of network modularity can help reveal regulatory mechanisms in systems biology, evolutionary biology and developmental biology. Identifying putative modular latent structures from high-throughput data using exploratory analysis can help better interpret the data and generate new hypotheses. Unsupervised learning methods designed for global dimension reduction or clustering fall short of identifying modules with factors acting in linear combinations. We present an exploratory data analysis method named MLSA (Modular Latent Structure Analysis) to estimate modular latent structures, which can find co-regulative modules that involve non-coexpressive genes. Through simulations and real-data analyses, we show that the method can recover modular latent structures effectively. In addition, the method also performed very well on data generated from sparse global latent factor models. The R code is available at http://userwww.service.emory.edu/~tyu8/MLSA/.

  5. Emerging flow injection mass spectrometry methods for high-throughput quantitative analysis.

    Science.gov (United States)

    Nanita, Sergio C; Kaldon, Laura G

    2016-01-01

    Where does flow injection analysis mass spectrometry (FIA-MS) stand relative to ambient mass spectrometry (MS) and chromatography-MS? Improvements in FIA-MS methods have resulted in fast-expanding uses of this technique. Key advantages of FIA-MS over chromatography-MS are fast analysis (typical run time method simplicity, and FIA-MS offers high-throughput without compromising sensitivity, precision and accuracy as much as ambient MS techniques. Consequently, FIA-MS is increasingly becoming recognized as a suitable technique for applications where quantitative screening of chemicals needs to be performed rapidly and reliably. The FIA-MS methods discussed herein have demonstrated quantitation of diverse analytes, including pharmaceuticals, pesticides, environmental contaminants, and endogenous compounds, at levels ranging from parts-per-billion (ppb) to parts-per-million (ppm) in very complex matrices (such as blood, urine, and a variety of foods of plant and animal origin), allowing successful applications of the technique in clinical diagnostics, metabolomics, environmental sciences, toxicology, and detection of adulterated/counterfeited goods. The recent boom in applications of FIA-MS for high-throughput quantitative analysis has been driven in part by (1) the continuous improvements in sensitivity and selectivity of MS instrumentation, (2) the introduction of novel sample preparation procedures compatible with standalone mass spectrometric analysis such as salting out assisted liquid-liquid extraction (SALLE) with volatile solutes and NH4(+) QuEChERS, and (3) the need to improve efficiency of laboratories to satisfy increasing analytical demand while lowering operational cost. The advantages and drawbacks of quantitative analysis by FIA-MS are discussed in comparison to chromatography-MS and ambient MS (e.g., DESI, LAESI, DART). Generally, FIA-MS sits 'in the middle' between ambient MS and chromatography-MS, offering a balance between analytical capability and

  6. Determining the diet of larvae of western rock lobster (Panulirus cygnus using high-throughput DNA sequencing techniques.

    Directory of Open Access Journals (Sweden)

    Richard O'Rorke

    Full Text Available The Western Australian rock lobster fishery has been both a highly productive and sustainable fishery. However, a recent dramatic and unexplained decline in post-larval recruitment threatens this sustainability. Our lack of knowledge of key processes in lobster larval ecology, such as their position in the food web, limits our ability to determine what underpins this decline. The present study uses a high-throughput amplicon sequencing approach on DNA obtained from the hepatopancreas of larvae to discover significant prey items. Two short regions of the 18S rRNA gene were amplified under the presence of lobster specific PNA to prevent lobster amplification and to improve prey amplification. In the resulting sequences either little prey was recovered, indicating that the larval gut was empty, or there was a high number of reads originating from multiple zooplankton taxa. The most abundant reads included colonial Radiolaria, Thaliacea, Actinopterygii, Hydrozoa and Sagittoidea, which supports the hypothesis that the larvae feed on multiple groups of mostly transparent gelatinous zooplankton. This hypothesis has prevailed as it has been tentatively inferred from the physiology of larvae, captive feeding trials and co-occurrence in situ. However, these prey have not been observed in the larval gut as traditional microscopic techniques cannot discern between transparent and gelatinous prey items in the gut. High-throughput amplicon sequencing of gut DNA has enabled us to classify these otherwise undetectable prey. The dominance of the colonial radiolarians among the gut contents is intriguing in that this group has been historically difficult to quantify in the water column, which may explain why they have not been connected to larval diet previously. Our results indicate that a PCR based technique is a very successful approach to identify the most abundant taxa in the natural diet of lobster larvae.

  7. Determining the diet of larvae of western rock lobster (Panulirus cygnus) using high-throughput DNA sequencing techniques.

    Science.gov (United States)

    O'Rorke, Richard; Lavery, Shane; Chow, Seinen; Takeyama, Haruko; Tsai, Peter; Beckley, Lynnath E; Thompson, Peter A; Waite, Anya M; Jeffs, Andrew G

    2012-01-01

    The Western Australian rock lobster fishery has been both a highly productive and sustainable fishery. However, a recent dramatic and unexplained decline in post-larval recruitment threatens this sustainability. Our lack of knowledge of key processes in lobster larval ecology, such as their position in the food web, limits our ability to determine what underpins this decline. The present study uses a high-throughput amplicon sequencing approach on DNA obtained from the hepatopancreas of larvae to discover significant prey items. Two short regions of the 18S rRNA gene were amplified under the presence of lobster specific PNA to prevent lobster amplification and to improve prey amplification. In the resulting sequences either little prey was recovered, indicating that the larval gut was empty, or there was a high number of reads originating from multiple zooplankton taxa. The most abundant reads included colonial Radiolaria, Thaliacea, Actinopterygii, Hydrozoa and Sagittoidea, which supports the hypothesis that the larvae feed on multiple groups of mostly transparent gelatinous zooplankton. This hypothesis has prevailed as it has been tentatively inferred from the physiology of larvae, captive feeding trials and co-occurrence in situ. However, these prey have not been observed in the larval gut as traditional microscopic techniques cannot discern between transparent and gelatinous prey items in the gut. High-throughput amplicon sequencing of gut DNA has enabled us to classify these otherwise undetectable prey. The dominance of the colonial radiolarians among the gut contents is intriguing in that this group has been historically difficult to quantify in the water column, which may explain why they have not been connected to larval diet previously. Our results indicate that a PCR based technique is a very successful approach to identify the most abundant taxa in the natural diet of lobster larvae.

  8. High throughput protein production screening

    Science.gov (United States)

    Beernink, Peter T [Walnut Creek, CA; Coleman, Matthew A [Oakland, CA; Segelke, Brent W [San Ramon, CA

    2009-09-08

    Methods, compositions, and kits for the cell-free production and analysis of proteins are provided. The invention allows for the production of proteins from prokaryotic sequences or eukaryotic sequences, including human cDNAs using PCR and IVT methods and detecting the proteins through fluorescence or immunoblot techniques. This invention can be used to identify optimized PCR and WT conditions, codon usages and mutations. The methods are readily automated and can be used for high throughput analysis of protein expression levels, interactions, and functional states.

  9. Genetic Bases of Bicuspid Aortic Valve: The Contribution of Traditional and High-Throughput Sequencing Approaches on Research and Diagnosis

    Directory of Open Access Journals (Sweden)

    Betti Giusti

    2017-08-01

    Full Text Available Bicuspid aortic valve (BAV is a common (0.5–2.0% of general population congenital heart defect with increased prevalence of aortic dilatation and dissection. BAV has an autosomal dominant inheritance with reduced penetrance and variable expressivity. BAV has been described as an isolated trait or associated with syndromic conditions [e.g., Marfan Marfan syndrome or Loeys-Dietz syndrome (MFS, LDS]. Identification of a syndromic condition in a BAV patient is clinically relevant to personalize aortic surgery indication. A 4-fold increase in BAV prevalence in a large cohort of unrelated MFS patients with respect to general population was reported, as well as in LDS patients (8-fold. It is also known that BAV is more frequent in patients with thoracic aortic aneurysm (TAA related to mutations in ACTA2, FBN1, and TGFBR2 genes. Moreover, in 8 patients with BAV and thoracic aortic dilation, not fulfilling the clinical criteria for MFS, FBN1 mutations in 2/8 patients were identified suggesting that FBN1 or other genes involved in syndromic conditions correlated to aortopathy could be involved in BAV. Beyond loci associated to syndromic disorders, studies in humans and animal models evidenced/suggested the role of further genes in non-syndromic BAV. The transcriptional regulator NOTCH1 has been associated with the development and acceleration of calcium deposition. Genome wide marker-based linkage analysis demonstrated a linkage of BAV to loci on chromosomes 18, 5, and 13q. Recently, a role for GATA4/5 in aortic valve morphogenesis and endocardial cell differentiation has been reported. BAV has also been associated with a reduced UFD1L gene expression or involvement of a locus containing AXIN1/PDIA2. Much remains to be understood about the genetics of BAV. In the last years, high-throughput sequencing technologies, allowing the analysis of large number of genes or entire exomes or genomes, progressively became available. The latter issue together with

  10. Multispot single-molecule FRET: High-throughput analysis of freely diffusing molecules.

    Directory of Open Access Journals (Sweden)

    Antonino Ingargiola

    Full Text Available We describe an 8-spot confocal setup for high-throughput smFRET assays and illustrate its performance with two characteristic experiments. First, measurements on a series of freely diffusing doubly-labeled dsDNA samples allow us to demonstrate that data acquired in multiple spots in parallel can be properly corrected and result in measured sample characteristics consistent with those obtained with a standard single-spot setup. We then take advantage of the higher throughput provided by parallel acquisition to address an outstanding question about the kinetics of the initial steps of bacterial RNA transcription. Our real-time kinetic analysis of promoter escape by bacterial RNA polymerase confirms results obtained by a more indirect route, shedding additional light on the initial steps of transcription. Finally, we discuss the advantages of our multispot setup, while pointing potential limitations of the current single laser excitation design, as well as analysis challenges and their solutions.

  11. Hadoop and friends - first experience at CERN with a new platform for high throughput analysis steps

    Science.gov (United States)

    Duellmann, D.; Surdy, K.; Menichetti, L.; Toebbicke, R.

    2017-10-01

    The statistical analysis of infrastructure metrics comes with several specific challenges, including the fairly large volume of unstructured metrics from a large set of independent data sources. Hadoop and Spark provide an ideal environment in particular for the first steps of skimming rapidly through hundreds of TB of low relevance data to find and extract the much smaller data volume that is relevant for statistical analysis and modelling. This presentation will describe the new Hadoop service at CERN and the use of several of its components for high throughput data aggregation and ad-hoc pattern searches. We will describe the hardware setup used, the service structure with a small set of decoupled clusters and the first experience with co-hosting different applications and performing software upgrades. We will further detail the common infrastructure used for data extraction and preparation from continuous monitoring and database input sources.

  12. The Candida Genome Database (CGD): incorporation of Assembly 22, systematic identifiers and visualization of high throughput sequencing data.

    Science.gov (United States)

    Skrzypek, Marek S; Binkley, Jonathan; Binkley, Gail; Miyasato, Stuart R; Simison, Matt; Sherlock, Gavin

    2017-01-04

    The Candida Genome Database (CGD, http://www.candidagenome.org/) is a freely available online resource that provides gene, protein and sequence information for multiple Candida species, along with web-based tools for accessing, analyzing and exploring these data. The mission of CGD is to facilitate and accelerate research into Candida pathogenesis and biology, by curating the scientific literature in real time, and connecting literature-derived annotations to the latest version of the genomic sequence and its annotations. Here, we report the incorporation into CGD of Assembly 22, the first chromosome-level, phased diploid assembly of the C. albicans genome, coupled with improvements that we have made to the assembly using additional available sequence data. We also report the creation of systematic identifiers for C. albicans genes and sequence features using a system similar to that adopted by the yeast community over two decades ago. Finally, we describe the incorporation of JBrowse into CGD, which allows online browsing of mapped high throughput sequencing data, and its implementation for several RNA-Seq data sets, as well as the whole genome sequencing data that was used in the construction of Assembly 22. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  13. Detection and mapping of mtDNA SNPs in Atlantic salmon using high throughput DNA sequencing

    Directory of Open Access Journals (Sweden)

    Olafsdottir Gudbjorg

    2011-04-01

    Full Text Available Abstract Background Approximately half of the mitochondrial genome inherent within 546 individual Atlantic salmon (Salmo salar derived from across the species' North Atlantic range, was selectively amplified with a novel combination of standard PCR and pyro-sequencing in a single run using 454 Titanium FLX technology (Roche, 454 Life Sciences. A unique combination of barcoded primers and a partitioned sequencing plate was employed to designate each sequence read to its original sample. The sequence reads were aligned according to the S. salar mitochondrial reference sequence (NC_001960.1, with the objective of identifying single nucleotide polymorphisms (SNPs. They were validated if they met with the following three stringent criteria: (i sequence reads were produced from both DNA strands; (ii SNPs were confirmed in a minimum of 90% of replicate sequence reads; and (iii SNPs occurred in more than one individual. Results Pyrosequencing generated a total of 179,826,884 bp of data, and 10,765 of the total 10,920 S. salar sequences (98.6% were assigned back to their original samples. The approach taken resulted in a total of 216 SNPs and 2 indels, which were validated and mapped onto the S. salar mitochondrial genome, including 107 SNPs and one indel not previously reported. An average of 27.3 sequence reads with a standard deviation of 11.7 supported each SNP per individual. Conclusion The study generated a mitochondrial SNP panel from a large sample group across a broad geographical area, reducing the potential for ascertainment bias, which has hampered previous studies. The SNPs identified here validate those identified in previous studies, and also contribute additional potentially informative loci for the future study of phylogeography and evolution in the Atlantic salmon. The overall success experienced with this novel application of HT sequencing of targeted regions suggests that the same approach could be successfully applied for SNP mining

  14. High-throughput FTIR-based bioprocess analysis of recombinant cyprosin production.

    Science.gov (United States)

    Sampaio, Pedro N; Sales, Kevin C; Rosa, Filipa O; Lopes, Marta B; Calado, Cecília R C

    2017-01-01

    To increase the knowledge of the recombinant cyprosin production process in Saccharomyces cerevisiae cultures, it is relevant to implement efficient bioprocess monitoring techniques. The present work focuses on the implementation of a mid-infrared (MIR) spectroscopy-based tool for monitoring the recombinant culture in a rapid, economic, and high-throughput (using a microplate system) mode. Multivariate data analysis on the MIR spectra of culture samples was conducted. Principal component analysis (PCA) enabled capturing the general metabolic status of the yeast cells, as replicated samples appear grouped together in the score plot and groups of culture samples according to the main growth phase can be clearly distinguished. The PCA-loading vectors also revealed spectral regions, and the corresponding chemical functional groups and biomolecules that mostly contributed for the cell biomolecular fingerprint associated with the culture growth phase. These data were corroborated by the analysis of the samples' second derivative spectra. Partial least square (PLS) regression models built based on the MIR spectra showed high predictive ability for estimating the bioprocess critical variables: biomass (R 2 = 0.99, RMSEP 2.8%); cyprosin activity (R 2 = 0.98, RMSEP 3.9%); glucose (R 2 = 0.93, RMSECV 7.2%); galactose (R 2 = 0.97, RMSEP 4.6%); ethanol (R 2 = 0.97, RMSEP 5.3%); and acetate (R 2 = 0.95, RMSEP 7.0%). In conclusion, high-throughput MIR spectroscopy and multivariate data analysis were effective in identifying the main growth phases and specific cyprosin production phases along the yeast culture as well as in quantifying the critical variables of the process. This knowledge will promote future process optimization and control the recombinant cyprosin bioprocess according to Quality by Design framework.

  15. Pathway Processor 2.0: a web resource for pathway-based analysis of high-throughput data.

    Science.gov (United States)

    Beltrame, Luca; Bianco, Luca; Fontana, Paolo; Cavalieri, Duccio

    2013-07-15

    Pathway Processor 2.0 is a web application designed to analyze high-throughput datasets, including but not limited to microarray and next-generation sequencing, using a pathway centric logic. In addition to well-established methods such as the Fisher's test and impact analysis, Pathway Processor 2.0 offers innovative methods that convert gene expression into pathway expression, leading to the identification of differentially regulated pathways in a dataset of choice. Pathway Processor 2.0 is available as a web service at http://compbiotoolbox.fmach.it/pathwayProcessor/. Sample datasets to test the functionality can be used directly from the application. duccio.cavalieri@fmach.it Supplementary data are available at Bioinformatics online.

  16. The gut microbiotassay – a high-throughput real-time PCR chip combined with next generation sequencing

    DEFF Research Database (Denmark)

    Hermann-Bank, Marie Louise; Skovgaard, Kerstin; Mølbak, Lars

    informative. Many methods can be used to try to define and characterize the gut microbiota. Here we designed an assay consisting of twenty-four different primer systems targeting the most common bacterial groups of the intestine on different hierarchical levels. The aim of this study was to implement and test...... this assay with the high-throughput real-time PCR chip “Access Array 48.48” from Fluidigm. The chip executes 2304 individual reactions in parallel and afterwards it is possible to harvest the amplicons for next-generation sequencing. This approach gives a taxonomical overview of the gut microbiota, hence...... the name: ‘the gut microbiotassay’. The assay was tested on fifteen different bacterial type strains each functioning as target for one or more of the primer systems. In this way the sensitivity and the specificity of the primers were assessed. Next the assay was tested on complex ecosystems by extracting...

  17. High throughput resistance profiling of Plasmodium falciparum infections based on custom dual indexing and Illumina next generation sequencing-technology

    DEFF Research Database (Denmark)

    Nag, Sidsel; Dalgaard, Marlene Danner; Kofoed, Poul-Erik

    2017-01-01

    as the entire length of pfK13, and the mitochondrial barcode for parasite origin. SNPs of interest were sequenced with an average depth of 2,043 reads, and bases were called for the various SNP-positions with a p-value below 0.05, for 89.8-100% of samples. The SNP data indicates that artemisinin resistance......-conferring SNPs in pfK13 are absent from the studied area of Guinea-Bissau, while the pfmdr1 86 N allele is found at a high prevalence. The mitochondrial barcodes are unanimous and accommodate a West African origin of the parasites. With this method, very reliable high throughput surveillance of antimalarial drug...

  18. metaBIT, an integrative and automated metagenomic pipeline for analysing microbial profiles from high-throughput sequencing shotgun data

    DEFF Research Database (Denmark)

    Louvel, Guillaume; Der Sarkissian, Clio; Hanghøj, Kristian Ebbesen

    2016-01-01

    Micro-organisms account for most of the Earth's biodiversity and yet remain largely unknown. The complexity and diversity of microbial communities present in clinical and environmental samples can now be robustly investigated in record times and prices thanks to recent advances in high......-throughput DNA sequencing (HTS). Here, we develop metaBIT, an open-source computational pipeline automatizing routine microbial profiling of shotgun HTS data. Customizable by the user at different stringency levels, it performs robust taxonomy-based assignment and relative abundance calculation of microbial taxa......, as well as cross-sample statistical analyses of microbial diversity distributions. We demonstrate the versatility of metaBIT within a range of published HTS data sets sampled from the environment (soil and seawater) and the human body (skin and gut), but also from archaeological specimens. We present...

  19. Ammonium inhibition through the decoupling of acidification process and methanogenesis in anaerobic digester revealed by high throughput sequencing.

    Science.gov (United States)

    Zhang, Miao; Lin, Qiang; Rui, Junpeng; Li, Jiabao; Li, Xiangzhen

    2017-02-01

    To reveal the shifts of microbial communities along ammonium gradients, and the relationship between microbial community composition and the anaerobic digestion performance using a high throughput sequencing technique. Methane production declined with increasing ammonium concentration, and was inhibited above 4 g l-1. The volatile fatty acids, especially acetate, accumulated with elevated ammonium. Prokaryotic populations showed different responses to the ammonium concentration: Clostridium, Tepidimicrobium, Sporanaerobacter, Peptostreptococcus, Sarcina and Peptoniphilus showed good tolerance to ammonium ions. However, Syntrophomonas with poor tolerance to ammonium may be inhibited during anaerobic digestion. During methanogenesis, Methanosarcina was the dominant methanogen. Excessive ammonium inhibited methane production probably by decoupling the linkage between acidification process and methanogenesis, and finally resulted in different performance in anaerobic digestion.

  20. Insights into the microbial diversity and community dynamics of Chinese traditional fermented foods from using high-throughput sequencing approaches*

    Science.gov (United States)

    He, Guo-qing; Liu, Tong-jie; Sadiq, Faizan A.; Gu, Jing-si; Zhang, Guo-hua

    2017-01-01

    Chinese traditional fermented foods have a very long history dating back thousands of years and have become an indispensable part of Chinese dietary culture. A plethora of research has been conducted to unravel the composition and dynamics of microbial consortia associated with Chinese traditional fermented foods using culture-dependent as well as culture-independent methods, like different high-throughput sequencing (HTS) techniques. These HTS techniques enable us to understand the relationship between a food product and its microbes to a greater extent than ever before. Considering the importance of Chinese traditional fermented products, the objective of this paper is to review the diversity and dynamics of microbiota in Chinese traditional fermented foods revealed by HTS approaches. PMID:28378567

  1. Insights into the microbial diversity and community dynamics of Chinese traditional fermented foods from using high-throughput sequencing approaches.

    Science.gov (United States)

    He, Guo-Qing; Liu, Tong-Jie; Sadiq, Faizan A; Gu, Jing-Si; Zhang, Guo-Hua

    Chinese traditional fermented foods have a very long history dating back thousands of years and have become an indispensable part of Chinese dietary culture. A plethora of research has been conducted to unravel the composition and dynamics of microbial consortia associated with Chinese traditional fermented foods using culture-dependent as well as culture-independent methods, like different high-throughput sequencing (HTS) techniques. These HTS techniques enable us to understand the relationship between a food product and its microbes to a greater extent than ever before. Considering the importance of Chinese traditional fermented products, the objective of this paper is to review the diversity and dynamics of microbiota in Chinese traditional fermented foods revealed by HTS approaches.

  2. WormScan: a technique for high-throughput phenotypic analysis of Caenorhabditis elegans.

    Directory of Open Access Journals (Sweden)

    Mark D Mathew

    Full Text Available BACKGROUND: There are four main phenotypes that are assessed in whole organism studies of Caenorhabditis elegans; mortality, movement, fecundity and size. Procedures have been developed that focus on the digital analysis of some, but not all of these phenotypes and may be limited by expense and limited throughput. We have developed WormScan, an automated image acquisition system that allows quantitative analysis of each of these four phenotypes on standard NGM plates seeded with E. coli. This system is very easy to implement and has the capacity to be used in high-throughput analysis. METHODOLOGY/PRINCIPAL FINDINGS: Our system employs a readily available consumer grade flatbed scanner. The method uses light stimulus from the scanner rather than physical stimulus to induce movement. With two sequential scans it is possible to quantify the induced phototactic response. To demonstrate the utility of the method, we measured the phenotypic response of C. elegans to phosphine gas exposure. We found that stimulation of movement by the light of the scanner was equivalent to physical stimulation for the determination of mortality. WormScan also provided a quantitative assessment of health for the survivors. Habituation from light stimulation of continuous scans was similar to habituation caused by physical stimulus. CONCLUSIONS/SIGNIFICANCE: There are existing systems for the automated phenotypic data collection of C. elegans. The specific advantages of our method over existing systems are high-throughput assessment of a greater range of phenotypic endpoints including determination of mortality and quantification of the mobility of survivors. Our system is also inexpensive and very easy to implement. Even though we have focused on demonstrating the usefulness of WormScan in toxicology, it can be used in a wide range of additional C. elegans studies including lifespan determination, development, pathology and behavior. Moreover, we have even adapted the

  3. High-Throughput Sequencing of Microbial Community Diversity and Dynamics during Douchi Fermentation

    National Research Council Canada - National Science Library

    Yang, Lin; Yang, Hui-lin; Tu, Zong-cai; Wang, Xiao-lan

    2016-01-01

    .... A total of 181,443 high quality bacterial 16S rRNA sequences and 221,059 high quality fungal internal transcribed spacer reads were used for taxonomic classification, revealing eight bacterial and three fungal phyla...

  4. Next generation MUT-MAP, a high-sensitivity high-throughput microfluidics chip-based mutation analysis panel.

    Directory of Open Access Journals (Sweden)

    Erica B Schleifman

    Full Text Available Molecular profiling of tumor tissue to detect alterations, such as oncogenic mutations, plays a vital role in determining treatment options in oncology. Hence, there is an increasing need for a robust and high-throughput technology to detect oncogenic hotspot mutations. Although commercial assays are available to detect genetic alterations in single genes, only a limited amount of tissue is often available from patients, requiring multiplexing to allow for simultaneous detection of mutations in many genes using low DNA input. Even though next-generation sequencing (NGS platforms provide powerful tools for this purpose, they face challenges such as high cost, large DNA input requirement, complex data analysis, and long turnaround times, limiting their use in clinical settings. We report the development of the next generation mutation multi-analyte panel (MUT-MAP, a high-throughput microfluidic, panel for detecting 120 somatic mutations across eleven genes of therapeutic interest (AKT1, BRAF, EGFR, FGFR3, FLT3, HRAS, KIT, KRAS, MET, NRAS, and PIK3CA using allele-specific PCR (AS-PCR and Taqman technology. This mutation panel requires as little as 2 ng of high quality DNA from fresh frozen or 100 ng of DNA from formalin-fixed paraffin-embedded (FFPE tissues. Mutation calls, including an automated data analysis process, have been implemented to run 88 samples per day. Validation of this platform using plasmids showed robust signal and low cross-reactivity in all of the newly added assays and mutation calls in cell line samples were found to be consistent with the Catalogue of Somatic Mutations in Cancer (COSMIC database allowing for direct comparison of our platform to Sanger sequencing. High correlation with NGS when compared to the SuraSeq500 panel run on the Ion Torrent platform in a FFPE dilution experiment showed assay sensitivity down to 0.45%. This multiplexed mutation panel is a valuable tool for high-throughput biomarker discovery in

  5. High-throughput polymorphism detection and genotyping in Brassica napus using next-generation RAD sequencing

    Directory of Open Access Journals (Sweden)

    Bus Anja

    2012-06-01

    Full Text Available Abstract Background The complex genome of rapeseed (Brassica napus is not well understood despite the economic importance of the species. Good knowledge of sequence variation is needed for genetics approaches and breeding purposes. We used a diversity set of B. napus representing eight different germplasm types to sequence genome-wide distributed restriction-site associated DNA (RAD fragments for polymorphism detection and genotyping. Results More than 113,000 RAD clusters with more than 20,000 single nucleotide polymorphisms (SNPs and 125 insertions/deletions were detected and characterized. About one third of the RAD clusters and polymorphisms mapped to the Brassica rapa reference sequence. An even distribution of RAD clusters and polymorphisms was observed across the B. rapa chromosomes, which suggests that there might be an equal distribution over the Brassica oleracea chromosomes, too. The representation of Gene Ontology (GO terms for unigenes with RAD clusters and polymorphisms revealed no signature of selection with respect to the distribution of polymorphisms within genes belonging to a specific GO category. Conclusions Considering the decreasing costs for next-generation sequencing, the results of our study suggest that RAD sequencing is not only a simple and cost-effective method for high-density polymorphism detection but also an alternative to SNP genotyping from transcriptome sequencing or SNP arrays, even for species with complex genomes such as B. napus.

  6. Resonant waveguide grating imagers for single cell analysis and high throughput screening

    Science.gov (United States)

    Fang, Ye

    2015-08-01

    Resonant waveguide grating (RWG) systems illuminate an array of diffractive nanograting waveguide structures in microtiter plate to establish evanescent wave for measuring tiny changes in local refractive index arising from the dynamic mass redistribution of living cells upon stimulation. Whole-plate RWG imager enables high-throughput profiling and screening of drugs. Microfluidics RWG imager not only manifests distinct receptor signaling waves, but also differentiates long-acting agonism and antagonism. Spatially resolved RWG imager allows for single cell analysis including receptor signaling heterogeneity and the invasion of cancer cells in a spheroidal structure through 3-dimensional extracellular matrix. High frequency RWG imager permits real-time detection of drug-induced cardiotoxicity. The wide coverage in target, pathway, assay, and cell phenotype has made RWG systems powerful tool in both basic research and early drug discovery process.

  7. High-throughput protein extraction and immunoblotting analysis in Saccharomyces cerevisiae.

    Science.gov (United States)

    Lorenz, Todd C; Anand, Vikram C; Payne, Gregory S

    2008-01-01

    A variety of Saccharomyces cerevisiae strain libraries allow for systematic analysis of strains bearing gene deletions, repressible genes, overexpressed genes, or modified genes on a genome-wide scale. Here we introduce a method for culturing yeast strains in 96-well format to achieve log-phase growth and a high-throughput technique for generating whole-cell protein extracts from these cultures using sodium dodecyl sulfate and heat lysis. We subsequently describe a procedure to analyze these whole-cell extracts by immunoblotting for alkaline phosphatase and carboxypeptidase yscS to identify strains with defects in protein transport pathways or protein glycosylation. These methods should be readily adaptable to many different areas of interest.

  8. Improved structure, function and compatibility for CellProfiler: modular high-throughput image analysis software.

    Science.gov (United States)

    Kamentsky, Lee; Jones, Thouis R; Fraser, Adam; Bray, Mark-Anthony; Logan, David J; Madden, Katherine L; Ljosa, Vebjorn; Rueden, Curtis; Eliceiri, Kevin W; Carpenter, Anne E

    2011-04-15

    There is a strong and growing need in the biology research community for accurate, automated image analysis. Here, we describe CellProfiler 2.0, which has been engineered to meet the needs of its growing user base. It is more robust and user friendly, with new algorithms and features to facilitate high-throughput work. ImageJ plugins can now be run within a CellProfiler pipeline. CellProfiler 2.0 is free and open source, available at http://www.cellprofiler.org under the GPL v. 2 license. It is available as a packaged application for Macintosh OS X and Microsoft Windows and can be compiled for Linux. anne@broadinstitute.org Supplementary data are available at Bioinformatics online.

  9. Universal and blocking primer mismatches limit the use of high-throughput DNA sequencing for the quantitative metabarcoding of arthropods.

    Science.gov (United States)

    Piñol, J; Mir, G; Gomez-Polo, P; Agustí, N

    2015-07-01

    The quantification of the biological diversity in environmental samples using high-throughput DNA sequencing is hindered by the PCR bias caused by variable primer-template mismatches of the individual species. In some dietary studies, there is the added problem that samples are enriched with predator DNA, so often a predator-specific blocking oligonucleotide is used to alleviate the problem. However, specific blocking oligonucleotides could coblock nontarget species to some degree. Here, we accurately estimate the extent of the PCR biases induced by universal and blocking primers on a mock community prepared with DNA of twelve species of terrestrial arthropods. We also compare universal and blocking primer biases with those induced by variable annealing temperature and number of PCR cycles. The results show that reads of all species were recovered after PCR enrichment at our control conditions (no blocking oligonucleotide, 45 °C annealing temperature and 40 cycles) and high-throughput sequencing. They also show that the four factors considered biased the final proportions of the species to some degree. Among these factors, the number of primer-template mismatches of each species had a disproportionate effect (up to five orders of magnitude) on the amplification efficiency. In particular, the number of primer-template mismatches explained most of the variation (~3/4) in the amplification efficiency of the species. The effect of blocking oligonucleotide concentration on nontarget species relative abundance was also significant, but less important (below one order of magnitude). Considering the results reported here, the quantitative potential of the technique is limited, and only qualitative results (the species list) are reliable, at least when targeting the barcoding COI region. © 2014 John Wiley & Sons Ltd.

  10. Single-nucleotide polymorphism discovery by high-throughput sequencing in sorghum

    Directory of Open Access Journals (Sweden)

    White Frank F

    2011-07-01

    Full Text Available Abstract Background Eight diverse sorghum (Sorghum bicolor L. Moench accessions were subjected to short-read genome sequencing to characterize the distribution of single-nucleotide polymorphisms (SNPs. Two strategies were used for DNA library preparation. Missing SNP genotype data were imputed by local haplotype comparison. The effect of library type and genomic diversity on SNP discovery and imputation are evaluated. Results Alignment of eight genome equivalents (6 Gb to the public reference genome revealed 283,000 SNPs at ≥82% confirmation probability. Sequencing from libraries constructed to limit sequencing to start at defined restriction sites led to genotyping 10-fold more SNPs in all 8 accessions, and correctly imputing 11% more missing data, than from semirandom libraries. The SNP yield advantage of the reduced-representation method was less than expected, since up to one fifth of reads started at noncanonical restriction sites and up to one third of restriction sites predicted in silico to yield unique alignments were not sampled at near-saturation. For imputation accuracy, the availability of a genomically similar accession in the germplasm panel was more important than panel size or sequencing coverage. Conclusions A sequence quantity of 3 million 50-base reads per accession using a BsrFI library would conservatively provide satisfactory genotyping of 96,000 sorghum SNPs. For most reliable SNP-genotype imputation in shallowly sequenced genomes, germplasm panels should consist of pairs or groups of genomically similar entries. These results may help in designing strategies for economical genotyping-by-sequencing of large numbers of plant accessions.

  11. Genomic Methods Take the Plunge: Recent Advances in High-Throughput Sequencing of Marine Mammals.

    Science.gov (United States)

    Cammen, Kristina M; Andrews, Kimberly R; Carroll, Emma L; Foote, Andrew D; Humble, Emily; Khudyakov, Jane I; Louis, Marie; McGowen, Michael R; Olsen, Morten Tange; Van Cise, Amy M

    2016-11-01

    The dramatic increase in the application of genomic techniques to non-model organisms (NMOs) over the past decade has yielded numerous valuable contributions to evolutionary biology and ecology, many of which would not have been possible with traditional genetic markers. We review this recent progression with a particular focus on genomic studies of marine mammals, a group of taxa that represent key macroevolutionary transitions from terrestrial to marine environments and for which available genomic resources have recently undergone notable rapid growth. Genomic studies of NMOs utilize an expanding range of approaches, including whole genome sequencing, restriction site-associated DNA sequencing, array-based sequencing of single nucleotide polymorphisms and target sequence probes (e.g., exomes), and transcriptome sequencing. These approaches generate different types and quantities of data, and many can be applied with limited or no prior genomic resources, thus overcoming one traditional limitation of research on NMOs. Within marine mammals, such studies have thus far yielded significant contributions to the fields of phylogenomics and comparative genomics, as well as enabled investigations of fitness, demography, and population structure. Here we review the primary options for generating genomic data, introduce several emerging techniques, and discuss the suitability of each approach for different applications in the study of NMOs. © The American Genetic Association 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  12. Gene Expression Analysis of Escherichia Coli Grown in Miniaturized Bioreactor Platforms for High-Throughput Analysis of Growth and genomic Data

    DEFF Research Database (Denmark)

    Boccazzi, P.; Zanzotto, A.; Szita, Nicolas

    2005-01-01

    Combining high-throughput growth physiology and global gene expression data analysis is of significant value for integrating metabolism and genomics. We compared global gene expression using 500 ng of total RNA from Escherichia coli cultures grown in rich or defined minimal media in a miniaturized....... In general, these changes in gene expression levels were similar to those observed in 1,000-fold larger cultures. The increasing rate at which complete genomic sequences of microorganisms are becoming available offers an unprecedented opportunity for investigating these organisms. Our results from microscale...... cultures using just 500 ng of total RNA indicate that high-throughput integration of growth physiology and genomics will be possible with novel biochemical platforms and improved detection technologies....

  13. High-throughput physical map anchoring via BAC-pool sequencing

    Czech Academy of Sciences Publication Activity Database

    Cviková, Kateřina; Cattonaro, F.; Alaux, M.; Stein, N.; Mayer, K.F.X.; Doležel, Jaroslav; Bartoš, Jan

    2015-01-01

    Roč. 15, APR 11 (2015) ISSN 1471-2229 R&D Projects: GA ČR GA13-08786S; GA MŠk(CZ) LO1204 Institutional support: RVO:61389030 Keywords : Physical map * Contig anchoring * Next generation sequencing Subject RIV: EB - Genetics ; Molecular Biology Impact factor: 3.631, year: 2015

  14. Profiling of Ribose Methylations in RNA by High-Throughput Sequencing

    DEFF Research Database (Denmark)

    Birkedal, Ulf; Christensen-Dalsgaard, Mikkel; Krogh, Nicolai

    2015-01-01

    Ribose methylations are the most abundant chemical modifications of ribosomal RNA and are critical for ribosome assembly and fidelity of translation. Many aspects of ribose methylations have been difficult to study due to lack of efficient mapping methods. Here, we present a sequencing-based meth...

  15. Melon Transcriptome Characterization: Simple Sequence Repeats and Single Nucleotide Polymorphisms Discovery for High Throughput Genotyping across the Species

    Directory of Open Access Journals (Sweden)

    José Miguel Blanca

    2011-07-01

    Full Text Available Melon ( L. ranks among the highest-valued fruit crops worldwide. Some genomic tools are available for this crop, including a Sanger transcriptome. We report the generation of 689,054 high-quality expressed sequence tags (ESTs from two 454 sequencing runs, using normalized and nonnormalized complementary DNA (cDNA libraries prepared from four genotypes belonging to the two subspecies and the main commercial types. 454 ESTs were combined with the Sanger available ESTs and de novo assembled into 53,252 unigenes. Over 63% of the unigenes were functionally annotated with Gene Ontology (GO terms and 21% had known orthologs of (L. Heynh. Annotation distribution followed similar tendencies than that reported for , suggesting that the dataset represents a fairly complete melon transcriptome. Furthermore, we identified a set of 3298 unigenes with microsatellite motifs and 14,417 sequences with single nucleotide variants of which 11,655 single nucleotide polymorphism met criteria for use with high-throughput genotyping platforms, and 453 could be detected as cleaved amplified polymorphic sequence (CAPS. A set of markers were validated, 90% of them being polymorphic in a number of variable accessions. This transcriptome provides an invaluable new tool for biological research, more so when it includes transcripts not described previously. It is being used for genome annotation and has provided a large collection of markers that will allow speeding up the process of breeding new melon varieties.

  16. naiveBayesCall: an efficient model-based base-calling algorithm for high-throughput sequencing.

    Science.gov (United States)

    Kao, Wei-Chun; Song, Yun S

    2011-03-01

    Immense amounts of raw instrument data (i.e., images of fluorescence) are currently being generated using ultra high-throughput sequencing platforms. An important computational challenge associated with this rapid advancement is to develop efficient algorithms that can extract accurate sequence information from raw data. To address this challenge, we recently introduced a novel model-based base-calling algorithm that is fully parametric and has several advantages over previously proposed methods. Our original algorithm, called BayesCall, significantly reduced the error rate, particularly in the later cycles of a sequencing run, and also produced useful base-specific quality scores with a high discrimination ability. Unfortunately, however, BayesCall is too computationally expensive to be of broad practical use. In this article, we build on our previous model-based approach to devise an efficient base-calling algorithm that is orders of magnitude faster than BayesCall, while still maintaining a comparably high level of accuracy. Our new algorithm is called naive-BayesCall, and it utilizes approximation and optimization methods to achieve scalability. We describe the performance of naiveBayesCall and demonstrate how improved base-calling accuracy may facilitate de novo assembly and SNP detection when the sequence coverage depth is low to moderate.

  17. Exploring the environmental diversity of kinetoplastid flagellates in the high-throughput DNA sequencing era

    Directory of Open Access Journals (Sweden)

    Claudia Masini d’Avila-Levy

    2015-01-01

    Full Text Available The class Kinetoplastea encompasses both free-living and parasitic species from a wide range of hosts. Several representatives of this group are responsible for severe human diseases and for economic losses in agriculture and livestock. While this group encompasses over 30 genera, most of the available information has been derived from the vertebrate pathogenic genera Leishmaniaand Trypanosoma.Recent studies of the previously neglected groups of Kinetoplastea indicated that the actual diversity is much higher than previously thought. This article discusses the known segment of kinetoplastid diversity and how gene-directed Sanger sequencing and next-generation sequencing methods can help to deepen our knowledge of these interesting protists.

  18. Use of genotyping by sequencing data to develop a high-throughput and multifunctional SNP panel for conservation applications in Pacific lamprey.

    Science.gov (United States)

    Hess, Jon E; Campbell, Nathan R; Docker, Margaret F; Baker, Cyndi; Jackson, Aaron; Lampman, Ralph; McIlraith, Brian; Moser, Mary L; Statler, David P; Young, William P; Wildbill, Andrew J; Narum, Shawn R

    2015-01-01

    Next-generation sequencing data can be mined for highly informative single nucleotide polymorphisms (SNPs) to develop high-throughput genomic assays for nonmodel organisms. However, choosing a set of SNPs to address a variety of objectives can be difficult because SNPs are often not equally informative. We developed an optimal combination of 96 high-throughput SNP assays from a total of 4439 SNPs identified in a previous study of Pacific lamprey (Entosphenus tridentatus) and used them to address four disparate objectives: parentage analysis, species identification and characterization of neutral and adaptive variation. Nine of these SNPs are FST outliers, and five of these outliers are localized within genes and significantly associated with geography, run-timing and dwarf life history. Two of the 96 SNPs were diagnostic for two other lamprey species that were morphologically indistinguishable at early larval stages and were sympatric in the Pacific Northwest. The majority (85) of SNPs in the panel were highly informative for parentage analysis, that is, putatively neutral with high minor allele frequency across the species' range. Results from three case studies are presented to demonstrate the broad utility of this panel of SNP markers in this species. As Pacific lamprey populations are undergoing rapid decline, these SNPs provide an important resource to address critical uncertainties associated with the conservation and recovery of this imperiled species. © 2014 John Wiley & Sons Ltd.

  19. An integrated multiple capillary array electrophoresis system for high-throughput DNA sequencing

    Energy Technology Data Exchange (ETDEWEB)

    Lu, X.

    1998-03-27

    A capillary array electrophoresis system was chosen to perform DNA sequencing because of several advantages such as rapid heat dissipation, multiplexing capabilities, gel matrix filling simplicity, and the mature nature of the associated manufacturing technologies. There are two major concerns for the multiple capillary systems. One concern is inter-capillary cross-talk, and the other concern is excitation and detection efficiency. Cross-talk is eliminated through proper optical coupling, good focusing and immersing capillary array into index matching fluid. A side-entry excitation scheme with orthogonal detection was established for large capillary array. Two 100 capillary array formats were used for DNA sequencing. One format is cylindrical capillary with 150 {micro}m o.d., 75 {micro}m i.d and the other format is square capillary with 300 {micro}m out edge and 75 {micro}m inner edge. This project is focused on the development of excitation and detection of DNA as well as performing DNA sequencing. The DNA injection schemes are discussed for the cases of single and bundled capillaries. An individual sampling device was designed. The base-calling was performed for a capillary from the capillary array with the accuracy of 98%.

  20. Utility of high-throughput DNA sequencing in the study of the human papillomaviruses.

    Science.gov (United States)

    Escobar-Escamilla, Noé; Ramírez-González, José Ernesto; Castro-Escarpulli, Graciela; Díaz-Quiñonez, José Alberto

    2017-12-27

    The Papillomaviridae family is probably the most diverse group of viruses that affect vertebrates. The study of the relationship between infection by certain types of human papillomavirus (HPV) and the development of neoplastic epithelial lesions is of particular interest because of the high prevalence of HPV-related carcinomas in populations of developing countries. To understand the mechanisms of infection and their association with different clinical manifestations, molecular tools play an important role in the description of new types of HPV, the characterization of effector properties of the viral factors, the specific diagnosis and monitoring of HPV types, and the alteration patterns at genetic level in the host. Technological advances in the field of DNA sequencing have led to the development of different next-generation sequencing systems, allowing obtaining a large amount of data and broadening the applications to study viral diseases. In this review, we summarize the main approaches and their perspectives where the use of massively parallel sequencing has been proved as a useful tool in the research of the HPV infection.

  1. Deconvolution of multiple infections in Plasmodium falciparum from high throughput sequencing data.

    Science.gov (United States)

    Zhu, Sha Joe; Almagro-Garcia, Jacob; McVean, Gil

    2018-01-01

    The presence of multiple infecting strains of the malarial parasite Plasmodium falciparum affects key phenotypic traits, including drug resistance and risk of severe disease. Advances in protocols and sequencing technology have made it possible to obtain high-coverage genome-wide sequencing data from blood samples and blood spots taken in the field. However, analyzing and interpreting such data is challenging because of the high rate of multiple infections present. We have developed a statistical method and implementation for deconvolving multiple genome sequences present in an individual with mixed infections. The software package DEploid uses haplotype structure within a reference panel of clonal isolates as a prior for haplotypes present in a given sample. It estimates the number of strains, their relative proportions and the haplotypes presented in a sample, allowing researchers to study multiple infection in malaria with an unprecedented level of detail. The open source implementation DEploid is freely available at https://github.com/mcveanlab/DEploid under the conditions of the GPLv3 license. An R version is available at https://github.com/mcveanlab/DEploid-r. joe.zhu@bdi.ox.ac.uk or gil.mcvean@bdi.ox.ac.uk. Supplementary data are available at Bioinformatics online.

  2. The High Throughput Sequence Annotation Service (HT-SAS – the shortcut from sequence to true Medline words

    Directory of Open Access Journals (Sweden)

    Siedlecki Pawel

    2009-05-01

    Full Text Available Abstract Background Advances in high-throughput technologies available to modern biology have created an increasing flood of experimentally determined facts. Ordering, managing and describing these raw results is the first step which allows facts to become knowledge. Currently there are limited ways to automatically annotate such data, especially utilizing information deposited in published literature. Results To aid researchers in describing results from high-throughput experiments we developed HT-SAS, a web service for automatic annotation of proteins using general English words. For each protein a poll of Medline abstracts connected to homologous proteins is gathered using the UniProt-Medline link. Overrepresented words are detected using binomial statistics approximation. We tested our automatic approach with a protein test set from SGD to determine the accuracy and usefulness of our approach. We also applied the automatic annotation service to improve annotations of proteins from Plasmodium bergei expressed exclusively during the blood stage. Conclusion Using HT-SAS we created new, or enriched already established annotations for over 20% of proteins from Plasmodium bergei expressed in the blood stage, deposited in PlasmoDB. Our tests show this approach to information extraction provides highly specific keywords, often also when the number of abstracts is limited. Our service should be useful for manual curators, as a complement to manually curated information sources and for researchers working with protein datasets, especially from poorly characterized organisms.

  3. Quantitative insertion-site sequencing (QIseq) for high throughput phenotyping of transposon mutants.

    Science.gov (United States)

    Bronner, Iraad F; Otto, Thomas D; Zhang, Min; Udenze, Kenneth; Wang, Chengqi; Quail, Michael A; Jiang, Rays H Y; Adams, John H; Rayner, Julian C

    2016-07-01

    Genetic screening using random transposon insertions has been a powerful tool for uncovering biology in prokaryotes, where whole-genome saturating screens have been performed in multiple organisms. In eukaryotes, such screens have proven more problematic, in part because of the lack of a sensitive and robust system for identifying transposon insertion sites. We here describe quantitative insertion-site sequencing, or QIseq, which uses custom library preparation and Illumina sequencing technology and is able to identify insertion sites from both the 5' and 3' ends of the transposon, providing an inbuilt level of validation. The approach was developed using piggyBac mutants in the human malaria parasite Plasmodium falciparum but should be applicable to many other eukaryotic genomes. QIseq proved accurate, confirming known sites in >100 mutants, and sensitive, identifying and monitoring sites over a >10,000-fold dynamic range of sequence counts. Applying QIseq to uncloned parasites shortly after transfections revealed multiple insertions in mixed populations and suggests that >4000 independent mutants could be generated from relatively modest scales of transfection, providing a clear pathway to genome-scale screens in P. falciparum QIseq was also used to monitor the growth of pools of previously cloned mutants and reproducibly differentiated between deleterious and neutral mutations in competitive growth. Among the mutants with fitness defects was a mutant with a piggyBac insertion immediately upstream of the kelch protein K13 gene associated with artemisinin resistance, implying mutants in this gene may have competitive fitness costs. QIseq has the potential to enable the scale-up of piggyBac-mediated genetics across multiple eukaryotic systems. © 2016 Bronner et al.; Published by Cold Spring Harbor Laboratory Press.

  4. Improving transcriptome assembly through error correction of high-throughput sequence reads.

    Science.gov (United States)

    Macmanes, Matthew D; Eisen, Michael B

    2013-01-01

    The study of functional genomics, particularly in non-model organisms, has been dramatically improved over the last few years by the use of transcriptomes and RNAseq. While these studies are potentially extremely powerful, a computationally intensive procedure, the de novo construction of a reference transcriptome must be completed as a prerequisite to further analyses. The accurate reference is critically important as all downstream steps, including estimating transcript abundance are critically dependent on the construction of an accurate reference. Though a substantial amount of research has been done on assembly, only recently have the pre-assembly procedures been studied in detail. Specifically, several stand-alone error correction modules have been reported on and, while they have shown to be effective in reducing errors at the level of sequencing reads, how error correction impacts assembly accuracy is largely unknown. Here, we show via use of a simulated and empiric dataset, that applying error correction to sequencing reads has significant positive effects on assembly accuracy, and should be applied to all datasets. A complete collection of commands which will allow for the production of Reptile corrected reads is available at https://github.com/macmanes/error_correction/tree/master/scripts and as File S1.

  5. Improving transcriptome assembly through error correction of high-throughput sequence reads

    Directory of Open Access Journals (Sweden)

    Matthew D. MacManes

    2013-07-01

    Full Text Available The study of functional genomics, particularly in non-model organisms, has been dramatically improved over the last few years by the use of transcriptomes and RNAseq. While these studies are potentially extremely powerful, a computationally intensive procedure, the de novo construction of a reference transcriptome must be completed as a prerequisite to further analyses. The accurate reference is critically important as all downstream steps, including estimating transcript abundance are critically dependent on the construction of an accurate reference. Though a substantial amount of research has been done on assembly, only recently have the pre-assembly procedures been studied in detail. Specifically, several stand-alone error correction modules have been reported on and, while they have shown to be effective in reducing errors at the level of sequencing reads, how error correction impacts assembly accuracy is largely unknown. Here, we show via use of a simulated and empiric dataset, that applying error correction to sequencing reads has significant positive effects on assembly accuracy, and should be applied to all datasets. A complete collection of commands which will allow for the production of Reptile corrected reads is available at https://github.com/macmanes/error_correction/tree/master/scripts and as File S1.

  6. High-throughput sequencing reveals inbreeding depression in a natural population.

    Science.gov (United States)

    Hoffman, Joseph I; Simpson, Fraser; David, Patrice; Rijks, Jolianne M; Kuiken, Thijs; Thorne, Michael A S; Lacy, Robert C; Dasmahapatra, Kanchon K

    2014-03-11

    Proxy measures of genome-wide heterozygosity based on approximately 10 microsatellites have been used to uncover heterozygosity fitness correlations (HFCs) for a wealth of important fitness traits in natural populations. However, effect sizes are typically very small and the underlying mechanisms remain contentious, as a handful of markers usually provides little power to detect inbreeding. We therefore used restriction site associated DNA (RAD) sequencing to accurately estimate genome-wide heterozygosity, an approach transferrable to any organism. As a proof of concept, we first RAD sequenced oldfield mice (Peromyscus polionotus) from a known pedigree, finding strong concordance between the inbreeding coefficient and heterozygosity measured at 13,198 single-nucleotide polymorphisms (SNPs). When applied to a natural population of harbor seals (Phoca vitulina), a weak HFC for parasite infection based on 27 microsatellites strengthened considerably with 14,585 SNPs, the deviance explained by heterozygosity increasing almost fivefold to a remarkable 49%. These findings arguably provide the strongest evidence to date of an HFC being due to inbreeding depression in a natural population lacking a pedigree. They also suggest that under some circumstances heterozygosity may explain far more variation in fitness than previously envisaged.

  7. High-throughput sequencing-based genome-wide identification of microRNAs expressed in developing cotton seeds.

    Science.gov (United States)

    Wang, YanMei; Ding, Yan; Yu, DingWei; Xue, Wei; Liu, JinYuan

    2015-08-01

    MicroRNAs (miRNAs) have been shown to play critical regulatory roles in gene expression in cotton. Although a large number of miRNAs have been identified in cotton fibers, the functions of miRNAs in seed development remain unexplored. In this study, a small RNA library was constructed from cotton seeds sampled at 15 days post-anthesis (DPA) and was subjected to high-throughput sequencing. A total of 95 known miRNAs were detected to be expressed in cotton seeds. The expression pattern of these identified miRNAs was profiled and 48 known miRNAs were differentially expressed between cotton seeds and fibers at 15 DPA. In addition, 23 novel miRNA candidates were identified in 15-DPA seeds. Putative targets for 21 novel and 87 known miRNAs were successfully predicted and 900 expressed sequence tag (EST) sequences were proposed to be candidate target genes, which are involved in various metabolic and biological processes, suggesting a complex regulatory network in developing cotton seeds. Furthermore, miRNA-mediated cleavage of three important transcripts in vivo was validated by RLM-5' RACE. This study is the first to show the regulatory network of miRNAs that are involved in developing cotton seeds and provides a foundation for future studies on the specific functions of these miRNAs in seed development.

  8. High throughput on-chip analysis of high-energy charged particle tracks using lensfree imaging

    Energy Technology Data Exchange (ETDEWEB)

    Luo, Wei; Shabbir, Faizan; Gong, Chao; Gulec, Cagatay; Pigeon, Jeremy; Shaw, Jessica; Greenbaum, Alon; Tochitsky, Sergei; Joshi, Chandrashekhar [Electrical Engineering Department, University of California, Los Angeles, California 90095 (United States); Ozcan, Aydogan, E-mail: ozcan@ucla.edu [Electrical Engineering Department, University of California, Los Angeles, California 90095 (United States); Bioengineering Department, University of California, Los Angeles, California 90095 (United States); California NanoSystems Institute (CNSI), University of California, Los Angeles, California 90095 (United States)

    2015-04-13

    We demonstrate a high-throughput charged particle analysis platform, which is based on lensfree on-chip microscopy for rapid ion track analysis using allyl diglycol carbonate, i.e., CR-39 plastic polymer as the sensing medium. By adopting a wide-area opto-electronic image sensor together with a source-shifting based pixel super-resolution technique, a large CR-39 sample volume (i.e., 4 cm × 4 cm × 0.1 cm) can be imaged in less than 1 min using a compact lensfree on-chip microscope, which detects partially coherent in-line holograms of the ion tracks recorded within the CR-39 detector. After the image capture, using highly parallelized reconstruction and ion track analysis algorithms running on graphics processing units, we reconstruct and analyze the entire volume of a CR-39 detector within ∼1.5 min. This significant reduction in the entire imaging and ion track analysis time not only increases our throughput but also allows us to perform time-resolved analysis of the etching process to monitor and optimize the growth of ion tracks during etching. This computational lensfree imaging platform can provide a much higher throughput and more cost-effective alternative to traditional lens-based scanning optical microscopes for ion track analysis using CR-39 and other passive high energy particle detectors.

  9. Analysis of JC virus DNA replication using a quantitative and high-throughput assay.

    Science.gov (United States)

    Shin, Jong; Phelan, Paul J; Chhum, Panharith; Bashkenova, Nazym; Yim, Sung; Parker, Robert; Gagnon, David; Gjoerup, Ole; Archambault, Jacques; Bullock, Peter A

    2014-11-01

    Progressive Multifocal Leukoencephalopathy (PML) is caused by lytic replication of JC virus (JCV) in specific cells of the central nervous system. Like other polyomaviruses, JCV encodes a large T-antigen helicase needed for replication of the viral DNA. Here, we report the development of a luciferase-based, quantitative and high-throughput assay of JCV DNA replication in C33A cells, which, unlike the glial cell lines Hs 683 and U87, accumulate high levels of nuclear T-ag needed for robust replication. Using this assay, we investigated the requirement for different domains of T-ag, and for specific sequences within and flanking the viral origin, in JCV DNA replication. Beyond providing validation of the assay, these studies revealed an important stimulatory role of the transcription factor NF1 in JCV DNA replication. Finally, we show that the assay can be used for inhibitor testing, highlighting its value for the identification of antiviral drugs targeting JCV DNA replication. Copyright © 2014 Elsevier Inc. All rights reserved.

  10. Analysis of JC virus DNA replication using a quantitative and high-throughput assay

    Science.gov (United States)

    Shin, Jong; Phelan, Paul J.; Chhum, Panharith; Bashkenova, Nazym; Yim, Sung; Parker, Robert; Gagnon, David; Gjoerup, Ole; Archambault, Jacques; Bullock, Peter A.

    2015-01-01

    Progressive Multifocal Leukoencephalopathy (PML) is caused by lytic replication of JC virus (JCV) in specific cells of the central nervous system. Like other polyomaviruses, JCV encodes a large T-antigen helicase needed for replication of the viral DNA. Here, we report the development of a luciferase-based, quantitative and high-throughput assay of JCV DNA replication in C33A cells, which, unlike the glial cell lines Hs 683 and U87, accumulate high levels of nuclear T-ag needed for robust replication. Using this assay, we investigated the requirement for different domains of T-ag, and for specific sequences within and flanking the viral origin, in JCV DNA replication. Beyond providing validation of the assay, these studies revealed an important stimulatory role of the transcription factor NF1 in JCV DNA replication. Finally, we show that the assay can be used for inhibitor testing, highlighting its value for the identification of antiviral drugs targeting JCV DNA replication. PMID:25155200

  11. pep2pro: the high-throughput proteomics data processing, analysis and visualization tool

    Directory of Open Access Journals (Sweden)

    Matthias eHirsch-Hoffmann

    2012-06-01

    Full Text Available The pep2pro database was built to support effective high-throughput proteome data analysis. Its database schema allows the coherent integration of search results from different database-dependent search algorithms and filtering of the data including control for unambiguous assignment of peptides to proteins. The capacity of the pep2pro database has been exploited in data analysis of various Arabidopsis proteome datasets. The diversity of the datasets and the associated scientific questions required thorough querying of the data. This was supported by the relational format structure of the data that links all information on the sample, spectrum, search database and algorithm to peptide and protein identifications and their post-translational modifications. After publication of datasets they are made available on the pep2pro website at www.pep2pro.ethz.ch. Further, the pep2pro data analysis pipeline also handles data export do the PRIDE database (http://www.ebi.ac.uk/pride and data retrieval by the MASCP Gator (http://gator.masc-proteomics.org/. The utility of pep2pro will continue to be used for analysis of additional datasets and as a data warehouse. The capacity of the pep2pro database for proteome data analysis has now also been made publicly available through the release of pep2pro4all, which consists of a database schema and a script that will populate the database with mass spectrometry data provided in mzIdentML format.

  12. pep2pro: the high-throughput proteomics data processing, analysis, and visualization tool.

    Science.gov (United States)

    Hirsch-Hoffmann, Matthias; Gruissem, Wilhelm; Baerenfaller, Katja

    2012-01-01

    The pep2pro database was built to support effective high-throughput proteome data analysis. Its database schema allows the coherent integration of search results from different database-dependent search algorithms and filtering of the data including control for unambiguous assignment of peptides to proteins. The capacity of the pep2pro database has been exploited in data analysis of various Arabidopsis proteome datasets. The diversity of the datasets and the associated scientific questions required thorough querying of the data. This was supported by the relational format structure of the data that links all information on the sample, spectrum, search database, and algorithm to peptide and protein identifications and their post-translational modifications. After publication of datasets they are made available on the pep2pro website at www.pep2pro.ethz.ch. Further, the pep2pro data analysis pipeline also handles data export do the PRIDE database (http://www.ebi.ac.uk/pride) and data retrieval by the MASCP Gator (http://gator.masc-proteomics.org/). The utility of pep2pro will continue to be used for analysis of additional datasets and as a data warehouse. The capacity of the pep2pro database for proteome data analysis has now also been made publicly available through the release of pep2pro4all, which consists of a database schema and a script that will populate the database with mass spectrometry data provided in mzIdentML format.

  13. Characterization of bacteria in biopsies of colon and stools by high throughput sequencing of the V2 region of bacterial 16S rRNA gene in human.

    Directory of Open Access Journals (Sweden)

    Yukihide Momozawa

    Full Text Available BACKGROUND: The characterization of the human intestinal microflora and their interactions with the host have been identified as key components in the study of intestinal disorders such as inflammatory bowel diseases. High-throughput sequencing has enabled culture-independent studies to deeply analyze bacteria in the gut. It is possible with this technology to systematically analyze links between microbes and the genetic constitution of the host, such as DNA polymorphisms and methylation, and gene expression. METHODS AND FINDINGS: In this study the V2 region of the bacterial 16S ribosomal RNA (rRNA gene using 454 pyrosequencing from seven anatomic regions of human colon and two types of stool specimens were analyzed. The study examined the number of reads needed to ascertain differences between samples, the effect of DNA extraction procedures and PCR reproducibility, and differences between biopsies and stools in order to design a large scale systematic analysis of gut microbes. It was shown (1 that sequence coverage lower than 1,000 reads influenced quantitative and qualitative differences between samples measured by UniFrac distances. Distances between samples became stable after 1,000 reads. (2 Difference of extracted bacteria was observed between the two DNA extraction methods. In particular, Firmicutes Bacilli were not extracted well by one method. (3 Quantitative and qualitative difference in bacteria from ileum to rectum colon were not observed, but there was a significant positive trend between distances within colon and quantitative differences. Between sample type, biopsies or stools, quantitative and qualitative differences were observed. CONCLUSIONS: Results of human colonic bacteria analyzed using high-throughput sequencing were highly dependent on the experimental design, especially the number of sequence reads, DNA extraction method, and sample type.

  14. Microdissection of lampbrush chromosomes as an approach for generation of locus-specific FISH-probes and samples for high-throughput sequencing.

    Science.gov (United States)

    Zlotina, Anna; Kulikova, Tatiana; Kosyakova, Nadezda; Liehr, Thomas; Krasikova, Alla

    2016-02-20

    Over the past two decades, chromosome microdissection has been widely used in diagnostics and research enabling analysis of chromosomes and their regions through probe generation and establishing of chromosome- and chromosome region-specific DNA libraries. However, relatively small physical size of mitotic chromosomes limited the use of the conventional chromosome microdissection for investigation of tiny chromosomal regions. In the present study, we developed a workflow for mechanical microdissection of giant transcriptionally active lampbrush chromosomes followed by the preparation of whole-chromosome and locus-specific fluorescent in situ hybridization (FISH)-probes and high-throughput sequencing. In particular, chicken (Gallus g. domesticus) lampbrush chromosome regions as small as single chromomeres, individual lateral loops and marker structures were successfully microdissected. The dissected fragments were mapped with high resolution to target regions of the corresponding lampbrush chromosomes. For investigation of RNA-content of lampbrush chromosome structures, samples retrieved by microdissection were subjected to reverse transcription. Using high-throughput sequencing, the isolated regions were successfully assigned to chicken genome coordinates. As a result, we defined precisely the loci for marker structures formation on chicken lampbrush chromosomes 2 and 3. Additionally, our data suggest that large DAPI-positive chromomeres of chicken lampbrush chromosome arms are characterized by low gene density and high repeat content. The developed technical approach allows to obtain DNA and RNA samples from particular lampbrush chromosome loci, to define precisely the genomic position, extent and sequence content of the dissected regions. The data obtained demonstrate that lampbrush chromosome microdissection provides a unique opportunity to correlate a particular transcriptional domain or a cytological structure with a known DNA sequence. This approach offers

  15. DRUMS: Disk Repository with Update Management and Select option for high throughput sequencing data.

    Science.gov (United States)

    Nettling, Martin; Thieme, Nils; Both, Andreas; Grosse, Ivo

    2014-02-04

    New technologies for analyzing biological samples, like next generation sequencing, are producing a growing amount of data together with quality scores. Moreover, software tools (e.g., for mapping sequence reads), calculating transcription factor binding probabilities, estimating epigenetic modification enriched regions or determining single nucleotide polymorphism increase this amount of position-specific DNA-related data even further. Hence, requesting data becomes challenging and expensive and is often implemented using specialised hardware. In addition, picking specific data as fast as possible becomes increasingly important in many fields of science. The general problem of handling big data sets was addressed by developing specialized databases like HBase, HyperTable or Cassandra. However, these database solutions require also specialized or distributed hardware leading to expensive investments. To the best of our knowledge, there is no database capable of (i) storing billions of position-specific DNA-related records, (ii) performing fast and resource saving requests, and (iii) running on a single standard computer hardware. Here, we present DRUMS (Disk Repository with Update Management and Select option), satisfying demands (i)-(iii). It tackles the weaknesses of traditional databases while handling position-specific DNA-related data in an efficient manner. DRUMS is capable of storing up to billions of records. Moreover, it focuses on optimizing relating single lookups as range request, which are needed permanently for computations in bioinformatics. To validate the power of DRUMS, we compare it to the widely used MySQL database. The test setting considers two biological data sets. We use standard desktop hardware as test environment. DRUMS outperforms MySQL in writing and reading records by a factor of two up to a factor of 10000. Furthermore, it can work with significantly larger data sets. Our work focuses on mid-sized data sets up to several billion

  16. High throughput quantitative phenotyping of plant resistance using chlorophyll fluorescence image analysis.

    Science.gov (United States)

    Rousseau, Céline; Belin, Etienne; Bove, Edouard; Rousseau, David; Fabre, Frédéric; Berruyer, Romain; Guillaumès, Jacky; Manceau, Charles; Jacques, Marie-Agnès; Boureau, Tristan

    2013-06-13

    In order to select for quantitative plant resistance to pathogens, high throughput approaches that can precisely quantify disease severity are needed. Automation and use of calibrated image analysis should provide more accurate, objective and faster analyses than visual assessments. In contrast to conventional visible imaging, chlorophyll fluorescence imaging is not sensitive to environmental light variations and provides single-channel images prone to a segmentation analysis by simple thresholding approaches. Among the various parameters used in chlorophyll fluorescence imaging, the maximum quantum yield of photosystem II photochemistry (Fv/Fm) is well adapted to phenotyping disease severity. Fv/Fm is an indicator of plant stress that displays a robust contrast between infected and healthy tissues. In the present paper, we aimed at the segmentation of Fv/Fm images to quantify disease severity. Based on the Fv/Fm values of each pixel of the image, a thresholding approach was developed to delimit diseased areas. A first step consisted in setting up thresholds to reproduce visual observations by trained raters of symptoms caused by Xanthomonas fuscans subsp. fuscans (Xff) CFBP4834-R on Phaseolus vulgaris cv. Flavert. In order to develop a thresholding approach valuable on any cultivars or species, a second step was based on modeling pixel-wise Fv/Fm-distributions as mixtures of Gaussian distributions. Such a modeling may discriminate various stages of the symptom development but over-weights artifacts that can occur on mock-inoculated samples. Therefore, we developed a thresholding approach based on the probability of misclassification of a healthy pixel. Then, a clustering step is performed on the diseased areas to discriminate between various stages of alteration of plant tissues. Notably, the use of chlorophyll fluorescence imaging could detect pre-symptomatic area. The interest of this image analysis procedure for assessing the levels of quantitative resistance

  17. A bioinformatics approach for determining sample identity from different lanes of high-throughput sequencing data.

    Directory of Open Access Journals (Sweden)

    Rachel L Goldfeder

    Full Text Available The ability to generate whole genome data is rapidly becoming commoditized. For example, a mammalian sized genome (∼3Gb can now be sequenced using approximately ten lanes on an Illumina HiSeq 2000. Since lanes from different runs are often combined, verifying that each lane in a genome's build is from the same sample is an important quality control. We sought to address this issue in a post hoc bioinformatic manner, instead of using upstream sample or "barcode" modifications. We rely on the inherent small differences between any two individuals to show that genotype concordance rates can be effectively used to test if any two lanes of HiSeq 2000 data are from the same sample. As proof of principle, we use recent data from three different human samples generated on this platform. We show that the distributions of concordance rates are non-overlapping when comparing lanes from the same sample versus lanes from different samples. Our method proves to be robust even when different numbers of reads are analyzed. Finally, we provide a straightforward method for determining the gender of any given sample. Our results suggest that examining the concordance of detected genotypes from lanes purported to be from the same sample is a relatively simple approach for confirming that combined lanes of data are of the same identity and quality.

  18. A high-throughput sequencing ecotoxicology study of freshwater bacterial communities and their responses to tebuconazole.

    Science.gov (United States)

    Pascault, Noémie; Roux, Simon; Artigas, Joan; Pesce, Stéphane; Leloup, Julie; Tadonleke, Rémy D; Debroas, Didier; Bouchez, Agnès; Humbert, Jean-François

    2014-12-01

    The pollution of lakes and rivers by pesticides is a growing problem worldwide. However, the impacts of these substances on microbial communities are still poorly understood, partly because next-generation sequencing (NGS) has rarely been used in an ecotoxicology context to study bacterial communities despite its interest for accessing rare taxa. Microcosm experiments were carried out to evaluate the effects of tebuconazole (TBZ) on the structure and composition of bacterial communities from two types of freshwater ecosystem (lakes and rivers) with differing histories of pollutant contamination (pristine vs. previously exposed sites). Pyrosequencing revealed that bacterial diversity was higher in the river than in the lakes and in previously exposed sites than in pristine sites. Lakes and river stations shared very few OTUs, and differences at the phylum level were identified between these ecosystems (i.e. the relative importance of Actinobacteria and Gammaproteobacteria). Despite differences between these ecosystems and their contamination history, no significant effect of TBZ on bacterial community structure or composition was observed. Compared to functional parameters that displayed variable responses, we demonstrated that a combination of classical methods and NGS is necessary to investigate the ecotoxicological responses of microbial communities to pollutants. © 2014 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved.

  19. A functional analysis of the CREB signaling pathway using HaloCHIP-chip and high throughput reporter assays

    Directory of Open Access Journals (Sweden)

    Aldred Shelley F

    2009-10-01

    Full Text Available Abstract Background Regulation of gene expression is essential for normal development and cellular growth. Transcriptional events are tightly controlled both spatially and temporally by specific DNA-protein interactions. In this study we finely map the genome-wide targets of the CREB protein across all known and predicted human promoters, and characterize the functional consequences of a subset of these binding events using high-throughput reporter assays. To measure CREB binding, we used HaloCHIP, an antibody-free alternative to the ChIP method that utilizes the HaloTag fusion protein, and also high-throughput promoter-luciferase reporter assays, which provide rapid and quantitative screening of promoters for transcriptional activation or repression in living cells. Results In analysis of CREB genome-wide binding events using a comprehensive DNA microarray of human promoters, we observe for the first time that CREB has a strong preference for binding at bidirectional promoters and unlike unidirectional promoters, these binding events often occur downstream of transcription start sites. Comparison between HaloCHIP-chip and ChIP-chip data reveal this to be true for both methodologies, indicating it is not a bias of the technology chosen. Transcriptional data obtained from promoter-luciferase reporter arrays also show an unprecedented, high level of activation of CREB-bound promoters in the presence of the co-activator protein TORC1. Conclusion These data suggest for the first time that TORC1 provides directional information when CREB is bound at bidirectional promoters and possible pausing of the CREB protein after initial transcriptional activation. Also, this combined approach demonstrates the ability to more broadly characterize CREB protein-DNA interactions wherein not only DNA binding sites are discovered, but also the potential of the promoter sequence to respond to CREB is evaluated.

  20. Next generation sequencing-based multigene panel for high throughput detection of food-borne pathogens.

    Science.gov (United States)

    Ferrario, Chiara; Lugli, Gabriele Andrea; Ossiprandi, Maria Cristina; Turroni, Francesca; Milani, Christian; Duranti, Sabrina; Mancabelli, Leonardo; Mangifesta, Marta; Alessandri, Giulia; van Sinderen, Douwe; Ventura, Marco

    2017-09-01

    Contamination of food by chemicals or pathogenic bacteria may cause particular illnesses that are linked to food consumption, commonly referred to as foodborne diseases. Bacteria are present in/on various foods products, such as fruits, vegetables and ready-to-eat products. Bacteria that cause foodborne diseases are known as foodborne pathogens (FBPs). Accurate detection methods that are able to reveal the presence of FBPs in food matrices are in constant demand, in order to ensure safe foods with a minimal risk of causing foodborne diseases. Here, a multiplex PCR-based Illumina sequencing method for FBP detection in food matrices was developed. Starting from 25 bacterial targets and 49 selected PCR primer pairs, a primer collection called foodborne pathogen - panel (FPP) consisting of 12 oligonucleotide pairs was developed. The FPP allows a more rapid and reliable identification of FBPs compared to classical cultivation methods. Furthermore, FPP permits sensitive and specific FBP detection in about two days from food sample acquisition to bioinformatics-based identification. The FPP is able to simultaneously identify eight different bacterial pathogens, i.e. Listeria monocytogenes, Campylobacter jejuni, Campylobacter coli, Salmonella enterica subsp. enterica serovar enteritidis, Escherichia coli, Shigella sonnei, Staphylococcus aureus and Yersinia enterocolitica, in a given food matrix at a threshold contamination level of 10 1 cell/g. Moreover, this novel detection method may represent an alternative and/or a complementary approach to PCR-based techniques, which are routinely used for FBP detection, and could be implemented in (parts of) the food chain as a quality check. Copyright © 2017 Elsevier B.V. All rights reserved.

  1. Neuropeptidergic Signaling in the American Lobster Homarus americanus: New Insights from High-Throughput Nucleotide Sequencing.

    Science.gov (United States)

    Christie, Andrew E; Chi, Megan; Lameyer, Tess J; Pascual, Micah G; Shea, Devlin N; Stanhope, Meredith E; Schulz, David J; Dickinson, Patsy S

    2015-01-01

    Peptides are the largest and most diverse class of molecules used for neurochemical communication, playing key roles in the control of essentially all aspects of physiology and behavior. The American lobster, Homarus americanus, is a crustacean of commercial and biomedical importance; lobster growth and reproduction are under neuropeptidergic control, and portions of the lobster nervous system serve as models for understanding the general principles underlying rhythmic motor behavior (including peptidergic neuromodulation). While a number of neuropeptides have been identified from H. americanus, and the effects of some have been investigated at the cellular/systems levels, little is currently known about the molecular components of neuropeptidergic signaling in the lobster. Here, a H. americanus neural transcriptome was generated and mined for sequences encoding putative peptide precursors and receptors; 35 precursor- and 41 receptor-encoding transcripts were identified. We predicted 194 distinct neuropeptides from the deduced precursor proteins, including members of the adipokinetic hormone-corazonin-like peptide, allatostatin A, allatostatin C, bursicon, CCHamide, corazonin, crustacean cardioactive peptide, crustacean hyperglycemic hormone (CHH), CHH precursor-related peptide, diuretic hormone 31, diuretic hormone 44, eclosion hormone, FLRFamide, GSEFLamide, insulin-like peptide, intocin, leucokinin, myosuppressin, neuroparsin, neuropeptide F, orcokinin, pigment dispersing hormone, proctolin, pyrokinin, SIFamide, sulfakinin and tachykinin-related peptide families. While some of the predicted peptides are known H. americanus isoforms, most are novel identifications, more than doubling the extant lobster neuropeptidome. The deduced receptor proteins are the first descriptions of H. americanus neuropeptide receptors, and include ones for most of the peptide groups mentioned earlier, as well as those for ecdysis-triggering hormone, red pigment concentrating hormone

  2. High-throughput sequencing of ancient plant and mammal DNA preserved in herbivore middens

    DEFF Research Database (Denmark)

    Murray, Dáithí C.; Pearson, Stuart G.; Fullagar, Richard

    2012-01-01

    The study of arid palaeoenvironments is often frustrated by the poor or non-existent preservation of plant and animal material, yet these environments are of considerable environmental importance. The analysis of pollen and macrofossils isolated from herbivore middens has been an invaluable sourc...

  3. Targeted gene enrichment and high-throughput sequencing for environmental biomonitoring: a case study using freshwater macroinvertebrates.

    Science.gov (United States)

    Dowle, Eddy J; Pochon, Xavier; C Banks, Jonathan; Shearer, Karen; Wood, Susanna A

    2016-09-01

    Recent studies have advocated biomonitoring using DNA techniques. In this study, two high-throughput sequencing (HTS)-based methods were evaluated: amplicon metabarcoding of the cytochrome C oxidase subunit I (COI) mitochondrial gene and gene enrichment using MYbaits (targeting nine different genes including COI). The gene-enrichment method does not require PCR amplification and thus avoids biases associated with universal primers. Macroinvertebrate samples were collected from 12 New Zealand rivers. Macroinvertebrates were morphologically identified and enumerated, and their biomass determined. DNA was extracted from all macroinvertebrate samples and HTS undertaken using the illumina miseq platform. Macroinvertebrate communities were characterized from sequence data using either six genes (three of the original nine were not used) or just the COI gene in isolation. The gene-enrichment method (all genes) detected the highest number of taxa and obtained the strongest Spearman rank correlations between the number of sequence reads, abundance and biomass in 67% of the samples. Median detection rates across rare (5%) taxa were highest using the gene-enrichment method (all genes). Our data indicated primer biases occurred during amplicon metabarcoding with greater than 80% of sequence reads originating from one taxon in several samples. The accuracy and sensitivity of both HTS methods would be improved with more comprehensive reference sequence databases. The data from this study illustrate the challenges of using PCR amplification-based methods for biomonitoring and highlight the potential benefits of using approaches, such as gene enrichment, which circumvent the need for an initial PCR step. © 2015 John Wiley & Sons Ltd.

  4. Coupled high-throughput functional screening and next generation sequencing for identification of plant polymer decomposing enzymes in metagenomic libraries

    Directory of Open Access Journals (Sweden)

    Mari eNyyssönen

    2013-09-01

    Full Text Available Recent advances in sequencing technologies generate new predictions and hypotheses about the functional roles of environmental microorganisms. Yet, until we can test these predictions at a scale that matches our ability to generate them, most of them will remain as hypotheses. Function-based mining of metagenomic libraries can provide direct linkages between genes, metabolic traits and microbial taxa and thus bridge this gap between sequence data generation and functional predictions. Here we developed high-throughput screening assays for function-based characterization of activities involved in plant polymer decomposition from environmental metagenomic libraries. The multiplexed assays use fluorogenic and chromogenic substrates, combine automated liquid handling and use a genetically modified expression host to enable simultaneous screening of 12,160 clones for 14 activities in a total of 170,240 reactions. Using this platform we identified 374 (0.26 % cellulose, hemicellulose, chitin, starch, phosphate and protein hydrolyzing clones from fosmid libraries prepared from decomposing leaf litter. Sequencing on the Illumina MiSeq platform, followed by assembly and gene prediction of a subset of 95 fosmid clones, identified a broad range of bacterial phyla, including Actinobacteria, Bacteroidetes, multiple Proteobacteria sub-phyla in addition to some Fungi. Carbohydrate-active enzyme genes from 20 different glycoside hydrolase families were detected. Using tetranucleotide frequency binning of fosmid sequences, multiple enzyme activities from distinct fosmids were linked, demonstrating how biochemically-confirmed functional traits in environmental metagenomes may be attributed to groups of specific organisms. Overall, our results demonstrate how functional screening of metagenomic libraries can be used to connect microbial functionality to community composition and, as a result, complement large-scale metagenomic sequencing efforts.

  5. Filling reference gaps via assembling DNA barcodes using high-throughput sequencing-moving toward barcoding the world.

    Science.gov (United States)

    Liu, Shanlin; Yang, Chentao; Zhou, Chengran; Zhou, Xin

    2017-12-01

    Over the past decade, biodiversity researchers have dedicated tremendous efforts to constructing DNA reference barcodes for rapid species registration and identification. Although analytical cost for standard DNA barcoding has been significantly reduced since early 2000, further dramatic reduction in barcoding costs is unlikely because Sanger sequencing is approaching its limits in throughput and chemistry cost. Constraints in barcoding cost not only led to unbalanced barcoding efforts around the globe, but also prevented high-throughput sequencing (HTS)-based taxonomic identification from applying binomial species names, which provide crucial linkages to biological knowledge. We developed an Illumina-based pipeline, HIFI-Barcode, to produce full-length Cytochrome c oxidase subunit I (COI) barcodes from pooled polymerase chain reaction amplicons generated by individual specimens. The new pipeline generated accurate barcode sequences that were comparable to Sanger standards, even for different haplotypes of the same species that were only a few nucleotides different from each other. Additionally, the new pipeline was much more sensitive in recovering amplicons at low quantity. The HIFI-Barcode pipeline successfully recovered barcodes from more than 78% of the polymerase chain reactions that didn't show clear bands on the electrophoresis gel. Moreover, sequencing results based on the single molecular sequencing platform Pacbio confirmed the accuracy of the HIFI-Barcode results. Altogether, the new pipeline can provide an improved solution to produce full-length reference barcodes at about one-tenth of the current cost, enabling construction of comprehensive barcode libraries for local fauna, leading to a feasible direction for DNA barcoding global biomes. © The Authors 2017. Published by Oxford University Press.

  6. Utility of lab-on-a-chip technology for high-throughput nucleic acid and protein analysis

    DEFF Research Database (Denmark)

    Hawtin, Paul; Hardern, Ian; Wittig, Rainer

    2005-01-01

    samples is used to stratify gene sets for disease discovery. Finally, the applicability of a high-throughput LoaC system for assessing protein purification is demonstrated. The improvements in workflow processes, speed of analysis, data accuracy and reproducibility, and automated data analysis...

  7. High-throughput sequencing reveals extensive variation in human-specific L1 content in individual human genomes.

    Science.gov (United States)

    Ewing, Adam D; Kazazian, Haig H

    2010-09-01

    Using high-throughput sequencing, we devised a technique to determine the insertion sites of virtually all members of the human-specific L1 retrotransposon family in any human genome. Using diagnostic nucleotides, we were able to locate the approximately 800 L1Hs copies corresponding specifically to the pre-Ta, Ta-0, and Ta-1 L1Hs subfamilies, with over 90% of sequenced reads corresponding to human-specific elements. We find that any two individual genomes differ at an average of 285 sites with respect to L1 insertion presence or absence. In total, we assayed 25 individuals, 15 of which are unrelated, at 1139 sites, including 772 shared with the reference genome and 367 nonreference L1 insertions. We show that L1Hs profiles recapitulate genetic ancestry, and determine the chromosomal distribution of these elements. Using these data, we estimate that the rate of L1 retrotransposition in humans is between 1/95 and 1/270 births, and the number of dimorphic L1 elements in the human population with gene frequencies greater than 0.05 is between 3000 and 10,000.

  8. High throughput sequencing reveals the diversity of TRB-CDR3 repertoire in patients with psoriasis vulgaris.

    Science.gov (United States)

    Cao, Xiaofang; Wa, Qingbiao; Wang, Qidi; Li, Lin; Liu, Xin; An, Lisha; Cai, Ruikun; Du, Meng; Qiu, Yue; Han, Jian; Wang, Chunlin; Wang, Xingyu; Guo, Changlong; Lu, Yonghong; Ma, Xu

    2016-11-01

    Psoriasis is a T cell-mediated chronic inflammatory skin disease with inflammatory cell infiltrates in the dermis and epidermis. Previous studies suggested that there are some expanded T-cell receptor (TCR) clones in psoriatic skin. However, the effect of psoriasis on the immunological characteristics of TCR in circulating blood has not been reported. To address this, we performed high-throughput sequencing to reveal the immunological characteristics of TCR beta chain (TRB) in both psoriasis patients and healthy controls. Our results revealed that the TRB-CDR3 region of psoriasis patients had distinctive immunological characteristics compared with that of healthy controls, including V gene usage, nt of N addition. In addition, three types of TRB-CDR3 peptides were found highly relevant to psoriasis. Our findings show the comprehensive characteristics of psoriasis on the TRB-CDR3 repertoire of circulating blood at sequence-level resolution. These findings may contribute to a better understanding of the pathogenesis of psoriasis and open opportunities to explore potential therapeutic targets. Copyright © 2016 Elsevier B.V. All rights reserved.

  9. High-Throughput Sequencing of Viable Microbial Communities in Raw Pork Subjected to a Fast Cooling Process.

    Science.gov (United States)

    Yang, Chao; Che, You; Qi, Yan; Liang, Peixin; Song, Cunjiang

    2017-01-01

    This study aimed to investigate the effect of the fast cooling process on the microbiological community in chilled fresh pork during storage. We established a culture-independent method to study viable microbes in raw pork. Tray-packaged fresh pork and chilled fresh pork were completely spoiled after 18 and 49 d in aseptic bags at 4 °C, respectively. 16S/18S ribosomal RNAs were reverse transcribed to cDNA to characterize the activity of viable bacteria/fungi in the 2 types of pork. Both cDNA and total DNA were analyzed by high-throughput sequencing, which revealed that viable Bacteroides sp. were the most active genus in rotten pork, although viable Myroides sp. and Pseudomonas sp. were also active. Moreover, viable fungi were only detected in chilled fresh pork. The sequencing results revealed that the fast cooling process could suppress the growth of microbes present initially in the raw meat to extend its shelf life. Our results also suggested that fungi associated with pork spoilage could not grow well in aseptic tray-packaged conditions. © 2016 Institute of Food Technologists®.

  10. Discovery of J Chain in African Lungfish (Protopterus dolloi, Sarcopterygii) Using High Throughput Transcriptome Sequencing: Implications in Mucosal Immunity

    Science.gov (United States)

    Tacchi, Luca; Larragoite, Erin; Salinas, Irene

    2013-01-01

    J chain is a small polypeptide responsible for immunoglobulin (Ig) polymerization and transport of Igs across mucosal surfaces in higher vertebrates. We identified a J chain in dipnoid fish, the African lungfish (Protopterus dolloi) by high throughput sequencing of the transcriptome. P. dolloi J chain is 161 aa long and contains six of the eight Cys residues present in mammalian J chain. Phylogenetic studies place the lungfish J chain closer to tetrapod J chain than to the coelacanth or nurse shark sequences. J chain expression occurs in all P. dolloi immune tissues examined and it increases in the gut and kidney in response to an experimental bacterial infection. Double fluorescent in-situ hybridization shows that 88.5% of IgM+ cells in the gut co-express J chain, a significantly higher percentage than in the pre-pyloric spleen. Importantly, J chain expression is not restricted to the B-cell compartment since gut epithelial cells also express J chain. These results improve our current view of J chain from a phylogenetic perspective. PMID:23967082

  11. High-Throughput Sequencing and Mutagenesis to Accelerate the Domestication of Microlaena stipoides as a New Food Crop

    Science.gov (United States)

    Shapter, Frances M.; Cross, Michael; Ablett, Gary; Malory, Sylvia; Chivers, Ian H.; King, Graham J.; Henry, Robert J.

    2013-01-01

    Global food demand, climatic variability and reduced land availability are driving the need for domestication of new crop species. The accelerated domestication of a rice-like Australian dryland polyploid grass, Microlaena stipoides (Poaceae), was targeted using chemical mutagenesis in conjunction with high throughput sequencing of genes for key domestication traits. While M. stipoides has previously been identified as having potential as a new grain crop for human consumption, only a limited understanding of its genetic diversity and breeding system was available to aid the domestication process. Next generation sequencing of deeply-pooled target amplicons estimated allelic diversity of a selected base population at 14.3 SNP/Mb and identified novel, putatively mutation-induced polymorphisms at about 2.4 mutations/Mb. A 97% lethal dose (LD97) of ethyl methanesulfonate treatment was applied without inducing sterility in this polyploid species. Forward and reverse genetic screens identified beneficial alleles for the domestication trait, seed-shattering. Unique phenotypes observed in the M2 population suggest the potential for rapid accumulation of beneficial traits without recourse to a traditional cross-breeding strategy. This approach may be applicable to other wild species, unlocking their potential as new food, fibre and fuel crops. PMID:24367532

  12. Fungi Sailing the Arctic Ocean: Speciose Communities in North Atlantic Driftwood as Revealed by High-Throughput Amplicon Sequencing.

    Science.gov (United States)

    Rämä, Teppo; Davey, Marie L; Nordén, Jenni; Halvorsen, Rune; Blaalid, Rakel; Mathiassen, Geir H; Alsos, Inger G; Kauserud, Håvard

    2016-08-01

    High amounts of driftwood sail across the oceans and provide habitat for organisms tolerating the rough and saline environment. Fungi have adapted to the extremely cold and saline conditions which driftwood faces in the high north. For the first time, we applied high-throughput sequencing to fungi residing in driftwood to reveal their taxonomic richness, community composition, and ecology in the North Atlantic. Using pyrosequencing of ITS2 amplicons obtained from 49 marine logs, we found 807 fungal operational taxonomic units (OTUs) based on clustering at 97 % sequence similarity cut-off level. The phylum Ascomycota comprised 74 % of the OTUs and 20 % belonged to Basidiomycota. The richness of basidiomycetes decreased with prolonged submersion in the sea, supporting the general view of ascomycetes being more extremotolerant. However, more than one fourth of the fungal OTUs remained unassigned to any fungal class, emphasising the need for better DNA reference data from the marine habitat. Different fungal communities were detected in coniferous and deciduous logs. Our results highlight that driftwood hosts a considerably higher fungal diversity than currently known. The driftwood fungal community is not a terrestrial relic but a speciose assemblage of fungi adapted to the stressful marine environment and different kinds of wooden substrates found in it.

  13. High-throughput sequencing and mutagenesis to accelerate the domestication of Microlaena stipoides as a new food crop.

    Directory of Open Access Journals (Sweden)

    Frances M Shapter

    Full Text Available Global food demand, climatic variability and reduced land availability are driving the need for domestication of new crop species. The accelerated domestication of a rice-like Australian dryland polyploid grass, Microlaena stipoides (Poaceae, was targeted using chemical mutagenesis in conjunction with high throughput sequencing of genes for key domestication traits. While M. stipoides has previously been identified as having potential as a new grain crop for human consumption, only a limited understanding of its genetic diversity and breeding system was available to aid the domestication process. Next generation sequencing of deeply-pooled target amplicons estimated allelic diversity of a selected base population at 14.3 SNP/Mb and identified novel, putatively mutation-induced polymorphisms at about 2.4 mutations/Mb. A 97% lethal dose (LD₉₇ of ethyl methanesulfonate treatment was applied without inducing sterility in this polyploid species. Forward and reverse genetic screens identified beneficial alleles for the domestication trait, seed-shattering. Unique phenotypes observed in the M2 population suggest the potential for rapid accumulation of beneficial traits without recourse to a traditional cross-breeding strategy. This approach may be applicable to other wild species, unlocking their potential as new food, fibre and fuel crops.

  14. High-throughput microfluidic device for single cell analysis using multiple integrated soft lithographic pumps.

    Science.gov (United States)

    Patabadige, Damith E W; Mickleburgh, Tom; Ferris, Lorin; Brummer, Gage; Culbertson, Anne H; Culbertson, Christopher T

    2016-05-01

    The ability to accurately control fluid transport in microfluidic devices is key for developing high-throughput methods for single cell analysis. Making small, reproducible changes to flow rates, however, to optimize lysis and injection using pumps external to the microfluidic device are challenging and time-consuming. To improve the throughput and increase the number of cells analyzed, we have integrated previously reported micropumps into a microfluidic device that can increase the cell analysis rate to ∼1000 cells/h and operate for over an hour continuously. In order to increase the flow rates sufficiently to handle cells at a higher throughput, three sets of pumps were multiplexed. These pumps are simple, low-cost, durable, easy to fabricate, and biocompatible. They provide precise control of the flow rate up to 9.2 nL/s. These devices were used to automatically transport, lyse, and electrophoretically separate T-Lymphocyte cells loaded with Oregon green and 6-carboxyfluorescein. Peak overlap statistics predicted the number of fully resolved single-cell electropherograms seen. In addition, there was no change in the average fluorescent dye peak areas indicating that the cells remained intact and the dyes did not leak out of the cells over the 1 h analysis time. The cell lysate peak area distribution followed that expected of an asynchronous steady-state population of immortalized cells. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  15. High-throughput analysis of lectin-oligosaccharide interactions by automated frontal affinity chromatography.

    Science.gov (United States)

    Nakamura-Tsuruta, Sachiko; Uchiyama, Noboru; Hirabayashi, Jun

    2006-01-01

    Frontal affinity chromatography (FAC) is a quantitative method that enables sensitive and reproducible measurements of interactions between lectins and oligosaccharides. The method is suitable even for the measurement of low-affinity interactions and is based on a simple procedure and a clear principle. To achieve high-throughput and efficient analysis, an automated FAC system was developed. The system designated FAC-1 consists of two isocratic pumps, an autosampler, and a couple of miniature columns (bed volume, 31.4 microl) connected in parallel to either a fluorescence or an ultraviolet detector. By use of this parallel-column system, the time required for each analysis was reduced substantially. Under the established conditions, fewer than 10 hrs are required for 100 interaction analyses, consuming as little as 1 pmol pyridylaminated oligosaccharide for each analysis. This strategy for FAC should contribute to the construction of a lectin-oligosaccharide interaction database essential for future glycomics. Overall features and practical protocols for interaction analyses using FAC-1 are described.

  16. PhenStat: A Tool Kit for Standardized Analysis of High Throughput Phenotypic Data.

    Directory of Open Access Journals (Sweden)

    Natalja Kurbatova

    Full Text Available The lack of reproducibility with animal phenotyping experiments is a growing concern among the biomedical community. One contributing factor is the inadequate description of statistical analysis methods that prevents researchers from replicating results even when the original data are provided. Here we present PhenStat--a freely available R package that provides a variety of statistical methods for the identification of phenotypic associations. The methods have been developed for high throughput phenotyping pipelines implemented across various experimental designs with an emphasis on managing temporal variation. PhenStat is targeted to two user groups: small-scale users who wish to interact and test data from large resources and large-scale users who require an automated statistical analysis pipeline. The software provides guidance to the user for selecting appropriate analysis methods based on the dataset and is designed to allow for additions and modifications as needed. The package was tested on mouse and rat data and is used by the International Mouse Phenotyping Consortium (IMPC. By providing raw data and the version of PhenStat used, resources like the IMPC give users the ability to replicate and explore results within their own computing environment.

  17. Diversity and functions of bacterial community in drinking water biofilms revealed by high-throughput sequencing

    Science.gov (United States)

    Chao, Yuanqing; Mao, Yanping; Wang, Zhiping; Zhang, Tong

    2015-06-01

    The development of biofilms in drinking water (DW) systems may cause various problems to water quality. To investigate the community structure of biofilms on different pipe materials and the global/specific metabolic functions of DW biofilms, PCR-based 454 pyrosequencing data for 16S rRNA genes and Illumina metagenomic data were generated and analysed. Considerable differences in bacterial diversity and taxonomic structure were identified between biofilms formed on stainless steel and biofilms formed on plastics, indicating that the metallic materials facilitate the formation of higher diversity biofilms. Moreover, variations in several dominant genera were observed during biofilm formation. Based on PCA analysis, the global functions in the DW biofilms were similar to other DW metagenomes. Beyond the global functions, the occurrences and abundances of specific protective genes involved in the glutathione metabolism, the SoxRS system, the OxyR system, RpoS regulated genes, and the production/degradation of extracellular polymeric substances were also evaluated. A near-complete and low-contamination draft genome was constructed from the metagenome of the DW biofilm, based on the coverage and tetranucleotide frequencies, and identified as a Bradyrhizobiaceae-like bacterium according to a phylogenetic analysis. Our findings provide new insight into DW biofilms, especially in terms of their metabolic functions.

  18. High-throughput analysis of drug dissociation from serum proteins using affinity silica monoliths.

    Science.gov (United States)

    Yoo, Michelle J; Hage, David S

    2011-08-01

    A noncompetitive peak decay method was used with 1 mm×4.6 mm id silica monoliths to measure the dissociation rate constants (kd) for various drugs with human serum albumin (HSA) and α1-acid glycoprotein (AGP). Flow rates up to 9 mL/min were used in these experiments, resulting in analysis times of only 20-30 s. Using a silica monolith containing immobilized HSA, dissociation rate constants were measured for amitriptyline, carboplatin, cisplatin, chloramphenicol, nortriptyline, quinidine, and verapamil, giving values that ranged from 0.37 to 0.78 s(-1). Similar work with an immobilized AGP silica monolith gave kd values for amitriptyline, nortriptyline, and lidocaine of 0.39-0.73 s(-1). These kd values showed good agreement with values determined for drugs with similar structures and/or affinities for HSA or AGP. It was found that a kd of up to roughly 0.80 s(-1) could be measured by this approach. This information made it possible to obtain a better understanding of the advantages and possible limitations of the noncompetitive peak decay method and in the use of affinity silica monoliths for the high-throughput analysis of drug-protein dissociation. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  19. Combinatorial approach toward high-throughput analysis of direct methanol fuel cells.

    Science.gov (United States)

    Jiang, Rongzhong; Rong, Charles; Chu, Deryn

    2005-01-01

    A 40-member array of direct methanol fuel cells (with stationary fuel and convective air supplies) was generated by electrically connecting the fuel cells in series. High-throughput analysis of these fuel cells was realized by fast screening of voltages between the two terminals of a fuel cell at constant current discharge. A large number of voltage-current curves (200) were obtained by screening the voltages through multiple small-current steps. Gaussian distribution was used to statistically analyze the large number of experimental data. The standard deviation (sigma) of voltages of these fuel cells increased linearly with discharge current. The voltage-current curves at various fuel concentrations were simulated with an empirical equation of voltage versus current and a linear equation of sigma versus current. The simulated voltage-current curves fitted the experimental data well. With increasing methanol concentration from 0.5 to 4.0 M, the Tafel slope of the voltage-current curves (at sigma=0.0), changed from 28 to 91 mV.dec-1, the cell resistance from 2.91 to 0.18 Omega, and the power output from 3 to 18 mW.cm-2.

  20. A Concept for a Sensitive Micro Total Analysis System for High Throughput Fluorescence Imaging

    Directory of Open Access Journals (Sweden)

    Yosi Shacham

    2006-04-01

    Full Text Available This paper discusses possible methods for on-chip fluorescent imaging forintegrated bio-sensors. The integration of optical and electro-optical accessories, accordingto suggested methods, can improve the performance of fluorescence imaging. It can boostthe signal to background ratio by a few orders of magnitudes in comparison to conventionaldiscrete setups. The methods that are present in this paper are oriented towards buildingreproducible arrays for high-throughput micro total analysis systems (μTAS. The firstmethod relates to side illumination of the fluorescent material placed into micro-compartments of the lab-on-chip. Its significance is in high utilization of excitation energyfor low concentration of fluorescent material. The utilization of a transparent μLED chip,for the second method, allows the placement of the excitation light sources on the sameoptical axis with emission detector, such that the excitation and emission rays are directedcontroversly. The third method presents a spatial filtering of the excitation background.

  1. Microscopic description of oxide perovskites and automated high-throughput analysis of their energy landscape

    Science.gov (United States)

    Pizzi, Giovanni; Cepellotti, Andrea; Kozinsky, Boris; Marzari, Nicola

    Even if ferroelectric materials like BaTiO3 or KNbO3 have been used for decades in a broad range of technological applications, there is still significant debate in the literature concerning their microscopic behavior. For instance, many perovskite materials display a high-temperature cubic phase with zero net polarization, but its microscopic nature is though still unclear, with some materials displaying a very complex energy landscape with multiple local minima. In order to investigate and clarify the microscopic nature of oxide perovskites, we perform a study on a set of about 50 representative ABO3 systems. We use spacegroup techniques to systematically analyze all possible local displacement patterns that are compatible with a net paraelectric phase, but can provide local non-zero ferroelectric moments. The energetics and the stability of these patterns is then assessed by combining the spacegroup analysis with DFT calculations. All calculations are managed and analyzed using our high-throughput platform AiiDA (www.aiida.net). Using this technique, we are able to describe the different classes of microscopic models underlying the perovskite systems

  2. Multiplex mRNA assay using electrophoretic tags for high-throughput gene expression analysis.

    Science.gov (United States)

    Tian, Huan; Cao, Liching; Tan, Yuping; Williams, Stephen; Chen, Lili; Matray, Tracy; Chenna, Ahmed; Moore, Sean; Hernandez, Vincent; Xiao, Vivian; Tang, Mengxiang; Singh, Sharat

    2004-09-08

    We describe a novel multiplexing technology using a library of small fluorescent molecules, termed eTag molecules, to code and quantify mRNA targets. eTag molecules, which have the same fluorometric property, but distinct charge-to-mass ratios possess pre-defined electrophoretic characteristics and can be resolved using capillary electrophoresis. Coupled with primary Invader mRNA assay, eTag molecules were applied to simultaneously quantify up to 44 mRNA targets. This multiplexing approach was validated by examining a panel of inflammation responsive genes in human umbilical vein endothelial cells stimulated with inflammatory cytokine interleukin 1beta. The laser-induced fluorescence detection and electrokinetic sample injection process in capillary electrophoresis allows sensitive quantification of thousands of copies of mRNA molecules in a reaction. The assay is precise, as evaluated by measuring qualified Z' factor, a dimensionless and simple characteristic for applications in high-throughput screening using mRNA assays. Our data demonstrate the synergy between the multiplexing capability of eTag molecules by sensitive capillary electrophoresis detection and the isothermal linear amplification characteristics of the Invader assay. eTag multiplex mRNA assay presents a unique platform for sensitive, high sample throughput and multiplex gene expression analysis.

  3. The Open Connectome Project Data Cluster: Scalable Analysis and Vision for High-Throughput Neuroscience.

    Science.gov (United States)

    Burns, Randal; Roncal, William Gray; Kleissas, Dean; Lillaney, Kunal; Manavalan, Priya; Perlman, Eric; Berger, Daniel R; Bock, Davi D; Chung, Kwanghun; Grosenick, Logan; Kasthuri, Narayanan; Weiler, Nicholas C; Deisseroth, Karl; Kazhdan, Michael; Lichtman, Jeff; Reid, R Clay; Smith, Stephen J; Szalay, Alexander S; Vogelstein, Joshua T; Vogelstein, R Jacob

    2013-01-01

    We describe a scalable database cluster for the spatial analysis and annotation of high-throughput brain imaging data, initially for 3-d electron microscopy image stacks, but for time-series and multi-channel data as well. The system was designed primarily for workloads that build connectomes- neural connectivity maps of the brain-using the parallel execution of computer vision algorithms on high-performance compute clusters. These services and open-science data sets are publicly available at openconnecto.me. The system design inherits much from NoSQL scale-out and data-intensive computing architectures. We distribute data to cluster nodes by partitioning a spatial index. We direct I/O to different systems-reads to parallel disk arrays and writes to solid-state storage-to avoid I/O interference and maximize throughput. All programming interfaces are RESTful Web services, which are simple and stateless, improving scalability and usability. We include a performance evaluation of the production system, highlighting the effec-tiveness of spatial data organization.

  4. MultiSense: A Multimodal Sensor Tool Enabling the High-Throughput Analysis of Respiration.

    Science.gov (United States)

    Keil, Peter; Liebsch, Gregor; Borisjuk, Ljudmilla; Rolletschek, Hardy

    2017-01-01

    The high-throughput analysis of respiratory activity has become an important component of many biological investigations. Here, a technological platform, denoted the "MultiSense tool," is described. The tool enables the parallel monitoring of respiration in 100 samples over an extended time period, by dynamically tracking the concentrations of oxygen (O2) and/or carbon dioxide (CO2) and/or pH within an airtight vial. Its flexible design supports the quantification of respiration based on either oxygen consumption or carbon dioxide release, thereby allowing for the determination of the physiologically significant respiratory quotient (the ratio between the quantities of CO2 released and the O2 consumed). It requires an LED light source to be mounted above the sample, together with a CCD camera system, adjusted to enable the capture of analyte-specific wavelengths, and fluorescent sensor spots inserted into the sample vial. Here, a demonstration is given of the use of the MultiSense tool to quantify respiration in imbibing plant seeds, for which an appropriate step-by-step protocol is provided. The technology can be easily adapted for a wide range of applications, including the monitoring of gas exchange in any kind of liquid culture system (algae, embryo and tissue culture, cell suspensions, microbial cultures).

  5. The Open Connectome Project Data Cluster: Scalable Analysis and Vision for High-Throughput Neuroscience

    Science.gov (United States)

    Burns, Randal; Roncal, William Gray; Kleissas, Dean; Lillaney, Kunal; Manavalan, Priya; Perlman, Eric; Berger, Daniel R.; Bock, Davi D.; Chung, Kwanghun; Grosenick, Logan; Kasthuri, Narayanan; Weiler, Nicholas C.; Deisseroth, Karl; Kazhdan, Michael; Lichtman, Jeff; Reid, R. Clay; Smith, Stephen J.; Szalay, Alexander S.; Vogelstein, Joshua T.; Vogelstein, R. Jacob

    2013-01-01

    We describe a scalable database cluster for the spatial analysis and annotation of high-throughput brain imaging data, initially for 3-d electron microscopy image stacks, but for time-series and multi-channel data as well. The system was designed primarily for workloads that build connectomes— neural connectivity maps of the brain—using the parallel execution of computer vision algorithms on high-performance compute clusters. These services and open-science data sets are publicly available at openconnecto.me. The system design inherits much from NoSQL scale-out and data-intensive computing architectures. We distribute data to cluster nodes by partitioning a spatial index. We direct I/O to different systems—reads to parallel disk arrays and writes to solid-state storage—to avoid I/O interference and maximize throughput. All programming interfaces are RESTful Web services, which are simple and stateless, improving scalability and usability. We include a performance evaluation of the production system, highlighting the effec-tiveness of spatial data organization. PMID:24401992

  6. The pig gut microbial diversity: Understanding the pig gut microbial ecology through the next generation high throughput sequencing.

    Science.gov (United States)

    Kim, Hyeun Bum; Isaacson, Richard E

    2015-06-12

    The importance of the gut microbiota of animals is widely acknowledged because of its pivotal roles in the health and well being of animals. The genetic diversity of the gut microbiota contributes to the overall development and metabolic needs of the animal, and provides the host with many beneficial functions including production of volatile fatty acids, re-cycling of bile salts, production of vitamin K, cellulose digestion, and development of immune system. Thus the intestinal microbiota of animals has been the subject of study for many decades. Although most of the older studies have used culture dependent methods, the recent advent of high throughput sequencing of 16S rRNA genes has facilitated in depth studies exploring microbial populations and their dynamics in the animal gut. These culture independent DNA based studies generate large amounts of data and as a result contribute to a more detailed understanding of the microbiota dynamics in the gut and the ecology of the microbial populations. Of equal importance, is being able to identify and quantify microbes that are difficult to grow or that have not been grown in the laboratory. Interpreting the data obtained from this type of study requires using basic principles of microbial diversity to understand importance of the composition of microbial populations. In this review, we summarize the literature on culture independent studies of the pig gut microbiota with an emphasis on its succession and alterations caused by diverse factors. Copyright © 2015 Elsevier B.V. All rights reserved.

  7. Distribution and Diversity of Bacteria and Fungi Colonization in Stone Monuments Analyzed by High-Throughput Sequencing

    Science.gov (United States)

    Li, Qiang; Zhang, Bingjian; He, Zhang; Yang, Xiaoru

    2016-01-01

    The historical and cultural heritage of Qingxing palace and Lingyin and Kaihua temple, located in Hangzhou of China, include a large number of exquisite Buddhist statues and ancient stone sculptures which date back to the Northern Song (960–1219 A.D.) and Qing dynasties (1636–1912 A.D.) and are considered to be some of the best examples of ancient stone sculpting techniques. They were added to the World Heritage List in 2011 because of their unique craftsmanship and importance to the study of ancient Chinese Buddhist culture. However, biodeterioration of the surface of the ancient Buddhist statues and white marble pillars not only severely impairs their aesthetic value but also alters their material structure and thermo-hygric properties. In this study, high-throughput sequencing was utilized to identify the microbial communities colonizing the stone monuments. The diversity and distribution of the microbial communities in six samples collected from three different environmental conditions with signs of deterioration were analyzed by means of bioinformatics software and diversity indices. In addition, the impact of environmental factors, including temperature, light intensity, air humidity, and the concentration of NO2 and SO2, on the microbial communities’ diversity and distribution was evaluated. The results indicate that the presence of predominantly phototrophic microorganisms was correlated with light and humidity, while nitrifying bacteria and Thiobacillus were associated with NO2 and SO2 from air pollution. PMID:27658256

  8. Distribution and Diversity of Bacteria and Fungi Colonization in Stone Monuments Analyzed by High-Throughput Sequencing.

    Science.gov (United States)

    Li, Qiang; Zhang, Bingjian; He, Zhang; Yang, Xiaoru

    The historical and cultural heritage of Qingxing palace and Lingyin and Kaihua temple, located in Hangzhou of China, include a large number of exquisite Buddhist statues and ancient stone sculptures which date back to the Northern Song (960-1219 A.D.) and Qing dynasties (1636-1912 A.D.) and are considered to be some of the best examples of ancient stone sculpting techniques. They were added to the World Heritage List in 2011 because of their unique craftsmanship and importance to the study of ancient Chinese Buddhist culture. However, biodeterioration of the surface of the ancient Buddhist statues and white marble pillars not only severely impairs their aesthetic value but also alters their material structure and thermo-hygric properties. In this study, high-throughput sequencing was utilized to identify the microbial communities colonizing the stone monuments. The diversity and distribution of the microbial communities in six samples collected from three different environmental conditions with signs of deterioration were analyzed by means of bioinformatics software and diversity indices. In addition, the impact of environmental factors, including temperature, light intensity, air humidity, and the concentration of NO2 and SO2, on the microbial communities' diversity and distribution was evaluated. The results indicate that the presence of predominantly phototrophic microorganisms was correlated with light and humidity, while nitrifying bacteria and Thiobacillus were associated with NO2 and SO2 from air pollution.

  9. Distribution and Diversity of Bacteria and Fungi Colonization in Stone Monuments Analyzed by High-Throughput Sequencing.

    Directory of Open Access Journals (Sweden)

    Qiang Li

    Full Text Available The historical and cultural heritage of Qingxing palace and Lingyin and Kaihua temple, located in Hangzhou of China, include a large number of exquisite Buddhist statues and ancient stone sculptures which date back to the Northern Song (960-1219 A.D. and Qing dynasties (1636-1912 A.D. and are considered to be some of the best examples of ancient stone sculpting techniques. They were added to the World Heritage List in 2011 because of their unique craftsmanship and importance to the study of ancient Chinese Buddhist culture. However, biodeterioration of the surface of the ancient Buddhist statues and white marble pillars not only severely impairs their aesthetic value but also alters their material structure and thermo-hygric properties. In this study, high-throughput sequencing was utilized to identify the microbial communities colonizing the stone monuments. The diversity and distribution of the microbial communities in six samples collected from three different environmental conditions with signs of deterioration were analyzed by means of bioinformatics software and diversity indices. In addition, the impact of environmental factors, including temperature, light intensity, air humidity, and the concentration of NO2 and SO2, on the microbial communities' diversity and distribution was evaluated. The results indicate that the presence of predominantly phototrophic microorganisms was correlated with light and humidity, while nitrifying bacteria and Thiobacillus were associated with NO2 and SO2 from air pollution.

  10. Selection of DNA Aptamers for Ovarian Cancer Biomarker CA125 Using One-Pot SELEX and High-Throughput Sequencing

    Directory of Open Access Journals (Sweden)

    Delia J. Scoville

    2017-01-01

    Full Text Available CA125 is a mucin glycoprotein whose concentration in serum correlates with a woman’s risk of developing ovarian cancer and also indicates response to therapy in diagnosed patients. Accurate detection of this large, complex protein in patient samples is of great clinical relevance. We suggest that powerful new diagnostic tools may be enabled by the development of nucleic acid aptamers with affinity for CA125. Here, we report on our use of One-Pot SELEX to isolate single-stranded DNA aptamers with affinity for CA125, followed by high-throughput sequencing of the selected oligonucleotides. This data-rich approach, combined with bioinformatics tools, enabled the entire selection process to be characterized. Using fluorescence anisotropy and affinity probe capillary electrophoresis, the binding affinities of four aptamer candidates were evaluated. Two aptamers, CA125_1 and CA125_12, both without primers, were found to bind to clinically relevant concentrations of the protein target. Binding was differently influenced by the presence of Mg2+ ions, being required for binding of CA125_1 and abrogating binding of CA125_12. In conclusion, One-Pot SELEX was found to be a promising selection method that yielded DNA aptamers to a clinically important protein target.

  11. The First Report of miRNAs from a Thysanopteran Insect, Thrips palmi Karny Using High-Throughput Sequencing.

    Science.gov (United States)

    Rebijith, K B; Asokan, R; Hande, H Ranjitha; Krishna Kumar, N K

    Thrips palmi Karny (Thysanoptera: Thripidae) is the sole vector of Watermelon bud necrosis tospovirus, where the crop loss has been estimated to be around USD 50 million annually. Chemical insecticides are of limited use in the management of T. palmi due to the thigmokinetic behaviour and development of high levels of resistance to insecticides. There is an urgent need to find out an effective futuristic management strategy, where the small RNAs especially microRNAs hold great promise as a key player in the growth and development. miRNAs are a class of short non-coding RNAs involved in regulation of gene expression either by mRNA cleavage or by translational repression. We identified and characterized a total of 77 miRNAs from T. palmi using high-throughput deep sequencing. Functional classifications of the targets for these miRNAs revealed that majority of them are involved in the regulation of transcription and translation, nucleotide binding and signal transduction. We have also validated few of these miRNAs employing stem-loop RT-PCR, qRT-PCR and Northern blot. The present study not only provides an in-depth understanding of the biological and physiological roles of miRNAs in governing gene expression but may also lead as an invaluable tool for the management of thysanopteran insects in the future.

  12. Apple ring rot-responsive putative microRNAs revealed by high-throughput sequencing in Malus × domestica Borkh.

    Science.gov (United States)

    Yu, Xin-Yi; Du, Bei-Bei; Gao, Zhi-Hong; Zhang, Shi-Jie; Tu, Xu-Tong; Chen, Xiao-Yun; Zhang, Zhen; Qu, Shen-Chun

    2014-08-01

    MicroRNAs (miRNAs) are small non-coding RNAs, which silence target mRNA via cleavage or translational inhibition to function in regulating gene expression. MiRNAs act as important regulators of plant development and stress response. For understanding the role of miRNAs responsive to apple ring rot stress, we identified disease-responsive miRNAs using high-throughput sequencing in Malus × domestica Borkh.. Four small RNA libraries were constructed from two control strains in M. domestica, crabapple (CKHu) and Fuji Naga-fu No. 6 (CKFu), and two disease stress strains, crabapple (DSHu) and Fuji Naga-fu No. 6 (DSFu). A total of 59 miRNA families were identified and five miRNAs might be responsive to apple ring rot infection and validated via qRT-PCR. Furthermore, we predicted 76 target genes which were regulated by conserved miRNAs potentially. Our study demonstrated that miRNAs was responsive to apple ring rot infection and may have important implications on apple disease resistance.

  13. The First Report of miRNAs from a Thysanopteran Insect, Thrips palmi Karny Using High-Throughput Sequencing.

    Directory of Open Access Journals (Sweden)

    K B Rebijith

    Full Text Available Thrips palmi Karny (Thysanoptera: Thripidae is the sole vector of Watermelon bud necrosis tospovirus, where the crop loss has been estimated to be around USD 50 million annually. Chemical insecticides are of limited use in the management of T. palmi due to the thigmokinetic behaviour and development of high levels of resistance to insecticides. There is an urgent need to find out an effective futuristic management strategy, where the small RNAs especially microRNAs hold great promise as a key player in the growth and development. miRNAs are a class of short non-coding RNAs involved in regulation of gene expression either by mRNA cleavage or by translational repression. We identified and characterized a total of 77 miRNAs from T. palmi using high-throughput deep sequencing. Functional classifications of the targets for these miRNAs revealed that majority of them are involved in the regulation of transcription and translation, nucleotide binding and signal transduction. We have also validated few of these miRNAs employing stem-loop RT-PCR, qRT-PCR and Northern blot. The present study not only provides an in-depth understanding of the biological and physiological roles of miRNAs in governing gene expression but may also lead as an invaluable tool for the management of thysanopteran insects in the future.

  14. High throughput sequencing identifies an imprinted gene, Grb10, associated with the pluripotency state in nuclear transfer embryonic stem cells.

    Science.gov (United States)

    Li, Hui; Gao, Shuai; Huang, Hua; Liu, Wenqiang; Huang, Huanwei; Liu, Xiaoyu; Gao, Yawei; Le, Rongrong; Kou, Xiaochen; Zhao, Yanhong; Kou, Zhaohui; Li, Jia; Wang, Hong; Zhang, Yu; Wang, Hailin; Cai, Tao; Sun, Qingyuan; Gao, Shaorong; Han, Zhiming

    2017-07-18

    Somatic cell nuclear transfer and transcription factor mediated reprogramming are two widely used techniques for somatic cell reprogramming. Both fully reprogrammed nuclear transfer embryonic stem cells and induced pluripotent stem cells hold potential for regenerative medicine, and evaluation of the stem cell pluripotency state is crucial for these applications. Previous reports have shown that the Dlk1-Dio3 region is associated with pluripotency in induced pluripotent stem cells and the incomplete somatic cell reprogramming causes abnormally elevated levels of genomic 5-methylcytosine in induced pluripotent stem cells compared to nuclear transfer embryonic stem cells and embryonic stem cells. In this study, we compared pluripotency associated genes Rian and Gtl2 in the Dlk1-Dio3 region in exactly syngeneic nuclear transfer embryonic stem cells and induced pluripotent stem cells with same genomic insertion. We also assessed 5-methylcytosine and 5-hydroxymethylcytosine levels and performed high-throughput sequencing in these cells. Our results showed that Rian and Gtl2 in the Dlk1-Dio3 region related to pluripotency in induced pluripotent stem cells did not correlate with the genes in nuclear transfer embryonic stem cells, and no significant difference in 5-methylcytosine and 5-hydroxymethylcytosine levels were observed between fully and partially reprogrammed nuclear transfer embryonic stem cells and induced pluripotent stem cells. Through syngeneic comparison, our study identifies for the first time that Grb10 is associated with the pluripotency state in nuclear transfer embryonic stem cells.

  15. High-throughput mutational analysis of TOR1A in primary dystonia.

    Science.gov (United States)

    Xiao, Jianfeng; Bastian, Robert W; Perlmutter, Joel S; Racette, Brad A; Tabbal, Samer D; Karimi, Morvarid; Paniello, Randal C; Blitzer, Andrew; Batish, Sat Dev; Wszolek, Zbigniew K; Uitti, Ryan J; Hedera, Peter; Simon, David K; Tarsy, Daniel; Truong, Daniel D; Frei, Karen P; Pfeiffer, Ronald F; Gong, Suzhen; Zhao, Yu; LeDoux, Mark S

    2009-03-11

    Although the c.904_906delGAG mutation in Exon 5 of TOR1A typically manifests as early-onset generalized dystonia, DYT1 dystonia is genetically and clinically heterogeneous. Recently, another Exon 5 mutation (c.863G>A) has been associated with early-onset generalized dystonia and some DeltaGAG mutation carriers present with late-onset focal dystonia. The aim of this study was to identify TOR1A Exon 5 mutations in a large cohort of subjects with mainly non-generalized primary dystonia. High resolution melting (HRM) was used to examine the entire TOR1A Exon 5 coding sequence in 1014 subjects with primary dystonia (422 spasmodic dysphonia, 285 cervical dystonia, 67 blepharospasm, 41 writer's cramp, 16 oromandibular dystonia, 38 other primary focal dystonia, 112 segmental dystonia, 16 multifocal dystonia, and 17 generalized dystonia) and 250 controls (150 neurologically normal and 100 with other movement disorders). Diagnostic sensitivity and specificity were evaluated in an additional 8 subjects with known DeltaGAG DYT1 dystonia and 88 subjects with DeltaGAG-negative dystonia. HRM of TOR1A Exon 5 showed high (100%) diagnostic sensitivity and specificity. HRM was rapid and economical. HRM reliably differentiated the TOR1A DeltaGAG and c.863G>A mutations. Melting curves were normal in 250/250 controls and 1012/1014 subjects with primary dystonia. The two subjects with shifted melting curves were found to harbor the classic DeltaGAG deletion: 1) a non-Jewish Caucasian female with childhood-onset multifocal dystonia and 2) an Ashkenazi Jewish female with adolescent-onset spasmodic dysphonia. First, HRM is an inexpensive, diagnostically sensitive and specific, high-throughput method for mutation discovery. Second, Exon 5 mutations in TOR1A are rarely associated with non-generalized primary dystonia.

  16. High-throughput mutational analysis of TOR1A in primary dystonia

    Directory of Open Access Journals (Sweden)

    Truong Daniel D

    2009-03-01

    Full Text Available Abstract Background Although the c.904_906delGAG mutation in Exon 5 of TOR1A typically manifests as early-onset generalized dystonia, DYT1 dystonia is genetically and clinically heterogeneous. Recently, another Exon 5 mutation (c.863G>A has been associated with early-onset generalized dystonia and some ΔGAG mutation carriers present with late-onset focal dystonia. The aim of this study was to identify TOR1A Exon 5 mutations in a large cohort of subjects with mainly non-generalized primary dystonia. Methods High resolution melting (HRM was used to examine the entire TOR1A Exon 5 coding sequence in 1014 subjects with primary dystonia (422 spasmodic dysphonia, 285 cervical dystonia, 67 blepharospasm, 41 writer's cramp, 16 oromandibular dystonia, 38 other primary focal dystonia, 112 segmental dystonia, 16 multifocal dystonia, and 17 generalized dystonia and 250 controls (150 neurologically normal and 100 with other movement disorders. Diagnostic sensitivity and specificity were evaluated in an additional 8 subjects with known ΔGAG DYT1 dystonia and 88 subjects with ΔGAG-negative dystonia. Results HRM of TOR1A Exon 5 showed high (100% diagnostic sensitivity and specificity. HRM was rapid and economical. HRM reliably differentiated the TOR1A ΔGAG and c.863G>A mutations. Melting curves were normal in 250/250 controls and 1012/1014 subjects with primary dystonia. The two subjects with shifted melting curves were found to harbor the classic ΔGAG deletion: 1 a non-Jewish Caucasian female with childhood-onset multifocal dystonia and 2 an Ashkenazi Jewish female with adolescent-onset spasmodic dysphonia. Conclusion First, HRM is an inexpensive, diagnostically sensitive and specific, high-throughput method for mutation discovery. Second, Exon 5 mutations in TOR1A are rarely associated with non-generalized primary dystonia.

  17. High-throughput transcriptome analysis of barley (Hordeum vulgare) exposed to excessive boron.

    Science.gov (United States)

    Tombuloglu, Guzin; Tombuloglu, Huseyin; Sakcali, M Serdal; Unver, Turgay

    2015-02-15

    Boron (B) is an essential micronutrient for optimum plant growth. However, above certain threshold B is toxic and causes yield loss in agricultural lands. While a number of studies were conducted to understand B tolerance mechanism, a transcriptome-wide approach for B tolerant barley is performed here for the first time. A high-throughput RNA-Seq (cDNA) sequencing technology (Illumina) was used with barley (Hordeum vulgare), yielding 208 million clean reads. In total, 256,874 unigenes were generated and assigned to known peptide databases: Gene Ontology (GO) (99,043), Swiss-Prot (38,266), Clusters of Orthologous Groups (COG) (26,250), and the Kyoto Encyclopedia of Genes and Genomes (KEGG) (36,860), as determined by BLASTx search. According to the digital gene expression (DGE) analyses, 16% and 17% of the transcripts were found to be differentially regulated in root and leaf tissues, respectively. Most of them were involved in cell wall, stress response, membrane, protein kinase and transporter mechanisms. Some of the genes detected as highly expressed in root tissue are phospholipases, predicted divalent heavy-metal cation transporters, formin-like proteins and calmodulin/Ca(2+)-binding proteins. In addition, chitin-binding lectin precursor, ubiquitin carboxyl-terminal hydrolase, and serine/threonine-protein kinase AFC2 genes were indicated to be highly regulated in leaf tissue upon excess B treatment. Some pathways, such as the Ca(2+)-calmodulin system, are activated in response to B toxicity. The differential regulation of 10 transcripts was confirmed by qRT-PCR, revealing the tissue-specific responses against B toxicity and their putative function in B-tolerance mechanisms. Copyright © 2014. Published by Elsevier B.V.

  18. The use of high-throughput DNA sequencing in the investigation of antigenic variation: application to Neisseria species.

    Directory of Open Access Journals (Sweden)

    John K Davies

    Full Text Available Antigenic variation occurs in a broad range of species. This process resembles gene conversion in that variant DNA is unidirectionally transferred from partial gene copies (or silent loci into an expression locus. Previous studies of antigenic variation have involved the amplification and sequencing of individual genes from hundreds of colonies. Using the pilE gene from Neisseria gonorrhoeae we have demonstrated that it is possible to use PCR amplification, followed by high-throughput DNA sequencing and a novel assembly process, to detect individual antigenic variation events. The ability to detect these events was much greater than has previously been possible. In N. gonorrhoeae most silent loci contain multiple partial gene copies. Here we show that there is a bias towards using the copy at the 3' end of the silent loci (copy 1 as the donor sequence. The pilE gene of N. gonorrhoeae and some strains of Neisseria meningitidis encode class I pilin, but strains of N. meningitidis from clonal complexes 8 and 11 encode a class II pilin. We have confirmed that the class II pili of meningococcal strain FAM18 (clonal complex 11 are non-variable, and this is also true for the class II pili of strain NMB from clonal complex 8. In addition when a gene encoding class I pilin was moved into the meningococcal strain NMB background there was no evidence of antigenic variation. Finally we investigated several members of the opa gene family of N. gonorrhoeae, where it has been suggested that limited variation occurs. Variation was detected in the opaK gene that is located close to pilE, but not at the opaJ gene located elsewhere on the genome. The approach described here promises to dramatically improve studies of the extent and nature of antigenic variation systems in a variety of species.

  19. Identification and characterization of microRNAs in Humulus lupulus using high-throughput sequencing and their response to Citrus bark cracking viroid (CBCVd) infection

    Czech Academy of Sciences Publication Activity Database

    Mishra, Ajay Kumar; Duraisamy, Ganesh Selvaraj; Matoušek, Jaroslav; Radišek, S.; Javornik, B.; Jakše, J.

    2016-01-01

    Roč. 17, č. 919 (2016) ISSN 1471-2164 R&D Projects: GA MŠk(CZ) LH14255 Institutional support: RVO:60077344 Keywords : Humulus lupulus * High-throughput sequencing * Citrus bark cracking viroid Subject RIV: EB - Genetics ; Molecular Biology Impact factor: 3.729, year: 2016

  20. High-throughput Analysis of Large Microscopy Image Datasets on CPU-GPU Cluster Platforms.

    Science.gov (United States)

    Teodoro, George; Pan, Tony; Kurc, Tahsin M; Kong, Jun; Cooper, Lee A D; Podhorszki, Norbert; Klasky, Scott; Saltz, Joel H

    2013-05-01

    Analysis of large pathology image datasets offers significant opportunities for the investigation of disease morphology, but the resource requirements of analysis pipelines limit the scale of such studies. Motivated by a brain cancer study, we propose and evaluate a parallel image analysis application pipeline for high throughput computation of large datasets of high resolution pathology tissue images on distributed CPU-GPU platforms. To achieve efficient execution on these hybrid systems, we have built runtime support that allows us to express the cancer image analysis application as a hierarchical data processing pipeline. The application is implemented as a coarse-grain pipeline of stages, where each stage may be further partitioned into another pipeline of fine-grain operations. The fine-grain operations are efficiently managed and scheduled for computation on CPUs and GPUs using performance aware scheduling techniques along with several optimizations, including architecture aware process placement, data locality conscious task assignment, data prefetching, and asynchronous data copy. These optimizations are employed to maximize the utilization of the aggregate computing power of CPUs and GPUs and minimize data copy overheads. Our experimental evaluation shows that the cooperative use of CPUs and GPUs achieves significant improvements on top of GPU-only versions (up to 1.6×) and that the execution of the application as a set of fine-grain operations provides more opportunities for runtime optimizations and attains better performance than coarser-grain, monolithic implementations used in other works. An implementation of the cancer image analysis pipeline using the runtime support was able to process an image dataset consisting of 36,848 4Kx4K-pixel image tiles (about 1.8TB uncompressed) in less than 4 minutes (150 tiles/second) on 100 nodes of a state-of-the-art hybrid cluster system.

  1. Diagnostic single gene analyses beyond Sanger. Economic high-throughput sequencing of small genes involved in congenital coagulation and platelet disorders.

    Science.gov (United States)

    Najm, Juliane; Rath, Matthias; Schröder, Winnie; Felbor, Ute

    2017-07-17

    Molecular testing of congenital coagulation and platelet disorders offers confirmation of clinical diagnoses, supports genetic counselling, and enables predictive and prenatal diagnosis. In some cases, genotype-phenotype correlations are important for predicting the clinical course of the disease and adaptation of individualized therapy. Until recently, genotyping has been mainly performed by Sanger sequencing. While next generation sequencing (NGS) enables the parallel analysis of multiple genes, the cost-value ratio of custom-made panels can be unfavorable for analyses of specific small genes. The aim of this study was to transfer genotyping of small genes involved in congenital coagulation and platelet disorders from Sanger sequencing to an NGS-based method. A LR-PCR approach for target enrichment of the entire genomic regions of the genes F7, F10, F11, F12, GATA1, MYH9, TUBB1 and WAS was combined with high-throughput sequencing on a MiSeq platform. NGS detected all variants that had previously been identified by Sanger sequencing. Our results demonstrate that this approach is an accurate and flexible tool for molecular genetic diagnostics of single small genes.

  2. High-throughput analysis reveals novel maternal germline RNAs crucial for primordial germ cell preservation and proper migration.

    Science.gov (United States)

    Owens, Dawn A; Butler, Amanda M; Aguero, Tristan H; Newman, Karen M; Van Booven, Derek; King, Mary Lou

    2017-01-15

    During oogenesis, hundreds of maternal RNAs are selectively localized to the animal or vegetal pole, including determinants of somatic and germline fates. Although microarray analysis has identified localized determinants, it is not comprehensive and is limited to known transcripts. Here, we utilized high-throughput RNA-sequencing analysis to comprehensively interrogate animal and vegetal pole RNAs in the fully grown Xenopus laevis oocyte. We identified 411 (198 annotated) and 27 (15 annotated) enriched mRNAs at the vegetal and animal pole, respectively. Ninety were novel mRNAs over 4-fold enriched at the vegetal pole and six were over 10-fold enriched at the animal pole. Unlike mRNAs, microRNAs were not asymmetrically distributed. Whole-mount in situ hybridization confirmed that all 17 selected mRNAs were localized. Biological function and network analysis of vegetally enriched transcripts identified protein-modifying enzymes, receptors, ligands, RNA-binding proteins, transcription factors and co-factors with five defining hubs linking 47 genes in a network. Initial functional studies of maternal vegetally localized mRNAs show that sox7 plays a novel and important role in primordial germ cell (PGC) development and that ephrinB1 (efnb1) is required for proper PGC migration. We propose potential pathways operating at the vegetal pole that highlight where future investigations might be most fruitful. © 2017. Published by The Company of Biologists Ltd.

  3. High-throughput analysis of ammonia oxidiser community composition via a novel, amoA-based functional gene array.

    Directory of Open Access Journals (Sweden)

    Guy C J Abell

    Full Text Available Advances in microbial ecology research are more often than not limited by the capabilities of available methodologies. Aerobic autotrophic nitrification is one of the most important and well studied microbiological processes in terrestrial and aquatic ecosystems. We have developed and validated a microbial diagnostic microarray based on the ammonia-monooxygenase subunit A (amoA gene, enabling the in-depth analysis of the community structure of bacterial and archaeal ammonia oxidisers. The amoA microarray has been successfully applied to analyse nitrifier diversity in marine, estuarine, soil and wastewater treatment plant environments. The microarray has moderate costs for labour and consumables and enables the analysis of hundreds of environmental DNA or RNA samples per week per person. The array has been thoroughly validated with a range of individual and complex targets (amoA clones and environmental samples, respectively, combined with parallel analysis using traditional sequencing methods. The moderate cost and high throughput of the microarray makes it possible to adequately address broader questions of the ecology of microbial ammonia oxidation requiring high sample numbers and high resolution of the community composition.

  4. Understanding microalgal species composition and contributions in Antarctic glacial melt water through rbcL high throughput sequencing

    Science.gov (United States)

    Barretto, K. M.; Kalmbach, A. J.; de la Torre, J. R.; Falcón, L. I.; Carpenter, E. J.

    2016-02-01

    The McMurdo Dry Valleys (MDV) in Antarctica present unique research opportunities, both because of the understudied biogeochemical impact of their microbial communities, and their sensitivity to climate change. Despite harsh desiccation, pH, and salinity stress, summer glacial melt water supports life in the MDV in the form of algal mats. These mat communities are complex in structure, with a network of dominant cyanobacteria interspersed with heterotrophic diazotrophs, smaller photoautotrophs, and thick extracellular polymeric substances. Due to their complexity, standard microscopy yields a limited understanding of community assemblages. Our previous high throughput sequencing (HTS) approaches focusing on 16S rRNA have profiled communities with understudied photosynthetic phyla such as Acidobacteria, Gemmatimonadetes, and Chloroflexi. To characterize these phototrophic communities, we are interested in (1) understanding their temporal dynamics and how the dominant cyanobacterial species influence community composition, (2) modeling how pH, nutrients, soil wetness, and temperature act as multivariate drivers of community composition, and (3) establishing a pipeline for HTS of the rbcL gene - which encodes the large subunit of the ubiquitous photosynthetic protein RuBisCO. Our initial screening of community DNA from MDV algal mats has shown the presence of Form IA, IB, and IC cbbL (an rbcL ortholog), and Form ID rbcL - indicating a relatively high degree of photoautotrophic diversity. Soil wetness drives anoxic conditions and we see that it shifts overall microbial composition - we expect photoautotrophs to respond similarly. We also expect photoautotrophic assemblages to shift with pH and soil nutrients. Our deep sequencing efforts suggest an inconsistency between indexing primers and algal DNA that could underestimate cyanobacterial and overestimate eukaryotic abundance. Resolving these issues with new approaches will allow us to more fully understand the

  5. Short-term assessment of BCR repertoires of SLE patients after high dose glucocorticoid therapy with high-throughput sequencing.

    Science.gov (United States)

    Shi, Bin; Yu, Jiang; Ma, Long; Ma, Qingqing; Liu, Chunmei; Sun, Suhong; Ma, Rui; Yao, Xinsheng

    2016-01-01

    We analyze and assess BCR repertoires of SLE patients before and after high dose glucocorticoid therapy to address two fundamental questions: (1) After the treatment, how the BCR repertoire of SLE patient change on the clone level? (2) How to screen putative autoantibody clone set from BCR repertoire of SLE patients? The PBMCs of two SLE patients (P1 and P2) at different time points were collected, and DNA of these samples were extracted. High-throughput sequencing technology was applied in detection of BCR repertoire. Finally, we used bioinformatic methodology to analyse sequence data. We found that these two patients lost some IGHV3 family genes usage after treatment compared with before treatment. For pairing of IGHV-IGHJ gene, no significant change was shown for each patient. In addition, analyses of the composition of H-CDR3 showed overall AA compositions of H-CDR3 at three time points in each SLE patients were very similar, and the results of H-CDR3 AA usage that had the same length (14 AA) and the same position were similar. Antinuclear antibody tests of SLE patients showed that level of some antinuclear antibodies reduced after treatment; however, there was no sign that the percentage of autoantibody clones in BCR repertoires would reduce. High dose glucocorticoid treatment in short term will have little impact on composition of BCR repertoire of SLE patient. Treatment can reduce the amount of autoantibody in the protein level, but may not reduce the percentage of autoantibody clones in BCR repertoire in the clonal level.

  6. High-throughput sequencing and copy number variation detection using formalin fixed embedded tissue in metastatic gastric cancer.

    Directory of Open Access Journals (Sweden)

    Seokhwi Kim

    Full Text Available In the era of targeted therapy, mutation profiling of cancer is a crucial aspect of making therapeutic decisions. To characterize cancer at a molecular level, the use of formalin-fixed paraffin-embedded tissue is important. We tested the Ion AmpliSeq Cancer Hotspot Panel v2 and nCounter Copy Number Variation Assay in 89 formalin-fixed paraffin-embedded gastric cancer samples to determine whether they are applicable in archival clinical samples for personalized targeted therapies. We validated the results with Sanger sequencing, real-time quantitative PCR, fluorescence in situ hybridization and immunohistochemistry. Frequently detected somatic mutations included TP53 (28.17%, APC (10.1%, PIK3CA (5.6%, KRAS (4.5%, SMO (3.4%, STK11 (3.4%, CDKN2A (3.4% and SMAD4 (3.4%. Amplifications of HER2, CCNE1, MYC, KRAS and EGFR genes were observed in 8 (8.9%, 4 (4.5%, 2 (2.2%, 1 (1.1% and 1 (1.1% cases, respectively. In the cases with amplification, fluorescence in situ hybridization for HER2 verified gene amplification and immunohistochemistry for HER2, EGFR and CCNE1 verified the overexpression of proteins in tumor cells. In conclusion, we successfully performed semiconductor-based sequencing and nCounter copy number variation analyses in formalin-fixed paraffin-embedded gastric cancer samples. High-throughput screening in archival clinical samples enables faster, more accurate and cost-effective detection of hotspot mutations or amplification in genes.

  7. Developing high throughput genotyped chromosome segment substitution lines based on population whole-genome re-sequencing in rice (Oryza sativa L.

    Directory of Open Access Journals (Sweden)

    Gu Minghong

    2010-11-01

    Full Text Available Abstract Background Genetic populations provide the basis for a wide range of genetic and genomic studies and have been widely used in genetic mapping, gene discovery and genomics-assisted breeding. Chromosome segment substitution lines (CSSLs are the most powerful tools for the detection and precise mapping of quantitative trait loci (QTLs, for the analysis of complex traits in plant molecular genetics. Results In this study, a wide population consisting of 128 CSSLs was developed, derived from the crossing and back-crossing of two sequenced rice cultivars: 9311, an elite indica cultivar as the recipient and Nipponbare, a japonica cultivar as the donor. First, a physical map of the 128 CSSLs was constructed on the basis of estimates of the lengths and locations of the substituted chromosome segments using 254 PCR-based molecular markers. From this map, the total size of the 142 substituted segments in the population was 882.2 Mb, was 2.37 times that of the rice genome. Second, every CSSL underwent high-throughput genotyping by whole-genome re-sequencing with a 0.13× genome sequence, and an ultrahigh-quality physical map was constructed. This sequencing-based physical map indicated that 117 new segments were detected; almost all were shorter than 3 Mb and were not apparent in the molecular marker map. Furthermore, relative to the molecular marker-based map, the sequencing-based map yielded more precise recombination breakpoint determination and greater accuracy of the lengths of the substituted segments, and provided more accurate background information. Third, using the 128 CSSLs combined with the bin-map converted from the sequencing-based physical map, a multiple linear regression QTL analysis mapped nine QTLs, which explained 89.50% of the phenotypic variance for culm length. A large-effect QTL was located in a 791,655 bp region that contained the rice 'green revolution' gene. Conclusions The present results demonstrated that high

  8. Developing high throughput genotyped chromosome segment substitution lines based on population whole-genome re-sequencing in rice (Oryza sativa L.).

    Science.gov (United States)

    Xu, Jianjun; Zhao, Qiang; Du, Peina; Xu, Chenwu; Wang, Baohe; Feng, Qi; Liu, Qiaoquan; Tang, Shuzhu; Gu, Minghong; Han, Bin; Liang, Guohua

    2010-11-24

    Genetic populations provide the basis for a wide range of genetic and genomic studies and have been widely used in genetic mapping, gene discovery and genomics-assisted breeding. Chromosome segment substitution lines (CSSLs) are the most powerful tools for the detection and precise mapping of quantitative trait loci (QTLs), for the analysis of complex traits in plant molecular genetics. In this study, a wide population consisting of 128 CSSLs was developed, derived from the crossing and back-crossing of two sequenced rice cultivars: 9311, an elite indica cultivar as the recipient and Nipponbare, a japonica cultivar as the donor. First, a physical map of the 128 CSSLs was constructed on the basis of estimates of the lengths and locations of the substituted chromosome segments using 254 PCR-based molecular markers. From this map, the total size of the 142 substituted segments in the population was 882.2 Mb, was 2.37 times that of the rice genome. Second, every CSSL underwent high-throughput genotyping by whole-genome re-sequencing with a 0.13× genome sequence, and an ultrahigh-quality physical map was constructed. This sequencing-based physical map indicated that 117 new segments were detected; almost all were shorter than 3 Mb and were not apparent in the molecular marker map. Furthermore, relative to the molecular marker-based map, the sequencing-based map yielded more precise recombination breakpoint determination and greater accuracy of the lengths of the substituted segments, and provided more accurate background information. Third, using the 128 CSSLs combined with the bin-map converted from the sequencing-based physical map, a multiple linear regression QTL analysis mapped nine QTLs, which explained 89.50% of the phenotypic variance for culm length. A large-effect QTL was located in a 791,655 bp region that contained the rice 'green revolution' gene. The present results demonstrated that high throughput genotyped CSSLs combine the advantages of an ultrahigh

  9. ZebraZoom: an automated program for high-throughput behavioral analysis and categorization

    Directory of Open Access Journals (Sweden)

    Olivier eMirat

    2013-06-01

    Full Text Available The zebrafish larva stands out as an emergent model organism for translational studies involving gene or drug screening thanks to its size, genetics, and permeability. At the larval stage, locomotion occurs in short episodes punctuated by periods of rest. Although phenotyping behavior is a key component of large-scale screens, it has not yet been automated in this model system. We developed ZebraZoom, a program to automatically track larvae and identify maneuvers for many animals performing discrete movements. Our program detects each episodic movement and extracts large-scale statistics on motor patterns to produce a quantification of the locomotor repertoire. We used ZebraZoom to identify motor defects induced by a glycinergic receptor antagonist. The analysis of the blind mutant atoh7 (lak revealed small locomotor defects associated with the mutation. Using multiclass supervised machine learning, ZebraZoom categorizes all episodes of movement for each larva into one of three possible maneuvers: slow forward swim, routine turn, and escape. ZebraZoom reached 91% accuracy for categorization of stereotypical maneuvers that four independent experimenters unanimously identified. For all maneuvers in the data set, ZebraZoom agreed 73.2-82.5% of cases with four independent experimenters. We modeled the series of maneuvers performed by larvae as Markov chains and observed that larvae often repeated the same maneuvers within a group. When analyzing subsequent maneuvers performed by different larvae, we found that larva-larva interactions occurred as series of escapes. Overall, ZebraZoom reaches the level of precision found in manual analysis but accomplishes tasks in a high-throughput format necessary for large screens.

  10. WormSizer: high-throughput analysis of nematode size and shape.

    Directory of Open Access Journals (Sweden)

    Brad T Moore

    Full Text Available The fundamental phenotypes of growth rate, size and morphology are the result of complex interactions between genotype and environment. We developed a high-throughput software application, WormSizer, which computes size and shape of nematodes from brightfield images. Existing methods for estimating volume either coarsely model the nematode as a cylinder or assume the worm shape or opacity is invariant. Our estimate is more robust to changes in morphology or optical density as it only assumes radial symmetry. This open source software is written as a plugin for the well-known image-processing framework Fiji/ImageJ. It may therefore be extended easily. We evaluated the technical performance of this framework, and we used it to analyze growth and shape of several canonical Caenorhabditis elegans mutants in a developmental time series. We confirm quantitatively that a Dumpy (Dpy mutant is short and fat and that a Long (Lon mutant is long and thin. We show that daf-2 insulin-like receptor mutants are larger than wild-type upon hatching but grow slow, and WormSizer can distinguish dauer larvae from normal larvae. We also show that a Small (Sma mutant is actually smaller than wild-type at all stages of larval development. WormSizer works with Uncoordinated (Unc and Roller (Rol mutants as well, indicating that it can be used with mutants despite behavioral phenotypes. We used our complete data set to perform a power analysis, giving users a sense of how many images are needed to detect different effect sizes. Our analysis confirms and extends on existing phenotypic characterization of well-characterized mutants, demonstrating the utility and robustness of WormSizer.

  11. ToxCast Workflow: High-throughput screening assay data processing, analysis and management (SOT)

    Science.gov (United States)

    US EPA’s ToxCast program is generating data in high-throughput screening (HTS) and high-content screening (HCS) assays for thousands of environmental chemicals, for use in developing predictive toxicity models. Currently the ToxCast screening program includes over 1800 unique c...

  12. Application of high-throughput technologies to a structural proteomics-type analysis of Bacillus anthracis

    NARCIS (Netherlands)

    Au, K.; Folkers, G.E.; Kaptein, R.

    2006-01-01

    A collaborative project between two Structural Proteomics In Europe (SPINE) partner laboratories, York and Oxford, aimed at high-throughput (HTP) structure determination of proteins from Bacillus anthracis, the aetiological agent of anthrax and a biomedically important target, is described. Based

  13. Two-stage clustering (TSC: a pipeline for selecting operational taxonomic units for the high-throughput sequencing of PCR amplicons.

    Directory of Open Access Journals (Sweden)

    Xiao-Tao Jiang

    Full Text Available Clustering 16S/18S rRNA amplicon sequences into operational taxonomic units (OTUs is a critical step for the bioinformatic analysis of microbial diversity. Here, we report a pipeline for selecting OTUs with a relatively low computational demand and a high degree of accuracy. This pipeline is referred to as two-stage clustering (TSC because it divides tags into two groups according to their abundance and clusters them sequentially. The more abundant group is clustered using a hierarchical algorithm similar to that in ESPRIT, which has a high degree of accuracy but is computationally costly for large datasets. The rarer group, which includes the majority of tags, is then heuristically clustered to improve efficiency. To further improve the computational efficiency and accuracy, two preclustering steps are implemented. To maintain clustering accuracy, all tags are grouped into an OTU depending on their pairwise Needleman-Wunsch distance. This method not only improved the computational efficiency but also mitigated the spurious OTU estimation from 'noise' sequences. In addition, OTUs clustered using TSC showed comparable or improved performance in beta-diversity comparisons compared to existing OTU selection methods. This study suggests that the distribution of sequencing datasets is a useful property for improving the computational efficiency and increasing the clustering accuracy of the high-throughput sequencing of PCR amplicons. The software and user guide are freely available at http://hwzhoulab.smu.edu.cn/paperdata/.

  14. High-throughput identification and screening of novel Methylobacterium species using whole-cell MALDI-TOF/MS analysis.

    Science.gov (United States)

    Tani, Akio; Sahin, Nurettin; Matsuyama, Yumiko; Enomoto, Takashi; Nishimura, Naoki; Yokota, Akira; Kimbara, Kazuhide

    2012-01-01

    Methylobacterium species are ubiquitous α-proteobacteria that reside in the phyllosphere and are fed by methanol that is emitted from plants. In this study, we applied whole-cell matrix-assisted laser desorption/ionization time-of-flight mass spectrometry analysis (WC-MS) to evaluate the diversity of Methylobacterium species collected from a variety of plants. The WC-MS spectrum was reproducible through two weeks of cultivation on different media. WC-MS spectrum peaks of M. extorquens strain AM1 cells were attributed to ribosomal proteins, but those were not were also found. We developed a simple method for rapid identification based on spectra similarity. Using all available type strains of Methylobacterium species, the method provided a certain threshold similarity value for species-level discrimination, although the genus contains some type strains that could not be easily discriminated solely by 16S rRNA gene sequence similarity. Next, we evaluated the WC-MS data of approximately 200 methylotrophs isolated from various plants with MALDI Biotyper software (Bruker Daltonics). Isolates representing each cluster were further identified by 16S rRNA gene sequencing. In most cases, the identification by WC-MS matched that by sequencing, and isolates with unique spectra represented possible novel species. The strains belonging to M. extorquens, M. adhaesivum, M. marchantiae, M. komagatae, M. brachiatum, M. radiotolerans, and novel lineages close to M. adhaesivum, many of which were isolated from bryophytes, were found to be the most frequent phyllospheric colonizers. The WC-MS technique provides emerging high-throughputness in the identification of known/novel species of bacteria, enabling the selection of novel species in a library and identification without 16S rRNA gene sequencing.

  15. High-throughput identification and screening of novel Methylobacterium species using whole-cell MALDI-TOF/MS analysis.

    Directory of Open Access Journals (Sweden)

    Akio Tani

    Full Text Available Methylobacterium species are ubiquitous α-proteobacteria that reside in the phyllosphere and are fed by methanol that is emitted from plants. In this study, we applied whole-cell matrix-assisted laser desorption/ionization time-of-flight mass spectrometry analysis (WC-MS to evaluate the diversity of Methylobacterium species collected from a variety of plants. The WC-MS spectrum was reproducible through two weeks of cultivation on different media. WC-MS spectrum peaks of M. extorquens strain AM1 cells were attributed to ribosomal proteins, but those were not were also found. We developed a simple method for rapid identification based on spectra similarity. Using all available type strains of Methylobacterium species, the method provided a certain threshold similarity value for species-level discrimination, although the genus contains some type strains that could not be easily discriminated solely by 16S rRNA gene sequence similarity. Next, we evaluated the WC-MS data of approximately 200 methylotrophs isolated from various plants with MALDI Biotyper software (Bruker Daltonics. Isolates representing each cluster were further identified by 16S rRNA gene sequencing. In most cases, the identification by WC-MS matched that by sequencing, and isolates with unique spectra represented possible novel species. The strains belonging to M. extorquens, M. adhaesivum, M. marchantiae, M. komagatae, M. brachiatum, M. radiotolerans, and novel lineages close to M. adhaesivum, many of which were isolated from bryophytes, were found to be the most frequent phyllospheric colonizers. The WC-MS technique provides emerging high-throughputness in the identification of known/novel species of bacteria, enabling the selection of novel species in a library and identification without 16S rRNA gene sequencing.

  16. High Throughput Method for Analysis of Repeat Number for 28 Phase Variable Loci of Campylobacter jejuni Strain NCTC11168.

    Directory of Open Access Journals (Sweden)

    Lea Lango-Scholey

    Full Text Available Mutations in simple sequence repeat tracts are a major mechanism of phase variation in several bacterial species including Campylobacter jejuni. Changes in repeat number of tracts located within the reading frame can produce a high frequency of reversible switches in gene expression between ON and OFF states. The genome of C. jejuni strain NCTC11168 contains 29 loci with polyG/polyC tracts of seven or more repeats. This protocol outlines a method-the 28-locus-CJ11168 PV-analysis assay-for rapidly determining ON/OFF states of 28 of these phase-variable loci in a large number of individual colonies from C. jejuni strain NCTC11168. The method combines a series of multiplex PCR assays with a fragment analysis assay and automated extraction of fragment length, repeat number and expression state. This high throughput, multiplex assay has utility for detecting shifts in phase variation states within and between populations over time and for exploring the effects of phase variation on adaptation to differing selective pressures. Application of this method to analysis of the 28 polyG/polyC tracts in 90 C. jejuni colonies detected a 2.5-fold increase in slippage products as tracts lengthened from G8 to G11 but no difference between tracts of similar length indicating that flanking sequence does not influence slippage rates. Comparison of this observed slippage to previously measured mutation rates for G8 and G11 tracts in C. jejuni indicates that PCR amplification of a DNA sample will over-estimate phase variation frequencies by 20-35-fold. An important output of the 28-locus-CJ11168 PV-analysis assay is combinatorial expression states that cannot be determined by other methods. This method can be adapted to analysis of phase variation in other C. jejuni strains and in a diverse range of bacterial species.

  17. High-throughput sequencing of fecal DNA to identify insects consumed by wild Weddell's saddleback tamarins (Saguinus weddelli, Cebidae, Primates) in Bolivia.

    Science.gov (United States)

    Mallott, E K; Malhi, R S; Garber, P A

    2015-03-01

    The genus Saguinus represents a successful radiation of over 20 species of small-bodied New World monkeys. Studies of the tamarin diet indicate that insects and small vertebrates account for ∼16-45% of total feeding and foraging time, and represent an important source of lipids, protein, and metabolizable energy. Although tamarins are reported to commonly consume large-bodied insects such as grasshoppers and walking sticks (Orthoptera), little is known concerning the degree to which smaller or less easily identifiable arthropod prey comprises an important component of their diet. To better understand tamarin arthropod feeding behavior, fecal samples from 20 wild Bolivian saddleback tamarins (members of five groups) were collected over a 3 week period in June 2012, and analyzed for the presence of arthropod DNA. DNA was extracted using a Qiagen stool extraction kit, and universal insect primers were created and used to amplify a ∼280 bp section of the COI mitochondrial gene. Amplicons were sequenced on the Roche 454 sequencing platform using high-throughput sequencing techniques. An analysis of these samples indicated the presence of 43 taxa of arthropods including 10 orders, 15 families, and 12 identified genera. Many of these taxa had not been previously identified in the tamarin diet. These results highlight molecular analysis of fecal DNA as an important research tool for identifying anthropod feeding patterns in primates, and reveal broad diversity in the taxa, foraging microhabitats, and size of arthropods consumed by tamarin monkeys. © 2014 Wiley Periodicals, Inc.

  18. High throughput SNP discovery and genotyping in grapevine (Vitis vinifera L. by combining a re-sequencing approach and SNPlex technology

    Directory of Open Access Journals (Sweden)

    Martínez-Zapater José M

    2007-11-01

    decay of LD within the selected grapevine genotypes. To validate the use of the detected polymorphisms in genetic mapping, cultivar identification and genetic diversity studies we have used the SNPlex™ genotyping technology in a sample of grapevine genotypes and segregating progenies. Conclusion These results provide accurate values for nucleotide diversity in coding sequences and a first estimate of short-range LD in grapevine. Using SNPlex™ genotyping we have shown the application of a set of discovered SNPs as molecular markers for cultivar identification, linkage mapping and genetic diversity studies. Thus, the combination a highly efficient re-sequencing approach and the SNPlex™ high throughput genotyping technology provide a powerful tool for grapevine genetic analysis.

  19. High-throughput sequencing technology to reveal the composition and function of cecal microbiota in Dagu chicken.

    Science.gov (United States)

    Xu, Yunhe; Yang, Huixin; Zhang, Lili; Su, Yuhong; Shi, Donghui; Xiao, Haidi; Tian, Yumin

    2016-11-04

    The chicken gut microbiota is an important and complicated ecosystem for the host. They play an important role in converting food into nutrient and energy. The coding capacity of microbiome vastly surpasses that of the host's genome, encoding biochemical pathways that the host has not developed. An optimal gut microbiota can increase agricultural productivity. This study aims to explore the composition and function of cecal microbiota in Dagu chicken under two feeding modes, free-range (outdoor, OD) and cage (indoor, ID) raising. Cecal samples were collected from 24 chickens across 4 groups (12-w OD, 12-w ID, 18-w OD, and 18-w ID). We performed high-throughput sequencing of the 16S rRNA genes V4 hypervariable regions to characterize the cecal microbiota of Dagu chicken and compare the difference of cecal microbiota between free-range and cage raising chickens. It was found that 34 special operational taxonomic units (OTUs) in OD groups and 4 special OTUs in ID groups. 24 phyla were shared by the 24 samples. Bacteroidetes was the most abundant phylum with the largest proportion, followed by Firmicutes and Proteobacteria. The OD groups showed a higher proportion of Bacteroidetes (>50 %) in cecum, but a lower Firmicutes/Bacteroidetes ratio in both 12-w old (0.42, 0.62) and 18-w old groups (0.37, 0.49) compared with the ID groups. Cecal microbiota in the OD groups have higher abundance of functions involved in amino acids and glycan metabolic pathway. The composition and function of cecal microbiota in Dagu chicken under two feeding modes, free-range and cage raising are different. The cage raising mode showed a lower proportion of Bacteroidetes in cecum, but a higher Firmicutes/Bacteroidetes ratio compared with free-range mode. Cecal microbiota in free-range mode have higher abundance of functions involved in amino acids and glycan metabolic pathway.

  20. Characterization of Bacterial and Fungal Community Dynamics by High-Throughput Sequencing (HTS Metabarcoding during Flax Dew-Retting

    Directory of Open Access Journals (Sweden)

    Christophe Djemiel

    2017-10-01

    Full Text Available Flax dew-retting is a key step in the industrial extraction of fibers from flax stems and is dependent upon the production of a battery of hydrolytic enzymes produced by micro-organisms during this process. To explore the diversity and dynamics of bacterial and fungal communities involved in this process we applied a high-throughput sequencing (HTS DNA metabarcoding approach (16S rRNA/ITS region, Illumina Miseq on plant and soil samples obtained over a period of 7 weeks in July and August 2014. Twenty-three bacterial and six fungal phyla were identified in soil samples and 11 bacterial and four fungal phyla in plant samples. Dominant phyla were Proteobacteria, Bacteroidetes, Actinobacteria, and Firmicutes (bacteria and Ascomycota, Basidiomycota, and Zygomycota (fungi all of which have been previously associated with flax dew-retting except for Bacteroidetes and Basidiomycota that were identified for the first time. Rare phyla also identified for the first time in this process included Acidobacteria, CKC4, Chlorobi, Fibrobacteres, Gemmatimonadetes, Nitrospirae and TM6 (bacteria, and Chytridiomycota (fungi. No differences in microbial communities and colonization dynamics were observed between early and standard flax harvests. In contrast, the common agricultural practice of swath turning affects both bacterial and fungal community membership and structure in straw samples and may contribute to a more uniform retting. Prediction of community function using PICRUSt indicated the presence of a large collection of potential bacterial enzymes capable of hydrolyzing backbones and side-chains of cell wall polysaccharides. Assignment of functional guild (functional group using FUNGuild software highlighted a change from parasitic to saprophytic trophic modes in fungi during retting. This work provides the first exhaustive description of the microbial communities involved in flax dew-retting and will provide a valuable benchmark in future studies aiming

  1. Characterization of Bacterial and Fungal Community Dynamics by High-Throughput Sequencing (HTS) Metabarcoding during Flax Dew-Retting.

    Science.gov (United States)

    Djemiel, Christophe; Grec, Sébastien; Hawkins, Simon

    2017-01-01

    Flax dew-retting is a key step in the industrial extraction of fibers from flax stems and is dependent upon the production of a battery of hydrolytic enzymes produced by micro-organisms during this process. To explore the diversity and dynamics of bacterial and fungal communities involved in this process we applied a high-throughput sequencing (HTS) DNA metabarcoding approach (16S rRNA/ITS region, Illumina Miseq) on plant and soil samples obtained over a period of 7 weeks in July and August 2014. Twenty-three bacterial and six fungal phyla were identified in soil samples and 11 bacterial and four fungal phyla in plant samples. Dominant phyla were Proteobacteria, Bacteroidetes, Actinobacteria, and Firmicutes (bacteria) and Ascomycota, Basidiomycota, and Zygomycota (fungi) all of which have been previously associated with flax dew-retting except for Bacteroidetes and Basidiomycota that were identified for the first time. Rare phyla also identified for the first time in this process included Acidobacteria, CKC4, Chlorobi, Fibrobacteres, Gemmatimonadetes, Nitrospirae and TM6 (bacteria), and Chytridiomycota (fungi). No differences in microbial communities and colonization dynamics were observed between early and standard flax harvests. In contrast, the common agricultural practice of swath turning affects both bacterial and fungal community membership and structure in straw samples and may contribute to a more uniform retting. Prediction of community function using PICRUSt indicated the presence of a large collection of potential bacterial enzymes capable of hydrolyzing backbones and side-chains of cell wall polysaccharides. Assignment of functional guild (functional group) using FUNGuild software highlighted a change from parasitic to saprophytic trophic modes in fungi during retting. This work provides the first exhaustive description of the microbial communities involved in flax dew-retting and will provide a valuable benchmark in future studies aiming to evaluate

  2. Next-generation phage display: integrating and comparing available molecular tools to enable cost-effective high-throughput analysis.

    Directory of Open Access Journals (Sweden)

    Emmanuel Dias-Neto

    Full Text Available BACKGROUND: Combinatorial phage display has been used in the last 20 years in the identification of protein-ligands and protein-protein interactions, uncovering relevant molecular recognition events. Rate-limiting steps of combinatorial phage display library selection are (i the counting of transducing units and (ii the sequencing of the encoded displayed ligands. Here, we adapted emerging genomic technologies to minimize such challenges. METHODOLOGY/PRINCIPAL FINDINGS: We gained efficiency by applying in tandem real-time PCR for rapid quantification to enable bacteria-free phage display library screening, and added phage DNA next-generation sequencing for large-scale ligand analysis, reporting a fully integrated set of high-throughput quantitative and analytical tools. The approach is far less labor-intensive and allows rigorous quantification; for medical applications, including selections in patients, it also represents an advance for quantitative distribution analysis and ligand identification of hundreds of thousands of targeted particles from patient-derived biopsy or autopsy in a longer timeframe post library administration. Additional advantages over current methods include increased sensitivity, less variability, enhanced linearity, scalability, and accuracy at much lower cost. Sequences obtained by qPhage plus pyrosequencing were similar to a dataset produced from conventional Sanger-sequenced transducing-units (TU, with no biases due to GC content, codon usage, and amino acid or peptide frequency. These tools allow phage display selection and ligand analysis at >1,000-fold faster rate, and reduce costs approximately 250-fold for generating 10(6 ligand sequences. CONCLUSIONS/SIGNIFICANCE: Our analyses demonstrates that whereas this approach correlates with the traditional colony-counting, it is also capable of a much larger sampling, allowing a faster, less expensive, more accurate and consistent analysis of phage enrichment. Overall

  3. Micro-scaled high-throughput digestion of plant tissue samples for multi-elemental analysis

    Directory of Open Access Journals (Sweden)

    Husted Søren

    2009-09-01

    Full Text Available Abstract Background Quantitative multi-elemental analysis by inductively coupled plasma (ICP spectrometry depends on a complete digestion of solid samples. However, fast and thorough sample digestion is a challenging analytical task which constitutes a bottleneck in modern multi-elemental analysis. Additional obstacles may be that sample quantities are limited and elemental concentrations low. In such cases, digestion in small volumes with minimum dilution and contamination is required in order to obtain high accuracy data. Results We have developed a micro-scaled microwave digestion procedure and optimized it for accurate elemental profiling of plant materials (1-20 mg dry weight. A commercially available 64-position rotor with 5 ml disposable glass vials, originally designed for microwave-based parallel organic synthesis, was used as a platform for the digestion. The novel micro-scaled method was successfully validated by the use of various certified reference materials (CRM with matrices rich in starch, lipid or protein. When the micro-scaled digestion procedure was applied on single rice grains or small batches of Arabidopsis seeds (1 mg, corresponding to approximately 50 seeds, the obtained elemental profiles closely matched those obtained by conventional analysis using digestion in large volume vessels. Accumulated elemental contents derived from separate analyses of rice grain fractions (aleurone, embryo and endosperm closely matched the total content obtained by analysis of the whole rice grain. Conclusion A high-throughput micro-scaled method has been developed which enables digestion of small quantities of plant samples for subsequent elemental profiling by ICP-spectrometry. The method constitutes a valuable tool for screening of mutants and transformants. In addition, the method facilitates studies of the distribution of essential trace elements between and within plant organs which is relevant for, e.g., breeding programmes aiming at

  4. Micro-scaled high-throughput digestion of plant tissue samples for multi-elemental analysis.

    Science.gov (United States)

    Hansen, Thomas H; Laursen, Kristian H; Persson, Daniel P; Pedas, Pai; Husted, Søren; Schjoerring, Jan K

    2009-09-26

    Quantitative multi-elemental analysis by inductively coupled plasma (ICP) spectrometry depends on a complete digestion of solid samples. However, fast and thorough sample digestion is a challenging analytical task which constitutes a bottleneck in modern multi-elemental analysis. Additional obstacles may be that sample quantities are limited and elemental concentrations low. In such cases, digestion in small volumes with minimum dilution and contamination is required in order to obtain high accuracy data. We have developed a micro-scaled microwave digestion procedure and optimized it for accurate elemental profiling of plant materials (1-20 mg dry weight). A commercially available 64-position rotor with 5 ml disposable glass vials, originally designed for microwave-based parallel organic synthesis, was used as a platform for the digestion. The novel micro-scaled method was successfully validated by the use of various certified reference materials (CRM) with matrices rich in starch, lipid or protein. When the micro-scal