WorldWideScience

Sample records for high-throughput sequencing approach

  1. 76 FR 28990 - Ultra High Throughput Sequencing for Clinical Diagnostic Applications-Approaches To Assess...

    Science.gov (United States)

    2011-05-19

    ... Clinical Diagnostic Applications--Approaches To Assess Analytical Validity.'' The purpose of the public... approaches to assess analytical validity of ultra high throughput sequencing for clinical diagnostic... HUMAN SERVICES Food and Drug Administration Ultra High Throughput Sequencing for Clinical Diagnostic...

  2. Increasing ecological inference from high throughput sequencing of fungi in the environment through a tagging approach

    Science.gov (United States)

    D. Lee Taylor; Michael G. Booth; Jack W. McFarland; Ian C. Herriott; Niall J. Lennon; Chad Nusbaum; Thomas G. Marr

    2008-01-01

    High throughput sequencing methods are widely used in analyses of microbial diversity but are generally applied to small numbers of samples, which precludes charaterization of patterns of microbial diversity across space and time. We have designed a primer-tagging approach that allows pooling and subsequent sorting of numerous samples, which is directed to...

  3. Improving High-Throughput Sequencing Approaches for Reconstructing the Evolutionary Dynamics of Upper Paleolithic Human Groups

    DEFF Research Database (Denmark)

    Seguin-Orlando, Andaine

    the development and testing of innovative molecular approaches aiming at improving the amount of informative HTS data one can recover from ancient DNA extracts. We have characterized important ligation and amplification biases in the sequencing library building and enrichment steps, which can impede further...... been mainly driven by the development of High-Throughput DNA Sequencing (HTS) technologies but also by the implementation of novel molecular tools tailored to the manipulation of ultra short and damaged DNA molecules. Our ability to retrieve traces of genetic material has tremendously improved, pushing...

  4. TeloPCR-seq: a high-throughput sequencing approach for telomeres

    Science.gov (United States)

    Bennett, Henrietta W.; Liu, Na; Hu, Yan; King, Megan C.

    2017-01-01

    We have developed a high-throughput sequencing approach that enables us to determine terminal telomere sequences from tens of thousands of individual Schizosaccharomyces pombe telomeres. This method provides unprecedented coverage of telomeric sequence complexity in fission yeast. S. pombe telomeres are composed of modular degenerate repeats that can be explained by variation in usage of the TER1 RNA template during reverse transcription. Taking advantage of this deep sequencing approach, we find that “like” repeat modules are highly correlated within individual telomeres. Moreover, repeat module preference varies with telomere length, suggesting that existing repeats promote the incorporation of like repeats and/or that specific conformations of the telomerase holoenzyme efficiently and/or processively add repeats of like nature. After the loss of telomerase activity, this sequencing and analysis pipeline defines a population of telomeres with altered sequence content. This approach will be adaptable to study telomeric repeats in other organisms and also to interrogate repetitive sequences throughout the genome that are inaccessible to other sequencing methods. PMID:27714790

  5. PCR Strategies for Complete Allele Calling in Multigene Families Using High-Throughput Sequencing Approaches.

    Directory of Open Access Journals (Sweden)

    Elena Marmesat

    Full Text Available The characterization of multigene families with high copy number variation is often approached through PCR amplification with highly degenerate primers to account for all expected variants flanking the region of interest. Such an approach often introduces PCR biases that result in an unbalanced representation of targets in high-throughput sequencing libraries that eventually results in incomplete detection of the targeted alleles. Here we confirm this result and propose two different amplification strategies to alleviate this problem. The first strategy (called pooled-PCRs targets different subsets of alleles in multiple independent PCRs using different moderately degenerate primer pairs, whereas the second approach (called pooled-primers uses a custom-made pool of non-degenerate primers in a single PCR. We compare their performance to the common use of a single PCR with highly degenerate primers using the MHC class I of the Iberian lynx as a model. We found both novel approaches to work similarly well and better than the conventional approach. They significantly scored more alleles per individual (11.33 ± 1.38 and 11.72 ± 0.89 vs 7.94 ± 1.95, yielded more complete allelic profiles (96.28 ± 8.46 and 99.50 ± 2.12 vs 63.76 ± 15.43, and revealed more alleles at a population level (13 vs 12. Finally, we could link each allele's amplification efficiency with the primer-mismatches in its flanking sequences and show that ultra-deep coverage offered by high-throughput technologies does not fully compensate for such biases, especially as real alleles may reach lower coverage than artefacts. Adopting either of the proposed amplification methods provides the opportunity to attain more complete allelic profiles at lower coverages, improving confidence over the downstream analyses and subsequent applications.

  6. Compositional analysis: a valid approach to analyze microbiome high-throughput sequencing data.

    Science.gov (United States)

    Gloor, Gregory B; Reid, Gregor

    2016-08-01

    A workshop held at the 2015 annual meeting of the Canadian Society of Microbiologists highlighted compositional data analysis methods and the importance of exploratory data analysis for the analysis of microbiome data sets generated by high-throughput DNA sequencing. A summary of the content of that workshop, a review of new methods of analysis, and information on the importance of careful analyses are presented herein. The workshop focussed on explaining the rationale behind the use of compositional data analysis, and a demonstration of these methods for the examination of 2 microbiome data sets. A clear understanding of bioinformatics methodologies and the type of data being analyzed is essential, given the growing number of studies uncovering the critical role of the microbiome in health and disease and the need to understand alterations to its composition and function following intervention with fecal transplant, probiotics, diet, and pharmaceutical agents.

  7. Direct metagenomic detection of viral pathogens in nasal and fecal specimens using an unbiased high-throughput sequencing approach.

    Directory of Open Access Journals (Sweden)

    Shota Nakamura

    Full Text Available With the severe acute respiratory syndrome epidemic of 2003 and renewed attention on avian influenza viral pandemics, new surveillance systems are needed for the earlier detection of emerging infectious diseases. We applied a "next-generation" parallel sequencing platform for viral detection in nasopharyngeal and fecal samples collected during seasonal influenza virus (Flu infections and norovirus outbreaks from 2005 to 2007 in Osaka, Japan. Random RT-PCR was performed to amplify RNA extracted from 0.1-0.25 ml of nasopharyngeal aspirates (N = 3 and fecal specimens (N = 5, and more than 10 microg of cDNA was synthesized. Unbiased high-throughput sequencing of these 8 samples yielded 15,298-32,335 (average 24,738 reads in a single 7.5 h run. In nasopharyngeal samples, although whole genome analysis was not available because the majority (>90% of reads were host genome-derived, 20-460 Flu-reads were detected, which was sufficient for subtype identification. In fecal samples, bacteria and host cells were removed by centrifugation, resulting in gain of 484-15,260 reads of norovirus sequence (78-98% of the whole genome was covered, except for one specimen that was under-detectable by RT-PCR. These results suggest that our unbiased high-throughput sequencing approach is useful for directly detecting pathogenic viruses without advance genetic information. Although its cost and technological availability make it unlikely that this system will very soon be the diagnostic standard worldwide, this system could be useful for the earlier discovery of novel emerging viruses and bioterrorism, which are difficult to detect with conventional procedures.

  8. Insights into the microbial diversity and community dynamics of Chinese traditional fermented foods from using high-throughput sequencing approaches*

    Science.gov (United States)

    He, Guo-qing; Liu, Tong-jie; Sadiq, Faizan A.; Gu, Jing-si; Zhang, Guo-hua

    2017-01-01

    Chinese traditional fermented foods have a very long history dating back thousands of years and have become an indispensable part of Chinese dietary culture. A plethora of research has been conducted to unravel the composition and dynamics of microbial consortia associated with Chinese traditional fermented foods using culture-dependent as well as culture-independent methods, like different high-throughput sequencing (HTS) techniques. These HTS techniques enable us to understand the relationship between a food product and its microbes to a greater extent than ever before. Considering the importance of Chinese traditional fermented products, the objective of this paper is to review the diversity and dynamics of microbiota in Chinese traditional fermented foods revealed by HTS approaches. PMID:28378567

  9. Insights into the microbial diversity and community dynamics of Chinese traditional fermented foods from using high-throughput sequencing approaches.

    Science.gov (United States)

    He, Guo-Qing; Liu, Tong-Jie; Sadiq, Faizan A; Gu, Jing-Si; Zhang, Guo-Hua

    Chinese traditional fermented foods have a very long history dating back thousands of years and have become an indispensable part of Chinese dietary culture. A plethora of research has been conducted to unravel the composition and dynamics of microbial consortia associated with Chinese traditional fermented foods using culture-dependent as well as culture-independent methods, like different high-throughput sequencing (HTS) techniques. These HTS techniques enable us to understand the relationship between a food product and its microbes to a greater extent than ever before. Considering the importance of Chinese traditional fermented products, the objective of this paper is to review the diversity and dynamics of microbiota in Chinese traditional fermented foods revealed by HTS approaches.

  10. A novel approach for transcription factor analysis using SELEX with high-throughput sequencing (TFAST.

    Directory of Open Access Journals (Sweden)

    Daniel J Reiss

    Full Text Available BACKGROUND: In previous work, we designed a modified aptamer-free SELEX-seq protocol (afSELEX-seq for the discovery of transcription factor binding sites. Here, we present original software, TFAST, designed to analyze afSELEX-seq data, validated against our previously generated afSELEX-seq dataset and a model dataset. TFAST is designed with a simple graphical interface (Java so that it can be installed and executed without extensive expertise in bioinformatics. TFAST completes analysis within minutes on most personal computers. METHODOLOGY: Once afSELEX-seq data are aligned to a target genome, TFAST identifies peaks and, uniquely, compares peak characteristics between cycles. TFAST generates a hierarchical report of graded peaks, their associated genomic sequences, binding site length predictions, and dummy sequences. PRINCIPAL FINDINGS: Including additional cycles of afSELEX-seq improved TFAST's ability to selectively identify peaks, leading to 7,274, 4,255, and 2,628 peaks identified in two-, three-, and four-cycle afSELEX-seq. Inter-round analysis by TFAST identified 457 peaks as the strongest candidates for true binding sites. Separating peaks by TFAST into classes of worst, second-best and best candidate peaks revealed a trend of increasing significance (e-values 4.5 × 10(12, 2.9 × 10(-46, and 1.2 × 10(-73 and informational content (11.0, 11.9, and 12.5 bits over 15 bp of discovered motifs within each respective class. TFAST also predicted a binding site length (28 bp consistent with non-computational experimentally derived results for the transcription factor PapX (22 to 29 bp. CONCLUSIONS/SIGNIFICANCE: TFAST offers a novel and intuitive approach for determining DNA binding sites of proteins subjected to afSELEX-seq. Here, we demonstrate that TFAST, using afSELEX-seq data, rapidly and accurately predicted sequence length and motif for a putative transcription factor's binding site.

  11. A bioinformatics approach for determining sample identity from different lanes of high-throughput sequencing data.

    Directory of Open Access Journals (Sweden)

    Rachel L Goldfeder

    Full Text Available The ability to generate whole genome data is rapidly becoming commoditized. For example, a mammalian sized genome (∼3Gb can now be sequenced using approximately ten lanes on an Illumina HiSeq 2000. Since lanes from different runs are often combined, verifying that each lane in a genome's build is from the same sample is an important quality control. We sought to address this issue in a post hoc bioinformatic manner, instead of using upstream sample or "barcode" modifications. We rely on the inherent small differences between any two individuals to show that genotype concordance rates can be effectively used to test if any two lanes of HiSeq 2000 data are from the same sample. As proof of principle, we use recent data from three different human samples generated on this platform. We show that the distributions of concordance rates are non-overlapping when comparing lanes from the same sample versus lanes from different samples. Our method proves to be robust even when different numbers of reads are analyzed. Finally, we provide a straightforward method for determining the gender of any given sample. Our results suggest that examining the concordance of detected genotypes from lanes purported to be from the same sample is a relatively simple approach for confirming that combined lanes of data are of the same identity and quality.

  12. Applications of High Throughput Nucleotide Sequencing

    DEFF Research Database (Denmark)

    Waage, Johannes Eichler

    The recent advent of high throughput sequencing of nucleic acids (RNA and DNA) has vastly expanded research into the functional and structural biology of the genome of all living organisms (and even a few dead ones). With this enormous and exponential growth in biological data generation come...... equally large demands in data handling, analysis and interpretation, perhaps defining the modern challenge of the computational biologist of the post-genomic era. The first part of this thesis consists of a general introduction to the history, common terms and challenges of next generation sequencing......, focusing on oft encountered problems in data processing, such as quality assurance, mapping, normalization, visualization, and interpretation. Presented in the second part are scientific endeavors representing solutions to problems of two sub-genres of next generation sequencing. For the first flavor, RNA-sequencing...

  13. Applications of High Throughput Nucleotide Sequencing

    DEFF Research Database (Denmark)

    Waage, Johannes Eichler

    The recent advent of high throughput sequencing of nucleic acids (RNA and DNA) has vastly expanded research into the functional and structural biology of the genome of all living organisms (and even a few dead ones). With this enormous and exponential growth in biological data generation come...... equally large demands in data handling, analysis and interpretation, perhaps defining the modern challenge of the computational biologist of the post-genomic era. The first part of this thesis consists of a general introduction to the history, common terms and challenges of next generation sequencing......). For the second flavor, DNA-seq, a study presenting genome wide profiling of transcription factor CEBP/A in liver cells undergoing regeneration after partial hepatectomy (article IV) is included....

  14. A high-throughput de novo sequencing approach for shotgun proteomics using high-resolution tandem mass spectrometry

    Directory of Open Access Journals (Sweden)

    Banfield Jillian F

    2010-03-01

    Full Text Available Abstract Background High-resolution tandem mass spectra can now be readily acquired with hybrid instruments, such as LTQ-Orbitrap and LTQ-FT, in high-throughput shotgun proteomics workflows. The improved spectral quality enables more accurate de novo sequencing for identification of post-translational modifications and amino acid polymorphisms. Results In this study, a new de novo sequencing algorithm, called Vonode, has been developed specifically for analysis of such high-resolution tandem mass spectra. To fully exploit the high mass accuracy of these spectra, a unique scoring system is proposed to evaluate sequence tags based primarily on mass accuracy information of fragment ions. Consensus sequence tags were inferred for 11,422 spectra with an average peptide length of 5.5 residues from a total of 40,297 input spectra acquired in a 24-hour proteomics measurement of Rhodopseudomonas palustris. The accuracy of inferred consensus sequence tags was 84%. According to our comparison, the performance of Vonode was shown to be superior to the PepNovo v2.0 algorithm, in terms of the number of de novo sequenced spectra and the sequencing accuracy. Conclusions Here, we improved de novo sequencing performance by developing a new algorithm specifically for high-resolution tandem mass spectral data. The Vonode algorithm is freely available for download at http://compbio.ornl.gov/Vonode.

  15. A High-Throughput DNA-Sequencing Approach for Determining Sources of Fecal Bacteria in a Lake Superior Estuary.

    Science.gov (United States)

    Brown, Clairessa M; Staley, Christopher; Wang, Ping; Dalzell, Brent; Chun, Chan Lan; Sadowsky, Michael J

    2017-08-01

    Current microbial source-tracking (MST) methods, employed to determine sources of fecal contamination in waterways, use molecular markers targeting host-associated bacteria in animal or human feces. However, there is a lack of knowledge about fecal microbiome composition in several animals and imperfect marker specificity and sensitivity. To overcome these issues, a community-based MST method has been developed. Here, we describe a study done in the Lake Superior-Saint Louis River estuary using SourceTracker, a program that calculates the source contribution to an environment. High-throughput DNA sequencing of microbiota from a diverse collection of fecal samples obtained from 11 types of animals (wild, agricultural, and domesticated) and treated effluent (n = 233) was used to generate a fecal library to perform community-based MST. Analysis of 319 fecal and environmental samples revealed that the community compositions in water and fecal samples were significantly different, allowing for the determination of the presence of fecal inputs and identification of specific sources. SourceTracker results indicated that fecal bacterial inputs into the Lake Superior estuary were primarily attributed to wastewater effluent and, to a lesser extent, geese and gull wastes. These results suggest that a community-based MST method may be another useful tool for determining sources of aquatic fecal bacteria.

  16. Metabolomic and high-throughput sequencing analysis – modern approach for the assessment of biodeterioration of materials from historic buildings

    Directory of Open Access Journals (Sweden)

    Beata eGutarowska

    2015-09-01

    Full Text Available Preservation of cultural heritage is of paramount importance worldwide. Microbial colonization of construction materials, such as wood, brick, mortar and stone in historic buildings can lead to severe deterioration. The aim of the present study was to give modern insight into the phylogenetic diversity and activated metabolic pathways of microbial communities colonized historic objects located in the former Auschwitz II-Birkenau concentration and extermination camp in Oświęcim, Poland. For this purpose we combined molecular, microscopic and chemical methods. Selected specimens were examined using Field Emission Scanning Electron Microscopy (FESEM, metabolomic analysis and high-throughput Illumina sequencing. FESEM imaging revealed the presence of complex microbial communities comprising diatoms, fungi and bacteria, mainly cyanobacteria and actinobacteria, on sample surfaces. Microbial diversity of brick specimens appeared higher than that of the wood and was dominated by algae and cyanobacteria, while wood was mainly colonized by fungi. DNA sequences documented the presence of 15 bacterial phyla representing 99 genera including Halomonas, Halorhodospira, Salinisphaera, Salinibacterium, Rubrobacter, Streptomyces, Arthrobacter and 9 fungal classes represented by 113 genera including Cladosporium, Acremonium, Alternaria, Engyodontium, Penicillium, Rhizopus and Aureobasidium. Most of the identified sequences were characteristic of organisms implicated in deterioration of wood and brick. Metabolomic data indicated the activation of numerous metabolic pathways, including those regulating the production of primary and secondary metabolites, for example, metabolites associated with the production of antibiotics, organic acids and deterioration of organic compounds. The study demonstrated that a combination of electron microscopy imaging with metabolomic and genomic techniques allows to link the phylogenetic information and metabolic profiles of

  17. Yeast diversity during the fermentation of Andean chicha: A comparison of high-throughput sequencing and culture-dependent approaches.

    Science.gov (United States)

    Mendoza, Lucía M; Neef, Alexander; Vignolo, Graciela; Belloch, Carmela

    2017-10-01

    Diversity and dynamics of yeasts associated with the fermentation of Argentinian maize-based beverage chicha was investigated. Samples taken at different stages from two chicha productions were analyzed by culture-dependent and culture-independent methods. Five hundred and ninety six yeasts were isolated by classical microbiological methods and 16 species identified by RFLPs and sequencing of D1/D2 26S rRNA gene. Genetic typing of isolates from the dominant species, Saccharomyces cerevisiae, by PCR of delta elements revealed up to 42 different patterns. High-throughput sequencing (HTS) of D1/D2 26S rRNA gene amplicons from chicha samples detected more than one hundred yeast species and almost fifty filamentous fungi taxa. Analysis of the data revealed that yeasts dominated the fermentation, although, a significant percentage of filamentous fungi appeared in the first step of the process. Statistical analysis of results showed that very few taxa were represented by more than 1% of the reads per sample at any step of the process. S. cerevisiae represented more than 90% of the reads in the fermentative samples. Other yeast species dominated the pre-fermentative steps and abounded in fermented samples when S. cerevisiae was in percentages below 90%. Most yeasts species detected by pyrosequencing were not recovered by cultivation. In contrast, the cultivation-based methodology detected very few yeast taxa, and most of them corresponded with very few reads in the pyrosequencing analysis. Copyright © 2017 Elsevier Ltd. All rights reserved.

  18. Genetic Bases of Bicuspid Aortic Valve: The Contribution of Traditional and High-Throughput Sequencing Approaches on Research and Diagnosis

    Directory of Open Access Journals (Sweden)

    Betti Giusti

    2017-08-01

    Full Text Available Bicuspid aortic valve (BAV is a common (0.5–2.0% of general population congenital heart defect with increased prevalence of aortic dilatation and dissection. BAV has an autosomal dominant inheritance with reduced penetrance and variable expressivity. BAV has been described as an isolated trait or associated with syndromic conditions [e.g., Marfan Marfan syndrome or Loeys-Dietz syndrome (MFS, LDS]. Identification of a syndromic condition in a BAV patient is clinically relevant to personalize aortic surgery indication. A 4-fold increase in BAV prevalence in a large cohort of unrelated MFS patients with respect to general population was reported, as well as in LDS patients (8-fold. It is also known that BAV is more frequent in patients with thoracic aortic aneurysm (TAA related to mutations in ACTA2, FBN1, and TGFBR2 genes. Moreover, in 8 patients with BAV and thoracic aortic dilation, not fulfilling the clinical criteria for MFS, FBN1 mutations in 2/8 patients were identified suggesting that FBN1 or other genes involved in syndromic conditions correlated to aortopathy could be involved in BAV. Beyond loci associated to syndromic disorders, studies in humans and animal models evidenced/suggested the role of further genes in non-syndromic BAV. The transcriptional regulator NOTCH1 has been associated with the development and acceleration of calcium deposition. Genome wide marker-based linkage analysis demonstrated a linkage of BAV to loci on chromosomes 18, 5, and 13q. Recently, a role for GATA4/5 in aortic valve morphogenesis and endocardial cell differentiation has been reported. BAV has also been associated with a reduced UFD1L gene expression or involvement of a locus containing AXIN1/PDIA2. Much remains to be understood about the genetics of BAV. In the last years, high-throughput sequencing technologies, allowing the analysis of large number of genes or entire exomes or genomes, progressively became available. The latter issue together with

  19. Applications of High Throughput Sequencing for Immunology and Clinical Diagnostics

    OpenAIRE

    Kim, Hyunsung John

    2014-01-01

    High throughput sequencing methods have fundamentally shifted the manner in which biological experiments are performed. In this dissertation, conventional and novel high throughput sequencing and bioinformatics methods are applied to immunology and diagnostics. In order to study rare subsets of cells, an RNA sequencing method was first optimized for use with minimal levels of RNA and cellular input. The optimized RNA sequencing method was then applied to study the transcriptional differences ...

  20. High throughput sequencing and proteomics to identify immunogenic proteins of a new pathogen: the dirty genome approach.

    Directory of Open Access Journals (Sweden)

    Gilbert Greub

    Full Text Available BACKGROUND: With the availability of new generation sequencing technologies, bacterial genome projects have undergone a major boost. Still, chromosome completion needs a costly and time-consuming gap closure, especially when containing highly repetitive elements. However, incomplete genome data may be sufficiently informative to derive the pursued information. For emerging pathogens, i.e. newly identified pathogens, lack of release of genome data during gap closure stage is clearly medically counterproductive. METHODS/PRINCIPAL FINDINGS: We thus investigated the feasibility of a dirty genome approach, i.e. the release of unfinished genome sequences to develop serological diagnostic tools. We showed that almost the whole genome sequence of the emerging pathogen Parachlamydia acanthamoebae was retrieved even with relatively short reads from Genome Sequencer 20 and Solexa. The bacterial proteome was analyzed to select immunogenic proteins, which were then expressed and used to elaborate the first steps of an ELISA. CONCLUSIONS/SIGNIFICANCE: This work constitutes the proof of principle for a dirty genome approach, i.e. the use of unfinished genome sequences of pathogenic bacteria, coupled with proteomics to rapidly identify new immunogenic proteins useful to develop in the future specific diagnostic tests such as ELISA, immunohistochemistry and direct antigen detection. Although applied here to an emerging pathogen, this combined dirty genome sequencing/proteomic approach may be used for any pathogen for which better diagnostics are needed. These genome sequences may also be very useful to develop DNA based diagnostic tests. All these diagnostic tools will allow further evaluations of the pathogenic potential of this obligate intracellular bacterium.

  1. Microdissection of lampbrush chromosomes as an approach for generation of locus-specific FISH-probes and samples for high-throughput sequencing.

    Science.gov (United States)

    Zlotina, Anna; Kulikova, Tatiana; Kosyakova, Nadezda; Liehr, Thomas; Krasikova, Alla

    2016-02-20

    Over the past two decades, chromosome microdissection has been widely used in diagnostics and research enabling analysis of chromosomes and their regions through probe generation and establishing of chromosome- and chromosome region-specific DNA libraries. However, relatively small physical size of mitotic chromosomes limited the use of the conventional chromosome microdissection for investigation of tiny chromosomal regions. In the present study, we developed a workflow for mechanical microdissection of giant transcriptionally active lampbrush chromosomes followed by the preparation of whole-chromosome and locus-specific fluorescent in situ hybridization (FISH)-probes and high-throughput sequencing. In particular, chicken (Gallus g. domesticus) lampbrush chromosome regions as small as single chromomeres, individual lateral loops and marker structures were successfully microdissected. The dissected fragments were mapped with high resolution to target regions of the corresponding lampbrush chromosomes. For investigation of RNA-content of lampbrush chromosome structures, samples retrieved by microdissection were subjected to reverse transcription. Using high-throughput sequencing, the isolated regions were successfully assigned to chicken genome coordinates. As a result, we defined precisely the loci for marker structures formation on chicken lampbrush chromosomes 2 and 3. Additionally, our data suggest that large DAPI-positive chromomeres of chicken lampbrush chromosome arms are characterized by low gene density and high repeat content. The developed technical approach allows to obtain DNA and RNA samples from particular lampbrush chromosome loci, to define precisely the genomic position, extent and sequence content of the dissected regions. The data obtained demonstrate that lampbrush chromosome microdissection provides a unique opportunity to correlate a particular transcriptional domain or a cytological structure with a known DNA sequence. This approach offers

  2. An improved high throughput sequencing method for studying oomycete communities

    DEFF Research Database (Denmark)

    Sapkota, Rumakanta; Nicolaisen, Mogens

    2015-01-01

    Culture-independent studies using next generation sequencing have revolutionizedmicrobial ecology, however, oomycete ecology in soils is severely lagging behind. The aimof this study was to improve and validate standard techniques for using high throughput sequencing as a tool for studying oomycete...... agricultural fields in Denmark, and 11 samples from carrot tissue with symptoms of Pythium infection. Sequence data from the Pythium and Phytophthora mock communities showed that our strategy successfully detected all included species. Taxonomic assignments of OTUs from 26 soil sample showed that 95...... the usefulness of the method not only in soil DNA but also in a plant DNA background. In conclusion, we demonstrate a successful approach for pyrosequencing of oomycete communities using ITS1 as the barcode sequence with well-known primers for oomycete DNA amplification....

  3. The Gut Microbiotassay: a high-throughput qPCR approach combinable with next generation sequencing to study gut microbial diversity

    DEFF Research Database (Denmark)

    Hermann-Bank, Marie Louise; Skovgaard, Kerstin; Stockmarr, Anders

    2013-01-01

    Background The intestinal microbiota is a complex and diverse ecosystem that plays a significant role in maintaining the health and well-being of the mammalian host. During the last decade focus has increased on the importance of intestinal bacteria. Several molecular methods can be applied...... to describe the composition of the microbiota. This study used a new approach, the Gut Microbiotassay: an assembly of 24 primer sets targeting the main phyla and taxonomically related subgroups of the intestinal microbiota, to be used with the high-throughput qPCR chip ‘Access Array 48.48′, AA48.48, (Fluidigm...... with and without diarrhoea. The PCR amplicons from the 2304 reaction chambers were harvested from the AA48.48, purified, and sequenced using 454-technology. Results The Gut Microbiotassay was able to detect significant differences in the quantity and composition of the microbiota according to gut sections...

  4. High throughput SNP discovery and genotyping in grapevine (Vitis vinifera L. by combining a re-sequencing approach and SNPlex technology

    Directory of Open Access Journals (Sweden)

    Martínez-Zapater José M

    2007-11-01

    decay of LD within the selected grapevine genotypes. To validate the use of the detected polymorphisms in genetic mapping, cultivar identification and genetic diversity studies we have used the SNPlex™ genotyping technology in a sample of grapevine genotypes and segregating progenies. Conclusion These results provide accurate values for nucleotide diversity in coding sequences and a first estimate of short-range LD in grapevine. Using SNPlex™ genotyping we have shown the application of a set of discovered SNPs as molecular markers for cultivar identification, linkage mapping and genetic diversity studies. Thus, the combination a highly efficient re-sequencing approach and the SNPlex™ high throughput genotyping technology provide a powerful tool for grapevine genetic analysis.

  5. [Biological ingredient analysis of traditional Chinese medicines utilizing metagenomic approach based on high-throughput-sequencing and big-data-mining].

    Science.gov (United States)

    Bai, Hong; Ning, Kang; Wang, Chang-yun

    2015-03-01

    The quality of traditional Chinese medicines (TCMs) has been mainly evaluated based on chemical ingredients, yet recently more attentions have been paid on biological ingredients, especially for pill-based preparations. It is a key approach to establish a fast, accurate and systematic method of biological ingredient analysis for realization of modernization, industrialization and internationalization of TCMs. The biological ingredient analysis of TCM preparations could be abstracted as the identification of multiple species from a biological mixture. The metagenomic approach based on high-throughput-sequencing (HTS) and big-data-mining has been considered as one of the most effective methods for multiple species analysis of a biological mixture, which would also be helpful for the analysis of biological ingredients in TCMs. Simultaneous identification of diverse species, including the prescribed species, adulterants, toxic species, protected species and even the biological impurities introduced through production process, could be achieved by selecting appropriate DNA biomarkers, as well as applying large-scale sequence comparison and data mining. By this approach, it is prospective to offer an evaluation basis for the effectiveness, safety and legality of TCM preparations.

  6. High-throughput sequencing in mitochondrial DNA research.

    Science.gov (United States)

    Ye, Fei; Samuels, David C; Clark, Travis; Guo, Yan

    2014-07-01

    Next-generation sequencing, also known as high-throughput sequencing, has greatly enhanced researchers' ability to conduct biomedical research on all levels. Mitochondrial research has also benefitted greatly from high-throughput sequencing; sequencing technology now allows for screening of all 16,569 base pairs of the mitochondrial genome simultaneously for SNPs and low level heteroplasmy and, in some cases, the estimation of mitochondrial DNA copy number. It is important to realize the full potential of high-throughput sequencing for the advancement of mitochondrial research. To this end, we review how high-throughput sequencing has impacted mitochondrial research in the categories of SNPs, low level heteroplasmy, copy number, and structural variants. We also discuss the different types of mitochondrial DNA sequencing and their pros and cons. Based on previous studies conducted by various groups, we provide strategies for processing mitochondrial DNA sequencing data, including assembly, variant calling, and quality control. Copyright © 2014 Elsevier B.V. and Mitochondria Research Society. All rights reserved.

  7. High-throughput sequencing and vaccine design

    National Research Council Canada - National Science Library

    Luciani, F

    2016-01-01

    .... This has resulted in new approaches to vaccine research. On the one hand, the increase in genome complexity challenges our ability to study and understand pathogen biology and pathogen-host interactions...

  8. High-throughput sequencing as an effective approach in profiling small RNAs derived from a hairpin RNA expression vector in woody plants.

    Science.gov (United States)

    Zhao, Dongyan; Song, Guo-Qing

    2014-11-01

    Hairpin RNA (hpRNA)-mediated gene silencing has proved to be an efficient approach to develop virus-resistant transgenic plants. To characterize small RNA molecules (sRNAs) derived from an hpRNA expression vector in transgenic cherry rootstock plants, we conducted small RNA sequencing of (1) a transgenic rootstock containing an inverted repeat of the partial coat protein of Prunus necrotic ring spot virus (PNRSV-hpRNA); (2) a nontransgenic rootstock; and (3) a PNRSV-infected sweet cherry plant. Analysis of the PNRSV sRNA pools indicated that 24-nt (nucleotide) small interfering RNAs (siRNAs) were the most prevalent sRNAs in the transgenic rootstock whereas the most abundant sRNAs in the PNRSV-infected nontransgenic rootstock were 21-nt siRNAs. In addition, the 24-nt siRNAs of the PNRSV-hpRNA were more abundant on the sense strand than those on the antisense strand in the transgenic rootstock. In contrast, preference in generating PNRSV sRNAs, ranging from 19-nt to 30-nt for sense and antisense strands, was not distinct in the PNRSV-infected nontransgenic sweet cherry. Taken together, this is the first report on profiling hpRNA-derived sRNAs in woody plants using high-throughput sequencing technology, which is an efficient way to verify the presence/absence, the abundance, and the sequence features of certain sRNAs. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  9. High throughput 16S rRNA gene amplicon sequencing

    DEFF Research Database (Denmark)

    Nierychlo, Marta; Larsen, Poul; Jørgensen, Mads Koustrup

    S rRNA gene amplicon sequencing has been developed over the past few years and is now ready to use for more comprehensive studies related to plant operation and optimization thanks to short analysis time, low cost, high throughput, and high taxonomic resolution. In this study we show how 16S r...

  10. High-throughput sequence alignment using Graphics Processing Units

    Directory of Open Access Journals (Sweden)

    Trapnell Cole

    2007-12-01

    Full Text Available Abstract Background The recent availability of new, less expensive high-throughput DNA sequencing technologies has yielded a dramatic increase in the volume of sequence data that must be analyzed. These data are being generated for several purposes, including genotyping, genome resequencing, metagenomics, and de novo genome assembly projects. Sequence alignment programs such as MUMmer have proven essential for analysis of these data, but researchers will need ever faster, high-throughput alignment tools running on inexpensive hardware to keep up with new sequence technologies. Results This paper describes MUMmerGPU, an open-source high-throughput parallel pairwise local sequence alignment program that runs on commodity Graphics Processing Units (GPUs in common workstations. MUMmerGPU uses the new Compute Unified Device Architecture (CUDA from nVidia to align multiple query sequences against a single reference sequence stored as a suffix tree. By processing the queries in parallel on the highly parallel graphics card, MUMmerGPU achieves more than a 10-fold speedup over a serial CPU version of the sequence alignment kernel, and outperforms the exact alignment component of MUMmer on a high end CPU by 3.5-fold in total application time when aligning reads from recent sequencing projects using Solexa/Illumina, 454, and Sanger sequencing technologies. Conclusion MUMmerGPU is a low cost, ultra-fast sequence alignment program designed to handle the increasing volume of data produced by new, high-throughput sequencing technologies. MUMmerGPU demonstrates that even memory-intensive applications can run significantly faster on the relatively low-cost GPU than on the CPU.

  11. High-throughput sequence alignment using Graphics Processing Units.

    Science.gov (United States)

    Schatz, Michael C; Trapnell, Cole; Delcher, Arthur L; Varshney, Amitabh

    2007-12-10

    The recent availability of new, less expensive high-throughput DNA sequencing technologies has yielded a dramatic increase in the volume of sequence data that must be analyzed. These data are being generated for several purposes, including genotyping, genome resequencing, metagenomics, and de novo genome assembly projects. Sequence alignment programs such as MUMmer have proven essential for analysis of these data, but researchers will need ever faster, high-throughput alignment tools running on inexpensive hardware to keep up with new sequence technologies. This paper describes MUMmerGPU, an open-source high-throughput parallel pairwise local sequence alignment program that runs on commodity Graphics Processing Units (GPUs) in common workstations. MUMmerGPU uses the new Compute Unified Device Architecture (CUDA) from nVidia to align multiple query sequences against a single reference sequence stored as a suffix tree. By processing the queries in parallel on the highly parallel graphics card, MUMmerGPU achieves more than a 10-fold speedup over a serial CPU version of the sequence alignment kernel, and outperforms the exact alignment component of MUMmer on a high end CPU by 3.5-fold in total application time when aligning reads from recent sequencing projects using Solexa/Illumina, 454, and Sanger sequencing technologies. MUMmerGPU is a low cost, ultra-fast sequence alignment program designed to handle the increasing volume of data produced by new, high-throughput sequencing technologies. MUMmerGPU demonstrates that even memory-intensive applications can run significantly faster on the relatively low-cost GPU than on the CPU.

  12. Validation of high throughput sequencing and microbial forensics applications

    OpenAIRE

    Budowle, Bruce; Connell, Nancy D.; Bielecka-Oder, Anna; Rita R Colwell; Corbett, Cindi R.; Fletcher, Jacqueline; Forsman, Mats; Kadavy, Dana R; Markotic, Alemka; Morse, Stephen A.; Murch, Randall S; Sajantila, Antti; Schemes, Sarah E; Ternus, Krista L; Turner, Stephen D

    2014-01-01

    Abstract High throughput sequencing (HTS) generates large amounts of high quality sequence data for microbial genomics. The value of HTS for microbial forensics is the speed at which evidence can be collected and the power to characterize microbial-related evidence to solve biocrimes and bioterrorist events. As HTS technologies continue to improve, they provide increasingly powerful sets of tools to support the entire field of microbial forensics. Accurate, credible results a...

  13. Genetic characterisation of Malawian pneumococci prior to the roll-out of the PCV13 vaccine using a high-throughput whole genome sequencing approach.

    Directory of Open Access Journals (Sweden)

    Dean B Everett

    Full Text Available Malawi commenced the introduction of the 13-valent pneumococcal conjugate vaccine (PCV13 into the routine infant immunisation schedule in November 2011. Here we have tested the utility of high throughput whole genome sequencing to provide a high-resolution view of pre-vaccine pneumococcal epidemiology and population evolutionary trends to predict potential future change in population structure post introduction.One hundred and twenty seven (127 archived pneumococcal isolates from randomly selected adults and children presenting to the Queen Elizabeth Central Hospital, Blantyre, Malawi underwent whole genome sequencing.The pneumococcal population was dominated by serotype 1 (20.5% of invasive isolates prior to vaccine introduction. PCV13 is likely to protect against 62.9% of all circulating invasive pneumococci (78.3% in under-5-year-olds. Several Pneumococcal Molecular Epidemiology Network (PMEN clones are now in circulation in Malawi which were previously undetected but the pandemic multidrug resistant PMEN1 lineage was not identified. Genome analysis identified a number of novel sequence types and serotype switching.High throughput genome sequencing is now feasible and has the capacity to simultaneously elucidate serotype, sequence type and as well as detailed genetic information. It enables population level characterization, providing a detailed picture of population structure and genome evolution relevant to disease control. Post-vaccine introduction surveillance supported by genome sequencing is essential to providing a comprehensive picture of the impact of PCV13 on pneumococcal population structure and informing future public health interventions.

  14. High-throughput DNA sequencing: a genomic data manufacturing process.

    Science.gov (United States)

    Huang, G M

    1999-01-01

    The progress trends in automated DNA sequencing operation are reviewed. Technological development in sequencing instruments, enzymatic chemistry and robotic stations has resulted in ever-increasing capacity of sequence data production. This progress leads to a higher demand on laboratory information management and data quality assessment. High-throughput laboratories face the challenge of organizational management, as well as technology management. Engineering principles of process control should be adopted in this biological data manufacturing procedure. While various systems attempt to provide solutions to automate different parts of, or even the entire process, new technical advances will continue to change the paradigm and provide new challenges.

  15. High throughput sequencing of microRNAs in chicken somites.

    Science.gov (United States)

    Rathjen, Tina; Pais, Helio; Sweetman, Dylan; Moulton, Vincent; Munsterberg, Andrea; Dalmay, Tamas

    2009-05-06

    High throughput Solexa sequencing technology was applied to identify microRNAs in somites of developing chicken embryos. We obtained 651,273 reads, from which 340,415 were mapped to the chicken genome representing 1701 distinct sequences. Eighty-five of these were known microRNAs and 42 novel miRNA candidates were identified. Accumulation of 18 of 42 sequences was confirmed by Northern blot analysis. Ten of the 18 sequences are new variants of known miRNAs and eight short RNAs are novel miRNAs. Six of these eight have not been reported by other deep sequencing projects. One of the six new miRNAs is highly enriched in somite tissue suggesting that deep sequencing of other specific tissues has the potential to identify novel tissue specific miRNAs.

  16. High-throughput sequencing: a roadmap toward community ecology.

    Science.gov (United States)

    Poisot, Timothée; Péquin, Bérangère; Gravel, Dominique

    2013-04-01

    High-throughput sequencing is becoming increasingly important in microbial ecology, yet it is surprisingly under-used to generate or test biogeographic hypotheses. In this contribution, we highlight how adding these methods to the ecologist toolbox will allow the detection of new patterns, and will help our understanding of the structure and dynamics of diversity. Starting with a review of ecological questions that can be addressed, we move on to the technical and analytical issues that will benefit from an increased collaboration between different disciplines.

  17. Fusion genes and their discovery using high throughput sequencing.

    Science.gov (United States)

    Annala, M J; Parker, B C; Zhang, W; Nykter, M

    2013-11-01

    Fusion genes are hybrid genes that combine parts of two or more original genes. They can form as a result of chromosomal rearrangements or abnormal transcription, and have been shown to act as drivers of malignant transformation and progression in many human cancers. The biological significance of fusion genes together with their specificity to cancer cells has made them into excellent targets for molecular therapy. Fusion genes are also used as diagnostic and prognostic markers to confirm cancer diagnosis and monitor response to molecular therapies. High-throughput sequencing has enabled the systematic discovery of fusion genes in a wide variety of cancer types. In this review, we describe the history of fusion genes in cancer and the ways in which fusion genes form and affect cellular function. We also describe computational methodologies for detecting fusion genes from high-throughput sequencing experiments, and the most common sources of error that lead to false discovery of fusion genes. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  18. Applications of High-Throughput Nucleotide Sequencing (PhD)

    DEFF Research Database (Denmark)

    Waage, Johannes

    The recent advent of high throughput sequencing of nucleic acids (RNA and DNA) has vastly expanded research into the functional and structural biology of the genome of all living organisms (and even a few dead ones). With this enormous and exponential growth in biological data generation come...... equally large demands in data handling, analysis and interpretation, perhaps defining the modern challenge of the computational biologist of the post-genomic era. The first part of this thesis consists of a general introduction to the history, common terms and challenges of next generation sequencing......). For the second flavor, DNA-seq, a study presenting genome wide profiling of transcription factor CEBP/A in liver cells undergoing regeneration after partial hepatectomy (article IV) is included....

  19. ESSENTIALS: Software for Rapid Analysis of High Throughput Transposon Insertion Sequencing Data.

    NARCIS (Netherlands)

    Zomer, A.L.; Burghout, P.J.; Bootsma, H.J.; Hermans, P.W.M.; Hijum, S.A.F.T. van

    2012-01-01

    High-throughput analysis of genome-wide random transposon mutant libraries is a powerful tool for (conditional) essential gene discovery. Recently, several next-generation sequencing approaches, e.g. Tn-seq/INseq, HITS and TraDIS, have been developed that accurately map the site of transposon

  20. Savant: genome browser for high-throughput sequencing data.

    Science.gov (United States)

    Fiume, Marc; Williams, Vanessa; Brook, Andrew; Brudno, Michael

    2010-08-15

    The advent of high-throughput sequencing (HTS) technologies has made it affordable to sequence many individuals' genomes. Simultaneously the computational analysis of the large volumes of data generated by the new sequencing machines remains a challenge. While a plethora of tools are available to map the resulting reads to a reference genome, and to conduct primary analysis of the mappings, it is often necessary to visually examine the results and underlying data to confirm predictions and understand the functional effects, especially in the context of other datasets. We introduce Savant, the Sequence Annotation, Visualization and ANalysis Tool, a desktop visualization and analysis browser for genomic data. Savant was developed for visualizing and analyzing HTS data, with special care taken to enable dynamic visualization in the presence of gigabases of genomic reads and references the size of the human genome. Savant supports the visualization of genome-based sequence, point, interval and continuous datasets, and multiple visualization modes that enable easy identification of genomic variants (including single nucleotide polymorphisms, structural and copy number variants), and functional genomic information (e.g. peaks in ChIP-seq data) in the context of genomic annotations. Savant is freely available at http://compbio.cs.toronto.edu/savant.

  1. Validation of high throughput sequencing and microbial forensics applications.

    Science.gov (United States)

    Budowle, Bruce; Connell, Nancy D; Bielecka-Oder, Anna; Colwell, Rita R; Corbett, Cindi R; Fletcher, Jacqueline; Forsman, Mats; Kadavy, Dana R; Markotic, Alemka; Morse, Stephen A; Murch, Randall S; Sajantila, Antti; Schmedes, Sarah E; Ternus, Krista L; Turner, Stephen D; Minot, Samuel

    2014-01-01

    High throughput sequencing (HTS) generates large amounts of high quality sequence data for microbial genomics. The value of HTS for microbial forensics is the speed at which evidence can be collected and the power to characterize microbial-related evidence to solve biocrimes and bioterrorist events. As HTS technologies continue to improve, they provide increasingly powerful sets of tools to support the entire field of microbial forensics. Accurate, credible results allow analysis and interpretation, significantly influencing the course and/or focus of an investigation, and can impact the response of the government to an attack having individual, political, economic or military consequences. Interpretation of the results of microbial forensic analyses relies on understanding the performance and limitations of HTS methods, including analytical processes, assays and data interpretation. The utility of HTS must be defined carefully within established operating conditions and tolerances. Validation is essential in the development and implementation of microbial forensics methods used for formulating investigative leads attribution. HTS strategies vary, requiring guiding principles for HTS system validation. Three initial aspects of HTS, irrespective of chemistry, instrumentation or software are: 1) sample preparation, 2) sequencing, and 3) data analysis. Criteria that should be considered for HTS validation for microbial forensics are presented here. Validation should be defined in terms of specific application and the criteria described here comprise a foundation for investigators to establish, validate and implement HTS as a tool in microbial forensics, enhancing public safety and national security.

  2. Large scale library generation for high throughput sequencing.

    Directory of Open Access Journals (Sweden)

    Erik Borgström

    Full Text Available BACKGROUND: Large efforts have recently been made to automate the sample preparation protocols for massively parallel sequencing in order to match the increasing instrument throughput. Still, the size selection through agarose gel electrophoresis separation is a labor-intensive bottleneck of these protocols. METHODOLOGY/PRINCIPAL FINDINGS: In this study a method for automatic library preparation and size selection on a liquid handling robot is presented. The method utilizes selective precipitation of certain sizes of DNA molecules on to paramagnetic beads for cleanup and selection after standard enzymatic reactions. CONCLUSIONS/SIGNIFICANCE: The method is used to generate libraries for de novo and re-sequencing on the Illumina HiSeq 2000 instrument with a throughput of 12 samples per instrument in approximately 4 hours. The resulting output data show quality scores and pass filter rates comparable to manually prepared samples. The sample size distribution can be adjusted for each application, and are suitable for all high throughput DNA processing protocols seeking to control size intervals.

  3. A priori Considerations When Conducting High-Throughput Amplicon-Based Sequence Analysis

    Directory of Open Access Journals (Sweden)

    Aditi Sengupta

    2016-03-01

    Full Text Available Amplicon-based sequencing strategies that include 16S rRNA and functional genes, alongside “meta-omics” analyses of communities of microorganisms, have allowed researchers to pose questions and find answers to “who” is present in the environment and “what” they are doing. Next-generation sequencing approaches that aid microbial ecology studies of agricultural systems are fast gaining popularity among agronomy, crop, soil, and environmental science researchers. Given the rapid development of these high-throughput sequencing techniques, researchers with no prior experience will desire information about the best practices that can be used before actually starting high-throughput amplicon-based sequence analyses. We have outlined items that need to be carefully considered in experimental design, sampling, basic bioinformatics, sequencing of mock communities and negative controls, acquisition of metadata, and in standardization of reaction conditions as per experimental requirements. Not all considerations mentioned here may pertain to a particular study. The overall goal is to inform researchers about considerations that must be taken into account when conducting high-throughput microbial DNA sequencing and sequences analysis.

  4. Probabilistic Methods for Processing High-Throughput Sequencing Signals

    DEFF Research Database (Denmark)

    Sørensen, Lasse Maretty

    for reconstructing transcript sequences from RNA sequencing data. The method is based on a novel sparse prior distribution over transcript abundances and is markedly more accurate than existing approaches. The second chapter describes a new method for calling genotypes from a fixed set of candidate variants...

  5. Algorithms for mapping high-throughput DNA sequences

    DEFF Research Database (Denmark)

    Frellsen, Jes; Menzel, Peter; Krogh, Anders

    2014-01-01

    of data generation, new bioinformatics approaches have been developed to cope with the large amount of sequencing reads obtained in these experiments. In this chapter, we first introduce HTS technologies and their usage in molecular biology and discuss the problem of mapping sequencing reads...

  6. COMPUTER APPROACHES TO WHEAT HIGH-THROUGHPUT PHENOTYPING

    Directory of Open Access Journals (Sweden)

    Afonnikov D.

    2012-08-01

    Full Text Available The growing need for rapid and accurate approaches for large-scale assessment of phenotypic characters in plants becomes more and more obvious in the studies looking into relationships between genotype and phenotype. This need is due to the advent of high throughput methods for analysis of genomes. Nowadays, any genetic experiment involves data on thousands and dozens of thousands of plants. Traditional ways of assessing most phenotypic characteristics (those with reliance on the eye, the touch, the ruler are little effective on samples of such sizes. Modern approaches seek to take advantage of automated phenotyping, which warrants a much more rapid data acquisition, higher accuracy of the assessment of phenotypic features, measurement of new parameters of these features and exclusion of human subjectivity from the process. Additionally, automation allows measurement data to be rapidly loaded into computer databases, which reduces data processing time.In this work, we present the WheatPGE information system designed to solve the problem of integration of genotypic and phenotypic data and parameters of the environment, as well as to analyze the relationships between the genotype and phenotype in wheat. The system is used to consolidate miscellaneous data on a plant for storing and processing various morphological traits and genotypes of wheat plants as well as data on various environmental factors. The system is available at www.wheatdb.org. Its potential in genetic experiments has been demonstrated in high-throughput phenotyping of wheat leaf pubescence.

  7. Roche genome sequencer FLX based high-throughput sequencing of ancient DNA

    DEFF Research Database (Denmark)

    Alquezar-Planas, David E; Fordyce, Sarah Louise

    2012-01-01

    Since the development of so-called "next generation" high-throughput sequencing in 2005, this technology has been applied to a variety of fields. Such applications include disease studies, evolutionary investigations, and ancient DNA. Each application requires a specialized protocol to ensure tha...

  8. Library Design-Facilitated High-Throughput Sequencing of Synthetic Peptide Libraries.

    Science.gov (United States)

    Vinogradov, Alexander A; Gates, Zachary P; Zhang, Chi; Quartararo, Anthony J; Halloran, Kathryn H; Pentelute, Bradley L

    2017-11-13

    A methodology to achieve high-throughput de novo sequencing of synthetic peptide mixtures is reported. The approach leverages shotgun nanoliquid chromatography coupled with tandem mass spectrometry-based de novo sequencing of library mixtures (up to 2000 peptides) as well as automated data analysis protocols to filter away incorrect assignments, noise, and synthetic side-products. For increasing the confidence in the sequencing results, mass spectrometry-friendly library designs were developed that enabled unambiguous decoding of up to 600 peptide sequences per hour while maintaining greater than 85% sequence identification rates in most cases. The reliability of the reported decoding strategy was additionally confirmed by matching fragmentation spectra for select authentic peptides identified from library sequencing samples. The methods reported here are directly applicable to screening techniques that yield mixtures of active compounds, including particle sorting of one-bead one-compound libraries and affinity enrichment of synthetic library mixtures performed in solution.

  9. High-throughput DNA sequencing errors are reduced by orders of magnitude using circle sequencing

    Science.gov (United States)

    Lou, Dianne I.; Hussmann, Jeffrey A.; McBee, Ross M.; Acevedo, Ashley; Andino, Raul; Press, William H.; Sawyer, Sara L.

    2013-01-01

    A major limitation of high-throughput DNA sequencing is the high rate of erroneous base calls produced. For instance, Illumina sequencing machines produce errors at a rate of ∼0.1–1 × 10−2 per base sequenced. These technologies typically produce billions of base calls per experiment, translating to millions of errors. We have developed a unique library preparation strategy, “circle sequencing,” which allows for robust downstream computational correction of these errors. In this strategy, DNA templates are circularized, copied multiple times in tandem with a rolling circle polymerase, and then sequenced on any high-throughput sequencing machine. Each read produced is computationally processed to obtain a consensus sequence of all linked copies of the original molecule. Physically linking the copies ensures that each copy is independently derived from the original molecule and allows for efficient formation of consensus sequences. The circle-sequencing protocol precedes standard library preparations and is therefore suitable for a broad range of sequencing applications. We tested our method using the Illumina MiSeq platform and obtained errors in our processed sequencing reads at a rate as low as 7.6 × 10−6 per base sequenced, dramatically improving the error rate of Illumina sequencing and putting error on par with low-throughput, but highly accurate, Sanger sequencing. Circle sequencing also had substantially higher efficiency and lower cost than existing barcode-based schemes for correcting sequencing errors. PMID:24243955

  10. A reporter system coupled with high-throughput sequencing unveils key bacterial transcription and translation determinants.

    Science.gov (United States)

    Yus, Eva; Yang, Jae-Seong; Sogues, Adrià; Serrano, Luis

    2017-08-28

    Quantitative analysis of the sequence determinants of transcription and translation regulation is relevant for systems and synthetic biology. To identify these determinants, researchers have developed different methods of screening random libraries using fluorescent reporters or antibiotic resistance genes. Here, we have implemented a generic approach called ELM-seq (expression level monitoring by DNA methylation) that overcomes the technical limitations of such classic reporters. ELM-seq uses DamID (Escherichia coli DNA adenine methylase as a reporter coupled with methylation-sensitive restriction enzyme digestion and high-throughput sequencing) to enable in vivo quantitative analyses of upstream regulatory sequences. Using the genome-reduced bacterium Mycoplasma pneumoniae, we show that ELM-seq has a large dynamic range and causes minimal toxicity. We use ELM-seq to determine key sequences (known and putatively novel) of promoter and untranslated regions that influence transcription and translation efficiency. Applying ELM-seq to other organisms will help us to further understand gene expression and guide synthetic biology.Quantitative analysis of how DNA sequence determines transcription and translation regulation is of interest to systems and synthetic biologists. Here the authors present ELM-seq, which uses Dam activity as reporter for high-throughput analysis of promoter and 5'-UTR regions.

  11. Targeted next-generation sequencing in chronic lymphocytic leukemia: A high-throughput yet tailored approach will facilitate implementation in a clinical setting

    NARCIS (Netherlands)

    L.-A. Sutton (L.); V. Ljungström (Viktor); A. Mansouri (Ahmed); E. Young (Emma); D. Cortese (D.); V. Navrkalova (Veronika); J. Malčíková (J.); A.F. Muggen; M. Trbusek (Martin); P. Panagiotidis (P.); F. Davi (Frédéric); C. Belessi (C.); A.W. Langerak (Anton); P. Ghia (Paolo); D. Pospisilova (Dagmar); K. Stamatopoulos (Kostas); R. Rosenquist (R.)

    2015-01-01

    textabstractNext-generation sequencing has revealed novel recurrent mutations in chronic lymphocytic leukemia, particularly in patients with aggressive disease. Here, we explored targeted re-sequencing as a novel strategy to assess the mutation status of genes with prognostic potential. To this end,

  12. Characterizing ncRNAs in human pathogenic protists using high-throughput sequencing technology

    Directory of Open Access Journals (Sweden)

    Lesley Joan Collins

    2011-12-01

    Full Text Available ncRNAs are key genes in many human diseases including cancer and viral infection, as well as providing critical functions in pathogenic organisms such as fungi, bacteria, viruses and protists. Until now the identification and characterization of ncRNAs associated with disease has been slow or inaccurate requiring many years of testing to understand complicated RNA and protein gene relationships. High-throughput sequencing now offers the opportunity to characterize miRNAs, siRNAs, snoRNAs and long ncRNAs on a genomic scale making it faster and easier to clarify how these ncRNAs contribute to the disease state. However, this technology is still relatively new, and ncRNA discovery is not an application of high priority for streamlined bioinformatics. Here we summarize background concepts and practical approaches for ncRNA analysis using high-throughput sequencing, and how it relates to understanding human disease. As a case study, we focus on the parasitic protists Giardia lamblia and Trichomonas vaginalis, where large evolutionary distance has meant difficulties in comparing ncRNAs with those from model eukaryotes. A combination of biological, computational and sequencing approaches has enabled easier classification of ncRNA classes such as snoRNAs, but has also aided the identification of novel classes. It is hoped that a higher level of understanding of ncRNA expression and interaction may aid in the development of less harsh treatment for protist-based diseases.

  13. Targeted high throughput sequencing in hereditary ataxia and spastic paraplegia.

    Science.gov (United States)

    Iqbal, Zafar; Rydning, Siri L; Wedding, Iselin M; Koht, Jeanette; Pihlstrøm, Lasse; Rengmark, Aina H; Henriksen, Sandra P; Tallaksen, Chantal M E; Toft, Mathias

    2017-01-01

    Hereditary ataxia and spastic paraplegia are heterogeneous monogenic neurodegenerative disorders. To date, a large number of individuals with such disorders remain undiagnosed. Here, we have assessed molecular diagnosis by gene panel sequencing in 105 early and late-onset hereditary ataxia and spastic paraplegia probands, in whom extensive previous investigations had failed to identify the genetic cause of disease. Pathogenic and likely-pathogenic variants were identified in 20 probands (19%) and variants of uncertain significance in ten probands (10%). Together these accounted for 30 probands (29%) and involved 18 different genes. Among several interesting findings, dominantly inherited KIF1A variants, p.(Val8Met) and p.(Ile27Thr) segregated in two independent families, both presenting with a pure spastic paraplegia phenotype. Two homozygous missense variants, p.(Gly4230Ser) and p.(Leu4221Val) were found in SACS in one consanguineous family, presenting with spastic ataxia and isolated cerebellar atrophy. The average disease duration in probands with pathogenic and likely-pathogenic variants was 31 years, ranging from 4 to 51 years. In conclusion, this study confirmed and expanded the clinical phenotypes associated with known disease genes. The results demonstrate that gene panel sequencing and similar sequencing approaches can serve as efficient diagnostic tools for different heterogeneous disorders. Early use of such strategies may help to reduce both costs and time of the diagnostic process.

  14. Direct multiplex sequencing (DMPS)--a novel method for targeted high-throughput sequencing of ancient and highly degraded DNA

    National Research Council Canada - National Science Library

    Stiller, Mathias; Knapp, Michael; Stenzel, Udo; Hofreiter, Michael; Meyer, Matthias

    2009-01-01

    Although the emergence of high-throughput sequencing technologies has enabled whole-genome sequencing from extinct organisms, little progress has been made in accelerating targeted sequencing from highly degraded DNA...

  15. Unraveling long non-coding RNAs through analysis of high-throughput RNA-sequencing data

    Directory of Open Access Journals (Sweden)

    Rashmi Tripathi

    2017-06-01

    Full Text Available Extensive genome-wide transcriptome study mediated by high throughput sequencing technique has revolutionized the study of genetics and epigenetic at unprecedented resolution. The research has revealed that besides protein-coding RNAs, large proportions of mammalian transcriptome includes a heap of regulatory non protein-coding RNAs, the number encoded within human genome is enigmatic. Many taboos developed in the past categorized these non-coding RNAs as ‘‘dark matter” and “junks”. Breaking the myth, RNA-seq-- a recently developed experimental technique is widely being used for studying non-coding RNAs which has acquired the limelight due to their physiological and pathological significance. The longest member of the ncRNA family-- long non-coding RNAs, acts as stable and functional part of a genome, guiding towards the important clues about the varied biological events like cellular-, structural- processes governing the complexity of an organism. Here, we review the most recent and influential computational approach developed to identify and quantify the long non-coding RNAs serving as an assistant for the users to choose appropriate tools for their specific research. Keywords: Transcriptome, High throughput sequencing, Genetic and epigenetic, Long non-coding RNA, RNA-sequencing, RNA-seq

  16. Discovery of microRNAs and transcript targets related to witches' broom disease in Paulownia fortunei by high-throughput sequencing and degradome approach.

    Science.gov (United States)

    Niu, Suyan; Fan, Guoqiang; Deng, Minjie; Zhao, Zhenli; Xu, Enkai; Cao, Lin

    2016-02-01

    Paulownia witches' broom (PaWB) caused by the phytoplasma is a devastating disease of Paulownia trees. It has caused heavy yield losses to Paulownia production worldwide. However, knowledge of the transcriptional and post-transcriptional regulation of gene expression by microRNAs (miRNAs), especially miRNAs responsive to PaWB disease stress, is still rudimentary. In this study, to identify miRNAs and their transcript targets that are responsive to PaWB disease stress, six sequencing libraries were constructed from healthy (PF), PaWB-infected (PFI), and PaWB-infected, 20 mg L(-1) methyl methane sulfonate-treated (PFI20) P. fortunei seedlings. As a result, 95 conserved miRNAs belonging to 18 miRNA families, as well as 122 potential novel miRNAs, were identified. Most of them were found to be a response to PaWB disease-induced stress, and the expression levels of these miRNAs were validated by quantitative real-time PCR analysis. The study simultaneously identified 109 target genes from the P. fortunei for 14 conserved miRNA families and 24 novel miRNAs by degradome sequencing. Furthermore, the functions of the miRNA targets were annotated based on Gene Ontology and Kyoto Encyclopedia of Genes and Genomes pathway analysis. The results presented here provide the groundwork for further analysis of miRNAs and target genes responsive to the PaWB disease stress, and could be also useful for addressing new questions to better understand the mechanisms of plant infection by phytoplasma in the future.

  17. Comparing the performance of three ancient DNA extraction methods for high-throughput sequencing

    DEFF Research Database (Denmark)

    Gamba, Cristina; Hanghøj, Kristian Ebbesen; Gaunitz, Charleen

    2016-01-01

    The DNA molecules that can be extracted from archaeological and palaeontological remains are often degraded and massively contaminated with environmental microbial material. This reduces the efficacy of shotgun approaches for sequencing ancient genomes, despite the decreasing sequencing costs...... of high-throughput sequencing (HTS). Improving the recovery of endogenous molecules from the DNA extraction and purification steps could, thus, help advance the characterization of ancient genomes. Here, we apply the three most commonly used DNA extraction methods to five ancient bone samples spanning...... a ~30 thousand year temporal range and originating from a diversity of environments, from South America to Alaska. We show that methods based on the purification of DNA fragments using silica columns are more advantageous than in solution methods and increase not only the total amount of DNA molecules...

  18. Automated cleaning and pre-processing of immunoglobulin gene sequences from high-throughput sequencing

    Directory of Open Access Journals (Sweden)

    Miri eMichaeli

    2012-12-01

    Full Text Available High throughput sequencing (HTS yields tens of thousands to millions of sequences that require a large amount of pre-processing work to clean various artifacts. Such cleaning cannot be performed manually. Existing programs are not suitable for immunoglobulin (Ig genes, which are variable and often highly mutated. This paper describes Ig-HTS-Cleaner (Ig High Throughput Sequencing Cleaner, a program containing a simple cleaning procedure that successfully deals with pre-processing of Ig sequences derived from HTS, and Ig-Indel-Identifier (Ig Insertion – Deletion Identifier, a program for identifying legitimate and artifact insertions and/or deletions (indels. Our programs were designed for analyzing Ig gene sequences obtained by 454 sequencing, but they are applicable to all types of sequences and sequencing platforms. Ig-HTS-Cleaner and Ig-Indel-Identifier have been implemented in Java and saved as executable JAR files, supported on Linux and MS Windows. No special requirements are needed in order to run the programs, except for correctly constructing the input files as explained in the text. The programs' performance has been tested and validated on real and simulated data sets.

  19. Development and characterization of multiplex panels of microsatellite markers for Syphacia obvelata, a parasite of the house mouse (Mus musculus), using a high throughput DNA sequencing approach.

    Science.gov (United States)

    Wasimuddin; Čížková, Dagmar; Ribas, Alexis; Piálek, Jaroslav; de Bellocq, Joëlle Goüy; Bryja, Josef

    2012-10-01

    Syphacia obvelata is a common gastro-intestinal parasitic nematode of the house mouse (Mus musculus), a prime model rodent species. Investigations of the genetic structure, variability of parasite populations and other biological aspects of this host-parasite system are limited due to the lack of genetic resources for S. obvelata. To fill this gap, we developed a set of microsatellite markers for S. obvelata, using a 454 pyrosequencing approach. We designed three multiplex panels allowing genotyping of 10 polymorphic loci and scrutinized them on 42 samples from two different regions inhabited by two different house mouse subspecies (Mus musculus musculus and M. m. domesticus). The numbers of alleles ranged from 2 to 6 with mean observed heterozygosities 0.1476 and 0.2095 for domesticus and musculus worms, respectively. The described markers will facilitate further studies on population biology and co-evolution of this host-parasite system. Copyright © 2012 Elsevier B.V. All rights reserved.

  20. Functional approach to high-throughput plant growth analysis

    Science.gov (United States)

    2013-01-01

    Method Taking advantage of the current rapid development in imaging systems and computer vision algorithms, we present HPGA, a high-throughput phenotyping platform for plant growth modeling and functional analysis, which produces better understanding of energy distribution in regards of the balance between growth and defense. HPGA has two components, PAE (Plant Area Estimation) and GMA (Growth Modeling and Analysis). In PAE, by taking the complex leaf overlap problem into consideration, the area of every plant is measured from top-view images in four steps. Given the abundant measurements obtained with PAE, in the second module GMA, a nonlinear growth model is applied to generate growth curves, followed by functional data analysis. Results Experimental results on model plant Arabidopsis thaliana show that, compared to an existing approach, HPGA reduces the error rate of measuring plant area by half. The application of HPGA on the cfq mutant plants under fluctuating light reveals the correlation between low photosynthetic rates and small plant area (compared to wild type), which raises a hypothesis that knocking out cfq changes the sensitivity of the energy distribution under fluctuating light conditions to repress leaf growth. Availability HPGA is available at http://www.msu.edu/~jinchen/HPGA. PMID:24565437

  1. Recent progress using high-throughput sequencing technologies in plant molecular breeding.

    Science.gov (United States)

    Gao, Qiang; Yue, Guidong; Li, Wenqi; Wang, Junyi; Xu, Jiaohui; Yin, Ye

    2012-04-01

    High-throughput sequencing is a revolutionary technological innovation in DNA sequencing. This technology has an ultra-low cost per base of sequencing and an overwhelmingly high data output. High-throughput sequencing has brought novel research methods and solutions to the research fields of genomics and post-genomics. Furthermore, this technology is leading to a new molecular breeding revolution that has landmark significance for scientific research and enables us to launch multi-level, multi-faceted, and multi-extent studies in the fields of crop genetics, genomics, and crop breeding. In this paper, we review progress in the application of high-throughput sequencing technologies to plant molecular breeding studies. © 2012 Institute of Botany, Chinese Academy of Sciences.

  2. Monitoring genomic sequences during SELEX using high-throughput sequencing: neutral SELEX.

    Directory of Open Access Journals (Sweden)

    Bob Zimmermann

    Full Text Available BACKGROUND: SELEX is a well established in vitro selection tool to analyze the structure of ligand-binding nucleic acid sequences called aptamers. Genomic SELEX transforms SELEX into a tool to discover novel, genomically encoded RNA or DNA sequences binding a ligand of interest, called genomic aptamers. Concerns have been raised regarding requirements imposed on RNA sequences undergoing SELEX selection. METHODOLOGY/PRINCIPAL FINDINGS: To evaluate SELEX and assess the extent of these effects, we designed and performed a Neutral SELEX experiment omitting the selection step, such that the sequences are under the sole selective pressure of SELEX's amplification steps. Using high-throughput sequencing, we obtained thousands of full-length sequences from the initial genomic library and the pools after each of the 10 rounds of Neutral SELEX. We compared these to sequences obtained from a Genomic SELEX experiment deriving from the same initial library, but screening for RNAs binding with high affinity to the E. coli regulator protein Hfq. With each round of Neutral SELEX, sequences became less stable and changed in nucleotide content, but no sequences were enriched. In contrast, we detected substantial enrichment in the Hfq-selected set with enriched sequences having structural stability similar to the neutral sequences but with significantly different nucleotide selection. CONCLUSIONS/SIGNIFICANCE: Our data indicate that positive selection in SELEX acts independently of the neutral selective requirements imposed on the sequences. We conclude that Genomic SELEX, when combined with high-throughput sequencing of positively and neutrally selected pools, as well as the gnomic library, is a powerful method to identify genomic aptamers.

  3. Monitoring genomic sequences during SELEX using high-throughput sequencing: neutral SELEX.

    Science.gov (United States)

    Zimmermann, Bob; Gesell, Tanja; Chen, Doris; Lorenz, Christina; Schroeder, Renée

    2010-02-11

    SELEX is a well established in vitro selection tool to analyze the structure of ligand-binding nucleic acid sequences called aptamers. Genomic SELEX transforms SELEX into a tool to discover novel, genomically encoded RNA or DNA sequences binding a ligand of interest, called genomic aptamers. Concerns have been raised regarding requirements imposed on RNA sequences undergoing SELEX selection. To evaluate SELEX and assess the extent of these effects, we designed and performed a Neutral SELEX experiment omitting the selection step, such that the sequences are under the sole selective pressure of SELEX's amplification steps. Using high-throughput sequencing, we obtained thousands of full-length sequences from the initial genomic library and the pools after each of the 10 rounds of Neutral SELEX. We compared these to sequences obtained from a Genomic SELEX experiment deriving from the same initial library, but screening for RNAs binding with high affinity to the E. coli regulator protein Hfq. With each round of Neutral SELEX, sequences became less stable and changed in nucleotide content, but no sequences were enriched. In contrast, we detected substantial enrichment in the Hfq-selected set with enriched sequences having structural stability similar to the neutral sequences but with significantly different nucleotide selection. Our data indicate that positive selection in SELEX acts independently of the neutral selective requirements imposed on the sequences. We conclude that Genomic SELEX, when combined with high-throughput sequencing of positively and neutrally selected pools, as well as the gnomic library, is a powerful method to identify genomic aptamers.

  4. A general approach for discriminative de novo motif discovery from high-throughput data.

    Science.gov (United States)

    Grau, Jan; Posch, Stefan; Grosse, Ivo; Keilwagen, Jens

    2013-11-01

    De novo motif discovery has been an important challenge of bioinformatics for the past two decades. Since the emergence of high-throughput techniques like ChIP-seq, ChIP-exo and protein-binding microarrays (PBMs), the focus of de novo motif discovery has shifted to runtime and accuracy on large data sets. For this purpose, specialized algorithms have been designed for discovering motifs in ChIP-seq or PBM data. However, none of the existing approaches work perfectly for all three high-throughput techniques. In this article, we propose Dimont, a general approach for fast and accurate de novo motif discovery from high-throughput data. We demonstrate that Dimont yields a higher number of correct motifs from ChIP-seq data than any of the specialized approaches and achieves a higher accuracy for predicting PBM intensities from probe sequence than any of the approaches specifically designed for that purpose. Dimont also reports the expected motifs for several ChIP-exo data sets. Investigating differences between in vitro and in vivo binding, we find that for most transcription factors, the motifs discovered by Dimont are in good accordance between techniques, but we also find notable exceptions. We also observe that modeling intra-motif dependencies may increase accuracy, which indicates that more complex motif models are a worthwhile field of research.

  5. High-Throughput Sequencing Based Methods of RNA Structure Investigation

    DEFF Research Database (Denmark)

    Kielpinski, Lukasz Jan

    In this thesis we describe the development of four related methods for RNA structure probing that utilize massive parallel sequencing. Using them, we were able to gather structural data for multiple, long molecules simultaneously. First, we have established an easy to follow experimental and comp......In this thesis we describe the development of four related methods for RNA structure probing that utilize massive parallel sequencing. Using them, we were able to gather structural data for multiple, long molecules simultaneously. First, we have established an easy to follow experimental...... with known priming sites....

  6. Targeted high throughput sequencing in hereditary ataxia and spastic paraplegia

    OpenAIRE

    Iqbal, Zafar; Rydning, Siri L.; Wedding, Iselin M.; Koht, Jeanette; Pihlstr?m, Lasse; Rengmark, Aina H.; Henriksen, Sandra P.; Tallaksen, Chantal M. E.; Toft, Mathias

    2017-01-01

    Hereditary ataxia and spastic paraplegia are heterogeneous monogenic neurodegenerative disorders. To date, a large number of individuals with such disorders remain undiagnosed. Here, we have assessed molecular diagnosis by gene panel sequencing in 105 early and late-onset hereditary ataxia and spastic paraplegia probands, in whom extensive previous investigations had failed to identify the genetic cause of disease. Pathogenic and likely-pathogenic variants were identified in 20 probands (19%)...

  7. Pair-barcode high-throughput sequencing for large-scale multiplexed sample analysis

    Directory of Open Access Journals (Sweden)

    Tu Jing

    2012-01-01

    Full Text Available Abstract Background The multiplexing becomes the major limitation of the next-generation sequencing (NGS in application to low complexity samples. Physical space segregation allows limited multiplexing, while the existing barcode approach only permits simultaneously analysis of up to several dozen samples. Results Here we introduce pair-barcode sequencing (PBS, an economic and flexible barcoding technique that permits parallel analysis of large-scale multiplexed samples. In two pilot runs using SOLiD sequencer (Applied Biosystems Inc., 32 independent pair-barcoded miRNA libraries were simultaneously discovered by the combination of 4 unique forward barcodes and 8 unique reverse barcodes. Over 174,000,000 reads were generated and about 64% of them are assigned to both of the barcodes. After mapping all reads to pre-miRNAs in miRBase, different miRNA expression patterns are captured from the two clinical groups. The strong correlation using different barcode pairs and the high consistency of miRNA expression in two independent runs demonstrates that PBS approach is valid. Conclusions By employing PBS approach in NGS, large-scale multiplexed pooled samples could be practically analyzed in parallel so that high-throughput sequencing economically meets the requirements of samples which are low sequencing throughput demand.

  8. High Throughput Sequencing of Extracellular RNA from Human Plasma.

    Directory of Open Access Journals (Sweden)

    Kirsty M Danielson

    Full Text Available The presence and relative stability of extracellular RNAs (exRNAs in biofluids has led to an emerging recognition of their promise as 'liquid biopsies' for diseases. Most prior studies on discovery of exRNAs as disease-specific biomarkers have focused on microRNAs (miRNAs using technologies such as qRT-PCR and microarrays. The recent application of next-generation sequencing to discovery of exRNA biomarkers has revealed the presence of potential novel miRNAs as well as other RNA species such as tRNAs, snoRNAs, piRNAs and lncRNAs in biofluids. At the same time, the use of RNA sequencing for biofluids poses unique challenges, including low amounts of input RNAs, the presence of exRNAs in different compartments with varying degrees of vulnerability to isolation techniques, and the high abundance of specific RNA species (thereby limiting the sensitivity of detection of less abundant species. Moreover, discovery in human diseases often relies on archival biospecimens of varying age and limiting amounts of samples. In this study, we have tested RNA isolation methods to optimize profiling exRNAs by RNA sequencing in individuals without any known diseases. Our findings are consistent with other recent studies that detect microRNAs and ribosomal RNAs as the major exRNA species in plasma. Similar to other recent studies, we found that the landscape of biofluid microRNA transcriptome is dominated by several abundant microRNAs that appear to comprise conserved extracellular miRNAs. There is reasonable correlation of sets of conserved miRNAs across biological replicates, and even across other data sets obtained at different investigative sites. Conversely, the detection of less abundant miRNAs is far more dependent on the exact methodology of RNA isolation and profiling. This study highlights the challenges in detecting and quantifying less abundant plasma miRNAs in health and disease using RNA sequencing platforms.

  9. High-throughput sequencing of black pepper root transcriptome

    Directory of Open Access Journals (Sweden)

    Gordo Sheila MC

    2012-09-01

    Full Text Available Abstract Background Black pepper (Piper nigrum L. is one of the most popular spices in the world. It is used in cooking and the preservation of food and even has medicinal properties. Losses in production from disease are a major limitation in the culture of this crop. The major diseases are root rot and foot rot, which are results of root infection by Fusarium solani and Phytophtora capsici, respectively. Understanding the molecular interaction between the pathogens and the host’s root region is important for obtaining resistant cultivars by biotechnological breeding. Genetic and molecular data for this species, though, are limited. In this paper, RNA-Seq technology has been employed, for the first time, to describe the root transcriptome of black pepper. Results The root transcriptome of black pepper was sequenced by the NGS SOLiD platform and assembled using the multiple-k method. Blast2Go and orthoMCL methods were used to annotate 10338 unigenes. The 4472 predicted proteins showed about 52% homology with the Arabidopsis proteome. Two root proteomes identified 615 proteins, which seem to define the plant’s root pattern. Simple-sequence repeats were identified that may be useful in studies of genetic diversity and may have applications in biotechnology and ecology. Conclusions This dataset of 10338 unigenes is crucially important for the biotechnological breeding of black pepper and the ecogenomics of the Magnoliids, a major group of basal angiosperms.

  10. High-throughput sequencing of black pepper root transcriptome.

    Science.gov (United States)

    Gordo, Sheila M C; Pinheiro, Daniel G; Moreira, Edith C O; Rodrigues, Simone M; Poltronieri, Marli C; de Lemos, Oriel F; da Silva, Israel Tojal; Ramos, Rommel T J; Silva, Artur; Schneider, Horacio; Silva, Wilson A; Sampaio, Iracilda; Darnet, Sylvain

    2012-09-17

    Black pepper (Piper nigrum L.) is one of the most popular spices in the world. It is used in cooking and the preservation of food and even has medicinal properties. Losses in production from disease are a major limitation in the culture of this crop. The major diseases are root rot and foot rot, which are results of root infection by Fusarium solani and Phytophtora capsici, respectively. Understanding the molecular interaction between the pathogens and the host's root region is important for obtaining resistant cultivars by biotechnological breeding. Genetic and molecular data for this species, though, are limited. In this paper, RNA-Seq technology has been employed, for the first time, to describe the root transcriptome of black pepper. The root transcriptome of black pepper was sequenced by the NGS SOLiD platform and assembled using the multiple-k method. Blast2Go and orthoMCL methods were used to annotate 10338 unigenes. The 4472 predicted proteins showed about 52% homology with the Arabidopsis proteome. Two root proteomes identified 615 proteins, which seem to define the plant's root pattern. Simple-sequence repeats were identified that may be useful in studies of genetic diversity and may have applications in biotechnology and ecology. This dataset of 10338 unigenes is crucially important for the biotechnological breeding of black pepper and the ecogenomics of the Magnoliids, a major group of basal angiosperms.

  11. High-throughput sequencing of black pepper root transcriptome

    Science.gov (United States)

    2012-01-01

    Background Black pepper (Piper nigrum L.) is one of the most popular spices in the world. It is used in cooking and the preservation of food and even has medicinal properties. Losses in production from disease are a major limitation in the culture of this crop. The major diseases are root rot and foot rot, which are results of root infection by Fusarium solani and Phytophtora capsici, respectively. Understanding the molecular interaction between the pathogens and the host’s root region is important for obtaining resistant cultivars by biotechnological breeding. Genetic and molecular data for this species, though, are limited. In this paper, RNA-Seq technology has been employed, for the first time, to describe the root transcriptome of black pepper. Results The root transcriptome of black pepper was sequenced by the NGS SOLiD platform and assembled using the multiple-k method. Blast2Go and orthoMCL methods were used to annotate 10338 unigenes. The 4472 predicted proteins showed about 52% homology with the Arabidopsis proteome. Two root proteomes identified 615 proteins, which seem to define the plant’s root pattern. Simple-sequence repeats were identified that may be useful in studies of genetic diversity and may have applications in biotechnology and ecology. Conclusions This dataset of 10338 unigenes is crucially important for the biotechnological breeding of black pepper and the ecogenomics of the Magnoliids, a major group of basal angiosperms. PMID:22984782

  12. Novel Sequencing-based Strategies for High-Throughput Discovery of Genetic Mutations Underlying Inherited Antibody Deficiency Disorders

    OpenAIRE

    Wang, Hong-Ying; Jain, Ashish

    2011-01-01

    Human inherited antibody deficiency disorders are generally caused by mutations in genes involved in the pathways regulating B-cell class switch recombination; DNA damage repair; and B-cell development, differentiation, and survival. Sequencing a large set of candidate genes involved in these pathways appears to be a highly efficient way to identify novel mutations. Herein we review several high-throughput sequencing approaches as well as recent improvements in target gene enrichment technolo...

  13. The Microsoft Biology Foundation Applications for High-Throughput Sequencing

    Science.gov (United States)

    Mercer, S.

    2010-01-01

    w9-2 The need for reusable libraries of bioinformatics functions has been recognized for many years and a number of language-specific toolkits have been constructed. Such toolkits have served as valuable nucleation points for the community, promoting the sharing of code and establishing standards. The majority of DNA sequencing machines and many other standard pieces of lab equipment are controlled by PCs using Windows, and a Microsoft genomics toolkit would enable initial processing and quality control to happen closer to the instrumentation and provide opportunities for added-value services within core facilities. The Microsoft Biology Foundation (MBF) is an open source software library, freely available for both commercial and academic use, available as an early-stage betafrom mbf.codeplex.com. This presentation will describe the structure and goals of MBF and demonstrate some of its uses.

  14. Automated degenerate PCR primer design for high-throughput sequencing improves efficiency of viral sequencing

    Directory of Open Access Journals (Sweden)

    Li Kelvin

    2012-11-01

    Full Text Available Abstract Background In a high-throughput environment, to PCR amplify and sequence a large set of viral isolates from populations that are potentially heterogeneous and continuously evolving, the use of degenerate PCR primers is an important strategy. Degenerate primers allow for the PCR amplification of a wider range of viral isolates with only one set of pre-mixed primers, thus increasing amplification success rates and minimizing the necessity for genome finishing activities. To successfully select a large set of degenerate PCR primers necessary to tile across an entire viral genome and maximize their success, this process is best performed computationally. Results We have developed a fully automated degenerate PCR primer design system that plays a key role in the J. Craig Venter Institute’s (JCVI high-throughput viral sequencing pipeline. A consensus viral genome, or a set of consensus segment sequences in the case of a segmented virus, is specified using IUPAC ambiguity codes in the consensus template sequence to represent the allelic diversity of the target population. PCR primer pairs are then selected computationally to produce a minimal amplicon set capable of tiling across the full length of the specified target region. As part of the tiling process, primer pairs are computationally screened to meet the criteria for successful PCR with one of two described amplification protocols. The actual sequencing success rates for designed primers for measles virus, mumps virus, human parainfluenza virus 1 and 3, human respiratory syncytial virus A and B and human metapneumovirus are described, where >90% of designed primer pairs were able to consistently successfully amplify >75% of the isolates. Conclusions Augmenting our previously developed and published JCVI Primer Design Pipeline, we achieved similarly high sequencing success rates with only minor software modifications. The recommended methodology for the construction of the consensus

  15. Galaxy Workflows for Web-based Bioinformatics Analysis of Aptamer High-throughput Sequencing Data

    Directory of Open Access Journals (Sweden)

    William H Thiel

    2016-01-01

    Full Text Available Development of RNA and DNA aptamers for diagnostic and therapeutic applications is a rapidly growing field. Aptamers are identified through iterative rounds of selection in a process termed SELEX (Systematic Evolution of Ligands by EXponential enrichment. High-throughput sequencing (HTS revolutionized the modern SELEX process by identifying millions of aptamer sequences across multiple rounds of aptamer selection. However, these vast aptamer HTS datasets necessitated bioinformatics techniques. Herein, we describe a semiautomated approach to analyze aptamer HTS datasets using the Galaxy Project, a web-based open source collection of bioinformatics tools that were originally developed to analyze genome, exome, and transcriptome HTS data. Using a series of Workflows created in the Galaxy webserver, we demonstrate efficient processing of aptamer HTS data and compilation of a database of unique aptamer sequences. Additional Workflows were created to characterize the abundance and persistence of aptamer sequences within a selection and to filter sequences based on these parameters. A key advantage of this approach is that the online nature of the Galaxy webserver and its graphical interface allow for the analysis of HTS data without the need to compile code or install multiple programs.

  16. Galaxy Workflows for Web-based Bioinformatics Analysis of Aptamer High-throughput Sequencing Data.

    Science.gov (United States)

    Thiel, William H

    2016-01-01

    Development of RNA and DNA aptamers for diagnostic and therapeutic applications is a rapidly growing field. Aptamers are identified through iterative rounds of selection in a process termed SELEX (Systematic Evolution of Ligands by EXponential enrichment). High-throughput sequencing (HTS) revolutionized the modern SELEX process by identifying millions of aptamer sequences across multiple rounds of aptamer selection. However, these vast aptamer HTS datasets necessitated bioinformatics techniques. Herein, we describe a semiautomated approach to analyze aptamer HTS datasets using the Galaxy Project, a web-based open source collection of bioinformatics tools that were originally developed to analyze genome, exome, and transcriptome HTS data. Using a series of Workflows created in the Galaxy webserver, we demonstrate efficient processing of aptamer HTS data and compilation of a database of unique aptamer sequences. Additional Workflows were created to characterize the abundance and persistence of aptamer sequences within a selection and to filter sequences based on these parameters. A key advantage of this approach is that the online nature of the Galaxy webserver and its graphical interface allow for the analysis of HTS data without the need to compile code or install multiple programs. Copyright © 2016 Official journal of the American Society of Gene & Cell Therapy. Published by Elsevier Inc. All rights reserved.

  17. ESSENTIALS: Software for Rapid Analysis of High Throughput Transposon Insertion Sequencing Data

    Science.gov (United States)

    Zomer, Aldert; Burghout, Peter; Bootsma, Hester J.; Hermans, Peter W. M.; van Hijum, Sacha A. F. T.

    2012-01-01

    High-throughput analysis of genome-wide random transposon mutant libraries is a powerful tool for (conditional) essential gene discovery. Recently, several next-generation sequencing approaches, e.g. Tn-seq/INseq, HITS and TraDIS, have been developed that accurately map the site of transposon insertions by mutant-specific amplification and sequence readout of DNA flanking the transposon insertions site, assigning a measure of essentiality based on the number of reads per insertion site flanking sequence or per gene. However, analysis of these large and complex datasets is hampered by the lack of an easy to use and automated tool for transposon insertion sequencing data. To fill this gap, we developed ESSENTIALS, an open source, web-based software tool for researchers in the genomics field utilizing transposon insertion sequencing analysis. It accurately predicts (conditionally) essential genes and offers the flexibility of using different sample normalization methods, genomic location bias correction, data preprocessing steps, appropriate statistical tests and various visualizations to examine the results, while requiring only a minimum of input and hands-on work from the researcher. We successfully applied ESSENTIALS to in-house and published Tn-seq, TraDIS and HITS datasets and we show that the various pre- and post-processing steps on the sequence reads and count data with ESSENTIALS considerably improve the sensitivity and specificity of predicted gene essentiality. PMID:22900082

  18. Barcoded sequencing workflow for high throughput digitization of hybridoma antibody variable domain sequences.

    Science.gov (United States)

    Chen, Yongmei; Kim, Si Hyun; Shang, Yonglei; Guillory, Joseph; Stinson, Jeremy; Zhang, Qing; Hötzel, Isidro; Hoi, Kam Hon

    2018-01-20

    Since the invention of Hybridoma technology by Milstein and Köhler in 1975, its application has greatly advanced the antibody discovery process. The technology enables both functional screening and long-term archival of the immortalized monoclonal antibody producing B cells. Despite the dependable cryopreservation technology for hybridoma cells, practicality of long-term storage has been outpaced by recent progress in robotics and automations, which enables routine identification of thousands of antigen specific hybridoma clones. Such throughput increase imposes two nascent challenges in the antibody discovery process, namely limited cryopreservation storage space and limited throughput in conventional antibody sequencing. We herein provide a barcoded sequencing workflow that utilizes next generation sequencing to expand the conventional sequencing capacity. Accompanied with the bioinformatics tools we describe, the barcoded sequencing workflow robustly reports unambiguous antibody sequences as confirmed with Sanger sequencing controls. In complement with the commonly accessible recombinant DNA technology, the barcoded sequencing workflow allows for high throughput digitization of the antibody sequences and provides an effective solution to the limitations imposed by physical storage and sequencing capacity. Copyright © 2018 Genentech, Inc. Published by Elsevier B.V. All rights reserved.

  19. Molecular identification of soil eukaryotes and focused approaches targeting protist and faunal groups using high-throughput metabarcoding

    NARCIS (Netherlands)

    Groot, de G.A.; Laros, I.; Geisen, S.

    2015-01-01

    While until recently the application of high-throughput sequencing approaches has mostly been restricted to bacteria and fungi, these methods have now also become available to less often studied (eukaryotic) groups, such as fauna and protists. Such approaches allow routine diversity screening for

  20. Deep Mutational Scanning: Library Construction, Functional Selection, and High-Throughput Sequencing.

    Science.gov (United States)

    Starita, Lea M; Fields, Stanley

    2015-08-03

    Deep mutational scanning is a highly parallel method that uses high-throughput sequencing to track changes in >10(5) protein variants before and after selection to measure the effects of mutations on protein function. Here we outline the stages of a deep mutational scanning experiment, focusing on the construction of libraries of protein sequence variants and the preparation of Illumina sequencing libraries. © 2015 Cold Spring Harbor Laboratory Press.

  1. Comparing the performance of three ancient DNA extraction methods for high-throughput sequencing.

    Science.gov (United States)

    Gamba, Cristina; Hanghøj, Kristian; Gaunitz, Charleen; Alfarhan, Ahmed H; Alquraishi, Saleh A; Al-Rasheid, Khaled A S; Bradley, Daniel G; Orlando, Ludovic

    2016-03-01

    The DNA molecules that can be extracted from archaeological and palaeontological remains are often degraded and massively contaminated with environmental microbial material. This reduces the efficacy of shotgun approaches for sequencing ancient genomes, despite the decreasing sequencing costs of high-throughput sequencing (HTS). Improving the recovery of endogenous molecules from the DNA extraction and purification steps could, thus, help advance the characterization of ancient genomes. Here, we apply the three most commonly used DNA extraction methods to five ancient bone samples spanning a ~30 thousand year temporal range and originating from a diversity of environments, from South America to Alaska. We show that methods based on the purification of DNA fragments using silica columns are more advantageous than in solution methods and increase not only the total amount of DNA molecules retrieved but also the relative importance of endogenous DNA fragments and their molecular diversity. Therefore, these methods provide a cost-effective solution for downstream applications, including DNA sequencing on HTS platforms. © 2015 John Wiley & Sons Ltd.

  2. In silico prediction and screening of modular crystal structures via a high-throughput genomic approach

    Science.gov (United States)

    Li, Yi; Li, Xu; Liu, Jiancong; Duan, Fangzheng; Yu, Jihong

    2015-01-01

    High-throughput computational methods capable of predicting, evaluating and identifying promising synthetic candidates with desired properties are highly appealing to today's scientists. Despite some successes, in silico design of crystalline materials with complex three-dimensionally extended structures remains challenging. Here we demonstrate the application of a new genomic approach to ABC-6 zeolites, a family of industrially important catalysts whose structures are built from the stacking of modular six-ring layers. The sequences of layer stacking, which we deem the genes of this family, determine the structures and the properties of ABC-6 zeolites. By enumerating these gene-like stacking sequences, we have identified 1,127 most realizable new ABC-6 structures out of 78 groups of 84,292 theoretical ones, and experimentally realized 2 of them. Our genomic approach can extract crucial structural information directly from these gene-like stacking sequences, enabling high-throughput identification of synthetic targets with desired properties among a large number of candidate structures. PMID:26395233

  3. The promise and challenge of high-throughput sequencing of the antibody repertoire

    Science.gov (United States)

    Georgiou, George; Ippolito, Gregory C; Beausang, John; Busse, Christian E; Wardemann, Hedda; Quake, Stephen R

    2014-01-01

    Efforts to determine the antibody repertoire encoded by B cells in the blood or lymphoid organs using high-throughput DNA sequencing technologies have been advancing at an extremely rapid pace and are transforming our understanding of humoral immune responses. Information gained from high-throughput DNA sequencing of immunoglobulin genes (Ig-seq) can be applied to detect B-cell malignancies with high sensitivity, to discover antibodies specific for antigens of interest, to guide vaccine development and to understand autoimmunity. Rapid progress in the development of experimental protocols and informatics analysis tools is helping to reduce sequencing artifacts, to achieve more precise quantification of clonal diversity and to extract the most pertinent biological information. That said, broader application of Ig-seq, especially in clinical settings, will require the development of a standardized experimental design framework that will enable the sharing and meta-analysis of sequencing data generated by different laboratories. PMID:24441474

  4. Pyicos: a versatile toolkit for the analysis of high-throughput sequencing data

    OpenAIRE

    Althammer, Sonja Daniela; González-Vallinas Rostes, Juan, 1983-; Ballaré, Cecilia Julia; Beato, Miguel; Eyras Jiménez, Eduardo

    2011-01-01

    Motivation: High-throughput sequencing (HTS) has revolutionized gene regulation studies and is now fundamental for the detection of protein?DNA and protein?RNA binding, as well as for measuring RNA expression. With increasing variety and sequencing depth of HTS datasets, the need for more flexible and memory-efficient tools to analyse them is growing. Results: We describe Pyicos, a powerful toolkit for the analysis of mapped reads from diverse HTS experiments: ChIP-Seq, either punctuated or b...

  5. Evaluation of a pooled strategy for high-throughput sequencing of cosmid clones from metagenomic libraries.

    Directory of Open Access Journals (Sweden)

    Kathy N Lam

    Full Text Available High-throughput sequencing methods have been instrumental in the growing field of metagenomics, with technological improvements enabling greater throughput at decreased costs. Nonetheless, the economy of high-throughput sequencing cannot be fully leveraged in the subdiscipline of functional metagenomics. In this area of research, environmental DNA is typically cloned to generate large-insert libraries from which individual clones are isolated, based on specific activities of interest. Sequence data are required for complete characterization of such clones, but the sequencing of a large set of clones requires individual barcode-based sample preparation; this can become costly, as the cost of clone barcoding scales linearly with the number of clones processed, and thus sequencing a large number of metagenomic clones often remains cost-prohibitive. We investigated a hybrid Sanger/Illumina pooled sequencing strategy that omits barcoding altogether, and we evaluated this strategy by comparing the pooled sequencing results to reference sequence data obtained from traditional barcode-based sequencing of the same set of clones. Using identity and coverage metrics in our evaluation, we show that pooled sequencing can generate high-quality sequence data, without producing problematic chimeras. Though caveats of a pooled strategy exist and further optimization of the method is required to improve recovery of complete clone sequences and to avoid circumstances that generate unrecoverable clone sequences, our results demonstrate that pooled sequencing represents an effective and low-cost alternative for sequencing large sets of metagenomic clones.

  6. High-Throughput Sequencing of Three Lemnoideae (Duckweeds) Chloroplast Genomes from Total DNA

    Science.gov (United States)

    Wang, Wenqin; Messing, Joachim

    2011-01-01

    Background Chloroplast genomes provide a wealth of information for evolutionary and population genetic studies. Chloroplasts play a particularly important role in the adaption for aquatic plants because they float on water and their major surface is exposed continuously to sunlight. The subfamily of Lemnoideae represents such a collection of aquatic species that because of photosynthesis represents one of the fastest growing plant species on earth. Methods We sequenced the chloroplast genomes from three different genera of Lemnoideae, Spirodela polyrhiza, Wolffiella lingulata and Wolffia australiana by high-throughput DNA sequencing of genomic DNA using the SOLiD platform. Unfractionated total DNA contains high copies of plastid DNA so that sequences from the nucleus and mitochondria can easily be filtered computationally. Remaining sequence reads were assembled into contiguous sequences (contigs) using SOLiD software tools. Contigs were mapped to a reference genome of Lemna minor and gaps, selected by PCR, were sequenced on the ABI3730xl platform. Conclusions This combinatorial approach yielded whole genomic contiguous sequences in a cost-effective manner. Over 1,000-time coverage of chloroplast from total DNA were reached by the SOLiD platform in a single spot on a quadrant slide without purification. Comparative analysis indicated that the chloroplast genome was conserved in gene number and organization with respect to the reference genome of L. minor. However, higher nucleotide substitution, abundant deletions and insertions occurred in non-coding regions of these genomes, indicating a greater genomic dynamics than expected from the comparison of other related species in the Pooideae. Noticeably, there was no transition bias over transversion in Lemnoideae. The data should have immediate applications in evolutionary biology and plant taxonomy with increased resolution and statistical power. PMID:21931804

  7. High-throughput sequencing of three Lemnoideae (duckweeds chloroplast genomes from total DNA.

    Directory of Open Access Journals (Sweden)

    Wenqin Wang

    Full Text Available BACKGROUND: Chloroplast genomes provide a wealth of information for evolutionary and population genetic studies. Chloroplasts play a particularly important role in the adaption for aquatic plants because they float on water and their major surface is exposed continuously to sunlight. The subfamily of Lemnoideae represents such a collection of aquatic species that because of photosynthesis represents one of the fastest growing plant species on earth. METHODS: We sequenced the chloroplast genomes from three different genera of Lemnoideae, Spirodela polyrhiza, Wolffiella lingulata and Wolffia australiana by high-throughput DNA sequencing of genomic DNA using the SOLiD platform. Unfractionated total DNA contains high copies of plastid DNA so that sequences from the nucleus and mitochondria can easily be filtered computationally. Remaining sequence reads were assembled into contiguous sequences (contigs using SOLiD software tools. Contigs were mapped to a reference genome of Lemna minor and gaps, selected by PCR, were sequenced on the ABI3730xl platform. CONCLUSIONS: This combinatorial approach yielded whole genomic contiguous sequences in a cost-effective manner. Over 1,000-time coverage of chloroplast from total DNA were reached by the SOLiD platform in a single spot on a quadrant slide without purification. Comparative analysis indicated that the chloroplast genome was conserved in gene number and organization with respect to the reference genome of L. minor. However, higher nucleotide substitution, abundant deletions and insertions occurred in non-coding regions of these genomes, indicating a greater genomic dynamics than expected from the comparison of other related species in the Pooideae. Noticeably, there was no transition bias over transversion in Lemnoideae. The data should have immediate applications in evolutionary biology and plant taxonomy with increased resolution and statistical power.

  8. The application of the high throughput sequencing technology in the transposable elements.

    Science.gov (United States)

    Liu, Zhen; Xu, Jian-hong

    2015-09-01

    High throughput sequencing technology has dramatically improved the efficiency of DNA sequencing, and decreased the costs to a great extent. Meanwhile, this technology usually has advantages of better specificity, higher sensitivity and accuracy. Therefore, it has been applied to the research on genetic variations, transcriptomics and epigenomics. Recently, this technology has been widely employed in the studies of transposable elements and has achieved fruitful results. In this review, we summarize the application of high throughput sequencing technology in the fields of transposable elements, including the estimation of transposon content, preference of target sites and distribution, insertion polymorphism and population frequency, identification of rare copies, transposon horizontal transfers as well as transposon tagging. We also briefly introduce the major common sequencing strategies and algorithms, their advantages and disadvantages, and the corresponding solutions. Finally, we envision the developing trends of high throughput sequencing technology, especially the third generation sequencing technology, and its application in transposon studies in the future, hopefully providing a comprehensive understanding and reference for related scientific researchers.

  9. High-throughput sequencing enhanced phage display enables the identification of patient-specific epitope motifs in serum

    DEFF Research Database (Denmark)

    Christiansen, Anders; Kringelum, Jens Vindahl; Hansen, Christian Skjødt

    2015-01-01

    Phage display is a prominent screening technique with a multitude of applications including therapeutic antibody development and mapping of antigen epitopes. In this study, phages were selected based on their interaction with patient serum and exhaustively characterised by high-throughput sequenc......Phage display is a prominent screening technique with a multitude of applications including therapeutic antibody development and mapping of antigen epitopes. In this study, phages were selected based on their interaction with patient serum and exhaustively characterised by high......-throughput sequencing. A bioinformatics approach was developed in order to identify peptide motifs of interest based on clustering and contrasting to control samples. Comparison of patient and control samples confirmed a major issue in phage display, namely the selection of unspecific peptides. The potential...... display by (i) enabling the analysis of complex biological samples, (ii) circumventing the traditional laborious picking and functional testing of individual phage clones and (iii) reducing the number of selection rounds....

  10. Analysis of high-throughput sequencing and annotation strategies for phage genomes.

    Directory of Open Access Journals (Sweden)

    Matthew R Henn

    Full Text Available BACKGROUND: Bacterial viruses (phages play a critical role in shaping microbial populations as they influence both host mortality and horizontal gene transfer. As such, they have a significant impact on local and global ecosystem function and human health. Despite their importance, little is known about the genomic diversity harbored in phages, as methods to capture complete phage genomes have been hampered by the lack of knowledge about the target genomes, and difficulties in generating sufficient quantities of genomic DNA for sequencing. Of the approximately 550 phage genomes currently available in the public domain, fewer than 5% are marine phage. METHODOLOGY/PRINCIPAL FINDINGS: To advance the study of phage biology through comparative genomic approaches we used marine cyanophage as a model system. We compared DNA preparation methodologies (DNA extraction directly from either phage lysates or CsCl purified phage particles, and sequencing strategies that utilize either Sanger sequencing of a linker amplification shotgun library (LASL or of a whole genome shotgun library (WGSL, or 454 pyrosequencing methods. We demonstrate that genomic DNA sample preparation directly from a phage lysate, combined with 454 pyrosequencing, is best suited for phage genome sequencing at scale, as this method is capable of capturing complete continuous genomes with high accuracy. In addition, we describe an automated annotation informatics pipeline that delivers high-quality annotation and yields few false positives and negatives in ORF calling. CONCLUSIONS/SIGNIFICANCE: These DNA preparation, sequencing and annotation strategies enable a high-throughput approach to the burgeoning field of phage genomics.

  11. Comprehensive molecular diagnosis of Bardet-Biedl syndrome by high-throughput targeted exome sequencing.

    Directory of Open Access Journals (Sweden)

    Dong-Jun Xing

    Full Text Available Bardet-Biedl syndrome (BBS is an autosomal recessive disorder with significant genetic heterogeneity. BBS is linked to mutations in 17 genes, which contain more than 200 coding exons. Currently, BBS is diagnosed by direct DNA sequencing for mutations in these genes, which because of the large genomic screening region is both time-consuming and expensive. In order to develop a practical method for the clinic diagnosis of BBS, we have developed a high-throughput targeted exome sequencing (TES for genetic diagnosis. Five typical BBS patients were recruited and screened for mutations in a total of 144 known genes responsible for inherited retinal diseases, a hallmark symptom of BBS. The genomic DNA of these patients and their families were subjected to high-throughput DNA re-sequencing. Deep bioinformatics analysis was carried out to filter the massive sequencing data, which were further confirmed through co-segregation analysis. TES successfully revealed mutations in BBS genes in each patient and family member. Six pathological mutations, including five novel mutations, were revealed in the genes BBS2, MKKS, ARL6, MKS1. This study represents the first report of targeted exome sequencing in BBS patients and demonstrates that high-throughput TES is an accurate and rapid method for the genetic diagnosis of BBS.

  12. Target-dependent enrichment of virions determines the reduction of high-throughput sequencing in virus discovery.

    Directory of Open Access Journals (Sweden)

    Randi Holm Jensen

    Full Text Available Viral infections cause many different diseases stemming both from well-characterized viral pathogens but also from emerging viruses, and the search for novel viruses continues to be of great importance. High-throughput sequencing is an important technology for this purpose. However, viral nucleic acids often constitute a minute proportion of the total genetic material in a sample from infected tissue. Techniques to enrich viral targets in high-throughput sequencing have been reported, but the sensitivity of such methods is not well established. This study compares different library preparation techniques targeting both DNA and RNA with and without virion enrichment. By optimizing the selection of intact virus particles, both by physical and enzymatic approaches, we assessed the effectiveness of the specific enrichment of viral sequences as compared to non-enriched sample preparations by selectively looking for and counting read sequences obtained from shotgun sequencing. Using shotgun sequencing of total DNA or RNA, viral targets were detected at concentrations corresponding to the predicted level, providing a foundation for estimating the effectiveness of virion enrichment. Virion enrichment typically produced a 1000-fold increase in the proportion of DNA virus sequences. For RNA virions the gain was less pronounced with a maximum 13-fold increase. This enrichment varied between the different sample concentrations, with no clear trend. Despite that less sequencing was required to identify target sequences, it was not evident from our data that a lower detection level was achieved by virion enrichment compared to shotgun sequencing.

  13. High-throughput massively parallel sequencing for fetal aneuploidy detection from maternal plasma.

    Directory of Open Access Journals (Sweden)

    Taylor J Jensen

    Full Text Available Circulating cell-free (ccf fetal DNA comprises 3-20% of all the cell-free DNA present in maternal plasma. Numerous research and clinical studies have described the analysis of ccf DNA using next generation sequencing for the detection of fetal aneuploidies with high sensitivity and specificity. We sought to extend the utility of this approach by assessing semi-automated library preparation, higher sample multiplexing during sequencing, and improved bioinformatic tools to enable a higher throughput, more efficient assay while maintaining or improving clinical performance.Whole blood (10mL was collected from pregnant female donors and plasma separated using centrifugation. Ccf DNA was extracted using column-based methods. Libraries were prepared using an optimized semi-automated library preparation method and sequenced on an Illumina HiSeq2000 sequencer in a 12-plex format. Z-scores were calculated for affected chromosomes using a robust method after normalization and genomic segment filtering. Classification was based upon a standard normal transformed cutoff value of z = 3 for chromosome 21 and z = 3.95 for chromosomes 18 and 13.Two parallel assay development studies using a total of more than 1900 ccf DNA samples were performed to evaluate the technical feasibility of automating library preparation and increasing the sample multiplexing level. These processes were subsequently combined and a study of 1587 samples was completed to verify the stability of the process-optimized assay. Finally, an unblinded clinical evaluation of 1269 euploid and aneuploid samples utilizing this high-throughput assay coupled to improved bioinformatic procedures was performed. We were able to correctly detect all aneuploid cases with extremely low false positive rates of 0.09%, <0.01%, and 0.08% for trisomies 21, 18, and 13, respectively.These data suggest that the developed laboratory methods in concert with improved bioinformatic approaches enable higher sample

  14. High-throughput Sequencing Based Immune Repertoire Study during Infectious Disease

    Directory of Open Access Journals (Sweden)

    Dongni Hou

    2016-08-01

    Full Text Available The selectivity of the adaptive immune response is based on the enormous diversity of T and B cell antigen-specific receptors. The immune repertoire, the collection of T and B cells with functional diversity in the circulatory system at any given time, is dynamic and reflects the essence of immune selectivity. In this article, we review the recent advances in immune repertoire study of infectious diseases that achieved by traditional techniques and high-throughput sequencing techniques. High-throughput sequencing techniques enable the determination of complementary regions of lymphocyte receptors with unprecedented efficiency and scale. This progress in methodology enhances the understanding of immunologic changes during pathogen challenge, and also provides a basis for further development of novel diagnostic markers, immunotherapies and vaccines.

  15. Rapid Detection and Identification of Infectious Pathogens Based on High-throughput Sequencing

    Directory of Open Access Journals (Sweden)

    Pei-Xiang Ni

    2015-01-01

    Full Text Available Background: The dilemma of pathogens identification in patients with unidentified clinical symptoms such as fever of unknown origin exists, which not only poses a challenge to both the diagnostic and therapeutic process by itself, but also to expert physicians. Methods: In this report, we have attempted to increase the awareness of unidentified pathogens by developing a method to investigate hitherto unidentified infectious pathogens based on unbiased high-throughput sequencing. Results: Our observations show that this method supplements current diagnostic technology that predominantly relies on information derived five cases from the intensive care unit. This methodological approach detects viruses and corrects the incidence of false positive detection rates of pathogens in a much shorter period. Through our method is followed by polymerase chain reaction validation, we could identify infection with Epstein-Barr virus, and in another case, we could identify infection with Streptococcus viridians based on the culture, which was false positive. Conclusions: This technology is a promising approach to revolutionize rapid diagnosis of infectious pathogens and to guide therapy that might result in the improvement of personalized medicine.

  16. Sources of PCR-induced distortions in high-throughput sequencing data sets

    Science.gov (United States)

    Kebschull, Justus M.; Zador, Anthony M.

    2015-01-01

    PCR permits the exponential and sequence-specific amplification of DNA, even from minute starting quantities. PCR is a fundamental step in preparing DNA samples for high-throughput sequencing. However, there are errors associated with PCR-mediated amplification. Here we examine the effects of four important sources of error—bias, stochasticity, template switches and polymerase errors—on sequence representation in low-input next-generation sequencing libraries. We designed a pool of diverse PCR amplicons with a defined structure, and then used Illumina sequencing to search for signatures of each process. We further developed quantitative models for each process, and compared predictions of these models to our experimental data. We find that PCR stochasticity is the major force skewing sequence representation after amplification of a pool of unique DNA amplicons. Polymerase errors become very common in later cycles of PCR but have little impact on the overall sequence distribution as they are confined to small copy numbers. PCR template switches are rare and confined to low copy numbers. Our results provide a theoretical basis for removing distortions from high-throughput sequencing data. In addition, our findings on PCR stochasticity will have particular relevance to quantification of results from single cell sequencing, in which sequences are represented by only one or a few molecules. PMID:26187991

  17. Unbiased Characterization of Anopheles Mosquito Blood Meals by Targeted High-Throughput Sequencing.

    Directory of Open Access Journals (Sweden)

    Kyle Logue

    2016-03-01

    Full Text Available Understanding mosquito host choice is important for assessing vector competence or identifying disease reservoirs. Unfortunately, the availability of an unbiased method for comprehensively evaluating the composition of insect blood meals is very limited, as most current molecular assays only test for the presence of a few pre-selected species. These approaches also have limited ability to identify the presence of multiple mammalian hosts in a single blood meal. Here, we describe a novel high-throughput sequencing method that enables analysis of 96 mosquitoes simultaneously and provides a comprehensive and quantitative perspective on the composition of each blood meal. We validated in silico that universal primers targeting the mammalian mitochondrial 16S ribosomal RNA genes (16S rRNA should amplify more than 95% of the mammalian 16S rRNA sequences present in the NCBI nucleotide database. We applied this method to 442 female Anopheles punctulatus s. l. mosquitoes collected in Papua New Guinea (PNG. While human (52.9%, dog (15.8% and pig (29.2% were the most common hosts identified in our study, we also detected DNA from mice, one marsupial species and two bat species. Our analyses also revealed that 16.3% of the mosquitoes fed on more than one host. Analysis of the human mitochondrial hypervariable region I in 102 human blood meals showed that 5 (4.9% of the mosquitoes unambiguously fed on more than one person. Overall, analysis of PNG mosquitoes illustrates the potential of this approach to identify unsuspected hosts and characterize mixed blood meals, and shows how this approach can be adapted to evaluate inter-individual variations among human blood meals. Furthermore, this approach can be applied to any disease-transmitting arthropod and can be easily customized to investigate non-mammalian host sources.

  18. High-Throughput Approaches to Pinpoint Function within the Noncoding Genome.

    Science.gov (United States)

    Montalbano, Antonino; Canver, Matthew C; Sanjana, Neville E

    2017-10-05

    The clustered regularly interspaced short palindromic repeats (CRISPR)-Cas nuclease system is a powerful tool for genome editing, and its simple programmability has enabled high-throughput genetic and epigenetic studies. These high-throughput approaches offer investigators a toolkit for functional interrogation of not only protein-coding genes but also noncoding DNA. Historically, noncoding DNA has lacked the detailed characterization that has been applied to protein-coding genes in large part because there has not been a robust set of methodologies for perturbing these regions. Although the majority of high-throughput CRISPR screens have focused on the coding genome to date, an increasing number of CRISPR screens targeting noncoding genomic regions continue to emerge. Here, we review high-throughput CRISPR-based approaches to uncover and understand functional elements within the noncoding genome and discuss practical aspects of noncoding library design and screen analysis. Copyright © 2017 Elsevier Inc. All rights reserved.

  19. Exploring the polyadenylated RNA virome of sweet potato through high-throughput sequencing.

    Directory of Open Access Journals (Sweden)

    Ying-Hong Gu

    Full Text Available BACKGROUND: Viral diseases are the second most significant biotic stress for sweet potato, with yield losses reaching 20% to 40%. Over 30 viruses have been reported to infect sweet potato around the world, and 11 of these have been detected in China. Most of these viruses were detected by traditional detection approaches that show disadvantages in detection throughput. Next-generation sequencing technology provides a novel, high sensitive method for virus detection and diagnosis. METHODOLOGY/PRINCIPAL FINDINGS: We report the polyadenylated RNA virome of three sweet potato cultivars using a high throughput RNA sequencing approach. Transcripts of 15 different viruses were detected, 11 of which were detected in cultivar Xushu18, whilst 11 and 4 viruses were detected in Guangshu 87 and Jingshu 6, respectively. Four were detected in sweet potato for the first time, and 4 were found for the first time in China. The most prevalent virus was SPFMV, which constituted 88% of the total viral sequence reads. Virus transcripts with extremely low expression levels were also detected, such as transcripts of SPLCV, CMV and CymMV. Digital gene expression (DGE and reverse transcription polymerase chain reaction (RT-PCR analyses showed that the highest viral transcript expression levels were found in fibrous and tuberous roots, which suggest that these tissues should be optimum samples for virus detection. CONCLUSIONS/SIGNIFICANCE: A total of 15 viruses were presumed to present in three sweet potato cultivars growing in China. This is the first insight into the sweet potato polyadenylated RNA virome. These results can serve as a basis for further work to investigate whether some of the 'new' viruses infecting sweet potato are pathogenic.

  20. Exploring the polyadenylated RNA virome of sweet potato through high-throughput sequencing.

    Science.gov (United States)

    Gu, Ying-Hong; Tao, Xiang; Lai, Xian-Jun; Wang, Hai-Yan; Zhang, Yi-Zheng

    2014-01-01

    Viral diseases are the second most significant biotic stress for sweet potato, with yield losses reaching 20% to 40%. Over 30 viruses have been reported to infect sweet potato around the world, and 11 of these have been detected in China. Most of these viruses were detected by traditional detection approaches that show disadvantages in detection throughput. Next-generation sequencing technology provides a novel, high sensitive method for virus detection and diagnosis. We report the polyadenylated RNA virome of three sweet potato cultivars using a high throughput RNA sequencing approach. Transcripts of 15 different viruses were detected, 11 of which were detected in cultivar Xushu18, whilst 11 and 4 viruses were detected in Guangshu 87 and Jingshu 6, respectively. Four were detected in sweet potato for the first time, and 4 were found for the first time in China. The most prevalent virus was SPFMV, which constituted 88% of the total viral sequence reads. Virus transcripts with extremely low expression levels were also detected, such as transcripts of SPLCV, CMV and CymMV. Digital gene expression (DGE) and reverse transcription polymerase chain reaction (RT-PCR) analyses showed that the highest viral transcript expression levels were found in fibrous and tuberous roots, which suggest that these tissues should be optimum samples for virus detection. A total of 15 viruses were presumed to present in three sweet potato cultivars growing in China. This is the first insight into the sweet potato polyadenylated RNA virome. These results can serve as a basis for further work to investigate whether some of the 'new' viruses infecting sweet potato are pathogenic.

  1. High-throughput sequencing of forensic genetic samples using punches of FTA cards with buccal swabs

    DEFF Research Database (Denmark)

    Kampmann, Marie-Louise; Buchard, Anders; Børsting, Claus

    2016-01-01

    Here, we demonstrate that punches from buccal swab samples preserved on FTA cards can be used for high-throughput DNA sequencing, also known as massively parallel sequencing (MPS). We typed 44 reference samples with the HID-Ion AmpliSeq Identity Panel using washed 1.2 mm punches from FTA cards...... with buccal swabs and compared the results with those obtained with DNA extracted using the EZ1 DNA Investigator Kit. Concordant profiles were obtained for all samples. Our protocol includes simple punch, wash, and PCR steps, reducing cost and hands-on time in the laboratory. Furthermore, it facilitates...... automation of DNA sequencing....

  2. Multidimensional NMR approaches towards highly resolved, sensitive and high-throughput quantitative metabolomics.

    Science.gov (United States)

    Marchand, Jérémy; Martineau, Estelle; Guitton, Yann; Dervilly-Pinel, Gaud; Giraudeau, Patrick

    2017-02-01

    Multi-dimensional NMR is an appealing approach for dealing with the challenging complexity of biological samples in metabolomics. This article describes how spectroscopists have recently challenged their imagination in order to make 2D NMR a powerful tool for quantitative metabolomics, based on innovative pulse sequences combined with meticulous analytical chemistry approaches. Clever time-saving strategies have also been explored to make 2D NMR a high-throughput tool for metabolomics, relying on alternative data acquisition schemes such as ultrafast NMR. Currently, much work is aimed at drastically boosting the NMR sensitivity thanks to hyperpolarisation techniques, which have been used in combination with fast acquisition methods and could greatly expand the application potential of NMR metabolomics. Copyright © 2016 Elsevier Ltd. All rights reserved.

  3. Recent advances in high-throughput approaches to dissect enhancer function [version 1; referees: 3 approved

    Directory of Open Access Journals (Sweden)

    David Santiago-Algarra

    2017-06-01

    Full Text Available The regulation of gene transcription in higher eukaryotes is accomplished through the involvement of transcription start site (TSS-proximal (promoters and -distal (enhancers regulatory elements. It is now well acknowledged that enhancer elements play an essential role during development and cell differentiation, while genetic alterations in these elements are a major cause of human disease. Many strategies have been developed to identify and characterize enhancers. Here, we discuss recent advances in high-throughput approaches to assess enhancer activity, from the well-established massively parallel reporter assays to the recent clustered regularly interspaced short palindromic repeats (CRISPR/Cas9-based technologies. We highlight how these approaches contribute toward a better understanding of enhancer function, eventually leading to the discovery of new types of regulatory sequences, and how the alteration of enhancers can affect transcriptional regulation.

  4. Improved detection of artifactual viral minority variants in high-throughput sequencing data

    Directory of Open Access Journals (Sweden)

    Matthijs Rudolf Albert Welkers

    2015-01-01

    Full Text Available High-throughput sequencing (HTS of viral samples provides important information on the presence of viral minority variants. However, detection and accurate quantification is limited by the capacity to distinguish biological from artificial variation. In this study, errors related to the Illumina Hiseq2000 library generation and HTS process were investigated by determining minority variant frequencies in an influenza A/WSN/1933(H1N1 virus reverse-genetics plasmid pool. Errors related to amplification and sequencing were determined using the same plasmid pool, by generation of infectious virus using reverse genetics followed by in duplo reverse-transcriptase PCR (RT-PCR amplification and HTS in the same sequence run. Results showed that after ‘best practice’ quality control (QC, within the plasmid pool, 1 minority variant with a frequency >0.5% was identified, while 84 and 139 were identified in the RT-PCR amplified samples, indicating RT-PCR amplification artificially increased variation. Detailed analysis showed that artifactual minority variants could be identified by two major technical characteristics: their predominant presence in a single read orientation and uneven distribution of mismatches over the length of the reads. We demonstrate that by addition of two QC steps 95% of the artifactual minority variants could be identified. When our analysis approach was applied to 3 clinical samples 68% of the initially identified minority variants were identified as artifacts. Our study clearly demonstrated that, without additional QC steps, overestimation of viral minority variants is very likely to occur, mainly as a consequence of the required RT-PCR amplification step. The improved ability to detect and correct for artifactual minority variants, increases data resolution and could aid both past and future studies incorporating HTS. The source code has been made available through Sourceforge (https://sourceforge.net/projects/mva-ngs.

  5. Transcriptomic analysis of Petunia hybrida in response to salt stress using high throughput RNA sequencing.

    Directory of Open Access Journals (Sweden)

    Gonzalo H Villarino

    Full Text Available Salinity and drought stress are the primary cause of crop losses worldwide. In sodic saline soils sodium chloride (NaCl disrupts normal plant growth and development. The complex interactions of plant systems with abiotic stress have made RNA sequencing a more holistic and appealing approach to study transcriptome level responses in a single cell and/or tissue. In this work, we determined the Petunia transcriptome response to NaCl stress by sequencing leaf samples and assembling 196 million Illumina reads with Trinity software. Using our reference transcriptome we identified more than 7,000 genes that were differentially expressed within 24 h of acute NaCl stress. The proposed transcriptome can also be used as an excellent tool for biological and bioinformatics in the absence of an available Petunia genome and it is available at the SOL Genomics Network (SGN http://solgenomics.net. Genes related to regulation of reactive oxygen species, transport, and signal transductions as well as novel and undescribed transcripts were among those differentially expressed in response to salt stress. The candidate genes identified in this study can be applied as markers for breeding or to genetically engineer plants to enhance salt tolerance. Gene Ontology analyses indicated that most of the NaCl damage happened at 24 h inducing genotoxicity, affecting transport and organelles due to the high concentration of Na+ ions. Finally, we report a modification to the library preparation protocol whereby cDNA samples were bar-coded with non-HPLC purified primers, without affecting the quality and quantity of the RNA-seq data. The methodological improvement presented here could substantially reduce the cost of sample preparation for future high-throughput RNA sequencing experiments.

  6. Annotation of primate miRNAs by high throughput sequencing of small RNA libraries

    Directory of Open Access Journals (Sweden)

    Dannemann Michael

    2012-03-01

    Full Text Available Abstract Background In addition to genome sequencing, accurate functional annotation of genomes is required in order to carry out comparative and evolutionary analyses between species. Among primates, the human genome is the most extensively annotated. Human miRNA gene annotation is based on multiple lines of evidence including evidence for expression as well as prediction of the characteristic hairpin structure. In contrast, most miRNA genes in non-human primates are annotated based on homology without any expression evidence. We have sequenced small-RNA libraries from chimpanzee, gorilla, orangutan and rhesus macaque from multiple individuals and tissues. Using patterns of miRNA expression in conjunction with a model of miRNA biogenesis we used these high-throughput sequencing data to identify novel miRNAs in non-human primates. Results We predicted 47 new miRNAs in chimpanzee, 240 in gorilla, 55 in orangutan and 47 in rhesus macaque. The algorithm we used was able to predict 64% of the previously known miRNAs in chimpanzee, 94% in gorilla, 61% in orangutan and 71% in rhesus macaque. We therefore added evidence for expression in between one and five tissues to miRNAs that were previously annotated based only on homology to human miRNAs. We increased from 60 to 175 the number miRNAs that are located in orthologous regions in humans and the four non-human primate species studied here. Conclusions In this study we provide expression evidence for homology-based annotated miRNAs and predict de novo miRNAs in four non-human primate species. We increased the number of annotated miRNA genes and provided evidence for their expression in four non-human primates. Similar approaches using different individuals and tissues would improve annotation in non-human primates and allow for further comparative studies in the future.

  7. Finding sRNA generative locales from high-throughput sequencing data with NiBLS

    Directory of Open Access Journals (Sweden)

    Moulton Vincent

    2010-02-01

    Full Text Available Abstract Background Next-generation sequencing technologies allow researchers to obtain millions of sequence reads in a single experiment. One important use of the technology is the sequencing of small non-coding regulatory RNAs and the identification of the genomic locales from which they originate. Currently, there is a paucity of methods for finding small RNA generative locales. Results We describe and implement an algorithm that can determine small RNA generative locales from high-throughput sequencing data. The algorithm creates a network, or graph, of the small RNAs by creating links between them depending on their proximity on the target genome. For each of the sub-networks in the resulting graph the clustering coefficient, a measure of the interconnectedness of the subnetwork, is used to identify the generative locales. We test the algorithm over a wide range of parameters using RFAM sequences as positive controls and demonstrate that the algorithm has good sensitivity and specificity in a range of Arabidopsis and mouse small RNA sequence sets and that the locales it generates are robust to differences in the choice of parameters. Conclusions NiBLS is a fast, reliable and sensitive method for determining small RNA locales in high-throughput sequence data that is generally applicable to all classes of small RNA.

  8. Exploring fungal diversity in deep-sea sediments from Okinawa Trough using high-throughput Illumina sequencing

    Science.gov (United States)

    Zhang, Xiao-Yong; Wang, Guang-Hua; Xu, Xin-Ya; Nong, Xu-Hua; Wang, Jie; Amin, Muhammad; Qi, Shu-Hua

    2016-10-01

    The present study investigated the fungal diversity in four different deep-sea sediments from Okinawa Trough using high-throughput Illumina sequencing of the nuclear ribosomal internal transcribed spacer-1 (ITS1). A total of 40,297 fungal ITS1 sequences clustered into 420 operational taxonomic units (OTUs) with 97% sequence similarity and 170 taxa were recovered from these sediments. Most ITS1 sequences (78%) belonged to the phylum Ascomycota, followed by Basidiomycota (17.3%), Zygomycota (1.5%) and Chytridiomycota (0.8%), and a small proportion (2.4%) belonged to unassigned fungal phyla. Compared with previous studies on fungal diversity of sediments from deep-sea environments by culture-dependent approach and clone library analysis, the present result suggested that Illumina sequencing had been dramatically accelerating the discovery of fungal community of deep-sea sediments. Furthermore, our results revealed that Sordariomycetes was the most diverse and abundant fungal class in this study, challenging the traditional view that the diversity of Sordariomycetes phylotypes was low in the deep-sea environments. In addition, more than 12 taxa accounted for 21.5% sequences were found to be rarely reported as deep-sea fungi, suggesting the deep-sea sediments from Okinawa Trough harbored a plethora of different fungal communities compared with other deep-sea environments. To our knowledge, this study is the first exploration of the fungal diversity in deep-sea sediments from Okinawa Trough using high-throughput Illumina sequencing.

  9. On the optimal trimming of high-throughput mRNA sequence data

    Directory of Open Access Journals (Sweden)

    Matthew D MacManes

    2014-01-01

    Full Text Available The widespread and rapid adoption of high-throughput sequencing technologies has afforded researchers the opportunity to gain a deep understanding of genome level processes that underlie evolutionary change, and perhaps more importantly, the links between genotype and phenotype. In particular, researchers interested in functional biology and adaptation have used these technologies to sequence mRNA transcriptomes of specific tissues, which in turn are often compared to other tissues, or other individuals with different phenotypes. While these techniques are extremely powerful, careful attention to data quality is required. In particular, because high-throughput sequencing is more error-prone than traditional Sanger sequencing, quality trimming of sequence reads should be an important step in all data processing pipelines. While several software packages for quality trimming exist, no general guidelines for the specifics of trimming have been developed. Here, using empirically derived sequence data, I provide general recommendations regarding the optimal strength of trimming, specifically in mRNA-Seq studies. Although very aggressive quality trimming is common, this study suggests that a more gentle trimming, specifically of those nucleotides whose Phred score < 2 or < 5, is optimal for most studies across a wide variety of metrics.

  10. A family-based probabilistic method for capturing de novo mutations from high-throughput short-read sequencing data.

    Science.gov (United States)

    Cartwright, Reed A; Hussin, Julie; Keebler, Jonathan E M; Stone, Eric A; Awadalla, Philip

    2012-01-06

    Recent advances in high-throughput DNA sequencing technologies and associated statistical analyses have enabled in-depth analysis of whole-genome sequences. As this technology is applied to a growing number of individual human genomes, entire families are now being sequenced. Information contained within the pedigree of a sequenced family can be leveraged when inferring the donors' genotypes. The presence of a de novo mutation within the pedigree is indicated by a violation of Mendelian inheritance laws. Here, we present a method for probabilistically inferring genotypes across a pedigree using high-throughput sequencing data and producing the posterior probability of de novo mutation at each genomic site examined. This framework can be used to disentangle the effects of germline and somatic mutational processes and to simultaneously estimate the effect of sequencing error and the initial genetic variation in the population from which the founders of the pedigree arise. This approach is examined in detail through simulations and areas for method improvement are noted. By applying this method to data from members of a well-defined nuclear family with accurate pedigree information, the stage is set to make the most direct estimates of the human mutation rate to date.

  11. Semi-automated library preparation for high-throughput DNA sequencing platforms.

    Science.gov (United States)

    Farias-Hesson, Eveline; Erikson, Jonathan; Atkins, Alexander; Shen, Peidong; Davis, Ronald W; Scharfe, Curt; Pourmand, Nader

    2010-01-01

    Next-generation sequencing platforms are powerful technologies, providing gigabases of genetic information in a single run. An important prerequisite for high-throughput DNA sequencing is the development of robust and cost-effective preprocessing protocols for DNA sample library construction. Here we report the development of a semi-automated sample preparation protocol to produce adaptor-ligated fragment libraries. Using a liquid-handling robot in conjunction with Carboxy Terminated Magnetic Beads, we labeled each library sample using a unique 6 bp DNA barcode, which allowed multiplex sample processing and sequencing of 32 libraries in a single run using Applied Biosystems' SOLiD sequencer. We applied our semi-automated pipeline to targeted medical resequencing of nuclear candidate genes in individuals affected by mitochondrial disorders. This novel method is capable of preparing as much as 32 DNA libraries in 2.01 days (8-hour workday) for emulsion PCR/high throughput DNA sequencing, increasing sample preparation production by 8-fold.

  12. Allelome.PRO, a pipeline to define allele-specific genomic features from high-throughput sequencing data.

    Science.gov (United States)

    Andergassen, Daniel; Dotter, Christoph P; Kulinski, Tomasz M; Guenzl, Philipp M; Bammer, Philipp C; Barlow, Denise P; Pauler, Florian M; Hudson, Quanah J

    2015-12-02

    Detecting allelic biases from high-throughput sequencing data requires an approach that maximises sensitivity while minimizing false positives. Here, we present Allelome.PRO, an automated user-friendly bioinformatics pipeline, which uses high-throughput sequencing data from reciprocal crosses of two genetically distinct mouse strains to detect allele-specific expression and chromatin modifications. Allelome.PRO extends approaches used in previous studies that exclusively analyzed imprinted expression to give a complete picture of the 'allelome' by automatically categorising the allelic expression of all genes in a given cell type into imprinted, strain-biased, biallelic or non-informative. Allelome.PRO offers increased sensitivity to analyze lowly expressed transcripts, together with a robust false discovery rate empirically calculated from variation in the sequencing data. We used RNA-seq data from mouse embryonic fibroblasts from F1 reciprocal crosses to determine a biologically relevant allelic ratio cutoff, and define for the first time an entire allelome. Furthermore, we show that Allelome.PRO detects differential enrichment of H3K4me3 over promoters from ChIP-seq data validating the RNA-seq results. This approach can be easily extended to analyze histone marks of active enhancers, or transcription factor binding sites and therefore provides a powerful tool to identify candidate cis regulatory elements genome wide. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  13. Forecasting Ecological Genomics: High-Tech Animal Instrumentation Meets High-Throughput Sequencing.

    Science.gov (United States)

    Shafer, Aaron B A; Northrup, Joseph M; Wikelski, Martin; Wittemyer, George; Wolf, Jochen B W

    2016-01-01

    Recent advancements in animal tracking technology and high-throughput sequencing are rapidly changing the questions and scope of research in the biological sciences. The integration of genomic data with high-tech animal instrumentation comes as a natural progression of traditional work in ecological genetics, and we provide a framework for linking the separate data streams from these technologies. Such a merger will elucidate the genetic basis of adaptive behaviors like migration and hibernation and advance our understanding of fundamental ecological and evolutionary processes such as pathogen transmission, population responses to environmental change, and communication in natural populations.

  14. New Tools For Understanding Microbial Diversity Using High-throughput Sequence Data

    Science.gov (United States)

    Knight, R.; Hamady, M.; Liu, Z.; Lozupone, C.

    2007-12-01

    High-throughput sequencing techniques such as 454 are straining the limits of tools traditionally used to build trees, choose OTUs, and perform other essential sequencing tasks. We have developed a workflow for phylogenetic analysis of large-scale sequence data sets that combines existing tools, such as the Arb phylogeny package and the NAST multiple sequence alignment tool, with new methods for choosing and clustering OTUs and for performing phylogenetic community analysis with UniFrac. This talk discusses the cyberinfrastructure we are developing to support the human microbiome project, and the application of these workflows to analyze very large data sets that contrast the gut microbiota with a range of physical environments. These tools will ultimately help to define core and peripheral microbiomes in a range of environments, and will allow us to understand the physical and biotic factors that contribute most to differences in microbial diversity.

  15. PCR primers to study the diversity of expressed fungal genes encoding lignocellulolytic enzymes in soils using high-throughput sequencing.

    Directory of Open Access Journals (Sweden)

    Florian Barbi

    Full Text Available Plant biomass degradation in soil is one of the key steps of carbon cycling in terrestrial ecosystems. Fungal saprotrophic communities play an essential role in this process by producing hydrolytic enzymes active on the main components of plant organic matter. Open questions in this field regard the diversity of the species involved, the major biochemical pathways implicated and how these are affected by external factors such as litter quality or climate changes. This can be tackled by environmental genomic approaches involving the systematic sequencing of key enzyme-coding gene families using soil-extracted RNA as material. Such an approach necessitates the design and evaluation of gene family-specific PCR primers producing sequence fragments compatible with high-throughput sequencing approaches. In the present study, we developed and evaluated PCR primers for the specific amplification of fungal CAZy Glycoside Hydrolase gene families GH5 (subfamily 5 and GH11 encoding endo-β-1,4-glucanases and endo-β-1,4-xylanases respectively as well as Basidiomycota class II peroxidases, corresponding to the CAZy Auxiliary Activity family 2 (AA2, active on lignin. These primers were experimentally validated using DNA extracted from a wide range of Ascomycota and Basidiomycota species including 27 with sequenced genomes. Along with the published primers for Glycoside Hydrolase GH7 encoding enzymes active on cellulose, the newly design primers were shown to be compatible with the Illumina MiSeq sequencing technology. Sequences obtained from RNA extracted from beech or spruce forest soils showed a high diversity and were uniformly distributed in gene trees featuring the global diversity of these gene families. This high-throughput sequencing approach using several degenerate primers constitutes a robust method, which allows the simultaneous characterization of the diversity of different fungal transcripts involved in plant organic matter degradation and may

  16. Using machine learning and high-throughput RNA sequencing to classify the precursors of small non-coding RNAs

    OpenAIRE

    Ryvkin, Paul; Leung, Yuk Yee; Ungar, Lyle H.; Gregory, Brian D.; Wang, Li-San

    2013-01-01

    Recent advances in high-throughput sequencing allow researchers to examine the transcriptome in more detail than ever before. Using a method known as high-throughput small RNA-sequencing, we can now profile the expression of small regulatory RNAs such as microRNAs and small interfering RNAs (siRNAs) with a great deal of sensitivity. However, there are many other types of small RNAs (

  17. Efficient strategy for the molecular diagnosis of intellectual disability using targeted high-throughput sequencing.

    Science.gov (United States)

    Redin, Claire; Gérard, Bénédicte; Lauer, Julia; Herenger, Yvan; Muller, Jean; Quartier, Angélique; Masurel-Paulet, Alice; Willems, Marjolaine; Lesca, Gaétan; El-Chehadeh, Salima; Le Gras, Stéphanie; Vicaire, Serge; Philipps, Muriel; Dumas, Michaël; Geoffroy, Véronique; Feger, Claire; Haumesser, Nicolas; Alembik, Yves; Barth, Magalie; Bonneau, Dominique; Colin, Estelle; Dollfus, Hélène; Doray, Bérénice; Delrue, Marie-Ange; Drouin-Garraud, Valérie; Flori, Elisabeth; Fradin, Mélanie; Francannet, Christine; Goldenberg, Alice; Lumbroso, Serge; Mathieu-Dramard, Michèle; Martin-Coignard, Dominique; Lacombe, Didier; Morin, Gilles; Polge, Anne; Sukno, Sylvie; Thauvin-Robinet, Christel; Thevenon, Julien; Doco-Fenzy, Martine; Genevieve, David; Sarda, Pierre; Edery, Patrick; Isidor, Bertrand; Jost, Bernard; Olivier-Faivre, Laurence; Mandel, Jean-Louis; Piton, Amélie

    2014-11-01

    Intellectual disability (ID) is characterised by an extreme genetic heterogeneity. Several hundred genes have been associated to monogenic forms of ID, considerably complicating molecular diagnostics. Trio-exome sequencing was recently proposed as a diagnostic approach, yet remains costly for a general implementation. We report the alternative strategy of targeted high-throughput sequencing of 217 genes in which mutations had been reported in patients with ID or autism as the major clinical concern. We analysed 106 patients with ID of unknown aetiology following array-CGH analysis and other genetic investigations. Ninety per cent of these patients were males, and 75% sporadic cases. We identified 26 causative mutations: 16 in X-linked genes (ATRX, CUL4B, DMD, FMR1, HCFC1, IL1RAPL1, IQSEC2, KDM5C, MAOA, MECP2, SLC9A6, SLC16A2, PHF8) and 10 de novo in autosomal-dominant genes (DYRK1A, GRIN1, MED13L, TCF4, RAI1, SHANK3, SLC2A1, SYNGAP1). We also detected four possibly causative mutations (eg, in NLGN3) requiring further investigations. We present detailed reasoning for assigning causality for each mutation, and associated patients' clinical information. Some genes were hit more than once in our cohort, suggesting they correspond to more frequent ID-associated conditions (KDM5C, MECP2, DYRK1A, TCF4). We highlight some unexpected genotype to phenotype correlations, with causative mutations being identified in genes associated to defined syndromes in patients deviating from the classic phenotype (DMD, TCF4, MECP2). We also bring additional supportive (HCFC1, MED13L) or unsupportive (SHROOM4, SRPX2) evidences for the implication of previous candidate genes or mutations in cognitive disorders. With a diagnostic yield of 25% targeted sequencing appears relevant as a first intention test for the diagnosis of ID, but importantly will also contribute to a better understanding regarding the specific contribution of the many genes implicated in ID and autism. Published by the

  18. High-throughput sequencing and graph-based cluster analysis facilitate microsatellite development from a highly complex genome.

    Science.gov (United States)

    Shah, Abhijeet B; Schielzeth, Holger; Albersmeier, Andreas; Kalinowski, Joern; Hoffman, Joseph I

    2016-08-01

    Despite recent advances in high-throughput sequencing, difficulties are often encountered when developing microsatellites for species with large and complex genomes. This probably reflects the close association in many species of microsatellites with cryptic repetitive elements. We therefore developed a novel approach for isolating polymorphic microsatellites from the club-legged grasshopper (Gomphocerus sibiricus), an emerging quantitative genetic and behavioral model system. Whole genome shotgun Illumina MiSeq sequencing was used to generate over three million 300 bp paired-end reads, of which 67.75% were grouped into 40,548 clusters within RepeatExplorer. Annotations of the top 468 clusters, which represent 60.5% of the reads, revealed homology to satellite DNA and a variety of transposable elements. Evaluating 96 primer pairs in eight wild-caught individuals, we found that primers mined from singleton reads were six times more likely to amplify a single polymorphic microsatellite locus than primers mined from clusters. Our study provides experimental evidence in support of the notion that microsatellites associated with repetitive elements are less likely to successfully amplify. It also reveals how advances in high-throughput sequencing and graph-based repetitive DNA analysis can be leveraged to isolate polymorphic microsatellites from complex genomes.

  19. A High-Throughput Process for the Solid-Phase Purification of Synthetic DNA Sequences.

    Science.gov (United States)

    Grajkowski, Andrzej; Cieślak, Jacek; Beaucage, Serge L

    2017-06-19

    An efficient process for the purification of synthetic phosphorothioate and native DNA sequences is presented. The process is based on the use of an aminopropylated silica gel support functionalized with aminooxyalkyl functions to enable capture of DNA sequences through an oximation reaction with the keto function of a linker conjugated to the 5'-terminus of DNA sequences. Deoxyribonucleoside phosphoramidites carrying this linker, as a 5'-hydroxyl protecting group, have been synthesized for incorporation into DNA sequences during the last coupling step of a standard solid-phase synthesis protocol executed on a controlled pore glass (CPG) support. Solid-phase capture of the nucleobase- and phosphate-deprotected DNA sequences released from the CPG support is demonstrated to proceed near quantitatively. Shorter than full-length DNA sequences are first washed away from the capture support; the solid-phase purified DNA sequences are then released from this support upon reaction with tetra-n-butylammonium fluoride in dry dimethylsulfoxide (DMSO) and precipitated in tetrahydrofuran (THF). The purity of solid-phase-purified DNA sequences exceeds 98%. The simulated high-throughput and scalability features of the solid-phase purification process are demonstrated without sacrificing purity of the DNA sequences. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.

  20. SUGAR: graphical user interface-based data refiner for high-throughput DNA sequencing.

    Science.gov (United States)

    Sato, Yukuto; Kojima, Kaname; Nariai, Naoki; Yamaguchi-Kabata, Yumi; Kawai, Yosuke; Takahashi, Mamoru; Mimori, Takahiro; Nagasaki, Masao

    2014-08-08

    Next-generation sequencers (NGSs) have become one of the main tools for current biology. To obtain useful insights from the NGS data, it is essential to control low-quality portions of the data affected by technical errors such as air bubbles in sequencing fluidics. We develop a software SUGAR (subtile-based GUI-assisted refiner) which can handle ultra-high-throughput data with user-friendly graphical user interface (GUI) and interactive analysis capability. The SUGAR generates high-resolution quality heatmaps of the flowcell, enabling users to find possible signals of technical errors during the sequencing. The sequencing data generated from the error-affected regions of a flowcell can be selectively removed by automated analysis or GUI-assisted operations implemented in the SUGAR. The automated data-cleaning function based on sequence read quality (Phred) scores was applied to a public whole human genome sequencing data and we proved the overall mapping quality was improved. The detailed data evaluation and cleaning enabled by SUGAR would reduce technical problems in sequence read mapping, improving subsequent variant analysis that require high-quality sequence data and mapping results. Therefore, the software will be especially useful to control the quality of variant calls to the low population cells, e.g., cancers, in a sample with technical errors of sequencing procedures.

  1. Using high throughput sequencing to explore the biodiversity in oral bacterial communities

    Science.gov (United States)

    Diaz, P.I.; Dupuy, A.K.; Abusleme, L.; Reese, B.; Obergfell, C.; Choquette, L.; Dongari-Bagtzoglou, A.; Peterson, D.E.; Terzi, E.; Strausbaugh, L.D.

    2013-01-01

    Summary High throughput sequencing of 16S ribosomal RNA gene amplicons is a cost-effective method for characterization of oral bacterial communities. However, before undertaking large-scale studies, it is necessary to understand the technique-associated limitations and intrinsic variability of the oral ecosystem. In this work we evaluated bias in species representation using an in vitro-assembled mock community of oral bacteria. We then characterized the bacterial communities in saliva and buccal mucosa of five healthy subjects to investigate the power of high throughput sequencing in revealing their diversity and biogeography patterns. Mock community analysis showed primer and DNA isolation biases and an overestimation of diversity that was reduced after eliminating singleton operational taxonomic units (OTUs). Sequencing of salivary and mucosal communities found a total of 455 OTUs (0.3% dissimilarity) with only 78 of these present in all subjects. We demonstrate that this variability was partly the result of incomplete richness coverage even at great sequencing depths, and so comparing communities by their structure was more effective than comparisons based solely on membership. With respect to oral biogeography, we found inter-subject variability in community structure was lower than site differences between salivary and mucosal communities within subjects. These differences were evident at very low sequencing depths and were mostly caused by the abundance of Streptococcus mitis and Gemella haemolysans in mucosa. In summary, we present an experimental and data analysis framework that will facilitate design and interpretation of pyrosequencing-based studies. Despite challenges associated with this technique, we demonstrate its power for evaluation of oral diversity and biogeography patterns. PMID:22520388

  2. Low-Cost, High-Throughput Sequencing of DNA Assemblies Using a Highly Multiplexed Nextera Process.

    Science.gov (United States)

    Shapland, Elaine B; Holmes, Victor; Reeves, Christopher D; Sorokin, Elena; Durot, Maxime; Platt, Darren; Allen, Christopher; Dean, Jed; Serber, Zach; Newman, Jack; Chandran, Sunil

    2015-07-17

    In recent years, next-generation sequencing (NGS) technology has greatly reduced the cost of sequencing whole genomes, whereas the cost of sequence verification of plasmids via Sanger sequencing has remained high. Consequently, industrial-scale strain engineers either limit the number of designs or take short cuts in quality control. Here, we show that over 4000 plasmids can be completely sequenced in one Illumina MiSeq run for less than $3 each (15× coverage), which is a 20-fold reduction over using Sanger sequencing (2× coverage). We reduced the volume of the Nextera tagmentation reaction by 100-fold and developed an automated workflow to prepare thousands of samples for sequencing. We also developed software to track the samples and associated sequence data and to rapidly identify correctly assembled constructs having the fewest defects. As DNA synthesis and assembly become a centralized commodity, this NGS quality control (QC) process will be essential to groups operating high-throughput pipelines for DNA construction.

  3. Use of high throughput sequencing to study oomycete communities in soil and roots

    DEFF Research Database (Denmark)

    Sapkota, Rumakanta; Nicolaisen, Mogens

    2015-01-01

    limited understanding of the diversity of oomycetes in symptomatic plant tissue as well as in root zones. The aim of this study was to improve and validate techniques for using high throughput sequencing as a tool for studying oomycete communities. Primer sets ITS4, ITS6 and ITS7 that have been used...... taxonomic units from symptomatic lesions in carrot resulted in 94% of the reads belonging to oomycetes with a dominance of species of Pythium that are known to be involved in causing cavity spot. Moreover, soil samples showed that 95% of the sequences could be assigned to oomycetes including Pythium......, Aphanomyces, Peronospora, Saprolegnia and Phytophthora. A high proportion of oomycete reads was consistently present in all symptomatic lesions and soil samples showing the versatility of the strategy and thus demonstrating the usefulness of the method in plant and soil DNA background....

  4. New approach for high-throughput screening of drug activity on Plasmodium liver stages.

    NARCIS (Netherlands)

    Gego, A.; Silvie, O.; Franetich, J.F.; Farhati, K.; Hannoun, L.; Luty, A.J.F.; Sauerwein, R.W.; Boucheix, C.; Rubinstein, E.; Mazier, D.

    2006-01-01

    Plasmodium liver stages represent potential targets for antimalarial prophylactic drugs. Nevertheless, there is a lack of molecules active on these stages. We have now developed a new approach for the high-throughput screening of drug activity on Plasmodium liver stages in vitro, based on an

  5. High-throughput genome sequencing of two Listeria monocytogenes clinical isolates during a large foodborne outbreak

    Directory of Open Access Journals (Sweden)

    Trout-Yakel Keri M

    2010-02-01

    Full Text Available Abstract Background A large, multi-province outbreak of listeriosis associated with ready-to-eat meat products contaminated with Listeria monocytogenes serotype 1/2a occurred in Canada in 2008. Subtyping of outbreak-associated isolates using pulsed-field gel electrophoresis (PFGE revealed two similar but distinct AscI PFGE patterns. High-throughput pyrosequencing of two L. monocytogenes isolates was used to rapidly provide the genome sequence of the primary outbreak strain and to investigate the extent of genetic diversity associated with a change of a single restriction enzyme fragment during PFGE. Results The chromosomes were collinear, but differences included 28 single nucleotide polymorphisms (SNPs and three indels, including a 33 kbp prophage that accounted for the observed difference in AscI PFGE patterns. The distribution of these traits was assessed within further clinical, environmental and food isolates associated with the outbreak, and this comparison indicated that three distinct, but highly related strains may have been involved in this nationwide outbreak. Notably, these two isolates were found to harbor a 50 kbp putative mobile genomic island encoding translocation and efflux functions that has not been observed in other Listeria genomes. Conclusions High-throughput genome sequencing provided a more detailed real-time assessment of genetic traits characteristic of the outbreak strains than could be achieved with routine subtyping methods. This study confirms that the latest generation of DNA sequencing technologies can be applied during high priority public health events, and laboratories need to prepare for this inevitability and assess how to properly analyze and interpret whole genome sequences in the context of molecular epidemiology.

  6. Construction and analysis of high-density linkage map using high-throughput sequencing data.

    Directory of Open Access Journals (Sweden)

    Dongyuan Liu

    Full Text Available Linkage maps enable the study of important biological questions. The construction of high-density linkage maps appears more feasible since the advent of next-generation sequencing (NGS, which eases SNP discovery and high-throughput genotyping of large population. However, the marker number explosion and genotyping errors from NGS data challenge the computational efficiency and linkage map quality of linkage study methods. Here we report the HighMap method for constructing high-density linkage maps from NGS data. HighMap employs an iterative ordering and error correction strategy based on a k-nearest neighbor algorithm and a Monte Carlo multipoint maximum likelihood algorithm. Simulation study shows HighMap can create a linkage map with three times as many markers as ordering-only methods while offering more accurate marker orders and stable genetic distances. Using HighMap, we constructed a common carp linkage map with 10,004 markers. The singleton rate was less than one-ninth of that generated by JoinMap4.1. Its total map distance was 5,908 cM, consistent with reports on low-density maps. HighMap is an efficient method for constructing high-density, high-quality linkage maps from high-throughput population NGS data. It will facilitate genome assembling, comparative genomic analysis, and QTL studies. HighMap is available at http://highmap.biomarker.com.cn/.

  7. glbase: a framework for combining, analyzing and displaying heterogeneous genomic and high-throughput sequencing data

    Directory of Open Access Journals (Sweden)

    Andrew Paul Hutchins

    2014-01-01

    Full Text Available Genomic datasets and the tools to analyze them have proliferated at an astonishing rate. However, such tools are often poorly integrated with each other: each program typically produces its own custom output in a variety of non-standard file formats. Here we present glbase, a framework that uses a flexible set of descriptors that can quickly parse non-binary data files. glbase includes many functions to intersect two lists of data, including operations on genomic interval data and support for the efficient random access to huge genomic data files. Many glbase functions can produce graphical outputs, including scatter plots, heatmaps, boxplots and other common analytical displays of high-throughput data such as RNA-seq, ChIP-seq and microarray expression data. glbase is designed to rapidly bring biological data into a Python-based analytical environment to facilitate analysis and data processing. In summary, glbase is a flexible and multifunctional toolkit that allows the combination and analysis of high-throughput data (especially next-generation sequencing and genome-wide data, and which has been instrumental in the analysis of complex data sets. glbase is freely available at http://bitbucket.org/oaxiom/glbase/.

  8. A Bayesian framework to identify methylcytosines from high-throughput bisulfite sequencing data.

    Directory of Open Access Journals (Sweden)

    Qing Xie

    2014-09-01

    Full Text Available High-throughput bisulfite sequencing technologies have provided a comprehensive and well-fitted way to investigate DNA methylation at single-base resolution. However, there are substantial bioinformatic challenges to distinguish precisely methylcytosines from unconverted cytosines based on bisulfite sequencing data. The challenges arise, at least in part, from cell heterozygosis caused by multicellular sequencing and the still limited number of statistical methods that are available for methylcytosine calling based on bisulfite sequencing data. Here, we present an algorithm, termed Bycom, a new Bayesian model that can perform methylcytosine calling with high accuracy. Bycom considers cell heterozygosis along with sequencing errors and bisulfite conversion efficiency to improve calling accuracy. Bycom performance was compared with the performance of Lister, the method most widely used to identify methylcytosines from bisulfite sequencing data. The results showed that the performance of Bycom was better than that of Lister for data with high methylation levels. Bycom also showed higher sensitivity and specificity for low methylation level samples (<1% than Lister. A validation experiment based on reduced representation bisulfite sequencing data suggested that Bycom had a false positive rate of about 4% while maintaining an accuracy of close to 94%. This study demonstrated that Bycom had a low false calling rate at any methylation level and accurate methylcytosine calling at high methylation levels. Bycom will contribute significantly to studies aimed at recalibrating the methylation level of genomic regions based on the presence of methylcytosines.

  9. Detection of genomic variation by selection of a 9 mb DNA region and high throughput sequencing.

    Directory of Open Access Journals (Sweden)

    Sergey I Nikolaev

    Full Text Available Detection of the rare polymorphisms and causative mutations of genetic diseases in a targeted genomic area has become a major goal in order to understand genomic and phenotypic variability. We have interrogated repeat-masked regions of 8.9 Mb on human chromosomes 21 (7.8 Mb and 7 (1.1 Mb from an individual from the International HapMap Project (NA12872. We have optimized a method of genomic selection for high throughput sequencing. Microarray-based selection and sequencing resulted in 260-fold enrichment, with 41% of reads mapping to the target region. 83% of SNPs in the targeted region had at least 4-fold sequence coverage and 54% at least 15-fold. When assaying HapMap SNPs in NA12872, our sequence genotypes are 91.3% concordant in regions with coverage > or = 4-fold, and 97.9% concordant in regions with coverage > or = 15-fold. About 81% of the SNPs recovered with both thresholds are listed in dbSNP. We observed that regions with low sequence coverage occur in close proximity to low-complexity DNA. Validation experiments using Sanger sequencing were performed for 46 SNPs with 15-20 fold coverage, with a confirmation rate of 96%, suggesting that DNA selection provides an accurate and cost-effective method for identifying rare genomic variants.

  10. Human Genome Sequencing at the Population Scale: A Primer on High-Throughput DNA Sequencing and Analysis.

    Science.gov (United States)

    Goldfeder, Rachel L; Wall, Dennis P; Khoury, Muin J; Ioannidis, John P A; Ashley, Euan A

    2017-10-15

    Most human diseases have underlying genetic causes. To better understand the impact of genes on disease and its implications for medicine and public health, researchers have pursued methods for determining the sequences of individual genes, then all genes, and now complete human genomes. Massively parallel high-throughput sequencing technology, where DNA is sheared into smaller pieces, sequenced, and then computationally reordered and analyzed, enables fast and affordable sequencing of full human genomes. As the price of sequencing continues to decline, more and more individuals are having their genomes sequenced. This may facilitate better population-level disease subtyping and characterization, as well as individual-level diagnosis and personalized treatment and prevention plans. In this review, we describe several massively parallel high-throughput DNA sequencing technologies and their associated strengths, limitations, and error modes, with a focus on applications in epidemiologic research and precision medicine. We detail the methods used to computationally process and interpret sequence data to inform medical or preventative action. © The Author(s) 2017. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  11. The efficacy of high-throughput sequencing and target enrichment on charred archaeobotanical remains

    DEFF Research Database (Denmark)

    Nistelberger, H. M.; Smith, O.; Wales, Nathan

    2016-01-01

    The majority of archaeological plant material is preserved in a charred state. Obtaining reliable ancient DNA data from these remains has presented challenges due to high rates of nucleotide damage, short DNA fragment lengths, low endogenous DNA content and the potential for modern contamination...... different laboratories, presenting the largest HTS assessment of charred archaeobotanical specimens to date. Rigorous analysis of our data - excluding false-positives due to background contamination or incorrect index assignments - indicated a lack of endogenous DNA in nearly all samples, except for one....... It has been suggested that high-throughput sequencing (HTS) technologies coupled with DNA enrichment techniques may overcome some of these limitations. Here we report the findings of HTS and target enrichment on four important archaeological crops (barley, grape, maize and rice) performed in three...

  12. ViewBS: a powerful toolkit for visualization of high-throughput bisulfite sequencing data.

    Science.gov (United States)

    Huang, Xiaosan; Zhang, Shaoling; Li, Kongqing; Thimmapuram, Jyothi; Xie, Shaojun

    2017-10-26

    High throughput bisulfite sequencing (BS-seq) is an important technology to generate single-base DNA methylomes in both plants and animals. In order to accelerate the data analysis of BS-seq data, toolkits for visualization are required. ViewBS, an open-source toolkit, can extract and visualize the DNA methylome data easily and with flexibility. By using Tabix, ViewBS can visualize BS-seq for large datasets quickly. ViewBS can generate publication-quality figures, such as meta-plots, heat maps and violin-boxplots, which can help users to answer biological questions. We illustrate its application using BS-seq data from Arabidopsis thaliana. ViewBS is freely available at: https://github.com/xie186/ViewBS. xie186@purdue.edu. Supplementary data are available at Bioinformatics online.

  13. High-throughput scanning of the rat genome using interspersed repetitive sequence-PCR markers.

    Science.gov (United States)

    Gösele, C; Hong, L; Kreitler, T; Rossmann, M; Hieke, B; Gross, U; Kramer, M; Himmelbauer, H; Bihoreau, M T; Kwitek-Black, A E; Twigger, S; Tonellato, P J; Jacob, H J; Schalkwyk, L C; Lindpaintner, K; Ganten, D; Lehrach, H; Knoblauch, M

    2000-11-01

    We report the establishment of a hybridization-based marker system for the rat genome based on the PCR amplification of interspersed repetitive sequences (IRS). Overall, 351 IRS markers were mapped within the rat genome. The IRS marker panel consists of 210 nonpolymorphic and 141 polymorphic markers that were screened for presence/absence polymorphism patterns in 38 different rat strains and substrains that are commonly used in biomedical research. The IRS marker panel was demonstrated to be useful for rapid genome screening in experimental rat crosses and high-throughput characterization of large-insert genomic library clones. Information on corresponding YAC clones is made available for this IRS marker set distributed over the whole rat genome. The two existing rat radiation hybrid maps were integrated by placing the IRS markers in both maps. The genetic and physical mapping data presented provide substantial information for ongoing positional cloning projects in the rat. Copyright 2000 Academic Press.

  14. SNP calling using genotype model selection on high-throughput sequencing data

    KAUST Repository

    You, Na

    2012-01-16

    Motivation: A review of the available single nucleotide polymorphism (SNP) calling procedures for Illumina high-throughput sequencing (HTS) platform data reveals that most rely mainly on base-calling and mapping qualities as sources of error when calling SNPs. Thus, errors not involved in base-calling or alignment, such as those in genomic sample preparation, are not accounted for.Results: A novel method of consensus and SNP calling, Genotype Model Selection (GeMS), is given which accounts for the errors that occur during the preparation of the genomic sample. Simulations and real data analyses indicate that GeMS has the best performance balance of sensitivity and positive predictive value among the tested SNP callers. © The Author 2012. Published by Oxford University Press. All rights reserved.

  15. Barcoding the food chain: from Sanger to high-throughput sequencing.

    Science.gov (United States)

    Littlefair, Joanne E; Clare, Elizabeth L

    2016-11-01

    Society faces the complex challenge of supporting biodiversity and ecosystem functioning, while ensuring food security by providing safe traceable food through an ever-more-complex global food chain. The increase in human mobility brings the added threat of pests, parasites, and invaders that further complicate our agro-industrial efforts. DNA barcoding technologies allow researchers to identify both individual species, and, when combined with universal primers and high-throughput sequencing techniques, the diversity within mixed samples (metabarcoding). These tools are already being employed to detect market substitutions, trace pests through the forensic evaluation of trace "environmental DNA", and to track parasitic infections in livestock. The potential of DNA barcoding to contribute to increased security of the food chain is clear, but challenges remain in regulation and the need for validation of experimental analysis. Here, we present an overview of the current uses and challenges of applied DNA barcoding in agriculture, from agro-ecosystems within farmland to the kitchen table.

  16. Kaleidaseq: a Web-based tool to monitor data flow in a high throughput sequencing facility.

    Science.gov (United States)

    Dedhia, N N; McCombie, W R

    1998-03-01

    Tracking data flow in high throughput sequencing is important in maintaining a consistent number of successfully sequenced samples, making decisions on scheduling the flow of sequencing steps, resolving problems at various steps and tracking the status of different projects. This is especially critical when the laboratory is handling a multitude of projects. We have built a Web-based data flow tracking package, called Kaleidaseq, which allows us to monitor the flow and quality of sequencing samples through the steps of preparation of library plates, plaque-picking, preparation of templates, conducting sequencing reactions, loading of samples on gels, base-calling the traces, and calculating the quality of the sequenced samples. Kaleidaseq's suite of displays allows for outstanding monitoring of the production sequencing process. The online display of current information that Kaleidaseq provides on both project status and process queues sorted by project enables accurate real-time assessment of the necessary samples that must be processed to complete the project. This information allows the process manager to allocate future resources optimally and schedule tasks according to scientific priorities. Quality of the sequenced samples can be tracked on a daily basis, which allows the sequencing laboratory to maintain a steady performance level and quickly resolve dips in quality. Kaleidaseq has a simple easy-to-use interface that allows access to all major functions and process queues from one Web page. This software package is modular and designed to allow additional processing steps and new monitoring variables to be added and tracked with ease. Access to the underlying relational database is through the Perl DBI interface, which allows for the use of different relational databases. Kaleidaseq is available for free use by the academic community from http://www.cshl.org/kaleidaseq.

  17. End-to-End Optimization of High-Throughput DNA Sequencing.

    Science.gov (United States)

    O'Reilly, Eliza; Baccelli, Francois; De Veciana, Gustavo; Vikalo, Haris

    2016-10-01

    At the core of Illumina's high-throughput DNA sequencing platforms lies a biophysical surface process that results in a random geometry of clusters of homogeneous short DNA fragments typically hundreds of base pairs long-bridge amplification. The statistical properties of this random process and the lengths of the fragments are critical as they affect the information that can be subsequently extracted, that is, density of successfully inferred DNA fragment reads. The ensembles of overlapping DNA fragment reads are then used to computationally reconstruct the much longer target genome sequence. The success of the reconstruction in turn depends on having a sufficiently large ensemble of DNA fragments that are sufficiently long. In this article using stochastic geometry, we model and optimize the end-to-end flow cell synthesis and target genome sequencing process, linking and partially controlling the statistics of the physical processes to the success of the final computational step. Based on a rough calibration of our model, we provide, for the first time, a mathematical framework capturing the salient features of the sequencing platform that serves as a basis for optimizing cost, performance, and/or sensitivity analysis to various parameters.

  18. SAMQA: error classification and validation of high-throughput sequenced read data

    Directory of Open Access Journals (Sweden)

    Bressler Ryan

    2011-08-01

    Full Text Available Abstract Background The advances in high-throughput sequencing technologies and growth in data sizes has highlighted the need for scalable tools to perform quality assurance testing. These tests are necessary to ensure that data is of a minimum necessary standard for use in downstream analysis. In this paper we present the SAMQA tool to rapidly and robustly identify errors in population-scale sequence data. Results SAMQA has been used on samples from three separate sets of cancer genome data from The Cancer Genome Atlas (TCGA project. Using technical standards provided by the SAM specification and biological standards defined by researchers, we have classified errors in these sequence data sets relative to individual reads within a sample. Due to an observed linearithmic speedup through the use of a high-performance computing (HPC framework for the majority of tasks, poor quality data was identified prior to secondary analysis in significantly less time on the HPC framework than the same data run using alternative parallelization strategies on a single server. Conclusions The SAMQA toolset validates a minimum set of data quality standards across whole-genome and exome sequences. It is tuned to run on a high-performance computational framework, enabling QA across hundreds gigabytes of samples regardless of coverage or sample type.

  19. A beginners guide to SNP calling from high-throughput DNA-sequencing data.

    Science.gov (United States)

    Altmann, André; Weber, Peter; Bader, Daniel; Preuss, Michael; Binder, Elisabeth B; Müller-Myhsok, Bertram

    2012-10-01

    High-throughput DNA sequencing (HTS) is of increasing importance in the life sciences. One of its most prominent applications is the sequencing of whole genomes or targeted regions of the genome such as all exonic regions (i.e., the exome). Here, the objective is the identification of genetic variants such as single nucleotide polymorphisms (SNPs). The extraction of SNPs from the raw genetic sequences involves many processing steps and the application of a diverse set of tools. We review the essential building blocks for a pipeline that calls SNPs from raw HTS data. The pipeline includes quality control, mapping of short reads to the reference genome, visualization and post-processing of the alignment including base quality recalibration. The final steps of the pipeline include the SNP calling procedure along with filtering of SNP candidates. The steps of this pipeline are accompanied by an analysis of a publicly available whole-exome sequencing dataset. To this end, we employ several alignment programs and SNP calling routines for highlighting the fact that the choice of the tools significantly affects the final results.

  20. Polymorphism discovery and allele frequency estimation using high-throughput DNA sequencing of target-enriched pooled DNA samples

    Directory of Open Access Journals (Sweden)

    Mullen Michael P

    2012-01-01

    Full Text Available Abstract Background The central role of the somatotrophic axis in animal post-natal growth, development and fertility is well established. Therefore, the identification of genetic variants affecting quantitative traits within this axis is an attractive goal. However, large sample numbers are a pre-requisite for the identification of genetic variants underlying complex traits and although technologies are improving rapidly, high-throughput sequencing of large numbers of complete individual genomes remains prohibitively expensive. Therefore using a pooled DNA approach coupled with target enrichment and high-throughput sequencing, the aim of this study was to identify polymorphisms and estimate allele frequency differences across 83 candidate genes of the somatotrophic axis, in 150 Holstein-Friesian dairy bulls divided into two groups divergent for genetic merit for fertility. Results In total, 4,135 SNPs and 893 indels were identified during the resequencing of the 83 candidate genes. Nineteen percent (n = 952 of variants were located within 5' and 3' UTRs. Seventy-two percent (n = 3,612 were intronic and 9% (n = 464 were exonic, including 65 indels and 236 SNPs resulting in non-synonymous substitutions (NSS. Significant (P ® MassARRAY. No significant differences (P > 0.1 were observed between the two methods for any of the 43 SNPs across both pools (i.e., 86 tests in total. Conclusions The results of the current study support previous findings of the use of DNA sample pooling and high-throughput sequencing as a viable strategy for polymorphism discovery and allele frequency estimation. Using this approach we have characterised the genetic variation within genes of the somatotrophic axis and related pathways, central to mammalian post-natal growth and development and subsequent lactogenesis and fertility. We have identified a large number of variants segregating at significantly different frequencies between cattle groups divergent for calving

  1. Molecular Approaches for High Throughput Detection and Quantification of Genetically Modified Crops: A Review

    Directory of Open Access Journals (Sweden)

    Ibrahim B. Salisu

    2017-10-01

    Full Text Available As long as the genetically modified crops are gaining attention globally, their proper approval and commercialization need accurate and reliable diagnostic methods for the transgenic content. These diagnostic techniques are mainly divided into two major groups, i.e., identification of transgenic (1 DNA and (2 proteins from GMOs and their products. Conventional methods such as PCR (polymerase chain reaction and enzyme-linked immunosorbent assay (ELISA were routinely employed for DNA and protein based quantification respectively. Although, these Techniques (PCR and ELISA are considered as significantly convenient and productive, but there is need for more advance technologies that allow for high throughput detection and the quantification of GM event as the production of more complex GMO is increasing day by day. Therefore, recent approaches like microarray, capillary gel electrophoresis, digital PCR and next generation sequencing are more promising due to their accuracy and precise detection of transgenic contents. The present article is a brief comparative study of all such detection techniques on the basis of their advent, feasibility, accuracy, and cost effectiveness. However, these emerging technologies have a lot to do with detection of a specific event, contamination of different events and determination of fusion as well as stacked gene protein are the critical issues to be addressed in future.

  2. Molecular Approaches for High Throughput Detection and Quantification of Genetically Modified Crops: A Review

    Science.gov (United States)

    Salisu, Ibrahim B.; Shahid, Ahmad A.; Yaqoob, Amina; Ali, Qurban; Bajwa, Kamran S.; Rao, Abdul Q.; Husnain, Tayyab

    2017-01-01

    As long as the genetically modified crops are gaining attention globally, their proper approval and commercialization need accurate and reliable diagnostic methods for the transgenic content. These diagnostic techniques are mainly divided into two major groups, i.e., identification of transgenic (1) DNA and (2) proteins from GMOs and their products. Conventional methods such as PCR (polymerase chain reaction) and enzyme-linked immunosorbent assay (ELISA) were routinely employed for DNA and protein based quantification respectively. Although, these Techniques (PCR and ELISA) are considered as significantly convenient and productive, but there is need for more advance technologies that allow for high throughput detection and the quantification of GM event as the production of more complex GMO is increasing day by day. Therefore, recent approaches like microarray, capillary gel electrophoresis, digital PCR and next generation sequencing are more promising due to their accuracy and precise detection of transgenic contents. The present article is a brief comparative study of all such detection techniques on the basis of their advent, feasibility, accuracy, and cost effectiveness. However, these emerging technologies have a lot to do with detection of a specific event, contamination of different events and determination of fusion as well as stacked gene protein are the critical issues to be addressed in future. PMID:29085378

  3. High Throughput Facility

    Data.gov (United States)

    Federal Laboratory Consortium — Argonne?s high throughput facility provides highly automated and parallel approaches to material and materials chemistry development. The facility allows scientists...

  4. Investigation of bacterial and fungal diversity in tarag using high-throughput sequencing.

    Science.gov (United States)

    Sun, Zhihong; Liu, Wenjun; Bao, Qiuhua; Zhang, Jiachao; Hou, Qiangchuan; Kwok, Laiyu; Sun, Tiansong; Zhang, Heping

    2014-10-01

    This is the first study on the bacterial and fungal community diversity in 17 tarag samples (naturally fermented dairy products) through a metagenomic approach involving high-throughput pyrosequencing. Our results revealed the presence of a total of 47 bacterial and 43 fungal genera in all tarag samples, in which Lactobacillus and Galactomyces were the predominant genera of bacteria and fungi, respectively. The number of some microbial genera, such as Lactococcus, Acetobacter, Saccharomyces, Trichosporon, and Kluyveromyces, among others, was found to vary between different samples. Altogether, our results showed that the microbial flora in different samples may be stratified by geographic region. Copyright © 2014 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  5. Determining the diet of larvae of western rock lobster (Panulirus cygnus using high-throughput DNA sequencing techniques.

    Directory of Open Access Journals (Sweden)

    Richard O'Rorke

    Full Text Available The Western Australian rock lobster fishery has been both a highly productive and sustainable fishery. However, a recent dramatic and unexplained decline in post-larval recruitment threatens this sustainability. Our lack of knowledge of key processes in lobster larval ecology, such as their position in the food web, limits our ability to determine what underpins this decline. The present study uses a high-throughput amplicon sequencing approach on DNA obtained from the hepatopancreas of larvae to discover significant prey items. Two short regions of the 18S rRNA gene were amplified under the presence of lobster specific PNA to prevent lobster amplification and to improve prey amplification. In the resulting sequences either little prey was recovered, indicating that the larval gut was empty, or there was a high number of reads originating from multiple zooplankton taxa. The most abundant reads included colonial Radiolaria, Thaliacea, Actinopterygii, Hydrozoa and Sagittoidea, which supports the hypothesis that the larvae feed on multiple groups of mostly transparent gelatinous zooplankton. This hypothesis has prevailed as it has been tentatively inferred from the physiology of larvae, captive feeding trials and co-occurrence in situ. However, these prey have not been observed in the larval gut as traditional microscopic techniques cannot discern between transparent and gelatinous prey items in the gut. High-throughput amplicon sequencing of gut DNA has enabled us to classify these otherwise undetectable prey. The dominance of the colonial radiolarians among the gut contents is intriguing in that this group has been historically difficult to quantify in the water column, which may explain why they have not been connected to larval diet previously. Our results indicate that a PCR based technique is a very successful approach to identify the most abundant taxa in the natural diet of lobster larvae.

  6. Determining the diet of larvae of western rock lobster (Panulirus cygnus) using high-throughput DNA sequencing techniques.

    Science.gov (United States)

    O'Rorke, Richard; Lavery, Shane; Chow, Seinen; Takeyama, Haruko; Tsai, Peter; Beckley, Lynnath E; Thompson, Peter A; Waite, Anya M; Jeffs, Andrew G

    2012-01-01

    The Western Australian rock lobster fishery has been both a highly productive and sustainable fishery. However, a recent dramatic and unexplained decline in post-larval recruitment threatens this sustainability. Our lack of knowledge of key processes in lobster larval ecology, such as their position in the food web, limits our ability to determine what underpins this decline. The present study uses a high-throughput amplicon sequencing approach on DNA obtained from the hepatopancreas of larvae to discover significant prey items. Two short regions of the 18S rRNA gene were amplified under the presence of lobster specific PNA to prevent lobster amplification and to improve prey amplification. In the resulting sequences either little prey was recovered, indicating that the larval gut was empty, or there was a high number of reads originating from multiple zooplankton taxa. The most abundant reads included colonial Radiolaria, Thaliacea, Actinopterygii, Hydrozoa and Sagittoidea, which supports the hypothesis that the larvae feed on multiple groups of mostly transparent gelatinous zooplankton. This hypothesis has prevailed as it has been tentatively inferred from the physiology of larvae, captive feeding trials and co-occurrence in situ. However, these prey have not been observed in the larval gut as traditional microscopic techniques cannot discern between transparent and gelatinous prey items in the gut. High-throughput amplicon sequencing of gut DNA has enabled us to classify these otherwise undetectable prey. The dominance of the colonial radiolarians among the gut contents is intriguing in that this group has been historically difficult to quantify in the water column, which may explain why they have not been connected to larval diet previously. Our results indicate that a PCR based technique is a very successful approach to identify the most abundant taxa in the natural diet of lobster larvae.

  7. A Multicenter Study To Evaluate the Performance of High-Throughput Sequencing for Virus Detection.

    Science.gov (United States)

    Khan, Arifa S; Ng, Siemon H S; Vandeputte, Olivier; Aljanahi, Aisha; Deyati, Avisek; Cassart, Jean-Pol; Charlebois, Robert L; Taliaferro, Lanyn P

    2017-01-01

    The capability of high-throughput sequencing (HTS) for detection of known and unknown viruses makes it a powerful tool for broad microbial investigations, such as evaluation of novel cell substrates that may be used for the development of new biological products. However, like any new assay, regulatory applications of HTS need method standardization. Therefore, our three laboratories initiated a study to evaluate performance of HTS for potential detection of viral adventitious agents by spiking model viruses in different cellular matrices to mimic putative materials for manufacturing of biologics. Four model viruses were selected based upon different physical and biochemical properties and commercial availability: human respiratory syncytial virus (RSV), Epstein-Barr virus (EBV), feline leukemia virus (FeLV), and human reovirus (REO). Additionally, porcine circovirus (PCV) was tested by one laboratory. Independent samples were prepared for HTS by spiking intact viruses or extracted viral nucleic acids, singly or mixed, into different HeLa cell matrices (resuspended whole cells, cell lysate, or total cellular RNA). Data were obtained using different sequencing platforms (Roche 454, Illumina HiSeq1500 or HiSeq2500). Bioinformatic analyses were performed independently by each laboratory using available tools, pipelines, and databases. The results showed that comparable virus detection was obtained in the three laboratories regardless of sample processing, library preparation, sequencing platform, and bioinformatic analysis: between 0.1 and 3 viral genome copies per cell were detected for all of the model viruses used. This study highlights the potential for using HTS for sensitive detection of adventitious viruses in complex biological samples containing cellular background. IMPORTANCE Recent high-throughput sequencing (HTS) investigations have resulted in unexpected discoveries of known and novel viruses in a variety of sample types, including research materials

  8. Identification and characterization of small non-coding RNAs from Chinese fir by high throughput sequencing

    Directory of Open Access Journals (Sweden)

    Wan Li-Chuan

    2012-08-01

    Full Text Available Abstract Background Small non-coding RNAs (sRNAs play key roles in plant development, growth and responses to biotic and abiotic stresses. At least four classes of sRNAs have been well characterized in plants, including repeat-associated siRNAs (rasiRNAs, microRNAs (miRNAs, trans-acting siRNAs (tasiRNAs and natural antisense transcript-derived siRNAs. Chinese fir (Cunninghamia lanceolata is one of the most important coniferous evergreen tree species in China. No sRNA from Chinese fir has been described to date. Results To obtain sRNAs in Chinese fir, we sequenced a sRNA library generated from seeds, seedlings, leaves, stems and calli, using Illumina high throughput sequencing technology. A comprehensive set of sRNAs were acquired, including conserved and novel miRNAs, rasiRNAs and tasiRNAs. With BLASTN and MIREAP we identified a total of 115 conserved miRNAs comprising 40 miRNA families and one novel miRNA with precursor sequence. The expressions of 16 conserved and one novel miRNAs and one tasiRNA were detected by RT-PCR. Utilizing real time RT-PCR, we revealed that four conserved and one novel miRNAs displayed developmental stage-specific expression patterns in Chinese fir. In addition, 209 unigenes were predicted to be targets of 30 Chinese fir miRNA families, of which five target genes were experimentally verified by 5' RACE, including a squamosa promoter-binding protein gene, a pentatricopeptide (PPR repeat-containing protein gene, a BolA-like family protein gene, AGO1 and a gene of unknown function. We also demonstrated that the DCL3-dependent rasiRNA biogenesis pathway, which had been considered absent in conifers, existed in Chinese fir. Furthermore, the miR390-TAS3-ARF regulatory pathway was elucidated. Conclusions We unveiled a complex population of sRNAs in Chinese fir through high throughput sequencing. This provides an insight into the composition and function of sRNAs in Chinese fir and sheds new light on land plant sRNA evolution.

  9. High-throughput nucleotide sequence analysis of diverse bacterial communities in leachates of decomposing pig carcasses

    Directory of Open Access Journals (Sweden)

    Seung Hak Yang

    2015-09-01

    Full Text Available The leachate generated by the decomposition of animal carcass has been implicated as an environmental contaminant surrounding the burial site. High-throughput nucleotide sequencing was conducted to investigate the bacterial communities in leachates from the decomposition of pig carcasses. We acquired 51,230 reads from six different samples (1, 2, 3, 4, 6 and 14 week-old carcasses and found that sequences representing the phylum Firmicutes predominated. The diversity of bacterial 16S rRNA gene sequences in the leachate was the highest at 6 weeks, in contrast to those at 2 and 14 weeks. The relative abundance of Firmicutes was reduced, while the proportion of Bacteroidetes and Proteobacteria increased from 3–6 weeks. The representation of phyla was restored after 14 weeks. However, the community structures between the samples taken at 1–2 and 14 weeks differed at the bacterial classification level. The trend in pH was similar to the changes seen in bacterial communities, indicating that the pH of the leachate could be related to the shift in the microbial community. The results indicate that the composition of bacterial communities in leachates of decomposing pig carcasses shifted continuously during the study period and might be influenced by the burial site.

  10. [Study on Microbial Diversity of Peri-implantitis Subgingival by High-throughput Sequencing].

    Science.gov (United States)

    Li, Zhi-jie; Wang, Shao-guo; Li, Yue-hong; Tu, Dong-xiang; Liu, Shi-yun; Nie, Hong-bing; Li, Zhi-qiang; Zhang, Ju-mei

    2015-07-01

    To study microbial diversity of peri-implantitis subgingival with high-throughput sequencing, and investigate microbiological etiology of peri-implantitis. Subgingival plaques were sampled from the patients with peri-implantitis (D group) and non-peri-implantitis subjects (N group). The microbiological diversity of the subgingival plaques was detected by sequencing V4 region of 16S rRNA with Illumina Miseq platform. The diversity of the community structure was analyzed using Mothur software. A total of 156 507 gene sequences were detected in nine samples and 4 402 operational taxonomic units (OTUs) were found. Selenomonas, Pseudomonas, and Fusobacterium were dominant bacteria in D group, while Fusobacterium, Veillonella and Streptococcus were dominant bacteria in N group. Differences between peri-implantitis and non-peri-implantitis bacterial communities were observed at all phylogenetic levels by LEfSe, which was also found in PcoA test. The occurrence of peri-implantitis is not only related to periodontitis pathogenic microbe, but also related with the changes of oral microbial community structure. Treponema, Herbaspirillum, Butyricimonas and Phaeobacte may be closely related to the occurrence and development of peri-implantitis.

  11. Alignment of high-throughput sequencing data inside in-memory databases.

    Science.gov (United States)

    Firnkorn, Daniel; Knaup-Gregori, Petra; Lorenzo Bermejo, Justo; Ganzinger, Matthias

    2014-01-01

    In times of high-throughput DNA sequencing techniques, performance-capable analysis of DNA sequences is of high importance. Computer supported DNA analysis is still an intensive time-consuming task. In this paper we explore the potential of a new In-Memory database technology by using SAP's High Performance Analytic Appliance (HANA). We focus on read alignment as one of the first steps in DNA sequence analysis. In particular, we examined the widely used Burrows-Wheeler Aligner (BWA) and implemented stored procedures in both, HANA and the free database system MySQL, to compare execution time and memory management. To ensure that the results are comparable, MySQL has been running in memory as well, utilizing its integrated memory engine for database table creation. We implemented stored procedures, containing exact and inexact searching of DNA reads within the reference genome GRCh37. Due to technical restrictions in SAP HANA concerning recursion, the inexact matching problem could not be implemented on this platform. Hence, performance analysis between HANA and MySQL was made by comparing the execution time of the exact search procedures. Here, HANA was approximately 27 times faster than MySQL which means, that there is a high potential within the new In-Memory concepts, leading to further developments of DNA analysis procedures in the future.

  12. In Silico Identification of RNA Modifications from High-Throughput Sequencing Data Using HAMR.

    Science.gov (United States)

    Kuksa, Pavel P; Leung, Yuk Yee; Vandivier, Lee E; Anderson, Zachary; Gregory, Brian D; Wang, Li-San

    2017-01-01

    RNA molecules are often altered post-transcriptionally by the covalent modification of their nucleotides. These modifications are known to modulate the structure, function, and activity of RNAs. When reverse transcribed into cDNA during RNA sequencing library preparation, atypical (modified) ribonucleotides that affect Watson-Crick base pairing will interfere with reverse transcriptase (RT), resulting in cDNA products with mis-incorporated bases or prematurely terminated RNA products. These interactions with RT can therefore be inferred from mismatch patterns in the sequencing reads, and are distinguishable from simple base-calling errors, single-nucleotide polymorphisms (SNPs), or RNA editing sites. Here, we describe a computational protocol for the in silico identification of modified ribonucleotides from RT-based RNA-seq read-out using the High-throughput Analysis of Modified Ribonucleotides (HAMR) software. HAMR can identify these modifications transcriptome-wide with single nucleotide resolution, and also differentiate between different types of modifications to predict modification identity. Researchers can use HAMR to identify and characterize RNA modifications using RNA-seq data from a variety of common RT-based sequencing protocols such as Poly(A), total RNA-seq, and small RNA-seq.

  13. The gut microbiotassay – a high-throughput real-time PCR chip combined with next generation sequencing

    DEFF Research Database (Denmark)

    Hermann-Bank, Marie Louise; Skovgaard, Kerstin; Mølbak, Lars

    informative. Many methods can be used to try to define and characterize the gut microbiota. Here we designed an assay consisting of twenty-four different primer systems targeting the most common bacterial groups of the intestine on different hierarchical levels. The aim of this study was to implement and test...... this assay with the high-throughput real-time PCR chip “Access Array 48.48” from Fluidigm. The chip executes 2304 individual reactions in parallel and afterwards it is possible to harvest the amplicons for next-generation sequencing. This approach gives a taxonomical overview of the gut microbiota, hence...... the name: ‘the gut microbiotassay’. The assay was tested on fifteen different bacterial type strains each functioning as target for one or more of the primer systems. In this way the sensitivity and the specificity of the primers were assessed. Next the assay was tested on complex ecosystems by extracting...

  14. Vidjil: A Web Platform for Analysis of High-Throughput Repertoire Sequencing.

    Science.gov (United States)

    Duez, Marc; Giraud, Mathieu; Herbert, Ryan; Rocher, Tatiana; Salson, Mikaël; Thonier, Florian

    2016-01-01

    The B and T lymphocytes are white blood cells playing a key role in the adaptive immunity. A part of their DNA, called the V(D)J recombinations, is specific to each lymphocyte, and enables recognition of specific antigenes. Today, with new sequencing techniques, one can get billions of DNA sequences from these regions. With dedicated Repertoire Sequencing (RepSeq) methods, it is now possible to picture population of lymphocytes, and to monitor more accurately the immune response as well as pathologies such as leukemia. Vidjil is an open-source platform for the interactive analysis of high-throughput sequencing data from lymphocyte recombinations. It contains an algorithm gathering reads into clonotypes according to their V(D)J junctions, a web application made of a sample, experiment and patient database and a visualization for the analysis of clonotypes along the time. Vidjil is implemented in C++, Python and Javascript and licensed under the GPLv3 open-source license. Source code, binaries and a public web server are available at http://www.vidjil.org and at http://bioinfo.lille.inria.fr/vidjil. Using the Vidjil web application consists of four steps: 1. uploading a raw sequence file (typically a FASTQ); 2. running RepSeq analysis software; 3. visualizing the results; 4. annotating the results and saving them for future use. For the end-user, the Vidjil web application needs no specific installation and just requires a connection and a modern web browser. Vidjil is used by labs in hematology or immunology for research and clinical applications.

  15. High throughput whole rumen metagenome profiling using untargeted massively parallel sequencing

    Directory of Open Access Journals (Sweden)

    Ross Elizabeth M

    2012-07-01

    Full Text Available Abstract Background Variation of microorganism communities in the rumen of cattle (Bos taurus is of great interest because of possible links to economically or environmentally important traits, such as feed conversion efficiency or methane emission levels. The resolution of studies investigating this variation may be improved by utilizing untargeted massively parallel sequencing (MPS, that is, sequencing without targeted amplification of genes. The objective of this study was to develop a method which used MPS to generate “rumen metagenome profiles”, and to investigate if these profiles were repeatable among samples taken from the same cow. Given faecal samples are much easier to obtain than rumen fluid samples; we also investigated whether rumen metagenome profiles were predictive of faecal metagenome profiles. Results Rather than focusing on individual organisms within the rumen, our method used MPS data to generate quantitative rumen micro-biome profiles, regardless of taxonomic classifications. The method requires a previously assembled reference metagenome. A number of such reference metagenomes were considered, including two rumen derived metagenomes, a human faecal microflora metagenome and a reference metagenome made up of publically available prokaryote sequences. Sequence reads from each test sample were aligned to these references. The “rumen metagenome profile” was generated from the number of the reads that aligned to each contig in the database. We used this method to test the hypothesis that rumen fluid microbial community profiles vary more between cows than within multiple samples from the same cow. Rumen fluid samples were taken from three cows, at three locations within the rumen. DNA from the samples was sequenced on the Illumina GAIIx. When the reads were aligned to a rumen metagenome reference, the rumen metagenome profiles were repeatable (P  Conclusions We have presented a simple and high throughput method of

  16. The efficacy of high-throughput sequencing and target enrichment on charred archaeobotanical remains.

    Science.gov (United States)

    Nistelberger, H M; Smith, O; Wales, N; Star, B; Boessenkool, S

    2016-11-24

    The majority of archaeological plant material is preserved in a charred state. Obtaining reliable ancient DNA data from these remains has presented challenges due to high rates of nucleotide damage, short DNA fragment lengths, low endogenous DNA content and the potential for modern contamination. It has been suggested that high-throughput sequencing (HTS) technologies coupled with DNA enrichment techniques may overcome some of these limitations. Here we report the findings of HTS and target enrichment on four important archaeological crops (barley, grape, maize and rice) performed in three different laboratories, presenting the largest HTS assessment of charred archaeobotanical specimens to date. Rigorous analysis of our data - excluding false-positives due to background contamination or incorrect index assignments - indicated a lack of endogenous DNA in nearly all samples, except for one lightly-charred maize cob. Even with target enrichment, this sample failed to yield adequate data required to address fundamental questions in archaeology and biology. We further reanalysed part of an existing dataset on charred plant material, and found all purported endogenous DNA sequences were likely to be spurious. We suggest these technologies are not suitable for use with charred archaeobotanicals and urge great caution when interpreting data obtained by HTS of these remains.

  17. Genotyping by PCR and High-Throughput Sequencing of Commercial Probiotic Products Reveals Composition Biases.

    Directory of Open Access Journals (Sweden)

    Wesley Morovic

    2016-11-01

    Full Text Available Recent advances in microbiome research have brought renewed focus on beneficial bacteria, many of which are available in food and dietary supplements. Although probiotics have historically been defined as microorganisms that convey health benefits when ingested in sufficient viable amounts, this description now includes the stipulation well defined strains, encompassing definitive taxonomy for consumer consideration and regulatory oversight. Here, we evaluated 52 commercial dietary supplements covering a range of labeled species, and determined their content using plate counting, targeted genotyping. Additionally, strain identities were assessed using methods recently published by the United States Pharmacopeial Convention. We also determined the relative abundance of individual bacteria by high-throughput sequencing (HTS of the 16S rRNA sequence using paired-end 2x250bp Illumina MiSeq technology. Using multiple methods, we tested the hypothesis that products do contain the quantitative amount of labeled bacteria, and qualitative list of labeled microbial species. We found that 17 samples (33% were below label claim for CFU prior to their expiration dates. A multiplexed-PCR scheme showed that only 30/52 (58% of the products contained a correctly labeled classification, with issues encompassing incorrect taxonomy, missing species and un-labeled species. The HTS revealed that many blended products consisted predominantly of Lactobacillus acidophilus and Bifidobacterium animalis subsp. lactis. These results highlight the need for reliable methods to qualitatively determine the correct taxonomy and quantitatively ascertain the relative amounts of mixed microbial populations in commercial probiotic products.

  18. Perchlorate reduction by hydrogen autotrophic bacteria and microbial community analysis using high-throughput sequencing.

    Science.gov (United States)

    Wan, Dongjin; Liu, Yongde; Niu, Zhenhua; Xiao, Shuhu; Li, Daorong

    2016-02-01

    Hydrogen autotrophic reduction of perchlorate have advantages of high removal efficiency and harmless to drinking water. But so far the reported information about the microbial community structure was comparatively limited, changes in the biodiversity and the dominant bacteria during acclimation process required detailed study. In this study, perchlorate-reducing hydrogen autotrophic bacteria were acclimated by hydrogen aeration from activated sludge. For the first time, high-throughput sequencing was applied to analyze changes in biodiversity and the dominant bacteria during acclimation process. The Michaelis-Menten model described the perchlorate reduction kinetics well. Model parameters q(max) and K(s) were 2.521-3.245 (mg ClO4(-)/gVSS h) and 5.44-8.23 (mg/l), respectively. Microbial perchlorate reduction occurred across at pH range 5.0-11.0; removal was highest at pH 9.0. The enriched mixed bacteria could use perchlorate, nitrate and sulfate as electron accepter, and the sequence of preference was: NO3(-) > ClO4(-) > SO4(2-). Compared to the feed culture, biodiversity decreased greatly during acclimation process, the microbial community structure gradually stabilized after 9 acclimation cycles. The Thauera genus related to Rhodocyclales was the dominated perchlorate reducing bacteria (PRB) in the mixed culture.

  19. High-throughput DNA Stretching in Continuous Elongational Flow for Genome Sequence Scanning

    Science.gov (United States)

    Meltzer, Robert; Griffis, Joshua; Safranovitch, Mikhail; Malkin, Gene; Cameron, Douglas

    2014-03-01

    Genome Sequence Scanning (GSS) identifies and compares bacterial genomes by stretching long (60 - 300 kb) genomic DNA restriction fragments and scanning for site-selective fluorescent probes. Practical application of GSS requires: 1) high throughput data acquisition, 2) efficient DNA stretching, 3) reproducible DNA elasticity in the presence of intercalating fluorescent dyes. GSS utilizes a pseudo-two-dimensional micron-scale funnel with convergent sheathing flows to stretch one molecule at a time in continuous elongational flow and center the DNA stream over diffraction-limited confocal laser excitation spots. Funnel geometry has been optimized to maximize throughput of DNA within the desired length range (>10 million nucleobases per second). A constant-strain detection channel maximizes stretching efficiency by applying a constant parabolic tension profile to each molecule, minimizing relaxation and flow-induced tumbling. The effect of intercalator on DNA elasticity is experimentally controlled by reacting one molecule of DNA at a time in convergent sheathing flows of the dye. Derivations of accelerating flow and non-linear tension distribution permit alignment of detected fluorescence traces to theoretical templates derived from whole-genome sequence data.

  20. Surveying the repair of ancient DNA from bones via high-throughput sequencing.

    Science.gov (United States)

    Mouttham, Nathalie; Klunk, Jennifer; Kuch, Melanie; Fourney, Ron; Poinar, Hendrik

    2015-07-01

    DNA damage in the form of abasic sites, chemically altered nucleotides, and strand fragmentation is the foremost limitation in obtaining genetic information from many ancient samples. Upon cell death, DNA continues to endure various chemical attacks such as hydrolysis and oxidation, but repair pathways found in vivo no longer operate. By incubating degraded DNA with specific enzyme combinations adopted from these pathways, it is possible to reverse some of the post-mortem nucleic acid damage prior to downstream analyses such as library preparation, targeted enrichment, and high-throughput sequencing. Here, we evaluate the performance of two available repair protocols on previously characterized DNA extracts from four mammoths. Both methods use endonucleases and glycosylases along with a DNA polymerase-ligase combination. PreCR Repair Mix increases the number of molecules converted to sequencing libraries, leading to an increase in endogenous content and a decrease in cytosine-to-thymine transitions due to cytosine deamination. However, the effects of Nelson Repair Mix on repair of DNA damage remain inconclusive.

  1. Pyicos: a versatile toolkit for the analysis of high-throughput sequencing data.

    Science.gov (United States)

    Althammer, Sonja; González-Vallinas, Juan; Ballaré, Cecilia; Beato, Miguel; Eyras, Eduardo

    2011-12-15

    High-throughput sequencing (HTS) has revolutionized gene regulation studies and is now fundamental for the detection of protein-DNA and protein-RNA binding, as well as for measuring RNA expression. With increasing variety and sequencing depth of HTS datasets, the need for more flexible and memory-efficient tools to analyse them is growing. We describe Pyicos, a powerful toolkit for the analysis of mapped reads from diverse HTS experiments: ChIP-Seq, either punctuated or broad signals, CLIP-Seq and RNA-Seq. We prove the effectiveness of Pyicos to select for significant signals and show that its accuracy is comparable and sometimes superior to that of methods specifically designed for each particular type of experiment. Pyicos facilitates the analysis of a variety of HTS datatypes through its flexibility and memory efficiency, providing a useful framework for data integration into models of regulatory genomics. Open-source software, with tutorials and protocol files, is available at http://regulatorygenomics.upf.edu/pyicos or as a Galaxy server at http://regulatorygenomics.upf.edu/galaxy eduardo.eyras@upf.edu Supplementary data are available at Bioinformatics online.

  2. Comprehensive evaluation and optimization of amplicon library preparation methods for high-throughput antibody sequencing.

    Directory of Open Access Journals (Sweden)

    Ulrike Menzel

    Full Text Available High-throughput sequencing (HTS of antibody repertoire libraries has become a powerful tool in the field of systems immunology. However, numerous sources of bias in HTS workflows may affect the obtained antibody repertoire data. A crucial step in antibody library preparation is the addition of short platform-specific nucleotide adapter sequences. As of yet, the impact of the method of adapter addition on experimental library preparation and the resulting antibody repertoire HTS datasets has not been thoroughly investigated. Therefore, we compared three standard library preparation methods by performing Illumina HTS on antibody variable heavy genes from murine antibody-secreting cells. Clonal overlap and rank statistics demonstrated that the investigated methods produced equivalent HTS datasets. PCR-based methods were experimentally superior to ligation with respect to speed, efficiency, and practicality. Finally, using a two-step PCR based method we established a protocol for antibody repertoire library generation, beginning from inputs as low as 1 ng of total RNA. In summary, this study represents a major advance towards a standardized experimental framework for antibody HTS, thus opening up the potential for systems-based, cross-experiment meta-analyses of antibody repertoires.

  3. Common fusion transcripts identified in colorectal cancer cell lines by high-throughput RNA sequencing.

    Science.gov (United States)

    Nome, Torfinn; Thomassen, Gard Os; Bruun, Jarle; Ahlquist, Terje; Bakken, Anne C; Hoff, Andreas M; Rognum, Torleiv; Nesbakken, Arild; Lorenz, Susanne; Sun, Jinchang; Barros-Silva, João Diogo; Lind, Guro E; Myklebost, Ola; Teixeira, Manuel R; Meza-Zepeda, Leonardo A; Lothe, Ragnhild A; Skotheim, Rolf I

    2013-01-01

    Colorectal cancer (CRC) is the third most common cancer disease in the Western world, and about 40% of the patients die from this disease. The cancer cells are commonly genetically unstable, but only a few low-frequency recurrent fusion genes have so far been reported for this disease. In this study, we present a thorough search for novel fusion transcripts in CRC using high-throughput RNA sequencing. From altogether 220 million paired-end sequence reads from seven CRC cell lines, we identified 3391 candidate fused transcripts. By stringent requirements, we nominated 11 candidate fusion transcripts for further experimental validation, of which 10 were positive by reverse transcription-polymerase chain reaction and Sanger sequencing. Six were intrachromosomal fusion transcripts, and interestingly, three of these, AKAP13-PDE8A, COMMD10-AP3S1, and CTB-35F21.1-PSD2, were present in, respectively, 18, 18, and 20 of 21 analyzed cell lines and in, respectively, 18, 61, and 48 (17%-58%) of 106 primary cancer tissues. These three fusion transcripts were also detected in 2 to 4 of 14 normal colonic mucosa samples (14%-28%). Whole-genome sequencing identified a specific genomic breakpoint in COMMD10-AP3S1 and further indicates that both the COMMD10-AP3S1 and AKAP13-PDE8A fusion transcripts are due to genomic duplications in specific cell lines. In conclusion, we have identified AKAP13-PDE8A, COMMD10-AP3S1, and CTB-35F21.1-PSD2 as novel intrachromosomal fusion transcripts and the most highly recurring chimeric transcripts described for CRC to date. The functional and clinical relevance of these chimeric RNA molecules remains to be elucidated.

  4. Common Fusion Transcripts Identified in Colorectal Cancer Cell Lines by High-Throughput RNA Sequencing12

    Science.gov (United States)

    Nome, Torfinn; Thomassen, Gard OS; Bruun, Jarle; Ahlquist, Terje; Bakken, Anne C; Hoff, Andreas M; Rognum, Torleiv; Nesbakken, Arild; Lorenz, Susanne; Sun, Jinchang; Barros-Silva, João Diogo; Lind, Guro E; Myklebost, Ola; Teixeira, Manuel R; Meza-Zepeda, Leonardo A; Lothe, Ragnhild A; Skotheim, Rolf I

    2013-01-01

    Colorectal cancer (CRC) is the third most common cancer disease in the Western world, and about 40% of the patients die from this disease. The cancer cells are commonly genetically unstable, but only a few low-frequency recurrent fusion genes have so far been reported for this disease. In this study, we present a thorough search for novel fusion transcripts in CRC using high-throughput RNA sequencing. From altogether 220 million paired-end sequence reads from seven CRC cell lines, we identified 3391 candidate fused transcripts. By stringent requirements, we nominated 11 candidate fusion transcripts for further experimental validation, of which 10 were positive by reverse transcription-polymerase chain reaction and Sanger sequencing. Six were intrachromosomal fusion transcripts, and interestingly, three of these, AKAP13-PDE8A, COMMD10-AP3S1, and CTB-35F21.1-PSD2, were present in, respectively, 18, 18, and 20 of 21 analyzed cell lines and in, respectively, 18, 61, and 48 (17%-58%) of 106 primary cancer tissues. These three fusion transcripts were also detected in 2 to 4 of 14 normal colonic mucosa samples (14%–28%). Whole-genome sequencing identified a specific genomic breakpoint in COMMD10-AP3S1 and further indicates that both the COMMD10-AP3S1 and AKAP13-PDE8A fusion transcripts are due to genomic duplications in specific cell lines. In conclusion, we have identified AKAP13-PDE8A, COMMD10-AP3S1, and CTB-35F21.1-PSD2 as novel intrachromosomal fusion transcripts and the most highly recurring chimeric transcripts described for CRC to date. The functional and clinical relevance of these chimeric RNA molecules remains to be elucidated. PMID:24151535

  5. Identification of microRNAs from Eugenia uniflora by high-throughput sequencing and bioinformatics analysis.

    Science.gov (United States)

    Guzman, Frank; Almerão, Mauricio P; Körbes, Ana P; Loss-Morais, Guilherme; Margis, Rogerio

    2012-01-01

    microRNAs or miRNAs are small non-coding regulatory RNAs that play important functions in the regulation of gene expression at the post-transcriptional level by targeting mRNAs for degradation or inhibiting protein translation. Eugenia uniflora is a plant native to tropical America with pharmacological and ecological importance, and there have been no previous studies concerning its gene expression and regulation. To date, no miRNAs have been reported in Myrtaceae species. Small RNA and RNA-seq libraries were constructed to identify miRNAs and pre-miRNAs in Eugenia uniflora. Solexa technology was used to perform high throughput sequencing of the library, and the data obtained were analyzed using bioinformatics tools. From 14,489,131 small RNA clean reads, we obtained 1,852,722 mature miRNA sequences representing 45 conserved families that have been identified in other plant species. Further analysis using contigs assembled from RNA-seq allowed the prediction of secondary structures of 25 known and 17 novel pre-miRNAs. The expression of twenty-seven identified miRNAs was also validated using RT-PCR assays. Potential targets were predicted for the most abundant mature miRNAs in the identified pre-miRNAs based on sequence homology. This study is the first large scale identification of miRNAs and their potential targets from a species of the Myrtaceae family without genomic sequence resources. Our study provides more information about the evolutionary conservation of the regulatory network of miRNAs in plants and highlights species-specific miRNAs.

  6. LightAssembler: fast and memory-efficient assembly algorithm for high-throughput sequencing reads.

    Science.gov (United States)

    El-Metwally, Sara; Zakaria, Magdi; Hamza, Taher

    2016-11-01

    The deluge of current sequenced data has exceeded Moore's Law, more than doubling every 2 years since the next-generation sequencing (NGS) technologies were invented. Accordingly, we will able to generate more and more data with high speed at fixed cost, but lack the computational resources to store, process and analyze it. With error prone high throughput NGS reads and genomic repeats, the assembly graph contains massive amount of redundant nodes and branching edges. Most assembly pipelines require this large graph to reside in memory to start their workflows, which is intractable for mammalian genomes. Resource-efficient genome assemblers combine both the power of advanced computing techniques and innovative data structures to encode the assembly graph efficiently in a computer memory. LightAssembler is a lightweight assembly algorithm designed to be executed on a desktop machine. It uses a pair of cache oblivious Bloom filters, one holding a uniform sample of [Formula: see text]-spaced sequenced [Formula: see text]-mers and the other holding [Formula: see text]-mers classified as likely correct, using a simple statistical test. LightAssembler contains a light implementation of the graph traversal and simplification modules that achieves comparable assembly accuracy and contiguity to other competing tools. Our method reduces the memory usage by [Formula: see text] compared to the resource-efficient assemblers using benchmark datasets from GAGE and Assemblathon projects. While LightAssembler can be considered as a gap-based sequence assembler, different gap sizes result in an almost constant assembly size and genome coverage. https://github.com/SaraEl-Metwally/LightAssembler CONTACT: sarah_almetwally4@mans.edu.egSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  7. A Novel High-Throughput Approach to Measure Hydroxyl Radicals Induced by Airborne Particulate Matter

    Directory of Open Access Journals (Sweden)

    Yeongkwon Son

    2015-10-01

    Full Text Available Oxidative stress is one of the key mechanisms linking ambient particulate matter (PM exposure with various adverse health effects. The oxidative potential of PM has been used to characterize the ability of PM induced oxidative stress. Hydroxyl radical (•OH is the most destructive radical produced by PM. However, there is currently no high-throughput approach which can rapidly measure PM-induced •OH for a large number of samples with an automated system. This study evaluated four existing molecular probes (disodium terephthalate, 3′-p-(aminophenylfluorescein, coumarin-3-carboxylic acid, and sodium benzoate for their applicability to measure •OH induced by PM in a high-throughput cell-free system using fluorescence techniques, based on both our experiments and on an assessment of the physicochemical properties of the probes reported in the literature. Disodium terephthalate (TPT was the most applicable molecular probe to measure •OH induced by PM, due to its high solubility, high stability of the corresponding fluorescent product (i.e., 2-hydroxyterephthalic acid, high yield compared with the other molecular probes, and stable fluorescence intensity in a wide range of pH environments. TPT was applied in a high-throughput format to measure PM (NIST 1648a-induced •OH, in phosphate buffered saline. The formed fluorescent product was measured at designated time points up to 2 h. The fluorescent product of TPT had a detection limit of 17.59 nM. The soluble fraction of PM contributed approximately 76.9% of the •OH induced by total PM, and the soluble metal ions of PM contributed 57.4% of the overall •OH formation. This study provides a promising cost-effective high-throughput method to measure •OH induced by PM on a routine basis.

  8. Diversity and Structure of Diazotrophic Communities in Mangrove Rhizosphere, Revealed by High-Throughput Sequencing

    Directory of Open Access Journals (Sweden)

    Yanying Zhang

    2017-10-01

    Full Text Available Diazotrophic communities make an essential contribution to the productivity through providing new nitrogen. However, knowledge of the roles that both mangrove tree species and geochemical parameters play in shaping mangove rhizosphere diazotrophic communities is still elusive. Here, a comprehensive examination of the diversity and structure of microbial communities in the rhizospheres of three mangrove species, Rhizophora apiculata, Avicennia marina, and Ceriops tagal, was undertaken using high-throughput sequencing of the 16S rRNA and nifH genes. Our results revealed a great diversity of both the total microbial composition and the diazotrophic composition specifically in the mangrove rhizosphere. Deltaproteobacteria and Gammaproteobacteria were both ubiquitous and dominant, comprising an average of 45.87 and 86.66% of total microbial and diazotrophic communities, respectively. Sulfate-reducing bacteria belonging to the Desulfobacteraceae and Desulfovibrionaceae were the dominant diazotrophs. Community statistical analyses suggested that both mangrove tree species and additional environmental variables played important roles in shaping total microbial and potential diazotroph communities in mangrove rhizospheres. In contrast to the total microbial community investigated by analysis of 16S rRNA gene sequences, most of the dominant diazotrophic groups identified by nifH gene sequences were significantly different among mangrove species. The dominant diazotrophs of the family Desulfobacteraceae were positively correlated with total phosphorus, but negatively correlated with the nitrogen to phosphorus ratio. The Pseudomonadaceae were positively correlated with the concentration of available potassium, suggesting that diazotrophs potentially play an important role in biogeochemical cycles, such as those of nitrogen, phosphorus, sulfur, and potassium, in the mangrove ecosystem.

  9. NucTools: analysis of chromatin feature occupancy profiles from high-throughput sequencing data.

    Science.gov (United States)

    Vainshtein, Yevhen; Rippe, Karsten; Teif, Vladimir B

    2017-02-14

    Biomedical applications of high-throughput sequencing methods generate a vast amount of data in which numerous chromatin features are mapped along the genome. The results are frequently analysed by creating binary data sets that link the presence/absence of a given feature to specific genomic loci. However, the nucleosome occupancy or chromatin accessibility landscape is essentially continuous. It is currently a challenge in the field to cope with continuous distributions of deep sequencing chromatin readouts and to integrate the different types of discrete chromatin features to reveal linkages between them. Here we introduce the NucTools suite of Perl scripts as well as MATLAB- and R-based visualization programs for a nucleosome-centred downstream analysis of deep sequencing data. NucTools accounts for the continuous distribution of nucleosome occupancy. It allows calculations of nucleosome occupancy profiles averaged over several replicates, comparisons of nucleosome occupancy landscapes between different experimental conditions, and the estimation of the changes of integral chromatin properties such as the nucleosome repeat length. Furthermore, NucTools facilitates the annotation of nucleosome occupancy with other chromatin features like binding of transcription factors or architectural proteins, and epigenetic marks like histone modifications or DNA methylation. The applications of NucTools are demonstrated for the comparison of several datasets for nucleosome occupancy in mouse embryonic stem cells (ESCs) and mouse embryonic fibroblasts (MEFs). The typical workflows of data processing and integrative analysis with NucTools reveal information on the interplay of nucleosome positioning with other features such as for example binding of a transcription factor CTCF, regions with stable and unstable nucleosomes, and domains of large organized chromatin K9me2 modifications (LOCKs). As potential limitations and problems we discuss how inter-replicate variability of

  10. Preselection of shotgun clones by oligonucleotide fingerprinting: an efficient and high throughput strategy to reduce redundancy in large-scale sequencing projects

    National Research Council Canada - National Science Library

    Radelof, U; Hennig, S; Seranski, P; Steinfath, M; Ramser, J; Reinhardt, R; Poustka, A; Francis, F; Lehrach, H

    1998-01-01

    .... To reduce the overall effort and cost of those projects and to accelerate the sequencing throughput, we have developed an efficient, high throughput oligonucleotide fingerprinting protocol to select...

  11. Genome-Wide Assessment of the Binding Effects of Artificial Transcriptional Activators by High-Throughput Sequencing.

    Science.gov (United States)

    Chandran, Anandhakumar; Syed, Junetha; Li, Yue; Sato, Shinsuke; Bando, Toshikazu; Sugiyama, Hiroshi

    2016-10-17

    One of the major goals in DNA-based personalized medicine is the development of sequence-specific small molecules to target the genome. SAHA-PIPs belong to such class of small molecule. In the context of the complex eukaryotic genome, the differential biological effects of SAHA-PIPs are unclear. This question can be addressed by identifying the binding regions across the genome; however, it is a challenge to enrich small-molecule-bound DNA without chemical crosslinking. Here, we developed a method that employs high-throughput sequencing to map the binding area of small molecules throughout the chromatinized human genome. Analysis of the sequenced data confirmed the presence of specific binding sites for SAHA-PIPs from the enriched sequence reads. Mapping the binding sites and enriched regions on the human genome clarifies the reason for the distinct biological effects of SAHA-PIP. This approach will be useful for identifying the function of other small molecules on a large scale. © 2016 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.

  12. naiveBayesCall: an efficient model-based base-calling algorithm for high-throughput sequencing.

    Science.gov (United States)

    Kao, Wei-Chun; Song, Yun S

    2011-03-01

    Immense amounts of raw instrument data (i.e., images of fluorescence) are currently being generated using ultra high-throughput sequencing platforms. An important computational challenge associated with this rapid advancement is to develop efficient algorithms that can extract accurate sequence information from raw data. To address this challenge, we recently introduced a novel model-based base-calling algorithm that is fully parametric and has several advantages over previously proposed methods. Our original algorithm, called BayesCall, significantly reduced the error rate, particularly in the later cycles of a sequencing run, and also produced useful base-specific quality scores with a high discrimination ability. Unfortunately, however, BayesCall is too computationally expensive to be of broad practical use. In this article, we build on our previous model-based approach to devise an efficient base-calling algorithm that is orders of magnitude faster than BayesCall, while still maintaining a comparably high level of accuracy. Our new algorithm is called naive-BayesCall, and it utilizes approximation and optimization methods to achieve scalability. We describe the performance of naiveBayesCall and demonstrate how improved base-calling accuracy may facilitate de novo assembly and SNP detection when the sequence coverage depth is low to moderate.

  13. Comparison of mapping algorithms used in high-throughput sequencing: application to Ion Torrent data.

    Science.gov (United States)

    Caboche, Ségolène; Audebert, Christophe; Lemoine, Yves; Hot, David

    2014-04-05

    The rapid evolution in high-throughput sequencing (HTS) technologies has opened up new perspectives in several research fields and led to the production of large volumes of sequence data. A fundamental step in HTS data analysis is the mapping of reads onto reference sequences. Choosing a suitable mapper for a given technology and a given application is a subtle task because of the difficulty of evaluating mapping algorithms. In this paper, we present a benchmark procedure to compare mapping algorithms used in HTS using both real and simulated datasets and considering four evaluation criteria: computational resource and time requirements, robustness of mapping, ability to report positions for reads in repetitive regions, and ability to retrieve true genetic variation positions. To measure robustness, we introduced a new definition for a correctly mapped read taking into account not only the expected start position of the read but also the end position and the number of indels and substitutions. We developed CuReSim, a new read simulator, that is able to generate customized benchmark data for any kind of HTS technology by adjusting parameters to the error types. CuReSim and CuReSimEval, a tool to evaluate the mapping quality of the CuReSim simulated reads, are freely available. We applied our benchmark procedure to evaluate 14 mappers in the context of whole genome sequencing of small genomes with Ion Torrent data for which such a comparison has not yet been established. A benchmark procedure to compare HTS data mappers is introduced with a new definition for the mapping correctness as well as tools to generate simulated reads and evaluate mapping quality. The application of this procedure to Ion Torrent data from the whole genome sequencing of small genomes has allowed us to validate our benchmark procedure and demonstrate that it is helpful for selecting a mapper based on the intended application, questions to be addressed, and the technology used. This benchmark

  14. High-throughput sequencing of RNA silencing-associated small RNAs in olive (Olea europaea L..

    Directory of Open Access Journals (Sweden)

    Livia Donaire

    Full Text Available Small RNAs (sRNAs of 20 to 25 nucleotides (nt in length maintain genome integrity and control gene expression in a multitude of developmental and physiological processes. Despite RNA silencing has been primarily studied in model plants, the advent of high-throughput sequencing technologies has enabled profiling of the sRNA component of more than 40 plant species. Here, we used deep sequencing and molecular methods to report the first inventory of sRNAs in olive (Olea europaea L.. sRNA libraries prepared from juvenile and adult shoots revealed that the 24-nt class dominates the sRNA transcriptome and atypically accumulates to levels never seen in other plant species, suggesting an active role of heterochromatin silencing in the maintenance and integrity of its large genome. A total of 18 known miRNA families were identified in the libraries. Also, 5 other sRNAs derived from potential hairpin-like precursors remain as plausible miRNA candidates. RNA blots confirmed miRNA expression and suggested tissue- and/or developmental-specific expression patterns. Target mRNAs of conserved miRNAs were computationally predicted among the olive cDNA collection and experimentally validated through endonucleolytic cleavage assays. Finally, we use expression data to uncover genetic components of the miR156, miR172 and miR390/TAS3-derived trans-acting small interfering RNA (tasiRNA regulatory nodes, suggesting that these interactive networks controlling developmental transitions are fully operational in olive.

  15. Forensic soil DNA analysis using high-throughput sequencing: a comparison of four molecular markers.

    Science.gov (United States)

    Young, Jennifer M; Weyrich, Laura S; Cooper, Alan

    2014-11-01

    Soil analysis, such as mineralogy, geophysics, texture and colour, are commonly used in forensic casework to link a suspect to a crime scene. However, DNA analysis can also be applied to characterise the vast diversity of organisms present in soils. DNA metabarcoding and high-throughput sequencing (HTS) now offer a means to improve discrimination between forensic soil samples by identifying individual taxa and exploring non-culturable microbial species. Here, we compare the small-scale reproducibility and resolution of four molecular markers targeting different taxa (bacterial 16S rRNA, eukaryotic18S rRNA, plant trnL intron and fungal internal transcribed spacer I (ITS1) rDNA) to distinguish two sample sites. We also assess the background DNA level associated with each marker and examine the effects of filtering Operational Taxonomic Units (OTUs) detected in extraction blank controls. From this study, we show that non-bacterial taxa in soil, particularly fungi, can provide the greatest resolution between the sites, whereas plant markers may be problematic for forensic discrimination. ITS and 18S markers exhibit reliable amplification, and both show high discriminatory power with low background DNA levels. The 16S rRNA marker showed comparable discriminatory power post filtering; however, presented the highest level of background DNA. The discriminatory power of all markers was increased by applying OTU filtering steps, with the greatest improvement observed by the removal of any sequences detected in extraction blanks. This study demonstrates the potential use of multiple DNA markers for forensic soil analysis using HTS, and identifies some of the standardisation and evaluation steps necessary before this technique can be applied in casework. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  16. Influence of artifact removal on rare species recovery in natural complex communities using high-throughput sequencing.

    Directory of Open Access Journals (Sweden)

    Aibin Zhan

    Full Text Available Large-scale high-throughput sequencing techniques are rapidly becoming popular methods to profile complex communities and have generated deep insights into community biodiversity. However, several technical problems, especially sequencing artifacts such as nucleotide calling errors, could artificially inflate biodiversity estimates. Sequence filtering for artifact removal is a conventional method for deleting error-prone sequences from high-throughput sequencing data. As rare species represented by low-abundance sequences in datasets may be sensitive to artifact removal process, the influence of artifact removal on rare species recovery has not been well evaluated in natural complex communities. Here we employed both internal (reliable operational taxonomic units selected from communities themselves and external (indicator species spiked into communities references to evaluate the influence of artifact removal on rare species recovery using 454 pyrosequencing of complex plankton communities collected from both freshwater and marine habitats. Multiple analyses revealed three clear patterns: 1 rare species were eliminated during sequence filtering process at all tested filtering stringencies, 2 more rare taxa were eliminated as filtering stringencies increased, and 3 elimination of rare species intensified as biomass of a species in a community was reduced. Our results suggest that cautions be applied when processing high-throughput sequencing data, especially for rare taxa detection for conservation of species at risk and for rapid response programs targeting non-indigenous species. Establishment of both internal and external references proposed here provides a practical strategy to evaluate artifact removal process.

  17. The High Throughput Sequence Annotation Service (HT-SAS – the shortcut from sequence to true Medline words

    Directory of Open Access Journals (Sweden)

    Siedlecki Pawel

    2009-05-01

    Full Text Available Abstract Background Advances in high-throughput technologies available to modern biology have created an increasing flood of experimentally determined facts. Ordering, managing and describing these raw results is the first step which allows facts to become knowledge. Currently there are limited ways to automatically annotate such data, especially utilizing information deposited in published literature. Results To aid researchers in describing results from high-throughput experiments we developed HT-SAS, a web service for automatic annotation of proteins using general English words. For each protein a poll of Medline abstracts connected to homologous proteins is gathered using the UniProt-Medline link. Overrepresented words are detected using binomial statistics approximation. We tested our automatic approach with a protein test set from SGD to determine the accuracy and usefulness of our approach. We also applied the automatic annotation service to improve annotations of proteins from Plasmodium bergei expressed exclusively during the blood stage. Conclusion Using HT-SAS we created new, or enriched already established annotations for over 20% of proteins from Plasmodium bergei expressed in the blood stage, deposited in PlasmoDB. Our tests show this approach to information extraction provides highly specific keywords, often also when the number of abstracts is limited. Our service should be useful for manual curators, as a complement to manually curated information sources and for researchers working with protein datasets, especially from poorly characterized organisms.

  18. eRNA: a graphic user interface-based tool optimized for large data analysis from high-throughput RNA sequencing.

    Science.gov (United States)

    Yuan, Tiezheng; Huang, Xiaoyi; Dittmar, Rachel L; Du, Meijun; Kohli, Manish; Boardman, Lisa; Thibodeau, Stephen N; Wang, Liang

    2014-03-05

    RNA sequencing (RNA-seq) is emerging as a critical approach in biological research. However, its high-throughput advantage is significantly limited by the capacity of bioinformatics tools. The research community urgently needs user-friendly tools to efficiently analyze the complicated data generated by high throughput sequencers. We developed a standalone tool with graphic user interface (GUI)-based analytic modules, known as eRNA. The capacity of performing parallel processing and sample management facilitates large data analyses by maximizing hardware usage and freeing users from tediously handling sequencing data. The module miRNA identification" includes GUIs for raw data reading, adapter removal, sequence alignment, and read counting. The module "mRNA identification" includes GUIs for reference sequences, genome mapping, transcript assembling, and differential expression. The module "Target screening" provides expression profiling analyses and graphic visualization. The module "Self-testing" offers the directory setups, sample management, and a check for third-party package dependency. Integration of other GUIs including Bowtie, miRDeep2, and miRspring extend the program's functionality. eRNA focuses on the common tools required for the mapping and quantification analysis of miRNA-seq and mRNA-seq data. The software package provides an additional choice for scientists who require a user-friendly computing environment and high-throughput capacity for large data analysis. eRNA is available for free download at https://sourceforge.net/projects/erna/?source=directory.

  19. Unifying the analysis of high-throughput sequencing datasets: characterizing RNA-seq, 16S rRNA gene sequencing and selective growth experiments by compositional data analysis.

    Science.gov (United States)

    Fernandes, Andrew D; Reid, Jennifer Ns; Macklaim, Jean M; McMurrough, Thomas A; Edgell, David R; Gloor, Gregory B

    2014-01-01

    Experimental designs that take advantage of high-throughput sequencing to generate datasets include RNA sequencing (RNA-seq), chromatin immunoprecipitation sequencing (ChIP-seq), sequencing of 16S rRNA gene fragments, metagenomic analysis and selective growth experiments. In each case the underlying data are similar and are composed of counts of sequencing reads mapped to a large number of features in each sample. Despite this underlying similarity, the data analysis methods used for these experimental designs are all different, and do not translate across experiments. Alternative methods have been developed in the physical and geological sciences that treat similar data as compositions. Compositional data analysis methods transform the data to relative abundances with the result that the analyses are more robust and reproducible. Data from an in vitro selective growth experiment, an RNA-seq experiment and the Human Microbiome Project 16S rRNA gene abundance dataset were examined by ALDEx2, a compositional data analysis tool that uses Bayesian methods to infer technical and statistical error. The ALDEx2 approach is shown to be suitable for all three types of data: it correctly identifies both the direction and differential abundance of features in the differential growth experiment, it identifies a substantially similar set of differentially expressed genes in the RNA-seq dataset as the leading tools and it identifies as differential the taxa that distinguish the tongue dorsum and buccal mucosa in the Human Microbiome Project dataset. The design of ALDEx2 reduces the number of false positive identifications that result from datasets composed of many features in few samples. Statistical analysis of high-throughput sequencing datasets composed of per feature counts showed that the ALDEx2 R package is a simple and robust tool, which can be applied to RNA-seq, 16S rRNA gene sequencing and differential growth datasets, and by extension to other techniques that use a

  20. Construction and analysis of an integrated regulatory network derived from high-throughput sequencing data.

    Directory of Open Access Journals (Sweden)

    Chao Cheng

    2011-11-01

    Full Text Available We present a network framework for analyzing multi-level regulation in higher eukaryotes based on systematic integration of various high-throughput datasets. The network, namely the integrated regulatory network, consists of three major types of regulation: TF→gene, TF→miRNA and miRNA→gene. We identified the target genes and target miRNAs for a set of TFs based on the ChIP-Seq binding profiles, the predicted targets of miRNAs using annotated 3'UTR sequences and conservation information. Making use of the system-wide RNA-Seq profiles, we classified transcription factors into positive and negative regulators and assigned a sign for each regulatory interaction. Other types of edges such as protein-protein interactions and potential intra-regulations between miRNAs based on the embedding of miRNAs in their host genes were further incorporated. We examined the topological structures of the network, including its hierarchical organization and motif enrichment. We found that transcription factors downstream of the hierarchy distinguish themselves by expressing more uniformly at various tissues, have more interacting partners, and are more likely to be essential. We found an over-representation of notable network motifs, including a FFL in which a miRNA cost-effectively shuts down a transcription factor and its target. We used data of C. elegans from the modENCODE project as a primary model to illustrate our framework, but further verified the results using other two data sets. As more and more genome-wide ChIP-Seq and RNA-Seq data becomes available in the near future, our methods of data integration have various potential applications.

  1. Exploring the sources of bacterial spoilers in beefsteaks by culture-independent high-throughput sequencing.

    Directory of Open Access Journals (Sweden)

    Francesca De Filippis

    Full Text Available Microbial growth on meat to unacceptable levels contributes significantly to change meat structure, color and flavor and to cause meat spoilage. The types of microorganisms initially present in meat depend on several factors and multiple sources of contamination can be identified. The aims of this study were to evaluate the microbial diversity in beefsteaks before and after aerobic storage at 4°C and to investigate the sources of microbial contamination by examining the microbiota of carcasses wherefrom the steaks originated and of the processing environment where the beef was handled. Carcass, environmental (processing plant and meat samples were analyzed by culture-independent high-throughput sequencing of 16S rRNA gene amplicons. The microbiota of carcass swabs was very complex, including more than 600 operational taxonomic units (OTUs belonging to 15 different phyla. A significant association was found between beef microbiota and specific beef cuts (P<0.01 indicating that different cuts of the same carcass can influence the microbial contamination of beef. Despite the initially high complexity of the carcass microbiota, the steaks after aerobic storage at 4°C showed a dramatic decrease in microbial complexity. Pseudomonas sp. and Brochothrix thermosphacta were the main contaminants, and Acinetobacter, Psychrobacter and Enterobacteriaceae were also found. Comparing the relative abundance of OTUs in the different samples it was shown that abundant OTUs in beefsteaks after storage occurred in the corresponding carcass. However, the abundance of these same OTUs clearly increased in environmental samples taken in the processing plant suggesting that spoilage-associated microbial species originate from carcasses, they are carried to the processing environment where the meat is handled and there they become a resident microbiota. Such microbiota is then further spread on meat when it is handled and it represents the starting microbial association

  2. State of the Art High-Throughput Approaches to Genotoxicity: Flow Micronucleus, Ames II, GreenScreen and Comet

    Science.gov (United States)

    State of the Art High-Throughput Approaches to Genotoxicity: Flow Micronucleus, Ames II, GreenScreen and Comet (Presented by Dr. Marilyn J. Aardema, Chief Scientific Advisor, Toxicology, Dr. Leon Stankowski, et. al. (6/28/2012)

  3. High Throughput Sample Preparation and Analysis for DNA Sequencing, PCR and Combinatorial Screening of Catalysis Based on Capillary Array Technique

    Energy Technology Data Exchange (ETDEWEB)

    Zhang, Yonghua [Iowa State Univ., Ames, IA (United States)

    2000-01-01

    Sample preparation has been one of the major bottlenecks for many high throughput analyses. The purpose of this research was to develop new sample preparation and integration approach for DNA sequencing, PCR based DNA analysis and combinatorial screening of homogeneous catalysis based on multiplexed capillary electrophoresis with laser induced fluorescence or imaging UV absorption detection. The author first introduced a method to integrate the front-end tasks to DNA capillary-array sequencers. protocols for directly sequencing the plasmids from a single bacterial colony in fused-silica capillaries were developed. After the colony was picked, lysis was accomplished in situ in the plastic sample tube using either a thermocycler or heating block. Upon heating, the plasmids were released while chromsomal DNA and membrane proteins were denatured and precipitated to the bottom of the tube. After adding enzyme and Sanger reagents, the resulting solution was aspirated into the reaction capillaries by a syringe pump, and cycle sequencing was initiated. No deleterious effect upon the reaction efficiency, the on-line purification system, or the capillary electrophoresis separation was observed, even though the crude lysate was used as the template. Multiplexed on-line DNA sequencing data from 8 parallel channels allowed base calling up to 620 bp with an accuracy of 98%. The entire system can be automatically regenerated for repeated operation. For PCR based DNA analysis, they demonstrated that capillary electrophoresis with UV detection can be used for DNA analysis starting from clinical sample without purification. After PCR reaction using cheek cell, blood or HIV-1 gag DNA, the reaction mixtures was injected into the capillary either on-line or off-line by base stacking. The protocol was also applied to capillary array electrophoresis. The use of cheaper detection, and the elimination of purification of DNA sample before or after PCR reaction, will make this approach an

  4. Experimental design-based functional mining and characterization of high-throughput sequencing data in the sequence read archive.

    Science.gov (United States)

    Nakazato, Takeru; Ohta, Tazro; Bono, Hidemasa

    2013-01-01

    High-throughput sequencing technology, also called next-generation sequencing (NGS), has the potential to revolutionize the whole process of genome sequencing, transcriptomics, and epigenetics. Sequencing data is captured in a public primary data archive, the Sequence Read Archive (SRA). As of January 2013, data from more than 14,000 projects have been submitted to SRA, which is double that of the previous year. Researchers can download raw sequence data from SRA website to perform further analyses and to compare with their own data. However, it is extremely difficult to search entries and download raw sequences of interests with SRA because the data structure is complicated, and experimental conditions along with raw sequences are partly described in natural language. Additionally, some sequences are of inconsistent quality because anyone can submit sequencing data to SRA with no quality check. Therefore, as a criterion of data quality, we focused on SRA entries that were cited in journal articles. We extracted SRA IDs and PubMed IDs (PMIDs) from SRA and full-text versions of journal articles and retrieved 2748 SRA ID-PMID pairs. We constructed a publication list referring to SRA entries. Since, one of the main themes of -omics analyses is clarification of disease mechanisms, we also characterized SRA entries by disease keywords, according to the Medical Subject Headings (MeSH) extracted from articles assigned to each SRA entry. We obtained 989 SRA ID-MeSH disease term pairs, and constructed a disease list referring to SRA data. We previously developed feature profiles of diseases in a system called "Gendoo". We generated hyperlinks between diseases extracted from SRA and the feature profiles of it. The developed project, publication and disease lists resulting from this study are available at our web service, called "DBCLS SRA" (http://sra.dbcls.jp/). This service will improve accessibility to high-quality data from SRA.

  5. Experimental design-based functional mining and characterization of high-throughput sequencing data in the sequence read archive.

    Directory of Open Access Journals (Sweden)

    Takeru Nakazato

    Full Text Available High-throughput sequencing technology, also called next-generation sequencing (NGS, has the potential to revolutionize the whole process of genome sequencing, transcriptomics, and epigenetics. Sequencing data is captured in a public primary data archive, the Sequence Read Archive (SRA. As of January 2013, data from more than 14,000 projects have been submitted to SRA, which is double that of the previous year. Researchers can download raw sequence data from SRA website to perform further analyses and to compare with their own data. However, it is extremely difficult to search entries and download raw sequences of interests with SRA because the data structure is complicated, and experimental conditions along with raw sequences are partly described in natural language. Additionally, some sequences are of inconsistent quality because anyone can submit sequencing data to SRA with no quality check. Therefore, as a criterion of data quality, we focused on SRA entries that were cited in journal articles. We extracted SRA IDs and PubMed IDs (PMIDs from SRA and full-text versions of journal articles and retrieved 2748 SRA ID-PMID pairs. We constructed a publication list referring to SRA entries. Since, one of the main themes of -omics analyses is clarification of disease mechanisms, we also characterized SRA entries by disease keywords, according to the Medical Subject Headings (MeSH extracted from articles assigned to each SRA entry. We obtained 989 SRA ID-MeSH disease term pairs, and constructed a disease list referring to SRA data. We previously developed feature profiles of diseases in a system called "Gendoo". We generated hyperlinks between diseases extracted from SRA and the feature profiles of it. The developed project, publication and disease lists resulting from this study are available at our web service, called "DBCLS SRA" (http://sra.dbcls.jp/. This service will improve accessibility to high-quality data from SRA.

  6. Whole Genome Sequencing of Enterovirus species C Isolates by High-throughput Sequencing: Development of Generic Primers

    Directory of Open Access Journals (Sweden)

    Maël Bessaud

    2016-08-01

    Full Text Available Enteroviruses are among the most common viruses infecting humans and can cause diverse clinical syndromes ranging from minor febrile illness to severe and potentially fatal diseases. Enterovirus species C (EV-C consists of more than 20 types, among which the 3 serotypes of polioviruses, the etiological agents of poliomyelitis, are included. Biodiversity and evolution of EV-C genomes are shaped by frequent recombination events. Therefore, identification and characterization of circulating EV-C strains require the sequencing of different genomic regions.A simple method was developed to sequence quickly the entire genome of EV-C isolates. Four overlapping fragments were produced separately by RT-PCR performed with generic primers. The four amplicons were then pooled and purified prior to be sequenced by high-throughput technique.The method was assessed on a panel of EV-Cs belonging to a wide-range of types. It can be used to determine full-length genome sequences through de novo assembly of thousands of reads. It was also able to discriminate reads from closely related viruses in mixtures.By decreasing the workload compared to classical Sanger-based techniques, this method will serve as a precious tool for sequencing large panels of EV-Cs isolated in cell cultures during environmental surveillance or from patients, including vaccine-derived polioviruses.

  7. A Direct Comparison of Remote Sensing Approaches for High-Throughput Phenotyping in Plant Breeding

    Science.gov (United States)

    Tattaris, Maria; Reynolds, Matthew P.; Chapman, Scott C.

    2016-01-01

    Remote sensing (RS) of plant canopies permits non-intrusive, high-throughput monitoring of plant physiological characteristics. This study compared three RS approaches using a low flying UAV (unmanned aerial vehicle), with that of proximal sensing, and satellite-based imagery. Two physiological traits were considered, canopy temperature (CT) and a vegetation index (NDVI), to determine the most viable approaches for large scale crop genetic improvement. The UAV-based platform achieves plot-level resolution while measuring several hundred plots in one mission via high-resolution thermal and multispectral imagery measured at altitudes of 30–100 m. The satellite measures multispectral imagery from an altitude of 770 km. Information was compared with proximal measurements using IR thermometers and an NDVI sensor at a distance of 0.5–1 m above plots. For robust comparisons, CT and NDVI were assessed on panels of elite cultivars under irrigated and drought conditions, in different thermal regimes, and on un-adapted genetic resources under water deficit. Correlations between airborne data and yield/biomass at maturity were generally higher than equivalent proximal correlations. NDVI was derived from high-resolution satellite imagery for only larger sized plots (8.5 × 2.4 m) due to restricted pixel density. Results support use of UAV-based RS techniques for high-throughput phenotyping for both precision and efficiency. PMID:27536304

  8. A direct comparison of remote sensing approaches for high-throughput phenotyping in plant breeding

    Directory of Open Access Journals (Sweden)

    Maria Tattaris

    2016-08-01

    Full Text Available Remote sensing (RS of plant canopies permits non-intrusive, high-throughput monitoring of plant physiological characteristics. This study compared three RS approaches using a low flying UAV (unmanned aerial vehicle, with that of proximal sensing, and satellite-based imagery. Two physiological traits were considered, canopy temperature (CT and a vegetation index (NDVI, to determine the most viable approaches for large scale crop genetic improvement. The UAV-based platform achieves plot-level resolution while measuring several hundred plots in one mission via high-resolution thermal and multispectral imagery measured at altitudes of 30-100 m. The satellite measures multispectral imagery from an altitude of 770 km. Information was compared with proximal measurements using IR thermometers and an NDVI sensor at a distance of 0.5-1m above plots. For robust comparisons, CT and NDVI were assessed on panels of elite cultivars under irrigated and drought conditions, in different thermal regimes, and on un-adapted genetic resources under water deficit. Correlations between airborne data and yield/biomass at maturity were generally higher than equivalent proximal correlations. NDVI was derived from high-resolution satellite imagery for only larger sized plots (8.5 x 2.4 m due to restricted pixel density. Results support use of UAV-based RS techniques for high-throughput phenotyping for both precision and efficiency.

  9. A Direct Comparison of Remote Sensing Approaches for High-Throughput Phenotyping in Plant Breeding.

    Science.gov (United States)

    Tattaris, Maria; Reynolds, Matthew P; Chapman, Scott C

    2016-01-01

    Remote sensing (RS) of plant canopies permits non-intrusive, high-throughput monitoring of plant physiological characteristics. This study compared three RS approaches using a low flying UAV (unmanned aerial vehicle), with that of proximal sensing, and satellite-based imagery. Two physiological traits were considered, canopy temperature (CT) and a vegetation index (NDVI), to determine the most viable approaches for large scale crop genetic improvement. The UAV-based platform achieves plot-level resolution while measuring several hundred plots in one mission via high-resolution thermal and multispectral imagery measured at altitudes of 30-100 m. The satellite measures multispectral imagery from an altitude of 770 km. Information was compared with proximal measurements using IR thermometers and an NDVI sensor at a distance of 0.5-1 m above plots. For robust comparisons, CT and NDVI were assessed on panels of elite cultivars under irrigated and drought conditions, in different thermal regimes, and on un-adapted genetic resources under water deficit. Correlations between airborne data and yield/biomass at maturity were generally higher than equivalent proximal correlations. NDVI was derived from high-resolution satellite imagery for only larger sized plots (8.5 × 2.4 m) due to restricted pixel density. Results support use of UAV-based RS techniques for high-throughput phenotyping for both precision and efficiency.

  10. A high-throughput media design approach for high performance mammalian fed-batch cultures.

    Science.gov (United States)

    Rouiller, Yolande; Périlleux, Arnaud; Collet, Natacha; Jordan, Martin; Stettler, Matthieu; Broly, Hervé

    2013-01-01

    An innovative high-throughput medium development method based on media blending was successfully used to improve the performance of a Chinese hamster ovary fed-batch medium in shaking 96-deepwell plates. Starting from a proprietary chemically-defined medium, 16 formulations testing 43 of 47 components at 3 different levels were designed. Media blending was performed following a custom-made mixture design of experiments considering binary blends, resulting in 376 different blends that were tested during both cell expansion and fed-batch production phases in one single experiment. Three approaches were chosen to provide the best output of the large amount of data obtained. A simple ranking of conditions was first used as a quick approach to select new formulations with promising features. Then, prediction of the best mixes was done to maximize both growth and titer using the Design Expert software. Finally, a multivariate analysis enabled identification of individual potential critical components for further optimization. Applying this high-throughput method on a fed-batch, rather than on a simple batch, process opens new perspectives for medium and feed development that enables identification of an optimized process in a short time frame.

  11. Morphology control in polymer blend fibers—a high throughput computing approach

    Science.gov (United States)

    Sesha Sarath Pokuri, Balaji; Ganapathysubramanian, Baskar

    2016-08-01

    Fibers made from polymer blends have conventionally enjoyed wide use, particularly in textiles. This wide applicability is primarily aided by the ease of manufacturing such fibers. More recently, the ability to tailor the internal morphology of polymer blend fibers by carefully designing processing conditions has enabled such fibers to be used in technologically relevant applications. Some examples include anisotropic insulating properties for heat and anisotropic wicking of moisture, coaxial morphologies for optical applications as well as fibers with high internal surface area for filtration and catalysis applications. However, identifying the appropriate processing conditions from the large space of possibilities using conventional trial-and-error approaches is a tedious and resource-intensive process. Here, we illustrate a high throughput computational approach to rapidly explore and characterize how processing conditions (specifically blend ratio and evaporation rates) affect the internal morphology of polymer blends during solvent based fabrication. We focus on a PS: PMMA system and identify two distinct classes of morphologies formed due to variations in the processing conditions. We subsequently map the processing conditions to the morphology class, thus constructing a ‘phase diagram’ that enables rapid identification of processing parameters for specific morphology class. We finally demonstrate the potential for time dependent processing conditions to get desired features of the morphology. This opens up the possibility of rational stage-wise design of processing pathways for tailored fiber morphology using high throughput computing.

  12. Genome-wide analysis of microRNAs in rubber tree (Hevea brasiliensis L.) using high-throughput sequencing.

    Science.gov (United States)

    Lertpanyasampatha, Manassawe; Gao, Lei; Kongsawadworakul, Panida; Viboonjun, Unchera; Chrestin, Hervé; Liu, Renyi; Chen, Xuemei; Narangajavana, Jarunya

    2012-08-01

    MicroRNAs (miRNAs) are short RNAs with essential roles in gene regulation in various organisms including higher plants. In contrast to the vast information on miRNAs from many economically important plants, almost nothing has been reported on the identification or analysis of miRNAs from rubber tree (Hevea brasiliensis L.), the most important natural rubber-producing crop. To identify miRNAs and their target genes in rubber tree, high-throughput sequencing combined with a computational approach was performed. Four small RNA libraries were constructed for deep sequencing from mature and young leaves of two rubber tree clones, PB 260 and PB 217, which provide high and low latex yield, respectively. 115 miRNAs belonging to 56 known miRNA families were identified, and northern hybridization validated miRNA expression and revealed developmental stage-dependent and clone-specific expression for some miRNAs. We took advantage of the newly released rubber tree genome assembly and predicted 20 novel miRNAs. Further, computational analysis uncovered potential targets of the known and novel miRNAs. Predicted target genes included not only transcription factors but also genes involved in various biological processes including stress responses, primary and secondary metabolism, and signal transduction. In particular, genes with roles in rubber biosynthesis are predicted targets of miRNAs. This study provides a basic catalog of miRNAs and their targets in rubber tree to facilitate future improvement and exploitation of rubber tree.

  13. High-Throughput Sequencing and Mutagenesis to Accelerate the Domestication of Microlaena stipoides as a New Food Crop

    Science.gov (United States)

    Shapter, Frances M.; Cross, Michael; Ablett, Gary; Malory, Sylvia; Chivers, Ian H.; King, Graham J.; Henry, Robert J.

    2013-01-01

    Global food demand, climatic variability and reduced land availability are driving the need for domestication of new crop species. The accelerated domestication of a rice-like Australian dryland polyploid grass, Microlaena stipoides (Poaceae), was targeted using chemical mutagenesis in conjunction with high throughput sequencing of genes for key domestication traits. While M. stipoides has previously been identified as having potential as a new grain crop for human consumption, only a limited understanding of its genetic diversity and breeding system was available to aid the domestication process. Next generation sequencing of deeply-pooled target amplicons estimated allelic diversity of a selected base population at 14.3 SNP/Mb and identified novel, putatively mutation-induced polymorphisms at about 2.4 mutations/Mb. A 97% lethal dose (LD97) of ethyl methanesulfonate treatment was applied without inducing sterility in this polyploid species. Forward and reverse genetic screens identified beneficial alleles for the domestication trait, seed-shattering. Unique phenotypes observed in the M2 population suggest the potential for rapid accumulation of beneficial traits without recourse to a traditional cross-breeding strategy. This approach may be applicable to other wild species, unlocking their potential as new food, fibre and fuel crops. PMID:24367532

  14. High-throughput sequencing and mutagenesis to accelerate the domestication of Microlaena stipoides as a new food crop.

    Directory of Open Access Journals (Sweden)

    Frances M Shapter

    Full Text Available Global food demand, climatic variability and reduced land availability are driving the need for domestication of new crop species. The accelerated domestication of a rice-like Australian dryland polyploid grass, Microlaena stipoides (Poaceae, was targeted using chemical mutagenesis in conjunction with high throughput sequencing of genes for key domestication traits. While M. stipoides has previously been identified as having potential as a new grain crop for human consumption, only a limited understanding of its genetic diversity and breeding system was available to aid the domestication process. Next generation sequencing of deeply-pooled target amplicons estimated allelic diversity of a selected base population at 14.3 SNP/Mb and identified novel, putatively mutation-induced polymorphisms at about 2.4 mutations/Mb. A 97% lethal dose (LD₉₇ of ethyl methanesulfonate treatment was applied without inducing sterility in this polyploid species. Forward and reverse genetic screens identified beneficial alleles for the domestication trait, seed-shattering. Unique phenotypes observed in the M2 population suggest the potential for rapid accumulation of beneficial traits without recourse to a traditional cross-breeding strategy. This approach may be applicable to other wild species, unlocking their potential as new food, fibre and fuel crops.

  15. MetaGenSense: A web-application for analysis and exploration of high throughput sequencing metagenomic data.

    Science.gov (United States)

    Correia, Damien; Doppelt-Azeroual, Olivia; Denis, Jean-Baptiste; Vandenbogaert, Mathias; Caro, Valérie

    2015-01-01

    The detection and characterization of emerging infectious agents has been a continuing public health concern. High Throughput Sequencing (HTS) or Next-Generation Sequencing (NGS) technologies have proven to be promising approaches for efficient and unbiased detection of pathogens in complex biological samples, providing access to comprehensive analyses. As NGS approaches typically yield millions of putatively representative reads per sample, efficient data management and visualization resources have become mandatory. Most usually, those resources are implemented through a dedicated Laboratory Information Management System (LIMS), solely to provide perspective regarding the available information. We developed an easily deployable web-interface, facilitating management and bioinformatics analysis of metagenomics data-samples. It was engineered to run associated and dedicated Galaxy workflows for the detection and eventually classification of pathogens. The web application allows easy interaction with existing Galaxy metagenomic workflows, facilitates the organization, exploration and aggregation of the most relevant sample-specific sequences among millions of genomic sequences, allowing them to determine their relative abundance, and associate them to the most closely related organism or pathogen. The user-friendly Django-Based interface, associates the users' input data and its metadata through a bio-IT provided set of resources (a Galaxy instance, and both sufficient storage and grid computing power). Galaxy is used to handle and analyze the user's input data from loading, indexing, mapping, assembly and DB-searches. Interaction between our application and Galaxy is ensured by the BioBlend library, which gives API-based access to Galaxy's main features. Metadata about samples, runs, as well as the workflow results are stored in the LIMS. For metagenomic classification and exploration purposes, we show, as a proof of concept, that integration of intuitive

  16. Targeted gene enrichment and high-throughput sequencing for environmental biomonitoring: a case study using freshwater macroinvertebrates.

    Science.gov (United States)

    Dowle, Eddy J; Pochon, Xavier; C Banks, Jonathan; Shearer, Karen; Wood, Susanna A

    2016-09-01

    Recent studies have advocated biomonitoring using DNA techniques. In this study, two high-throughput sequencing (HTS)-based methods were evaluated: amplicon metabarcoding of the cytochrome C oxidase subunit I (COI) mitochondrial gene and gene enrichment using MYbaits (targeting nine different genes including COI). The gene-enrichment method does not require PCR amplification and thus avoids biases associated with universal primers. Macroinvertebrate samples were collected from 12 New Zealand rivers. Macroinvertebrates were morphologically identified and enumerated, and their biomass determined. DNA was extracted from all macroinvertebrate samples and HTS undertaken using the illumina miseq platform. Macroinvertebrate communities were characterized from sequence data using either six genes (three of the original nine were not used) or just the COI gene in isolation. The gene-enrichment method (all genes) detected the highest number of taxa and obtained the strongest Spearman rank correlations between the number of sequence reads, abundance and biomass in 67% of the samples. Median detection rates across rare (5%) taxa were highest using the gene-enrichment method (all genes). Our data indicated primer biases occurred during amplicon metabarcoding with greater than 80% of sequence reads originating from one taxon in several samples. The accuracy and sensitivity of both HTS methods would be improved with more comprehensive reference sequence databases. The data from this study illustrate the challenges of using PCR amplification-based methods for biomonitoring and highlight the potential benefits of using approaches, such as gene enrichment, which circumvent the need for an initial PCR step. © 2015 John Wiley & Sons Ltd.

  17. Filling reference gaps via assembling DNA barcodes using high-throughput sequencing-moving toward barcoding the world.

    Science.gov (United States)

    Liu, Shanlin; Yang, Chentao; Zhou, Chengran; Zhou, Xin

    2017-12-01

    Over the past decade, biodiversity researchers have dedicated tremendous efforts to constructing DNA reference barcodes for rapid species registration and identification. Although analytical cost for standard DNA barcoding has been significantly reduced since early 2000, further dramatic reduction in barcoding costs is unlikely because Sanger sequencing is approaching its limits in throughput and chemistry cost. Constraints in barcoding cost not only led to unbalanced barcoding efforts around the globe, but also prevented high-throughput sequencing (HTS)-based taxonomic identification from applying binomial species names, which provide crucial linkages to biological knowledge. We developed an Illumina-based pipeline, HIFI-Barcode, to produce full-length Cytochrome c oxidase subunit I (COI) barcodes from pooled polymerase chain reaction amplicons generated by individual specimens. The new pipeline generated accurate barcode sequences that were comparable to Sanger standards, even for different haplotypes of the same species that were only a few nucleotides different from each other. Additionally, the new pipeline was much more sensitive in recovering amplicons at low quantity. The HIFI-Barcode pipeline successfully recovered barcodes from more than 78% of the polymerase chain reactions that didn't show clear bands on the electrophoresis gel. Moreover, sequencing results based on the single molecular sequencing platform Pacbio confirmed the accuracy of the HIFI-Barcode results. Altogether, the new pipeline can provide an improved solution to produce full-length reference barcodes at about one-tenth of the current cost, enabling construction of comprehensive barcode libraries for local fauna, leading to a feasible direction for DNA barcoding global biomes. © The Authors 2017. Published by Oxford University Press.

  18. High throughput generation and trapping of individual agarose microgel using microfluidic approach

    KAUST Repository

    Shi, Yang

    2013-02-28

    Microgel is a kind of biocompatible polymeric material, which has been widely used as micro-carriers in materials synthesis, drug delivery and cell biology applications. However, high-throughput generation of individual microgel for on-site analysis in a microdevice still remains a challenge. Here, we presented a simple and stable droplet microfluidic system to realize high-throughput generation and trapping of individual agarose microgels based on the synergetic effect of surface tension and hydrodynamic forces in microchannels and used it for 3-D cell culture in real-time. The established system was mainly composed of droplet generators with flow focusing T-junction and a series of array individual trap structures. The whole process including the independent agarose microgel formation, immobilization in trapping array and gelation in situ via temperature cooling could be realized on the integrated microdevice completely. The performance of this system was demonstrated by successfully encapsulating and culturing adenoid cystic carcinoma (ACCM) cells in the gelated agarose microgels. This established approach is simple, easy to operate, which can not only generate the micro-carriers with different components in parallel, but also monitor the cell behavior in 3D matrix in real-time. It can also be extended for applications in the area of material synthesis and tissue engineering. © 2013 Springer-Verlag Berlin Heidelberg.

  19. Characterization of Microbial Community in Lascaux Cave by High Throughput Sequencing

    Science.gov (United States)

    Alonso, Lise; Dubost, Audrey; Luis, Patricia; Pommier, Thomas; Moënne-Loccoz, Yvan

    2017-04-01

    The Lascaux Cave in South-Est France is an archeological landmark renowned for its Paleolithic paintings dating back c.18.000 years. Extensive touristic frequenting and repeated chemical treatments have resulted in the development of microbial stains on cave walls, which is a major issue in terms of art conservation. Therefore, it is of prime importance to better understand the microbial ecology of Lascaux Cave. Like many other caves, Lascaux is quite heterogeneous in terms of the nature and surface properties of rock walls within cave rooms, as well as the succession of rooms/galleries from the entrance to deeper areas of the cave. Lascaux Cave displays an additional levels of heterogeneity related to the presence of discontinuous stains on certain types of cave walls. We compared the microbial community (i.e. both prokaryotic and eukaryotic microbial populations) colonizing cave walls of different rooms/galleries, in and outside stains and in different cave layers, in successive years. Quantitative PCR analysis of cave wall samples gave in the order of 102 copies of 18S rRNA genes and 105 copies of 16S rRNA genes per ng of DNA, indicating significant colonization of all cave walls by micro-eukaryotes and especially bacteria. Illumina metagenomic analyses of cave wall samples was carried out based on four ribosomal DNA markers targeting bacteria, archaea, fungi, and other micro-eukaryotes. The results showed that the four microbial communities were highly diverse in and outside stains, as several hundred genera of microorganisms were identified in each. Proteobacteria were more prominent within stains whereas Bacteroidetes and Sordariomycetes were more prominent outside stains. High-throughput sequencing also showed that the nature/surface properties of cave walls were the main factor determining the structure and composition of microbial communities, ahead of the other heterogeneity factors studied i.e. location within the cave, presence of stain and sampling

  20. Viral Metagenomics: Analysis of Begomoviruses by Illumina High-Throughput Sequencing

    Directory of Open Access Journals (Sweden)

    Ali Idris

    2014-03-01

    Full Text Available Traditional DNA sequencing methods are inefficient, lack the ability to discern the least abundant viral sequences, and ineffective for determining the extent of variability in viral populations. Here, populations of single-stranded DNA plant begomoviral genomes and their associated beta- and alpha-satellite molecules (virus-satellite complexes (genus, Begomovirus; family, Geminiviridae were enriched from total nucleic acids isolated from symptomatic, field-infected plants, using rolling circle amplification (RCA. Enriched virus-satellite complexes were subjected to Illumina-Next Generation Sequencing (NGS. CASAVA and SeqMan NGen programs were implemented, respectively, for quality control and for de novo and reference-guided contig assembly of viral-satellite sequences. The authenticity of the begomoviral sequences, and the reproducibility of the Illumina-NGS approach for begomoviral deep sequencing projects, were validated by comparing NGS results with those obtained using traditional molecular cloning and Sanger sequencing of viral components and satellite DNAs, also enriched by RCA or amplified by polymerase chain reaction. As the use of NGS approaches, together with advances in software development, make possible deep sequence coverage at a lower cost; the approach described herein will streamline the exploration of begomovirus diversity and population structure from naturally infected plants, irrespective of viral abundance. This is the first report of the implementation of Illumina-NGS to explore the diversity and identify begomoviral-satellite SNPs directly from plants naturally-infected with begomoviruses under field conditions.

  1. Viral metagenomics: Analysis of begomoviruses by illumina high-throughput sequencing

    KAUST Repository

    Idris, Ali

    2014-03-12

    Traditional DNA sequencing methods are inefficient, lack the ability to discern the least abundant viral sequences, and ineffective for determining the extent of variability in viral populations. Here, populations of single-stranded DNA plant begomoviral genomes and their associated beta- and alpha-satellite molecules (virus-satellite complexes) (genus, Begomovirus; family, Geminiviridae) were enriched from total nucleic acids isolated from symptomatic, field-infected plants, using rolling circle amplification (RCA). Enriched virus-satellite complexes were subjected to Illumina-Next Generation Sequencing (NGS). CASAVA and SeqMan NGen programs were implemented, respectively, for quality control and for de novo and reference-guided contig assembly of viral-satellite sequences. The authenticity of the begomoviral sequences, and the reproducibility of the Illumina-NGS approach for begomoviral deep sequencing projects, were validated by comparing NGS results with those obtained using traditional molecular cloning and Sanger sequencing of viral components and satellite DNAs, also enriched by RCA or amplified by polymerase chain reaction. As the use of NGS approaches, together with advances in software development, make possible deep sequence coverage at a lower cost; the approach described herein will streamline the exploration of begomovirus diversity and population structure from naturally infected plants, irrespective of viral abundance. This is the first report of the implementation of Illumina-NGS to explore the diversity and identify begomoviral-satellite SNPs directly from plants naturally-infected with begomoviruses under field conditions. 2014 by the authors; licensee MDPI, Basel, Switzerland.

  2. A high-throughput splinkerette-PCR method for the isolation and sequencing of retroviral insertion sites

    DEFF Research Database (Denmark)

    Uren, Anthony G; Mikkers, Harald; Kool, Jaap

    2009-01-01

    sites has been a major limitation to performing screens on this scale. Here we present a method for the high-throughput isolation of insertion sites using a highly efficient splinkerette-PCR method coupled with capillary or 454 sequencing. This protocol includes a description of the procedure for DNA...... optimized for the murine leukemia virus (MuLV), and can easily be performed in a 96-well plate format for the efficient multiplex isolation of insertion sites....

  3. Using high-throughput sequencing of ITS2 to describe Symbiodinium metacommunities in St. John, US Virgin Islands

    OpenAIRE

    Ross Cunning; Gates, Ruth D; Edmunds, Peter J.

    2017-01-01

    Symbiotic microalgae (Symbiodinium spp.) strongly influence the performance and stress-tolerance of their coral hosts, making the analysis of Symbiodinium communities in corals (and metacommunities on reefs) advantageous for many aspects of coral reef research. High-throughput sequencing of ITS2 nrDNA offers unprecedented scale in describing these communities, yet high intragenomic variability at this locus complicates the resolution of biologically meaningful diversity. Here, we demonstrate ...

  4. Identification of Brassinosteroid Target Genes by Chromatin Immunoprecipitation Followed by High-Throughput Sequencing (ChIP-seq) and RNA-Sequencing.

    Science.gov (United States)

    Nolan, Trevor; Liu, Sanzhen; Guo, Hongqing; Li, Lei; Schnable, Patrick; Yin, Yanhai

    2017-01-01

    Brassinosteroids (BRs) play important roles in many growth and developmental processes. BRs signal to regulate BR-INSENSITIVE1-ETHYL METHANESULFONATE-SUPPRESSOR1 (BES1) and BRASSINAZOLE-RESISTANT1 (BZR1) transcription factors (TFs), which, in turn, regulate several hundreds of transcription factors (termed BES1/BZR1-targeted TFs or BTFs) and thousands of genes to mediate various BR responses. Chromatin Immunoprecipitation followed by high-throughput sequencing (ChIP-seq) with BES1/BZR1 and BTFs is an important approach to identify BR target genes. In combination with RNA-sequencing experiments, these genomic methods have become powerful tools to detect BR target genes and reveal transcriptional networks underlying BR-regulated processes.

  5. Using machine learning and high-throughput RNA sequencing to classify the precursors of small non-coding RNAs.

    Science.gov (United States)

    Ryvkin, Paul; Leung, Yuk Yee; Ungar, Lyle H; Gregory, Brian D; Wang, Li-San

    2014-05-01

    Recent advances in high-throughput sequencing allow researchers to examine the transcriptome in more detail than ever before. Using a method known as high-throughput small RNA-sequencing, we can now profile the expression of small regulatory RNAs such as microRNAs and small interfering RNAs (siRNAs) with a great deal of sensitivity. However, there are many other types of small RNAs (small nucleolar RNAs), snRNAs (small nuclear RNAs), scRNAs (small cytoplasmic RNAs), tRNAs (transfer RNAs), and transposon-derived RNAs. Here, we present a user's guide for CoRAL (Classification of RNAs by Analysis of Length), a computational method for discriminating between different classes of RNA using high-throughput small RNA-sequencing data. Not only can CoRAL distinguish between RNA classes with high accuracy, but it also uses features that are relevant to small RNA biogenesis pathways. By doing so, CoRAL can give biologists a glimpse into the characteristics of different RNA processing pathways and how these might differ between tissue types, biological conditions, or even different species. CoRAL is available at http://wanglab.pcbi.upenn.edu/coral/. Copyright © 2013 Elsevier Inc. All rights reserved.

  6. A high-throughput screening approach to discovering good forms of biologically inspired visual representation.

    Directory of Open Access Journals (Sweden)

    Nicolas Pinto

    2009-11-01

    Full Text Available While many models of biological object recognition share a common set of "broad-stroke" properties, the performance of any one model depends strongly on the choice of parameters in a particular instantiation of that model--e.g., the number of units per layer, the size of pooling kernels, exponents in normalization operations, etc. Since the number of such parameters (explicit or implicit is typically large and the computational cost of evaluating one particular parameter set is high, the space of possible model instantiations goes largely unexplored. Thus, when a model fails to approach the abilities of biological visual systems, we are left uncertain whether this failure is because we are missing a fundamental idea or because the correct "parts" have not been tuned correctly, assembled at sufficient scale, or provided with enough training. Here, we present a high-throughput approach to the exploration of such parameter sets, leveraging recent advances in stream processing hardware (high-end NVIDIA graphic cards and the PlayStation 3's IBM Cell Processor. In analogy to high-throughput screening approaches in molecular biology and genetics, we explored thousands of potential network architectures and parameter instantiations, screening those that show promising object recognition performance for further analysis. We show that this approach can yield significant, reproducible gains in performance across an array of basic object recognition tasks, consistently outperforming a variety of state-of-the-art purpose-built vision systems from the literature. As the scale of available computational power continues to expand, we argue that this approach has the potential to greatly accelerate progress in both artificial vision and our understanding of the computational underpinning of biological vision.

  7. CSReport: A New Computational Tool Designed for Automatic Analysis of Class Switch Recombination Junctions Sequenced by High-Throughput Sequencing.

    Science.gov (United States)

    Boyer, François; Boutouil, Hend; Dalloul, Iman; Dalloul, Zeinab; Cook-Moreau, Jeanne; Aldigier, Jean-Claude; Carrion, Claire; Herve, Bastien; Scaon, Erwan; Cogné, Michel; Péron, Sophie

    2017-05-15

    B cells ensure humoral immune responses due to the production of Ag-specific memory B cells and Ab-secreting plasma cells. In secondary lymphoid organs, Ag-driven B cell activation induces terminal maturation and Ig isotype class switch (class switch recombination [CSR]). CSR creates a virtually unique IgH locus in every B cell clone by intrachromosomal recombination between two switch (S) regions upstream of each C region gene. Amount and structural features of CSR junctions reveal valuable information about the CSR mechanism, and analysis of CSR junctions is useful in basic and clinical research studies of B cell functions. To provide an automated tool able to analyze large data sets of CSR junction sequences produced by high-throughput sequencing (HTS), we designed CSReport, a software program dedicated to support analysis of CSR recombination junctions sequenced with a HTS-based protocol (Ion Torrent technology). CSReport was assessed using simulated data sets of CSR junctions and then used for analysis of Sμ-Sα and Sμ-Sγ1 junctions from CH12F3 cells and primary murine B cells, respectively. CSReport identifies junction segment breakpoints on reference sequences and junction structure (blunt-ended junctions or junctions with insertions or microhomology). Besides the ability to analyze unprecedentedly large libraries of junction sequences, CSReport will provide a unified framework for CSR junction studies. Our results show that CSReport is an accurate tool for analysis of sequences from our HTS-based protocol for CSR junctions, thereby facilitating and accelerating their study. Copyright © 2017 by The American Association of Immunologists, Inc.

  8. Detection and mapping of mtDNA SNPs in Atlantic salmon using high throughput DNA sequencing

    Directory of Open Access Journals (Sweden)

    Olafsdottir Gudbjorg

    2011-04-01

    Full Text Available Abstract Background Approximately half of the mitochondrial genome inherent within 546 individual Atlantic salmon (Salmo salar derived from across the species' North Atlantic range, was selectively amplified with a novel combination of standard PCR and pyro-sequencing in a single run using 454 Titanium FLX technology (Roche, 454 Life Sciences. A unique combination of barcoded primers and a partitioned sequencing plate was employed to designate each sequence read to its original sample. The sequence reads were aligned according to the S. salar mitochondrial reference sequence (NC_001960.1, with the objective of identifying single nucleotide polymorphisms (SNPs. They were validated if they met with the following three stringent criteria: (i sequence reads were produced from both DNA strands; (ii SNPs were confirmed in a minimum of 90% of replicate sequence reads; and (iii SNPs occurred in more than one individual. Results Pyrosequencing generated a total of 179,826,884 bp of data, and 10,765 of the total 10,920 S. salar sequences (98.6% were assigned back to their original samples. The approach taken resulted in a total of 216 SNPs and 2 indels, which were validated and mapped onto the S. salar mitochondrial genome, including 107 SNPs and one indel not previously reported. An average of 27.3 sequence reads with a standard deviation of 11.7 supported each SNP per individual. Conclusion The study generated a mitochondrial SNP panel from a large sample group across a broad geographical area, reducing the potential for ascertainment bias, which has hampered previous studies. The SNPs identified here validate those identified in previous studies, and also contribute additional potentially informative loci for the future study of phylogeography and evolution in the Atlantic salmon. The overall success experienced with this novel application of HT sequencing of targeted regions suggests that the same approach could be successfully applied for SNP mining

  9. Selection of DNA Aptamers for Ovarian Cancer Biomarker CA125 Using One-Pot SELEX and High-Throughput Sequencing

    Directory of Open Access Journals (Sweden)

    Delia J. Scoville

    2017-01-01

    Full Text Available CA125 is a mucin glycoprotein whose concentration in serum correlates with a woman’s risk of developing ovarian cancer and also indicates response to therapy in diagnosed patients. Accurate detection of this large, complex protein in patient samples is of great clinical relevance. We suggest that powerful new diagnostic tools may be enabled by the development of nucleic acid aptamers with affinity for CA125. Here, we report on our use of One-Pot SELEX to isolate single-stranded DNA aptamers with affinity for CA125, followed by high-throughput sequencing of the selected oligonucleotides. This data-rich approach, combined with bioinformatics tools, enabled the entire selection process to be characterized. Using fluorescence anisotropy and affinity probe capillary electrophoresis, the binding affinities of four aptamer candidates were evaluated. Two aptamers, CA125_1 and CA125_12, both without primers, were found to bind to clinically relevant concentrations of the protein target. Binding was differently influenced by the presence of Mg2+ ions, being required for binding of CA125_1 and abrogating binding of CA125_12. In conclusion, One-Pot SELEX was found to be a promising selection method that yielded DNA aptamers to a clinically important protein target.

  10. Genome-wide identification of bone metastasis-related microRNAs in lung adenocarcinoma by high-throughput sequencing.

    Directory of Open Access Journals (Sweden)

    Lin Xie

    Full Text Available BACKGROUND: MicroRNAs (miRNAs are a class of small noncoding RNAs that regulate gene expression at the post-transcriptional level. They participate in a wide variety of biological processes, including apoptosis, proliferation and metastasis. The aberrant expression of miRNAs has been found to play an important role in many cancers. RESULTS: To understand the roles of miRNAs in the bone metastasis of lung adenocarcinoma, we constructed two small RNA libraries from blood of lung adenocarcinoma patients with and without bone metastasis. High-throughput sequencing combined with differential expression analysis identified that 7 microRNAs were down-regulated and 21 microRNAs were up-regulated in lung adenocarcinoma with bone metastasis. A total of 797 target genes of the differentially expressed microRNAs were identified using a bioinformatics approach. Functional annotation analysis indicated that a number of pathways might be involved in bone metastasis, survival of the primary origin and metastatic angiogenesis of lung adenocarcinoma. These include the MAPK, Wnt, and NF-kappaB signaling pathways, as well as pathways involving the matrix metalloproteinase, cytoskeletal protein and angiogenesis factors. CONCLUSIONS: This study provides some insights into the molecular mechanisms that underlie lung adenocarcinoma development, thereby aiding the diagnosis and treatment of the disease.

  11. A ground-up approach to High Throughput Cloud Computing in High-Energy Physics

    CERN Document Server

    AUTHOR|(INSPIRE)INSPIRE-00245123; Ganis, Gerardo; Bagnasco, Stefano

    The thesis explores various practical approaches in making existing High Throughput computing applications common in High Energy Physics work on cloud-provided resources, as well as opening the possibility for running new applications. The work is divided into two parts: firstly we describe the work done at the computing facility hosted by INFN Torino to entirely convert former Grid resources into cloud ones, eventually running Grid use cases on top along with many others in a more flexible way. Integration and conversion problems are duly described. The second part covers the development of solutions for automatizing the orchestration of cloud workers based on the load of a batch queue and the development of HEP applications based on ROOT's PROOF that can adapt at runtime to a changing number of workers.

  12. Chromatin analyses of Zymoseptoria tritici: Methods for chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq).

    Science.gov (United States)

    Soyer, Jessica L; Möller, Mareike; Schotanus, Klaas; Connolly, Lanelle R; Galazka, Jonathan M; Freitag, Michael; Stukenbrock, Eva H

    2015-06-01

    The presence or absence of specific transcription factors, chromatin remodeling machineries, chromatin modification enzymes, post-translational histone modifications and histone variants all play crucial roles in the regulation of pathogenicity genes. Chromatin immunoprecipitation (ChIP) followed by high-throughput sequencing (ChIP-seq) provides an important tool to study genome-wide protein-DNA interactions to help understand gene regulation in the context of native chromatin. ChIP-seq is a convenient in vivo technique to identify, map and characterize occupancy of specific DNA fragments with proteins against which specific antibodies exist or which can be epitope-tagged in vivo. We optimized existing ChIP protocols for use in the wheat pathogen Zymoseptoria tritici and closely related sister species. Here, we provide a detailed method, underscoring which aspects of the technique are organism-specific. Library preparation for Illumina sequencing is described, as this is currently the most widely used ChIP-seq method. One approach for the analysis and visualization of representative sequence is described; improved tools for these analyses are constantly being developed. Using ChIP-seq with antibodies against H3K4me2, which is considered a mark for euchromatin or H3K9me3 and H3K27me3, which are considered marks for heterochromatin, the overall distribution of euchromatin and heterochromatin in the genome of Z. tritici can be determined. Our ChIP-seq protocol was also successfully applied to Z. tritici strains with high levels of melanization or aberrant colony morphology, and to different species of the genus (Z. ardabiliae and Z. pseudotritici), suggesting that our technique is robust. The methods described here provide a powerful framework to study new aspects of chromatin biology and gene regulation in this prominent wheat pathogen. Copyright © 2015 Elsevier Inc. All rights reserved.

  13. The use of high-throughput DNA sequencing in the investigation of antigenic variation: application to Neisseria species.

    Directory of Open Access Journals (Sweden)

    John K Davies

    Full Text Available Antigenic variation occurs in a broad range of species. This process resembles gene conversion in that variant DNA is unidirectionally transferred from partial gene copies (or silent loci into an expression locus. Previous studies of antigenic variation have involved the amplification and sequencing of individual genes from hundreds of colonies. Using the pilE gene from Neisseria gonorrhoeae we have demonstrated that it is possible to use PCR amplification, followed by high-throughput DNA sequencing and a novel assembly process, to detect individual antigenic variation events. The ability to detect these events was much greater than has previously been possible. In N. gonorrhoeae most silent loci contain multiple partial gene copies. Here we show that there is a bias towards using the copy at the 3' end of the silent loci (copy 1 as the donor sequence. The pilE gene of N. gonorrhoeae and some strains of Neisseria meningitidis encode class I pilin, but strains of N. meningitidis from clonal complexes 8 and 11 encode a class II pilin. We have confirmed that the class II pili of meningococcal strain FAM18 (clonal complex 11 are non-variable, and this is also true for the class II pili of strain NMB from clonal complex 8. In addition when a gene encoding class I pilin was moved into the meningococcal strain NMB background there was no evidence of antigenic variation. Finally we investigated several members of the opa gene family of N. gonorrhoeae, where it has been suggested that limited variation occurs. Variation was detected in the opaK gene that is located close to pilE, but not at the opaJ gene located elsewhere on the genome. The approach described here promises to dramatically improve studies of the extent and nature of antigenic variation systems in a variety of species.

  14. High-throughput and computational approaches for diagnostic and prognostic host tuberculosis biomarkers

    Directory of Open Access Journals (Sweden)

    January Weiner

    2017-03-01

    Full Text Available High-throughput techniques strive to identify new biomarkers that will be useful for the diagnosis, treatment, and prevention of tuberculosis (TB. However, their analysis and interpretation pose considerable challenges. Recent developments in the high-throughput detection of host biomarkers in TB are reported in this review.

  15. Genomic Methods Take the Plunge: Recent Advances in High-Throughput Sequencing of Marine Mammals.

    Science.gov (United States)

    Cammen, Kristina M; Andrews, Kimberly R; Carroll, Emma L; Foote, Andrew D; Humble, Emily; Khudyakov, Jane I; Louis, Marie; McGowen, Michael R; Olsen, Morten Tange; Van Cise, Amy M

    2016-11-01

    The dramatic increase in the application of genomic techniques to non-model organisms (NMOs) over the past decade has yielded numerous valuable contributions to evolutionary biology and ecology, many of which would not have been possible with traditional genetic markers. We review this recent progression with a particular focus on genomic studies of marine mammals, a group of taxa that represent key macroevolutionary transitions from terrestrial to marine environments and for which available genomic resources have recently undergone notable rapid growth. Genomic studies of NMOs utilize an expanding range of approaches, including whole genome sequencing, restriction site-associated DNA sequencing, array-based sequencing of single nucleotide polymorphisms and target sequence probes (e.g., exomes), and transcriptome sequencing. These approaches generate different types and quantities of data, and many can be applied with limited or no prior genomic resources, thus overcoming one traditional limitation of research on NMOs. Within marine mammals, such studies have thus far yielded significant contributions to the fields of phylogenomics and comparative genomics, as well as enabled investigations of fitness, demography, and population structure. Here we review the primary options for generating genomic data, introduce several emerging techniques, and discuss the suitability of each approach for different applications in the study of NMOs. © The American Genetic Association 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  16. Understanding microalgal species composition and contributions in Antarctic glacial melt water through rbcL high throughput sequencing

    Science.gov (United States)

    Barretto, K. M.; Kalmbach, A. J.; de la Torre, J. R.; Falcón, L. I.; Carpenter, E. J.

    2016-02-01

    The McMurdo Dry Valleys (MDV) in Antarctica present unique research opportunities, both because of the understudied biogeochemical impact of their microbial communities, and their sensitivity to climate change. Despite harsh desiccation, pH, and salinity stress, summer glacial melt water supports life in the MDV in the form of algal mats. These mat communities are complex in structure, with a network of dominant cyanobacteria interspersed with heterotrophic diazotrophs, smaller photoautotrophs, and thick extracellular polymeric substances. Due to their complexity, standard microscopy yields a limited understanding of community assemblages. Our previous high throughput sequencing (HTS) approaches focusing on 16S rRNA have profiled communities with understudied photosynthetic phyla such as Acidobacteria, Gemmatimonadetes, and Chloroflexi. To characterize these phototrophic communities, we are interested in (1) understanding their temporal dynamics and how the dominant cyanobacterial species influence community composition, (2) modeling how pH, nutrients, soil wetness, and temperature act as multivariate drivers of community composition, and (3) establishing a pipeline for HTS of the rbcL gene - which encodes the large subunit of the ubiquitous photosynthetic protein RuBisCO. Our initial screening of community DNA from MDV algal mats has shown the presence of Form IA, IB, and IC cbbL (an rbcL ortholog), and Form ID rbcL - indicating a relatively high degree of photoautotrophic diversity. Soil wetness drives anoxic conditions and we see that it shifts overall microbial composition - we expect photoautotrophs to respond similarly. We also expect photoautotrophic assemblages to shift with pH and soil nutrients. Our deep sequencing efforts suggest an inconsistency between indexing primers and algal DNA that could underestimate cyanobacterial and overestimate eukaryotic abundance. Resolving these issues with new approaches will allow us to more fully understand the

  17. DNA Sudoku—harnessing high-throughput sequencing for multiplexed specimen analysis

    Science.gov (United States)

    Erlich, Yaniv; Chang, Kenneth; Gordon, Assaf; Ronen, Roy; Navon, Oron; Rooks, Michelle; Hannon, Gregory J.

    2009-01-01

    Next-generation sequencers have sufficient power to analyze simultaneously DNAs from many different specimens, a practice known as multiplexing. Such schemes rely on the ability to associate each sequence read with the specimen from which it was derived. The current practice of appending molecular barcodes prior to pooling is practical for parallel analysis of up to many dozen samples. Here, we report a strategy that permits simultaneous analysis of tens of thousands of specimens. Our approach relies on the use of combinatorial pooling strategies in which pools rather than individual specimens are assigned barcodes. Thus, the identity of each specimen is encoded within the pooling pattern rather than by its association with a particular sequence tag. Decoding the pattern allows the sequence of an original specimen to be inferred with high confidence. We verified the ability of our encoding and decoding strategies to accurately report the sequence of individual samples within a large number of mixed specimens in two ways. First, we simulated data both from a clone library and from a human population in which a sequence variant associated with cystic fibrosis was present. Second, we actually pooled, sequenced, and decoded identities within two sets of 40,000 bacterial clones comprising approximately 20,000 different artificial microRNAs targeting Arabidopsis or human genes. We achieved greater than 97% accuracy in these trials. The strategies reported here can be applied to a wide variety of biological problems, including the determination of genotypic variation within large populations of individuals. PMID:19447965

  18. Metagenomic analysis and functional characterization of the biogas microbiome using high throughput shotgun sequencing and a novel binning strategy

    DEFF Research Database (Denmark)

    Campanaro, Stefano; Treu, Laura; Kougias, Panagiotis

    2016-01-01

    Biogas production is an economically attractive technology that has gained momentum worldwide over the past years. Biogas is produced by a biologically mediated process, widely known as "anaerobic digestion." This process is performed by a specialized and complex microbial community, in which...... dissect the bioma involved in anaerobic digestion by means of high throughput Illumina sequencing (~51 gigabases of sequence data), disclosing nearly one million genes and extracting 106 microbial genomes by a novel strategy combining two binning processes. Microbial phylogeny and putative taxonomy...

  19. Multiplexed Spliced-Leader Sequencing: A high-throughput, selective method for RNA-seq in Trypanosomatids.

    Science.gov (United States)

    Cuypers, Bart; Domagalska, Malgorzata A; Meysman, Pieter; Muylder, Géraldine de; Vanaerschot, Manu; Imamura, Hideo; Dumetz, Franck; Verdonckt, Thomas Wolf; Myler, Peter J; Ramasamy, Gowthaman; Laukens, Kris; Dujardin, Jean-Claude

    2017-06-16

    High throughput sequencing techniques are poorly adapted for in vivo studies of parasites, which require prior in vitro culturing and purification. Trypanosomatids, a group of kinetoplastid protozoans, possess a distinctive feature in their transcriptional mechanism whereby a specific Spliced Leader (SL) sequence is added to the 5'end of each mRNA by trans-splicing. This allows to discriminate Trypansomatid RNA from mammalian RNA and forms the basis of our new multiplexed protocol for high-throughput, selective RNA-sequencing called SL-seq. We provided a proof-of-concept of SL-seq in Leishmania donovani, the main causative agent of visceral leishmaniasis in humans, and successfully applied the method to sequence Leishmania mRNA directly from infected macrophages and from highly diluted mixes with human RNA. mRNA profiles obtained with SL-seq corresponded largely to those obtained from conventional poly-A tail purification methods, indicating both enumerate the same mRNA pool. However, SL-seq offers additional advantages, including lower sequencing depth requirements, fast and simple library prep and high resolution splice site detection. SL-seq is therefore ideal for fast and massive parallel sequencing of parasite transcriptomes directly from host tissues. Since SLs are also present in Nematodes, Cnidaria and primitive chordates, this method could also have high potential for transcriptomics studies in other organisms.

  20. Investigation of Human Cancers for Retrovirus by Low-Stringency Target Enrichment and High-Throughput Sequencing.

    Science.gov (United States)

    Vinner, Lasse; Mourier, Tobias; Friis-Nielsen, Jens; Gniadecki, Robert; Dybkaer, Karen; Rosenberg, Jacob; Langhoff, Jill Levin; Cruz, David Flores Santa; Fonager, Jannik; Izarzugaza, Jose M G; Gupta, Ramneek; Sicheritz-Ponten, Thomas; Brunak, Søren; Willerslev, Eske; Nielsen, Lars Peter; Hansen, Anders Johannes

    2015-08-19

    Although nearly one fifth of all human cancers have an infectious aetiology, the causes for the majority of cancers remain unexplained. Despite the enormous data output from high-throughput shotgun sequencing, viral DNA in a clinical sample typically constitutes a proportion of host DNA that is too small to be detected. Sequence variation among virus genomes complicates application of sequence-specific, and highly sensitive, PCR methods. Therefore, we aimed to develop and characterize a method that permits sensitive detection of sequences despite considerable variation. We demonstrate that our low-stringency in-solution hybridization method enables detection of retrovirus in samples from three cancer types. In accordance with recent studies our investigation revealed no retroviral infections in human B-cell lymphoma cells, cutaneous T-cell lymphoma or colorectal cancer biopsies. Nonetheless, our generally applicable method makes sensitive detection possible and permits sequencing of distantly related sequences from complex material.

  1. High-throughput polymorphism detection and genotyping in Brassica napus using next-generation RAD sequencing

    Directory of Open Access Journals (Sweden)

    Bus Anja

    2012-06-01

    Full Text Available Abstract Background The complex genome of rapeseed (Brassica napus is not well understood despite the economic importance of the species. Good knowledge of sequence variation is needed for genetics approaches and breeding purposes. We used a diversity set of B. napus representing eight different germplasm types to sequence genome-wide distributed restriction-site associated DNA (RAD fragments for polymorphism detection and genotyping. Results More than 113,000 RAD clusters with more than 20,000 single nucleotide polymorphisms (SNPs and 125 insertions/deletions were detected and characterized. About one third of the RAD clusters and polymorphisms mapped to the Brassica rapa reference sequence. An even distribution of RAD clusters and polymorphisms was observed across the B. rapa chromosomes, which suggests that there might be an equal distribution over the Brassica oleracea chromosomes, too. The representation of Gene Ontology (GO terms for unigenes with RAD clusters and polymorphisms revealed no signature of selection with respect to the distribution of polymorphisms within genes belonging to a specific GO category. Conclusions Considering the decreasing costs for next-generation sequencing, the results of our study suggest that RAD sequencing is not only a simple and cost-effective method for high-density polymorphism detection but also an alternative to SNP genotyping from transcriptome sequencing or SNP arrays, even for species with complex genomes such as B. napus.

  2. Temporal dynamics of soil microbial communities under different moisture regimes: high-throughput sequencing and bioinformatics analysis

    Science.gov (United States)

    Semenov, Mikhail; Zhuravleva, Anna; Semenov, Vyacheslav; Yevdokimov, Ilya; Larionova, Alla

    2017-04-01

    Recent climate scenarios predict not only continued global warming but also an increased frequency and intensity of extreme climatic events such as strong changes in temperature and precipitation regimes. Microorganisms are well known to be more sensitive to changes in environmental conditions than to other soil chemical and physical parameters. In this study, we determined the shifts in soil microbial community structure as well as indicative taxa in soils under three moisture regimes using high-throughput Illumina sequencing and range of bioinformatics approaches for the assessment of sequence data. Incubation experiments were performed in soil-filled (Greyic Phaeozems Albic) rhizoboxes with maize and without plants. Three contrasting moisture regimes were being simulated: 1) optimal wetting (OW), a watering 2-3 times per week to maintain soil moisture of 20-25% by weight; 2) periodic wetting (PW), with alternating periods of wetting and drought; and 3) constant insufficient wetting (IW), while soil moisture of 12% by weight was permanently maintained. Sampled fresh soils were homogenized, and the total DNA of three replicates was extracted using the FastDNA® SPIN kit for Soil. DNA replicates were combined in a pooled sample and the DNA was used for PCR with specific primers for the 16S V3 and V4 regions. In order to compare variability between different samples and replicates within a single sample, some DNA replicates treated separately. The products were purified and submitted to Illumina MiSeq sequencing. Sequence data were evaluated by alpha-diversity (Chao1 and Shannon H' diversity indexes), beta-diversity (UniFrac and Bray-Curtis dissimilarity), heatmap, tagcloud, and plot-bar analyses using the MiSeq Reporter Metagenomics Workflow and R packages (phyloseq, vegan, tagcloud). Shannon index varied in a rather narrow range (4.4-4.9) with the lowest values for microbial communities under PW treatment. Chao1 index varied from 385 to 480, being a more flexible

  3. eRNA: a graphic user interface-based tool optimized for large data analysis from high-throughput RNA sequencing

    Science.gov (United States)

    2014-01-01

    Background RNA sequencing (RNA-seq) is emerging as a critical approach in biological research. However, its high-throughput advantage is significantly limited by the capacity of bioinformatics tools. The research community urgently needs user-friendly tools to efficiently analyze the complicated data generated by high throughput sequencers. Results We developed a standalone tool with graphic user interface (GUI)-based analytic modules, known as eRNA. The capacity of performing parallel processing and sample management facilitates large data analyses by maximizing hardware usage and freeing users from tediously handling sequencing data. The module miRNA identification” includes GUIs for raw data reading, adapter removal, sequence alignment, and read counting. The module “mRNA identification” includes GUIs for reference sequences, genome mapping, transcript assembling, and differential expression. The module “Target screening” provides expression profiling analyses and graphic visualization. The module “Self-testing” offers the directory setups, sample management, and a check for third-party package dependency. Integration of other GUIs including Bowtie, miRDeep2, and miRspring extend the program’s functionality. Conclusions eRNA focuses on the common tools required for the mapping and quantification analysis of miRNA-seq and mRNA-seq data. The software package provides an additional choice for scientists who require a user-friendly computing environment and high-throughput capacity for large data analysis. eRNA is available for free download at https://sourceforge.net/projects/erna/?source=directory. PMID:24593312

  4. A Torrent of data: mapping chromatin organization using 5C and high-throughput sequencing.

    Science.gov (United States)

    Fraser, James; Ethier, Sylvain D; Miura, Hisashi; Dostie, Josée

    2012-01-01

    The study of three-dimensional genome organization is an exciting research area, which has benefited from the rapid development of high-resolution molecular mapping techniques over the past decade. These methods are derived from the chromosome conformation capture (3C) technique and are each aimed at improving some aspect of 3C. All 3C technologies use formaldehyde fixation and proximity-based ligation to capture chromatin contacts in cell populations and consider in vivo spatial proximity more or less inversely proportional to the frequency of measured interactions. The 3C-carbon copy (5C) method is among the most quantitative of these approaches. 5C is extremely robust and can be used to study chromatin organization at various scales. Here, we present a modified 5C analysis protocol adapted for sequencing with an Ion Torrent Personal Genome Machine™ (PGM™). We explain how Torrent 5C libraries are produced and sequenced. We also describe the statistical and computational methods we developed to normalize and analyze raw Torrent 5C sequence data. The Torrent 5C protocol should facilitate the study of in vivo chromatin architecture at high resolution because it benefits from high accuracy, greater speed, low running costs, and the flexibility of in-house next-generation sequencing. Copyright © 2012 Elsevier Inc. All rights reserved.

  5. Inertial-ordering-assisted droplet microfluidics for high-throughput single-cell RNA-sequencing.

    Science.gov (United States)

    Moon, Hui-Sung; Je, Kwanghwi; Min, Jae-Woong; Park, Donghyun; Han, Kyung-Yeon; Shin, Seung-Ho; Park, Woong-Yang; Yoo, Chang Eun; Kim, Shin-Hyun

    2018-02-27

    Single-cell RNA-seq reveals the cellular heterogeneity inherent in the population of cells, which is very important in many clinical and research applications. Recent advances in droplet microfluidics have achieved the automatic isolation, lysis, and labeling of single cells in droplet compartments without complex instrumentation. However, barcoding errors occurring in the cell encapsulation process because of the multiple-beads-in-droplet and insufficient throughput because of the low concentration of beads for avoiding multiple-beads-in-a-droplet remain important challenges for precise and efficient expression profiling of single cells. In this study, we developed a new droplet-based microfluidic platform that significantly improved the throughput while reducing barcoding errors through deterministic encapsulation of inertially ordered beads. Highly concentrated beads containing oligonucleotide barcodes were spontaneously ordered in a spiral channel by an inertial effect, which were in turn encapsulated in droplets one-by-one, while cells were simultaneously encapsulated in the droplets. The deterministic encapsulation of beads resulted in a high fraction of single-bead-in-a-droplet and rare multiple-beads-in-a-droplet although the bead concentration increased to 1000 μl -1 , which diminished barcoding errors and enabled accurate high-throughput barcoding. We successfully validated our device with single-cell RNA-seq. In addition, we found that multiple-beads-in-a-droplet, generated using a normal Drop-Seq device with a high concentration of beads, underestimated transcript numbers and overestimated cell numbers. This accurate high-throughput platform can expand the capability and practicality of Drop-Seq in single-cell analysis.

  6. A reproducible approach to high-throughput biological data acquisition and integration

    Directory of Open Access Journals (Sweden)

    Daniela Börnigen

    2015-03-01

    Full Text Available Modern biological research requires rapid, complex, and reproducible integration of multiple experimental results generated both internally and externally (e.g., from public repositories. Although large systematic meta-analyses are among the most effective approaches both for clinical biomarker discovery and for computational inference of biomolecular mechanisms, identifying, acquiring, and integrating relevant experimental results from multiple sources for a given study can be time-consuming and error-prone. To enable efficient and reproducible integration of diverse experimental results, we developed a novel approach for standardized acquisition and analysis of high-throughput and heterogeneous biological data. This allowed, first, novel biomolecular network reconstruction in human prostate cancer, which correctly recovered and extended the NFκB signaling pathway. Next, we investigated host-microbiome interactions. In less than an hour of analysis time, the system retrieved data and integrated six germ-free murine intestinal gene expression datasets to identify the genes most influenced by the gut microbiota, which comprised a set of immune-response and carbohydrate metabolism processes. Finally, we constructed integrated functional interaction networks to compare connectivity of peptide secretion pathways in the model organisms Escherichia coli, Bacillus subtilis, and Pseudomonas aeruginosa.

  7. Towards high-throughput mouse embryonic phenotyping: a novel approach to classifying ventricular septal defects

    Science.gov (United States)

    Liang, Xi; Xie, Zhongliu; Tamura, Masaru; Shiroishi, Toshihiko; Kitamoto, Asanobu

    2015-03-01

    The goal of the International Mouse Phenotyping Consortium (IMPC, www.mousephenotype.org) is to study all the over 23,000 genes in the mouse by knocking them out one-by-one for comparative analysis. Large amounts of knockout mouse lines have been raised, leading to a strong demand for high-throughput phenotyping technologies. Traditional means via time-consuming histological examination is clearly unsuitable in this scenario. Biomedical imaging technologies such as CT and MRI therefore have started being used to develop more efficient phenotyping approaches. Existing work however primarily rests on volumetric analytics over anatomical structures to detect anomaly, yet this type of methods generally fail when features are subtle such as ventricular septal defects (VSD) in the heart, and meanwhile phenotypic assessment normally requires expert manual labor. This study proposes, to the best of our knowledge, the first automatic VSD diagnostic system for mouse embryos. Our algorithm starts with the creation of an atlas using wild-type mouse images, followed by registration of knockouts to the atlas to perform atlas-based segmentation on the heart and then ventricles, after which ventricle segmentation is further refined using a region growing technique. VSD classification is completed by checking the existence of an overlap between left and right ventricles. Our approach has been validated on a database of 14 mouse embryo images, and achieved an overall accuracy of 90.9%, with sensitivity of 66.7% and specificity of 100%.

  8. A high-throughput microfluidic approach for 1000-fold leukocyte reduction of platelet-rich plasma

    Science.gov (United States)

    Xia, Hui; Strachan, Briony C.; Gifford, Sean C.; Shevkoplyas, Sergey S.

    2016-10-01

    Leukocyte reduction of donated blood products substantially reduces the risk of a number of transfusion-related complications. Current ‘leukoreduction’ filters operate by trapping leukocytes within specialized filtration material, while allowing desired blood components to pass through. However, the continuous release of inflammatory cytokines from the retained leukocytes, as well as the potential for platelet activation and clogging, are significant drawbacks of conventional ‘dead end’ filtration. To address these limitations, here we demonstrate our newly-developed ‘controlled incremental filtration’ (CIF) approach to perform high-throughput microfluidic removal of leukocytes from platelet-rich plasma (PRP) in a continuous flow regime. Leukocytes are separated from platelets within the PRP by progressively syphoning clarified PRP away from the concentrated leukocyte flowstream. Filtrate PRP collected from an optimally-designed CIF device typically showed a ~1000-fold (i.e. 99.9%) reduction in leukocyte concentration, while recovering >80% of the original platelets, at volumetric throughputs of ~1 mL/min. These results suggest that the CIF approach will enable users in many fields to now apply the advantages of microfluidic devices to particle separation, even for applications requiring macroscale flowrates.

  9. Characterization of Bacterial and Fungal Community Dynamics by High-Throughput Sequencing (HTS Metabarcoding during Flax Dew-Retting

    Directory of Open Access Journals (Sweden)

    Christophe Djemiel

    2017-10-01

    Full Text Available Flax dew-retting is a key step in the industrial extraction of fibers from flax stems and is dependent upon the production of a battery of hydrolytic enzymes produced by micro-organisms during this process. To explore the diversity and dynamics of bacterial and fungal communities involved in this process we applied a high-throughput sequencing (HTS DNA metabarcoding approach (16S rRNA/ITS region, Illumina Miseq on plant and soil samples obtained over a period of 7 weeks in July and August 2014. Twenty-three bacterial and six fungal phyla were identified in soil samples and 11 bacterial and four fungal phyla in plant samples. Dominant phyla were Proteobacteria, Bacteroidetes, Actinobacteria, and Firmicutes (bacteria and Ascomycota, Basidiomycota, and Zygomycota (fungi all of which have been previously associated with flax dew-retting except for Bacteroidetes and Basidiomycota that were identified for the first time. Rare phyla also identified for the first time in this process included Acidobacteria, CKC4, Chlorobi, Fibrobacteres, Gemmatimonadetes, Nitrospirae and TM6 (bacteria, and Chytridiomycota (fungi. No differences in microbial communities and colonization dynamics were observed between early and standard flax harvests. In contrast, the common agricultural practice of swath turning affects both bacterial and fungal community membership and structure in straw samples and may contribute to a more uniform retting. Prediction of community function using PICRUSt indicated the presence of a large collection of potential bacterial enzymes capable of hydrolyzing backbones and side-chains of cell wall polysaccharides. Assignment of functional guild (functional group using FUNGuild software highlighted a change from parasitic to saprophytic trophic modes in fungi during retting. This work provides the first exhaustive description of the microbial communities involved in flax dew-retting and will provide a valuable benchmark in future studies aiming

  10. Characterization of Bacterial and Fungal Community Dynamics by High-Throughput Sequencing (HTS) Metabarcoding during Flax Dew-Retting.

    Science.gov (United States)

    Djemiel, Christophe; Grec, Sébastien; Hawkins, Simon

    2017-01-01

    Flax dew-retting is a key step in the industrial extraction of fibers from flax stems and is dependent upon the production of a battery of hydrolytic enzymes produced by micro-organisms during this process. To explore the diversity and dynamics of bacterial and fungal communities involved in this process we applied a high-throughput sequencing (HTS) DNA metabarcoding approach (16S rRNA/ITS region, Illumina Miseq) on plant and soil samples obtained over a period of 7 weeks in July and August 2014. Twenty-three bacterial and six fungal phyla were identified in soil samples and 11 bacterial and four fungal phyla in plant samples. Dominant phyla were Proteobacteria, Bacteroidetes, Actinobacteria, and Firmicutes (bacteria) and Ascomycota, Basidiomycota, and Zygomycota (fungi) all of which have been previously associated with flax dew-retting except for Bacteroidetes and Basidiomycota that were identified for the first time. Rare phyla also identified for the first time in this process included Acidobacteria, CKC4, Chlorobi, Fibrobacteres, Gemmatimonadetes, Nitrospirae and TM6 (bacteria), and Chytridiomycota (fungi). No differences in microbial communities and colonization dynamics were observed between early and standard flax harvests. In contrast, the common agricultural practice of swath turning affects both bacterial and fungal community membership and structure in straw samples and may contribute to a more uniform retting. Prediction of community function using PICRUSt indicated the presence of a large collection of potential bacterial enzymes capable of hydrolyzing backbones and side-chains of cell wall polysaccharides. Assignment of functional guild (functional group) using FUNGuild software highlighted a change from parasitic to saprophytic trophic modes in fungi during retting. This work provides the first exhaustive description of the microbial communities involved in flax dew-retting and will provide a valuable benchmark in future studies aiming to evaluate

  11. Utility of high-throughput DNA sequencing in the study of the human papillomaviruses.

    Science.gov (United States)

    Escobar-Escamilla, Noé; Ramírez-González, José Ernesto; Castro-Escarpulli, Graciela; Díaz-Quiñonez, José Alberto

    2017-12-27

    The Papillomaviridae family is probably the most diverse group of viruses that affect vertebrates. The study of the relationship between infection by certain types of human papillomavirus (HPV) and the development of neoplastic epithelial lesions is of particular interest because of the high prevalence of HPV-related carcinomas in populations of developing countries. To understand the mechanisms of infection and their association with different clinical manifestations, molecular tools play an important role in the description of new types of HPV, the characterization of effector properties of the viral factors, the specific diagnosis and monitoring of HPV types, and the alteration patterns at genetic level in the host. Technological advances in the field of DNA sequencing have led to the development of different next-generation sequencing systems, allowing obtaining a large amount of data and broadening the applications to study viral diseases. In this review, we summarize the main approaches and their perspectives where the use of massively parallel sequencing has been proved as a useful tool in the research of the HPV infection.

  12. Molecular Identification of Soil Eukaryotes and Focused Approaches Targeting Protist and Faunal Groups Using High-Throughput Metabarcoding.

    Science.gov (United States)

    Arjen de Groot, G; Laros, Ivo; Geisen, Stefan

    2016-01-01

    While until recently the application of high-throughput sequencing approaches has mostly been restricted to bacteria and fungi, these methods have now also become available to less often studied (eukaryotic) groups, such as fauna and protists. Such approaches allow routine diversity screening for large numbers of samples via DNA metabarcoding. Given the enormous taxonomic diversity within the eukaryote tree of life, metabarcoding approaches targeting a single specific DNA region do not allow to discriminate members of all eukaryote clades at high taxonomic resolution. Here, we report on protocols that enable studying the diversity of soil eukaryotes and, at high taxonomic resolution, of individual faunal and protist groups therein using a tiered approach: first, the use of a general eukaryotic primer set targeting a wide range of eukaryotes provides a rough impression on the entire diversity of protists and faunal groups. Second, more focused approaches enable deciphering subsets of soil eukaryotes in higher taxonomic detail. We provide primers and protocols for two examples: soil microarthropods and cercozoan protists.

  13. High throughput resistance profiling of Plasmodium falciparum infections based on custom dual indexing and Illumina next generation sequencing-technology

    DEFF Research Database (Denmark)

    Nag, Sidsel; Dalgaard, Marlene Danner; Kofoed, Poul-Erik

    2017-01-01

    Genetic polymorphisms in P. falciparum can be used to indicate the parasite's susceptibility to antimalarial drugs as well as its geographical origin. Both of these factors are key to monitoring development and spread of antimalarial drug resistance. In this study, we combine multiplex PCR, custom...... designed dual indexing and Miseq sequencing for high throughput SNP-profiling of 457 malaria infections from Guinea-Bissau, at the cost of 10 USD per sample. By amplifying and sequencing 15 genetic fragments, we cover 20 resistance-conferring SNPs occurring in pfcrt, pfmdr1, pfdhfr, pfdhps, as well...... as the entire length of pfK13, and the mitochondrial barcode for parasite origin. SNPs of interest were sequenced with an average depth of 2,043 reads, and bases were called for the various SNP-positions with a p-value below 0.05, for 89.8-100% of samples. The SNP data indicates that artemisinin resistance...

  14. High throughput gene expression profiling: a molecular approach to integrative physiology

    Science.gov (United States)

    Liang, Mingyu; Cowley, Allen W; Greene, Andrew S

    2004-01-01

    Integrative physiology emphasizes the importance of understanding multiple pathways with overlapping, complementary, or opposing effects and their interactions in the context of intact organisms. The DNA microarray technology, the most commonly used method for high-throughput gene expression profiling, has been touted as an integrative tool that provides insights into regulatory pathways. However, the physiology community has been slow in acceptance of these techniques because of early failure in generating useful data and the lack of a cohesive theoretical framework in which experiments can be analysed. With recent advances in both technology and analysis, we propose a concept of multidimensional integration of physiology that incorporates data generated by DNA microarray and other functional, genomic, and proteomic approaches to achieve a truly integrative understanding of physiology. Analysis of several studies performed in simpler organisms or in mammalian model animals supports the feasibility of such multidimensional integration and demonstrates the power of DNA microarray as an indispensable molecular tool for such integration. Evaluation of DNA microarray techniques indicates that these techniques, despite limitations, have advanced to a point where the question-driven profiling research has become a feasible complement to the conventional, hypothesis-driven research. With a keen sense of homeostasis, global regulation, and quantitative analysis, integrative physiologists are uniquely positioned to apply these techniques to enhance the understanding of complex physiological functions. PMID:14678487

  15. Quantitative insertion-site sequencing (QIseq) for high throughput phenotyping of transposon mutants.

    Science.gov (United States)

    Bronner, Iraad F; Otto, Thomas D; Zhang, Min; Udenze, Kenneth; Wang, Chengqi; Quail, Michael A; Jiang, Rays H Y; Adams, John H; Rayner, Julian C

    2016-07-01

    Genetic screening using random transposon insertions has been a powerful tool for uncovering biology in prokaryotes, where whole-genome saturating screens have been performed in multiple organisms. In eukaryotes, such screens have proven more problematic, in part because of the lack of a sensitive and robust system for identifying transposon insertion sites. We here describe quantitative insertion-site sequencing, or QIseq, which uses custom library preparation and Illumina sequencing technology and is able to identify insertion sites from both the 5' and 3' ends of the transposon, providing an inbuilt level of validation. The approach was developed using piggyBac mutants in the human malaria parasite Plasmodium falciparum but should be applicable to many other eukaryotic genomes. QIseq proved accurate, confirming known sites in >100 mutants, and sensitive, identifying and monitoring sites over a >10,000-fold dynamic range of sequence counts. Applying QIseq to uncloned parasites shortly after transfections revealed multiple insertions in mixed populations and suggests that >4000 independent mutants could be generated from relatively modest scales of transfection, providing a clear pathway to genome-scale screens in P. falciparum QIseq was also used to monitor the growth of pools of previously cloned mutants and reproducibly differentiated between deleterious and neutral mutations in competitive growth. Among the mutants with fitness defects was a mutant with a piggyBac insertion immediately upstream of the kelch protein K13 gene associated with artemisinin resistance, implying mutants in this gene may have competitive fitness costs. QIseq has the potential to enable the scale-up of piggyBac-mediated genetics across multiple eukaryotic systems. © 2016 Bronner et al.; Published by Cold Spring Harbor Laboratory Press.

  16. High-throughput sequencing reveals inbreeding depression in a natural population.

    Science.gov (United States)

    Hoffman, Joseph I; Simpson, Fraser; David, Patrice; Rijks, Jolianne M; Kuiken, Thijs; Thorne, Michael A S; Lacy, Robert C; Dasmahapatra, Kanchon K

    2014-03-11

    Proxy measures of genome-wide heterozygosity based on approximately 10 microsatellites have been used to uncover heterozygosity fitness correlations (HFCs) for a wealth of important fitness traits in natural populations. However, effect sizes are typically very small and the underlying mechanisms remain contentious, as a handful of markers usually provides little power to detect inbreeding. We therefore used restriction site associated DNA (RAD) sequencing to accurately estimate genome-wide heterozygosity, an approach transferrable to any organism. As a proof of concept, we first RAD sequenced oldfield mice (Peromyscus polionotus) from a known pedigree, finding strong concordance between the inbreeding coefficient and heterozygosity measured at 13,198 single-nucleotide polymorphisms (SNPs). When applied to a natural population of harbor seals (Phoca vitulina), a weak HFC for parasite infection based on 27 microsatellites strengthened considerably with 14,585 SNPs, the deviance explained by heterozygosity increasing almost fivefold to a remarkable 49%. These findings arguably provide the strongest evidence to date of an HFC being due to inbreeding depression in a natural population lacking a pedigree. They also suggest that under some circumstances heterozygosity may explain far more variation in fitness than previously envisaged.

  17. Global repeat discovery and estimation of genomic copy number in a large, complex genome using a high-throughput 454 sequence survey

    Directory of Open Access Journals (Sweden)

    Varala Kranthi

    2007-05-01

    Full Text Available Abstract Background Extensive computational and database tools are available to mine genomic and genetic databases for model organisms, but little genomic data is available for many species of ecological or agricultural significance, especially those with large genomes. Genome surveys using conventional sequencing techniques are powerful, particularly for detecting sequences present in many copies per genome. However these methods are time-consuming and have potential drawbacks. High throughput 454 sequencing provides an alternative method by which much information can be gained quickly and cheaply from high-coverage surveys of genomic DNA. Results We sequenced 78 million base-pairs of randomly sheared soybean DNA which passed our quality criteria. Computational analysis of the survey sequences provided global information on the abundant repetitive sequences in soybean. The sequence was used to determine the copy number across regions of large genomic clones or contigs and discover higher-order structures within satellite repeats. We have created an annotated, online database of sequences present in multiple copies in the soybean genome. The low bias of pyrosequencing against repeat sequences is demonstrated by the overall composition of the survey data, which matches well with past estimates of repetitive DNA content obtained by DNA re-association kinetics (Cot analysis. Conclusion This approach provides a potential aid to conventional or shotgun genome assembly, by allowing rapid assessment of copy number in any clone or clone-end sequence. In addition, we show that partial sequencing can provide access to partial protein-coding sequences.

  18. Digital PCR provides sensitive and absolute calibration for high throughput sequencing

    Directory of Open Access Journals (Sweden)

    Fan H Christina

    2009-03-01

    Full Text Available Abstract Background Next-generation DNA sequencing on the 454, Solexa, and SOLiD platforms requires absolute calibration of the number of molecules to be sequenced. This requirement has two unfavorable consequences. First, large amounts of sample-typically micrograms-are needed for library preparation, thereby limiting the scope of samples which can be sequenced. For many applications, including metagenomics and the sequencing of ancient, forensic, and clinical samples, the quantity of input DNA can be critically limiting. Second, each library requires a titration sequencing run, thereby increasing the cost and lowering the throughput of sequencing. Results We demonstrate the use of digital PCR to accurately quantify 454 and Solexa sequencing libraries, enabling the preparation of sequencing libraries from nanogram quantities of input material while eliminating costly and time-consuming titration runs of the sequencer. We successfully sequenced low-nanogram scale bacterial and mammalian DNA samples on the 454 FLX and Solexa DNA sequencing platforms. This study is the first to definitively demonstrate the successful sequencing of picogram quantities of input DNA on the 454 platform, reducing the sample requirement more than 1000-fold without pre-amplification and the associated bias and reduction in library depth. Conclusion The digital PCR assay allows absolute quantification of sequencing libraries, eliminates uncertainties associated with the construction and application of standard curves to PCR-based quantification, and with a coefficient of variation close to 10%, is sufficiently precise to enable direct sequencing without titration runs.

  19. High-Throughput Sequencing of Microbial Community Diversity and Dynamics during Douchi Fermentation

    Science.gov (United States)

    Tu, Zong-cai; Wang, Xiao-lan

    2016-01-01

    Douchi is a type of Chinese traditional fermented food that is an important source of protein and is used in flavouring ingredients. The end product is affected by the microbial community present during fermentation, but exactly how microbes influence the fermentation process remains poorly understood. We used an Illumina MiSeq approach to investigate bacterial and fungal community diversity during both douchi-koji making and fermentation. A total of 181,443 high quality bacterial 16S rRNA sequences and 221,059 high quality fungal internal transcribed spacer reads were used for taxonomic classification, revealing eight bacterial and three fungal phyla. Firmicutes, Actinobacteria and Proteobacteria were the dominant bacterial phyla, while Ascomycota and Zygomycota were the dominant fungal phyla. At the genus level, Staphylococcus and Weissella were the dominant bacteria, while Aspergillus and Lichtheimia were the dominant fungi. Principal coordinate analysis showed structural separation between the composition of bacteria in koji making and fermentation. However, multivariate analysis of variance based on unweighted UniFrac distances did identify distinct differences (p fermentation. This is the first investigation to integrate douchi fermentation and koji making and fermentation processes through this technological approach. The results provide insight into the microbiome of the douchi fermentation process, and reveal a structural separation that may be stratified by the environment during the production of this traditional fermented food. PMID:27992473

  20. Multi-shaped-beam (MSB): an evolutionary approach for high throughput e-beam lithography

    Science.gov (United States)

    Slodowski, Matthias; Döring, Hans-Joachim; Stolberg, Ines A.; Dorl, Wolfgang

    2010-09-01

    The development of next-generation lithography (NGL) such as EUV, NIL and maskless lithography (ML2) are driven by the half pitch reduction and increasing integration density of integrated circuits down to the 22nm node and beyond. For electron beam direct write (EBDW) several revolutionary pixel based concepts have been under development since several years. By contrast an evolutionary and full package high throughput multi electron-beam approach called Multi Shaped Beam (MSB), which is based on proven Variable Shaped Beam (VSB) technology, will be presented in this paper. In the recent decade VSB has already been applied in EBDW for device learning, early prototyping and low volume fabrication in production environments for both silicon and compound semiconductor applications. Above all the high resolution and the high flexibility due to the avoidance of expensive masks for critical layers made it an attractive solution for advanced technology nodes down to 32nm half pitch. The limitation in throughput of VSB has been mitigated in a major extension of VSB by the qualification of the cell projection (CP) technology concurrently used with VSB. With CP more pixels in complex shapes can be projected in one shot, enabling a remarkable shot count reduction for repetitive pattern. The most advanced step to extend the mature VSB technology for higher throughput is its parallelization in one column applying MEMS based multi deflection arrays. With this Vistec MSB technology, multiple shaped beamlets are generated simultaneously, each controllable individually in shape size and beam on time. Compared to pixel based ML2 approaches the MSB technology enables the maskless, variable and parallel projection of a large number of pixels per beamlet times the number of beamlets. Basic concepts, exposure examples and performance results of each of the described throughput enhancement steps will be presented.

  1. SSR_pipeline: a bioinformatic infrastructure for identifying microsatellites from paired-end Illumina high-throughput DNA sequencing data

    Science.gov (United States)

    Miller, Mark P.; Knaus, Brian J.; Mullins, Thomas D.; Haig, Susan M.

    2013-01-01

    SSR_pipeline is a flexible set of programs designed to efficiently identify simple sequence repeats (e.g., microsatellites) from paired-end high-throughput Illumina DNA sequencing data. The program suite contains 3 analysis modules along with a fourth control module that can automate analyses of large volumes of data. The modules are used to 1) identify the subset of paired-end sequences that pass Illumina quality standards, 2) align paired-end reads into a single composite DNA sequence, and 3) identify sequences that possess microsatellites (both simple and compound) conforming to user-specified parameters. The microsatellite search algorithm is extremely efficient, and we have used it to identify repeats with motifs from 2 to 25bp in length. Each of the 3 analysis modules can also be used independently to provide greater flexibility or to work with FASTQ or FASTA files generated from other sequencing platforms (Roche 454, Ion Torrent, etc.). We demonstrate use of the program with data from the brine fly Ephydra packardi (Diptera: Ephydridae) and provide empirical timing benchmarks to illustrate program performance on a common desktop computer environment. We further show that the Illumina platform is capable of identifying large numbers of microsatellites, even when using unenriched sample libraries and a very small percentage of the sequencing capacity from a single DNA sequencing run. All modules from SSR_pipeline are implemented in the Python programming language and can therefore be used from nearly any computer operating system (Linux, Macintosh, and Windows).

  2. Uncommon nucleotide excision repair phenotypes revealed by targeted high-throughput sequencing.

    Science.gov (United States)

    Calmels, Nadège; Greff, Géraldine; Obringer, Cathy; Kempf, Nadine; Gasnier, Claire; Tarabeux, Julien; Miguet, Marguerite; Baujat, Geneviève; Bessis, Didier; Bretones, Patricia; Cavau, Anne; Digeon, Béatrice; Doco-Fenzy, Martine; Doray, Bérénice; Feillet, François; Gardeazabal, Jesus; Gener, Blanca; Julia, Sophie; Llano-Rivas, Isabel; Mazur, Artur; Michot, Caroline; Renaldo-Robin, Florence; Rossi, Massimiliano; Sabouraud, Pascal; Keren, Boris; Depienne, Christel; Muller, Jean; Mandel, Jean-Louis; Laugel, Vincent

    2016-03-22

    Deficient nucleotide excision repair (NER) activity causes a variety of autosomal recessive diseases including xeroderma pigmentosum (XP) a disorder which pre-disposes to skin cancer, and the severe multisystem condition known as Cockayne syndrome (CS). In view of the clinical overlap between NER-related disorders, as well as the existence of multiple phenotypes and the numerous genes involved, we developed a new diagnostic approach based on the enrichment of 16 NER-related genes by multiplex amplification coupled with next-generation sequencing (NGS). Our test cohort consisted of 11 DNA samples, all with known mutations and/or non pathogenic SNPs in two of the tested genes. We then used the same technique to analyse samples from a prospective cohort of 40 patients. Multiplex amplification and sequencing were performed using AmpliSeq protocol on the Ion Torrent PGM (Life Technologies). We identified causative mutations in 17 out of the 40 patients (43%). Four patients showed biallelic mutations in the ERCC6(CSB) gene, five in the ERCC8(CSA) gene: most of them had classical CS features but some had very mild and incomplete phenotypes. A small cohort of 4 unrelated classic XP patients from the Basque country (Northern Spain) revealed a common splicing mutation in POLH (XP-variant), demonstrating a new founder effect in this population. Interestingly, our results also found ERCC2(XPD), ERCC3(XPB) or ERCC5(XPG) mutations in two cases of UV-sensitive syndrome and in two cases with mixed XP/CS phenotypes. Our study confirms that NGS is an efficient technique for the analysis of NER-related disorders on a molecular level. It is particularly useful for phenotypes with combined features or unusually mild symptoms. Targeted NGS used in conjunction with DNA repair functional tests and precise clinical evaluation permits rapid and cost-effective diagnosis in patients with NER-defects.

  3. Origin, diversity and maturation of human antiviral antibodies analyzed by high-throughput sequencing

    Directory of Open Access Journals (Sweden)

    Ponraj ePrabakaran

    2012-08-01

    Full Text Available Our understanding of how antibodies are generated and function could help develop effective vaccines and antibody-based therapeutics against viruses such as HIV-1, SARS Coronavirus (CoV, and Hendra and Nipah viruses (henipaviruses. Although broadly neutralizing antibodies (bnAbs against the HIV-1 were observed in patients, elicitation of such bnAbs remains a major challenge when compared to other viral targets. We previously hypothesized that HIV-1 could have evolved a strategy to evade the immune system due to absent or very weak binding of germline antibodies to the conserved epitopes that may not be sufficient to initiate and/or maintain an effective immune response. To further explore our hypothesis, we used the 454 sequence analysis of a large naïve library of human IgM antibodies which had been used for selecting antibodies against SARS Coronavirus (CoV receptor-binding domain (RBD, and soluble G proteins (sG of Hendra and Nipah viruses (henipaviruses. We found that the human IgM repertoires from the 454 sequencing have diverse germline usages, recombination patterns, junction diversity and a lower extent of somatic mutation. In this study, we identified germline intermediates of antibodies specific to HIV-1 and other viruses as observed in normal individuals, and compared their genetic diversity and somatic mutation level along with available structural and functional data. Further computational analysis will provide framework for understanding the underlying genetic and molecular determinants related to maturation pathways of antiviral bnAbs that could be useful for applying novel approaches to the design of effective vaccine immunogens and antibody-based therapeutics.

  4. High-throughput sequencing of nematode communities from total soil DNA extractions

    DEFF Research Database (Denmark)

    Sapkota, Rumakanta; Nicolaisen, Mogens

    2015-01-01

    nematodes without the need for enrichment was developed. Using this strategy on DNA templates from a set of 22 agricultural soils, we obtained 64.4% sequences of nematode origin in total, whereas the remaining sequences were almost entirely from other metazoans. The nematode sequences were derived from...... a broad taxonomic range and most sequences were from nematode taxa that have previously been found to be abundant in soil such as Tylenchida, Rhabditida, Dorylaimida, Triplonchida and Araeolaimida. Conclusions: Our amplification and sequencing strategy for assessing nematode diversity was able to collect...

  5. High-throughput and quantitative genome-wide messenger RNA sequencing for molecular phenotyping.

    Science.gov (United States)

    Collins, John E; Wali, Neha; Sealy, Ian M; Morris, James A; White, Richard J; Leonard, Steven R; Jackson, David K; Jones, Matthew C; Smerdon, Nathalie C; Zamora, Jorge; Dooley, Christopher M; Carruthers, Samantha N; Barrett, Jeffrey C; Stemple, Derek L; Busch-Nentwich, Elisabeth M

    2015-08-05

    We present a genome-wide messenger RNA (mRNA) sequencing technique that converts small amounts of RNA from many samples into molecular phenotypes. It encompasses all steps from sample preparation to sequence analysis and is applicable to baseline profiling or perturbation measurements. Multiplex sequencing of transcript 3' ends identifies differential transcript abundance independent of gene annotation. We show that increasing biological replicate number while maintaining the total amount of sequencing identifies more differentially abundant transcripts. This method can be implemented on polyadenylated RNA from any organism with an annotated reference genome and in any laboratory with access to Illumina sequencing.

  6. Estimation of immune cell densities in immune cell conglomerates: an approach for high-throughput quantification.

    Directory of Open Access Journals (Sweden)

    Niels Halama

    Full Text Available BACKGROUND: Determining the correct number of positive immune cells in immunohistological sections of colorectal cancer and other tumor entities is emerging as an important clinical predictor and therapy selector for an individual patient. This task is usually obstructed by cell conglomerates of various sizes. We here show that at least in colorectal cancer the inclusion of immune cell conglomerates is indispensable for estimating reliable patient cell counts. Integrating virtual microscopy and image processing principally allows the high-throughput evaluation of complete tissue slides. METHODOLOGY/PRINCIPAL FINDINGS: For such large-scale systems we demonstrate a robust quantitative image processing algorithm for the reproducible quantification of cell conglomerates on CD3 positive T cells in colorectal cancer. While isolated cells (28 to 80 microm(2 are counted directly, the number of cells contained in a conglomerate is estimated by dividing the area of the conglomerate in thin tissues sections (< or =6 microm by the median area covered by an isolated T cell which we determined as 58 microm(2. We applied our algorithm to large numbers of CD3 positive T cell conglomerates and compared the results to cell counts obtained manually by two independent observers. While especially for high cell counts, the manual counting showed a deviation of up to 400 cells/mm(2 (41% variation, algorithm-determined T cell numbers generally lay in between the manually observed cell numbers but with perfect reproducibility. CONCLUSION: In summary, we recommend our approach as an objective and robust strategy for quantifying immune cell densities in immunohistological sections which can be directly implemented into automated full slide image processing systems.

  7. Peptide Pattern Recognition for high-throughput protein sequence analysis and clustering

    DEFF Research Database (Denmark)

    Busk, Peter Kamp

    2017-01-01

    Large collections of protein sequences with divergent sequences are tedious to analyze for understanding their phylogenetic or structure-function relation. Peptide Pattern Recognition is an algorithm that was developed to facilitate this task but the previous version does only allow a limited...... number of sequences as input. I implemented Peptide Pattern Recognition as a multithread software designed to handle large numbers of sequences and perform analysis in a reasonable time frame. Benchmarking showed that the new implementation of Peptide Pattern Recognition is twenty times faster than...... the previous implementation on a small protein collection with 673 MAP kinase sequences. In addition, the new implementation could analyze a large protein collection with 48,570 Glycosyl Transferase family 20 sequences without reaching its upper limit on a desktop computer. Peptide Pattern Recognition...

  8. DRUMS: Disk Repository with Update Management and Select option for high throughput sequencing data.

    Science.gov (United States)

    Nettling, Martin; Thieme, Nils; Both, Andreas; Grosse, Ivo

    2014-02-04

    New technologies for analyzing biological samples, like next generation sequencing, are producing a growing amount of data together with quality scores. Moreover, software tools (e.g., for mapping sequence reads), calculating transcription factor binding probabilities, estimating epigenetic modification enriched regions or determining single nucleotide polymorphism increase this amount of position-specific DNA-related data even further. Hence, requesting data becomes challenging and expensive and is often implemented using specialised hardware. In addition, picking specific data as fast as possible becomes increasingly important in many fields of science. The general problem of handling big data sets was addressed by developing specialized databases like HBase, HyperTable or Cassandra. However, these database solutions require also specialized or distributed hardware leading to expensive investments. To the best of our knowledge, there is no database capable of (i) storing billions of position-specific DNA-related records, (ii) performing fast and resource saving requests, and (iii) running on a single standard computer hardware. Here, we present DRUMS (Disk Repository with Update Management and Select option), satisfying demands (i)-(iii). It tackles the weaknesses of traditional databases while handling position-specific DNA-related data in an efficient manner. DRUMS is capable of storing up to billions of records. Moreover, it focuses on optimizing relating single lookups as range request, which are needed permanently for computations in bioinformatics. To validate the power of DRUMS, we compare it to the widely used MySQL database. The test setting considers two biological data sets. We use standard desktop hardware as test environment. DRUMS outperforms MySQL in writing and reading records by a factor of two up to a factor of 10000. Furthermore, it can work with significantly larger data sets. Our work focuses on mid-sized data sets up to several billion

  9. The Candida Genome Database (CGD): incorporation of Assembly 22, systematic identifiers and visualization of high throughput sequencing data.

    Science.gov (United States)

    Skrzypek, Marek S; Binkley, Jonathan; Binkley, Gail; Miyasato, Stuart R; Simison, Matt; Sherlock, Gavin

    2017-01-04

    The Candida Genome Database (CGD, http://www.candidagenome.org/) is a freely available online resource that provides gene, protein and sequence information for multiple Candida species, along with web-based tools for accessing, analyzing and exploring these data. The mission of CGD is to facilitate and accelerate research into Candida pathogenesis and biology, by curating the scientific literature in real time, and connecting literature-derived annotations to the latest version of the genomic sequence and its annotations. Here, we report the incorporation into CGD of Assembly 22, the first chromosome-level, phased diploid assembly of the C. albicans genome, coupled with improvements that we have made to the assembly using additional available sequence data. We also report the creation of systematic identifiers for C. albicans genes and sequence features using a system similar to that adopted by the yeast community over two decades ago. Finally, we describe the incorporation of JBrowse into CGD, which allows online browsing of mapped high throughput sequencing data, and its implementation for several RNA-Seq data sets, as well as the whole genome sequencing data that was used in the construction of Assembly 22. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  10. Methylation Sensitive Amplification Polymorphism Sequencing (MSAP-Seq)-A Method for High-Throughput Analysis of Differentially Methylated CCGG Sites in Plants with Large Genomes.

    Science.gov (United States)

    Chwialkowska, Karolina; Korotko, Urszula; Kosinska, Joanna; Szarejko, Iwona; Kwasniewski, Miroslaw

    2017-01-01

    Epigenetic mechanisms, including histone modifications and DNA methylation, mutually regulate chromatin structure, maintain genome integrity, and affect gene expression and transposon mobility. Variations in DNA methylation within plant populations, as well as methylation in response to internal and external factors, are of increasing interest, especially in the crop research field. Methylation Sensitive Amplification Polymorphism (MSAP) is one of the most commonly used methods for assessing DNA methylation changes in plants. This method involves gel-based visualization of PCR fragments from selectively amplified DNA that are cleaved using methylation-sensitive restriction enzymes. In this study, we developed and validated a new method based on the conventional MSAP approach called Methylation Sensitive Amplification Polymorphism Sequencing (MSAP-Seq). We improved the MSAP-based approach by replacing the conventional separation of amplicons on polyacrylamide gels with direct, high-throughput sequencing using Next Generation Sequencing (NGS) and automated data analysis. MSAP-Seq allows for global sequence-based identification of changes in DNA methylation. This technique was validated in Hordeum vulgare. However, MSAP-Seq can be straightforwardly implemented in different plant species, including crops with large, complex and highly repetitive genomes. The incorporation of high-throughput sequencing into MSAP-Seq enables parallel and direct analysis of DNA methylation in hundreds of thousands of sites across the genome. MSAP-Seq provides direct genomic localization of changes and enables quantitative evaluation. We have shown that the MSAP-Seq method specifically targets gene-containing regions and that a single analysis can cover three-quarters of all genes in large genomes. Moreover, MSAP-Seq's simplicity, cost effectiveness, and high-multiplexing capability make this method highly affordable. Therefore, MSAP-Seq can be used for DNA methylation analysis in crop

  11. Next generation sequencing-based multigene panel for high throughput detection of food-borne pathogens.

    Science.gov (United States)

    Ferrario, Chiara; Lugli, Gabriele Andrea; Ossiprandi, Maria Cristina; Turroni, Francesca; Milani, Christian; Duranti, Sabrina; Mancabelli, Leonardo; Mangifesta, Marta; Alessandri, Giulia; van Sinderen, Douwe; Ventura, Marco

    2017-09-01

    Contamination of food by chemicals or pathogenic bacteria may cause particular illnesses that are linked to food consumption, commonly referred to as foodborne diseases. Bacteria are present in/on various foods products, such as fruits, vegetables and ready-to-eat products. Bacteria that cause foodborne diseases are known as foodborne pathogens (FBPs). Accurate detection methods that are able to reveal the presence of FBPs in food matrices are in constant demand, in order to ensure safe foods with a minimal risk of causing foodborne diseases. Here, a multiplex PCR-based Illumina sequencing method for FBP detection in food matrices was developed. Starting from 25 bacterial targets and 49 selected PCR primer pairs, a primer collection called foodborne pathogen - panel (FPP) consisting of 12 oligonucleotide pairs was developed. The FPP allows a more rapid and reliable identification of FBPs compared to classical cultivation methods. Furthermore, FPP permits sensitive and specific FBP detection in about two days from food sample acquisition to bioinformatics-based identification. The FPP is able to simultaneously identify eight different bacterial pathogens, i.e. Listeria monocytogenes, Campylobacter jejuni, Campylobacter coli, Salmonella enterica subsp. enterica serovar enteritidis, Escherichia coli, Shigella sonnei, Staphylococcus aureus and Yersinia enterocolitica, in a given food matrix at a threshold contamination level of 10 1 cell/g. Moreover, this novel detection method may represent an alternative and/or a complementary approach to PCR-based techniques, which are routinely used for FBP detection, and could be implemented in (parts of) the food chain as a quality check. Copyright © 2017 Elsevier B.V. All rights reserved.

  12. Reproducibility of high-throughput mRNA and small RNA sequencing across laboratories.

    Science.gov (United States)

    't Hoen, Peter A C; Friedländer, Marc R; Almlöf, Jonas; Sammeth, Michael; Pulyakhina, Irina; Anvar, Seyed Yahya; Laros, Jeroen F J; Buermans, Henk P J; Karlberg, Olof; Brännvall, Mathias; den Dunnen, Johan T; van Ommen, Gert-Jan B; Gut, Ivo G; Guigó, Roderic; Estivill, Xavier; Syvänen, Ann-Christine; Dermitzakis, Emmanouil T; Lappalainen, Tuuli

    2013-11-01

    RNA sequencing is an increasingly popular technology for genome-wide analysis of transcript sequence and abundance. However, understanding of the sources of technical and interlaboratory variation is still limited. To address this, the GEUVADIS consortium sequenced mRNAs and small RNAs of lymphoblastoid cell lines of 465 individuals in seven sequencing centers, with a large number of replicates. The variation between laboratories appeared to be considerably smaller than the already limited biological variation. Laboratory effects were mainly seen in differences in insert size and GC content and could be adequately corrected for. In small-RNA sequencing, the microRNA (miRNA) content differed widely between samples owing to competitive sequencing of rRNA fragments. This did not affect relative quantification of miRNAs. We conclude that distributing RNA sequencing among different laboratories is feasible, given proper standardization and randomization procedures. We provide a set of quality measures and guidelines for assessing technical biases in RNA-seq data.

  13. High throughput resistance profiling of Plasmodium falciparum infections based on custom dual indexing and Illumina next generation sequencing-technology

    DEFF Research Database (Denmark)

    Nag, Sidsel; Dalgaard, Marlene Danner; Kofoed, Poul-Erik

    2017-01-01

    as the entire length of pfK13, and the mitochondrial barcode for parasite origin. SNPs of interest were sequenced with an average depth of 2,043 reads, and bases were called for the various SNP-positions with a p-value below 0.05, for 89.8-100% of samples. The SNP data indicates that artemisinin resistance......-conferring SNPs in pfK13 are absent from the studied area of Guinea-Bissau, while the pfmdr1 86 N allele is found at a high prevalence. The mitochondrial barcodes are unanimous and accommodate a West African origin of the parasites. With this method, very reliable high throughput surveillance of antimalarial drug...

  14. HTSSIP: An R package for analysis of high throughput sequencing data from nucleic acid stable isotope probing (SIP) experiments.

    Science.gov (United States)

    Youngblut, Nicholas D; Barnett, Samuel E; Buckley, Daniel H

    2018-01-01

    Combining high throughput sequencing with stable isotope probing (HTS-SIP) is a powerful method for mapping in situ metabolic processes to thousands of microbial taxa. However, accurately mapping metabolic processes to taxa is complex and challenging. Multiple HTS-SIP data analysis methods have been developed, including high-resolution stable isotope probing (HR-SIP), multi-window high-resolution stable isotope probing (MW-HR-SIP), quantitative stable isotope probing (qSIP), and ΔBD. Currently, there is no publicly available software designed specifically for analyzing HTS-SIP data. To address this shortfall, we have developed the HTSSIP R package, an open-source, cross-platform toolset for conducting HTS-SIP analyses in a straightforward and easily reproducible manner. The HTSSIP package, along with full documentation and examples, is available from CRAN at https://cran.r-project.org/web/packages/HTSSIP/index.html and Github at https://github.com/buckleylab/HTSSIP.

  15. metaBIT, an integrative and automated metagenomic pipeline for analysing microbial profiles from high-throughput sequencing shotgun data

    DEFF Research Database (Denmark)

    Louvel, Guillaume; Der Sarkissian, Clio; Hanghøj, Kristian Ebbesen

    2016-01-01

    Micro-organisms account for most of the Earth's biodiversity and yet remain largely unknown. The complexity and diversity of microbial communities present in clinical and environmental samples can now be robustly investigated in record times and prices thanks to recent advances in high......-throughput DNA sequencing (HTS). Here, we develop metaBIT, an open-source computational pipeline automatizing routine microbial profiling of shotgun HTS data. Customizable by the user at different stringency levels, it performs robust taxonomy-based assignment and relative abundance calculation of microbial taxa......, as well as cross-sample statistical analyses of microbial diversity distributions. We demonstrate the versatility of metaBIT within a range of published HTS data sets sampled from the environment (soil and seawater) and the human body (skin and gut), but also from archaeological specimens. We present...

  16. Ammonium inhibition through the decoupling of acidification process and methanogenesis in anaerobic digester revealed by high throughput sequencing.

    Science.gov (United States)

    Zhang, Miao; Lin, Qiang; Rui, Junpeng; Li, Jiabao; Li, Xiangzhen

    2017-02-01

    To reveal the shifts of microbial communities along ammonium gradients, and the relationship between microbial community composition and the anaerobic digestion performance using a high throughput sequencing technique. Methane production declined with increasing ammonium concentration, and was inhibited above 4 g l-1. The volatile fatty acids, especially acetate, accumulated with elevated ammonium. Prokaryotic populations showed different responses to the ammonium concentration: Clostridium, Tepidimicrobium, Sporanaerobacter, Peptostreptococcus, Sarcina and Peptoniphilus showed good tolerance to ammonium ions. However, Syntrophomonas with poor tolerance to ammonium may be inhibited during anaerobic digestion. During methanogenesis, Methanosarcina was the dominant methanogen. Excessive ammonium inhibited methane production probably by decoupling the linkage between acidification process and methanogenesis, and finally resulted in different performance in anaerobic digestion.

  17. One step forwards for the routine use of high-throughput DNA sequencing in environmental monitoring. An efficient and standardizable method to maximize the detection of environmental bacteria.

    Science.gov (United States)

    Bruno, Antonia; Sandionigi, Anna; Galimberti, Andrea; Siani, Eleonora; Labra, Massimo; Cocuzza, Clementina; Ferri, Emanuele; Casiraghi, Maurizio

    2017-02-01

    We propose an innovative, repeatable, and reliable experimental workflow to concentrate and detect environmental bacteria in drinking water using molecular techniques. We first concentrated bacteria in water samples using tangential flow filtration and then we evaluated two methods of environmental DNA extraction. We performed tests on both artificially contaminated water samples and real drinking water samples. The efficiency of the experimental workflow was measured through qPCR. The successful applicability of the high-throughput DNA sequencing (HTS) approach was demonstrated on drinking water samples. Our results demonstrate the feasibility of our approach in high-throughput-based studies, and we suggest incorporating it in monitoring strategies to have a better representation of the microbial community. In the recent years, HTS techniques have become key tools in the study of microbial communities. To make the leap from academic laboratories to the routine monitoring (e.g., water treatment plants laboratories), we here propose an experimental workflow suitable for the introduction of HTS as a standard method for detecting environmental bacteria. © 2016 The Authors. MicrobiologyOpen published by John Wiley & Sons Ltd.

  18. High-Throughput Sequencing of Microbial Community Diversity and Dynamics during Douchi Fermentation

    National Research Council Canada - National Science Library

    Yang, Lin; Yang, Hui-lin; Tu, Zong-cai; Wang, Xiao-lan

    2016-01-01

    .... A total of 181,443 high quality bacterial 16S rRNA sequences and 221,059 high quality fungal internal transcribed spacer reads were used for taxonomic classification, revealing eight bacterial and three fungal phyla...

  19. Isolation and characterization of antigen-specific alpaca (Lama pacos) VHH antibodies by biopanning followed by high-throughput sequencing.

    Science.gov (United States)

    Miyazaki, Nobuo; Kiyose, Norihiko; Akazawa, Yoko; Takashima, Mizuki; Hagihara, Yosihisa; Inoue, Naokazu; Matsuda, Tomonari; Ogawa, Ryu; Inoue, Seiya; Ito, Yuji

    2015-09-01

    The antigen-binding domain of camelid dimeric heavy chain antibodies, known as VHH or Nanobody, has much potential in pharmaceutical and industrial applications. To establish the isolation process of antigen-specific VHH, a VHH phage library was constructed with a diversity of 8.4 × 10(7) from cDNA of peripheral blood mononuclear cells of an alpaca (Lama pacos) immunized with a fragment of IZUMO1 (IZUMO1PFF) as a model antigen. By conventional biopanning, 13 antigen-specific VHHs were isolated. The amino acid sequences of these VHHs, designated as N-group VHHs, were very similar to each other (>93% identity). To find more diverse antibodies, we performed high-throughput sequencing (HTS) of VHH genes. By comparing the frequencies of each sequence between before and after biopanning, we found the sequences whose frequencies were increased by biopanning. The top 100 sequences of them were supplied for phylogenic tree analysis. In total 75% of them belonged to N-group VHHs, but the other were phylogenically apart from N-group VHHs (Non N-group). Two of three VHHs selected from non N-group VHHs showed sufficient antigen binding ability. These results suggested that biopanning followed by HTS provided a useful method for finding minor and diverse antigen-specific clones that could not be identified by conventional biopanning. © The Authors 2015. Published by Oxford University Press on behalf of the Japanese Biochemical Society. All rights reserved.

  20. Universal and blocking primer mismatches limit the use of high-throughput DNA sequencing for the quantitative metabarcoding of arthropods.

    Science.gov (United States)

    Piñol, J; Mir, G; Gomez-Polo, P; Agustí, N

    2015-07-01

    The quantification of the biological diversity in environmental samples using high-throughput DNA sequencing is hindered by the PCR bias caused by variable primer-template mismatches of the individual species. In some dietary studies, there is the added problem that samples are enriched with predator DNA, so often a predator-specific blocking oligonucleotide is used to alleviate the problem. However, specific blocking oligonucleotides could coblock nontarget species to some degree. Here, we accurately estimate the extent of the PCR biases induced by universal and blocking primers on a mock community prepared with DNA of twelve species of terrestrial arthropods. We also compare universal and blocking primer biases with those induced by variable annealing temperature and number of PCR cycles. The results show that reads of all species were recovered after PCR enrichment at our control conditions (no blocking oligonucleotide, 45 °C annealing temperature and 40 cycles) and high-throughput sequencing. They also show that the four factors considered biased the final proportions of the species to some degree. Among these factors, the number of primer-template mismatches of each species had a disproportionate effect (up to five orders of magnitude) on the amplification efficiency. In particular, the number of primer-template mismatches explained most of the variation (~3/4) in the amplification efficiency of the species. The effect of blocking oligonucleotide concentration on nontarget species relative abundance was also significant, but less important (below one order of magnitude). Considering the results reported here, the quantitative potential of the technique is limited, and only qualitative results (the species list) are reliable, at least when targeting the barcoding COI region. © 2014 John Wiley & Sons Ltd.

  1. MicroRNA from Moringa oleifera: Identification by High Throughput Sequencing and Their Potential Contribution to Plant Medicinal Value.

    Directory of Open Access Journals (Sweden)

    Stefano Pirrò

    Full Text Available Moringa oleifera is a widespread plant with substantial nutritional and medicinal value. We postulated that microRNAs (miRNAs, which are endogenous, noncoding small RNAs regulating gene expression at the post-transcriptional level, might contribute to the medicinal properties of plants of this species after ingestion into human body, regulating human gene expression. However, the knowledge is scarce about miRNA in Moringa. Furthermore, in order to test the hypothesis on the pharmacological potential properties of miRNA, we conducted a high-throughput sequencing analysis using the Illumina platform. A total of 31,290,964 raw reads were produced from a library of small RNA isolated from M. oleifera seeds. We identified 94 conserved and two novel miRNAs that were validated by qRT-PCR assays. Results from qRT-PCR trials conducted on the expression of 20 Moringa miRNA showed that are conserved across multiple plant species as determined by their detection in tissue of other common crop plants. In silico analyses predicted target genes for the conserved miRNA that in turn allowed to relate the miRNAs to the regulation of physiological processes. Some of the predicted plant miRNAs have functional homology to their mammalian counterparts and regulated human genes when they were transfected into cell lines. To our knowledge, this is the first report of discovering M. oleifera miRNAs based on high-throughput sequencing and bioinformatics analysis and we provided new insight into a potential cross-species control of human gene expression. The widespread cultivation and consumption of M. oleifera, for nutritional and medicinal purposes, brings humans into close contact with products and extracts of this plant species. The potential for miRNA transfer should be evaluated as one possible mechanism of action to account for beneficial properties of this valuable species.

  2. MicroRNA from Moringa oleifera: Identification by High Throughput Sequencing and Their Potential Contribution to Plant Medicinal Value.

    Science.gov (United States)

    Pirrò, Stefano; Zanella, Letizia; Kenzo, Maurice; Montesano, Carla; Minutolo, Antonella; Potestà, Marina; Sobze, Martin Sanou; Canini, Antonella; Cirilli, Marco; Muleo, Rosario; Colizzi, Vittorio; Galgani, Andrea

    2016-01-01

    Moringa oleifera is a widespread plant with substantial nutritional and medicinal value. We postulated that microRNAs (miRNAs), which are endogenous, noncoding small RNAs regulating gene expression at the post-transcriptional level, might contribute to the medicinal properties of plants of this species after ingestion into human body, regulating human gene expression. However, the knowledge is scarce about miRNA in Moringa. Furthermore, in order to test the hypothesis on the pharmacological potential properties of miRNA, we conducted a high-throughput sequencing analysis using the Illumina platform. A total of 31,290,964 raw reads were produced from a library of small RNA isolated from M. oleifera seeds. We identified 94 conserved and two novel miRNAs that were validated by qRT-PCR assays. Results from qRT-PCR trials conducted on the expression of 20 Moringa miRNA showed that are conserved across multiple plant species as determined by their detection in tissue of other common crop plants. In silico analyses predicted target genes for the conserved miRNA that in turn allowed to relate the miRNAs to the regulation of physiological processes. Some of the predicted plant miRNAs have functional homology to their mammalian counterparts and regulated human genes when they were transfected into cell lines. To our knowledge, this is the first report of discovering M. oleifera miRNAs based on high-throughput sequencing and bioinformatics analysis and we provided new insight into a potential cross-species control of human gene expression. The widespread cultivation and consumption of M. oleifera, for nutritional and medicinal purposes, brings humans into close contact with products and extracts of this plant species. The potential for miRNA transfer should be evaluated as one possible mechanism of action to account for beneficial properties of this valuable species.

  3. Bulk segregant analysis by high-throughput sequencing reveals a novel xylose utilization gene from Saccharomyces cerevisiae.

    Directory of Open Access Journals (Sweden)

    Jared W Wenger

    2010-05-01

    Full Text Available Fermentation of xylose is a fundamental requirement for the efficient production of ethanol from lignocellulosic biomass sources. Although they aggressively ferment hexoses, it has long been thought that native Saccharomyces cerevisiae strains cannot grow fermentatively or non-fermentatively on xylose. Population surveys have uncovered a few naturally occurring strains that are weakly xylose-positive, and some S. cerevisiae have been genetically engineered to ferment xylose, but no strain, either natural or engineered, has yet been reported to ferment xylose as efficiently as glucose. Here, we used a medium-throughput screen to identify Saccharomyces strains that can increase in optical density when xylose is presented as the sole carbon source. We identified 38 strains that have this xylose utilization phenotype, including strains of S. cerevisiae, other sensu stricto members, and hybrids between them. All the S. cerevisiae xylose-utilizing strains we identified are wine yeasts, and for those that could produce meiotic progeny, the xylose phenotype segregates as a single gene trait. We mapped this gene by Bulk Segregant Analysis (BSA using tiling microarrays and high-throughput sequencing. The gene is a putative xylitol dehydrogenase, which we name XDH1, and is located in the subtelomeric region of the right end of chromosome XV in a region not present in the S288c reference genome. We further characterized the xylose phenotype by performing gene expression microarrays and by genetically dissecting the endogenous Saccharomyces xylose pathway. We have demonstrated that natural S. cerevisiae yeasts are capable of utilizing xylose as the sole carbon source, characterized the genetic basis for this trait as well as the endogenous xylose utilization pathway, and demonstrated the feasibility of BSA using high-throughput sequencing.

  4. Single-nucleotide polymorphism discovery by high-throughput sequencing in sorghum

    Directory of Open Access Journals (Sweden)

    White Frank F

    2011-07-01

    Full Text Available Abstract Background Eight diverse sorghum (Sorghum bicolor L. Moench accessions were subjected to short-read genome sequencing to characterize the distribution of single-nucleotide polymorphisms (SNPs. Two strategies were used for DNA library preparation. Missing SNP genotype data were imputed by local haplotype comparison. The effect of library type and genomic diversity on SNP discovery and imputation are evaluated. Results Alignment of eight genome equivalents (6 Gb to the public reference genome revealed 283,000 SNPs at ≥82% confirmation probability. Sequencing from libraries constructed to limit sequencing to start at defined restriction sites led to genotyping 10-fold more SNPs in all 8 accessions, and correctly imputing 11% more missing data, than from semirandom libraries. The SNP yield advantage of the reduced-representation method was less than expected, since up to one fifth of reads started at noncanonical restriction sites and up to one third of restriction sites predicted in silico to yield unique alignments were not sampled at near-saturation. For imputation accuracy, the availability of a genomically similar accession in the germplasm panel was more important than panel size or sequencing coverage. Conclusions A sequence quantity of 3 million 50-base reads per accession using a BsrFI library would conservatively provide satisfactory genotyping of 96,000 sorghum SNPs. For most reliable SNP-genotype imputation in shallowly sequenced genomes, germplasm panels should consist of pairs or groups of genomically similar entries. These results may help in designing strategies for economical genotyping-by-sequencing of large numbers of plant accessions.

  5. Molecular diet analysis of two African free-tailed bats (Molossidae) using high throughput sequencing

    DEFF Research Database (Denmark)

    Bohmann, Kristine; Monadjem, Ara; Noer, Christina Lehmkuhl

    2011-01-01

    Given the diversity of prey consumed by insectivorous bats, it is difficult to discern the composition of their diet using morphological or conventional PCR-based analyses of their faeces. We demonstrate the use of a powerful alternate tool, the use of the Roche FLX sequencing platform to deep......-sequence uniquely 5′ tagged insect-generic barcode cytochrome c oxidase I (COI) fragments, that were PCR amplified from faecal pellets of two free-tailed bat species Chaerephon pumilus and Mops condylurus (family: Molossidae). Although the analyses were challenged by the paucity of southern African insect COI...

  6. High-throughput physical map anchoring via BAC-pool sequencing

    Czech Academy of Sciences Publication Activity Database

    Cviková, Kateřina; Cattonaro, F.; Alaux, M.; Stein, N.; Mayer, K.F.X.; Doležel, Jaroslav; Bartoš, Jan

    2015-01-01

    Roč. 15, APR 11 (2015) ISSN 1471-2229 R&D Projects: GA ČR GA13-08786S; GA MŠk(CZ) LO1204 Institutional support: RVO:61389030 Keywords : Physical map * Contig anchoring * Next generation sequencing Subject RIV: EB - Genetics ; Molecular Biology Impact factor: 3.631, year: 2015

  7. Profiling of Ribose Methylations in RNA by High-Throughput Sequencing

    DEFF Research Database (Denmark)

    Birkedal, Ulf; Christensen-Dalsgaard, Mikkel; Krogh, Nicolai

    2015-01-01

    Ribose methylations are the most abundant chemical modifications of ribosomal RNA and are critical for ribosome assembly and fidelity of translation. Many aspects of ribose methylations have been difficult to study due to lack of efficient mapping methods. Here, we present a sequencing-based meth...

  8. Designing small universal k-mer hitting sets for improved analysis of high-throughput sequencing.

    Directory of Open Access Journals (Sweden)

    Yaron Orenstein

    2017-10-01

    Full Text Available With the rapidly increasing volume of deep sequencing data, more efficient algorithms and data structures are needed. Minimizers are a central recent paradigm that has improved various sequence analysis tasks, including hashing for faster read overlap detection, sparse suffix arrays for creating smaller indexes, and Bloom filters for speeding up sequence search. Here, we propose an alternative paradigm that can lead to substantial further improvement in these and other tasks. For integers k and L > k, we say that a set of k-mers is a universal hitting set (UHS if every possible L-long sequence must contain a k-mer from the set. We develop a heuristic called DOCKS to find a compact UHS, which works in two phases: The first phase is solved optimally, and for the second we propose several efficient heuristics, trading set size for speed and memory. The use of heuristics is motivated by showing the NP-hardness of a closely related problem. We show that DOCKS works well in practice and produces UHSs that are very close to a theoretical lower bound. We present results for various values of k and L and by applying them to real genomes show that UHSs indeed improve over minimizers. In particular, DOCKS uses less than 30% of the 10-mers needed to span the human genome compared to minimizers. The software and computed UHSs are freely available at github.com/Shamir-Lab/DOCKS/ and acgt.cs.tau.ac.il/docks/, respectively.

  9. Design of new plasmepsin inhibitors: a virtual high throughput screening approach on the EGEE grid.

    Science.gov (United States)

    Kasam, Vinod; Zimmermann, Marc; Maass, Astrid; Schwichtenberg, Horst; Wolf, Antje; Jacq, Nicolas; Breton, Vincent; Hofmann-Apitius, Martin

    2007-01-01

    Though different species of the genus Plasmodium may be responsible for malaria, the variant caused by P. falciparum is often very dangerous and even fatal if untreated. Hemoglobin degradation is one of the key metabolic processes for the survival of the Plasmodium parasite in its host. Plasmepsins, a family of aspartic proteases encoded by the Plasmodium genome, play a prominent role in host hemoglobin cleavage. In this paper we demonstrate the use of virtual screening, in particular molecular docking, employed at a very large scale to identify novel inhibitors for plasmepsins II and IV. A large grid infrastructure, the EGEE grid, was used to address the problem of large computation resources required for docking hundreds of thousands of chemical compounds on different plasmepsin targets of P. falciparum. A large compound library of about 1 million chemical compounds was docked on 5 different targets of plasmepsins using two different docking software, namely FlexX and AutoDock. Several strategies were employed to analyze the results of this virtual screening approach including docking scores, ideal binding modes, and interactions to key residues of the protein. Three different classes of structures with thiourea, diphenylurea, and guanidino scaffolds were identified to be promising hits. While the identification of diphenylurea compounds is in accordance with the literature and thus provides a sort of "positive control", the identification of novel compounds with a guanidino scaffold proves that high throughput docking can be effectively used to identify novel potential inhibitors of P. falciparum plasmepsins. Thus, with the work presented here, we do not only demonstrate the relevance of computational grids in drug discovery but also identify several promising small molecules which have the potential to serve as candidate inhibitors for P. falciparum plasmepsins. With the use of the EGEE grid infrastructure for the virtual screening campaign against the malaria

  10. Transcriptome analysis of Emiliania huxleyi cells grown under different conditions using high-throughput sequencing data

    Science.gov (United States)

    Andreson, R.; Anlauf, H.; Mackinder, L.; Iglesias-Rodriguez, D.; LaRoche, J.; Lenhard, B.

    2012-04-01

    Coccolithophores are ideal for studying genes responsible for biomineralization processes due to relatively small genome sizes, ability to grow in culture, and as a natural model system for measuring expression of calcification-related genes in two life stages. As the Emiliania huxleyi has several annotated calcification-related proteins, we have concentrated on analyzing its genes and promoter areas. Many recent studies have focused primarily on transcriptome analysis of E. huxleyi using nutrient-limited conditions to get more information about up-regulated genes involved in biomineralization and calcification processes. Although there are more than 100,000 EST sequences for E. huxleyi available from these projects in public databases, that data is often insufficient to identify the exact position of transcription start site (TSS) to perform precise analysis (nucleotide content, motif search) of core promoters and regulatory mechanisms in immediate flanking areas. ESTs are not ideal for these kinds of analyses because the standard technologies of producing 5' EST libraries do not guarantee that the exact 5' end of the transcript will be captured. To determine the extent and accurate positions of 5' ends of transcripts and therefore the positions of core promoters, Cap analysis of gene expression (CAGE) sequencing method was used for sequencing RNA of E. huxleyi in both stages, calcifying and non-calcifying. As an additional info, gene expression levels of RNA for 21 samples were retrieved with whole transcriptome shotgun sequencing (RNA-Seq). The collections of reads these methods produced were used to map and annotate genes on several samples and measure the RNA expression levels in different conditions. Although there are not much data available for close organisms, it is possible to compare these results with other species to find conserved regulatory mechanisms between genes related to calcification. Visualization tools allowing browsing of annotated genes

  11. Melon Transcriptome Characterization: Simple Sequence Repeats and Single Nucleotide Polymorphisms Discovery for High Throughput Genotyping across the Species

    Directory of Open Access Journals (Sweden)

    José Miguel Blanca

    2011-07-01

    Full Text Available Melon ( L. ranks among the highest-valued fruit crops worldwide. Some genomic tools are available for this crop, including a Sanger transcriptome. We report the generation of 689,054 high-quality expressed sequence tags (ESTs from two 454 sequencing runs, using normalized and nonnormalized complementary DNA (cDNA libraries prepared from four genotypes belonging to the two subspecies and the main commercial types. 454 ESTs were combined with the Sanger available ESTs and de novo assembled into 53,252 unigenes. Over 63% of the unigenes were functionally annotated with Gene Ontology (GO terms and 21% had known orthologs of (L. Heynh. Annotation distribution followed similar tendencies than that reported for , suggesting that the dataset represents a fairly complete melon transcriptome. Furthermore, we identified a set of 3298 unigenes with microsatellite motifs and 14,417 sequences with single nucleotide variants of which 11,655 single nucleotide polymorphism met criteria for use with high-throughput genotyping platforms, and 453 could be detected as cleaved amplified polymorphic sequence (CAPS. A set of markers were validated, 90% of them being polymorphic in a number of variable accessions. This transcriptome provides an invaluable new tool for biological research, more so when it includes transcripts not described previously. It is being used for genome annotation and has provided a large collection of markers that will allow speeding up the process of breeding new melon varieties.

  12. [High throughput-targeted sequencing panel for exploring radiosensitivity associated genes in esophageal squamous cell carcinoma].

    Science.gov (United States)

    Qiao, Y; Hu, C X; Song, D A; Li, S Q; Zhou, L H; Jiang, X D

    2017-08-23

    Objective: To explore radiosensitivity-associated genes in esophageal squamous cell carcinoma by targeted sequencing panel. Methods: The peripheral blood from 22 esophageal squamous cell carcinoma (ESCC) patients received radiotherapy alone were collected, respectively. The genomic DNA (gDNA) of peripheral blood was extracted and used to create a library of gDNA restriction fragments. The gDNA restriction fragments were hybridized to the HaloPlex probe capture library, which comprises 356 cancer genes selected from the Catalogue of Somatic Mutations in Cancer (Cosmic) database of 2011 updated edition. The sequencing data were aligned by the Genome Analysis Toolkit GATK (version 3.0) and Picar. The single nucleotide polymorphism and inserted-deletion (SNP/InDel) variations were annotated by online database. The pathway enrichment was analyzed by Ingenuity Pathway analysis (IPA). Moreover, according to the short-period curative effect, 22 patients were divided into two groups: the radiation- sensitivity group (CR+ PR) and the radiation-resistant group (PD+ SD). The nonsynonymous mutation sites were statistically analyzed and the genes associated with radiosensitivity of ESCC were screened. Results: More than 97% sequencing reads were aligned to human genome reference sequence and more than 90% sequencing reads were the target sequences. SNP/InDel database annotation results showed that the mutations of 22 cases mainly distributed in exons, and the mutant types were mainly missense and synonymous single nucleotide variant (SNV). There were 23 genes of high-frequency mutation associated with esophageal cancer. Pathway enrichment by IPA showed that 3 pathways were associated with the development of esophageal cancer, which were roles of BRCA1 in DNA damage response pathway, DNA double-strand break repair by non-homologous end joining pathway and ATM signaling pathway. According to the curative effect, five genes including mismatch repair system component (PMS1

  13. Exploring the environmental diversity of kinetoplastid flagellates in the high-throughput DNA sequencing era

    Directory of Open Access Journals (Sweden)

    Claudia Masini d’Avila-Levy

    2015-01-01

    Full Text Available The class Kinetoplastea encompasses both free-living and parasitic species from a wide range of hosts. Several representatives of this group are responsible for severe human diseases and for economic losses in agriculture and livestock. While this group encompasses over 30 genera, most of the available information has been derived from the vertebrate pathogenic genera Leishmaniaand Trypanosoma.Recent studies of the previously neglected groups of Kinetoplastea indicated that the actual diversity is much higher than previously thought. This article discusses the known segment of kinetoplastid diversity and how gene-directed Sanger sequencing and next-generation sequencing methods can help to deepen our knowledge of these interesting protists.

  14. Identification and characterization of microRNAs in Phaseolus vulgaris by high-throughput sequencing

    Science.gov (United States)

    2012-01-01

    Background MicroRNAs (miRNAs) are endogenously encoded small RNAs that post-transcriptionally regulate gene expression. MiRNAs play essential roles in almost all plant biological processes. Currently, few miRNAs have been identified in the model food legume Phaseolus vulgaris (common bean). Recent advances in next generation sequencing technologies have allowed the identification of conserved and novel miRNAs in many plant species. Here, we used Illumina's sequencing by synthesis (SBS) technology to identify and characterize the miRNA population of Phaseolus vulgaris. Results Small RNA libraries were generated from roots, flowers, leaves, and seedlings of P. vulgaris. Based on similarity to previously reported plant miRNAs,114 miRNAs belonging to 33 conserved miRNA families were identified. Stem-loop precursors and target gene sequences for several conserved common bean miRNAs were determined from publicly available databases. Less conserved miRNA families and species-specific common bean miRNA isoforms were also characterized. Moreover, novel miRNAs based on the small RNAs were found and their potential precursors were predicted. In addition, new target candidates for novel and conserved miRNAs were proposed. Finally, we studied organ-specific miRNA family expression levels through miRNA read frequencies. Conclusions This work represents the first massive-scale RNA sequencing study performed in Phaseolus vulgaris to identify and characterize its miRNA population. It significantly increases the number of miRNAs, precursors, and targets identified in this agronomically important species. The miRNA expression analysis provides a foundation for understanding common bean miRNA organ-specific expression patterns. The present study offers an expanded picture of P. vulgaris miRNAs in relation to those of other legumes. PMID:22394504

  15. Identification and characterization of microRNAs in Phaseolus vulgaris by high-throughput sequencing.

    Science.gov (United States)

    Peláez, Pablo; Trejo, Minerva S; Iñiguez, Luis P; Estrada-Navarrete, Georgina; Covarrubias, Alejandra A; Reyes, José L; Sanchez, Federico

    2012-03-06

    MicroRNAs (miRNAs) are endogenously encoded small RNAs that post-transcriptionally regulate gene expression. MiRNAs play essential roles in almost all plant biological processes. Currently, few miRNAs have been identified in the model food legume Phaseolus vulgaris (common bean). Recent advances in next generation sequencing technologies have allowed the identification of conserved and novel miRNAs in many plant species. Here, we used Illumina's sequencing by synthesis (SBS) technology to identify and characterize the miRNA population of Phaseolus vulgaris. Small RNA libraries were generated from roots, flowers, leaves, and seedlings of P. vulgaris. Based on similarity to previously reported plant miRNAs,114 miRNAs belonging to 33 conserved miRNA families were identified. Stem-loop precursors and target gene sequences for several conserved common bean miRNAs were determined from publicly available databases. Less conserved miRNA families and species-specific common bean miRNA isoforms were also characterized. Moreover, novel miRNAs based on the small RNAs were found and their potential precursors were predicted. In addition, new target candidates for novel and conserved miRNAs were proposed. Finally, we studied organ-specific miRNA family expression levels through miRNA read frequencies. This work represents the first massive-scale RNA sequencing study performed in Phaseolus vulgaris to identify and characterize its miRNA population. It significantly increases the number of miRNAs, precursors, and targets identified in this agronomically important species. The miRNA expression analysis provides a foundation for understanding common bean miRNA organ-specific expression patterns. The present study offers an expanded picture of P. vulgaris miRNAs in relation to those of other legumes.

  16. Identification and characterization of microRNAs in Phaseolus vulgaris by high-throughput sequencing

    Directory of Open Access Journals (Sweden)

    Peláez Pablo

    2012-03-01

    Full Text Available Abstract Background MicroRNAs (miRNAs are endogenously encoded small RNAs that post-transcriptionally regulate gene expression. MiRNAs play essential roles in almost all plant biological processes. Currently, few miRNAs have been identified in the model food legume Phaseolus vulgaris (common bean. Recent advances in next generation sequencing technologies have allowed the identification of conserved and novel miRNAs in many plant species. Here, we used Illumina's sequencing by synthesis (SBS technology to identify and characterize the miRNA population of Phaseolus vulgaris. Results Small RNA libraries were generated from roots, flowers, leaves, and seedlings of P. vulgaris. Based on similarity to previously reported plant miRNAs,114 miRNAs belonging to 33 conserved miRNA families were identified. Stem-loop precursors and target gene sequences for several conserved common bean miRNAs were determined from publicly available databases. Less conserved miRNA families and species-specific common bean miRNA isoforms were also characterized. Moreover, novel miRNAs based on the small RNAs were found and their potential precursors were predicted. In addition, new target candidates for novel and conserved miRNAs were proposed. Finally, we studied organ-specific miRNA family expression levels through miRNA read frequencies. Conclusions This work represents the first massive-scale RNA sequencing study performed in Phaseolus vulgaris to identify and characterize its miRNA population. It significantly increases the number of miRNAs, precursors, and targets identified in this agronomically important species. The miRNA expression analysis provides a foundation for understanding common bean miRNA organ-specific expression patterns. The present study offers an expanded picture of P. vulgaris miRNAs in relation to those of other legumes.

  17. An integrated multiple capillary array electrophoresis system for high-throughput DNA sequencing

    Energy Technology Data Exchange (ETDEWEB)

    Lu, X.

    1998-03-27

    A capillary array electrophoresis system was chosen to perform DNA sequencing because of several advantages such as rapid heat dissipation, multiplexing capabilities, gel matrix filling simplicity, and the mature nature of the associated manufacturing technologies. There are two major concerns for the multiple capillary systems. One concern is inter-capillary cross-talk, and the other concern is excitation and detection efficiency. Cross-talk is eliminated through proper optical coupling, good focusing and immersing capillary array into index matching fluid. A side-entry excitation scheme with orthogonal detection was established for large capillary array. Two 100 capillary array formats were used for DNA sequencing. One format is cylindrical capillary with 150 {micro}m o.d., 75 {micro}m i.d and the other format is square capillary with 300 {micro}m out edge and 75 {micro}m inner edge. This project is focused on the development of excitation and detection of DNA as well as performing DNA sequencing. The DNA injection schemes are discussed for the cases of single and bundled capillaries. An individual sampling device was designed. The base-calling was performed for a capillary from the capillary array with the accuracy of 98%.

  18. High-throughput sequencing of human plasma RNA by using thermostable group II intron reverse transcriptases.

    Science.gov (United States)

    Qin, Yidan; Yao, Jun; Wu, Douglas C; Nottingham, Ryan M; Mohr, Sabine; Hunicke-Smith, Scott; Lambowitz, Alan M

    2016-01-01

    Next-generation RNA-sequencing (RNA-seq) has revolutionized transcriptome profiling, gene expression analysis, and RNA-based diagnostics. Here, we developed a new RNA-seq method that exploits thermostable group II intron reverse transcriptases (TGIRTs) and used it to profile human plasma RNAs. TGIRTs have higher thermostability, processivity, and fidelity than conventional reverse transcriptases, plus a novel template-switching activity that can efficiently attach RNA-seq adapters to target RNA sequences without RNA ligation. The new TGIRT-seq method enabled construction of RNA-seq libraries from RNA in RNA in 1-mL plasma samples from a healthy individual revealed RNA fragments mapping to a diverse population of protein-coding gene and long ncRNAs, which are enriched in intron and antisense sequences, as well as nearly all known classes of small ncRNAs, some of which have never before been seen in plasma. Surprisingly, many of the small ncRNA species were present as full-length transcripts, suggesting that they are protected from plasma RNases in ribonucleoprotein (RNP) complexes and/or exosomes. This TGIRT-seq method is readily adaptable for profiling of whole-cell, exosomal, and miRNAs, and for related procedures, such as HITS-CLIP and ribosome profiling. © 2015 Qin et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.

  19. Deconvolution of multiple infections in Plasmodium falciparum from high throughput sequencing data.

    Science.gov (United States)

    Zhu, Sha Joe; Almagro-Garcia, Jacob; McVean, Gil

    2018-01-01

    The presence of multiple infecting strains of the malarial parasite Plasmodium falciparum affects key phenotypic traits, including drug resistance and risk of severe disease. Advances in protocols and sequencing technology have made it possible to obtain high-coverage genome-wide sequencing data from blood samples and blood spots taken in the field. However, analyzing and interpreting such data is challenging because of the high rate of multiple infections present. We have developed a statistical method and implementation for deconvolving multiple genome sequences present in an individual with mixed infections. The software package DEploid uses haplotype structure within a reference panel of clonal isolates as a prior for haplotypes present in a given sample. It estimates the number of strains, their relative proportions and the haplotypes presented in a sample, allowing researchers to study multiple infection in malaria with an unprecedented level of detail. The open source implementation DEploid is freely available at https://github.com/mcveanlab/DEploid under the conditions of the GPLv3 license. An R version is available at https://github.com/mcveanlab/DEploid-r. joe.zhu@bdi.ox.ac.uk or gil.mcvean@bdi.ox.ac.uk. Supplementary data are available at Bioinformatics online.

  20. Secure and robust cloud computing for high-throughput forensic microsatellite sequence analysis and databasing.

    Science.gov (United States)

    Bailey, Sarah F; Scheible, Melissa K; Williams, Christopher; Silva, Deborah S B S; Hoggan, Marina; Eichman, Christopher; Faith, Seth A

    2017-11-01

    Next-generation Sequencing (NGS) is a rapidly evolving technology with demonstrated benefits for forensic genetic applications, and the strategies to analyze and manage the massive NGS datasets are currently in development. Here, the computing, data storage, connectivity, and security resources of the Cloud were evaluated as a model for forensic laboratory systems that produce NGS data. A complete front-to-end Cloud system was developed to upload, process, and interpret raw NGS data using a web browser dashboard. The system was extensible, demonstrating analysis capabilities of autosomal and Y-STRs from a variety of NGS instrumentation (Illumina MiniSeq and MiSeq, and Oxford Nanopore MinION). NGS data for STRs were concordant with standard reference materials previously characterized with capillary electrophoresis and Sanger sequencing. The computing power of the Cloud was implemented with on-demand auto-scaling to allow multiple file analysis in tandem. The system was designed to store resulting data in a relational database, amenable to downstream sample interpretations and databasing applications following the most recent guidelines in nomenclature for sequenced alleles. Lastly, a multi-layered Cloud security architecture was tested and showed that industry standards for securing data and computing resources were readily applied to the NGS system without disadvantageous effects for bioinformatic analysis, connectivity or data storage/retrieval. The results of this study demonstrate the feasibility of using Cloud-based systems for secured NGS data analysis, storage, databasing, and multi-user distributed connectivity. Copyright © 2017 Elsevier B.V. All rights reserved.

  1. Sample Preservation, DNA or RNA Extraction and Data Analysis for High-Throughput Phytoplankton Community Sequencing

    Directory of Open Access Journals (Sweden)

    Anita Mäki

    2017-09-01

    Full Text Available Phytoplankton is the basis for aquatic food webs and mirrors the water quality. Conventionally, phytoplankton analysis has been done using time consuming and partly subjective microscopic observations, but next generation sequencing (NGS technologies provide promising potential for rapid automated examination of environmental samples. Because many phytoplankton species have tough cell walls, methods for cell lysis and DNA or RNA isolation need to be efficient to allow unbiased nucleic acid retrieval. Here, we analyzed how two phytoplankton preservation methods, three commercial DNA extraction kits and their improvements, three RNA extraction methods, and two data analysis procedures affected the results of the NGS analysis. A mock community was pooled from phytoplankton species with variation in nucleus size and cell wall hardness. Although the study showed potential for studying Lugol-preserved sample collections, it demonstrated critical challenges in the DNA-based phytoplankton analysis in overall. The 18S rRNA gene sequencing output was highly affected by the variation in the rRNA gene copy numbers per cell, while sample preservation and nucleic acid extraction methods formed another source of variation. At the top, sequence-specific variation in the data quality introduced unexpected bioinformatics bias when the sliding-window method was used for the quality trimming of the Ion Torrent data. While DNA-based analyses did not correlate with biomasses or cell numbers of the mock community, rRNA-based analyses were less affected by different RNA extraction procedures and had better match with the biomasses, dry weight and carbon contents, and are therefore recommended for quantitative phytoplankton analyses.

  2. Improving transcriptome assembly through error correction of high-throughput sequence reads.

    Science.gov (United States)

    Macmanes, Matthew D; Eisen, Michael B

    2013-01-01

    The study of functional genomics, particularly in non-model organisms, has been dramatically improved over the last few years by the use of transcriptomes and RNAseq. While these studies are potentially extremely powerful, a computationally intensive procedure, the de novo construction of a reference transcriptome must be completed as a prerequisite to further analyses. The accurate reference is critically important as all downstream steps, including estimating transcript abundance are critically dependent on the construction of an accurate reference. Though a substantial amount of research has been done on assembly, only recently have the pre-assembly procedures been studied in detail. Specifically, several stand-alone error correction modules have been reported on and, while they have shown to be effective in reducing errors at the level of sequencing reads, how error correction impacts assembly accuracy is largely unknown. Here, we show via use of a simulated and empiric dataset, that applying error correction to sequencing reads has significant positive effects on assembly accuracy, and should be applied to all datasets. A complete collection of commands which will allow for the production of Reptile corrected reads is available at https://github.com/macmanes/error_correction/tree/master/scripts and as File S1.

  3. Improving transcriptome assembly through error correction of high-throughput sequence reads

    Directory of Open Access Journals (Sweden)

    Matthew D. MacManes

    2013-07-01

    Full Text Available The study of functional genomics, particularly in non-model organisms, has been dramatically improved over the last few years by the use of transcriptomes and RNAseq. While these studies are potentially extremely powerful, a computationally intensive procedure, the de novo construction of a reference transcriptome must be completed as a prerequisite to further analyses. The accurate reference is critically important as all downstream steps, including estimating transcript abundance are critically dependent on the construction of an accurate reference. Though a substantial amount of research has been done on assembly, only recently have the pre-assembly procedures been studied in detail. Specifically, several stand-alone error correction modules have been reported on and, while they have shown to be effective in reducing errors at the level of sequencing reads, how error correction impacts assembly accuracy is largely unknown. Here, we show via use of a simulated and empiric dataset, that applying error correction to sequencing reads has significant positive effects on assembly accuracy, and should be applied to all datasets. A complete collection of commands which will allow for the production of Reptile corrected reads is available at https://github.com/macmanes/error_correction/tree/master/scripts and as File S1.

  4. High-throughput sequencing-based genome-wide identification of microRNAs expressed in developing cotton seeds.

    Science.gov (United States)

    Wang, YanMei; Ding, Yan; Yu, DingWei; Xue, Wei; Liu, JinYuan

    2015-08-01

    MicroRNAs (miRNAs) have been shown to play critical regulatory roles in gene expression in cotton. Although a large number of miRNAs have been identified in cotton fibers, the functions of miRNAs in seed development remain unexplored. In this study, a small RNA library was constructed from cotton seeds sampled at 15 days post-anthesis (DPA) and was subjected to high-throughput sequencing. A total of 95 known miRNAs were detected to be expressed in cotton seeds. The expression pattern of these identified miRNAs was profiled and 48 known miRNAs were differentially expressed between cotton seeds and fibers at 15 DPA. In addition, 23 novel miRNA candidates were identified in 15-DPA seeds. Putative targets for 21 novel and 87 known miRNAs were successfully predicted and 900 expressed sequence tag (EST) sequences were proposed to be candidate target genes, which are involved in various metabolic and biological processes, suggesting a complex regulatory network in developing cotton seeds. Furthermore, miRNA-mediated cleavage of three important transcripts in vivo was validated by RLM-5' RACE. This study is the first to show the regulatory network of miRNAs that are involved in developing cotton seeds and provides a foundation for future studies on the specific functions of these miRNAs in seed development.

  5. Bacterial diversity of the American sand fly Lutzomyia intermedia using high-throughput metagenomic sequencing.

    Science.gov (United States)

    Monteiro, Carolina Cunha; Villegas, Luis Eduardo Martinez; Campolina, Thais Bonifácio; Pires, Ana Clara Machado Araújo; Miranda, Jose Carlos; Pimenta, Paulo Filemon Paolucci; Secundino, Nagila Francinete Costa

    2016-08-31

    Parasites of the genus Leishmania cause a broad spectrum of diseases, collectively known as leishmaniasis, in humans worldwide. American cutaneous leishmaniasis is a neglected disease transmitted by sand fly vectors including Lutzomyia intermedia, a proven vector. The female sand fly can acquire or deliver Leishmania spp. parasites while feeding on a blood meal, which is required for nutrition, egg development and survival. The microbiota composition and abundance varies by food source, life stages and physiological conditions. The sand fly microbiota can affect parasite life-cycle in the vector. We performed a metagenomic analysis for microbiota composition and abundance in Lu. intermedia, from an endemic area in Brazil. The adult insects were collected using CDC light traps, morphologically identified, carefully sterilized, dissected under a microscope and the females separated into groups according to their physiological condition: (i) absence of blood meal (unfed = UN); (ii) presence of blood meal (blood-fed = BF); and (iii) presence of developed ovaries (gravid = GR). Then, they were processed for metagenomics with Illumina Hiseq Sequencing in order to be sequence analyzed and to obtain the taxonomic profiles of the microbiota. Bacterial metagenomic analysis revealed differences in microbiota composition based upon the distinct physiological stages of the adult insect. Sequence identification revealed two phyla (Proteobacteria and Actinobacteria), 11 families and 15 genera; 87 % of the bacteria were Gram-negative, while only one family and two genera were identified as Gram-positive. The genera Ochrobactrum, Bradyrhizobium and Pseudomonas were found across all of the groups. The metagenomic analysis revealed that the microbiota of the Lu. intermedia female sand flies are distinct under specific physiological conditions and consist of 15 bacterial genera. The Ochrobactrum, Bradyrhizobium and Pseudomonas were the common genera. Our results detailing

  6. Characterization of limes (Citrus aurantifolia) grown in Bhutan and Indonesia using high-throughput sequencing.

    Science.gov (United States)

    Penjor, Tshering; Mimura, Takashi; Matsumoto, Ryoji; Yamamoto, Masashi; Nagano, Yukio

    2014-04-30

    Lime [Citrus aurantifolia (Cristm.) Swingle] is a Citrus species that is a popular ingredient in many cuisines. Some citrus plants are known to originate in the area ranging from northeastern India to southwestern China. In the current study, we characterized and compared limes grown in Bhutan (n = 5 accessions) and Indonesia (n = 3 accessions). The limes were separated into two groups based on their morphology. Restriction site-associated DNA sequencing (RAD-seq) separated the eight accessions into two clusters. One cluster contained four accessions from Bhutan, whereas the other cluster contained one accession from Bhutan and the three accessions from Indonesia. This genetic classification supported the morphological classification of limes. The analysis suggests that the properties associated with asexual reproduction, and somatic homologous recombination, have contributed to the genetic diversification of limes.

  7. High Throughput Sequencing of Germline and Tumor from Men With Early-Onset Metastatic Prostate Cancer

    Science.gov (United States)

    2014-10-01

    laboratory has applied these same approaches to interrogate hundreds of routine FFPE cancer specimens on the Ion Torrent platform, including...START DATE Sept. 01, 2013 Site 1: University of Michigan Site 2: University of Carolina – Chapel Hill 3003 S. State Street Ann Arbor, MI 48109-1274...specified in proposal) Timeline Site 1 Site 2 Major Task 1: To identify and enroll men with de novo metastatic prostate cancer presenting at or before

  8. High-throughput search for caloric materials: the CaloriCool approach

    Science.gov (United States)

    Zarkevich, N. A.; Johnson, D. D.; Pecharsky, V. K.

    2018-01-01

    The high-throughput search paradigm adopted by the newly established caloric materials consortium—CaloriCool®—with the goal to substantially accelerate discovery and design of novel caloric materials is briefly discussed. We begin with describing material selection criteria based on known properties, which are then followed by heuristic fast estimates, ab initio calculations, all of which has been implemented in a set of automated computational tools and measurements. We also demonstrate how theoretical and computational methods serve as a guide for experimental efforts by considering a representative example from the field of magnetocaloric materials.

  9. Diagnostic single gene analyses beyond Sanger. Economic high-throughput sequencing of small genes involved in congenital coagulation and platelet disorders.

    Science.gov (United States)

    Najm, Juliane; Rath, Matthias; Schröder, Winnie; Felbor, Ute

    2017-07-17

    Molecular testing of congenital coagulation and platelet disorders offers confirmation of clinical diagnoses, supports genetic counselling, and enables predictive and prenatal diagnosis. In some cases, genotype-phenotype correlations are important for predicting the clinical course of the disease and adaptation of individualized therapy. Until recently, genotyping has been mainly performed by Sanger sequencing. While next generation sequencing (NGS) enables the parallel analysis of multiple genes, the cost-value ratio of custom-made panels can be unfavorable for analyses of specific small genes. The aim of this study was to transfer genotyping of small genes involved in congenital coagulation and platelet disorders from Sanger sequencing to an NGS-based method. A LR-PCR approach for target enrichment of the entire genomic regions of the genes F7, F10, F11, F12, GATA1, MYH9, TUBB1 and WAS was combined with high-throughput sequencing on a MiSeq platform. NGS detected all variants that had previously been identified by Sanger sequencing. Our results demonstrate that this approach is an accurate and flexible tool for molecular genetic diagnostics of single small genes.

  10. A high-throughput sequencing ecotoxicology study of freshwater bacterial communities and their responses to tebuconazole.

    Science.gov (United States)

    Pascault, Noémie; Roux, Simon; Artigas, Joan; Pesce, Stéphane; Leloup, Julie; Tadonleke, Rémy D; Debroas, Didier; Bouchez, Agnès; Humbert, Jean-François

    2014-12-01

    The pollution of lakes and rivers by pesticides is a growing problem worldwide. However, the impacts of these substances on microbial communities are still poorly understood, partly because next-generation sequencing (NGS) has rarely been used in an ecotoxicology context to study bacterial communities despite its interest for accessing rare taxa. Microcosm experiments were carried out to evaluate the effects of tebuconazole (TBZ) on the structure and composition of bacterial communities from two types of freshwater ecosystem (lakes and rivers) with differing histories of pollutant contamination (pristine vs. previously exposed sites). Pyrosequencing revealed that bacterial diversity was higher in the river than in the lakes and in previously exposed sites than in pristine sites. Lakes and river stations shared very few OTUs, and differences at the phylum level were identified between these ecosystems (i.e. the relative importance of Actinobacteria and Gammaproteobacteria). Despite differences between these ecosystems and their contamination history, no significant effect of TBZ on bacterial community structure or composition was observed. Compared to functional parameters that displayed variable responses, we demonstrated that a combination of classical methods and NGS is necessary to investigate the ecotoxicological responses of microbial communities to pollutants. © 2014 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved.

  11. Neuropeptidergic Signaling in the American Lobster Homarus americanus: New Insights from High-Throughput Nucleotide Sequencing.

    Science.gov (United States)

    Christie, Andrew E; Chi, Megan; Lameyer, Tess J; Pascual, Micah G; Shea, Devlin N; Stanhope, Meredith E; Schulz, David J; Dickinson, Patsy S

    2015-01-01

    Peptides are the largest and most diverse class of molecules used for neurochemical communication, playing key roles in the control of essentially all aspects of physiology and behavior. The American lobster, Homarus americanus, is a crustacean of commercial and biomedical importance; lobster growth and reproduction are under neuropeptidergic control, and portions of the lobster nervous system serve as models for understanding the general principles underlying rhythmic motor behavior (including peptidergic neuromodulation). While a number of neuropeptides have been identified from H. americanus, and the effects of some have been investigated at the cellular/systems levels, little is currently known about the molecular components of neuropeptidergic signaling in the lobster. Here, a H. americanus neural transcriptome was generated and mined for sequences encoding putative peptide precursors and receptors; 35 precursor- and 41 receptor-encoding transcripts were identified. We predicted 194 distinct neuropeptides from the deduced precursor proteins, including members of the adipokinetic hormone-corazonin-like peptide, allatostatin A, allatostatin C, bursicon, CCHamide, corazonin, crustacean cardioactive peptide, crustacean hyperglycemic hormone (CHH), CHH precursor-related peptide, diuretic hormone 31, diuretic hormone 44, eclosion hormone, FLRFamide, GSEFLamide, insulin-like peptide, intocin, leucokinin, myosuppressin, neuroparsin, neuropeptide F, orcokinin, pigment dispersing hormone, proctolin, pyrokinin, SIFamide, sulfakinin and tachykinin-related peptide families. While some of the predicted peptides are known H. americanus isoforms, most are novel identifications, more than doubling the extant lobster neuropeptidome. The deduced receptor proteins are the first descriptions of H. americanus neuropeptide receptors, and include ones for most of the peptide groups mentioned earlier, as well as those for ecdysis-triggering hormone, red pigment concentrating hormone

  12. Coupled high-throughput functional screening and next generation sequencing for identification of plant polymer decomposing enzymes in metagenomic libraries

    Directory of Open Access Journals (Sweden)

    Mari eNyyssönen

    2013-09-01

    Full Text Available Recent advances in sequencing technologies generate new predictions and hypotheses about the functional roles of environmental microorganisms. Yet, until we can test these predictions at a scale that matches our ability to generate them, most of them will remain as hypotheses. Function-based mining of metagenomic libraries can provide direct linkages between genes, metabolic traits and microbial taxa and thus bridge this gap between sequence data generation and functional predictions. Here we developed high-throughput screening assays for function-based characterization of activities involved in plant polymer decomposition from environmental metagenomic libraries. The multiplexed assays use fluorogenic and chromogenic substrates, combine automated liquid handling and use a genetically modified expression host to enable simultaneous screening of 12,160 clones for 14 activities in a total of 170,240 reactions. Using this platform we identified 374 (0.26 % cellulose, hemicellulose, chitin, starch, phosphate and protein hydrolyzing clones from fosmid libraries prepared from decomposing leaf litter. Sequencing on the Illumina MiSeq platform, followed by assembly and gene prediction of a subset of 95 fosmid clones, identified a broad range of bacterial phyla, including Actinobacteria, Bacteroidetes, multiple Proteobacteria sub-phyla in addition to some Fungi. Carbohydrate-active enzyme genes from 20 different glycoside hydrolase families were detected. Using tetranucleotide frequency binning of fosmid sequences, multiple enzyme activities from distinct fosmids were linked, demonstrating how biochemically-confirmed functional traits in environmental metagenomes may be attributed to groups of specific organisms. Overall, our results demonstrate how functional screening of metagenomic libraries can be used to connect microbial functionality to community composition and, as a result, complement large-scale metagenomic sequencing efforts.

  13. Transcriptomics of in vitro immune-stimulated hemocytes from the Manila clam Ruditapes philippinarum using high-throughput sequencing.

    Directory of Open Access Journals (Sweden)

    Rebeca Moreira

    Full Text Available BACKGROUND: The Manila clam (Ruditapes philippinarum is a worldwide cultured bivalve species with important commercial value. Diseases affecting this species can result in large economic losses. Because knowledge of the molecular mechanisms of the immune response in bivalves, especially clams, is scarce and fragmentary, we sequenced RNA from immune-stimulated R. philippinarum hemocytes by 454-pyrosequencing to identify genes involved in their immune defense against infectious diseases. METHODOLOGY AND PRINCIPAL FINDINGS: High-throughput deep sequencing of R. philippinarum using 454 pyrosequencing technology yielded 974,976 high-quality reads with an average read length of 250 bp. The reads were assembled into 51,265 contigs and the 44.7% of the translated nucleotide sequences into protein were annotated successfully. The 35 most frequently found contigs included a large number of immune-related genes, and a more detailed analysis showed the presence of putative members of several immune pathways and processes like the apoptosis, the toll like signaling pathway and the complement cascade. We have found sequences from molecules never described in bivalves before, especially in the complement pathway where almost all the components are present. CONCLUSIONS: This study represents the first transcriptome analysis using 454-pyrosequencing conducted on R. philippinarum focused on its immune system. Our results will provide a rich source of data to discover and identify new genes, which will serve as a basis for microarray construction and the study of gene expression as well as for the identification of genetic markers. The discovery of new immune sequences was very productive and resulted in a large variety of contigs that may play a role in the defense mechanisms of Ruditapes philippinarum.

  14. High-throughput sequencing reveals extensive variation in human-specific L1 content in individual human genomes.

    Science.gov (United States)

    Ewing, Adam D; Kazazian, Haig H

    2010-09-01

    Using high-throughput sequencing, we devised a technique to determine the insertion sites of virtually all members of the human-specific L1 retrotransposon family in any human genome. Using diagnostic nucleotides, we were able to locate the approximately 800 L1Hs copies corresponding specifically to the pre-Ta, Ta-0, and Ta-1 L1Hs subfamilies, with over 90% of sequenced reads corresponding to human-specific elements. We find that any two individual genomes differ at an average of 285 sites with respect to L1 insertion presence or absence. In total, we assayed 25 individuals, 15 of which are unrelated, at 1139 sites, including 772 shared with the reference genome and 367 nonreference L1 insertions. We show that L1Hs profiles recapitulate genetic ancestry, and determine the chromosomal distribution of these elements. Using these data, we estimate that the rate of L1 retrotransposition in humans is between 1/95 and 1/270 births, and the number of dimorphic L1 elements in the human population with gene frequencies greater than 0.05 is between 3000 and 10,000.

  15. High throughput sequencing reveals the diversity of TRB-CDR3 repertoire in patients with psoriasis vulgaris.

    Science.gov (United States)

    Cao, Xiaofang; Wa, Qingbiao; Wang, Qidi; Li, Lin; Liu, Xin; An, Lisha; Cai, Ruikun; Du, Meng; Qiu, Yue; Han, Jian; Wang, Chunlin; Wang, Xingyu; Guo, Changlong; Lu, Yonghong; Ma, Xu

    2016-11-01

    Psoriasis is a T cell-mediated chronic inflammatory skin disease with inflammatory cell infiltrates in the dermis and epidermis. Previous studies suggested that there are some expanded T-cell receptor (TCR) clones in psoriatic skin. However, the effect of psoriasis on the immunological characteristics of TCR in circulating blood has not been reported. To address this, we performed high-throughput sequencing to reveal the immunological characteristics of TCR beta chain (TRB) in both psoriasis patients and healthy controls. Our results revealed that the TRB-CDR3 region of psoriasis patients had distinctive immunological characteristics compared with that of healthy controls, including V gene usage, nt of N addition. In addition, three types of TRB-CDR3 peptides were found highly relevant to psoriasis. Our findings show the comprehensive characteristics of psoriasis on the TRB-CDR3 repertoire of circulating blood at sequence-level resolution. These findings may contribute to a better understanding of the pathogenesis of psoriasis and open opportunities to explore potential therapeutic targets. Copyright © 2016 Elsevier B.V. All rights reserved.

  16. High-Throughput Sequencing of Viable Microbial Communities in Raw Pork Subjected to a Fast Cooling Process.

    Science.gov (United States)

    Yang, Chao; Che, You; Qi, Yan; Liang, Peixin; Song, Cunjiang

    2017-01-01

    This study aimed to investigate the effect of the fast cooling process on the microbiological community in chilled fresh pork during storage. We established a culture-independent method to study viable microbes in raw pork. Tray-packaged fresh pork and chilled fresh pork were completely spoiled after 18 and 49 d in aseptic bags at 4 °C, respectively. 16S/18S ribosomal RNAs were reverse transcribed to cDNA to characterize the activity of viable bacteria/fungi in the 2 types of pork. Both cDNA and total DNA were analyzed by high-throughput sequencing, which revealed that viable Bacteroides sp. were the most active genus in rotten pork, although viable Myroides sp. and Pseudomonas sp. were also active. Moreover, viable fungi were only detected in chilled fresh pork. The sequencing results revealed that the fast cooling process could suppress the growth of microbes present initially in the raw meat to extend its shelf life. Our results also suggested that fungi associated with pork spoilage could not grow well in aseptic tray-packaged conditions. © 2016 Institute of Food Technologists®.

  17. ImmuneDB: a system for the analysis and exploration of high-throughput adaptive immune receptor sequencing data.

    Science.gov (United States)

    Rosenfeld, Aaron M; Meng, Wenzhao; Luning Prak, Eline T; Hershberg, Uri

    2017-01-15

    As high-throughput sequencing of B cells becomes more common, the need for tools to analyze the large quantity of data also increases. This article introduces ImmuneDB, a system for analyzing vast amounts of heavy chain variable region sequences and exploring the resulting data. It can take as input raw FASTA/FASTQ data, identify genes, determine clones, construct lineages, as well as provide information such as selection pressure and mutation analysis. It uses an industry leading database, MySQL, to provide fast analysis and avoid the complexities of using error prone flat-files. ImmuneDB is freely available at http://immunedb.comA demo of the ImmuneDB web interface is available at: http://immunedb.com/demo CONTACT: Uh25@drexel.eduSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  18. Discovery of J Chain in African Lungfish (Protopterus dolloi, Sarcopterygii) Using High Throughput Transcriptome Sequencing: Implications in Mucosal Immunity

    Science.gov (United States)

    Tacchi, Luca; Larragoite, Erin; Salinas, Irene

    2013-01-01

    J chain is a small polypeptide responsible for immunoglobulin (Ig) polymerization and transport of Igs across mucosal surfaces in higher vertebrates. We identified a J chain in dipnoid fish, the African lungfish (Protopterus dolloi) by high throughput sequencing of the transcriptome. P. dolloi J chain is 161 aa long and contains six of the eight Cys residues present in mammalian J chain. Phylogenetic studies place the lungfish J chain closer to tetrapod J chain than to the coelacanth or nurse shark sequences. J chain expression occurs in all P. dolloi immune tissues examined and it increases in the gut and kidney in response to an experimental bacterial infection. Double fluorescent in-situ hybridization shows that 88.5% of IgM+ cells in the gut co-express J chain, a significantly higher percentage than in the pre-pyloric spleen. Importantly, J chain expression is not restricted to the B-cell compartment since gut epithelial cells also express J chain. These results improve our current view of J chain from a phylogenetic perspective. PMID:23967082

  19. Fungi Sailing the Arctic Ocean: Speciose Communities in North Atlantic Driftwood as Revealed by High-Throughput Amplicon Sequencing.

    Science.gov (United States)

    Rämä, Teppo; Davey, Marie L; Nordén, Jenni; Halvorsen, Rune; Blaalid, Rakel; Mathiassen, Geir H; Alsos, Inger G; Kauserud, Håvard

    2016-08-01

    High amounts of driftwood sail across the oceans and provide habitat for organisms tolerating the rough and saline environment. Fungi have adapted to the extremely cold and saline conditions which driftwood faces in the high north. For the first time, we applied high-throughput sequencing to fungi residing in driftwood to reveal their taxonomic richness, community composition, and ecology in the North Atlantic. Using pyrosequencing of ITS2 amplicons obtained from 49 marine logs, we found 807 fungal operational taxonomic units (OTUs) based on clustering at 97 % sequence similarity cut-off level. The phylum Ascomycota comprised 74 % of the OTUs and 20 % belonged to Basidiomycota. The richness of basidiomycetes decreased with prolonged submersion in the sea, supporting the general view of ascomycetes being more extremotolerant. However, more than one fourth of the fungal OTUs remained unassigned to any fungal class, emphasising the need for better DNA reference data from the marine habitat. Different fungal communities were detected in coniferous and deciduous logs. Our results highlight that driftwood hosts a considerably higher fungal diversity than currently known. The driftwood fungal community is not a terrestrial relic but a speciose assemblage of fungi adapted to the stressful marine environment and different kinds of wooden substrates found in it.

  20. Soil DNA metabarcoding and high-throughput sequencing as a forensic tool: considerations, potential limitations and recommendations.

    Science.gov (United States)

    Young, J M; Austin, J J; Weyrich, L S

    2017-02-01

    Analysis of physical evidence is typically a deciding factor in forensic casework by establishing what transpired at a scene or who was involved. Forensic geoscience is an emerging multi-disciplinary science that can offer significant benefits to forensic investigations. Soil is a powerful, nearly 'ideal' contact trace evidence, as it is highly individualistic, easy to characterise, has a high transfer and retention probability, and is often overlooked in attempts to conceal evidence. However, many real-life cases encounter close proximity soil samples or soils with low inorganic content, which cannot be easily discriminated based on current physical and chemical analysis techniques. The capability to improve forensic soil discrimination, and identify key indicator taxa from soil using the organic fraction is currently lacking. The development of new DNA sequencing technologies offers the ability to generate detailed genetic profiles from soils and enhance current forensic soil analyses. Here, we discuss the use of DNA metabarcoding combined with high-throughput sequencing (HTS) technology to distinguish between soils from different locations in a forensic context. Specifically, we provide recommendations for best practice, outline the potential limitations encountered in a forensic context and describe the future directions required to integrate soil DNA analysis into casework. © FEMS 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  1. VarSim: a high-fidelity simulation and validation framework for high-throughput genome sequencing with cancer applications.

    Science.gov (United States)

    Mu, John C; Mohiyuddin, Marghoob; Li, Jian; Bani Asadi, Narges; Gerstein, Mark B; Abyzov, Alexej; Wong, Wing H; Lam, Hugo Y K

    2015-05-01

    VarSim is a framework for assessing alignment and variant calling accuracy in high-throughput genome sequencing through simulation or real data. In contrast to simulating a random mutation spectrum, it synthesizes diploid genomes with germline and somatic mutations based on a realistic model. This model leverages information such as previously reported mutations to make the synthetic genomes biologically relevant. VarSim simulates and validates a wide range of variants, including single nucleotide variants, small indels and large structural variants. It is an automated, comprehensive compute framework supporting parallel computation and multiple read simulators. Furthermore, we developed a novel map data structure to validate read alignments, a strategy to compare variants binned in size ranges and a lightweight, interactive, graphical report to visualize validation results with detailed statistics. Thus far, it is the most comprehensive validation tool for secondary analysis in next generation sequencing. Code in Java and Python along with instructions to download the reads and variants is at http://bioinform.github.io/varsim. rd@bina.com Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.

  2. PathOS: a decision support system for reporting high throughput sequencing of cancers in clinical diagnostic laboratories.

    Science.gov (United States)

    Doig, Kenneth D; Fellowes, Andrew; Bell, Anthony H; Seleznev, Andrei; Ma, David; Ellul, Jason; Li, Jason; Doyle, Maria A; Thompson, Ella R; Kumar, Amit; Lara, Luis; Vedururu, Ravikiran; Reid, Gareth; Conway, Thomas; Papenfuss, Anthony T; Fox, Stephen B

    2017-04-24

    The increasing affordability of DNA sequencing has allowed it to be widely deployed in pathology laboratories. However, this has exposed many issues with the analysis and reporting of variants for clinical diagnostic use. Implementing a high-throughput sequencing (NGS) clinical reporting system requires a diverse combination of capabilities, statistical methods to identify variants, global variant databases, a validated bioinformatics pipeline, an auditable laboratory workflow, reproducible clinical assays and quality control monitoring throughout. These capabilities must be packaged in software that integrates the disparate components into a useable system. To meet these needs, we developed a web-based application, PathOS, which takes variant data from a patient sample through to a clinical report. PathOS has been used operationally in the Peter MacCallum Cancer Centre for two years for the analysis, curation and reporting of genetic tests for cancer patients, as well as the curation of large-scale research studies. PathOS has also been deployed in cloud environments allowing multiple institutions to use separate, secure and customisable instances of the system. Increasingly, the bottleneck of variant curation is limiting the adoption of clinical sequencing for molecular diagnostics. PathOS is focused on providing clinical variant curators and pathology laboratories with a decision support system needed for personalised medicine. While the genesis of PathOS has been within cancer molecular diagnostics, the system is applicable to NGS clinical reporting generally. The widespread availability of genomic sequencers has highlighted the limited availability of software to support clinical decision-making in molecular pathology. PathOS is a system that has been developed and refined in a hospital laboratory context to meet the needs of clinical diagnostics. The software is available as a set of Docker images and source code at https://github.com/PapenfussLab/PathOS .

  3. The pig gut microbial diversity: Understanding the pig gut microbial ecology through the next generation high throughput sequencing.

    Science.gov (United States)

    Kim, Hyeun Bum; Isaacson, Richard E

    2015-06-12

    The importance of the gut microbiota of animals is widely acknowledged because of its pivotal roles in the health and well being of animals. The genetic diversity of the gut microbiota contributes to the overall development and metabolic needs of the animal, and provides the host with many beneficial functions including production of volatile fatty acids, re-cycling of bile salts, production of vitamin K, cellulose digestion, and development of immune system. Thus the intestinal microbiota of animals has been the subject of study for many decades. Although most of the older studies have used culture dependent methods, the recent advent of high throughput sequencing of 16S rRNA genes has facilitated in depth studies exploring microbial populations and their dynamics in the animal gut. These culture independent DNA based studies generate large amounts of data and as a result contribute to a more detailed understanding of the microbiota dynamics in the gut and the ecology of the microbial populations. Of equal importance, is being able to identify and quantify microbes that are difficult to grow or that have not been grown in the laboratory. Interpreting the data obtained from this type of study requires using basic principles of microbial diversity to understand importance of the composition of microbial populations. In this review, we summarize the literature on culture independent studies of the pig gut microbiota with an emphasis on its succession and alterations caused by diverse factors. Copyright © 2015 Elsevier B.V. All rights reserved.

  4. Distribution and Diversity of Bacteria and Fungi Colonization in Stone Monuments Analyzed by High-Throughput Sequencing

    Science.gov (United States)

    Li, Qiang; Zhang, Bingjian; He, Zhang; Yang, Xiaoru

    2016-01-01

    The historical and cultural heritage of Qingxing palace and Lingyin and Kaihua temple, located in Hangzhou of China, include a large number of exquisite Buddhist statues and ancient stone sculptures which date back to the Northern Song (960–1219 A.D.) and Qing dynasties (1636–1912 A.D.) and are considered to be some of the best examples of ancient stone sculpting techniques. They were added to the World Heritage List in 2011 because of their unique craftsmanship and importance to the study of ancient Chinese Buddhist culture. However, biodeterioration of the surface of the ancient Buddhist statues and white marble pillars not only severely impairs their aesthetic value but also alters their material structure and thermo-hygric properties. In this study, high-throughput sequencing was utilized to identify the microbial communities colonizing the stone monuments. The diversity and distribution of the microbial communities in six samples collected from three different environmental conditions with signs of deterioration were analyzed by means of bioinformatics software and diversity indices. In addition, the impact of environmental factors, including temperature, light intensity, air humidity, and the concentration of NO2 and SO2, on the microbial communities’ diversity and distribution was evaluated. The results indicate that the presence of predominantly phototrophic microorganisms was correlated with light and humidity, while nitrifying bacteria and Thiobacillus were associated with NO2 and SO2 from air pollution. PMID:27658256

  5. Distribution and Diversity of Bacteria and Fungi Colonization in Stone Monuments Analyzed by High-Throughput Sequencing.

    Science.gov (United States)

    Li, Qiang; Zhang, Bingjian; He, Zhang; Yang, Xiaoru

    The historical and cultural heritage of Qingxing palace and Lingyin and Kaihua temple, located in Hangzhou of China, include a large number of exquisite Buddhist statues and ancient stone sculptures which date back to the Northern Song (960-1219 A.D.) and Qing dynasties (1636-1912 A.D.) and are considered to be some of the best examples of ancient stone sculpting techniques. They were added to the World Heritage List in 2011 because of their unique craftsmanship and importance to the study of ancient Chinese Buddhist culture. However, biodeterioration of the surface of the ancient Buddhist statues and white marble pillars not only severely impairs their aesthetic value but also alters their material structure and thermo-hygric properties. In this study, high-throughput sequencing was utilized to identify the microbial communities colonizing the stone monuments. The diversity and distribution of the microbial communities in six samples collected from three different environmental conditions with signs of deterioration were analyzed by means of bioinformatics software and diversity indices. In addition, the impact of environmental factors, including temperature, light intensity, air humidity, and the concentration of NO2 and SO2, on the microbial communities' diversity and distribution was evaluated. The results indicate that the presence of predominantly phototrophic microorganisms was correlated with light and humidity, while nitrifying bacteria and Thiobacillus were associated with NO2 and SO2 from air pollution.

  6. Distribution and Diversity of Bacteria and Fungi Colonization in Stone Monuments Analyzed by High-Throughput Sequencing.

    Directory of Open Access Journals (Sweden)

    Qiang Li

    Full Text Available The historical and cultural heritage of Qingxing palace and Lingyin and Kaihua temple, located in Hangzhou of China, include a large number of exquisite Buddhist statues and ancient stone sculptures which date back to the Northern Song (960-1219 A.D. and Qing dynasties (1636-1912 A.D. and are considered to be some of the best examples of ancient stone sculpting techniques. They were added to the World Heritage List in 2011 because of their unique craftsmanship and importance to the study of ancient Chinese Buddhist culture. However, biodeterioration of the surface of the ancient Buddhist statues and white marble pillars not only severely impairs their aesthetic value but also alters their material structure and thermo-hygric properties. In this study, high-throughput sequencing was utilized to identify the microbial communities colonizing the stone monuments. The diversity and distribution of the microbial communities in six samples collected from three different environmental conditions with signs of deterioration were analyzed by means of bioinformatics software and diversity indices. In addition, the impact of environmental factors, including temperature, light intensity, air humidity, and the concentration of NO2 and SO2, on the microbial communities' diversity and distribution was evaluated. The results indicate that the presence of predominantly phototrophic microorganisms was correlated with light and humidity, while nitrifying bacteria and Thiobacillus were associated with NO2 and SO2 from air pollution.

  7. The First Report of miRNAs from a Thysanopteran Insect, Thrips palmi Karny Using High-Throughput Sequencing.

    Science.gov (United States)

    Rebijith, K B; Asokan, R; Hande, H Ranjitha; Krishna Kumar, N K

    Thrips palmi Karny (Thysanoptera: Thripidae) is the sole vector of Watermelon bud necrosis tospovirus, where the crop loss has been estimated to be around USD 50 million annually. Chemical insecticides are of limited use in the management of T. palmi due to the thigmokinetic behaviour and development of high levels of resistance to insecticides. There is an urgent need to find out an effective futuristic management strategy, where the small RNAs especially microRNAs hold great promise as a key player in the growth and development. miRNAs are a class of short non-coding RNAs involved in regulation of gene expression either by mRNA cleavage or by translational repression. We identified and characterized a total of 77 miRNAs from T. palmi using high-throughput deep sequencing. Functional classifications of the targets for these miRNAs revealed that majority of them are involved in the regulation of transcription and translation, nucleotide binding and signal transduction. We have also validated few of these miRNAs employing stem-loop RT-PCR, qRT-PCR and Northern blot. The present study not only provides an in-depth understanding of the biological and physiological roles of miRNAs in governing gene expression but may also lead as an invaluable tool for the management of thysanopteran insects in the future.

  8. Apple ring rot-responsive putative microRNAs revealed by high-throughput sequencing in Malus × domestica Borkh.

    Science.gov (United States)

    Yu, Xin-Yi; Du, Bei-Bei; Gao, Zhi-Hong; Zhang, Shi-Jie; Tu, Xu-Tong; Chen, Xiao-Yun; Zhang, Zhen; Qu, Shen-Chun

    2014-08-01

    MicroRNAs (miRNAs) are small non-coding RNAs, which silence target mRNA via cleavage or translational inhibition to function in regulating gene expression. MiRNAs act as important regulators of plant development and stress response. For understanding the role of miRNAs responsive to apple ring rot stress, we identified disease-responsive miRNAs using high-throughput sequencing in Malus × domestica Borkh.. Four small RNA libraries were constructed from two control strains in M. domestica, crabapple (CKHu) and Fuji Naga-fu No. 6 (CKFu), and two disease stress strains, crabapple (DSHu) and Fuji Naga-fu No. 6 (DSFu). A total of 59 miRNA families were identified and five miRNAs might be responsive to apple ring rot infection and validated via qRT-PCR. Furthermore, we predicted 76 target genes which were regulated by conserved miRNAs potentially. Our study demonstrated that miRNAs was responsive to apple ring rot infection and may have important implications on apple disease resistance.

  9. Taxonomic analysis of the microbial community in stored sugar beets using high-throughput sequencing of different marker genes.

    Science.gov (United States)

    Liebe, Sebastian; Wibberg, Daniel; Winkler, Anika; Pühler, Alfred; Schlüter, Andreas; Varrelmann, Mark

    2016-02-01

    Post-harvest colonization of sugar beets accompanied by rot development is a serious problem due to sugar losses and negative impact on processing quality. Studies on the microbial community associated with rot development and factors shaping their structure are missing. Therefore, high-throughput sequencing was applied to describe the influence of environment, plant genotype and storage temperature (8°C and 20°C) on three different communities in stored sugar beets, namely fungi (internal transcribed spacers 1 and 2), Fusarium spp. (elongation factor-1α gene fragment) and oomycetes (internal transcribed spacers 1). The composition of the fungal community changed during storage mostly influenced by the storage temperature followed by a weak environmental effect. Botrytis cinerea was the prevalent species at 8°C whereas members of the fungal genera Fusarium and Penicillium became dominant at 20°C. This shift was independent of the plant genotype. Species richness within the genus Fusarium also increased during storage at both temperatures whereas the oomycetes community did not change. Moreover, oomycetes species were absent after storage at 20°C. The results of the present study clearly show that rot development during sugar beet storage is associated with pathogens well known as causal agents of post-harvest diseases in many other crops. © FEMS 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  10. HTSSIP: An R package for analysis of high throughput sequencing data from nucleic acid stable isotope probing (SIP experiments.

    Directory of Open Access Journals (Sweden)

    Nicholas D Youngblut

    Full Text Available Combining high throughput sequencing with stable isotope probing (HTS-SIP is a powerful method for mapping in situ metabolic processes to thousands of microbial taxa. However, accurately mapping metabolic processes to taxa is complex and challenging. Multiple HTS-SIP data analysis methods have been developed, including high-resolution stable isotope probing (HR-SIP, multi-window high-resolution stable isotope probing (MW-HR-SIP, quantitative stable isotope probing (qSIP, and ΔBD. Currently, there is no publicly available software designed specifically for analyzing HTS-SIP data. To address this shortfall, we have developed the HTSSIP R package, an open-source, cross-platform toolset for conducting HTS-SIP analyses in a straightforward and easily reproducible manner. The HTSSIP package, along with full documentation and examples, is available from CRAN at https://cran.r-project.org/web/packages/HTSSIP/index.html and Github at https://github.com/buckleylab/HTSSIP.

  11. The First Report of miRNAs from a Thysanopteran Insect, Thrips palmi Karny Using High-Throughput Sequencing.

    Directory of Open Access Journals (Sweden)

    K B Rebijith

    Full Text Available Thrips palmi Karny (Thysanoptera: Thripidae is the sole vector of Watermelon bud necrosis tospovirus, where the crop loss has been estimated to be around USD 50 million annually. Chemical insecticides are of limited use in the management of T. palmi due to the thigmokinetic behaviour and development of high levels of resistance to insecticides. There is an urgent need to find out an effective futuristic management strategy, where the small RNAs especially microRNAs hold great promise as a key player in the growth and development. miRNAs are a class of short non-coding RNAs involved in regulation of gene expression either by mRNA cleavage or by translational repression. We identified and characterized a total of 77 miRNAs from T. palmi using high-throughput deep sequencing. Functional classifications of the targets for these miRNAs revealed that majority of them are involved in the regulation of transcription and translation, nucleotide binding and signal transduction. We have also validated few of these miRNAs employing stem-loop RT-PCR, qRT-PCR and Northern blot. The present study not only provides an in-depth understanding of the biological and physiological roles of miRNAs in governing gene expression but may also lead as an invaluable tool for the management of thysanopteran insects in the future.

  12. High throughput sequencing identifies an imprinted gene, Grb10, associated with the pluripotency state in nuclear transfer embryonic stem cells.

    Science.gov (United States)

    Li, Hui; Gao, Shuai; Huang, Hua; Liu, Wenqiang; Huang, Huanwei; Liu, Xiaoyu; Gao, Yawei; Le, Rongrong; Kou, Xiaochen; Zhao, Yanhong; Kou, Zhaohui; Li, Jia; Wang, Hong; Zhang, Yu; Wang, Hailin; Cai, Tao; Sun, Qingyuan; Gao, Shaorong; Han, Zhiming

    2017-07-18

    Somatic cell nuclear transfer and transcription factor mediated reprogramming are two widely used techniques for somatic cell reprogramming. Both fully reprogrammed nuclear transfer embryonic stem cells and induced pluripotent stem cells hold potential for regenerative medicine, and evaluation of the stem cell pluripotency state is crucial for these applications. Previous reports have shown that the Dlk1-Dio3 region is associated with pluripotency in induced pluripotent stem cells and the incomplete somatic cell reprogramming causes abnormally elevated levels of genomic 5-methylcytosine in induced pluripotent stem cells compared to nuclear transfer embryonic stem cells and embryonic stem cells. In this study, we compared pluripotency associated genes Rian and Gtl2 in the Dlk1-Dio3 region in exactly syngeneic nuclear transfer embryonic stem cells and induced pluripotent stem cells with same genomic insertion. We also assessed 5-methylcytosine and 5-hydroxymethylcytosine levels and performed high-throughput sequencing in these cells. Our results showed that Rian and Gtl2 in the Dlk1-Dio3 region related to pluripotency in induced pluripotent stem cells did not correlate with the genes in nuclear transfer embryonic stem cells, and no significant difference in 5-methylcytosine and 5-hydroxymethylcytosine levels were observed between fully and partially reprogrammed nuclear transfer embryonic stem cells and induced pluripotent stem cells. Through syngeneic comparison, our study identifies for the first time that Grb10 is associated with the pluripotency state in nuclear transfer embryonic stem cells.

  13. Identification and characterization of novel and conserved microRNAs in radish (Raphanus sativus L.) using high-throughput sequencing.

    Science.gov (United States)

    Xu, Liang; Wang, Yan; Xu, Yuanyuan; Wang, Liangju; Zhai, Lulu; Zhu, Xianwen; Gong, Yiqin; Ye, Shan; Liu, Liwang

    2013-03-01

    MicroRNAs (miRNAs) are endogenous, non-coding, small RNAs that play significant regulatory roles in plant growth, development, and biotic and abiotic stress responses. To date, a great number of conserved and species-specific miRNAs have been identified in many important plant species such as Arabidopsis, rice and poplar. However, little is known about identification of miRNAs and their target genes in radish (Raphanus sativus L.). In the present study, a small RNA library from radish root was constructed and sequenced using the high-throughput Solexa sequencing. Through sequence alignment and secondary structure prediction, a total of 545 conserved miRNA families as well as 15 novel (with their miRNA* strand) and 64 potentially novel miRNAs were identified. Quantitative real-time PCR (qRT-PCR) analysis confirmed that both conserved and novel miRNAs were expressed in radish, and some of them were preferentially expressed in certain tissues. A total of 196 potential target genes were predicted for 42 novel radish miRNAs. Gene ontology (GO) analysis showed that most of the targets were involved in plant growth, development, metabolism and stress responses. This study represents a first large-scale identification and characterization of radish miRNAs and their potential target genes. These results could lead to the further identification of radish miRNAs and enhance our understanding of radish miRNA regulatory mechanisms in diverse biological and metabolic processes. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.

  14. High-throughput sequencing of microRNAs in peripheral blood mononuclear cells: identification of potential weight loss biomarkers.

    Directory of Open Access Journals (Sweden)

    Fermín I Milagro

    Full Text Available INTRODUCTION: MicroRNAs (miRNAs are being increasingly studied in relation to energy metabolism and body composition homeostasis. Indeed, the quantitative analysis of miRNAs expression in different adiposity conditions may contribute to understand the intimate mechanisms participating in body weight control and to find new biomarkers with diagnostic or prognostic value in obesity management. OBJECTIVE: The aim of this study was the search for miRNAs in blood cells whose expression could be used as prognostic biomarkers of weight loss. METHODS: Ten Caucasian obese women were selected among the participants in a weight-loss trial that consisted in following an energy-restricted treatment. Weight loss was considered unsuccessful when 5% (responders. At baseline, total miRNA isolated from peripheral blood mononuclear cells (PBMC was sequenced with SOLiD v4. The miRNA sequencing data were validated by RT-PCR. RESULTS: Differential baseline expression of several miRNAs was found between responders and non-responders. Two miRNAs were up-regulated in the non-responder group (mir-935 and mir-4772 and three others were down-regulated (mir-223, mir-224 and mir-376b. Both mir-935 and mir-4772 showed relevant associations with the magnitude of weight loss, although the expression of other transcripts (mir-874, mir-199b, mir-766, mir-589 and mir-148b also correlated with weight loss. CONCLUSIONS: This research addresses the use of high-throughput sequencing technologies in the search for miRNA expression biomarkers in obesity, by determining the miRNA transcriptome of PBMC. Basal expression of different miRNAs, particularly mir-935 and mir-4772, could be prognostic biomarkers and may forecast the response to a hypocaloric diet.

  15. Unraveling Core Functional Microbiota in Traditional Solid-State Fermentation by High-Throughput Amplicons and Metatranscriptomics Sequencing

    Science.gov (United States)

    Song, Zhewei; Du, Hai; Zhang, Yan; Xu, Yan

    2017-01-01

    Fermentation microbiota is specific microorganisms that generate different types of metabolites in many productions. In traditional solid-state fermentation, the structural composition and functional capacity of the core microbiota determine the quality and quantity of products. As a typical example of food fermentation, Chinese Maotai-flavor liquor production involves a complex of various microorganisms and a wide variety of metabolites. However, the microbial succession and functional shift of the core microbiota in this traditional food fermentation remain unclear. Here, high-throughput amplicons (16S rRNA gene amplicon sequencing and internal transcribed space amplicon sequencing) and metatranscriptomics sequencing technologies were combined to reveal the structure and function of the core microbiota in Chinese soy sauce aroma type liquor production. In addition, ultra-performance liquid chromatography and headspace-solid phase microextraction-gas chromatography-mass spectrometry were employed to provide qualitative and quantitative analysis of the major flavor metabolites. A total of 10 fungal and 11 bacterial genera were identified as the core microbiota. In addition, metatranscriptomic analysis revealed pyruvate metabolism in yeasts (genera Pichia, Schizosaccharomyces, Saccharomyces, and Zygosaccharomyces) and lactic acid bacteria (genus Lactobacillus) classified into two stages in the production of flavor components. Stage I involved high-level alcohol (ethanol) production, with the genus Schizosaccharomyces serving as the core functional microorganism. Stage II involved high-level acid (lactic acid and acetic acid) production, with the genus Lactobacillus serving as the core functional microorganism. The functional shift from the genus Schizosaccharomyces to the genus Lactobacillus drives flavor component conversion from alcohol (ethanol) to acid (lactic acid and acetic acid) in Chinese Maotai-flavor liquor production. Our findings provide insight into the

  16. Unraveling Core Functional Microbiota in Traditional Solid-State Fermentation by High-Throughput Amplicons and Metatranscriptomics Sequencing.

    Science.gov (United States)

    Song, Zhewei; Du, Hai; Zhang, Yan; Xu, Yan

    2017-01-01

    Fermentation microbiota is specific microorganisms that generate different types of metabolites in many productions. In traditional solid-state fermentation, the structural composition and functional capacity of the core microbiota determine the quality and quantity of products. As a typical example of food fermentation, Chinese Maotai-flavor liquor production involves a complex of various microorganisms and a wide variety of metabolites. However, the microbial succession and functional shift of the core microbiota in this traditional food fermentation remain unclear. Here, high-throughput amplicons (16S rRNA gene amplicon sequencing and internal transcribed space amplicon sequencing) and metatranscriptomics sequencing technologies were combined to reveal the structure and function of the core microbiota in Chinese soy sauce aroma type liquor production. In addition, ultra-performance liquid chromatography and headspace-solid phase microextraction-gas chromatography-mass spectrometry were employed to provide qualitative and quantitative analysis of the major flavor metabolites. A total of 10 fungal and 11 bacterial genera were identified as the core microbiota. In addition, metatranscriptomic analysis revealed pyruvate metabolism in yeasts (genera Pichia, Schizosaccharomyces, Saccharomyces, and Zygosaccharomyces) and lactic acid bacteria (genus Lactobacillus) classified into two stages in the production of flavor components. Stage I involved high-level alcohol (ethanol) production, with the genus Schizosaccharomyces serving as the core functional microorganism. Stage II involved high-level acid (lactic acid and acetic acid) production, with the genus Lactobacillus serving as the core functional microorganism. The functional shift from the genus Schizosaccharomyces to the genus Lactobacillus drives flavor component conversion from alcohol (ethanol) to acid (lactic acid and acetic acid) in Chinese Maotai-flavor liquor production. Our findings provide insight into the

  17. Unraveling Core Functional Microbiota in Traditional Solid-State Fermentation by High-Throughput Amplicons and Metatranscriptomics Sequencing

    Directory of Open Access Journals (Sweden)

    Zhewei Song

    2017-07-01

    Full Text Available Fermentation microbiota is specific microorganisms that generate different types of metabolites in many productions. In traditional solid-state fermentation, the structural composition and functional capacity of the core microbiota determine the quality and quantity of products. As a typical example of food fermentation, Chinese Maotai-flavor liquor production involves a complex of various microorganisms and a wide variety of metabolites. However, the microbial succession and functional shift of the core microbiota in this traditional food fermentation remain unclear. Here, high-throughput amplicons (16S rRNA gene amplicon sequencing and internal transcribed space amplicon sequencing and metatranscriptomics sequencing technologies were combined to reveal the structure and function of the core microbiota in Chinese soy sauce aroma type liquor production. In addition, ultra-performance liquid chromatography and headspace-solid phase microextraction-gas chromatography-mass spectrometry were employed to provide qualitative and quantitative analysis of the major flavor metabolites. A total of 10 fungal and 11 bacterial genera were identified as the core microbiota. In addition, metatranscriptomic analysis revealed pyruvate metabolism in yeasts (genera Pichia, Schizosaccharomyces, Saccharomyces, and Zygosaccharomyces and lactic acid bacteria (genus Lactobacillus classified into two stages in the production of flavor components. Stage I involved high-level alcohol (ethanol production, with the genus Schizosaccharomyces serving as the core functional microorganism. Stage II involved high-level acid (lactic acid and acetic acid production, with the genus Lactobacillus serving as the core functional microorganism. The functional shift from the genus Schizosaccharomyces to the genus Lactobacillus drives flavor component conversion from alcohol (ethanol to acid (lactic acid and acetic acid in Chinese Maotai-flavor liquor production. Our findings provide

  18. A standardized framework for accurate, high-throughput genotyping of recombinant and non-recombinant viral sequences.

    Science.gov (United States)

    Alcantara, Luiz Carlos Junior; Cassol, Sharon; Libin, Pieter; Deforche, Koen; Pybus, Oliver G; Van Ranst, Marc; Galvão-Castro, Bernardo; Vandamme, Anne-Mieke; de Oliveira, Tulio

    2009-07-01

    Human immunodeficiency virus type-1 (HIV-1), hepatitis B and C and other rapidly evolving viruses are characterized by extremely high levels of genetic diversity. To facilitate diagnosis and the development of prevention and treatment strategies that efficiently target the diversity of these viruses, and other pathogens such as human T-lymphotropic virus type-1 (HTLV-1), human herpes virus type-8 (HHV8) and human papillomavirus (HPV), we developed a rapid high-throughput-genotyping system. The method involves the alignment of a query sequence with a carefully selected set of pre-defined reference strains, followed by phylogenetic analysis of multiple overlapping segments of the alignment using a sliding window. Each segment of the query sequence is assigned the genotype and sub-genotype of the reference strain with the highest bootstrap (>70%) and bootscanning (>90%) scores. Results from all windows are combined and displayed graphically using color-coded genotypes. The new Virus-Genotyping Tools provide accurate classification of recombinant and non-recombinant viruses and are currently being assessed for their diagnostic utility. They have incorporated into several HIV drug resistance algorithms including the Stanford (http://hivdb.stanford.edu) and two European databases (http://www.umcutrecht.nl/subsite/spread-programme/ and http://www.hivrdb.org.uk/) and have been successfully used to genotype a large number of sequences in these and other databases. The tools are a PHP/JAVA web application and are freely accessible on a number of servers including: http://bioafrica.mrc.ac.za/rega-genotype/html/, http://lasp.cpqgm.fiocruz.br/virus-genotype/html/, http://jose.med.kuleuven.be/genotypetool/html/.

  19. Identification and characterization of microRNAs in Humulus lupulus using high-throughput sequencing and their response to Citrus bark cracking viroid (CBCVd) infection

    Czech Academy of Sciences Publication Activity Database

    Mishra, Ajay Kumar; Duraisamy, Ganesh Selvaraj; Matoušek, Jaroslav; Radišek, S.; Javornik, B.; Jakše, J.

    2016-01-01

    Roč. 17, č. 919 (2016) ISSN 1471-2164 R&D Projects: GA MŠk(CZ) LH14255 Institutional support: RVO:60077344 Keywords : Humulus lupulus * High-throughput sequencing * Citrus bark cracking viroid Subject RIV: EB - Genetics ; Molecular Biology Impact factor: 3.729, year: 2016

  20. Short-term assessment of BCR repertoires of SLE patients after high dose glucocorticoid therapy with high-throughput sequencing.

    Science.gov (United States)

    Shi, Bin; Yu, Jiang; Ma, Long; Ma, Qingqing; Liu, Chunmei; Sun, Suhong; Ma, Rui; Yao, Xinsheng

    2016-01-01

    We analyze and assess BCR repertoires of SLE patients before and after high dose glucocorticoid therapy to address two fundamental questions: (1) After the treatment, how the BCR repertoire of SLE patient change on the clone level? (2) How to screen putative autoantibody clone set from BCR repertoire of SLE patients? The PBMCs of two SLE patients (P1 and P2) at different time points were collected, and DNA of these samples were extracted. High-throughput sequencing technology was applied in detection of BCR repertoire. Finally, we used bioinformatic methodology to analyse sequence data. We found that these two patients lost some IGHV3 family genes usage after treatment compared with before treatment. For pairing of IGHV-IGHJ gene, no significant change was shown for each patient. In addition, analyses of the composition of H-CDR3 showed overall AA compositions of H-CDR3 at three time points in each SLE patients were very similar, and the results of H-CDR3 AA usage that had the same length (14 AA) and the same position were similar. Antinuclear antibody tests of SLE patients showed that level of some antinuclear antibodies reduced after treatment; however, there was no sign that the percentage of autoantibody clones in BCR repertoires would reduce. High dose glucocorticoid treatment in short term will have little impact on composition of BCR repertoire of SLE patient. Treatment can reduce the amount of autoantibody in the protein level, but may not reduce the percentage of autoantibody clones in BCR repertoire in the clonal level.

  1. Comparison of analysis tools for miRNA high throughput sequencing using nerve crush as a model

    Directory of Open Access Journals (Sweden)

    Raghu Prasad Rao Metpally

    2013-03-01

    Full Text Available Recent advances in sample preparation and analysis for next generation sequencing have made it possible to profile and discover new miRNAs in a high throughput manner. In the case of neurological disease and injury, these types of experiments have been more limited. Possibly because tissues such as the brain and spinal cord are inaccessible for direct sampling in living patients, and indirect sampling of blood and cerebrospinal fluid are affected by low amounts of RNA. We used a mouse model to examine changes in miRNA expression in response to acute nerve crush. We assayed miRNA from both muscle tissue and blood plasma. We examined how the depth of coverage (the number of mapped reads changed the number of detectable miRNAs in each sample type. We also found that samples with very low starting amounts of RNA (mouse plasma made high depth of mature miRNA coverage more difficult to obtain. Each tissue must be assessed independently for the depth of coverage required to adequately power detection of differential expression, weighed against the cost of sequencing that sample to the adequate depth. We explored the changes in total mapped reads and differential expression results generated by three different software packages: miRDeep2, miRNAKey, and miRExpress and two different analysis packages, DESeq and EdgeR. We also examine the accuracy of using miRDeep2 to predict novel miRNAs and subsequently detect them in the samples using qRT-PCR.

  2. High-throughput sequencing and copy number variation detection using formalin fixed embedded tissue in metastatic gastric cancer.

    Directory of Open Access Journals (Sweden)

    Seokhwi Kim

    Full Text Available In the era of targeted therapy, mutation profiling of cancer is a crucial aspect of making therapeutic decisions. To characterize cancer at a molecular level, the use of formalin-fixed paraffin-embedded tissue is important. We tested the Ion AmpliSeq Cancer Hotspot Panel v2 and nCounter Copy Number Variation Assay in 89 formalin-fixed paraffin-embedded gastric cancer samples to determine whether they are applicable in archival clinical samples for personalized targeted therapies. We validated the results with Sanger sequencing, real-time quantitative PCR, fluorescence in situ hybridization and immunohistochemistry. Frequently detected somatic mutations included TP53 (28.17%, APC (10.1%, PIK3CA (5.6%, KRAS (4.5%, SMO (3.4%, STK11 (3.4%, CDKN2A (3.4% and SMAD4 (3.4%. Amplifications of HER2, CCNE1, MYC, KRAS and EGFR genes were observed in 8 (8.9%, 4 (4.5%, 2 (2.2%, 1 (1.1% and 1 (1.1% cases, respectively. In the cases with amplification, fluorescence in situ hybridization for HER2 verified gene amplification and immunohistochemistry for HER2, EGFR and CCNE1 verified the overexpression of proteins in tumor cells. In conclusion, we successfully performed semiconductor-based sequencing and nCounter copy number variation analyses in formalin-fixed paraffin-embedded gastric cancer samples. High-throughput screening in archival clinical samples enables faster, more accurate and cost-effective detection of hotspot mutations or amplification in genes.

  3. High-throughput sequence analysis of small RNAs in grapevine (Vitis vinifera L.) affected by grapevine leafroll disease.

    Science.gov (United States)

    Alabi, Olufemi J; Zheng, Yun; Jagadeeswaran, Guru; Sunkar, Ramanjulu; Naidu, Rayapati A

    2012-12-01

    Grapevine leafroll disease (GLRD) is one of the most economically important virus diseases of grapevine (Vitis spp.) worldwide. In this study, we used high-throughput sequencing of cDNA libraries made from small RNAs (sRNAs) to compare profiles of sRNA populations recovered from own-rooted Merlot grapevines with and without GLRD symptoms. The data revealed the presence of sRNAs specific to Grapevine leafroll-associated virus 3, Hop stunt viroid (HpSVd), Grapevine yellow speckle viroid 1 (GYSVd-1) and Grapevine yellow speckle viroid 2 (GYSVd-2) in symptomatic grapevines and sRNAs specific only to HpSVd, GYSVd-1 and GYSVd-2 in nonsymptomatic grapevines. In addition to 135 previously identified conserved microRNAs in grapevine (Vvi-miRs), we identified 10 novel and several candidate Vvi-miRs in both symptomatic and nonsymptomatic grapevine leaves based on the cloning of miRNA star sequences. Quantitative real-time reverse transcriptase-polymerase chain reaction (RT-PCR) of selected conserved Vvi-miRs indicated that individual members of an miRNA family are differentially expressed in symptomatic and nonsymptomatic leaves. The high-resolution mapping of sRNAs specific to an ampelovirus and three viroids in mixed infections, the identification of novel Vvi-miRs and the modulation of certain conserved Vvi-miRs offers resources for the further elucidation of compatible host-pathogen interactions and for the provision of ecologically relevant information to better understand host-pathogen-environment interactions in a perennial fruit crop. © 2012 THE AUTHORS. MOLECULAR PLANT PATHOLOGY © 2012 BSPP AND BLACKWELL PUBLISHING LTD.

  4. High-throughput sequencing of 16S rDNA amplicons characterizes bacterial composition in bronchoalveolar lavage fluid in patients with ventilator-associated pneumonia

    Directory of Open Access Journals (Sweden)

    Yang XJ

    2015-08-01

    Full Text Available Xiao-Jun Yang,1,* Yan-Bo Wang,2,3,* Zhi-Wei Zhou,4,* Guo-Wei Wang,2 Xiao-Hong Wang,1 Qing-Fu Liu,1 Shu-Feng Zhou,4 Zhen-Hai Wang2,3 1Department of Intensive Care Unit, 2Neurology Center, General Hospital of Ningxia Medical University, Yinchuan, Ningxia, People’s Republic of China; 3Key Laboratory of Brain Diseases of Ningxia, Yinchuan, Ningxia, People’s Republic of China; 4Department of Pharmaceutical Sciences, College of Pharmacy, University of South Florida, Tampa, FL, USA *These authors contributed equally to this work Abstract: Ventilator-associated pneumonia (VAP is a life-threatening disease that is associated with high rates of morbidity and likely mortality, placing a heavy burden on an individual and society. Currently available diagnostic and therapeutic approaches for VAP treatment are limited, and the prognosis of VAP is poor. The present study aimed to reveal and discriminate the identification of the full spectrum of the pathogens in patients with VAP using high-throughput sequencing approach and analyze the species richness and complexity via alpha and beta diversity analysis. The bronchoalveolar lavage fluid samples were collected from 27 patients with VAP in intensive care unit. The polymerase chain reaction products of the hypervariable regions of 16S rDNA gene in these 27 samples of VAP were sequenced using the 454 GS FLX system. A total of 103,856 pyrosequencing reads and 638 operational taxonomic units were obtained from these 27 samples. There were four dominant phyla, including Proteobacteria, Firmicutes, Actinobacteria, and Bacteroidetes. There were 90 different genera, of which 12 genera occurred in over ten different samples. The top five dominant genera were Streptococcus, Acinetobacter, Limnohabitans, Neisseria, and Corynebacterium, and the most widely distributed genera were Streptococcus, Limnohabitans, and Acinetobacter in these 27 samples. Of note, the mixed profile of causative pathogens was observed. Taken

  5. HT-COMET: a novel automated approach for high throughput assessment of human sperm chromatin quality.

    Science.gov (United States)

    Albert, Océane; Reintsch, Wolfgang E; Chan, Peter; Robaire, Bernard

    2016-05-01

    Can we make the comet assay (single-cell gel electrophoresis) for human sperm a more accurate and informative high throughput assay? We developed a standardized automated high throughput comet (HT-COMET) assay for human sperm that improves its accuracy and efficiency, and could be of prognostic value to patients in the fertility clinic. The comet assay involves the collection of data on sperm DNA damage at the level of the single cell, allowing the use of samples from severe oligozoospermic patients. However, this makes comet scoring a low throughput procedure that renders large cohort analyses tedious. Furthermore, the comet assay comes with an inherent vulnerability to variability. Our objective is to develop an automated high throughput comet assay for human sperm that will increase both its accuracy and efficiency. The study comprised two distinct components: a HT-COMET technical optimization section based on control versus DNAse treatment analyses ( ITALIC! n = 3-5), and a cross-sectional study on 123 men presenting to a reproductive center with sperm concentrations categorized as severe oligozoospermia, oligozoospermia or normozoospermia. Sperm chromatin quality was measured using the comet assay: on classic 2-well slides for software comparison; on 96-well slides for HT-COMET optimization; after exposure to various concentrations of a damage-inducing agent, DNAse, using HT-COMET; on 123 subjects with different sperm concentrations using HT-COMET. Data from the 123 subjects were correlated to classic semen quality parameters and plotted as single-cell data in individual DNA damage profiles. We have developed a standard automated HT-COMET procedure for human sperm. It includes automated scoring of comets by a fully integrated high content screening setup that compares well with the most commonly used semi-manual analysis software. Using this method, a cross-sectional study on 123 men showed no significant correlation between sperm concentration and sperm DNA

  6. HT-COMET: a novel automated approach for high throughput assessment of human sperm chromatin quality

    Science.gov (United States)

    Albert, Océane; Reintsch, Wolfgang E.; Chan, Peter; Robaire, Bernard

    2016-01-01

    STUDY QUESTION Can we make the comet assay (single-cell gel electrophoresis) for human sperm a more accurate and informative high throughput assay? SUMMARY ANSWER We developed a standardized automated high throughput comet (HT-COMET) assay for human sperm that improves its accuracy and efficiency, and could be of prognostic value to patients in the fertility clinic. WHAT IS KNOWN ALREADY The comet assay involves the collection of data on sperm DNA damage at the level of the single cell, allowing the use of samples from severe oligozoospermic patients. However, this makes comet scoring a low throughput procedure that renders large cohort analyses tedious. Furthermore, the comet assay comes with an inherent vulnerability to variability. Our objective is to develop an automated high throughput comet assay for human sperm that will increase both its accuracy and efficiency. STUDY DESIGN, SIZE, DURATION The study comprised two distinct components: a HT-COMET technical optimization section based on control versus DNAse treatment analyses (n = 3–5), and a cross-sectional study on 123 men presenting to a reproductive center with sperm concentrations categorized as severe oligozoospermia, oligozoospermia or normozoospermia. PARTICIPANTS/MATERIALS, SETTING, METHODS Sperm chromatin quality was measured using the comet assay: on classic 2-well slides for software comparison; on 96-well slides for HT-COMET optimization; after exposure to various concentrations of a damage-inducing agent, DNAse, using HT-COMET; on 123 subjects with different sperm concentrations using HT-COMET. Data from the 123 subjects were correlated to classic semen quality parameters and plotted as single-cell data in individual DNA damage profiles. MAIN RESULTS AND THE ROLE OF CHANCE We have developed a standard automated HT-COMET procedure for human sperm. It includes automated scoring of comets by a fully integrated high content screening setup that compares well with the most commonly used semi

  7. High-throughput screening approaches and combinatorial development of biomaterials using microfluidics.

    Science.gov (United States)

    Barata, David; van Blitterswijk, Clemens; Habibovic, Pamela

    2016-04-01

    From the first microfluidic devices used for analysis of single metabolic by-products to highly complex multicompartmental co-culture organ-on-chip platforms, efforts of many multidisciplinary teams around the world have been invested in overcoming the limitations of conventional research methods in the biomedical field. Close spatial and temporal control over fluids and physical parameters, integration of sensors for direct read-out as well as the possibility to increase throughput of screening through parallelization, multiplexing and automation are some of the advantages of microfluidic over conventional, 2D tissue culture in vitro systems. Moreover, small volumes and relatively small cell numbers used in experimental set-ups involving microfluidics, can potentially decrease research cost. On the other hand, these small volumes and numbers of cells also mean that many of the conventional molecular biology or biochemistry assays cannot be directly applied to experiments that are performed in microfluidic platforms. Development of different types of assays and evidence that such assays are indeed a suitable alternative to conventional ones is a step that needs to be taken in order to have microfluidics-based platforms fully adopted in biomedical research. In this review, rather than providing a comprehensive overview of the literature on microfluidics, we aim to discuss developments in the field of microfluidics that can aid advancement of biomedical research, with emphasis on the field of biomaterials. Three important topics will be discussed, being: screening, in particular high-throughput and combinatorial screening; mimicking of natural microenvironment ranging from 3D hydrogel-based cellular niches to organ-on-chip devices; and production of biomaterials with closely controlled properties. While important technical aspects of various platforms will be discussed, the focus is mainly on their applications, including the state-of-the-art, future perspectives and

  8. Non-invasive high throughput approach for protein hydrophobicity determination based on surface tension.

    Science.gov (United States)

    Amrhein, Sven; Bauer, Katharina Christin; Galm, Lara; Hubbuch, Jürgen

    2015-12-01

    The surface hydrophobicity of a protein is an important factor for its interactions in solution and thus the outcome of its production process. Yet most of the methods are not able to evaluate the influence of these hydrophobic interactions under natural conditions. In the present work we have established a high resolution stalagmometric method for surface tension determination on a liquid handling station, which can cope with accuracy as well as high throughput requirements. Surface tensions could be derived with a low sample consumption (800 μL) and a high reproducibility (surface tension was correlated to the hydrophobicity of lysozyme, human lysozyme, BSA, and α-lactalbumin. Differences in proteins' hydrophobic character depending on pH and species could be resolved. Within this work we have developed a pH dependent hydrophobicity ranking, which was found to be in good agreement with literature. For the studied pH range of 3-9 lysozyme from chicken egg white was identified to be the most hydrophilic. α-lactalbumin at pH 3 exhibited the most pronounced hydrophobic character. The stalagmometric method occurred to outclass the widely used spectrophotometric method with bromophenol blue sodium salt as it gave reasonable results without restrictions on pH and protein species. © 2015 Wiley Periodicals, Inc.

  9. Combinatorial approach toward high-throughput analysis of direct methanol fuel cells.

    Science.gov (United States)

    Jiang, Rongzhong; Rong, Charles; Chu, Deryn

    2005-01-01

    A 40-member array of direct methanol fuel cells (with stationary fuel and convective air supplies) was generated by electrically connecting the fuel cells in series. High-throughput analysis of these fuel cells was realized by fast screening of voltages between the two terminals of a fuel cell at constant current discharge. A large number of voltage-current curves (200) were obtained by screening the voltages through multiple small-current steps. Gaussian distribution was used to statistically analyze the large number of experimental data. The standard deviation (sigma) of voltages of these fuel cells increased linearly with discharge current. The voltage-current curves at various fuel concentrations were simulated with an empirical equation of voltage versus current and a linear equation of sigma versus current. The simulated voltage-current curves fitted the experimental data well. With increasing methanol concentration from 0.5 to 4.0 M, the Tafel slope of the voltage-current curves (at sigma=0.0), changed from 28 to 91 mV.dec-1, the cell resistance from 2.91 to 0.18 Omega, and the power output from 3 to 18 mW.cm-2.

  10. Prioritizing Environmental Chemicals for Obesity and Diabetes Outcomes Research: A Screening Approach Using ToxCast™ High-Throughput Data.

    Science.gov (United States)

    Auerbach, Scott; Filer, Dayne; Reif, David; Walker, Vickie; Holloway, Alison C; Schlezinger, Jennifer; Srinivasan, Supriya; Svoboda, Daniel; Judson, Richard; Bucher, John R; Thayer, Kristina A

    2016-08-01

    Diabetes and obesity are major threats to public health in the United States and abroad. Understanding the role that chemicals in our environment play in the development of these conditions is an emerging issue in environmental health, although identifying and prioritizing chemicals for testing beyond those already implicated in the literature is challenging. This review is intended to help researchers generate hypotheses about chemicals that may contribute to diabetes and to obesity-related health outcomes by summarizing relevant findings from the U.S. Environmental Protection Agency (EPA) ToxCast™ high-throughput screening (HTS) program. Our aim was to develop new hypotheses around environmental chemicals of potential interest for diabetes- or obesity-related outcomes using high-throughput screening data. We identified ToxCast™ assay targets relevant to several biological processes related to diabetes and obesity (insulin sensitivity in peripheral tissue, pancreatic islet and β cell function, adipocyte differentiation, and feeding behavior) and presented chemical screening data against those assay targets to identify chemicals of potential interest. The results of this screening-level analysis suggest that the spectrum of environmental chemicals to consider in research related to diabetes and obesity is much broader than indicated by research papers and reviews published in the peer-reviewed literature. Testing hypotheses based on ToxCast™ data will also help assess the predictive utility of this HTS platform. More research is required to put these screening-level analyses into context, but the information presented in this review should facilitate the development of new hypotheses. Auerbach S, Filer D, Reif D, Walker V, Holloway AC, Schlezinger J, Srinivasan S, Svoboda D, Judson R, Bucher JR, Thayer KA. 2016. Prioritizing environmental chemicals for obesity and diabetes outcomes research: a screening approach using ToxCast™ high-throughput data. Environ

  11. High-throughput sequencing technology to reveal the composition and function of cecal microbiota in Dagu chicken.

    Science.gov (United States)

    Xu, Yunhe; Yang, Huixin; Zhang, Lili; Su, Yuhong; Shi, Donghui; Xiao, Haidi; Tian, Yumin

    2016-11-04

    The chicken gut microbiota is an important and complicated ecosystem for the host. They play an important role in converting food into nutrient and energy. The coding capacity of microbiome vastly surpasses that of the host's genome, encoding biochemical pathways that the host has not developed. An optimal gut microbiota can increase agricultural productivity. This study aims to explore the composition and function of cecal microbiota in Dagu chicken under two feeding modes, free-range (outdoor, OD) and cage (indoor, ID) raising. Cecal samples were collected from 24 chickens across 4 groups (12-w OD, 12-w ID, 18-w OD, and 18-w ID). We performed high-throughput sequencing of the 16S rRNA genes V4 hypervariable regions to characterize the cecal microbiota of Dagu chicken and compare the difference of cecal microbiota between free-range and cage raising chickens. It was found that 34 special operational taxonomic units (OTUs) in OD groups and 4 special OTUs in ID groups. 24 phyla were shared by the 24 samples. Bacteroidetes was the most abundant phylum with the largest proportion, followed by Firmicutes and Proteobacteria. The OD groups showed a higher proportion of Bacteroidetes (>50 %) in cecum, but a lower Firmicutes/Bacteroidetes ratio in both 12-w old (0.42, 0.62) and 18-w old groups (0.37, 0.49) compared with the ID groups. Cecal microbiota in the OD groups have higher abundance of functions involved in amino acids and glycan metabolic pathway. The composition and function of cecal microbiota in Dagu chicken under two feeding modes, free-range and cage raising are different. The cage raising mode showed a lower proportion of Bacteroidetes in cecum, but a higher Firmicutes/Bacteroidetes ratio compared with free-range mode. Cecal microbiota in free-range mode have higher abundance of functions involved in amino acids and glycan metabolic pathway.

  12. Predicting the origin of soil evidence: High throughput eukaryote sequencing and MIR spectroscopy applied to a crime scene scenario.

    Science.gov (United States)

    Young, Jennifer M; Weyrich, Laura S; Breen, James; Macdonald, Lynne M; Cooper, Alan

    2015-06-01

    Soil can serve as powerful trace evidence in forensic casework, because it is highly individualistic and can be characterised using a number of techniques. Complex soil matrixes can support a vast number of organisms that can provide a site-specific signal for use in forensic soil discrimination. Previous DNA fingerprinting techniques rely on variations in fragment length to distinguish between soil profiles and focus solely on microbial communities. However, the recent development of high throughput sequencing (HTS) has the potential to provide a more detailed picture of the soil community by accessing non-culturable microorganisms and by identifying specific bacteria, fungi, and plants within soil. To demonstrate the application of HTS to forensic soil analysis, 18S ribosomal RNA profiles of six forensic mock crime scene samples were compared to those collected from seven reference locations across South Australia. Our results demonstrate the utility of non-bacterial DNA to discriminate between different sites, and were able to link a soil to a particular location. In addition, HTS complemented traditional Mid Infrared (MIR) spectroscopy soil profiling, but was able to provide statistically stronger discriminatory power at a finer scale. Through the design of an experimental case scenario, we highlight the considerations and potential limitations of this method in forensic casework. We show that HTS analysis of soil eukaryotes was robust to environmental variation, e.g. rainfall and temperature, transfer effects, storage effects and spatial variation. In addition, this study utilises novel analytical methodologies to interpret results for investigative purposes and provides prediction statistics to support soil DNA analysis for evidential stages of a case. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  13. QTL Mapping for Rice RVA Properties Using High-Throughput Re-sequenced Chromosome Segment Substitution Lines

    Directory of Open Access Journals (Sweden)

    Chang-quan ZHANG

    2013-11-01

    Full Text Available The rapid visco analyser (RVA profile is an important factor for evaluation of the cooking and eating quality of rice. To improve rice quality, the identification of new quantitative trait loci (QTLs for RVA profiling is of great significance. We used a japonica rice cultivar Nipponbare as the recipient and indica rice 9311 as the donor to develop a population containing 38 chromosome segment substitution lines (CSSLs genotyped by a high-throughput re-sequencing strategy. In this study, the population and the parent lines, which contained similar apparent amylose contents, were used to map the QTLs of RVA properties including peak paste viscosity (PKV, hot paste viscosity (HPV, cool paste viscosity (CPV, breakdown viscosity (BKV, setback viscosity (SBV, consistency viscosity (CSV, peak time (PeT and pasting temperature (PaT. QTL analysis was carried out using one-way analysis of variance and Dunnett's test, and stable QTLs were identified over two years and under two environments. We identified 10 stable QTLs: qPKV2-1, qSBV2-1; qPKV5-1, qHPV5-1, qCPV5-1; qPKV7-1, qHPV7-1, qCPV7-1, qSBV7-1; and qPKV8-1 on chromosomes 2, 5, 7 and 8, respectively, with contributions ranging from −95.6% to 47.1%. Besides, there was pleiotropy in the QTLs on chromosomes 2, 5 and 7.

  14. Technological Innovations for High-Throughput Approaches to In Vitro Allergy Diagnosis.

    Science.gov (United States)

    Chapman, Martin D; Wuenschmann, Sabina; King, Eva; Pomés, Anna

    2015-07-01

    Allergy diagnostics is being transformed by the advent of in vitro IgE testing using purified allergen molecules, combined with multiplex technology and biosensors, to deliver discriminating, sensitive, and high-throughput molecular diagnostics at the point of care. Essential elements of IgE molecular diagnostics are purified natural or recombinant allergens with defined purity and IgE reactivity, planar or bead-based multiplex systems to enable IgE to multiple allergens to be measured simultaneously, and, most recently, nanotechnology-based biosensors that facilitate rapid reaction rates and delivery of test results via mobile devices. Molecular diagnostics relies on measurement of IgE to purified allergens, the "active ingredients" of allergenic extracts. Typically, this involves measuring IgE to multiple allergens which is facilitated by multiplex technology and biosensors. The technology differentiates between clinically significant cross-reactive allergens (which could not be deduced by conventional IgE assays using allergenic extracts) and provides better diagnostic outcomes. Purified allergens are manufactured under good laboratory practice and validated using protein chemistry, mass spectrometry, and IgE antibody binding. Recently, multiple allergens (from dog) were expressed as a single molecule with high diagnostic efficacy. Challenges faced by molecular allergy diagnostic companies include generation of large panels of purified allergens with known diagnostic efficacy, access to flexible and robust array or sensor technology, and, importantly, access to well-defined serum panels form allergic patients for product development and validation. Innovations in IgE molecular diagnostics are rapidly being brought to market and will strengthen allergy testing at the point of care.

  15. High-throughput sequencing of 16S rDNA amplicons characterizes bacterial composition in cerebrospinal fluid samples from patients with purulent meningitis

    Directory of Open Access Journals (Sweden)

    Liu A

    2015-08-01

    Full Text Available Aicui Liu,1,2,* Chao Wang,1,2,* Zhijuan Liang,3 Zhi-Wei Zhou,4 Lin Wang,1,2 Qiaoli Ma,1,2 Guowei Wang,1,2 Shu-Feng Zhou,4 Zhenhai Wang1,2 1Neurology Center, General Hospital of Ningxia Medical University, Yinchuan, Ningxia; 2Key Laboratory of Brain Diseases of Ningxia, Yinchuan, Ningxia; 3Department of Neurology, The First People’s Hospital of Lanzhou, Lanzhou, Gansu, People’s Republic of China; 4Department of Pharmaceutical Sciences, College of Pharmacy, University of South Florida, Tampa, FL, USA *These authors contributed equally to this work Abstract: Purulent meningitis (PM is a severe infectious disease that is associated with high rates of morbidity and mortality. It has been recognized that bacterial infection is a major contributing factor to the pathogenesis of PM. However, there is a lack of information on the bacterial composition in PM, due to the low positive rate of cerebrospinal fluid bacterial culture. Herein, we aimed to discriminate and identify the main pathogens and bacterial composition in cerebrospinal fluid sample from PM patients using high-throughput sequencing approach. The cerebrospinal fluid samples were collected from 26 PM patients, and were determined as culture-negative samples. The polymerase chain reaction products of the hypervariable regions of 16S rDNA gene in these 26 samples of PM were sequenced using the 454 GS FLX system. The results showed that there were 71,440 pyrosequencing reads, of which, the predominant phyla were Proteobacteria and Firmicutes; and the predominant genera were Streptococcus, Acinetobacter, Pseudomonas, and Neisseria. The bacterial species in the cerebrospinal fluid were complex, with 61.5% of the samples presenting with mixed pathogens. A significant number of bacteria belonging to a known pathogenic potential was observed. The number of operational taxonomic units for individual samples ranged from six to 75 and there was a comparable difference in the species diversity that

  16. High-throughput next-generation sequencing to genotype six classical HLA loci from 96 donors in a single MiSeq run.

    Science.gov (United States)

    Ehrenberg, P K; Geretz, A; Sindhu, R K; Vayntrub, T; Fernández Viña, M A; Apps, R; Michael, N L; Thomas, R

    2017-11-01

    Next generation sequencing (NGS) methods have been established as an efficient approach for HLA typing because unlike traditional Sanger sequencing, they provide unambiguous results at a reasonable cost. We previously developed a multi-locus index method to genotype four HLA loci (A, B, C, and DRB1) on the Illumina MiSeq platform. We have now expanded this method to include two additional loci, HLA-DPB1 and DQB1. Contiguous full-length amplicons from 5'UTR through 3'UTR regions were generated using one long-range PCR reaction per locus for each of the six loci from 96 individuals of different ethnicities. The six amplicons from each donor were pooled, enzymatically fragmented and given a donor-specific index. This approach enabled sequencing of 576 loci from 96 individuals in a single MiSeq run. Donor-specific sequence reads were demultiplexed, and allele calls were generated from FASTQ files using commercially available software. Comparison to HLA genotypes generated from Sanger sequence-based typing (SBT) identified no discordances among any of the alleles analyzed in this study. Importantly, this method was able to resolve 22 DPB1 and 20 DQB1 alleles that were ambiguous with the SBT method. Furthermore, a novel allele in each of these two loci was identified, with the DQB1*05:01:24 allele having a frequency of greater than five percent. This method was subsequently validated against a blinded panel of 22 samples from the 17th International HLA and Immunogenetics Workshop. The flexibility of the method is further highlighted by successful genotyping of eight loci comprising all classical HLA loci for a subset of the samples. We now present a high-throughput, high-resolution, scalable NGS HLA typing method to accurately and efficiently genotype all classical HLA class I and II loci. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  17. The use of coded PCR primers enables high-throughput sequencing of multiple homolog amplification products by 454 parallel sequencing

    DEFF Research Database (Denmark)

    Binladen, Jonas; Gilbert, M Thomas P; Bollback, Jonathan P

    2007-01-01

    to the correct source once sequencing anomalies are accounted for (miss-assignment ratebias in the distribution of the differently tagged...... primers that is dependent on the 5' nucleotide of the tag. In particular, primers 5' labelled with a cytosine are heavily overrepresented among the final sequences, while those 5' labelled with a thymine are strongly underrepresented. A weaker bias also exists with regards to the distribution...

  18. Metagenomic analysis of taxa associated with Lutzomyia longipalpis, vector of visceral leishmaniasis, using an unbiased high-throughput approach.

    Directory of Open Access Journals (Sweden)

    Christina B McCarthy

    2011-09-01

    Full Text Available BACKGROUND: Leishmaniasis is one of the most diverse and complex of all vector-borne diseases worldwide. It is caused by parasites of the genus Leishmania, obligate intramacrophage protists characterised by diversity and complexity. Its most severe form is visceral leishmaniasis (VL, a systemic disease that is fatal if left untreated. In Latin America VL is caused by Leishmania infantum chagasi and transmitted by Lutzomyia longipalpis. This phlebotomine sandfly is only found in the New World, from Mexico to Argentina. In South America, migration and urbanisation have largely contributed to the increase of VL as a public health problem. Moreover, the first VL outbreak was recently reported in Argentina, which has already caused 7 deaths and 83 reported cases. METHODOLOGY/PRINCIPAL FINDINGS: An inventory of the microbiota associated with insect vectors, especially of wild specimens, would aid in the development of novel strategies for controlling insect vectors. Given the recent VL outbreak in Argentina and the compelling need to develop appropriate control strategies, this study focused on wild male and female Lu. longipalpis from an Argentine endemic (Posadas, Misiones and a Brazilian non-endemic (Lapinha Cave, Minas Gerais VL location. Previous studies on wild and laboratory reared female Lu. longipalpis have described gut bacteria using standard bacteriological methods. In this study, total RNA was extracted from the insects and submitted to high-throughput pyrosequencing. The analysis revealed the presence of sequences from bacteria, fungi, protist parasites, plants and metazoans. CONCLUSIONS/SIGNIFICANCE: This is the first time an unbiased and comprehensive metagenomic approach has been used to survey taxa associated with an infectious disease vector. The identification of gregarines suggested they are a possible efficient control method under natural conditions. Ongoing studies are determining the significance of the associated taxa found

  19. Scaling up: A guide to high throughput genomic approaches for biodiversity analysis.

    Science.gov (United States)

    Porter, Teresita M; Hajibabaei, Mehrdad

    2018-01-02

    The purpose of this review is to present the most common and emerging DNA-based methods used to generate data for biodiversity and biomonitoring studies. Since environmental assessment and monitoring programs may require biodiversity information at multiple levels, we pay particular attention to the DNA metabarcoding method and discuss a number of bioinformatic tools and considerations for producing DNA-based indicators using operational taxonomic units (OTUs), taxa at a variety of ranks, and community composition. By developing the capacity to harness the advantages provided by the newest technologies, investigators can 'scale-up' by increasing the number of samples and replicates processed, the frequency of sampling over time and space, and even the depth of sampling such as by sequencing more reads per sample or more markers per sample. The ability to scale-up is made possible by the reduced hands-on time and cost per sample provided by the newest kits, platforms, and software tools. Results gleaned from broad-scale monitoring will provide an opportunity to address key scientific questions linked to biodiversity and its dynamics across time and space as well as being more relevant for policy makers, enabling science-based decision making, and provide a greater socio-economic impact. Since genomic approaches are continually evolving, we provide this guide to methods used in biodiversity genomics. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.

  20. Identification of microRNAs in the Toxigenic Dinoflagellate Alexandrium catenella by High-Throughput Illumina Sequencing and Bioinformatic Analysis.

    Directory of Open Access Journals (Sweden)

    Huili Geng

    Full Text Available Micro-ribonucleic acids (miRNAs are a large group of endogenous, tiny, non-coding RNAs consisting of 19-25 nucleotides that regulate gene expression at either the transcriptional or post-transcriptional level by mediating gene silencing in eukaryotes. They are considered to be important regulators that affect growth, development, and response to various stresses in plants. Alexandrium catenella is an important marine toxic phytoplankton species that can cause harmful algal blooms (HABs. To date, identification and function analysis of miRNAs in A. catenella remain largely unexamined. In this study, high-throughput sequencing was performed on A. catenella to identify and quantitatively profile the repertoire of small RNAs from two different growth phases. A total of 38,092,056 and 32,969,156 raw reads were obtained from the two small RNA libraries, respectively. In total, 88 mature miRNAs belonging to 32 miRNA families were identified. Significant differences were found in the member number, expression level of various families, and expression abundance of each member within a family. A total of 15 potentially novel miRNAs were identified. Comparative profiling showed that 12 known miRNAs exhibited differential expression between the lag phase and the logarithmic phase. Real-time quantitative RT-PCR (qPCR was performed to confirm the expression of two differentially expressed miRNAs that were one up-regulated novel miRNA (aca-miR-3p-456915, and one down-regulated conserved miRNA (tae-miR159a. The expression trend of the qPCR assay was generally consistent with the deep sequencing result. Target predictions of the 12 differentially expressed miRNAs resulted in 1813 target genes. Gene ontology (GO analysis and the Kyoto Encyclopedia of Genes and Genomes pathway database (KEGG annotations revealed that some miRNAs were associated with growth and developmental processes of the alga. These results provide insights into the roles that miRNAs play in

  1. MetaGenSense: A web-application for analysis and exploration of high throughput sequencing metagenomic data [version 3; referees: 1 approved, 2 approved with reservations

    Directory of Open Access Journals (Sweden)

    Damien Correia

    2016-12-01

    Full Text Available The detection and characterization of emerging infectious agents has been a continuing public health concern. High Throughput Sequencing (HTS or Next-Generation Sequencing (NGS technologies have proven to be promising approaches for efficient and unbiased detection of pathogens in complex biological samples, providing access to comprehensive analyses. As NGS approaches typically yield millions of putatively representative reads per sample, efficient data management and visualization resources have become mandatory. Most usually, those resources are implemented through a dedicated Laboratory Information Management System (LIMS, solely to provide perspective regarding the available information. We developed an easily deployable web-interface, facilitating management and bioinformatics analysis of metagenomics data-samples. It was engineered to run associated and dedicated Galaxy workflows for the detection and eventually classification of pathogens. The web application allows easy interaction with existing Galaxy metagenomic workflows, facilitates the organization, exploration and aggregation of the most relevant sample-specific sequences among millions of genomic sequences, allowing them to determine their relative abundance, and associate them to the most closely related organism or pathogen. The user-friendly Django-Based interface, associates the users’ input data and its metadata through a bio-IT provided set of resources (a Galaxy instance, and both sufficient storage and grid computing power. Galaxy is used to handle and analyze the user’s input data from loading, indexing, mapping, assembly and DB-searches. Interaction between our application and Galaxy is ensured by the BioBlend library, which gives API-based access to Galaxy’s main features. Metadata about samples, runs, as well as the workflow results are stored in the LIMS. For metagenomic classification and exploration purposes, we show, as a proof of concept, that integration

  2. MetaGenSense: A web-application for analysis and exploration of high throughput sequencing metagenomic data [version 2; referees: 1 approved, 2 approved with reservations

    Directory of Open Access Journals (Sweden)

    Damien Correia

    2016-08-01

    Full Text Available The detection and characterization of emerging infectious agents has been a continuing public health concern. High Throughput Sequencing (HTS or Next-Generation Sequencing (NGS technologies have proven to be promising approaches for efficient and unbiased detection of pathogens in complex biological samples, providing access to comprehensive analyses. As NGS approaches typically yield millions of putatively representative reads per sample, efficient data management and visualization resources have become mandatory. Most usually, those resources are implemented through a dedicated Laboratory Information Management System (LIMS, solely to provide perspective regarding the available information. We developed an easily deployable web-interface, facilitating management and bioinformatics analysis of metagenomics data-samples. It was engineered to run associated and dedicated Galaxy workflows for the detection and eventually classification of pathogens. The web application allows easy interaction with existing Galaxy metagenomic workflows, facilitates the organization, exploration and aggregation of the most relevant sample-specific sequences among millions of genomic sequences, allowing them to determine their relative abundance, and associate them to the most closely related organism or pathogen. The user-friendly Django-Based interface, associates the users’ input data and its metadata through a bio-IT provided set of resources (a Galaxy instance, and both sufficient storage and grid computing power. Galaxy is used to handle and analyze the user’s input data from loading, indexing, mapping, assembly and DB-searches. Interaction between our application and Galaxy is ensured by the BioBlend library, which gives API-based access to Galaxy’s main features. Metadata about samples, runs, as well as the workflow results are stored in the LIMS. For metagenomic classification and exploration purposes, we show, as a proof of concept, that integration

  3. Assessment of Bifidobacterium Species Using groEL Gene on the Basis of Illumina MiSeq High-Throughput Sequencing

    Science.gov (United States)

    Hu, Lujun; Lu, Wenwei; Wang, Linlin; Pan, Mingluo; Zhang, Hao; Zhao, Jianxin; Chen, Wei

    2017-01-01

    The next-generation high-throughput sequencing techniques have introduced a new way to assess the gut’s microbial diversity on the basis of 16S rRNA gene-based microbiota analysis. However, the precise appraisal of the biodiversity of Bifidobacterium species within the gut remains a challenging task because of the limited resolving power of the 16S rRNA gene in different species. The groEL gene, a protein-coding gene, evolves quickly and thus is useful for differentiating bifidobacteria. Here, we designed a Bifidobacterium-specific primer pair which targets a hypervariable sequence region within the groEL gene that is suitable for precise taxonomic identification and detection of all recognized species of the genus Bifidobacterium so far. The results showed that the novel designed primer set can specifically differentiate Bifidobacterium species from non-bifidobacteria, and as low as 104 cells of Bifidobacterium species can be detected using the novel designed primer set on the basis of Illumina Miseq high-throughput sequencing. We also developed a novel protocol to assess the diversity of Bifidobacterium species in both human and rat feces through high-throughput sequencing technologies using groEL gene as a discriminative marker. PMID:29160815

  4. Deciphering the ubiquitin proteome: Limits and advantages of high throughput global affinity purification-mass spectrometry approaches.

    Science.gov (United States)

    Polge, Cécile; Uttenweiler-Joseph, Sandrine; Leulmi, Roza; Heng, Anne-Elisabeth; Burlet-Schiltz, Odile; Attaix, Didier; Taillandier, Daniel

    2013-10-01

    Ubiquitination is a posttranslational modification of proteins that involves the covalent attachment of ubiquitin, either as a single moiety or as polymers. This process controls almost every cellular metabolic pathway through a variety of combinations of linkages. Mass spectrometry now allows high throughput approaches for the identification of the thousands of ubiquitinated proteins and of their ubiquitination sites. Despite major technological improvements in mass spectrometry in terms of sensitivity, resolution and acquisition speed, the use of efficient purification methods of ubiquitinated proteins prior to mass spectrometry analysis is critical to achieve an efficient characterization of the ubiquitome. This critical step is achieved using different approaches that possess advantages and pitfalls. Here, we discuss the limits that can be encountered when deciphering the ubiquitome. This article is part of a Directed Issue entitled: Molecular basis of muscle wasting. Copyright © 2013 Elsevier Ltd. All rights reserved.

  5. SSR_pipeline--computer software for the identification of microsatellite sequences from paired-end Illumina high-throughput DNA sequence data

    Science.gov (United States)

    Miller, Mark P.; Knaus, Brian J.; Mullins, Thomas D.; Haig, Susan M.

    2013-01-01

    SSR_pipeline is a flexible set of programs designed to efficiently identify simple sequence repeats (SSRs; for example, microsatellites) from paired-end high-throughput Illumina DNA sequencing data. The program suite contains three analysis modules along with a fourth control module that can be used to automate analyses of large volumes of data. The modules are used to (1) identify the subset of paired-end sequences that pass quality standards, (2) align paired-end reads into a single composite DNA sequence, and (3) identify sequences that possess microsatellites conforming to user specified parameters. Each of the three separate analysis modules also can be used independently to provide greater flexibility or to work with FASTQ or FASTA files generated from other sequencing platforms (Roche 454, Ion Torrent, etc). All modules are implemented in the Python programming language and can therefore be used from nearly any computer operating system (Linux, Macintosh, Windows). The program suite relies on a compiled Python extension module to perform paired-end alignments. Instructions for compiling the extension from source code are provided in the documentation. Users who do not have Python installed on their computers or who do not have the ability to compile software also may choose to download packaged executable files. These files include all Python scripts, a copy of the compiled extension module, and a minimal installation of Python in a single binary executable. See program documentation for more information.

  6. A novel library-independent approach based on high-throughput cultivation in Bioscreen and fingerprinting by FTIR spectroscopy for microbial source tracking in food industry.

    Science.gov (United States)

    Shapaval, V; Møretrø, T; Wold Åsli, A; Suso, H P; Schmitt, J; Lillehaug, D; Kohler, A

    2017-05-01

    Microbiological source tracking (MST) for food industry is a rapid growing area of research and technology development. In this paper, a new library-independent approach for MST is presented. It is based on a high-throughput liquid microcultivation and FTIR spectroscopy. In this approach, FTIR spectra obtained from micro-organisms isolated along the production line and a product are compared to each other. We tested and evaluated the new source tracking approach by simulating a source tracking situation. In this simulation study, a selection of 20 spoilage mould strains from a total of six genera (Alternaria, Aspergillus, Mucor, Paecilomyces, Peyronellaea and Phoma) was used. The simulation of the source tracking situation showed that 80-100% of the sources could be correctly identified with respect to genus/species level. When performing source tracking simulations, the FTIR identification diverged for Phoma glomerata strain in the reference collection. When reidentifying the strain by sequencing, it turned out that the strain was a Peyronellaea arachidicola. The obtained results demonstrated that the proposed approach is a versatile tool for identifying sources of microbial contamination. Thus, it has a high potential for routine control in the food industry due to low costs and analysis time. The source tracking of fungal contamination in the food industry is an important aspect of food safety. Currently, all available methods are time consuming and require the use of a reference library that may limit the accuracy of the identification. In this study, we report for the first time, a library-independent FTIR spectroscopic approach for MST of fungal contamination along the food production line. It combines high-throughput microcultivation and FTIR spectroscopy and is specific on the genus and species level. Therefore, such an approach possesses great importance for food safety control in food industry. © 2016 The Society for Applied Microbiology.

  7. A complementary role of multiparameter flow cytometry and high-throughput sequencing for minimal residual disease detection in chronic lymphocytic leukemia: an European Research Initiative on CLL study.

    LENUS (Irish Health Repository)

    Rawstron, A C

    2016-04-01

    In chronic lymphocytic leukemia (CLL) the level of minimal residual disease (MRD) after therapy is an independent predictor of outcome. Given the increasing number of new agents being explored for CLL therapy, using MRD as a surrogate could greatly reduce the time necessary to assess their efficacy. In this European Research Initiative on CLL (ERIC) project we have identified and validated a flow-cytometric approach to reliably quantitate CLL cells to the level of 0.0010% (10(-5)). The assay comprises a core panel of six markers (i.e. CD19, CD20, CD5, CD43, CD79b and CD81) with a component specification independent of instrument and reagents, which can be locally re-validated using normal peripheral blood. This method is directly comparable to previous ERIC-designed assays and also provides a backbone for investigation of new markers. A parallel analysis of high-throughput sequencing using the ClonoSEQ assay showed good concordance with flow cytometry results at the 0.010% (10(-4)) level, the MRD threshold defined in the 2008 International Workshop on CLL guidelines, but it also provides good linearity to a detection limit of 1 in a million (10(-6)). The combination of both technologies would permit a highly sensitive approach to MRD detection while providing a reproducible and broadly accessible method to quantify residual disease and optimize treatment in CLL.

  8. A complementary role of multiparameter flow cytometry and high-throughput sequencing for minimal residual disease detection in chronic lymphocytic leukemia: an European Research Initiative on CLL study

    Science.gov (United States)

    Rawstron, A C; Fazi, C; Agathangelidis, A; Villamor, N; Letestu, R; Nomdedeu, J; Palacio, C; Stehlikova, O; Kreuzer, K-A; Liptrot, S; O'Brien, D; de Tute, R M; Marinov, I; Hauwel, M; Spacek, M; Dobber, J; Kater, A P; Gambell, P; Soosapilla, A; Lozanski, G; Brachtl, G; Lin, K; Boysen, J; Hanson, C; Jorgensen, J L; Stetler-Stevenson, M; Yuan, C; Broome, H E; Rassenti, L; Craig, F; Delgado, J; Moreno, C; Bosch, F; Egle, A; Doubek, M; Pospisilova, S; Mulligan, S; Westerman, D; Sanders, C M; Emerson, R; Robins, H S; Kirsch, I; Shanafelt, T; Pettitt, A; Kipps, T J; Wierda, W G; Cymbalista, F; Hallek, M; Hillmen, P; Montserrat, E; Ghia, P

    2016-01-01

    In chronic lymphocytic leukemia (CLL) the level of minimal residual disease (MRD) after therapy is an independent predictor of outcome. Given the increasing number of new agents being explored for CLL therapy, using MRD as a surrogate could greatly reduce the time necessary to assess their efficacy. In this European Research Initiative on CLL (ERIC) project we have identified and validated a flow-cytometric approach to reliably quantitate CLL cells to the level of 0.0010% (10−5). The assay comprises a core panel of six markers (i.e. CD19, CD20, CD5, CD43, CD79b and CD81) with a component specification independent of instrument and reagents, which can be locally re-validated using normal peripheral blood. This method is directly comparable to previous ERIC-designed assays and also provides a backbone for investigation of new markers. A parallel analysis of high-throughput sequencing using the ClonoSEQ assay showed good concordance with flow cytometry results at the 0.010% (10−4) level, the MRD threshold defined in the 2008 International Workshop on CLL guidelines, but it also provides good linearity to a detection limit of 1 in a million (10−6). The combination of both technologies would permit a highly sensitive approach to MRD detection while providing a reproducible and broadly accessible method to quantify residual disease and optimize treatment in CLL. PMID:26639181

  9. Development of a Rapid Microbore Metabolic Profiling Ultraperformance Liquid Chromatography-Mass Spectrometry Approach for High-Throughput Phenotyping Studies.

    Science.gov (United States)

    Gray, Nicola; Adesina-Georgiadis, Kyrillos; Chekmeneva, Elena; Plumb, Robert S; Wilson, Ian D; Nicholson, Jeremy K

    2016-06-07

    A rapid gradient microbore ultraperformance liquid chromatography-mass spectrometry (UPLC-MS) method has been developed to provide a high-throughput analytical platform for the metabolic phenotyping of urine from large sample cohorts. The rapid microbore metabolic profiling (RAMMP) approach was based on scaling a conventional reversed-phase UPLC-MS method for urinary profiling from 2.1 mm × 100 mm columns to 1 mm × 50 mm columns, increasing the linear velocity of the solvent, and decreasing the gradient time to provide an analysis time of 2.5 min/sample. Comparison showed that conventional UPLC-MS and rapid gradient approaches provided peak capacities of 150 and 50, respectively, with the conventional method detecting approximately 19 000 features compared to the ∼6 000 found using the rapid gradient method. Similar levels of repeatability were seen for both methods. Despite the reduced peak capacity and the reduction in ions detected, the RAMMP method was able to achieve similar levels of group discrimination as conventional UPLC-MS when applied to rat urine samples obtained from investigative studies on the effects of acute 2-bromophenol and chronic acetaminophen administration. When compared to a direct infusion MS method of similar analysis time the RAMMP method provided superior selectivity. The RAMMP approach provides a robust and sensitive method that is well suited to high-throughput metabonomic analysis of complex mixtures such as urine combined with a 5-fold reduction in analysis time compared with the conventional UPLC-MS method.

  10. High-Throughput Sequencing of the Expressed Torafugu (Takifugu rubripes Antibody Sequences Distinguishes IgM and IgT Repertoires and Reveals Evidence of Convergent Evolution

    Directory of Open Access Journals (Sweden)

    Xi Fu

    2018-02-01

    Full Text Available B-cell antigen receptor (BCR or antibody diversity arises from somatic recombination of immunoglobulin (Ig gene segments and is concentrated within the Ig heavy (H chain complementarity-determining region 3 (CDR-H3. We performed high-throughput sequencing of the expressed antibody heavy-chain repertoire from adult torafugu. We found that torafugu use between 70 and 82% of all possible V (variable, D (diversity, and J (joining gene segment combinations and that they share a similar frequency distribution of these VDJ combinations. The CDR-H3 sequence repertoire observed in individuals is biased with the preferential use of a small number of VDJ, dominated by sequences containing inserted nucleotides. We uncovered the common CDR-H3 amino-acid (aa sequences shared by individuals. Common CDR-H3 sequences feature highly convergent nucleic-acid recombination compared with private ones. Finally, we observed differences in repertoires between IgM and IgT, including the unequal usage frequencies of V gene segment and the biased number of nucleotide insertion/deletion at VDJ junction regions that leads to distinct distributions of CDR-H3 lengths.

  11. Biphasic Study to Characterize Agricultural Biogas Plants by High-Throughput 16S rRNA Gene Amplicon Sequencing and Microscopic Analysis.

    Science.gov (United States)

    Maus, Irena; Kim, Yong Sung; Wibberg, Daniel; Stolze, Yvonne; Off, Sandra; Antonczyk, Sebastian; Pühler, Alfred; Scherer, Paul; Schlüter, Andreas

    2017-02-28

    Process surveillance within agricultural biogas plants (BGPs) was concurrently studied by high-throughput 16S rRNA gene amplicon sequencing and an optimized quantitative microscopic fingerprinting (QMF) technique. In contrast to 16S rRNA gene amplicons, digitalized microscopy is a rapid and cost-effective method that facilitates enumeration and morphological differentiation of the most significant groups of methanogens regarding their shape and characteristic autofluorescent factor 420. Moreover, the fluorescence signal mirrors cell vitality. In this study, four different BGPs were investigated. The results indicated stable process performance in the mesophilic BGPs and in the thermophilic reactor. Bacterial subcommunity characterization revealed significant differences between the four BGPs. Most remarkably, the genera Defluviitoga and Halocella dominated the thermophilic bacterial subcommunity, whereas members of another taxon, Syntrophaceticus, were found to be abundant in the mesophilic BGP. The domain Archaea was dominated by the genus Methanoculleus in all four BGPs, followed by Methanosaeta in BGP1 and BGP3. In contrast, Methanothermobacter members were highly abundant in the thermophilic BGP4. Furthermore, a high consistency between the sequencing approach and the QMF method was shown, especially for the thermophilic BGP. The differences elucidated that using this biphasic approach for mesophilic BGPs provided novel insights regarding disaggregated single cells of Methanosarcina and Methanosaeta species. Both dominated the archaeal subcommunity and replaced coccoid Methanoculleus members belonging to the same group of Methanomicrobiales that have been frequently observed in similar BGPs. This work demonstrates that combining QMF and 16S rRNA gene amplicon sequencing is a complementary strategy to describe archaeal community structures within biogas processes.

  12. Fixing Formalin: A Method to Recover Genomic-Scale DNA Sequence Data from Formalin-Fixed Museum Specimens Using High-Throughput Sequencing.

    Directory of Open Access Journals (Sweden)

    Sarah M Hykin

    Full Text Available For 150 years or more, specimens were routinely collected and deposited in natural history collections without preserving fresh tissue samples for genetic analysis. In the case of most herpetological specimens (i.e. amphibians and reptiles, attempts to extract and sequence DNA from formalin-fixed, ethanol-preserved specimens-particularly for use in phylogenetic analyses-has been laborious and largely ineffective due to the highly fragmented nature of the DNA. As a result, tens of thousands of specimens in herpetological collections have not been available for sequence-based phylogenetic studies. Massively parallel High-Throughput Sequencing methods and the associated bioinformatics, however, are particularly suited to recovering meaningful genetic markers from severely degraded/fragmented DNA sequences such as DNA damaged by formalin-fixation. In this study, we compared previously published DNA extraction methods on three tissue types subsampled from formalin-fixed specimens of Anolis carolinensis, followed by sequencing. Sufficient quality DNA was recovered from liver tissue, making this technique minimally destructive to museum specimens. Sequencing was only successful for the more recently collected specimen (collected ~30 ybp. We suspect this could be due either to the conditions of preservation and/or the amount of tissue used for extraction purposes. For the successfully sequenced sample, we found a high rate of base misincorporation. After rigorous trimming, we successfully mapped 27.93% of the cleaned reads to the reference genome, were able to reconstruct the complete mitochondrial genome, and recovered an accurate phylogenetic placement for our specimen. We conclude that the amount of DNA available, which can vary depending on specimen age and preservation conditions, will determine if sequencing will be successful. The technique described here will greatly improve the value of museum collections by making many formalin-fixed specimens

  13. Fixing Formalin: A Method to Recover Genomic-Scale DNA Sequence Data from Formalin-Fixed Museum Specimens Using High-Throughput Sequencing.

    Science.gov (United States)

    Hykin, Sarah M; Bi, Ke; McGuire, Jimmy A

    2015-01-01

    For 150 years or more, specimens were routinely collected and deposited in natural history collections without preserving fresh tissue samples for genetic analysis. In the case of most herpetological specimens (i.e. amphibians and reptiles), attempts to extract and sequence DNA from formalin-fixed, ethanol-preserved specimens-particularly for use in phylogenetic analyses-has been laborious and largely ineffective due to the highly fragmented nature of the DNA. As a result, tens of thousands of specimens in herpetological collections have not been available for sequence-based phylogenetic studies. Massively parallel High-Throughput Sequencing methods and the associated bioinformatics, however, are particularly suited to recovering meaningful genetic markers from severely degraded/fragmented DNA sequences such as DNA damaged by formalin-fixation. In this study, we compared previously published DNA extraction methods on three tissue types subsampled from formalin-fixed specimens of Anolis carolinensis, followed by sequencing. Sufficient quality DNA was recovered from liver tissue, making this technique minimally destructive to museum specimens. Sequencing was only successful for the more recently collected specimen (collected ~30 ybp). We suspect this could be due either to the conditions of preservation and/or the amount of tissue used for extraction purposes. For the successfully sequenced sample, we found a high rate of base misincorporation. After rigorous trimming, we successfully mapped 27.93% of the cleaned reads to the reference genome, were able to reconstruct the complete mitochondrial genome, and recovered an accurate phylogenetic placement for our specimen. We conclude that the amount of DNA available, which can vary depending on specimen age and preservation conditions, will determine if sequencing will be successful. The technique described here will greatly improve the value of museum collections by making many formalin-fixed specimens available for

  14. Genome-wide identification of cold-responsive and new microRNAs in Populus tomentosa by high-throughput sequencing.

    Science.gov (United States)

    Chen, Lei; Zhang, Yiyun; Ren, Yuanyuan; Xu, Jichen; Zhang, Zhiyi; Wang, Yanwei

    2012-01-13

    MicroRNAs (miRNAs) are small, non-coding RNAs that regulate the expression of target mRNAs in plant growth, development, abiotic stress responses, and pathogen responses. Cold stress is one of the most common abiotic factors affecting plants, and it adversely affects plant growth, development, and spatial distribution. To understand the roles of miRNAs under cold stress in Populus tomentosa, we constructed two small RNA libraries from plantlets treated or not with cold conditions (4°C for 8 h). High-throughput sequencing of the two libraries identified 144 conserved miRNAs belonging to 33 miRNA families and 29 new miRNAs (as well as their corresponding miRNA(∗)s) belonging to 23 miRNA families. Differential expression analysis showed that 21 miRNAs were down-regulated and nine miRNAs were up-regulated in response to cold stress. Among them, 19 cold-responsive miRNAs, two new miRNAs and their corresponding miRNA(∗)s were validated by qRT-PCR. A total of 101 target genes of the new miRNAs were predicted using a bioinformatics approach. These target genes are involved in growth and resistance to various stresses. The results demonstrated that Populus miRNAs play critical roles in the cold stress response. Copyright © 2011 Elsevier Inc. All rights reserved.

  15. A Scalable Epitope Tagging Approach for High Throughput ChIP-Seq Analysis.

    Science.gov (United States)

    Xiong, Xiong; Zhang, Yanxiao; Yan, Jian; Jain, Surbhi; Chee, Sora; Ren, Bing; Zhao, Huimin

    2017-06-16

    Eukaryotic transcriptional factors (TFs) typically recognize short genomic sequences alone or together with other proteins to modulate gene expression. Mapping of TF-DNA interactions in the genome is crucial for understanding the gene regulatory programs in cells. While chromatin immunoprecipitation followed by sequencing (ChIP-Seq) is commonly used for this purpose, its application is severely limited by the availability of suitable antibodies for TFs. To overcome this limitation, we developed an efficient and scalable strategy named cmChIP-Seq that combines the clustered regularly interspaced short palindromic repeats (CRISPR) technology with microhomology mediated end joining (MMEJ) to genetically engineer a TF with an epitope tag. We demonstrated the utility of this tool by applying it to four TFs in a human colorectal cancer cell line. The highly scalable procedure makes this strategy ideal for ChIP-Seq analysis of TFs in diverse species and cell types.

  16. Characterization of a transcriptome from a non-model organism, Cladonia rangiferina, the grey reindeer lichen, using high-throughput next generation sequencing and EST sequence data

    Directory of Open Access Journals (Sweden)

    Junttila Sini

    2012-10-01

    Full Text Available Abstract Background Lichens are symbiotic organisms that have a remarkable ability to survive in some of the most extreme terrestrial climates on earth. Lichens can endure frequent desiccation and wetting cycles and are able to survive in a dehydrated molecular dormant state for decades at a time. Genetic resources have been established in lichen species for the study of molecular systematics and their taxonomic classification. No lichen species have been characterised yet using genomics and the molecular mechanisms underlying the lichen symbiosis and the fundamentals of desiccation tolerance remain undescribed. We report the characterisation of a transcriptome of the grey reindeer lichen, Cladonia rangiferina, using high-throughput next-generation transcriptome sequencing and traditional Sanger EST sequencing data. Results Altogether 243,729 high quality sequence reads were de novo assembled into 16,204 contigs and 49,587 singletons. The genome of origin for the sequences produced was predicted using Eclat with sequences derived from the axenically grown symbiotic partners used as training sequences for the classification model. 62.8% of the sequences were classified as being of fungal origin while the remaining 37.2% were predicted as being of algal origin. The assembled sequences were annotated by BLASTX comparison against a non-redundant protein sequence database with 34.4% of the sequences having a BLAST match. 29.3% of the sequences had a Gene Ontology term match and 27.9% of the sequences had a domain or structural match following an InterPro search. 60 KEGG pathways with more than 10 associated sequences were identified. Conclusions Our results present a first transcriptome sequencing and de novo assembly for a lichen species and describe the ongoing molecular processes and the most active pathways in C. rangiferina. This brings a meaningful contribution to publicly available lichen sequence information. These data provide a first

  17. Comprehensive processing of high-throughput small RNA sequencing data including quality checking, normalization, and differential expression analysis using the UEA sRNA Workbench.

    Science.gov (United States)

    Beckers, Matthew; Mohorianu, Irina; Stocks, Matthew; Applegate, Christopher; Dalmay, Tamas; Moulton, Vincent

    2017-06-01

    Recently, high-throughput sequencing (HTS) has revealed compelling details about the small RNA (sRNA) population in eukaryotes. These 20 to 25 nt noncoding RNAs can influence gene expression by acting as guides for the sequence-specific regulatory mechanism known as RNA silencing. The increase in sequencing depth and number of samples per project enables a better understanding of the role sRNAs play by facilitating the study of expression patterns. However, the intricacy of the biological hypotheses coupled with a lack of appropriate tools often leads to inadequate mining of the available data and thus, an incomplete description of the biological mechanisms involved. To enable a comprehensive study of differential expression in sRNA data sets, we present a new interactive pipeline that guides researchers through the various stages of data preprocessing and analysis. This includes various tools, some of which we specifically developed for sRNA analysis, for quality checking and normalization of sRNA samples as well as tools for the detection of differentially expressed sRNAs and identification of the resulting expression patterns. The pipeline is available within the UEA sRNA Workbench, a user-friendly software package for the processing of sRNA data sets. We demonstrate the use of the pipeline on a H. sapiens data set; additional examples on a B. terrestris data set and on an A. thaliana data set are described in the Supplemental Information A comparison with existing approaches is also included, which exemplifies some of the issues that need to be addressed for sRNA analysis and how the new pipeline may be used to do this. © 2017 Beckers et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.

  18. In silico prediction and screening of modular crystal structures via a high-throughput genomic approach

    National Research Council Canada - National Science Library

    Li, Yi; Li, Xu; Liu, Jiancong; Duan, Fangzheng; Yu, Jihong

    2015-01-01

    .... Here we demonstrate the application of a new genomic approach to ABC-6 zeolites, a family of industrially important catalysts whose structures are built from the stacking of modular six-ring layers...

  19. Evaluation of a High-Throughput Repetitive-Sequence-Based PCR System for DNA Fingerprinting of Mycobacterium tuberculosis and Mycobacterium avium Complex Strains

    Science.gov (United States)

    Cangelosi, Gerard A.; Freeman, Robert J.; Lewis, Kaeryn N.; Livingston-Rosanoff, Devon; Shah, Ketan S.; Milan, Sparrow Joy; Goldberg, Stefan V.

    2004-01-01

    Repetitive-sequence-based PCR (rep-PCR) is useful for generating DNA fingerprints of diverse bacterial and fungal species. Rep-PCR amplicon fingerprints represent genomic segments lying between repetitive sequences. A commercial system that electrophoretically separates rep-PCR amplicons on microfluidic chips, and provides computer-generated readouts of results has been adapted for use with Mycobacterium species. The ability of this system to type M. tuberculosis and M. avium complex (MAC) isolates was evaluated. M. tuberculosis strains (n = 56) were typed by spoligotyping with rep-PCR as a high-resolution adjunct. Results were compared with those generated by a standard approach of spoligotyping with IS6110-targeted restriction fragment length polymorphism (IS6110-RFLP) as the high-resolution adjunct. The sample included 11 epidemiologically and genotypically linked outbreak isolates and a population-based sample of 45 isolates from recent immigrants to Seattle, Wash., from the African Horn countries of Somalia, Eritrea, and Ethiopia. Twenty isolates exhibited unique spoligotypes and were not analyzed further. Of the 36 outbreak and African Horn isolates with nonunique spoligotypes, 23 fell into four clusters identified by IS6110-RFLP and rep-PCR, with 97% concordance observed between the two methods. Both approaches revealed extensive strain heterogeneity within the African Horn sample, consistent with a predominant pattern of reactivation of latent infections in this immigrant population. Rep-PCR exhibited 89% concordance with IS1245-RFLP typing of 28 M. avium subspecies avium strains. For M. tuberculosis as well as M. avium subspecies avium, the discriminative power of rep-PCR equaled or exceeded that of RFLP. Rep-PCR also generated DNA fingerprints from M. intracellulare (n = 8) and MACx (n = 2) strains. It shows promise as a fast, unified method for high-throughput genotypic fingerprinting of multiple Mycobacterium species. PMID:15184453

  20. An effective method to purify Plasmodium falciparum DNA directly from clinical blood samples for whole genome high-throughput sequencing.

    Directory of Open Access Journals (Sweden)

    Sarah Auburn

    Full Text Available Highly parallel sequencing technologies permit cost-effective whole genome sequencing of hundreds of Plasmodium parasites. The ability to sequence clinical Plasmodium samples, extracted directly from patient blood without a culture step, presents a unique opportunity to sample the diversity of "natural" parasite populations in high resolution clinical and epidemiological studies. A major challenge to sequencing clinical Plasmodium samples is the abundance of human DNA, which may substantially reduce the yield of Plasmodium sequence. We tested a range of human white blood cell (WBC depletion methods on P. falciparum-infected patient samples in search of a method displaying an optimal balance of WBC-removal efficacy, cost, simplicity, and applicability to low resource settings. In the first of a two-part study, combinations of three different WBC depletion methods were tested on 43 patient blood samples in Mali. A two-step combination of Lymphoprep plus Plasmodipur best fitted our requirements, although moderate variability was observed in human DNA quantity. This approach was further assessed in a larger sample of 76 patients from Burkina Faso. WBC-removal efficacy remained high (70% samples and lower variation was observed in human DNA quantities. In order to assess the Plasmodium sequence yield at different human DNA proportions, 59 samples with up to 60% human DNA contamination were sequenced on the Illumina Genome Analyzer platform. An average ~40-fold coverage of the genome was observed per lane for samples with ≤ 30% human DNA. Even in low resource settings, using a simple two-step combination of Lymphoprep plus Plasmodipur, over 70% of clinical sample preparations should exhibit sufficiently low human DNA quantities to enable ~40-fold sequence coverage of the P. falciparum genome using a single lane on the Illumina Genome Analyzer platform. This approach should greatly facilitate large-scale clinical and epidemiologic studies of P

  1. Low Complexity Approach for High Throughput Belief-Propagation based Decoding of LDPC Codes

    Directory of Open Access Journals (Sweden)

    BOT, A.

    2013-11-01

    Full Text Available The paper proposes a low complexity belief propagation (BP based decoding algorithm for LDPC codes. In spite of the iterative nature of the decoding process, the proposed algorithm provides both reduced complexity and increased BER performances as compared with the classic min-sum (MS algorithm, generally used for hardware implementations. Linear approximations of check-nodes update function are used in order to reduce the complexity of the BP algorithm. Considering this decoding approach, an FPGA based hardware architecture is proposed for implementing the decoding algorithm, aiming to increase the decoder throughput. FPGA technology was chosen for the LDPC decoder implementation, due to its parallel computation and reconfiguration capabilities. The obtained results show improvements regarding decoding throughput and BER performances compared with state-of-the-art approaches.

  2. A multivariate approach for high throughput pectin profiling by combining glycan microarrays with monoclonal antibodies.

    Science.gov (United States)

    Sousa, António G; Ahl, Louise I; Pedersen, Henriette L; Fangel, Jonatan U; Sørensen, Susanne O; Willats, William G T

    2015-05-29

    Pectin-one of the most complex biomacromolecules in nature has been extensively studied using various techniques. This has been done so in an attempt to understand the chemical composition and conformation of pectin, whilst discovering and optimising new industrial applications of the polymer. For the last decade the emergence of glycan microarray technology has led to a growing capacity of acquiring simultaneous measurements related to various carbohydrate characteristics while generating large collections of data. Here we used a multivariate analysis approach in order to analyse a set of 359 pectin samples probed with 14 different monoclonal antibodies (mAbs). Principal component analysis (PCA) and partial least squares (PLS) regression were utilised to obtain the most optimal qualitative and quantitative information from the spotted microarrays. The potential use of microarray technology combined with chemometrics for the accurate determination of degree of methyl-esterification (DM) and degree of blockiness (DB) was assessed. Copyright © 2015 Elsevier Ltd. All rights reserved.

  3. Deep sequencing of HPV16 genomes: A new high-throughput tool for exploring the carcinogenicity and natural history of HPV16 infection

    Directory of Open Access Journals (Sweden)

    Michael Cullen

    2015-12-01

    Full Text Available For unknown reasons, there is huge variability in risk conferred by different HPV types and, remarkably, strong differences even between closely related variant lineages within each type. HPV16 is a uniquely powerful carcinogenic type, causing approximately half of cervical cancer and most other HPV-related cancers. To permit the large-scale study of HPV genome variability and precancer/cancer, starting with HPV16 and cervical cancer, we developed a high-throughput next-generation sequencing (NGS whole-genome method. We designed a custom HPV16 AmpliSeq™ panel that generated 47 overlapping amplicons covering 99% of the genome sequenced on the Ion Torrent Proton platform. After validating with Sanger, the current “gold standard” of sequencing, in 89 specimens with concordance of 99.9%, we used our NGS method and custom annotation pipeline to sequence 796 HPV16-positive exfoliated cervical cell specimens. The median completion rate per sample was 98.0%.Our method enabled us to discover novel SNPs, large contiguous deletions suggestive of viral integration (OR of 27.3, 95% CI 3.3–222, P=0.002, and the sensitive detection of variant lineage coinfections. This method represents an innovative high-throughput, ultra-deep coverage technique for HPV genomic sequencing, which, in turn, enables the investigation of the role of genetic variation in HPV epidemiology and carcinogenesis. Keywords: HPV16, HPV epidemiology, HPV genomics

  4. Transcriptome-Wide Analysis of Botrytis elliptica Responsive microRNAs and Their Targets in Lilium Regale Wilson by High-Throughput Sequencing and Degradome Analysis

    Directory of Open Access Journals (Sweden)

    Xue Gao

    2017-05-01

    Full Text Available MicroRNAs, as master regulators of gene expression, have been widely identified and play crucial roles in plant-pathogen interactions. A fatal pathogen, Botrytis elliptica, causes the serious folia disease of lily, which reduces production because of the high susceptibility of most cultivated species. However, the miRNAs related to Botrytis infection of lily, and the miRNA-mediated gene regulatory networks providing resistance to B. elliptica in lily remain largely unexplored. To systematically dissect B. elliptica-responsive miRNAs and their target genes, three small RNA libraries were constructed from the leaves of Lilium regale, a promising Chinese wild Lilium species, which had been subjected to mock B. elliptica treatment or B. elliptica infection for 6 and 24 h. By high-throughput sequencing, 71 known miRNAs belonging to 47 conserved families and 24 novel miRNA were identified, of which 18 miRNAs were downreguleted and 13 were upregulated in response to B. elliptica. Moreover, based on the lily mRNA transcriptome, 22 targets for 9 known and 1 novel miRNAs were identified by the degradome sequencing approach. Most target genes for elliptica-responsive miRNAs were involved in metabolic processes, few encoding different transcription factors, including ELONGATION FACTOR 1 ALPHA (EF1a and TEOSINTE BRANCHED1/CYCLOIDEA/PROLIFERATING CELL FACTOR 2 (TCP2. Furthermore, the expression patterns of a set of elliptica-responsive miRNAs and their targets were validated by quantitative real-time PCR. This study represents the first transcriptome-based analysis of miRNAs responsive to B. elliptica and their targets in lily. The results reveal the possible regulatory roles of miRNAs and their targets in B. elliptica interaction, which will extend our understanding of the mechanisms of this disease in lily.

  5. A cost-effective high-throughput metabarcoding approach powerful enough to genotype ~44 000 year-old rodent remains from Northern Africa.

    Science.gov (United States)

    Guimaraes, S; Pruvost, M; Daligault, J; Stoetzel, E; Bennett, E A; Côté, N M-L; Nicolas, V; Lalis, A; Denys, C; Geigl, E-M; Grange, T

    2017-05-01

    We present a cost-effective metabarcoding approach, aMPlex Torrent, which relies on an improved multiplex PCR adapted to highly degraded DNA, combining barcoding and next-generation sequencing to simultaneously analyse many heterogeneous samples. We demonstrate the strength of these improvements by generating a phylochronology through the genotyping of ancient rodent remains from a Moroccan cave whose stratigraphy covers the last 120 000 years. Rodents are important for epidemiology, agronomy and ecological investigations and can act as bioindicators for human- and/or climate-induced environmental changes. Efficient and reliable genotyping of ancient rodent remains has the potential to deliver valuable phylogenetic and paleoecological information. The analysis of multiple ancient skeletal remains of very small size with poor DNA preservation, however, requires a sensitive high-throughput method to generate sufficient data. We show this approach to be particularly adapted at accessing this otherwise difficult taxonomic and genetic resource. As a highly scalable, lower cost and less labour-intensive alternative to targeted sequence capture approaches, we propose the aMPlex Torrent strategy to be a useful tool for the genetic analysis of multiple degraded samples in studies involving ecology, archaeology, conservation and evolutionary biology. © 2016 John Wiley & Sons Ltd.

  6. High-throughput, cell-free, liposome-based approach for assessing in vitro activity of lipid kinases.

    Science.gov (United States)

    Demian, Douglas J; Clugston, Susan L; Foster, Meta M; Rameh, Lucia; Sarkes, Deborah; Townson, Sharon A; Yang, Lily; Zhang, Melvin; Charlton, Maura E

    2009-08-01

    Lipid kinases are central players in lipid signaling pathways involved in inflammation, tumorigenesis, and metabolic syndrome. A number of these kinase targets have proven difficult to investigate in higher throughput cell-free assay systems. This challenge is partially due to specific substrate interaction requirements for several of the lipid kinase family members and the resulting incompatibility of these substrates with most established, homogeneous assay formats. Traditional, cell-free in vitro investigational methods for members of the lipid kinase family typically involve substrate incorporation of [gamma-32P] and resolution of signal by thin-layer chromatography (TLC) and autoradiograph densitometry. This approach, although highly sensitive, does not lend itself to high-throughput testing of large numbers of small molecules (100 s to 1 MM+). The authors present the development and implementation of a fully synthetic, liposome-based assay for assessing in vitro activity of phosphatidylinositol-5-phosphate-4-kinase isoforms (PIP4KIIbeta and alpha) in 2 commonly used homogeneous technologies. They have validated these assays through compound testing in both traditional TLC and radioactive filterplate approaches as well as binding validation using isothermic calorimetry. A directed library representing known kinase pharmacophores was screened against type IIbeta phosphatidylinositol-phosphate kinase (PIPK) to identify small-molecule inhibitors. This assay system can be applied to other types and isoforms of PIPKs as well as a variety of other lipid kinase targets.

  7. A New Statistical Approach to Characterize Chemical-Elicited Behavioral Effects in High-Throughput Studies Using Zebrafish.

    Directory of Open Access Journals (Sweden)

    Guozhu Zhang

    Full Text Available Zebrafish have become an important alternative model for characterizing chemical bioactivity, partly due to the efficiency at which systematic, high-dimensional data can be generated. However, these new data present analytical challenges associated with scale and diversity. We developed a novel, robust statistical approach to characterize chemical-elicited effects in behavioral data from high-throughput screening (HTS of all 1,060 Toxicity Forecaster (ToxCast™ chemicals across 5 concentrations at 120 hours post-fertilization (hpf. Taking advantage of the immense scale of data for a global view, we show that this new approach reduces bias introduced by extreme values yet allows for diverse response patterns that confound the application of traditional statistics. We have also shown that, as a summary measure of response for local tests of chemical-associated behavioral effects, it achieves a significant reduction in coefficient of variation compared to many traditional statistical modeling methods. This effective increase in signal-to-noise ratio augments statistical power and is observed across experimental periods (light/dark conditions that display varied distributional response patterns. Finally, we integrated results with data from concomitant developmental endpoint measurements to show that appropriate statistical handling of HTS behavioral data can add important biological context that informs mechanistic hypotheses.

  8. Acute effects of TiO2 nanomaterials on the viability and taxonomic composition of aquatic bacterial communities assessed via high-throughput screening and next generation sequencing.

    Directory of Open Access Journals (Sweden)

    Chu Thi Thanh Binh

    Full Text Available The nanotechnology industry is growing rapidly, leading to concerns about the potential ecological consequences of the release of engineered nanomaterials (ENMs to the environment. One challenge of assessing the ecological risks of ENMs is the incredible diversity of ENMs currently available and the rapid pace at which new ENMs are being developed. High-throughput screening (HTS is a popular approach to assessing ENM cytotoxicity that offers the opportunity to rapidly test in parallel a wide range of ENMs at multiple concentrations. However, current HTS approaches generally test one cell type at a time, which limits their ability to predict responses of complex microbial communities. In this study toxicity screening via a HTS platform was used in combination with next generation sequencing (NGS to assess responses of bacterial communities from two aquatic habitats, Lake Michigan (LM and the Chicago River (CR, to short-term exposure in their native waters to several commercial TiO2 nanomaterials under simulated solar irradiation. Results demonstrate that bacterial communities from LM and CR differed in their sensitivity to nano-TiO2, with the community from CR being more resistant. NGS analysis revealed that the composition of the bacterial communities from LM and CR were significantly altered by exposure to nano-TiO2, including decreases in overall bacterial diversity, decreases in the relative abundance of Actinomycetales, Sphingobacteriales, Limnohabitans, and Flavobacterium, and a significant increase in Limnobacter. These results suggest that the release of nano-TiO2 to the environment has the potential to alter the composition of aquatic bacterial communities, which could have implications for the stability and function of aquatic ecosystems. The novel combination of HTS and NGS described in this study represents a major advance over current methods for assessing ENM ecotoxicity because the relative toxicities of multiple ENMs to thousands

  9. Next-generation sequencing in veterinary medicine: how can the massive amount of information arising from high-throughput technologies improve diagnosis, control, and management of infectious diseases?

    Science.gov (United States)

    Van Borm, Steven; Belák, Sándor; Freimanis, Graham; Fusaro, Alice; Granberg, Fredrik; Höper, Dirk; King, Donald P; Monne, Isabella; Orton, Richard; Rosseel, Toon

    2015-01-01

    The development of high-throughput molecular technologies and associated bioinformatics has dramatically changed the capacities of scientists to produce, handle, and analyze large amounts of genomic, transcriptomic, and proteomic data. A clear example of this step-change is represented by the amount of DNA sequence data that can be now produced using next-generation sequencing (NGS) platforms. Similarly, recent improvements in protein and peptide separation efficiencies and highly accurate mass spectrometry have promoted the identification and quantification of proteins in a given sample. These advancements in biotechnology have increasingly been applied to the study of animal infectious diseases and are beginning to revolutionize the way that biological and evolutionary processes can be studied at the molecular level. Studies have demonstrated the value of NGS technologies for molecular characterization, ranging from metagenomic characterization of unknown pathogens or microbial communities to molecular epidemiology and evolution of viral quasispecies. Moreover, high-throughput technologies now allow detailed studies of host-pathogen interactions at the level of their genomes (genomics), transcriptomes (transcriptomics), or proteomes (proteomics). Ultimately, the interaction between pathogen and host biological networks can be questioned by analytically integrating these levels (integrative OMICS and systems biology). The application of high-throughput biotechnology platforms in these fields and their typical low-cost per information content has revolutionized the resolution with which these processes can now be studied. The aim of this chapter is to provide a current and prospective view on the opportunities and challenges associated with the application of massive parallel sequencing technologies to veterinary medicine, with particular focus on applications that have a potential impact on disease control and management.

  10. Development and Evaluation of Quality Metrics for Bioinformatics Analysis of Viral Insertion Site Data Generated Using High Throughput Sequencing.

    Science.gov (United States)

    Gao, Hongyu; Hawkins, Troy; Jasti, Aparna; Chen, Yu-Hsiang; Mockaitis, Keithanne; Dinauer, Mary; Cornetta, Kenneth

    2014-05-06

    Integration of viral vectors into a host genome is associated with insertional mutagenesis and subjects in clinical gene therapy trials must be monitored for this adverse event. Several PCR based methods such as ligase-mediated (LM) PCR, linear-amplification-mediated (LAM) PCR and non-restrictive (nr) LAM PCR were developed to identify sites of vector integration. Coupling the power of next-generation sequencing technologies with various PCR approaches will provide a comprehensive and genome-wide profiling of insertion sites and increase throughput. In this bioinformatics study, we aimed to develop and apply quality metrics to viral insertion data obtained using next-generation sequencing. We developed five simple metrics for assessing next-generation sequencing data from different PCR products and showed how the metrics can be used to objectively compare runs performed with the same methodology as well as data generated using different PCR techniques. The results will help researchers troubleshoot complex methodologies, understand the quality of sequencing data, and provide a starting point for developing standardization of vector insertion site data analysis.

  11. Comparative high-throughput transcriptome sequencing and development of SiESTa, the Silene EST annotation database

    Directory of Open Access Journals (Sweden)

    Marais Gabriel AB

    2011-07-01

    further develop Silene as a plant model system. The genes characterized will be useful for future research not only in the species included in the present study, but also in related species for which no genomic resources are yet available. Our results demonstrate the efficiency of massively parallel transcriptome sequencing in a comparative framework as an approach for developing genomic resources in diverse groups of non-model organisms.

  12. High Throughput Transcriptomics @ USEPA (Toxicology ...

    Science.gov (United States)

    The ideal chemical testing approach will provide complete coverage of all relevant toxicological responses. It should be sensitive and specific It should identify the mechanism/mode-of-action (with dose-dependence). It should identify responses relevant to the species of interest. Responses should ideally be translated into tissue-, organ-, and organism-level effects. It must be economical and scalable. Using a High Throughput Transcriptomics platform within US EPA provides broader coverage of biological activity space and toxicological MOAs and helps fill the toxicological data gap. Slide presentation at the 2016 ToxForum on using High Throughput Transcriptomics at US EPA for broader coverage biological activity space and toxicological MOAs.

  13. Effective Optimization of Antibody Affinity by Phage Display Integrated with High-Throughput DNA Synthesis and Sequencing Technologies.

    Directory of Open Access Journals (Sweden)

    Dongmei Hu

    Full Text Available Phage display technology has been widely used for antibody affinity maturation for decades. The limited library sequence diversity together with excessive redundancy and labour-consuming procedure for candidate identification are two major obstacles to widespread adoption of this technology. We hereby describe a novel library generation and screening approach to address the problems. The approach started with the targeted diversification of multiple complementarity determining regions (CDRs of a humanized anti-ErbB2 antibody, HuA21, with a small perturbation mutagenesis strategy. A combination of three degenerate codons, NWG, NWC, and NSG, were chosen for amino acid saturation mutagenesis without introducing cysteine and stop residues. In total, 7,749 degenerate oligonucleotides were synthesized on two microchips and released to construct five single-chain antibody fragment (scFv gene libraries with 4 x 10(6 DNA sequences. Deep sequencing of the unselected and selected phage libraries using the Illumina platform allowed for an in-depth evaluation of the enrichment landscapes in CDR sequences and amino acid substitutions. Potent candidates were identified according to their high frequencies using NGS analysis, by-passing the need for the primary screening of target-binding clones. Furthermore, a subsequent library by recombination of the 10 most abundant variants from four CDRs was constructed and screened, and a mutant with 158-fold increased affinity (Kd = 25.5 pM was obtained. These results suggest the potential application of the developed methodology for optimizing the binding properties of other antibodies and biomolecules.

  14. Probe molecules (PrM) approach in adverse outcome pathway (AOP) based high throughput screening (HTS): in vivo discovery for developing in vitro target methods

    Science.gov (United States)

    Efficient and accurate adverse outcome pathway (AOP) based high-throughput screening (HTS) methods use a systems biology based approach to computationally model in vitro cellular and molecular data for rapid chemical prioritization; however, not all HTS assays are grounded by rel...

  15. Identification of DNA sequence variation in Campylobacter jejuni strains associated with the Guillain-Barré syndrome by high-throughput AFLP analysis

    Directory of Open Access Journals (Sweden)

    Endtz Hubert P

    2006-04-01

    Full Text Available Abstract Background Campylobacter jejuni is the predominant cause of antecedent infection in post-infectious neuropathies such as the Guillain-Barré (GBS and Miller Fisher syndromes (MFS. GBS and MFS are probably induced by molecular mimicry between human gangliosides and bacterial lipo-oligosaccharides (LOS. This study describes a new C. jejuni-specific high-throughput AFLP (htAFLP approach for detection and identification of DNA polymorphism, in general, and of putative GBS/MFS-markers, in particular. Results We compared 6 different isolates of the "genome strain" NCTC 11168 obtained from different laboratories. HtAFLP analysis generated approximately 3000 markers per stain, 19 of which were polymorphic. The DNA polymorphisms could not be confirmed by PCR-RFLP analysis, suggesting a baseline level of 0.6% AFLP artefacts. Comparison of NCTC 11168 with 4 GBS-associated strains revealed 23 potentially GBS-specific markers, 17 of which were identified by DNA sequencing. A collection of 27 GBS/MFS-associated and 17 enteritis control strains was analyzed with PCR-RFLP tests based on 11 of these markers. We identified 3 markers, located in the LOS biosynthesis genes cj1136, cj1138 and cj1139c, that were significantly associated with GBS (P = 0.024, P = 0.047 and P Conclusion This study shows that bacterial GBS markers are limited in number and located in the LOS biosynthesis genes, which corroborates the current consensus that LOS mimicry may be the prime etiologic determinant of GBS. Furthermore, our results demonstrate that htAFLP, with its high reproducibility and resolution, is an effective technique for the detection and subsequent identification of putative bacterial disease markers.

  16. Evaluation of MC1R high-throughput nucleotide sequencing data generated by the 1000 Genomes Project.

    Science.gov (United States)

    Marano, Leonardo Arduino; Marcorin, Letícia; Castelli, Erick da Cruz; Mendes-Junior, Celso Teixeira

    2017-01-01

    The advent of next-generation sequencing allows simultaneous processing of several genomic regions/individuals, increasing the availability and accuracy of whole-genome data. However, these new approaches may present some errors and bias due to alignment, genotype calling, and imputation methods. Despite these flaws, data obtained by next-generation sequencing can be valuable for population and evolutionary studies of specific genes, such as genes related to how pigmentation evolved among populations, one of the main topics in human evolutionary biology. Melanocortin-1 receptor (MC1R) is one of the most studied genes involved in pigmentation variation. As MC1R has already been suggested to affect melanogenesis and increase risk of developing melanoma, it constitutes one of the best models to understand how natural selection acts on pigmentation. Here we employed a locally developed pipeline to obtain genotype and haplotype data for MC1R from the raw sequencing data provided by the 1000 Genomes FTP site. We also compared such genotype data to Phase 3 VCF to evaluate its quality and discover any polymorphic sites that may have been overlooked. In conclusion, either the VCF file or one of the presently described pipelines could be used to obtain reliable and accurate genotype calling from the 1000 Genomes Phase 3 data.

  17. Investigation of Human Cancers for Retrovirus by Low-Stringency Target Enrichment and High-Throughput Sequencing

    DEFF Research Database (Denmark)

    Vinner, Lasse; Mourier, Tobias; Friis-Nielsen, Jens

    2015-01-01

    small to be detected. Sequence variation among virus genomes complicates application of sequence-specific, and highly sensitive, PCR methods. Therefore, we aimed to develop and characterize a method that permits sensitive detection of sequences despite considerable variation. We demonstrate that our low...... biopsies. Nonetheless, our generally applicable method makes sensitive detection possible and permits sequencing of distantly related sequences from complex material....

  18. Leveraging the power of high performance computing for next generation sequencing data analysis: tricks and twists from a high throughput exome workflow.

    Directory of Open Access Journals (Sweden)

    Amit Kawalia

    Full Text Available Next generation sequencing (NGS has been a great success and is now a standard method of research in the life sciences. With this technology, dozens of whole genomes or hundreds of exomes can be sequenced in rather short time, producing huge amounts of data. Complex bioinformatics analyses are required to turn these data into scientific findings. In order to run these analyses fast, automated workflows implemented on high performance computers are state of the art. While providing sufficient compute power and storage to meet the NGS data challenge, high performance computing (HPC systems require special care when utilized for high throughput processing. This is especially true if the HPC system is shared by different users. Here, stability, robustness and maintainability are as important for automated workflows as speed and throughput. To achieve all of these aims, dedicated solutions have to be developed. In this paper, we present the tricks and twists that we utilized in the implementation of our exome data processing workflow. It may serve as a guideline for other high throughput data analysis projects using a similar infrastructure. The code implementing our solutions is provided in the supporting information files.

  19. Leveraging the Power of High Performance Computing for Next Generation Sequencing Data Analysis: Tricks and Twists from a High Throughput Exome Workflow

    Science.gov (United States)

    Wonczak, Stephan; Thiele, Holger; Nieroda, Lech; Jabbari, Kamel; Borowski, Stefan; Sinha, Vishal; Gunia, Wilfried; Lang, Ulrich; Achter, Viktor; Nürnberg, Peter

    2015-01-01

    Next generation sequencing (NGS) has been a great success and is now a standard method of research in the life sciences. With this technology, dozens of whole genomes or hundreds of exomes can be sequenced in rather short time, producing huge amounts of data. Complex bioinformatics analyses are required to turn these data into scientific findings. In order to run these analyses fast, automated workflows implemented on high performance computers are state of the art. While providing sufficient compute power and storage to meet the NGS data challenge, high performance computing (HPC) systems require special care when utilized for high throughput processing. This is especially true if the HPC system is shared by different users. Here, stability, robustness and maintainability are as important for automated workflows as speed and throughput. To achieve all of these aims, dedicated solutions have to be developed. In this paper, we present the tricks and twists that we utilized in the implementation of our exome data processing workflow. It may serve as a guideline for other high throughput data analysis projects using a similar infrastructure. The code implementing our solutions is provided in the supporting information files. PMID:25942438

  20. Combination of amplified rDNA restriction analysis and high-throughput sequencing revealed the negative effect of colistin sulfate on the diversity of soil microorganisms.

    Science.gov (United States)

    Fan, Tingli; Sun, Yongxue; Peng, Jinju; Wu, Qun; Ma, Yi; Zhou, Xiaohui

    2018-01-01

    Colistin sulfate is widely used in both human and veterinary medicine. However, its effect on the microbial ecologyis unknown. In this study, we determined the effect of colistin sulfate on the diversity of soil microorganisms by amplified rDNA restriction analysis (ARDRA) and high-throughput sequencing.ARDRAshowed that the diversity of DNA from soil microorganisms was reduced after soil was treated with colistin sulfate, with the most dramatic reductionobserved after 35days of treatment. High-throughput sequencing showed that the Chao1 and abundance-based coverage estimators (ACE) were reduced in the soils treated with colistin sulfate for 35 dayscompared to those treated with colistin sulfate for 7days. Furthermore, Chao1 and ACE tended to be lower when higher concentration of colistin sulfate was used, suggesting that the microbial abundance is reduced by colistin sulfate in a dose-dependent manner. Shannon index showed that the diversity of soil microorganism was reduced upon treatment with colistin sulfate compared to the untreated control group. Following 7days of treatment, Bacillus, Clostridiumand Sphingomonas were sensitive to all the concentration of colistin sulfate used in this study. Following 35days of treatment, the abundance of Choroplast, Haliangium, Pseudomonas, Lactococcus, and Clostridium was significantly decreased. Our results demonstrated that colistin sulfate especially at high concentration (≥5mg/kg) could alter the population structure of microorganisms and consequently the microbial community function in soil. Copyright © 2017 Elsevier GmbH. All rights reserved.

  1. Leveraging the power of high performance computing for next generation sequencing data analysis: tricks and twists from a high throughput exome workflow.

    Science.gov (United States)

    Kawalia, Amit; Motameny, Susanne; Wonczak, Stephan; Thiele, Holger; Nieroda, Lech; Jabbari, Kamel; Borowski, Stefan; Sinha, Vishal; Gunia, Wilfried; Lang, Ulrich; Achter, Viktor; Nürnberg, Peter

    2015-01-01

    Next generation sequencing (NGS) has been a great success and is now a standard method of research in the life sciences. With this technology, dozens of whole genomes or hundreds of exomes can be sequenced in rather short time, producing huge amounts of data. Complex bioinformatics analyses are required to turn these data into scientific findings. In order to run these analyses fast, automated workflows implemented on high performance computers are state of the art. While providing sufficient compute power and storage to meet the NGS data challenge, high performance computing (HPC) systems require special care when utilized for high throughput processing. This is especially true if the HPC system is shared by different users. Here, stability, robustness and maintainability are as important for automated workflows as speed and throughput. To achieve all of these aims, dedicated solutions have to be developed. In this paper, we present the tricks and twists that we utilized in the implementation of our exome data processing workflow. It may serve as a guideline for other high throughput data analysis projects using a similar infrastructure. The code implementing our solutions is provided in the supporting information files.

  2. Characterization of the fecal microbiota using high-throughput sequencing reveals a stable microbial community during storage.

    Directory of Open Access Journals (Sweden)

    Ian M Carroll

    Full Text Available The handling and treatment of biological samples is critical when characterizing the composition of the intestinal microbiota between different ecological niches or diseases. Specifically, exposure of fecal samples to room temperature or long term storage in deep freezing conditions may alter the composition of the microbiota. Thus, we stored fecal samples at room temperature and monitored the stability of the microbiota over twenty four hours. We also investigated the stability of the microbiota in fecal samples during a six month storage period at -80°C. As the stability of the fecal microbiota may be affected by intestinal disease, we analyzed two healthy controls and two patients with irritable bowel syndrome (IBS. We used high-throughput pyrosequencing of the 16S rRNA gene to characterize the microbiota in fecal samples stored at room temperature or -80°C at six and seven time points, respectively. The composition of microbial communities in IBS patients and healthy controls were determined and compared using the Quantitative Insights Into Microbial Ecology (QIIME pipeline. The composition of the microbiota in fecal samples stored for different lengths of time at room temperature or -80°C clustered strongly based on the host each sample originated from. Our data demonstrates that fecal samples exposed to room or deep freezing temperatures for up to twenty four hours and six months, respectively, exhibit a microbial composition and diversity that shares more identity with its host of origin than any other sample.

  3. A comprehensive analysis of in vitro and in vivo genetic fitness of Pseudomonas aeruginosa using high-throughput sequencing of transposon libraries.

    Directory of Open Access Journals (Sweden)

    David Skurnik

    Full Text Available High-throughput sequencing of transposon (Tn libraries created within entire genomes identifies and quantifies the contribution of individual genes and operons to the fitness of organisms in different environments. We used insertion-sequencing (INSeq to analyze the contribution to fitness of all non-essential genes in the chromosome of Pseudomonas aeruginosa strain PA14 based on a library of ∼300,000 individual Tn insertions. In vitro growth in LB provided a baseline for comparison with the survival of the Tn insertion strains following 6 days of colonization of the murine gastrointestinal tract as well as a comparison with Tn-inserts subsequently able to systemically disseminate to the spleen following induction of neutropenia. Sequencing was performed following DNA extraction from the recovered bacteria, digestion with the MmeI restriction enzyme that hydrolyzes DNA 16 bp away from the end of the Tn insert, and fractionation into oligonucleotides of 1,200-1,500 bp that were prepared for high-throughput sequencing. Changes in frequency of Tn inserts into the P. aeruginosa genome were used to quantify in vivo fitness resulting from loss of a gene. 636 genes had <10 sequencing reads in LB, thus defined as unable to grow in this medium. During in vivo infection there were major losses of strains with Tn inserts in almost all known virulence factors, as well as respiration, energy utilization, ion pumps, nutritional genes and prophages. Many new candidates for virulence factors were also identified. There were consistent changes in the recovery of Tn inserts in genes within most operons and Tn insertions into some genes enhanced in vivo fitness. Strikingly, 90% of the non-essential genes were required for in vivo survival following systemic dissemination during neutropenia. These experiments resulted in the identification of the P. aeruginosa strain PA14 genes necessary for optimal survival in the mucosal and systemic environments of a mammalian

  4. Diet-induced hyperinsulinemia differentially affects glucose and protein metabolism: a high-throughput metabolomic approach in rats.

    Science.gov (United States)

    Etxeberria, U; de la Garza, A L; Martínez, J A; Milagro, F I

    2013-09-01

    Metabolomics is a high-throughput tool that quantifies and identifies the complete set of biofluid metabolites. This "omics" science is playing an increasing role in understanding the mechanisms involved in disease progression. The aim of this study was to determine whether a nontargeted metabolomic approach could be applied to investigate metabolic differences between obese rats fed a high-fat sucrose (HFS) diet for 9 weeks and control diet-fed rats. Animals fed with the HFS diet became obese, hyperleptinemic, hyperglycemic, hyperinsulinemic, and resistant to insulin. Serum samples of overnight-fasted animals were analyzed by (1)H NMR technique, and 49 metabolites were identified and quantified. The biochemical changes observed suggest that major metabolic processes like carbohydrate metabolism, β-oxidation, tricarboxylic acid cycle, Kennedy pathway, and folate-mediated one-carbon metabolism were altered in obese rats. The circulating levels of most amino acids were lower in obese animals. Serum levels of docosahexaenoic acid, linoleic acid, unsaturated n-6 fatty acids, and total polyunsaturated fatty acids also decreased in HFS-fed rats. The circulating levels of urea, six water-soluble metabolites (creatine, creatinine, choline, acetyl carnitine, formate, and allantoin), and two lipid compounds (phosphatidylcholines and sphingomyelin) were also significantly reduced by the HFS diet intake. This study offers further insight of the possible mechanisms implicated in the development of diet-induced obesity. It suggests that the HFS diet-induced hyperinsulinemia is responsible for the decrease in the circulating levels of urea, creatinine, and many amino acids, despite an increase in serum glucose levels.

  5. Electrofusion of cells on chip - From a static, parallel approach to a high-throughput, serial, microdroplet platform

    NARCIS (Netherlands)

    Kemna, Evelien

    2013-01-01

    In this thesis, the results of the development of two microfluidic devices (static and high-throughput) for efficient electrofucion of cells were described and evaulated. Electrofusion of cells is an important tool for generating antibody producing hybridomas. The hybridomas are the result of the

  6. The challenges of using high-throughput sequencing to track multiple bipartite mycoviruses of wild orchid-fungus partnerships over consecutive years.

    Science.gov (United States)

    Ong, Jamie W L; Li, Hua; Sivasithamparam, Krishnapillai; Dixon, Kingsley W; Jones, Michael G K; Wylie, Stephen J

    2017-10-01

    The bipartite alpha- and betapartitiviruses are recorded from a wide range of fungi and plants. Using a combination of dsRNA-enrichment, high-throughput shotgun sequencing and informatics, we report the occurrence of multiple new partitiviruses associated with mycorrhizal Ceratobasidium fungi, themselves symbiotically associated with a small wild population of Pterostylis sanguinea orchids in Australia, over two consecutive years. Twenty-one partial or near-complete sequences representing 16 definitive alpha- and betapartitivirus species, and further possible species, were detected from two fungal isolates. The majority of partitiviruses occurred in fungal isolates from both years. Two of the partitiviruses represent phylogenetically divergent forms of Alphapartitivirus, suggesting that they may have evolved under long geographical isolation there. We address the challenge of pairing the two genomic segments of partitiviruses to identify species when multiple partitiviruses co-infect a single host. Copyright © 2017 Elsevier Inc. All rights reserved.

  7. Culture-Independent Metagenomic Surveillance of Commercially Available Probiotics with High-Throughput Next-Generation Sequencing.

    Science.gov (United States)

    Patro, Jennifer N; Ramachandran, Padmini; Barnaba, Tammy; Mammel, Mark K; Lewis, Jada L; Elkins, Christopher A

    2016-01-01

    be accurate and true. Those products containing live microbials report both identity and viability on most product labels. This study used next-generation sequencing technology as an analytical tool in conjunction with classic culture methods to examine the validity of the labels on supplement products containing live microbials found in the United States marketplace. Our results show the importance of testing these products for identity, viability, and potential contaminants, as well as introduce a new culture-independent diagnostic approach for testing these products. Podcast: A podcast concerning this article is available.

  8. Characterization of bacteria in biopsies of colon and stools by high throughput sequencing of the V2 region of bacterial 16S rRNA gene in human.

    Directory of Open Access Journals (Sweden)

    Yukihide Momozawa

    Full Text Available BACKGROUND: The characterization of the human intestinal microflora and their interactions with the host have been identified as key components in the study of intestinal disorders such as inflammatory bowel diseases. High-throughput sequencing has enabled culture-independent studies to deeply analyze bacteria in the gut. It is possible with this technology to systematically analyze links between microbes and the genetic constitution of the host, such as DNA polymorphisms and methylation, and gene expression. METHODS AND FINDINGS: In this study the V2 region of the bacterial 16S ribosomal RNA (rRNA gene using 454 pyrosequencing from seven anatomic regions of human colon and two types of stool specimens were analyzed. The study examined the number of reads needed to ascertain differences between samples, the effect of DNA extraction procedures and PCR reproducibility, and differences between biopsies and stools in order to design a large scale systematic analysis of gut microbes. It was shown (1 that sequence coverage lower than 1,000 reads influenced quantitative and qualitative differences between samples measured by UniFrac distances. Distances between samples became stable after 1,000 reads. (2 Difference of extracted bacteria was observed between the two DNA extraction methods. In particular, Firmicutes Bacilli were not extracted well by one method. (3 Quantitative and qualitative difference in bacteria from ileum to rectum colon were not observed, but there was a significant positive trend between distances within colon and quantitative differences. Between sample type, biopsies or stools, quantitative and qualitative differences were observed. CONCLUSIONS: Results of human colonic bacteria analyzed using high-throughput sequencing were highly dependent on the experimental design, especially the number of sequence reads, DNA extraction method, and sample type.

  9. High-throughput 16S rRNA gene sequencing reveals alterations of intestinal microbiota in myalgic encephalomyelitis/chronic fatigue syndrome patients.

    Science.gov (United States)

    Frémont, Marc; Coomans, Danny; Massart, Sebastien; De Meirleir, Kenny

    2013-08-01

    Human intestinal microbiota plays an important role in the maintenance of host health by providing energy, nutrients, and immunological protection. Intestinal dysfunction is a frequent complaint in myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) patients, and previous reports suggest that dysbiosis, i.e. the overgrowth of abnormal populations of bacteria in the gut, is linked to the pathogenesis of the disease. We used high-throughput 16S rRNA gene sequencing to investigate the presence of specific alterations in the gut microbiota of ME/CFS patients from Belgium and Norway. 43 ME/CFS patients and 36 healthy controls were included in the study. Bacterial DNA was extracted from stool samples, PCR amplification was performed on 16S rRNA gene regions, and PCR amplicons were sequenced using Roche FLX 454 sequencer. The composition of the gut microbiota was found to differ between Belgian controls and Norwegian controls: Norwegians showed higher percentages of specific Firmicutes populations (Roseburia, Holdemania) and lower proportions of most Bacteroidetes genera. A highly significant separation could be achieved between Norwegian controls and Norwegian patients: patients presented increased proportions of Lactonifactor and Alistipes, as well as a decrease in several Firmicutes populations. In Belgian subjects the patient/control separation was less pronounced, however some abnormalities observed in Norwegian patients were also found in Belgian patients. These results show that intestinal microbiota is altered in ME/CFS. High-throughput sequencing is a useful tool to diagnose dysbiosis in patients and could help designing treatments based on gut microbiota modulation (antibiotics, pre and probiotics supplementation). Copyright © 2013 The Authors. Published by Elsevier Ltd.. All rights reserved.

  10. Use of genotyping by sequencing data to develop a high-throughput and multifunctional SNP panel for conservation applications in Pacific lamprey.

    Science.gov (United States)

    Hess, Jon E; Campbell, Nathan R; Docker, Margaret F; Baker, Cyndi; Jackson, Aaron; Lampman, Ralph; McIlraith, Brian; Moser, Mary L; Statler, David P; Young, William P; Wildbill, Andrew J; Narum, Shawn R

    2015-01-01

    Next-generation sequencing data can be mined for highly informative single nucleotide polymorphisms (SNPs) to develop high-throughput genomic assays for nonmodel organisms. However, choosing a set of SNPs to address a variety of objectives can be difficult because SNPs are often not equally informative. We developed an optimal combination of 96 high-throughput SNP assays from a total of 4439 SNPs identified in a previous study of Pacific lamprey (Entosphenus tridentatus) and used them to address four disparate objectives: parentage analysis, species identification and characterization of neutral and adaptive variation. Nine of these SNPs are FST outliers, and five of these outliers are localized within genes and significantly associated with geography, run-timing and dwarf life history. Two of the 96 SNPs were diagnostic for two other lamprey species that were morphologically indistinguishable at early larval stages and were sympatric in the Pacific Northwest. The majority (85) of SNPs in the panel were highly informative for parentage analysis, that is, putatively neutral with high minor allele frequency across the species' range. Results from three case studies are presented to demonstrate the broad utility of this panel of SNP markers in this species. As Pacific lamprey populations are undergoing rapid decline, these SNPs provide an important resource to address critical uncertainties associated with the conservation and recovery of this imperiled species. © 2014 John Wiley & Sons Ltd.

  11. Uncovering leaf rust responsive miRNAs in wheat (Triticum aestivum L.) using high-throughput sequencing and prediction of their targets through degradome analysis.

    Science.gov (United States)

    Kumar, Dhananjay; Dutta, Summi; Singh, Dharmendra; Prabhu, Kumble Vinod; Kumar, Manish; Mukhopadhyay, Kunal

    2017-01-01

    Deep sequencing identified 497 conserved and 559 novel miRNAs in wheat, while degradome analysis revealed 701 targets genes. QRT-PCR demonstrated differential expression of miRNAs during stages of leaf rust progression. Bread wheat (Triticum aestivum L.) is an important cereal food crop feeding 30 % of the world population. Major threat to wheat production is the rust epidemics. This study was targeted towards identification and functional characterizations of micro(mi)RNAs and their target genes in wheat in response to leaf rust ingression. High-throughput sequencing was used for transcriptome-wide identification of miRNAs and their expression profiling in retort to leaf rust using mock and pathogen-inoculated resistant and susceptible near-isogenic wheat plants. A total of 1056 mature miRNAs were identified, of which 497 miRNAs were conserved and 559 miRNAs were novel. The pathogen-inoculated resistant plants manifested more miRNAs compared with the pathogen infected susceptible plants. The miRNA counts increased in susceptible isoline due to leaf rust, conversely, the counts decreased in the resistant isoline in response to pathogenesis illustrating precise spatial tuning of miRNAs during compatible and incompatible interaction. Stem-loop quantitative real-time PCR was used to profile 10 highly differentially expressed miRNAs obtained from high-throughput sequencing data. The spatio-temporal profiling validated the differential expression of miRNAs between the isolines as well as in retort to pathogen infection. Degradome analysis provided 701 predicted target genes associated with defense response, signal transduction, development, metabolism, and transcriptional regulation. The obtained results indicate that wheat isolines employ diverse arrays of miRNAs that modulate their target genes during compatible and incompatible interaction. Our findings contribute to increase knowledge on roles of microRNA in wheat-leaf rust interactions and could help in rust

  12. SLAF-seq: an efficient method of large-scale de novo SNP discovery and genotyping using high-throughput sequencing.

    Directory of Open Access Journals (Sweden)

    Xiaowen Sun

    Full Text Available Large-scale genotyping plays an important role in genetic association studies. It has provided new opportunities for gene discovery, especially when combined with high-throughput sequencing technologies. Here, we report an efficient solution for large-scale genotyping. We call it specific-locus amplified fragment sequencing (SLAF-seq. SLAF-seq technology has several distinguishing characteristics: i deep sequencing to ensure genotyping accuracy; ii reduced representation strategy to reduce sequencing costs; iii pre-designed reduced representation scheme to optimize marker efficiency; and iv double barcode system for large populations. In this study, we tested the efficiency of SLAF-seq on rice and soybean data. Both sets of results showed strong consistency between predicted and practical SLAFs and considerable genotyping accuracy. We also report the highest density genetic map yet created for any organism without a reference genome sequence, common carp in this case, using SLAF-seq data. We detected 50,530 high-quality SLAFs with 13,291 SNPs genotyped in 211 individual carp. The genetic map contained 5,885 markers with 0.68 cM intervals on average. A comparative genomics study between common carp genetic map and zebrafish genome sequence map showed high-quality SLAF-seq genotyping results. SLAF-seq provides a high-resolution strategy for large-scale genotyping and can be generally applicable to various species and populations.

  13. Enhancing the detection of barcoded reads in high throughput DNA sequencing data by controlling the false discovery rate

    NARCIS (Netherlands)

    Buschmann, Tilo; Zhang, Rong; Brash, Douglas E.; Bystrykh, Leonid V.

    2014-01-01

    Background: DNA barcodes are short unique sequences used to label DNA or RNA-derived samples in multiplexed deep sequencing experiments. During the demultiplexing step, barcodes must be detected and their position identified. In some cases (e. g., with PacBio SMRT), the position of the barcode and

  14. ssHMM: extracting intuitive sequence-structure motifs from high-throughput RNA-binding protein data.

    Science.gov (United States)

    Heller, David; Krestel, Ralf; Ohler, Uwe; Vingron, Martin; Marsico, Annalisa

    2017-11-02

    RNA-binding proteins (RBPs) play an important role in RNA post-transcriptional regulation and recognize target RNAs via sequence-structure motifs. The extent to which RNA structure influences protein binding in the presence or absence of a sequence motif is still poorly understood. Existing RNA motif finders either take the structure of the RNA only partially into account, or employ models which are not directly interpretable as sequence-structure motifs. We developed ssHMM, an RNA motif finder based on a hidden Markov model (HMM) and Gibbs sampling which fully captures the relationship between RNA sequence and secondary structure preference of a given RBP. Compared to previous methods which output separate logos for sequence and structure, it directly produces a combined sequence-structure motif when trained on a large set of sequences. ssHMM's model is visualized intuitively as a graph and facilitates biological interpretation. ssHMM can be used to find novel bona fide sequence-structure motifs of uncharacterized RBPs, such as the one presented here for the YY1 protein. ssHMM reaches a high motif recovery rate on synthetic data, it recovers known RBP motifs from CLIP-Seq data, and scales linearly on the input size, being considerably faster than MEMERIS and RNAcontext on large datasets while being on par with GraphProt. It is freely available on Github and as a Docker image. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  15. Automation in the high-throughput selection of random combinatorial libraries--different approaches for select applications.

    Science.gov (United States)

    Glökler, Jörn; Schütze, Tatjana; Konthur, Zoltán

    2010-04-08

    Automation in combination with high throughput screening methods has revolutionised molecular biology in the last two decades. Today, many combinatorial libraries as well as several systems for automation are available. Depending on scope, budget and time, a different combination of library and experimental handling might be most effective. In this review we will discuss several concepts of combinatorial libraries and provide information as what to expect from these depending on the given context.

  16. Comparison of two high-throughput semiconductor chip sequencing platforms in noninvasive prenatal testing for Down syndrome in early pregnancy.

    Science.gov (United States)

    Kim, Sunshin; Jung, HeeJung; Han, Sung Hee; Lee, SeungJae; Kwon, JeongSub; Kim, Min Gyun; Chu, Hyungsik; Chen, Hongliang; Han, Kyudong; Kwak, Hwanjong; Park, Sunghoon; Joo, Hee Jae; Kim, Byung Chul; Bhak, Jong

    2016-04-30

    Noninvasive prenatal testing (NIPT) to detect fetal aneuploidy using next-generation sequencing on ion semiconductor platforms has become common. There are several sequencers that can generate sufficient DNA reads for NIPT. However, the approval criteria vary among platforms and countries. This can delay the introduction of such devices and systems to clinics. A comparison of the sensitivity and specificity of two different platforms using the same sequencing chemistry could be useful in NIPT for fetal chromosomal aneuploidies. This would improve healthcare authorities' confidence in decision-making on sequencing-based tests. One hundred and one pregnant women who were predicted at high risk of fetal defects using conventional prenatal screening tests, and who underwent definitive diagnosis by full karyotyping, were enrolled from three hospitals in Korea. Most of the pregnant women (69.79 %) received NIPT during weeks 11-13 of gestation and 30.21 % during weeks 14-18. We used Ion Torrent PGM and Proton semi-conductor-based sequencers with 0.3× sequencing coverage depth. The average total reads of 101 samples were approximately 4.5 and 7.6 M for PGM and Proton, respectively. A Burrows-Wheeler Aligner (BWA) algorithm was used for the alignment, and a z-score was used to decide fetal trisomy 21. Interactive dot diagrams from the sequencing data showed minimal z-score values of 2.07 and 2.10 to discriminate negative versus positive cases of fetal trisomy 21 for the two different sequencing systems. Our z-score-based discrimination method resulted in 100 % positive and negative prediction values for both ion semiconductor PGM and Proton sequencers, regardless of their sequencing chip and chemistry differences. Both platforms performed well at an early stage (11-13 weeks of gestation) compared with previous studies. These results suggested that, using two different sequencers, NIPT to detect fetal trisomy 21 in early pregnancy is accurate and platform

  17. Identification and characterization of cold-responsive microRNAs in tea plant (Camellia sinensis) and their targets using high-throughput sequencing and degradome analysis.

    Science.gov (United States)

    Zhang, Yue; Zhu, Xujun; Chen, Xuan; Song, Changnian; Zou, Zhongwei; Wang, Yuhua; Wang, Mingle; Fang, Wanping; Li, Xinghui

    2014-10-21

    MicroRNAs (miRNAs) are approximately 19 ~ 21 nucleotide noncoding RNAs produced by Dicer-catalyzed excision from stem-loop precursors. Many plant miRNAs have critical functions in development, nutrient homeostasis, abiotic stress responses, and pathogen responses via interaction with specific target mRNAs. Camellia sinensis is one of the most important commercial beverage crops in the world. However, miRNAs associated with cold stress tolerance in C. sinensis remains unexplored. The use of high-throughput sequencing can provide a much deeper understanding of miRNAs. To obtain more insight into the function of miRNAs in cold stress tolerance, Illumina sequencing of C. sinensis sRNA was conducted. Solexa sequencing technology was used for high-throughput sequencing of the small RNA library from the cold treatment of tea leaves. To align the sequencing data with known plant miRNAs, we characterized 106 conserved C. sinensis miRNAs. In addition, 215 potential candidate miRNAs were found, among, which 98 candidates with star sequences were chosen as novel miRNAs. Both congruously and differentially regulated miRNAs were obtained, and cultivar-specific miRNAs were identified by microarray-based hybridization in response to cold stress. The results were also confirmed by quantitative real-time polymerase chain reaction. To confirm the targets of miRNAs, two degradome libraries from two treatments were constructed. According to degradome sequencing, 455 and 591 genes were identified as cleavage targets of miRNAs from cold treatments and control libraries, respectively, and 283 targets were present in both libraries. Functional analysis of these miRNA targets indicated their involvement in important activities, such as development, regulation of transcription, and stress response. We discovered 31 up-regulated miRNAs and 43 down-regulated miRNAs in 'Yingshuang', and 46 up-regulated miRNA and 45 down-regulated miRNAs in 'Baiye 1' in response to cold stress, respectively. A

  18. Identification of Treponema pedis as the predominant Treponema species in porcine skin ulcers by fluorescence in situ hybridization and high-throughput sequencing

    DEFF Research Database (Denmark)

    Karlsson, Frida; Schou, Kirstine Klitgaard; Jensen, Tim Kåre

    2014-01-01

    Skin lesions often seen in pig production are of great animal welfare concern. To study the potential role of Treponema bacteria in porcine skin ulcers, we investigated the presence and distribution of these organisms in decubital shoulder ulcers (n=51) and ear necroses (n=54) by fluorescence....... The results from this study point toward an important role of T. pedis as a secondary bacterial infection in porcine skin ulcers, especially in severe and chronic lesions....... in situ hybridization (FISH) and high-throughput sequencing. In addition, two cases of facial ulcers and five cases of other skin ulcers were included in the study. Samples from all 112 skin lesions and intact skin from pigs without skin ulcers (n=14) were screened by FISH. Three different oligonucleotide...

  19. Enhancing the detection of barcoded reads in high throughput DNA sequencing data by controlling the false discovery rate.

    Science.gov (United States)

    Buschmann, Tilo; Zhang, Rong; Brash, Douglas E; Bystrykh, Leonid V

    2014-08-07

    DNA barcodes are short unique sequences used to label DNA or RNA-derived samples in multiplexed deep sequencing experiments. During the demultiplexing step, barcodes must be detected and their position identified. In some cases (e.g., with PacBio SMRT), the position of the barcode and DNA context is not well defined. Many reads start inside the genomic insert so that adjacent primers might be missed. The matter is further complicated by coincidental similarities between barcode sequences and reference DNA. Therefore, a robust strategy is required in order to detect barcoded reads and avoid a large number of false positives or negatives.For mass inference problems such as this one, false discovery rate (FDR) methods are powerful and balanced solutions. Since existing FDR methods cannot be applied to this particular problem, we present an adapted FDR method that is suitable for the detection of barcoded reads as well as suggest possible improvements. In our analysis, barcode sequences showed high rates of coincidental similarities with the Mus musculus reference DNA. This problem became more acute when the length of the barcode sequence decreased and the number of barcodes in the set increased. The method presented in this paper controls the tail area-based false discovery rate to distinguish between barcoded and unbarcoded reads. This method helps to establish the highest acceptable minimal distance between reads and barcode sequences. In a proof of concept experiment we correctly detected barcodes in 83% of the reads with a precision of 89%. Sensitivity improved to 99% at 99% precision when the adjacent primer sequence was incorporated in the analysis. The analysis was further improved using a paired end strategy. Following an analysis of the data for sequence variants induced in the Atp1a1 gene of C57BL/6 murine melanocytes by ultraviolet light and conferring resistance to ouabain, we found no evidence of cross-contamination of DNA material between samples. Our

  20. High throughput protein production screening

    Science.gov (United States)

    Beernink, Peter T [Walnut Creek, CA; Coleman, Matthew A [Oakland, CA; Segelke, Brent W [San Ramon, CA

    2009-09-08

    Methods, compositions, and kits for the cell-free production and analysis of proteins are provided. The invention allows for the production of proteins from prokaryotic sequences or eukaryotic sequences, including human cDNAs using PCR and IVT methods and detecting the proteins through fluorescence or immunoblot techniques. This invention can be used to identify optimized PCR and WT conditions, codon usages and mutations. The methods are readily automated and can be used for high throughput analysis of protein expression levels, interactions, and functional states.

  1. Developing high throughput genotyped chromosome segment substitution lines based on population whole-genome re-sequencing in rice (Oryza sativa L.

    Directory of Open Access Journals (Sweden)

    Gu Minghong

    2010-11-01

    Full Text Available Abstract Background Genetic populations provide the basis for a wide range of genetic and genomic studies and have been widely used in genetic mapping, gene discovery and genomics-assisted breeding. Chromosome segment substitution lines (CSSLs are the most powerful tools for the detection and precise mapping of quantitative trait loci (QTLs, for the analysis of complex traits in plant molecular genetics. Results In this study, a wide population consisting of 128 CSSLs was developed, derived from the crossing and back-crossing of two sequenced rice cultivars: 9311, an elite indica cultivar as the recipient and Nipponbare, a japonica cultivar as the donor. First, a physical map of the 128 CSSLs was constructed on the basis of estimates of the lengths and locations of the substituted chromosome segments using 254 PCR-based molecular markers. From this map, the total size of the 142 substituted segments in the population was 882.2 Mb, was 2.37 times that of the rice genome. Second, every CSSL underwent high-throughput genotyping by whole-genome re-sequencing with a 0.13× genome sequence, and an ultrahigh-quality physical map was constructed. This sequencing-based physical map indicated that 117 new segments were detected; almost all were shorter than 3 Mb and were not apparent in the molecular marker map. Furthermore, relative to the molecular marker-based map, the sequencing-based map yielded more precise recombination breakpoint determination and greater accuracy of the lengths of the substituted segments, and provided more accurate background information. Third, using the 128 CSSLs combined with the bin-map converted from the sequencing-based physical map, a multiple linear regression QTL analysis mapped nine QTLs, which explained 89.50% of the phenotypic variance for culm length. A large-effect QTL was located in a 791,655 bp region that contained the rice 'green revolution' gene. Conclusions The present results demonstrated that high

  2. Developing high throughput genotyped chromosome segment substitution lines based on population whole-genome re-sequencing in rice (Oryza sativa L.).

    Science.gov (United States)

    Xu, Jianjun; Zhao, Qiang; Du, Peina; Xu, Chenwu; Wang, Baohe; Feng, Qi; Liu, Qiaoquan; Tang, Shuzhu; Gu, Minghong; Han, Bin; Liang, Guohua

    2010-11-24

    Genetic populations provide the basis for a wide range of genetic and genomic studies and have been widely used in genetic mapping, gene discovery and genomics-assisted breeding. Chromosome segment substitution lines (CSSLs) are the most powerful tools for the detection and precise mapping of quantitative trait loci (QTLs), for the analysis of complex traits in plant molecular genetics. In this study, a wide population consisting of 128 CSSLs was developed, derived from the crossing and back-crossing of two sequenced rice cultivars: 9311, an elite indica cultivar as the recipient and Nipponbare, a japonica cultivar as the donor. First, a physical map of the 128 CSSLs was constructed on the basis of estimates of the lengths and locations of the substituted chromosome segments using 254 PCR-based molecular markers. From this map, the total size of the 142 substituted segments in the population was 882.2 Mb, was 2.37 times that of the rice genome. Second, every CSSL underwent high-throughput genotyping by whole-genome re-sequencing with a 0.13× genome sequence, and an ultrahigh-quality physical map was constructed. This sequencing-based physical map indicated that 117 new segments were detected; almost all were shorter than 3 Mb and were not apparent in the molecular marker map. Furthermore, relative to the molecular marker-based map, the sequencing-based map yielded more precise recombination breakpoint determination and greater accuracy of the lengths of the substituted segments, and provided more accurate background information. Third, using the 128 CSSLs combined with the bin-map converted from the sequencing-based physical map, a multiple linear regression QTL analysis mapped nine QTLs, which explained 89.50% of the phenotypic variance for culm length. A large-effect QTL was located in a 791,655 bp region that contained the rice 'green revolution' gene. The present results demonstrated that high throughput genotyped CSSLs combine the advantages of an ultrahigh

  3. Investigation of Human Cancers for Retrovirus by Low-Stringency Target Enrichment and High-Throughput Sequencing

    DEFF Research Database (Denmark)

    Vinner, Lasse; Mourier, Tobias; Friis-Nielsen, Jens

    2015-01-01

    sequences in clinical samples. We used this method to conduct an investigation for novel retrovirus in samples from three cancer types. In accordance with recent studies our investigation revealed no retroviral infections in human B-cell lymphoma cells, cutaneous T-cell lymphoma or colorectal cancer...

  4. Two-stage clustering (TSC: a pipeline for selecting operational taxonomic units for the high-throughput sequencing of PCR amplicons.

    Directory of Open Access Journals (Sweden)

    Xiao-Tao Jiang

    Full Text Available Clustering 16S/18S rRNA amplicon sequences into operational taxonomic units (OTUs is a critical step for the bioinformatic analysis of microbial diversity. Here, we report a pipeline for selecting OTUs with a relatively low computational demand and a high degree of accuracy. This pipeline is referred to as two-stage clustering (TSC because it divides tags into two groups according to their abundance and clusters them sequentially. The more abundant group is clustered using a hierarchical algorithm similar to that in ESPRIT, which has a high degree of accuracy but is computationally costly for large datasets. The rarer group, which includes the majority of tags, is then heuristically clustered to improve efficiency. To further improve the computational efficiency and accuracy, two preclustering steps are implemented. To maintain clustering accuracy, all tags are grouped into an OTU depending on their pairwise Needleman-Wunsch distance. This method not only improved the computational efficiency but also mitigated the spurious OTU estimation from 'noise' sequences. In addition, OTUs clustered using TSC showed comparable or improved performance in beta-diversity comparisons compared to existing OTU selection methods. This study suggests that the distribution of sequencing datasets is a useful property for improving the computational efficiency and increasing the clustering accuracy of the high-throughput sequencing of PCR amplicons. The software and user guide are freely available at http://hwzhoulab.smu.edu.cn/paperdata/.

  5. High-throughput sequencing of partially edited trypanosome mRNAs reveals barriers to editing progression and evidence for alternative editing

    Science.gov (United States)

    Simpson, Rachel M.; Bruno, Andrew E.; Bard, Jonathan E.; Buck, Michael J.

    2016-01-01

    Uridine insertion/deletion RNA editing in kinetoplastids entails the addition and deletion of uridine residues throughout the length of mitochondrial transcripts to generate translatable mRNAs. This complex process requires the coordinated use of several multiprotein complexes as well as the sequential use of noncoding template RNAs called guide RNAs. The majority of steady-state mitochondrial mRNAs are partially edited and often contain regions of mis-editing, termed junctions, whose role is unclear. Here, we report a novel method for sequencing entire populations of pre-edited partially edited, and fully edited RNAs and analyzing editing characteristics across populations using a new bioinformatics tool, the Trypanosome RNA Editing Alignment Tool (TREAT). Using TREAT, we examined populations of two transcripts, RPS12 and ND7-5′, in wild-type Trypanosoma brucei. We provide evidence that the majority of partially edited sequences contain junctions, that intrinsic pause sites arise during the progression of editing, and that the mechanisms that mediate pausing in the generation of canonical fully edited sequences are distinct from those that mediate the ends of junction regions. Furthermore, we identify alternatively edited sequences that constitute plausible alternative open reading frames and identify substantial variability in the 5′ UTRs of both canonical and alternatively edited sequences. This work is the first to use high-throughput sequencing to examine full-length sequences of whole populations of partially edited transcripts. Our method is highly applicable to current questions in the RNA editing field, including defining mechanisms of action for editing factors and identifying potential alternatively edited sequences. PMID:26908922

  6. High-throughput sequencing of partially edited trypanosome mRNAs reveals barriers to editing progression and evidence for alternative editing.

    Science.gov (United States)

    Simpson, Rachel M; Bruno, Andrew E; Bard, Jonathan E; Buck, Michael J; Read, Laurie K

    2016-05-01

    Uridine insertion/deletion RNA editing in kinetoplastids entails the addition and deletion of uridine residues throughout the length of mitochondrial transcripts to generate translatable mRNAs. This complex process requires the coordinated use of several multiprotein complexes as well as the sequential use of noncoding template RNAs called guide RNAs. The majority of steady-state mitochondrial mRNAs are partially edited and often contain regions of mis-editing, termed junctions, whose role is unclear. Here, we report a novel method for sequencing entire populations of pre-edited partially edited, and fully edited RNAs and analyzing editing characteristics across populations using a new bioinformatics tool, the Trypanosome RNA Editing Alignment Tool (TREAT). Using TREAT, we examined populations of two transcripts, RPS12 and ND7-5', in wild-typeTrypanosoma brucei We provide evidence that the majority of partially edited sequences contain junctions, that intrinsic pause sites arise during the progression of editing, and that the mechanisms that mediate pausing in the generation of canonical fully edited sequences are distinct from those that mediate the ends of junction regions. Furthermore, we identify alternatively edited sequences that constitute plausible alternative open reading frames and identify substantial variability in the 5' UTRs of both canonical and alternatively edited sequences. This work is the first to use high-throughput sequencing to examine full-length sequences of whole populations of partially edited transcripts. Our method is highly applicable to current questions in the RNA editing field, including defining mechanisms of action for editing factors and identifying potential alternatively edited sequences. © 2016 Simpson et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.

  7. High-throughput Transcriptome Sequencing Reveals the Role of Anthocyanin Metabolism in Begonia semperflorens Under High Light Stress.

    Science.gov (United States)

    Wang, Jiawan; Guo, Meili; Li, Yonghua; Wu, Ronghua; Zhang, Kaiming

    2017-07-26

    Begonia semperflorens is an ornamental perennial herb. The leaves of B. semperflorens turn red under increased light, which increases the ornamental value of the plant. The color of the leaves is determined by anthocyanin metabolism. In B. semperflorens leaves, anthocyanin metabolism is sensitive to external environmental conditions such as temperature, light and hormone levels. To explore this process in detail and to assess gene expression under high light stress, transcriptome analysis was performed by RNA sequencing using the sequencing-by-synthesis method. A total of 83 699 unigenes were isolated, and 51 754 unigenes were annotated using the NR, Swiss-Prot, KEGG, COG, KOG, GO and Pfam databases. Furthermore, many of the differentially expressed genes were related to factors associated with anthocyanin metabolism, which influences the expression of leaf color. © 2017 The American Society of Photobiology.

  8. A new perspective on studying burial environment before archaeological excavation: analyzing bacterial community distribution by high-throughput sequencing

    OpenAIRE

    Jinjin Xu; Yanfei Wei; Hanqing Jia; Lin Xiao; Decai Gong

    2017-01-01

    Burial conditions play a crucial role in archaeological heritage preservation. Especially, the microorganisms were considered as the leading causes which incurred degradation and vanishment of historic materials. In this article, we analyzed bacterial diversity and community structure from M1 of Wangshanqiao using 16?S rRNA gene amplicon sequencing. The results indicated that microbial communities in burial conditions were diverse among four different samples. The samples from the robber hole...

  9. Automated high throughput nucleic acid purification from formalin-fixed paraffin-embedded tissue samples for next generation sequence analysis.

    Science.gov (United States)

    Haile, Simon; Pandoh, Pawan; McDonald, Helen; Corbett, Richard D; Tsao, Philip; Kirk, Heather; MacLeod, Tina; Jones, Martin; Bilobram, Steve; Brooks, Denise; Smailus, Duane; Steidl, Christian; Scott, David W; Bala, Miruna; Hirst, Martin; Miller, Diane; Moore, Richard A; Mungall, Andrew J; Coope, Robin J; Ma, Yussanne; Zhao, Yongjun; Holt, Rob A; Jones, Steven J; Marra, Marco A

    2017-01-01

    Curation and storage of formalin-fixed, paraffin-embedded (FFPE) samples are standard procedures in hospital pathology laboratories around the world. Many thousands of such samples exist and could be used for next generation sequencing analysis. Retrospective analyses of such samples are important for identifying molecular correlates of carcinogenesis, treatment history and disease outcomes. Two major hurdles in using FFPE material for sequencing are the damaged nature of the nucleic acids and the labor-intensive nature of nucleic acid purification. These limitations and a number of other issues that span multiple steps from nucleic acid purification to library construction are addressed here. We optimized and automated a 96-well magnetic bead-based extraction protocol that can be scaled to large cohorts and is compatible with automation. Using sets of 32 and 91 individual FFPE samples respectively, we generated libraries from 100 ng of total RNA and DNA starting amounts with 95-100% success rate. The use of the resulting RNA in micro-RNA sequencing was also demonstrated. In addition to offering the potential of scalability and rapid throughput, the yield obtained with lower input requirements makes these methods applicable to clinical samples where tissue abundance is limiting.

  10. High-throughput amplicon sequencing and stream benthic bacteria: identifying the best taxonomic level for multiple-stressor research

    Science.gov (United States)

    Salis, R. K.; Bruder, A.; Piggott, J. J.; Summerfield, T. C.; Matthaei, C. D.

    2017-03-01

    Disentangling the individual and interactive effects of multiple stressors on microbial communities is a key challenge to our understanding and management of ecosystems. Advances in molecular techniques allow studying microbial communities in situ and with high taxonomic resolution. However, the taxonomic level which provides the best trade-off between our ability to detect multiple-stressor effects versus the goal of studying entire communities remains unknown. We used outdoor mesocosms simulating small streams to investigate the effects of four agricultural stressors (nutrient enrichment, the nitrification inhibitor dicyandiamide (DCD), fine sediment and flow velocity reduction) on stream bacteria (phyla, orders, genera, and species represented by Operational Taxonomic Units with 97% sequence similarity). Community composition was assessed using amplicon sequencing (16S rRNA gene, V3-V4 region). DCD was the most pervasive stressor, affecting evenness and most abundant taxa, followed by sediment and flow velocity. Stressor pervasiveness was similar across taxonomic levels and lower levels did not perform better in detecting stressor effects. Community coverage decreased from 96% of all sequences for abundant phyla to 28% for species. Order-level responses were generally representative of responses of corresponding genera and species, suggesting that this level may represent the best compromise between stressor sensitivity and coverage of bacterial communities.

  11. Heap: a highly sensitive and accurate SNP detection tool for low-coverage high-throughput sequencing data

    KAUST Repository

    Kobayashi, Masaaki

    2017-04-20

    Recent availability of large-scale genomic resources enables us to conduct so called genome-wide association studies (GWAS) and genomic prediction (GP) studies, particularly with next-generation sequencing (NGS) data. The effectiveness of GWAS and GP depends on not only their mathematical models, but the quality and quantity of variants employed in the analysis. In NGS single nucleotide polymorphism (SNP) calling, conventional tools ideally require more reads for higher SNP sensitivity and accuracy. In this study, we aimed to develop a tool, Heap, that enables robustly sensitive and accurate calling of SNPs, particularly with a low coverage NGS data, which must be aligned to the reference genome sequences in advance. To reduce false positive SNPs, Heap determines genotypes and calls SNPs at each site except for sites at the both ends of reads or containing a minor allele supported by only one read. Performance comparison with existing tools showed that Heap achieved the highest F-scores with low coverage (7X) restriction-site associated DNA sequencing reads of sorghum and rice individuals. This will facilitate cost-effective GWAS and GP studies in this NGS era. Code and documentation of Heap are freely available from https://github.com/meiji-bioinf/heap (29 March 2017, date last accessed) and our web site (http://bioinf.mind.meiji.ac.jp/lab/en/tools.html (29 March 2017, date last accessed)).

  12. High-Throughput Analysis With 96-Capillary Array Electrophoresis and Integrated Sample Preparation for DNA Sequencing Based on Laser Induced Fluorescence Detection

    Energy Technology Data Exchange (ETDEWEB)

    Xue, Gang [Iowa State Univ., Ames, IA (United States)

    2001-01-01

    The purpose of this research was to improve the fluorescence detection for the multiplexed capillary array electrophoresis, extend its use beyond the genomic analysis, and to develop an integrated micro-sample preparation system for high-throughput DNA sequencing. The authors first demonstrated multiplexed capillary zone electrophoresis (CZE) and micellar electrokinetic chromatography (MEKC) separations in a 96-capillary array system with laser-induced fluorescence detection. Migration times of four kinds of fluoresceins and six polyaromatic hydrocarbons (PAHs) are normalized to one of the capillaries using two internal standards. The relative standard deviations (RSD) after normalization are 0.6-1.4% for the fluoresceins and 0.1-1.5% for the PAHs. Quantitative calibration of the separations based on peak areas is also performed, again with substantial improvement over the raw data. This opens up the possibility of performing massively parallel separations for high-throughput chemical analysis for process monitoring, combinatorial synthesis, and clinical diagnosis. The authors further improved the fluorescence detection by step laser scanning. A computer-controlled galvanometer scanner is adapted for scanning a focused laser beam across a 96-capillary array for laser-induced fluorescence detection. The signal at a single photomultiplier tube is temporally sorted to distinguish among the capillaries. The limit of detection for fluorescein is 3 x 10-11 M (S/N = 3) for 5-mW of total laser power scanned at 4 Hz. The observed cross-talk among capillaries is 0.2%. Advantages include the efficient utilization of light due to the high duty-cycle of step scan, good detection performance due to the reduction of stray light, ruggedness due to the small mass of the galvanometer mirror, low cost due to the simplicity of components, and flexibility due to the independent paths for excitation and emission.

  13. The Utilization of Formalin Fixed-Paraffin-Embedded Specimens in High Throughput Genomic Studies

    Directory of Open Access Journals (Sweden)

    Pan Zhang

    2017-01-01

    Full Text Available High throughput genomic assays empower us to study the entire human genome in short time with reasonable cost. Formalin fixed-paraffin-embedded (FFPE tissue processing remains the most economical approach for longitudinal tissue specimen storage. Therefore, the ability to apply high throughput genomic applications to FFPE specimens can expand clinical assays and discovery. Many studies have measured the accuracy and repeatability of data generated from FFPE specimens using high throughput genomic assays. Together, these studies demonstrate feasibility and provide crucial guidance for future studies using FFPE specimens. Here, we summarize the findings of these studies and discuss the limitations of high throughput data generated from FFPE specimens across several platforms that include microarray, high throughput sequencing, and NanoString.

  14. High-throughput sequencing of six bamboo chloroplast genomes: phylogenetic implications for temperate woody bamboos (Poaceae: Bambusoideae.

    Directory of Open Access Journals (Sweden)

    Yun-Jie Zhang

    Full Text Available BACKGROUND: Bambusoideae is the only subfamily that contains woody members in the grass family, Poaceae. In phylogenetic analyses, Bambusoideae, Pooideae and Ehrhartoideae formed the BEP clade, yet the internal relationships of this clade are controversial. The distinctive life history (infrequent flowering and predominance of asexual reproduction of woody bamboos makes them an interesting but taxonomically difficult group. Phylogenetic analyses based on large DNA fragments could only provide a moderate resolution of woody bamboo relationships, although a robust phylogenetic tree is needed to elucidate their evolutionary history. Phylogenomics is an alternative choice for resolving difficult phylogenies. METHODOLOGY/PRINCIPAL FINDINGS: Here we present the complete nucleotide sequences of six woody bamboo chloroplast (cp genomes using Illumina sequencing. These genomes are similar to those of other grasses and rather conservative in evolution. We constructed a phylogeny of Poaceae from 24 complete cp genomes including 21 grass species. Within the BEP clade, we found strong support for a sister relationship between Bambusoideae and Pooideae. In a substantial improvement over prior studies, all six nodes within Bambusoideae were supported with ≥0.95 posterior probability from Bayesian inference and 5/6 nodes resolved with 100% bootstrap support in maximum parsimony and maximum likelihood analyses. We found that repeats in the cp genome could provide phylogenetic information, while caution is needed when using indels in phylogenetic analyses based on few selected genes. We also identified relatively rapidly evolving cp genome regions that have the potential to be used for further phylogenetic study in Bambusoideae. CONCLUSIONS/SIGNIFICANCE: The cp genome of Bambusoideae evolved slowly, and phylogenomics based on whole cp genome could be used to resolve major relationships within the subfamily. The difficulty in resolving the diversification among

  15. Assessment of whole genome amplification-induced bias through high-throughput, massively parallel whole genome sequencing

    Directory of Open Access Journals (Sweden)

    Plant Ramona N

    2006-08-01

    Full Text Available Abstract Background Whole genome amplification is an increasingly common technique through which minute amounts of DNA can be multiplied to generate quantities suitable for genetic testing and analysis. Questions of amplification-induced error and template bias generated by these methods have previously been addressed through either small scale (SNPs or large scale (CGH array, FISH methodologies. Here we utilized whole genome sequencing to assess amplification-induced bias in both coding and non-coding regions of two bacterial genomes. Halobacterium species NRC-1 DNA and Campylobacter jejuni were amplified by several common, commercially available protocols: multiple displacement amplification, primer extension pre-amplification and degenerate oligonucleotide primed PCR. The amplification-induced bias of each method was assessed by sequencing both genomes in their entirety using the 454 Sequencing System technology and comparing the results with those obtained from unamplified controls. Results All amplification methodologies induced statistically significant bias relative to the unamplified control. For the Halobacterium species NRC-1 genome, assessed at 100 base resolution, the D-statistics from GenomiPhi-amplified material were 119 times greater than those from unamplified material, 164.0 times greater for Repli-G, 165.0 times greater for PEP-PCR and 252.0 times greater than the unamplified controls for DOP-PCR. For Campylobacter jejuni, also analyzed at 100 base resolution, the D-statistics from GenomiPhi-amplified material were 15 times greater than those from unamplified material, 19.8 times greater for Repli-G, 61.8 times greater for PEP-PCR and 220.5 times greater than the unamplified controls for DOP-PCR. Conclusion Of the amplification methodologies examined in this paper, the multiple displacement amplification products generated the least bias, and produced significantly higher yields of amplified DNA.

  16. Identification and characterization of Wilt and salt stress-responsive microRNAs in chickpea through high-throughput sequencing.

    Science.gov (United States)

    Kohli, Deshika; Joshi, Gopal; Deokar, Amit Atmaram; Bhardwaj, Ankur R; Agarwal, Manu; Katiyar-Agarwal, Surekha; Srinivasan, Ramamurthy; Jain, Pradeep Kumar

    2014-01-01

    Chickpea (Cicer arietinum) is the second most widely grown legume worldwide and is the most important pulse crop in the Indian subcontinent. Chickpea productivity is adversely affected by a large number of biotic and abiotic stresses. MicroRNAs (miRNAs) have been implicated in the regulation of plant responses to several biotic and abiotic stresses. This study is the first attempt to identify chickpea miRNAs that are associated with biotic and abiotic stresses. The wilt infection that is caused by the fungus Fusarium oxysporum f.sp. ciceris is one of the major diseases severely affecting chickpea yields. Of late, increasing soil salinization has become a major problem in realizing these potential yields. Three chickpea libraries using fungal-infected, salt-treated and untreated seedlings were constructed and sequenced using next-generation sequencing technology. A total of 12,135,571 unique reads were obtained. In addition to 122 conserved miRNAs belonging to 25 different families, 59 novel miRNAs along with their star sequences were identified. Four legume-specific miRNAs, including miR5213, miR5232, miR2111 and miR2118, were found in all of the libraries. Poly(A)-based qRT-PCR (Quantitative real-time PCR) was used to validate eleven conserved and five novel miRNAs. miR530 was highly up regulated in response to fungal infection, which targets genes encoding zinc knuckle- and microtubule-associated proteins. Many miRNAs responded in a similar fashion under both biotic and abiotic stresses, indicating the existence of cross talk between the pathways that are involved in regulating these stresses. The potential target genes for the conserved and novel miRNAs were predicted based on sequence homologies. miR166 targets a HD-ZIPIII transcription factor and was validated by 5' RLM-RACE. This study has identified several conserved and novel miRNAs in the chickpea that are associated with gene regulation following exposure to wilt and salt stress.

  17. Identification and characterization of Wilt and salt stress-responsive microRNAs in chickpea through high-throughput sequencing.

    Directory of Open Access Journals (Sweden)

    Deshika Kohli

    Full Text Available Chickpea (Cicer arietinum is the second most widely grown legume worldwide and is the most important pulse crop in the Indian subcontinent. Chickpea productivity is adversely affected by a large number of biotic and abiotic stresses. MicroRNAs (miRNAs have been implicated in the regulation of plant responses to several biotic and abiotic stresses. This study is the first attempt to identify chickpea miRNAs that are associated with biotic and abiotic stresses. The wilt infection that is caused by the fungus Fusarium oxysporum f.sp. ciceris is one of the major diseases severely affecting chickpea yields. Of late, increasing soil salinization has become a major problem in realizing these potential yields. Three chickpea libraries using fungal-infected, salt-treated and untreated seedlings were constructed and sequenced using next-generation sequencing technology. A total of 12,135,571 unique reads were obtained. In addition to 122 conserved miRNAs belonging to 25 different families, 59 novel miRNAs along with their star sequences were identified. Four legume-specific miRNAs, including miR5213, miR5232, miR2111 and miR2118, were found in all of the libraries. Poly(A-based qRT-PCR (Quantitative real-time PCR was used to validate eleven conserved and five novel miRNAs. miR530 was highly up regulated in response to fungal infection, which targets genes encoding zinc knuckle- and microtubule-associated proteins. Many miRNAs responded in a similar fashion under both biotic and abiotic stresses, indicating the existence of cross talk between the pathways that are involved in regulating these stresses. The potential target genes for the conserved and novel miRNAs were predicted based on sequence homologies. miR166 targets a HD-ZIPIII transcription factor and was validated by 5' RLM-RACE. This study has identified several conserved and novel miRNAs in the chickpea that are associated with gene regulation following exposure to wilt and salt stress.

  18. High-throughput sequencing identification of genes involved with Varroa destructor resistance in the eastern honeybee, Apis cerana.

    Science.gov (United States)

    Ji, T; Yin, L; Liu, Z; Shen, F; Shen, J

    2014-10-31

    Varroa destructor is the greatest threat to the honeybee Apis mellifera worldwide, while it rarely causes serious harm to its native host, the Eastern honeybee Apis cerana. The genetic mechanisms underlying the resistance of A. cerana to Varroa remain unclear. Thus, understanding the molecular mechanism of resistance to Varroa may provide useful insights for reducing this disease in other organisms. In this study, the transcriptomes of two A. cerana colonies were sequenced using the Illumina Solexa sequencing method. One colony was highly affected by mites, whereas the other colony displayed strong resistance to V. destructor. We determined differences in gene expression in the two colonies after challenging the colonies with V. destructor. After de novo transcriptome assembly, we obtained 91,172 unigenes for A. cerana and found that 288 differentially expressed genes varied by more than 15-fold. A total of 277 unigenes were present at higher levels in the non-affected colony. Genes involved in resistance to Varroa included unigenes related to skeletal muscle movement, olfactory sensitivity, and transcription factors. This suggests that hygienic behavior and grooming behavior may play important roles in the resistance to Varroa.

  19. High-throughput sequencing of fecal DNA to identify insects consumed by wild Weddell's saddleback tamarins (Saguinus weddelli, Cebidae, Primates) in Bolivia.

    Science.gov (United States)

    Mallott, E K; Malhi, R S; Garber, P A

    2015-03-01

    The genus Saguinus represents a successful radiation of over 20 species of small-bodied New World monkeys. Studies of the tamarin diet indicate that insects and small vertebrates account for ∼16-45% of total feeding and foraging time, and represent an important source of lipids, protein, and metabolizable energy. Although tamarins are reported to commonly consume large-bodied insects such as grasshoppers and walking sticks (Orthoptera), little is known concerning the degree to which smaller or less easily identifiable arthropod prey comprises an important component of their diet. To better understand tamarin arthropod feeding behavior, fecal samples from 20 wild Bolivian saddleback tamarins (members of five groups) were collected over a 3 week period in June 2012, and analyzed for the presence of arthropod DNA. DNA was extracted using a Qiagen stool extraction kit, and universal insect primers were created and used to amplify a ∼280 bp section of the COI mitochondrial gene. Amplicons were sequenced on the Roche 454 sequencing platform using high-throughput sequencing techniques. An analysis of these samples indicated the presence of 43 taxa of arthropods including 10 orders, 15 families, and 12 identified genera. Many of these taxa had not been previously identified in the tamarin diet. These results highlight molecular analysis of fecal DNA as an important research tool for identifying anthropod feeding patterns in primates, and reveal broad diversity in the taxa, foraging microhabitats, and size of arthropods consumed by tamarin monkeys. © 2014 Wiley Periodicals, Inc.

  20. CRISPR-Cas9-Edited Site Sequencing (CRES-Seq): An Efficient and High-Throughput Method for the Selection of CRISPR-Cas9-Edited Clones.

    Science.gov (United States)

    Veeranagouda, Yaligara; Debono-Lagneaux, Delphine; Fournet, Hamida; Thill, Gilbert; Didier, Michel

    2018-01-16

    The emergence of clustered regularly interspaced short palindromic repeats-Cas9 (CRISPR-Cas9) gene editing systems has enabled the creation of specific mutants at low cost, in a short time and with high efficiency, in eukaryotic cells. Since a CRISPR-Cas9 system typically creates an array of mutations in targeted sites, a successful gene editing project requires careful selection of edited clones. This process can be very challenging, especially when working with multiallelic genes and/or polyploid cells (such as cancer and plants cells). Here we described a next-generation sequencing method called CRISPR-Cas9 Edited Site Sequencing (CRES-Seq) for the efficient and high-throughput screening of CRISPR-Cas9-edited clones. CRES-Seq facilitates the precise genotyping up to 96 CRISPR-Cas9-edited sites (CRES) in a single MiniSeq (Illumina) run with an approximate sequencing cost of $6/clone. CRES-Seq is particularly useful when multiple genes are simultaneously targeted by CRISPR-Cas9, and also for screening of clones generated from multiallelic genes/polyploid cells. © 2018 by John Wiley & Sons, Inc. Copyright © 2018 John Wiley & Sons, Inc.

  1. A High-Throughput and Low-Complexity H.264/AVC Intra 16×16 Prediction Architecture for HD Video Sequences

    Directory of Open Access Journals (Sweden)

    M. Orlandić

    2014-11-01

    Full Text Available H.264/AVC compression standard provides tools and solutions for an efficient coding of video sequences of various resolutions. Spatial redundancy in a video frame is removed by use of intra prediction algorithm. There are three block-wise types of intra prediction: 4×4, 8×8 and 16×16. This paper proposes an efficient, low-complexity architecture for intra 16×16 prediction that provides real-time processing of HD video sequences. All four prediction (V, H, DC, Plane modes are supported in the implementation. The high-complexity plane mode computes a number of intermediate parameters required for creating prediction pixels. The local memory buffers are used for storing intermediate reconstructed data used as reference pixels in intra prediction process. The high throughput is achieved by 16-pixel parallelism and the proposed prediction process takes 48 cycles for processing one macroblock. The proposed architecture is synthesized and implemented on Kintex 705 -XC7K325T board and requires 94 MHz to encode a video sequence of HD 4k×2k (3840×2160 resolution at 60 fps in real time. This represents a significant improvement compared to the state of the art.

  2. Mining genes involved in the stratification of Paris Polyphylla seeds using high-throughput embryo Transcriptome sequencing

    Science.gov (United States)

    2013-01-01

    Background Paris polyphylla var. yunnanensis is an important medicinal plant. Seed dormancy is one of the main factors restricting artificial cultivation. The molecular mechanisms of seed dormancy remain unclear, and little genomic or transcriptome data are available for this plant. Results In this study, massive parallel pyrosequencing on the Roche 454-GS FLX Titanium platform was used to generate a substantial sequence dataset for the P. polyphylla embryo. 369,496 high quality reads were obtained, ranging from 50 to 1146 bp, with a mean of 219 bp. These reads were assembled into 47,768 unigenes, which included 16,069 contigs and 31,699 singletons. Using BLASTX searches of public databases, 15,757 (32.3%) unique transcripts were identified. Gene Ontology and Cluster of Orthologous Groups of proteins annotations revealed that these transcripts were broadly representative of the P. polyphylla embryo transcriptome. The Kyoto Encyclopedia of Genes and Genomes assigned 5961 of the unique sequences to specific metabolic pathways. Relative expression levels analysis showed that eleven phytohormone-related genes and five other genes have different expression patterns in the embryo and endosperm in the seed stratification process. Conclusions Gene annotation and quantitative RT-PCR expression analysis identified 464 transcripts that may be involved in phytohormone catabolism and biosynthesis, hormone signal, seed dormancy, seed maturation, cell wall growth and circadian rhythms. In particular, the relative expression analysis of sixteen genes (CYP707A, NCED, GA20ox2, GA20ox3, ABI2, PP2C, ARP3, ARP7, IAAH, IAAS, BRRK, DRM, ELF1, ELF2, SFR6, and SUS) in embryo and endosperm and at two temperatures indicated that these related genes may be candidates for clarifying the molecular basis of seed dormancy in P. polyphlla var. yunnanensis. PMID:23718911

  3. Development of novel microsatellite markers for the BBCC Oryza genome (Poaceae) using high-throughput sequencing technology.

    Science.gov (United States)

    Wang, Caihong; Liu, Xiaojiao; Peng, Suotang; Xu, Qun; Yuan, Xiaoping; Feng, Yue; Yu, Hanyong; Wang, Yiping; Wei, Xinghua

    2014-01-01

    Wild species of Oryza are extremely valuable sources of genetic material that can be used to broaden the genetic background of cultivated rice, and to increase its resistance to abiotic and biotic stresses. Until recently, there was no sequence information for the BBCC Oryza genome; therefore, no special markers had been developed for this genome type. The lack of suitable markers made it difficult to search for valuable genes in the BBCC genome. The aim of this study was to develop microsatellite markers for the BBCC genome. We obtained 13,991 SSR-containing sequences and designed 14,508 primer pairs. The most abundant was hexanuclelotide (31.39%), followed by trinucleotide (27.67%) and dinucleotide (19.04%). 600 markers were selected for validation in 23 accessions of Oryza species with the BBCC genome. A set of 495 markers produced clear amplified fragments of the expected sizes. The average number of alleles per locus (Na) was 2.5, ranging from 1 to 9. The genetic diversity per locus (He) ranged from 0 to 0.844 with a mean of 0.333. The mean polymorphism information content (PIC) was 0.290, and ranged from 0 to 0.825. Of the 495 markers, 12 were only found in the BB genome, 173 were unique to the CC genome, and 198 were also present in the AA genome. These microsatellite markers could be used to evaluate the phylogenetic relationships among different Oryza genomes, and to construct a genetic linkage map for locating and identifying valuable genes in the BBCC genome, and would also for marker-assisted breeding programs that included accessions with the AA genome, especially Oryza sativa.

  4. Assessing the impact of water treatment on bacterial biofilms in drinking water distribution systems using high-throughput DNA sequencing.

    Science.gov (United States)

    Shaw, Jennifer L A; Monis, Paul; Fabris, Rolando; Ho, Lionel; Braun, Kalan; Drikas, Mary; Cooper, Alan

    2014-12-01

    Biofilm control in drinking water distribution systems (DWDSs) is crucial, as biofilms are known to reduce flow efficiency, impair taste and quality of drinking water and have been implicated in the transmission of harmful pathogens. Microorganisms within biofilm communities are more resistant to disinfection compared to planktonic microorganisms, making them difficult to manage in DWDSs. This study evaluates the impact of four unique drinking water treatments on biofilm community structure using metagenomic DNA sequencing. Four experimental DWDSs were subjected to the following treatments: (1) conventional coagulation, (2) magnetic ion exchange contact (MIEX) plus conventional coagulation, (3) MIEX plus conventional coagulation plus granular activated carbon, and (4) membrane filtration (MF). Bacterial biofilms located inside the pipes of each system were sampled under sterile conditions both (a) immediately after treatment application ('inlet') and (b) at a 1 km distance from the treatment application ('outlet'). Bacterial 16S rRNA gene sequencing revealed that the outlet biofilms were more diverse than those sampled at the inlet for all treatments. The lowest number of unique operational taxonomic units (OTUs) and lowest diversity was observed in the MF inlet. However, the MF system revealed the greatest increase in diversity and OTU count from inlet to outlet. Further, the biofilm communities at the outlet of each system were more similar to one another than to their respective inlet, suggesting that biofilm communities converge towards a common established equilibrium as distance from treatment application increases. Based on the results, MF treatment is most effective at inhibiting biofilm growth, but a highly efficient post-treatment disinfection regime is also critical in order to prevent the high rates of post-treatment regrowth. Copyright © 2014 Elsevier Ltd. All rights reserved.

  5. Identification and Characterization of miRNA Transcriptome in Asiatic Cotton (Gossypium arboreum Using High Throughput Sequencing

    Directory of Open Access Journals (Sweden)

    Muhammad Farooq

    2017-06-01

    Full Text Available MicroRNAs (miRNAs are small 20–24nt molecules that have been well studied over the past decade due to their important regulatory roles in different cellular processes. The mature sequences are more conserved across vast phylogenetic scales than their precursors and some are conserved within entire kingdoms, hence, their loci and function can be predicted by homology searches. Different studies have been performed to elucidate miRNAs using de novo prediction methods but due to complex regulatory mechanisms or false positive in silico predictions, not all of them express in reality and sometimes computationally predicted mature transcripts differ from the actual expressed ones. With the availability of a complete genome sequence of Gossypium arboreum, it is important to annotate the genome for both coding and non-coding regions using high confidence transcript evidence, for this cotton species that is highly resistant to various biotic and abiotic stresses. Here we have analyzed the small RNA transcriptome of G. arboreum leaves and provided genome annotation of miRNAs with evidence from miRNA/miRNA∗ transcripts. A total of 446 miRNAs clustered into 224 miRNA families were found, among which 48 families are conserved in other plants and 176 are novel. Four short RNA libraries were used to shortlist best predictions based on high reads per million. The size, origin, copy numbers and transcript depth of all miRNAs along with their isoforms and targets has been reported. The highest gene copy number was observed for gar-miR7504 followed by gar-miR166, gar-miR8771, gar-miR156, and gar-miR7484. Altogether, 1274 target genes were found in G. arboreum that are enriched for 216 KEGG pathways. The resultant genomic annotations are provided in UCSC, BED format.

  6. Identification of miRNAs Responsive to Botrytis cinerea in Herbaceous Peony (Paeonia lactiflora Pall. by High-Throughput Sequencing

    Directory of Open Access Journals (Sweden)

    Daqiu Zhao

    2015-09-01

    Full Text Available Herbaceous peony (Paeonia lactiflora Pall., one of the world’s most important ornamental plants, is highly susceptible to Botrytis cinerea, and improving resistance to this pathogenic fungus is a problem yet to be solved. MicroRNAs (miRNAs play an essential role in resistance to B. cinerea, but until now, no studies have been reported concerning miRNAs induction in P. lactiflora. Here, we constructed and sequenced two small RNA (sRNA libraries from two B. cinerea-infected P. lactiflora cultivars (“Zifengyu” and “Dafugui” with significantly different levels of resistance to B. cinerea, using the Illumina HiSeq 2000 platform. From the raw reads generated, 4,592,881 and 5,809,796 sRNAs were obtained, and 280 and 306 miRNAs were identified from “Zifengyu” and “Dafugui”, respectively. A total of 237 conserved and 7 novel sequences of miRNAs were differentially expressed between the two cultivars, and we predicted and annotated their potential target genes. Subsequently, 7 differentially expressed candidate miRNAs were screened according to their target genes annotated in KEGG pathways, and the expression patterns of miRNAs and corresponding target genes were elucidated. We found that miR5254, miR165a-3p, miR3897-3p and miR6450a might be involved in the P. lactiflora response to B. cinerea infection. These results provide insight into the molecular mechanisms responsible for resistance to B. cinerea in P. lactiflora.

  7. De novo Transcriptome Analysis of Chinese Citrus Fly, Bactrocera minax (Diptera: Tephritidae, by High-Throughput Illumina Sequencing.

    Directory of Open Access Journals (Sweden)

    Jia Wang

    Full Text Available The Chinese citrus fly, Bactrocera minax (Enderlein, is one of the most devastating pests of citrus in the temperate areas of Asia. So far, studies involving molecular biology and physiology of B. minax are still scarce, partly because of the lack of genomic information and inability to rear this insect in laboratory. In this study, de novo assembly of a transcriptome was performed using Illumina sequencing technology. A total of 20,928,907 clean reads were obtained and assembled into 33,324 unigenes, with an average length of 908.44 bp. Unigenes were annotated by alignment against NCBI non-redundant protein (Nr, Swiss-Prot, Clusters of Orthologous Groups (COG, Gene Ontology (GO, and Kyoto Encyclopedia of Genes and Genomes Pathway (KEGG database. Genes potentially involved in stress tolerance, including 20 heat shock protein (Hsps genes, 26 glutathione S-transferases (GSTs genes, and 2 ferritin subunit genes, were identified. These genes may play roles in stress tolerance in B. minax diapause stage. It has previously been found that 20E application on B. minax pupae could avert diapause, but the underlying mechanisms remain unknown. Thus, genes encoding enzymes in 20E biosynthesis pathway, including Neverland, Spook, Phantom, Disembodied, Shadow, Shade, and Cyp18a1, and genes encoding 20E receptor proteins, ecdysone receptor (EcR and ultraspiracle (USP, were identified. The expression patterns of 20E-related genes among developmental stages and between 20E-treated and untreated pupae demonstrated their roles in diapause program. In addition, 1,909 simple sequence repeats (SSRs were detected, which will contribute to molecular marker development. The findings in this study greatly improve our genetic understanding of B. minax, and lay the foundation for future studies on this species.

  8. De novo Transcriptome Analysis of Chinese Citrus Fly, Bactrocera minax (Diptera: Tephritidae), by High-Throughput Illumina Sequencing.

    Science.gov (United States)

    Wang, Jia; Xiong, Ke-Cai; Liu, Ying-Hong

    2016-01-01

    The Chinese citrus fly, Bactrocera minax (Enderlein), is one of the most devastating pests of citrus in the temperate areas of Asia. So far, studies involving molecular biology and physiology of B. minax are still scarce, partly because of the lack of genomic information and inability to rear this insect in laboratory. In this study, de novo assembly of a transcriptome was performed using Illumina sequencing technology. A total of 20,928,907 clean reads were obtained and assembled into 33,324 unigenes, with an average length of 908.44 bp. Unigenes were annotated by alignment against NCBI non-redundant protein (Nr), Swiss-Prot, Clusters of Orthologous Groups (COG), Gene Ontology (GO), and Kyoto Encyclopedia of Genes and Genomes Pathway (KEGG) database. Genes potentially involved in stress tolerance, including 20 heat shock protein (Hsps) genes, 26 glutathione S-transferases (GSTs) genes, and 2 ferritin subunit genes, were identified. These genes may play roles in stress tolerance in B. minax diapause stage. It has previously been found that 20E application on B. minax pupae could avert diapause, but the underlying mechanisms remain unknown. Thus, genes encoding enzymes in 20E biosynthesis pathway, including Neverland, Spook, Phantom, Disembodied, Shadow, Shade, and Cyp18a1, and genes encoding 20E receptor proteins, ecdysone receptor (EcR) and ultraspiracle (USP), were identified. The expression patterns of 20E-related genes among developmental stages and between 20E-treated and untreated pupae demonstrated their roles in diapause program. In addition, 1,909 simple sequence repeats (SSRs) were detected, which will contribute to molecular marker development. The findings in this study greatly improve our genetic understanding of B. minax, and lay the foundation for future studies on this species.

  9. Exploring the transcriptome of non-model oleaginous microalga Dunaliella tertiolecta through high-throughput sequencing and high performance computing.

    Science.gov (United States)

    Yao, Lina; Tan, Kenneth Wei Min; Tan, Tin Wee; Lee, Yuan Kun

    2017-02-22

    RNA-Seq technology has received a lot of attention in recent years for microalgal global transcriptomic profiling. It is widely used in transcriptome-wide analysis of gene expression., particularly for microalgal strains with potential as biofuel sources. However, insufficient genomic or transcriptomic information of non-model microalgae has limited the understanding of their regulatory mechanisms and hampered genetic manipulation to enhance biofuel production. As such, an optimal microalgal transcriptomic database construction is a subject of urgent investigation. Dunaliella tertiolecta, a non-model oleaginous microalgal species, was sequenced via Illumina MISEQ and HISEQ 4000 in RNA-Seq studies. The high quality high-throughout sequencing data were explored using high performance computing (HPC) in a petascale data center and subjected to de novo assembly and parallelized mpiBLASTX search with multiple species. As a result, a transcriptome database of 17,845 was constructed (~95% completeness). This enlarged database constructed fueled the RNA-Seq data analysis, which was validated by a nitrogen deprivation (ND) study that induces triacylglycerol (TAG) production. The new paralleled assembly and annotation method under HPC presented here allows the solution of large-scale data processing problems in acceptable computation time. There is significant increase in the number of transcriptomic data achieved and observable heterogeneity in the performance to identify differentially expressed genes in the ND treatment paradigm. The results provide new insights as to how response to ND treatment in microalgae is regulated. ND analyses highlight the advantages of this database generated in this study that could also serve as a useful resource for future gene manipulation and transcriptome-wide analysis. We thus demonstrate the usefulness of exploring the transcriptome as an informative platform for functional studies and genetic manipulations in similar species.

  10. Next Gen Pop Gen: implementing a high-throughput approach to population genetics in boarfish (Capros aper).

    Science.gov (United States)

    Farrell, Edward D; Carlsson, Jeanette E L; Carlsson, Jens

    2016-12-01

    The recently developed approach for microsatellite genotyping by sequencing (GBS) using individual combinatorial barcoding was further improved and used to assess the genetic population structure of boarfish (Capros aper) across the species' range. Microsatellite loci were developed de novo and genotyped by next-generation sequencing. Genetic analyses of the samples indicated that boarfish can be subdivided into at least seven biological units (populations) across the species' range. Furthermore, the recent apparent increase in abundance in the northeast Atlantic is better explained by demographic changes within this area than by influx from southern or insular populations. This study clearly shows that the microsatellite GBS approach is a generic, cost-effective, rapid and powerful method suitable for full-scale population genetic studies-a crucial element for assessment, sustainable management and conservation of valuable biological resources.

  11. High-throughput sequencing and pathway analysis reveal alteration of the pituitary transcriptome by 17α-ethynylestradiol (EE2) in female coho salmon, Oncorhynchus kisutch

    Energy Technology Data Exchange (ETDEWEB)

    Harding, Louisa B. [School of Aquatic and Fishery Sciences, University of Washington, Seattle, WA 98195 (United States); Schultz, Irvin R. [Battelle, Marine Sciences Laboratory – Pacific Northwest National Laboratory, 1529 West Sequim Bay Road, Sequim, WA 98382 (United States); Goetz, Giles W. [School of Aquatic and Fishery Sciences, University of Washington, Seattle, WA 98195 (United States); Luckenbach, J. Adam [Northwest Fisheries Science Center, National Marine Fisheries Service, National Oceanic and Atmospheric Administration, 2725 Montlake Blvd E, Seattle, WA 98112 (United States); Center for Reproductive Biology, Washington State University, Pullman, WA 98164 (United States); Young, Graham [School of Aquatic and Fishery Sciences, University of Washington, Seattle, WA 98195 (United States); Center for Reproductive Biology, Washington State University, Pullman, WA 98164 (United States); Goetz, Frederick W. [Northwest Fisheries Science Center, National Marine Fisheries Service, National Oceanic and Atmospheric Administration, Manchester Research Station, P.O. Box 130, Manchester, WA 98353 (United States); Swanson, Penny, E-mail: penny.swanson@noaa.gov [Northwest Fisheries Science Center, National Marine Fisheries Service, National Oceanic and Atmospheric Administration, 2725 Montlake Blvd E, Seattle, WA 98112 (United States); Center for Reproductive Biology, Washington State University, Pullman, WA 98164 (United States)

    2013-10-15

    Highlights: •Studied impacts of ethynylestradiol (EE2) exposure on salmon pituitary transcriptome. •High-throughput sequencing, RNAseq, and pathway analysis were performed. •EE2 altered mRNAs for genes in circadian rhythm, GnRH, and TGFβ signaling pathways. •LH and FSH beta subunit mRNAs were most highly up- and down-regulated by EE2, respectively. •Estrogens may alter processes associated with reproductive timing in salmon. -- Abstract: Considerable research has been done on the effects of endocrine disrupting chemicals (EDCs) on reproduction and gene expression in the brain, liver and gonads of teleost fish, but information on impacts to the pituitary gland are still limited despite its central role in regulating reproduction. The aim of this study was to further our understanding of the potential effects of natural and synthetic estrogens on the brain–pituitary–gonad axis in fish by determining the effects of 17α-ethynylestradiol (EE2) on the pituitary transcriptome. We exposed sub-adult coho salmon (Oncorhynchus kisutch) to 0 or 12 ng EE2/L for up to 6 weeks and effects on the pituitary transcriptome of females were assessed using high-throughput Illumina{sup ®} sequencing, RNA-Seq and pathway analysis. After 1 or 6 weeks, 218 and 670 contiguous sequences (contigs) respectively, were differentially expressed in pituitaries of EE2-exposed fish relative to control. Two of the most highly up- and down-regulated contigs were luteinizing hormone β subunit (241-fold and 395-fold at 1 and 6 weeks, respectively) and follicle-stimulating hormone β subunit (−3.4-fold at 6 weeks). Additional contigs related to gonadotropin synthesis and release were differentially expressed in EE2-exposed fish relative to controls. These included contigs involved in gonadotropin releasing hormone (GNRH) and transforming growth factor-β signaling. There was an over-representation of significantly affected contigs in 33 and 18 canonical pathways at 1 and 6 weeks

  12. A new sieving matrix for DNA sequencing, genotyping and mutation detection and high-throughput genotyping with a 96-capillary array system

    Energy Technology Data Exchange (ETDEWEB)

    Gao, David [Iowa State Univ., Ames, IA (United States)

    1999-11-08

    Capillary electrophoresis has been widely accepted as a fast separation technique in DNA analysis. In this dissertation, a new sieving matrix is described for DNA analysis, especially DNA sequencing, genetic typing and mutation detection. A high-throughput 96 capillary array electrophoresis system was also demonstrated for simultaneous multiple genotyping. The authors first evaluated the influence of different capillary coatings on the performance of DNA sequencing. A bare capillary was compared with a DB-wax, an FC-coated and a polyvinylpyrrolidone dynamically coated capillary with PEO as sieving matrix. It was found that covalently-coated capillaries had no better performance than bare capillaries while PVP coating provided excellent and reproducible results. The authors also developed a new sieving Matrix for DNA separation based on commercially available poly(vinylpyrrolidone) (PVP). This sieving matrix has a very low viscosity and an excellent self-coating effect. Successful separations were achieved in uncoated capillaries. Sequencing of M13mp18 showed good resolution up to 500 bases in treated PVP solution. Temperature gradient capillary electrophoresis and PVP solution was applied to mutation detection. A heteroduplex sample and a homoduplex reference were injected during a pair of continuous runs. A temperature gradient of 10 C with a ramp of 0.7 C/min was swept throughout the capillary. Detection was accomplished by laser induced fluorescence detection. Mutation detection was performed by comparing the pattern changes between the homoduplex and the heteroduplex samples. High throughput, high detection rate and easy operation were achieved in this system. They further demonstrated fast and reliable genotyping based on CTTv STR system by multiple-capillary array electrophoresis. The PCR products from individuals were mixed with pooled allelic ladder as an absolute standard and coinjected with a 96-vial tray. Simultaneous one-color laser-induced fluorescence

  13. Identification and characterization of microRNAs related to salt stress in broccoli, using high-throughput sequencing and bioinformatics analysis.

    Science.gov (United States)

    Tian, Yunhong; Tian, Yunming; Luo, Xiaojun; Zhou, Tao; Huang, Zuoping; Liu, Ying; Qiu, Yihan; Hou, Bing; Sun, Dan; Deng, Hongyu; Qian, Shen; Yao, Kaitai

    2014-09-03

    MicroRNAs (miRNAs) are a new class of endogenous regulators of a broad range of physiological processes, which act by regulating gene expression post-transcriptionally. The brassica vegetable, broccoli (Brassica oleracea var. italica), is very popular with a wide range of consumers, but environmental stresses such as salinity are a problem worldwide in restricting its growth and yield. Little is known about the role of miRNAs in the response of broccoli to salt stress. In this study, broccoli subjected to salt stress and broccoli grown under control conditions were analyzed by high-throughput sequencing. Differential miRNA expression was confirmed by real-time reverse transcription polymerase chain reaction (RT-PCR). The prediction of miRNA targets was undertaken using the Kyoto Encyclopedia of Genes and Genomes (KEGG) Orthology (KO) database and Gene Ontology (GO)-enrichment analyses. Two libraries of small (or short) RNAs (sRNAs) were constructed and sequenced by high-throughput Solexa sequencing. A total of 24,511,963 and 21,034,728 clean reads, representing 9,861,236 (40.23%) and 8,574,665 (40.76%) unique reads, were obtained for control and salt-stressed broccoli, respectively. Furthermore, 42 putative known and 39 putative candidate miRNAs that were differentially expressed between control and salt-stressed broccoli were revealed by their read counts and confirmed by the use of stem-loop real-time RT-PCR. Amongst these, the putative conserved miRNAs, miR393 and miR855, and two putative candidate miRNAs, miR3 and miR34, were the most strongly down-regulated when broccoli was salt-stressed, whereas the putative conserved miRNA, miR396a, and the putative candidate miRNA, miR37, were the most up-regulated. Finally, analysis of the predicted gene targets of miRNAs using the GO and KO databases indicated that a range of metabolic and other cellular functions known to be associated with salt stress were up-regulated in broccoli treated with salt. A comprehensive

  14. A simple, high throughput method to locate single copy sequences from Bacterial Artificial Chromosome (BAC libraries using High Resolution Melt analysis

    Directory of Open Access Journals (Sweden)

    Caligari Peter DS

    2010-05-01

    Full Text Available Abstract Background The high-throughput anchoring of genetic markers into contigs is required for many ongoing physical mapping projects. Multidimentional BAC pooling strategies for PCR-based screening of large insert libraries is a widely used alternative to high density filter hybridisation of bacterial colonies. To date, concerns over reliability have led most if not all groups engaged in high throughput physical mapping projects to favour BAC DNA isolation prior to amplification by conventional PCR. Results Here, we report the first combined use of Multiplex Tandem PCR (MT-PCR and High Resolution Melt (HRM analysis on bacterial stocks of BAC library superpools as a means of rapidly anchoring markers to BAC colonies and thereby to integrate genetic and physical maps. We exemplify the approach using a BAC library of the model plant Arabidopsis thaliana. Super pools of twenty five 384-well plates and two-dimension matrix pools of the BAC library were prepared for marker screening. The entire procedure only requires around 3 h to anchor one marker. Conclusions A pre-amplification step during MT-PCR allows high multiplexing and increases the sensitivity and reliability of subsequent HRM discrimination. This simple gel-free protocol is more reliable, faster and far less costly than conventional PCR screening. The option to screen in parallel 3 genetic markers in one MT-PCR-HRM reaction using templates from directly pooled bacterial stocks of BAC-containing bacteria further reduces time for anchoring markers in physical maps of species with large genomes.

  15. High-Throughput Sequencing of MicroRNAs in Adenovirus Type 3 Infected Human Laryngeal Epithelial Cells

    Directory of Open Access Journals (Sweden)

    Yuhua Qi

    2010-01-01

    Full Text Available Adenovirus infection can cause various illnesses depending on the infecting serotype, such as gastroenteritis, conjunctivitis, cystitis, and rash illness, but the infection mechanism is still unknown. MicroRNAs (miRNA have been reported to play essential roles in cell proliferation, cell differentiation, and pathogenesis of human diseases including viral infections. We analyzed the miRNA expression profiles from adenovirus type 3 (AD3 infected Human laryngeal epithelial (Hep2 cells using a SOLiD deep sequencing. 492 precursor miRNAs were identified in the AD3 infected Hep2 cells, and 540 precursor miRNAs were identified in the control. A total of 44 miRNAs demonstrated high expression and 36 miRNAs showed lower expression in the AD3 infected cells than control. The biogenesis of miRNAs has been analyzed, and some of the SOLiD results were confirmed by Quantitative PCR analysis. The present studies may provide a useful clue for the biological function research into AD3 infection.

  16. A new perspective on studying burial environment before archaeological excavation: analyzing bacterial community distribution by high-throughput sequencing.

    Science.gov (United States)

    Xu, Jinjin; Wei, Yanfei; Jia, Hanqing; Xiao, Lin; Gong, Decai

    2017-02-07

    Burial conditions play a crucial role in archaeological heritage preservation. Especially, the microorganisms were considered as the leading causes which incurred degradation and vanishment of historic materials. In this article, we analyzed bacterial diversity and community structure from M1 of Wangshanqiao using 16 S rRNA gene amplicon sequencing. The results indicated that microbial communities in burial conditions were diverse among four different samples. The samples from the robber hole varied most obviously in community structure both in Alpha and Beta diversity. In addition, the dominant phylum in different samples were Proteobacteria, Actinobacteria and Bacteroidetes, respectively. Moreover, the study implied that historical materials preservation conditions had connections with bacterial community distribution. At the genus level, Acinetobacter might possess high ability in degrading organic culture heritage in burial conditions, while Bacteroides were associated closely with favorable preservation conditions. This method contributes to fetch information which would never recover after excavation, and it will help to explore microbial degradation on precious organic culture heritage and further our understanding of archaeological burial environment. The study also indicates that robbery has a serious negative impact on burial remains.

  17. Unearthing microbial diversity of Taxus rhizosphere via MiSeq high-throughput amplicon sequencing and isolate characterization

    Science.gov (United States)

    Hao, Da Cheng; Song, Si Meng; Mu, Jun; Hu, Wen Li; Xiao, Pei Gen

    2016-04-01

    The species variability and potential environmental functions of Taxus rhizosphere microbial community were studied by comparative analyses of 15 16S rRNA and 15 ITS MiSeq sequencing libraries from Taxus rhizospheres in subtropical and temperate regions of China, as well as by isolating laccase-producing strains and polycyclic aromatic hydrocarbon (PAH)-degrading strains. Total reads could be assigned to 2,141 Operational Taxonomic Units (OTUs) belonging to 31 bacteria phyla and 2,904 OTUs of at least seven fungi phyla. The abundance of Planctomycetes, Actinobacteria, and Chloroflexi was higher in T. cuspidata var. nana and T. × media rhizospheres than in T. mairei rhizosphere (NF), while Acidobacteria, Proteobacteria, Nitrospirae, and unclassified bacteria were more abundant in the latter. Ascomycota and Zygomycota were predominant in NF, while two temperate Taxus rhizospheres had more unclassified fungi, Basidiomycota, and Chytridiomycota. The bacterial/fungal community richness and diversity were lower in NF than in other two. Three dye decolorizing fungal isolates were shown to be highly efficient in removing three classes of reactive dye, while two PAH-degrading fungi were able to degrade recalcitrant benzo[a]pyrene. The present studies extend the knowledge pedigree of the microbial diversity populating rhizospheres, and exemplify the method shift in research and development of resource plant rhizosphere.

  18. Computer Animation Technology in Behavioral Sciences: A Sequential, Automatic, and High-Throughput Approach to Quantifying Personality in Zebrafish (Danio rerio).

    Science.gov (United States)

    Fangmeier, Melissa L; Noble, Daniel W A; O'Dea, Rose E; Usui, Takuji; Lagisz, Malgorzata; Hesselson, Daniel; Nakagawa, Shinichi

    2018-01-30

    An emergent field of animal personality necessitates a method for repeated high-throughput quantification of behavioral traits across contexts. In this study, we have developed an automated video stimulus approach to sequentially present different contexts relevant to five "personality" traits (exploration, boldness, neophobia, aggression, and sociability), successfully quantifying repeatable trait measurements in multiple individuals simultaneously. Although our method is designed to quantify personality traits in zebrafish, our approach can accommodate the quantification of other behaviors, and could be customized for other species. All digital materials and detailed protocols are publicly available online for researchers to freely use and modify.

  19. Characterization of P5CS gene in Calotropis procera plant from the de novo assembled transcriptome contigs of the high-throughput sequencing dataset.

    Science.gov (United States)

    Ramadan, Ahmed M; Hassanein, Sameh E

    2014-12-01

    The wild plant known as Calotropis procera is important in medicine, industry and ornamental fields. Due to spread in areas that suffer from environmental stress, it has a large number of tolerance genes to environmental stress such as drought and salinity. Proline is one of the most compatible solutes that accumulate widely in plants to tolerate unfavorable environmental conditions. Plant proline synthesis depends on Δ-pyrroline-5-carboxylate synthase (P5CS) gene. But information about this gene in C. procera is unavailable. In this study, we uncovered and characterized P5CS (P5CS, NCBI accession no. KJ020750) gene in this medicinal plant from the de novo assembled transcriptome contigs of the high-throughput sequencing dataset. A number of GenBank accessions for P5CS sequences were blasted with the recovered de novo assembled contigs. Homology modeling of the deduced amino acids (NCBI accession No. AHM25913) was further carried out using Swiss-Model, accessible via the EXPASY. Superimposition of C. procera P5CS-like full sequence model on Homo sapiens (P5CS_HUMAN, UniProt protein accession no. P54886) was constructed using RasMol and Deep-View programs. The functional domains of the novel P5CS amino acids sequence were identified from the NCBI conserved domain database (CDD) that provide insights into sequence structure/function relationships, as well as domain models imported from a number of external source databases (Pfam, SMART, COG, PRK, TIGRFAM). Copyright © 2014 Académie des sciences. Published by Elsevier SAS. All rights reserved.

  20. Detection of a Usp-like gene in Calotropis procera plant from the de novo assembled genome contigs of the high-throughput sequencing dataset.

    Science.gov (United States)

    Shokry, Ahmed M; Al-Karim, Saleh; Ramadan, Ahmed; Gadallah, Nour; Al Attas, Sanaa G; Sabir, Jamal S M; Hassan, Sabah M; Madkour, Magdy A; Bressan, Ray; Mahfouz, Magdy; Bahieldin, Ahmed

    2014-02-01

    The wild plant species Calotropis procera (C. procera) has many potential applications and beneficial uses in medicine, industry and ornamental field. It also represents an excellent source of genes for drought and salt tolerance. Genes encoding proteins that contain the conserved universal stress protein (USP) domain are known to provide organisms like bacteria, archaea, fungi, protozoa and plants with the ability to respond to a plethora of environmental stresses. However, information on the possible occurrence of Usp in C. procera is not available. In this study, we uncovered and characterized a one-class A Usp-like (UspA-like, NCBI accession No. KC954274) gene in this medicinal plant from the de novo assembled genome contigs of the high-throughput sequencing dataset. A number of GenBank accessions for Usp sequences were blasted with the recovered de novo assembled contigs. Homology modelling of the deduced amino acids (NCBI accession No. AGT02387) was further carried out using Swiss-Model, accessible via the EXPASY. Superimposition of C. procera USPA-like full sequence model on Thermus thermophilus USP UniProt protein (PDB accession No. Q5SJV7) was constructed using RasMol and Deep-View programs. The functional domains of the novel USPA-like amino acids sequence were identified from the NCBI conserved domain database (CDD) that provide insights into sequence structure/function relationships, as well as domain models imported from a number of external source databases (Pfam, SMART, COG, PRK, TIGRFAM). Copyright © 2014 Académie des sciences. All rights reserved.

  1. Detection of a Usp-like gene in Calotropis procera plant from the de novo assembled genome contigs of the high-throughput sequencing dataset

    KAUST Repository

    Shokry, Ahmed M.

    2014-02-01

    The wild plant species Calotropis procera (C. procera) has many potential applications and beneficial uses in medicine, industry and ornamental field. It also represents an excellent source of genes for drought and salt tolerance. Genes encoding proteins that contain the conserved universal stress protein (USP) domain are known to provide organisms like bacteria, archaea, fungi, protozoa and plants with the ability to respond to a plethora of environmental stresses. However, information on the possible occurrence of Usp in C. procera is not available. In this study, we uncovered and characterized a one-class A Usp-like (UspA-like, NCBI accession No. KC954274) gene in this medicinal plant from the de novo assembled genome contigs of the high-throughput